ScavioScavio
ProductPricingDocs
Sign InGet Started
  1. Home
  2. Tutorials
  3. How to Scrape B2B Directories with n8n and a Search API
Tutorial

How to Scrape B2B Directories with n8n and a Search API

Build an n8n workflow that extracts company data from B2B directories via search API. No browser automation needed. Step-by-step tutorial.

Get Free API KeyAPI Docs

B2B directories like Clutch, G2, and industry-specific listings are rich sources of company data for outbound sales. Instead of building fragile browser scrapers that break when directory layouts change, you can search for directory listings via a search API and extract structured data from the SERP snippets. This approach is faster, cheaper, and more maintainable. This tutorial builds an n8n pipeline that queries directories, extracts company information, and outputs a clean lead list. Each search costs $0.005 via Scavio.

Prerequisites

  • n8n instance running (self-hosted or cloud)
  • A Scavio API key from scavio.dev
  • Target industry or niche to prospect
  • Google Sheets or CRM for output

Walkthrough

Step 1: Define directory search queries

Craft search queries that target B2B directory listings. The pattern is: site:directory.com + niche + location. This retrieves only directory pages, not random websites.

JavaScript
// n8n Code node to generate targeted queries:
const directories = [
  { name: 'clutch', query: 'site:clutch.co' },
  { name: 'g2', query: 'site:g2.com' },
  { name: 'goodfirms', query: 'site:goodfirms.co' }
];
const niches = ['marketing agency', 'web development company', 'IT consulting'];

const queries = [];
for (const dir of directories) {
  for (const niche of niches) {
    queries.push({
      json: {
        directory: dir.name,
        searchQuery: `${dir.query} ${niche}`,
        niche
      }
    });
  }
}
return queries; // 9 targeted directory queries

Step 2: Execute search queries via HTTP Request

For each query, call the Scavio API to get directory listing results. The organic results contain company names, descriptions, and ratings in the snippets.

JSON
// n8n HTTP Request node:
// Method: POST
// URL: https://api.scavio.dev/api/v1/search
// Headers: x-api-key: {{ $env.SCAVIO_API_KEY }}
// Body:
{
  "query": "{{ $json.searchQuery }}",
  "country_code": "us"
}

Step 3: Parse company data from SERP results

Extract company names, URLs, and descriptions from the organic results. Directory pages have predictable title formats that can be parsed.

JavaScript
// n8n Code node to parse directory listings:
const data = $input.first().json;
const companies = (data.organic_results || []).map(r => {
  // Clutch titles: "Company Name - Reviews, Cost & More"
  // G2 titles: "Company Name Reviews 2026"
  const name = r.title.split(' - ')[0].split(' Reviews')[0].trim();
  return {
    name,
    url: r.link,
    description: r.snippet || '',
    directory: $('Code').first().json.directory,
    niche: $('Code').first().json.niche
  };
}).filter(c => c.name.length > 2 && c.name.length < 100);

return companies.map(c => ({ json: c }));

Step 4: Deduplicate and enrich with website search

Remove duplicate companies across directories and optionally enrich with a direct search for each company to find their actual website and contact info.

JavaScript
// Dedup in a Code node:
const seen = new Set();
const unique = [];
for (const item of $input.all()) {
  const key = item.json.name.toLowerCase();
  if (!seen.has(key)) {
    seen.add(key);
    unique.push(item);
  }
}
return unique;

// Then enrich each with a second search:
// HTTP Request node:
// Body:
{
  "query": "{{ $json.name }} company website contact",
  "country_code": "us"
}

Step 5: Export to Google Sheets with cost tracking

Write the cleaned leads to a Google Sheet. Add a cost column so you know exactly what the extraction run cost.

JavaScript
// Final Code node before Google Sheets:
const leads = $input.all().map((item, i) => ({
  json: {
    ...item.json,
    extractedAt: new Date().toISOString(),
    estimatedCost: ((i + 1) * 0.005).toFixed(3)
  }
}));

const totalCost = (leads.length * 0.005).toFixed(2);
console.log(`Extracted ${leads.length} leads, cost: $${totalCost}`);
return leads;

// Google Sheets node: Append to "B2B Leads" sheet
// Total cost: 9 directory queries + ~50 enrichment queries = ~$0.30

Python Example

Python
import os, requests, time

API_KEY = os.environ['SCAVIO_API_KEY']

def search(query: str) -> dict:
    resp = requests.post('https://api.scavio.dev/api/v1/search',
        headers={'x-api-key': API_KEY, 'Content-Type': 'application/json'},
        json={'query': query, 'country_code': 'us'})
    return resp.json()

def scrape_directory(directory_site: str, niche: str) -> list:
    data = search(f'site:{directory_site} {niche}')
    companies = []
    for r in data.get('organic_results', []):
        name = r['title'].split(' - ')[0].split(' Reviews')[0].strip()
        companies.append({'name': name, 'url': r['link'],
                         'snippet': r.get('snippet', ''), 'directory': directory_site})
    return companies

def main():
    directories = ['clutch.co', 'g2.com', 'goodfirms.co']
    all_leads = []
    seen = set()
    for d in directories:
        companies = scrape_directory(d, 'marketing agency')
        for c in companies:
            if c['name'].lower() not in seen:
                seen.add(c['name'].lower())
                all_leads.append(c)
        time.sleep(0.3)
    print(f'Found {len(all_leads)} unique companies')
    for lead in all_leads[:5]:
        print(f'  {lead["name"]} ({lead["directory"]})')

if __name__ == '__main__':
    main()

JavaScript Example

JavaScript
const API_KEY = process.env.SCAVIO_API_KEY;

async function search(query) {
  const resp = await fetch('https://api.scavio.dev/api/v1/search', {
    method: 'POST',
    headers: { 'x-api-key': API_KEY, 'Content-Type': 'application/json' },
    body: JSON.stringify({ query, country_code: 'us' })
  });
  return resp.json();
}

async function main() {
  const directories = ['clutch.co', 'g2.com', 'goodfirms.co'];
  const seen = new Set();
  const leads = [];
  for (const dir of directories) {
    const data = await search(`site:${dir} marketing agency`);
    for (const r of data.organic_results || []) {
      const name = r.title.split(' - ')[0].split(' Reviews')[0].trim();
      if (!seen.has(name.toLowerCase())) {
        seen.add(name.toLowerCase());
        leads.push({ name, url: r.link, directory: dir });
      }
    }
  }
  console.log(`Found ${leads.length} unique companies`);
  leads.slice(0, 5).forEach(l => console.log(`  ${l.name} (${l.directory})`));
}

main().catch(console.error);

Expected Output

JSON
Found 27 unique companies across 3 directories
  WebFX (clutch.co)
  Ignite Digital (clutch.co)
  SmartSites (g2.com)
  Thrive Internet Marketing (goodfirms.co)
  Disruptive Advertising (g2.com)

Cost: 3 directory queries = $0.015
With enrichment: ~$0.15 total

Related Tutorials

  • How to Build an n8n Google Maps Email Extraction Pipeline
  • How to Build n8n Lead Scoring with Search API Data
  • How to Enrich Leads with a Search API in Python

Frequently Asked Questions

Most developers complete this tutorial in 15 to 30 minutes. You will need a Scavio API key (free tier works) and a working Python or JavaScript environment.

n8n instance running (self-hosted or cloud). A Scavio API key from scavio.dev. Target industry or niche to prospect. Google Sheets or CRM for output. A Scavio API key gives you 50 free credits on signup.

Yes. The free tier includes 50 credits on signup, which is more than enough to complete this tutorial and prototype a working solution.

Scavio has a native LangChain package (langchain-scavio), an MCP server, and a plain REST API that works with any HTTP client. This tutorial uses the raw REST API, but you can adapt to your framework of choice.

Related Resources

Comparison

Web Scraping in n8n (HTTP Request + HTML Extract) vs Search API in n8n (HTTP Request to search API)

Read more
Best Of

Best Search APIs for n8n Lead Scoring Workflows (2026)

Read more
Best Of

Best n8n Search API Nodes Comparison (May 2026)

Read more
Use Case

n8n Search Enrichment Workflow

Read more
Workflow

n8n Directory Pagination and Data Extraction Workflow

Read more
Glossary

Search API Provider Landscape (2026)

Read more

Start Building

Build an n8n workflow that extracts company data from B2B directories via search API. No browser automation needed. Step-by-step tutorial.

Get Free API KeyRead the Docs
ScavioScavio

Real-time search API for AI agents. Search every platform, not just Google.

Product

  • Features
  • Pricing
  • Dashboard
  • Affiliates

Developers

  • Documentation
  • API Reference
  • Quickstart
  • MCP Integration
  • Python SDK

Alternatives

  • Tavily Alternative
  • SerpAPI Alternative
  • Firecrawl Alternative
  • Exa Alternative

Tools

  • JSON Formatter
  • cURL to Code
  • Token Counter
  • All Tools

© 2026 Scavio. All rights reserved.

Featured on TAAFT
Terms of ServicePrivacy Policy