ScavioScavio
ProductPricingDocs
Sign InGet Started
Blog
ai-agentscost-optimizationscrapingsearch-apiarchitecture

Agent Discovery vs Extraction: Why Cost Split Matters

Discovery via search API costs $0.005 per query. Extraction via scraper costs $0.05-0.50 per page. Running extraction on everything that passes discovery is the biggest cost mistake in agent pipelines.

May 22, 2026
6 min read

Agent Discovery vs Extraction: Why Cost Split Matters

Discovery via search API costs $0.005 per query. Extraction via scraper costs $0.05-0.50 per page. Running extraction on everything that passes discovery is the biggest cost mistake in agent pipelines. The fix is a relevance filter between the two stages that passes only 20-30% of discovered URLs to the extraction stage.

The Cost Asymmetry

Search API calls and scraping have fundamentally different cost structures:

  • Search API: structured result, no HTML parsing, no proxy management, $0.003-0.008/query
  • Scraper (Firecrawl, Jina, custom): full page content, HTML rendering, proxy handling, $0.05-0.50/page depending on anti-bot complexity

The 10-100x cost difference means that a single unnecessary scrape call costs as much as 10-100 search API calls. If your agent scrapes 50 pages when 10 contained the relevant information, you paid 5x the necessary scrape cost.

Measuring Your Current Split

Before optimizing, measure:

Python
class CostTracker:
    def __init__(self):
        self.search_calls = 0
        self.scrape_calls = 0
        self.search_cost = 0.0
        self.scrape_cost = 0.0

    def log_search(self, cost: float = 0.005):
        self.search_calls += 1
        self.search_cost += cost

    def log_scrape(self, cost: float = 0.10):
        self.scrape_calls += 1
        self.scrape_cost += cost

    @property
    def scrape_ratio(self):
        total = self.search_calls + self.scrape_calls
        return self.scrape_calls / total if total > 0 else 0

    @property
    def total_cost(self):
        return self.search_cost + self.scrape_cost

If your scrape ratio exceeds 0.5 (more than half your calls are scrapes), your pipeline likely has no relevance filter.

Adding the Relevance Filter

Between discovery and extraction, add an LLM call that scores snippet relevance:

Python
def filter_for_extraction(task: str, search_results: list[dict]) -> list[str]:
    snippets = [
        f"{i}. [{r['title']}] {r['snippet']}"
        for i, r in enumerate(search_results, 1)
    ]

    prompt = f"""Task: {task}

Search results (title + snippet only):
{chr(10).join(snippets)}

Which result numbers are highly likely to contain primary source
information needed for the task? Return only numbers, comma-separated.
Be selective — only include results where the snippet strongly suggests
the page contains the specific information needed."""

    response = llm.complete(prompt)
    selected_nums = [int(n.strip()) for n in response.split(',') if n.strip().isdigit()]
    return [search_results[i-1]['link'] for i in selected_nums
            if 1 <= i <= len(search_results)]

This LLM call costs ~$0.001 in Claude Haiku tokens for 10 snippets. It typically selects 2-3 URLs from 10 candidates, cutting scrape volume by 70-80%.

The Extraction Stage

Only scrape the filtered URLs:

Python
def run_pipeline(task: str, queries: list[str], tracker: CostTracker) -> list[dict]:
    all_results = []

    # Stage 1: Discovery
    for query in queries:
        results = search_api.call(query)
        tracker.log_search()
        all_results.extend(results)

    # Stage 2: Filter
    urls_to_scrape = filter_for_extraction(task, all_results)

    # Stage 3: Extraction (only filtered URLs)
    extracted = []
    for url in urls_to_scrape:
        content = scraper.fetch(url)
        tracker.log_scrape()
        extracted.append({"url": url, "content": content})

    return extracted

Real Cost Impact at Scale

For an agent running 100 research tasks per month, each with 10 queries and 10 results per query:

Without filter:

  • 1,000 search calls: $5
  • 10,000 scrape calls at $0.10: $1,000
  • Total: $1,005

With filter (25% pass rate):

  • 1,000 search calls: $5
  • 2,500 scrape calls at $0.10: $250
  • 100 filter LLM calls at $0.001: $0.10
  • Total: $255.10

The filter saves $750/month on this workload. The LLM filter call costs $0.10 total.

When to Skip the Filter

For targeted extraction where you already know which URLs contain relevant content (a curated list, a specific domain pattern, known documentation pages), skip the filter — you are paying $0.001 for a decision you have already made. The filter earns its keep on broad research tasks where the relevant fraction of results is uncertain.

Continue reading

aeod2c

AEO Tracking for D2C Ecommerce Brands in 2026

6 min read
seoai-agents

Building Your First AI SEO Agent: Start With One Workflow

6 min read
ScavioScavio

Real-time search API for AI agents. Search every platform, not just Google.

Product

  • Features
  • Pricing
  • Dashboard
  • Affiliates

Developers

  • Documentation
  • API Reference
  • Quickstart
  • MCP Integration
  • Python SDK

Alternatives

  • Tavily Alternative
  • SerpAPI Alternative
  • Firecrawl Alternative
  • Exa Alternative

Tools

  • JSON Formatter
  • cURL to Code
  • Token Counter
  • All Tools

© 2026 Scavio. All rights reserved.

Featured on TAAFT
Terms of ServicePrivacy Policy