Per-Token Search Billing Makes Agent Costs Unpredictable
Per-token billing on search APIs like Perplexity Sonar and Exa contents makes agent costs unpredictable because agents control context length. The agent decides how many tokens to pull from each search, and that decision is hidden from your billing. Credit-per-query pricing makes agent costs predictable because cost does not depend on result length.
How Per-Token Search Billing Works
Perplexity Sonar charges per token on both input and output. Output includes the web content the model synthesizes into its answer. If an agent asks a complex question, Perplexity pulls more web sources and generates a longer response — more tokens, higher cost. The agent does not control this; Perplexity's model decides how much content to include.
Exa's content retrieval similarly charges per token for the content returned. A request for webpage contents that returns 5,000 tokens costs 5x more than one returning 1,000 tokens. The agent cannot predict how long the retrieved content will be.
The Variance Problem
Consider an agent running 1,000 searches in a month, split between simple queries ("current Bitcoin price") and complex research queries ("explain the implications of the EU AI Act for small software companies"). The simple queries return short responses; the complex ones return long syntheses.
With per-token billing:
- Simple query: 200 tokens output = $0.001
- Complex query: 2,000 tokens output = $0.01
If the ratio of complex to simple queries shifts — because users increasingly use the agent for research rather than lookups — your costs can double without your query count changing.
With credit-per-query billing:
- Simple query: 1 credit = $0.005
- Complex query: 1 credit = $0.005
Your cost is purely a function of query count, which you can measure and predict.
Practical Budget Planning
For budget planning with per-token APIs:
# You cannot predict this reliably
estimated_monthly_cost = num_queries * avg_tokens_per_query * token_price
# avg_tokens_per_query is unknown and varies by query type
# token_price varies by input vs output
# result: wide confidence interval on actual costFor budget planning with credit-based APIs:
# This is exact
estimated_monthly_cost = num_queries * credit_price
# credit_price = $0.005 (Scavio), $0.008 (Tavily)
# num_queries = measurable from logs
# result: narrow confidence intervalWhere Per-Token Billing Is Justified
Per-token billing makes sense when result length is predictably short and consistent:
- Factual lookup queries that always return brief answers
- Structured data extraction where you control the output format
- Use cases where you need the synthesis built into the response (Perplexity's prose answers)
Per-token billing becomes expensive relative to credit-based when:
- Queries vary widely in complexity
- The API generates long synthesized answers you then re-process with your own LLM anyway (paying twice)
- Agents autonomously decide query complexity
The Double-Processing Problem
This is the hidden cost of per-token search APIs in agent pipelines. If your agent:
- Calls Perplexity Sonar to get a synthesized answer (pay per token for synthesis)
- Passes that synthesized answer to Claude for further reasoning (pay again for the input tokens)
You are paying for synthesis twice. With a credit-based SERP API that returns raw results, you pay once for the search and once for your LLM reasoning — and the SERP call is cheaper.
# Inefficient: per-token synthesis + LLM reasoning
perplexity_answer = perplexity.search(query) # pay for synthesis
claude_analysis = claude.reason(perplexity_answer) # pay again for input
# Efficient: raw results + LLM reasoning
search_results = serp_api.search(query) # pay for search only
claude_analysis = claude.reason(search_results) # pay once for LLMThe efficient version costs less and gives you more control over the reasoning step.