LLM hallucinations (fabricated pricing, invented features, outdated version numbers) erode user trust and can cause real harm in production. Automated detection verifies LLM-generated claims against live search results before the output reaches users. The approach: extract factual assertions, search for each claim, flag mismatches.
Why Search-Based Verification Works
Traditional hallucination detection relies on the model's own confidence scores, which are unreliable. A model can be confidently wrong. Search-based verification is external: if the LLM says "Tavily costs $50/mo" and a Google search for "Tavily pricing 2026" returns results saying "$30/mo", that is a verified mismatch regardless of the model's confidence.
The Verification Pipeline
import requests, os, json
H = {"x-api-key": os.environ["SCAVIO_API_KEY"]}
def verify_claim(claim_text, search_query, expected_value):
"""Verify a factual claim against live search results."""
r = requests.post("https://api.scavio.dev/api/v1/search", headers=H,
json={"platform": "google", "query": search_query},
timeout=10).json()
snippets = " ".join(
s.get("snippet", "") for s in r.get("organic", [])[:3])
verified = expected_value.lower() in snippets.lower()
return {
"claim": claim_text,
"verified": verified,
"evidence": snippets[:300],
}
claims = [
{"text": "Tavily costs $30/mo",
"query": "Tavily pricing 2026",
"expected": "$30"},
{"text": "Firecrawl Hobby plan is $16/mo",
"query": "Firecrawl pricing 2026",
"expected": "$16"},
]
for c in claims:
result = verify_claim(c["text"], c["query"], c["expected"])
status = "PASS" if result["verified"] else "FAIL"
print(f"[{status}] {result['claim']}")Cold Start for LLM Failure Data
Building a dataset of known LLM failures is a cold start problem. Instead of waiting for community contributions, pipe search API results as ground truth against LLM outputs automatically. If the LLM says "library X has function Y" and a search for current docs says otherwise, that is a verified failure data point. Automated verification scales better than manual crowdsourcing for building the initial dataset.
What to Verify
- Pricing claims: any dollar amount attributed to a specific product or service
- Version numbers: any specific version (v3.2, 2.0.1) claimed for software
- Feature claims: "X supports Y" or "X integrates with Y"
- Date-sensitive facts: "as of 2026" or "released in Q1"
- Comparative claims: "X is faster/cheaper/better than Y"
The cost of verification is low: 1-3 search queries per claim at $0.005/credit. For a typical LLM response with 5 factual claims, verification costs $0.025-$0.075. Compare that to the cost of a customer making a decision based on a hallucinated pricing comparison.