ScavioScavio
ProductPricingDocs
Sign InGet Started
  1. Home
  2. Tutorials
  3. How to Build an AI Content Grounding Pipeline
Tutorial

How to Build an AI Content Grounding Pipeline

Build a pipeline that grounds LLM-generated content with verified search data. Reduce hallucinations by cross-referencing claims against live SERP results.

Get Free API KeyAPI Docs

LLMs generate fluent text but frequently hallucinate statistics, dates, product details, and claims. Content grounding solves this by running the LLM's assertions through a verification loop: extract factual claims from the generated text, search for each claim via a real-time search API, and flag or replace any claim that contradicts the search evidence. This tutorial builds a grounding pipeline that takes raw LLM output, extracts checkable claims, verifies each one against Scavio search results, and produces a grounded version with citation URLs. The pipeline catches hallucinated numbers, outdated information, and fabricated sources before they reach production.

Prerequisites

  • Python 3.10+ installed
  • requests library installed
  • A Scavio API key from scavio.dev
  • An OpenAI API key (or any LLM API for claim extraction)

Walkthrough

Step 1: Extract factual claims from LLM output

Parse the generated text to identify statements that contain verifiable facts: numbers, dates, product names, company claims. Use a second LLM call to extract these as a list.

Python
import os, requests, json

SCAVIO_KEY = os.environ['SCAVIO_API_KEY']
OPENAI_KEY = os.environ['OPENAI_API_KEY']
SEARCH_ENDPOINT = 'https://api.scavio.dev/api/v1/search'
SEARCH_HEADERS = {'x-api-key': SCAVIO_KEY, 'Content-Type': 'application/json'}

def extract_claims(text):
    resp = requests.post('https://api.openai.com/v1/chat/completions',
        headers={'Authorization': f'Bearer {OPENAI_KEY}', 'Content-Type': 'application/json'},
        json={'model': 'gpt-4o', 'temperature': 0,
            'messages': [{'role': 'system', 'content': 'Extract all factual claims from the text. Return a JSON array of strings, each a single verifiable claim.'},
                {'role': 'user', 'content': text}],
            'response_format': {'type': 'json_object'}})
    return json.loads(resp.json()['choices'][0]['message']['content']).get('claims', [])

Step 2: Verify each claim against search results

For each extracted claim, run a Scavio search query and check whether the top results support, contradict, or are silent on the claim.

Python
def verify_claim(claim):
    resp = requests.post(SEARCH_ENDPOINT, headers=SEARCH_HEADERS,
        json={'query': claim, 'country_code': 'us'})
    results = resp.json().get('organic_results', [])[:5]
    snippets = [r.get('snippet', '') for r in results if r.get('snippet')]
    sources = [r['link'] for r in results[:3]]
    evidence = ' '.join(snippets).lower()
    claim_lower = claim.lower()
    supported = any(word in evidence for word in claim_lower.split() if len(word) > 4)
    return {
        'claim': claim,
        'status': 'SUPPORTED' if supported else 'UNVERIFIED',
        'sources': sources,
        'evidence_preview': snippets[0][:200] if snippets else '',
    }

Step 3: Build the grounded output with citations

Replace or annotate unverified claims in the original text. Append source URLs as citations for verified claims.

Python
def ground_content(raw_text):
    claims = extract_claims(raw_text)
    print(f'Extracted {len(claims)} claims to verify')
    verifications = []
    for claim in claims:
        result = verify_claim(claim)
        verifications.append(result)
        print(f"  [{result['status']}] {claim[:60]}")
    grounded = raw_text
    citations = []
    for v in verifications:
        if v['status'] == 'SUPPORTED' and v['sources']:
            citations.append(f"- {v['claim'][:80]}: {v['sources'][0]}")
        elif v['status'] == 'UNVERIFIED':
            grounded = grounded.replace(v['claim'],
                f"{v['claim']} [UNVERIFIED - needs manual review]")
    grounded += '\n\nSources:\n' + '\n'.join(citations) if citations else ''
    cost = len(claims) * 0.005
    print(f'Verification cost: ${cost:.3f} ({len(claims)} searches)')
    return grounded, verifications

Python Example

Python
import os, requests, json

SCAVIO_KEY = os.environ['SCAVIO_API_KEY']
SH = {'x-api-key': SCAVIO_KEY, 'Content-Type': 'application/json'}

def search(query):
    return requests.post('https://api.scavio.dev/api/v1/search',
        headers=SH, json={'query': query, 'country_code': 'us'}).json()

def verify_claims(claims):
    results = []
    for claim in claims:
        data = search(claim)
        snippets = [r.get('snippet', '') for r in data.get('organic_results', [])[:5]]
        sources = [r['link'] for r in data.get('organic_results', [])[:3]]
        evidence = ' '.join(snippets).lower()
        supported = any(w in evidence for w in claim.lower().split() if len(w) > 4)
        results.append({'claim': claim, 'ok': supported, 'sources': sources})
    return results

def ground(text, claims):
    verified = verify_claims(claims)
    for v in verified:
        tag = 'OK' if v['ok'] else 'UNVERIFIED'
        print(f'[{tag}] {v["claim"][:60]}')
    bad = [v for v in verified if not v['ok']]
    print(f'{len(verified) - len(bad)}/{len(verified)} claims verified')
    print(f'Cost: ${len(claims) * 0.005:.3f}')

claims = ['Python is the most popular programming language in 2026',
    'FastAPI processes 10 million requests per second']
ground('sample text', claims)

JavaScript Example

JavaScript
const SCAVIO_KEY = process.env.SCAVIO_API_KEY;
const SH = { 'x-api-key': SCAVIO_KEY, 'Content-Type': 'application/json' };

async function search(query) {
  return fetch('https://api.scavio.dev/api/v1/search', {
    method: 'POST', headers: SH, body: JSON.stringify({ query, country_code: 'us' })
  }).then(r => r.json());
}

async function verifyClaims(claims) {
  const results = [];
  for (const claim of claims) {
    const data = await search(claim);
    const snippets = (data.organic_results || []).slice(0, 5)
      .map(r => r.snippet || '').join(' ').toLowerCase();
    const sources = (data.organic_results || []).slice(0, 3).map(r => r.link);
    const supported = claim.toLowerCase().split(' ')
      .filter(w => w.length > 4).some(w => snippets.includes(w));
    results.push({ claim, ok: supported, sources });
  }
  return results;
}

async function ground(claims) {
  const results = await verifyClaims(claims);
  results.forEach(v => console.log(`[${v.ok ? 'OK' : 'UNVERIFIED'}] ${v.claim.slice(0, 60)}`));
  const verified = results.filter(v => v.ok).length;
  console.log(`${verified}/${results.length} claims verified`);
  console.log(`Cost: $${(claims.length * 0.005).toFixed(3)}`);
}

ground(['Python is the most popular language in 2026']).catch(console.error);

Expected Output

JSON
Extracted 5 claims to verify
  [SUPPORTED] Python is the most popular programming language in 2026
  [UNVERIFIED] FastAPI processes 10 million requests per second
  [SUPPORTED] Django 5.2 was released in April 2026
  [SUPPORTED] OpenAI has over 200 million weekly active users
  [UNVERIFIED] Rust will replace Python by 2028

3/5 claims verified
Verification cost: $0.025 (5 searches)

Related Tutorials

  • How to Build a RAG Agent with LangChain and Scavio
  • How to Add Web Search to a Local LLM Agent
  • How to Build an Autonomous Research Agent with Scavio

Frequently Asked Questions

Most developers complete this tutorial in 15 to 30 minutes. You will need a Scavio API key (free tier works) and a working Python or JavaScript environment.

Python 3.10+ installed. requests library installed. A Scavio API key from scavio.dev. An OpenAI API key (or any LLM API for claim extraction). A Scavio API key gives you 50 free credits on signup.

Yes. The free tier includes 50 credits on signup, which is more than enough to complete this tutorial and prototype a working solution.

Scavio has a native LangChain package (langchain-scavio), an MCP server, and a plain REST API that works with any HTTP client. This tutorial uses the raw REST API, but you can adapt to your framework of choice.

Related Resources

Best Of

Best Search APIs for Open-Source LLM Grounding in 2026

Read more
Use Case

Local LLM Search Grounding via API

Read more
Best Of

Best Search APIs for Pipeline Integration in 2026

Read more
Glossary

LLM Grounding

Read more
Glossary

Search API Provider Landscape (2026)

Read more
Solution

Ground LLM Responses with Real-Time Search Data

Read more

Start Building

Build a pipeline that grounds LLM-generated content with verified search data. Reduce hallucinations by cross-referencing claims against live SERP results.

Get Free API KeyRead the Docs
ScavioScavio

Real-time search API for AI agents. Search every platform, not just Google.

Product

  • Features
  • Pricing
  • Dashboard
  • Affiliates

Developers

  • Documentation
  • API Reference
  • Quickstart
  • MCP Integration
  • Python SDK

Alternatives

  • Tavily Alternative
  • SerpAPI Alternative
  • Firecrawl Alternative
  • Exa Alternative

Tools

  • JSON Formatter
  • cURL to Code
  • Token Counter
  • All Tools

© 2026 Scavio. All rights reserved.

Featured on TAAFT
Terms of ServicePrivacy Policy