ScavioScavio
ProductPricingDocs
Sign InGet Started
  1. Home
  2. Tutorials
  3. How to Build a Content Pipeline with Live Data
Tutorial

How to Build a Content Pipeline with Live Data

Feed real SERP data, Reddit opinions, and product prices into your AI content pipeline. Stop generating AI slop.

Get Free API KeyAPI Docs

AI content without live data is slop. Articles about 'best APIs' with fabricated pricing and product comparisons with imaginary features fail because the LLM invented details. This tutorial builds a content pipeline that fetches real data before generating text: current prices from search results, user opinions from Reddit, and product details from Amazon. The output passes fact-checking because facts came from live sources.

Prerequisites

  • Python 3.10+
  • requests library installed
  • A Scavio API key from scavio.dev
  • An OpenAI API key for content generation

Walkthrough

Step 1: Build the multi-source data fetcher

Pull data from Google, Reddit, and Amazon via Scavio.

Python
import os, requests, json

SK = os.environ['SCAVIO_API_KEY']
OK = os.environ['OPENAI_API_KEY']
SH = {'x-api-key': SK, 'Content-Type': 'application/json'}

def fetch_google(q):
    data = requests.post('https://api.scavio.dev/api/v1/search',
        headers=SH, json={'query': q, 'country_code': 'us'}).json()
    return [{'title': r['title'], 'snippet': r.get('snippet', ''), 'url': r['link']}
            for r in data.get('organic_results', [])[:5]]

def fetch_reddit(q):
    data = requests.post('https://api.scavio.dev/api/v1/search',
        headers=SH, json={'query': q, 'platform': 'reddit', 'country_code': 'us'}).json()
    return [{'title': r['title'], 'snippet': r.get('snippet', '')}
            for r in data.get('organic_results', [])[:5]]

def fetch_products(q):
    data = requests.post('https://api.scavio.dev/api/v1/search',
        headers=SH, json={'query': q, 'platform': 'amazon', 'marketplace': 'US'}).json()
    return [{'title': p.get('title', ''), 'price': p.get('price', 'N/A'), 'rating': p.get('rating', '')}
            for p in data.get('products', [])[:5]]

Step 2: Assemble a research brief

Compile data from all sources into a structured brief for the LLM.

Python
def research(topic, product_query=None):
    brief = f'Topic: {topic}\n\n=== Google ===\n'
    g = fetch_google(topic)
    brief += '\n'.join(f"- {r['title']}: {r['snippet']}" for r in g)
    brief += '\n\n=== Reddit ===\n'
    r = fetch_reddit(topic)
    brief += '\n'.join(f"- {d['title']}: {d['snippet']}" for d in r)
    credits = 2
    if product_query:
        p = fetch_products(product_query)
        brief += '\n\n=== Amazon Products ===\n'
        brief += '\n'.join(f"- {x['title']}: {x['price']} ({x['rating']})" for x in p)
        credits += 1
    print(f'Research cost: ${credits * 0.005:.3f}')
    return brief

Step 3: Generate grounded content

Pass the brief to the LLM with strict instructions to only use provided data.

Python
def generate(topic, brief):
    resp = requests.post('https://api.openai.com/v1/chat/completions',
        headers={'Authorization': f'Bearer {OK}', 'Content-Type': 'application/json'},
        json={'model': 'gpt-4o', 'temperature': 0.3, 'messages': [
            {'role': 'system', 'content': 'Write based ONLY on the research brief. No fabricated stats. '
                'If data is missing, say so. Cite Reddit as "users report". Start with a direct answer.'},
            {'role': 'user', 'content': f'Write 600 words about: {topic}\n\n{brief}'}]})
    return resp.json()['choices'][0]['message']['content']

brief = research('best noise canceling headphones 2026', 'noise canceling headphones')
article = generate('best noise canceling headphones 2026', brief)
print(article[:300])

Step 4: Validate prices in generated content

Check that dollar amounts in the article appear in source data.

Python
import re

def validate(content, brief):
    source = brief.lower()
    prices = re.findall(r'\$[\d,.]+', content)
    issues = [p for p in prices if p.lower() not in source]
    if issues:
        print(f'WARNING: {len(issues)} unverified prices: {issues}')
    else:
        print('All prices verified against source data.')
    return issues

validate(article, brief)

Python Example

Python
import os, requests

SK = os.environ['SCAVIO_API_KEY']
SH = {'x-api-key': SK, 'Content-Type': 'application/json'}

def research(topic):
    g = requests.post('https://api.scavio.dev/api/v1/search',
        headers=SH, json={'query': topic, 'country_code': 'us'}).json().get('organic_results', [])[:3]
    r = requests.post('https://api.scavio.dev/api/v1/search',
        headers=SH, json={'query': topic, 'platform': 'reddit', 'country_code': 'us'}).json().get('organic_results', [])[:3]
    print(f'{len(g)} Google + {len(r)} Reddit results. Cost: $0.010')
    return g, r

research('best serp api 2026')

JavaScript Example

JavaScript
const SK = process.env.SCAVIO_API_KEY;
const SH = { 'x-api-key': SK, 'Content-Type': 'application/json' };

async function research(topic) {
  const [g, r] = await Promise.all([
    fetch('https://api.scavio.dev/api/v1/search', {
      method: 'POST', headers: SH,
      body: JSON.stringify({ query: topic, country_code: 'us' })
    }).then(r => r.json()),
    fetch('https://api.scavio.dev/api/v1/search', {
      method: 'POST', headers: SH,
      body: JSON.stringify({ query: topic, platform: 'reddit', country_code: 'us' })
    }).then(r => r.json()),
  ]);
  console.log(`${(g.organic_results||[]).length}G + ${(r.organic_results||[]).length}R. Cost: $0.010`);
}
research('best serp api 2026').catch(console.error);

Expected Output

JSON
Research cost: $0.015

The Sony WH-1000XM5 remains the top noise canceling headphone
in 2026, priced at $298 on Amazon with a 4.6 rating. Users on
Reddit report the XM5 noise cancellation outperforms Bose QC
Ultra in airplane environments...

All prices verified against source data.

Related Tutorials

  • How to Build an AI Content Grounding Pipeline
  • How to Build Intent-Based Leads from Reddit
  • How to Fetch Google Search Results in Python

Frequently Asked Questions

Most developers complete this tutorial in 15 to 30 minutes. You will need a Scavio API key (free tier works) and a working Python or JavaScript environment.

Python 3.10+. requests library installed. A Scavio API key from scavio.dev. An OpenAI API key for content generation. A Scavio API key gives you 50 free credits on signup.

Yes. The free tier includes 50 credits on signup, which is more than enough to complete this tutorial and prototype a working solution.

Scavio has a native LangChain package (langchain-scavio), an MCP server, and a plain REST API that works with any HTTP client. This tutorial uses the raw REST API, but you can adapt to your framework of choice.

Related Resources

Best Of

Best Real Time Search API in 2026

Read more
Best Of

Best Search APIs for Pipeline Integration in 2026

Read more
Glossary

Search API Provider Landscape (2026)

Read more
Use Case

Vibe-Coded Data-Grounded App

Read more
Use Case

Agentic SEO Content Operations

Read more
Glossary

Free Search API Tier Comparison

Read more

Start Building

Feed real SERP data, Reddit opinions, and product prices into your AI content pipeline. Stop generating AI slop.

Get Free API KeyRead the Docs
ScavioScavio

Real-time search API for AI agents. Search every platform, not just Google.

Product

  • Features
  • Pricing
  • Dashboard
  • Affiliates

Developers

  • Documentation
  • API Reference
  • Quickstart
  • MCP Integration
  • Python SDK

Alternatives

  • Tavily Alternative
  • SerpAPI Alternative
  • Firecrawl Alternative
  • Exa Alternative

Tools

  • JSON Formatter
  • cURL to Code
  • Token Counter
  • All Tools

© 2026 Scavio. All rights reserved.

Featured on TAAFT
Terms of ServicePrivacy Policy