ScavioScavio
ProductPricingDocs
Sign InGet Started
  1. Home
  2. Tutorials
  3. How to Aggregate Multi-Source Data with a Search API
Tutorial

How to Aggregate Multi-Source Data with a Search API

Aggregate data from Google, Amazon, YouTube, Reddit, and Walmart in a single pipeline. Deduplicate, normalize, and merge results across platforms.

Get Free API KeyAPI Docs

Aggregate data from multiple search platforms into a unified dataset by querying Google, Amazon, YouTube, Reddit, and Walmart through a single API, normalizing the response fields, deduplicating overlapping results, and merging them into a combined output. Multi-source aggregation is essential for competitive analysis, market research, and product comparison workflows where no single platform provides a complete picture. A unified pipeline eliminates the need to manage separate API keys and response parsers for each platform.

Prerequisites

  • Python 3.8+ installed
  • requests library installed
  • A Scavio API key from scavio.dev
  • A query or topic to research across platforms

Walkthrough

Step 1: Query all platforms

Search the same query across all supported platforms in one pass.

Python
import os, requests, time

API_KEY = os.environ['SCAVIO_API_KEY']
PLATFORMS = ['google', 'amazon', 'youtube', 'reddit', 'walmart']

def search_all(query: str) -> dict:
    all_results = {}
    for platform in PLATFORMS:
        resp = requests.post('https://api.scavio.dev/api/v1/search',
            headers={'x-api-key': API_KEY},
            json={'platform': platform, 'query': query}, timeout=15)
        data = resp.json()
        all_results[platform] = data.get('organic_results', [])
        time.sleep(0.2)
    return all_results

results = search_all('wireless earbuds')
for platform, items in results.items():
    print(f'{platform}: {len(items)} results')

Step 2: Normalize response fields

Map platform-specific fields to a common schema for unified processing.

Python
def normalize_result(result: dict, platform: str) -> dict:
    """Normalize a result from any platform to a common schema."""
    return {
        'platform': platform,
        'title': result.get('title', ''),
        'url': result.get('link', result.get('url', '')),
        'description': result.get('snippet', result.get('description', '')),
        'price': result.get('price', ''),
        'rating': result.get('rating', ''),
        'thumbnail': result.get('thumbnail', ''),
        'source': result.get('source', result.get('channel', '')),
    }

def normalize_all(raw_results: dict) -> list:
    normalized = []
    for platform, items in raw_results.items():
        for item in items:
            normalized.append(normalize_result(item, platform))
    return normalized

normalized = normalize_all(results)
print(f'Total normalized results: {len(normalized)}')
for r in normalized[:3]:
    print(f"  [{r['platform']}] {r['title'][:50]}")

Step 3: Deduplicate results

Remove duplicate entries that appear across multiple platforms.

Python
from urllib.parse import urlparse

def deduplicate(results: list) -> list:
    seen_titles = set()
    seen_domains = {}
    deduped = []
    for r in results:
        title_key = r['title'].lower().strip()[:60]
        if title_key in seen_titles:
            continue
        seen_titles.add(title_key)
        domain = urlparse(r['url']).netloc if r['url'] else ''
        if domain and domain in seen_domains:
            existing = seen_domains[domain]
            if len(r['description']) > len(existing['description']):
                deduped.remove(existing)
                deduped.append(r)
                seen_domains[domain] = r
            continue
        if domain:
            seen_domains[domain] = r
        deduped.append(r)
    return deduped

deduped = deduplicate(normalized)
print(f'Before: {len(normalized)}, After: {len(deduped)} (removed {len(normalized) - len(deduped)} dupes)')

Step 4: Merge and rank

Merge results from all platforms and rank by relevance signals.

Python
def rank_results(results: list) -> list:
    for r in results:
        score = 0
        if r['description']:
            score += 1
        if r['price']:
            score += 1
        if r['rating']:
            score += 1
        if r['url']:
            score += 1
        if r['platform'] == 'google':
            score += 1  # Boost web results for general queries
        r['relevance_score'] = score
    return sorted(results, key=lambda x: x['relevance_score'], reverse=True)

ranked = rank_results(deduped)
for r in ranked[:5]:
    print(f"  [{r['platform']}] score={r['relevance_score']} {r['title'][:45]}")

Step 5: Export aggregated dataset

Save the merged, deduplicated, and ranked results as a single JSON dataset.

Python
import json

def export_aggregated(results: list, output_path: str):
    dataset = {
        'query': 'wireless earbuds',
        'platforms': PLATFORMS,
        'total_results': len(results),
        'platform_breakdown': {},
        'results': results,
    }
    for p in PLATFORMS:
        dataset['platform_breakdown'][p] = len([r for r in results if r['platform'] == p])
    with open(output_path, 'w') as f:
        json.dump(dataset, f, indent=2)
    print(f'Exported {len(results)} results to {output_path}')
    for p, count in dataset['platform_breakdown'].items():
        print(f'  {p}: {count}')

export_aggregated(ranked, 'aggregated_results.json')

Python Example

Python
import requests, os
H = {'x-api-key': os.environ['SCAVIO_API_KEY']}

def multi_search(query, platforms=None):
    platforms = platforms or ['google', 'amazon', 'youtube']
    all_results = []
    for p in platforms:
        data = requests.post('https://api.scavio.dev/api/v1/search', headers=H,
            json={'platform': p, 'query': query}).json()
        for r in data.get('organic_results', [])[:3]:
            all_results.append({'platform': p, 'title': r.get('title', ''), 'url': r.get('link', '')})
    return all_results

print(multi_search('wireless earbuds'))

JavaScript Example

JavaScript
const H = {'x-api-key': process.env.SCAVIO_API_KEY, 'Content-Type': 'application/json'};
async function multiSearch(query, platforms = ['google', 'amazon', 'youtube']) {
  const all = [];
  for (const p of platforms) {
    const r = await fetch('https://api.scavio.dev/api/v1/search', {
      method: 'POST', headers: H,
      body: JSON.stringify({platform: p, query})
    });
    const results = (await r.json()).organic_results || [];
    results.slice(0, 3).forEach(r => all.push({platform: p, title: r.title, url: r.link}));
  }
  return all;
}
multiSearch('wireless earbuds').then(console.log);

Expected Output

JSON
A unified dataset combining normalized, deduplicated, and ranked results from Google, Amazon, YouTube, Reddit, and Walmart for comprehensive multi-source analysis.

Related Tutorials

  • How to Build a Shopify Competitor Tracker API
  • How to Build a Multi-Marketplace Price Tracker

Frequently Asked Questions

Most developers complete this tutorial in 15 to 30 minutes. You will need a Scavio API key (free tier works) and a working Python or JavaScript environment.

Python 3.8+ installed. requests library installed. A Scavio API key from scavio.dev. A query or topic to research across platforms. A Scavio API key gives you 50 free credits on signup.

Yes. The free tier includes 50 credits on signup, which is more than enough to complete this tutorial and prototype a working solution.

Scavio has a native LangChain package (langchain-scavio), an MCP server, and a plain REST API that works with any HTTP client. This tutorial uses the raw REST API, but you can adapt to your framework of choice.

Related Resources

Best Of

Best Multi-Platform Data APIs for Agent Grounding in May 2026

Read more
Best Of

Best Multi-Platform Search APIs for Agencies in 2026

Read more
Use Case

AI Trading Multi-Source Data Aggregation

Read more
Use Case

Pi Coding Agent Multi-Platform Search

Read more
Glossary

Search API Provider Landscape (2026)

Read more
Solution

Collect Data from Multiple Sources Through One API

Read more

Start Building

Aggregate data from Google, Amazon, YouTube, Reddit, and Walmart in a single pipeline. Deduplicate, normalize, and merge results across platforms.

Get Free API KeyRead the Docs
ScavioScavio

Real-time search API for AI agents. Search every platform, not just Google.

Product

  • Features
  • Pricing
  • Dashboard
  • Affiliates

Developers

  • Documentation
  • API Reference
  • Quickstart
  • MCP Integration
  • Python SDK

Alternatives

  • Tavily Alternative
  • SerpAPI Alternative
  • Firecrawl Alternative
  • Exa Alternative

Tools

  • JSON Formatter
  • cURL to Code
  • Token Counter
  • All Tools

© 2026 Scavio. All rights reserved.

Featured on TAAFT
Terms of ServicePrivacy Policy