ScavioScavio
ProductPricingDocs
Sign InGet Started
  1. Home
  2. Tutorials
  3. How to Build a Multi-Source News Aggregation Agent
Tutorial

How to Build a Multi-Source News Aggregation Agent

Pipeline that runs 6 query bursts/day across 9 sources, dedupes, and emits AI-edited articles. Pattern from r/IA_Italia's cybersecurity build.

Get Free API KeyAPI Docs

An r/IA_Italia post documented a multi-source AI-native cybersecurity news pipeline: 6 cron bursts/day, 9 sources, similarity-filter dedup, Gemini editor. This tutorial reconstructs the pattern using Scavio.

Prerequisites

  • Python 3.10+
  • Scavio API key
  • Gemini or any LLM API key

Walkthrough

Step 1: Cron triggers (6 bursts/day)

Spread across the day to capture fresh news.

Bash
# crontab -e
# 0 6,10,12,15,18,21 * * * /usr/bin/python pipeline.py

Step 2: LLM generates 6 query phrases

Avoid topics already covered today.

Python
import anthropic
client = anthropic.Anthropic()

def queries(covered):
    msg = client.messages.create(model='claude-sonnet-4-6', max_tokens=300,
        messages=[{'role':'user','content':f'Generate 6 news queries about cybersecurity. Avoid: {covered}'}])
    return msg.content[0].text.split('\n')

Step 3: Parallel SERP across surfaces

Scavio search returns Google News + organic.

Python
import requests, os
API_KEY = os.environ['SCAVIO_API_KEY']

def news(q):
    return requests.post('https://api.scavio.dev/api/v1/search',
        headers={'x-api-key': API_KEY},
        json={'query': q, 'search_type': 'news'}).json()

Step 4: Similarity-filter dedup

Embed titles, drop near-duplicates against today's published set.

Python
# Pseudocode — use sentence-transformers or a hosted embedding API.
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
def dedup(new_items, published_today):
    new_emb = model.encode([i['title'] for i in new_items])
    pub_emb = model.encode([p['title'] for p in published_today])
    # cosine similarity threshold 0.85 -> drop

Step 5: LLM edits article with editorial angle

Internal links + SEO metadata.

Python
def article(item, related):
    msg = client.messages.create(model='claude-sonnet-4-6', max_tokens=600,
        messages=[{'role':'user','content':f'Write a 300-word news article about {item["title"]} with an editorial angle. Source: {item["link"]}. Related: {related}.'}])
    return msg.content[0].text

Step 6: Throttle publication cadence

Avoid flooding the site.

Python
import time
for article_text in articles:
    publish(article_text)
    time.sleep(60 * 5)  # 5 min between posts

Python Example

Python
# See steps above. Daily run cost: 6 bursts × 6 queries = 36 credits = $0.15.

JavaScript Example

JavaScript
// Same pattern in TS. Use the Vercel AI SDK or any LLM client for the edit step.

Expected Output

JSON
About 20-30 published articles per day, deduplicated, with editorial angle. Daily 11:30 PM recap from the last 24 hours.

Related Tutorials

  • How to Scrape Google News with Python and Scavio
  • How to Build a Cybersecurity News Pipeline with AI

Frequently Asked Questions

Most developers complete this tutorial in 15 to 30 minutes. You will need a Scavio API key (free tier works) and a working Python or JavaScript environment.

Python 3.10+. Scavio API key. Gemini or any LLM API key. A Scavio API key gives you 50 free credits on signup.

Yes. The free tier includes 50 credits on signup, which is more than enough to complete this tutorial and prototype a working solution.

Scavio has a native LangChain package (langchain-scavio), an MCP server, and a plain REST API that works with any HTTP client. This tutorial uses the raw REST API, but you can adapt to your framework of choice.

Related Resources

Best Of

Best News Aggregation APIs for AI Pipelines in 2026

Read more
Best Of

Best Multi-Platform Data APIs for Agent Grounding in May 2026

Read more
Solution

AI-Native Cybersecurity News Publication Stack

Read more
Use Case

AI-Native Cybersecurity News Publication

Read more
Use Case

News Digest Agent Pipeline

Read more
Solution

Feed Six Platforms Into Your Agent for Fresh Data

Read more

Start Building

Pipeline that runs 6 query bursts/day across 9 sources, dedupes, and emits AI-edited articles. Pattern from r/IA_Italia's cybersecurity build.

Get Free API KeyRead the Docs
ScavioScavio

Real-time search API for AI agents. Search every platform, not just Google.

Product

  • Features
  • Pricing
  • Dashboard
  • Affiliates

Developers

  • Documentation
  • API Reference
  • Quickstart
  • MCP Integration
  • Python SDK

Alternatives

  • Tavily Alternative
  • SerpAPI Alternative
  • Firecrawl Alternative
  • Exa Alternative

Tools

  • JSON Formatter
  • cURL to Code
  • Token Counter
  • All Tools

© 2026 Scavio. All rights reserved.

Featured on TAAFT
Terms of ServicePrivacy Policy