ScavioScavio
ProductPricingDocs
Sign InGet Started
  1. Home
  2. Tutorials
  3. How to Build a Job Listing Aggregator Without Scraping
Tutorial

How to Build a Job Listing Aggregator Without Scraping

Build a job search aggregator using search APIs instead of fragile scrapers. Find listings from Google-indexed job boards and Reddit hiring threads.

Get Free API KeyAPI Docs

Job boards (Indeed, LinkedIn, Greenhouse) actively fight scrapers. But their listings are indexed by Google. Instead of maintaining scrapers that break monthly, search Google for indexed job listings and Reddit for who-is-hiring threads. This tutorial builds an aggregator that survives anti-bot updates.

Prerequisites

  • Python 3.8+
  • A Scavio API key
  • Basic knowledge of job listing data

Walkthrough

Step 1: Search Google for job listings

Use site: operators to find listings on specific job boards indexed by Google.

Python
import requests, os

H = {'x-api-key': os.environ['SCAVIO_API_KEY'], 'Content-Type': 'application/json'}

def find_jobs(role: str, location: str = '') -> list:
    query = f'{role} {location} site:greenhouse.io OR site:lever.co OR site:jobs.ashbyhq.com'
    resp = requests.post('https://api.scavio.dev/api/v1/search', headers=H,
        json={'platform': 'google', 'query': query}, timeout=10)
    return [{'title': r.get('title',''), 'url': r.get('link',''), 'snippet': r.get('snippet','')}
            for r in resp.json().get('organic', [])]

Step 2: Add Reddit hiring thread search

Find who-is-hiring posts on relevant subreddits.

Python
def find_reddit_jobs(role: str) -> list:
    resp = requests.post('https://api.scavio.dev/api/v1/search', headers=H,
        json={'platform': 'reddit', 'query': f'{role} hiring'}, timeout=10)
    threads = resp.json().get('organic', [])
    return [{'title': t.get('title',''), 'url': t.get('link',''), 'subreddit': t.get('subreddit',''),
             'score': t.get('score', 0)}
            for t in threads if 'hiring' in t.get('title','').lower() or 'job' in t.get('title','').lower()]

Step 3: Deduplicate and rank

Remove duplicate listings and rank by relevance.

Python
def aggregate_jobs(role: str, location: str = '') -> dict:
    google_jobs = find_jobs(role, location)
    reddit_jobs = find_reddit_jobs(role)
    
    seen_urls = set()
    unique_jobs = []
    for job in google_jobs + reddit_jobs:
        url = job.get('url', '')
        if url and url not in seen_urls:
            seen_urls.add(url)
            unique_jobs.append(job)
    
    return {
        'role': role,
        'location': location,
        'total_found': len(unique_jobs),
        'from_google': len(google_jobs),
        'from_reddit': len(reddit_jobs),
        'listings': unique_jobs,
    }

results = aggregate_jobs('senior python developer', 'remote')
print(f"Found {results['total_found']} unique listings")
for job in results['listings'][:5]:
    print(f"  - {job['title']}")

Step 4: Schedule daily updates

Run the aggregator on a schedule and track new listings.

Python
import json
from pathlib import Path
from datetime import date

def daily_job_check(roles: list, location: str = '') -> dict:
    today = date.today().isoformat()
    all_listings = []
    
    for role in roles:
        results = aggregate_jobs(role, location)
        all_listings.extend(results['listings'])
    
    # Load previous listings to find new ones
    history_file = Path('job_history.json')
    seen = set()
    if history_file.exists():
        seen = set(json.loads(history_file.read_text()).get('urls', []))
    
    new_listings = [j for j in all_listings if j.get('url') not in seen]
    
    # Update history
    seen.update(j.get('url','') for j in all_listings)
    history_file.write_text(json.dumps({'urls': list(seen), 'last_run': today}))
    
    return {'date': today, 'new_listings': len(new_listings), 'total_tracked': len(seen), 'new': new_listings}

Python Example

Python
import requests, os
H = {'x-api-key': os.environ['SCAVIO_API_KEY'], 'Content-Type': 'application/json'}

def find_jobs(role, location=''):
    q = f'{role} {location} site:greenhouse.io OR site:lever.co'
    r = requests.post('https://api.scavio.dev/api/v1/search', headers=H,
        json={'platform': 'google', 'query': q}).json()
    return [{'title': x['title'], 'url': x.get('link','')} for x in r.get('organic',[])]

JavaScript Example

JavaScript
async function findJobs(role, location = '') {
  const q = `${role} ${location} site:greenhouse.io OR site:lever.co`;
  const r = await fetch('https://api.scavio.dev/api/v1/search', {
    method: 'POST', headers: {'x-api-key': process.env.SCAVIO_API_KEY, 'Content-Type': 'application/json'},
    body: JSON.stringify({platform: 'google', query: q})
  });
  return (await r.json()).organic?.map(x => ({title: x.title, url: x.link})) || [];
}

Expected Output

JSON
A job listing aggregator that finds positions from Google-indexed job boards and Reddit hiring threads without maintaining any scrapers.

Related Tutorials

  • How to Fetch Google Search Results in Python
  • How to Build an Autonomous Research Agent with Scavio

Frequently Asked Questions

Most developers complete this tutorial in 15 to 30 minutes. You will need a Scavio API key (free tier works) and a working Python or JavaScript environment.

Python 3.8+. A Scavio API key. Basic knowledge of job listing data. A Scavio API key gives you 50 free credits on signup.

Yes. The free tier includes 50 credits on signup, which is more than enough to complete this tutorial and prototype a working solution.

Scavio has a native LangChain package (langchain-scavio), an MCP server, and a plain REST API that works with any HTTP client. This tutorial uses the raw REST API, but you can adapt to your framework of choice.

Related Resources

Best Of

Best APIs for Building Job Search Platforms in 2026

Read more
Solution

Build a Job Listing Aggregator with Search API

Read more
Glossary

Search API Provider Landscape (2026)

Read more
Best Of

Best Budget Search APIs for AI Agents Under $10/mo (2026)

Read more
Glossary

Structured Search API vs. Raw Scraping

Read more
Comparison

Search APIs (Scavio, Tavily, SerpAPI) vs Headless Browser (Playwright, Puppeteer, Browserbase)

Read more

Start Building

Build a job search aggregator using search APIs instead of fragile scrapers. Find listings from Google-indexed job boards and Reddit hiring threads.

Get Free API KeyRead the Docs
ScavioScavio

Real-time search API for AI agents. Search every platform, not just Google.

Product

  • Features
  • Pricing
  • Dashboard
  • Affiliates

Developers

  • Documentation
  • API Reference
  • Quickstart
  • MCP Integration
  • Python SDK

Alternatives

  • Tavily Alternative
  • SerpAPI Alternative
  • Firecrawl Alternative
  • Exa Alternative

Tools

  • JSON Formatter
  • cURL to Code
  • Token Counter
  • All Tools

© 2026 Scavio. All rights reserved.

Featured on TAAFT
Terms of ServicePrivacy Policy