ScavioScavio
ProductPricingDocs
Sign InGet Started
  1. Home
  2. Tutorials
  3. How to Build a HiringCafe-Style Job Aggregator
Tutorial

How to Build a HiringCafe-Style Job Aggregator

An r/hiringcafe thread surfaced the pattern: pull from career pages, AI-summarize, surface salary. Walk-through with Scavio + LLM.

Get Free API KeyAPI Docs

An r/hiringcafe thread shared the AI Job Search Agent pattern: pull from real employer career pages, AI-summarize each role, surface salary upfront. This walks a HiringCafe-style aggregator with Scavio + LLM.

Prerequisites

  • Scavio API key
  • An LLM API key
  • A list of target employers (or a way to discover them)

Walkthrough

Step 1: Discover career-page URLs via dorked search

site:company.com careers + jobs.lever.co + boards.greenhouse.io patterns.

Python
import requests, os
H = {'x-api-key': os.environ['SCAVIO_API_KEY']}
DORKS = [
    'site:{domain}/careers',
    'site:{domain}/jobs',
    'site:jobs.lever.co/{domain}',
    'site:boards.greenhouse.io/{domain}',
]
def find_career_urls(domain):
    out = []
    for d in DORKS:
        q = d.format(domain=domain.replace('.com',''))
        r = requests.post('https://api.scavio.dev/api/v1/search', headers=H, json={'query': q}).json()
        out.extend(o['link'] for o in r.get('organic_results', [])[:5])
    return list(set(out))

Step 2: Extract listing pages as markdown

Scavio /extract turns the careers page into clean markdown.

Python
def extract(url):
    return requests.post('https://api.scavio.dev/api/v1/extract',
        headers=H, json={'url': url, 'format': 'markdown'}).json().get('markdown', '')

Step 3: Parse roles with an LLM

Structured extraction: title, location, salary if shown, summary.

Python
PROMPT = '''Extract job postings from this careers page. For each, return JSON with:
- title, team, location, remote (bool), salary_min, salary_max (null if not shown), apply_url, summary (2 sentences).
Return a JSON list.
Page:
{md}'''
result = llm.complete(PROMPT.format(md=markdown))

Step 4: Dedupe by (employer, title, location)

Same role on multiple aggregators = one record.

Python
def dedupe(roles):
    seen = set(); out = []
    for r in roles:
        key = (r['employer'], r['title'], r['location'])
        if key not in seen:
            seen.add(key); out.append(r)
    return out

Step 5: Rank by salary + recency + match score

User-input filters drive the surface.

Python
def rank(roles, user_skills):
    for r in roles:
        match = sum(1 for s in user_skills if s.lower() in (r['summary'] + r['title']).lower())
        r['score'] = (r.get('salary_max') or 0) * 0.3 + match * 100
    return sorted(roles, key=lambda x: -x['score'])

Python Example

Python
# Per-employer cost: ~3 dorked searches + 1 extract + 1 LLM call = ~$0.02-0.05

JavaScript Example

JavaScript
// Same flow in TS.

Expected Output

JSON
JSON list of jobs with title, salary, summary, apply_url. Dedupes across aggregators. Ranks by user skills + salary. The hard part remains the relevance ranking; the data layer is the easy part.

Related Tutorials

  • How to Build an AI Job Search Agent

Frequently Asked Questions

Most developers complete this tutorial in 15 to 30 minutes. You will need a Scavio API key (free tier works) and a working Python or JavaScript environment.

Scavio API key. An LLM API key. A list of target employers (or a way to discover them). A Scavio API key gives you 50 free credits on signup.

Yes. The free tier includes 50 credits on signup, which is more than enough to complete this tutorial and prototype a working solution.

Scavio has a native LangChain package (langchain-scavio), an MCP server, and a plain REST API that works with any HTTP client. This tutorial uses the raw REST API, but you can adapt to your framework of choice.

Related Resources

Best Of

Best APIs for HiringCafe-Style Job Aggregators in 2026

Read more
Solution

HiringCafe-Style Job Aggregator Stack

Read more
Solution

AI Job Search Agent Stack

Read more
Use Case

HiringCafe-Style Job Aggregator

Read more
Best Of

Best APIs for Building Job Search Platforms in 2026

Read more
Use Case

Non-Engineer Job Search Agent with n8n

Read more

Start Building

An r/hiringcafe thread surfaced the pattern: pull from career pages, AI-summarize, surface salary. Walk-through with Scavio + LLM.

Get Free API KeyRead the Docs
ScavioScavio

Real-time search API for AI agents. Search every platform, not just Google.

Product

  • Features
  • Pricing
  • Dashboard
  • Affiliates

Developers

  • Documentation
  • API Reference
  • Quickstart
  • MCP Integration
  • Python SDK

Alternatives

  • Tavily Alternative
  • SerpAPI Alternative
  • Firecrawl Alternative
  • Exa Alternative

Tools

  • JSON Formatter
  • cURL to Code
  • Token Counter
  • All Tools

© 2026 Scavio. All rights reserved.

Featured on TAAFT
Terms of ServicePrivacy Policy