ScavioScavio
ProductPricingDocs
Sign InGet Started
  1. Home
  2. Tutorials
  3. How to Extract Structured Data from Any Website
Tutorial

How to Extract Structured Data from Any Website

Learn how to use Scavio's extract endpoint to pull structured data from any URL without writing custom scrapers.

Get Free API KeyAPI Docs

Extracting structured data from websites typically requires writing custom scrapers for each site's HTML layout. Scavio's extract endpoint takes a URL and returns structured content without any parsing code. This tutorial shows how to extract data from product pages, articles, and company websites using a single API call.

Prerequisites

  • Python 3.8+ or Node.js 18+
  • requests library (Python) or built-in fetch (JS)
  • A Scavio API key from scavio.dev

Walkthrough

Step 1: Extract content from a URL

Send a URL to the extract endpoint and receive structured content.

Python
import requests, os
H = {'x-api-key': os.environ['SCAVIO_API_KEY']}

def extract(url: str) -> dict:
    resp = requests.post('https://api.scavio.dev/api/v1/extract',
        headers=H, json={'url': url}, timeout=30)
    return resp.json()

data = extract('https://example.com/product-page')
print(data)

Step 2: Extract multiple URLs in batch

Process a list of URLs and aggregate the extracted data.

Python
import time

def extract_batch(urls: list, delay: float = 0.5) -> list:
    results = []
    for url in urls:
        try:
            data = extract(url)
            results.append({'url': url, 'status': 'ok', 'data': data})
        except Exception as e:
            results.append({'url': url, 'status': 'error', 'error': str(e)})
        time.sleep(delay)
    return results

urls = ['https://example.com/page1', 'https://example.com/page2']
extracted = extract_batch(urls)

Step 3: Combine search + extract for enrichment

Search for companies, then extract structured data from their websites.

Python
def search_and_extract(query: str) -> list:
    # Search for relevant pages
    search_resp = requests.post('https://api.scavio.dev/api/v1/search', headers=H,
        json={'platform': 'google', 'query': query}, timeout=10)
    results = search_resp.json().get('organic', [])[:3]
    # Extract structured data from each result
    enriched = []
    for r in results:
        try:
            extracted = extract(r['link'])
            enriched.append({'title': r['title'], 'url': r['link'], 'extracted': extracted})
        except: pass
    return enriched

data = search_and_extract('best CRM software pricing')

Step 4: Save extracted data

Export the extracted data for downstream processing.

Python
import json

def save_extracted(data: list, filepath: str):
    with open(filepath, 'w') as f:
        json.dump(data, f, indent=2)
    print(f'Saved {len(data)} extracted records to {filepath}')

save_extracted(extracted, 'extracted_data.json')

Python Example

Python
import requests, os
H = {'x-api-key': os.environ['SCAVIO_API_KEY']}

def extract(url):
    return requests.post('https://api.scavio.dev/api/v1/extract',
        headers=H, json={'url': url}, timeout=30).json()

# Extract structured data from any URL:
data = extract('https://example.com/pricing')

JavaScript Example

JavaScript
async function extract(url) {
  const resp = await fetch('https://api.scavio.dev/api/v1/extract', {
    method: 'POST', headers: {'x-api-key': process.env.SCAVIO_API_KEY, 'Content-Type': 'application/json'},
    body: JSON.stringify({url})
  });
  return resp.json();
}

Expected Output

JSON
Structured data extracted from any URL via a single API call, with no custom parsing code needed.

Related Tutorials

  • How to Fetch Google Search Results in Python
  • How to Migrate a Web Scraper to a Search API

Frequently Asked Questions

Most developers complete this tutorial in 15 to 30 minutes. You will need a Scavio API key (free tier works) and a working Python or JavaScript environment.

Python 3.8+ or Node.js 18+. requests library (Python) or built-in fetch (JS). A Scavio API key from scavio.dev. A Scavio API key gives you 50 free credits on signup.

Yes. The free tier includes 50 credits on signup, which is more than enough to complete this tutorial and prototype a working solution.

Scavio has a native LangChain package (langchain-scavio), an MCP server, and a plain REST API that works with any HTTP client. This tutorial uses the raw REST API, but you can adapt to your framework of choice.

Related Resources

Best Of

Best TikTok Data APIs Without Scraping or Proxies in 2026

Read more
Best Of

Best CAPTCHA-Free Data APIs in 2026

Read more
Use Case

ScrapingAnt API Migration

Read more
Use Case

CAPTCHA-Free Data Pipeline

Read more
Solution

Get Local Business Data Without Scraping Google Maps

Read more
Glossary

Structured Search API vs. Raw Scraping

Read more

Start Building

Learn how to use Scavio's extract endpoint to pull structured data from any URL without writing custom scrapers.

Get Free API KeyRead the Docs
ScavioScavio

Real-time search API for AI agents. Search every platform, not just Google.

Product

  • Features
  • Pricing
  • Dashboard
  • Affiliates

Developers

  • Documentation
  • API Reference
  • Quickstart
  • MCP Integration
  • Python SDK

Alternatives

  • Tavily Alternative
  • SerpAPI Alternative
  • Firecrawl Alternative
  • Exa Alternative

Tools

  • JSON Formatter
  • cURL to Code
  • Token Counter
  • All Tools

© 2026 Scavio. All rights reserved.

Featured on TAAFT
Terms of ServicePrivacy Policy