ScavioScavio
ProductPricingDocs
Sign InGet Started
  1. Home
  2. Tutorials
  3. How to Run SearXNG + Hermes + Qwen with Docker Compose
Tutorial

How to Run SearXNG + Hermes + Qwen with Docker Compose

Run SearXNG, Hermes, and Qwen in Docker Compose for local AI search. Compare local results with a cloud search API. Full compose file included.

Get Free API KeyAPI Docs

Run SearXNG, Hermes 3, and Qwen 2.5 as a local AI search stack using Docker Compose, then compare the result quality and latency against a cloud search API. Self-hosted search stacks appeal to privacy-focused teams, but they require ongoing maintenance and produce inconsistent results depending on which SearXNG engines are functioning. This tutorial sets up the full stack, runs comparison queries, and helps you decide where self-hosted works and where an API is the better choice.

Prerequisites

  • Docker and Docker Compose installed
  • 16GB+ RAM for local model inference
  • A Scavio API key from scavio.dev for comparison
  • Python 3.8+ installed

Walkthrough

Step 1: Write the Docker Compose file

Define services for SearXNG (metasearch), Ollama with Hermes/Qwen models, and a bridge service.

Python
# docker-compose.yml
# Save this as docker-compose.yml in your project directory

compose_yaml = """
version: '3.8'
services:
  searxng:
    image: searxng/searxng:latest
    ports:
      - '8080:8080'
    volumes:
      - ./searxng:/etc/searxng
    restart: unless-stopped

  ollama:
    image: ollama/ollama:latest
    ports:
      - '11434:11434'
    volumes:
      - ollama_data:/root/.ollama
    restart: unless-stopped

volumes:
  ollama_data:
"""

with open('docker-compose.yml', 'w') as f:
    f.write(compose_yaml)
print('docker-compose.yml written')
print('Run: docker compose up -d')
print('Then: docker exec ollama ollama pull hermes3')
print('Then: docker exec ollama ollama pull qwen2.5')

Step 2: Configure SearXNG engines

Customize SearXNG settings to enable the search engines you need and disable noisy ones.

Python
import os

os.makedirs('searxng', exist_ok=True)

settings_yml = """
use_default_settings: true
server:
  secret_key: 'change-this-to-a-random-string'
search:
  safe_search: 0
  autocomplete: ''
  default_lang: 'en'
engines:
  - name: google
    engine: google
    shortcut: g
    disabled: false
  - name: duckduckgo
    engine: duckduckgo
    shortcut: ddg
    disabled: false
  - name: brave
    engine: brave
    shortcut: br
    disabled: false
"""

with open('searxng/settings.yml', 'w') as f:
    f.write(settings_yml)
print('SearXNG settings written to searxng/settings.yml')

Step 3: Connect Hermes to SearXNG

Build a Python bridge that sends a query to SearXNG, passes results to Hermes/Qwen via Ollama, and returns the grounded answer.

Python
import requests

def searxng_search(query: str) -> list:
    resp = requests.get('http://localhost:8080/search',
        params={'q': query, 'format': 'json'}, timeout=15)
    results = resp.json().get('results', [])
    return [{'title': r.get('title', ''), 'url': r.get('url', ''),
             'content': r.get('content', '')} for r in results[:5]]

def ollama_generate(model: str, prompt: str) -> str:
    resp = requests.post('http://localhost:11434/api/generate',
        json={'model': model, 'prompt': prompt, 'stream': False}, timeout=60)
    return resp.json().get('response', '')

def local_grounded_search(query: str, model: str = 'hermes3') -> str:
    results = searxng_search(query)
    context = '\n'.join(f"{r['title']}: {r['content']}" for r in results)
    prompt = f'Using this context:\n{context}\n\nAnswer: {query}'
    return ollama_generate(model, prompt)

# print(local_grounded_search('best python frameworks 2026'))

Step 4: Test queries and compare with API

Run the same queries through both the local stack and Scavio, then compare result quality and latency.

Python
import time

SCAVIO_KEY = os.environ['SCAVIO_API_KEY']

def scavio_search(query: str) -> list:
    resp = requests.post('https://api.scavio.dev/api/v1/search',
        headers={'x-api-key': SCAVIO_KEY},
        json={'platform': 'google', 'query': query}, timeout=10)
    return resp.json().get('organic_results', [])[:5]

def compare(query: str) -> dict:
    # Local stack
    start = time.monotonic()
    local_results = searxng_search(query)
    local_ms = (time.monotonic() - start) * 1000
    # Cloud API
    start = time.monotonic()
    api_results = scavio_search(query)
    api_ms = (time.monotonic() - start) * 1000
    print(f'Query: {query}')
    print(f'  Local: {len(local_results)} results in {local_ms:.0f}ms')
    print(f'  API:   {len(api_results)} results in {api_ms:.0f}ms')
    return {'query': query, 'local_count': len(local_results), 'api_count': len(api_results),
            'local_ms': round(local_ms), 'api_ms': round(api_ms)}

# compare('best python web framework 2026')

Step 5: Compare results with API baseline

Evaluate which queries work well locally and which need a cloud API for reliable results.

Python
def evaluate_stack(queries: list) -> dict:
    results = []
    for q in queries:
        try:
            api = scavio_search(q)
            results.append({'query': q, 'api_results': len(api), 'status': 'ok'})
        except Exception as e:
            results.append({'query': q, 'error': str(e)})
    ok = sum(1 for r in results if r.get('status') == 'ok')
    print(f'{ok}/{len(queries)} queries returned API results')
    print('Local stack: good for privacy, slower, inconsistent engines')
    print('Cloud API: consistent results, structured JSON, multi-platform')
    return {'total': len(queries), 'ok': ok, 'results': results}

evaluate_stack(['best crm 2026', 'python async tutorial', 'react vs vue performance'])

Python Example

Python
import requests, os
H = {'x-api-key': os.environ['SCAVIO_API_KEY']}

# Cloud API comparison baseline
def cloud_search(query):
    data = requests.post('https://api.scavio.dev/api/v1/search', headers=H,
        json={'platform': 'google', 'query': query}).json()
    return data.get('organic_results', [])[:5]

# Local SearXNG (when running)
def local_search(query):
    try:
        data = requests.get('http://localhost:8080/search', params={'q': query, 'format': 'json'}, timeout=10).json()
        return data.get('results', [])[:5]
    except: return []

print(f'API: {len(cloud_search("best crm 2026"))} results')

JavaScript Example

JavaScript
const H = {'x-api-key': process.env.SCAVIO_API_KEY, 'Content-Type': 'application/json'};
async function cloudSearch(query) {
  const r = await fetch('https://api.scavio.dev/api/v1/search', {
    method: 'POST', headers: H, body: JSON.stringify({platform: 'google', query})
  });
  return (await r.json()).organic_results || [];
}
async function localSearch(query) {
  try {
    const r = await fetch(`http://localhost:8080/search?q=${encodeURIComponent(query)}&format=json`);
    return (await r.json()).results || [];
  } catch { return []; }
}
cloudSearch('best crm 2026').then(r => console.log('API:', r.length, 'results'));

Expected Output

JSON
A Docker Compose stack running SearXNG and Ollama with Hermes/Qwen, with comparison benchmarks against Scavio's cloud API for result quality and latency.

Related Tutorials

  • How to Fix Hermes Agent Web Search Failures
  • How to Add Live Web Search to a Local Research Stack

Frequently Asked Questions

Most developers complete this tutorial in 15 to 30 minutes. You will need a Scavio API key (free tier works) and a working Python or JavaScript environment.

Docker and Docker Compose installed. 16GB+ RAM for local model inference. A Scavio API key from scavio.dev for comparison. Python 3.8+ installed. A Scavio API key gives you 50 free credits on signup.

Yes. The free tier includes 50 credits on signup, which is more than enough to complete this tutorial and prototype a working solution.

Scavio has a native LangChain package (langchain-scavio), an MCP server, and a plain REST API that works with any HTTP client. This tutorial uses the raw REST API, but you can adapt to your framework of choice.

Related Resources

Best Of

Best Self-Hosted vs Hosted Search API Alternatives 2026

Read more
Best Of

Best Self-Hosted Search for AI Agents in 2026

Read more
Comparison

Self-Hosted Search (SearXNG, YaCy) vs Commercial Search APIs (Scavio, Tavily, SerpAPI)

Read more
Glossary

Search API Provider Landscape (2026)

Read more
Use Case

Local LLM Search Grounding via API

Read more
Comparison

Scavio vs SearXNG (self-hosted)

Read more

Start Building

Run SearXNG, Hermes, and Qwen in Docker Compose for local AI search. Compare local results with a cloud search API. Full compose file included.

Get Free API KeyRead the Docs
ScavioScavio

Real-time search API for AI agents. Search every platform, not just Google.

Product

  • Features
  • Pricing
  • Dashboard
  • Affiliates

Developers

  • Documentation
  • API Reference
  • Quickstart
  • MCP Integration
  • Python SDK

Alternatives

  • Tavily Alternative
  • SerpAPI Alternative
  • Firecrawl Alternative
  • Exa Alternative

Tools

  • JSON Formatter
  • cURL to Code
  • Token Counter
  • All Tools

© 2026 Scavio. All rights reserved.

Featured on TAAFT
Terms of ServicePrivacy Policy