How long does this set up rag with search instead of vectors tutorial take?

Most developers complete this tutorial in 15 to 30 minutes. You will need a Scavio API key (free tier works) and a working Python or JavaScript environment.

What do I need before starting?

Python 3.9+. Scavio API key. anthropic SDK. A Scavio API key gives you 50 free credits on signup.

Can I run this tutorial with the free tier?

Yes. The free tier includes 50 credits on signup, which is more than enough to complete this tutorial and prototype a working solution.

What frameworks does this work with?

Scavio has a native LangChain package (langchain-scavio), an MCP server, and a plain REST API that works with any HTTP client. This tutorial uses the raw REST API, but you can adapt to your framework of choice.

RAG Without Vectors: Search API as Retrieval Layer

Search-based RAG replaces a vector store with a live search API call. Instead of embedding documents and doing similarity search, you query the web (or a platform) for relevant content and inject the results directly into the LLM context.

Prerequisites

Python 3.9+
Scavio API key
anthropic SDK

Walkthrough

Step 1: Understand the pattern difference

Traditional RAG: embed documents -> store vectors -> similarity search -> inject. Search RAG: query API -> get snippets -> inject. No embedding step, no vector DB.

Python

# Traditional vector RAG (what you're replacing)
# 1. Embed your corpus (expensive, slow)
# 2. Store in Pinecone/Chroma/Weaviate
# 3. Embed the query
# 4. Cosine similarity search
# 5. Retrieve top-k chunks
# 6. Inject into prompt

# Search RAG (this tutorial)
# 1. Convert question to search query
# 2. Call search API (1 API call, 1 credit)
# 3. Extract top snippets
# 4. Inject into prompt
# Done. Fresh data. No embedding costs.

Step 2: Build the retrieval function

The retriever takes a question, searches for relevant content, and returns formatted snippets.

Python

import requests

SCAVIO_KEY = "your-scavio-api-key"

def retrieve(question: str, num_results: int = 5, platform: str = None) -> list[dict]:
    payload = {"query": question, "num_results": num_results}
    if platform:
        payload["platform"] = platform
    r = requests.post(
        "https://api.scavio.dev/api/v1/search",
        json=payload,
        headers={"x-api-key": SCAVIO_KEY},
        timeout=15
    )
    r.raise_for_status()
    results = r.json().get("organic_results", [])
    return [{"title": res["title"], "snippet": res.get("snippet", ""), "url": res["link"]} for res in results]

Step 3: Format retrieved docs as context

Convert the search results into a context block that the LLM can reference.

Python

def format_context(docs: list[dict]) -> str:
    lines = []
    for i, doc in enumerate(docs, 1):
        lines.append(f"[{i}] {doc['title']}\nURL: {doc['url']}\n{doc['snippet']}")
    return "\n---\n".join(lines)

Step 4: Generate answer with Anthropic Claude

Inject the retrieved context into the prompt and get a grounded answer.

Python

import anthropic

ANTHROPIC_KEY = "your-anthropic-key"

def rag_answer(question: str, platform: str = None) -> dict:
    docs = retrieve(question, num_results=5, platform=platform)
    context = format_context(docs)
    prompt = f"""Use the following search results to answer the question. Cite sources by number.

{context}

Question: {question}
Answer:"""

    client = anthropic.Anthropic(api_key=ANTHROPIC_KEY)
    msg = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        messages=[{"role": "user", "content": prompt}]
    )
    return {"answer": msg.content[0].text, "sources": [d["url"] for d in docs]}

result = rag_answer("What are the latest AI models released in 2026?")
print(result["answer"])
print("\nSources:", result["sources"])

Python Example

Python

import requests
import anthropic

SCAVIO_KEY = "your-scavio-api-key"
ANTHROPIC_KEY = "your-anthropic-key"

def retrieve(question: str, n: int = 5, platform: str = None) -> list[dict]:
    payload = {"query": question, "num_results": n}
    if platform:
        payload["platform"] = platform
    r = requests.post(
        "https://api.scavio.dev/api/v1/search",
        json=payload,
        headers={"x-api-key": SCAVIO_KEY},
        timeout=15
    )
    r.raise_for_status()
    return [{"title": d["title"], "snippet": d.get("snippet",""), "url": d["link"]}
            for d in r.json().get("organic_results", [])]

def format_context(docs: list) -> str:
    return "\n---\n".join(f"[{i}] {d['title']}\n{d['snippet']}\n{d['url']}" for i, d in enumerate(docs, 1))

def rag_answer(question: str, platform: str = None) -> dict:
    docs = retrieve(question, n=5, platform=platform)
    context = format_context(docs)
    prompt = f"Use these search results to answer. Cite source numbers.\n\n{context}\n\nQuestion: {question}\nAnswer:"
    client = anthropic.Anthropic(api_key=ANTHROPIC_KEY)
    msg = client.messages.create(model="claude-sonnet-4-6", max_tokens=1024,
                                  messages=[{"role": "user", "content": prompt}])
    return {"answer": msg.content[0].text, "sources": [d["url"] for d in docs]}

if __name__ == "__main__":
    questions = [
        "What are the most popular vector databases in 2026?",
        "Latest AI coding assistants compared"
    ]
    for q in questions:
        result = rag_answer(q)
        print(f"Q: {q}")
        print(f"A: {result['answer'][:300]}...\n")

JavaScript Example

JavaScript

const SCAVIO_KEY = 'your-scavio-api-key';
const ANTHROPIC_KEY = 'your-anthropic-key';

async function retrieve(question, n = 5, platform = null) {
  const payload = { query: question, num_results: n };
  if (platform) payload.platform = platform;
  const res = await fetch('https://api.scavio.dev/api/v1/search', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json', 'x-api-key': SCAVIO_KEY },
    body: JSON.stringify(payload)
  });
  const data = await res.json();
  return (data.organic_results ?? []).map(d => ({ title: d.title, snippet: d.snippet ?? '', url: d.link }));
}

function formatContext(docs) {
  return docs.map((d, i) => `[${i+1}] ${d.title}\n${d.snippet}\n${d.url}`).join('\n---\n');
}

async function ragAnswer(question, platform = null) {
  const docs = await retrieve(question, 5, platform);
  const context = formatContext(docs);
  const prompt = `Use these search results to answer. Cite source numbers.\n\n${context}\n\nQuestion: ${question}\nAnswer:`;

  const res = await fetch('https://api.anthropic.com/v1/messages', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json', 'x-api-key': ANTHROPIC_KEY, 'anthropic-version': '2023-06-01' },
    body: JSON.stringify({ model: 'claude-sonnet-4-6', max_tokens: 1024, messages: [{ role: 'user', content: prompt }] })
  });
  const msg = await res.json();
  return { answer: msg.content[0].text, sources: docs.map(d => d.url) };
}

const result = await ragAnswer('What are the most popular vector databases in 2026?');
console.log(result.answer);

Expected Output

JSON

Based on the search results, the most popular vector databases in 2026 include:

1. Pinecone - serverless, widely used in production [1]
2. Weaviate - open source with hybrid search [2]
3. Qdrant - performance-focused, Rust-based [3]
4. Chroma - popular for local development [4]
5. pgvector - PostgreSQL extension for teams already using Postgres [5]

Sources: https://pinecone.io, https://weaviate.io, ...

Prerequisites

Python 3.9+
Scavio API key
anthropic SDK

Walkthrough

Step 1: Understand the pattern difference

Traditional RAG: embed documents -> store vectors -> similarity search -> inject. Search RAG: query API -> get snippets -> inject. No embedding step, no vector DB.

Python

# Traditional vector RAG (what you're replacing)
# 1. Embed your corpus (expensive, slow)
# 2. Store in Pinecone/Chroma/Weaviate
# 3. Embed the query
# 4. Cosine similarity search
# 5. Retrieve top-k chunks
# 6. Inject into prompt

# Search RAG (this tutorial)
# 1. Convert question to search query
# 2. Call search API (1 API call, 1 credit)
# 3. Extract top snippets
# 4. Inject into prompt
# Done. Fresh data. No embedding costs.

Step 2: Build the retrieval function

The retriever takes a question, searches for relevant content, and returns formatted snippets.

Python

import requests

SCAVIO_KEY = "your-scavio-api-key"

def retrieve(question: str, num_results: int = 5, platform: str = None) -> list[dict]:
    payload = {"query": question, "num_results": num_results}
    if platform:
        payload["platform"] = platform
    r = requests.post(
        "https://api.scavio.dev/api/v1/search",
        json=payload,
        headers={"x-api-key": SCAVIO_KEY},
        timeout=15
    )
    r.raise_for_status()
    results = r.json().get("organic_results", [])
    return [{"title": res["title"], "snippet": res.get("snippet", ""), "url": res["link"]} for res in results]

Step 3: Format retrieved docs as context

Convert the search results into a context block that the LLM can reference.

Python

def format_context(docs: list[dict]) -> str:
    lines = []
    for i, doc in enumerate(docs, 1):
        lines.append(f"[{i}] {doc['title']}\nURL: {doc['url']}\n{doc['snippet']}")
    return "\n---\n".join(lines)

Step 4: Generate answer with Anthropic Claude

Inject the retrieved context into the prompt and get a grounded answer.

Python

import anthropic

ANTHROPIC_KEY = "your-anthropic-key"

def rag_answer(question: str, platform: str = None) -> dict:
    docs = retrieve(question, num_results=5, platform=platform)
    context = format_context(docs)
    prompt = f"""Use the following search results to answer the question. Cite sources by number.

{context}

Question: {question}
Answer:"""

    client = anthropic.Anthropic(api_key=ANTHROPIC_KEY)
    msg = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        messages=[{"role": "user", "content": prompt}]
    )
    return {"answer": msg.content[0].text, "sources": [d["url"] for d in docs]}

result = rag_answer("What are the latest AI models released in 2026?")
print(result["answer"])
print("\nSources:", result["sources"])

Python Example

Python

import requests
import anthropic

SCAVIO_KEY = "your-scavio-api-key"
ANTHROPIC_KEY = "your-anthropic-key"

def retrieve(question: str, n: int = 5, platform: str = None) -> list[dict]:
    payload = {"query": question, "num_results": n}
    if platform:
        payload["platform"] = platform
    r = requests.post(
        "https://api.scavio.dev/api/v1/search",
        json=payload,
        headers={"x-api-key": SCAVIO_KEY},
        timeout=15
    )
    r.raise_for_status()
    return [{"title": d["title"], "snippet": d.get("snippet",""), "url": d["link"]}
            for d in r.json().get("organic_results", [])]

def format_context(docs: list) -> str:
    return "\n---\n".join(f"[{i}] {d['title']}\n{d['snippet']}\n{d['url']}" for i, d in enumerate(docs, 1))

def rag_answer(question: str, platform: str = None) -> dict:
    docs = retrieve(question, n=5, platform=platform)
    context = format_context(docs)
    prompt = f"Use these search results to answer. Cite source numbers.\n\n{context}\n\nQuestion: {question}\nAnswer:"
    client = anthropic.Anthropic(api_key=ANTHROPIC_KEY)
    msg = client.messages.create(model="claude-sonnet-4-6", max_tokens=1024,
                                  messages=[{"role": "user", "content": prompt}])
    return {"answer": msg.content[0].text, "sources": [d["url"] for d in docs]}

if __name__ == "__main__":
    questions = [
        "What are the most popular vector databases in 2026?",
        "Latest AI coding assistants compared"
    ]
    for q in questions:
        result = rag_answer(q)
        print(f"Q: {q}")
        print(f"A: {result['answer'][:300]}...\n")

JavaScript Example

JavaScript

const SCAVIO_KEY = 'your-scavio-api-key';
const ANTHROPIC_KEY = 'your-anthropic-key';

async function retrieve(question, n = 5, platform = null) {
  const payload = { query: question, num_results: n };
  if (platform) payload.platform = platform;
  const res = await fetch('https://api.scavio.dev/api/v1/search', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json', 'x-api-key': SCAVIO_KEY },
    body: JSON.stringify(payload)
  });
  const data = await res.json();
  return (data.organic_results ?? []).map(d => ({ title: d.title, snippet: d.snippet ?? '', url: d.link }));
}

function formatContext(docs) {
  return docs.map((d, i) => `[${i+1}] ${d.title}\n${d.snippet}\n${d.url}`).join('\n---\n');
}

async function ragAnswer(question, platform = null) {
  const docs = await retrieve(question, 5, platform);
  const context = formatContext(docs);
  const prompt = `Use these search results to answer. Cite source numbers.\n\n${context}\n\nQuestion: ${question}\nAnswer:`;

  const res = await fetch('https://api.anthropic.com/v1/messages', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json', 'x-api-key': ANTHROPIC_KEY, 'anthropic-version': '2023-06-01' },
    body: JSON.stringify({ model: 'claude-sonnet-4-6', max_tokens: 1024, messages: [{ role: 'user', content: prompt }] })
  });
  const msg = await res.json();
  return { answer: msg.content[0].text, sources: docs.map(d => d.url) };
}

const result = await ragAnswer('What are the most popular vector databases in 2026?');
console.log(result.answer);

Expected Output

JSON

Based on the search results, the most popular vector databases in 2026 include:

1. Pinecone - serverless, widely used in production [1]
2. Weaviate - open source with hybrid search [2]
3. Qdrant - performance-focused, Rust-based [3]
4. Chroma - popular for local development [4]
5. pgvector - PostgreSQL extension for teams already using Postgres [5]

Sources: https://pinecone.io, https://weaviate.io, ...

How to Set Up RAG with Search Instead of Vectors

Prerequisites

Walkthrough

Step 1: Understand the pattern difference

Step 2: Build the retrieval function

Step 3: Format retrieved docs as context

Step 4: Generate answer with Anthropic Claude

Python Example

JavaScript Example

Expected Output

Related Tutorials

Frequently Asked Questions

How long does this set up rag with search instead of vectors tutorial take?

What do I need before starting?

Can I run this tutorial with the free tier?

What frameworks does this work with?

Related Resources

Best Search APIs for RAG Grounding in Production in 2026

Best Search APIs for LangChain RAG Pipelines in May 2026

Live Search in LangChain RAG Pipeline

Improve RAG Answer Quality with Search Grounding

Local RAG + Search API Hybrid Application

Search-Augmented RAG

Start Building

How to Set Up RAG with Search Instead of Vectors

Prerequisites

Walkthrough

Step 1: Understand the pattern difference

Step 2: Build the retrieval function

Step 3: Format retrieved docs as context

Step 4: Generate answer with Anthropic Claude

Python Example

JavaScript Example

Expected Output

Related Tutorials

Frequently Asked Questions

How long does this set up rag with search instead of vectors tutorial take?

What do I need before starting?

Can I run this tutorial with the free tier?

What frameworks does this work with?

Related Resources

Best Search APIs for RAG Grounding in Production in 2026

Best Search APIs for LangChain RAG Pipelines in May 2026

Live Search in LangChain RAG Pipeline

Improve RAG Answer Quality with Search Grounding

Local RAG + Search API Hybrid Application

Search-Augmented RAG

Start Building