Build a Perplexity-Style Search Tool with MCP

Perplexity answers questions by searching the web, reading the results, and synthesizing an answer with inline citations. The core loop is simple: search, retrieve, synthesize. You can build a basic version of this using an MCP search tool and any LLM. But "basic version" is the key phrase. Perplexity has custom ranking, their own index, streaming synthesis, and a team of 200+ engineers. Your version will be functional, not competitive.

The architecture

User asks a question
Agent calls search MCP tool to get relevant web results
Agent reads the titles, URLs, and snippets from results
Agent synthesizes an answer referencing specific sources by number
Output includes the answer text and a numbered source list

The search function

Python

import requests, os, json

def search_web(query: str, num_results: int = 8) -> list:
    """Search and return structured results with source numbers."""
    resp = requests.post(
        "https://api.scavio.dev/api/v1/search",
        headers={"x-api-key": os.environ["SCAVIO_API_KEY"]},
        json={"query": query, "num_results": num_results},
        timeout=10,
    )
    results = resp.json().get("results", [])
    return [
        {
            "source_num": i + 1,
            "title": r["title"],
            "url": r["url"],
            "snippet": r.get("snippet", ""),
        }
        for i, r in enumerate(results)
    ]

The synthesis prompt

The key to good citation behavior is the system prompt. Tell the model to cite sources by number and never make claims without a supporting source.

Python

from openai import OpenAI

client = OpenAI()

def synthesize_answer(question: str, sources: list) -> str:
    """Synthesize an answer with inline citations."""
    sources_text = "\\n".join(
        f"[{s['source_num']}] {s['title']} - {s['url']}\\n{s['snippet']}"
        for s in sources
    )

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": (
                    "Answer the user question using ONLY the provided sources. "
                    "Cite sources inline using [1], [2], etc. "
                    "If no source supports a claim, say you could not verify it. "
                    "End with a Sources section listing all cited URLs."
                ),
            },
            {
                "role": "user",
                "content": f"Question: {question}\\n\\nSources:\\n{sources_text}",
            },
        ],
    )
    return response.choices[0].message.content

The full pipeline

Python

def answer_with_citations(question: str) -> str:
    """Perplexity-style search and synthesize."""
    # Step 1: Search
    sources = search_web(question)

    # Step 2: Synthesize with citations
    answer = synthesize_answer(question, sources)

    # Step 3: Append source list
    source_list = "\\n".join(
        f"[{s['source_num']}] {s['title']} - {s['url']}"
        for s in sources
    )
    return f"{answer}\\n\\n---\\nSources:\\n{source_list}"

# Example
result = answer_with_citations(
    "What are the main differences between Bun and Deno in 2026?"
)
print(result)

What Perplexity does that you cannot easily replicate

Custom re-ranking: Perplexity re-ranks search results using their own relevance model. You get Google's default ranking.
Page fetching: Perplexity reads full page content, not just snippets. You get 150-character snippets unless you add a scraping step.
Streaming: Perplexity streams the synthesis in real-time. Adding streaming to your pipeline requires SSE or WebSocket infrastructure.
Follow-up context: Perplexity maintains conversation context across follow-up questions. Your version is stateless per call unless you add memory.
Speed: Perplexity has optimized infrastructure for sub-2-second responses. Your version will take 3-8 seconds depending on search latency and LLM response time.

Cost per query

Search: 1 credit at $0.005. LLM synthesis with GPT-4o: ~3,000 input tokens ($0.0075) + ~500 output tokens ($0.005). Total per query: ~$0.0175. Perplexity Pro costs $20/mo for 300 Pro searches, or $0.067 per Pro search. Your DIY version is cheaper per query but lacks the polish and speed.

When this approach makes sense

Build this when you need citation-backed answers inside your own application, where you control the UI and data flow. Embedding Perplexity means using their API ($5/1,000 queries on their Pro tier), which may cost more than the DIY approach for high-volume use cases. For personal or internal tools where polish matters less than function, the DIY version works. For user-facing products, Perplexity's speed and quality are hard to match.

The search function

Python

import requests, os, json

def search_web(query: str, num_results: int = 8) -> list:
    """Search and return structured results with source numbers."""
    resp = requests.post(
        "https://api.scavio.dev/api/v1/search",
        headers={"x-api-key": os.environ["SCAVIO_API_KEY"]},
        json={"query": query, "num_results": num_results},
        timeout=10,
    )
    results = resp.json().get("results", [])
    return [
        {
            "source_num": i + 1,
            "title": r["title"],
            "url": r["url"],
            "snippet": r.get("snippet", ""),
        }
        for i, r in enumerate(results)
    ]

The synthesis prompt

The key to good citation behavior is the system prompt. Tell the model to cite sources by number and never make claims without a supporting source.

Python

from openai import OpenAI

client = OpenAI()

def synthesize_answer(question: str, sources: list) -> str:
    """Synthesize an answer with inline citations."""
    sources_text = "\\n".join(
        f"[{s['source_num']}] {s['title']} - {s['url']}\\n{s['snippet']}"
        for s in sources
    )

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": (
                    "Answer the user question using ONLY the provided sources. "
                    "Cite sources inline using [1], [2], etc. "
                    "If no source supports a claim, say you could not verify it. "
                    "End with a Sources section listing all cited URLs."
                ),
            },
            {
                "role": "user",
                "content": f"Question: {question}\\n\\nSources:\\n{sources_text}",
            },
        ],
    )
    return response.choices[0].message.content

The full pipeline

Python

def answer_with_citations(question: str) -> str:
    """Perplexity-style search and synthesize."""
    # Step 1: Search
    sources = search_web(question)

    # Step 2: Synthesize with citations
    answer = synthesize_answer(question, sources)

    # Step 3: Append source list
    source_list = "\\n".join(
        f"[{s['source_num']}] {s['title']} - {s['url']}"
        for s in sources
    )
    return f"{answer}\\n\\n---\\nSources:\\n{source_list}"

# Example
result = answer_with_citations(
    "What are the main differences between Bun and Deno in 2026?"
)
print(result)

What Perplexity does that you cannot easily replicate

Custom re-ranking: Perplexity re-ranks search results using their own relevance model. You get Google's default ranking.

Page fetching: Perplexity reads full page content, not just snippets. You get 150-character snippets unless you add a scraping step.

Streaming: Perplexity streams the synthesis in real-time. Adding streaming to your pipeline requires SSE or WebSocket infrastructure.

Follow-up context: Perplexity maintains conversation context across follow-up questions. Your version is stateless per call unless you add memory.

Speed: Perplexity has optimized infrastructure for sub-2-second responses. Your version will take 3-8 seconds depending on search latency and LLM response time.

When this approach makes sense

Build a Perplexity-Style Search Tool with MCP

The architecture

The search function

The synthesis prompt

The full pipeline

What Perplexity does that you cannot easily replicate

Cost per query

When this approach makes sense

Continue reading

AEO Tracking for D2C Ecommerce Brands in 2026

Agent Discovery vs Extraction: Why Cost Split Matters

Build a Perplexity-Style Search Tool with MCP

The architecture

The search function

The synthesis prompt

The full pipeline

What Perplexity does that you cannot easily replicate

Cost per query

When this approach makes sense

Continue reading

AEO Tracking for D2C Ecommerce Brands in 2026

Agent Discovery vs Extraction: Why Cost Split Matters