Perplexity answers questions by searching the web, reading the results, and synthesizing an answer with inline citations. The core loop is simple: search, retrieve, synthesize. You can build a basic version of this using an MCP search tool and any LLM. But "basic version" is the key phrase. Perplexity has custom ranking, their own index, streaming synthesis, and a team of 200+ engineers. Your version will be functional, not competitive.
The architecture
- User asks a question
- Agent calls search MCP tool to get relevant web results
- Agent reads the titles, URLs, and snippets from results
- Agent synthesizes an answer referencing specific sources by number
- Output includes the answer text and a numbered source list
The search function
import requests, os, json
def search_web(query: str, num_results: int = 8) -> list:
"""Search and return structured results with source numbers."""
resp = requests.post(
"https://api.scavio.dev/api/v1/search",
headers={"x-api-key": os.environ["SCAVIO_API_KEY"]},
json={"query": query, "num_results": num_results},
timeout=10,
)
results = resp.json().get("results", [])
return [
{
"source_num": i + 1,
"title": r["title"],
"url": r["url"],
"snippet": r.get("snippet", ""),
}
for i, r in enumerate(results)
]The synthesis prompt
The key to good citation behavior is the system prompt. Tell the model to cite sources by number and never make claims without a supporting source.
from openai import OpenAI
client = OpenAI()
def synthesize_answer(question: str, sources: list) -> str:
"""Synthesize an answer with inline citations."""
sources_text = "\\n".join(
f"[{s['source_num']}] {s['title']} - {s['url']}\\n{s['snippet']}"
for s in sources
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "system",
"content": (
"Answer the user question using ONLY the provided sources. "
"Cite sources inline using [1], [2], etc. "
"If no source supports a claim, say you could not verify it. "
"End with a Sources section listing all cited URLs."
),
},
{
"role": "user",
"content": f"Question: {question}\\n\\nSources:\\n{sources_text}",
},
],
)
return response.choices[0].message.contentThe full pipeline
def answer_with_citations(question: str) -> str:
"""Perplexity-style search and synthesize."""
# Step 1: Search
sources = search_web(question)
# Step 2: Synthesize with citations
answer = synthesize_answer(question, sources)
# Step 3: Append source list
source_list = "\\n".join(
f"[{s['source_num']}] {s['title']} - {s['url']}"
for s in sources
)
return f"{answer}\\n\\n---\\nSources:\\n{source_list}"
# Example
result = answer_with_citations(
"What are the main differences between Bun and Deno in 2026?"
)
print(result)What Perplexity does that you cannot easily replicate
- Custom re-ranking: Perplexity re-ranks search results using their own relevance model. You get Google's default ranking.
- Page fetching: Perplexity reads full page content, not just snippets. You get 150-character snippets unless you add a scraping step.
- Streaming: Perplexity streams the synthesis in real-time. Adding streaming to your pipeline requires SSE or WebSocket infrastructure.
- Follow-up context: Perplexity maintains conversation context across follow-up questions. Your version is stateless per call unless you add memory.
- Speed: Perplexity has optimized infrastructure for sub-2-second responses. Your version will take 3-8 seconds depending on search latency and LLM response time.
Cost per query
Search: 1 credit at $0.005. LLM synthesis with GPT-4o: ~3,000 input tokens ($0.0075) + ~500 output tokens ($0.005). Total per query: ~$0.0175. Perplexity Pro costs $20/mo for 300 Pro searches, or $0.067 per Pro search. Your DIY version is cheaper per query but lacks the polish and speed.
When this approach makes sense
Build this when you need citation-backed answers inside your own application, where you control the UI and data flow. Embedding Perplexity means using their API ($5/1,000 queries on their Pro tier), which may cost more than the DIY approach for high-volume use cases. For personal or internal tools where polish matters less than function, the DIY version works. For user-facing products, Perplexity's speed and quality are hard to match.