How long does this evaluate mcp servers for data quality tutorial take?

Most developers complete this tutorial in 15 to 30 minutes. You will need a Scavio API key (free tier works) and a working Python or JavaScript environment.

What do I need before starting?

Python 3.8+ installed. requests library installed. A Scavio API key from scavio.dev. A set of test queries with known expected results. A Scavio API key gives you 50 free credits on signup.

Can I run this tutorial with the free tier?

Yes. The free tier includes 50 credits on signup, which is more than enough to complete this tutorial and prototype a working solution.

What frameworks does this work with?

Scavio has a native LangChain package (langchain-scavio), an MCP server, and a plain REST API that works with any HTTP client. This tutorial uses the raw REST API, but you can adapt to your framework of choice.

Evaluate MCP Servers for Data Quality (2026)

Evaluate MCP servers for data quality by running a standardized set of test queries and scoring the results on freshness, coverage, and factual accuracy. Most MCP server comparisons focus on latency and uptime but ignore the quality of the data returned, which directly impacts LLM output. This tutorial builds a scoring harness that tests a search MCP server against a curated set of queries with known-good answers, then produces a quality report. We use Scavio's MCP endpoint as the server under test.

Prerequisites

Python 3.8+ installed
requests library installed
A Scavio API key from scavio.dev
A set of test queries with known expected results

Walkthrough

Step 1: Define the evaluation dataset

Create a list of test queries paired with expected attributes like minimum result count and required domains.

Python

import os, requests

API_KEY = os.environ['SCAVIO_API_KEY']

EVAL_SET = [
    {'query': 'python 3.13 release date', 'expected_domain': 'python.org', 'min_results': 3},
    {'query': 'react 19 new features', 'expected_domain': 'react.dev', 'min_results': 3},
    {'query': 'nvidia h200 price', 'expected_domain': 'nvidia.com', 'min_results': 2},
    {'query': 'fastapi latest version', 'expected_domain': 'fastapi.tiangolo.com', 'min_results': 3},
]

Step 2: Run queries and collect results

Send each evaluation query through the Scavio API and record the raw results for scoring.

Python

def run_eval_query(test_case: dict) -> dict:
    resp = requests.post('https://api.scavio.dev/api/v1/search',
        headers={'x-api-key': API_KEY},
        json={'platform': 'google', 'query': test_case['query']}, timeout=15)
    resp.raise_for_status()
    results = resp.json().get('organic_results', [])
    return {
        'query': test_case['query'],
        'results': results,
        'expected_domain': test_case['expected_domain'],
        'min_results': test_case['min_results'],
    }

Step 3: Score each response

Score on three dimensions: coverage (result count meets minimum), authority (expected domain appears in top results), and freshness (results contain current-year dates).

Python

def score_response(eval_result: dict) -> dict:
    results = eval_result['results']
    coverage = 1.0 if len(results) >= eval_result['min_results'] else len(results) / eval_result['min_results']
    domain_found = any(eval_result['expected_domain'] in r.get('link', '') for r in results[:5])
    authority = 1.0 if domain_found else 0.0
    year_mentions = sum(1 for r in results[:5] if '2026' in r.get('snippet', '') or '2025' in r.get('snippet', ''))
    freshness = min(year_mentions / 3, 1.0)
    return {
        'query': eval_result['query'],
        'coverage': round(coverage, 2),
        'authority': authority,
        'freshness': round(freshness, 2),
        'composite': round((coverage + authority + freshness) / 3, 2),
    }

Step 4: Generate the quality report

Run the full evaluation and print a summary report with per-query scores and an aggregate quality score.

Python

def run_evaluation():
    scores = []
    for test in EVAL_SET:
        result = run_eval_query(test)
        score = score_response(result)
        scores.append(score)
        print(f'{score["query"][:40]:<42} C={score["coverage"]} A={score["authority"]} F={score["freshness"]} => {score["composite"]}')
    avg = round(sum(s['composite'] for s in scores) / len(scores), 2)
    print(f'\nAggregate quality score: {avg}/1.00')
    return scores

run_evaluation()

Python Example

Python

import requests, os
H = {'x-api-key': os.environ['SCAVIO_API_KEY']}

def eval_query(query, expected_domain):
    data = requests.post('https://api.scavio.dev/api/v1/search', headers=H,
        json={'platform': 'google', 'query': query}, timeout=15).json()
    results = data.get('organic_results', [])
    found = any(expected_domain in r.get('link', '') for r in results[:5])
    return {'query': query, 'count': len(results), 'authority': found}

print(eval_query('python 3.13 release date', 'python.org'))

JavaScript Example

JavaScript

const H = {'x-api-key': process.env.SCAVIO_API_KEY, 'Content-Type': 'application/json'};
async function evalQuery(query, expectedDomain) {
  const r = await fetch('https://api.scavio.dev/api/v1/search', {
    method: 'POST', headers: H, body: JSON.stringify({platform: 'google', query})
  });
  const results = (await r.json()).organic_results || [];
  const found = results.slice(0, 5).some(r => r.link?.includes(expectedDomain));
  return {query, count: results.length, authority: found};
}
evalQuery('python 3.13 release date', 'python.org').then(console.log);

Expected Output

JSON

A quality report scoring each MCP server test query on coverage, authority, and freshness, with an aggregate composite score out of 1.00.

Prerequisites

Python 3.8+ installed
requests library installed
A Scavio API key from scavio.dev
A set of test queries with known expected results

Walkthrough

Step 1: Define the evaluation dataset

Create a list of test queries paired with expected attributes like minimum result count and required domains.

Python

import os, requests

API_KEY = os.environ['SCAVIO_API_KEY']

EVAL_SET = [
    {'query': 'python 3.13 release date', 'expected_domain': 'python.org', 'min_results': 3},
    {'query': 'react 19 new features', 'expected_domain': 'react.dev', 'min_results': 3},
    {'query': 'nvidia h200 price', 'expected_domain': 'nvidia.com', 'min_results': 2},
    {'query': 'fastapi latest version', 'expected_domain': 'fastapi.tiangolo.com', 'min_results': 3},
]

Step 2: Run queries and collect results

Send each evaluation query through the Scavio API and record the raw results for scoring.

Python

def run_eval_query(test_case: dict) -> dict:
    resp = requests.post('https://api.scavio.dev/api/v1/search',
        headers={'x-api-key': API_KEY},
        json={'platform': 'google', 'query': test_case['query']}, timeout=15)
    resp.raise_for_status()
    results = resp.json().get('organic_results', [])
    return {
        'query': test_case['query'],
        'results': results,
        'expected_domain': test_case['expected_domain'],
        'min_results': test_case['min_results'],
    }

Step 3: Score each response

Score on three dimensions: coverage (result count meets minimum), authority (expected domain appears in top results), and freshness (results contain current-year dates).

Python

def score_response(eval_result: dict) -> dict:
    results = eval_result['results']
    coverage = 1.0 if len(results) >= eval_result['min_results'] else len(results) / eval_result['min_results']
    domain_found = any(eval_result['expected_domain'] in r.get('link', '') for r in results[:5])
    authority = 1.0 if domain_found else 0.0
    year_mentions = sum(1 for r in results[:5] if '2026' in r.get('snippet', '') or '2025' in r.get('snippet', ''))
    freshness = min(year_mentions / 3, 1.0)
    return {
        'query': eval_result['query'],
        'coverage': round(coverage, 2),
        'authority': authority,
        'freshness': round(freshness, 2),
        'composite': round((coverage + authority + freshness) / 3, 2),
    }

Step 4: Generate the quality report

Run the full evaluation and print a summary report with per-query scores and an aggregate quality score.

Python

def run_evaluation():
    scores = []
    for test in EVAL_SET:
        result = run_eval_query(test)
        score = score_response(result)
        scores.append(score)
        print(f'{score["query"][:40]:<42} C={score["coverage"]} A={score["authority"]} F={score["freshness"]} => {score["composite"]}')
    avg = round(sum(s['composite'] for s in scores) / len(scores), 2)
    print(f'\nAggregate quality score: {avg}/1.00')
    return scores

run_evaluation()

Python Example

Python

import requests, os
H = {'x-api-key': os.environ['SCAVIO_API_KEY']}

def eval_query(query, expected_domain):
    data = requests.post('https://api.scavio.dev/api/v1/search', headers=H,
        json={'platform': 'google', 'query': query}, timeout=15).json()
    results = data.get('organic_results', [])
    found = any(expected_domain in r.get('link', '') for r in results[:5])
    return {'query': query, 'count': len(results), 'authority': found}

print(eval_query('python 3.13 release date', 'python.org'))

JavaScript Example

JavaScript

const H = {'x-api-key': process.env.SCAVIO_API_KEY, 'Content-Type': 'application/json'};
async function evalQuery(query, expectedDomain) {
  const r = await fetch('https://api.scavio.dev/api/v1/search', {
    method: 'POST', headers: H, body: JSON.stringify({platform: 'google', query})
  });
  const results = (await r.json()).organic_results || [];
  const found = results.slice(0, 5).some(r => r.link?.includes(expectedDomain));
  return {query, count: results.length, authority: found};
}
evalQuery('python 3.13 release date', 'python.org').then(console.log);

Expected Output

JSON

A quality report scoring each MCP server test query on coverage, authority, and freshness, with an aggregate composite score out of 1.00.

How to Evaluate MCP Servers for Data Quality

Prerequisites

Walkthrough

Step 1: Define the evaluation dataset

Step 2: Run queries and collect results

Step 3: Score each response

Step 4: Generate the quality report

Python Example

JavaScript Example

Expected Output

Related Tutorials

Frequently Asked Questions

How long does this evaluate mcp servers for data quality tutorial take?

What do I need before starting?

Can I run this tutorial with the free tier?

What frameworks does this work with?

Related Resources

Best MCP Search Servers: Community Edition, May 2026

Best MCP Search Server in 2026

MCP Custom Search Server

MCP Search Gateway for Multi-Agent Systems

MCP Data Server

Give AI Agents Multi-Source Search via MCP

Start Building

How to Evaluate MCP Servers for Data Quality

Prerequisites

Walkthrough

Step 1: Define the evaluation dataset

Step 2: Run queries and collect results

Step 3: Score each response

Step 4: Generate the quality report

Python Example

JavaScript Example

Expected Output

Related Tutorials

Frequently Asked Questions

How long does this evaluate mcp servers for data quality tutorial take?

What do I need before starting?

Can I run this tutorial with the free tier?

What frameworks does this work with?

Related Resources

Best MCP Search Servers: Community Edition, May 2026

Best MCP Search Server in 2026

MCP Custom Search Server

MCP Search Gateway for Multi-Agent Systems

MCP Data Server

Give AI Agents Multi-Source Search via MCP

Start Building