Build an analytics pipeline where n8n pulls Google Search Console and GA4 data via their APIs, saves pre-aggregated reports to Google Drive, and Claude reads them via MCP to write analysis into Obsidian. The key insight: pre-aggregate data before sending to Claude. Raw GSC exports with 50,000 rows overwhelm context windows. Summarize first, analyze second.
Architecture
# Pipeline stages:
#
# [n8n Cron: daily 6am]
# |
# v
# [GSC API] --> aggregate by page, query, date
# |
# [GA4 API] --> aggregate by page, source, events
# |
# v
# [n8n Function: pre-aggregate]
# |
# v
# [Google Drive: /analytics/daily-reports/]
# |
# v
# [Claude via MCP: reads Drive files]
# |
# v
# [Obsidian vault: /analysis/weekly-seo-report.md]Step 1: pull GSC data with n8n
# n8n HTTP Request node: GSC Search Analytics
# Method: POST
# URL: https://www.googleapis.com/webmasters/v3/sites/
# https%3A%2F%2Fyoursite.com/searchAnalytics/query
# Auth: OAuth2 (Google credentials)
#
# Body:
# {
# "startDate": "{{ $now.minus(7, 'days').format('yyyy-MM-dd') }}",
# "endDate": "{{ $now.minus(1, 'day').format('yyyy-MM-dd') }}",
# "dimensions": ["query", "page", "date"],
# "rowLimit": 5000,
# "dimensionFilterGroups": [{
# "filters": [{
# "dimension": "country",
# "expression": "usa"
# }]
# }]
# }
# This returns up to 5,000 rows with clicks, impressions, CTR, position
# for each query+page+date combinationStep 2: pre-aggregate before Claude sees it
# n8n Function node: aggregate GSC data
# Place after GSC HTTP Request node
const rows = $input.first().json.rows || [];
// Aggregate by page (sum clicks/impressions, avg position)
const byPage = {};
for (const row of rows) {
const page = row.keys[1]; // page dimension
if (!byPage[page]) {
byPage[page] = { clicks: 0, impressions: 0, positions: [], queries: new Set() };
}
byPage[page].clicks += row.clicks;
byPage[page].impressions += row.impressions;
byPage[page].positions.push(row.position);
byPage[page].queries.add(row.keys[0]);
}
// Create summary (fits in Claude's context)
const summary = Object.entries(byPage)
.map(([page, d]) => ({
page,
clicks: d.clicks,
impressions: d.impressions,
avg_position: (d.positions.reduce((a, b) => a + b, 0) / d.positions.length).toFixed(1),
unique_queries: d.queries.size,
}))
.sort((a, b) => b.clicks - a.clicks)
.slice(0, 50); // Top 50 pages only
return [{ json: { summary, total_pages: Object.keys(byPage).length, period: "last_7_days" } }];Step 3: pull GA4 data
# n8n HTTP Request node: GA4 Data API
# Method: POST
# URL: https://analyticsdata.googleapis.com/v1beta/
# properties/YOUR_PROPERTY_ID:runReport
# Auth: OAuth2 (Google credentials)
#
# Body:
# {
# "dateRanges": [{"startDate": "7daysAgo", "endDate": "yesterday"}],
# "dimensions": [
# {"name": "pagePath"},
# {"name": "sessionSource"}
# ],
# "metrics": [
# {"name": "sessions"},
# {"name": "engagedSessions"},
# {"name": "conversions"}
# ],
# "limit": 1000,
# "orderBys": [{"metric": {"metricName": "sessions"}, "desc": true}]
# }
# Aggregate GA4 in a Function node similar to GSC:
# Group by pagePath, sum sessions/conversions, list top sourcesStep 4: save to Google Drive
# n8n Google Drive node: Create File
# Operation: Upload
# File Name: analytics_{{ $now.format('yyyy-MM-dd') }}.json
# Folder: /analytics/daily-reports/
# Content: {{ JSON.stringify($json) }}
# The file contains pre-aggregated data:
# {
# "gsc": { "summary": [...top 50 pages...], "total_pages": 230, "period": "last_7_days" },
# "ga4": { "summary": [...top 50 pages...], "total_sessions": 4500, "period": "last_7_days" }
# }
#
# This is ~5-10KB instead of the raw 500KB+ exportStep 5: Claude reads via MCP and writes analysis
# MCP configuration for Claude (in .mcp.json or settings):
# {
# "mcpServers": {
# "google-drive": {
# "command": "npx",
# "args": ["-y", "@anthropic/google-drive-mcp"],
# "env": { "GOOGLE_CLIENT_ID": "...", "GOOGLE_CLIENT_SECRET": "..." }
# }
# }
# }
# Claude prompt for weekly analysis:
# "Read the latest analytics file from Google Drive at
# /analytics/daily-reports/. Compare this week's top pages
# to last week. Identify:
# 1. Pages with biggest position improvements
# 2. Pages losing clicks despite stable position (CTR drop)
# 3. New queries driving traffic this week
# 4. Pages with high impressions but low CTR (optimization targets)
# Write the analysis to my Obsidian vault."Why pre-aggregation matters
- Raw GSC export: 50,000 rows = 5MB+ = exceeds Claude's practical context window
- Pre-aggregated: 50 top pages with metrics = 10KB = fits easily, faster response
- Claude analyzes patterns in summaries better than scanning raw rows
- Cheaper: fewer tokens = lower API cost if using Claude API
- Reproducible: saved aggregates serve as historical records
Enriching with SERP data
import requests, os
def enrich_gsc_with_serp(gsc_summary: list) -> list:
"""Add SERP feature data to top GSC pages."""
for page in gsc_summary[:10]: # Top 10 pages only
# Find the top query for this page
top_query = page.get("top_query", page["page"].split("/")[-1].replace("-", " "))
resp = requests.post(
"https://api.scavio.dev/api/v1/search",
headers={"x-api-key": os.environ["SCAVIO_API_KEY"]},
json={"query": top_query, "platform": "google"},
timeout=10,
)
serp = resp.json()
page["serp_features"] = {
"ai_overview": bool(serp.get("ai_overview")),
"featured_snippet": bool(serp.get("featured_snippet")),
"paa_count": len(serp.get("people_also_ask", [])),
}
return gsc_summary
# 10 credits = $0.05 to enrich top pages with SERP context
# Helps Claude identify WHY CTR changed (AI Overview appeared, etc.)This pipeline costs under $5/month: n8n self-hosted is free, GSC/GA4 APIs are free, Google Drive is free, and optional SERP enrichment via Scavio is $1-2/month. The value is not the data collection -- it is the pre-aggregation that makes Claude's analysis actually useful instead of drowning in raw rows.