Reddit Stock Sentiment AI Pipeline

Reddit's WallStreetBets, r/stocks, and r/investing contain real-time retail sentiment that moves markets. Building an AI pipeline to scan these subreddits for sentiment costs approximately $0.25/day using a structured Reddit API for thread data and an LLM for sentiment classification. No scraping, no proxies, no Reddit API rate limits.

Pipeline architecture

Fetch Reddit threads mentioning target tickers via search API
Extract thread titles, body text, upvote counts, and comment counts
Run sentiment classification on each thread (bullish / bearish / neutral)
Aggregate daily sentiment scores per ticker
Alert on sentiment spikes (sudden shift from neutral to strongly bullish/bearish)

Step 1: collect Reddit data

Python

import requests, os

H = {"x-api-key": os.environ["SCAVIO_API_KEY"]}

def get_reddit_mentions(ticker: str):
    """Fetch Reddit threads mentioning a stock ticker."""
    queries = [
        f"{ticker} stock reddit",
        f"{ticker} DD wallstreetbets",
        f"{ticker} analysis r/stocks",
    ]
    threads = []
    for q in queries:
        resp = requests.post("https://api.scavio.dev/api/v1/search",
            headers=H, json={"query": q, "platform": "reddit"})
        for r in resp.json().get("organic_results", []):
            threads.append({
                "title": r.get("title", ""),
                "snippet": r.get("snippet", ""),
                "url": r.get("link", ""),
                "date": r.get("date", ""),
            })
    return threads

threads = get_reddit_mentions("NVDA")
print(f"Found {len(threads)} threads mentioning NVDA")

Step 2: sentiment classification

Python

from openai import OpenAI

client = OpenAI()

def classify_sentiment(threads: list) -> list:
    """Classify each thread as bullish, bearish, or neutral."""
    results = []
    for t in threads:
        text = f"{t['title']}. {t['snippet']}"
        resp = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{
                "role": "user",
                "content": f"Classify this Reddit post about a stock as bullish, bearish, or neutral. Reply with one word only.\\n\\n{text}"
            }],
            max_tokens=5)
        sentiment = resp.choices[0].message.content.strip().lower()
        results.append({**t, "sentiment": sentiment})
    return results

Step 3: aggregate and alert

Python

from collections import Counter

def daily_sentiment(ticker: str):
    threads = get_reddit_mentions(ticker)
    classified = classify_sentiment(threads)
    counts = Counter(t["sentiment"] for t in classified)
    total = len(classified)
    score = {
        "ticker": ticker,
        "total_threads": total,
        "bullish": counts.get("bullish", 0),
        "bearish": counts.get("bearish", 0),
        "neutral": counts.get("neutral", 0),
        "bullish_pct": round(counts.get("bullish", 0) / max(total, 1) * 100, 1),
    }
    return score

# Cost: 50 tickers x 3 queries each x $0.005 = $0.75/day
# LLM cost: ~150 classifications x $0.001 each = $0.15/day
# Total: under $1/day for 50 tickers

Limitations

Reddit sentiment is noisy: memes, sarcasm, and pump-and-dump campaigns are common
Search results lag: new posts take time to be indexed by search engines
Not a trading signal on its own: use as one input among many, not as a sole decision driver
Small sample sizes: a ticker with 5 threads is not statistically meaningful

Pipeline architecture

Fetch Reddit threads mentioning target tickers via search API

Extract thread titles, body text, upvote counts, and comment counts

Run sentiment classification on each thread (bullish / bearish / neutral)

Aggregate daily sentiment scores per ticker

Alert on sentiment spikes (sudden shift from neutral to strongly bullish/bearish)

Step 1: collect Reddit data

Python

import requests, os

H = {"x-api-key": os.environ["SCAVIO_API_KEY"]}

def get_reddit_mentions(ticker: str):
    """Fetch Reddit threads mentioning a stock ticker."""
    queries = [
        f"{ticker} stock reddit",
        f"{ticker} DD wallstreetbets",
        f"{ticker} analysis r/stocks",
    ]
    threads = []
    for q in queries:
        resp = requests.post("https://api.scavio.dev/api/v1/search",
            headers=H, json={"query": q, "platform": "reddit"})
        for r in resp.json().get("organic_results", []):
            threads.append({
                "title": r.get("title", ""),
                "snippet": r.get("snippet", ""),
                "url": r.get("link", ""),
                "date": r.get("date", ""),
            })
    return threads

threads = get_reddit_mentions("NVDA")
print(f"Found {len(threads)} threads mentioning NVDA")

Step 2: sentiment classification

Python

from openai import OpenAI

client = OpenAI()

def classify_sentiment(threads: list) -> list:
    """Classify each thread as bullish, bearish, or neutral."""
    results = []
    for t in threads:
        text = f"{t['title']}. {t['snippet']}"
        resp = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{
                "role": "user",
                "content": f"Classify this Reddit post about a stock as bullish, bearish, or neutral. Reply with one word only.\\n\\n{text}"
            }],
            max_tokens=5)
        sentiment = resp.choices[0].message.content.strip().lower()
        results.append({**t, "sentiment": sentiment})
    return results

Step 3: aggregate and alert

Python

from collections import Counter

def daily_sentiment(ticker: str):
    threads = get_reddit_mentions(ticker)
    classified = classify_sentiment(threads)
    counts = Counter(t["sentiment"] for t in classified)
    total = len(classified)
    score = {
        "ticker": ticker,
        "total_threads": total,
        "bullish": counts.get("bullish", 0),
        "bearish": counts.get("bearish", 0),
        "neutral": counts.get("neutral", 0),
        "bullish_pct": round(counts.get("bullish", 0) / max(total, 1) * 100, 1),
    }
    return score

# Cost: 50 tickers x 3 queries each x $0.005 = $0.75/day
# LLM cost: ~150 classifications x $0.001 each = $0.15/day
# Total: under $1/day for 50 tickers

Limitations

Reddit sentiment is noisy: memes, sarcasm, and pump-and-dump campaigns are common

Search results lag: new posts take time to be indexed by search engines

Not a trading signal on its own: use as one input among many, not as a sole decision driver

Small sample sizes: a ticker with 5 threads is not statistically meaningful

Reddit Stock Sentiment AI Pipeline

Pipeline architecture

Step 1: collect Reddit data

Step 2: sentiment classification

Step 3: aggregate and alert

Limitations

Continue reading

AEO Tracking for D2C Ecommerce Brands in 2026

Agent Discovery vs Extraction: Why Cost Split Matters

Reddit Stock Sentiment AI Pipeline

Pipeline architecture

Step 1: collect Reddit data

Step 2: sentiment classification

Step 3: aggregate and alert

Limitations

Continue reading

AEO Tracking for D2C Ecommerce Brands in 2026

Agent Discovery vs Extraction: Why Cost Split Matters