Post Emotion Analysis

Emotion Streamgraph — drag to select range

—

Emotion Breakdown

Posts

About This Project

What is this?

This is an experiment in building a complete data science project with Claude Code (Opus 4.6). The entire pipeline — data ingestion, LLM-based emotion scoring, interactive visualization, and deployment — was built by Claude as an autonomous coding agent, with a human providing direction and editorial judgment.

The subject matter is secondary to the process. The analyzed account was chosen because it provides a large, readily available single-author corpus of public posts. The approach is generalizable to any public account on any platform.

How it works

Each post is scored 0–10 across 12 emotion dimensions by google/gemini-2.0-flash-lite-001 via OpenRouter. The 12 emotions are grouped into families:

Hostile: anger, contempt, cruelty
Fearful: fear, dread
Aspirational: hope, passion, awe
Warm: love, caring, charity
Other: humor

Period-level percentages are computed by summing raw scores across all posts in that period, then normalizing to 100%. Retweets and posts under 10 characters are excluded.

Pipeline

The processing pipeline (also written by Claude) handles data ingestion from multiple formats, deduplication, batched LLM scoring with progress checkpointing, and topic extraction. The visualization is a single index.html file using Chart.js and D3.js — no build step, no framework, no backend.

Quality Assurance

Scores are audited by re-scoring a √n random sample with a stronger model and comparing per-emotion agreement. This measures MAE, Pearson correlation, and agreement rates across all 12 dimensions. Audited against:

x-ai/grok-4
anthropic/claude-sonnet-4-5

The audit script is configurable via the AUDIT_MODEL environment variable, so any OpenRouter-accessible model can be used as a reference.

Limitations

LLM-based emotion classification is inherently subjective and non-deterministic. Scores are approximate, not ground truth.
Short posts and irony/sarcasm are particularly challenging for any classifier.
The dataset may not capture deleted posts or posts removed within the collection window.
Aggregated trends are more reliable than individual post scores.

Source

All code is available in this repository. MIT licensed.