← Portfolio
Current Era // Jan 2026

model-arena

AI blind taste test. Can you tell Claude from GPT from Gemini? Interactive challenges that put your AI intuition to the test.

HTML 85% JavaScript 6% CSS 5% 388 KB

Project Overview

An interactive blind comparison platform where users try to identify which AI model generated a given response. Challenges present outputs from Claude, GPT, and Gemini side by side without labels. Vote on which is which, see the reveal, and track your accuracy on a leaderboard. It turns out the differences between frontier models are more subtle than most people think.

3
AI models compared
388 KB
Total size (lightweight)
0
Dependencies
Blind
Testing methodology

Tech Stack

Pure vanilla HTML, CSS, and JavaScript. No frameworks, no build tools, no server required. Everything runs client-side on GitHub Pages.

HTML 85% JavaScript 6% CSS 5% TypeScript 4%
Vanilla HTML Vanilla CSS Vanilla JS GitHub Pages LocalStorage

Architecture

A simple client-side architecture. Pre-generated AI responses are embedded in the HTML. The comparison engine randomizes presentation order and tracks user votes in LocalStorage.

Model A
VS
Model B

Which response was written by Claude? GPT? Gemini?

Static HTML (GitHub Pages)
    |
    Pre-generated AI responses (embedded in markup)
    |
Comparison Engine (client-side JS)
    |--- Randomize --- Shuffle model presentation order
    |--- Vote ------- Capture user identification guess
    |--- Reveal ----- Show correct model + scoring
    |--- Track ------ LocalStorage leaderboard

Key Features

👀
Blind Comparison
Responses are presented without model labels. No bias, no brand loyalty -- just pure output quality assessment.
Interactive Voting
Click to identify which model wrote each response. Get instant feedback on your accuracy with detailed reveal screens.
🏆
Leaderboard
Track your identification accuracy over time. How well can you really distinguish between frontier AI models?
📊
Multiple Challenges
Different prompt categories test different model strengths: creative writing, code generation, reasoning, and analysis.
Zero Dependencies
Under 400 KB total. No frameworks, no build step, no server. Opens instantly in any browser.
💡
Metaball Lava Lamp
Custom canvas-based metaball visual effect inspired by Balatro. Pure GPU-accelerated eye candy as a background effect.

Stats

Jan 27
Created, 2026
388 KB
Repository size
4
Languages
0
External dependencies

Lessons Learned

01

LLM Comparison Methodology

Blind testing is the only fair way to compare AI models. Branding and expectations heavily influence perception. When labels are removed, the quality gap between frontier models narrows significantly.

02

Blind Testing UX

Designing a blind comparison UI requires careful randomization, clear visual separation between options, and satisfying reveal animations. The "moment of truth" is the core UX loop.

03

The Subtleties Between Models

Each model has telltale patterns -- Claude tends toward nuanced caveats, GPT toward confident structure, Gemini toward concise directness. But these patterns are inconsistent enough to keep the game genuinely challenging.