Current Era // Jan 2026

model-arena

AI blind taste test. Can you tell Claude from GPT from Gemini? Interactive challenges that put your AI intuition to the test.

HTML 85% JavaScript 6% CSS 5% 388 KB

01 // Overview

Project Overview

An interactive blind comparison platform where users try to identify which AI model generated a given response. Challenges present outputs from Claude, GPT, and Gemini side by side without labels. Vote on which is which, see the reveal, and track your accuracy on a leaderboard. It turns out the differences between frontier models are more subtle than most people think.

AI models compared

388 KB

Total size (lightweight)

Dependencies

Blind

Testing methodology

02 // Tech Stack

Tech Stack

Pure vanilla HTML, CSS, and JavaScript. No frameworks, no build tools, no server required. Everything runs client-side on GitHub Pages.

HTML 85% JavaScript 6% CSS 5% TypeScript 4%

Vanilla HTML Vanilla CSS Vanilla JS GitHub Pages LocalStorage

03 // Architecture

Architecture

A simple client-side architecture. Pre-generated AI responses are embedded in the HTML. The comparison engine randomizes presentation order and tracks user votes in LocalStorage.

☯

Model A

☯

Model B

Which response was written by Claude? GPT? Gemini?

Static HTML (GitHub Pages)
    |
    Pre-generated AI responses (embedded in markup)
    |
Comparison Engine (client-side JS)
    |--- Randomize --- Shuffle model presentation order
    |--- Vote ------- Capture user identification guess
    |--- Reveal ----- Show correct model + scoring
    |--- Track ------ LocalStorage leaderboard

04 // Key Features

Key Features

👀

Blind Comparison

Responses are presented without model labels. No bias, no brand loyalty -- just pure output quality assessment.

✍

Interactive Voting

Click to identify which model wrote each response. Get instant feedback on your accuracy with detailed reveal screens.

🏆

Leaderboard

Track your identification accuracy over time. How well can you really distinguish between frontier AI models?

📊

Multiple Challenges

Different prompt categories test different model strengths: creative writing, code generation, reasoning, and analysis.

⚡

Zero Dependencies

Under 400 KB total. No frameworks, no build step, no server. Opens instantly in any browser.

💡

Metaball Lava Lamp

Custom canvas-based metaball visual effect inspired by Balatro. Pure GPU-accelerated eye candy as a background effect.

05 // Stats

Stats

Jan 27

Created, 2026

388 KB

Repository size

Languages

External dependencies

06 // Lessons Learned

Lessons Learned

LLM Comparison Methodology

Blind testing is the only fair way to compare AI models. Branding and expectations heavily influence perception. When labels are removed, the quality gap between frontier models narrows significantly.

Blind Testing UX

Designing a blind comparison UI requires careful randomization, clear visual separation between options, and satisfying reveal animations. The "moment of truth" is the core UX loop.

The Subtleties Between Models

Each model has telltale patterns -- Claude tends toward nuanced caveats, GPT toward confident structure, Gemini toward concise directness. But these patterns are inconsistent enough to keep the game genuinely challenging.