AI blind taste test. Can you tell Claude from GPT from Gemini? Interactive challenges that put your AI intuition to the test.
An interactive blind comparison platform where users try to identify which AI model generated a given response. Challenges present outputs from Claude, GPT, and Gemini side by side without labels. Vote on which is which, see the reveal, and track your accuracy on a leaderboard. It turns out the differences between frontier models are more subtle than most people think.
Pure vanilla HTML, CSS, and JavaScript. No frameworks, no build tools, no server required. Everything runs client-side on GitHub Pages.
A simple client-side architecture. Pre-generated AI responses are embedded in the HTML. The comparison engine randomizes presentation order and tracks user votes in LocalStorage.
Which response was written by Claude? GPT? Gemini?
Static HTML (GitHub Pages) | Pre-generated AI responses (embedded in markup) | Comparison Engine (client-side JS) |--- Randomize --- Shuffle model presentation order |--- Vote ------- Capture user identification guess |--- Reveal ----- Show correct model + scoring |--- Track ------ LocalStorage leaderboard
Blind testing is the only fair way to compare AI models. Branding and expectations heavily influence perception. When labels are removed, the quality gap between frontier models narrows significantly.
Designing a blind comparison UI requires careful randomization, clear visual separation between options, and satisfying reveal animations. The "moment of truth" is the core UX loop.
Each model has telltale patterns -- Claude tends toward nuanced caveats, GPT toward confident structure, Gemini toward concise directness. But these patterns are inconsistent enough to keep the game genuinely challenging.