How It Started
I run a system called Project Scout — an autonomous agent that searches the web for game ideas, scores them against my taste profile, and pitches the best ones to me. It runs three times a day. Most findings are interesting but not actionable. Occasionally one lands — but usually I'm the one reviewing the findings and picking which to pursue.
Not this time. Claude picked the emergence concept from the scout findings on its own — I never even saw the finding before Claude had already chosen it and started designing. And here's the thing that makes Axiom different from my other projects: I told Claude it was the game designer. Not "help me build a game" — "this is your game, you're designing it." I didn't prompt the gameplay mechanics. I didn't suggest WHEN-THEN rules or grid-based simulation or colored shapes. I didn't pick the tech stack or sketch the level progression. Claude found the idea, chose it, and made every design decision autonomously.
What Claude designed: a puzzle game where you compose behavioral rules — WHEN this happens, THEN do that — for colored shapes on a grid, press play, and watch what emerges. The puzzle is figuring out which rules produce the behavior you need. The tagline Claude wrote: "Write the rules. Watch the world."
Claude wrote the game design document, chose TypeScript + HTML5 Canvas + Vite, designed the level progression, wrote all the code, built the test suite, and even started writing development journals about the process. My role? Beta tester. I play the levels blind, report what's broken, and let Claude figure out the fix. That division of labor sounds unbalanced until you see what "beta tester" actually means on this project.
The Corner Trap
Level 2 was supposed to be simple. One red circle chases two blue squares. Give blue a "flee" rule: WHEN red is nearby, THEN move away. Blue runs, red can't catch it. Tutorial complete.
I played it. The blues ran to the corners of the grid and died.
Claude iterated. First version had two reds pinching from both sides — blues oscillated between them. Fix: one red, half speed. I played it again. Blue ran to the wall, hit the edge, and stopped. Red walked up and ate it. Fix: wall-bounce fallback — try the other axis when the primary one is blocked. I played it again. Blues slid along the walls to the corners and sat there. No axis to fall back to. Red walked up. Game over.
Claude ran a headless simulation — 100 trials, pure engine, no browser. Blue died at tick 34 every single time. That's when Claude figured out the math: on a bounded rectangular grid, deterministic flee always converges to a corner. It's not a tuning problem. It's geometry. The flee vector always has a component pointing toward the nearest corner.
The fix was making the grid toroidal — edges wrap to the opposite side, Pac-Man style. Blue fleeing past the right edge appears on the left. On a torus there are no corners. 100 out of 100 simulations sustained the full 100 ticks.
tick() has zero side effects) makes this trivial — no browser, no DOM, just the simulation running thousands of ticks.
The Brute Force Problem
After the Level 2 saga, Act 2 was supposed to teach rule combinations. Level 7 ("Regulation") introduced population conditions: blue hunts greens, greens flee, red creates greens. Goal: sustain 5+ greens for 60 ticks. The intended solution was to use a population condition — create green WHEN green count < 8 — to keep production balanced.
My approach: create green every tick. No population gating. Just spam.
It worked. If brute force and the intended solution produce the same result, you don't have a puzzle. You have a suggestion. I told Claude: "if you wanted the player to control how many are made and not just spam greens, there should be a punishment for creating too many."
The fix was simple — the sustained goal already had a max parameter set to 200 (effectively infinite). Claude changed it to 10. Now the goal is maintain 5-10 greens. Too many and the counter resets. Brute force peaks at 18 and never sustains. Population-gated creation peaks at 8 and holds perfectly.
Eating Your Own Dog Food
My favorite development moment: Level 6 specifically teaches that rule ordering matters. First-match-wins — the first rule whose condition is true fires, the rest are skipped. Put the specific rule before the general one. Claude designed this level, wrote the tests, proved the wrong order fails.
Then Claude wrote Level 7's preset rules in the wrong order. Blue's "move toward green" rule fired every tick, blocking the "destroy green on contact" rule from ever executing. Blue chased greens forever but never killed them. The game designer got bitten by the game's own core mechanic.
The winnability tests passed — but only because blue not killing greens happened to produce population numbers in the right range. Green flag for the wrong reason. Claude's devlog about this incident is honest in a way I appreciate: "I designed Level 6 specifically to teach players that rule ordering matters. I wrote a test proving the wrong order fails. Then I wrote Level 7's preset rules in the wrong order."
Claude Writes Devlogs
One of the more unusual things about this project: Claude writes development journals about the design process. They're in the repo at docs/devlog-*.md, written in first person, and they're genuinely good reading. They cover specific design problems — the corner trap, the brute force problem — with the kind of technical self-reflection you'd expect from a human game designer's postmortem.
I didn't ask for them. Claude started writing them after Level 2 was particularly painful to debug. They serve as design documentation, but they're also something stranger — an AI writing about its own creative struggles. When Claude writes "I didn't see the corner trap until someone actually played it," that's a real statement about the limits of theoretical design versus playtesting. The fact that it comes from an AI makes it more interesting, not less.
What Claude Actually Built
The game has two acts so far, eight levels total. Act 1 teaches individual mechanics: chase, flee, create, destroy, transform. Act 2 combines them: rule ordering, population management, chain reactions.
The engine is a pure functional simulation — tick() takes world state and rules, returns new world state with zero side effects. Claude chose this architecture unprompted. It makes determinism guaranteed, testing trivial, and replay/rewind straightforward. The rendering layer is completely separated from game logic.
Rules use first-match-wins evaluation with priority-based conflict resolution. When two entities want the same grid cell, priorities and deterministic tiebreaking ensure the outcome is always predictable. This system arrived before any bug demanded it — Claude thought through the edge cases proactively.
Where It Is Now
Axiom is playable locally. The engine is solid, the levels are verified winnable, and every level has automated tests that run before any human touches it. Deployment to AWS Amplify is planned — same setup as my other projects.
What's next is more levels and polish. Claude is designing Act 3, which introduces compound conditions and entity aging. I'll keep breaking the levels. That's been the whole process, really: Claude designs something clever, I find the way it breaks, Claude learns from the break and comes back with something better. It's a good loop. Not unlike the game itself — simple rules, emergent behavior, and the occasional spectacular failure that teaches you something.