We Drew This | Penguinboi

TLDR We Drew This is a live Twitch art stream where the crowd votes on every element of a piece — subject, action, setting, style — and AI generates it in real time. The crowd picks the prompts. Claude writes them. FLUX 2 Pro renders them. Everyone takes credit. Gallery at wedrewthis.com.

The Concept

The crowd votes. The AI creates. Everyone takes credit.

That's the whole pitch. I wanted to build a Twitch stream that felt genuinely interactive — not just "watch me code" or "watch me draw," but something where the audience is an active creative collaborator. Not passive viewers. Co-artists.

The idea is simple: we run a series of votes in chat, each one adding a layer to the artwork. Subject first, then action, then setting, then art style. After four rounds of voting, Claude assembles all the winning choices into a single, cohesive image generation prompt. FLUX 2 Pro renders it live on stream. The chat explodes. Someone points out that "astronaut breakdancing in a Minecraft cave" was entirely their idea. They're right.

What makes it fun — and what took the most thought to build — is that the votes aren't independent. They cascade.

The Vote Loop

Each round of voting is informed by what the chat already decided. This is the part I'm most proud of.

Before the stream starts, Claude picks a vote sequence from a set of templates. The default "Classic" sequence goes: Subject → Action → Setting → Art Style. There are others — "Sci-Fi" sequences bias toward space and technology, "Fantasy" sequences lean into magic and creatures — but Classic works for almost anything.

Here's where it gets interesting: when Claude generates the options for each round, it already knows the prior winners. If "astronaut" wins the Subject vote, the Action options it generates for round two might include "floating in zero gravity," "planting a flag on an alien surface," or "breakdancing." If round two goes to "breakdancing," then the Setting options in round three will likely include environments that play off that choice — a space station dance floor, a crater stage, a moon disco.

The options respond to what came before. By the final vote, the choices feel curated for this specific piece, not randomly assembled. It creates a coherent arc through the voting, and it means the artwork has an internal logic even when the choices are weird.

Voting works by typing 1, 2, 3, or 4 in chat — the orchestrator tallies them in real time. Crucially, viewers vote on options that Claude generated, not ones they wrote themselves. This is a deliberate design choice: it prevents trolling and NSFW content from derailing the stream. Claude proposes, the crowd disposes. The creative energy comes from the combinations, not from individual suggestions. (Native Twitch polls are built into the code too, but those require Affiliate status — a goal for the future.)

The AI Stack

Two models, two jobs, no overlap.

Claude Sonnet handles everything language-shaped: generating the vote options for each round (3-5 per round, informed by prior winners), assembling the final image generation prompt from all the winning choices, and writing title options for the finished piece. Claude is good at this because it understands context — it can look at "astronaut + breakdancing + moon crater + pixel art" and produce a prompt that makes all four elements work together visually, rather than just concatenating them with commas and hoping for the best.

FLUX 2 Pro via Replicate handles the actual image generation. The quality-to-cost ratio is excellent for live stream use — sharp, creative output at $0.05 per image. At that price, we can generate multiple candidates if the first render is a swing-and-a-miss, or run bonus rounds for special occasions, without the cost spiraling.

After the image is revealed, we run one final vote: the title. Claude generates three or four potential names for the piece, each one a little different in tone — some descriptive, some poetic, some funny. Chat votes. The winning title gets embedded in the gallery entry. This is consistently the most chaotic round of voting. Chat is very opinionated about names.

$0.05 Per image via FLUX 2 Pro on Replicate. A two-hour stream with 6–8 full rounds costs about $0.35–$0.40 in image generation. The Claude API calls add maybe another $0.10. The whole creative pipeline for a stream costs less than a cup of coffee.

Watermarks That Match the Art

Every piece that comes out of We Drew This gets a signature — a small "WDT" watermark in the corner. This is partly branding, partly provenance. If a piece ends up shared somewhere, you can trace it back.

The interesting part is that the watermark matches the art style. A pixel art piece gets a pixel art signature. A watercolor painting gets a soft, brushstroke-style mark. An oil painting gets something that looks like it was painted in. This matters more than it sounds: a crisp digital watermark on a watercolor piece looks wrong in a way that immediately cheapens the work.

The implementation uses FLUX Fill Pro inpainting. After the main image is generated, I pass it back to FLUX with a mask covering the signature area and a style-matched prompt: "pixel art text 'WDT' in the style of the image." The model fills in a signature that belongs in the piece. It's not always perfect, but it's almost always better than overlaying text.

Pillow handles fallback. If the inpainting call fails or times out — which happens occasionally at peak Replicate load — we fall back to a semi-transparent text overlay in a font that at least loosely matches the aesthetic. The stream doesn't stop waiting for a perfect watermark.

The Infrastructure

The orchestrator is a Python service running on EC2. It manages the vote state machine, talks to the Twitch IRC for chat voting, calls Claude and Replicate, and broadcasts state updates to the OBS overlay.

The OBS overlay is a locally-served web page that displays the current vote options, a live vote count bar, and the generated image when it's ready. It gets updates from the orchestrator via SSE (Server-Sent Events) — the overlay just keeps a connection open and reacts to state changes. This is much simpler than websockets for a one-way data flow, and it means the overlay always reflects the live state even if OBS restarts mid-stream.

nginx sits in front of everything as a reverse proxy, handles SSL termination, and serves the admin panel. The admin panel is a simple web UI that lets me manually advance the vote state, override a winner if chat voting was ambiguous, trigger a regeneration, or add extra context to Claude's prompt mid-stream. Most streams I barely touch it, but it's invaluable when something goes sideways.

The Social Pipeline

After each round completes, the artwork and title go out automatically. Claude writes a social caption that includes the vote winners — something like "The chat voted: astronaut + breakdancing + moon crater + pixel art. We Drew This." Tweepy posts it to Twitter. The Instagram Graph API posts it to Instagram with the same caption.

I was skeptical about auto-posting at first — it can feel spammy if the content isn't interesting. But "here's the weird thing chat made tonight" turns out to be reliably interesting content. The posts do well. People who missed the stream can see what the chat created and immediately understand what We Drew This is about.

Claude also writes a slightly longer caption for the gallery entry, giving the piece a bit of context: when it was created, how the votes went, what the margins looked like. If "breakdancing" won round two with 47% of the vote over "playing chess" at 31%, that's part of the story of the piece.

The Gallery

Every piece lives at wedrewthis.com. The gallery is static HTML and vanilla JS — no framework, no build step, no server-side rendering. The gallery data is a single gallery.json file that the page loads and renders. Simple and fast.

After each stream, the new pieces and updated gallery.json get pushed to S3. CloudFront handles distribution and caching. A CloudFront invalidation on gallery.json means the gallery is live within a minute of a piece being added, without needing to invalidate the entire cache. The HTML and JS are edge-cached and almost never need refreshing.

The gallery page shows pieces in reverse-chronological order, with the vote breakdown for each one — what options were on the table, what chat chose, how close the votes were. Some of the most interesting pieces came from close votes. "Pixel art" beat "impressionist painting" by three votes on a piece that would have looked completely different either way.

Try It

The gallery is live at wedrewthis.com — every piece We Drew This has produced is there, with the full vote history. Stream schedule is on the Twitch channel. Come vote on something weird.

The thing I keep coming back to is how genuinely collaborative it feels. Chat didn't just watch a piece get made — they made it. The AI is a very capable brush, but the creative decisions belong to whoever showed up that night. When a piece turns out beautiful, chat is rightfully proud. When it turns out chaotic and strange, that's also entirely their fault. They wouldn't have it any other way.