Building claude-scrub

TLDR claude-scrub is a free, open-source CLI tool that finds and removes secrets from Claude Code's local session data. Zero dependencies, single Python file, MIT licensed. Because your AI coding assistant remembers everything — including your API keys.

The Problem Nobody Talks About

AI coding assistants see everything you do. That's the whole point — they need context to be helpful. But "everything" includes your API keys, database credentials, and access tokens. When you paste an API key into a conversation, or your assistant reads your .env file, or a command outputs credentials — all of that gets stored in local session history. In plaintext. Indefinitely. Like a very diligent, very indiscreet note-taker.

This is true of any AI agent that stores conversation history locally — Claude Code, Copilot, Cursor, Windsurf, all of them. The assistant doesn't need to be malicious; the data just accumulates. It's how conversation context works. But it means your local data directory is a treasure trove of secrets sitting on disk, unencrypted, one stolen laptop away from ruining your weekend.

In Claude Code specifically, secrets leak into ~/.claude/ through more channels than you'd expect:

Direct pasting — "here's my API key: sk-ant-..."
Tool calls — Claude Code reads .env files, config files, and command output containing credentials
File snapshots — session files store the contents of files Claude Code reads, including files with secrets
Command output — env, cat .env, printenv all dump secrets into conversation history
Memory files — if Claude Code summarizes a conversation that included a credential, that secret persists across conversations forever

Other AI agents have their own versions of this problem — different file paths, different storage formats, same "oh, that's been sitting there this whole time?" moment. I use Claude Code, so I built a tool to clean up Claude Code's data. I went looking for something that already existed. Nothing did.

Design Decisions

The first design question was scope. I already had a script called claude-sessions that listed and resumed Claude Code sessions. Adding scan-and-scrub functionality to it made more sense than starting a new tool, so claude-scrub absorbed the session listing and added two new commands: scan (read-only audit) and scrub (destructive removal).

A few design choices that shaped everything else:

Zero dependencies. The entire tool is a single Python file using only the standard library. This isn't minimalism for its own sake — this tool touches your most sensitive data. Every dependency is a supply chain attack surface. One file, stdlib only, nothing to compromise.

Scan before scrub. The read-only audit runs first so you see exactly what's exposed before deciding to remove anything. "Trust me, I'm a regex" is not a confidence-inspiring sales pitch.

No backups. This one felt wrong at first. Normally you'd want .bak files before destructive operations. But backup files containing your plaintext secrets defeat the entire purpose of scrubbing. Congratulations, you now have two copies of your AWS keys on disk.

Opt-in for external data. Paste cache, file history, and third-party tool databases aren't scrubbed by default because they're managed by other tools. Memory files are scrubbed by default because they persist across conversations and accumulate secrets over time — a higher-risk exposure vector than ephemeral cache files.

The False Positive Problem

The first version was simple: match every secret pattern, scrub everything. This went about as well as you'd expect. On real session data, the generic catch-all patterns (key=..., token=..., secret=...) matched roughly 28,000 times. The vast majority were config values, code discussion, and documentation — not actual secrets. Scrubbing all of them was like curing a headache by removing the patient's head.

This led to the pattern tier design:

Specific tier — patterns with distinctive formats that rarely false-positive. Prefixed API keys like sk-ant-, AKIA, ghp_. Private key headers. Luhn-validated credit card numbers. Scrubbed by default.
Generic tier — broad catch-all patterns like password=... and api_key=... that match by keyword plus value assignment. Shown in scan output but only scrubbed with --aggressive.

I also narrowed the generic keywords. Standalone key, token, and secret matched everything from React component keys to pagination tokens to someone explaining what an API key is in a code comment. Keeping only compound forms like api_key, secret_key, and password cut the noise dramatically.

~40 patterns, 2 tiers Built-in patterns cover AI providers, cloud platforms, payment processors, dev tools, crypto material, and credit cards — with an optional expanded database of 1,600+ patterns for comprehensive scanning.

Teaching a Regex to Think

Even with narrower keywords, api_key=development_mode and api_key=sK3j8fAx7mNp2qRwL9 both match the generic pattern. But one is a config value and the other is almost certainly a real secret.

The difference is randomness. Real API keys are generated strings with high character diversity. Config values are human-readable with repeated common letters. Shannon entropy quantifies this:

H(s) = -Σ (p_i × log₂(p_i))

For each unique character, multiply its frequency by the log of its frequency, sum them up, negate. High entropy (~4.0+ bits) means random-looking — probably a real secret. Low entropy (~3.0–3.5 bits) means human-readable — probably not.

I use this to promote high-entropy generic matches to the specific tier. They get scrubbed by default even though the pattern itself is generic. The threshold (~3.8 bits) sits in the natural gap between "things humans write" and "things key generators produce." Five lines of Python, zero dependencies, and the tool got dramatically smarter about what to scrub.

Credit Cards and the Luhn Algorithm

PII is a different category from API keys, but credit card numbers in session data are a serious exposure. The challenge: a regex matching 13–19 digit sequences starting with 3–6 also matches a lot of things that aren't credit cards.

The Luhn algorithm — a mod-10 checksum built into every real card number — eliminates almost all false positives. Eight lines of Python turns a noisy regex into a high-precision detector. Credit cards go in the specific tier, scrubbed by default. If a number passes both the format regex and the Luhn check, it's a real card number and it shouldn't be sitting in your session history.

The "Oh No" Moment

I discovered this one the fun way: by running the tool on real data and watching it eat my session files. Claude Code stores session history as compact JSONL — one JSON object per line, no pretty-printing. The greedy catch-all regex patterns used \S+ to match secret values, which happily consumed JSON structural characters along with the actual secret.

A match like api_key=mykey12345678},{"other":"val"} — the \S{8,} pattern swallowed the entire string, including the }, ", and everything after it. The "scrubbed" file was silently corrupted JSON. The security tool had become the threat. Very on-brand for a tool called "scrub," honestly.

The fix was simple: replace \S with [^\s",}\]] — exclude JSON delimiters from the character class. But the lesson is important: tools that modify data in-place need to understand the format they're operating on, not just the content they're looking for. I also added make_json_safe_regex() to automatically sanitize external and custom patterns on load, so the same bug can't sneak in through user-defined patterns or the downloaded pattern database.

Hardening the Security Tool

A tool that reads and modifies secret-containing files is itself an attack surface. If an attacker can plant a symlink at ~/.claude/projects/evil/session.jsonl pointing to /etc/passwd, or craft a ReDoS pattern that hangs the scanner, the security tool becomes the vulnerability. It's turtles all the way down.

So: hardening.

Symlink protection everywhere the tool touches the filesystem — discovery, scrub, cache loading, session launch. Every is_file() check is paired with not is_symlink().
ReDoS prevention — is_safe_regex() rejects patterns over 500 characters or containing nested quantifiers like (a+)+. Applied to both custom user patterns and downloaded pattern databases.
Atomic writes during scrub — tempfile.mkstemp() in the same directory, write the scrubbed content, then os.replace(). If the process crashes mid-scrub, the original file is untouched rather than half-written.
Download safety — 30-second socket timeout and 10MB size limit when fetching the external patterns database.
Narrowed exception handling — download_patterns_db() catches (URLError, OSError) instead of bare Exception. Unexpected errors surface instead of being silently swallowed.

The principle: a security tool should be at least as hardened as the data it protects.

Scrubbing Isn't Enough

Finding secrets is only half the job. If an API key was stored in plaintext session data, scrubbing the local copy doesn't invalidate the key — it's still live out there in the world, doing key things. Every credential that was exposed needs to be rotated.

--rotation-list generates a deduplicated checklist of credential types found, with guidance on what to rotate. Save it as plain text or JSON for integration with ticketing systems or runbooks. The tool doesn't just clean up — it tells you what else you need to do.

The Design Philosophy

The final architecture embodies a specific philosophy: scan is generous, scrub is conservative. Scan shows every match across all tiers so you see the full picture. Scrub only touches high-confidence matches by default. You can opt into more aggressive scrubbing, but the default protects session context while catching real secrets.

This is the opposite of most security tools, which default to maximum strictness. But for a tool that irreversibly modifies conversation history, false positives aren't just noise — they destroy useful data. I'd rather miss a low-confidence match than nuke half your session context because a regex got excited about the word "token" in a React tutorial.

From Script to Open Source

The jump from "working script" to "thing other humans can safely run" involved more work than the core features:

CI pipeline — GitHub Actions running ruff linting and pytest across Python 3.8, 3.10, and 3.12
Pre-commit hooks — ruff lint, ruff format, and local pytest catching issues before they reach CI
Community files — SECURITY.md, CONTRIBUTING.md, CHANGELOG.md, issue templates, PR template
Scan caching — results cached with a 1-hour TTL so scrub doesn't re-scan. The cache stores pattern names and line numbers, never secret values.
165 tests covering utilities, cache, sessions, entropy, error paths, security hardening, and end-to-end workflows

40% features, 60% everything else The code that finds secrets was maybe 40% of the total effort. The other 60% was hardening, testing, and making it safe for other people to use.

Go Use It

The repo is at github.com/penguinboi/claude-scrub. MIT licensed. Single Python file, zero dependencies, works anywhere Python 3.8+ runs. claude-scrub scan to see what's exposed, claude-scrub scrub to clean it up, --rotation-list to know what to rotate afterward.

If you use Claude Code, your session data almost certainly contains secrets you've forgotten about. And if you use a different AI coding assistant, the same problem exists — just in a different directory with a different storage format. claude-scrub targets Claude Code specifically, but the patterns, the tier architecture, and the entropy-gating approach are all portable ideas. If someone wants to build the Cursor or Copilot version, the hard design work is already done — steal freely.

In the meantime: your AI assistant has a great memory. Make sure it's not remembering things it shouldn't.