Skip to content
English
all Engineering playbooks

playbook

Refactor safely behind a green suite

Clean up a gnarly file without changing behaviour — using the test suite as your seatbelt, running it before and after so you can prove nothing broke instead of hoping.

medium ~30 min

when to reach for this

A file has grown into a mess — a 400-line function, duplicated logic, names that lie. You want it cleaner, but "refactor" without a safety net is just "rewrite and pray." This system makes the test suite the seatbelt: green before, green after, with the structural change in between — so "I didn't change behaviour" is a fact you proved, not a hope you have.

gather this first

  • The file you want to refactor, open in the repo.
  • A passing test suite that covers the code — if there isn't one, run the *backfill-tests* playbook first. The seatbelt has to exist before you drive.
  • The test command, so Claude can run it before and after and show you both results.

the workflow

  1. Establish green, then name the smells

    Prove the starting state is green first — that's your baseline. Then get a plan for *what* to change before any code moves.

    you ask
    First run the test suite and confirm it's green — that's our baseline. Then read this file and tell me the three biggest structural problems and how you'd fix each, in order, without changing behaviour. Don't refactor yet.

    what you get back A confirmed-green baseline plus a short, ordered plan — "extract the validation into its own function, collapse the duplicated branches, rename data to what it actually holds" — all behaviour-preserving.

    It's far cheaper to correct a wrong approach in prose than in a 300-line diff. Read the plan before it writes a line.

  2. Refactor in small, reviewable moves

    Do the changes as discrete steps, not one giant rewrite — so if something breaks, you know exactly which move did it.

    you ask
    Do the refactor as separate, small steps in the order we agreed. Behaviour must not change. After each step, show me the diff for that step so I can follow what moved.

    what you get back A sequence of focused diffs — extract, then dedupe, then rename — each one easy to read, instead of one unreviewable wall of changes.

  3. Prove behaviour is unchanged

    Run the same suite again. Same green is the whole point — it's the difference between "I refactored" and "I rewrote and hoped."

    you ask
    Run the full suite again and show me the result next to the baseline. If anything is red, that's a behaviour change — revert that step and tell me what it touched. Don't paper over a failure.

    what you get back The same green you started with. If a test went red, an honest "step 2 changed behaviour here" and a revert — not a quietly-edited test to make it pass.

    Never take "behaviour is unchanged" on its word — the re-run of the suite is what makes it true.

  4. Walk me through what changed and why

    End with a short narrative so your eventual reviewer (and future you) understands the intent behind the diff.

    you ask
    Summarize what you changed and why, in a few bullets I could paste into the PR description — what's cleaner now and what behaviour stayed identical.

    what you get back A tight changelog you can drop straight into the PR, so the reviewer reads intent, not just a diff.

make it your own

  • **No suite yet:** don't refactor blind. Run the *backfill-tests* playbook to add characterization tests first, *then* come back here.
  • **Bigger reshape:** for a structural change across several files, ask for the plan in plan mode first and approve it before any edits — see /features/plan-mode/.

watch out for

  • A refactor with no test run before and after is a rewrite with a nicer name. The two green runs are the entire safety guarantee — don't skip either.
  • If it suggests a behaviour change "while we're in here," stop and split it out. Mixing a refactor with a behaviour change means a red test could be either, and you've lost the seatbelt.
  • Don't let it edit a test to make a refactor's failure go away. A red test after a behaviour-preserving change is a real signal — listen to it.

you'll end up with A gnarly file becomes a clean one, with the same green suite proving behaviour never moved — a refactor you can stand behind in review instead of a rewrite you're quietly nervous about.