Skip to content
English
all Data & Analysts playbooks

playbook

Write the query and learn the SQL

Get the exact query for the question you actually have, a line-by-line plain-English walkthrough of every clause, and a reconciliation against a known total — the answer and the understanding to defend it.

medium ~40 min

when to reach for this

You need a number — monthly active users by plan, revenue by region, whatever your manager just asked for — and you know roughly what SQL is but not enough to write it cold, or to trust a query someone hands you. Copy-pasting a query you don't understand is how a wrong number ends up in a deck with your name on it. This system gets you the query, a walkthrough that teaches you what each clause does, and a reconciliation step — so you can both produce the answer and explain it when someone pushes back.

gather this first

  • The schema or a sample: the table name and its columns — paste the CREATE TABLE or a schema.sql, or just a few example rows from events.csv.
  • The question in one plain sentence ("active users by plan, last 6 months") and how you'd define the fuzzy words in it — what counts as 'active', where the month boundary falls.
  • A known total to reconcile against — a row count from the *Interview a dataset before you touch it* profile, or a number from last month's report.

the workflow

  1. State the question precisely before any SQL

    Most wrong queries come from a fuzzy question, not bad syntax. Make Claude pin down the definitions first — 'active' and 'last 6 months' mean nothing until you agree on them — so the query answers the question you actually have.

    you ask
    I want monthly active users by plan for the last 6 months from the events table. Before you write any SQL, read it back to me: how are you defining 'active', where does each month start and end, and which timezone? List any assumption you're making. Don't write the query yet.

    what you get back A short read-back: "active = at least one event in the month; months are calendar months in UTC; last 6 means Jan–Jun." If 'active' should mean a *login* not any event, you fix that here — before it's baked into the query.

    This 'read the question back to me first' step is the whole game. A query that answers the wrong question runs perfectly and lies confidently.

  2. Get the query with a line-by-line explanation

    Ask for the SQL and the walkthrough in the same breath. The explanation is what turns this from copy-paste into something you understand well enough to modify and defend.

    you ask
    Now write the query for that. Then walk me through it clause by clause in plain English: what the WHERE filter does, what GROUP BY plan, month groups together, why you used that date function, and what each column in the output means. Assume I'm not a SQL expert.

    what you get back The query, then a numbered walkthrough: "WHERE event_date >= '2026-01-01' keeps only the last 6 months; GROUP BY plan, month makes one row per plan per month; COUNT(DISTINCT user_id) counts each user once even if they fired ten events." You leave knowing why, not just what.

  3. Reconcile the result against a known total

    A number you can't trace back to something you already trust is a number you can't share. Tie the query's output to the row count or a prior report before you believe it.

    you ask
    Run it and show me the result. Then reconcile: does the sum of users across all plans and months line up with the distinct user count in the raw table? If they don't match, find the gap — a filter dropping rows, a NULL plan, a duplicate — and tell me which.

    what you get back The result table plus a reconciliation: "14,210 distinct users total, matches the raw table — except 38 rows have a NULL plan and were excluded; here's the count with and without them." Now the number is defensible, edge cases and all.

    Reconcile against a *known* total — a row count or last month's figure. A result that matches nothing you already trust is just a confident guess.

  4. Stress-test the edges and save the explained version

    Have Claude attack its own query for the cases that quietly break aggregates — duplicates, NULLs, timezone boundaries — then keep the commented version so future-you (and a reviewer) can read it.

    you ask
    What could make this query wrong that the result wouldn't reveal? Check for double-counting, NULLs in the GROUP BY, off-by-one month boundaries, and timezone edge cases. Then give me the final query with a one-line comment above each clause explaining it, ready to save as active-users.sql.

    what you get back A short risk list ("a user who switched plans is counted under both — is that intended?") and a commented active-users.sql you can reread in three months and still understand.

make it your own

  • **Spreadsheet, not a database:** same flow against a CSV — Claude can compute the answer directly over events.csv and still explain each step; you don't need a live database to learn the logic.
  • **The number moved and you don't know why:** when the question is 'why did this change,' hand the explained query straight into *Find out why the number moved* — having the query already understood makes the root-cause hunt far faster.
  • **A query you run every month:** once it's explained and reconciled, save the prompt as a /mau custom command or a skill (see the *Features* tab) so the recurring number comes back the same way each time — the foundation the *Build a recurring analysis report* capstone leans on.

watch out for

  • A query can be syntactically perfect and answer the wrong question. The read-back in step 1 is non-negotiable — verify the *definition* of every fuzzy word ('active', 'churned', 'month') before you trust the *output*.
  • If the table holds personal data, run the query locally and share only the aggregate. Paste a schema and a few sample rows when you can rather than dumping a raw table full of emails into the prompt.
  • Claude writes and explains the query; you own whether the answer is right. Read the walkthrough, reconcile the total, and confirm the logic matches your intent — don't ship a number you couldn't explain out loud.

you'll end up with The exact query for your real question, a plain-English walkthrough that taught you why it works, a reconciliation that proves the number, and a commented `.sql` file you can rerun and defend.