تجاوز إلى المحتوى
العربية
كل playbooks البيانات والمحللون

playbook

Interview a dataset before you touch it

Open a file you've never seen and get the full picture — row count, what every column means, the date range, blanks, and duplicates — before you ask a single analysis question.

سهل ~15 min

متى تلجأ إلى هذا

Someone drops a export.csv in your lap and asks for the number by standup. You've never opened it, you don't know if status has five values or fifty, and one bad assumption — a column that means something other than its name, a date range that's missing last week — quietly poisons every chart you build on top of it. The fix is a five-minute interview: make Claude read the whole file and describe it back to you so you know exactly what you're working with before you analyze anything.

جهّز هذا أولًا

  • The raw file as it landed — export.csv, events.csv, a spreadsheet tab saved out as CSV. Don't clean it first; you want to see the mess.
  • Whatever you were told it contains ("last quarter's orders") so you can check the file against the claim.
  • If it's sensitive — customer rows, anything with names or emails — work from a copy and keep it local; this is profiling, not sharing.

الـ workflow

  1. Have Claude read the whole file and count it

    Start with the shape, not the analysis. Knowing the row count up front gives you the number every later result has to reconcile against — it's the anchor for the whole project.

    أنت تطلب
    Open export.csv and give me the basics: how many rows, how many columns, and the column names exactly as they appear in the header. Don't analyze anything yet — just describe what's here.

    ما تحصل عليه A plain readout: "12,840 rows, 9 columns: order_id, customer_id, order_date, status, amount, currency, region, channel, refunded." Now you have the row count — 12,840 — to reconcile everything else against.

    Write that row count down. It's the known total you'll sanity-check every breakdown against later.

  2. Make it explain every column in plain English

    Column names lie. amount might be cents, status might have a value nobody documented, region might be blank for half the rows. Ask Claude to define each one from the actual values it sees, not from the name.

    أنت تطلب
    For each column, tell me in one line what it actually holds — based on the values, not the name. List the distinct values for status, region, and channel. Flag any column whose name doesn't match what's inside it.

    ما تحصل عليه A column glossary: "amount = order total in cents (e.g. 4999 = $49.99); status = one of paid, pending, refunded, chargeback; region blank on 18% of rows." The cents catch alone saves you a 100x error.

  3. Profile the dates and the blanks

    The two things that silently break an analysis are a date range that doesn't cover what you think, and missing values you didn't budget for. Surface both before you filter on either.

    أنت تطلب
    What's the date range of order_date — earliest and latest? Are there any gaps (whole days with no rows)? And give me a table of blanks per column: how many and what percent are empty in each.

    ما تحصل عليه "order_date runs Jan 1 to Mar 28; no rows at all for Feb 14–15. Blanks: region 18%, channel 4%, everything else 0%." Now you know the data stops on the 28th — not 'last week' — before you promise a last-week number.

  4. Hunt for duplicates and anything off

    Duplicate rows double your totals; outliers signal a broken export or a test order. Ask for both, plus a written 'here's what I'd be careful of' so the gotchas are on the record.

    أنت تطلب
    Are there duplicate order_id values? How many fully identical rows? Show me the 5 largest and 5 smallest amount values so I can spot test orders or errors. Then summarize, in 4 bullets, what I should be careful of when I analyze this file.

    ما تحصل عليه "47 duplicate order_ids (likely retries); one row with amount = 9999999 (a test order); negative amounts on the 31 refund rows." Plus a short caveats list you can paste straight into your notes.

    Those caveats are the most valuable output here — they're the assumptions a reviewer would otherwise catch after you'd shipped the wrong number.

اجعله ملكك

  • **Reuse it on every new file:** the four moves never change, so save them as a /profile custom command (see the Playbook's *Features* tab) — next time it's one command, not four prompts.
  • **Feeding a bigger job:** this is step one of every other Data playbook. Profile first, then hand the same file straight into *Write the query and learn the SQL* or *Two messy files into one trustworthy dataset*.
  • **Recurring source:** if the same export.csv lands every week, a scheduled agent (see the *Features* tab) can profile it on arrival and ping you only when the row count or blank rate jumps.

انتبه إلى

  • A profile describes the file; it doesn't certify the data is *correct*. Claude can tell you region is 18% blank — it can't tell you whether that's expected. You still decide what's a problem.
  • If the file holds personal data — names, emails, anything identifying — keep it on your own machine and have Claude work locally. Profiling doesn't require sending raw customer rows anywhere.
  • Treat the column glossary as a draft, not gospel. Spot-check two or three of Claude's column definitions against rows you actually understand before you build on them — Claude drafts the read, you confirm it.

ستحصل في النهاية على A one-page picture of a file you'd never opened — its size, every column's real meaning, the true date range, the blanks, the duplicates, and a caveats list — so you start analyzing from knowledge instead of from a guess.