Which Claude model should you use? Haiku, Sonnet and Opus, picked by the health job — not the hype.
Every few months the names change and the launch posts insist this one is the smartest model ever made. For running your own health data, that’s the wrong question. There aren’t seven models to memorise — there’s one ladder, and a simple rule for which rung fits which job. Here’s the calm version: what Haiku, Sonnet and Opus actually are, and which one to reach for when you’re reading your own labs, sleep and notes.
Every few months a new Claude lands, the launch post calls it the most capable model ever built, and a fresh wave of carousels appears explaining the ‘new lineup’ as if it were a periodic table you now have to memorise. Haiku, Sonnet, Opus — and whatever bigger, bolder names get added next. It looks like seven things to learn. It isn’t.
It’s one ladder. The rungs climb from fast-and-cheap, through balanced, to powerful-and-expensive. Once you understand the ladder, you never have to re-learn it when the names change — and they will change. The skill that lasts isn’t knowing this month’s flagship. It’s knowing which rung your task actually needs. That’s the part the hype is built to make you forget, because ‘always use the biggest one’ is a better sales line than ‘most of the time you don’t need it.’
the ladder, in plain terms
Ignore the version numbers for a second. At any given time the family sorts into three jobs. The names move; the jobs don’t.
- Haiku — the fast one. The smallest, quickest, cheapest tier. Built for simple jobs done at speed and scale: tidy this export, pull the numbers out of this PDF, classify these notes, give me a one-line summary. Think of it as the intern who replies in two seconds — brilliant for volume, not the one you hand a hard judgement call.
- Sonnet — the all-rounder. The balanced middle of the family: smart enough for most real work, still easy on time and cost. This is the default for the overwhelming majority of everyday health tasks. The reliable daily driver — unglamorous, and almost always the right answer.
- Opus — the heavy hitter. The flagship, made for genuinely hard, multi-step thinking. More power, higher cost, slower. You reach for it when a problem is genuinely complex and the quality of the reasoning matters more than the speed or the bill — not as a reflex.
A useful mental image: it’s like coffee sizes. Small, medium, large — and occasionally ‘are you sure?’. Most days you want the medium. Learn the ladder once and you stop feeling lost every time the menu gets re-printed.
now map it to your own health stack
Here’s where this stops being trivia and starts saving you money and frustration. These are the jobs people actually do when they read their own data — matched to the rung that fits.
- Extracting and tidying — pulling values out of a lab PDF, turning an Oura or Apple Health export into a clean table, transcribing a messy note. This is fast, structured, low-judgement work. The fast tier handles it well, and there’s no reason to pay flagship prices for data entry.
- Your weekly read-out — ‘here’s this week’s sleep, training and mood; what changed, what’s drifting, what should I keep an eye on?’ This is the bread-and-butter of a personal health ledger, and it’s squarely the all-rounder’s job. Reach for the middle rung and stay there for most of what you do.
- Cross-referencing months of data — ‘look across the last four months of resting heart rate, alcohol and HRV and tell me where the patterns connect.’ This is harder, multi-factor reasoning over a lot of context. This is where stepping up to the heavy hitter earns its keep.
- A genuinely complex question — reconciling two conflicting lab panels, reasoning through a tangled medication-and-supplement timeline, or pressure-testing a protocol before you take it to a practitioner. Heavy thinking, high stakes. Use the flagship, and then take the output to a human.
the guardrails question — and why ‘most powerful’ isn’t the whole story
Launch posts love a benchmark table: this model wins this many rows against the big names. Treat those with the same scepticism you’d treat any number a company publishes about its own product. They’re measured on tasks chosen by the people selling the model, and a percentage point on an agentic-coding benchmark tells you almost nothing about whether a model will read your sleep data sensibly.
What actually matters for health is something the leaderboards don’t show: how the model behaves when a question shades into medical territory. A well-built consumer model should get more careful, not less, when you ask it something that sounds like a diagnosis — hedging, citing uncertainty, telling you to see someone. That restraint is a feature, not a weakness. Be wary of any setup or ‘unlocked’ variant sold on the promise that it removes the guardrails. For your own body, the model that knows when to stop is worth more than the one that always has an answer.
“The model that says ‘I’m not sure — check this with a clinician’ is doing its job. The one that confidently diagnoses you from a screenshot is the one to worry about.”
the skill that survives the next launch
Here’s the uncomfortable part for anyone hoping to memorise their way to mastery: the names you learn today have a shelf life of months. New tiers arrive, old ones get renamed, the ‘best model in the world’ crown changes hands on a schedule. If your competence is ‘I know that Opus is the good one,’ it expires on the next release.
What doesn’t expire is the judgement underneath: knowing what your task actually requires, matching it to the cheapest rung that does the job well, and keeping a human in the loop for anything that touches your health. That’s the whole point of building your own stack instead of renting another app’s opinion. The tools keep getting better and the names keep changing — the rule for choosing between them, and the judgement behind it, stays yours.
If you want the ladder turned into a working setup — the right Claude tier wired to your own labs, exports and notes, with a weekly read-out that runs on the cheaper rung and only escalates when it should — that’s exactly what our Claude for Health guide and the ten-minute Setup walk you through. None of it is a secret mode. All of it is the calm version that keeps working after the next model drops.
Recommended next