The research was never proportionally about women. The apps inherited the gap.
Population health data was built largely on male defaults, then handed to algorithms that smoothed every woman's outliers into “normal.” The fix is unromantic: treat your own body as a sample size of one, and read it with a model that does not assume you are the average of millions of strangers.
It is not a conspiracy. It is a documented historical fact: for most of the twentieth century, the bodies that population health research was built on were disproportionately male. Women were excluded from large clinical trials by default until the 1990s on the grounds that the menstrual cycle was a confounder — which is a polite scientific way of saying the thing that makes you a woman was treated as noise to be filtered out.
Three decades later, the legacy is still everywhere. Drug dosing tables. Heart-attack symptom checklists. The reference ranges your last blood test was scored against. The auto-detected “normal” on your sleep tracker. The luteal-phase symptom your period app silently rounded into the average. None of these were built with malice. All of them were built on data that did not include enough of you.
what the apps inherited
Health apps and consumer wearables did not invent the data gap. They inherited it. Then they made it worse in two specific ways: by training on more of the same skewed data, and by smoothing your individual outliers into population averages so the dashboard looked clean. The dashboard looking clean is the problem. A clean dashboard is what tells you nothing is wrong when something is.
- Period apps that assume a 28-day cycle and quietly re-center your real cycle around it. Your real cycle was the data. The smoothing was the loss.
- Sleep trackers that score “normal” against a population average where the female luteal-phase sleep collapse is statistically washed out by the male nights.
- Heart-rate variability bands that were calibrated on athletic male cohorts and that flag your perfectly fine recovery as poor for two weeks of every month.
- Symptom checkers whose “classic presentation” for a heart attack is the male presentation, with the female presentation listed under “atypical.”
why this matters more now, not less
It would be reasonable to assume the gap is closing. In some places it is. The FDA started requiring sex-specific analysis in 1993. Cardiology guidelines have begun naming the female presentation as primary, not atypical. Endometriosis is no longer routinely dismissed as “women being dramatic” by every clinician under forty. Real progress, real and slow.
But the apps in your phone right now were trained on the data that already existed, and that data is still skewed. Worse: the same apps are now being layered with AI features that confidently summarise your health to you. Confidence is not accuracy. A model that has been trained on a population that under-represents you will speak just as fluently when it is wrong about you as when it is right.
the n-of-1 fix
There is a fix that does not require waiting for the field to catch up. Treat your own body as a sample size of one. Stop asking the algorithm whether you are normal. Ask a reasoning model — fed only your own four months of notes — whether you have a pattern. The model does not need a population. It needs your data, honestly written down, and a question that respects what only you can see.
- Open one document. Four lines a day. Cycle day (or week, post-menopause). Sleep. One symptom rated 1–10. One sentence about energy. Ninety seconds. No screens flashing at you.
- Do that for four cycles, or four months if you no longer cycle. Four months is the smallest sample where a real pattern can show up above the noise of any one bad week.
- Paste the whole thing into a free reasoning chat tool. Ask: “Across these four months of my own data, what pattern do you see — not what is normal for the population, what is consistent for me?”
- Take the one paragraph the model writes back, edit it for accuracy, and bring it to your next GP appointment as the reason you want one specific investigation. One. Not twelve.
- If the GP says “cycles vary,” you now have four months of evidence that yours varies in a specific repeatable way. That is the sentence that moves the conversation.
“Population data could not see you. Your own four-month note can. The model that reads it back is not diagnosing you — it is doing the dashboard work the apps were supposed to do but could not, because the apps were built on a population you were never proportionally part of.”
for practitioners reading this
If you work with women — and most practitioners do — the fastest improvement in your intake quality is to stop asking new clients to bring app data and start asking them to bring four cycles of their own four-line note. Three reasons. The note is honest in a way the app data is not, because nobody lies to a notebook the way they lie to a streak. The note includes the columns the apps do not have, because the client wrote them. And the note can be read back in three minutes by a model in front of the client, which makes the read-back itself part of the relationship.
It also retires the silent assumption that the population averages your tools are built on apply cleanly to her. They do not. They apply on average. She is not on average. Nobody is.
the quiet leverage
The data gap will close. It is closing. But the apps you are using this month were trained before it closed, and the AI features being bolted onto them this year are confidently smoothing your real signal into the same old population average. Until the field catches up, the most accurate model of your health is the one that has read four months of your own honest notes — and nothing else. That model exists. It is free. The four-line note is what makes it useful.
Stop asking algorithms whether you are normal. Ask a model whether you have a pattern. The first question was never going to serve you. The second one is the one your body has been waiting for someone to ask.
Recommended next