RESEARCH

OpenEvidence vs Consensus: which one belongs in your Research Layer?

Two AI research tools keep coming up when people want to check what the evidence actually says about a supplement, a symptom, or a protocol. One is built for clinicians and reads like a specialist. One is built to summarise the literature and reads like a very fast librarian. Here’s how they differ, where each one earns its place, and how to stack both inside the Research Layer without handing over your judgement.

By Sabin · Wellness & AI9 min read

Somewhere between ‘I read one alarming headline’ and ‘I asked a chat model and it sounded confident,’ there is a better move: go and check what the actual literature says. That is the whole job of the Research Layer — the first layer of the stack, where you turn a question into evidence you can trust before it ever becomes a decision. And two tools keep coming up by name when people try to do this well: OpenEvidence and Consensus.

They get lumped together because they both point AI at published research instead of at the open internet. But they were built for different people, and if you use the wrong one for the wrong job you’ll either get an answer that’s too clinical to act on or one that’s too shallow to trust. This is a guide to telling them apart — and to using both without letting either one do your thinking for you.

what each one actually is

OpenEvidence is built for the point of care. Its natural user is a doctor with a question between patients, and it answers in that register: a synthesised, referenced response that reads like a colleague who has already done the reading. Ask it something specific and clinical and you get a paragraph of reasoning with citations threaded through it, drawn from medical journals and clinical sources rather than the general web.

Consensus starts from the other end. It is a search engine over the research literature that uses AI to pull and summarise findings across many papers at once. You ask a question, and instead of one authoritative answer it shows you a spread — here are the studies, here is roughly how many point which way, here is a one-line takeaway from each. It is designed to answer ‘what does the body of research say?’ more than ‘what should be done in this case?’

One tool tells you what a specialist would conclude. The other shows you the shape of the evidence a specialist would be standing on. Those are different questions, and most health mistakes come from asking one and hearing the other.

how they handle citations

This is where the difference gets practical, because a citation is only useful if you can follow it. Both tools cite, but they cite for different readers.

  • OpenEvidence threads references into a synthesised answer — the value is the reasoning, and the citations back it up. You still click through, but the tool has already made the call and is showing its work.
  • Consensus foregrounds the papers themselves — the value is the spread of sources, and the summary sits on top. You are reading the studies more directly, with the AI acting as a very fast triage of what each one found.

The rule that protects you is the same for both: a claim you cannot trace to a source you can open is not evidence yet, it is a rumour with good posture. Whichever tool you use, the citation is not decoration. It is the part you actually verify.

peer review, and what ‘evidence’ hides

Both tools lean on published, peer-reviewed research, which is exactly why people reach for them instead of a general chatbot. But ‘it’s in a study’ is the beginning of the work, not the end of it. A single small trial and a large meta-analysis are both ‘evidence,’ and they carry wildly different weight. This is where our own evidence hierarchy does the heavy lifting: systematic reviews and large randomised trials near the top; single studies, animal work and mechanistic reasoning much further down.

Consensus makes the spread visible, which helps you feel how settled — or unsettled — a question really is. OpenEvidence gives you a more finished conclusion, which is faster but asks you to trust the synthesis. Neither one removes your job: to notice study size, population, and whether the paper actually studied people like you. A great deal of health research was never about the person reading it.

which one for the non-clinical reader

If you are not a clinician — you are a thoughtful person trying to understand your own body — here is the honest split. Consensus tends to be the friendlier front door. It is built to answer topic-level questions and to show you the landscape, which is usually what you want first: is there anything to this, and how strong is it? OpenEvidence is extraordinary but speaks fluent clinician; its answers assume a reader who can weigh a contraindication without flinching.

That does not make OpenEvidence off-limits — it makes it the tool you graduate to once you know what you’re asking. The mistake is treating either one’s output as instructions. A referenced answer about a treatment is context to bring to the person who can actually prescribe it, not a prescription you write yourself.

how to stack both in the Research Layer

The point of the Research Layer was never to crown one tool. It is to have a repeatable way to turn a worry into a well-founded question. Here is the sequence that uses each tool for what it’s best at.

  1. Start broad in Consensus. Ask the topic-level question — ‘does creatine help with X?’ — and read the spread. You’re looking for whether the evidence is strong, mixed, or thin, not for a verdict yet.
  2. Go deep in OpenEvidence on the specific version of your question — the dose, the interaction, the condition — when you need clinician-grade reasoning rather than a landscape.
  3. Open at least two citations yourself. Read the abstract and the population. This is the step that separates being informed from being impressed.
  4. Write down what you found in your own words, in a place you own — the Ledger. One line: what the evidence says, how strong it is, what you’re unsure about.
  5. Take the unresolved part to a qualified human. The tools got you a better question; the clinician gets you the decision. That hand-off is the design, not a failure of it.

The stack is not OpenEvidence or Consensus. It is Consensus to map, OpenEvidence to drill, your own reading to verify, and a human to decide. Any single tool doing all four is how people end up confidently wrong.

the part no tool can do

Both of these tools are genuinely good, and both are getting better faster than anything you could have used five years ago. That is exactly why the skill worth building is not ‘which one has the best answers’ but ‘how do I read any of them well.’ The tools will keep changing names and interfaces. The literacy — weighing a study, tracing a citation, knowing where your question stops and a clinician’s begins — is the part that stays yours and keeps its value no matter which engine is winning this quarter.

So use both. Let Consensus show you the shape of the field and OpenEvidence sharpen the clinical edge of your question. Then close the laptop, keep the finding somewhere you can read it later, and remember that the most valuable tool in the Research Layer was never the search engine. It was the reader who knows what to do with what it finds.

Three things to read next.

See all →

Suggested for you

Based on what you've been reading — always learning.

See all →