Skip to main content
Skip to main content

The coach can only say what the evidence can carry.

This page is for the visitor who wants to know whether DN1 is honest about what it does and doesn't know. The opening contract is simple: evidence climbs a ladder, uncertainty stays visible, and when we do not know, the coach has to say so.

The evidence ladder

The coach can't say something works for you until the evidence supports the claim strength. Five rungs, with explicit boundaries.

  1. 01

    Tested in a bounded N-of-1 window. Enough observations to make a scoped personal read. This is not a broad causal proof claim.

  2. 02

    Repeated pattern (observational). Seen multiple times in your data, not yet under test conditions. The coach keeps it observational.

  3. 03

    Worth testing (tentative). A signal the system noticed, flagged for a real test. The coach proposes the protocol.

  4. 04

    No detectable effect. We tested it. The signal isn't there at the strength we'd act on. The coach says so.

  5. 05

    Too noisy to call. Real life confounded the read. The coach names the confounders and the next useful measurement.

When the data isn't enough

Two artifacts. The first is a protocol that didn't move what it was supposed to. The second is a window where life got in the way of the read.

No useful signal on the named outcome14 nights / 2026-03-29 → 2026-04-12

Magnesium did nothing for my sleep score. I kept taking it.


Window
14 nights at 420 mg bedtime
WHOOP sleep score
±2 points (null)
Subjective sleep
no step-change
Best night H10 RMSSD
33.3 ms (+8.7 vs prior)
Peak-hour H10 RMSSD
41.7 ms (program high)
Diastolic BP floor
86-88 mmHg (3 sub-90 readings)

The hypothesis on the named outcome (sleep) is not supported. A different autonomic or vascular read may be worth testing. Recommendation: keep the protocol, change what we measure, and verify before upgrading the claim.


Keep, reassign outcomeTier: Tentative
Too noisy to callDays 26-29 / 2026-04-05 → 2026-04-08

We tried to read this window. Real life confounded it.


Window
4 nights (Apr 5-8)
Sleep duration
~3.0 / 5.25 / 4.5 / fragmented hours
H10 RMSSD
not captured / 26.3 / 22.4 / fragmented ms
Mean sleep HR
not captured / 65.5 / 66.3 / 70+ bpm

Confounders named

  • Personal stress, 2-hr wake periods
  • Easter family disruption
  • Children waking, dogs added Day 27
  • <1200 cal / day x3 (appetite suppression)
  • Magnesium form change (pressed → capsule, ruled out)
  • Skin temp +2.7 °F deviation, sympathetic thermoregulation

The system named what got in the way of the read and what would close it. That's an evidence-bounded read.

Next useful step. Four consecutive low-stress nights with matched sleep duration and full stack compliance, the cleanest comparison the data hasn't seen yet.


InconclusiveTier: Too noisy

Your perception is the sensor

Most health tools ignore everything you cannot strap on. DN1 treats what you notice, energy, mood, how you actually feel, as data.

When you say “I had good energy through dinner and cleaned the kitchen instead of sitting down with my phone,” that is data. When you say “I skipped the walk and the whole afternoon got harder,” that is data. The system extracts structured signals from natural language: energy, clarity, mood, motivation, stress, sleep quality, soreness, digestion. No forms, no scales, no sliders.

In 35 days, the founder's conversations produced 294 subjective data rows across 8 canonical metrics. All LLM-derived from language. This is how DN1 can track outcomes most tools cannot measure: whether the day actually worked, not just whether a number moved.

What that looks like in practice

A real coaching turn. The user reported feeling flat despite textbook recovery metrics. The coach considered the pharmacology and changed the draft plan.

Coaching transcriptDay 17 / 2026-03-27

The coach saw the mismatch, asked what could explain it, and changed the draft plan.


User

Best metrics of the program this morning. Recovery 90, RHR 64, HRV 25, BP 128/92, slept 7 hours with textbook architecture. But I feel flat. Energy is a 5. Legs do not have any pop. What is going on?

Coach

Last night was your second consecutive night taking the full bedtime stack: magnesium glycinate 420 mg, ashwagandha, glycine. The same mechanisms that improved the metrics could be blunting the morning. Magnesium glycinate vasodilates and modulates GABA, six-to-eight-hour effect window, residual sedation tracks. Ashwagandha blunts the cortisol awakening response. Glycine drops core temp and slows the morning rebound. The metrics are the pharmacology. The flatness might be too. Did you take the stack again tonight?

User

Yes.

Coach

Then we test it. If tomorrow morning is energized, today was first-dose carry-over and your body is adapting. If you wake up flat again, the pattern is real and we move the stack earlier or pull ashwagandha alone for two nights to isolate it. Do not take it again until we read tomorrow.


The coach didn't tell me my sleep was poor. It read the pharmacology, asked the next question, and changed the draft plan. Reconstructed from session notes because the verbatim turn is not preserved.


Plan draft changedTier: REAL CONVERSATION (RECONSTRUCTED FROM SESSION NOTES)

Why depth matters

A general-purpose LLM and DN1 both receive the same data. The difference is what the system knows about you when it reads it.

Substrate comparison

Same model. Different substrate. Different answer.

A real morning question, fed to two systems. The model is the same on both sides. The system around it is not.

User message

"Whoop says I'm at 31% recovery this morning. HRV is 17, RHR is 67. I did a hard mile yesterday, first one in a few weeks. I feel pretty good actually. Slight legs but mentally sharp. I have a Z3 interval workout on the calendar today. Should I do it or take it easy?"

A leading general-purpose LLM

Listening to your body is great, and the fact that you feel sharp is a real signal. That said, there is a meaningful divergence between how you feel and what your recovery markers are telling you. A 31% recovery score with HRV at 17 ms is a clear physiological signal that your autonomic nervous system is still processing yesterday’s effort.

The general principle in this situation: if recovery is below ~33%, the body is in a sympathetic-dominant state and high-intensity training risks accumulating fatigue rather than building fitness. A Z3 interval workout layered on top of an unrecovered system can push you from productive overload toward functional overreaching, which sets back training quality.

I would recommend swapping today’s session for active recovery, a Zone 2 walk, mobility work, or light cycling at conversational pace. Hydrate well, prioritize protein at your next meal, aim for 7 to 8 hours of sleep tonight. Re-evaluate tomorrow morning. If your recovery score climbs back above 60% and your HRV recovers, you are cleared to resume the planned intervals.

You can absolutely train hard, but the smarter move is to let your body finish the adaptation it started before you ask it for more.

The same model with DN1's grounding

Train. Do the Z3 session.

Your wrist score is composite and biased toward rest. The components disagree with each other: the H10 overnight read and sleep components are cleaner than the score implies, and your subjective state is sharp. The 17 ms HRV is the day-after-hard-effort dip, not a warning.

We have watched this pattern in your data several times now. On the recent ones where you trained, your next-day signals came back better more often than not. Your most recent match was a few days ago, and the trained-then-rebound shape held.

The mechanism is supercompensation. The stimulus drives the parasympathetic overshoot during recovery; resting at the trough prevents the rebound. Your overall load this week is not anywhere near overreach territory.

One condition. If your warmup tells you the legs are dead, not slight, dead, switch to Z2 and we will re-stack tomorrow. The warmup is the test, not the wrist score.

Same model on both sides. The right one has the personal data, the components, and a system that is allowed to disagree with the wearable.

How we know the methods are real

We run our own research program on the math behind the coach. Methods are checked against bounded validation artifacts before they're allowed to support stronger copy.

The principles we'll talk about: an evidence ladder that bounds what the system can say, individual-level error control so small samples don't produce false discoveries, and honest uncertainty intervals when the system makes a claim. Principles and constraints are public. The implementation recipes that compose them stay private.

Why I built this

I've worn a Whoop for six years and a Garmin for thirteen. I've logged thousands of weights. I have lab panels going back further than that. The first time I asked a general-purpose LLM to read all of it together, it gave me an answer that sounded right and forgot the next morning.

I built DN1 because I wanted a system that remembered. I wanted something that, when I told it I'd tried magnesium for three weeks and my sleep score hadn't moved, didn't double down or capitulate. I wanted it to look at the autonomic and vascular markers the sleep score wasn't capturing and tell me what would be worth testing next.

The first user is me. The methods exist because I had to know whether what I was doing was working. The career-arc framing is the easy version of the story. The hard one is: I have years of data and I didn't trust any tool I could find to read it the way I'd read it.

Your version may start smaller: caffeine and afternoon clarity, creatine and training response, a blood-pressure marker, or one sleep score that keeps grading days it does not understand. Same method. Name the question, bind the claim to the evidence, and change the plan only when the signal earns it.

N-of-1 marker tracking, physician monitored41 days / 2026-03-10 → 2026-04-20

BP moved in one physician-monitored founder window. Useful method proof, not a protocol to copy.


Day 0
146 / 102
Day 17
128 / 92 (first sub-130 systolic)
Day 28
120 / 88 (program low both arms)
Day 41
121 / 86 (diastolic floor confirmed)
Day 43
122 / 87 (third consecutive sub-90 diastolic)
Window total
-25 systolic / -16 diastolic in 41 days
Strongest single lever
weight loss (-9.4 lbs)

One founder, one supervised window, one cheap marker measured at home with a rested morning protocol. Multiple lifestyle and supplement variables moved in parallel, so this does not isolate a lever and should not be read as blood-pressure treatment advice or a supplement recommendation. The public point is narrower: test-change-verify can make a cheap marker visible without pretending the result generalizes.


Marker movedTier: N=1, non-generalizable

Join the waitlist

If you've read this far, you're the user. Join the waitlist.

Join the waitlist
Methods | DeltaN1