What to log, at minimum: (1) the date and the action (initiate, scale, exit); (2) the thesis statement in one sentence; (3) the price and the position size; (4) the conviction level (expressed as a percentage); (5) the explicit bear case; (6) the catalyst and the timeline; (7) the three operational signals you are watching; (8) what would falsify the thesis. Then, after every exit, log the outcome: was the thesis correct, partially correct, or wrong? Was the P&L good, neutral, or bad? Where did the two agree or disagree?
| Quadrant | Thesis correct | Thesis wrong |
|---|---|---|
| P&L good | Skill — the framework worked as designed. Document what made the read correct. | Luck — the position made money for reasons unrelated to your edge. Do not generalize from these. |
| P&L bad | Bad luck (or a timing mismatch with the catalyst window). The framework worked; the outcome is noise. | Skill failure — the framework or the execution broke. The single most valuable category to study. |
The outcome-bias trap. Most retail investors implicitly grade their decisions on P&L alone, which conflates skill and luck. The practitioner discipline is to grade decisions on the four-quadrant matrix above. A profitable trade where the thesis was wrong is a luck-driven outcome that should NOT inflate confidence; an unprofitable trade where the thesis was correct should NOT shake conviction. The journal is the only reliable mechanism to keep skill and luck separated.
Worked example — calibration curve construction. After 60 logged decisions, bucket them by initiation conviction: 50-59%, 60-69%, 70-79%, 80%+. For each bucket, compute the actual hit rate (thesis correct OR P&L good — pick one definition and stay consistent). Compare to the midpoint of the bucket. A well-calibrated analyst lands within five points of the bucket midpoint in each tier. A 15-point or larger miss is a systematic miscalibration the analyst can identify and work on. The curve is the most honest scorecard an investor can produce.
## See also: deeper references - **Overconfidence and the confidence-accuracy gap:** `bf-1` in `behavioral-finance-201` — for the underlying behavioral mechanism the calibration curve measures. - **Outcome bias and process vs results:** `bf-7` in `behavioral-finance-201` — for the four-quadrant framework's behavioral foundation. - **Anchoring and the path-dependence trap:** `bf-3` in `behavioral-finance-201` — for the disposition effect's role in journal review. - **Hindsight bias:** `bf-2` in `behavioral-finance-201` — for the most common journal-review failure mode (rewriting the thesis after the outcome is known). - **Conviction-calibration mechanics:** `ptk-3` in `practitioner-toolkit-201` — for the items-for-further-diligence discipline that produces well-calibrated initiations in the first place.
Sit with the ideas.
You log every initiation, sizing decision, and exit in a structured journal. Twelve months in, you review the journal and notice a pattern: positions where you initiated at conviction levels of 70% or higher have an actual win rate of 48%, while positions where you initiated at 50% conviction have a win rate of 51%. What does this pattern indicate?