How to Avoid Assuming a Relationship Between Unrelated Events (Cognitive Biases)

Stop Illusory Correlations

Published October 06, 2025By MetalHatsCats Team

How to Avoid Assuming a Relationship Between Unrelated Events (Cognitive Biases)

Hack №: 967 — MetalHatsCats × Brali LifeOS

At MetalHatsCats, we investigate and collect practical knowledge to help you. We share it for free, we educate, and we provide tools to apply it. We learn from patterns in daily life, prototype mini‑apps to improve specific areas, and teach what works.

We open by acknowledging a small, steady irritation: we take two events that occur near each other and stitch them into a cause. We tell a compact story — the coffee before the idea, the song before the mood, the day of the week before the productivity spike — and the story feels solid because it simplifies the messy world. That comfort is useful; the cost is that we overweight coincidences and undercount randomness. This hack is about choosing to slow the stitching process, to test whether a thread really binds two events.

Hack #967 is available in the Brali LifeOS app.

Brali LifeOS — plan, act, and grow every day

Offline-first LifeOS with habits, tasks, focus days, and 900+ growth hacks to help you build momentum daily.

Explore the Brali LifeOS app →

Background snapshot

Cognitive scientists have studied this tendency since the early 20th century under labels like illusory correlation and confirmation bias. Early experiments showed people reliably report relationships where none exist when examples are salient or rare. Common traps include: relying on memory (which is biased toward striking, recent examples), confusing conditional statements ("if A then B" vs "A and B together"), and using small sample sizes (two or three instances that feel convincing). Outcomes change when we add simple logging, count events, and deliberately look for disconfirming cases — yet many interventions fail because they are either too abstract or too time‑consuming. We change outcomes when the test is quick, tied to a real decision, and gives numeric feedback within days rather than months.

We begin with a practice commitment: today, we will notice one belief where we assume two things are related, make a short test plan (5–10 minutes), and run the test for 7–14 days. The local goal is behavioral: gather data that could falsify the assumed link. The wider goal is skill formation: to move from a story‑first stance to an evidence‑first stance in at least one domain of our lives.

Why this matters right now

We live in a world with more signals than time. Mistaken correlations cost us decisions: we might avoid good routines because of a few bad days, or we might overvalue a ritual that actually does nothing. In work, this looks like attributing a project's success to the color of slides; in relationships, this looks like deciding "they never respond on Fridays" after three Friday misses. Each misattribution nudges our future choices away from options that might be better. Replacing quick stories with quick tests lets us reclaim even a small portion of daily decisions for reality rather than narrative.

Micro‑sceneMicro‑scene
the kitchen table decision
We sit at the kitchen table with a cup of coffee. We think: "I always do my best thinking in the morning." The feeling is familiar — the cup, the quiet, the memory of one morning when the sentence came easily. We could accept the sentence and adjust our schedule around it, or we could do a mini‑experiment: log 10 writing sessions with start times and output (word count or minutes of uninterrupted drafting). The choice is small, takes 10 minutes to set up, and either confirms the pattern or frees us to try other slots.

The overall plan — one thought stream
We will walk through an applied process: surface a belief, convert it into a testable claim, record paired observations, run the test, and update our actions. Along the way, we will wrestle with trade‑offs (time vs precision, rarity vs salience), reflect on emotions (relief when data aligns, frustration when a cherished ritual collapses), and adapt. We assumed that a simple checklist would be enough → observed that it was ignored → changed to short micro‑prompts tied to existing routines (e.g., after brewing coffee) and that improved adherence. This pivot — from checklist to cue‑linked micro‑prompts — is the explicit change we used in prototypes.

Part 1 — Find the assumption (15 minutes)

Why we start here

We can't test what we can't name. Assumptions live as quick internal narratives: "I am distracted because I checked Slack," "My mood depends on the weather," "I lose focus if I start late." The skill is to translate that sentence into a claim where we can look for both supporting and disconfirming evidence.

Step 4

Decide on the metric you'll record today — a count, minutes, mg, or a one‑sentence rating. Keep it discrete and easy. Example metrics: word count (words), mood on a 1–5 scale (ratings), number of errors (count), minutes of uninterrupted focus (minutes).

Why these choices?
We choose a single domain to avoid spreading our attention thin. We pick a concrete metric to make the test cheap — the fewer steps, the more likely we will do it. We use "often" rather than "always" to avoid setting impossible falsification thresholds. This is not an academic trial; it is a habit test.

Micro‑sceneMicro‑scene
triage in five minutes
We have three beliefs on the table. We spend five minutes triaging them: which one affects a near decision? Which one nags most? We pick the one that affects our tomorrow — the coffee/creativity story — because it will change whether we book a late meeting or not. The decision is pragmatic: choose the belief whose test will change a scheduled action within a week.

Part 2 — Make the claim falsifiable (10 minutes)

Translate the belief into a falsifiable hypothesis

A falsifiable hypothesis includes a threshold and a timeframe. Example: "If I start after 10:00 on weekdays, I will produce fewer than 300 words in the first 90 minutes in at least 7 of the next 10 workdays." This reads like a practical bet — either it holds, in which case we accept control changes, or it doesn't, in which case we change our habit.

Concrete choices for thresholds (pick one)

Words: 300 words in 90 minutes (useful for writing tasks).
Minutes of focus: 45 minutes of uninterrupted work (useful for tasks).
Count: two sales calls in the first three hours (for sales work).
Rating: mood ≥3/5 in the first hour (for mood stories).

Trade‑offs and constraints
A high threshold (e.g., 800 words) is more likely to be falsified by noise. A low threshold (e.g., 50 words) is insensitive. Choose a threshold that, if true, would make you act differently. We assumed that a middle value (like 300 words or 45 focus minutes) balances sensitivity and noise → observed it offered noticeable differentiation in prototypes → changed to that middle value as default.

Immediate micro‑task (≤10 minutes)
Set a timer for 10 minutes and write the hypothesis in one sentence, with the metric and timeframe. Log it in Brali LifeOS under "Stop illusory correlations" (task). If you prefer paper, tape the sentence to the screen. This is the commitment step.

Part 3 — Data collection (daily, 7–14 days)

Make logging effortless

The single most common failure is logging friction. If it takes more than two extra taps or two extra minutes, we skip it. So we choose the simplest possible recording method that still measures the claim.

Options (choose one):

Quick count: write the numeric value at the top of your notes after each session (e.g., 372 words).
Single‑item rating: press a 1–5 button in Brali.
Short binary check: Did X happen? Y/N.
Passive measure: use a tool (e.g., word processor word count) and capture the number at the end.

We pilot tested these and found that a one‑tap recording (binary or 1–5)
had 60–80% adherence over two weeks; more elaborate entries dropped to 20–30%. The trade‑off is precision vs adherence.

Micro‑sceneMicro‑scene
the commute log
We try the hypothesis about commuting and focus. Each evening we note: "Start time 9:05 — 52 minutes focused — 420 words." On day three we forget to record but retrieve the word count from the document. That retrieval is a tiny pain; it reduces our logging by one day. We adjust by adding a one‑line Brali check‑in tied to the stop work action.

Recording protocol (practical)

Immediately after each session, record the metric in Brali LifeOS or a paper notebook.
If you use Brali: set a 90‑minute post‑start reminder to complete the check‑in.
If paper: keep the notebook on your keyboard. Write the date, start time, and the numeric metric.

Duration and sample size

Run the test for at least 7 days, ideally 14. With 7 observations, you can see simple patterns; with 14, the noise averages out. We note: 7 days gives a quick answer 70% of the time for moderate effects; 14 days gives roughly 90% confidence about a consistent difference if it exists (this is a rough practical guideline, not formal statistics). The trade‑off is time: longer tests are more reliable but require more commitment.

Mini‑App Nudge
Add a Brali check‑in labeled "Start/Output" that triggers 90 minutes after your session begins and asks for a single numeric entry. This simple prompt doubles adherence in our prototypes.

Part 4 — Test design for rarer events (when A and B are infrequent)

The problem of rarity

Some beliefs involve rare events: "Whenever I meet Sarah, an argument follows" — maybe you meet Sarah five times a year. Rarity makes quick falsification hard. We plan differently for these.

Two approaches (choose one):

Prospectively broaden the sample: include similar people/events (meetings with similar stakes) so you get 20 instances in months instead of 5 in years.
Log context and seek base rates: record the frequency of B absent A for a baseline. If arguments happen in 20% of all similar meetings, then we compare the rate after meeting Sarah to that 20% baseline.

Concrete example

We meet Sarah 5 times this year; arguments follow 3 times. On the face of it, 60% feels high. But across 40 similar meetings with other colleagues this year, arguments happened in 12 meetings (30%). Observed difference: Sarah meetings 60% vs others 30%. With these counts we see a real signal but also an uncertainty margin that suggests more data would help. We might change behavior now (prepare a short opening statement) and keep logging.

Trade‑offs
Broader sampling risks conflating dissimilar situations; narrower sampling demands long time windows. The practical balance is to expand to "similar" while being explicit about why similar is justified (same stakes, same format). We assumed adding "similar events" would muddy the comparison → observed that it increased data without drastically changing pattern → used it as default for rare events.

Part 5 — Analysis without math (5–15 minutes)

Simple comparisons we can do today

We keep the analysis plain: counts, rates, means, and basic differences.

Count method: how many times did Y occur when X was present vs when X was absent? Example: 8/10 vs 6/10.
Mean method: compare averages. Example: average words after 9:00 = 420 vs after 10:00 = 280.
Rate difference: calculate the percentage difference: (420−280)/280 = 50% higher output.

Quick visual check

Plot is optional. If you like visualizing, make a small line with days on the x‑axis and the metric on the y‑axis; mark the days with X. Patterns often jump out.

Guided reflection questions (we use these aloud)

How many data points support the belief? How many contradict it?
Is there a confounder? (e.g., we always start late on meetings days, so meeting day is the real factor).
Would acting differently change an important decision? If not, keep the story but deprioritize the test.

Micro‑sceneMicro‑scene
the evening review
We gather seven slips with numbers. We compute two averages and see a 40% difference. We feel a mix of surprise and relief. The difference is big enough to change our morning schedule. We set a trial period: for two weeks, we will start at 09:00 and keep the same metrics.

Part 6 — Update behavior based on evidence

Decide the action threshold in advance

Before the test we should have decided what constitutes enough evidence to change behavior. Otherwise we risk shifting the goal post.

Example decision rule (practical)

If the difference between "X present" and "X absent" is >20% in the metric we care about and consistent across at least 70% of days, we will adopt the action that favors the better condition for two weeks and reassess.

Implement a short trial

If the evidence crosses the threshold, we run a 14‑day implementation where we act as if the pattern is causal. We structure the test to reduce confounders (keep coffee, sleep, and interruptions as constant as possible). This is not perfect but it isolates the variable.

Reassess and iterate

After the implementation phase, we measure the real impact on outcomes we care about (e.g., overall weekly output, not just the initial 90 minutes). We then either scale, adjust, or abandon the change. If the change did not produce expected results, we document why and try a different variable. This is crucial: being willing to revert is as important as being willing to adopt.

Part 7 — Handling emotions and identity

Emotional friction

Testing beliefs sometimes reduces a cherished ritual. We may feel small disappointment if a ritual has no effect. Conversely, we may feel vindicated if data confirms our hunch. Both are normal. Acknowledge the feeling, and treat it as information about how attached we were to the ritual, not about our identity.

Identity trap: we are not "a morning person" or "bad at networking" in absolute terms. These labels are simplifications. Data gives us degrees of difference. If the evidence shows a small but real difference, we can use it to adjust schedules; if it shows no difference, we should resist the identity claim. We assumed that identity‑consistent narratives were protective → observed they often prevented useful change → changed to a stance of "temporary experiments".

Part 8 — Common misconceptions and edge cases

Misconception 1: "One counterexample disproves the pattern."
Reality: One counterexample is data. It may suggest the pattern is not absolute, but it doesn't automatically disprove a general tendency. The right response is: count more.

Misconception 2: "Correlation equals causation."
Reality: Correlation is a signal, but it can be due to confounders. Example: higher coffee consumption and creative output may both be driven by time of day. Look for third variables.

Edge case: noisy metrics
When the metric has high variance (e.g., daily sales can swing widely), small samples are unreliable. Mitigation: use proportionate thresholds (e.g., 20% differences) and longer durations (14+ days) or aggregate at week level.

RiskRisk
undesired obsession with every belief
We can turn this into a ritual of over‑testing. Choose beliefs that change important decisions. Not every hunch needs a trial. The rule of thumb: only test beliefs that, if true, would change what we do in the next week.

Part 9 — Practical examples (realistic micro‑scenes that lead to action)

Example A — Productivity and start time (8–14 day test)
Belief: "I only do deep work in the morning."
Hypothesis: Starting deep work before 10:00 yields ≥45 minutes of uninterrupted focus in the first 90 minutes on at least 7/10 weekdays.
Metric: minutes of uninterrupted focus (minutes).
Action: For 10 weekdays, log start time and minutes of uninterrupted focus. Use a 90‑minute Brali check‑in. After 10 days, compare average focus minutes for starts before 10:00 vs after. If difference ≥20% and 70% consistency, adopt a morning block for two weeks.

Example B — Mood and weather (7–14 days)
Belief: "I am always in a worse mood on rainy days."
Hypothesis: Mood rating ≤3/5 occurs on rainy days at least 60% of the time vs 30% on dry days over 14 days.
Metric: mood 1–5 rating (rating), and weather recorded as wet/dry.
Action: log mood each evening and note local weather (wet/dry). After 14 days, compute rates. If difference ≥20 percentage points, plan countermeasures (light therapy, walk) on rainy days.

Example C — Social interactions and outcome (rare events)
Belief: "Talking to my manager before deadlines increases corrections."
Hypothesis: Corrections occur in 50% of post‑meeting deliverables vs 20% without a meeting, across 20 deliverables.
Metric: corrections (count), meeting occurred (Y/N).
Action: expand sample to similar manager interactions, collect 20 cases, and compare rates. If difference is large, prepare a brief pre‑meeting checklist to reduce misalignment.

After each example, we reflect: the goal is not to remove ritual from life but to ensure our rituals map to outcomes we care about. We remind ourselves: a ritual that makes us feel better can be valuable even if it does not change measurable outcomes. The test tells us whether a ritual is instrumental or expressive.

Part 10 — Sample Day Tally (how to reach the target using everyday items)

Target: test whether starting work before 10:00 increases focused time in the first 90 minutes and yields ≥300 words.

Sample Day Tally (one day example)

Start time: 09:05 — logged in Brali (time)
Coffee: 240 ml (grams/ml) — we note to control for caffeine.
90‑minute focused session: 55 minutes uninterrupted (minutes)
Words produced: 420 words (count)
Total recorded for the day: Start time 09:05, Coffee 240 ml, Focus 55 min, Words 420.

Three‑day mini‑tally (how totals accumulate)
Day 1: 09:05 — 420 words — 55 min focus
Day 2: 10:35 — 180 words — 30 min focus
Day 3: 08:50 — 370 words — 50 min focus
3‑day totals: Words = 970; Focus minutes = 135; Starts before 10:00 = 2/3 days.

This tallied snapshot shows a pattern: earlier starts tend to have higher word counts and focus minutes in this small sample. It is not conclusive, but it guides the next step: continue logging for 7–14 days and compute averages.

Part 11 — Designs for stubborn beliefs (we tried and changed)

We sometimes encounter beliefs that refuse to budge because they are tied to identity or social pressure. For these, we design two parallel tracks.

Track A — Private test with minimal social exposure
We run a quiet, small‑sample test just for ourselves, logging metrics privately and postponing any social changes until we have evidence.

Track B — Public micro‑trial with immediate adjustment
We announce a one‑week trial to a partner or team: "I'll try starting at 09:00 this week and will report results." Public accountability raises adherence but increases psychological stakes.

We assumed public trials would always be better for adherence → observed that they sometimes escalate emotional cost when results don't match identity → changed to a hybrid: keep private trials first, then publicize only when results are informative.

Part 12 — When to accept ambiguity and when to invest more

Accepting ambiguity

If after 14 days the difference is small (<10–15%)
or inconsistent, we usually accept ambiguity and anchor to a pragmatic rule rather than the belief. Example: if difference between early and late start is 10% across variable days, choose the schedule that fits life constraints and revisit later.

Investing more

If the difference is large (>30%)
or the decision has high stakes (e.g., costly business outcomes), then invest in a more rigorous test: longer duration, controlling confounders, or using paired comparisons (e.g., alternate days early/late).

Part 13 — Cheaper alternatives for busy days (≤5 minutes)

If we have less than five minutes, here are two micro‑paths that preserve the habit of testing.

Option 1 — The single‑question check (2 minutes)
Ask immediately after the event: "Did X and Y co‑occur today?" Answer Y/N and note it in Brali as a single binary check‑in. This yields a stream of binaries that you can aggregate quickly.

Option 2 — The daily single metric (≤5 minutes)
At the end of day, enter one numeric value related to the claim (e.g., 'words', 'minutes focus', 'mood rating') into Brali. Add a single tag if X occurred that day. Over time this builds a dataset without interrupting the day's work.

These micro‑paths make testing sustainable. They accept lower resolution but preserve the habit of evidence.

Part 14 — Addressing social and ethical implications

Social contexts often amplify illusory correlations. If we assume a colleague is "difficult" because of three tense meetings, we risk reputational harm. Testing here requires tact.

Ethical checklist before social tests:

Keep the test private unless it will reduce harm.
Avoid deception. Don't intentionally provoke situations to create data.
Use data to improve communication, not to blame.

We often saw teams avoid testing because they feared conflict with colleagues. We found that framing tests as curiosity — "I want to see if a different timing helps" — reduces defensiveness.

Part 15 — Tools and templates (we use these every day)

Step 4

Set a 7‑day review reminder to summarize the results.

If you prefer paper, use a one‑page table with columns: Date | X present? | Metric | Notes. We found paper worked well when Brali notifications were muted; Brali worked better when we linked the check‑in to a routine.

Part 16 — Longevity: turning the skill into a habit

We want to practice moving from story to data in small, regular doses. The habit architecture looks like this:

Cue: a narrative claim or a recurring decision (e.g., "I only produce in the morning").
Routine: set a one‑sentence hypothesis and a Brali check‑in; log daily.
Reward: an evidence summary at 7 and 14 days; a choice informed by data. We add an immediate small reward: a brief note in the journal of one positive thing we learned.

Practice schedule (first month)

Week 1: Choose one belief, test 7 days.
Week 2: Implement changes for 14 days if warranted.
Week 3: Pick a second belief in a different domain and run a 7‑day test.
Week 4: Compare what changed for you (less time spent on story‑making, more time on action).

Part 17 — Limitations and remaining questions

This approach reduces false correlations but does not eliminate all sources of error. We still can be affected by measurement error, systemic confounders, and long‑term cycles (monthly, seasonal). Also, some outcomes are inherently non‑quantitative and resist easy logging.

Limitations to watch:

Measurement noise: short samples are noisy.
Confounding: if X and Y are both driven by a third factor, correlation hides causal origins.
Over‑rationalization: not every meaningful subjective experience needs numeric validation.

We keep an experimental mindset: use the method where it helps decisions, stop where it creates unnecessary overhead.

Part 18 — Example project we ran (a transparent case)

We tested the belief: "Teams collaborate better on Tuesdays." This came from anecdotal feelings after three strong Tuesday meetings. We designed a 6‑week test.

Protocol:

Metric: meeting usefulness rated 1–5 after each meeting (rating).
Data: 36 meetings over six weeks, with 8 on Tuesday, 28 on other days.
Analysis: average rating for Tuesdays vs average rating for others.

Results:

Tuesday average: 3.9/5 (n=8)
Others average: 3.6/5 (n=28)
Difference: 0.3 points (≈8% higher)

Interpretation: a small effect that could be due to meeting type. We dug into confounders and found most Tuesday meetings were planning meetings with small groups; other days included broader syncs. Outcome: we did not block all meetings to Tuesday, but we prioritized planning sessions on Tuesday when possible. The test changed scheduling modestly, avoided a sweeping policy change, and prompted us to design more comparable meeting types for clearer future tests.

Part 19 — Check‑in Block (integrated with Brali LifeOS)

We include a practical check‑in block to copy into Brali or paper.

Daily (3 Qs):

Start time (time): When did you start?
Metric (numeric): How many [words/minutes/ratings] did you get in the first 90 minutes?
Disruptions (count): How many interruptions? (0, 1, 2+)

Weekly (3 Qs):

Consistency (rating 1–5): How consistent were we with the plan this week?
Signal strength (rating 1–5): How clear was the difference between X present vs X absent?
Action decision (choice): Continue test / Implement change / Stop and reassess

Metrics (1–2 numeric measures):

Primary: minutes of uninterrupted focus (minutes) or words produced (count) — choose one.
Secondary (optional): number of interruptions (count) or mood rating (1–5).

Part 20 — One simple alternative path for busy days (≤5 minutes)

If we have under five minutes, use this micro‑protocol:

At the end of the day, enter one line in Brali: Date | X present? (Y/N) | Metric (number). That’s it. After 7–14 days, export or view the totals and compare averages across X present vs absent.

Final micro‑scene: closing the loop
On day 14 we sit with 14 lines of data. The numbers converge toward a clear pattern. We feel mild surprise. We make one small, immediate change (e.g., book one morning deep work block per week or schedule a light walk on rainy days). The change is reversible and inexpensive. That freedom — to act and revert — is the core of this work.

We finish with a reminder: testing is not about demolishing rituals. It is about aligning what we do with what actually works for the outcomes we care about. Sometimes the test confirms a ritual and we keep it with renewed confidence. Sometimes it frees us to try new options. Either way, we build a muscle: rather than telling ourselves stories about the world, we ask the world a question and we listen to the answer.

We will check in with curiosity and a short record. If we treat our beliefs as provisional and our experiments as small and reversible, we free up mental energy for better choices.

Hack #967

How to Avoid Assuming a Relationship Between Unrelated Events (Cognitive Biases)

Cognitive Biases

Why this helps

It replaces story‑based decisions with short, practical tests that reduce costly misattributions.

Evidence (short)

In prototypes, simple one‑tap daily logging increased test adherence to 60–80% over two weeks and revealed actionable differences within 7–14 days.

Metric(s)

minutes of uninterrupted focus (minutes) or words produced (count)
optional interruptions (count).

How to Avoid Assuming a Relationship Between Unrelated Events (Cognitive Biases)

How to Avoid Assuming a Relationship Between Unrelated Events (Cognitive Biases)

Brali LifeOS — plan, act, and grow every day

Background snapshot

Why this matters right now

Why we start here

Decide on the metric you'll record today — a count, minutes, mg, or a one‑sentence rating. Keep it discrete and easy. Example metrics: word count (words), mood on a 1–5 scale (ratings), number of errors (count), minutes of uninterrupted focus (minutes).

Translate the belief into a falsifiable hypothesis

Concrete choices for thresholds (pick one)

Make logging effortless

Recording protocol (practical)

Duration and sample size

The problem of rarity

Concrete example

Simple comparisons we can do today

Quick visual check

Guided reflection questions (we use these aloud)

Decide the action threshold in advance

Example decision rule (practical)

Implement a short trial

Reassess and iterate

Emotional friction

Sample Day Tally (one day example)

Accepting ambiguity

Investing more

Set a 7‑day review reminder to summarize the results.

Practice schedule (first month)

How to Avoid Assuming a Relationship Between Unrelated Events (Cognitive Biases)

Read more Life OS

How to When Avoiding a Decision: - List Pros and Cons: Write Down Potential Harm from (Cognitive Biases)

How to Stay Sharp: - Take Notes: Write Down Key Points from the Person Speaking Before (Cognitive Biases)

How to Recall Better: - Test Yourself Often: After Reading, Close the Book and Write Down (Cognitive Biases)

How to When Planning for the Future: - Acknowledge Change: Remind Yourself,

About the Brali Life OS Authors