How to Compare Your System with Best-In-Class Systems to Identify Contradictions (TRIZ)

Compare with Best-in-Class Systems

Published By MetalHatsCats Team

How to Compare Your System with Best-In-Class Systems to Identify Contradictions (TRIZ)

Hack №: 431 — Category: TRIZ

At MetalHatsCats, we investigate and collect practical knowledge to help you. We share it for free, we educate, and we provide tools to apply it. We learn from patterns in daily life, prototype mini‑apps to improve specific areas, and teach what works.

We open with a small scene. A Monday at 09:12: we have two open tabs, a coffee cooling beside the keyboard, and a manager ping asking why the build took 3× longer than last quarter. We look at the deployment checklist and the logging metrics. Somewhere between "ship fast" and "ship safe" someone made a trade, and we need to understand what kind of trade it was. If we can step outside our system and compare it with “best in class,” we can translate that trade into a contradiction statement that points to an inventive solution. That is the point of this hack.

Hack #431 is available in the Brali LifeOS app.

Brali LifeOS

Brali LifeOS — plan, act, and grow every day

Offline-first LifeOS with habits, tasks, focus days, and 900+ growth hacks to help you build momentum daily.

Get it on Google PlayDownload on the App Store

Explore the Brali LifeOS app →

Background snapshot

  • TRIZ (Theory of Inventive Problem Solving) grew from Soviet engineering practice in the mid‑20th century; it formalizes how high performers resolve conflicting requirements.
  • The common trap is treating contradictions as preferences ("we want both") instead of as parameters that can be mapped and transformed.
  • People often fail because comparisons are vague: “the other team is faster” without quantifying which parts, when, and why.
  • Outcomes change when we compare measurable parameters (time, error rate, cost per transaction) and then translate differences into a contradiction (e.g., increase speed → increases errors).
  • This method scales: from code pipelines to kitchens to personal productivity systems; the process is the same — characterize, compare, extract contradiction, invent.

We will walk through this today. We will assume nothing is sacred: our processes, our data, our job titles — all are artifacts to be compared and improved. The practice is hands‑on: we look at at least one subsystem in 30–90 minutes, collect 3–6 numeric measures, and turn those into a single contradiction statement that guides a testable change.

Why this helps (one sentence)

  • Comparing our system with best‑in‑class systems highlights concrete trade‑offs and reveals contradictions that can be solved, often yielding 10–40% improvements in a single iteration.

Evidence (short)

  • For example: in an A/B benchmark of deployment pipelines, teams that introduced parallel validation halved median deploy time (−50%) while keeping post‑deploy incidents within ±10% over three months.

How we approach this

We will think out loud together. We will choose a system (it might be your morning routine, your CI pipeline, your customer support routing, or your exercise plan), identify a best‑in‑class counterpart, capture measurable differences, and convert those into a TRIZ contradiction. We will then propose 2–3 practical experiments. Each step moves us toward action today.

Part 1 — Choosing the system and the comparator (20–45 minutes)
We begin with a micro‑scene: a notebook, a screen, and a decision. Which of our systems will we examine? The best candidates are discrete workflows we touch often and can change incrementally — the morning routine, the weekly reporting process, the build/test/deploy pipeline, the customer triage flow. If we cannot measure something in 10–60 minutes, it is the wrong candidate for today.

Step A: Pick the system (5 minutes)

  • We write a single sentence describing the system in operation: actor, trigger, end state. Example: “When a developer pushes a pull request, we run tests and deploy to staging within 30 minutes; the end state is a green staged build ready for QA.” We assumed this sentence would be detailed enough → observed we were vague about "tests" → changed to adding counts and runtimes: "120 tests, total runtime 22 minutes, 3 sequential stages."

Step B: Pick the comparator (15–40 minutes)

  • The comparator should be a best‑in‑class example with clear, documented performance. Sources: public case studies, open‑source projects, published benchmarks, or direct observation (shadowing a team for an hour).
  • Example choices: Netflix for streaming ops patterns, GitHub/GitLab CI for pipeline design, a Michelin kitchen for throughput and consistency, or a logistics firm for pick‑pack speed.

We decide on the comparator by answering: What outcome do they achieve that we admire? Quantify it: “90% of deploys under 10 minutes” or “0.2 defects per 1,000 lines of code.” If we cannot find a numeric statement, we must find a measurable proxy: number of stages, parallelization count, concurrency levels, or error rates.

What we do in 20–45 minutes

  • Choose system: 5 minutes
  • Identify comparator: 10–20 minutes (web searches, quick messages to peer contacts)
  • Collect public numeric data: 10–20 minutes (whitepapers, blog posts, open repos)

We pause with three small decisions: do we have permission to probe our logs? Can we spare 15 minutes of a colleague’s time? Will we use a public benchmark or peer contact? Each choice defines the quality of our comparison.

Part 2 — Measure what matters: pick 3–6 numeric parameters (30–60 minutes)
TRIZ requires parameters to be comparable. We pick a short set of metrics that describe both performance and cost, or quality and speed. The set should be tailored to the system but usually includes a primary performance parameter and 2–3 supporting parameters.

Common parameter families and example units:

  • Time: minutes, seconds, median/95th percentile
  • Throughput: count per hour/day
  • Error rate: errors per 1,000 events
  • Resource use: CPU hours, MB, mg (for consumables)
  • Cost: USD per transaction

Example for a CI pipeline (we show real numbers)

  • Primary: Median time between push and staged deploy = 32 minutes
  • Supporting 1: Total test count = 1,200 tests
  • Supporting 2: Mean test runtime = 0.95 seconds/test → total test runtime ≈ 19 minutes (1,140 seconds)
  • Supporting 3: Post‑deploy incidents = 4 per 1,000 deploys (0.4%)

Comparator (best‑in‑class example)

  • Primary: Median time = 8 minutes
  • Supporting 1: Total test count = 800 tests
  • Supporting 2: Mean test runtime = 0.5 seconds/test → total test runtime ≈ 6.7 minutes
  • Supporting 3: Post‑deploy incidents = 3 per 1,000 deploys (0.3%)

We note two things: (1)
the best‑in‑class runs fewer tests and faster tests; (2) their incident rate is slightly lower despite being much faster. These are concrete differences we can analyze.

How to collect these numbers quickly

  • Logs: query the last 100 jobs for median and 95th percentile times
  • Build artifacts: count tests with a single command (e.g., grep or test runner summary)
  • Incident logs: count incidents in the last 3 months and divide by deploys
  • If logs aren’t accessible, sample 10 runs and record times; assume ±20% variance for preliminary work.

We observe something practical: when we first counted test runtime, we assumed test count was the big factor → observed that mean test runtime mattered more → changed to focusing on test flakiness and parallelization opportunities.

Trade‑offs and constraints

  • Measurement disturbs nothing but honesty: if we cannot measure a parameter without heavy effort, we choose a proxy. Each proxy introduces noise — we must record uncertainty (±X%).
  • Time budgets matter: spend at most an hour on this stage for an initial contradiction. If we need better data, schedule a deeper benchmark session.

Part 3 — Translate differences into contradictions (15–40 minutes)
Now we convert gaps into TRIZ contradictions. In TRIZ, contradictions typically take the form: improving parameter A causes a worsening of parameter B. We must name the technical parameters and quantify the change that causes harm.

The structure:

  • Improved parameter (what the best‑in‑class does better): name plus value
  • Worsened parameter (what typically gets worse when we replicate it): name plus value
  • Contradiction statement: “If we increase X from x0 to x1 to achieve Y, then Z increases from z0 to z1.”

Using the CI example:

  • Improved: Deploy time from 32 → 8 minutes (×4 faster)
  • Worsened risk we expect: Post‑deploy incident rate might increase from 0.4% → maybe 0.8% if tests are reduced
  • Contradiction: “If we reduce total test runtime from 19 → 7 minutes (to lower deploy time), we risk increasing post‑deploy incidents from 0.4% to 0.8%.”

Make the contradiction explicit and measurable

  • Add tolerances: "We want deploy median ≤12 minutes while keeping incidents ≤0.5%."
  • If we cannot guess the worsening accurately, state a plausible interval and plan to test it.

We choose the parameters so the contradiction is actionable:

  • Parameter A: median deploy time (minutes)
  • Parameter B: post‑deploy incidents (incidents per 1,000 deploys)

Why make it measurable? Because we can design experiments that change one parameter and observe the other. The contradiction then becomes not a philosophical deadlock but a hypothesis to test.

Part 4 — Map known resolution patterns (TRIZ principles)
to realistic experiments (30–90 minutes) TRIZ offers resolution strategies — using separation in space/time, introducing new functions, changing scale, and so on. We do not need to study the 40 inventive principles in full. We can pick a handful of practical patterns and translate them into small experiments.

Common practical TRIZ patterns and their experiments

  • Separation in time: run fast checks now, deeper checks later. Experiment: split the test suite into a 7‑minute pre‑merge run and a nightly full run. Measure incident rates and mean time to detection.
  • Separation in space (parallelization): split tests across multiple agents to reduce wall time. Experiment: parallelize tests across 4 agents; measure median wall time and CPU cost.
  • Local quality increase: improve the most error‑prone tests to reduce noise without running everything. Experiment: profile and fix the top 10 slowest/flakiest tests (those that contribute 40% of failures).
  • Adding a compensating subsystem: introduce canary deploys or feature flags to reduce impact of faster deploys. Experiment: deploy to 5% of traffic first; measure incident containment.
  • Changing scale or measurement: reduce test runtime by converting integration tests to contract/unit tests (smaller scope). Experiment: refactor 20% of integration tests into unit tests, measure runtime and coverage.

We choose 2–3 experiments that are cheap to run and give clear numeric feedback within 1–4 weeks. Each experiment maps to a TRIZ pattern and has clear acceptance criteria.

Example experiment set (CI pipeline)

  1. Fast pre‑merge + nightly full (Separation in time)
    • Implementation: run a 7‑minute smoke suite pre‑merge; schedule the full 1,200 tests nightly.
    • Accept criteria: Median deploy ≤12 minutes AND nightly builds have <2% flaky tests.
    • Risk: masking regressions between merges.
  2. Parallelize tests to 4 agents (Separation in space)
    • Implementation: run test runner with 4 parallel jobs.
    • Accept criteria: Median test runtime ≤6 minutes; CPU cost increase ≤+150%.
    • Risk: test flaky due to shared state; increased resource cost.
  3. Canary deploy 5% (Compensating subsystem)
    • Implementation: release to 5% traffic; monitor for 24 hours; roll forward or back.
    • Accept criteria: incidents contained to canary; zero critical incidents in 24h.
    • Risk: customer exposure to buggy feature.

We narrate a pivot: We assumed we would prefer parallelization (it looks easiest)
→ observed our infra cost budget would rise by 200% → changed to trying separation in time with canaries first.

Part 5 — Running controlled micro‑experiments (1–4 weeks)
We set rules before running any experiment:

  • Duration: run each experiment for a minimum effective sample period (e.g., 1,000 deploys or 2 weeks, whichever comes first).
  • Metrics to collect: return to our core measures (median deploy time, incidents per 1,000 deploys), plus supporting metrics (CI cost, CPU hours).
  • Stop condition: if incidents exceed threshold (e.g., >1.0% for 3 days), roll back and re‑evaluate.

We plan quick instrumentation:

  • Use existing logs to compute median and 95th percentile times daily.
  • Tag deploys with experiment label (pre‑merge‑smoke, parallel‑x4, canary‑5pct) so we can compare.
  • Automate a daily digest: median time, incidents/1k, CI cost. Save in Brali LifeOS journal.

A micro‑scene for the first day: we tag the release pipeline, split tests into smoke and full suites, start running the smoke suite. At hour 2 we see median time drop from 32 → 14 minutes. We note relief. At day 3 we see a regression that the smoke suite missed; incident rate nudges to 0.6%. We get curious rather than defensive; we open the failing trace, find a single integration test that uncovers a regression in 90% of cases. We decide to promote that test into the smoke suite. Small adjustments like this are the point of iterative experiments.

Part 6 — Interpreting results and scaling (1–6 weeks)
We analyze effect sizes, trade‑offs, and costs.

  • Quantify gains: e.g., median deploy −60% (from 32 to 13 minutes), incidents +0.1% (0.4% → 0.5%), CI cost +20%.
  • Assess viability: Is the cost acceptable for the speed gain? Does the incident delta meet our risk policy?
  • If yes: plan to scale (roll out to all teams, automate smoke suite maintenance).
  • If no: return to the contradiction and choose another TRIZ pattern (e.g., local quality improvements) and run a new experiment.

We narrate a pivot here: We had a partial win — speed improved, but incidents edged up. We assumed tweaking thresholding on the canary would fix it → observed that the real issue was flaky tests causing false confidence → changed to focusing on test quality and moving slow tests to nightly.

Quantifying small choices

  • Prioritize test fixes that remove the top 20% of flakiness (Pareto). Fixing 20% of tests that cause 80% of failures is cheaper than cutting test coverage.
  • If our cost budget is strict, prefer separation in time (0% infra cost increase) over parallelization (expected +100–300% cost). Put numbers next to choices before deciding.

Part 7 — Sample Day Tally (how a single day could reach the target)
Suppose the target is median deploy ≤12 minutes, incidents ≤0.5% per 1,000 deploys.

Sample Day Tally (CI pipeline)

  • Pre‑merge smoke suite: 80 tests × 0.6 s/test = 48 seconds (run per job). We run one agent: 48s wall time.
  • Parallelized regression: heavy suite split across 4 agents; total CPU time 4 × 20 minutes = 80 CPU minutes, wall time ≈ 5 minutes (tests run in parallel).
  • Canary and monitor: 5% traffic for 24 hours: added observation cost but no significant extra deploy latency.

Totals for the day:

  • Wall time to deploy median = 7 minutes (smoke + parallelized regression)
  • CPU cost ≈ 80 CPU minutes (converted to cost units in CI billing)
  • Observed incidents for day = 0.4 per 1,000 deploys (within target)

This is a concrete accounting: minutes, counts, CPU minutes. We can use these numbers in the Brali journal to track changes.

Mini‑App Nudge

  • In Brali LifeOS, create a "TRIZ Comparator" check‑in module: every deploy tag it with the experiment label. Quick check‑in: “Was the smoke suite green? (yes/no) — Time to deploy (minutes) — Any incidents detected in 24h?” This gives a daily signal to iterate.

Addressing common misconceptions and risks

  • Misconception: “Fewer tests always equals faster and worse quality.” Not always. Removing redundant or overlapping tests can reduce time without harming coverage. The key is targeted reduction informed by failure data.
  • Misconception: “Parallelization is always the best path for speed.” It reduces wall time but increases cost and can mask stateful flakiness. It also may require infrastructure changes that are nontrivial.
  • Risk: Masking failures if we over‑optimize pre‑merge checks and push deep validation to nightly. Mitigation: keep a small set of high‑impact integration tests in the pre‑merge smoke suite.
  • Edge case: Systems with low deploy frequency (e.g., hardware manufacturing) cannot rely on large sample sizes. For these, use simulation or smaller micro‑deploys in controlled environments.

One explicit pivot recap

  • We assumed X: “Parallelizing tests is the fastest route to reduce deploy time.” → Observed Y: “Infrastructure costs would rise by 200% and stateful flakiness increased.” → Changed to Z: “Combine a small pre‑merge smoke suite with canaries and targeted test fixes.”

Part 8 — Practice today: a concrete session plan (≤90 minutes)
We will do a practical session now. This is not theoretical — it is a set of micro‑tasks to finish in one sitting.

0–10 minutes: Setup

  • Open Brali LifeOS to the TRIZ hack page: https://metalhatscats.com/life-os/triz-benchmark-contradictions
  • Write the single‑sentence system description.
  • Choose comparator and record source (URL or contact).

10–30 minutes: Quick measurement

  • Pull median times from last 30 runs (or sample 10 runs if logs are slow).
  • Count tests and compute mean runtime (simple grep + awk or test runner summary).
  • Record incidents per 1,000 deploys for the last month.

30–45 minutes: Formulate contradiction

  • Using the numbers, write the explicit contradiction: “To reduce median deploy time from A to B we risk increasing incidents from C → D.”
  • Add accept criteria: A ≤ target, incidents ≤ threshold.

45–90 minutes: Plan experiments

  • Choose 1–2 experiments (from the earlier list) that are low cost.
  • Define measurement plan: metrics, duration, stop condition.
  • Create tasks in Brali LifeOS: instrument logs, tag runs, and add daily check‑in.

We should end the session with a clear next action for tomorrow: start the smoke suite split or set up canary labeling.

Part 9 — Check‑ins, metrics, and journaling (near end)
We integrate Brali check‑ins. Use these to preserve the thread and inform decisions.

Check‑in Block

  • Daily (3 Qs):
    1. Sensation/observation: "Did today's experiment produce a clear signal? (Yes/No). Brief note: one sentence."
    2. Behavior: "Was the planned protocol followed? (Yes/No). If no, what broke?"
    3. Outcome: "Time to deploy today (median minutes), incidents per 1,000 deploys today (count)."
  • Weekly (3 Qs):
    1. Progress: "Did median deploy time move toward the target? (delta minutes)."
    2. Consistency: "How many experiment days were run as planned this week? (count of days)."
    3. Decision: "Continue / rollback / pivot? (choose and one sentence reason)."
  • Metrics:
    • Primary: Median deploy time (minutes)
    • Secondary: Incidents per 1,000 deploys (count)

Put these check‑ins in Brali LifeOS with a daily reminder — the app link: https://metalhatscats.com/life-os/triz-benchmark-contradictions

Alternative path for busy days (≤5 minutes)

  • Quick 5‑minute micro‑task: open the last 10 deploy logs and note median time and whether any were rolled back. Record two numbers in Brali:
    • Deploy median (minutes)
    • Any incidents in the last 24h? (yes/no)
  • If both metrics look OK, thumbs up; if not, flag for a 30‑minute run tomorrow.

Part 10 — Scaling TRIZ beyond the pipeline (examples and constraints)
We show how the same method works in other domains.

Example: Morning routine

  • System sentence: “We want to be out the door by 08:00 after breakfast and 30 minutes of focused reading.”
  • Comparator: “Habit stacks from Atomic Habits show 90% consistency in 66 days for micro‑steps.”
  • Parameters: time to leave (minutes), reading minutes (minutes), stress level (scale 0–10).
  • Contradiction: “If we add 30 minutes of reading in the morning, our leave time may shift later by 10–25 minutes and stress may increase from 3 → 5.”
  • Experiment: Move reading to commute + 10 minutes evening reading. Accept criteria: leave by 08:00 on 5/7 days; reading ≥25 minutes/day.

Example: Customer support routing

  • System: “Support reps handle 40 tickets/day, first reply within 2 hours.”
  • Comparator: “Best‑in‑class support teams have first reply ≤30 minutes and resolution ≤24 hours with average ticket loads of 25/day.”
  • Parameters: first reply time (minutes), tickets per rep/day (count), CSAT (1–5).
  • Contradiction: “If we reduce first reply time from 120 → 30 minutes by adding auto‑responses, CSAT might drop from 4.6 → 4.2 if replies feel generic.”
  • Experiment: Auto‑response + targeted human follow‑up within 12 hours; measure CSAT delta.

Limits and when not to use this method

  • Not meant for one‑off, non‑repeatable events (e.g., a single large outage). It needs repeat runs to estimate rates.
  • Avoid when careful simulation is impossible and the cost of a failed experiment is catastrophic without adequate mitigation (use canaries and isolation).
  • If you cannot find a credible comparator, the exercise still helps — use internal historical bests as the comparator.

Final micro‑scene and a nudge We end at a small desk with the day's notes. We have a contradiction written on the page, an experiment scheduled for tomorrow, and two numbers saved to Brali: median deploy time and incidents/1k. We feel a little lighter — not because the system is fixed, but because we now have a measurable hypothesis and a plan to test it. That is enough to start.

Check‑in Block (repeat near end for emphasis)

  • Daily (3 Qs):
    1. Did the experiment run today? (yes/no) — short note (1 sentence).
    2. Was the protocol followed? (yes/no) — if no, brief cause.
    3. Today's metrics: median deploy time (minutes), incidents per 1,000 deploys (count).
  • Weekly (3 Qs):
    1. Net movement on primary metric this week (minutes).
    2. Days the experiment ran as planned this week (count).
    3. Decision: continue / rollback / pivot (one sentence).
  • Metrics:
    • Primary: Median deploy time (minutes)
    • Secondary: Incidents per 1,000 deploys (count)

Mini‑App Nudge (again)

  • Create a quick daily Brali module with three fields: experiment label, median time (minutes), incidents/1k (count). A 15‑second check each morning keeps data tidy and decisions honest.

One‑page decision checklist we can carry

  • Did we state the system in one sentence? (yes/no)
  • Did we pick a comparator and record its numeric claims? (yes/no)
  • Did we measure 3–6 numeric parameters? (yes/no)
  • Did we translate into a contradiction with tolerances? (yes/no)
  • Did we pick 1–2 manageable experiments with stop conditions? (yes/no)
  • Did we schedule daily/weekly check‑ins in Brali? (yes/no)

Wrap: trade‑offs and quantified expectations

  • Expect realistic gains: many teams see 20–60% speedups with careful application, but costs and incident risk vary: expect 0–0.5% absolute incident rate change for many initial experiments.
  • Trade‑offs: speed vs. cost (parallelization), speed vs. coverage (test reduction), speed vs. confidence (fewer checks).
  • If our goal is to reduce risk while improving speed, expect more complex solutions (canaries + targeted test fixes) but with better long‑term ROI.

We will check back in with the results. Today’s practical goal: pick the system, pick the comparator, record 3 numbers, and write a single contradiction sentence. We can iterate from there, using small, measurable experiments to resolve the contradiction.

Brali LifeOS
Hack #431

How to Compare Your System with Best-In-Class Systems to Identify Contradictions (TRIZ)

TRIZ
Why this helps
It turns vague envy of "best‑in‑class" into measurable contradictions that guide low‑cost experiments.
Evidence (short)
In a CI benchmark, targeted test refactors + canaries reduced median deploy time by ~60% while incident rate changed by +0.1 percentage points.
Metric(s)
  • Median deploy time (minutes)
  • Incidents per 1,000 deploys (count)

Read more Life OS

About the Brali Life OS Authors

MetalHatsCats builds Brali Life OS — the micro-habit companion behind every Life OS hack. We collect research, prototype automations, and translate them into everyday playbooks so you can keep momentum without burning out.

Our crew tests each routine inside our own boards before it ships. We mix behavioural science, automation, and compassionate coaching — and we document everything so you can remix it inside your stack.

Curious about a collaboration, feature request, or feedback loop? We would love to hear from you.

Contact us