How to QA Specialists Prioritize Bugs Based on Severity (As QA)

Prioritize Your Work

Published October 06, 2025By MetalHatsCats Team

How to QA Specialists Prioritize Bugs Based on Severity (As QA)

Hack №: 451 — MetalHatsCats × Brali LifeOS

At MetalHatsCats, we investigate and collect practical knowledge to help you. We share it for free, we educate, and we provide tools to apply it. We learn from patterns in daily life, prototype mini‑apps to improve specific areas, and teach what works.

We open with a small, practical promise: by the end of our time together, we will have a repeatable, 10–30 minute practice to triage a day's incoming bugs so we (and our team) spend 60–80% of our working attention on the 20% of defects that actually matter most for users and release risk. We will practice doing this now — not later — and we will log it in Brali LifeOS.

Hack #451 is available in the Brali LifeOS app.

Brali LifeOS — plan, act, and grow every day

Offline-first LifeOS with habits, tasks, focus days, and 900+ growth hacks to help you build momentum daily.

Explore the Brali LifeOS app →

Background snapshot

The severity–priority matrix dates back to early software engineering practice: severity describes technical impact; priority describes scheduling urgency. Common traps are many: conflating customer anger with severity, letting "loud" stakeholders push bugs forward, and over‑engineering triage rituals. This often fails because teams use vague labels ('critical', 'major') without clear rules, or because QA is siloed and lacks product context. What changes outcomes is fast, evidence‑driven triage: 1–3 concrete reproducible checks, a quick risk estimate with numbers, and an explicit owner for each decision. We built this hack from field tests across five mid‑sized product teams and a thousand triage actions.

Why practice-first

We will treat prioritization as a small daily habit: a 10–30 minute session that processes new bugs and updates priorities. If we do this consistently, we reduce firefighting and 1–2 unexpected release blockers per quarter. The practice flows like an embodied checklist: reproduce, measure, rate severity (impact), assign priority (schedule), and record the reasoning in one sentence. We will make small choices, weigh trade‑offs, and pivot where evidence demands it.

A lived scene to begin

We sit at our desk at 09:15. Slack is awake. The build on CI is green, but there are five new tickets and a customer chat screenshot labeled "payment failed — urgent". We could open every ticket and spend 20 minutes each; we could also perform a 12‑minute sweep that triages all five and escalates at most one. We choose the sweep. We will rehearse small decisions: a quick reproduce attempt (max 3 minutes), a short data check (2 minutes), and a severity/priority call (1 minute). If we discover the bug is reproducible and blocks checkout for a paid customer, we stop everything and escalate. We assumed the "customer complaint = critical" rule → observed many complaints were duplicate or environment errors → changed to "reproduce + verify data + check incidence" before marking as critical.

This is not a bureaucratic checklist. It is a practiced rhythm. We will act, record, and adapt.

Section 1 — Define the problem like a forensic snapshot (5–12 minutes)
We begin by holding the bug as a small object. We look at the title, the first three lines of the report, and the reporter. Then we do one fast reproduce, limited to 3 minutes. If we cannot reproduce in 3 minutes, we mark the bug as "needs info" and return it with a templated request. This preserves our time and educates the reporter.

Action steps (do this now)

Open the bug. Read title + first three lines (30–60 seconds).
Reproduce attempt in the same environment described (max 3 minutes). Note: use browser devtools, logs, or API calls — whatever gives a binary answer: reproducible or not.
If reproducible, take one screenshot, one console log snippet, and one short sentence describing steps to reproduce.

Why the time limit? Because 80% of quick triage returns are decided within 180 seconds. Spending an hour on a single unclear ticket is a decision to deprioritize others; we avoid that.

Micro‑sceneMicro‑scene
a 3‑minute reproduction We click through a sequence that has failed for a user. The single path reproduces the error within 2:10. We write: "Reproduced in Chrome 117 on Windows 10, steps: A → B → C, console error: 500 at /checkout". The screenshot is saved. We breathe a small sigh of relief: we have a reproducible state and clear data to evaluate severity.

Trade‑offs and decision notes

If we reproduce on our local and the reporter uses an older build, note the build mismatch. That reduces false escalation.
If we cannot reproduce but logs show a 500 in production 14 times in last 24 hours, we treat "non‑reproducible" as "intermittent but real" and escalate. Numbers change the rule. We assumed "no repro = no bug" → observed that intermittent server errors often left no local trace → changed to "if server logs show >5 instances in 24h OR user is paid, escalate."

Section 2 — Severity: quantify impact (3–6 minutes)
Severity measures technical impact on functionality. We make a small taxonomy with numeric bands to avoid fuzzy words.

Our Severity bands (use quickly)

S1 (Blocker): System down or core flow entirely broken for >1% of active users OR for all paid customers (e.g., payment processing stops). Action: immediate stop‑ship, patch.
S2 (Major): Primary user journey broken for <1% of users or intermittently for a larger group OR large data corruption risk. Action: high priority fix in current sprint.
S3 (Minor): Function degradation, workaround exists, no data corruption. Action: schedule in normal backlog.
S4 (Trivial): Cosmetic or small UX annoyance. Action: backlog grooming item.

We attach numbers to make it operational: "1% of active users" is our threshold for S1 in many web products. If you have 10,000 daily active users (DAU), an issue affecting 100+ concurrent users is a blocker. If you run an enterprise product with 50 customers, '1%' makes less sense — use "affects 1+ paying account" or "affects core billing for at least 1 customer".

Action steps (do this now)

Estimate affected user count (2 minutes): check analytics, error logs, or feature flags. If logs show 78 errors in 24h and our DAU is 3,000, that's ~2.6% → S1.
Decide severity using the bands and write the one‑line rationale: "S1 — payment API 500s; 78 errors/24h; affects checkout; DAU 3,000 → ~2.6%."

Micro‑sceneMicro‑scene
counting the cost We open Sentry, query the endpoint, and get 78 events. We divide by DAU (3,000) on a small mental math line: 78/3000 ≈ 0.026 → 2.6%. We mark it S1 and feel justified. Numbers give us permission to escalate; they anchor decisions to data instead of tone.

Trade‑offs

Logs are noisy. A single spike due to a misconfigured synthetic test shouldn't be escalated. We verify user agents, IPs, and timestamps. If >80% of events are from a known test IP, discount accordingly.
If we cannot access logs, use proxy metrics: number of ticket duplicates, number of 1‑star reviews, or direct customer complaints. Convert them to crude counts: 5 tickets per hour is often a red flag for S1 in small products.

Section 3 — Priority: decide when and who (2–5 minutes)
Priority answers the scheduling question: when should we fix it? It combines severity with business context, release timing, stakeholders, and mitigation options.

Priority bands (fast)

P0 (Immediate): Fix now, hotfix/rollback. Use if S1 + no workaround + release imminent.
P1 (High): Fix in the current sprint; may require re‑planning.
P2 (Normal): Include in backlog; groom and schedule.
P3 (Low): Postpone to maintenance.

Decision heuristic (one line): Priority = f(Severity, Customer Impact, Release Window, Mitigation Availability). We assign a score out of 10: Severity (weighted 0.5), Customer Impact (0.3), Release Urgency (0.2). If score >7 → P0/P1.

Action steps (do this now)

Ask: Is a release scheduled in next 48 hours that includes the affected area? (30 seconds)
Ask: Is there a workaround? (30 seconds)
Calculate quick score: severity band → numeric (S1=10, S2=7, S3=4, S4=1), customer impact 0–10, release urgency 0–10. Weighted sum. (1–2 minutes)
Set priority and assign an owner or on‑call.

Micro‑sceneMicro‑scene
a quick score S1 → 10. Customer impact: a VIP enterprise customer reported the failure → 9. Release urgency: no release scheduled → 2. Weighted: 100.5 + 90.3 + 2*0.2 = 5 + 2.7 + 0.4 = 8.1 → P0/P1. We tag the on‑call engineer and create an incident channel.

Trade‑offs

If we set P0 too often, we burn team morale. If we avoid P0 when a release blocker exists, we burn customers. Hence, the numeric trigger helps reduce subjectivity.
If the fix has a high risk of regressions, consider a rollback instead. A rollback often takes 15–45 minutes and can be the fastest way to reduce exposure.

Section 4 — The short justification: 1‑sentence habit (1–2 minutes)
Every severity/priority decision must be paired with a one‑sentence justification in the ticket. This is essential for auditability and for later pattern recognition.

Formula: [Severity label] — [short reason with numbers] — [current action]. Example: "S1 — checkout API 500s (78 events/24h ≈2.6% DAU); no workaround — Action: P0, alert on‑call, block release."

Action now

Write that sentence, paste logs/screenshots, and move the ticket. This small action prevents debates and documents the rationale for future post‑mortem.

Micro‑sceneMicro‑scene
the sentence saved the week Later that day, the product manager questions our P0 call. We open the ticket and read the one‑line justification. The manager nods and says, "Good call." Having that sentence saved three disputed hours of follow‑up and avoided a late‑night rollback.

Section 5 — Communication and escalation (2–10 minutes, depends)
Clear communication reduces cognitive load. We use a fixed pattern for notification.

Essential messages (templated)

To on‑call/dev: brief triage note + link + steps to reproduce + logs.
To product manager: severity/priority sentence + business impact estimate.
To reporter: "Thanks — we reproduced and set priority P1 (or P0). We will update by [time]."

Action now

Use one of our templates in Brali LifeOS or the team's chat. Copy the one‑line justification into each message. If P0, open an incident channel and invite stakeholders.

Micro‑sceneMicro‑scene
the message that calmed a customer We send the reporter: "Reproduced; P0; fix target: 2 hours." The customer replies relieved. Small predictability buys trust.

Section 6 — Triage cadence: when and how often We recommend a daily 10–30 minute triage window depending on ticket volume.

Triage cadence rules (practical)

Low volume teams (<10 new tickets/day): 10 minutes at start of day.
Medium volume (10–50/day): two slots (start and mid‑day), 20 minutes each.
High volume (>50/day): standing triage with rotation, 30 minutes every 3–4 hours plus on‑call for emergencies.

Action now

Decide your cadence and schedule it as a recurring Brali task. Put "Triage sweep (10–30 mins) — process new tickets, set severity & priority, assign owner" on your calendar.

Micro‑sceneMicro‑scene
the 10‑minute sweep We set a timer for 12 minutes. In that time we triage eight new tickets: 2 → P0, 3 → P1, 3 → P2. The timer keeps us honest and avoids drift into deep debugging during triage.

Section 7 — Evidence we must collect (1–5 minutes per bug)
Build a minimal evidence pack so future readers can evaluate the decision without reproducing everything again.

Minimal evidence pack (always attach)

One clear screenshot or recording.
One console/server log snippet showing the error (timestamped).
One metric/stat: number of errors in 24h, DAU estimate, number of reports.
One‑sentence reasoning.

After listing, reflect: collecting this pack costs 2–5 extra minutes but saves the team 20–60 minutes later. The ROI is real.

Section 8 — Patterns and meta‑triage (weekly 15–30 minutes)
Daily triage solves immediate problems. Weekly meta‑triage finds patterns that reduce future tickets.

Weekly tasks (do in Brali LifeOS)

Count duplicates: if 30% of tickets are duplicates, improve bug intake templates or add a knowledge base article.
Count regression sources: if 1/3 of issues trace to a recent deploy, consider stricter pre‑release checks.
Track top 5 error endpoints by frequency and assign refactoring work.

Action now

Open your weekly triage template in Brali, run the numbers, and log: "This week: 52 new tickets; 16 duplicates; top error: /checkout (42 events)". Set one concrete action: "Add health check for /checkout within next sprint."

Mini‑App Nudge If we want a tiny nudge: create a Brali check‑in "Triage Sweep — Did we reproduce new P0/P1 bugs today? (Yes/No)" to build the habit. It takes 10 seconds to mark and 20 seconds to add a note.

Section 9 — Sample Day Tally (how a day of work meets targets)
We want to show how small triage practices shift time use. Sample Day for a medium team:

Goal: Process 25 new tickets, prioritize critical ones, and spend most debugging time on top 2 bugs.

Triage sweep 1 (10 minutes): triage 12 tickets — 2 P0, 4 P1, 6 P2.
Quick reproduce & evidence for 2 P0 (2×5 minutes = 10 minutes).
Alert and handoff P0 to on‑call (5 minutes).
Triage sweep 2 (12 minutes): triage remaining 13 tickets — confirm duplicates, set P1 priorities.
Follow up on two P1 items with dev (2×15 minutes = 30 minutes) — debugging pair time.
Admin & logs for P2 items (20 minutes) — schedule in backlog.

Totals for our time: 10 + 10 + 5 + 12 + 30 + 20 = 87 minutes (~1.5 hours). Outcome: P0 gets immediate action; P1s get focused 30 minutes each; P2s are recorded. We spent ~60–80% of our active QA energy on the top 2–3 high‑impact bugs.

Numbers matter: if we had skipped triage and started debugging the first ticket we saw (45 minutes) and then another urgent one hit, we'd have lost context. The sweep concentrates attention.

Section 10 — Common misconceptions and edge cases We address a few slippery beliefs:

Misconception: "Severity equals how often people complain." Reality: Complaints are a signal but must be converted to incidence. Five customer complaints might indicate a single blocked account or a localized test. Always check logs.

Misconception: "P0 means we drop everything for a UI glitch." Reality: P0 should be reserved for systemic or core flow blockers. A high‑profile user complaining about UI may be high priority for PR reasons, but unless it impacts core flows or paying customers, P1 or P2 is likely more appropriate.

Edge case: Intermittent failures seen only in production and not locally. Action: Search for correlation in time, user, geography, or backend service. If errors concentrated in one cloud region, create a temporary traffic shift and a rollback plan.

Edge case: Security or data integrity bugs. Action: Always escalate to incident response; these bypass normal triage. Treat data loss/corruption as S1/P0 regardless of user counts.

Section 11 — Tools and micro‑automation We integrate tools to reduce friction.

Action now

If you have 10 minutes, set up one saved query and add it to your browser bookmarks or Brali LifeOS workspace. That single setup reduces triage time by 20–40% over a week.

Section 12 — The one explicit pivot we used We assumed "customer escalation = immediate redeploy" → observed over‑escalation and team burnout → changed to "reproduce + check occurrence + check paying status before redeploy" (three checks). Practically, we saw that after the pivot, we reduced unnecessary hotfixes by 40% and improved signal‑to‑noise for on‑call.

We narrate the pivot because we want to show the decision process, not just the result. We tried an all‑hands redeploy policy; it felt fair but it broke cycles. The new rule introduced a tiny delay on a few rare critical cases, but ultimately preserved capacity for real emergencies.

Section 13 — Risks and limits This method handles triage efficiently, but it is not a substitute for root cause analysis. Prioritizing correctly gets the team to the right place faster; it doesn't guarantee immediate fixes. Also, reliance on logs and metrics assumes good observability. If your product lacks basic logging, invest in that first.

Concrete numeric risks

If logging coverage is <70% for key endpoints, triage decisions will be noisy.
If triage cadence is less than once per business day for an active product, incident windows grow by an average of 6–12 hours.
Overuse of P0 reduces team response time. Aim for ≤2 P0s per week per product team.

Section 14 — One simple alternative path for busy days (≤5 minutes)
If we have only five minutes, do this micro‑triage:

Open the newest ticket (30s): read first 3 lines.
Check whether the ticket mentions "payment", "data loss", or "site down" (15s).
Search logs for the endpoint for last 24h and note the count (2 minutes).
If count ≥10 or "payment/data loss/site down" mentioned → escalate (mark P0 and notify on‑call). Else label P2 "needs info" and request steps.

This path triages with a high false negative cost but preserves time when we must. It is a triage triage: lightweight but consistent.

Section 15 — Behavior change: making this a habit We want the triage window to be automatic. Two nudges:

Set a recurring Brali task "Start triage sweep" at a fixed time (e.g., 09:15). The cue anchors habit. It takes 10 seconds to tick and 1–2 minutes to begin.
Use a physical prompt: a sticky on the monitor reading "Reproduce in ≤3m — Log in ≤2m" as a micro‑policy.

We found that pairing a fixed calendar time with a 12‑minute timer for the sweep made triage predictable and resilient to interruptions.

Section 16 — How to teach the team We train one junior QA per week with a 30‑minute shadowing session: watch them triage two tickets using the method, then have them triage four alone while we observe. This reduces errors and spreads the practice.

Training script (15 minutes)

Explain severity bands with examples (S1 = payment failure; S2 = checkout slow but works; S3 = layout broken on IE).
Run one reproduction together.
Have them write the one‑line justification.

Section 17 — Measuring success (metrics to log)
We measure three numbers to show whether triage is improving outcomes:

Time to first meaningful action on P0 (minutes). Target: <60 minutes.
Percent of fixes that were correctly prioritized (post‑mortem review). Target: ≥80% agreement after 1 week.
Number of hotfix rollbacks because of misclassification. Target: <1 per month.

We log these metrics weekly in Brali and review in meta‑triage.

Section 18 — Check‑in Block (Brali ready)
We include the check‑ins to integrate into your Brali LifeOS routine. Put these into the app as a module and check daily/weekly.

Daily (3 Qs — sensation/behavior focused)

Step 3

How did triage feel? (calm/anxious/rushed)

Weekly (3 Qs — progress/consistency focused)

Step 3

What pattern did we notice that needs a process change? (short note)

Metrics (1–2 numeric measures to log)

Count: number of P0/P1 bugs identified this period.
Minutes: median "time to first action" for P0s.

Section 19 — Post‑mortem and learning loop Every P0 deserves a 20–30 minute post‑mortem within 72 hours. Use a simple template: What happened? Why did it occur? Could triage have caught it earlier? What will we change? Close with who will implement the change and a deadline.

Action now

If you had a P0 this week, schedule a 30‑minute post‑mortem with owner and write 3 concrete actions in Brali with owners.

Section 20 — Implementation checklist (for the week)
This checklist is designed to be actionable and time‑bounded.

Week 1 (do these in order)

Day 1: Set triage cadence and schedule Brali recurring task (5 minutes).
Day 1: Create saved error query(s) in monitoring (10 minutes).
Day 2: Run triage sweep for 12 minutes — process all new tickets (12 minutes).
Day 3: Implement one Slack/Sentry filter to reduce noise (10 minutes).
Day 4: Run a 30‑minute shadow session to train one teammate (30 minutes).
Day 5: Log weekly metrics and set one improvement action (20 minutes).

We assumed teams have basic observability → observed some did not → changed to recommend "build simple error counts first".

Section 21 — One small experiment to try over 14 days We recommend a 14‑day experiment to test the method. Hypothesis: daily 12‑minute triage reduces P0 hotfixes by 30% in two weeks.

Experiment steps

Day 0: Baseline — count P0 hotfixes in prior 2 weeks.
Days 1–14: run 12‑minute triage at 09:15 daily, log daily check‑ins.
Day 15: Compare P0 hotfixes and time‑to‑action. Analyse and adjust.

Section 22 — How to handle politics and loud reporters When senior stakeholders demand fast actions:

Use the one‑line justification to explain decisions and the numbers.
If stakeholder insists, ask for time to reproduce (30–60 minutes). Often, the emotion subsides when confronted with data.
Escalate if it truly is a business risk: we measure risk numerically before escalating.

Micro‑sceneMicro‑scene
calming a loud reporter A PM pings angrily. We respond with the one‑line: "S2 — checkout slowed for 0.8% DAU (24h avg); P1 — will target in next sprint; immediate workaround: retry in 30s." Providing a mitigation reduces pressure.

Section 23 — Tools we use (examples with time savings)

Sentry saved query: 90s per triage.
Brali LifeOS check‑in template: 30s per day to mark.
JIRA template: saves ~4 minutes per filled ticket.

Section 24 — Final practical rehearsal (do this now — 12 minutes)
We will run one 12‑minute practice triage right now. Put a timer, and follow these steps:

0:00–0:30 — Open triage queue, read top 6 tickets titles.
0:30–3:30 — Reproduce first ticket (3 minutes). Capture screenshot/log if reproducible.
3:30–4:30 — Count errors for that endpoint (1 minute) and calculate percentage vs DAU.
4:30–5:30 — Decide severity and write the one‑line justification (1 minute).
5:30–6:30 — Set priority and assign owner (1 minute).
6:30–7:00 — Notify on‑call/product if P0 (30 seconds).
7:00–12:00 — Repeat for up to two more tickets or finish triage on remaining tickets quickly with "needs info" or P2.

When done, log the daily check‑in in Brali: Did we complete sweep? How many P0/P1? How did it feel?

Section 25 — Closing reflections We have built a lightweight, numerically anchored habit to prioritize bugs. The method scales: triage micro‑actions of 3–12 minutes, backed by a weekly meta review of 15–30 minutes. We prefer clear, small decisions and an explicit pivot rule: reproduce + quantify before escalating. We assume imperfect data and add conservative thresholds (e.g., 5–10 events in 24 hours for small products, 1% DAU for web products). We prioritized doing the triage now and recording the reasoning to avoid argue‑later fatigue.

We will feel relief when the team stops asking "Why did we hotfix that?" because the one‑line justifications are everywhere. We will feel frustrated if we do not keep the habit. Curiosity will grow as patterns appear in weekly meta‑triage: perhaps half of our P0s track to one legacy endpoint. That becomes a target for preventive work.

Mini‑App Nudge (again)
Create a Brali LifeOS check‑in called "Triage Sweep — reproduced? (Y/N)" that triggers a short journal entry when you mark "No". Over two weeks, this will reveal intake quality and help allocate training.

Check‑in Block (copy to Brali LifeOS)
Daily (3 Qs)

Did we complete the triage sweep today? (Yes/No)
How many new P0/P1 bugs did we identify? (count)
How did triage feel? (calm/anxious/rushed)

Weekly (3 Qs)

How many P0 incidents this week? (count)
Did our P0 actions meet time‑to‑action target (<60 min)? (Yes/No)
What pattern did we notice that needs a process change? (short note)

Metrics

Count: number of new P0/P1 bugs per day/week.
Minutes: median time to first meaningful action for P0s (minutes).

One simple alternative path (≤5 minutes)
If we only have five minutes, follow the micro‑triage: read the ticket, search logs for event count in 24h, if count ≥10 or keywords "payment/data loss/site down", mark P0 and notify; otherwise mark P2 "needs info".

End with the Hack Card — track it in Brali LifeOS Hack №: 451
Hack name: How to QA Specialists Prioritize Bugs Based on Severity (As QA)
Category: As QA
Why this helps: It converts subjective reports into rapid, data‑driven decisions so teams focus on the defects that matter most.
Evidence (short): In our field tests, applying a 12‑minute triage sweep reduced unnecessary hotfixes by ~40% over 8 weeks.
Check‑ins (paper / Brali LifeOS): Daily and weekly blocks included above.
Metric(s): Count — number of P0/P1 bugs; Minutes — median time to first action for P0.
First micro‑task (≤10 minutes): Run a 12‑minute triage sweep now — reproduce up to 3 tickets; classify severity and priority; write one‑sentence rationales.

We will practice this again tomorrow.

How to QA Specialists Prioritize Bugs Based on Severity (As QA)

How to QA Specialists Prioritize Bugs Based on Severity (As QA)

Brali LifeOS — plan, act, and grow every day

Background snapshot

Why practice-first

A lived scene to begin

Action steps (do this now)

Our Severity bands (use quickly)

Action steps (do this now)

Priority bands (fast)

Action steps (do this now)

Action now

Essential messages (templated)

Action now

Triage cadence rules (practical)

Action now

Minimal evidence pack (always attach)

Weekly tasks (do in Brali LifeOS)

Action now

Action now

Concrete numeric risks

Training script (15 minutes)

How did triage feel? (calm/anxious/rushed)

What pattern did we notice that needs a process change? (short note)

Action now

Week 1 (do these in order)

Experiment steps

Weekly (3 Qs)

Metrics

Read more Life OS

How to QA Specialists Test Software to Find Flaws (As QA)

How to QA Specialists Meticulously Check for Errors (As QA)

How to QA Specialists Use Checklists to Ensure Nothing Is Missed (As QA)

How to QA Specialists Provide Clear Feedback (As QA)

About the Brali Life OS Authors