Phase 3 · W27–W28

W27–W28: Operational Metrics & Reporting

Turn classifier, routing, clustering, and eval outputs into automated reports that drive operational decisions.

Suggested time: 4–6 hours/week

Outcomes

  • A weekly ops report generated automatically.
  • A top recurring issues section with counts and trend context.
  • Routing breakdown across dev, data, config, and manual actions.
  • Model health stats including accuracy, needs_human, and drift flags.
  • A short executive summary that stays honest and actionable.

Deliverables

  • Automated weekly report file saved under /reports/ with date.
  • Metrics tables for routing, top clusters, and DQ failures.
  • AI health section with regression pass/fail and drift signals.
  • Top 5 next-action list linked to cluster/rule_id evidence.

Prerequisites

  • W25–W26: Evaluation (accuracy, drift, regressions)

W27–W28: Operational Metrics & Reporting

What you’re doing

You take your AI Ticket Analyzer and turn it into something that helps you run AMS like a machine.

Right now you have:

  • classification
  • routing
  • clustering
  • evaluation

Cool. Now make it visible.
Because if nobody sees it, nobody trusts it, and it dies.

Time: 4–6 hours/week
Output: an ops report pack (weekly/monthly) + metrics tables that show what’s happening and what to fix next


The promise (what you’ll have by the end)

By the end of W28 you will have:

  • A “weekly ops report” generated automatically
  • A “top recurring issues” section with counts + trends
  • Routing stats (what goes to dev vs data vs config)
  • Model health stats (accuracy, needs_human, drift flags)
  • A simple executive summary that doesn’t lie

The rule: reports must drive decisions

A report that doesn’t change a decision is just decoration.

Every section must answer:

  • what happened?
  • why it matters?
  • what we do next?

What to include in your report (keep it tight)

1) Volume and trend

  • total tickets processed
  • trend vs last period
  • spikes (top days)

2) Routing breakdown

  • % dev required
  • % data quality
  • % config/customizing
  • % manual client action

This shows where your team is bleeding time.

3) Recurring issues (clusters)

  • top 10 clusters by count
  • trend vs last week/month
  • recommended actions (runbook/fix/mapping)

This is your “investment list”.

4) Data quality top pain

  • top rule_id failures
  • top fields with issues
  • countries/sales orgs with most problems (if relevant)

This turns DQ into a targeted plan.

5) AI system health

  • golden set accuracy (latest)
  • needs_human rate
  • invalid output rate (should be 0)
  • drift flags (if label distribution changed)

This stops silent quality decay.


Step-by-step checklist

1) Define the report format

Pick one:

  • markdown report generated to /reports/
  • HTML page in your app
  • both (markdown is enough for v1)

Start with markdown. It’s easy to ship.

2) Build a report generator

One command:

  • queries DB (or reads logs)
  • computes metrics
  • prints a clean report file

Do not hand-write reports. Automate.

3) Add “executive summary” (but keep it honest)

Top 3 bullets:

  • biggest spike
  • biggest recurring issue
  • best next fix to reduce volume

No corporate fluff. Just reality.

4) Make it periodic

Weekly report is enough.
If you want monthly too, make monthly a rollup.

5) Make it shareable

If someone opens the report, they should understand it without you explaining for 30 minutes.


Deliverables (you must ship these)

Deliverable A — Automated weekly report

  • report file generated (markdown/html)
  • stored under /reports/ with date

Deliverable B — Metrics tables

  • routing breakdown
  • top clusters
  • top DQ failures

Deliverable C — AI health section

  • accuracy + needs_human + drift flags
  • regression summary (pass/fail)

Deliverable D — “Next actions” list

  • top 5 actions to reduce ticket load
  • each action linked to cluster/rule_id evidence

Common traps (don’t do this)

Report first. Dashboards are optional.

  • Trap 1: “Dashboard before report.”

Tables + a short summary beats a rainbow dashboard.

  • Trap 2: “Too many charts.”

Without actions, the report is useless.

  • Trap 3: “No next actions.”

Quick self-check (2 minutes)

Answer yes/no:

  • Is the report generated automatically (no manual work)?
  • Does it show routing breakdown and recurring issues?
  • Does it include DQ pain with rule_id evidence?
  • Does it include AI health (accuracy/needs_human/drift)?
  • Does it end with a clear next-action list?

If any “no” — fix it before moving on.


Next module: W29–W30W29–W30: Knowledge Base Design (sources, chunking, metadata)