Phase 3 · W21–W22
W21–W22: Classification & Routing
Turn prompt outputs into a real, validated classification and routing feature with deterministic behavior.
Suggested time: 4–6 hours/week
Outcomes
- A runnable service endpoint or CLI that classifies tickets.
- Strict JSON output validation that blocks broken responses.
- Deterministic routing rules mapped to team/owner and next action.
- A clear human fallback path for low-confidence or invalid outputs.
- Prediction logging and run summaries for audit and improvement.
- A small evaluation run on the golden set with real metrics.
Deliverables
- Working classifier API/CLI with strict JSON I/O contract.
- Routing map file (label → team/action) used by the system.
- Fallback behavior for low confidence, invalid output, and missing info.
- Prediction log and evaluation report with real numbers.
Prerequisites
- W19–W20: Prompting Patterns for Ops (safe prompts, constraints)
W21–W22: Classification & Routing
What you’re doing
You stop having “a prompt”.
You build an actual feature.
This is where the AI Ticket Analyzer becomes a real system:
- input comes in
- output is structured
- routing is deterministic
- failures are handled
- everything is logged and measurable
Time: 4–6 hours/week
Output: an end-to-end classification + routing pipeline with JSON validation, fallback rules, and a small evaluation report
The promise (what you’ll have by the end)
By the end of W22 you will have:
- A runnable service endpoint (or CLI) that classifies tickets
- Strict JSON output validation (no broken responses)
- Routing rules (team/action mapping)
- A human-fallback path that is not embarrassing
- Logging + run summaries for predictions
- A small eval run on your golden set
The rule: AI is one component, not the whole system
The model can be wrong.
So your system must:
- validate
- constrain
- fallback
- and keep moving
Step-by-step checklist
1) Build the classification API contract
Define one endpoint (or CLI command) like:
Input: your ticket schema
Output: your strict JSON classification schema
- POST /classify
No free-form text.
2) Implement output validation
After the model responds:
- validate against schema
- if invalid → mark needs_human=true and log the error
- never pass broken output forward
This is the difference between demo and production.
3) Add routing rules (deterministic mapping)
Create a routing map:
- label → owner/team → next action
Example:
- DEV_REQUIRED → “Dev team” → “create user story”
- MDG_MASTERDATA_CHANGE → “MDG ops” → “update in MDG + retrigger”
- UI_CLIENT_ACTION → “Support” → “provide GUI steps”
- CONFIG_CUSTOMIZING → “Config owner” → “check customizing + transport”
Keep routing rules outside code if possible (JSON/YAML).
4) Add the fallback strategy
If confidence is low OR output invalid OR missing info:
- needs_human=true
- add “missing_fields” list or “questions_to_ask”
- route to “human triage”
This prevents hallucination from becoming action.
5) Log every prediction
Store:
- ticket_id
- model/prompt version
- predicted labels
- confidence
- needs_human flag
- routing output
- timestamp
This is your audit trail.
6) Evaluate on your golden set
Run:
- primary label accuracy
- % needs_human
- top confusion pairs (A mistaken as B)
Write a short report. No vibes.
Deliverables (you must ship these)
Deliverable A — Working classifier
- API endpoint or CLI exists
- It accepts tickets and returns strict JSON
Deliverable B — Routing map
- A mapping file exists (label → team/action)
- It’s used by the system (not just documented)
Deliverable C — Fallback behavior
- Low confidence / invalid output triggers needs_human=true
- Missing info path is implemented
Deliverable D — Prediction log + eval report
- Predictions stored (file or DB)
- Eval report exists with real numbers
Common traps (don’t do this)
No. Validate and constrain.
- Trap 1: “We’ll just trust the model.”
Put routing rules in data files so you can change without redeploying.
- Trap 2: “Routing inside code.”
Without logs you can’t improve or explain failures.
- Trap 3: “No audit trail.”
Quick self-check (2 minutes)
Answer yes/no:
- Do I have a strict input/output contract?
- Is model output validated every time?
- Are routing rules deterministic and externalized?
- Is there a clear needs_human fallback path?
- Did I run evaluation with real numbers?
If any “no” — fix it before moving on.