Phase 3 · W21–W22

W21–W22: Classification & Routing

Turn prompt outputs into a real, validated classification and routing feature with deterministic behavior.

Suggested time: 4–6 hours/week

Outcomes

A runnable service endpoint or CLI that classifies tickets.
Strict JSON output validation that blocks broken responses.
Deterministic routing rules mapped to team/owner and next action.
A clear human fallback path for low-confidence or invalid outputs.
Prediction logging and run summaries for audit and improvement.
A small evaluation run on the golden set with real metrics.

Deliverables

Working classifier API/CLI with strict JSON I/O contract.
Routing map file (label → team/action) used by the system.
Fallback behavior for low confidence, invalid output, and missing info.
Prediction log and evaluation report with real numbers.

Prerequisites

W19–W20: Prompting Patterns for Ops (safe prompts, constraints)

W21–W22: Classification & Routing

What you’re doing

You stop having “a prompt”.
You build an actual feature.

This is where the AI Ticket Analyzer becomes a real system:

input comes in
output is structured
routing is deterministic
failures are handled
everything is logged and measurable

Time: 4–6 hours/week
Output: an end-to-end classification + routing pipeline with JSON validation, fallback rules, and a small evaluation report

The promise (what you’ll have by the end)

By the end of W22 you will have:

A runnable service endpoint (or CLI) that classifies tickets
Strict JSON output validation (no broken responses)
Routing rules (team/action mapping)
A human-fallback path that is not embarrassing
Logging + run summaries for predictions
A small eval run on your golden set

The rule: AI is one component, not the whole system

The model can be wrong.
So your system must:

validate
constrain
fallback
and keep moving

Step-by-step checklist

1) Build the classification API contract

Define one endpoint (or CLI command) like:

Input: your ticket schema
Output: your strict JSON classification schema

POST /classify

No free-form text.

2) Implement output validation

After the model responds:

validate against schema
if invalid → mark needs_human=true and log the error
never pass broken output forward

This is the difference between demo and production.

3) Add routing rules (deterministic mapping)

Create a routing map:

label → owner/team → next action

Example:

DEV_REQUIRED → “Dev team” → “create user story”
MDG_MASTERDATA_CHANGE → “MDG ops” → “update in MDG + retrigger”
UI_CLIENT_ACTION → “Support” → “provide GUI steps”
CONFIG_CUSTOMIZING → “Config owner” → “check customizing + transport”

Keep routing rules outside code if possible (JSON/YAML).

4) Add the fallback strategy

If confidence is low OR output invalid OR missing info:

needs_human=true
add “missing_fields” list or “questions_to_ask”
route to “human triage”

This prevents hallucination from becoming action.

5) Log every prediction

Store:

ticket_id
model/prompt version
predicted labels
confidence
needs_human flag
routing output
timestamp

This is your audit trail.

6) Evaluate on your golden set

Run:

primary label accuracy
% needs_human
top confusion pairs (A mistaken as B)

Write a short report. No vibes.

Deliverables (you must ship these)

Deliverable A — Working classifier

API endpoint or CLI exists
It accepts tickets and returns strict JSON

Deliverable B — Routing map

A mapping file exists (label → team/action)
It’s used by the system (not just documented)

Deliverable C — Fallback behavior

Low confidence / invalid output triggers needs_human=true
Missing info path is implemented

Deliverable D — Prediction log + eval report

Predictions stored (file or DB)
Eval report exists with real numbers

Common traps (don’t do this)

No. Validate and constrain.

Trap 1: “We’ll just trust the model.”

Put routing rules in data files so you can change without redeploying.

Trap 2: “Routing inside code.”

Without logs you can’t improve or explain failures.

Trap 3: “No audit trail.”

Quick self-check (2 minutes)

Answer yes/no:

Do I have a strict input/output contract?
Is model output validated every time?
Are routing rules deterministic and externalized?
Is there a clear needs_human fallback path?
Did I run evaluation with real numbers?

If any “no” — fix it before moving on.