Phase 3 · W21–W22

W21–W22: Classification & Routing

Turn prompt outputs into a real, validated classification and routing feature with deterministic behavior.

Suggested time: 4–6 hours/week

Outcomes

  • A runnable service endpoint or CLI that classifies tickets.
  • Strict JSON output validation that blocks broken responses.
  • Deterministic routing rules mapped to team/owner and next action.
  • A clear human fallback path for low-confidence or invalid outputs.
  • Prediction logging and run summaries for audit and improvement.
  • A small evaluation run on the golden set with real metrics.

Deliverables

  • Working classifier API/CLI with strict JSON I/O contract.
  • Routing map file (label → team/action) used by the system.
  • Fallback behavior for low confidence, invalid output, and missing info.
  • Prediction log and evaluation report with real numbers.

Prerequisites

  • W19–W20: Prompting Patterns for Ops (safe prompts, constraints)

W21–W22: Classification & Routing

What you’re doing

You stop having “a prompt”.
You build an actual feature.

This is where the AI Ticket Analyzer becomes a real system:

  • input comes in
  • output is structured
  • routing is deterministic
  • failures are handled
  • everything is logged and measurable

Time: 4–6 hours/week
Output: an end-to-end classification + routing pipeline with JSON validation, fallback rules, and a small evaluation report


The promise (what you’ll have by the end)

By the end of W22 you will have:

  • A runnable service endpoint (or CLI) that classifies tickets
  • Strict JSON output validation (no broken responses)
  • Routing rules (team/action mapping)
  • A human-fallback path that is not embarrassing
  • Logging + run summaries for predictions
  • A small eval run on your golden set

The rule: AI is one component, not the whole system

The model can be wrong.
So your system must:

  • validate
  • constrain
  • fallback
  • and keep moving

Step-by-step checklist

1) Build the classification API contract

Define one endpoint (or CLI command) like:

Input: your ticket schema
Output: your strict JSON classification schema

  • POST /classify

No free-form text.

2) Implement output validation

After the model responds:

  • validate against schema
  • if invalid → mark needs_human=true and log the error
  • never pass broken output forward

This is the difference between demo and production.

3) Add routing rules (deterministic mapping)

Create a routing map:

  • label → owner/team → next action

Example:

  • DEV_REQUIRED → “Dev team” → “create user story”
  • MDG_MASTERDATA_CHANGE → “MDG ops” → “update in MDG + retrigger”
  • UI_CLIENT_ACTION → “Support” → “provide GUI steps”
  • CONFIG_CUSTOMIZING → “Config owner” → “check customizing + transport”

Keep routing rules outside code if possible (JSON/YAML).

4) Add the fallback strategy

If confidence is low OR output invalid OR missing info:

  • needs_human=true
  • add “missing_fields” list or “questions_to_ask”
  • route to “human triage”

This prevents hallucination from becoming action.

5) Log every prediction

Store:

  • ticket_id
  • model/prompt version
  • predicted labels
  • confidence
  • needs_human flag
  • routing output
  • timestamp

This is your audit trail.

6) Evaluate on your golden set

Run:

  • primary label accuracy
  • % needs_human
  • top confusion pairs (A mistaken as B)

Write a short report. No vibes.


Deliverables (you must ship these)

Deliverable A — Working classifier

  • API endpoint or CLI exists
  • It accepts tickets and returns strict JSON

Deliverable B — Routing map

  • A mapping file exists (label → team/action)
  • It’s used by the system (not just documented)

Deliverable C — Fallback behavior

  • Low confidence / invalid output triggers needs_human=true
  • Missing info path is implemented

Deliverable D — Prediction log + eval report

  • Predictions stored (file or DB)
  • Eval report exists with real numbers

Common traps (don’t do this)

No. Validate and constrain.

  • Trap 1: “We’ll just trust the model.”

Put routing rules in data files so you can change without redeploying.

  • Trap 2: “Routing inside code.”

Without logs you can’t improve or explain failures.

  • Trap 3: “No audit trail.”

Quick self-check (2 minutes)

Answer yes/no:

  • Do I have a strict input/output contract?
  • Is model output validated every time?
  • Are routing rules deterministic and externalized?
  • Is there a clear needs_human fallback path?
  • Did I run evaluation with real numbers?

If any “no” — fix it before moving on.