Phase 3 · W19–W20

W19–W20: Prompting Patterns for Ops (safe prompts, constraints)

Make LLM outputs predictable, constrained, and parseable so they are actually usable in operations.

Suggested time: 4–6 hours/week

Outcomes

3–5 prompt templates that solve real ops tasks.
A strict output schema (JSON) that your app can parse.
Guardrails: what the model is allowed to say and not say.
A fallback strategy (unknown / needs-human).
A mini evaluation run against your golden set.

Deliverables

3 prompt templates in the repo with clear input/output format.
JSON schema or TS type for output with validation in code.
Guardrails doc with allowed labels and unknown behavior rules.
Mini evaluation report with accuracy, failures, and next improvements.

Prerequisites

W17–W18: Ticket Data Modeling & Labeling

W19–W20: Prompting Patterns for Ops (safe prompts, constraints)

What you’re doing

You’re making LLM output predictable enough to be used in operations.

The biggest mistake people make:

they ask the model “what do you think?”
get a nice paragraph
and call it automation

Ops needs:

structure
constraints
reproducibility
a clear “I don’t know” path

Time: 4–6 hours/week
Output: a set of prompt templates + output schemas + safety rules + a small eval routine using your golden set

The promise (what you’ll have by the end)

By the end of W20 you will have:

3–5 prompt templates that solve real ops tasks
A strict output schema (JSON) that your app can parse
Guardrails: what the model is allowed to say and not say
A fallback strategy (unknown / needs-human)
A mini evaluation run against your golden set

The rule: no free-form output in production

If the output is not parseable, it’s not usable.

Your model output must be:

JSON
with fixed keys
with allowed values
with confidence and reasons

Build your prompt kit (simple but real)

1) Define tasks (pick 3)

Pick tasks that are actually useful:

classify ticket category
route to team (dev/data/config/manual)
suggest next steps checklist
detect duplicates / similar tickets
extract key fields (BP number, system, country, etc.)

Pick 3. Don’t do 10.

2) Define output schema (strict JSON)

Example keys:

primary_label
secondary_label (optional)
confidence (0–1)
extracted_fields (object)
suggested_next_steps (array)
needs_human (boolean)
reasons (array of short bullet strings)

Keep it stable.

3) Write the prompt template (with constraints)

Your prompt must include:

role (“You are an AMS triage assistant…”)
input format
allowed labels list
output JSON schema
instruction to say needs_human=true if unsure
“do not invent facts” rule

4) Add refusal / unknown behavior

If the ticket is missing info:

model must request missing fields
or set needs_human=true

No guessing.

5) Add a mini-eval using your golden set

Run your prompt on the golden set and compute:

accuracy of primary label
% needs_human
top failure patterns

This is how you stop lying to yourself.

Deliverables (you must ship these)

Deliverable A — Prompt templates

3 prompt templates stored in repo (as .md or .txt)
Each has clear input/output format

Deliverable B — Output schema

JSON schema / TS type exists
Your code validates model output

Deliverable C — Guardrails doc

A short doc:
allowed labels
“do not invent facts”
when to set needs_human=true

Deliverable D — Mini evaluation report

run results on golden set:
accuracy
failures
what to improve next

Common traps (don’t do this)

Free answers are useless in ops.

Trap 1: “Let it answer freely.”

Without eval you are doing vibes, not engineering.

Trap 2: “No evaluation.”

If the model can’t say “I don’t know”, it will hallucinate.

Trap 3: “No unknown path.”

Quick self-check (2 minutes)

Answer yes/no:

Are outputs strict JSON with fixed keys?
Do I have allowed labels and constraints in the prompt?
Do I have a needs_human fallback path?
Did I run a mini-eval on my golden set?
Do I know my top 3 failure patterns?

If any “no” — fix it before moving on.

Next module: W21–W22 — W21–W22: Classification & Routing