Phase 3 · W23–W24
W23–W24: Clustering & Recurring Issue Detection
Detect recurring ticket patterns and turn them into actionable fixes, runbooks, and automation opportunities.
Suggested time: 4–6 hours/week
Outcomes
- A similarity pipeline (even if simple).
- Clusters of tickets with readable titles/labels.
- A “top recurring issues” list with counts and examples.
- A way to link clusters to actions (runbook, fix, mapping update, etc.).
- A baseline you can improve later (better embeddings, better thresholds).
Deliverables
- A clustering script/service that outputs cluster_id and ticket lists.
- Readable cluster summaries with titles and keywords.
- A recurring issues report with top clusters, counts, and examples.
- A cluster-to-action mapping included in the report.
Prerequisites
- W21–W22: Classification & Routing
W23–W24: Clustering & Recurring Issue Detection
What you’re doing
You stop treating tickets as separate snowflakes.
In AMS reality, 60–80% of tickets are repeats:
- same root cause
- same missing mapping
- same “interface overwrote value”
- same “user did wrong step”
Clustering turns chaos into patterns.
Patterns turn into:
- fixes
- automation
- runbooks
- proactive monitoring
Time: 4–6 hours/week
Output: a repeat-issue detector that groups similar tickets and produces a “Top recurring problems” report
The promise (what you’ll have by the end)
By the end of W24 you will have:
- A similarity pipeline (even if simple)
- Clusters of tickets with readable titles/labels
- A “top recurring issues” list with counts and examples
- A way to link clusters to actions (runbook, fix, mapping update, etc.)
- A baseline you can improve later (better embeddings, better thresholds)
The rule: clusters must be actionable
A cluster is useless if it doesn’t lead to an action.
Your cluster output must include:
- what it is (human-readable summary)
- how many times it happened
- what to do next (suggested action)
Step-by-step checklist
1) Prepare the text for similarity
Use a consistent text field per ticket:
- short_description + key paragraphs
- remove noise (email signatures, greetings, ticket templates)
- keep important context (system name, object type, error codes)
You don’t need perfection. You need consistency.
2) Choose a similarity method (start simple)
Start with one approach:
or
- embeddings + cosine similarity
- TF-IDF + cosine similarity (cheap and fine for first version)
Don’t block on “the best model”.
3) Define clustering logic
Simple approach:
- compute similarity matrix
- group items above threshold
- keep clusters with size >= 2 or 3
You can improve later, but get something working now.
4) Name clusters (make them readable)
For each cluster, generate:
- a short title (e.g., “MDG overwrite: dunning profile reset”)
- top keywords / error codes
- 2–3 example tickets
If cluster names are garbage, nobody will use it.
5) Produce the recurring issues report
Your report should show:
- top clusters by count
- cluster title
- label (from earlier classifier)
- example ticket ids
- suggested next action
This report is your “where to invest”.
6) Link cluster → action
Create a simple mapping:
- cluster_id → runbook link OR “create fix ticket” OR “update mapping table”
Even if the action is manual for now, make it explicit.
Deliverables (you must ship these)
Deliverable A — Clustering pipeline
- A script/service that clusters tickets
- Output includes cluster_id and ticket list
Deliverable B — Cluster summaries
- Each cluster has a readable title + keywords
- At least 5 clusters (if dataset allows)
Deliverable C — Recurring issues report
- A markdown/HTML report exists
- Top clusters with counts and examples are visible
Deliverable D — Action mapping
- A mapping file exists (cluster_id → action)
- It’s included in the report
Common traps (don’t do this)
No. This is ops intelligence.
- Trap 1: “Clustering is only for data science.”
No. A messy but useful report beats a perfect idea that never ships.
- Trap 2: “Perfect clusters or nothing.”
Without actions, clusters become trivia.
- Trap 3: “No action mapping.”
Quick self-check (2 minutes)
Answer yes/no:
- Do I have a consistent text field per ticket for similarity?
- Do clusters have readable titles?
- Does the report show top recurring issues with counts?
- Is each cluster linked to a next action?
- Can this report help me choose what to fix first?
If any “no” — fix it before moving on.