Phase 3 · W23–W24

W23–W24: Clustering & Recurring Issue Detection

Detect recurring ticket patterns and turn them into actionable fixes, runbooks, and automation opportunities.

Suggested time: 4–6 hours/week

Outcomes

A similarity pipeline (even if simple).
Clusters of tickets with readable titles/labels.
A “top recurring issues” list with counts and examples.
A way to link clusters to actions (runbook, fix, mapping update, etc.).
A baseline you can improve later (better embeddings, better thresholds).

Deliverables

A clustering script/service that outputs cluster_id and ticket lists.
Readable cluster summaries with titles and keywords.
A recurring issues report with top clusters, counts, and examples.
A cluster-to-action mapping included in the report.

Prerequisites

W21–W22: Classification & Routing

W23–W24: Clustering & Recurring Issue Detection

What you’re doing

You stop treating tickets as separate snowflakes.

In AMS reality, 60–80% of tickets are repeats:

same root cause
same missing mapping
same “interface overwrote value”
same “user did wrong step”

Clustering turns chaos into patterns.
Patterns turn into:

fixes
automation
runbooks
proactive monitoring

Time: 4–6 hours/week
Output: a repeat-issue detector that groups similar tickets and produces a “Top recurring problems” report

The promise (what you’ll have by the end)

By the end of W24 you will have:

A similarity pipeline (even if simple)
Clusters of tickets with readable titles/labels
A “top recurring issues” list with counts and examples
A way to link clusters to actions (runbook, fix, mapping update, etc.)
A baseline you can improve later (better embeddings, better thresholds)

The rule: clusters must be actionable

A cluster is useless if it doesn’t lead to an action.

Your cluster output must include:

what it is (human-readable summary)
how many times it happened
what to do next (suggested action)

Step-by-step checklist

1) Prepare the text for similarity

Use a consistent text field per ticket:

short_description + key paragraphs
remove noise (email signatures, greetings, ticket templates)
keep important context (system name, object type, error codes)

You don’t need perfection. You need consistency.

2) Choose a similarity method (start simple)

Start with one approach:

embeddings + cosine similarity
TF-IDF + cosine similarity (cheap and fine for first version)

Don’t block on “the best model”.

3) Define clustering logic

Simple approach:

compute similarity matrix
group items above threshold
keep clusters with size >= 2 or 3

You can improve later, but get something working now.

4) Name clusters (make them readable)

For each cluster, generate:

a short title (e.g., “MDG overwrite: dunning profile reset”)
top keywords / error codes
2–3 example tickets

If cluster names are garbage, nobody will use it.

5) Produce the recurring issues report

Your report should show:

top clusters by count
cluster title
label (from earlier classifier)
example ticket ids
suggested next action

This report is your “where to invest”.

6) Link cluster → action

Create a simple mapping:

cluster_id → runbook link OR “create fix ticket” OR “update mapping table”

Even if the action is manual for now, make it explicit.

Deliverables (you must ship these)

Deliverable A — Clustering pipeline

A script/service that clusters tickets
Output includes cluster_id and ticket list

Deliverable B — Cluster summaries

Each cluster has a readable title + keywords
At least 5 clusters (if dataset allows)

Deliverable C — Recurring issues report

A markdown/HTML report exists
Top clusters with counts and examples are visible

Deliverable D — Action mapping

A mapping file exists (cluster_id → action)
It’s included in the report

Common traps (don’t do this)

No. This is ops intelligence.

Trap 1: “Clustering is only for data science.”

No. A messy but useful report beats a perfect idea that never ships.

Trap 2: “Perfect clusters or nothing.”

Without actions, clusters become trivia.

Trap 3: “No action mapping.”

Quick self-check (2 minutes)

Answer yes/no:

Do I have a consistent text field per ticket for similarity?
Do clusters have readable titles?
Does the report show top recurring issues with counts?
Is each cluster linked to a next action?
Can this report help me choose what to fix first?

If any “no” — fix it before moving on.