Phase 3 · W23–W24

W23–W24: Clustering & Recurring Issue Detection

Detect recurring ticket patterns and turn them into actionable fixes, runbooks, and automation opportunities.

Suggested time: 4–6 hours/week

Outcomes

  • A similarity pipeline (even if simple).
  • Clusters of tickets with readable titles/labels.
  • A “top recurring issues” list with counts and examples.
  • A way to link clusters to actions (runbook, fix, mapping update, etc.).
  • A baseline you can improve later (better embeddings, better thresholds).

Deliverables

  • A clustering script/service that outputs cluster_id and ticket lists.
  • Readable cluster summaries with titles and keywords.
  • A recurring issues report with top clusters, counts, and examples.
  • A cluster-to-action mapping included in the report.

Prerequisites

  • W21–W22: Classification & Routing

W23–W24: Clustering & Recurring Issue Detection

What you’re doing

You stop treating tickets as separate snowflakes.

In AMS reality, 60–80% of tickets are repeats:

  • same root cause
  • same missing mapping
  • same “interface overwrote value”
  • same “user did wrong step”

Clustering turns chaos into patterns.
Patterns turn into:

  • fixes
  • automation
  • runbooks
  • proactive monitoring

Time: 4–6 hours/week
Output: a repeat-issue detector that groups similar tickets and produces a “Top recurring problems” report


The promise (what you’ll have by the end)

By the end of W24 you will have:

  • A similarity pipeline (even if simple)
  • Clusters of tickets with readable titles/labels
  • A “top recurring issues” list with counts and examples
  • A way to link clusters to actions (runbook, fix, mapping update, etc.)
  • A baseline you can improve later (better embeddings, better thresholds)

The rule: clusters must be actionable

A cluster is useless if it doesn’t lead to an action.

Your cluster output must include:

  • what it is (human-readable summary)
  • how many times it happened
  • what to do next (suggested action)

Step-by-step checklist

1) Prepare the text for similarity

Use a consistent text field per ticket:

  • short_description + key paragraphs
  • remove noise (email signatures, greetings, ticket templates)
  • keep important context (system name, object type, error codes)

You don’t need perfection. You need consistency.

2) Choose a similarity method (start simple)

Start with one approach:

or

  • embeddings + cosine similarity
  • TF-IDF + cosine similarity (cheap and fine for first version)

Don’t block on “the best model”.

3) Define clustering logic

Simple approach:

  • compute similarity matrix
  • group items above threshold
  • keep clusters with size >= 2 or 3

You can improve later, but get something working now.

4) Name clusters (make them readable)

For each cluster, generate:

  • a short title (e.g., “MDG overwrite: dunning profile reset”)
  • top keywords / error codes
  • 2–3 example tickets

If cluster names are garbage, nobody will use it.

5) Produce the recurring issues report

Your report should show:

  • top clusters by count
  • cluster title
  • label (from earlier classifier)
  • example ticket ids
  • suggested next action

This report is your “where to invest”.

6) Link cluster → action

Create a simple mapping:

  • cluster_id → runbook link OR “create fix ticket” OR “update mapping table”

Even if the action is manual for now, make it explicit.


Deliverables (you must ship these)

Deliverable A — Clustering pipeline

  • A script/service that clusters tickets
  • Output includes cluster_id and ticket list

Deliverable B — Cluster summaries

  • Each cluster has a readable title + keywords
  • At least 5 clusters (if dataset allows)

Deliverable C — Recurring issues report

  • A markdown/HTML report exists
  • Top clusters with counts and examples are visible

Deliverable D — Action mapping

  • A mapping file exists (cluster_id → action)
  • It’s included in the report

Common traps (don’t do this)

No. This is ops intelligence.

  • Trap 1: “Clustering is only for data science.”

No. A messy but useful report beats a perfect idea that never ships.

  • Trap 2: “Perfect clusters or nothing.”

Without actions, clusters become trivia.

  • Trap 3: “No action mapping.”

Quick self-check (2 minutes)

Answer yes/no:

  • Do I have a consistent text field per ticket for similarity?
  • Do clusters have readable titles?
  • Does the report show top recurring issues with counts?
  • Is each cluster linked to a next action?
  • Can this report help me choose what to fix first?

If any “no” — fix it before moving on.