Demand Forecasting in SAP AMS: From Ticket Closure to Controlled Operations (with Responsible Agentic Support)
The week before month-end close, a “small” change request lands: adjust pricing logic for a campaign, plus a master data correction because a load went wrong. At the same time, an interface backlog grows and blocks billing. The incident queue is “green” on SLA, but the same defect family reappears after every release, and the team is exhausted. This is L2–L4 AMS reality: complex incidents, change requests, problem management, process improvements, and small-to-medium developments happening in the same constrained capacity.
Why this matters now
Many AMS setups look healthy on paper because tickets are closed on time. The hidden cost sits elsewhere:
- Repeat incidents: the same interface errors, batch chain breaks, and authorization gaps return because prevention work keeps losing priority.
- Manual work: triage, log collection, and “please provide details” loops consume senior time.
- Knowledge loss: fixes live in chat threads and people’s heads, not in versioned runbooks.
- Cost drift: overtime and emergency changes rise during predictable peaks (close, campaigns, master data events).
The source record is blunt: AMS feels overloaded when demand is unmanaged and capacity is planned as if tomorrow will be like yesterday. Modern AMS is not a new tool. It is day-to-day control: forecasting demand from real signals (calendar, releases, interface health), shaping intake, and protecting prevention capacity so repeats go down.
Agentic / AI-assisted ways of working can help here—but only when they are used as controlled workflows, not as “auto-fix”.
The mental model
Classic AMS optimizes for ticket throughput: close incidents, meet SLA, keep the queue moving.
Modern AMS optimizes for outcomes and learning loops:
- fewer repeats (problem elimination),
- safer change delivery (less regression during peaks),
- predictable run cost (capacity as an operational control).
A simple model from the source JSON is the four capacity buckets:
- Run (incidents + mandatory ops)
- Change (planned delivery)
- Improve (problem elimination + automation)
- Reserve (shock absorber for spikes)
Two rules of thumb I use:
- Reserve must be real capacity, not a line in a spreadsheet. If it is always “borrowed”, you have no shock absorber.
- If Run stays above your threshold for 2+ weeks, cut Change or add containment—otherwise Improve dies and repeats grow (this rule is explicitly in the source).
What changes in practice
-
From “unexpected overload” to named demand sources
Use the source categories: predictable (close, campaigns, master data loads, planned transports, vendor windows), semi-predictable (rollouts, org changes, new interfaces), unpredictable (outages, security incidents, rare standard defects). If you can name it, you can plan it. -
From yearly capacity plans to weekly control
Weekly planning mechanics from the source: commit to changes within WIP limits, pick 1–3 “load-killer” Problems, adjust Reserve based on forecasted risk windows. -
From incident closure to root-cause removal ownership
Make “repeat incident families trend” a first-class input (source). Assign a Problem Owner who is measured on repeat reduction during peak windows, not on ticket volume. -
From tribal knowledge to searchable, versioned runbooks
Every critical flow (interfaces/IDocs, batch chains, close activities, master data events) needs a runbook with evidence links: symptoms, checks, safe actions, rollback notes, and who can approve what. -
From manual triage to AI-assisted triage with guardrails
Use assistance to classify tickets, request missing fields via hard intake templates (source lever), and retrieve context (similar incidents, known errors). Do not let it decide production actions. -
From reactive firefighting to risk-window playbooks
The source playbook is practical: pre-freeze low-value changes, increase monitoring sensitivity, prepare runbooks and standby owners for critical flows. This is how you stop “surprises” that are actually predictable. -
From “one vendor” thinking to clear decision rights
Separate who diagnoses, who approves, and who executes. Especially for data corrections and transports/imports, decision rights must be explicit to keep audit clean.
Agentic / AI pattern (without magic)
“Agentic” here means: a workflow where a system can plan steps, retrieve context, and draft actions, and it can execute only pre-approved safe tasks under human control.
One realistic end-to-end workflow for L2–L4 incidents and changes:
Inputs
- Ticket text + category + history (12–24 months trend is a source input)
- Monitoring signals (interface error rates, backlog velocity—source input)
- Release calendar / planned transports (source input)
- Runbooks and known error patterns (assumption: you store them somewhere searchable)
Steps
- Classify & route: suggest module/flow (e.g., interface vs batch vs authorization) and propose owner.
- Retrieve context: pull similar incident families, recent changes in the release calendar, and relevant runbook sections.
- Propose an action plan: checks to run, likely causes, and a containment option (e.g., pause a non-critical job, reroute a retry) with prerequisites.
- Request approval: for anything touching production behavior (changes, data corrections, security), generate an approval packet: risk, impact, rollback idea, evidence.
- Execute safe tasks (only if pre-approved): create/update ticket fields, draft comms, open a Problem record, generate a risk-window checklist (source output), or prepare a change draft. No direct production changes by default.
- Document: write a structured resolution note, link evidence, and propose a runbook update.
Guardrails
- Least privilege: the assistant can read what it needs; write access is limited to ticketing/knowledge drafts unless explicitly approved.
- Separation of duties: the same actor (human or system) should not both approve and execute production-impacting actions.
- Audit trail: every suggestion and executed step is logged with inputs used and who approved.
- Rollback discipline: every change proposal must include rollback steps or a containment alternative.
- Privacy: redact personal data in prompts and stored notes; keep sensitive business data out of free-text where possible.
What stays human-owned: approving production changes and transports/imports, authorizing data corrections, security decisions, and business sign-off on process changes. Honestly, this will slow you down at first because you are building the approval and evidence habits you should have had anyway.
Implementation steps (first 30 days)
-
Baseline demand signals
How: extract ticket volume by category + repeat families trend (source inputs).
Success: you can point to top 5 repeat families and top 3 peak drivers. -
Build a weekly forecast draft
How: combine historical trends with business calendar (close, promotions) and release calendar (source).
Success: a weekly “expected incident load” and “expected change demand” view exists (source outputs). -
Define capacity buckets and protect Improve
How: agree Run/Change/Improve/Reserve split and track actuals weekly (source metric).
Success: Improve time is not silently consumed by Run for two consecutive weeks. -
Create a risk-window protocol
How: define what gets pre-frozen, what monitoring sensitivity changes, who is on standby (source playbook).
Success: spike response time improves (source metric), fewer emergency changes during close. -
Introduce hard intake templates
How: enforce required fields for common ticket families; reject incomplete requests (source lever).
Success: fewer back-and-forth loops; lower reopen rate (generalization). -
Pick 1–3 load-killer Problems
How: select by repeat volume and business impact; assign owners and weekly check-ins (source weekly planning).
Success: repeat incident trend starts bending during peak windows (source metric). -
Pilot AI-assisted triage on a narrow scope
How: limit to classification, context retrieval, and drafting; require evidence links.
Success: reduced manual touch time; fewer misrouted tickets (generalization). -
Add approval packets for risky work
How: standardize risk/rollback/evidence for data corrections, interface mapping changes, and transports.
Success: change failure rate trend improves; audit questions are answered faster (generalization).
One limitation: forecasts will be wrong when the landscape changes suddenly (new partner issues, security events). That is why Reserve exists.
Pitfalls and anti-patterns
- Planning based on optimism (source anti-pattern)
- No Reserve capacity (source anti-pattern)
- Treating predictable peaks as “unexpected” (source anti-pattern)
- Automating broken intake: faster garbage in, faster garbage out
- Trusting AI summaries without checking logs/evidence
- Over-broad access for assistants (privacy and audit risk)
- Missing ownership for Problems (“everyone” owns it, so nobody does)
- Metrics that reward closure over prevention (repeat families keep growing)
- Over-customization of workflows that nobody maintains
- Ignoring change governance during peaks (“just this once” becomes normal)
Checklist
- Weekly demand forecast uses: ticket trends, repeat families, business calendar, release calendar, interface health signals
- Capacity tracked as Run/Change/Improve/Reserve actual split
- Reserve is protected and visible
- 1–3 Problems selected weekly with owners and expected load reduction
- Risk-window playbook: freeze rules, monitoring sensitivity, standby owners
- Hard intake templates reduce incomplete tickets
- Agentic workflow limited to safe tasks; approvals required for prod-impacting actions
- Audit trail and rollback notes are mandatory for changes and data corrections
FAQ
Is this safe in regulated environments?
Yes, if you enforce least privilege, separation of duties, audit trails, and approval gates. The assistant drafts and prepares; humans approve and execute sensitive actions.
How do we measure value beyond ticket counts?
Use the source metrics: Run/Change/Improve/Reserve split, forecast error (and why), spike response time, repeat incident trend during peak windows. Add change failure rate and reopen rate as supporting signals (generalization).
What data do we need for RAG / knowledge retrieval?
Minimum: resolved tickets with categories, repeat family tags, runbooks, release calendar notes, and interface health summaries. If you lack structured tags, start by tagging the top repeat families manually for a month.
How to start if the landscape is messy?
Start with one critical flow (e.g., a high-volume interface or close activity). Build a runbook, define the intake template, and forecast just that slice. Expand once the weekly loop works.
Will forecasting replace experienced leads?
No. It gives earlier signals and forces explicit decisions. Someone still needs to decide what to freeze, what to ship, and what to stop.
Where does small-to-medium development fit?
In the Change bucket, with WIP limits and explicit risk windows. If Run is hot for 2+ weeks, reduce Change or add containment (source rule).
Next action
Next week, run a 45-minute weekly planning session using the four buckets, pick 1–3 load-killer Problems, and mark the next risk window from your business and release calendar—then decide, in writing, what you will pre-freeze and what Reserve you will protect.
Source: “Demand Forecasting & Capacity Planning: Stop Scheduling Surprises” (ams-033), Dzmitryi Kharlanau (SAP Lead). Dataset bytes: https://dkharlanau.github.io
Operational FAQ
Is this safe in regulated environments?↓
How do we measure value beyond ticket counts?↓
What data do we need for RAG / knowledge retrieval?↓
How to start if the landscape is messy?↓
MetalHatsCats Operational Intelligence — 2/20/2026
