Phase 1 · W4–W5

W4–W5: Data Fundamentals for SAP Work (schemas, validation, profiling)

You’re turning “SAP gut feeling” into rules and structure.

Suggested time: 4–6 hours/week

Outcomes

  • A clear schema for your core business objects (start small).
  • Validation rules that catch common SAP/MDG issues early.
  • A profiling report that shows what’s broken (not “I think” — facts).
  • A simple error taxonomy you can reuse in tickets and dashboards.
  • A clean dataset sample you can safely share (anonymized).

Deliverables

  • A schema file exists in the repo (json/pydantic/etc.).
  • A rules file exists (could be code or YAML).
  • A generated report exists (markdown or simple table).
  • A short doc exists explaining the categories.

Prerequisites

  • W2–W3: Modern Backend Basics (Python + FastAPI)

W4–W5: Data Fundamentals for SAP Work (schemas, validation, profiling)

What you’re doing

You’re turning “SAP gut feeling” into rules and structure.

In AMS, everyone says:

  • “data is messy”
  • “interfaces overwrite stuff”
  • “MDG did something weird again”

Cool story. Now you build a system that can *prove* what’s wrong and *where*.

Time: 4–6 hours/week
Output: a data model + validation rules + profiling report + a repeatable way to detect bad data before it becomes a ticket


The promise (what you’ll have by the end)

By the end of W5 you will have:

  • A clear schema for your core business objects (start small)
  • Validation rules that catch common SAP/MDG issues early
  • A profiling report that shows what’s broken (not “I think” — facts)
  • A simple error taxonomy you can reuse in tickets and dashboards
  • A clean dataset sample you can safely share (anonymized)

The mindset shift

Stop treating data problems like “random incidents”.
Treat them like:

  • patterns
  • categories
  • rules
  • measurable quality

Once you can measure it, you can automate it.


Pick one object (don’t be greedy)

Choose ONE to start. Examples:

  • Business Partner (customer/vendor core)
  • Address + postal code + country
  • Partner Functions (SP/BP/PY/SH)
  • Payment terms / Incoterms
  • Dunning / Credit profiles

Pick the one you see in tickets every week.


Step-by-step checklist

1) Define your schema (the minimum truth)

Write a schema (JSON schema / Pydantic model / Zod — whatever you use).
It must include:

  • required fields
  • types
  • allowed values (where realistic)
  • formatting rules (where important)

This is your contract.

2) Define validation rules (business reality)

Examples of rules (adjust to your object):

  • Country requires a valid postal code format
  • Postal code must exist for specific regions
  • Payment terms must be in an allowed set
  • Partner functions must be complete for order flow (SP/BP/PY/SH)
  • Transportation zone cannot be empty if country = X
  • VAT ID format depends on country

Rules should produce:

  • rule_id
  • severity (error/warn)
  • message
  • field(s)
  • suggested fix (optional but powerful)

3) Build a profiling report (data facts)

You need a report like:

  • % missing fields
  • top invalid values
  • duplicates (based on key)
  • distribution of key fields
  • “top 10 errors by frequency”

This turns “we have issues” into “here are the exact issues and counts”.

4) Create an error taxonomy (simple but stable)

You need categories you can reuse across the program. Example:

  • DATA_MISSING (required field empty)
  • DATA_INVALID_FORMAT (regex/format mismatch)
  • DATA_INVALID_VALUE (not in allowed set)
  • DATA_INCONSISTENT (cross-field logic broken)
  • MAPPING_MISSING (mapping table missing)
  • INTERFACE_OVERWRITE (value reset/overwritten)
  • MDG_SYNC (MDG → S4 mismatch)
  • PROCESS_CONFIG (customizing/setting issue)
  • MANUAL_STEP_REQUIRED (must be done via client/GUI)

Don’t create 50 categories. Keep it usable.

5) Anonymize a sample dataset

Take 50–200 rows and anonymize:

  • names
  • addresses
  • IDs

Keep structure and errors. Remove personal data. You want something you can demo publicly without problems.


Deliverables (you must ship these)

Deliverable A — Schema

  • A schema file exists in the repo (json/pydantic/etc.)
  • It represents the minimum contract for your chosen object

Deliverable B — Validation rules

  • A rules file exists (could be code or YAML)
  • Running validation produces a list of errors with rule_id + severity

Deliverable C — Profiling report

  • A generated report exists (markdown or simple table)
  • It includes counts and “top errors”

Deliverable D — Error taxonomy

  • A short doc exists explaining the categories
  • You can reuse it later in ticket analyzer and dashboards

Common traps (don’t do this)

No. Start with 10–20 rules that hit 80% of pain.

  • Trap 1: “I need perfect SAP semantics.”

No. Pick a slice: addresses or partner functions or payment terms.

  • Trap 2: “I’ll model the whole BP monster.”

You can share anonymized structure + errors. That’s enough.

  • Trap 3: “I can’t share data.”

Quick self-check (2 minutes)

Answer yes/no:

  • Did I pick ONE object and keep scope small?
  • Do I have a schema that defines the minimum truth?
  • Do I have validation rules that produce structured errors?
  • Do I have a profiling report with real counts?
  • Can I explain my error taxonomy in 60 seconds?

If any “no” — fix it before moving on.


Next module preview (W6)

Next we harden your work:
packaging, test baseline, and a real “definition of done” gate.
So you stop shipping half-baked stuff.

Next module: W6W6: Packaging, Testing Baseline, and “Definition of Done”