Phase 4 · W39–W40

W39–W40: Review & Hardening (quality gates)

Harden the KB system with enforceable quality gates so every change is safe, testable, and demo-ready.

Suggested time: 4–6 hours/week

Outcomes

  • A “Definition of Done” for KB changes.
  • Automated checks that run on every change.
  • A stable retrieval benchmark with pass/fail thresholds.
  • A governance enforcement check (no restricted leaks).
  • A demo script that shows the system working end-to-end.

Deliverables

  • KB Definition of Done checklist used as a gate.
  • One command that runs all KB checks with non-zero exit on failure.
  • Deterministic demo script with 3 queries and source-grounded answers.
  • Short architecture document explaining the full KB system.

Prerequisites

  • W37–W38: Continuous Updates (content pipeline)

W39–W40: Review & Hardening (quality gates)

What you’re doing

You stop having “a bunch of parts”.

By now you have:

  • ingestion
  • chunking + metadata
  • retrieval
  • governance rules
  • docs/runbooks/RCAs
  • update pipeline

Now you harden it into a portfolio-grade system:

  • consistent
  • testable
  • reproducible
  • safe

Time: 4–6 hours/week
Output: quality gates for the KB system + a hardening pass that makes it reliable and demo-ready


The promise (what you’ll have by the end)

By the end of W40 you will have:

  • A “Definition of Done” for KB changes
  • Automated checks that run on every change
  • A stable retrieval benchmark with pass/fail thresholds
  • A governance enforcement check (no restricted leaks)
  • A demo script that shows the system working end-to-end

The rule: ship only what you can trust

If you can’t trust it, don’t demo it.
If you can’t demo it, it’s not portfolio-ready.


Build quality gates (minimum set)

Gate 1 — Content validity

Fail if:

  • required metadata missing
  • doc status invalid
  • owner missing
  • last_reviewed_at missing

Gate 2 — Ingestion integrity

Fail if:

  • ingestion command fails
  • chunks.jsonl is empty
  • chunk records missing required fields

Gate 3 — Retrieval benchmark

Fail if:

  • top-5 hit rate < threshold
  • worst-case queries degrade too much

Gate 4 — Governance enforcement

Fail if:

  • restricted content detected in allowed corpus
  • secret patterns detected (token/password/etc.)
  • sensitivity mismatches exist

Gate 5 — Demo readiness

Fail if:

  • you can’t run demo in one command
  • demo output is inconsistent
  • no “answers with sources” examples

Start with these five. They cover most real failures.


Step-by-step checklist

1) Create a KB Definition of Done

A short checklist file:

  • metadata present
  • status lifecycle followed
  • ingestion passes
  • retrieval benchmark passes
  • governance checks pass

Print it. Follow it.

2) Build a “kb check” command

One command like:

It should run:

  • `make kb-check`
  • lint metadata
  • run ingestion
  • run retrieval tests
  • run governance scan

No manual steps.

3) Add a demo script

Demo = predictable.
Create a script that:

  • runs ingestion
  • runs retrieval for 3 example queries
  • prints top sources + snippet
  • shows refusal when no source

This is your “portfolio proof”.

4) Document the system

Write one doc:

  • architecture overview
  • where docs live
  • how ingestion works
  • how retrieval works
  • how governance is enforced
  • how to run the demo

Keep it clean and short.


Deliverables (you must ship these)

Deliverable A — KB Definition of Done

  • checklist exists
  • used as gate

Deliverable B — Automated KB checks

  • one command runs all checks
  • non-zero exit on failure

Deliverable C — Demo script

  • deterministic demo with 3 queries
  • shows sources and safe behavior

Deliverable D — Architecture doc

  • short and readable
  • explains the system end-to-end

Common traps (don’t do this)

Hardening is what makes it real.

  • Trap 1: “Hardening is boring.”

Manual checks get skipped. Automation doesn’t.

  • Trap 2: “Manual checking is enough.”

Without demo script, your portfolio story is weak.

  • Trap 3: “No demo script.”

Quick self-check (2 minutes)

Answer yes/no:

  • Do I have pass/fail gates for content, ingestion, retrieval, governance?
  • Can I run all KB checks in one command?
  • Can I demo the KB system in 2 minutes?
  • Are the docs clear enough for a stranger to understand?
  • Would I trust this system in real ops?

If any “no” — fix it before moving on.


Next module: W41–W44W41–W44: Hardening & Documentation (README, diagrams, demos)