Phase 4 · W39–W40

W39–W40: Review & Hardening (quality gates)

Harden the KB system with enforceable quality gates so every change is safe, testable, and demo-ready.

Suggested time: 4–6 hours/week

Outcomes

A “Definition of Done” for KB changes.
Automated checks that run on every change.
A stable retrieval benchmark with pass/fail thresholds.
A governance enforcement check (no restricted leaks).
A demo script that shows the system working end-to-end.

Deliverables

KB Definition of Done checklist used as a gate.
One command that runs all KB checks with non-zero exit on failure.
Deterministic demo script with 3 queries and source-grounded answers.
Short architecture document explaining the full KB system.

Prerequisites

W37–W38: Continuous Updates (content pipeline)

W39–W40: Review & Hardening (quality gates)

What you’re doing

You stop having “a bunch of parts”.

By now you have:

ingestion
chunking + metadata
retrieval
governance rules
docs/runbooks/RCAs
update pipeline

Now you harden it into a portfolio-grade system:

consistent
testable
reproducible
safe

Time: 4–6 hours/week
Output: quality gates for the KB system + a hardening pass that makes it reliable and demo-ready

The promise (what you’ll have by the end)

By the end of W40 you will have:

A “Definition of Done” for KB changes
Automated checks that run on every change
A stable retrieval benchmark with pass/fail thresholds
A governance enforcement check (no restricted leaks)
A demo script that shows the system working end-to-end

The rule: ship only what you can trust

If you can’t trust it, don’t demo it.
If you can’t demo it, it’s not portfolio-ready.

Build quality gates (minimum set)

Gate 1 — Content validity

Fail if:

required metadata missing
doc status invalid
owner missing
last_reviewed_at missing

Gate 2 — Ingestion integrity

Fail if:

ingestion command fails
chunks.jsonl is empty
chunk records missing required fields

Gate 3 — Retrieval benchmark

Fail if:

top-5 hit rate < threshold
worst-case queries degrade too much

Gate 4 — Governance enforcement

Fail if:

restricted content detected in allowed corpus
secret patterns detected (token/password/etc.)
sensitivity mismatches exist

Gate 5 — Demo readiness

Fail if:

you can’t run demo in one command
demo output is inconsistent
no “answers with sources” examples

Start with these five. They cover most real failures.

Step-by-step checklist

1) Create a KB Definition of Done

A short checklist file:

metadata present
status lifecycle followed
ingestion passes
retrieval benchmark passes
governance checks pass

Print it. Follow it.

2) Build a “kb check” command

One command like:

It should run:

`make kb-check`
lint metadata
run ingestion
run retrieval tests
run governance scan

No manual steps.

3) Add a demo script

Demo = predictable.
Create a script that:

runs ingestion
runs retrieval for 3 example queries
prints top sources + snippet
shows refusal when no source

This is your “portfolio proof”.

4) Document the system

Write one doc:

architecture overview
where docs live
how ingestion works
how retrieval works
how governance is enforced
how to run the demo

Keep it clean and short.

Deliverables (you must ship these)

Deliverable A — KB Definition of Done

checklist exists
used as gate

Deliverable B — Automated KB checks

one command runs all checks
non-zero exit on failure

Deliverable C — Demo script

deterministic demo with 3 queries
shows sources and safe behavior

Deliverable D — Architecture doc

short and readable
explains the system end-to-end

Common traps (don’t do this)

Hardening is what makes it real.

Trap 1: “Hardening is boring.”

Manual checks get skipped. Automation doesn’t.

Trap 2: “Manual checking is enough.”

Without demo script, your portfolio story is weak.

Trap 3: “No demo script.”

Quick self-check (2 minutes)

Answer yes/no:

Do I have pass/fail gates for content, ingestion, retrieval, governance?
Can I run all KB checks in one command?
Can I demo the KB system in 2 minutes?
Are the docs clear enough for a stranger to understand?
Would I trust this system in real ops?

If any “no” — fix it before moving on.

Next module: W41–W44 — W41–W44: Hardening & Documentation (README, diagrams, demos)