Phase 4 · W39–W40
W39–W40: Review & Hardening (quality gates)
Harden the KB system with enforceable quality gates so every change is safe, testable, and demo-ready.
Suggested time: 4–6 hours/week
Outcomes
- A “Definition of Done” for KB changes.
- Automated checks that run on every change.
- A stable retrieval benchmark with pass/fail thresholds.
- A governance enforcement check (no restricted leaks).
- A demo script that shows the system working end-to-end.
Deliverables
- KB Definition of Done checklist used as a gate.
- One command that runs all KB checks with non-zero exit on failure.
- Deterministic demo script with 3 queries and source-grounded answers.
- Short architecture document explaining the full KB system.
Prerequisites
- W37–W38: Continuous Updates (content pipeline)
W39–W40: Review & Hardening (quality gates)
What you’re doing
You stop having “a bunch of parts”.
By now you have:
- ingestion
- chunking + metadata
- retrieval
- governance rules
- docs/runbooks/RCAs
- update pipeline
Now you harden it into a portfolio-grade system:
- consistent
- testable
- reproducible
- safe
Time: 4–6 hours/week
Output: quality gates for the KB system + a hardening pass that makes it reliable and demo-ready
The promise (what you’ll have by the end)
By the end of W40 you will have:
- A “Definition of Done” for KB changes
- Automated checks that run on every change
- A stable retrieval benchmark with pass/fail thresholds
- A governance enforcement check (no restricted leaks)
- A demo script that shows the system working end-to-end
The rule: ship only what you can trust
If you can’t trust it, don’t demo it.
If you can’t demo it, it’s not portfolio-ready.
Build quality gates (minimum set)
Gate 1 — Content validity
Fail if:
- required metadata missing
- doc status invalid
- owner missing
- last_reviewed_at missing
Gate 2 — Ingestion integrity
Fail if:
- ingestion command fails
- chunks.jsonl is empty
- chunk records missing required fields
Gate 3 — Retrieval benchmark
Fail if:
- top-5 hit rate < threshold
- worst-case queries degrade too much
Gate 4 — Governance enforcement
Fail if:
- restricted content detected in allowed corpus
- secret patterns detected (token/password/etc.)
- sensitivity mismatches exist
Gate 5 — Demo readiness
Fail if:
- you can’t run demo in one command
- demo output is inconsistent
- no “answers with sources” examples
Start with these five. They cover most real failures.
Step-by-step checklist
1) Create a KB Definition of Done
A short checklist file:
- metadata present
- status lifecycle followed
- ingestion passes
- retrieval benchmark passes
- governance checks pass
Print it. Follow it.
2) Build a “kb check” command
One command like:
It should run:
- `make kb-check`
- lint metadata
- run ingestion
- run retrieval tests
- run governance scan
No manual steps.
3) Add a demo script
Demo = predictable.
Create a script that:
- runs ingestion
- runs retrieval for 3 example queries
- prints top sources + snippet
- shows refusal when no source
This is your “portfolio proof”.
4) Document the system
Write one doc:
- architecture overview
- where docs live
- how ingestion works
- how retrieval works
- how governance is enforced
- how to run the demo
Keep it clean and short.
Deliverables (you must ship these)
Deliverable A — KB Definition of Done
- checklist exists
- used as gate
Deliverable B — Automated KB checks
- one command runs all checks
- non-zero exit on failure
Deliverable C — Demo script
- deterministic demo with 3 queries
- shows sources and safe behavior
Deliverable D — Architecture doc
- short and readable
- explains the system end-to-end
Common traps (don’t do this)
Hardening is what makes it real.
- Trap 1: “Hardening is boring.”
Manual checks get skipped. Automation doesn’t.
- Trap 2: “Manual checking is enough.”
Without demo script, your portfolio story is weak.
- Trap 3: “No demo script.”
Quick self-check (2 minutes)
Answer yes/no:
- Do I have pass/fail gates for content, ingestion, retrieval, governance?
- Can I run all KB checks in one command?
- Can I demo the KB system in 2 minutes?
- Are the docs clear enough for a stranger to understand?
- Would I trust this system in real ops?
If any “no” — fix it before moving on.