Phase 4 · W33–W34
W33–W34: Guardrails & Governance (what is allowed, what is not)
Establish clear governance and guardrails so the knowledge system stays safe, source-grounded, and policy-compliant.
Suggested time: 4–6 hours/week
Outcomes
- A clear “allowed vs forbidden” content policy.
- A sensitivity classification (public/internal/restricted).
- A safe-answer protocol (“answers with sources or no answer”).
- A redaction/anonymization guideline.
- Simple enforcement: checks during ingestion and during answering.
Deliverables
- Governance policy doc with allowed/forbidden content and sensitivity levels.
- Redaction guideline with rules, examples, and uncertain-case handling.
- Safe-answer protocol covering cite/refuse/unknown restricted behavior.
- Enforcement checks at ingestion and answering time.
Prerequisites
- W31–W32: Retrieval Quality (search, ranking, relevance)
W33–W34: Guardrails & Governance (what is allowed, what is not)
What you’re doing
You stop treating “knowledge base” like a cute wiki.
A KB + RAG system becomes a liability fast if:
- it leaks sensitive data
- it gives confident wrong answers
- it tells people to do risky changes
- it invents facts
So you build guardrails and governance like an adult.
Time: 4–6 hours/week
Output: a governance policy + content rules + safe-answer behavior + basic enforcement checks
The promise (what you’ll have by the end)
By the end of W34 you will have:
- A clear “allowed vs forbidden” content policy
- A sensitivity classification (public/internal/restricted)
- A safe-answer protocol (“answers with sources or no answer”)
- A redaction/anonymization guideline
- Simple enforcement: checks during ingestion and during answering
The rule: if you can’t cite it, don’t claim it
Your system must prefer:
over
- “Here are sources”
- “Trust me”
And it must be allowed to say:
- “I don’t know”
- “Not enough info”
- “This is restricted”
No hero mode.
Step-by-step checklist
1) Define sensitivity levels
Start simple:
- PUBLIC (safe to publish)
- INTERNAL (company/team internal)
- RESTRICTED (PII, credentials, customer data, sensitive configs)
Everything ingested must have a sensitivity label.
If you can’t label it, don’t ingest it.
2) Define “allowed content”
Examples:
- runbooks without secrets
- RCA summaries without personal data
- mapping rules without customer identifiers
- interface descriptions at conceptual level
- known issues summaries
3) Define “forbidden content”
Examples:
- credentials, tokens, keys
- full customer records
- personal addresses/names
- screenshots with sensitive info
- anything that violates policy or contracts
Write it down clearly. Not vague.
4) Add redaction/anonymization rules
Rules like:
- replace IDs with placeholders
- mask emails
- remove addresses
- remove attachments or store only metadata
If you can’t anonymize safely, exclude.
5) Define safe-answer behavior
Your answer policy should be:
- Always provide citations (source chunks)
- If no good source found → say “I don’t know” + suggest where to look
- If restricted content would be required → refuse and explain restriction
- Never invent configuration steps or transactions if not in sources
6) Add enforcement checks
During ingestion:
- block docs flagged as RESTRICTED if not allowed
- run a simple scanner for secrets (patterns like “password=”, “token=”, etc.)
During answering:
- only retrieve chunks allowed for the current context
- refuse if sensitivity mismatch
Start basic. It already helps a lot.
Deliverables (you must ship these)
Deliverable A — Governance policy doc
- allowed vs forbidden
- sensitivity levels
- ownership and update rules
Deliverable B — Redaction guideline
- clear rules + examples
- “what to do when unsure”
Deliverable C — Safe-answer protocol
- “cite or refuse”
- “unknown path”
- “restricted path”
Deliverable D — Enforcement checks
- ingestion-time checks exist
- answering-time filtering exists (even if basic)
Common traps (don’t do this)
Later means you ship a liability.
- Trap 1: “We’ll add governance later.”
No. You need constraints and rules.
- Trap 2: “The model will behave.”
That’s how leaks happen. Label content properly.
- Trap 3: “Everything is internal anyway.”
Quick self-check (2 minutes)
Answer yes/no:
- Does every doc/chunk have a sensitivity label?
- Do I have a written allowed/forbidden policy?
- Does the system cite sources or refuse?
- Do ingestion and retrieval enforce sensitivity rules?
- Do we have a clear redaction/anonymization guideline?
If any “no” — fix it before moving on.