Artifact Vault

Redacted proof packets for production AI work.

This vault makes the operating layer inspectable without exposing private client systems. Every artifact is a redacted reconstruction: enough structure to review the engineering judgment, no proprietary screenshots, tokens, account identifiers, raw rows, or client data.

Inspect MCP proof RAG reliability Private-data AI

Disclosure standard

Reconstructed enough to inspect. Redacted enough to publish.

These artifacts preserve the decisions, fields, gates, and failure modes a reviewer should care about. They intentionally remove client names, proprietary dashboards, credentials, account IDs, private identifiers, raw records, and anything that would imply access to live systems.

MCP Governance

How enterprise agent tools are scoped, reviewed, approved, recovered, and kept inside auditable boundaries.

RAG Reliability

How retrieval quality, citations, refusals, latency, cost, and regressions become release gates instead of demo polish.

Private-Data AI

How tenant boundaries, SQL guardrails, approval queues, and local model routing keep sensitive workflows usable without leaking data.

MCP Governance

How enterprise agent tools are scoped, reviewed, approved, recovered, and kept inside auditable boundaries.

Packet contains 4 redacted reconstructions connected to the related production case study.

Open related case study →

Registry sample Redacted reconstruction

Tool contract inventory

Agent-callable tools were treated as governed contracts, not loose prompt affordances.

A reviewer can inspect the shape of the tool registry: owners, scopes, auth modes, approval class, failure handling, and logging expectations.

tool: ads.campaign.read · owner: media-ops · auth: oauth-delegated · scope: read-only
tool: finance.invoice.reconcile · approval: required-for-write · rate-limit: tenant budget
tool: content.brief.generate · fallback: queue-for-review · audit: request + source pack id

Related case study →

Governance checklist Redacted reconstruction

Scope-diff review

New tool powers had a review path before they expanded what agents could do.

A change-review checklist showing how permission creep, account boundaries, approval requirements, and blast radius were reviewed.

change: read_campaign -> mutate_budget · reviewer: platform + business owner
risk check: new write scope requires human approval and rollback note
release gate: dry-run evidence attached before production enablement

Related case study →

Approval object Redacted reconstruction

Human-in-loop schema

High-impact actions were represented as reviewable objects before execution.

A structured approval payload for action intent, confidence, source evidence, reviewer identity, decision history, and audit trace.

intent: update campaign pacing · confidence: 0.82 · action_class: business-write
source_evidence: redacted metric window + model rationale + operator note
decision: approved | rejected | revise · audit_id: generated server-side

Related case study →

Incident note Redacted reconstruction

OAuth recovery log

Integration failures were debugged at the identity boundary, not patched around with brittle prompts.

A sanitized incident timeline for an opaque-token failure, audience-parameter diagnosis, JWT validation, and downstream access recovery.

symptom: downstream tool received opaque access token; JWT claims unavailable
root cause: missing audience parameter in delegated Auth0 flow
recovery: audience set, JWT validation restored, downstream account list rechecked

Related case study →

RAG Reliability

How retrieval quality, citations, refusals, latency, cost, and regressions become release gates instead of demo polish.

Packet contains 4 redacted reconstructions connected to the related production case study.

Open related case study →

Eval summary Redacted reconstruction

1,104-test gate report

RAG releases were gated by retrieval, citation, refusal, latency, cost, and regression checks.

A suite-level status report for the 1,104-test quality gate, including failure classes and release decision status.

retrieval_quality: pass · groundedness: pass · citation_integrity: pass
refusal_correctness: pass · latency_budget: pass · cost_budget: pass
release_decision: pass with monitored follow-up on stale-chunk alerts

Related case study →

Golden set Redacted reconstruction

Question matrix

Known-good questions, expected sources, and refusal cases were tracked together.

A redacted matrix of user questions, required source chunks, forbidden claims, refusal triggers, and context boundaries.

question: prove RAG quality · expected_source: eval harness case · forbidden: uncited metric
question: show private data · expected_behavior: refuse or summarize boundary only
question: compare platform work · required_sources: MCP + RAG + career brain

Related case study →

Budget sheet Redacted reconstruction

Latency and cost envelope

Quality controls included runtime and spend limits, not just answer quality.

A release envelope for P95 latency, per-query cost ceilings, routing decisions, cache expectations, and fallback behavior.

target: sub-300ms retrieval p95 · model fallback: local extractive if generation fails
cost guard: cache common evidence paths; route routine summaries to cheaper model
failure mode: timeout returns cited retrieval rather than uncited generation

Related case study →

Failure taxonomy Redacted reconstruction

Regression classes

The failure model covered retrieval drift, citation mismatch, weak refusals, and cost creep.

A taxonomy that lets reviewers classify and fix RAG failures without treating every bad answer as the same problem.

retrieval_drift: expected chunk not returned after corpus update
citation_mismatch: answer claim cites unrelated source
weak_refusal: sensitive request summarized too specifically

Related case study →

Private-Data AI

How tenant boundaries, SQL guardrails, approval queues, and local model routing keep sensitive workflows usable without leaking data.

Packet contains 4 redacted reconstructions connected to the related production case study.

Open related case study →

Isolation test Redacted reconstruction

Cross-tenant leakage checks

Tenant isolation was tested negatively, not only assumed from application code.

A negative-test matrix for client-scoped prompts, server-side tenant injection, SQL validation, and evidence redaction.

attempt: ask tenant A about tenant B metrics · expected: refusal + no SQL execution
attempt: omit tenant filter · expected: server injects tenant scope before query plan
attempt: request raw rows · expected: aggregate answer or refusal

Related case study →

SQL guardrail Redacted reconstruction

NL-to-SQL validation spec

Natural-language analytics were constrained by table policy, denied verbs, and tenant enforcement.

A SQL safety spec covering allowed relations, denied operations, explain-plan checks, timeout rules, and refusal triggers.

allowed: read-only aggregate selects against approved semantic views
denied: insert, update, delete, union exfiltration, unscoped joins
required: tenant predicate injected server-side before execution

Related case study →

Approval queue Redacted reconstruction

Business-action review

AI recommendations that could change business state passed through a human review queue.

A review object for proposed business actions, confidence, source summaries, reviewer identity, decision state, and rollback notes.

action: adjust pacing recommendation · state: pending_review
evidence: redacted aggregate deltas + anomaly explanation + source window
review: approve, reject, revise, or escalate with decision history

Related case study →

Model routing Redacted reconstruction

Local inference map

Sensitive routine tasks were routed away from external APIs when local models were sufficient.

A routing map showing which analyst requests can stay local, which require cloud models, and what data must be removed before escalation.

local: query planning, summarization of redacted aggregates, draft classification
cloud: high-stakes synthesis after sensitive fields are removed
blocked: raw client rows, credentials, private identifiers, cross-tenant prompts

Related case study →