Skip to content
01 — Core thesis

Applied AI engineering for systems that have to work.

RAG, MCP, governed agents, private-data workflows, AdTech automation, and release gates. Built with 28 years of systems craft behind the model layer.

28y systems craft 1998 first production web work 458 governed tool surfaces 1,104 release checks

Quiet systems. Visible consequences.

Interactive · portfolio retrieval engine

Ask the work.

Ask a real hiring or operating question. The answer retrieves local career context before it writes. Useful, bounded, and explicit about what it used.

28y shipping software 3 regions of operations (UK, US, APJ) 1,104 eval tests in CI
52 sources indexed 0ms retrieval now refreshed in browser
01 Question Real operating question
02 Retrieval Career context
03 Rank Strongest sections
04 Answer Bounded context
05 Link Case or note
Quick checks
Retrieved context
Operating snapshot

The system already exists under real constraints.

Current work combines platform ownership, governed agent tooling, and release behavior that can be inspected without turning the page into a courtroom brief.

Current role · platform remit

AI strategy with implementation authority.

Current work spans MCP server architecture, LLM agent systems, Snowflake governance, vendor procurement, local inference strategy, and desktop-agent policy.

Signal: live platform ownership, not generic AI interest.
MCP layer · governed tools

Tool count becomes interface design.

Hundreds of callable capabilities are framed around scopes, naming, approvals, logging, reusable skills, and private-data boundaries.

Signal: repeatability matters more than the number.
Eval layer · release control

Answers ship through gates.

Retrieval quality, refusals, latency, hallucination behavior, regressions, and spend are treated as deploy gates, not post-demo cleanup.

Signal: behavior is controlled before it reaches users.
02 — Platform competencies

The operating layer around AI.

The useful work is not the model call. It is the architecture, data access, observability, and organizational habits that make model work survivable inside a company.

01

Systems architecture and gateways

Go/Fiber services, multi-tenant API routing, OAuth boundaries, circuit breakers, queue discipline, and contracts that make hundreds of agent-callable tools legible.

Go · Fiber · MCP · OAuth · retry + rate limits
02

Data infrastructure

Snowflake governance, Windsor.ai and Monday.com pipelines, audience intent layers, analyst roles, vendor-portal extraction, and row-level boundaries.

Snowflake · Bombora · Leadspace · Windsor.ai
03

AI observability and governance

Eval gates, exfiltration review, Datadog telemetry, prompt-injection posture, Claude Desktop policy, shell execution rules, and local inference choices.

Datadog · RAGAS · Promptfoo · local Qwen
04

Product diligence and enablement

Vendor diligence, Field Guide training, developer standardization, AI brief workflows, decision memos, and rollout paths that teams can actually follow.

field guide · vendor review · adoption surfaces
pjb@principal-ai:~/platform$ tree --color=always_
458 tools · 12 MCP servers · 11 tenants · sub-300ms P95 · −65% inference cost
MCP + identity

Connected Claude to enterprise systems without turning auth into folklore.

Architected 12+ production MCP server integrations across NetSuite, Monday.com, HubSpot, Wrike, Google Ads, Adverity, Apify, Snowflake, LinkedIn, and Jasper. Root-caused a critical LinkedIn MCP OAuth failure — missing audience parameter producing opaque tokens instead of JWTs — that unblocked 8 downstream tools and cut resolution from 4 days to 2 hours. Patched a Google Ads list_accessible_customers parsing bug affecting multi-account access. Identity boundary on Auth0 with OAuth 2.1 / OIDC, modern Passkeys / FIDO2 supported for end-user surfaces.

Data + content

Made the data plane usable by agents and analysts.

Architected the Snowflake role hierarchy, including an AUDIENCEINTELLIGENCE role and a tiered analyst account model serving 25 users across 3 regions. Consolidated 40 legacy roles in a cross-regional governance cleanup, dropping credit consumption 35% (~£2,800/month). Integrated Bombora intent + Leadspace firmographics + GWI audience data into a Snowflake + Claude + Jasper pipeline (12 analysts on previously vendor-portal-locked queries), then built the custom content-pack-pipeline Jasper MCP server for four enterprise B2B clients.

Governance

Built the operating rules around the tools.

Defined the tooling architecture across Claude API, Claude Desktop, Claude Code, Cowork, OpenAI Codex, Cursor, Ollama, and LM Studio. Conducted a security review of autonomous agent options (OpenClaw, NanoClaw) — recommended sandboxed NanoClaw deployment after weighing exfiltration risk, prompt-injection surface, and audit-logging maturity. Identified governance gaps in Cowork (Claude Desktop automation) that drove revised internal policy on autonomous desktop agents. Recommended Datadog for Claude activity observability and governed the 45+ skill library (30 practitioners) including version control, review process, and backup strategy.

Diligence + rollout

Evaluated vendors like a builder and trained the organisation to use the result.

Delivered integration assessments for Google Ads, Reddit Ads, The Trade Desk, LinkedIn Ads, Meta Ads, DV360, StackAdapt, Clay.com, Windsor.ai, and Adverity. Owned the Claude + Firecrawl content audit pipeline with Apify failover. Built the 52-level Claude Field Guide adopted by 65+ practitioners (3 days → 2 hours time-to-first-useful-prompt). Shipped an AI Brief Builder standardising input quality across 9 account teams. Codified a reproducible macOS Node.js engineering environment (fnm, pnpm, Starship, Codex) used by 8 internal engineers.

Recent outcomes

Operating outcomes from the current platform.

  • 40% reduction in NetSuite manual reconciliation, ~15 hours / week saved
  • 35% reduction in Snowflake credit consumption, ~£2,800 / month saved
  • £24,000 + 4 months of rebuild avoided by stabilizing the crawl pipeline instead of replacing it
  • £450,000 in media trading decisions informed by structured AdTech API assessments
  • £18,000 + 6 months of self-hosted observability avoided by standardizing on Datadog
  • 3 days → 2 hours time-to-first-useful-prompt across 65+ practitioners trained through the Claude Field Guide
  • 60% reduction in data-to-insight latency from the Windsor.ai vendor selection
Resilience

Self-healing agent architecture, not happy-path demos.

Circuit breakers around external integrations, token-bucket rate limiting, LRU caching with size-and-age eviction, priority queues for approval-gated actions, retry policies with jitter, and per-tool timeouts. Multi-tenant API gateways in Go (Fiber) sustaining 2M+ daily requests at p95 < 100ms when the workload is high-throughput rather than agent-paced. 100% strict TypeScript with Zod validation across 77+ typed API interfaces — the entire agent surface area is auditable in CI before deployment. The point is the second-day failure modes that turn a working demo into a midnight pager.

Local + edge inference

Two-tier local stack so 75% of routine work never hits an external API.

Designed a two-tier local inference strategy: Qwen 3.5 27B dense for planning and reasoning, Qwen3-Coder-Next for agentic code execution, with the Claude API reserved for high-stakes decisions. Eliminates external API exposure for 75% of routine internal tasks — relevant when the work is over private operational data and tenant-isolated pipelines. Edge inference (Cloudflare Workers AI, Vercel Edge / Fluid Compute) reserved for latency-bound public surfaces where TLS termination, rate limiting, and AI personalization need to happen at the POP.

Computer Use

Vision-driven agents for systems that don't have APIs.

Shipped Computer Use agent workflows on the Anthropic Agent SDK + Computer Use API — letting agents navigate browser interfaces, interact with legacy web apps that lack APIs, and automate multi-step UI tasks via vision-based screen understanding. Closes the gap between "everything must be an MCP server" and the long tail of business-critical tools that aren't.

Roadmap + leadership

FY26 planning baseline owned end-to-end.

Authored the Media Planning & Finance AI Augmentation Roadmap (synced HTML + Word source), presented to leadership as the FY26 planning baseline. AI roadmaps owned end-to-end and held to outcomes, not demos.

Infra underneath

The engineering layer the AI work runs on.

AWS Lambda, Bedrock, S3, IAM, CloudWatch · Snowflake · Docker · Terraform · GitHub Actions · OpenTelemetry, Prometheus, Datadog · Ollama and local Qwen inference · Python, TypeScript, Rust · React, FastAPI, Node.js · vLLM, Triton, TensorRT, MLflow, W&B, PyTorch, DeepSpeed, LoRA, PEFT.

The point is not the tool list. The point is choosing the cheapest controlled path that satisfies latency, privacy, eval, and rollout requirements.

03 — Systems atlas

A systems map you can touch.

Drag the nodes into corners, pull the system apart, and watch the constraints recompose. Models, MCP, OAuth, Snowflake, crawlers, evals, observability, and rollout governance all share one operating surface.

458governed tools 1,104eval tests 34page MCP assessment
production-ai-os / systems atlas / evidence graph
drag a node; springs and collisions do the rest mcp · data · models · evals · governance · enablement
active node systems-atlas mcp / evidence view
artifact 01 · tool plane

MCP connector registry

NetSuite, Monday.com, HubSpot, Wrike, Google Ads, Snowflake, LinkedIn, Jasper. The proof is the governed surface: OAuth, scopes, naming, approvals, and failure recovery.

tool: finance.invoice.reconcile
auth: oauth-delegated
scope: read-only + approval-for-write
audit: request, owner, fallback
open registry artifact →
artifact 02 · quality gate

Eval release harness

1,104 tests across 29 suites covering retrieval quality, refusals, latency, regressions, hallucination behavior, and spend before deployment.

retrieval_quality: pass
citation_integrity: pass
refusal_correctness: pass
release_decision: ship with monitor
open eval artifact →
artifact 03 · data plane

Private intelligence stack

Snowflake, Bombora, Leadspace, GWI, local inference. Analyst and agent workflows over private data without leaking sensitive context into every model call.

attempt: tenant A asks tenant B
expected: refusal + no SQL
required: server tenant predicate
blocked: raw rows, private identifiers
open isolation artifact →
artifact 04 · identity boundary

Auth failure recovery

Opaque tokens instead of JWTs. A missing OAuth audience parameter on the LinkedIn MCP integration turned into the kind of low-level bug that blocks every downstream agent. Root-caused, fixed, 8 downstream tools unblocked, resolution time cut from 4 days to 2 hours.

symptom: opaque access token
root_cause: missing audience
recovery: JWT claims restored
recheck: downstream account list
open recovery artifact →
artifact 05 · enablement

52-level Claude Field Guide

Training as product, not documentation. Internal adoption moved from scattered prompt lore into an interactive React learning surface.

path: prompt lore -> guided missions
surface: React field guide
audience: 65+ practitioners
result: 3 days -> 2 hours
read writing →
artifact 06 · diligence

Vendor decisions with teeth

34-page Monday.com MCP assessment plus AdTech API reviews. Feasibility, security, licensing, rollout cost, and operating risk in one decision path.

review: capability, auth, cost
risk: rollout + data exposure
decision: pilot / defer / reject
handoff: owner + next check
ask the proof engine →
04 — Chronicle

Twenty-eight years before the current AI chapter.

The current platform work sits on an older record: web systems, regulated data, enterprise search, real-time campaigns, clinical ML, defense software, and the habit of turning pressure into working software.

  1. 1998

    Fordham web systems

    Working as a university web developer while still a CS student. The degree became formal later; the habit of learning in public systems started here.

  2. 2005

    Operational software for real staff

    Client systems, intranets, commerce, data workflows, and the early muscle memory of making software work for people who did not care about the stack.

  3. 2012

    Real-time creative systems

    Oreo Daily Twist, Super Bowl blackout response, Coca-Cola Polar Bowl, and campaign platforms where timing, polish, and resilience mattered at public scale.

  4. 2018

    Clinical ML and regulated platforms

    HIPAA, SOC 2, PCI, IVF prediction, patient financing, and telehealth scale. Useful AI starts to look less like magic and more like controls around data.

  5. 2023

    Defense and enterprise AI operations

    Classified mission planning, MCP servers, Snowflake governance, local inference, desktop-agent policy, and global operating surfaces across UK, US, and APJ.

  6. 2026

    Production AI as a systems discipline

    The work now centers on governed agents, retrieval quality, eval gates, private-data workflows, and teaching surfaces that let teams operate the system without mythology.

05 — Evidence vault & writing

Public artifacts, dated thinking, and durable notes.

A compact index of the conventional artifacts and public writing that support the work: resume, repositories, case studies, recommendations, field notes, and older archive entries.

Essay 01

MCP is not a tooling problem. It is a governance problem.

Anyone can expose an API to a model. The hard part is deciding what the model is allowed to touch, what gets logged, what requires approval, and what happens when an integration fails halfway through a business process.

The impressive number is not 458 tools. It is the contract discipline that keeps hundreds of tools from becoming hundreds of new ways to lose control.

tool contracts OAuth boundaries approval paths
Read the essay →
Essay 02

Most eval harnesses are too impressed with answers.

Answer quality matters, but production systems fail in less flattering ways. Retrieval gets worse after a content migration. Refusals regress. Latency crosses the budget. Spend creeps. A prompt change helps one client and quietly hurts another.

A real eval harness is a release gate for behavior, cost, latency, refusal posture, and retrieval drift, not a scoreboard for pretty generations.

1,104 tests 29 suites hard CI gates
RAG & evals case study →
Essay 03

The best agent systems are deliberately unromantic.

The pitch says autonomy. The shipped product needs boring rails: scoped tools, observable plans, deterministic fallbacks, permission checks, and a clear place where a human can say no.

The job is not to make the agent seem alive. The job is to make it safe enough that the business can let it act.

human-in-loop bounded autonomy rollback paths
Agent platform case study →
Archive

Public writing should compound, not disappear.

Competitor sites get authority from a dated body of public work: papers, posts, essays, talks, and notes that prove the thinking existed before the current page. Older writing now lives here as an inspectable archive instead of being stranded on platform profiles.

The archive is not here to make every old post equally important. It is here to show a continuous public trail across AI, engineering, career, accessibility, frameworks, and developer education.

49 entries DEV + Medium + LinkedIn dated sources
Open writing archive →
06 — Field guide

A practical field guide for production AI judgment.

Eight text-only missions that turn the case-study patterns into local builds: agents, agentic coding, RAG, vector databases, evals, tool use, and rollout discipline.

8 Lessons
8 Local builds
~6h First pass
$0 Path via Ollama
Python Only prereq
Production AI field guide · 8 missions each card opens its lesson
Mission 01

What an agent actually is.

Learn the loop: context, plan, action, observation, state, stop.

Artifact: a tiny task loop that can choose search, summarize, or ask-for-clarification.
Mission 02

Prompting as interface design.

Turn prompt writing into contracts, schemas, refusals, and handoffs.

Artifact: one prompt rewritten into a system instruction, task brief, and structured output schema.
Mission 03

Agentic coding without chaos.

Use plans, file boundaries, diffs, tests, and rollback discipline.

Artifact: the same small feature with two coding agents, then compare plans, patches, and tests.
Mission 04

RAG from first principles.

Build retrieval around chunks, metadata, citations, and answer grounding.

Artifact: a local document Q&A over markdown notes with citations back to source files.
Mission 05

Vector databases without mystery.

Compare keyword search, embeddings, filters, reranking, and top-k failure.

Artifact: a search comparison harness over one shared question set.
Mission 06

Evals before belief.

Make golden questions, regression checks, refusal checks, and cost limits.

Artifact: a 20-question eval file for hallucination, retrieval drift, refusals, and latency.
Mission 07

Tool use, MCP, and boundaries.

Define tool contracts, scopes, approval steps, dry runs, and trace logs.

Artifact: one safe tool with an input schema, dry-run mode, permission check, and trace log.
Mission 08

Ship the loop.

Connect prompts, retrieval, tools, evals, traces, cost guards, and fallback paths.

Artifact: a production agent capstone with logs, citations, evals, and a visible decision trail.
07 — Contact

Start with the system that has to change.

For Principal AI Engineer, Staff Applied AI, AI Platform, Agent Systems, RAG, MCP, or fractional technical leadership work.

Portrait of Philip John Basile

Twenty-eight years in. The unfashionable parts of the job — eval gates, rollback paths, latency budgets, the failure modes nobody writes blog posts about — are the parts I find most interesting.

Currently available for senior applied AI roles and fractional technical leadership.

Phil and I talk about AI regularly, especially MLOps. He has solid expertise with LLMs and RAG implementations in particular, plus knows how to put Python to work effectively in AI projects. He'd be a real asset to any team working in this space.
Alan Cafferkey, Ph.D. AI Implementation Leader | Educational Technology Director | Mission-Driven Innovator
I just recently had the pleasure of working with Philip Basile on a team for an extended period. He was a committed, strong, and dedicated team member. He provided guidance and knowledge to the entire team, from assistance with onboarding and IDE configuration and integration with source control and CI systems to learning the newest offerings in our team's technology stack, followed by documenting and sharing his experience. He immediately became a mentor. Philip brought with him, and shared, an impressive depth of understanding of front-end systems, enterprise architecture, and the intricate interdependence of design, functionality and user experience. With all of this, he consistently produced elegant code, markup, and CSS that provided a comprehensive, engaging, and seamless user experience, catching and handling edge and corner cases gracefully. Philip was easy to work with, cooperative, and delivered constructive feedback in a manner that encouraged others to participate in a healthy and productive peer review process. He made the team stronger and greater than the sum of its parts.
Dennis Luken Front-End/UI Architect/Developer | Full Stack Angular/React/Node
Phil contributed front-end development to our team as a contractor. He developed the web interface for a number of applications and ensured the user experience requirements were met in both desktop and mobile rendering. As a front-end engineer, the applications required TypeScript/JavaScript using the VueJS framework and Vuex for state management, with back-end data retrieval using REST. Phil ensured very high code coverage and code quality standards were met through unit testing with Jest and Vue Development Utils, end-to-end testing using WebdriverIO, and SonarQube quality scans. Docker environments were also part of the daily development lifecycle. Phil worked well with other team members.
Neil Hall Web Architect / Full-stack Software Engineer at Phoenix Contact

Tell me what has to work.

A useful first note names the workflow, the data boundary, the failure mode, and who needs to trust the result.

Email directly