01 — Proof engine

I build production AI systems for companies moving past demos.

LLM agents, RAG, MCP, AdTech automation, private-data workflows, and eval gates. The operating layer between frontier models and real companies, built on 28 years of systems craft.

See proof Start a conversation

28y shipping software 1998 Fordham web developer as a student 458 agent-callable tools, governed 1,104 eval tests in CI

Show me the receipts. That is the design brief.

Interactive · Gemini RAG proof engine

Ask for proof.

Ask the question a hiring partner would actually ask. The answer retrieves evidence first, then asks Gemini to respond from that bounded context. No generic pitch without a citation trail.

28y shipping software 3 regions of operations (UK, US, APJ) 1,104 eval tests in CI

52 sources indexed 0ms retrieval now refreshed in browser

01 Question Hiring-grade ask

02 Retrieval Career brain + proof inventory

03 Citations Ranked trace

04 Answer Bounded context

05 Artifact Case study or packet

Review mode

Retrieval trace · cited sections

Proof ledger

The claims stay attached to sources.

A tighter inventory of what this page is proving: current platform ownership, governed agent tooling, and eval-backed delivery.

Current role · platform remit

AI strategy with operating scope.

Current work spans MCP server architecture, LLM agent systems, Snowflake governance, vendor procurement, local inference strategy, and desktop-agent policy.

Signal: this is live platform ownership, not a generic AI interest.

MCP layer · governed tools

Tool count becomes a contract problem.

Hundreds of callable capabilities are framed around scopes, naming, approvals, logging, reusable skills, and private-data boundaries.

Signal: the proof is governance and repeatability, not a big number by itself.

Eval layer · release control

Answers ship through gates.

Retrieval quality, refusals, latency, hallucination behavior, regressions, and spend are treated as deploy gates, not post-demo cleanup.

Signal: the proof surface shows how AI behavior is allowed to ship.

02 — Full 360

AI is the current chapter. The advantage is the whole arc.

I started in 1998 as a Fordham University web developer while still a student. I lived the degree before it was official, then kept learning after the degree was already out of date. The career since then is not a random stack list; it is the craft record behind the AI work.

1998 → now

Models are the flour. Experience is the bake.

Everyone has access to AI now. Everyone also has access to flour. The difference is knowing how to combine ingredients, control heat, recover when something goes wrong, and make something worth serving. My edge is the 28-year operating record behind the prompts: self-taught, still learning, and always turning the work into a path other people can climb.

28y: software craft
40+: systems shipped
458: governed AI tools
1,104: eval tests

How the arc compounds

Self-taught never meant isolated. The pattern is learn the thing, ship the thing, document the thing, and raise the people around it. I have coached engineers into senior roles, compressed onboarding paths, built field guides, led interviews for clients, and usually tried to lift the team higher than myself. That is why the AI work is not just prompt skill; it is judgment turned into systems other people can trust and use.

03 — Hiring fit

Where the record is strongest right now.

A buyer or hiring team should not have to read the whole site to know whether the fit is real. These are the lanes where the proof, case studies, and public artifacts line up cleanly.

01 · Agent governance

Enterprise tools without uncontrolled autonomy.

MCP servers, OAuth boundaries, tool contracts, approvals, logging, desktop-agent policy, and recovery when integrations behave differently than the demo.

Best evidence: MCP platform + systems atlas

02 · RAG quality

Retrieval systems with release gates.

Private knowledge workflows, citation behavior, regression suites, refusal checks, cost budgets, latency targets, and evals that block bad releases.

Best evidence: RAG eval harness

03 · Private-data AI

AI workflows over data that cannot leak.

Tenant isolation, Snowflake roles, local inference, approval queues, NL-to-SQL guardrails, and useful analyst interfaces over sensitive operational data.

Best evidence: marketing intelligence case

04 · Technical leadership

Ambiguous AI work turned into operating systems.

Vendor diligence, platform architecture, adoption surfaces, field guides, internal standards, and the translation between executives, operators, and engineers.

Best evidence: current platform record

Not the right lane

I am not optimizing this site for frontier model research roles, academic publication paths, brand-only AI strategy, or prompt coaching without implementation authority. The strongest fit is senior applied AI, production systems, platform ownership, and fractional technical leadership.

04 — Selected work

Three production systems show the pattern.

Private data, governed tools, measurable quality, and business workflows that survive outside the demo. The 458-tool number is the total agent-callable surface across MCP tools, custom tools, Claude skills, and workflow actions — the architecture includes 12+ production MCP server integrations, not 458 separate MCP servers.

tenant-isolation-audit / prod run #742

Natural language request "Which campaigns are underperforming for this account, and what budget shift should we approve?" route: NL-to-SQL -> attribution -> forecast -> approval queue

Generated access guard WHERE tenant_id = :active_tenant AND action_requires_review = true local qwen route selected for sensitive private data

client_a query
2.3M-row warehouse, scoped to active tenant 0 leaks

cross-tenant probe
prompt-level and SQL-level boundary test blocked

budget mutation
campaign change enters analyst approval queue review

Case 01 / Private data

Multi-tenant marketing intelligence — 11 client accounts with audited tenant-isolation controls.

Eleven brands, 2.3M private rows, 31 Python modules, 80+ REST endpoints, 20+ AI features. Local Qwen and Ollama inference for sensitive data. NL-to-SQL, attribution, forecasting, anomaly detection. Three layers of tenant isolation, audited.

11Tenants

2.3MRows

80+Endpoints

Read the case study →

Case 02 / Agents

The MCP agent platform — 458 governed tools, not 458 prompts.

Enterprise MCP work across NetSuite, Monday.com, HubSpot, Wrike, Google Ads, Adverity, Apify, Snowflake, LinkedIn, and Jasper. The impressive part is not tool count. It is reducing hundreds of capabilities into stable contracts, OAuth boundaries, approval paths, and reusable skills so teams can ship without prompt folklore.

458Tools

23Skills

−87%Research time

Read the case study →

Case 03 / RAG & evals

Production RAG with 1,104 tests gating every release.

Five RAG systems at roughly 50 QPS, sub-300ms P95, retrieval accuracy up 40%, inference cost down 65%. RAGAS, LangSmith, and Promptfoo gates for retrieval quality, hallucination behavior, latency ceilings, refusal correctness, and spend. Most evals only test answer quality. These don't.

5Systems

<300msP95

−65%Cost

Read the case study →

05 — Career proof beyond the AI cycle

28 years of production systems underneath the AI work.

Started as a Fordham web developer while still a student, then kept self-teaching as each official credential aged. Classified defense, HIPAA-regulated clinical data, NYSE-scale telehealth, global logistics, commerce, enterprise search, and Cannes-winning advertising platforms became the operating base the current AI platform sits on.

28years shipping 12time zones led 5engineers coached

01U.S. Air Force AMCClassified mission planning at Scott AFB
02IBMGlobal search, 3× query speed, 80% cost reduction, CTO commendation
03Teladoc HealthNYSE-debut scale, 12.2M → 15.1M members, sub-2-second WebRTC connect time
04IntegraMed50+ clinics, 40K+ IVF cycles, ~30% prediction improvement, $5M+ ML impact
05BaubleBar$10M+ platform revenue, 30% conversion lift, 100K+ concurrent launch users
06360i / DentsuOreo Daily Twist, Super Bowl blackout, sub-5-min content decisions, millions concurrent

Systems shipped across high-stakes environments

Where the work actually had to survive.

Defense: U.S. Air Force AMC (Scott AFB), classified mission planning
Healthcare & regulated: Teladoc Health, IntegraMed Fertility, Bayer HealthCare — HIPAA, SOC 2 Type II, PCI DSS, FedRAMP-aligned controls, WCAG 2.2 AA
Enterprise & logistics: IBM, Atlas Air, Dragos, ADP, FIS, Phoenix Contact, Shubert Ticketing, Bremer Bank, IFF
Creative & AdTech: 360i / Dentsu, Publicis Groupe, Fox Sports, Cannes Grand Prix & SABRE Gold campaigns

Adjacent systems background

Real-time, simulation, creative — design lineage, not the brand.

Before the current agent platform work, I spent years on real-time systems, game AI, interactive campaigns, creative tools, and simulation workflows. Unreal Engine 5 (Epic Games-recognized developer since 2014, Nintendo and Sony licensed), Unity 6, behavior trees, AI perception, navigation, digital twins, ONNX integration. Plus the creative-AI stack — Stable Diffusion, Adobe Firefly, Midjourney, Sora, LoRA, ControlNet, ComfyUI, Creative Cloud integration — wired into multimodal pipelines that cut on-brand asset production time ~40%, with LoRA / PEFT fine-tuning delivering ~25% performance improvement on domain-specific generation tasks. That background shapes how I design agents today: state, memory, perception, pathing, tool choice, fallback behavior, and decisions under frame-time constraints. The site's main lane is still production AI for enterprise — this is the design lineage underneath it.

06 — Evidence ledger

Every major claim has a trail.

A case study, public artifact, resume entry, repo, field note, or operating plan behind every line.

AI platform ownership

Enterprise operating layer, not model research theater.

Sole technical decision-maker for AI platform strategy across UK, US, and APJ operations: MCP architecture, LLM agent systems, Snowflake governance, vendor procurement, local inference, and desktop-agent policy.

Open proof

Agent systems

458 governed tools with contracts, scopes, and approval paths.

The claim is not tool count. The evidence is the governed surface: OAuth, naming, scope control, reusable Claude skills, failure recovery, and human approval boundaries.

Case study

Quality gates

1,104 eval and regression tests before RAG deployment.

Retrieval quality, hallucination behavior, refusal posture, latency, cost, and regressions are treated as release gates instead of demo polish.

Case study

Private data

11 tenants, 2.3M rows, audited isolation controls.

Prompt constraints, SQL validation, server-side tenant injection, local inference paths, and approval queues that reduce cross-tenant leakage risk over private client data.

Case study

Career depth

Twenty-eight years shipping systems where failure is visible.

NYSE-scale telehealth, classified mission planning, global flight scheduling, clinical ML, commerce launch scale, and Cannes Grand Prix interactive work before the AI platform layer.

Record

Teaching surface

A free field guide built from production patterns.

Eight text-only missions, local artifacts, knowledge checks, persisted progress, and a self-issued certificate. The course supports the production thesis instead of replacing the portfolio.

Course

Proof links

LinkedIn GitHub Resume PDF Market Map Case Studies Course Artifacts 90-Day Plan Writing Archive

07 — Market map

Where this sits in the AI engineer market.

The competitor set is not one market. It splits into research authority, education authority, agent tooling companies, and enterprise AI operators. This site should win only one of those lanes: production AI systems inside real organizations.

Research authority

Frontier labs and university faculty.

Their edge: papers, citations, labs, students, awards, foundation-model history, and institutional reputation.

Our answer: do not pretend to be that. Translate frontier capability into controlled production systems: identity, tools, retrieval, evals, observability, and rollout policy.

Education authority

Massive courses and public teachers.

Their edge: learner scale, lectures, books, certificates, testimonials, and established teaching brands.

Our answer: a narrower field guide from production work. The course exists to expose judgment, not to compete with university-scale AI education.

Agent tooling companies

Platforms, frameworks, docs, and adoption metrics.

Their edge: product gravity, downloads, customer logos, SDKs, docs, changelogs, and ecosystem ownership.

Our answer: be the operator who chooses, wires, governs, evaluates, and recovers those tools inside a business with private data and real risk.

Enterprise AI operators

The quiet lane with the most buying intent.

Their edge: many have strong private experience but weak public surfaces, because the work sits behind client systems and internal documents.

Our answer: make the private work legible without leaking it: redacted case studies, artifact ledgers, system maps, public course material, and a proof engine.

The positioning is simple: not the best researcher, not the biggest teacher, not a tooling vendor. A production AI systems engineer with proof that the operating layer has already been built.

Trace the proof →

Competitive lane

I am not positioning as a frontier model researcher. I build the enterprise operating layer that lets teams use frontier models safely: tools, identity, retrieval, evals, governance, rollout, and recovery when the system fails in public.

pjb@principal-ai:~/platform$ tree --color=always_

┌─[ INPUT ]────────┐    ┌─[ RETRIEVAL ]──┐    ┌─[ MODEL ]──────────┐
│ briefs           │───▶│ vector+hybrid  │───▶│ Claude frontier    │
│ tools / context  │    │ reranker       │    │ Qwen 27B local     │
│ multi-tenant     │    │ iterative loop │    │ Qwen3-Coder agent  │
└──────────────────┘    └────────────────┘    └─────────┬──────────┘
                                                        │
┌─[ EVAL ]─────────┐    ┌─[ GUARDS ]─────┐    ┌─────────▼──────────┐
│ 1,104 tests      │◀───│ Llama Guard    │◀───│ governed output    │
│ 29 suites        │    │ NeMo Guardrails│    │ citations          │
│ CI hard-stop     │    │ audit log      │    │ refuse / escalate  │
└──────────────────┘    └────────────────┘    └────────────────────┘

458 tools · 12 MCP servers · 11 tenants · sub-300ms P95 · −65% inference cost

08 — Current platform

Sole technical voice for an AI operating system.

Since 2023, the work has been less about isolated prototypes and more about the operating layer around AI: MCP servers, OAuth, Snowflake governance, content pipelines, vendor due diligence, local inference, desktop-agent policy, and the training surface that lets non-engineers use it without breaking the business. Tooling architecture supports 120 staff across 3 global regions (UK, US, APJ): 30 practitioners on the Claude skills library, 25 users on Snowflake, 65+ trained through the Field Guide, 9 account teams on the AI Brief Builder.

34page Monday.com MCP assessment, 60+ user pilot scoped 52level Claude Field Guide 4enterprise B2B clients on Jasper workflow

Failure

Opaque tokens broke downstream tools.

Fix: traced the missing Auth0 audience parameter, restored JWT validation, and rechecked account access.

Outcome · 8 tools unblocked · 4 days to 2 hours

Failure

Crawl reliability threatened a rebuild.

Fix: stabilized Firecrawl with Apify failover and explicit recovery paths instead of replacing the stack.

Outcome · £24K and 4 months avoided

Failure

Snowflake access had grown into role sprawl.

Fix: redesigned analyst roles and consolidated legacy permissions into a smaller governed surface.

Outcome · 35% credit reduction · ~£2,800/month

MCP + identity

Connected Claude to enterprise systems without turning auth into folklore.

Architected 12+ production MCP server integrations across NetSuite, Monday.com, HubSpot, Wrike, Google Ads, Adverity, Apify, Snowflake, LinkedIn, and Jasper. Root-caused a critical LinkedIn MCP OAuth failure — missing audience parameter producing opaque tokens instead of JWTs — that unblocked 8 downstream tools and cut resolution from 4 days to 2 hours. Patched a Google Ads list_accessible_customers parsing bug affecting multi-account access. Identity boundary on Auth0 with OAuth 2.1 / OIDC, modern Passkeys / FIDO2 supported for end-user surfaces.

Data + content

Made the data plane usable by agents and analysts.

Architected the Snowflake role hierarchy, including an AUDIENCEINTELLIGENCE role and a tiered analyst account model serving 25 users across 3 regions. Consolidated 40 legacy roles in a cross-regional governance cleanup, dropping credit consumption 35% (~£2,800/month). Integrated Bombora intent + Leadspace firmographics + GWI audience data into a Snowflake + Claude + Jasper pipeline (12 analysts on previously vendor-portal-locked queries), then built the custom content-pack-pipeline Jasper MCP server for four enterprise B2B clients.

Governance

Built the operating rules around the tools.

Defined the tooling architecture across Claude API, Claude Desktop, Claude Code, Cowork, OpenAI Codex, Cursor, Ollama, and LM Studio. Conducted a security review of autonomous agent options (OpenClaw, NanoClaw) — recommended sandboxed NanoClaw deployment after weighing exfiltration risk, prompt-injection surface, and audit-logging maturity. Identified governance gaps in Cowork (Claude Desktop automation) that drove revised internal policy on autonomous desktop agents. Recommended Datadog for Claude activity observability and governed the 45+ skill library (30 practitioners) including version control, review process, and backup strategy.

Diligence + rollout

Evaluated vendors like a builder and trained the organisation to use the result.

Delivered integration assessments for Google Ads, Reddit Ads, The Trade Desk, LinkedIn Ads, Meta Ads, DV360, StackAdapt, Clay.com, Windsor.ai, and Adverity. Owned the Claude + Firecrawl content audit pipeline with Apify failover. Built the 52-level Claude Field Guide adopted by 65+ practitioners (3 days → 2 hours time-to-first-useful-prompt). Shipped an AI Brief Builder standardising input quality across 9 account teams. Codified a reproducible macOS Node.js engineering environment (fnm, pnpm, Starship, Codex) used by 8 internal engineers.

Recent outcomes

Operating outcomes from the current platform.

40% reduction in NetSuite manual reconciliation, ~15 hours / week saved
35% reduction in Snowflake credit consumption, ~£2,800 / month saved
£24,000 + 4 months of rebuild avoided by stabilizing the crawl pipeline instead of replacing it
£450,000 in media trading decisions informed by structured AdTech API assessments
£18,000 + 6 months of self-hosted observability avoided by standardizing on Datadog
3 days → 2 hours time-to-first-useful-prompt across 65+ practitioners trained through the Claude Field Guide
60% reduction in data-to-insight latency from the Windsor.ai vendor selection

Resilience

Self-healing agent architecture, not happy-path demos.

Circuit breakers around external integrations, token-bucket rate limiting, LRU caching with size-and-age eviction, priority queues for approval-gated actions, retry policies with jitter, and per-tool timeouts. Multi-tenant API gateways in Go (Fiber) sustaining 2M+ daily requests at p95 < 100ms when the workload is high-throughput rather than agent-paced. 100% strict TypeScript with Zod validation across 77+ typed API interfaces — the entire agent surface area is auditable in CI before deployment. The point is the second-day failure modes that turn a working demo into a midnight pager.

Local + edge inference

Two-tier local stack so 75% of routine work never hits an external API.

Designed a two-tier local inference strategy: Qwen 3.5 27B dense for planning and reasoning, Qwen3-Coder-Next for agentic code execution, with the Claude API reserved for high-stakes decisions. Eliminates external API exposure for 75% of routine internal tasks — relevant when the work is over private operational data and tenant-isolated pipelines. Edge inference (Cloudflare Workers AI, Vercel Edge / Fluid Compute) reserved for latency-bound public surfaces where TLS termination, rate limiting, and AI personalization need to happen at the POP.

Computer Use

Vision-driven agents for systems that don't have APIs.

Shipped Computer Use agent workflows on the Anthropic Agent SDK + Computer Use API — letting agents navigate browser interfaces, interact with legacy web apps that lack APIs, and automate multi-step UI tasks via vision-based screen understanding. Closes the gap between "everything must be an MCP server" and the long tail of business-critical tools that aren't.

Roadmap + leadership

FY26 planning baseline owned end-to-end.

Authored the Media Planning & Finance AI Augmentation Roadmap (synced HTML + Word source), presented to leadership as the FY26 planning baseline. AI roadmaps owned end-to-end and held to outcomes, not demos.

Infra underneath

The engineering layer the AI work runs on.

AWS Lambda, Bedrock, S3, IAM, CloudWatch · Snowflake · Docker · Terraform · GitHub Actions · OpenTelemetry, Prometheus, Datadog · Ollama and local Qwen inference · Python, TypeScript, Rust · React, FastAPI, Node.js · vLLM, Triton, TensorRT, MLflow, W&B, PyTorch, DeepSpeed, LoRA, PEFT.

The point is not the tool list. The point is choosing the cheapest controlled path that satisfies latency, privacy, eval, and rollout requirements.

09 — Systems atlas

The operating map behind the work.

One graph for the production surface: models, MCP servers, OAuth, Snowflake, Jasper, crawlers, eval gates, observability, and rollout governance.

458governed tools 1,104eval tests 34page MCP assessment

production-ai-os / systems atlas / evidence graph

Production AI systems atlas interactive proof map

drag a node; springs and collisions do the rest mcp · data · models · evals · governance · enablement

active node systems-atlas mcp / evidence view

artifact 01 · tool plane

MCP connector registry

NetSuite, Monday.com, HubSpot, Wrike, Google Ads, Snowflake, LinkedIn, Jasper. The proof is the governed surface: OAuth, scopes, naming, approvals, and failure recovery.

tool: finance.invoice.reconcile
auth: oauth-delegated
scope: read-only + approval-for-write
audit: request, owner, fallback

open registry artifact →

artifact 02 · quality gate

Eval release harness

1,104 tests across 29 suites covering retrieval quality, refusals, latency, regressions, hallucination behavior, and spend before deployment.

retrieval_quality: pass
citation_integrity: pass
refusal_correctness: pass
release_decision: ship with monitor

open eval artifact →

artifact 03 · data plane

Private intelligence stack

Snowflake, Bombora, Leadspace, GWI, local inference. Analyst and agent workflows over private data without leaking sensitive context into every model call.

attempt: tenant A asks tenant B
expected: refusal + no SQL
required: server tenant predicate
blocked: raw rows, private identifiers

open isolation artifact →

artifact 04 · identity boundary

Auth failure recovery

Opaque tokens instead of JWTs. A missing OAuth audience parameter on the LinkedIn MCP integration turned into the kind of low-level bug that blocks every downstream agent. Root-caused, fixed, 8 downstream tools unblocked, resolution time cut from 4 days to 2 hours.

symptom: opaque access token
root_cause: missing audience
recovery: JWT claims restored
recheck: downstream account list

open recovery artifact →

artifact 05 · enablement

52-level Claude Field Guide

Training as product, not documentation. Internal adoption moved from scattered prompt lore into an interactive React learning surface.

path: prompt lore -> guided missions
surface: React field guide
audience: 65+ practitioners
result: 3 days -> 2 hours

read writing →

artifact 06 · diligence

Vendor decisions with teeth

34-page Monday.com MCP assessment plus AdTech API reviews. Feasibility, security, licensing, rollout cost, and operating risk in one decision path.

review: capability, auth, cost
risk: rollout + data exposure
decision: pilot / defer / reject
handoff: owner + next check

ask the proof engine →

10 — Recommendations

Endorsed by peers across seven domains.

AI implementation, technical leadership, quality gates, delivery, and client engagement.

“ Phil and I talk about AI regularly, especially MLOps. He has solid expertise with LLMs and RAG implementations in particular, plus knows how to put Python to work effectively in AI projects. He'd be a real asset to any team working in this space. ”

“ I just recently had the pleasure of working with Philip Basile on a team for an extended period. He was a committed, strong, and dedicated team member. He provided guidance and knowledge to the entire team, from assistance with onboarding and IDE configuration and integration with source control and CI systems to learning the newest offerings in our team's technology stack, followed by documenting and sharing his experience. He immediately became a mentor. Philip brought with him, and shared, an impressive depth of understanding of front-end systems, enterprise architecture, and the intricate interdependence of design, functionality and user experience. With all of this, he consistently produced elegant code, markup, and CSS that provided a comprehensive, engaging, and seamless user experience, catching and handling edge and corner cases gracefully. Philip was easy to work with, cooperative, and delivered constructive feedback in a manner that encouraged others to participate in a healthy and productive peer review process. He made the team stronger and greater than the sum of its parts. ”

“ Phil contributed front-end development to our team as a contractor. He developed the web interface for a number of applications and ensured the user experience requirements were met in both desktop and mobile rendering. As a front-end engineer, the applications required TypeScript/JavaScript using the VueJS framework and Vuex for state management, with back-end data retrieval using REST. Phil ensured very high code coverage and code quality standards were met through unit testing with Jest and Vue Development Utils, end-to-end testing using WebdriverIO, and SonarQube quality scans. Docker environments were also part of the daily development lifecycle. Phil worked well with other team members. ”

“ It was a pleasure working with Philip. He excelled at developing multiple UI components simultaneously while integrating with various microservices that utilized RESTful APIs. Philip displayed extraordinary communication skills while implementing UX designs by effectively detailing any blockers, inconsistent documentation, or missing requirements. He has also been able to confidently demonstrate and document his completed work. Philip always brings positive energy to meetings and discussions. He works well in teams and can quickly adapt to changes in organizational structure. ”

“ Philip is an exceptional front-end developer and IT professional. I had the pleasure of working with him on a few challenging client engagements, and his ability to quickly step up in a lead capacity, drive work, and produce results was greatly appreciated. Combined with his technical skill, his professionalism and personality makes him a key asset to any team. ”

“ Phil consistently produces new ideas and approaches to improve code or streamline development processes. He has a knack for identifying potential issues early and developing creative solutions to complex challenges. His passion for innovation makes him an asset in providing fresh perspectives during technical discussions. Phil is an asset to any team seeking an influential contributor with an innovative mindset. ”

“ Philip is a UX Master, and a front-end usability champion. He has deep knowledge of CSS, JavaScript and User Experience Design. He improved the overall quality of Teladoc's website experience, and made the product better. Besides front-end, Philip is a self-starter who always keeps improving himself, and I know he has become quite good at backend development too. Finally, Philip is a great co-worker, reliable and with a fantastic sense of humor. He would be a valuable and important member of any team he joins or leads. ”

See all 37 recommendations →

11 — Writing

A public body of work, not a feed.

Native essays, older posts, and source-labeled archives brought back onto the site so the thinking is easy to inspect by date, platform, and topic.

Position 01

MCP is not a tooling problem. It is a governance problem.

Anyone can expose an API to a model. The hard part is deciding what the model is allowed to touch, what gets logged, what requires approval, and what happens when an integration fails halfway through a business process.

The impressive number is not 458 tools. It is the contract discipline that keeps hundreds of tools from becoming hundreds of new ways to lose control.

tool contracts OAuth boundaries approval paths

Ask the portfolio about MCP →

Position 02

Most eval harnesses are too impressed with answers.

Answer quality matters, but production systems fail in less flattering ways. Retrieval gets worse after a content migration. Refusals regress. Latency crosses the budget. Spend creeps. A prompt change helps one client and quietly hurts another.

A real eval harness is a release gate for behavior, cost, latency, refusal posture, and retrieval drift, not a scoreboard for pretty generations.

1,104 tests 29 suites hard CI gates

RAG & evals case study →

Position 03

The best agent systems are deliberately unromantic.

The pitch says autonomy. The shipped product needs boring rails: scoped tools, observable plans, deterministic fallbacks, permission checks, and a clear place where a human can say no.

The job is not to make the agent seem alive. The job is to make it safe enough that the business can let it act.

human-in-loop bounded autonomy rollback paths

Agent platform case study →

Position 04

Public writing should compound, not disappear.

Competitor sites get authority from a dated body of public work: papers, posts, essays, talks, and notes that prove the thinking existed before the current page. Older writing now lives here as an inspectable archive instead of being stranded on platform profiles.

The archive is not here to make every old post equally important. It is here to show a continuous public trail across AI, engineering, career, accessibility, frameworks, and developer education.

49 entries DEV + Medium + LinkedIn dated sources

Open writing archive →

12 — Field Guide

A field guide that exposes the production judgment behind the work.

Not a course business. A public artifact that exposes the judgment behind the case studies: agents, agentic coding, RAG, vector databases, evals, tool use, and rollout discipline — translated into eight text-only missions with hands-on builds.

8 Lessons

8 Local builds

~6h First pass

$0 Path via Ollama

Python Only prereq

Production AI field guide · 8 missions each card opens its lesson

Mission 01

What an agent actually is.

Learn the loop: context, plan, action, observation, state, stop.

Artifact: a tiny task loop that can choose search, summarize, or ask-for-clarification.

Mission 02

Prompting as interface design.

Turn prompt writing into contracts, schemas, refusals, and handoffs.

Artifact: one prompt rewritten into a system instruction, task brief, and structured output schema.

Mission 03

Agentic coding without chaos.

Use plans, file boundaries, diffs, tests, and rollback discipline.

Artifact: the same small feature with two coding agents, then compare plans, patches, and tests.

Mission 04

RAG from first principles.

Build retrieval around chunks, metadata, citations, and answer grounding.

Artifact: a local document Q&A over markdown notes with citations back to source files.

Mission 05

Vector databases without mystery.

Compare keyword search, embeddings, filters, reranking, and top-k failure.

Artifact: a search comparison harness over one shared question set.

Mission 06

Evals before belief.

Make golden questions, regression checks, refusal checks, and cost limits.

Artifact: a 20-question eval file for hallucination, retrieval drift, refusals, and latency.

Mission 07

Tool use, MCP, and boundaries.

Define tool contracts, scopes, approval steps, dry runs, and trace logs.

Artifact: one safe tool with an input schema, dry-run mode, permission check, and trace log.

Mission 08

Ship the loop.

Connect prompts, retrieval, tools, evals, traces, cost guards, and fallback paths.

Artifact: a production agent capstone with logs, citations, evals, and a visible decision trail.

AI platform record

The current competitive lane.

2023 — present

Founder & Principal AI EngineerBasilecom · Transmission Agency

Sole technical decision-maker for AI platform strategy at a global B2B marketing agency, advising the SVP Global Operations across UK, US, and APJ. Owned MCP server architecture, LLM agent systems, Snowflake governance, vendor procurement, local inference strategy, and desktop-agent policy.

proof surface

Production AI operating layerLLM agents · RAG · MCP · AdTech automation

458 agent-callable tools governed across 12+ production MCP servers, 23 packaged Claude skills (from a 45+ skill internal library used by 30 practitioners), 1,104 eval tests across 29 suites, 34-page Monday.com MCP assessment, 52-level Claude Field Guide, and 4 enterprise B2B clients on Jasper workflow.

Software systems record

The operating base underneath the AI work.

1998 — 2023

Principal EngineerBasilecom · clients: IBM, Atlas Air, Dragos, U.S. Air Force AMC

Started in 1998 as a Fordham University web developer while still a CS student, then carried that self-taught habit through client systems, IBM global search modernization, Atlas Air "Hawk" global flight scheduling, Dragos ICS/OT cybersecurity, and classified Air Force AMC mission planning. Teams of 4–20 across 12 time zones, ~90% client retention, 75% repeat business, 5 engineers coached to senior roles.

2017 — 2018

Senior Full Stack Engineer & Data ScientistIntegraMed Fertility

Predictive ML on 200+ clinical features driving ~30% IVF success-prediction improvement. HIPAA / SOC 2 / PCI DSS platform across 50+ clinics, 40K+ IVF cycles, $100M+ in patient financing at 99.9% uptime. Zero-downtime integration of 9 acquired clinics and 50K+ patient records. $5M+ revenue impact through optimized treatment protocols.

2015 — 2016

Senior Full Stack EngineerTeladoc Health

Scaled platform through NYSE debut: 12.2M → 15.1M members and 240K+ quarterly visits. Migrated backend to Elixir/Phoenix for real-time telehealth with sub-2-second WebRTC connect times across ~50K concurrent sessions. CVS MinuteClinic API — first NCQA telehealth credentialing.

2013 — 2014

Senior Full Stack EngineerBaubleBar

First technical hire — built platform driving $10M+ revenue with a 30% conversion-rate lift. Celebrity launch sites at 100K+ concurrent users. Served on executive search panel for hiring the CTO.

earlier

Senior Engineer · Cannes Grand Prix & SABRE Gold campaigns360i / Dentsu

Engineering lead on Oreo Daily Twist and the Super Bowl blackout real-time response (sub-5-minute content decisions at millions of concurrent users), Oscar Mayer Bacon Barter, Coca-Cola Polar Bowl, and platforms for Marvel, NBC, and National Geographic. 3-year technical advisor to Polywork (Product Hunt Golden Kitty winner, 50K+ users in 48 hours).

Education & certifications

Foundations underneath the work.

degree

B.S. Computer ScienceFordham University

Formal CS foundation for the operating systems, data structures, compilers, and web work that started while still a student.

ML cert

Machine Learning SpecializationStanford / Coursera

Classic supervised learning, model evaluation, bias/variance tradeoffs, and practical ML workflows refreshed against current AI systems.

DL cert

Deep Learning SpecializationDeepLearning.ai

Neural-network, sequence-model, and optimization foundations behind the production RAG and agent-eval work.

service

15-year volunteerCivil Air Patrol · U.S. Air Force Auxiliary · security-clearance eligible

Long-running operational context for mission planning, chain-of-command communication, and work where process discipline matters.

14 — Public artifacts

Things a stranger can inspect.

The competitors with the strongest careers have public gravity: papers, talks, open source, courses, and durable writing. This is the current public surface for the production-AI lane.

artifact vault

Redacted proof packets

Inspectable reconstructions for MCP governance, RAG reliability, private-data AI, eval gates, approvals, and tenant boundaries.

public proof vault field guide

Agent engineering course

Eight text-only missions with progress, artifact logs, field checks, capstone, and a self-issued completion certificate.

course player case studies

Production proof

Private-data intelligence, MCP agent governance, and eval-gated RAG systems written as inspectable project narratives.

three case studies interactive proof

Portfolio answer engine

A Gemini-backed RAG interface that answers hiring-style questions from retrieved career-brain evidence.

ask for proof writing

Writing archive

Native production-AI essays plus a dated archive of older posts imported from DEV, Medium, and LinkedIn.

notes + archive career document

Resume PDF

A conventional hiring artifact for the claims that should not depend on interactive site copy.

downloadable peer proof

Recommendations

Endorsements from managers, peers, clients, and collaborators across the full systems career arc.

public endorsements course artifact

Completion certificate

A local progress artifact for the Agent Engineering mission path, tied to completed field exercises.

printable certificate source

GitHub profile

Public repositories across local retrieval, AI tooling, MLOps experiments, TypeScript systems, Rust, and applied ML labs.

github.com/philipjohnbasile

Research labs & public experiments

These are public applied-ML labs used to demonstrate local-first MLOps, model evaluation, cheminformatics, and reproducible pipelines. They support the case studies — they are not the production case studies themselves.

Rust · Python · WASM

VecStore

Embeddable vector database for local-first retrieval. HNSW + BM25, Python bindings, browser-runnable WASM build.

github.com/philipjohnbasile/vecstore TypeScript

PhilJS

A modern UI framework experiment with fine-grained reactivity, AI-native streaming SDK, WebAssembly support, and first-class Rust integration.

github.com/philipjohnbasile/philjs Python · Pharmacovigilance

SignalScope

Research-only signal-detection lab over FAERS and PubMed: extraction, disproportionality scoring, embedding analysis, ablation, and writeup.

github.com/philipjohnbasile/signal-scope Python · MLOps

CardValueML

Local-first MLOps pipeline for trading-card pricing — ingestion, feature engineering, model registry, drift, calibration, and CI monitoring.

github.com/philipjohnbasile/CardValueML Python · Cheminformatics

Protein-Ligand Playground

An offline-friendly protein-ligand affinity workflow over ChEMBL sample data: random and time splits, RDKit baselines, ChemBERTa embeddings, GNNs, CLIs.

github.com/philipjohnbasile/protein-molecule-ai-playground — More

The rest of the GitHub

58 public repositories across infrastructure, AI tooling, frameworks, and experiments. Python, Rust, TypeScript, and practical build artifacts.

github.com/philipjohnbasile

15 — First 90 days

What happens after the conversation.

The strongest competitor sites do not just list credentials; they make the next step feel concrete. This is the operating plan I would use to turn an AI mandate into controlled production movement.

Days 01-15

Map the surface.

Inventory the real AI system, including the quiet parts that usually live in Slack threads, browser extensions, notebooks, and one-off vendor pilots.

Models, prompts, tools, data paths, and owners
Shadow automations and approval gaps
Current costs, latency, failure modes, and risk

Days 16-30

Install control.

Turn the inventory into a practical operating layer: a tool registry, access boundaries, model policy, eval gates, and the first fast wins.

Tool contracts, scopes, and identity boundaries
Risk register with severity and ownership
One workflow selected for governed acceleration

Days 31-60

Ship the pilot.

Build one visible workflow that proves the standard: private-data handling, retrieval quality, citations, approvals, telemetry, and rollback.

Eval suite wired into release decisions
Human-in-the-loop review where it matters
Business-facing dashboard for quality and adoption

Days 61-90

Harden the system.

Move from impressive pilot to repeatable operating practice: documentation, training, incident paths, procurement standards, and scale rules.

Rollout policy and production readiness checklist
Team enablement surface and ownership model
Next-quarter roadmap with measured tradeoffs

Outputs by day 90

AI system inventory, governed tool registry, eval suite, risk register, rollout policy, working pilot, training surface, and a credible roadmap for the next quarter.

Start the conversation

I build production AI systems for companies moving past demos.

Ask for proof.

The claims stay attached to sources.

AI strategy with operating scope.

Tool count becomes a contract problem.

Answers ship through gates.

AI is the current chapter. The advantage is the whole arc.

Models are the flour. Experience is the bake.

Web foundations

Operational software

Audience and taste

Systems with consequences

Trust boundaries

AI operating layer

Where the record is strongest right now.

Enterprise tools without uncontrolled autonomy.

Retrieval systems with release gates.

AI workflows over data that cannot leak.

Ambiguous AI work turned into operating systems.

Three production systems show the pattern.

Multi-tenant marketing intelligence — 11 client accounts with audited tenant-isolation controls.

The MCP agent platform — 458 governed tools, not 458 prompts.

Production RAG with 1,104 tests gating every release.

28 years of production systems underneath the AI work.

Where the work actually had to survive.

Real-time, simulation, creative — design lineage, not the brand.

Every major claim has a trail.

Enterprise operating layer, not model research theater.

458 governed tools with contracts, scopes, and approval paths.

1,104 eval and regression tests before RAG deployment.

11 tenants, 2.3M rows, audited isolation controls.

Twenty-eight years shipping systems where failure is visible.

A free field guide built from production patterns.

Where this sits in the AI engineer market.

Frontier labs and university faculty.

Massive courses and public teachers.

Platforms, frameworks, docs, and adoption metrics.

The quiet lane with the most buying intent.

Connected Claude to enterprise systems without turning auth into folklore.

Made the data plane usable by agents and analysts.

Built the operating rules around the tools.

Evaluated vendors like a builder and trained the organisation to use the result.

Operating outcomes from the current platform.

Self-healing agent architecture, not happy-path demos.

Two-tier local stack so 75% of routine work never hits an external API.

Vision-driven agents for systems that don't have APIs.

FY26 planning baseline owned end-to-end.

The engineering layer the AI work runs on.

The operating map behind the work.

MCP connector registry

Eval release harness

Private intelligence stack

Auth failure recovery

52-level Claude Field Guide

Vendor decisions with teeth

Endorsed by peers across seven domains.

A public body of work, not a feed.

MCP is not a tooling problem. It is a governance problem.

Most eval harnesses are too impressed with answers.

The best agent systems are deliberately unromantic.

Public writing should compound, not disappear.

A field guide that exposes the production judgment behind the work.

What an agent actually is.

Prompting as interface design.

Agentic coding without chaos.

RAG from first principles.

Vector databases without mystery.

Evals before belief.

Tool use, MCP, and boundaries.

Ship the loop.

The current competitive lane.

The operating base underneath the AI work.

Foundations underneath the work.

Things a stranger can inspect.

Redacted proof packets

Agent engineering course

Production proof

Portfolio answer engine

Writing archive

Resume PDF