Skip to content
01 — Proof engine

I build production AI systems for companies moving past demos.

LLM agents, RAG, MCP, AdTech automation, private-data workflows, and eval gates. The operating layer between frontier models and real companies, built on 28 years of systems craft.

28y shipping software 1998 Fordham web developer as a student 458 agent-callable tools, governed 1,104 eval tests in CI

Show me the receipts. That is the design brief.

Interactive · Gemini RAG proof engine

Ask for proof.

Ask the question a hiring partner would actually ask. The answer retrieves evidence first, then asks Gemini to respond from that bounded context. No generic pitch without a citation trail.

28y shipping software 3 regions of operations (UK, US, APJ) 1,104 eval tests in CI
52 sources indexed 0ms retrieval now refreshed in browser
01 Question Hiring-grade ask
02 Retrieval Career brain + proof inventory
03 Citations Ranked trace
04 Answer Bounded context
05 Artifact Case study or packet
Review mode
Retrieval trace · cited sections
Proof ledger

The claims stay attached to sources.

A tighter inventory of what this page is proving: current platform ownership, governed agent tooling, and eval-backed delivery.

Current role · platform remit

AI strategy with operating scope.

Current work spans MCP server architecture, LLM agent systems, Snowflake governance, vendor procurement, local inference strategy, and desktop-agent policy.

Signal: this is live platform ownership, not a generic AI interest.
MCP layer · governed tools

Tool count becomes a contract problem.

Hundreds of callable capabilities are framed around scopes, naming, approvals, logging, reusable skills, and private-data boundaries.

Signal: the proof is governance and repeatability, not a big number by itself.
Eval layer · release control

Answers ship through gates.

Retrieval quality, refusals, latency, hallucination behavior, regressions, and spend are treated as deploy gates, not post-demo cleanup.

Signal: the proof surface shows how AI behavior is allowed to ship.
02 — Full 360

AI is the current chapter. The advantage is the whole arc.

I started in 1998 as a Fordham University web developer while still a student. I lived the degree before it was official, then kept learning after the degree was already out of date. The career since then is not a random stack list; it is the craft record behind the AI work.

1998 → now

Models are the flour. Experience is the bake.

Everyone has access to AI now. Everyone also has access to flour. The difference is knowing how to combine ingredients, control heat, recover when something goes wrong, and make something worth serving. My edge is the 28-year operating record behind the prompts: self-taught, still learning, and always turning the work into a path other people can climb.

28y
software craft
40+
systems shipped
458
governed AI tools
1,104
eval tests
  1. 01 · 1998

    Web foundations

    Fordham web development while still a CS student; the degree became formal later, but the work was already real.

  2. 02 · Data systems

    Operational software

    Nextel commissions data, Intrepid web and intranet modernization, and early Basilecom client systems where software had to serve real staff.

  3. 03 · Creative velocity

    Audience and taste

    360i real-time campaigns, Oreo Daily Twist, Cannes Grand Prix work, and BaubleBar launches taught timing, polish, and pressure.

  4. 04 · Regulated scale

    Systems with consequences

    Teladoc through NYSE-debut scale and IntegraMed clinical ML under HIPAA, SOC 2, PCI, patient-data, and uptime constraints.

  5. 05 · Enterprise + defense

    Trust boundaries

    IBM search, Atlas Air logistics, Dragos OT security, and Air Force mission planning gave the AI work its governance instincts.

  6. 06 · Now

    AI operating layer

    MCP, RAG, eval gates, local inference, prompt-injection defense, tenant isolation, and human approval paths for real organizations.

How the arc compounds

Self-taught never meant isolated. The pattern is learn the thing, ship the thing, document the thing, and raise the people around it. I have coached engineers into senior roles, compressed onboarding paths, built field guides, led interviews for clients, and usually tried to lift the team higher than myself. That is why the AI work is not just prompt skill; it is judgment turned into systems other people can trust and use.

03 — Hiring fit

Where the record is strongest right now.

A buyer or hiring team should not have to read the whole site to know whether the fit is real. These are the lanes where the proof, case studies, and public artifacts line up cleanly.

01 · Agent governance

Enterprise tools without uncontrolled autonomy.

MCP servers, OAuth boundaries, tool contracts, approvals, logging, desktop-agent policy, and recovery when integrations behave differently than the demo.

Best evidence: MCP platform + systems atlas
02 · RAG quality

Retrieval systems with release gates.

Private knowledge workflows, citation behavior, regression suites, refusal checks, cost budgets, latency targets, and evals that block bad releases.

Best evidence: RAG eval harness
03 · Private-data AI

AI workflows over data that cannot leak.

Tenant isolation, Snowflake roles, local inference, approval queues, NL-to-SQL guardrails, and useful analyst interfaces over sensitive operational data.

Best evidence: marketing intelligence case
04 · Technical leadership

Ambiguous AI work turned into operating systems.

Vendor diligence, platform architecture, adoption surfaces, field guides, internal standards, and the translation between executives, operators, and engineers.

Best evidence: current platform record
Not the right lane

I am not optimizing this site for frontier model research roles, academic publication paths, brand-only AI strategy, or prompt coaching without implementation authority. The strongest fit is senior applied AI, production systems, platform ownership, and fractional technical leadership.

04 — Selected work

Three production systems show the pattern.

Private data, governed tools, measurable quality, and business workflows that survive outside the demo. The 458-tool number is the total agent-callable surface across MCP tools, custom tools, Claude skills, and workflow actions — the architecture includes 12+ production MCP server integrations, not 458 separate MCP servers.

Evidence vault Each case links claim, artifact, gate, and operating result.
Case 01 / Private data

Multi-tenant marketing intelligence — 11 client accounts with audited tenant-isolation controls.

Eleven brands, 2.3M private rows, 31 Python modules, 80+ REST endpoints, 20+ AI features. Local Qwen and Ollama inference for sensitive data. NL-to-SQL, attribution, forecasting, anomaly detection. Three layers of tenant isolation, audited.

11Tenants
2.3MRows
80+Endpoints
Read the case study
Case 02 / Agents

The MCP agent platform — 458 governed tools, not 458 prompts.

Enterprise MCP work across NetSuite, Monday.com, HubSpot, Wrike, Google Ads, Adverity, Apify, Snowflake, LinkedIn, and Jasper. The impressive part is not tool count. It is reducing hundreds of capabilities into stable contracts, OAuth boundaries, approval paths, and reusable skills so teams can ship without prompt folklore.

458Tools
23Skills
−87%Research time
Read the case study
Case 03 / RAG & evals

Production RAG with 1,104 tests gating every release.

Five RAG systems at roughly 50 QPS, sub-300ms P95, retrieval accuracy up 40%, inference cost down 65%. RAGAS, LangSmith, and Promptfoo gates for retrieval quality, hallucination behavior, latency ceilings, refusal correctness, and spend. Most evals only test answer quality. These don't.

5Systems
<300msP95
−65%Cost
Read the case study
05 — Career proof beyond the AI cycle

28 years of production systems underneath the AI work.

Started as a Fordham web developer while still a student, then kept self-teaching as each official credential aged. Classified defense, HIPAA-regulated clinical data, NYSE-scale telehealth, global logistics, commerce, enterprise search, and Cannes-winning advertising platforms became the operating base the current AI platform sits on.

28years shipping 12time zones led 5engineers coached
  • 01U.S. Air Force AMCClassified mission planning at Scott AFB
  • 02IBMGlobal search, 3× query speed, 80% cost reduction, CTO commendation
  • 03Teladoc HealthNYSE-debut scale, 12.2M → 15.1M members, sub-2-second WebRTC connect time
  • 04IntegraMed50+ clinics, 40K+ IVF cycles, ~30% prediction improvement, $5M+ ML impact
  • 05BaubleBar$10M+ platform revenue, 30% conversion lift, 100K+ concurrent launch users
  • 06360i / DentsuOreo Daily Twist, Super Bowl blackout, sub-5-min content decisions, millions concurrent
Systems shipped across high-stakes environments

Where the work actually had to survive.

Defense
U.S. Air Force AMC (Scott AFB), classified mission planning
Healthcare & regulated
Teladoc Health, IntegraMed Fertility, Bayer HealthCare — HIPAA, SOC 2 Type II, PCI DSS, FedRAMP-aligned controls, WCAG 2.2 AA
Enterprise & logistics
IBM, Atlas Air, Dragos, ADP, FIS, Phoenix Contact, Shubert Ticketing, Bremer Bank, IFF
Creative & AdTech
360i / Dentsu, Publicis Groupe, Fox Sports, Cannes Grand Prix & SABRE Gold campaigns
Adjacent systems background

Real-time, simulation, creative — design lineage, not the brand.

Before the current agent platform work, I spent years on real-time systems, game AI, interactive campaigns, creative tools, and simulation workflows. Unreal Engine 5 (Epic Games-recognized developer since 2014, Nintendo and Sony licensed), Unity 6, behavior trees, AI perception, navigation, digital twins, ONNX integration. Plus the creative-AI stack — Stable Diffusion, Adobe Firefly, Midjourney, Sora, LoRA, ControlNet, ComfyUI, Creative Cloud integration — wired into multimodal pipelines that cut on-brand asset production time ~40%, with LoRA / PEFT fine-tuning delivering ~25% performance improvement on domain-specific generation tasks. That background shapes how I design agents today: state, memory, perception, pathing, tool choice, fallback behavior, and decisions under frame-time constraints. The site's main lane is still production AI for enterprise — this is the design lineage underneath it.

06 — Evidence ledger

Every major claim has a trail.

A case study, public artifact, resume entry, repo, field note, or operating plan behind every line.

AI platform ownership

Enterprise operating layer, not model research theater.

Sole technical decision-maker for AI platform strategy across UK, US, and APJ operations: MCP architecture, LLM agent systems, Snowflake governance, vendor procurement, local inference, and desktop-agent policy.

Open proof
Agent systems

458 governed tools with contracts, scopes, and approval paths.

The claim is not tool count. The evidence is the governed surface: OAuth, naming, scope control, reusable Claude skills, failure recovery, and human approval boundaries.

Case study
Quality gates

1,104 eval and regression tests before RAG deployment.

Retrieval quality, hallucination behavior, refusal posture, latency, cost, and regressions are treated as release gates instead of demo polish.

Case study
Private data

11 tenants, 2.3M rows, audited isolation controls.

Prompt constraints, SQL validation, server-side tenant injection, local inference paths, and approval queues that reduce cross-tenant leakage risk over private client data.

Case study
Career depth

Twenty-eight years shipping systems where failure is visible.

NYSE-scale telehealth, classified mission planning, global flight scheduling, clinical ML, commerce launch scale, and Cannes Grand Prix interactive work before the AI platform layer.

Record
Teaching surface

A free field guide built from production patterns.

Eight text-only missions, local artifacts, knowledge checks, persisted progress, and a self-issued certificate. The course supports the production thesis instead of replacing the portfolio.

Course
07 — Market map

Where this sits in the AI engineer market.

The competitor set is not one market. It splits into research authority, education authority, agent tooling companies, and enterprise AI operators. This site should win only one of those lanes: production AI systems inside real organizations.

Research authority

Frontier labs and university faculty.

Their edge: papers, citations, labs, students, awards, foundation-model history, and institutional reputation.

Our answer: do not pretend to be that. Translate frontier capability into controlled production systems: identity, tools, retrieval, evals, observability, and rollout policy.

Education authority

Massive courses and public teachers.

Their edge: learner scale, lectures, books, certificates, testimonials, and established teaching brands.

Our answer: a narrower field guide from production work. The course exists to expose judgment, not to compete with university-scale AI education.

Agent tooling companies

Platforms, frameworks, docs, and adoption metrics.

Their edge: product gravity, downloads, customer logos, SDKs, docs, changelogs, and ecosystem ownership.

Our answer: be the operator who chooses, wires, governs, evaluates, and recovers those tools inside a business with private data and real risk.

Enterprise AI operators

The quiet lane with the most buying intent.

Their edge: many have strong private experience but weak public surfaces, because the work sits behind client systems and internal documents.

Our answer: make the private work legible without leaking it: redacted case studies, artifact ledgers, system maps, public course material, and a proof engine.

The positioning is simple: not the best researcher, not the biggest teacher, not a tooling vendor. A production AI systems engineer with proof that the operating layer has already been built.

Trace the proof →
Competitive lane

I am not positioning as a frontier model researcher. I build the enterprise operating layer that lets teams use frontier models safely: tools, identity, retrieval, evals, governance, rollout, and recovery when the system fails in public.

pjb@principal-ai:~/platform$ tree --color=always_
458 tools · 12 MCP servers · 11 tenants · sub-300ms P95 · −65% inference cost
MCP + identity

Connected Claude to enterprise systems without turning auth into folklore.

Architected 12+ production MCP server integrations across NetSuite, Monday.com, HubSpot, Wrike, Google Ads, Adverity, Apify, Snowflake, LinkedIn, and Jasper. Root-caused a critical LinkedIn MCP OAuth failure — missing audience parameter producing opaque tokens instead of JWTs — that unblocked 8 downstream tools and cut resolution from 4 days to 2 hours. Patched a Google Ads list_accessible_customers parsing bug affecting multi-account access. Identity boundary on Auth0 with OAuth 2.1 / OIDC, modern Passkeys / FIDO2 supported for end-user surfaces.

Data + content

Made the data plane usable by agents and analysts.

Architected the Snowflake role hierarchy, including an AUDIENCEINTELLIGENCE role and a tiered analyst account model serving 25 users across 3 regions. Consolidated 40 legacy roles in a cross-regional governance cleanup, dropping credit consumption 35% (~£2,800/month). Integrated Bombora intent + Leadspace firmographics + GWI audience data into a Snowflake + Claude + Jasper pipeline (12 analysts on previously vendor-portal-locked queries), then built the custom content-pack-pipeline Jasper MCP server for four enterprise B2B clients.

Governance

Built the operating rules around the tools.

Defined the tooling architecture across Claude API, Claude Desktop, Claude Code, Cowork, OpenAI Codex, Cursor, Ollama, and LM Studio. Conducted a security review of autonomous agent options (OpenClaw, NanoClaw) — recommended sandboxed NanoClaw deployment after weighing exfiltration risk, prompt-injection surface, and audit-logging maturity. Identified governance gaps in Cowork (Claude Desktop automation) that drove revised internal policy on autonomous desktop agents. Recommended Datadog for Claude activity observability and governed the 45+ skill library (30 practitioners) including version control, review process, and backup strategy.

Diligence + rollout

Evaluated vendors like a builder and trained the organisation to use the result.

Delivered integration assessments for Google Ads, Reddit Ads, The Trade Desk, LinkedIn Ads, Meta Ads, DV360, StackAdapt, Clay.com, Windsor.ai, and Adverity. Owned the Claude + Firecrawl content audit pipeline with Apify failover. Built the 52-level Claude Field Guide adopted by 65+ practitioners (3 days → 2 hours time-to-first-useful-prompt). Shipped an AI Brief Builder standardising input quality across 9 account teams. Codified a reproducible macOS Node.js engineering environment (fnm, pnpm, Starship, Codex) used by 8 internal engineers.

Recent outcomes

Operating outcomes from the current platform.

  • 40% reduction in NetSuite manual reconciliation, ~15 hours / week saved
  • 35% reduction in Snowflake credit consumption, ~£2,800 / month saved
  • £24,000 + 4 months of rebuild avoided by stabilizing the crawl pipeline instead of replacing it
  • £450,000 in media trading decisions informed by structured AdTech API assessments
  • £18,000 + 6 months of self-hosted observability avoided by standardizing on Datadog
  • 3 days → 2 hours time-to-first-useful-prompt across 65+ practitioners trained through the Claude Field Guide
  • 60% reduction in data-to-insight latency from the Windsor.ai vendor selection
Resilience

Self-healing agent architecture, not happy-path demos.

Circuit breakers around external integrations, token-bucket rate limiting, LRU caching with size-and-age eviction, priority queues for approval-gated actions, retry policies with jitter, and per-tool timeouts. Multi-tenant API gateways in Go (Fiber) sustaining 2M+ daily requests at p95 < 100ms when the workload is high-throughput rather than agent-paced. 100% strict TypeScript with Zod validation across 77+ typed API interfaces — the entire agent surface area is auditable in CI before deployment. The point is the second-day failure modes that turn a working demo into a midnight pager.

Local + edge inference

Two-tier local stack so 75% of routine work never hits an external API.

Designed a two-tier local inference strategy: Qwen 3.5 27B dense for planning and reasoning, Qwen3-Coder-Next for agentic code execution, with the Claude API reserved for high-stakes decisions. Eliminates external API exposure for 75% of routine internal tasks — relevant when the work is over private operational data and tenant-isolated pipelines. Edge inference (Cloudflare Workers AI, Vercel Edge / Fluid Compute) reserved for latency-bound public surfaces where TLS termination, rate limiting, and AI personalization need to happen at the POP.

Computer Use

Vision-driven agents for systems that don't have APIs.

Shipped Computer Use agent workflows on the Anthropic Agent SDK + Computer Use API — letting agents navigate browser interfaces, interact with legacy web apps that lack APIs, and automate multi-step UI tasks via vision-based screen understanding. Closes the gap between "everything must be an MCP server" and the long tail of business-critical tools that aren't.

Roadmap + leadership

FY26 planning baseline owned end-to-end.

Authored the Media Planning & Finance AI Augmentation Roadmap (synced HTML + Word source), presented to leadership as the FY26 planning baseline. AI roadmaps owned end-to-end and held to outcomes, not demos.

Infra underneath

The engineering layer the AI work runs on.

AWS Lambda, Bedrock, S3, IAM, CloudWatch · Snowflake · Docker · Terraform · GitHub Actions · OpenTelemetry, Prometheus, Datadog · Ollama and local Qwen inference · Python, TypeScript, Rust · React, FastAPI, Node.js · vLLM, Triton, TensorRT, MLflow, W&B, PyTorch, DeepSpeed, LoRA, PEFT.

The point is not the tool list. The point is choosing the cheapest controlled path that satisfies latency, privacy, eval, and rollout requirements.

09 — Systems atlas

The operating map behind the work.

One graph for the production surface: models, MCP servers, OAuth, Snowflake, Jasper, crawlers, eval gates, observability, and rollout governance.

458governed tools 1,104eval tests 34page MCP assessment
production-ai-os / systems atlas / evidence graph
drag a node; springs and collisions do the rest mcp · data · models · evals · governance · enablement
active node systems-atlas mcp / evidence view
artifact 01 · tool plane

MCP connector registry

NetSuite, Monday.com, HubSpot, Wrike, Google Ads, Snowflake, LinkedIn, Jasper. The proof is the governed surface: OAuth, scopes, naming, approvals, and failure recovery.

tool: finance.invoice.reconcile
auth: oauth-delegated
scope: read-only + approval-for-write
audit: request, owner, fallback
open registry artifact →
artifact 02 · quality gate

Eval release harness

1,104 tests across 29 suites covering retrieval quality, refusals, latency, regressions, hallucination behavior, and spend before deployment.

retrieval_quality: pass
citation_integrity: pass
refusal_correctness: pass
release_decision: ship with monitor
open eval artifact →
artifact 03 · data plane

Private intelligence stack

Snowflake, Bombora, Leadspace, GWI, local inference. Analyst and agent workflows over private data without leaking sensitive context into every model call.

attempt: tenant A asks tenant B
expected: refusal + no SQL
required: server tenant predicate
blocked: raw rows, private identifiers
open isolation artifact →
artifact 04 · identity boundary

Auth failure recovery

Opaque tokens instead of JWTs. A missing OAuth audience parameter on the LinkedIn MCP integration turned into the kind of low-level bug that blocks every downstream agent. Root-caused, fixed, 8 downstream tools unblocked, resolution time cut from 4 days to 2 hours.

symptom: opaque access token
root_cause: missing audience
recovery: JWT claims restored
recheck: downstream account list
open recovery artifact →
artifact 05 · enablement

52-level Claude Field Guide

Training as product, not documentation. Internal adoption moved from scattered prompt lore into an interactive React learning surface.

path: prompt lore -> guided missions
surface: React field guide
audience: 65+ practitioners
result: 3 days -> 2 hours
read writing →
artifact 06 · diligence

Vendor decisions with teeth

34-page Monday.com MCP assessment plus AdTech API reviews. Feasibility, security, licensing, rollout cost, and operating risk in one decision path.

review: capability, auth, cost
risk: rollout + data exposure
decision: pilot / defer / reject
handoff: owner + next check
ask the proof engine →
10 — Recommendations

Endorsed by peers across seven domains.

AI implementation, technical leadership, quality gates, delivery, and client engagement.

Phil and I talk about AI regularly, especially MLOps. He has solid expertise with LLMs and RAG implementations in particular, plus knows how to put Python to work effectively in AI projects. He'd be a real asset to any team working in this space.
Alan Cafferkey, Ph.D. AI Implementation Leader | Educational Technology Director | Mission-Driven Innovator
I just recently had the pleasure of working with Philip Basile on a team for an extended period. He was a committed, strong, and dedicated team member. He provided guidance and knowledge to the entire team, from assistance with onboarding and IDE configuration and integration with source control and CI systems to learning the newest offerings in our team's technology stack, followed by documenting and sharing his experience. He immediately became a mentor. Philip brought with him, and shared, an impressive depth of understanding of front-end systems, enterprise architecture, and the intricate interdependence of design, functionality and user experience. With all of this, he consistently produced elegant code, markup, and CSS that provided a comprehensive, engaging, and seamless user experience, catching and handling edge and corner cases gracefully. Philip was easy to work with, cooperative, and delivered constructive feedback in a manner that encouraged others to participate in a healthy and productive peer review process. He made the team stronger and greater than the sum of its parts.
Dennis Luken Front-End/UI Architect/Developer | Full Stack Angular/React/Node
Phil contributed front-end development to our team as a contractor. He developed the web interface for a number of applications and ensured the user experience requirements were met in both desktop and mobile rendering. As a front-end engineer, the applications required TypeScript/JavaScript using the VueJS framework and Vuex for state management, with back-end data retrieval using REST. Phil ensured very high code coverage and code quality standards were met through unit testing with Jest and Vue Development Utils, end-to-end testing using WebdriverIO, and SonarQube quality scans. Docker environments were also part of the daily development lifecycle. Phil worked well with other team members.
Neil Hall Web Architect / Full-stack Software Engineer at Phoenix Contact
It was a pleasure working with Philip. He excelled at developing multiple UI components simultaneously while integrating with various microservices that utilized RESTful APIs. Philip displayed extraordinary communication skills while implementing UX designs by effectively detailing any blockers, inconsistent documentation, or missing requirements. He has also been able to confidently demonstrate and document his completed work. Philip always brings positive energy to meetings and discussions. He works well in teams and can quickly adapt to changes in organizational structure.
Tyler Rieke Principal Software Engineer at Dragos, Inc.
Philip is an exceptional front-end developer and IT professional. I had the pleasure of working with him on a few challenging client engagements, and his ability to quickly step up in a lead capacity, drive work, and produce results was greatly appreciated. Combined with his technical skill, his professionalism and personality makes him a key asset to any team.
John Robinson IT Director at RSC Solutions
Phil consistently produces new ideas and approaches to improve code or streamline development processes. He has a knack for identifying potential issues early and developing creative solutions to complex challenges. His passion for innovation makes him an asset in providing fresh perspectives during technical discussions. Phil is an asset to any team seeking an influential contributor with an innovative mindset.
Patrick Gross Consulting Member of Technical Staff at Oracle
Philip is a UX Master, and a front-end usability champion. He has deep knowledge of CSS, JavaScript and User Experience Design. He improved the overall quality of Teladoc's website experience, and made the product better. Besides front-end, Philip is a self-starter who always keeps improving himself, and I know he has become quite good at backend development too. Finally, Philip is a great co-worker, reliable and with a fantastic sense of humor. He would be a valuable and important member of any team he joins or leads.
Joseph Hurtado Tech Project Manager | Lead Agile Consultant

See all 37 recommendations →

11 — Writing

A public body of work, not a feed.

Native essays, older posts, and source-labeled archives brought back onto the site so the thinking is easy to inspect by date, platform, and topic.

Position 01

MCP is not a tooling problem. It is a governance problem.

Anyone can expose an API to a model. The hard part is deciding what the model is allowed to touch, what gets logged, what requires approval, and what happens when an integration fails halfway through a business process.

The impressive number is not 458 tools. It is the contract discipline that keeps hundreds of tools from becoming hundreds of new ways to lose control.

tool contracts OAuth boundaries approval paths
Ask the portfolio about MCP →
Position 02

Most eval harnesses are too impressed with answers.

Answer quality matters, but production systems fail in less flattering ways. Retrieval gets worse after a content migration. Refusals regress. Latency crosses the budget. Spend creeps. A prompt change helps one client and quietly hurts another.

A real eval harness is a release gate for behavior, cost, latency, refusal posture, and retrieval drift, not a scoreboard for pretty generations.

1,104 tests 29 suites hard CI gates
RAG & evals case study →
Position 03

The best agent systems are deliberately unromantic.

The pitch says autonomy. The shipped product needs boring rails: scoped tools, observable plans, deterministic fallbacks, permission checks, and a clear place where a human can say no.

The job is not to make the agent seem alive. The job is to make it safe enough that the business can let it act.

human-in-loop bounded autonomy rollback paths
Agent platform case study →
Position 04

Public writing should compound, not disappear.

Competitor sites get authority from a dated body of public work: papers, posts, essays, talks, and notes that prove the thinking existed before the current page. Older writing now lives here as an inspectable archive instead of being stranded on platform profiles.

The archive is not here to make every old post equally important. It is here to show a continuous public trail across AI, engineering, career, accessibility, frameworks, and developer education.

49 entries DEV + Medium + LinkedIn dated sources
Open writing archive →
12 — Field Guide

A field guide that exposes the production judgment behind the work.

Not a course business. A public artifact that exposes the judgment behind the case studies: agents, agentic coding, RAG, vector databases, evals, tool use, and rollout discipline — translated into eight text-only missions with hands-on builds.

8 Lessons
8 Local builds
~6h First pass
$0 Path via Ollama
Python Only prereq
Production AI field guide · 8 missions each card opens its lesson
Mission 01

What an agent actually is.

Learn the loop: context, plan, action, observation, state, stop.

Artifact: a tiny task loop that can choose search, summarize, or ask-for-clarification.
Mission 02

Prompting as interface design.

Turn prompt writing into contracts, schemas, refusals, and handoffs.

Artifact: one prompt rewritten into a system instruction, task brief, and structured output schema.
Mission 03

Agentic coding without chaos.

Use plans, file boundaries, diffs, tests, and rollback discipline.

Artifact: the same small feature with two coding agents, then compare plans, patches, and tests.
Mission 04

RAG from first principles.

Build retrieval around chunks, metadata, citations, and answer grounding.

Artifact: a local document Q&A over markdown notes with citations back to source files.
Mission 05

Vector databases without mystery.

Compare keyword search, embeddings, filters, reranking, and top-k failure.

Artifact: a search comparison harness over one shared question set.
Mission 06

Evals before belief.

Make golden questions, regression checks, refusal checks, and cost limits.

Artifact: a 20-question eval file for hallucination, retrieval drift, refusals, and latency.
Mission 07

Tool use, MCP, and boundaries.

Define tool contracts, scopes, approval steps, dry runs, and trace logs.

Artifact: one safe tool with an input schema, dry-run mode, permission check, and trace log.
Mission 08

Ship the loop.

Connect prompts, retrieval, tools, evals, traces, cost guards, and fallback paths.

Artifact: a production agent capstone with logs, citations, evals, and a visible decision trail.
AI platform record

The current competitive lane.

2023 — present
Founder & Principal AI EngineerBasilecom · Transmission Agency

Sole technical decision-maker for AI platform strategy at a global B2B marketing agency, advising the SVP Global Operations across UK, US, and APJ. Owned MCP server architecture, LLM agent systems, Snowflake governance, vendor procurement, local inference strategy, and desktop-agent policy.

proof surface
Production AI operating layerLLM agents · RAG · MCP · AdTech automation

458 agent-callable tools governed across 12+ production MCP servers, 23 packaged Claude skills (from a 45+ skill internal library used by 30 practitioners), 1,104 eval tests across 29 suites, 34-page Monday.com MCP assessment, 52-level Claude Field Guide, and 4 enterprise B2B clients on Jasper workflow.

Software systems record

The operating base underneath the AI work.

1998 — 2023
Principal EngineerBasilecom · clients: IBM, Atlas Air, Dragos, U.S. Air Force AMC

Started in 1998 as a Fordham University web developer while still a CS student, then carried that self-taught habit through client systems, IBM global search modernization, Atlas Air "Hawk" global flight scheduling, Dragos ICS/OT cybersecurity, and classified Air Force AMC mission planning. Teams of 4–20 across 12 time zones, ~90% client retention, 75% repeat business, 5 engineers coached to senior roles.

2017 — 2018
Senior Full Stack Engineer & Data ScientistIntegraMed Fertility

Predictive ML on 200+ clinical features driving ~30% IVF success-prediction improvement. HIPAA / SOC 2 / PCI DSS platform across 50+ clinics, 40K+ IVF cycles, $100M+ in patient financing at 99.9% uptime. Zero-downtime integration of 9 acquired clinics and 50K+ patient records. $5M+ revenue impact through optimized treatment protocols.

2015 — 2016
Senior Full Stack EngineerTeladoc Health

Scaled platform through NYSE debut: 12.2M → 15.1M members and 240K+ quarterly visits. Migrated backend to Elixir/Phoenix for real-time telehealth with sub-2-second WebRTC connect times across ~50K concurrent sessions. CVS MinuteClinic API — first NCQA telehealth credentialing.

2013 — 2014
Senior Full Stack EngineerBaubleBar

First technical hire — built platform driving $10M+ revenue with a 30% conversion-rate lift. Celebrity launch sites at 100K+ concurrent users. Served on executive search panel for hiring the CTO.

earlier
Senior Engineer · Cannes Grand Prix & SABRE Gold campaigns360i / Dentsu

Engineering lead on Oreo Daily Twist and the Super Bowl blackout real-time response (sub-5-minute content decisions at millions of concurrent users), Oscar Mayer Bacon Barter, Coca-Cola Polar Bowl, and platforms for Marvel, NBC, and National Geographic. 3-year technical advisor to Polywork (Product Hunt Golden Kitty winner, 50K+ users in 48 hours).

Education & certifications

Foundations underneath the work.

degree
B.S. Computer ScienceFordham University

Formal CS foundation for the operating systems, data structures, compilers, and web work that started while still a student.

ML cert
Machine Learning SpecializationStanford / Coursera

Classic supervised learning, model evaluation, bias/variance tradeoffs, and practical ML workflows refreshed against current AI systems.

DL cert
Deep Learning SpecializationDeepLearning.ai

Neural-network, sequence-model, and optimization foundations behind the production RAG and agent-eval work.

service
15-year volunteerCivil Air Patrol · U.S. Air Force Auxiliary · security-clearance eligible

Long-running operational context for mission planning, chain-of-command communication, and work where process discipline matters.

14 — Public artifacts

Things a stranger can inspect.

The competitors with the strongest careers have public gravity: papers, talks, open source, courses, and durable writing. This is the current public surface for the production-AI lane.

Research labs & public experiments

These are public applied-ML labs used to demonstrate local-first MLOps, model evaluation, cheminformatics, and reproducible pipelines. They support the case studies — they are not the production case studies themselves.

15 — First 90 days

What happens after the conversation.

The strongest competitor sites do not just list credentials; they make the next step feel concrete. This is the operating plan I would use to turn an AI mandate into controlled production movement.

Days 01-15

Map the surface.

Inventory the real AI system, including the quiet parts that usually live in Slack threads, browser extensions, notebooks, and one-off vendor pilots.

  • Models, prompts, tools, data paths, and owners
  • Shadow automations and approval gaps
  • Current costs, latency, failure modes, and risk
Days 16-30

Install control.

Turn the inventory into a practical operating layer: a tool registry, access boundaries, model policy, eval gates, and the first fast wins.

  • Tool contracts, scopes, and identity boundaries
  • Risk register with severity and ownership
  • One workflow selected for governed acceleration
Days 31-60

Ship the pilot.

Build one visible workflow that proves the standard: private-data handling, retrieval quality, citations, approvals, telemetry, and rollback.

  • Eval suite wired into release decisions
  • Human-in-the-loop review where it matters
  • Business-facing dashboard for quality and adoption
Days 61-90

Harden the system.

Move from impressive pilot to repeatable operating practice: documentation, training, incident paths, procurement standards, and scale rules.

  • Rollout policy and production readiness checklist
  • Team enablement surface and ownership model
  • Next-quarter roadmap with measured tradeoffs
Outputs by day 90

AI system inventory, governed tool registry, eval suite, risk register, rollout policy, working pilot, training surface, and a credible roadmap for the next quarter.

Start the conversation
Portrait of Philip John Basile

Twenty-eight years in. The unfashionable parts of the job — eval gates, rollback paths, latency budgets, the failure modes nobody writes blog posts about — are the parts I find most interesting.

Currently available for senior applied AI roles and fractional technical leadership.

If you have an AI system that needs to work in the real world.

For Principal AI Engineer, Staff Applied AI, AI Platform, Agent Systems, RAG, MCP, or fractional AI leadership work — start here.

Email directly