Most teams scaling AI agents to production are not blocked by the model. They are blocked by a CISO meeting that nobody on the team prepared for. The demo worked. The Slack thread was enthusiastic. Then a security officer or a compliance lead walks in with five questions, and the deployment slips by a quarter — sometimes two. Every team I have watched build agent systems hits the same wall in the same order. The model is fine. The infrastructure underneath the model was never built.
These are the five questions. None of them are about the model. All of them are about the dependency layer beneath it — the layer most teams treat as something to figure out after launch. It is not. It is the gate to launch.
Can Your Agent Prove It Is the Agent It Claims to Be?
OAuth was built for humans. There is no equivalent for agents. When Agent A calls Agent B, "trust me, I am the production scanner" is not a credible answer in any enterprise compliance regime. Model spoofing, silent version downgrades, and unsigned model swaps are not theoretical — they are the default state of a stack assembled in 2025. If you cannot cryptographically prove which model, which version, and which configuration produced a given action, you do not have an agent identity. You have a logo on a dashboard.
Can You Prove What Your Agent Actually Did?
Output verification alone is not enough. Two agents can return the same answer for different reasons — one because it ran the work, one because it hallucinated a plausible result. Process attestation is the layer most teams skip: a signed, tamper-evident log of every step the agent took, every tool it called, every input it received, every decision it branched on. Financial auditing exists because humans can lie too; the infrastructure is what makes the work checkable. Agents need the same thing. If a regulator asks "show me what this agent did on March 14 at 2:47 PM," the answer needs to be a transcript, not a vibe.
Can You Verify the Output Without Re-Doing the Work?
My code-quality scanner produces 406 findings in 35.2 seconds. That is the entire pitch of the invisible workforce. It is also the entire problem: how does the CISO know those 406 findings are real? If verification means rerunning the scan with a human team for two weeks, the agent saved no time — it just moved the cost. Real verification means deterministic re-execution, cross-agent consensus, or external attestation against ground truth. If your only answer is "the model is usually right," you do not have a production system. You have a demo that ships output the buyer cannot trust.
When Your Agent Calls Another Agent, Who Owns the Failure?
Multi-agent stacks now ship in production. The orchestrator decomposes the task, hands subtasks to specialist agents, and assembles the result. The liability chain has not caught up. If the report agent calls a forecast agent that calls a research agent and the chain produces a wrong number on which a customer acts — whose contract was breached? Whose insurance pays? Whose audit log is canonical? "We use LangChain" is not an answer a general counsel accepts. Orchestration is a dependency-layer component for a reason: the handoff protocol is the contract, and most stacks do not have one written down.
Will You Know If It Stops Working Tomorrow?
Server monitoring tells you a process crashed. Agent monitoring has to tell you that a process is still running, still returning answers, and silently producing garbage. The failure mode is not the 500 error — it is the 200 OK with a plausible-but-wrong response. Drift, prompt regression, upstream model deprecation, and tool-API breakage all present as "the agent still works, just worse." If your observability stack cannot distinguish "agent answered" from "agent answered correctly," you will discover the regression the same way 77% of enterprises discover their agent failures — from a customer.
None of these are model questions. All five are infrastructure questions. The teams that answer them before scaling get to production. The teams that answer them after the breach pay enterprise contracts to consultants who answer them in retrospect.
Chapters 2 and 5 of The AI Agent Economy are the long versions of this checklist — the five-component dependency layer, the three-layer trust framework, and the reason both will be more valuable than the agents that depend on them.
Frequently asked
What is the biggest reason AI agent deployments fail in production?
Most agent deployments fail at the dependency layer, not at the model. The pattern is consistent: a working demo, an enthusiastic team, and then a six-month stall when enterprise buyers ask basic questions about identity, process attestation, output verification, orchestration liability, and monitoring — and the team has no answers. The model worked fine. The infrastructure around the model was never built.
What should an enterprise check before scaling an AI agent?
Five things, in order: (1) Agent identity — can the agent prove which model and version it actually is? (2) Process attestation — is there a verifiable record of what steps the agent took? (3) Output verification — can the result be checked without redoing the work? (4) Orchestration accountability — when one agent calls another, who is liable for the result? (5) Observability — will the team know if the agent silently stops working tomorrow? If any answer is 'we'll figure it out later,' the deployment is not production-ready.
How do you verify what an AI agent actually did?
Through process attestation — a signed, tamper-evident log of every step the agent took, with inputs, outputs, and intermediate decisions. Output checking alone is not enough, because two agents can produce the same answer for different reasons, one of which is correct and one of which is a hallucination. Process attestation is the layer that lets a human auditor reconstruct the agent's reasoning after the fact, the same way financial audit trails reconstruct human work.
What is agent identity and why does it matter?
Agent identity is the cryptographic ability to prove that a given action was taken by a specific model, version, and configuration — not a spoofed wrapper, not a downgraded fallback model, not an older snapshot that has since been deprecated. It matters because every compliance regime built for the agent economy will treat 'we cannot prove which agent did this' as functionally equivalent to 'an unidentified user did this' — and unidentified actions do not pass audit.
Related reading
From the same content cluster.
Cluster pillar
The Dependency Layer — Full Thesis
The infrastructure intelligent agents cannot generate. Pillar page for this cluster.
Related post
The Four Layers Agents Cannot Run Without
The taxonomy beneath the five questions — identity, orchestration, memory, observability.
Related post
Why 77% of Enterprises Can't Get AI Agents to Production
The operational evidence for why these five questions matter — and why most teams hit the wall.
Glossary
Glossary: Dependency Layer
Canonical definition — the five components of agent infrastructure that decide production readiness.
From the book
The AI Agent Economy — Book 1
The full thesis, developed across ten chapters and fifteen falsifiable predictions.