DEV Community

NTCTech
NTCTech

Posted on • Originally published at rack2cloud.com

Nobody Knows How Many AI Agents They're Running

Field Notes — Engineering Notes from the Complexity Gap | Rack2Cloud

Ask an infrastructure team how many virtual machines they run and they'll give you a number. Ask how many Kubernetes clusters they operate and they'll point to a dashboard. Ask for an ai agent inventory and the answer usually becomes a discussion about definitions.

That discussion is itself the problem.

AI agent inventory classification gap — workflows misclassified as automation instead of agents

The Classification Problem Nobody Solved

Before any organization can inventory its agents, it has to decide what counts as one. Most haven't.

Example Agent? Why It's Ambiguous
Scheduled GPT workflow Maybe Invokes a model but may not have persistent tool grants or autonomous decision logic
Copilot Studio workflow with tool access Maybe Executes autonomously but ownership often remains with the workflow creator rather than infrastructure
n8n automation with LLM call Depends Classification hinges on whether it makes decisions or merely routes output
LangChain service Usually Has persistent context and tool invocation — but deployed like an application
Claude Desktop with MCP tools Maybe Session-scoped authority that can span multiple systems; tool grants outlive the session
Bedrock Agent Yes Explicit agent runtime, managed execution, tool grants tracked at the service level

The point isn't that the examples are edge cases. Most of them are production deployments at mid-size enterprises right now. The point is that organizations are making inventory decisions — including the decision not to maintain one — without agreeing on what they're inventorying.

Classification failures aren't new. Dependency classification failure drove the VMware dependency audit problem. Infrastructure classification failure is how shadow control planes form. Authority classification failure is what MCP tool chains expose. The agent inventory problem is the same failure recurring in a new layer of AI infrastructure architecture: the organization hasn't defined the category, so it can't close the gap.

The Deployment Pattern That Made This Inevitable

Agents entered the enterprise through workflow tooling, not infrastructure procurement. That single fact explains why nobody inventories them.

Virtual machines entered through infrastructure teams — request process, allocation review, tracking system, chargeback model. Containers entered through platform teams — Kubernetes clusters, namespaces, resource quotas, admission controllers. In both cases the inventory system existed before the workload arrived.

Agents entered through business users building Copilot Studio workflows, automation teams deploying n8n and Zapier integrations, developers wiring LangChain services into internal tooling, prompt engineers connecting models to production data sources via MCP, and SaaS vendors activating agent capabilities inside tools the enterprise already licensed.

None of those paths required infrastructure involvement. None triggered a procurement workflow with an inventory requirement. The agents exist in environments the infrastructure team owns — but they arrived through governance gaps those environments don't cover.

The Agent Already Exists. Nobody Calls It One.

Many organizations aren't failing to inventory agents because they don't have them. They're failing because they still classify them as workflows.

The scheduled summarization job that runs every morning, pulls from three internal systems, and posts a digest to Slack — that's an agent. The ticket triage workflow that reads incoming requests, classifies severity, assigns queues, and escalates edge cases — that's an agent. The procurement assistant that checks vendor contracts against policy and flags renewal risks — that's an agent.

Each was built by a team that would describe it as automation. The distinction matters architecturally: a workflow executes a fixed sequence. An agent makes decisions based on model output, invokes tools, and produces actions with real side effects. The execution authority profile is different. The governance requirement is different.

When agents arrive through non-infrastructure paths and get classified as workflows, they never reach any inventory layer. They exist at production scale in invisible categories.

agent classification gap — workflows misclassified as automation instead of agents

What "Running" Means — and Why the Inventory Gap Becomes an Authorization Gap

Even once an organization decides what counts as an agent, "how many are running" isn't a straightforward question.

Agents are often stateless between invocations. The execution ends, context clears, compute cost drops to zero. By a naive definition, it isn't running. But the tool grants remain active. The webhook registration persists. The scheduled trigger fires again tomorrow. The downstream system it's authorized to write to hasn't revoked that access. The agent is dormant, not decommissioned — and the distinction is rarely tracked.

This is where the inventory gap becomes an authorization gap. The progression is predictable:

Unknown Count → Unknown Ownership → Unknown Authority → Unknown Risk

An organization that can't answer how many agents exist also can't answer who owns each one, what each is authorized to do, or what actions it can trigger. That's not an inventory problem anymore — it's the same class of gap that the Governance & Runtime Control layer is designed to close, and the same structural failure that emerges when console access bypasses IaC, or when SaaS control planes accumulate authority nobody audited. The mechanism is different. The failure mode is structurally identical.

ai agent authorization gap — unknown count to unknown authority progression

What a Real AI Agent Inventory Looks Like

A minimum viable agent inventory isn't a CMDB project. It's a set of questions every deployed agent should be able to answer:

Attribute Why It Matters
Owner Accountability — who is responsible when the agent produces an unintended action
Authority Level Execution scope — not just what tools it can access, but what actions it can authorize
Tool Grants Technical access — which systems, APIs, and data sources it can reach
Trigger Mechanism Activation path — scheduled, event-driven, user-invoked, or chained from another agent
Last Active Lifecycle signal — dormant agents with active tool grants are the core exposure
Decommission Criteria Retirement governance — under what conditions does this agent get turned off

The authority level attribute is the one most inventory attempts omit. Tool grants describe technical access. Authority level describes execution scope — whether the agent can approve, publish, modify, escalate, or commit. Two agents with identical tool grants can have radically different risk profiles depending on what those tools allow at the action layer. This exposure compounds when AI vendors activate agent capabilities inside tools the enterprise already licensed — the authority level was never defined because the agent was never classified as one.

Architect's Verdict

The problem isn't that organizations don't know how many agents they're running. The problem is that they don't know how many entities are capable of exercising authority inside their environment. Counting agents is an inventory exercise. Understanding what they're allowed to do is an architecture exercise.

The classification failure precedes the inventory failure. Until organizations define what an agent is — consistently, across business units, workflow teams, and infrastructure — the inventory will always be incomplete. Every Copilot Studio workflow with tool access that gets classified as "just automation" is a gap in the authority map, not just a missing row in a spreadsheet.

Infrastructure teams built accountability structures for VMs and containers because those workloads arrived through processes that required it. Agents arrived through processes that didn't. Fixing the inventory problem means retrofitting that accountability structure onto a layer that was specifically deployed to avoid it.

Originally published at rack2cloud.com

Top comments (0)