The POC works. The agent triages tickets, drafts documents, and pulls customer history on demand. Leadership sees the demo and wants it in three departments by Q3.
Then the questions nobody asked during the pilot start surfacing.
- Where does the agent get production data without someone manually exporting a CSV?
- Who else on the team deployed an agent last month that nobody registered?
- What happens when it writes the wrong value into your CRM?
IBM’s research found that 76% of executives are already operating agent POCs. The problem is that the infrastructure those agents need to run reliably in production (data access, coordination, governance, observability) wasn’t part of the pilot.
The gap between running a POC and running agents in production lies in the capabilities you build around them.
Here’s what those capabilities look like when your data team is three people and your engineering bench is five.
Your agents need a coordinator, not a control plane
A support team deploys two agents. One auto-responds to customer tickets. The other routes tickets to specialists based on complexity.
A billing dispute comes in. The response agent drafts a refund offer and sends it. The routing agent flags the same ticket and forwards it to the billing team for review.
The customer now has a refund promise in their inbox while the billing team is still deciding whether a refund is warranted. Both agents did exactly what they were designed to do, and the customer experience is worse for it.
This happens because neither agent knows the other exists. There is no coordination layer deciding which agent acts first, which waits, or which defers to a human.
What right-sized orchestration looks like
Microsoft’s own Store Assistant runs one coordinator agent that manages four to five expert skills: sales advice, technical support, human handoff, and conversation close.
Microsoft reported revenue at 142% of forecast and human transfers down 46% six months after launch. The architecture is deliberately simple — one planner, a roster of experts with explicit plug-ins, change-feed triggers from Cosmos DB feeding Azure Functions.
Microsoft’s own Cloud Adoption Framework recommends starting with a single agent and moving to multi-agent only when specific criteria require it, such as crossing security boundaries or enforcing separation of duties. The Azure Architecture Center’s orchestration guide puts it directly: use the lowest level of complexity that reliably meets your requirements.
What you can do
Start with one agent that has access to multiple tools. Move to a second agent only when the workflow crosses a security boundary or requires a genuinely separate knowledge domain.
If you run multiple agents, designate one as the coordinator and keep the total to five or fewer. The test is simple: if the infrastructure to coordinate your agents takes longer to build than the agents themselves, you’ve over-engineered the solution.
Stay updated with Simform’s weekly insights.
If you can’t list your agents, you can’t govern them
Three months after a team’s first agent goes live, the data team has deployed one for report generation, a product manager has built another using a low-code tool, and an engineer’s decommissioned POC still runs nightly, pulling customer records from the CRM.
When the CFO asks how many AI systems access financial data, the answer takes three days because nobody kept a list.
Grant Thornton’s Survey found that 75% of boards have approved major AI investments, but 48% have not set governance expectations for AI, and 46% have not integrated AI risk into ongoing board or committee oversight.
The investments are funded. The mechanisms to determine whether what was funded is working or even running aren’t in place. CIO.com calls unmanaged agents “the new shadow IT” and projects that the average organization will run 50+ specialized agents within the near-term forecast horizon.
At mid-market scale, the number will be smaller, but the problem is the same. Every unregistered agent is an unmanaged access point to production data.
What you can do
The registry at mid-market scale is a shared document with five columns: agent name, owner, systems accessed, data sensitivity level, and last review date. Add a sixth: the kill-switch procedure for shutting the agent down if something goes wrong. Review it quarterly.
The point is not tooling. Grant Thornton’s own recommendation is a federated model with central policy and delegated assessment, which, at mid-market scale, means that one person defines the boundaries, and each agent-owner verifies compliance within them.
Your agents are only as reliable as the data they reach
Dashboards built on stale data show wrong numbers. Agents built on stale data take wrong actions. That’s the difference that makes data quality an operational risk for agents in a way it never was for BI.
An agent that queries a batch-refreshed warehouse or a dataset with inconsistent definitions will give confident answers built on unreliable inputs.
The fix is data products — governed datasets with consistent definitions and standard interfaces
What this looks like at mid-market scale
Gay Lea Foods, a dairy cooperative with ~1,200 member farms, had monthly reporting that took 24 days post-close because data was fragmented across Dynamics 365 and Excel with inconsistent definitions across departments.
After unifying onto a single Fabric semantic layer, reporting dropped to under a day. One governed dataset, standard SQL endpoints, every consumer querying the same definitions — including agents.
The same principle applies to context. An agent without persistent memory asks the same questions every session, eroding trust as fast as bad source data. Managed services like Azure AI Search, Agentic Retrieval, and Fabric IQ’s shared ontology handle this without a dedicated retrieval team.
Agent governance isn’t AI governance with a new label
Most AI governance frameworks were built for models that recommend. Agents don’t recommend. They act. They issue credits, modify records, send communications, and trigger workflows. And some of those actions can’t be cleanly reversed once they’ve been executed.
IBM’s AI Ethics Board identifies non-reversibility as the defining agent risk — alongside opaqueness, open-endedness, and compounding complexity — that traditional AI governance doesn’t address.
Grant Thornton’s survey found organizations with fully integrated AI are nearly four times more likely to report revenue growth than those still piloting. The difference, consistently across the data, is governance — the infrastructure that gives leadership confidence to expand what agents are allowed to do.
What you can do
Before any agent gets write access to a production system, define which actions it can execute independently and which require human confirmation. Refunds above a threshold, modifications to regulated records, outbound communications — approval gates built into the workflow, not reviewed after the fact.
The mid-market teams that will scale agents successfully are the ones that treat the infrastructure around the agent as the actual product. Coordination, inventory, data, governance — the agent is the easiest part to build. The scaffolding that makes it reliable is where most teams under-invest, and where the scaling gap lives.
If you’re evaluating how to build the infrastructure that enables agents to run reliably, here’s how we approach it.