The 5-layer governance framework for enterprise agentic AI - autonomy classification, action controls, escalation logic, and compliance mapped to the EU AI Act.
•
•
21 min read time
Topics
In eighteen months, agentic AI has moved from research prototype to enterprise reality. By the end of 2025, Gartner projected that agentic AI capabilities would be embedded in over a third of enterprise software applications by 2028, up from less than one per cent in 2024.[1] Salesforce launched Agentforce. Microsoft built agent orchestration into Copilot Studio. ServiceNow, SAP and dozens of others followed. The question for enterprises is no longer whether to deploy AI agents but whether they have agentic AI governance in place before something goes wrong.
The urgency is warranted. Agentic AI systems are qualitatively different from the AI that most governance frameworks were designed to address. A classifier assigns labels. A chatbot generates text. An agent pursues goals: planning multi-step workflows, selecting tools, executing actions, observing results and iterating - often across dozens of decisions with minimal human intervention. That operational autonomy is what makes agents valuable. It is also what makes them difficult to govern. Agentic AI governance - the set of policies, controls and oversight mechanisms that ensure autonomous AI agents operate within acceptable risk boundaries - is now an enterprise imperative, not a future consideration.
This guide lays out a practical framework for enterprise agentic AI governance. It covers what agentic AI systems are and why they demand distinct governance treatment; how existing regulations apply; the core governance controls organisations should implement; and how to operationalise those controls without strangling the innovation that agents are meant to enable.
What Makes Agentic AI Different
The term "agentic AI" describes AI systems that can take autonomous actions in pursuit of goals, rather than simply generating outputs for human consumption. The distinction matters because it changes the risk profile fundamentally.
A conventional AI system operates within a request-response paradigm. A user provides input; the system generates output; the user decides what to do with it. The human remains in the loop at every consequential step. An agentic system breaks this pattern. It receives a high-level objective, decomposes it into sub-tasks, selects and invokes tools to accomplish each step, evaluates intermediate results and adjusts its approach - all with varying degrees of human oversight, from full approval at each stage through to fully autonomous execution.
Three characteristics distinguish agentic AI from conventional systems and create distinct governance challenges.
Autonomous action-taking
Agents do not merely recommend; they act. They send emails, execute code, modify databases, call APIs, create files and interact with external services. Each action changes the state of the world in ways that may be difficult or impossible to reverse. A misclassified image can be corrected. An email sent to the wrong recipient, a database record overwritten, or a financial transaction executed in error cannot simply be undone.
Multi-step reasoning chains
Agents operate across extended reasoning chains where each step builds on the last. A single high-level instruction - "find the best candidate for this open role" - may trigger dozens of intermediate decisions about which databases to query, what criteria to weight, which candidates to shortlist and how to communicate with them. Governance frameworks designed for single-decision systems struggle with this compounding complexity.
Dynamic tool invocation
Modern agent architectures allow agents to discover and invoke tools at runtime - APIs, databases, web services, code interpreters - that may not have been anticipated when the system was designed or assessed. This creates a moving target for risk assessment and compliance. The capabilities of the system on Monday may differ materially from its capabilities on Friday.
These three characteristics interact. An agent with autonomous action-taking, operating across multi-step chains, dynamically invoking tools it discovers at runtime, presents a governance challenge that is not merely harder than governing a conventional AI system - it is structurally different.
How Existing Regulations Apply
A common misconception holds that current AI regulation does not cover agentic systems - that a new legislative framework is needed before governance obligations attach. This is incorrect, and organisations that wait for "agentic-specific" regulation risk finding themselves non-compliant under laws already in force.
The EU AI Act
The EU AI Act's definition of an AI system in Article 3(1) describes "a machine-based system that is designed to operate with varying levels of autonomy and that may exhibit adaptiveness after deployment and that, for explicit or implicit objectives, infers, from the input it receives, how to generate outputs such as predictions, content, recommendations, or decisions that can influence physical or virtual environments."[2]
Every element of this definition accommodates agentic AI without strain. "Varying levels of autonomy" explicitly contemplates a spectrum from human-directed to fully autonomous operation. "Implicit objectives" - whilst drafted primarily to capture systems where the objective is implied by design or training rather than stated as a prompt - is broad enough to cover the emergent sub-goals that agents pursue when decomposing high-level instructions, though this interpretive position has not yet been tested in enforcement or through formal AI Office guidance.[3] "Decisions" captures the action-selection that agents perform at each step. And "influence physical or virtual environments" reaches beyond passive output generation to encompass the environment-changing actions that define agentic behaviour.
The Act's enforcement timeline is phased. Prohibitions on unacceptable-risk AI systems took effect on 2 February 2025. Obligations for general-purpose AI (GPAI) models apply from 2 August 2025. The full suite of high-risk system obligations under Annex III takes effect on 2 August 2026.[4] Organisations deploying agentic systems in high-risk use cases have a narrowing window to prepare.
Where an agentic system falls within a high-risk use case listed in Annex III - employment decisions, credit scoring, law enforcement, critical infrastructure management - the full suite of high-risk obligations applies. These include risk management systems (Article 9), data governance (Article 10), technical documentation (Article 11), record-keeping and automatic logging (Article 12), transparency (Article 13), human oversight (Article 14), and accuracy, robustness and cybersecurity requirements (Article 15).[5] Article 12 is worth particular emphasis: for systems whose decisions unfold across multi-step reasoning chains with dynamic tool invocations, the logging requirements are both the most technically demanding and the most operationally important obligation to get right.
The Act's operational framework does strain under agentic use cases - particularly around conformity assessment for dynamic systems, the provider-deployer responsibility split when agents invoke third-party tools, and what "proportionate" human oversight means for systems making dozens of micro-decisions per second.[6] But the regulatory perimeter is clear. Agentic AI systems are AI systems under the Act, and the obligations apply.
GPAI model obligations
Many enterprise agents are built on general-purpose AI models provided by third parties - Anthropic, OpenAI, Google, Meta and others. The EU AI Act imposes specific obligations on GPAI model providers under Articles 51-56, including technical documentation, copyright policy transparency and, for models designated as posing systemic risk, adversarial testing and incident reporting.[7] The GPAI Code of Practice, finalised in late 2025, provides a compliance pathway for these obligations.
For enterprise deployers, the interaction between GPAI provider obligations and deployer obligations creates a layered compliance picture. The foundation model provider bears certain responsibilities; the deploying organisation bears others. Where an agent framework assembles a GPAI model, an orchestration layer and third-party tools into a composite system, the allocation of obligations across the value chain becomes a governance challenge in its own right - one the multi-stakeholder section below addresses directly.
ISO/IEC 42001
The international standard for AI management systems, published in December 2023, provides a framework for establishing, implementing and continually improving an AI management system.[8] Whilst it does not address agentic AI specifically, its controls around risk assessment, human oversight and continuous monitoring are directly applicable. Organisations pursuing ISO 42001 certification should ensure their AI management system explicitly covers autonomous agent deployments, not merely conventional AI applications.
NIST AI Risk Management Framework
The NIST AI RMF's four core functions - Govern, Map, Measure, Manage - provide a structured approach to AI risk management that extends naturally to agentic systems.[9] The framework's emphasis on contextual risk assessment and continuous monitoring is particularly relevant given the dynamic nature of agent behaviour. The companion document on generative AI risks (NIST AI 600-1) addresses several risks pertinent to tool-using agents, including prompt injection and unintended data exposure.[10] The OECD, whose AI system definition influenced Article 3(1), has also been actively developing classification frameworks that include autonomy level as a dimension - work that will likely shape how agentic AI governance evolves internationally.[11]
Emerging state and sector-specific requirements
The Colorado AI Act, effective February 2026 subject to pending legislative amendments, requires developers and deployers of high-risk AI systems to exercise reasonable care to avoid algorithmic discrimination in consequential decisions.[12] Agentic systems making or substantially influencing employment, financial, insurance or housing decisions fall squarely within scope. In financial services, interagency model risk management guidance (the Federal Reserve's SR 11-7 and OCC 2011-12) applies to AI agents used in risk assessment and decision-making, though it was designed for traditional statistical models and requires careful interpretation for foundation-model-based agents operating with autonomous tool access.[13]
The Agentic AI Governance Framework
Governing agentic AI requires controls that address the specific characteristics outlined above: autonomous action-taking, multi-step reasoning and dynamic tool invocation. The framework developed here, based on Enzai's work with enterprise governance teams, organises these controls into five layers: autonomy classification, action whitelisting, escalation logic, traceability, and continuous monitoring. Each layer builds on the one before it.
Layer 1: Autonomy classification
Not all agents require the same level of governance. An agent that drafts email replies for human review presents a fundamentally different risk profile from one that autonomously executes financial transactions. The first step in any agentic AI governance programme is classifying each agent's level of autonomy and mapping that classification to proportionate controls.
Tier | Label | Description | Governance requirement |
|---|---|---|---|
1 | Assistive | Agent generates recommendations or drafts; human reviews and approves every action before execution | Standard AI quality controls |
2 | Supervised | Agent executes actions within defined parameters; human monitors and retains ability to intervene | Clear monitoring protocols and intervention mechanisms |
3 | Bounded autonomous | Agent operates autonomously within a constrained action space and predefined guardrails | Rigorous action whitelisting, escalation logic, continuous monitoring |
4 | Fully autonomous | Agent operates with broad discretion across open-ended tasks | Strongest controls: real-time monitoring, comprehensive audit trails, regular human outcome review |
The classification should be use-case specific, not model specific. The same foundation model might power a Tier 1 assistant in one deployment and a Tier 3 autonomous agent in another. Governance obligations attach to the deployment, not the model.
Layer 2: Action whitelisting and bounded action spaces
The most effective control for agentic AI is constraining what it can do. Rather than attempting to predict and prevent every possible failure mode, action whitelisting defines the set of permissible actions an agent may take - the tools it may invoke, the APIs it may call, the data it may access, the systems it may modify.
This is an allow-list approach, not a deny-list approach. The default posture is that any action not explicitly permitted is prohibited. This inverts the typical security model for software systems (where anything not explicitly prohibited is permitted) and reflects the reality that agentic AI systems may discover and attempt actions that their designers never contemplated.
Practical implementation involves defining action boundaries at three levels:
Tool access: Which APIs, databases, services and code execution environments the agent may invoke
Parameter constraints: What inputs the agent may pass to each tool (for example, restricting a database query agent to read-only operations on specific tables)
Impact limits: Thresholds on the scope of any single action (for example, capping the monetary value of transactions an agent may authorise without human approval)
A critical operational consideration: action whitelists must be maintained as the tool landscape evolves. When a foundation model provider pushes an update that changes model behaviour, or when new tools are added to an agent's potential action space, the whitelist must be reviewed and revalidated. Treating whitelists as static configuration rather than living governance artefacts is a common failure mode.
Layer 3: Escalation logic
Even within bounded action spaces, agents will encounter situations that exceed their competence or authority. Effective governance requires predefined escalation paths - clear rules about when an agent must hand control back to a human rather than continuing autonomously.
Escalation triggers should include:
Confidence thresholds: When the agent's confidence in its chosen action falls below a defined level
Impact thresholds: When a proposed action exceeds predefined impact limits (financial value, number of affected records, irreversibility)
Anomaly detection: When the agent's behaviour deviates from expected patterns
Domain boundaries: When the agent encounters a task or domain outside its defined scope
Failure conditions: When a tool invocation fails or returns unexpected results
The design of escalation mechanisms is as important as the triggers themselves. Escalation must be frictionless for the agent (it should not be "incentivised" to avoid escalation by design patterns that penalise it for doing so) and actionable for the human (the escalation must include sufficient context for the human to make an informed decision without reconstructing the agent's entire reasoning chain).
Layer 4: Traceability and audit trails
Regulatory requirements under the EU AI Act (Article 12 for high-risk systems) and practical incident response both demand comprehensive logging of agent activity.[14] For agentic systems, this means capturing not just inputs and outputs but the full chain of reasoning, tool invocations, intermediate results and decisions at each step.
An effective audit trail for agentic AI should record:
The initial objective or instruction provided to the agent
Each reasoning step and the rationale for action selection
Every tool invocation, including the tool called, parameters passed and response received
Actions taken and their outcomes
Escalation events and human interventions
Environmental state changes resulting from agent actions
Timestamps and session identifiers linking related events
These logs serve multiple purposes: post-incident investigation, regulatory compliance, continuous improvement of governance controls, and accountability attribution when things go wrong. They should be immutable, timestamped and stored independently of the agent system itself to prevent tampering.
Layer 5: Continuous monitoring
Point-in-time assessment - evaluating an agent before deployment and assuming it remains compliant - is insufficient for systems whose capabilities and behaviour may change dynamically. Agentic AI governance requires continuous monitoring across several dimensions:
Performance monitoring: Are the agent's outcomes meeting quality and accuracy thresholds?
Behavioural monitoring: Is the agent's behaviour remaining within expected patterns? Are action distributions shifting over time?
Compliance monitoring: Are the agent's actions remaining within its permitted action space? Are escalation protocols being followed?
Fairness monitoring: For agents making or influencing decisions about individuals, are outcomes equitable across protected characteristics?
Security monitoring: Is the agent being subjected to adversarial inputs, prompt injection attempts or other manipulation?
Monitoring should feed back into governance controls. When monitoring detects an anomaly - an action outside the permitted set, a shift in outcome distributions, a pattern suggestive of adversarial manipulation - the response should be automated where possible (pausing the agent, triggering escalation) and documented for human review.
Who Is Responsible When an Agent Acts?
One of the most difficult aspects of agentic AI governance is that responsibility is fragmented across multiple actors. A typical enterprise agent deployment involves at least four parties: the foundation model provider (OpenAI, Anthropic, Google, Meta or others); the agent framework or orchestration layer (which may be a third-party platform or built in-house); the deploying organisation; and the providers of tools and APIs that the agent invokes at runtime.
The EU AI Act splits obligations between providers and deployers, with some provisions addressing importers and distributors.[15] But this binary does not map cleanly onto the agentic value chain. Who is the "provider" of an agent that combines a third-party foundation model with an in-house orchestration layer calling external APIs? Article 25 addresses situations where third parties modify or repurpose AI systems, potentially becoming providers themselves, but the boundaries remain unclear for dynamically composed systems.[16]
This fragmentation also creates a foundation model supply chain risk that many organisations underestimate. When a model provider pushes an update - a new model version, changed safety filters, altered behaviour patterns - the governance controls validated against the previous version may no longer hold. Action whitelists, escalation thresholds and compliance assessments all assume a particular model behaviour baseline. A silent model update can invalidate those assumptions without any change on the deployer's side.
Practical governance must account for this fragmentation. Organisations deploying agentic AI should:
Map the full value chain for each agent deployment, identifying every actor and their governance responsibilities
Establish contractual arrangements with tool and API providers that address data handling, liability and incident response
Maintain a clear internal accountability structure - who owns each agent deployment, who is responsible for monitoring, and who has authority to intervene or shut down
Implement version pinning and change management processes for foundation models, with re-validation triggers when providers release updates
Document the allocation of responsibilities in a way that would satisfy a regulatory audit
Incident Response for Agentic AI
Governance is not only about preventing failures; it is about responding effectively when they occur. Agentic AI systems require incident response procedures that account for their autonomous, multi-step nature.
An agentic AI incident response plan should include:
Kill switches: The ability to immediately halt an agent's execution, revoking its tool access and preventing further actions. This must be technically reliable (not dependent on the agent's cooperation) and accessible to designated personnel within seconds
Rollback procedures: Where possible, predefined procedures for reversing actions taken by an agent. Not all actions are reversible, which makes the audit trail and impact limits described above critical for limiting blast radius
Notification obligations: In regulated industries, certain agent failures may trigger reporting requirements. In financial services, unauthorised transactions may require regulatory notification within hours. Under the EU AI Act, serious incidents involving high-risk AI systems must be reported to market surveillance authorities
Root cause analysis: Post-incident investigation should trace the full reasoning chain from the initial objective through each tool invocation and decision to the point of failure, using the audit trail captured by Layer 4
Incident response planning should be conducted before deployment, not after the first failure. Tabletop exercises that simulate agent failures - a breached action boundary, a failed escalation, a compromised tool response - help teams build the muscle memory needed to respond effectively under pressure.
Operationalising Agentic AI Governance
Governance frameworks are only valuable if they can be operationalised - implemented in practice without creating such friction that they prevent the innovation agents are meant to enable. Several principles help bridge the gap between framework and practice.
Integrate governance into the agent lifecycle
Governance should not be a separate workstream that runs parallel to agent development and deployment. It should be embedded in the development pipeline - autonomy classification at the design stage, action whitelisting during development, escalation logic testing before deployment, continuous monitoring in production. This is analogous to the shift-left movement in security: governance built in from the start, not bolted on after the fact.
Automate governance controls
Manual governance processes cannot keep pace with agents that operate at machine speed. Action whitelisting should be enforced programmatically, not through policy documents. Escalation triggers should fire automatically, not rely on human monitoring of logs. Compliance checks should run as automated gates in CI/CD pipelines, giving engineering teams clear pass/fail signals rather than compliance forms to fill out.
Build governance into the AI inventory
Every agentic AI system should be registered in a centralised AI inventory that captures its autonomy classification, permitted action space, escalation protocols, monitoring configuration, accountability assignment and foundation model version. This inventory is the foundation of the governance programme - without it, an organisation cannot answer basic questions about what agents it has deployed, what they are authorised to do, and who is responsible for them.
Sequence implementation pragmatically
Organisations deploying dozens of agents cannot implement all five governance layers across every deployment simultaneously. A practical sequencing approach:
Start with the inventory. You cannot govern what you cannot see. Catalogue every agent deployment, including shadow deployments teams may have spun up informally.
Classify autonomy levels. Map each agent to a tier. This determines the proportionality of every subsequent control.
Implement action whitelisting for Tier 3 and Tier 4 agents first. These carry the highest risk and benefit most from bounded action spaces.
Build audit trails from day one. Logging is the cheapest control to implement early and the most expensive to retrofit.
Layer in continuous monitoring as deployments mature. Start with compliance monitoring (is the agent staying within its whitelist?) and expand to behavioural and fairness monitoring over time.
Treat governance as a competitive advantage
Organisations that view governance solely as a compliance cost will under-invest in it and build resentment among engineering teams. The better frame is governance as a condition for scale. Without systematic controls, agent deployments will be limited to low-risk, low-value use cases because the organisation cannot demonstrate adequate oversight for anything more ambitious. Governance is what enables an enterprise to move from cautious pilot programmes to production deployments with genuine business impact.
What Is Coming Next in Agentic AI Regulation
The agentic AI governance landscape will evolve rapidly over the next two years. CEN and CENELEC are developing harmonised standards under the EU AI Act through Joint Technical Committee 21, with the mandate amended in June 2025 to reflect the acceleration of the standards programme.[17] These standards will need to address what proportionate human oversight looks like for highly autonomous systems - bounded action spaces, structured checkpoints, audit trails and intervention mechanisms - without demanding a human in the loop at every turn.
The European Commission has the power to update the high-risk use-case list in Annex III through delegated acts as agentic-specific risks emerge, without requiring full legislative amendment.[18] Commission guidance can also clarify how the provider-deployer framework maps onto agentic value chains and when runtime tool invocation counts as "substantial modification" under Article 3(23). Codes of practice under Article 56 offer another lever, addressing agent-specific risks at the GPAI model layer - controllability features, tool-use logging and action-space constraints.[19]
Industry groups including the IAPP and the OECD are actively working on governance guidance for autonomous AI.[20] ISO/IEC JTC 1/SC 42 continues to expand the 42000 series of AI standards, and it would not be surprising to see agentic-specific work items emerge as the deployment landscape matures.
Organisations that wait for these standards and guidance before acting will find themselves retrofitting governance into agent deployments that were never designed for it - an expensive and disruptive exercise. The more prudent approach is to build governance infrastructure now and adapt as regulatory and standards-based requirements crystallise.
The challenge of governing agentic AI is real, but it is not unprecedented. Enterprises have governed other complex, autonomous and high-risk systems before - from algorithmic trading to autonomous vehicles to robotic process automation. The principles are the same: classify risk, constrain action, require escalation, maintain traceability and monitor continuously. What is new is the speed at which agentic AI is being deployed and the breadth of enterprise functions it is touching.
Implementing these five governance layers across dozens of agent deployments, each with different autonomy tiers, tool access configurations and regulatory obligations, is an infrastructure problem as much as a policy one. At Enzai, building that infrastructure - from autonomy classification and action whitelisting through to continuous monitoring and regulatory compliance across the EU AI Act, ISO 42001 and NIST AI RMF - is the challenge our platform is designed to address. To learn more, book a demo.
References
[1] Gartner, "Agentic AI: The Next Frontier of Enterprise AI," October 2024. Gartner projected that agentic AI capabilities would feature in 33% of enterprise software applications by 2028.
[2] Regulation (EU) 2024/1689 of the European Parliament and of the Council, Article 3(1). Official Journal of the European Union, L series, 12 July 2024.
[3] The European Commission's interpretive guidance on the AI system definition (published 2025) provides additional context but has not specifically addressed emergent sub-goal decomposition in agentic systems.
[4] Regulation (EU) 2024/1689, Articles 113-114 (entry into force and application dates).
[5] Regulation (EU) 2024/1689, Chapter III, Section 2, Articles 9-15.
[6] For a detailed analysis of these operational strains, see Enzai, "The EU AI Act Bends. It Need Not Break," March 2026.
[7] Regulation (EU) 2024/1689, Chapter V, Articles 51-56 (Obligations for providers of GPAI models).
[8] ISO/IEC 42001:2023, Information technology - Artificial intelligence - Management system. International Organization for Standardization, December 2023.
[9] NIST, Artificial Intelligence Risk Management Framework (AI RMF 1.0), NIST AI 100-1, January 2023.
[10] NIST, Artificial Intelligence Risk Management Framework: Generative Artificial Intelligence Profile, NIST AI 600-1, July 2024.
[11] OECD, "OECD Framework for the Classification of AI Systems," OECD Digital Economy Papers No. 323, February 2022. The OECD AI Principles were updated in May 2024.
[12] Colorado SB 24-205, Concerning Consumer Protections for Artificial Intelligence, signed May 2024. Original effective date 1 February 2026; subject to pending legislative amendments.
[13] Board of Governors of the Federal Reserve System, SR Letter 11-7, "Supervisory Guidance on Model Risk Management," April 2011; Office of the Comptroller of the Currency, OCC 2011-12.
[14] Regulation (EU) 2024/1689, Article 12 (Record-keeping / automatic logging for high-risk AI systems).
[15] Regulation (EU) 2024/1689, Articles 16 (provider obligations) and 26 (deployer obligations).
[16] Regulation (EU) 2024/1689, Article 25 (Obligations of other parties along the AI value chain).
[17] European Commission Standardisation Request M/593 to CEN and CENELEC, as amended June 2025; CEN-CENELEC JTC 21 work programme.
[18] Regulation (EU) 2024/1689, Article 7 (Amendments to Annex III).
[19] Regulation (EU) 2024/1689, Article 56 (Codes of practice for GPAI models).
[20] IAPP AI Governance Center; OECD AI Policy Observatory, oecd.ai; ISO/IEC JTC 1/SC 42 work programme.
Empower your organization to adopt, govern, and monitor AI with enterprise-grade confidence. Built for regulated organizations operating at scale.







