
A structured AI vendor risk assessment framework - questionnaires, red flags, contractual protections, and ongoing monitoring under the EU AI Act.
•
•
15 min read time
Topics
Most enterprise AI is not built in-house. According to Gartner, by 2025 over 70% of organisations were expected to have adopted at least one form of AI delivered by a third-party provider [1]. That figure has only grown. Yet the governance frameworks most organisations rely on were designed for software they control, not for probabilistic systems trained on data they have never seen, updated on schedules they do not set, and operating according to logic that may be opaque even to the vendor itself.
The regulatory landscape has caught up with this reality. The EU AI Act, which entered its phased enforcement period in 2025, draws a clear line between providers (those who develop or place an AI system on the market) and deployers (those who use it under their own authority). But that line does not absolve deployers of responsibility. Article 26 makes clear that deployers of high-risk AI systems bear specific obligations around human oversight, input data quality, monitoring, and record-keeping [2]. The uncomfortable truth for procurement and compliance teams is this: regulatory liability does not transfer to the vendor simply because the vendor built the model.
This guide provides a structured AI vendor risk assessment framework, from initial due diligence through to ongoing monitoring.
Why third-party AI risk demands a different approach
Traditional vendor risk management evaluates uptime, data security, and contractual SLAs. These remain necessary for AI vendors, but they are not sufficient. AI systems introduce a category of risk that conventional IT procurement was never built to address.
Three properties make third-party AI fundamentally different from other software:
Opacity of decision logic. A conventional SaaS application executes deterministic code. An AI system may produce outputs shaped by training data, fine-tuning choices, and inference-time parameters that the deploying organisation has no visibility into. When a credit-scoring model denies an application or an HR screening tool ranks candidates, the deployer needs to explain that decision. If the vendor cannot provide a meaningful explanation, the deployer is left accountable for outcomes it cannot interpret.
Non-static behaviour. Traditional software changes through versioned releases. AI models can shift in behaviour through retraining, fine-tuning, or changes to underlying data pipelines, sometimes without a formal release cycle. A model that performed within acceptable parameters during procurement may drift over subsequent months. The deployer may not know until harm has already occurred.
Inherited data risk. The training data that shapes a model's behaviour may contain biases, copyrighted material, or personal data processed without adequate legal basis. The deployer inherits the consequences of these upstream choices, even though it played no part in making them. Under GDPR, if an AI system processes personal data in a manner inconsistent with the deployer's data processing agreements, the deployer bears enforcement risk alongside the provider [3].
Conventional vendor scorecards do not capture these dynamics. A purpose-built assessment framework is essential.
A Structured AI Vendor Risk Assessment Framework
The following framework organises AI vendor evaluation into seven domains. Each domain maps to a specific governance concern and can be scored during procurement, periodic review, or triggered reassessment.
Organisations using platforms such as Enzai can integrate this framework into existing vendor lifecycle workflows, ensuring that AI-specific criteria sit alongside traditional IT risk indicators rather than in a separate, disconnected process.
Assessment questionnaire
Domain | Assessment criteria | Scoring guidance |
|---|---|---|
Model transparency | Does the vendor disclose model type, architecture family, and version? Can the vendor provide model cards or datasheets? Is there documentation of known limitations? | Full disclosure with model cards = high. Partial disclosure = medium. "Proprietary, cannot share" = low. |
Data governance | What data was used for training? Does the vendor confirm lawful basis for data processing? Are data provenance records available? How is personal data handled in inference? | Documented data lineage with legal basis = high. General statements without evidence = medium. No information available = low. |
Bias and fairness testing | Has the vendor conducted bias audits? Across which protected characteristics? Are results available? What remediation processes exist? | Independent third-party audit with published results = high. Internal testing with documentation = medium. No testing or "we don't test for that" = low. |
Security | What security certifications does the vendor hold (SOC 2, ISO 27001)? How are models protected against adversarial attack, prompt injection, or data extraction? Is there a vulnerability disclosure programme? | Relevant certifications plus AI-specific security measures = high. General certifications only = medium. No certifications = low. |
Compliance and certifications | Can the vendor demonstrate compliance with the EU AI Act, GDPR, or sector-specific regulation? Is there a conformity assessment for high-risk systems? | Conformity assessment completed with documentation = high. Compliance programme in progress = medium. No compliance activity = low. |
Incident response | Does the vendor have a documented AI incident response plan? What are notification timelines? Is there a post-incident review process? | Documented plan with defined SLAs and post-incident review = high. General incident process not specific to AI = medium. No process = low. |
Update and change management | How does the vendor communicate model updates? Is there a change notification window? Can the deployer test updates before they go live? Is rollback possible? | Pre-notification with testing window and rollback = high. Notification after deployment = medium. No notification process = low. |
Sub-processor and foundation model dependency | Does the vendor rely on upstream AI providers (e.g. OpenAI, Anthropic, Google)? What happens if the upstream provider changes terms, updates models, or experiences an outage? Are contractual protections passed through? | Full disclosure of upstream providers with contractual pass-through and contingency plans = high. Partial disclosure = medium. No disclosure or single-provider dependency with no fallback = low. |
This table should be treated as a living instrument. Scoring thresholds will vary by use case: a high-risk AI system used in healthcare triage demands far stricter transparency than a low-risk content recommendation tool.
Key questions to ask vendors
A questionnaire only works if the questions are specific enough to expose genuine risk. Vague questions invite vague answers. The following questions, organised by category, are designed to produce actionable responses.
Model transparency and explainability
What model architecture does this system use, and what version is currently deployed in production?
Can you provide a model card or technical datasheet describing intended use cases, known limitations, and performance benchmarks?
What methods are available for explaining individual predictions or decisions to affected individuals?
If the model is based on a foundation model from a third party (e.g., an LLM provider), can you disclose which foundation model and version you are building on?
Data governance and privacy
What datasets were used to train and validate this model, and can you provide a data provenance record?
What is the lawful basis under GDPR (or equivalent regulation) for processing personal data in training?
Does the model retain, memorise, or reproduce training data during inference? What safeguards prevent data leakage?
How is customer data used after deployment - is it fed back into model training, and can the customer opt out?
Bias, fairness, and safety
Has this model been audited for bias across protected characteristics as defined under applicable anti-discrimination law?
Who conducted the audit, and are the results available for review?
What ongoing monitoring is in place to detect emergent bias or performance degradation across subgroups?
For generative AI systems: what guardrails prevent the generation of harmful, misleading, or legally problematic outputs?
Security and resilience
What specific protections are in place against adversarial attacks, prompt injection, model inversion, or training data extraction?
Has the system undergone AI-specific penetration testing or red-teaming beyond standard application security testing?
What is the disaster recovery and business continuity plan specific to the AI components of this service?
Compliance and regulatory readiness
Has this system been classified under the EU AI Act risk categories, and if so, what classification has it received?
Can you provide documentation of a conformity assessment for high-risk AI systems as required under Article 43?
What is your timeline for full compliance with applicable AI regulations in our operating jurisdictions?
Incident response and accountability
What is your AI-specific incident response plan, and what triggers an incident classification?
What is the contractual notification timeline for AI-related incidents affecting our deployment?
Can you provide examples of past AI incidents and how they were resolved?
Not every question will apply to every vendor. But the absence of credible answers to questions that clearly do apply is itself a finding.
Red flags in vendor responses
Vendor evaluation is as much about how organisations receive answers as what those answers contain. Certain patterns in vendor responses should trigger heightened scrutiny.
Vagueness masquerading as confidentiality
There is a legitimate basis for protecting trade secrets. But a vendor that refuses to disclose the general architecture family of a model, the categories of training data used, or the existence of bias testing is not protecting intellectual property. It is obscuring risk. A vendor that says "our model is proprietary and we cannot share details" in response to every question about transparency is not a vendor that can support a deployer's regulatory obligations.
Absence of documentation
If a vendor cannot produce a model card, a data governance policy, or an incident response plan, the most likely explanation is that these do not exist. The absence of documentation is not a neutral finding. It indicates that the vendor has not invested in the governance infrastructure required to support responsible deployment.
Resistance to audit rights
Any vendor that pushes back on contractual audit rights, whether framed as logistically difficult or commercially unreasonable, should be treated with caution. The EU AI Act explicitly anticipates that deployers will need to verify provider compliance [4]. A vendor that resists audit provisions is a vendor that may not withstand scrutiny.
No incident response process
If the vendor's response to "What is your AI incident response plan?" is silence, a redirect to generic IT incident management, or a promise to develop one, the organisation should consider whether it is prepared to bear the full weight of any AI failure without vendor support.
Shifting liability language
Watch for contractual language that attempts to shift all liability for AI outcomes to the deployer. Whilst deployers do bear obligations, a vendor that accepts zero accountability for model performance, bias, or failure is signalling something about its confidence in its own systems.
The pattern matters more than any individual response. A vendor that is transparent about genuine limitations is far less risky than one that claims perfection whilst providing no evidence.
Contractual protections for AI procurement
Assessment is necessary but not sufficient. The findings must be embedded in enforceable contractual terms. Standard software procurement agreements rarely contain provisions adequate for AI-specific risk. The following clauses should be considered for inclusion in any AI vendor agreement.
Audit rights
The agreement should grant the deployer the right to audit the vendor's AI systems, either directly or through an independent third party, at reasonable intervals and upon the occurrence of specified trigger events (such as a reported incident or regulatory inquiry). This right should extend to model performance data, bias testing results, and data governance practices.
Incident notification
The agreement should specify maximum notification timelines for AI-related incidents, distinct from general service incidents. For high-risk systems, notification within 24 hours of discovery is a reasonable starting position. The notification should include the nature of the incident, affected systems, estimated impact, and remediation steps.
Data handling and retention
The agreement should specify precisely how customer data is used in relation to the AI system: whether it is used for model improvement, how it is stored, when it is deleted, and whether the customer can request its removal from training datasets. This is particularly critical in light of ongoing regulatory attention to the intersection of AI training and data protection rights [5].
Model change notification
The agreement should require the vendor to provide advance notice of material changes to the model, including retraining on new data, architecture changes, and significant parameter adjustments. The notice period should allow the deployer to conduct testing before the updated model enters production in their environment. A 30-day notification window for non-urgent changes is a reasonable baseline.
Performance benchmarks and SLAs
Beyond traditional uptime SLAs, the agreement should establish measurable performance benchmarks for the AI system, including accuracy thresholds, fairness metrics, and latency requirements. Breach of these benchmarks should trigger defined remediation obligations and, where appropriate, termination rights.
Liability allocation
The agreement should allocate liability for AI-related harms in a manner that reflects each party's degree of control. A vendor that controls the model, training data, and update cycle should bear proportionate liability for defects in those components. Blanket indemnification in favour of the vendor is not appropriate for AI deployments.
Strong contracts do not replace governance, but they provide the enforcement mechanism that makes governance operational.
Ongoing monitoring: beyond the one-time assessment
An AI vendor assessment conducted once at procurement and filed away is a compliance artefact, not a risk management practice. AI systems change, and so do the risks they present.
Continuous monitoring indicators
Organisations should establish ongoing monitoring across several dimensions:
Performance drift. Track the AI system's output quality against the benchmarks established during procurement. Degradation may indicate model drift, data pipeline issues, or undisclosed model changes.
Incident frequency and severity. Log all AI-related incidents, including near-misses, and trend them over time. An increasing frequency of anomalous outputs warrants investigation.
Regulatory developments. Monitor changes in applicable AI regulation that may alter the risk classification of the system or impose new obligations on deployers.
Vendor financial and operational stability. A vendor facing financial distress may reduce investment in model maintenance, security, and compliance - all of which directly affect the deployer.
Reassessment triggers
Certain events should trigger a full reassessment of the vendor, outside the regular review cycle:
The vendor announces a major model update or architecture change
A significant AI incident occurs, whether involving the organisation's deployment or reported publicly
Regulatory guidance changes the risk classification of the AI system
The organisation changes its use of the AI system in a way that materially alters the risk profile
The vendor is acquired, merges, or undergoes significant leadership change
Embedding monitoring in governance workflows
Effective ongoing monitoring requires more than good intentions. It requires tooling that integrates vendor oversight into existing governance cadences. Enzai provides the infrastructure for continuous third-party AI monitoring, connecting assessment results, contractual obligations, and real-time performance indicators into a single governance workflow. Without this integration, monitoring obligations tend to degrade over time, leaving organisations exposed precisely when vigilance matters most.
Governance is not an event. It is a continuous practice.
The Article 25 trap: when deployers become providers
One of the most consequential and least understood provisions in the EU AI Act concerns the circumstances under which a deployer is reclassified as a provider. Article 25 stipulates that a deployer shall be considered a provider if it places its name or trademark on a high-risk AI system already on the market, makes a substantial modification to a high-risk system, modifies the intended purpose of an AI system such that it becomes high-risk, or places a high-risk system on the market after a third party has already done so under agreement with the original provider [6].
This has direct implications for organisations that customise third-party AI. Fine-tuning a vendor's model on proprietary data, modifying its outputs through additional processing layers, or deploying it for a use case materially different from the vendor's stated intended purpose may each be sufficient to trigger reclassification.
The consequences of reclassification are significant. A provider bears the full weight of EU AI Act obligations for high-risk systems, including conformity assessments, technical documentation, post-market monitoring, and registration in the EU database. These obligations are substantially more onerous than those imposed on deployers.
Practical steps to avoid the trap
Document the intended purpose. Maintain clear records of the vendor's stated intended purpose for the AI system and the organisation's actual use case. Any divergence should be reviewed by legal and compliance teams.
Assess customisation scope. Before fine-tuning, retraining, or materially modifying a third-party AI system, conduct an assessment of whether the modification constitutes a "substantial modification" under the Act. The European Commission's guidance on this point is expected to provide further clarity, but the risk exists now [7].
Contractual clarity. Ensure the vendor agreement clearly delineates which party is the provider and which is the deployer, and specify the conditions under which that classification might change.
Seek legal counsel early. The provider-deployer boundary is a question of substance, not labels. Calling oneself a deployer in a contract does not override a factual determination that one is acting as a provider.
The Article 25 trap is not hypothetical. As organisations increasingly customise and fine-tune vendor AI systems, the boundary between deployment and provision is likely to become one of the most actively litigated areas of AI regulation.
Building a defensible third-party AI programme
Third-party AI vendor risk assessment is not a procurement checkbox. It is an ongoing governance discipline that spans legal, technical, and operational domains. The organisations that manage this risk effectively will be those that treat AI vendor oversight with the same rigour they apply to financial controls or data protection.
The framework outlined here provides a starting point: structured assessment, specific questions, contractual protections, continuous monitoring, and awareness of the regulatory traps that can transform a deployer into a provider overnight. None of it is optional for organisations deploying high-risk AI systems in regulated environments.
For organisations seeking to operationalise third-party AI governance at scale, Enzai provides the platform infrastructure to manage vendor assessments, track obligations, and maintain continuous oversight across the full AI portfolio. Request a demo to see how it works in practice.
References
[1] Gartner, "Gartner Predicts 70% of Organisations Will Shift Focus to AI Governance," 2024.
[2] European Parliament and Council, Regulation (EU) 2024/1689 (EU AI Act), Article 26 - Obligations of Deployers of High-Risk AI Systems, 2024.
[3] European Data Protection Board, "Opinion on the Interplay Between the AI Act and the GDPR," 2024.
[4] European Parliament and Council, Regulation (EU) 2024/1689 (EU AI Act), Article 13 (Transparency and provision of information to deployers) and Article 26 (Deployer obligations), 2024.
[5] European Data Protection Board, "Guidelines on the Use of Personal Data in AI Model Training," 2025.
[6] European Parliament and Council, Regulation (EU) 2024/1689 (EU AI Act), Article 25 - Responsibilities along the AI value chain, 2024.
[7] European Commission, "Guidelines on Substantial Modification of AI Systems" (forthcoming), referenced in Recital 88 of the EU AI Act.

Download Whitepaper
Third-Party AI Vendor Risk: How to Assess AI You Didn't Build
Get a practical guide to navigating AI regulation, risk, and compliance — including EU AI Act readiness, best practices, and real-world governance patterns.
Empower your organization to adopt, govern, and monitor AI with enterprise-grade confidence. Built for regulated organizations operating at scale.






