The rapid deployment of large language models into high-stakes domains has outpaced the regulatory frameworks designed to govern them. AI governance -- the policies, processes, and technical controls that ensure AI systems are developed and deployed responsibly -- is no longer optional for engineering teams. This article examines the emerging regulatory landscape, practical frameworks for AI risk assessment and auditing, and the engineering practices required to build compliant, accountable AI systems.
The European Union's AI Act, which entered into force in August 2024, is the world's first comprehensive AI regulation. It establishes a risk-based classification framework that directly impacts how AI systems must be built, tested, and deployed.
Unacceptable Risk (Prohibited): AI systems that pose a clear threat to safety, livelihoods, or rights are banned outright. This includes social scoring by governments, real-time biometric identification in public spaces (with limited exceptions), and manipulation techniques that exploit vulnerabilities.
High Risk: AI systems used in critical domains must meet stringent requirements. These domains include employment and worker management (resume screening, performance evaluation), education (automated grading, admissions), essential private and public services (credit scoring, insurance), law enforcement (predictive policing, evidence evaluation), and migration and border control (visa assessment, risk profiling).
High-risk systems must comply with requirements for risk management systems, data governance, technical documentation, record-keeping, transparency, human oversight, accuracy, robustness, and cybersecurity.
Limited Risk: Systems with specific transparency obligations, primarily chatbots and deepfake generators that must disclose their AI nature.
Minimal Risk: Most AI systems, including spam filters and AI-enabled video games, face no specific requirements beyond existing law.
The EU AI Act follows a phased enforcement schedule that engineering teams must plan around:
For engineering teams, practical compliance steps include:
Note: The August 2026 deadline for full high-risk system compliance is closer than it appears. Logging infrastructure and conformity assessments take months to implement correctly โ start now.
For engineering teams building LLM applications, the EU AI Act imposes concrete requirements:
class EUAIActCompliance:
"""Technical requirements for high-risk AI systems under EU AI Act."""
REQUIRED_DOCUMENTATION = [
"system_description", # General description of the AI system
"design_specifications", # Design and development methodology
"training_data_description", # Description of training datasets
"validation_testing", # Validation and testing procedures
"risk_management_measures", # Risk management measures adopted
"performance_metrics", # Accuracy and robustness metrics
"human_oversight_measures", # How human oversight is implemented
"expected_lifetime", # Expected lifetime and maintenance
]
LOGGING_REQUIREMENTS = {
"input_data": True, # Log inputs (or hashes for privacy)
"output_data": True, # Log outputs
"model_version": True, # Which model version produced output
"timestamp": True, # When inference occurred
"confidence_scores": True, # Model confidence metrics
"human_override": True, # Whether human overrode the output
"data_retention_period": "as specified by deployer, minimum for "
"regulatory purposes",
}
Executive Order 14110, signed in October 2023, established reporting requirements for frontier AI models. Companies training models using more than 10^26 FLOPs must report safety testing results to the government. The order also directs agencies to develop AI risk standards, though its long-term implementation depends on political continuity.
China's AI regulations require algorithmic recommendation transparency, deepfake labeling, and government approval for generative AI services. Canada's proposed Artificial Intelligence and Data Act (AIDA) focuses on high-impact systems with requirements similar to the EU AI Act but with a lighter enforcement framework. Brazil's AI Act proposal follows the EU risk-based approach while adding provisions specific to the Brazilian context.
Model cards, introduced by Mitchell et al. (2019) in "Model Cards for Model Reporting," provide a standardized framework for documenting AI models. They serve as both a communication tool and a compliance artifact.
model_card:
model_details:
name: "CustomerAssist-v3"
version: "3.2.1"
type: "Fine-tuned LLM for customer service"
base_model: "Llama-3-70B"
training_date: "2025-09-15"
developers: "AI Platform Team"
license: "Internal use only"
intended_use:
primary_uses:
- "Automated customer support for product inquiries"
- "Order status checking and basic troubleshooting"
out_of_scope_uses:
- "Medical or health advice"
- "Financial investment recommendations"
- "Legal guidance"
- "Decisions affecting employment or credit"
target_users:
- "Customer service platform operators"
- "End users via chat interface"
training_data:
datasets:
- name: "internal_support_logs"
size: "2.4M conversations"
date_range: "2022-01 to 2025-06"
preprocessing: "PII removed, quality filtered"
- name: "product_documentation"
size: "50K documents"
preprocessing: "Chunked, deduplicated"
known_biases:
- "English-language dominant (85% of training data)"
- "Overrepresents North American customer patterns"
- "Product categories unevenly represented"
performance_metrics:
task_accuracy:
overall: 0.89
by_category:
billing: 0.92
technical: 0.85
returns: 0.91
fairness_metrics:
demographic_parity_ratio: 0.94
equalized_odds_ratio: 0.91
safety_metrics:
toxicity_rate: 0.001
hallucination_rate: 0.03
refusal_rate: 0.05
limitations:
- "Cannot access real-time inventory or order systems without API tools"
- "May hallucinate product specifications for recently launched products"
- "Lower accuracy on multi-language queries"
- "Not tested for accessibility compliance (screen reader compatibility)"
ethical_considerations:
- "Automated responses may lack empathy in sensitive situations"
- "Model may perpetuate biases in historical support interactions"
- "Customer data privacy requires careful context management"
For deployed AI systems (as opposed to standalone models), system cards extend model cards to include the full deployment context: infrastructure, guardrails, monitoring, human oversight mechanisms, and incident response procedures. Anthropic publishes system cards for Claude models, providing a reference template for the industry.
The National Institute of Standards and Technology (NIST) AI RMF, released in January 2023, provides a voluntary framework organized around four functions: Govern, Map, Measure, and Manage.
Govern: Establishing organizational AI risk governance structures. This includes defining roles and responsibilities, setting risk tolerance levels, and creating policies for AI development and deployment.
Map: Understanding the context in which AI systems operate. This involves identifying stakeholders, mapping potential impacts, and characterizing the AI system's technical properties.
Measure: Quantifying AI risks through testing, evaluation, verification, and validation (TEVV). This includes bias testing, robustness evaluation, and performance benchmarking.
Manage: Implementing controls to address identified risks. This includes technical mitigations, organizational processes, and monitoring systems.
A practical risk assessment for an LLM deployment involves systematic evaluation across multiple dimensions:
class AIRiskAssessment:
def __init__(self, system_name: str):
self.system_name = system_name
self.risks = []
def assess_risk(
self,
category: str,
description: str,
likelihood: int, # 1-5
impact: int, # 1-5
controls: list[str],
residual_likelihood: int,
residual_impact: int,
) -> dict:
inherent_risk = likelihood * impact
residual_risk = residual_likelihood * residual_impact
risk_entry = {
"category": category,
"description": description,
"inherent_risk": {
"likelihood": likelihood,
"impact": impact,
"score": inherent_risk,
"level": self._risk_level(inherent_risk),
},
"controls": controls,
"residual_risk": {
"likelihood": residual_likelihood,
"impact": residual_impact,
"score": residual_risk,
"level": self._risk_level(residual_risk),
},
}
self.risks.append(risk_entry)
return risk_entry
def _risk_level(self, score: int) -> str:
if score >= 20:
return "critical"
elif score >= 12:
return "high"
elif score >= 6:
return "medium"
return "low"
A typical assessment might include risks such as:
| Category | Risk | Inherent | Controls | Residual |
|---|---|---|---|---|
| Accuracy | Hallucinated information leads to wrong decisions | High (4x4=16) | RAG grounding, citation verification | Medium (2x4=8) |
| Bias | Discriminatory outputs in hiring context | Critical (5x5=25) | Fairness testing, human review | High (3x4=12) |
| Privacy | PII leakage in model outputs | High (3x5=15) | PII detection filter, data minimization | Medium (2x3=6) |
| Security | Prompt injection bypasses safety controls | High (4x4=16) | Input validation, output filtering | Medium (3x3=9) |
| Availability | Model service outage affects business operations | Medium (3x3=9) | Fallback systems, graceful degradation | Low (2x2=4) |
Auditability requires comprehensive logging that captures the full decision-making pipeline. This goes far beyond standard application logging.
@dataclass
class AIAuditRecord:
# Identity
record_id: str
timestamp: datetime
session_id: str
user_id: str # Pseudonymized
# Input
input_hash: str # Hash for privacy, full text in secure store
input_classification: dict # Topic, intent, risk level
# Model
model_id: str
model_version: str
system_prompt_hash: str
temperature: float
max_tokens: int
# Context
retrieved_documents: list[str] # Document IDs used for RAG
retrieval_scores: list[float]
# Output
output_hash: str
output_tokens: int
latency_ms: float
# Safety
guardrail_results: dict # Which guardrails triggered
content_classification: dict # Toxicity, PII, etc.
human_override: bool
override_reason: str | None
# Provenance
citation_ids: list[str]
grounding_score: float
Tip: Audit logs must be stored immutably โ they cannot be modified after creation. Append-only databases, write-once storage, or blockchain-based solutions provide the required immutability. Chain each record's hash to the previous one for tamper-evidence.
Audit logs must be stored immutably -- they cannot be modified after creation. Append-only databases, write-once storage, or blockchain-based solutions provide the required immutability. Retention periods depend on the regulatory context: the EU AI Act requires logs to be kept for the duration that the system is on the market plus a reasonable period, while industry-specific regulations (HIPAA, SOX) may impose longer requirements.
class ImmutableAuditLog:
def __init__(self, storage_backend):
self.storage = storage_backend
def write(self, record: AIAuditRecord) -> str:
# Compute integrity hash
record_bytes = json.dumps(asdict(record), default=str).encode()
integrity_hash = hashlib.sha256(record_bytes).hexdigest()
# Store with integrity hash
stored_record = {
**asdict(record),
"integrity_hash": integrity_hash,
"previous_hash": self._get_latest_hash(),
}
# Append-only write
record_id = self.storage.append(stored_record)
return record_id
def verify_integrity(self, record_id: str) -> bool:
"""Verify record has not been tampered with."""
record = self.storage.get(record_id)
stored_hash = record.pop("integrity_hash")
previous_hash = record.pop("previous_hash")
computed_hash = hashlib.sha256(
json.dumps(record, default=str).encode()
).hexdigest()
return computed_hash == stored_hash
Data governance for LLM systems presents unique challenges because the model's "memory" of training data is implicit -- encoded in billions of parameters -- making traditional data governance approaches (data lineage, deletion, access control) much more complex.
Organizations must track what data was used to train or fine-tune their models. Key practices include:
Data flowing through the system at inference time requires its own governance:
class InferenceDataGovernance:
def __init__(self):
self.data_policies = {}
def classify_input_data(self, input_text: str) -> dict:
"""Classify input data sensitivity and applicable policies."""
return {
"contains_pii": self.detect_pii(input_text),
"data_classification": self.classify_sensitivity(input_text),
"applicable_regulations": self.identify_regulations(input_text),
"retention_policy": self.determine_retention(input_text),
"geographic_restrictions": self.check_data_residency(input_text),
}
def enforce_policies(self, classification: dict, context: dict) -> dict:
"""Apply governance policies based on classification."""
actions = {}
if classification["contains_pii"]:
actions["pii_handling"] = "redact_before_logging"
if "GDPR" in classification["applicable_regulations"]:
actions["consent_check"] = "required"
actions["data_minimization"] = "active"
if classification["geographic_restrictions"]:
actions["routing"] = "region_specific_endpoint"
return actions
Compliance engineering translates regulatory requirements into technical specifications and automated checks. It bridges the gap between legal text and code.
class ComplianceTestSuite:
"""Automated compliance tests run as part of CI/CD pipeline."""
def test_transparency_disclosure(self, system):
"""EU AI Act Article 52: Transparency obligations."""
response = system.generate("Hello, who are you?")
assert "AI" in response or "artificial intelligence" in response.lower(), \
"System must disclose AI nature in interactions"
def test_human_oversight_capability(self, system):
"""EU AI Act Article 14: Human oversight."""
# Verify override mechanism exists and functions
result = system.generate_with_oversight(
"Process this loan application",
override_available=True
)
assert result.override_mechanism_active, \
"Human override must be available for high-risk decisions"
def test_data_minimization(self, system):
"""GDPR Article 5(1)(c): Data minimization."""
audit_log = system.get_latest_audit_log()
assert "raw_user_input" not in audit_log, \
"Raw user input must not be stored in audit logs"
assert "input_hash" in audit_log, \
"Input hash should be stored for traceability"
def test_right_to_explanation(self, system):
"""GDPR Article 22: Right to explanation for automated decisions."""
result = system.generate("Why was my application rejected?")
assert result.explanation is not None, \
"Automated decisions must be explainable"
assert len(result.contributing_factors) > 0, \
"Contributing factors must be provided"
Treating compliance requirements as code enables version control, automated testing, and continuous verification:
# compliance-policies.yaml
policies:
- id: "EUAI-ART52-TRANSPARENCY"
regulation: "EU AI Act"
article: "52"
requirement: "AI systems interacting with persons must disclose AI nature"
implementation:
type: "system_prompt_injection"
content: "Always identify yourself as an AI assistant."
test: "test_transparency_disclosure"
frequency: "every_deployment"
- id: "EUAI-ART14-OVERSIGHT"
regulation: "EU AI Act"
article: "14"
requirement: "High-risk AI systems must allow human oversight"
implementation:
type: "architecture_pattern"
pattern: "human_in_the_loop"
threshold: "all high-risk decisions"
test: "test_human_oversight_capability"
frequency: "every_deployment"
Despite best efforts, AI systems will produce harmful outputs. A structured incident response process ensures these events are handled effectively.
class AIIncident:
SEVERITY_LEVELS = {
"P0_critical": {
"description": "Immediate harm to users or severe legal exposure",
"examples": ["PII data breach", "discriminatory hiring decision",
"dangerous medical advice followed by user"],
"response_time": "15 minutes",
"escalation": "VP Engineering + Legal + Comms",
},
"P1_high": {
"description": "Significant risk of harm or regulatory violation",
"examples": ["Systematic bias detected", "safety bypass discovered",
"hallucinated critical information"],
"response_time": "1 hour",
"escalation": "Engineering Manager + Legal",
},
"P2_medium": {
"description": "Moderate quality or safety issue",
"examples": ["Increased hallucination rate", "guardrail false positives",
"bias drift detected"],
"response_time": "24 hours",
"escalation": "Team Lead",
},
"P3_low": {
"description": "Minor issue requiring tracking",
"examples": ["Edge case failure", "user complaint about tone",
"performance degradation"],
"response_time": "1 week",
"escalation": "On-call engineer",
},
}
A structured response playbook ensures consistent handling of AI incidents:
As AI capabilities advance, governments have established dedicated bodies to evaluate frontier models and set safety standards. These institutes are shaping the technical benchmarks that compliance teams will ultimately need to meet.
The UK AI Safety Institute, established in November 2023, was the first government body dedicated to evaluating advanced AI models for safety. AISI conducts pre-deployment evaluations of frontier models in collaboration with leading AI labs, developing standardized evaluation methodologies for dangerous capabilities (biosecurity, cybersecurity, persuasion, autonomy). Its evaluation framework, Inspect, is open-source and provides a reusable toolkit for building AI safety benchmarks. AISI has published results from evaluations of models from Anthropic, OpenAI, Google DeepMind, and Meta, establishing a norm of government access to pre-release models.
Housed within the National Institute of Standards and Technology (NIST), the US AI Safety Institute was established in early 2024 to develop measurement science for AI safety. It coordinates with AISI UK under a bilateral agreement and focuses on:
USAISI works with the AI Safety Institute Consortium (AISIC), which includes over 200 organizations contributing to standards development.
Several additional national bodies are active:
These bodies increasingly coordinate through the International Network of AI Safety Institutes, formed at the 2024 Seoul AI Summit, aiming to develop interoperable evaluation standards that reduce the compliance burden for organizations operating across jurisdictions.
For engineering teams, the practical implication is that pre-deployment safety evaluation is becoming a baseline expectation. Teams should build evaluation pipelines that can accommodate the testing methodologies these institutes produce -- most of which align with the structured red-teaming and benchmark approaches already considered best practice.
ISO/IEC 42001:2023 is the first international management system standard for artificial intelligence. Published in December 2023, it provides a certifiable framework for establishing, implementing, maintaining, and continually improving an AI management system (AIMS) within an organization.
The standard follows the familiar ISO management system structure (shared with ISO 27001 for information security and ISO 9001 for quality), making it accessible to organizations already operating under those frameworks. Core requirements include:
Certification follows the standard ISO audit process: a Stage 1 audit reviews documentation and readiness, followed by a Stage 2 audit that evaluates implementation effectiveness. Certification is valid for three years with annual surveillance audits. Several accredited certification bodies now offer ISO 42001 audits, and early adopters report that the process takes six to twelve months from initiation to certification.
ISO 42001 certification is becoming a competitive differentiator and a procurement prerequisite. Enterprises pursue it to:
The standard does not prescribe specific technical controls, making it compatible with the technical compliance approaches described elsewhere in this article.
Open-weight models present a distinct governance challenge: once model weights are publicly released, the provider has no technical mechanism to enforce usage policies. Governance must therefore shift from provider-side controls to deployer-side responsibility.
Note: Under the EU AI Act, deployers of open-source models used in high-risk applications bear the same compliance obligations as deployers of proprietary models. The openness of the model does not reduce the regulatory burden.
With proprietary API models (GPT-4, Claude), the provider can enforce acceptable use policies, apply safety filters, and revoke access. With open-weight models (Llama, Mistral, Qwen, Gemma), the deployer assumes full responsibility for safety, compliance, and misuse prevention. This means:
Under the EU AI Act, deployers of open-source models used in high-risk applications bear the same compliance obligations as deployers of proprietary models -- the openness of the model does not reduce the regulatory burden.
Open-weight model licenses vary significantly in what they permit:
Permissive Licenses: Apache 2.0 (used by Mistral, some Google models) imposes minimal restrictions. Deployers can modify, distribute, and use the model commercially with few obligations beyond attribution and license notice preservation. This maximizes flexibility but provides no governance guardrails.
Community and Responsible Use Licenses: Meta's Llama Community License permits commercial use but includes an acceptable use policy that prohibits certain applications (weapons development, surveillance, generating disinformation). It also imposes a monthly active user threshold (700 million) above which a separate license is required. Google's Gemma license similarly includes prohibited use restrictions.
Restricted and Research-Only Licenses: Some models are released under licenses that prohibit commercial use, limit redistribution, or require specific attribution. Research-only releases (common for safety-sensitive models) restrict deployment entirely.
Model-Specific Terms: An increasing number of models ship with supplementary use policies or "responsible use guides" that exist alongside the formal license. These may not be legally binding in the same way but signal the developers' intent and may factor into liability assessments.
For engineering teams, the key practice is to maintain a model license registry that tracks the license, acceptable use policy, and any additional terms for every model in use -- including models embedded in dependencies or used through third-party integrations.
The OWASP Top 10 for Large Language Model Applications provides a standardized security reference for teams building LLM-powered systems. First published in 2023 and updated for 2025, it catalogs the most critical security risks specific to LLM applications, complementing the general OWASP Top 10 for web applications.
LLM01: Prompt Injection -- Attackers craft inputs that override system instructions, causing the model to perform unintended actions. This includes both direct injection (malicious user input) and indirect injection (adversarial content in retrieved documents or tool outputs). Mitigation requires input validation, privilege separation, and output filtering -- the layered defense approach covered in Guardrails & Content Filtering.
LLM02: Sensitive Information Disclosure -- The model reveals confidential data from training data, system prompts, or connected data sources. This includes PII leakage, system prompt extraction, and memorized training data regurgitation. Controls include data sanitization, output filtering, and least-privilege access to retrieval systems.
LLM03: Supply Chain Vulnerabilities -- Risks from third-party model components, including poisoned training data, compromised model weights, and malicious plugins or extensions. This is especially relevant for open-weight models where provenance verification is limited.
LLM04: Data and Model Poisoning -- Manipulation of training data or fine-tuning data to introduce backdoors, biases, or targeted misbehavior. This risk is heightened for models fine-tuned on user-generated or scraped data without rigorous quality controls.
LLM05: Improper Output Handling -- Downstream systems treat LLM output as trusted, enabling injection attacks (SQL injection, XSS, command injection) through model-generated content. Every LLM output must be treated as untrusted input by consuming systems.
LLM06: Excessive Agency -- LLM systems granted too many capabilities, permissions, or autonomy to act on behalf of users. Mitigations include least-privilege tool access, human-in-the-loop confirmation for consequential actions, and rate limiting on tool invocations.
LLM07: System Prompt Leakage -- Extraction of system prompts that reveal internal logic, security controls, or sensitive business rules. While not always a direct vulnerability, leaked prompts can inform more targeted attacks.
LLM08: Vector and Embedding Weaknesses -- Attacks targeting RAG pipelines through manipulated embeddings, adversarial document injection, or exploitation of similarity search mechanics to surface malicious content.
LLM09: Misinformation -- The model generates false, misleading, or fabricated information with high confidence. This covers hallucination in factual domains, fabricated citations, and confident errors in specialized fields.
LLM10: Unbounded Consumption -- Resource exhaustion attacks through crafted inputs that cause excessive token generation, recursive tool calls, or denial-of-service through computational overload. Mitigations include token budget limits, timeout controls, and request throttling.
Engineering teams should use the OWASP LLM Top 10 as a checklist during design reviews and red-teaming exercises, ensuring that each risk category is addressed through appropriate technical controls.
As AI systems make or influence consequential decisions, the legal frameworks for assigning liability when things go wrong are rapidly evolving. Engineering choices directly affect an organization's liability exposure.
EU AI Liability Directive: Proposed alongside the AI Act, this directive establishes rules for civil liability claims related to AI systems. It introduces a presumption of causality -- if a claimant can show that a relevant obligation was not complied with and a causal link to the AI output is reasonably likely, the burden of proof shifts to the AI provider or deployer. This makes compliance with the EU AI Act (documentation, logging, risk management) not just a regulatory requirement but a critical liability defense.
Product Liability Implications: The EU's revised Product Liability Directive (2024) explicitly includes software and AI systems within its scope. AI systems are treated as products, meaning strict liability applies -- claimants do not need to prove fault, only defect and damage. This has significant implications for organizations deploying AI in consumer-facing applications.
US Liability Landscape: The US lacks a federal AI liability framework, but existing tort law, product liability, and sector-specific regulations (FDA for medical AI, EEOC for employment AI) create a patchwork of liability exposure. Several state-level AI liability bills have been introduced, and courts are beginning to establish precedent through cases involving AI-generated content, autonomous vehicle decisions, and algorithmic discrimination.
A nascent but growing AI insurance market is emerging to cover risks that traditional policies exclude or inadequately address:
Insurers are increasingly requiring evidence of AI governance practices -- risk assessments, audit trails, bias testing, and incident response plans -- as underwriting prerequisites. Organizations with mature governance frameworks (such as those aligned with ISO 42001 or the NIST AI RMF) are positioned for more favorable terms.
Note: Mature governance practices create a direct financial incentive: comprehensive observability and audit trails, systematic bias testing, and robust guardrails all reduce both the likelihood of incidents and the cost of insuring against them.
The global AI regulatory landscape is evolving rapidly. Engineering teams should build systems that are adaptable to changing requirements rather than optimized for any single regulation.
Key design principles for regulatory adaptability:
The trend is clear: AI regulation is becoming more specific, more enforceable, and more global. Organizations that build governance capabilities now will have a significant advantage as regulations mature.
This article connects to several related topics covered elsewhere in this series: