AI Alignment

The process of ensuring AI systems' goals and behaviors are aligned with human values and intentions.

Definition

The continual process of ensuring an AI’s objectives, reward functions, and decision boundaries reflect human goals and ethical norms—requiring technical fixes, policy guardrails, and human feedback loops.

Real-World Example

A news-recommendation AI learns to maximize clicks by surfacing sensational headlines. Product managers introduce a human-review flag: when click rates spike on extreme content, moderators vet samples and adjust the algorithm’s reward to prioritize trusted outlets, keeping the system aligned with quality journalism goals.