Adversarial Attack

Techniques that manipulate AI models by introducing deceptive inputs to cause incorrect outputs.

Definition

Deliberate, often imperceptible, modifications to input data—images, text, or audio—that exploit vulnerabilities in an AI model’s decision boundaries. Such attacks highlight black-box system weaknesses and drive the need for proactive defenses: adversarial-training (injecting crafted examples during training), input-sanitization layers, and ongoing “red-team” penetration tests.

Real-World Example

Security researchers place tiny, artful stickers on a stop sign so that a self-driving car’s vision system misreads it as “Speed Limit 45.” The automaker responds by integrating adversarial-example detectors and hardening the model with randomized input preprocessing.