Imbalanced Data
A dataset where one class or category significantly outnumbers others, which can lead AI models to bias toward the majority class unless mitigated.
Definition
Occurs when target categories (e.g., fraud vs. legitimate) are unevenly represented, causing models to favor majority classes and overlook rare but critical cases. Mitigation techniques include resampling (oversampling minority, undersampling majority), synthetic-data generation (SMOTE), or using class-weight adjustments. Governance requires monitoring class distributions, tracking performance by class, and documenting mitigation choices and their effects.
Real-World Example
A bank’s fraud-detection dataset has 0.5% fraud cases. The data-science team applies SMOTE to oversample fraud examples, retrains the model with class-weighted loss, and raises fraud recall from 60% to 85%—while documenting the process for audit and compliance.