Data Quality

The condition of data based on factors such as accuracy, completeness, reliability, and relevance, crucial for effective AI model performance.

Definition

A multidimensional measure—including correctness (error-free), completeness (no missing values), consistency (uniform formats), timeliness (up-to-date), and relevance (fit for purpose). Data-quality programs deploy automated validation rules, cleansing pipelines, and quality dashboards, with escalation procedures when metrics fall below thresholds.

Real-World Example

A credit-risk team tracks data-quality metrics for income and employment fields in loan applications. When missing-value rates exceed 2%, an automated alert triggers a review: data engineers correct ETL scripts and notify frontline staff to enforce mandatory fields, restoring data completeness before model retraining.