Metadata Management
The practice of capturing and maintaining descriptive data (e.g., data provenance, feature definitions, model parameters) to support traceability and audits.
Definition
Implementation of metadata registries that collect lineage information (source datasets, transformation steps), feature catalogs (definitions, data types), model artifacts (hyperparameters, training code versions), and usage logs. Governance enforces mandatory metadata capture at each pipeline stage, integrates metadata validation checks, and provides search and reporting interfaces for stakeholders to conduct audits and impact analyses.
Real-World Example
A pharmaceutical ML platform uses a metadata store to log: dataset versions, feature-engineering scripts, model-training Git commits, and deployment timestamps. When a model’s performance drops, investigators query the metadata store to pinpoint the exact data or code changes responsible.