Feature Selection

Identifying and selecting the most relevant features for model training to reduce complexity and improve accuracy.

Definition

A process that ranks or filters features based on statistical metrics (mutual information, correlation), model-based importance scores, or wrapper methods (recursive feature elimination). Good feature selection reduces overfitting, speeds up training, and simplifies explainability. Governance guidelines require documenting selection criteria, ensuring no sensitive attributes inadvertently leak, and re-evaluating selection as data evolves.

Real-World Example

In credit-risk modeling, a data-science team uses L1 regularization and permutation-importance analysis to drop 40% of low-impact variables (e.g., minor demographic fields). The resulting model trains 30% faster, maintains performance, and is easier for auditors to review.