Feature Extraction
The process of mapping raw data (e.g., text, images) into numerical representations (features) suitable for input into ML algorithms.
Definition
Automated or algorithm-driven transformation of unstructured data into fixed-length vectors—using techniques like TF-IDF for text, SIFT for images, or spectral features for audio. Modern approaches include learned embeddings (BERT, Word2Vec). Governance must validate that extraction methods generalize across domains, do not leak sensitive information, and remain robust to data-drift or adversarial perturbations.
Real-World Example
A speech-to-text system uses Mel-frequency cepstral coefficients (MFCCs) to extract audio features from raw waveforms. These features feed into a neural network that achieves state-of-the-art word-error rates. The team monitors MFCC distributions over time to detect microphone-drift issues in deployed devices.