Knowledge Distillation

A method of transferring insight from a larger “teacher” model into a smaller “student” model, balancing performance with resource and governance constraints.

Definition

A two-step process where a compact student network is trained to mimic the output distributions (soft labels) or intermediate representations of a larger, more complex teacher model. Distillation reduces inference latency, energy use, and attack surface—important for edge deployment or regulated environments. Governance includes validating that distilled models maintain fairness and accuracy, and documenting the distillation recipe for audit and reproducibility purposes.

Real-World Example

A mobile-app developer distills a large BERT-based sentiment-analysis model into a TinyBERT variant suitable for on-device inference. The distilled model retains 98% of the teacher’s accuracy while reducing memory footprint by 90%. Documentation of the distillation process is stored in the corporate knowledge-management system for future audits.