Gradient Descent
An optimization algorithm that iteratively adjusts model parameters in the direction that minimally decreases the loss function.
Definition
The foundational technique for training neural networks and many other models. It computes gradients of the loss with respect to parameters and updates them via a learning rate. Variants include batch, stochastic, and adaptive methods (Adam, RMSProp). Governance includes tracking convergence behavior, setting appropriate learning-rate schedules, detecting exploding/vanishing gradients, and logging training runs for reproducibility and audit audits.
Real-World Example
A deep-learning research team uses Adam (an adaptive gradient-descent variant) to train a language model. They log learning-rate changes, gradient norms, and loss curves in an ML-experiment tracking platform, enabling them to reproduce a high-quality checkpoint and quickly diagnose training instabilities.