Inference
The process by which a trained AI model processes new data inputs to produce predictions or decisions.
Definition
The runtime stage where models apply learned parameters to unseen data, often under strict latency, throughput, and resource constraints. Inference governance ensures that production models use the correct version, adhere to performance SLAs, log inputs/outputs for monitoring, and enforce input-validation checks to prevent misuse or injection attacks.
Real-World Example
A fraud-detection service exposes a REST API for inference. It wraps the model in a microservice that verifies input schemas, logs every request and response with metadata, and scales horizontally to maintain sub-100 ms response times during peak transaction loads.