Deployment

Inference

Running a model to produce an output.

Plain-English explanation

Inference cost and latency depend on model size, hardware, batching, context length, output length, and serving architecture.

Why it matters

Inference matters because it affects how AI systems are designed, evaluated, priced, or trusted. Knowing the term helps you ask better questions and avoid vague implementation decisions.

Ask how it changes quality, cost, speed, or safety.
Look for concrete examples in the workflow you are building.
Document the tradeoff before choosing a tool or architecture.