Deployment
Inference
Running a model to produce an output.
Plain-English explanation
Inference cost and latency depend on model size, hardware, batching, context length, output length, and serving architecture.
Why it matters
Inference matters because it affects how AI systems are designed, evaluated, priced, or trusted. Knowing the term helps you ask better questions and avoid vague implementation decisions.
- Ask how it changes quality, cost, speed, or safety.
- Look for concrete examples in the workflow you are building.
- Document the tradeoff before choosing a tool or architecture.