AI Vision
Skills, tools, briefs, wiki
HomeBriefsSkillsToolsWikiWeekly
Back to wiki
Deployment

Quantization

Reducing model precision to save memory and speed inference.

Plain-English explanation

Quantization enables larger models on smaller hardware, but may reduce quality. Test the target task rather than relying only on benchmarks.

Why it matters

Quantization matters because it affects how AI systems are designed, evaluated, priced, or trusted. Knowing the term helps you ask better questions and avoid vague implementation decisions.

  • Ask how it changes quality, cost, speed, or safety.
  • Look for concrete examples in the workflow you are building.
  • Document the tradeoff before choosing a tool or architecture.