19. Small and on-device language models
See how smaller models can run on phones, laptops, browsers, and private servers. This chapter covers distillation, quantization-aware choices, local runtimes, privacy tradeoffs, and when small models beat large hosted models.