14. GPUs and distributed training
See what changes when training moves from a laptop to clusters of GPUs or TPUs. This chapter covers memory limits, batching, mixed precision, parallelism, checkpoints, failures, and the practical reasons training is expensive.