24. Build kernels for neural network workloads
Write and tune GPU code for convolutions, attention, normalization, embedding lookups, and batch processing. This chapter connects core GPU skills to the workloads behind deep learning and large language models.