14. Transformers that scale generation
Transformers combine attention, feed-forward layers, positional information, and large-scale training into a flexible architecture. This chapter covers encoder, decoder, and encoder-decoder designs and why transformers became the base for modern generative AI.