10. Transformer blocks inside an LLM
Put attention, feed-forward layers, residual connections, layer normalization, and positional information into one working mental model. This chapter explains decoder-only transformers, the architecture behind most modern LLMs.