25. Build transformers from their parts
Build the transformer block from embeddings, positional information, self-attention, feedforward layers, residual paths, and normalization. You will connect each part to the behavior of modern language and vision models.