21. Align language models with examples and preferences
Shape language model behavior with instruction data, preference data, rejection sampling, supervised fine-tuning, DPO-style methods, and RLHF concepts. You will focus on what each method needs, what it changes, and how to test the result safely.