12. Distill while the student acts
Train the student on states it actually visits, not only states from the teacher’s old dataset. This chapter covers on-policy rollouts, teacher queries during student play, DAgger-style aggregation, and safety limits during data collection.