37. Train agents with reinforcement learning
Train agents that learn from rewards, actions, states, policies, and value functions. This chapter covers Q-learning, policy gradients, actor-critic methods, exploration, simulation, and why reinforcement learning is hard to validate.