Run one step of the loop - On Policy Distillation | Zoonk
Run one step of the loop
Walk through one complete interaction step: the agent sees an observation, chooses an action, and the environment returns a new observation, reward, and ending signal. Keep the timing straight so rewards and next observations are not used too early.