Search courses, chapters, or pages...
Draw the line between the decision maker and the world it acts in. Identify what the agent controls, what the environment controls, and why this boundary matters before any code or neural network appears.
Use what you learned in the previous lesson to solve real-world problems.
Decide what information the agent actually receives at a decision point. Separate observations from hidden facts about the world so you do not accidentally give the agent knowledge it would not have.
Check what you understood with a short quiz.
Match each decision to the set of actions the environment accepts. Recognize discrete choices like left/right, continuous controls like steering angle, and invalid actions that should not be offered to the agent.
Read rewards as numeric feedback from the environment after an action. Distinguish the immediate reward for one step from broader goals or later performance measures that will be handled in later chapters.
Mark where an episode begins, what reset provides, and why an episode can end. Distinguish natural termination, such as success or failure, from an outside cutoff such as a time limit.
Walk through one complete interaction step: the agent sees an observation, chooses an action, and the environment returns a new observation, reward, and ending signal. Keep the timing straight so rewards and next observations are not used too early.
Locate the policy as the part of the agent that maps what it sees to what it does. Separate the policy from the environment, the reward rule, and any later training method.
Fill a small step table by hand with observation, action, reward, next observation, and done status. Use the table to trace a full episode without yet using trajectories, returns, or probability calculations.
Compare a teacher agent and a student agent using the same observation and action interface. Reason about why distillation only makes sense when the student can see compatible inputs and produce compatible actions.
Review this chapter with practice based on your mistakes.