Agents that choose actions

Decide which part is the chooser and which part is the world it acts in. Locate the policy as the agent’s action-selecting rule, and keep it separate from rewards, observations, and environment dynamics.

Not started

Read what the agent can see and do

Inspect an environment interface to identify what the agent receives and what it is allowed to send back. Distinguish observation from hidden environment state, action from outcome, and valid action choices from impossible moves.

Not started

Practice

Apply the previous explanations in a guided problem.

Not started

Start, step, and stop an episode

Use reset to get the first observation, then parse each step’s reply as next observation, reward, termination or truncation, and optional info using the Gymnasium/OpenAI Gym convention. Treat reward as immediate feedback and an episode ending as the signal to stop acting and reset.

Not started

Fill in a decision-loop trace

Work through a tiny environment row by row with columns for current observation, chosen action, reward, next observation, and done signal. Practice updating the loop correctly without adding returns, neural networks, or training code.

Not started

Practice

Apply the previous explanations in a guided problem.

Not started

Quiz

Check your understanding with a short quiz.

Not started

Review

Review this chapter with practice based on your mistakes.

Not started

Search