Beat 1 · Concrete
The output becomes the next input
One token at a time — each generated token is fed back in to predict the next.
Beat 2 · Abstract
The agent loop is the same loop
Propose an action, the world responds, observe, repeat — and reward tunes the policy.
Beat 3 · Interactive
Close the loop yourself
Step the loop or nudge the reward; watch behavior settle toward the goal.
Drag inside the stage to move the goal · inject error, watch it resolve
The threads that tie back
2022 · ChatGPT
RLHF
Human preference becomes a reward signal; the policy is tuned by the same loop it runs in.
Generation
Diffusion
Image models denoise from pure noise back toward a sample — generation as iterated correction.
Agents
Tool use
The model proposes an action, a tool or the world responds, it observes, and acts again.
Era 01 · 1948
The loop returns
Wiener: "control and communication in the animal and the machine." The arc closes where it opened.