Inner Monologue: Embodied Reasoning through Planning with Language Models
We propose Inner Monologue, which incorporates language models into embodied agents to enable reasoning through planning and feedback.
¶
Think out loud, then act
¶
The agent generates an internal narrative: what it intends, what it observes, what went wrong. Language becomes the scratchpad for closed-loop control, not just user-facing chat.
¶
Feedback closes the loop
¶
Success detectors, scene descriptions, and human hints feed back into the monologue. The model replans without a full reset. This is closer to how people recover from failed grasps than one-shot command execution.
¶
With SayCan
¶
Read alongside SayCan: SayCan grounds skill selection in affordances; Inner Monologue adds temporal reasoning and recovery. Both treat the LLM as a planner, not a remote control.