Papyros

Archive / paper

Inner Monologue: Embodied Reasoning through Planning with Language Models

We propose Inner Monologue, which incorporates language models into embodied agents to enable reasoning through planning and feedback.

Think out loud, then act

The agent generates an internal narrative: what it intends, what it observes, what went wrong. Language becomes the scratchpad for closed-loop control, not just user-facing chat.

Feedback closes the loop

Success detectors, scene descriptions, and human hints feed back into the monologue. The model replans without a full reset. This is closer to how people recover from failed grasps than one-shot command execution.

With SayCan

Read alongside SayCan: SayCan grounds skill selection in affordances; Inner Monologue adds temporal reasoning and recovery. Both treat the LLM as a planner, not a remote control.