Do As I Can, Not As I Say: Grounding Language in Robotic Affordances
We propose SayCan, a method that grounds language in robotic affordances to enable physically situated robotic agents to follow high-level instructions.
¶
Language without grounding fails
¶
A model can say "pick up the can" eloquently and still propose impossible moves. SayCan splits the problem: a language model scores which skills are relevant to the instruction; a value function scores which skills are feasible in the current scene.
¶
Skills as the interface
¶
The robot executes a library of low-level skills (grasp, move, place). The LM chooses among them in sequence. Feasibility filters hallucinated plans before they reach the hardware.
¶
Embodied AI direction
¶
This paper connects the LLM wave to robotics without pretending text alone is enough. Grounding lives in the intersection of language, perception, and control. Inner Monologue and related work extend the same thread.