The Bitter Lesson

Knowledge loses to compute

Sutton argues researchers keep embedding human insight into systems (chess heuristics, linguistic features, vision pipelines) and keep getting overtaken by search and learning at scale. The bitter part: our cleverness wastes time.

History as evidence

Chess, Go, speech, vision, language: the arc repeats. Methods that use more data and compute win. Methods that encode expert structure plateau.

After transformers

Read in 2019 it sounded like reinforcement learning's manifesto. After GPT and scaling laws it reads like prophecy. Whether it stays true forever is another question Sutton would probably leave to compute.