Sparks of Artificial General Intelligence: Early experiments with GPT-4
We demonstrate that, beyond its mastery of language, GPT-4 can solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more.
¶
Beyond benchmarks
¶
Bubeck et al. did not rely on a single score. They probed GPT-4 with tasks requiring planning, abstraction, and tool use. The paper documents behaviors that look like reasoning even when the mechanism is opaque.
¶
The AGI framing
¶
The title provoked debate. Critics said the evaluation was anecdotal; supporters said standard benchmarks had already saturated. The paper's value is descriptive: it catalogues what a frontier model could do in early 2023 before the public had access.
¶
What remains open
¶
"Sparks" is the careful word. The model still hallucinates, forgets, and fails on adversarial prompts. The paper asks whether capability clusters into general intelligence or a mosaic of narrow tricks. The question outlived the hype.