Papyros

Archive / paper

Sparks of Artificial General Intelligence: Early experiments with GPT-4

We demonstrate that, beyond its mastery of language, GPT-4 can solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more.

Beyond benchmarks

Bubeck et al. did not rely on a single score. They probed GPT-4 with tasks requiring planning, abstraction, and tool use. The paper documents behaviors that look like reasoning even when the mechanism is opaque.

The AGI framing

The title provoked debate. Critics said the evaluation was anecdotal; supporters said standard benchmarks had already saturated. The paper's value is descriptive: it catalogues what a frontier model could do in early 2023 before the public had access.

What remains open

"Sparks" is the careful word. The model still hallucinates, forgets, and fails on adversarial prompts. The paper asks whether capability clusters into general intelligence or a mosaic of narrow tricks. The question outlived the hype.