Papyros

Archive / paper

A Few Useful Things to Know About Machine Learning

Tapping into the next frontier of the information revolution depends vitally on machine learning.

Wisdom over benchmarks

Domingos compresses years of practice into aphorisms that still sting. Data is more important than the algorithm. Feature engineering matters more than model shopping. More features can hurt unless you regularize.

Leakage and evaluation

Train-test contamination, tuning on the test set, and mistaking correlation for causation appear as recurring failure modes. The paper is a checklist before you trust a leaderboard score.

For readers of scaling papers

Read this after Kaplan and Brown. Scale changed the game, but the fundamentals here still explain why models fail in production.