Papyros

Archive / paper

A Mathematical Theory of Communication

The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point.

The move that started everything

Shannon's first act is an act of refusal. He sets meaning aside.

Frequently the messages have meaning [...]. These semantic aspects of communication are irrelevant to the engineering problem.

By declining to ask what a message means, Shannon makes the quantity of information measurable. A message is a selection from a set of possible messages. The more uncertainty resolved by a selection, the more information it carries.

Entropy

For a source emitting symbols with probabilities p_i, the information is the entropy:

H = - Σ p_i log₂ p_i      (bits per symbol)

Entropy is maximized when outcomes are equiprobable and collapses to zero when one outcome is certain. It is, exactly, the average surprise of the source, and the floor below which no lossless code can compress.

The channel and its capacity

Every real channel adds noise. Shannon's deepest result, the noisy channel coding theorem, says something that sounded impossible in 1948: reliable communication over a noisy channel is achievable up to a finite rate, the channel capacity, with an error probability as small as you like, provided you encode over long enough blocks.

ConceptWhat it bounds
Entropy HBest possible lossless compression
Capacity CBest possible reliable transmission rate

Below capacity: arbitrarily good. Above capacity: arbitrarily bad. The boundary is sharp.

Why it belongs in every archive

Almost everything digital descends from this paper, modems, compression, cryptography, error-correcting codes in every disk and deep-space probe. Shannon did not improve communication. He drew its outer wall and proved where it stood.