In coding theory, Kraft’s inequality gives a necessary and sufficient condition for the existence of a prefix code.1 The result in fact generalises to uniquely decodable codes, in which case it goes by the name of Kraft-McMillan inequality.1 Below is a proof that a prefix code necessarily satisfies the inequality.

Definition. Let \(\newcommand{\abs}[1]{\left\lvert#1\right\rvert} \newcommand{\LHS}{\text{LHS}} \newcommand{\RHS}{\text{RHS}} \newcommand{\im}{\mathop{\rm im}\nolimits} S\) and \(T\) be finite sets (“alphabets”). A prefix code is a function \(C\colon S \to T^*\) such that for \(s,s' \in S\), if \(C(s) \preceq C(s')\) then \(s = s'\).2

Let \(C\colon S \to T^*\) be a prefix code. For \(s \in S\), write \(l(s) := \abs{C(s)}\).

Theorem (Kraft’s inequality as a necessary condition3).

\[\begin{equation}\label{eq:Kraft} \sum_{s \in S} \abs{T}^{-l(s)} \leq 1. \end{equation}\]

The inequality is however also a sufficient condition for the existence of a prefix code: whenever \(l\colon S \to \mathbf{Z}_{\geq 0}\) is a function that satisfies \(\eqref{eq:Kraft}\), then there exists a prefix code \(C\colon S \to T^*\) such that \(l(s) = |C(s)|\) for all \(s \in S\) (see T. M. Cover and J. A. Thomas, Elements of Information Theory, 2nd ed. (2006), p. 107, Theorem 5.2.1). But I will not be concerned with this “converse” today.

I will deduce this as a special case of the following inequality (for \(\tau = ()\)), which describes a more detailed picture of what is going on.

Lemma. Let \(d \geq 0\) and \(\tau \in T^d\). Then \(\displaystyle \sum_{ \substack{s \in S, \\ \tau \preceq C(s)} } |T|^{-\abs{C(s)}} \leq |T|^{-d}.\)

Proof. The proof will be described just for \(\abs{T} = 2\), for simplicity. It trivially adapts to the general \(\abs{T}\), as the reader will see.

We proceed by reverse induction on \(d\).

Case: there is no \(s \in S\) with \(\tau \preceq C(s)\). Then \(\LHS = 0 < \RHS\), as desired.

Case: there is an \(s \in S\) with \(\tau \preceq C(s)\). Let us further break down this case.

Suppose there is an \(s \in S\) with \(\tau = C(s)\). Since \(C\) is a prefix code, \(s\) is then the only element in \(S\) with \(\tau \preceq C(s)\). Therefore \(\LHS = \abs{T}^{-\abs{C(s)}} = \RHS\), as desired.

Suppose there is no \(s \in S\) with \(\tau = C(s)\). Write \(T = \{0,1\}\). Then

\[\{ s \in S \mid \tau \preceq C(s) \} = \{ s \in S \mid \tau{:}0 \preceq C(s) \} \cup \{ s \in S \mid \tau{:}1 \preceq C(s) \},\]

where the colon (\(:\)) denotes concatenation and wherein the union, note, is disjoint. Therefore, by induction, we have

\[\LHS = \sum_{ \substack{s \in S, \\ \tau{:}0 \preceq C(s)} } 2^{-\abs{C(s)}} + \sum_{ \substack{s \in S, \\ \tau{:}1 \preceq C(s)} } 2^{-\abs{C(s)}} \leq 2^{-(d+1)} + 2^{-(d+1)} = \RHS,\]

as desired. This proves the lemma (hence the theorem).

  1. Reference: Wikipedia, “Kraft-McMillan inequality”  2

  2. Here, \(T^*\) is the Kleene star (i.e. the set of strings in \(T\)), and \(C(s) \preceq C(s')\) denotes that \(C(s)\) is an initial segment (or prefix) of \(C(s')\). 

  3. This statement asserts that Kraft’s inequality is necessarily satisfied by a prefix code.