A proof of Kraft’s inequality

In coding theory, Kraft’s inequality gives a necessary and sufficient condition for the existence of a prefix code.¹ The result in fact generalises to uniquely decodable codes, in which case it goes by the name of Kraft-McMillan inequality.¹ Below is a proof that a prefix code necessarily satisfies the inequality.

Definition. Let \(\newcommand{\abs}[1]{\left\lvert#1\right\rvert} \newcommand{\LHS}{\text{LHS}} \newcommand{\RHS}{\text{RHS}} \newcommand{\im}{\mathop{\rm im}\nolimits} S\) and \(T\) be finite sets (“alphabets”). A prefix code is a function \(C\colon S \to T^*\) such that for \(s,s' \in S\), if \(C(s) \preceq C(s')\) then \(s = s'\).²

Let \(C\colon S \to T^*\) be a prefix code. For \(s \in S\), write \(l(s) := \abs{C(s)}\).

Theorem (Kraft’s inequality as a necessary condition³).

\[\begin{equation}\label{eq:Kraft} \sum_{s \in S} \abs{T}^{-l(s)} \leq 1. \end{equation}\]

The inequality is however also a sufficient condition for the existence of a prefix code: whenever \(l\colon S \to \mathbf{Z}_{\geq 0}\) is a function that satisfies \(\eqref{eq:Kraft}\), then there exists a prefix code \(C\colon S \to T^*\) such that \(l(s) = |C(s)|\) for all \(s \in S\) (see T. M. Cover and J. A. Thomas, Elements of Information Theory, 2nd ed. (2006), p. 107, Theorem 5.2.1). But I will not be concerned with this “converse” today.

I will deduce this as a special case of the following inequality (for \(\tau = ()\)), which describes a more detailed picture of what is going on.

Lemma. Let \(d \geq 0\) and \(\tau \in T^d\). Then \(\displaystyle \sum_{ \substack{s \in S, \\ \tau \preceq C(s)} } |T|^{-\abs{C(s)}} \leq |T|^{-d}.\)

Proof. The proof will be described just for \(\abs{T} = 2\), for simplicity. It trivially adapts to the general \(\abs{T}\), as the reader will see.

We proceed by reverse induction on \(d\).

Case: there is no \(s \in S\) with \(\tau \preceq C(s)\). Then \(\LHS = 0 < \RHS\), as desired.

Case: there is an \(s \in S\) with \(\tau \preceq C(s)\). Let us further break down this case.

Suppose there is an \(s \in S\) with \(\tau = C(s)\). Since \(C\) is a prefix code, \(s\) is then the only element in \(S\) with \(\tau \preceq C(s)\). Therefore \(\LHS = \abs{T}^{-\abs{C(s)}} = \RHS\), as desired.

Suppose there is no \(s \in S\) with \(\tau = C(s)\). Write \(T = \{0,1\}\). Then

\[\{ s \in S \mid \tau \preceq C(s) \} = \{ s \in S \mid \tau{:}0 \preceq C(s) \} \cup \{ s \in S \mid \tau{:}1 \preceq C(s) \},\]

where the colon (\(:\)) denotes concatenation and wherein the union, note, is disjoint. Therefore, by induction, we have

\[\LHS = \sum_{ \substack{s \in S, \\ \tau{:}0 \preceq C(s)} } 2^{-\abs{C(s)}} + \sum_{ \substack{s \in S, \\ \tau{:}1 \preceq C(s)} } 2^{-\abs{C(s)}} \leq 2^{-(d+1)} + 2^{-(d+1)} = \RHS,\]

as desired. This proves the lemma (hence the theorem).

Reference: Wikipedia, “Kraft-McMillan inequality” ↩ ↩²
Here, \(T^*\) is the Kleene star (i.e. the set of strings in \(T\)), and \(C(s) \preceq C(s')\) denotes that \(C(s)\) is an initial segment (or prefix) of \(C(s')\). ↩
This statement asserts that Kraft’s inequality is necessarily satisfied by a prefix code. ↩