# Laplace's Rule of Succession

The Rule of Succession gives a simple formula for “enumerative induction”: reasoning from observed instances to unobserved ones. If you’ve observed 8 ravens and they’ve all been black, how certain should you be the next raven you see will also be black? According to the Rule of Succession, 90%. In general, the probability is $(k+1)/(n+2)$ that the next observation will be positive, given $k$ positive observations out of $n$ total.

When does the Rule of Succession apply, and why is it $(k+1)/(n+2)$? Laplace first derived a special case of the rule in 1774, using certain assumptions. The same assumptions also allow us to derive the general rule, and following the derivation through answers both questions. $\newcommand{\p}{P}\newcommand{\given}{\mid}\newcommand{\dif}{d}$

As motivation, imagine we’re drawing randomly, with replacement, from an urn of marbles some proportion $p$ of which are black. Strictly speaking, $p$ must be a rational number in this setup. But formally, we’ll suppose $p$ can be any real number in the unit interval.

If we have no idea what $p$ is, it’s natural to start with a uniform prior over its possible values. Formally, $p$ is a random variable with a uniform density on the $[0,1]$ interval. Each draw induces another random variable, $$X_i = \begin{cases} 1 & \text{ if the i^\text{th} draw is black},\newline 0 & \text{ otherwise}. \end{cases}$$ We’ll define one last random variable $S_n$, which counts the black draws: $$S_n = X_1 + \ldots + X_n .$$ Laplace’s assumptions are then as follows.

1. Each $X_i$ has the same chance $p$ of being $1$.
2. That chance is independent of whatever values the other $X_j$’s take.
3. The prior distribution over $p$ is uniform: $f(p) = 1$ for $0 \leq p \leq 1$.

Given these assumptions, the Rule of Succession follows: $$\p(X_{n+1} = 1 \given S_n = k) = \frac{k+1}{n+2}.$$ We’ll start by deriving this result for the special case where all observations are positive, so that $k = n$.

# Laplace’s Special Case

When $k = n$, the Rule of Succession says: $$\p(X_{n+1} = 1 \given S_n = n) = \frac{n+1}{n+2}.$$ To derive this result, we start with the Law of Total Probability. \begin{align} \p(X_{n+1} = 1 \given S_n = n) &= \int_0^1 \p(X_{n+1} = 1 \given S_n = n, p) f(p \given S_n = n) \dif p\newline &= \int_0^1 \p(X_{n+1} = 1 \given p) f(p \given S_n = n) \dif p\newline &= \int_0^1 p \, f(p \given S_n = n) \dif p. \tag{1} \end{align} To finish the calculation, we need to compute $f(p \given S_n = n)$. We need to know how observing $n$ out of $n$ black marbles changes the probability density over $p$.

For this we turn to Bayes’ theorem. \begin{aligned} f(p \given S_n = n) &= \frac{ f(p) \p(S_n = n \given p) }{ \p(S_n = n) }\newline &= \frac{ \p(S_n = n \given p) }{ \p(S_n = n) }\newline &= \frac{ p^n }{ \p(S_n = n) }\newline &= c p^n. \end{aligned} Here $c$ is an as-yet unknown constant: the inverse of $\p(S_n = n)$, whatever that is. To find $c$, first observe by calculus that: $$\int_0^1 c p^n \dif p = \left. \left(\frac{c p^{n+1}}{n+1}\right) \right|_0^1 = \frac{c}{n+1}.$$ Then observe that this quantity must equal $1$, since we’ve integrated $f(p \given S_n = n)$, a probability density. Thus $c = n + 1$, and hence $$f(p \given S_n = n) = (n+1) p^n.$$ Returning now to finish our original calculation in Equation (1): \begin{aligned} \p(X_{n+1} = 1 \given S_n = n) &= \int_0^1 p \, f(p \given S_n = n) \dif p\newline &= \int_0^1 p \, (n+1) p^n \dif p\newline &= (n+1) \int_0^1 p^{n+1} \dif p\newline &= (n+1) \left. \left(\frac{p^{n+2}}{n+2}\right) \right|_0^1\newline &= \frac{n+1}{n+2}. \end{aligned} This is the Rule of Succession when $k = n$, as desired.

# The General Case

The proof of the general case starts similarly. We first apply the Law of Total Probability to obtain $$\p(X_{n+1} = 1 \given S_n = k) = \int_0^1 p \, f(p \given S_n = k) \dif p. \tag{2}$$ Then we use Bayes’ theorem to compute $f(p \given S_n = k)$. \begin{align} f(p \given S_n = k) &= \frac{ \p(S_n = k \given p) }{ \p(S_n = k) }\newline &= \frac{ \binom{n}{k} p^k (1-p)^{n-k} }{ \p(S_n = k) } \tag{3}. \end{align} Note that we used the formula for a binomial probability here to calculate the numerator $\p(S_n = k \given p)$.

Computing the denominator $\p(S_n = k)$ requires a different approach from the special case. We start with the Law of Total Probability: \begin{align} \p(S_n = k) &= \int_0^1 \p(S_n = k \given p) f(p) \dif p\newline &= \int_0^1 \p(S_n = k \given p) \dif p \newline &= \int_0^1 \binom{n}{k} p^k (1-p)^{n-k} \dif p \newline &= \binom{n}{k} \int_0^1 p^k (1-p)^{n-k} \dif p. \end{align} This leaves us facing an instance of a famous function, the “beta function,” which is defined: $$B(a, b) = \int_0^1 x^a (1-x)^{b} \dif x.$$ In our case $a$ and $b$ are natural numbers, so $B(a,b)$ has an elegant formula, which we use now and prove later: $$B(a, b) = \frac{a!b!}{(a + b + 1)!}.$$ For us, $a = k$ and $b = n-k$, so we have $$\p(S_n = k) = \binom{n}{k} B(k, n-k) = \binom{n}{k} \frac{k!(n-k)!}{(n + 1)!}.$$ Substituting back into our calculation of $f(p \given S_n = k)$ in Equation (3): \begin{aligned} f(p \given S_n = k) &= \frac{ \binom{n}{k} p^k (1-p)^{n-k} }{ \binom{n}{k} B(k, n-k) }\newline &= \frac{(n + 1)!}{k!(n-k)!} p^k (1-p)^{n-k} . \end{aligned} Then we finish our original calculation from Equation (2): \begin{align} \p(X_{n+1} = 1 \given S_n = k) &= \int_0^1 p \frac{(n + 1)!}{k!(n-k)!} p^k (1-p)^{n-k} \dif p\newline &= \frac{(n + 1)!}{k!(n-k)!} \int_0^1 p^{k+1} (1-p)^{n-k} \dif p\newline &= \frac{(n + 1)!}{k!(n-k)!} B(k+1, n-k)\newline &= \frac{(n + 1)!}{k!(n-k)!} \frac{(k+1)!(n-k)!}{(k+1 + n-k + 1)!}\newline &= \frac{k+1}{n + 2}. \end{align} This is the Rule of Succession, as desired.

# The Beta Function

Finally, let’s derive the formula we used for the beta function: $$\int_0^1 x^a (1-x)^{b} \dif x = \frac{a!b!}{(a + b + 1)!},$$ where $a$ and $b$ are natural numbers. We proceed in two steps: integration by parts, then a proof by induction.

Notice first that when $b = 0$ our integral simplifies and is straightforward: $$\int_0^1 x^a \dif x = \frac{1}{a+1}.$$ So let’s assume $b > 0$ and pursue integration by parts. If we let $$u = (1 - x)^b, \quad \dif v = x^a \dif x,$$ then $$\dif u = -b (1 - x)^{b-1}, \quad v = \frac{x^{a+1}}{a+1}.$$ So \begin{aligned} \int_0^1 x^a (1-x)^{b} \dif x &= \left. \left(\frac{ x^{a+1} (1 - x)^b }{ a+1 }\right) \right|_0^1 + \frac{b}{a+1} \int_0^1 x^{a+1} (1 - x)^{b-1} \dif x\newline &= \frac{b}{a+1} \int_0^1 x^{a+1} (1 - x)^{b-1} \dif x. \end{aligned}

Now we use this identity in an argument by induction. We already noted that when $b = 0$ we have $B(a, 0) = 1/(a+1)$. This satisfies the general formula $$B(a, b) = \frac{a!b!}{(a+b+1)!}.$$ By induction on $b > 0$, we find the formula holds in general: \begin{aligned} B(a, b) &= \int_0^1 x^a (1-x)^{b} \dif x\newline &= \frac{b}{a+1} \int_0^1 x^{a+1} (1 - x)^{b-1} \dif x\newline &= \frac{b}{a+1} B(a+1, b-1)\newline &= \frac{b}{a+1} \frac{(a+1)!(b-1)!}{(a + 1 + b - 1 + 1)!}\newline &= \frac{a!b!}{(a + b + 1)!}. \end{aligned}

# Acknowledgments

Our proof of the special case follows this excellent video by Joe Blitzstein. And our proof of the general case comes from Sheldon Ross’ classic textbook, A First Course in Probability, Exercise 30 on page 128 of the 7th edition.