Nobody Expects the Chance Function!

Here’s a striking result that caught me off guard the other day. It came up in a facebook thread, and judging by the discussion there it caught a few other people in this neighbourhood off guard too.

The short version: chances are “self-expecting” pretty much if and only if they’re “self-certain”. Less cryptically: the chance of a proposition equals its expected chance just in case the chance function assigns probability 1 to itself being the true chance function, modulo an exception to be discussed below.

The same result applies to any probabilities of course, whether they represent physical chances or evidential probabilities or whatever. In fact, thanks to friends on facebook, I learned that it drives this lovely paper by Kevin Dorst.

I just happened to stumble across it while thinking about chances, because Richard Pettigrew uses the assumption that chances are self-expecting in his Phil Review paper on accuracy and the Principal Principle. But later, in his landmark book on accuracy, he switches to the requirement that they be self-certain. It turns out this isn’t a coincidence. The result we’re about to look at illuminates this shift.

The result goes back to 1997 at least, in a paper by Dov Samet. Proving the full result is a bit more involved than what I’ll present here. For simplicity, I’ll only prove a special case at the end. But along the way we’ll look at some suggestive examples that illustrate the full version.

The Chance Matrix

Imagine we have just four possible worlds, resulting from two tosses of a coin. What are the physical chances at each of the four possible worlds $HH$, $HT$, $TH$, and $TT$? $\newcommand{\mstar}{\mathfrak{m}^*} \newcommand{\C}{\mathbf{C}}$

One natural thought is to apply Laplace’s classic rule of succession: given $s$ heads out of $n$ tosses, conclude that the probability of heads on each toss is $(s+1)/(n+2)$. So at $HH$-world for example, the chance of heads was $3/ 4$ on each toss.

If we assume the tosses are independent, then $HH$-world had chance $(3/ 4)(3/ 4) = 9/16$ of being actual, according to the chance function at $HH$-world. Whereas $HT$-world had chance $(3/ 4)(1/ 4) = 3/16$ of being actual at $HH$-world. The full chance function at $HH$-world can be displayed as a column vector: $$ \left( \begin{matrix} 9/16\\
1/16 \end{matrix} \right). $$ Applying the same recipe at $HT$-world would give us a different column vector. And sticking the columns for all four worlds together, we get a $4 \times 4$ chance matrix for our space of possible worlds: $$ \mathbf{C} = \left( \begin{matrix} 9/16 & 1/ 4 & 1/ 4 & 1/16\\
3/16 & 1/ 4 & 1/ 4 & 3/16\\
3/16 & 1/ 4 & 1/ 4 & 3/16\\
1/16 & 1/ 4 & 1/ 4 & 9/16 \end{matrix} \right). $$ Each column gives the chances at a world, while a row gives the chances of a world. For example, entry $c_{14}$ gives the chance at world $4$ of world $1$ being actual. It says how likely the sequence $HH$ was if the actual unfolding of events is instead $TT$, namely $1/16$.

A different thought would be to appeal to Carnap’s notorious “logical” prior $\mstar$: $$ \mstar = \left( \begin{matrix} 1/3\\
1/3 \end{matrix} \right). $$ This assignment of probabilities ignores the actual unfolding of events in each world. It falls out of a bit of a priori reasoning instead. There are three possible outcomes: $2$ heads, $1$ head, or $0$ heads. Each is equally likely, $1/ 3$. But there are two ways to get $1$ head, so the $1/ 3$ there gets subdivided equally between $HT$ and $TH$, leaving $1/6$ for each.

Since these chances ignore the actual unfolding of events in each world, the chance matrix we get here is extremely anti-Humean. It’s just four repetitions of $\mstar$: $$ \mathbf{C} = \left( \begin{matrix} 1/3 & 1/3 & 1/3 & 1/3\\
1/6 & 1/6 & 1/6 & 1/6\\
1/6 & 1/6 & 1/6 & 1/6\\
1/3 & 1/3 & 1/3 & 1/3 \end{matrix} \right). $$ You might think that’s a pretty terrible theory of chance, and I sympathize. But what we’re about to see is that, of our two chance matrices, only the second is “self-expecting”. And its terribleness is part of the reason why.

Self Expectation

Pettigrew’s Phil Review paper assumes that chance functions are “self-expecting”. The chance of a proposition at a given world must equal its expected value, where the expectation is taken according to the chances at that world.

In terms of a chance matrix $\C$, this amounts to the requirement that $\C \C = \C$. When we multiply $\C$ by $\C$, we take dot-products of rows and columns. For example, if we were doing the calculation by hand, we’d start by multiplying the first row of $\C$ by the first column of $\C$. And this is just the weighted average of the various possible chances of the first world, where the weights are the chances at that world. In other words, it’s the expected chance of $HH$-world at $HH$-world.

In general, the dot product of row $i$ with column $j$ is the expected chance of world $i$ at world $j$. For this expected chance to equal the chance of world $i$ at world $j$, it must be that $\C \C = \C$. More succinctly, $\C^2 = \C$.

Matrices that have this property—squaring them leaves them unchanged—are called idempotent. And when our matrices are column stochastic (all values are nonnegative and each column sums to $1$), idempotence is a very… well, potent requirement.

For example, our first chance matrix based on Laplace’s rule of succession is not idempotent. Its square is not itself, but something quite different. Our second, Carnapian matrix is idempotent though. Its square is just itself. And that’s not a coincidence.

Self Certainty

Any chance matrix whose columns are redundant will be idempotent. After all, if the chances are the same at every world, the expected value of any world is always the same. So its expected value just is the value it has at every world.

But redundant columns also mean that the chances are self certain. Each world’s chance assignment gives zero probability to the chances being anything other than what they are at that world. Because there are no worlds where the chances are different.

The chances can vary from world to world and still be self-expecting though. There are idempotent chance matrices where the columns are not simply redundant. For example, here’s another idempotent chance matrix: $$ \left( \begin{matrix} 1/3 & 1/3 & 0 & 0\\
2/3 & 2/3 & 0 & 0\\
0 & 0 & 1/ 4 & 1/ 4\\
0 & 0 & 3/ 4 & 3/ 4 \end{matrix} \right). $$ But notice how it’s still kind of a degenerate case. There are two, disjoint regions of modal space here that regard one another as zero-chance. And within each region the chances are the same at each world. Worlds $1$ and $2$ have the same chances, and they give zero chance to worlds $3$ and $4$. And vice versa from the point of view of worlds $3$ and $4$.

In other words, self-expectation and self-certainty go hand in hand here once again.

A Sliver of Daylight

Is there any daylight at all then between self-expectation and self-certainty?

Self-certainty entails self-expectation, and the argument is pretty short. If the only worlds with positive chance according to world $j$ assign the same chances as world $j$ does, then any average of those chances will just be those same chances.

But self-expectation doesn’t quite entail self-certainty. For example, here’s an idempotent chance matrix that’s not self-certain: $$ \left( \begin{matrix} 1/3 & 1/3 & 0 & 0 & 25/94\\
2/3 & 2/3 & 0 & 0 & 25/47\\
0 & 0 & 1/ 4 & 1/ 4 & 19/376\\
0 & 0 & 3/ 4 & 3/ 4 & 57/376\\
0 & 0 & 0 & 0 & 0 \end{matrix} \right). $$ It’s kind of a lame counterexample though, because the new, fifth world we’ve introduced (the coin explodes or something idk) has zero chance at every world, even itself.

In fact this is what Samet proves: this is the only kind of counterexample possible! If probabilities are self-expecting, then they must be either self-certain or self-effacing. They must assign zero chance to the chances being otherwise, or they must assign chance one to them being otherwise.

In terms of matrices, there are only three kinds of idempotent chance matrix:

  1. All columns are identical.
  2. The matrix is block diagonal, with identical columns inside each block.
  3. The matrix is as in (2), except for some columns $j_1, \ldots, j_n$. But the corresponding rows $j_1, \ldots, j_n$ contain only zeros.

Strictly speaking (1) is actually a special case of (2). But (1) deserves direct attention because it arises in a way that’s interesting both philosophically and mathematically.

The Connected Case

Here’s a natural thought, one that’s driven a lot of the literature on chance and Lewis’ Principal Principle. The thought: however events unfold at one world, there’s a chance they could have evolved differently. There’s even some small, non-zero chance they could have evolved quite differently.

Taking this thought a bit further, you might think there’s a region of modal space where, even though worlds $w_1$ and $w_n$ have different chances, $w_n$ is always reachable from world $w_1$. More exactly, there’s always a connecting sequence of worlds $w_1, w_2, \ldots, w_n$ where $w_i$ gives non-zero chance to $w_{i+1}$.

In terms of coin tosses, maybe it’s a law of nature that all coins land heads when all hundred out of one hundred flips land heads. But when there’s a mix of heads and tails, there’s at least some chance the mix could have had a few more heads, or a few more tails. So every world where the sequence isn’t perfectly uniform can be reached from every other. If not in a single, positive-chance hop, then at least by a series of hops, perhaps by switching the outcomes of the flips one at a time for example.

In terms of graphs, such a region of modal space is said to be connected. In terms of matrices, it amounts to the chance matrix for this region being regular: there must be some power $n$ such that $\C^n$ contains all positive entries.

Now, regular matrices have the remarkable property that, as we multiply them against themselves more and more times, the result converges to a matrix $\mathbf{P}$ whose columns are all identical: $$ \lim_{n \rightarrow \infty} \C^n = \mathbf{P} = \left( \begin{matrix} p_1 & \ldots & p_1 \\
\vdots & \ldots & \vdots \\
p_k & \ldots & p_k \\
\end{matrix} \right). $$ Now recall that for $\C$ to be self-expecting, it must be idempotent, meaning $\C^2 = \C$. But that means $\C^n = \C$ for any power $n$. But then $\C = \mathbf{P}$, so $\C$ must already have redundant columns.

What does this mean for us? One way to think about it: there isn’t as much room for chances to vary from world to world as one might have thought. If the chances are going to be self-expecting, they must be the same at every world across a whole region modal space despite the facts turning out quite differently at various worlds across that region.

This point is strongly reminiscent of David Lewis’ famous “Big Bad Bug” of course. And there’s tons of relevant literature, most of which I confess I never really absorbed. So I’ll link to one paper I’m finding especially helpful on this right now, Richard Pettigrew’s “What Chance-Credence Norms Should Not Be”.