Measuring the Strength of Evidence

All of the probabilities on this page are prior credences. As done on the preceding page, we’ll drop the subscript ‘1’ to simplify notation.

As we saw on the previous page, the likelihood principle says that evidence E favors whichever hypothesis makes E more likely. In other words, if pr(E|H1) is bigger than pr(E|H2), then E favors H1. By comparing those two conditional probabilities, we can determine which way the evidence tilts, so to speak. But that’s not all. By comparing pr(E|H1) with pr(E|H2), we can also assess the strength of the evidence.

To assess how strong the evidence is, we need to ask how much bigger is pr(E|H1) than pr(E|H2). Is it twice as big? Three times bigger? The answer to that “how much” question is the following ratio, which is called the Bayes factor:
Bayes factor  =   pr(E|H1)
pr(E|H2)
If hypothesis H1 makes E twice as likely as H2 does, then the Bayes factor is 2. If H1 makes E three times as likely as H2 does, the Bayes factor is 3. And so on.

The Bayes factor is related to, but distinct from, the Bayesian multiplier defined in the previous chapter. We’ll contrast the two quantities in a moment, and we’ll see why the Bayes factor—not the Bayesian multiplier—provides a useful measure of evidential strength. First, however, let’s examine how the Bayes factor works in the simplest case, where we are considering only one hypothesis. As we saw in the non-contrastive likelihood principle, we can determine whether E is evidence for or against a single hypothesis H by comparing pr(E|H) with pr(E|~H). This same comparison also gives us a measure of evidential strength. Specifically, the ratio of those two conditional probabilities provides a measure of the degree to which E supports H. This ratio is the Bayes factor of E relative to H:

Bayes factor of E (relative to H)  =   pr(E|H)
pr(E|~H)

For instance, if E is 10 times more likely given H than given ~H, the Bayes factor is 10. The greater the Bayes factor, the stronger the evidence. A proposition with a Bayes factor of 100 (relative to hypothesis H) provides stronger evidence for H than a proposition with a Bayes factor of only 10, for instance. Similarly, propositions with a Bayes factor less than 1 count as evidence against H; and the closer the Bayes factor is to zero, the more heavily the evidence weighs against H.

Imagine you are a historian trying to figure out whether Herbert Hoover hated horses. In search of evidence for equine enmity, you might read his memoirs to see if he ever eschews equestrian excursions. If you find that he eschews equestrianism, this will support your hypothesis, since he’s more likely to eschew equestrianism if he hates horses than if he doesn’t.

How strong is this evidence? That depends on your prior conditional credences. Perhaps you consider it unlikely that Hoover’s memoirs will mention equestrianism at all. Still, he’s more likely to execrate equestrianism in his memoirs if he hates horses than if he doesn’t. Let’s suppose your prior conditional credences are pr(E|H) = .1 and pr(E|~H) = .001, where E is the proposition that he eschews equestrianism and H is the hypothesis that he hates horses. In this case, the Bayes factor is 100, since you think E is 100 times more likely if H is true than if H is false:
pr(E|H)  =  .1  =  100
pr(E|~H) .001

By the way, it seems Hoover did, in fact, hate horses. He once remarked: “I have often wondered if a mistake had been made when God created the horse.”Quoted by Hoover’s friend Ding Darling. And yes, Ding Darling was a real person. I promise I’m not making any of this up! If you don’t believe me, here’s proof from the National Archives: Matthew Schaefer, “Heroes on Horseback; Hoover on Horses,” 2021.

It’s important not to confuse the Bayes factor with the Bayesian multiplier, defined previously. For comparison, here are both formulas:

Bayes factor  =   pr(E|H)
pr(E|~H)
Bayesian multiplier  =   pr(E|H)
pr(E)

Only the Bayes factor, not the Bayesian multiplier, provides a measure of evidential strength. The Bayesian multiplier is the number by which you should multiply your prior credence in H upon learning E. Since a large Bayesian multiplier means a large increase in your credence, it might seem to indicate strong evidence. However, the size of the Bayesian multiplier is not a good measure of the strength of the evidence. Here’s why. It is possible for weak evidence to result in a large shift in your credence; and, conversely, it’s possible for strong evidence to result in a small shift in your credence, as illustrated in the following example.

To see how strong evidence might nonetheless yield a small Bayesian multiplier, consider an extreme case: suppose evidence E logically entails hypothesis H. Logical entailment is the strongest possible evidential relationship. No evidence can support H more strongly than E does, since E literally guarantees the truth of H. Nevertheless, the Bayesian multiplier might be small, depending on your credence in H prior to learning E. If you already were 95% confident in H before you learned E, your credence can only shift upward by 5%, so the Bayesian multiplier is just barely more than 1.Since your credence in H must increase from 95% to 100% when you learn E, the Bayesian multiplier in this case is 100/95, or approximately 1.05. That’s why the Bayesian multiplier isn’t a good measure of evidential strength. The Bayes factor, in contrast, correctly indicates that logical entailment is the strongest possible evidential relationship. If E entails H, then pr(E|~H) is zero, so the Bayes factor is infinite!Strictly speaking, the Bayes factor is undefined in this case, but we can express the crucial idea more carefully: as the strength of evidence E approaches that of logical entailment, the denominator pr(E|~H) approaches zero, so the Bayes factor approaches infinity.

Now, let’s return to the contrastive version of the likelihood principle. It says that evidence E favors whichever hypothesis makes E more likely. What does it mean for the evidence to favor one hypothesis over another? It means that the ratio of your credences should shift in favor of that hypothesis. In particular, if hypothesis H1 makes E more likely than H2 does, then the following ratio should increase when you learn E:
pr(H1)
pr(H2)

This leads to another question: how much should that ratio increase? Remarkably, the answer to this question is the same as the answer to the previous “how much” question discussed above: how much bigger is pr(E|H1) than pr(E|H2)? The answer to both questions is the Bayes factor. Not only does the Bayes factor tell us how much more likely one hypothesis makes the evidence, compared to a rival hypothesis. It also tells us how much the ratio of our credences in those two hypotheses should change when we encounter this evidence. The Bayes factor is precisely the amount by which the ratio of your unconditional credences in H1 and H2 should change when you learn E. In other words, when you conditionalize on E, the ratio of your posterior credences equals the prior ratio times the Bayes factor:
pr(H1|E)  =  pr(H1)  ×  pr(E|H1)
pr(H2|E) pr(H2) pr(E|H2)

To illustrate this last equation, let’s return to the missing car keys example, focusing on the hypotheses that the keys were borrowed or stolen. Let’s suppose your prior credences in these two hypotheses are pr(B) = .4 and pr(S) = .2, respectively. (These probabilities don’t add up to 1 because there is a third hypothesis that we are not presently considering: the hypothesis that the keys were merely misplaced.)

Now, you are about to check whether the car is in the driveway. Your prior conditional credence pr(C|B) is .25, as specified on the previous page, and pr(C|S) is much lower: .01, let’s say. Plugging these values into the equation above, we have:
pr(B|C)  =  pr(B)  ×  pr(C|B)
pr(S|C) pr(S) pr(C|S)
 
   =  .4  ×  .25
  .2 .01
 
   =  2  ×  25
The Bayes factor in this example is 25. This means that although you already were twice as confident in B as in S, the ratio of these credences will increase by a factor of 25 when you learn that the car is still in the driveway. Obviously, the presence of the car in the driveway is strong evidence favoring borrowed over stolen, even though it also provides evidence against both of those hypotheses (since it favors misplaced over each of them, as we saw on the previous page).