Bayes’ Theorem (Part 2)
One of the things that makes Bayes’ Theorem particularly useful is that it allows us to update our probabilities as we receive new information. So let’s suppose that we have a fourth marble.
And it is also white, as stated above. In which way can we update our probabilities?
We would set up the same type of base theorem formula again. But these values, P(A1), P(A2), become our new prior probabilities. They become our new prior probabilities, so we have the same base theorem formula, but instead of using one-half and one-half by the principle of indifference, we now can use eight-ninths and one-ninths.
Let me demonstrate what I mean.
The probability that we have urn 1 given 3 whites and a 4th would be equal to the probability of drawing a white marble from urn one, times the prior probability that that is the urn that we’re observing, divided by the total probability of the data.
The probability that we are observing white, given that it is drawn from urn 1 (8/9) plus the probability of observing white, given that it is drawn from urn 2 (1/9). The expression on top is 0.2 times 8/9, and on the bottom we have (0.2) [multiplied by] (8/9), plus (0.1) [multiplied by] (1/9).
If we convert these ratios to percentages, the probability of urn 1 is 94.12 percent and the probability of urn 2 is 5.88 percent.
The probability of this outcome is now 11.1%, down from its previous value of 88.9%.
The different parts of Bayes’ theorem have special names, and here you can see theta to indicate the different parameters that were possible.
In this problem, urn 1 had a 0.2 probability of white balls and urn 2 had a 0.1 probability of white balls.
You can still use A if that seems more familiar and comfortable. The posterior probability is the probability after new data is observed, and it is the result.
Our best estimate of the probability that the 0.2 parameter is active or that the 0.1 parameter is active is 0.2 after new data.
The prior probability is the value we start with, either before any data are observed or after new data have been observed.
Before any data we will have 0.5, while before new data we would have eight-ninths and one-ninths.
The likelihood of the data given each parameter, sub i, is known as the standard forward probability. The probability of the data on the bottom is called the marginal probability.