Bayes Theorem

The Bayes’ theorem describes the probability of an event, based on prior knowledge of conditions that might be related to the event.

Posterior probability: P(A│B)=P(A⋂B)/P(B)= P(B│A)×P(A)/P(B)


Exhibit 32.6 Dove buyers and Watson’s shoppers. 40% of Dove buyers shop at Watson’s.

Example 1: A new body wash brand Daisy has been launched. The brand is targeting the users of Dove, another body wash brand. From research studies, it is known that 5% of the consumers in this market are Dove buyers, and that a high proportion, 40%, of them shop at Watsons, a personal care retailer. It is also known that 25% of shoppers of personal care products shop at Watsons.

The brand manager is considering conducting a sampling programme for Daisy at Watson’s, however, she feels it will only be viable to do so if at least 10% of Watson’s shoppers are Dove buyers. Should she go ahead with the programme?

The answer is pretty clear from Exhibit 32.6, which sums up the information. However, using Bayes theorem:

$$P(D): Proportion \,of \,Dove \,buyers = 0.05$$ $$P(W): Proportion \,of \,Watsons \,shoppers = 0.25$$ $$P(W|D): Dove \,consumers \,who \,shop \,at \,Watsons = 0.40$$ $$P(D|W): Watsons \,shoppers \,who \,buy \,Dove$$ $$P(D│W)=\frac{P(W│D)×P(D)}{P(W)} =\frac{0.40×0.05}{0.25}=0.08=8 \text{%} $$

Since the proportion of Dove buyers (8%) is less than 10%, the Daisy brand manager will not find it viable to conduct the proposed sampling exercise.

Example 2: A supermarket chain, Shop n Save (SnS), is facing stiff competition from market leader FairPrice (FP). The chain currently has a strong 32% base of regular shoppers, but is concerned that a high proportion of these regular shoppers are also shopping at FairPrice.

Based on a published research on FairPrice, the SnS team know that 85% of supermarket shoppers shop regularly at FP, and 24% of these shoppers also shop regularly at SnS.

Based on this information, can the SnS team compute what proportion of their regular shoppers also shop at FP?

$$P(S): \text{Proportion who shop regularly at SnS} = 0.32$$ $$P(F): \text{Proportion who shop regularly at FP} = 0.85$$ $$P(S|F): \text{Proportion of FP regular shoppers who shop regularly at SnS} = 0.24$$ $$P(F|S): \text{Proportion of SnS regular shoppers who shop regularly at FP}$$ $$P(F│S)=\frac{P(S│F)×P(F)}{P(S)}=\frac{0.24×0.85}{0.32}=64\text{%}$$

Exhibit 32.7 Bayes theorem, deriving posterior probabilities from prior probabilities and new information.

As depicted in Exhibit 32.7, often we begin with initial or prior probabilities for certain events. Then we obtain additional information from, for instance, a sample of the universe. Given this information we compute the revised or posterior probability.

Consider n mutually exclusive and collectively exhaustive events, A1, A2 … An. Given that event B has occurred, we can use Bayes theorem to find the posterior probability:

$$P(A_i│B)=\frac{P(A_i⋂B)}{P(B)}=\frac{P(B│A_i)×P(A_i)}{P(B)}=\frac{P(B│A_i )×P(A_i)}{P(B│A_1)P(A_1)+P(B│A_2)P(A_2)… P(B|A_n)P(A_n)}$$ $$P(A_i│B)=\frac{P(B│A_i)×P(A_i)}{P(B│A_1)P(A_1)+P(B│A_2)P(A_2)… P(B|A_n)P(A_n)}$$

Example 3: If the cancer test results for a patient are positive (T+), what is the probability, P(C+|T+), the patient has lung cancer (C+)?

Given:

  • Prevalence rate: 1% of the target population (aged above 50 years) develops lung cancer (C = lung cancer). P(C+) = 0.01, P(C−) = 0.99.
  • Sensitivity, P(T+|C+)= 95%. The test shows positive result for 95% of patients with lung cancer. P(T+|C+) indicates that the test is sensitive to detecting the presence of lung cancer given that the patient actually is suffering from lung cancer.
  • P(T+|C−)= 7.5%. The test shows positive result for 7.5% of patients who actually do not have lung cancer.
  • Specificity = 1 − P(T+|C−) = P(T-|C−) = 1 – 7.5% = 92.5%. On 92.5% occasions the test correctly detected the absence of lung cancer given that the patient is not suffering from lung cancer.

NOTE: Laboratory tests reveal sensitivity and specificity. However, what the doctor needs to know is the likelihood that his patient is suffering from cancer, given that the test results are positive, i.e. the posterior probability, P(C+|T+).


$$P(T+ ∩ \,C+) = P(T+|\,C+) × P(C+) = 0.95 × 0.01 = 0.0095$$ $$P(T+ ∩ \,C-) = P(T+|\,C-) × P(C-) = 0.075 × 0.99 = 0.07425$$

Question: P(C+|T+): If the cancer test results for a patient are positive (T+), what is the probability the patient has cancer (C+)?

$$P(C+│T+)=\frac{P(T+ ∩ \,C+)}{P(T+)} = 0.11$$

There is 11% probability that patient has lung cancer, given the test turns out to be positive.

By applying the Bayes’ Theorem, we are able to transform the probabilities from lab test or research study, into probabilities that are useful.

In this example if you underwent the cancer test, and the result was positive, you would be terrified to know that 95 percent of patients suffering from cancer get the same positive result. On the other hand, the realization that the probability you have cancer is 11% is less terrifying. Though still not good news, it is much more useful and relevant information.

Posterior Odds

By rules of conditional probability:

$$P(C+│\,T+)=\frac{P(T+ ∩\, C+)}{P(T+)} = \frac{P(T+│\,C+)×P(C+)}{P(T+)}$$ $$Posterior = Likelihood × \frac{Prior}{P(T+)}$$ $$\mathbf{Posterior ∝ Likelihood × Prior}$$

Likelihood is the data – in this case the likelihood the test is positive given that the patient has cancer.

Prior represents additional knowledge – 1% of the target population has cancer. Prior could come from anywhere. It could be a hunch, data taken from another study, anything we feel could have a bearing on the calculation.

Posterior is the information we are interested in. The probability that patient has lung cancer, given the test turns out to be positive.

$$P(C+│\,T+)=\frac{P(T+ ∩\, C+)}{P(T+)} = \frac{P(T+│\,C+)×P(C+)}{(P(T+)}$$ $$P(C−│\,T+)=\frac{P(T+ ∩\, C−)}{P(T+)} = \frac{P(T+│\,C−)×P(C−)}{(P(T+)}$$

Taking ratios:

$$\frac{P(C+│\,T+)}{P(C−│\,T+)}=\frac{P(T+│\,C+)}{P(T+│\,C−)}×\frac{P(C+)}{P(C−)}$$ $$\mathbf{Posterior \,odds = likelihood \,ratio × prior \,odds}$$

Example 4: A leading pharmaceutical firm segmented its customers, based on their priorities and preference. The segmentation process may be summed up into these 3 steps:

  • Qualitative study (Focus Groups) to understand priorities and preferences.
  • Quantitative study to measure customers’ preferences and priorities.
  • Cluster analysis to identify and name the segments.

The company identified 3 segments:

  • Quality seekers: Oriented towards clinical quality, less sensitive to cost of services.
  • Cost conscious customers: Priority on reducing their expenses.
  • Empowered health seekers: Choose to have greater control of their health options.

The firm is particularly interested in quality seeking customers, and wants to tailor services for this segment.

To identify these customers, a 5-minute, screening questionnaire was developed.

Based on statistics from the quantitative study/cluster analysis, the probability that the screening test would correctly identify a quality seeker, P(T+/ Q+), is 90%. But 8% of customers who were not quality seekers would also test positive, P(T+/Q-) = 8%. The cluster analysis also revealed that 20% of the firm’s customer are likely to be quality seekers, P(Q+) = 20%.

Based on this information, we can tell what proportion of customers who are screened as quality seekers, are actually quality seekers:

$$P(T+)= P(T+ ∩\, Q+)+P(T+ ∩\, Q-) $$ $$P(T+ ∩\, Q+) = P(T+│\,Q+) × P(Q+) = 0.90 × 0.20 = 0.18$$ $$P(T+ ∩\, Q-) = P(T+│\,Q-) × P(Q-) = 0.08 × 0.80 = 0.064$$ $$P(T+)=0.18+ 0.064=0.244$$ $$P(Q+│\,T+)=\frac{P(T+ ∩\, Q+)}{P(T+)} = \frac{P(T+│\,Q+)×P(Q+)}{P(T+)} = \frac{0.90 ×0.20}{0.244}=73.8 \% $$

73.8% of customers that are screened positive, will actually be a quality seeker.

Previous     Next