Contingency Tables

A contingency table is a way to summarize bivariate data, or data from two variables.

Smallpox in Boston (1726)

 

Inoculated

 

yes

no

total

Result

lived

238

5136

5374

died

6

844

850

total

244

5980

6224

  • 5136 is the count of people who lived AND were not inoculated. 
  • 6224 is the total number of observations.
  • 244 is the total number of people who were inoculated.
  • 5374 is the total number of people who lived.

Contingency Tables

  • These are basically two-variable frequency distributions.
  • We can convert to proportions by dividing each count by the total number of observations.

 

Inoculated

 

yes

no

total

Result

lived

0.0382

0.8252

0.8634

died

0.0010

0.1356

0.1366

total

0.0392

0.9608

1.0000

  • 0.8252 is the proportion of people who lived AND were not inoculated. 
  • 1.000 is the proportion of total number of observations. Think of this as 100% of the observations.
  • 0.0392 is the proportion of people who were inoculated.
  • 0.8634 is the proportion of people who lived.

 

Inoculated

 

yes

no

total

Result

lived

0.0382

0.8252

0.8634

died

0.0010

0.1356

0.1366

total

0.0392

0.9608

1.0000

  • The row and column totals are marginal probabilities.
  • The probability of two events together (\(A\) and \(B\)) is a joint probability.

What can we learn about the result of smallpox if we already know something about inoculation status?

  • For example, given that a person is inoculated, what is the probability of death?
  • To figure this out, we restrict our attention to the 244 inoculated cases. Of these, 6 died. So the probability is 6/244.

Conditional Probability

Conditional probability: the probability of some event \(A\) if we know that event \(B\) occurred (or is true): \[P(A|B) = \frac{P(A\text{ and }B)}{P(B)}\] where the symbol | is read as “given”.

  • For death given inoculation, \[\begin{align} P(\text{death}|\text{inoculation}) &= \frac{P(\text{death and inoculation})}{P(\text{inoculation})} \\ &= \frac{0.0010}{0.0392} = 0.0255 \end{align}\]
  • We could also write this as \[\begin{align} P(\text{death}|\text{inoculation}) &= \frac{P(\text{death and inoculation})}{P(\text{inoculation})} \\ &= \frac{6/6224}{244/6224} = \frac{6}{244} \end{align}\]

Independent Events

If knowing whether event \(B\) occurs tells us nothing about event \(A\), the events are independent. For example, if we know that the first flip of a (fair) coin came up heads, that doesn’t tell us anything about what will happen next time we flip that coin.

We can test for independence by checking if \(P(A|B)=P(A)\).

Multiplication Rule for Independent Processes

If \(A\) and \(B\) are independent events, then \[P(A \text{ and }B) = P(A)P(B).\]

  • We can extend this to more than two events: \[P(A \text{ and }B \text{ and } C \text{ and } \dots) = P(A)P(B)P(C)\dots.\]
  • Note that if \(P(A \text{ and }B) \ne P(A)P(B)\), then \(A\) and \(B\) are not independent.

Example

Find the probability of rolling a \(6\) on your first roll of a die and a \(6\) on your second roll.

Let \(A=\) (rolling a \(6\) on first roll) and \(B=\) (rolling a \(6\) on second roll). For each roll, the probabiltiy of getting a \(6\) is \(1/6\), so \(P(A) = \frac{1}{6}\) and \(P(B) = \frac{1}{6}\).

Then, because each roll is independent of any other rolls, \[P(A \text{ and }B) = P(A)P(B) = \frac{1}{6}\times\frac{1}{6} = \frac{1}{36}\]

General Multiplication Rule

If \(A\) and \(B\) are any two events, then \[P(A \text{ and }B) = P(A|B)P(B).\]

  • This is just the conditional probability formula, rewritten in terms of \(P(A \text{ and }B)\)!

Checkpoint

Suppose we know that 38.4% of US households have dogs and that among those with dogs, 23.1% have cats. Find the probability that a US household has both dogs and cats.