Measures of Variability

How much do the data vary?

Should we care? Yes! The more variable the data, the harder it is to be confident in our measures of center!

  • If you live in a place with extremely variable weather, it is going to be much harder to be confident in how to dress for tomorrow’s weather.
  • If you live in a place where the weather is always the same, it’s much easier to be confident in what you plan to wear.

Range

One easy way to think about variability is the range of the data: \[\text{range} = \text{maximum} - \text{minimum}\]

  • This is quick and convenient, but it is extremely sensitive to extreme values!
  • It also takes into account only two of the observations.
    • We would prefer a measure of variability that takes into account all the observations.

Deviation

Deviation is the distance of an observation from the mean: \[x - \bar{x}\]

  • We want to think about how far - on average - a typical observation is from the center.
  • Might think to take the average deviance… but it turns out that summing up the deviances will always result in 0!
    • Conceptually, this is because the stuff below the mean (negative numbers) and the stuff above the mean (positive numbers) end up canceling each other out until we end up at 0.

One way to deal with this is to make all of the numbers positive, which we accomplish by squaring the deviance.

Deviance Squared Deviance
\(x\) \(x - \bar{x}\) \((x - \bar{x})^2\)
2 -1.2 1.44
5 1.8 3.24
3 -0.2 0.04
4 0.8 0.64
2 -1.2 1.44
\(\bar{x}=3.2\) Total = 0 Total = 6.8

Variance

Variance (denoted \(s^2\)) is the “average” squared distance from the mean: \[ s^2 = \frac{(x_1-\bar{x})^2 + (x_2-\bar{x})^2 + \dots + (x_n-\bar{x})^2}{n-1} \] where \(n\) is the sample size.

  • Notice that we divide by \(n-1\) and NOT by \(n\).

Standard Deviation

Finally, we come to standard deviation (denoted \(s\)).

  • The standard deviation is the square root of the variance. \[s = \sqrt{s^2}\]
  • We say that a “typical” observation is within about one standard deviation of the mean (between \(\bar{x}-s\) and \(\bar{x}+s\)).
  • We will use a computer to calculate the variance and standard deviation.

We will think about one more measure of variability, the interquartile range, in the next section.