Measures of Central Tendency

What values are most common or most likely?

Three ways to measure:

  • mode
  • mean
  • median

Mode

Mode: the most commonly occurring value.

  • Used when working with categorical variables.

Mean

Mean: this is what we usually think of as the “average”.

  • Denoted \(\bar{x}\).
  • Add up all of the values and divide by the number of observations (\(n\)): \[ \bar{x} = \frac{x_1 + x_2 + \dots + x_n}{n} = \sum_{i=1}^n \frac{x_i}{n} \]
  • \(x_i\) is the \(i\)th observation a
  • \(\sum_{i=1}^n\) is the sum of all observations from 1 through \(n\).
    • This is called summation notation.

Median

Median: the middle number when the data are ordered from smallest to largest.

  • If there are an odd number of observations, this will be the number in the middle:
    {1, 3, 7, 9, 9} has median 7
  • If there are an even number of observations, there will be two numbers in the middle. The median will be their average.
    {1, 2, 4, 7, 9, 9} has median \(\frac{4+7}{2}=5.5\)

The mean is sensitive to extreme values and skew. The median is not!

\(x\): 1, 3, 7, 9, 9 \(y\): 1, 3, 7, 9, 45
\(\text{median} = 7\) \(\text{median} = 7\)
\(\bar{x} = \frac{29}{5} = 5.8\) \(\bar{y} = \frac{65}{5} = 13\)


Changing that 9 out for a 45 changes the mean a lot! But the median is 7 for both \(x\) and \(y\).

Because the median is not affected by extreme observations or skew, we say it is a resistant measure or that it is robust.

Which measure should we use?

  • Mean: symmetric, numeric data
  • Median: skewed, numeric data
  • Mode: categorical data

Note: If the mean and median are roughly equal, it is reasonable to assume the distribution is roughly symmetric.