Goal: formalize the concept of the strength of a linear relationship.
The correlation (or correlation coefficient) \(R\) between two variables describes the strength of their linear relationship.
\[R = \frac{1}{n-1}\sum_{i=1}^n\left(\frac{x_i - \bar{x}}{s_x}\times\frac{y_i - \bar{y}}{s_y}\right)\]
- \(s_x\) and \(s_y\) are the respective standard deviations for \(x\) and \(y\)
- The sample size \(n\) is the total number of \((x,y)\) pairs.
- \(R\) always takes values between \(-1\) and \(1\).
This is a pretty involved formula! We’ll let a computer handle this one.