Residuals are the leftover stuff (variation) in the data after accounting for model fit: \[\text{data} = \text{prediction} + \text{residual}\]
Residuals are the leftover stuff (variation) in the data after accounting for model fit: \[\text{data} = \text{prediction} + \text{residual}\]
Note: If an observation lands above the regression line, \(e > 0\). If below, \(e < 0\).
Goal: get each residual as close to 0 as possible.
To shrink the residuals toward 0, we minimize: \[ \begin{align} \sum_{i=1}^n e_i^2 &= \sum_{i=1}^n (y_i - \hat{y}_i)^2 \\ & = \sum_{i=1}^n [y_i - (b_0 + b_1 x_i)]^2 \end{align} \]
The values \(b_0\) and \(b_1\) that minimize this will make up our regression line.
eruptions
, the length of each eruptionwaiting
, the time between eruptionsThe sample statistics for these data are
waiting |
eruptions |
|
---|---|---|
mean | \(\bar{x}=70.90\) | \(\bar{y}=3.49\) |
sd | \(s_x=13.60\) | \(s_y=1.14\) |
\(R = 0.90\) |
Find the regression line and interpret the parameters.
Consider a dataset on height
and age
of \(n=84\) Loblolly pine trees.
height
to predict tree age
.The sample statistics for these data are
height |
age |
|
---|---|---|
mean | \(32.36\) | \(13.00\) |
sd | \(20.67\) | \(7.90\) |
\(R = 0.99\) |
Find the regression line and interpret the parameters.
The coefficient of determination, \(R^2\), is the square of the correlation coefficient.