Least Squares

How do we estimate \(\beta\)?

Consider the simple case where \(y = \beta_0 + \beta_1x + \epsilon\).

\[\begin{aligned} \hat{\beta}_0 &= \bar{y} - \hat{\beta}_1\bar{x} \\ \hat{\beta}_1 &= \frac{\sum_{i=1}^n(x_i-\bar{x})(y_i-\bar{y})}{\sum_{i=1}^n(x_i-\bar{x})^2} \\ &= \frac{s_{xy}}{s_x^2} \\ &= r_{xy}\frac{s_y}{s_x} \end{aligned}\]

Notice how the sample correlation, covariance, variances, and coefficients are all related.

x <- c(0.25,1,1.75)
y <- c(13, 18, 19)
plot(x, y, pch=20, cex=4, xlim=c(0, 2), ylim=c(10, 20))

x <- c(0.25,1,1.75)
y <- c(13, 18, 19)
plot(x, y, pch=20, cex=4, xlim=c(0, 2), ylim=c(10, 20))
abline(12, 4, col="blue", lwd=2)
abline(14, 4, col="red", lwd=2)
abline(lm(y~x), lwd=2)

Error

\[e_i = y_i - f(x_i)\] Our goal is to minimize overall error.

Why can’t we jump right in with minimizing this quantity?

Goal: Minimize Error

One possibility: absolute error \(|y_i - f(x_i)|\)

Another possibility: squared error \((y_i - f(x_i))^2\)

Squared error is used far more often than absolute error.

Why do you think that is?

Least Squares: Minimizing Error

This process minimizes the overall squared distance between the regression line and each \(y\) value.

That is, we minimize the vertical distance between each point and the line.

Least Squares

Let \(f(x_i) = \beta_0 + \beta_1x_i\). Minimize \[\sum_{i=1}^n (y_i - f(x_i))^2\] to find estimates for \(\beta_0\) and \(\beta_1\).

Least Squares

Note: least squares is a convex optimization problem.

That is, every local minimum is a global minimum.
- (We don’t need to do any kind of second derivative check.)

The general case

More generally, we can do this with matrices! \[\sum\epsilon_i^2 = \epsilon^T\epsilon = (y-X\beta)^T(y-X\beta)\]

Differentiating with respect to \(\beta\) and setting to 0, we find that \(\hat{\beta}\) satisfies

\[X^TX\hat{\beta} = X^ty\]

…and if \(X^TX\) is invertible,

\[\begin{aligned} \hat{\beta} &= (X^TX)^{-1}X^Ty \\ X\hat{\beta}_1 &= X(X^TX)^{-1}X^Ty \\ \hat{y} &= Hy \\ \end{aligned}\]

where \(H\) is called the hat-matrix.