Linear Equations

  • Should already have seen linear equations like \[y=mx+b\]
  • In statistics, we write these as \[y=b_0 + b_1x\]
    • \(b_0\) and \(b_1\) are constants.
    • \(x\) is the independent variable.
    • \(y\) is the dependent variable

Slope and Intercept

\[y=b_0 + b_1x\]

  • The y-intercept is \(b_0\), the value of \(y\) when \(x=0\).
  • The slope is \(b_1\), the change in \(y\) for a 1-unit change in \(x\).

A scatterplot shows the relationship between two (numeric) variables.

We call this type of data bivariate data.

This relationship can be modeled perfectly with a straight line: \[y = 8 + 3.25x\]

  • But what if that pound of coffee didn’t always cost $8?
  • Or the coffee drinks didn’t always cost $3.25?

The linear regression line looks like \[y = \beta_0 + \beta_1x + \epsilon\]

  • \(\beta\) is the Greek letter “beta”.
  • \(\beta_0\) and \(\beta_1\) are constants.
  • Error (the fact that the points don’t all line up perfectly) is represented by \(\epsilon\).

We estimate \(\beta_0\) and \(\beta_1\) using data and denote the estimated line by \[\hat{y} = b_0 + b_1x \]

  • \(\hat{y}\), “y-hat”, is the estimated value of \(y\).
  • \(b_0\) is the estimate for \(\beta_0\).
  • \(b_1\) is the estimate for \(\beta_1\).

We use a regression line to make predictions about \(y\) using values of \(x\). Think of this as the 2-dimensional version of a point estimate!

  • \(y\) is the response variable.
  • \(x\) is the predictor variable.

Example

Example: Researchers took a variety of measurements on 344 adult penguins in Antarctica.

Example

The regression model for these data is \[\hat{y}=-5780.83 + 49.69x\]

Example

To predict the body mass for a penguin with a flipper length of 180cm, we just need to plug in 180 for flipper length (\(x\)): \[\hat{y}=-5780.83 + 49.69\times 180 = 3163.37\text{g}.\] - Note that the regression line automatically deals with units.