3.1 Linear Equations

Goals

Review linear equations.
Motivate regression.
Interpret a slope and intercept in context.
Use a regression line to predict values of a dependent variable.

Linear Equations

Should already have seen linear equations like \[y=mx+b\]
In statistics, we write these as \[y=b_0 + b_1x\]
- $b_0$ and $b_1$ are constants.
- $x$ is the independent variable.
- $y$ is the dependent variable

Slope and Intercept

\[y=b_0 + b_1x\]

The y-intercept is $b_0$, the value of $y$ when $x=0$.
The slope is $b_1$, the change in $y$ for a 1-unit change in $x$.

A scatterplot shows the relationship between two (numeric) variables.

We call this type of data bivariate data.

This relationship can be modeled perfectly with a straight line: \[y = 8 + 3.25x\]

Example: Interpret Slope and Intercept

\[y = 8 + 3.25x\] where $x$ is the number of coffees purchased in a month and $y$ is the amount of money spent on coffee.

The intercept is 8, which is the dollar amount of money spent on coffee (the value of $y$) when 0 coffees are purchased in that month (when $x=0$).
The slope is 3.25, which is the increase in amount of money spent on coffee (increase in $y$) for each additional coffee purchased (a one-unit increase in $x$).

But what if that pound of coffee didn’t always cost $8?
Or the coffee drinks didn’t always cost $3.25?

The linear regression line looks like \[y = \beta_0 + \beta_1x + \epsilon\]

$\beta$ is the Greek letter “beta”.
$\beta_0$ and $\beta_1$ are constants.
Error (the fact that the points don’t all line up perfectly) is represented by $\epsilon$.

We estimate $\beta_0$ and $\beta_1$ using data and denote the estimated line by \[\hat{y} = b_0 + b_1x \]

$\hat{y}$, “y-hat”, is the estimated value of $y$.
$b_0$ is the estimate for $\beta_0$.
$b_1$ is the estimate for $\beta_1$.
…and 0 is the estimate for $\epsilon$ (so we ignore it).

We use a regression line to make predictions about $y$ using values of $x$.

Think of this as the 2-dimensional version of a point estimate!

$y$ is the response variable.
$x$ is the predictor variable.

Example

Example: Researchers took a variety of measurements on 344 adult penguins in Antarctica.

Example

The regression model for these data is \[\hat{y}=-5780.83 + 49.69x\]

Example

To predict the body mass for a penguin with a flipper length of 180mm, we just need to plug in 180 for flipper length ($x$): \[\hat{y}=-5780.83 + 49.69\times 180 = 3163.37\text{g}.\] - Note that the regression line automatically deals with units.

Homework Problems

Section 3.1 Exercises 1 and 2