3.2 Linear Equations

Linear Equations

Should already have seen linear equations like \[y=mx+b\]
In statistics, we write these as \[y=b_0 + b_1x\]
- $b_0$ and $b_1$ are constants.
- $x$ is the independent variable.
- $y$ is the dependent variable

Slope and Intercept

\[y=b_0 + b_1x\]

The y-intercept is $b_0$, the value of $y$ when $x=0$.
The slope is $b_1$, the change in $y$ for a 1-unit change in $x$.

A scatterplot shows the relationship between two (numeric) variables.

We call this type of data bivariate data.

This relationship can be modeled perfectly with a straight line: \[y = 8 + 3.25x\]

But what if that pound of coffee didn’t always cost $8?
Or the coffee drinks didn’t always cost $3.25?

The linear regression line looks like \[y = \beta_0 + \beta_1x + \epsilon\]

$\beta$ is the Greek letter “beta”.
$\beta_0$ and $\beta_1$ are constants.
Error (the fact that the points don’t all line up perfectly) is represented by $\epsilon$.

We estimate $\beta_0$ and $\beta_1$ using data and denote the estimated line by \[\hat{y} = b_0 + b_1x \]

$\hat{y}$, “y-hat”, is the estimated value of $y$.
$b_0$ is the estimate for $\beta_0$.
$b_1$ is the estimate for $\beta_1$.

We use a regression line to make predictions about $y$ using values of $x$. Think of this as the 2-dimensional version of a point estimate!

$y$ is the response variable.
$x$ is the predictor variable.

Example

Example: Researchers took a variety of measurements on 344 adult penguins in Antarctica.

Example

The regression model for these data is \[\hat{y}=-5780.83 + 49.69x\]

Example

To predict the body mass for a penguin with a flipper length of 180cm, we just need to plug in 180 for flipper length ($x$): \[\hat{y}=-5780.83 + 49.69\times 180 = 3163.37\text{g}.\] - Note that the regression line automatically deals with units.