- Should already have seen linear equations like \[y=mx+b\]
- In statistics, we write these as \[y=b_0 + b_1x\]
- \(b_0\) and \(b_1\) are constants.
- \(x\) is the independent variable.
- \(y\) is the dependent variable
\[y=b_0 + b_1x\]
A scatterplot shows the relationship between two (numeric) variables.
We call this type of data bivariate data.
This relationship can be modeled perfectly with a straight line: \[y = 8 + 3.25x\]
The linear regression line looks like \[y = \beta_0 + \beta_1x + \epsilon\]
We estimate \(\beta_0\) and \(\beta_1\) using data and denote the estimated line by \[\hat{y} = b_0 + b_1x \]
We use a regression line to make predictions about \(y\) using values of \(x\). Think of this as the 2-dimensional version of a point estimate!
Example: Researchers took a variety of measurements on 344 adult penguins in Antarctica.
The regression model for these data is \[\hat{y}=-5780.83 + 49.69x\]
To predict the body mass for a penguin with a flipper length of 180cm, we just need to plug in 180 for flipper length (\(x\)): \[\hat{y}=-5780.83 + 49.69\times 180 = 3163.37\text{g}.\] - Note that the regression line automatically deals with units.