Introduction to ANOVA

ANOVA

“Are all these means the same?” This is called the ANalysis Of VAriance, or ANOVA.

\(H_0\): The mean outcome is the same across all groups.
\(H_A\): At least one mean differs from the rest.

\(H_0: \mu_1 = \mu_2 = \dots = \mu_k\)
\(H_A: \mu_i \ne \mu_j\) for at least one pair \((i, j)\)

where \(k\) is the number of means being compared and the notation \(\mu_i\) represents the mean for the \(i\)th group (\(i\) can take on any whole number value between 1 and \(k\)).

Conditions

For ANOVA, we have three key conditions:

Observations are independent within and across groups.
- For independence within group, we want to convince ourselves that for any particular group, the observations do not impact each other.
- For independence across groups, we want to convince ourselves that the groups do not impact each other.
Data within each group are approximately normal.
Variability is approximately equal across groups.

Why Variance?

Consider the following boxplots:

Why Variance?

Is there a difference in the means for Experiment 1? What about Experiment 2?

In fact, the means are \(\mu_1 = \mu_4 = 2\), \(\mu_2 = \mu_5 = 1\), and \(\mu_3 = \mu_6 = 0.5\).
The variances for the Experiment 1 groups are much larger than for the Experiment 2 groups.
The larger variances in Experiment 1 obscure any differences between the group means.

Mean Square Groups

We need to consider whether the sample means differ more than we would expect them to based on random variation.

This type of variation is called mean square between groups or \(MSG\).
It has associated degrees of freedom \(df_G = k-1\) where \(k\) is the number of groups.

\[MSG = \frac{SSG}{df_G}\] where \(SSE\) is the sum of squares group.

If \(H_0\) is true, variation in the sample means is due to chance.

Mean Square Error

We need some quantity that will give us an idea of how much variability to expect if the null hypothesis is true.
This is the mean square error or \(MSE\), which has degrees of freedom \(df_E = n-k\).

\[MSE = \frac{SSE}{df_E}\] where \(SSE\) is the sum of squares error.

F Test

We compare these two quantities by examining their ratio: \[F = \frac{MSG}{MSE}\]

This is the test statistic for the ANOVA.

The Basic ANOVA Table

	df	Sum of Squares	Mean Squares	F Value	P-Value
group	\(df_G\)	\(SSG\)	\(MSG\)	\(F\)	p-value
error	\(df_E\)	\(SSE\)	\(MSE\)

Example: Chick Weights

R has data on the weights of chicks fed six different feeds (diets).
Assume these data are based on a random sample of chicks.
There are \(n=71\) total observations and \(k=6\) different feeds.
Let’s assume we want to test with a 0.05 level of significance.

Example: Chick Weights

\(H_0\): the mean weight is the same for all six feeds.
\(H_A\): at least one feed has a mean weight that differs.

The summaries for these data are

##         casein horsebean linseed meatmeal soybean sunflower
## n        12.00     10.00   12.00    11.00   14.00     12.00
## Mean    323.58    160.20  218.75   276.91  246.43    328.92
## Std Dev  64.43     38.63   52.24    64.90   54.13     48.84

boxplot(chickwts$weight~chickwts$feed, ylab="Weight", xlab="Feed")

par(mfrow=c(2,3)); for(i in 1:6) hist(chickwts$weight[chickwts$feed==feeds[i]], xlab=feeds[i], main="")

Example: Chick Weights

anova(aov(chickwts$weight ~ chickwts$feed))

## Analysis of Variance Table
## 
## Response: chickwts$weight
##               Df Sum Sq Mean Sq F value    Pr(>F)    
## chickwts$feed  5 231129   46226  15.365 5.936e-10 ***
## Residuals     65 195556    3009                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1