Analysis of Variance

Insect sprays

The InsectSprays data in R gives the counts of insects in agricultural experimental units treated with six different insectisides. We can use this data to examine the relative effectiveness of the difference insectisides. (If the insectiside is effective, we would expect to see relatively few insects on the corresponding experimental unit.)

Exploratory analysis

Load the InsectSprays data set into our workspace.

data(InsectSprays)
?InsectSprays

We have observations on 2 different variables, one categorical and one numerical.

What are the variables in this data set? How many cases are in the sample?

As a first step in the analysis, we should consider summaries of the data.

summary(InsectSprays)
hist(InsectSprays$count)

This gives information for the number of experimental units assigned to each spray and the spread of the number of insects across all units. It does not tell us anything about how these variables might interact. We can consider this interaction using side-by-side boxplots.

boxplot(InsectSprays$count ~ InsectSprays$spray)

What does the plot suggest about the relationship between the count of insects and the spray used? Do you think there are differences between the sprays? If so, where do yo think the differences are?

ANOVA

To examine whether the conditions for ANOVA are satisfied, we can start by examining plots of the count for each spray group.

hist(InsectSprays$count[InsectSprays$spray == "A"], 
     main ="Histogram of Count for Spray A", xlab="count")

Repeat for sprays B - F. Do the groups appear to be normally distributed?

We can also check for normality using the Shapiro-Wilk test. Remember that the null hypothesis for the Shapiro-Wilk test is H0: the population is normally distributed. This means that if we reject, then the data is not normally distributed. If we fail to reject, then we say that it is reasonable to assume normality.

shapiro.test(InsectSprays$count[InsectSprays$spray == "A"])

## 
##  Shapiro-Wilk normality test
## 
## data:  InsectSprays$count[InsectSprays$spray == "A"]
## W = 0.95757, p-value = 0.7487

Repeat for sprays B - F. Is it reasonable to assume normality in each case?
Find the standard deviation of count for each spray. Do you think it is reasonable to assume that the six groups have equal variances?
Write the hypotheses for the ANOVA for these data.

We will assume that the assumptions are satisfied and will run the ANOVA. We do this using the function aov and the summary function. The first, aov, calculates the ANOVA. The latter summary formats the result into what ANOVA that we are familiar with.

summary(aov(InsectSprays$count ~ InsectSprays$spray))

What can you conclude based on this ANOVA table? Test at the 5% level of significance.

Finally, we want to examine where the differences are. We will use the function pairwise.t.test to examine all possible comparisons between the groups. Because we need to control for Type I error inflation, we will correct using the Bonferonni correction.

pairwise.t.test(x=InsectSprays$count, g=InsectSprays$spray, p.adj="bonf")

This outputs a matrix with the p-value for comparing each spray group, where the null hypothesis is that the means for the two groups are equal. Since probabilities are always between 0 and 1, if the Bonferonni correction makes a probability greater than 1, R will report 1. For example, for sprays A and B, the p-value for the test that mean(A) = mean(B) is 1.

Which sprays are significantly different from one another?

On your own

Use the following R code to read in a subset of the chickwts dataset in R. This data is for an experiment designed to measure the relative effectiveness of various feed supplements on chicken weights. Assume the different feeds are randomly assigned to the chickens. We are removing two of the feeds, horsebean and meatmeal, to simplify the problem slightly. To learn more about the full data, use ?chickwts.

chickwts <- chickwts[which(chickwts$feed != "horsebean" & chickwts$feed != "meatmeal"),]

What are the hypotheses for the ANOVA corresponding to these data?
Check the ANOVA assumptions for these data.
Assume the assumptions are satisfied. Conduct the ANOVA for these data. Include the ANOVA table in your report and make sure to report your results with a plain language explanation in the context of the study.
Run a post hoc test to determine where the differences exist. If there are no differences, use your post hoc test to confirm.

Note: This lab is derivative of an OpenIntro lab, released under a Creative Commons Attribution-ShareAlike 3.0 Unported. The original OpenIntro lab may be found here.