The InsectSprays
data in R
gives the counts of insects in agricultural experimental units treated with six different insectisides. We can use this data to examine the relative effectiveness of the difference insectisides. (If the insectiside is effective, we would expect to see relatively few insects on the corresponding experimental unit.)
Load the InsectSprays
data set into our workspace.
We have observations on 2 different variables, one categorical and one numerical.
As a first step in the analysis, we should consider summaries of the data.
This gives information for the number of experimental units assigned to each spray and the spread of the number of insects across all units. It does not tell us anything about how these variables might interact. We can consider this interaction using side-by-side boxplots.
count
of insects and the spray
used? Do you think there are differences between the sprays? If so, where do yo think the differences are?To examine whether the conditions for ANOVA are satisfied, we can start by examining plots of the count
for each spray
group.
hist(InsectSprays$count[InsectSprays$spray == "A"],
main ="Histogram of Count for Spray A", xlab="count")
We can also check for normality using the Shapiro-Wilk test. Remember that the null hypothesis for the Shapiro-Wilk test is H0: the population is normally distributed. This means that if we reject, then the data is not normally distributed. If we fail to reject, then we say that it is reasonable to assume normality.
##
## Shapiro-Wilk normality test
##
## data: InsectSprays$count[InsectSprays$spray == "A"]
## W = 0.95757, p-value = 0.7487
Repeat for sprays B - F. Is it reasonable to assume normality in each case?
Find the standard deviation of count
for each spray
. Do you think it is reasonable to assume that the six groups have equal variances?
Write the hypotheses for the ANOVA for these data.
We will assume that the assumptions are satisfied and will run the ANOVA. We do this using the function aov
and the summary
function. The first, aov
, calculates the ANOVA. The latter summary
formats the result into what ANOVA that we are familiar with.
Finally, we want to examine where the differences are. We will use the function pairwise.t.test
to examine all possible comparisons between the groups. Because we need to control for Type I error inflation, we will correct using the Bonferonni correction.
This outputs a matrix with the p-value for comparing each spray
group, where the null hypothesis is that the means for the two groups are equal. Since probabilities are always between 0 and 1, if the Bonferonni correction makes a probability greater than 1, R
will report 1. For example, for sprays A and B, the p-value for the test that mean(A) = mean(B) is 1.
Use the following R
code to read in a subset of the chickwts
dataset in R
. This data is for an experiment designed to measure the relative effectiveness of various feed supplements on chicken weights. Assume the different feeds are randomly assigned to the chickens. We are removing two of the feeds, horsebean and meatmeal, to simplify the problem slightly. To learn more about the full data, use ?chickwts
.
What are the hypotheses for the ANOVA corresponding to these data?
Check the ANOVA assumptions for these data.
Assume the assumptions are satisfied. Conduct the ANOVA for these data. Include the ANOVA table in your report and make sure to report your results with a plain language explanation in the context of the study.
Run a post hoc test to determine where the differences exist. If there are no differences, use your post hoc test to confirm.
Note: This lab is derivative of an OpenIntro lab, released under a Creative Commons Attribution-ShareAlike 3.0 Unported. The original OpenIntro lab may be found here.