Standard hypothesis test are readily available as built in R functions:

t tests
ANOVA
chi-square tests
etc.

Consider the chickwts dataset.

head(chickwts)

##   weight      feed
## 1    179 horsebean
## 2    160 horsebean
## 3    136 horsebean
## 4    227 horsebean
## 5    217 horsebean
## 6    168 horsebean

In general, if you know what standard test you want to perform, you can look up the appropriate function online and you have all the tools you need to run it based on what we’ve already learned. For example, we might perform an ANOVA to see if there is a difference in weight between the different diets.

# ?aov
test1 <- aov(weight~feed, chickwts) #runs an ANOVA along with a bunch of other stuff
names(test1) # see what the aov command stored for us

##  [1] "coefficients"  "residuals"     "effects"       "rank"         
##  [5] "fitted.values" "assign"        "qr"            "df.residual"  
##  [9] "contrasts"     "xlevels"       "call"          "terms"        
## [13] "model"

anova(test1) # examine the ANOVA output

## Analysis of Variance Table
## 
## Response: weight
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## feed       5 231129   46226  15.365 5.936e-10 ***
## Residuals 65 195556    3009                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

par(mfrow=c(2,2))
plot(test1) # default plots are diagnostic plots

Because statistical inference in R follows so directly from what we’ve already learned (and some things we will learn in the next labs), we will focus on running alternate approaches to inference.

Nonparametric Tests

There are lots of nonparametric tests built into R, but let’s take a moment to work through a couple.

On Your Own

What follows is an explanation of a hypothesis test in R, the Wilcoxon signed-rank test. See if you can convert this explanation to R code.

Suppose we have some paired data on dock jumps, a type of dog agility competition.

dognames <- c("Suki","Harvey","Sausage","Heidi","Beans")
jump1 <- c(24.3,26.3,31.2,19.9,23.1)
jump2 <- c(24.6,27.1,30.0,22.5,24.1)
dockjump <- data.frame(dognames, jump1, jump2)
dockjump

##   dognames jump1 jump2
## 1     Suki  24.3  24.6
## 2   Harvey  26.3  27.1
## 3  Sausage  31.2  30.0
## 4    Heidi  19.9  22.5
## 5    Beans  23.1  24.1

We wish to determine if the average difference between these pairs is nonzero.

Calculate the difference \(d\) between the first and second jump for each dog and store this in a new variable.
Find the absolute value of each difference \(d\). Sort these values and assign a rank, \(r\) to each sorted value (1 is the minimum, \(n=5\) is the maximum).
Calculate \[\sum_{i=1}^{n} \text{sign}(d_i)r_i\] where \(\text{sign}(x) = 1\) if \(x > 0\) and \(\text{sign}(x) = -1\) if \(x < 0\).

Statistical Inference

Lauren Cappiello

Nonparametric Tests

On Your Own