Completely Randomized Designs

In a completely randomized design (CRD) treatments are assigned to experimental units at random.

  • Appropriate when units are homogeneous, but what happens when they’re not?
  • If we don’t know how to describe the form of the heterogeneity, we can continue to use CRD.
  • When units are heterogeneous in a known way, we can use blocks.

Blocks

  • Idea: experimental units are heterogeneous in a known way and can be arranged into blocks.
    • within-block variation ideally small
    • between block variation ideally large
    • a block design can then be more efficient than a CRD
  • We prefer to have a block size equal to the number of treatments.
    • If this cannot be done, an incomplete block design must be used.
  • Ex: we think the treatment will function differently for high- vs low-risk patients, so we use risk level as a block.

Randomized Block Designs

  • In a randomized block design, the treatment levels are assigned randomly within a block.
    • Randomization restricted relative to CRD
    • This means fewer possible permutations for random treatment assignment
  • Because they are a feature of the experimental, block effects must be included in the model regardless of significance.
    • That is, we cannot regain these degrees of freedom if not significant.

Randomized Block Designs

Suppose we have one treatment factor \(\tau\) at \(t\) levels and one blocking facor \(\beta\) at \(r\) levels: \[y_{ij}=\mu + \tau_i + \beta_j + \epsilon_{ij}\]

  • Suppose there is one observation for each treatment in each block.
  • This is a randomized complete block design.
  • Analysis very similar to two-way ANOVA
    • Model can be extended to include interactions.
    • Significance of block effect only useful for future reference.

Example

data(oatvar)
attach(oatvar)
xtabs (yield ~ variety + block)
##        block
## variety   I  II III  IV   V
##       1 296 357 340 331 348
##       2 402 390 431 340 320
##       3 437 334 426 320 296
##       4 303 319 310 260 242
##       5 469 405 442 487 394
##       6 345 342 358 300 308
##       7 324 339 357 352 220
##       8 488 374 401 338 320

Example

par(mfrow=c(1,2))
boxplot(yield ~ variety, xlab="yield", ylab="variety")
boxplot(yield ~ block, xlab="yield",ylab="block")

Example

par(mfrow=c(1,2))
interaction.plot(variety, block, yield)
interaction.plot(block, variety, yield)

Example

mod1 <- lm(yield ~ block+variety)
anova(mod1)
## Analysis of Variance Table
## 
## Response: yield
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## block      4  33396  8348.9  6.2449  0.001008 ** 
## variety    7  77524 11074.8  8.2839 1.804e-05 ***
## Residuals 28  37433  1336.9                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Example

The ANOVA corresponds to the sequential tests

  1. \(y \sim 1\)
  2. \(y \sim \text{block}\)
  3. \(y \sim \text{block} + \text{variety}\)

We want to test the blocking effect in this way, within the context of the model as a whole (as opposed to testing it alone against \(y\)).

Model Terms

Note that the order the terms enter the model doesn’t matter because this is an orthogonal design.

If the design is not orthogonal, for example if we had one less observation, this would not be the case (the ANOVA would change, significance could change, etc.)

  • In this case, we should put the blocking factor into the model first so that it’s included when we test the treatment factor.

Example

par(mfrow=c(1,2))
plot(mod1, which=c(1,2))

RCBD versus CRD

To examine relative efficiency, we find the ratio of the variances of models with and without the blocking effect.

gcrd <- lm(yield ~ variety) # model without block
(summary(gcrd)$sig / summary(mod1)$sig)^2
## [1] 1.655617
  • A CRD would require 66% more observations to obtain the same level of precision as the RCBD.
  • Increased efficiency not guaranteed for RCBD - use blocks only if there’s a strong scientific reason to.

Latin Squares

  • Latin Squares are useful when there are two blocking variables.
  • Example: risk and gender

Example Latin Square

  • Suppose there are 4 treatments (A-D) and two blocks with 4 levels each.
1 2 3 4
I A B C D
II B D A C
III C A D B
IV D C B A
  • Each treatment assigned to each block exactly once.
    • Each row and column contains all 4 treatments.
  • We should choose randomly from all possible Latin Square layouts.

Latin Squares

The model is \[y_{ij}=\mu + \tau_i + \beta_j + \gamma_{k} + \epsilon_{ijk}\]

  • Not all combinations of \(i,j,k\) will appear.
  • Test for treatment effect using F-test on the full model vs a model with the treatment removed.

Latin Squares vs. RCBD

  • Latin Squares more efficient when blocking effects are sizable.
  • Variations:
    • Can be replicated if more runs available.
    • If both block sizes not equal to number of treatments, can create Latin Rectangles by adjoining Latin squares.
    • Graeco-Latin squares occasionally used for three blocking variables (but rarely seen in practice).

Latin Squares without Blocks

  • Can also be used to compare three treatment factors.
    • Requires \(t^2\) runs instead of \(t^3\) runs.
    • Downside: cannot estimate interactions.
    • This is one type of fractional factorial.

Example

data(abrasion)
abrasion
##    run position material wear
## 1    1        1        C  235
## 2    1        2        D  236
## 3    1        3        B  218
## 4    1        4        A  268
## 5    2        1        A  251
## 6    2        2        B  241
## 7    2        3        D  227
## 8    2        4        C  229
## 9    3        1        D  234
## 10   3        2        C  273
## 11   3        3        A  274
## 12   3        4        B  226
## 13   4        1        B  195
## 14   4        2        A  270
## 15   4        3        C  230
## 16   4        4        D  225

Example

Latin Square Structure:

matrix(abrasion$material, 4, 4)
##      [,1] [,2] [,3] [,4]
## [1,] "C"  "A"  "D"  "B" 
## [2,] "D"  "B"  "C"  "A" 
## [3,] "B"  "D"  "A"  "C" 
## [4,] "A"  "C"  "B"  "D"

Example

par(mfrow=c(1,3))
with(abrasion, boxplot(wear ~ material, xlab="Material"))
with(abrasion, boxplot(wear ~ run, xlab="Run"))
with(abrasion, boxplot(wear ~ position, xlab="Position"))

Example

g <- lm (wear ~ material+run+position, abrasion)
drop1(g, test="F") # test each variable against full model
## Single term deletions
## 
## Model:
## wear ~ material + run + position
##          Df Sum of Sq    RSS     AIC F value    Pr(>F)    
## <none>                 367.5  70.146                      
## material  3    4621.5 4989.0 105.878 25.1510 0.0008498 ***
## run       3     986.5 1354.0  85.012  5.3687 0.0390130 *  
## position  3    1468.5 1836.0  89.884  7.9918 0.0161685 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Example

par(mfrow=c(1,2))
plot(g, which=c(1,2))

## 
## Call:
## lm(formula = wear ~ material + run + position, data = abrasion)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -9.750 -2.250  0.250  3.688  8.000 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  254.750      6.187  41.174 1.37e-08 ***
## materialB    -45.750      5.534  -8.267 0.000169 ***
## materialC    -24.000      5.534  -4.337 0.004892 ** 
## materialD    -35.250      5.534  -6.370 0.000703 ***
## run2          -2.250      5.534  -0.407 0.698423    
## run3          12.500      5.534   2.259 0.064657 .  
## run4          -9.250      5.534  -1.671 0.145658    
## position2     26.250      5.534   4.743 0.003180 ** 
## position3      8.500      5.534   1.536 0.175454    
## position4      8.250      5.534   1.491 0.186608    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.826 on 6 degrees of freedom
## Multiple R-squared:  0.9506, Adjusted R-squared:  0.8766 
## F-statistic: 12.84 on 9 and 6 DF,  p-value: 0.002828

Balanced Incomplete Block Designs

In a complete block design, block size = number treatments.

What happens when block size < number treatments?

  • Treatments and blocks not orthogonal.
  • Some treatment contrasts will not be identifiable from certain block contrasts.
    • Confounding: these treatment contrasts cannot be examined.

Balanced Incomplete Block (BIB) Designs

  • In a BIB design, all pairwise differences are identifiable and have same standard error.
    • Pairwise differences are more likely to be interesting than other contrasts, so the design is constructed to facilitate this.

Example

Suppose, we have four treatments (\(t=4\)) A, B, C, D and the block size, \(k=3\) and there are \(b=4\) blocks.

  • Each treatment will appear \(r=3\) times.
  • Each pair of treatments appears in the same block \(\lambda =2\) times.

Example

One possibility:

Block 1 Block 2 Block 3 Block 4
A A A B
B B C C
C D D D

  • Treatments (\(t\)), block size, \(k\), and \(b\) blocks.
  • Each treatment will appear \(r\) times.
  • Each pair of treatments appears in the same block \(\lambda\) times.

For a BIB design, we require

  • \(b \ge t > k\)
  • \(rt = bk = n\)
  • \(\lambda(t-1) = r(k-1)\)
    • Number of pairs in a block is \(k(k-1)/2\) so total pairs is \(bk(k-1)/2\).
    • Number of treatment pairs is \(t(t-1)/2\)
      • Ratio of these must be \(\lambda\)

BIB Designs

Since \(\lambda\) must be an integer, a BIB design isn’t always possible, even if the first two conditions hold.

  • Ex: \(r=4, t=3, b=6, k=2\) so \(\lambda=2\) BIB possible
  • Ex: \(r=4, t=4, b=8, k=2\) so \(\lambda=4/3\) BIB not possible

BIB Model

Same as for RCBD:

\[y_{ij}=\mu + \tau_i + \beta_j + \epsilon_{ij}\]

Example

  • A nutritionist studied the effects of six diets, “a” through “f,” on weight gain of domestic rabbits.
  • From past experience with sizes of litters, it was felt that only three uniform rabbits could be selected from each available litter.
  • There were 10 litters available forming blocks of size 3.

Example

data(rabbit)
xtabs (gain ~ treat+block, rabbit)
##      block
## treat   b1  b10   b2   b3   b4   b5   b6   b7   b8   b9
##     a  0.0 37.3 40.1  0.0 44.9  0.0  0.0 45.2 44.0  0.0
##     b 32.6  0.0 38.1  0.0  0.0  0.0 37.3 40.6  0.0 30.6
##     c 35.2  0.0 40.9 34.6 43.9 40.9  0.0  0.0  0.0  0.0
##     d  0.0 42.3  0.0 37.5  0.0 37.3  0.0 37.9  0.0 27.5
##     e  0.0  0.0  0.0  0.0 40.8 32.0 40.5  0.0 38.5 20.6
##     f 42.2 41.7  0.0 34.3  0.0  0.0 42.8  0.0 51.9  0.0
  • 0 values correspond to no observation.
  • Note BIB structure w/ each pair of diets appearing in same block exactly twice

Example

attach(rabbit); par(mfrow=c(1,2))
boxplot(gain ~ block); boxplot(gain ~ treat)

Example

mod1 <- lm(gain ~ block + treat, rabbit) # enter block first!
anova(mod1)
## Analysis of Variance Table
## 
## Response: gain
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## block      9 730.39  81.154  8.0738 0.0002454 ***
## treat      5 158.73  31.745  3.1583 0.0381655 *  
## Residuals 15 150.77  10.052                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Example

This design is not orthogonal, so changing order will impact the ANOVA:

mod2 <- lm(gain ~ treat + block, rabbit) # don't do this
anova(mod2)
## Analysis of Variance Table
## 
## Response: gain
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## treat      5 293.38  58.676  5.8375 0.0034544 ** 
## block      9 595.74  66.193  6.5854 0.0007602 ***
## Residuals 15 150.77  10.052                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Example

par(mfrow=c(1,2))
plot(mod1, which=c(1,2))

Example: Pairwise CIs

  • Since this is not a complete layout, we need to construct the Tukey CIs.
coef(summary(mod1))[11:15,]
##           Estimate Std. Error     t value   Pr(>|t|)
## treatb -1.74166667   2.241821 -0.77689835 0.44929765
## treatc  0.40000000   2.241821  0.17842642 0.86077558
## treatd  0.06666667   2.241821  0.02973774 0.97666828
## treate -5.22500000   2.241821 -2.33069505 0.03413408
## treatf  3.30000000   2.241821  1.47201793 0.16168557

Example: Pairwise CIs

Compute Tukey CV:

qtukey(0.95, 6, 15) # 6 means to compare, 15 model df
## [1] 4.594735

Example: Pairwise CIs

  • Std errors are all 2.242

Compute interval width:

qtukey(0.95, 6, 15)*2.242/sqrt(2)
## [1] 7.284187

Example

tcoefs <- c(0, coef(mod1)[11:15])
abs(outer(tcoefs, tcoefs,"-")) > 7.2842 # only e-f different
##              treatb treatc treatd treate treatf
##        FALSE  FALSE  FALSE  FALSE  FALSE  FALSE
## treatb FALSE  FALSE  FALSE  FALSE  FALSE  FALSE
## treatc FALSE  FALSE  FALSE  FALSE  FALSE  FALSE
## treatd FALSE  FALSE  FALSE  FALSE  FALSE  FALSE
## treate FALSE  FALSE  FALSE  FALSE  FALSE   TRUE
## treatf FALSE  FALSE  FALSE  FALSE   TRUE  FALSE

Example

Relative efficiency of this BIB vs CRD

mod.crd <- lm(gain ~ treat, rabbit)
(summary(mod.crd)$sig / summary(mod1)$sig)^2 # way more efficient!
## [1] 3.094508