Loops are a great, relatively straightforward way to repeatedly
execute a chunk of code. However, they aren’t especially efficient.
Enter: the apply
type function:
replicate()
apply()
lapply()
sapply()
tapply()
mapply()
These functions run code chunks in a non-sequential way that is often more efficient than a loop (as long as the elements in your object aren’t dependent on other elements in your object).
replicate()
We will start with replicate because it’s arguably the easiest of
these functions to use. The replicate
function repeats a
function call n
times.
replicate(n=4, "Hello")
## [1] "Hello" "Hello" "Hello" "Hello"
replicate(10, factorial(7))
## [1] 5040 5040 5040 5040 5040 5040 5040 5040 5040 5040
# histogram of the means from 100 random samples of size n=10 from a standard normal distribution
hist(replicate(100, mean(rnorm(10))),
main = "", xlab="Means")
The function rep
replicates the values in the first
argument. This is not part of the apply
family,
but may serve a similar purpose to replicate
.
Suppose I want to represent all of the possible combinations when rolling two four-sided dice.
v1 <- rep(1:4, times=4) # replicate the sequence 1:4, four times
v2 <- rep(1:4, each=4) # replicate 1:4, with each number replicated 4 times (in a row)
data.frame(v1, v2)
## v1 v2
## 1 1 1
## 2 2 1
## 3 3 1
## 4 4 1
## 5 1 2
## 6 2 2
## 7 3 2
## 8 4 2
## 9 1 3
## 10 2 3
## 11 3 3
## 12 4 3
## 13 1 4
## 14 2 4
## 15 3 4
## 16 4 4
apply()
The apply
function applies a given function to the rows
or columns of matrices (or arrays). It assembles the returned values
into a vector, array, or list, which it returns.
The apply()
arguments:
X
: an array (matrix)MARGIN
1
indicates rowsc(1,2)
indicates rows and columnsFUN
: the function to be applied...
: additional arguments to be passed to
FUN
data <- matrix(1:9, nrow=3, ncol=3)
# the following is equivalent to the command: colMeans(data)
apply(data, 2, mean) # data, columns, mean --> get column means
## [1] 2 5 8
# the following is equivalent to the command: rowSums(data)
apply(data, 1, sum) # data, rows, sum --> get row sums
## [1] 12 15 18
We can also use apply
functions on user-defined
functions.
# Define the function within the apply statement:
apply(data, 2, function(x){
y <- sum(x)^2 # sum of the input vector (here a column) squared
return(y)
}
)
## [1] 36 225 576
# Define the function outside of the apply statement:
fn <- function(x){
y <- sum(x)^2 # sum of the input vector (here a column) squared
return(y)
}
apply(data, 2, fn)
## [1] 36 225 576
The values that apply()
returns depends on the function
FUN
.
FUN
returns an element of length 1, then
apply
will return a vector.FUN
always returns an element of length \(n>1\), then apply
will
return a matrix with n rows, and the number of columns will correspond
to how many rows/columns were iterated over.FUN
returns an object that would vary in
length, then apply
will return a list where each element
corresponds to a row or column that was iterated over.In short, apply
prioritizes returning a vector, array
(matrix), and list (in that order). What is returned depends on the
output of FUN
.
Note: running apply
on a data frame will cause R to
convert the data frame using as.matrix
. This is often not
what we want, so be cautious doing that.
x <- cbind(x1 = 3, x2 = c(4:1, 2:5))
fun1 <- function(x, c1, c2){
mean_vec <- c(mean(x[c1]), mean(x[c2]))
return(mean_vec)
}
apply(x, 1, fun1, c1 = "x1", c2 = c("x1","x2"))
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
## [1,] 3.0 3 3.0 3 3.0 3 3.0 3
## [2,] 3.5 3 2.5 2 2.5 3 3.5 4
mat <- matrix(c(-1, 1, 0,
2, -2, 20,
62,-2, -6), nrow = 3)
CheckPos <- function(Vec){
# Subset values of Vec that are even
PosVec <- Vec[Vec > 0]
# Return only the even values
return(PosVec)
}
# Check Positive values by column
apply(mat, 2, CheckPos)
## [[1]]
## [1] 1
##
## [[2]]
## [1] 2 20
##
## [[3]]
## [1] 62
Use an apply function to find the interquartile range
(IQR()
) of each variable in the ChickWeight
data. (This dataset is built into R.)
lapply()
The lapply
function is used to apply a function to each
element of a list. It collects the returned values into another list,
which it returns.
Arguments:
X
: a listFUN
: the function to be applied...
: additional arguments to be passed to
FUN
data_lst <- list(item1 = 1:5,
item2 = seq(4,36,8),
item3 = c(1,3,5,7,9))
data_vector <- 1:8
lapply(data_lst, sum)
## $item1
## [1] 15
##
## $item2
## [1] 100
##
## $item3
## [1] 25
lapply(data_vector, sum) # lapply performs an `as.list` command on X if it's not already a list
## [[1]]
## [1] 1
##
## [[2]]
## [1] 2
##
## [[3]]
## [1] 3
##
## [[4]]
## [1] 4
##
## [[5]]
## [1] 5
##
## [[6]]
## [1] 6
##
## [[7]]
## [1] 7
##
## [[8]]
## [1] 8
x <- list(a = 1:10,
beta = exp(-3:3),
logic = c(TRUE,FALSE,FALSE,TRUE))
# compute the list mean for each list element
lapply(x, mean)
## $a
## [1] 5.5
##
## $beta
## [1] 4.535125
##
## $logic
## [1] 0.5
Consider the built-in data set iris
. If we use the
as.list()
function, each column is converted into an
element of a list.
head(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
str(as.list(iris))
## List of 5
## $ Sepal.Length: num [1:150] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
## $ Sepal.Width : num [1:150] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
## $ Petal.Length: num [1:150] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
## $ Petal.Width : num [1:150] 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
## $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
So if we use lapply()
in this case, it will iterate over
the columns. We can find all values within a variable that are
greater than the variable mean (for columns 1-4, the numeric
variables).
lapply(iris[,1:4], function(column){
big_values <- column[column > mean(column)]
return(big_values)
})
## $Sepal.Length
## [1] 7.0 6.4 6.9 6.5 6.3 6.6 5.9 6.0 6.1 6.7 6.2 5.9 6.1 6.3 6.1 6.4 6.6 6.8 6.7
## [20] 6.0 6.0 6.0 6.7 6.3 6.1 6.2 6.3 7.1 6.3 6.5 7.6 7.3 6.7 7.2 6.5 6.4 6.8 6.4
## [39] 6.5 7.7 7.7 6.0 6.9 7.7 6.3 6.7 7.2 6.2 6.1 6.4 7.2 7.4 7.9 6.4 6.3 6.1 7.7
## [58] 6.3 6.4 6.0 6.9 6.7 6.9 6.8 6.7 6.7 6.3 6.5 6.2 5.9
##
## $Sepal.Width
## [1] 3.5 3.2 3.1 3.6 3.9 3.4 3.4 3.1 3.7 3.4 4.0 4.4 3.9 3.5 3.8 3.8 3.4 3.7 3.6
## [20] 3.3 3.4 3.4 3.5 3.4 3.2 3.1 3.4 4.1 4.2 3.1 3.2 3.5 3.6 3.4 3.5 3.2 3.5 3.8
## [39] 3.8 3.2 3.7 3.3 3.2 3.2 3.1 3.3 3.1 3.2 3.4 3.1 3.3 3.6 3.2 3.2 3.8 3.2 3.3
## [58] 3.2 3.8 3.4 3.1 3.1 3.1 3.1 3.2 3.3 3.4
##
## $Petal.Length
## [1] 4.7 4.5 4.9 4.0 4.6 4.5 4.7 4.6 3.9 4.2 4.0 4.7 4.4 4.5 4.1 4.5 3.9 4.8 4.0
## [20] 4.9 4.7 4.3 4.4 4.8 5.0 4.5 3.8 3.9 5.1 4.5 4.5 4.7 4.4 4.1 4.0 4.4 4.6 4.0
## [39] 4.2 4.2 4.2 4.3 4.1 6.0 5.1 5.9 5.6 5.8 6.6 4.5 6.3 5.8 6.1 5.1 5.3 5.5 5.0
## [58] 5.1 5.3 5.5 6.7 6.9 5.0 5.7 4.9 6.7 4.9 5.7 6.0 4.8 4.9 5.6 5.8 6.1 6.4 5.6
## [77] 5.1 5.6 6.1 5.6 5.5 4.8 5.4 5.6 5.1 5.1 5.9 5.7 5.2 5.0 5.2 5.4 5.1
##
## $Petal.Width
## [1] 1.4 1.5 1.5 1.3 1.5 1.3 1.6 1.3 1.4 1.5 1.4 1.3 1.4 1.5 1.5 1.8 1.3 1.5 1.2
## [20] 1.3 1.4 1.4 1.7 1.5 1.2 1.6 1.5 1.6 1.5 1.3 1.3 1.3 1.2 1.4 1.2 1.3 1.2 1.3
## [39] 1.3 1.3 2.5 1.9 2.1 1.8 2.2 2.1 1.7 1.8 1.8 2.5 2.0 1.9 2.1 2.0 2.4 2.3 1.8
## [58] 2.2 2.3 1.5 2.3 2.0 2.0 1.8 2.1 1.8 1.8 1.8 2.1 1.6 1.9 2.0 2.2 1.5 1.4 2.3
## [77] 2.4 1.8 1.8 2.1 2.4 2.3 1.9 2.3 2.5 2.3 1.9 2.0 2.3 1.8
Use lapply
to find the range for each item in the list
data_lst
(which should already be in your R environment
from an earlier code chunk).
sapply()
The sapply
function works basically the same as the
lapply
function. The primary difference is that
sapply
attempts to simplify the result into a vector or
matrix (instead of a list). This simplification works the same way as in
apply
.
lapply(data_lst, sum) # returns a list
## $item1
## [1] 15
##
## $item2
## [1] 100
##
## $item3
## [1] 25
sapply(data_lst, sum) # returns a vector
## item1 item2 item3
## 15 100 25
Use sapply
to find the range for each item in the list
data_lst
.
tapply()
The tapply
function breaks the data set up into groups
and applies a function to each group.
Arguments:
X
: A 1 dimensional objectINDEX
: A grouping factor or a list of factorsFUN
: The function to be applieddata = data.frame(name=c("Amy","Jose","Ray","Kim","Sam","Eve","Bob"),
age=c(24, 22, 21, 23, 20, 24, 21),
gender=factor(c("F","M","M","F","M","F","M")))
tapply(data$age, data$gender, min) # age, grouped by gender, min for each group
## F M
## 23 20
For the ChickWeight
data, use tapply
find
the mean weight for each chick.
mapply()
The mapply
function is a multivariate version of
sapply
. It applies FUN
to the first elements
of each ...
argument, the second elements, the third
elements, and so on.
Arguments:
FUN
: The function to be applied...
: Arguments to vectorize over (vectors or lists of
strictly positive length, or all of zero length).mapply(rep, times = 1:4, x = 4:1)
## [[1]]
## [1] 4
##
## [[2]]
## [1] 3 3
##
## [[3]]
## [1] 2 2 2
##
## [[4]]
## [1] 1 1 1 1
More information and examples: http://adv-r.had.co.nz/Functionals.html