Loops are a great, relatively straightforward way to repeatedly
execute a chunk of code. However, they aren’t especially efficient.
Enter: the apply type function:
replicate()apply()lapply()sapply()tapply()mapply()These functions run code chunks in a non-sequential way that is often more efficient than a loop (as long as the elements in your object aren’t dependent on other elements in your object).
replicate()We will start with replicate because it’s arguably the easiest of
these functions to use. The replicate function repeats a
function call n times.
replicate(n=4, "Hello")
## [1] "Hello" "Hello" "Hello" "Hello"
replicate(10, factorial(7))
## [1] 5040 5040 5040 5040 5040 5040 5040 5040 5040 5040
# histogram of the means from 100 random samples of size n=10 from a standard normal distribution
hist(replicate(100, mean(rnorm(10))),
main = "", xlab="Means")
The function rep replicates the values in the first
argument. This is not part of the apply family,
but may serve a similar purpose to replicate.
Suppose I want to represent all of the possible combinations when rolling two four-sided dice.
v1 <- rep(1:4, times=4) # replicate the sequence 1:4, four times
v2 <- rep(1:4, each=4) # replicate 1:4, with each number replicated 4 times (in a row)
data.frame(v1, v2)
## v1 v2
## 1 1 1
## 2 2 1
## 3 3 1
## 4 4 1
## 5 1 2
## 6 2 2
## 7 3 2
## 8 4 2
## 9 1 3
## 10 2 3
## 11 3 3
## 12 4 3
## 13 1 4
## 14 2 4
## 15 3 4
## 16 4 4
apply()The apply function applies a given function to the rows
or columns of matrices (or arrays). It assembles the returned values
into a vector, array, or list, which it returns.
The apply() arguments:
X: an array (matrix)MARGIN
1 indicates rowsc(1,2) indicates rows and columnsFUN: the function to be applied...: additional arguments to be passed to
FUNdata <- matrix(1:9, nrow=3, ncol=3)
# the following is equivalent to the command: colMeans(data)
apply(data, 2, mean) # data, columns, mean --> get column means
## [1] 2 5 8
# the following is equivalent to the command: rowSums(data)
apply(data, 1, sum) # data, rows, sum --> get row sums
## [1] 12 15 18
We can also use apply functions on user-defined
functions.
# Define the function within the apply statement:
apply(data, 2, function(x){
y <- sum(x)^2 # sum of the input vector (here a column) squared
return(y)
}
)
## [1] 36 225 576
# Define the function outside of the apply statement:
fn <- function(x){
y <- sum(x)^2 # sum of the input vector (here a column) squared
return(y)
}
apply(data, 2, fn)
## [1] 36 225 576
The values that apply() returns depends on the function
FUN.
FUN returns an element of length 1, then
apply will return a vector.FUN always returns an element of length \(n>1\), then apply will
return a matrix with n rows, and the number of columns will correspond
to how many rows/columns were iterated over.FUN returns an object that would vary in
length, then apply will return a list where each element
corresponds to a row or column that was iterated over.In short, apply prioritizes returning a vector, array
(matrix), and list (in that order). What is returned depends on the
output of FUN.
Note: running apply on a data frame will cause R to
convert the data frame using as.matrix. This is often not
what we want, so be cautious doing that.
x <- cbind(x1 = 3, x2 = c(4:1, 2:5))
fun1 <- function(x, c1, c2){
mean_vec <- c(mean(x[c1]), mean(x[c2]))
return(mean_vec)
}
apply(x, 1, fun1, c1 = "x1", c2 = c("x1","x2"))
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
## [1,] 3.0 3 3.0 3 3.0 3 3.0 3
## [2,] 3.5 3 2.5 2 2.5 3 3.5 4
mat <- matrix(c(-1, 1, 0,
2, -2, 20,
62,-2, -6), nrow = 3)
CheckPos <- function(Vec){
# Subset values of Vec that are even
PosVec <- Vec[Vec > 0]
# Return only the even values
return(PosVec)
}
# Check Positive values by column
apply(mat, 2, CheckPos)
## [[1]]
## [1] 1
##
## [[2]]
## [1] 2 20
##
## [[3]]
## [1] 62
Use an apply function to find the interquartile range
(IQR()) of each variable in the ChickWeight
data. (This dataset is built into R.)
lapply()The lapply function is used to apply a function to each
element of a list. It collects the returned values into another list,
which it returns.
Arguments:
X: a listFUN: the function to be applied...: additional arguments to be passed to
FUNdata_lst <- list(item1 = 1:5,
item2 = seq(4,36,8),
item3 = c(1,3,5,7,9))
data_vector <- 1:8
lapply(data_lst, sum)
## $item1
## [1] 15
##
## $item2
## [1] 100
##
## $item3
## [1] 25
lapply(data_vector, sum) # lapply performs an `as.list` command on X if it's not already a list
## [[1]]
## [1] 1
##
## [[2]]
## [1] 2
##
## [[3]]
## [1] 3
##
## [[4]]
## [1] 4
##
## [[5]]
## [1] 5
##
## [[6]]
## [1] 6
##
## [[7]]
## [1] 7
##
## [[8]]
## [1] 8
x <- list(a = 1:10,
beta = exp(-3:3),
logic = c(TRUE,FALSE,FALSE,TRUE))
# compute the list mean for each list element
lapply(x, mean)
## $a
## [1] 5.5
##
## $beta
## [1] 4.535125
##
## $logic
## [1] 0.5
Consider the built-in data set iris. If we use the
as.list() function, each column is converted into an
element of a list.
head(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
str(as.list(iris))
## List of 5
## $ Sepal.Length: num [1:150] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
## $ Sepal.Width : num [1:150] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
## $ Petal.Length: num [1:150] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
## $ Petal.Width : num [1:150] 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
## $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
So if we use lapply() in this case, it will iterate over
the columns. We can find all values within a variable that are
greater than the variable mean (for columns 1-4, the numeric
variables).
lapply(iris[,1:4], function(column){
big_values <- column[column > mean(column)]
return(big_values)
})
## $Sepal.Length
## [1] 7.0 6.4 6.9 6.5 6.3 6.6 5.9 6.0 6.1 6.7 6.2 5.9 6.1 6.3 6.1 6.4 6.6 6.8 6.7
## [20] 6.0 6.0 6.0 6.7 6.3 6.1 6.2 6.3 7.1 6.3 6.5 7.6 7.3 6.7 7.2 6.5 6.4 6.8 6.4
## [39] 6.5 7.7 7.7 6.0 6.9 7.7 6.3 6.7 7.2 6.2 6.1 6.4 7.2 7.4 7.9 6.4 6.3 6.1 7.7
## [58] 6.3 6.4 6.0 6.9 6.7 6.9 6.8 6.7 6.7 6.3 6.5 6.2 5.9
##
## $Sepal.Width
## [1] 3.5 3.2 3.1 3.6 3.9 3.4 3.4 3.1 3.7 3.4 4.0 4.4 3.9 3.5 3.8 3.8 3.4 3.7 3.6
## [20] 3.3 3.4 3.4 3.5 3.4 3.2 3.1 3.4 4.1 4.2 3.1 3.2 3.5 3.6 3.4 3.5 3.2 3.5 3.8
## [39] 3.8 3.2 3.7 3.3 3.2 3.2 3.1 3.3 3.1 3.2 3.4 3.1 3.3 3.6 3.2 3.2 3.8 3.2 3.3
## [58] 3.2 3.8 3.4 3.1 3.1 3.1 3.1 3.2 3.3 3.4
##
## $Petal.Length
## [1] 4.7 4.5 4.9 4.0 4.6 4.5 4.7 4.6 3.9 4.2 4.0 4.7 4.4 4.5 4.1 4.5 3.9 4.8 4.0
## [20] 4.9 4.7 4.3 4.4 4.8 5.0 4.5 3.8 3.9 5.1 4.5 4.5 4.7 4.4 4.1 4.0 4.4 4.6 4.0
## [39] 4.2 4.2 4.2 4.3 4.1 6.0 5.1 5.9 5.6 5.8 6.6 4.5 6.3 5.8 6.1 5.1 5.3 5.5 5.0
## [58] 5.1 5.3 5.5 6.7 6.9 5.0 5.7 4.9 6.7 4.9 5.7 6.0 4.8 4.9 5.6 5.8 6.1 6.4 5.6
## [77] 5.1 5.6 6.1 5.6 5.5 4.8 5.4 5.6 5.1 5.1 5.9 5.7 5.2 5.0 5.2 5.4 5.1
##
## $Petal.Width
## [1] 1.4 1.5 1.5 1.3 1.5 1.3 1.6 1.3 1.4 1.5 1.4 1.3 1.4 1.5 1.5 1.8 1.3 1.5 1.2
## [20] 1.3 1.4 1.4 1.7 1.5 1.2 1.6 1.5 1.6 1.5 1.3 1.3 1.3 1.2 1.4 1.2 1.3 1.2 1.3
## [39] 1.3 1.3 2.5 1.9 2.1 1.8 2.2 2.1 1.7 1.8 1.8 2.5 2.0 1.9 2.1 2.0 2.4 2.3 1.8
## [58] 2.2 2.3 1.5 2.3 2.0 2.0 1.8 2.1 1.8 1.8 1.8 2.1 1.6 1.9 2.0 2.2 1.5 1.4 2.3
## [77] 2.4 1.8 1.8 2.1 2.4 2.3 1.9 2.3 2.5 2.3 1.9 2.0 2.3 1.8
Use lapply to find the range for each item in the list
data_lst (which should already be in your R environment
from an earlier code chunk).
sapply()The sapply function works basically the same as the
lapply function. The primary difference is that
sapply attempts to simplify the result into a vector or
matrix (instead of a list). This simplification works the same way as in
apply.
lapply(data_lst, sum) # returns a list
## $item1
## [1] 15
##
## $item2
## [1] 100
##
## $item3
## [1] 25
sapply(data_lst, sum) # returns a vector
## item1 item2 item3
## 15 100 25
Use sapply to find the range for each item in the list
data_lst.
tapply()The tapply function breaks the data set up into groups
and applies a function to each group.
Arguments:
X: A 1 dimensional objectINDEX: A grouping factor or a list of factorsFUN: The function to be applieddata = data.frame(name=c("Amy","Jose","Ray","Kim","Sam","Eve","Bob"),
age=c(24, 22, 21, 23, 20, 24, 21),
gender=factor(c("F","M","M","F","M","F","M")))
tapply(data$age, data$gender, min) # age, grouped by gender, min for each group
## F M
## 23 20
For the ChickWeight data, use tapply find
the mean weight for each chick.
mapply()The mapply function is a multivariate version of
sapply. It applies FUN to the first elements
of each ... argument, the second elements, the third
elements, and so on.
Arguments:
FUN: The function to be applied...: Arguments to vectorize over (vectors or lists of
strictly positive length, or all of zero length).mapply(rep, times = 1:4, x = 4:1)
## [[1]]
## [1] 4
##
## [[2]]
## [1] 3 3
##
## [[3]]
## [1] 2 2 2
##
## [[4]]
## [1] 1 1 1 1
More information and examples: http://adv-r.had.co.nz/Functionals.html