A matrix is a two-dimensional array of data of the same type. To create a matrix, we need to tell R the data along with the number of rows and columns.
m1 <- matrix(c(1, 9, 2, 0, 5, 7, 3, 8, 4),
nrow=3, ncol=3)
R labels both rows and columns in the output. When R puts the data
into the matrix, by default it fills in column-by-column. We can
override this with the byrow
command:
m2 <- matrix(c(1, 9, 2, 0, 5, 7, 3, 8, 4),
nrow=3, ncol=3,
byrow = TRUE)
We can create matrices of doubles, integers, logicals, characters, etc. but mostly they are used for numeric data.
Use R to create and store the matrix
1, 5, 19
6, 2, 9
0, 12, 13
in R.
As with vectors, we can use matrices with arithmetic operations. These are vectorized operations and NOT matrix math.
#multiply by a scalar
# add matrices
#multiply matrices
What about matrix math? We wonโt need it in this class, but R can absolutely handle that!
# actual matrix multiplication
m1 %*% m2
## [,1] [,2] [,3]
## [1,] 10 33 14
## [2,] 33 170 85
## [3,] 14 85 69
Data frames are a collection of same-length vectors of (potentially) different types. This is how most data is stored.
We can create a data frame manually using a set of same-length vectors.
age <- c(1, 8, 10, 30, 31)
gender <- c("Female", "Female", "Male","Female","Male")
married <- c(FALSE, FALSE, FALSE, TRUE, TRUE)
simpsons <- data.frame(age, gender, married)
simpsons
## age gender married
## 1 1 Female FALSE
## 2 8 Female FALSE
## 3 10 Male FALSE
## 4 30 Female TRUE
## 5 31 Male TRUE
We can see the class of each column in a data frame using the command
sapply(simpsons, class)
## age gender married
## "numeric" "character" "logical"
To quickly see the size or dimensions of a matrix/data frame, there are three important functions.
dim(simpsons)
## [1] 5 3
nrow(simpsons)
## [1] 5
ncol(simpsons)
## [1] 3
In vectors, we access individual elements using [index]
(or some list of indices). For vectors and data frames, we access
individual elements using [row index, column index]
.
m <- matrix(c(1, 9, 2, 0, 5, 7, 3, 8, 4),
nrow=3, ncol=3)
m
## [,1] [,2] [,3]
## [1,] 1 0 3
## [2,] 9 5 8
## [3,] 2 7 4
m[1,2]
## [1] 0
m[c(1,2),c(1,3)]
## [,1] [,2]
## [1,] 1 3
## [2,] 9 8
We can access entire rows/columns by leaving one dimension blank.
# isolate the second row
m[2,]
## [1] 9 5 8
#isolate the second column
m[,2]
## [1] 0 5 7
We also have a special access ability with data frames (as with lists): the dollar sign. This will allow us to access specific columns using their names.
age1 <- simpsons$age
age1
## [1] 1 8 10 30 31
We sometimes need to convert matrices to data frames and vice versa.
m <- as.data.frame(m)
m
## V1 V2 V3
## 1 1 0 3
## 2 9 5 8
## 3 2 7 4
m <- as.matrix(m)
m
## V1 V2 V3
## [1,] 1 0 3
## [2,] 9 5 8
## [3,] 2 7 4
We also routinely need to change the type of individual columns in a data frame.
simpsons$gender <- as.factor(simpsons$gender)
simpsons$gender
## [1] Female Female Male Female Male
## Levels: Female Male
In both matrices and data frames, we are able to name both rows and columns. Data frames will always have column names, but matrices may not. In part, this is because data frames are used to store and work with data, whereas matrices are used more for linear algebra type calculations.
rownames(simpsons)
## [1] "1" "2" "3" "4" "5"
colnames(simpsons)
## [1] "age" "gender" "married"
We can also assign row and column names.
rownames(simpsons) <- c("Maggie", "Lisa", "Bart", "Marge", "Homer")
colnames(simpsons) <- c("Age", "Gender", "Married")
simpsons
## Age Gender Married
## Maggie 1 Female FALSE
## Lisa 8 Female FALSE
## Bart 10 Male FALSE
## Marge 30 Female TRUE
## Homer 31 Male TRUE
There are other object types. You may occasionally run into them, or you may find yourself needing one. For example, you may find that you need a three- or four-dimensional object (called arrays), but we will not use these in this class.