We got a brief introduction to 1-D objects in R when we discussed naming and storing objects. R has a variety of different types of objects that store different types and dimensions of information.
A vector is a 1-D object, meaning that it stores a sequence of elements in a particular order. In general, the only rule here is that the elements all need to be the same type - numeric, character, etc. In our intro to R lab, we saw vectors with one element:
a <- 2
To create vectors with multiple elements, we use the concatenate function: c()
:
v <- c(1,2,3)
Create a vector named v2
of the (ordered) numbers: 5, 2, 9, and 10.
The data types available in R are double
, integer
, complex
, logical
, character
, and raw
.
raw
type is almost never used in practice.Unlike in other programming languages, we don’t need to tell R which data type we’re using. However, we may occasionally want to change whatever R did automatically. We can set this up when we create a vector, but I think the easiest way to manage this is to let R do its thing and then use as.type
to make the change (where “type” is replaced by the desired data type). For example,
v.double <- as.double(v)
We can also create vectors by combine other vectors of the same type:
v.combined <- c(v, v2)
Convert the variable v2
to character.
Vectors of all variable types can also include NA
, which indicates a missing value.
We can check which type R is using for a variable using typeof()
. Let’s check v
and v2
to make sure we got what we wanted (v
should be a double; v2
should be character).
We enter character type values as strings enclosed in double or single quotes.
char1 <- c("banana","apple")
is exactly equivalent to
char1 <- c('banana','apple')
Logical vectors can be entered directly using TRUE
and FALSE
or T
and F
. This is one place where R being case sensitive is really important - you can use t
and f
as variable names, but T
and F
are reserved.
logi1 <- c(FALSE, TRUE, T, F)
You can also create logical vectors by using logical operators within the c()
function.
logi2 <- c(2>3, 5!=6)
You can check the length of a vector using length()
:
length(logi1)
## [1] 4
We can do math on vectors directly! They don’t need to be of exactly the same type, but they do need to be of the same class. We can check the class using class()
.
Operations are performed element-by-element.
v3 <- c(1, 5, 6)
v4 <- c(-0.41, -1.20, pi)
class(v3)
## [1] "numeric"
class(v4)
## [1] "numeric"
Notice that class()
tells us that both vectors are numeric, so we’re good to go with the math. Let’s try a few operations with v3
and v4
:
This can sometimes feel a little weird, especially if you’ve taken vector calculus. That’s ok!
If we use a scalar (including an object holding only one value, like a <- 3
), it will apply that scalar operation to every element in the vector. Let’s try a few with v3
:
v5 <- c(10,20)
Let’s try a couple of operations with v3
and v4
:
What happened?
We can’t do math with character vectors.
We can do math with logical vectors, though. Since logical vectors are binary, R will treat TRUE
and 1 and FALSE
as 0. Why might this be useful?
Sometimes we may want to access or use specific elements from a vector without creating new objects for each element. An index is the location of an element in a vector. R starts its indexing at 1. Let’s get the first element from v5
.
You should add comments to the following code chunks describing what they do.
v5[1]
## [1] 10
We can also tell R to exclude specific elements by using the negative of the element we want to exclude.
v5[-1]
## [1] 20
And we can ask for several indices at once:
v3[c(3,1)]
## [1] 6 1
Factors are R’s way of working with categorical data.
hair <- c("Blonde", "Black", "Black", "Red", "Blonde", "Brown", "Black")
typeof(hair)
## [1] "character"
To make this more useful as categorical data, we convert to factor
:
hair <- as.factor(hair)
Levels are part of what makes factors easier to work with than basic character data. These show all of the unique or distinct values the variable takes on.
This also allows us to give R more information than the data might originally contain. How?
directions <- c("North", "West", "North", "East", "North", "West", "East")
f <- factor(directions)
The data is clearly working with the cardinal directions - north, east, south, and west - but since “south” is missing from the data, it’s also missing from the levels. If we wanted to write up or plot these data, we would probably want south represented. Luckily, we can add it in by telling R exactly what the levels should be:
f <- factor(directions,
levels = c("North", "East", "South", "West"))
Side note: what about that equals sign?
We can also take this a step further and abbreviate the level names using labels
:
f <- factor(directions,
levels = c("North", "East", "South", "West"),
labels = c("N", "E", "S", "W"))
Convert the following character vector to a factor, then give it levels cat
, dog
, bird
, and fish
. Save this factor in an object called pets
.
animals <- c("cat","dog","dog","cat","fish","fish","dog","cat")
Other times, we may work with the data in such a way where R has factor levels for which there are no observations and we might not want them anymore. In this case, we can use droplevels
to reduce the levels to only those for which we have data.
record <- c("win", "loss", "loss", "win", "loss", "win")
f <- factor(record,
levels = c("loss", "tie", "win"))
f.new <- droplevels(f)
Sometimes the levels of a variable have some natural ordering. For example, it’s common to have survey questions with response options “strongly disagree”, “disagree”, “neutral”, “agree”, and “strongly agree”.
To order a factor, we give R the labels in our desired order (from “lowest” to “highest”) and add ordered = TRUE
.
record = c("win", "tie", "loss", "tie", "loss", "win", "win")
f = factor(record,
ordered = TRUE,
levels = c("loss", "tie", "win"))
Take your pets
factor and order the levels in whatever way you like. Then, drop any unnecessary levels.
For the most part, the basic features available to use for vectors are the same for factors.
We can examine the length:
…convert to other variable types:
…get the class
(which will give us more information than just the type
):
…name the elements:
named1 <- c(sally = "win", tom = "win", ed = "lost", jane = "tie")
named1 <- factor(named1)
…and access elements:
For a factor, the summary
function will give a quick table overview of the contents of a factor. The table
function will do basically the same thing.
We can extend the use of table to create two-way tables:
hair <- c("Blonde", "Black", "Black", "Red", "Blonde", "Brown", "Black")
own_pets <- c(TRUE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE)
hair <- factor(hair)
own_pets <- factor(own_pets)
table(hair, own_pets)
## own_pets
## hair FALSE TRUE
## Black 2 1
## Blonde 1 1
## Brown 1 0
## Red 0 1
What needs to be true for these tables to make sense?