Vectors

One-Dimensional Objects

We got a brief introduction to 1-D objects in R when we discussed naming and storing objects. R has a variety of different types of objects that store different types and dimensions of information.

A vector is a 1-D object, meaning that it stores a sequence of elements in a particular order. In general, the only rule here is that the elements all need to be the same type - numeric, character, etc. In our intro to R lab, we saw vectors with one element:

a <- 2

To create vectors with multiple elements, we use the concatenate function: c():

v <- c(1,2,3)

On Your Own

Create a vector named v2 of the (ordered) numbers: 5, 2, 9, and 10.

Types of Data

The data types available in R are double, integer, complex, logical, character, and raw.

  • Double: Real numbers
  • Integer: Integer numbers
  • Complex: Complex numbers
  • Logical: A vector containing True/False values
  • Character: a vector containing strings.
  • Raw: holds raw bytes (bytes are a unit of digital information storage). The raw type is almost never used in practice.

Unlike in other programming languages, we don’t need to tell R which data type we’re using. However, we may occasionally want to change whatever R did automatically. We can set this up when we create a vector, but I think the easiest way to manage this is to let R do its thing and then use as.type to make the change (where “type” is replaced by the desired data type). For example,

v.double <- as.double(v)

We can also create vectors by combine other vectors of the same type:

v.combined <- c(v, v2)

On Your Own

Convert the variable v2 to character.

Vectors of all variable types can also include NA, which indicates a missing value.

We can check which type R is using for a variable using typeof(). Let’s check v and v2 to make sure we got what we wanted (v should be a double; v2 should be character).

More on the Character Type

We enter character type values as strings enclosed in double or single quotes.

char1 <- c("banana","apple")

is exactly equivalent to

char1 <- c('banana','apple')

More on the Logical Type

Logical vectors can be entered directly using TRUE and FALSE or T and F. This is one place where R being case sensitive is really important - you can use t and f as variable names, but T and F are reserved.

logi1 <- c(FALSE, TRUE, T, F)

You can also create logical vectors by using logical operators within the c() function.

logi2 <- c(2>3, 5!=6)

Basic Features

Checking Length

You can check the length of a vector using length():

length(logi1)
## [1] 4

Vectorized Operations

We can do math on vectors directly! They don’t need to be of exactly the same type, but they do need to be of the same class. We can check the class using class().

Operations are performed element-by-element.

v3 <- c(1, 5, 6)
v4 <- c(-0.41, -1.20,  pi)
class(v3)
## [1] "numeric"
class(v4)
## [1] "numeric"

Notice that class() tells us that both vectors are numeric, so we’re good to go with the math. Let’s try a few operations with v3 and v4:

This can sometimes feel a little weird, especially if you’ve taken vector calculus. That’s ok!

If we use a scalar (including an object holding only one value, like a <- 3), it will apply that scalar operation to every element in the vector. Let’s try a few with v3:

Vectors of different lengths

v5 <- c(10,20)

Let’s try a couple of operations with v3 and v4:

What happened?

What about character and logical vectors?

We can’t do math with character vectors.

We can do math with logical vectors, though. Since logical vectors are binary, R will treat TRUE and 1 and FALSE as 0. Why might this be useful?

Accessing Individual Elements

Sometimes we may want to access or use specific elements from a vector without creating new objects for each element. An index is the location of an element in a vector. R starts its indexing at 1. Let’s get the first element from v5.

You should add comments to the following code chunks describing what they do.

v5[1]
## [1] 10

We can also tell R to exclude specific elements by using the negative of the element we want to exclude.

v5[-1]
## [1] 20

And we can ask for several indices at once:

v3[c(3,1)]
## [1] 6 1

Factors

Factors are R’s way of working with categorical data.

hair <- c("Blonde", "Black", "Black", "Red", "Blonde", "Brown", "Black")
typeof(hair)
## [1] "character"

To make this more useful as categorical data, we convert to factor:

hair <- as.factor(hair)

Levels

Levels are part of what makes factors easier to work with than basic character data. These show all of the unique or distinct values the variable takes on.

This also allows us to give R more information than the data might originally contain. How?

directions <- c("North", "West", "North", "East", "North", "West", "East")
f <- factor(directions)

The data is clearly working with the cardinal directions - north, east, south, and west - but since “south” is missing from the data, it’s also missing from the levels. If we wanted to write up or plot these data, we would probably want south represented. Luckily, we can add it in by telling R exactly what the levels should be:

f <- factor(directions,
            levels = c("North", "East", "South", "West")) 

Side note: what about that equals sign?

We can also take this a step further and abbreviate the level names using labels:

f <- factor(directions,
            levels = c("North", "East", "South", "West"),
            labels = c("N", "E", "S", "W"))

On Your Own

Convert the following character vector to a factor, then give it levels cat, dog, bird, and fish. Save this factor in an object called pets.

animals <- c("cat","dog","dog","cat","fish","fish","dog","cat")

Removing Extraneous Levels

Other times, we may work with the data in such a way where R has factor levels for which there are no observations and we might not want them anymore. In this case, we can use droplevels to reduce the levels to only those for which we have data.

record <- c("win", "loss", "loss", "win", "loss", "win")
f <- factor(record,
           levels = c("loss", "tie", "win"))
f.new <- droplevels(f)

Ordered Factors

Sometimes the levels of a variable have some natural ordering. For example, it’s common to have survey questions with response options “strongly disagree”, “disagree”, “neutral”, “agree”, and “strongly agree”.

To order a factor, we give R the labels in our desired order (from “lowest” to “highest”) and add ordered = TRUE.

record = c("win", "tie", "loss", "tie", "loss", "win", "win")
f = factor(record, 
           ordered = TRUE, 
           levels = c("loss", "tie", "win"))

On Your Own

Take your pets factor and order the levels in whatever way you like. Then, drop any unnecessary levels.

Basic Features

For the most part, the basic features available to use for vectors are the same for factors.

We can examine the length:

…convert to other variable types:

…get the class (which will give us more information than just the type):

…name the elements:

named1 <- c(sally = "win", tom = "win", ed = "lost", jane = "tie")
named1 <- factor(named1)

…and access elements:

Frequency Tables

For a factor, the summary function will give a quick table overview of the contents of a factor. The table function will do basically the same thing.

We can extend the use of table to create two-way tables:

hair <- c("Blonde", "Black", "Black", "Red", "Blonde", "Brown", "Black")
own_pets <- c(TRUE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE)

hair <- factor(hair)
own_pets <- factor(own_pets)

table(hair, own_pets)
##         own_pets
## hair     FALSE TRUE
##   Black      2    1
##   Blonde     1    1
##   Brown      1    0
##   Red        0    1

What needs to be true for these tables to make sense?