# Lesson 5 - Vector Subsetting

Individual elements within a vector can be accessed by using square brackets [ ] with the vector name. To extract a single element, put the element number in square brackets following the vector name (note the element positions are 1-based):

> jersey_numbers <- c(Pierce = 34L, Garnett = 5L, Rondo = 9L, Allen = 20L, Perkins = 43L) > jersey_numbers[1] Pierce 34

If the vector elements are named, the element name can be used instead of the element number:

> jersey_numbers["Pierce"] Pierce 34

To extract multiple elements from a vector, pass in an integer class vector to the square brackets. The values of the integer vector correspond to the elements to be extracted. Here we will extract the first, third, and fourth elements of the jersey_numbers vector:

> jersey_numbers[c(1,3,4)] Pierce Rondo Allen 34 9 20

The values of the integer vector can be in any order:

> jersey_numbers[c(4,1,3)] Allen Pierce Rondo 20 34 9

Multiple elements can also be extracted via label names. Pass in a character class vector to the square brackets. The label names can be in any order:

> jersey_numbers[c("Perkins", "Rondo")] Perkins Rondo 43 9

The negative sign operator - can be used to specify elements that should not be extracted:

> jersey_numbers Pierce Garnett Rondo Allen Perkins 34 5 9 20 43 > jersey_numbers[-4] #All elements, except the fourth Pierce Garnett Rondo Perkins 34 5 9 43 > jersey_numbers[-c(4,5)] #All elements, except the fourth and fifth Pierce Garnett Rondo 34 5 9

### Integer Sequences

So far, we've been creating integer class vectors with the c() function. To create a vector of consecutive integers, there's a programming shortcut: the colon :. Here are a few examples:

> 1:10 #Vector of integers from 1 to 10 [1] 1 2 3 4 5 6 7 8 9 10 > sequence_vector <- 20:25 #Vector of integers assigned to a variable > sequence_vector [1] 20 21 22 23 24 25 > 15:9 #Vector of integers in reverse. [1] 15 14 13 12 11 10 9

I suspect I'll find colon : shortcut gets used frequently with R. Back to the jersey_numbers vector. Let's extract the first three elements, and then the last three elements in reverse:

> jersey_numbers[1:3] Pierce Garnett Rondo 34 5 9 > jersey_numbers[5:3] Perkins Allen Rondo 43 20 9

### Logical Class Vectors

Here is a design pattern of extracting vector elements using a logical (boolean) class vector. To begin, we'll find the elements of jersey_numbers that are less than 10:

> jersey_numbers < 10 Pierce Garnett Rondo Allen Perkins FALSE TRUE TRUE FALSE FALSE

A vector of those logical values lets us extract just the elements that are TRUE (jersey number is less than 10):

> jersey_numbers[c(FALSE, TRUE, TRUE, FALSE, FALSE)] Garnett Rondo 5 9

The same can be done programatically:

> single_digits <- jersey_numbers < 10 > jersey_numbers[single_digits] Garnett Rondo 5 9

Alternatively, the above could be written as a single statement:

> jersey_numbers[jersey_numbers < 10] Garnett Rondo 5 9

### Recycling

Our jersey_numbers vector has five elements. What happens if we try to extract elements using a vector of logicals that itself only has two elements?

> jersey_numbers Pierce Garnett Rondo Allen Perkins 34 5 9 20 43 > jersey_numbers[c(TRUE, FALSE)] Pierce Rondo Perkins 34 9 43

The code above returned the first, third, and fifth elements with no warnings or errors. When R gets to the second (last) element of the logical vector, it "recycles" the vector by going back to the first element. The first element of jersey_numbers is TRUE and the second element is FALSE. The logical vector elements are recycled and repeated for the third and fourth jersey_number elements. The logical vector elements are recycled one more time for the fifth jersey_number element. Essentially, this recycling example is equivalent to this:

> jersey_numbers[c(TRUE, FALSE, TRUE, FALSE, TRUE)] Pierce Rondo Perkins 34 9 43

### Dave's Thoughts

There was a good mix of the familiar and unfamiliar for me in this post. Generically speaking, accessing elements/items of a collection/array/list with square brackets is very familiar. I've been using similar, if not identical syntax with C# and VB6/VBA for many years. The negative sign syntax to exclude elements is nice. I bet a lot of SQL developers would be jealous. If we want to select every column in a table, except for one, we have nothing similar to accomplish the task. We're relegated to typing out the names of every column. Sigh...

The ability to access a subset of vector elements without having to use looping constructs feels like a pretty big deal. Loops are generally easy to read and interpret, but they can be poor in terms of performance. I'm reminded of LINQ from the .NET Framework. It allows you to "query" programming objects to get a subset of them--without iterative looping.

## 0 comments: