Solutions to this workshop can be found here
Review: vectors, functions, and types
Vectors and functions
# Create a variable, vect_1 that holds a vector containing any 4 numbers
vect_1 <- c(1, 2, 3, 5)
# Calculate the sum of vect_1 and print it out on the screen
sum(vect_1)
[1] 11
# Create a new variable, vect_2, that contains each value of vect_1 squared
# (Don't manually enter the numbers, use vect_1 to calculate this)
vect_2 <- vect_1^2
print(vect_2)
[1] 1 4 9 25
Notice that lots of functions work on vectors!
Sequences of numbers
# Create a vector of numbers from -5 to 5 containing 21 elements called review_vec
review_vec <- seq(from = -5, to = 5, length.out = 21)
print(review_vec)
[1] -5.0 -4.5 -4.0 -3.5 -3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5
[19] 4.0 4.5 5.0
When we get missing values
A HUGE part of analyzing real scientific data is dealing with missing values. There are lots of ways we can get missing values; these are probably the most common:
- When we try to get R to do a nonsensical thing
- When we perform some super taboo math operation
- When we are using real data and we didn’t collect every observation (eg you measured the height of a bunch of plants every week but by the third week, some of them died)
Every programming language has its own special way of representing ‘missing’ data; R, ever so special, uses two: NA or NaN, depending on the situation; don’t worry about the difference, I (Eugene) literally learned it while preparing this class, after 5 years of programming in R.
Let’s take a look at how these missing values work.
Taboo math operations
Another way to get missing values is to perform a math operation that R doesn’t like. Let’s try doing this. Remember that we can perform operations on vectors, e.g. adding two vectors to each other. Let’s try dividing vectors this way.
vector_1 <- c(1, 2, 0, 3, 0, -5)
vector_2 <- c(0, -3, 1, 6, 0, 0)
# What do you think you'll get if you divide vector_1 by vector_2?
# Try it
vector_1 / vector_2
[1] Inf -0.6666667 0.0000000 0.5000000 NaN -Inf
Notice that dividing non-zero by zero gives you Inf
(or -Inf
if the numerator is negative), dividing zero by non-zero gives you zero (this makes sense), but dividing zero by zero gives you NaN
Real missing data
Finally, you can enter NA into your vector, just like you would any other value
# Create a vector, data_vector, which has some numbers and at least one NA value
data_vector <- c(3, 8, NA, 9)
# What happens if you take the sum of data_vector?
sum(data_vector)
[1] NA
Lots of functions in R, including sum()
, have a way of dealing with missing data automatically. There are lots of times when we may want to just ignore the missing values and work with the rest of the data we have. Take a look at the documentation for sum()
; do you see anything in there that might suggest how you could get the function to ignore missing values and take the sum of data_vector anyways?
# Try using sum() to get the sum of the non-missing values in data_vector
sum(data_vector, na.rm = TRUE)
[1] 20
Logicals
Introduction to logicals
In R, in addition to character and numeric values, we can have logical values. Logical values are TRUE
and FALSE
. These words, when you type them, are something like NA
in that they’re special words.
Logical values are often generated by comparisons between two values. These comparisons are made with logical operators. Many of them you will be familiar with.
3 > 4
[1] FALSE
8 < 5
[1] FALSE
4 <= 4
[1] TRUE
Note that the logical operator “is equal to” is ==
in R. If you only use =
, you will reassign your variable (that is =
in R is the same as <-
).
“Not” is specified by !
. So “is not equal to” is !=
.
Try to figure out what the output of each of the following lines will be before running them.
TRUE == TRUE
[1] TRUE
TRUE == FALSE
[1] FALSE
TRUE != FALSE
[1] TRUE
3 != 4
[1] TRUE
char = 'a'
char == 'a'
[1] TRUE
char = 'b'
char == 'a'
[1] FALSE
Looking for missing values in a vector
One useful function that results in a logical value is is.na()
# try using is.na() on position 11 in review_vec
print(review_vec[11])
[1] 0
is.na(review_vec[11])
[1] FALSE
# try using is.na() on review_vec
is.na(review_vec)
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[16] FALSE FALSE FALSE FALSE FALSE FALSE
# try using is.na() on data_vector
is.na(data_vector)
[1] FALSE FALSE TRUE FALSE
Just like you can make a vector of characters or a vector of numeric values, you can also make a vector of logicals.
# Make a vector with 3 TRUEs and 2 FALSEs, save it as logical_vec1
logical_vec1 <- c(TRUE, TRUE, TRUE, FALSE, FALSE)
# Make a vector called num_vec that starts at 2 and ends at 10 (increasing by 1)
num_vec <- c(2:10)
logical_vec2 <- num_vec %% 3 == 2 #what is this line of code doing?
print(num_vec)
[1] 2 3 4 5 6 7 8 9 10
print(logical_vec2)
[1] TRUE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE
Treating numbers as logicals
In R, FALSE is also encoded as 0, and TRUE is encoded as 1.
0==FALSE
[1] TRUE
1==TRUE
[1] TRUE
# How do you think we could easily figure out how many TRUEs there are in logical_vec1?
sum(logical_vec1)
[1] 3
# How can you get R to tell you the number of missing values in data_vec?
sum(is.na(data_vector))
[1] 1
Numbers can be treated like logical values, especially with as.logical()
. 0 is false, everything else is true.
as.logical(-2:2)
[1] TRUE TRUE FALSE TRUE TRUE
sum(as.logical(-2:2)) # what is going on here?
[1] 4
Exercise
Imagine that you’re doing an experiment measuring plant height, but some of the plants didn’t grow. You decide those should be considered missing values, so they entered NA
for those plants. Let’s count the number of plants that DID grow in your experiment.
plant_heights <- c(1.9, 0.1, NA, 0.8, 0.4, 0.2, 7.9, NA, 16.8, 3.5)
# create a logical vector, plant_NA_vector, that contains information on whether
# each value in plant_heights is missing (NA)
plant_NA_vector <- is.na(plant_heights)
print(plant_NA_vector)
[1] FALSE FALSE TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
# use plant_NA_vector to figure out how many plants total didn't grow
# assign this number to a variable and print it out
no_grow_plants <- sum(plant_NA_vector)
print(no_grow_plants)
[1] 2
# use plant_NA_vector to figure out how many plants DID grow
# assign this number to a variable and print it out
grow_plants <- sum(!plant_NA_vector)
print(grow_plants)
[1] 8
# bonus:figure out what PROPORTION of plants grew
# assign this number to a variable and print it out
prop_grow <- grow_plants/(grow_plants + no_grow_plants)
print(prop_grow)
[1] 0.8
Selecting and replacing values from vectors based on logicals
In addition to using indices to get R to tell us a specific position in a vector, we can also use them to replace specific values. Take a look at this example.
vector_to_sub <- c(12, 13, 14, 15, NA, 17, 18, 19)
# replace the 2nd value in vector_to_sub with 54
vector_to_sub[2] <- 54
print(vector_to_sub)
[1] 12 54 14 15 NA 17 18 19
# now, try replacing the 1st, 3rd, and 8th values in vector_to_sub with 28,
# using only one line of code (i.e. do it all at once)
vector_to_sub[c(1,3,8)] <- 28
print(vector_to_sub)
[1] 28 54 28 15 NA 17 18 28
One really powerful way to use this vector substitution is to combine it with logical vectors. Take a look at the example below:
vector_to_sub2 <- 1:3
vector_to_sub2[c(TRUE, FALSE, TRUE)] <- 42
# What do you think vector_to_sub2 looks like now? Print it out to check
print(vector_to_sub2)
[1] 42 2 42
The logical vector used for indexing is telling R which positions should be replaced.
Try this for yourself. Lots of times, we want to replace missing values in vectors. Let’s say that, when you do your analysis on the plant example above, you want to treat the plants that didn’t grow as having a height of 0 rather than being missing.
# use plant_NA_vector to replace every missing value in plant_heights with a 0
print(plant_heights)
[1] 1.9 0.1 NA 0.8 0.4 0.2 7.9 NA 16.8 3.5
plant_heights[plant_NA_vector] <- 0
# print the modified plant_heights vector
print(plant_heights)
[1] 1.9 0.1 0.0 0.8 0.4 0.2 7.9 0.0 16.8 3.5
Operations on logicals (AND, OR)
We can also do “and” (&
) and “or” (|
, all the way on the right side of your keyboard). “And” only returns true if both things are true. “Or” returns true if at least one thing is true.
Try predicting what all the operations below will return
TRUE & TRUE
[1] TRUE
TRUE & FALSE
[1] FALSE
FALSE & FALSE
[1] FALSE
FALSE | TRUE
[1] TRUE
TRUE | TRUE
[1] TRUE
FALSE | FALSE
[1] FALSE
# bonus: perform some operations on the following variables (unicorns and
# rainbows) that will result in a TRUE value being returned
unicorns <- FALSE
rainbows <- FALSE
!unicorns & !rainbows
[1] TRUE
Things we hope you’ve learned today (and will hopefully remember next time)
- When missing values arise
- Using
is.na()
to identify the positions with missing values in a vector
- Logicals
- Using logicals as indices to pull out or substitute elements of a vector that match some criteria
- Operations that can be performed on logicals (
&
and |
)
