Solutions to this workshop can be found here
Testing functions
One really critical point is that you really need test any new functions extensively. It is incredibly easy to make a mistake in coding (think about all the typos you’ve ever made, except this time they can critically affect the results of your data analysis). The solution to this is testing any code you write extensively, and if possible, having another coder familiar with what you’re doing look over it (which means always including tons of comments!)
Before we do anything else, we’ll need to load in the function we created last time to grow plants, since we’re going to be re-using this. To do this, run the source()
command, with the full path to grow_plants.R as the argument
source('~/Documents/Teaching/Intro_R_Course/grow_plants.R')
Now, let’s run some tests.
Another great sanity check in coding anything related to data is plotting. Try making a plot of your data, plotting height vs day for all the plants.
Loops
Introduction to loops
We now have a function that allows us to ‘grow’ the plants in our dataframe in a single line, just by providing the growth parameters. However, this doesn’t actually solve our problem of trying to model the growth of these plants over long periods of time. We ideally want to avoid running this model by hand 30+ times.
Instead, we can use something called a “loop”. These are an incredibly fundamental idea in programming, and they allow you to easily repeat a task multiple times.
for
loops
The most commonly used type of loop is a ‘for’ loop. Imagine we have some list (e.g. a consecutive list of days). The loop takes a variable and takes turns assigning every value from the list to it, in order, and then doing something with the variable. Here’s how they look in R:
for (variable_from_list in some_list){
# do some stuff using variable_from_list
}
Note that unlike functions, loops are not closed-off boxes: in addition to variable_from_list, they have access to all the previously specified variables in R, and they can modify these variables.
Let’s try an example: we can write a loop that starts with a vector of numbers, and produces an output vector of those numbers squared.
initial_vector <- c(0, 3, -2, 4, 15)
squares_vector <- c()
for(current_number in initial_vector){
print(current_number)
current_number_squared <- current_number^2
squares_vector <- c(squares_vector, current_number_squared)
}
print(squares_vector)
Of course, you don’t need a loop to do the above; simply running initial_vector^2
would have worked. In fact, running this kind of operation in the loop we wrote takes much longer than the alternative. However, there are many cases (like our plant growth problem) where loops are the only way to do something.
while
loops
There’s also another kind of loop, a ‘while’ loop. Rather than going through a vector of pre-defined length, while loops check that some condition is true, and only continue while that is the case.
Here’s a while loop that subtracts 2 from a number and prints the result, stopping only when the result is equal to 0.
number <- 8
while (number != 0){
number <- number - 2
print(number)
}
while()
loops are tricky, however. If you don’t think carefully about your starting condition, you can end up in an endless loop. For example, try changing the initial value of number
to 7 in the loop above.
Hint: You can press inside your console and hold down ctrl-c
or Esc
to end the execution of any R code.
Coding plant growth in a loop
Now that we know how to write loops, try to write some code that makes a new dataframe, plant_growth_df_2, and then uses a loop running the grow_plants function to loop through 30 days of plant growth.
Putting it all together: Using simulations to plan experiments
Let’s come back to our initial problem. We want to figure out whether salt affects plant growth. To do this, we plan to compare two conditions: plants grown without salt, and plants grown with some standard concentration of salt added to the soil.
Imagine we suspect, based on previous literature, that salt decreases plant growth rate by 20%. How large of an experiment would we have to do to see this change? How long do we need to run this experiment for?
We can try to get ballpark answers to these questions with simulations.
First, let’s convert our plant dataframe-creating loop from above into a function. In addition to all the plant growth parameters, we may want to pass the number of individual plants we’re growing, as well as the duration of growth, into this function.
Now we can use our new simulate_growth()
function to simulate plant growth with and without salt.
Now that we’ve simulated our data, we can do some plotting and statistical analysis on it!
combined_plant_df$plant.id.full <-
paste(combined_plant_df$Condition, combined_plant_df$plant.id)
View(combined_plant_df)
What would you do if you wanted to repeat this simulation and comparison 1000 times, saving the p-value associated with the effect of Condition each time?
Using coding in science
I wanted to share some final resources that I think are great places to look to keep going with learning to code.
SoftwareCarpentry: Sets of lessons on the coding, mostly at an introductory/early intermediate level. Lots of overlap with this course in the R sections, but presented in a different way; also courses in Python, Unix, and really useful tools like GitHub
Jenny Bryan’s UBC’s STAT545 course: walks you through intro- and intermediate-level R, explaining not just the programming, but good ways to think about analyzing data and coding the analysis. If you want a more in-depth course about R in general, I highly recommend this.
If you’re planning to spend a lot of time coding (i.e. not just a one-off analysis, but more long-term projects), the following articles are great sources for thinking about what to aspire to. They can be daunting if you try to achieve all their suggestions at once, but I think as they both try to point out, a better idea is understanding what good, reproducible coding looks like and then slowly building more and more of these practices into your work as you grow. Few coding projects in biology manage to achieve everything outlined here, but these articles provide an excellent long-term roadmap as you become a better coder.
