Introduction to R
1. Getting started with R
Objectives - at the end of this session, students will:
- know how to load R and R studio onto a computer
- be familiar with the components of the R Studio interface
- be able to run code and assess output in the default graphics display
- be able to install packages (libraries) in R
- be able to use R help as well asStackOverflowto address questions
- be able to access Panopto software, record a video and share it with instructors
Resources
- Website for downloading R Studio, which also provides a link to download R. Note that R and R Studio are separate software packages that both must be installed separately https://posit.co/download/rstudio-desktop/
- In this 4-minute video, Adam overviews the R Studio interface
- In this 4-minute video, Adam overviews a few options for getting help in R
Proficiency Demonstration
- Record a 2-minute video using a screen capture video of your choice and submit the link to the instructor for review. For MSU students, Panopto is the required platform to use and is accessed through the MSU website using your NetID and password (see this 5 minute video on getting started with Panopto).
2. Basic notation and objects
Objectives - at the end of this session, students will:
- understand how to create objectsand assign variable names
- know common operators: Arithmetic (+,-,*,etc.), Logical/Relational (> < !=, ==)
- understand the difference between vectors and dataframes and have a general sense of what a list and a matrix are.
- understand basic data frame and vector operations: How to create a data frame manually or with a .csv file
- understand how to use square bracket indexing tolocate rows/columns/entries by indexing
- understandhow to create,concatenate, and do basic operations on vectors
Resources
-
In this 15 minute video, Nick overviews basic operators and data types (supporting files here).
-
In this 7 minute video, Adam overviews downloading data from WQX and creating a dataframe in R.
-
Data Types: https://www.w3schools.com/r/r_data_types.asp
-
Data frames: https://www.w3schools.com/r/r_data_frames.asp
Proficiency Demonstration
Turn in a script that does the following:
-
Assign a string/char variable and an integer or double, and determine the type of each with class() or typeof()
-
Create a set of integer/double variables and write a line of code containing at least 3-4 mathematical operations with those variables (e.g. x1*x3/2 %% x2); be able to describe the order of operations
-
Create a vector (with more than one entry); change one of the vector’s entries via indexing; have a line of code using one of the logical or relational operators on the vector (i.e. %in%, >, <, etc.), which returns a Logical or vector of Logical.)
-
Create 2 data frames, one with read.csv on a file of choice and one manually
-
From one of these data frames, retrieve a row, a column, and an individual cell using square bracket indexing
Record a 2-minute video referencing your script and demonstrating that you have met the objectives for the week.
3. Plotting Part 1
Objectives - at the end of this session, students will:
-
be familiar with the fundamental arguments in the base R plotting function
-
be able to use different graphics devices: default, windows (or quartz on mac), png
-
be able to use multiple plot types: xy scatter, boxplots, and bar
Resources
-
In this 9 minute video, Nick overviews the basics of creating plots in R (supporting files here).
-
Graph Gallery – comprehensive website for different plot types, includes both ggplot and base R methods: https://r-graph-gallery.com/
-
https://www.geeksforgeeks.org/graph-plotting-in-r-programming/
Proficiency Demonstration
Turn in a script that does the following:
-
Read in a csv for a dataset of your choice
-
Create at least 2 different plots using the dataset
-
For each plot, include a title and axis labels; if colors/shapes are used to distinguish features, include an appropriately labeled legend
-
For at least one of the plots, print onto a png and output file.
Record a 2-minute video referencing your script and demonstrating that you have met the objectives for the week.
4. Data pipeline structure and coding conventions
Objectives
-
understand relative path references and be familiar with knowing what the working directory is
-
understand the value in creating a structured data, code, and results folder environment, for readily sharing work that is easily transferable and repeatable
-
be able to create a set of folders with code using relative path references to read in data and save output to a results folder
-
understand good practices in code organization, commenting, and naming conventions
Resources
-
In this 5 minute video, Adam overviews the use of relative path names (supporting files here).
-
In this 7.5 minute video, Adam overviews the use of the pipeline data structure (supporting files here).
-
Chapter from a larger tutorial, focusing on R workspace management and File types: https://bookdown.org/ndphillips/YaRrr/importingdata.html
-
Style guides: http://adv-r.had.co.nz/Style.html
Proficiency Demonstration
Turn in a zipped set of folders; specifically:
-
Files uploaded should consist of an input, code, output file structure
- Create a script which reads in data contained in input file, and outputs an edited data frame as a .csv, along with a .png file of some plot from the data, located in the output file.
- Code from this point on should include comments
Record a 2-minute video referencing your script and demonstrating that you have met the objectives for the week. In addition to describing the script, video should demonstrate knowledge of the working directory and ability to adjust it.
5. Data wrangling
Objectives
-
understand the concept of tidy data and how to maintain it
-
understand common functions for manipulating data frames; sort, merge, subset/filter
-
Aggregate data using both the aggregate function in base R, and the dcast function in the “reshape2” package.
Specifically:
-
Be able to download water quality results and site information from WQX and link those based on Site ID.
-
Be able to filter data in long format to get only data for an analyte of interest
-
Be able to aggregate data by mean (or other summary statistics) concentration based on site information, specifically county and/or watershed (HUC8)
Resources
- In this 5-minute video, Nick overviews use of the lubridate package to work with dates (supporting files here). Here is a cheatsheet for lubridate date and times.
- In this 6-minute video, Nick overviews work with strings (supporting files here). Here is a cheatsheet for working with strings.
- Here are some example datasets from WQX and a script to link the sites data to the water quality results data. (supporing files here)
-
Tidy data: https://r4ds.had.co.nz/tidy-data.html
-
dplyr documentation: https://dplyr.tidyverse.org/
-
https://cran.r-project.org/web/packages/dplyr/vignettes/dplyr.html
- Data reshaping
Proficiency Demonstration
Turn in a script that does the following:
- submit a folder of data and code following the pipeline method from the previous lesson and demonstrating the objectives.
Record a 2-minute video referencing your script and demonstrating that you have met the objectives for the week.