eBird Workshops OCA II

Introduction

This website contains the notes for a set of two workshops on best practices for using eBird data and eBird Status data products, respectively, presented at the Ornithological Congress of The Americas (OCA) in August 2023 in Gramado, Brazil. The two workshops are:

Best Practices for Using eBird Data: introduction to the eBird Basic Dataset (EBD), challenges associated with using eBird data for analysis, and best practices for preparing eBird data for modeling.
eBird Status and Trends: downloading eBird Status data products, loading the data into R, and using them for a variety of applications.

Setup

This workshop is intended to be interactive. All examples are written in the R programming language, and the instructor will work through the examples in real time, while the attendees are encouraged following along by writing the same code. To ensure we can avoid any unnecessary delays, please follow these setup instructions prior to the workshop:

Create an eBird account if you don’t already have one and request access to the raw eBird data and/or the eBird Status data products depending on which workshops you’re attending:
- Best Practices for Using eBird Data: request access to the eBird Basic Dataset.
- eBird Status and Trends: request access to the eBird Status data products
Download and install the latest version of R. You must have R version 4.0.0 or newer to follow along with this workshop
Download and install the latest version of RStudio. RStudio is not required for this workshop; however, the instructors will be using it and you may find it easier to following along if you’re working in the same environment.
The lessons in this workshop use a variety of R packages. To install all the necessary packages, run the following code

if (!requireNamespace("remotes", quietly = TRUE)) {
  install.packages("remotes")
}
remotes::install_github("ebird/ebird-best-practices")

Ensure all packages are updated to their most recent versions by clicking on the Update button on the Packages tab in RStudio.
Download the data package for the workshop you are attending:
- Best Practices for Using eBird Data
- eBird Status and Trends

Tidyverse

Throughout this workshop, we use packages from the Tidyverse, an opinionated collection of R packages designed for data science. Packages such as ggplot2, for data visualization, and dplyr, for data manipulation, are two of the most well known Tidyverse packages; however, there are many more. We’ll try to explain any functions as they come up; however, for a good general resource on working with data in R using the Tidyverse see the free online book R for Data Science by Hadley Wickham.

The one piece of the Tidyverse that we will cover up front is the pipe operator %>%. The pipe takes the expression to the left of it and “pipes” it into the first argument of the expression on the right.

library(dplyr)

# without pipe
mean(1:10)
#> [1] 5.5

# with pipe
1:10 %>% mean()
#> [1] 5.5

The pipe can code significantly more readable by avoiding nested function calls, reducing the need for intermediate variables, and making sequential operations read left-to-right. For example, to add a new variable to a data frame, then summarize using a grouping variable, the following are equivalent:

# intermediate variables
mtcars_kg <- mutate(mtcars, wt_kg = 454 * wt)
mtcars_grouped <- group_by(mtcars_kg, cyl)
summarize(mtcars_grouped, wt_kg = mean(wt_kg))
#> # A tibble: 3 × 2
#>     cyl wt_kg
#>   <dbl> <dbl>
#> 1     4 1038.
#> 2     6 1415.
#> 3     8 1816.

# nested function calls
summarize(
  group_by(
    mutate(mtcars, wt_kg = 454 * wt),
    cyl
  ),
  wt_kg = mean(wt_kg)
)
#> # A tibble: 3 × 2
#>     cyl wt_kg
#>   <dbl> <dbl>
#> 1     4 1038.
#> 2     6 1415.
#> 3     8 1816.

# pipes
mtcars %>% 
  mutate(wt_kg = 454 * wt) %>% 
  group_by(cyl) %>% 
  summarize(wt_kg = mean(wt_kg))
#> # A tibble: 3 × 2
#>     cyl wt_kg
#>   <dbl> <dbl>
#> 1     4 1038.
#> 2     6 1415.
#> 3     8 1816.

Exercise

Rewrite the following code using pipes:

set.seed(1)
round(log(runif(10, min = 0.5)), 1)
#>  [1] -0.5 -0.4 -0.2  0.0 -0.5 -0.1  0.0 -0.2 -0.2 -0.6

Solution

set.seed(1)
runif(10, min = 0.5) %>% 
  log() %>% 
  round(digits = 1)
#>  [1] -0.5 -0.4 -0.2  0.0 -0.5 -0.1  0.0 -0.2 -0.2 -0.6