eBird Best Practices Workshop (ROC)

Introduction

The contents of this website comprise the notes for a workshop on best practices for using eBird data and eBird Status data products presented to Red de Observadores de Aves y Vida Silvestre de Chile (ROC) in May 2023 in Santiago, Chile. The workshop is divided into three lessons covering:

eBird Data: introduction to the eBird Reference Dataset (ERD), challenges associated with using eBird data for analysis, and best practices for preparing eBird data for modeling.
Modeling Relative Abundance: best practices for using eBird data to model encounter rate, count, and relative abundance for a species.
eBird Status Data Products: downloading eBird Status data products, loading the data into R, and using them for a variety of applications.

Setup

This workshop is intended to be interactive. All examples are written in the R programming language, and the instructor will work through the examples in real time, while the attendees are encouraged following along by writing the same code. To ensure we can avoid any unnecessary delays, please follow these setup instructions prior to the workshop

Download the data package for this workshop. This data package contains a variety of datasets used throughout the three lessons.
Request access to the eBird Status data products.
Download and install the latest version of R. You must have R version 4.0.0 or newer to follow along with this workshop
Download and install the latest version of RStudio. RStudio is not required for this workshop; however, the instructors will be using it and you may find it easier to following along if you’re working in the same environment.
Create an RStudio project for working through the examples in this workshop. Move all the files from the data package so they’re in a data/ subdirectory of the project directory.
The lessons in this workshop use a variety of R packages. To install all the necessary packages, run the following code

if (!requireNamespace("remotes", quietly = TRUE)) {
  install.packages("remotes")
}
remotes::install_github("ebird/ebird-best-practices")

Ensure all packages are updated to their most recent versions by clicking on the Update button on the Packages tab in RStudio.

Tidyverse

Throughout this book, we use packages from the Tidyverse, an opinionated collection of R packages designed for data science. Packages such as ggplot2, for data visualization, and dplyr, for data manipulation, are two of the most well known Tidyverse packages; however, there are many more. We’ll try to explain any functions as they come up; however, for a good general resource on working with data in R using the Tidyverse see the free online book R for Data Science by Hadley Wickham.

The one piece of the Tidyverse that we will cover up front is the pipe operator %>%. The pipe takes the expression to the left of it and “pipes” it into the first argument of the expression on the right.

library(dplyr)

# without pipe
mean(1:10)
#> [1] 5.5

# with pipe
1:10 %>% mean()
#> [1] 5.5

The pipe can code significantly more readable by avoiding nested function calls, reducing the need for intermediate variables, and making sequential operations read left-to-right. For example, to add a new variable to a data frame, then summarize using a grouping variable, the following are equivalent:

# intermediate variables
mtcars_kg <- mutate(mtcars, wt_kg = 454 * wt)
mtcars_grouped <- group_by(mtcars_kg, cyl)
summarize(mtcars_grouped, wt_kg = mean(wt_kg))
#> # A tibble: 3 × 2
#>     cyl wt_kg
#>   <dbl> <dbl>
#> 1     4 1038.
#> 2     6 1415.
#> 3     8 1816.

# nested function calls
summarize(
  group_by(
    mutate(mtcars, wt_kg = 454 * wt),
    cyl
  ),
  wt_kg = mean(wt_kg)
)
#> # A tibble: 3 × 2
#>     cyl wt_kg
#>   <dbl> <dbl>
#> 1     4 1038.
#> 2     6 1415.
#> 3     8 1816.

# pipes
mtcars %>% 
  mutate(wt_kg = 454 * wt) %>% 
  group_by(cyl) %>% 
  summarize(wt_kg = mean(wt_kg))
#> # A tibble: 3 × 2
#>     cyl wt_kg
#>   <dbl> <dbl>
#> 1     4 1038.
#> 2     6 1415.
#> 3     8 1816.

Exercise

Rewrite the following code using pipes:

set.seed(1)
round(log(runif(10, min = 0.5)), 1)
#>  [1] -0.5 -0.4 -0.2  0.0 -0.5 -0.1  0.0 -0.2 -0.2 -0.6

Solution

set.seed(1)
runif(10, min = 0.5) %>% 
  log() %>% 
  round(digits = 1)
#>  [1] -0.5 -0.4 -0.2  0.0 -0.5 -0.1  0.0 -0.2 -0.2 -0.6