eBird Status and Trends

Introduction

This website houses the notes for a workshop on the eBird Status and Trends Data Products presented at the The Wildlife Society (TWS) Conference in November 2023. The workshop will be divided into two lessons covering the eBird Status Data Products and the eBird Trends Data Products. In each section, we demonstrate how to download the data, load it into R, and use it for some common conservation and management applications.

Setup

Before attending this workshop, create an eBird account if you don’t already have one and request access to the eBird Status and Trends Data Products by filling out the access request form at: https://science.ebird.org/en/status-and-trends/download-data

Software

This workshop is intended to be interactive. All examples are written in the R programming language, and the instructor will work through the examples in real time, while the attendees are encouraged following along by writing the same code. To ensure we can avoid any unnecessary delays, please follow these setup instructions prior to the workshop:

  1. Download and install the latest version of R. You must have R version 4.0.0 or newer to follow along with this workshop
  2. Download and install the latest version of RStudio. RStudio is not required for this workshop; however, the instructors will be using it and you may find it easier to following along if you’re working in the same environment.
  3. Working with the eBird Status and Trends Data Products in R requires the ebirdst R package. Install the latest version of the package by running the following code in R:
if (!requireNamespace("remotes", quietly = TRUE)) {
  install.packages("remotes")
}
remotes::install_github("ebird/ebirdst", ref = "trends")
  1. Ensure all packages are updated to their most recent versions by clicking on the “Update” button on the “Packages” tab in RStudio.

Data

For those working through the notes on their own, all the required data will be downloaded as needed during the lessons; however, for those attending the workshop, having 30 people attempt to download a large amount of data on the same WiFi connect can pose a problem. With that in mind, attendees will be asked to download data in advance by running the following code:

# download data package
td <- file.path(tempdir(), "ebirdst-workshop-data")
dir.create(td, recursive = TRUE, showWarnings = FALSE)
tf <- file.path(td, "data.zip")
options(timeout = 10000)
download.file("https://cornell.box.com/shared/static/qn7knt6o865853uhbqqlr9fhz3mj1mhk.zip", 
              destfile = tf)

# unzip
unzip(tf, exdir = td)
files <- list.files(td, recursive = TRUE)
files <- files[grepl("^2022/", files)]
dest_dir <- ebirdst::ebirdst_data_dir()
# create directories
for (d in unique(dirname(files))) {
  dir.create(file.path(dest_dir, d), showWarnings = FALSE, recursive = TRUE)
}
# copy files
for (f in files) {
  if (!file.exists(file.path(dest_dir, f))) {
    file.copy(from = file.path(td, f), 
              to = file.path(dest_dir, f))
  }
}

# clean up
unlink(td, recursive = TRUE)

Template R scripts

During the workshop we’ll work through the lessons on this website, writing code together in real time; however, it will be useful to have script templates to work from. Open RStudio, then:

  1. Create a script named “ebird-status.R”, visit this link, and copy the contents into the script you just created.
  2. Create a script named “ebird-trends.R”, visit this link, and copy the contents into the script you just created.

Background knowledge

Tidyverse

Throughout this workshop, we use packages from the Tidyverse, an opinionated collection of R packages designed for data science. Packages such as ggplot2, for data visualization, and dplyr, for data manipulation, are two of the most well known Tidyverse packages; however, there are many more. We’ll try to explain any functions as they come up; however, for a good general resource on working with data in R using the Tidyverse see the free online book R for Data Science by Hadley Wickham.

The one piece of the Tidyverse that we will cover up front is the pipe operator %>%. The pipe takes the expression to the left of it and “pipes” it into the first argument of the expression on the right.

library(dplyr)

# without pipe
mean(1:10)
#> [1] 5.5

# with pipe
1:10 %>% mean()
#> [1] 5.5

The pipe can code significantly more readable by avoiding nested function calls, reducing the need for intermediate variables, and making sequential operations read left-to-right. For example, to add a new variable to a data frame, then summarize using a grouping variable, the following are equivalent:

# intermediate variables
mtcars_kg <- mutate(mtcars, wt_kg = 454 * wt)
mtcars_grouped <- group_by(mtcars_kg, cyl)
summarize(mtcars_grouped, wt_kg = mean(wt_kg))
#> # A tibble: 3 × 2
#>     cyl wt_kg
#>   <dbl> <dbl>
#> 1     4 1038.
#> 2     6 1415.
#> 3     8 1816.

# nested function calls
summarize(
  group_by(
    mutate(mtcars, wt_kg = 454 * wt),
    cyl
  ),
  wt_kg = mean(wt_kg)
)
#> # A tibble: 3 × 2
#>     cyl wt_kg
#>   <dbl> <dbl>
#> 1     4 1038.
#> 2     6 1415.
#> 3     8 1816.

# pipes
mtcars %>% 
  mutate(wt_kg = 454 * wt) %>% 
  group_by(cyl) %>% 
  summarize(wt_kg = mean(wt_kg))
#> # A tibble: 3 × 2
#>     cyl wt_kg
#>   <dbl> <dbl>
#> 1     4 1038.
#> 2     6 1415.
#> 3     8 1816.
Exercise

Rewrite the following code using pipes:

set.seed(1)
round(log(runif(10, min = 0.5)), 1)
#>  [1] -0.5 -0.4 -0.2  0.0 -0.5 -0.1  0.0 -0.2 -0.2 -0.6
set.seed(1)
runif(10, min = 0.5) %>% 
  log() %>% 
  round(digits = 1)
#>  [1] -0.5 -0.4 -0.2  0.0 -0.5 -0.1  0.0 -0.2 -0.2 -0.6

Working with spatial data in R

The Status and Trends Data Products are mostly spatial data in one of the following formats:

  • Raster: values assigned to a regular grid of square cells. Data products of this type are stored in GeoTIFF format and we use the R package terra to work with them. vector
  • Polygons: polygon boundaries with attribute data assigned to each polygon. Data products of this type (e.g. range polgyons) are stored in GeoPackage format and we use the R package sf to work with them.
  • Points: point locations defined by a pair of coordinates with attribute data assigned to each point. Data products of this type are stored in CSV or Parquet format and we work with these data in R as data frames or in an excplicitly spatial format using the sf package.

Some familiarity of the main spatial R packages sf and terra will be useful for following along with this workshop. The free online book Geocomputation with R is a good resource on working with spatial data in R.