eBird Status Data Products

Introduction

The contents of this website comprise the notes for a workshop on best practices for using eBird Status data products presented at the Australasian Ornithological Conference 2023 on Friday December 1 in Brisbane, Australia.

The eBird Status Data Products materials cover: downloading eBird Status data products, loading the data into R, and using them for a variety of applications.

Setup

This workshop is intended to be interactive. All examples are written in the R programming language, and the instructor will work through the examples in real time, while the attendees are encouraged following along by writing the same code. To ensure we can avoid any unnecessary delays, please follow these setup instructions prior to the workshop:

Create an eBird account if you don’t already have one and request access to the eBird Status data products.
Download and install the latest version of R. You must have R version 4.0.0 or newer to follow along with this workshop
Download and install the latest version of RStudio. RStudio is not required for this workshop; however, the instructors will be using it and you may find it easier to following along if you’re working in the same environment.
Create an RStudio project for working through the examples in this workshop.
The lessons in this workshop use a variety of R packages. To install all the necessary packages, run the following code

if (!requireNamespace("remotes", quietly = TRUE)) {
  install.packages("remotes")
}
remotes::install_github("ebird/ebird-best-practices")

Ensure all packages are updated to their most recent versions by clicking on the Update button on the Packages tab in RStudio.

Data

For those working through the notes on their own, all the required data will be downloaded as needed during the lessons; however, for those attending the workshop, having 30 people attempt to download a large amount of data on the same WiFi connect can pose a problem. With that in mind, attendees will be asked to download data in advance by running the following code:

# download data package
td <- file.path(tempdir(), "ebirdst-workshop-data")
dir.create(td, recursive = TRUE, showWarnings = FALSE)
tf <- file.path(td, "data.zip")
options(timeout = 10000)
download.file("https://cornell.box.com/shared/static/le1try00p75jnw9vr7cb0c64m3icwkpm.zip", 
              destfile = tf)

# unzip
unzip(tf, exdir = td)
data_dir <- "data"
dir.create(data_dir, showWarnings = FALSE)

# move gis files
dd <- file.path(td, "data/")
pasf <- "capad2022.gpkg"
gisf <- "gis-data.gpkg"
file.copy(file.path(dd, pasf), file.path(data_dir, pasf))
file.copy(file.path(dd, gisf), file.path(data_dir, gisf))

# move ebirdst data
ebirdst_data_path <- file.path(td, "data/ebirdst-data/")
files <- list.files(ebirdst_data_path, recursive = TRUE)
files <- files[grepl("^2022/", files)]
dest_dir <- ebirdst::ebirdst_data_dir()
# create directories
for (d in unique(dirname(files))) {
  dir.create(file.path(dest_dir, d), showWarnings = FALSE, recursive = TRUE)
}
# copy files
for (f in files) {
  if (!file.exists(file.path(dest_dir, f))) {
    file.copy(from = file.path(ebirdst_data_path, f), 
              to = file.path(dest_dir, f))
  }
}

# clean up
unlink(td, recursive = TRUE)

Template R script

During the workshop we’ll work through the lessons on this website, writing code together in real time; however, it will be useful to have script templates to work from. Open RStudio, then:

Create a script named “ebird-status.R”, visit this link, and copy the contents into the script you just created.

Tidyverse

Throughout this workshop, we use packages from the Tidyverse, an opinionated collection of R packages designed for data science. Packages such as ggplot2, for data visualization, and dplyr, for data manipulation, are two of the most well known Tidyverse packages; however, there are many more. We’ll try to explain any functions as they come up; however, for a good general resource on working with data in R using the Tidyverse see the free online book R for Data Science by Hadley Wickham.

The one piece of the Tidyverse that we will cover up front is the pipe operator %>%. The pipe takes the expression to the left of it and “pipes” it into the first argument of the expression on the right.

library(dplyr)

# without pipe
mean(1:10)
#> [1] 5.5

# with pipe
1:10 %>% mean()
#> [1] 5.5

The pipe can code significantly more readable by avoiding nested function calls, reducing the need for intermediate variables, and making sequential operations read left-to-right. For example, to add a new variable to a data frame, then summarize using a grouping variable, the following are equivalent:

# intermediate variables
mtcars_kg <- mutate(mtcars, wt_kg = 454 * wt)
mtcars_grouped <- group_by(mtcars_kg, cyl)
summarize(mtcars_grouped, wt_kg = mean(wt_kg))
#> # A tibble: 3 × 2
#>     cyl wt_kg
#>   <dbl> <dbl>
#> 1     4 1038.
#> 2     6 1415.
#> 3     8 1816.

# nested function calls
summarize(
  group_by(
    mutate(mtcars, wt_kg = 454 * wt),
    cyl
  ),
  wt_kg = mean(wt_kg)
)
#> # A tibble: 3 × 2
#>     cyl wt_kg
#>   <dbl> <dbl>
#> 1     4 1038.
#> 2     6 1415.
#> 3     8 1816.

# pipes
mtcars %>% 
  mutate(wt_kg = 454 * wt) %>% 
  group_by(cyl) %>% 
  summarize(wt_kg = mean(wt_kg))
#> # A tibble: 3 × 2
#>     cyl wt_kg
#>   <dbl> <dbl>
#> 1     4 1038.
#> 2     6 1415.
#> 3     8 1816.

Exercise

Rewrite the following code using pipes:

set.seed(1)
round(log(runif(10, min = 0.5)), 1)
#>  [1] -0.5 -0.4 -0.2  0.0 -0.5 -0.1  0.0 -0.2 -0.2 -0.6

Solution

set.seed(1)
runif(10, min = 0.5) %>% 
  log() %>% 
  round(digits = 1)
#>  [1] -0.5 -0.4 -0.2  0.0 -0.5 -0.1  0.0 -0.2 -0.2 -0.6