eBird Best Practices: Modeling Relative Abundance

Introduction

The contents of this website comprise the notes for a workshop on best practices for using eBird data presented at the Australasian Ornithological Conference 2023 on Monday November 27 in Brisbane, Australia. The workshop is divided into two lessons covering:

  1. eBird Data: introduction to the eBird Reference Dataset (ERD), challenges associated with using eBird data for analysis, and best practices for preparing eBird data for modeling.
  2. Modeling Relative Abundance: best practices for using eBird data to model encounter rate, count, and relative abundance for a species.

Setup

This workshop is intended to be interactive. All examples are written in the R programming language, and the instructor will work through the examples in real time, while the attendees are encouraged following along by writing the same code. To ensure we can avoid any unnecessary delays, please follow these setup instructions prior to the workshop

  1. Download and install the latest version of R. You must have R version 4.0.0 or newer to follow along with this workshop
  2. Download and install the latest version of RStudio. RStudio is not required for this workshop; however, the instructors will be using it and you may find it easier to following along if you’re working in the same environment.
  3. Create an RStudio project for working through the examples in this workshop. In the Data section below, you’ll download the data to the proper path in the project you create.
  4. The lessons in this workshop use a variety of R packages. To install all the necessary packages, run the following code
if (!requireNamespace("remotes", quietly = TRUE)) {
  install.packages("remotes")
}
remotes::install_github("ebird/ebird-best-practices")
  1. Ensure all packages are updated to their most recent versions by clicking on the Update button on the Packages tab in RStudio.

Data

Since having a large group of workshop participants attempt to download a large amount of data all at once, on the same WIFI connection, we’ve provided pre-packaged data and a script to load it into the correct directories before the workshop begins.

# download data package to root project directory
options(timeout = 10000)
download.file("https://cornell.box.com/shared/static/vpk03qqwjtswskam246prd5h9am3lzt5.zip", 
              destfile = "data.zip")

# unzip
unzip("data.zip")

Template R scripts

During the workshop we’ll work through the lessons on this website, writing code together in real time; however, it will be useful to have script templates to work from. Open RStudio, then:

  1. Create a script named “ebird-data.R”, visit this link, and copy the contents into the script you just created.
  2. Create a script named “ebird-rel-abd.R”, visit this link, and copy the contents into the script you just created.

Tidyverse

Throughout this book, we use packages from the Tidyverse, an opinionated collection of R packages designed for data science. Packages such as ggplot2, for data visualization, and dplyr, for data manipulation, are two of the most well known Tidyverse packages; however, there are many more. We’ll try to explain any functions as they come up; however, for a good general resource on working with data in R using the Tidyverse see the free online book R for Data Science by Hadley Wickham.

The one piece of the Tidyverse that we will cover up front is the pipe operator %>%. The pipe takes the expression to the left of it and “pipes” it into the first argument of the expression on the right.

library(dplyr)

# without pipe
mean(1:10)
#> [1] 5.5

# with pipe
1:10 %>% mean()
#> [1] 5.5

The pipe can code significantly more readable by avoiding nested function calls, reducing the need for intermediate variables, and making sequential operations read left-to-right. For example, to add a new variable to a data frame, then summarize using a grouping variable, the following are equivalent:

# intermediate variables
mtcars_kg <- mutate(mtcars, wt_kg = 454 * wt)
mtcars_grouped <- group_by(mtcars_kg, cyl)
summarize(mtcars_grouped, wt_kg = mean(wt_kg))
#> # A tibble: 3 × 2
#>     cyl wt_kg
#>   <dbl> <dbl>
#> 1     4 1038.
#> 2     6 1415.
#> 3     8 1816.

# nested function calls
summarize(
  group_by(
    mutate(mtcars, wt_kg = 454 * wt),
    cyl
  ),
  wt_kg = mean(wt_kg)
)
#> # A tibble: 3 × 2
#>     cyl wt_kg
#>   <dbl> <dbl>
#> 1     4 1038.
#> 2     6 1415.
#> 3     8 1816.

# pipes
mtcars %>% 
  mutate(wt_kg = 454 * wt) %>% 
  group_by(cyl) %>% 
  summarize(wt_kg = mean(wt_kg))
#> # A tibble: 3 × 2
#>     cyl wt_kg
#>   <dbl> <dbl>
#> 1     4 1038.
#> 2     6 1415.
#> 3     8 1816.
Exercise

Rewrite the following code using pipes:

set.seed(1)
round(log(runif(10, min = 0.5)), 1)
#>  [1] -0.5 -0.4 -0.2  0.0 -0.5 -0.1  0.0 -0.2 -0.2 -0.6
set.seed(1)
runif(10, min = 0.5) %>% 
  log() %>% 
  round(digits = 1)
#>  [1] -0.5 -0.4 -0.2  0.0 -0.5 -0.1  0.0 -0.2 -0.2 -0.6