Skip to contents

Overview

This vignette shows how to use dwcPrepare to prepare biodiversity data that use Darwin Core terms.

Currently, a number of Darwin Core Event and Location terms are supported by the package, calculated with two wrapper functions dwc_Event() and dwc_Location(). We think that the most useful part of the package is to generate some of the Darwin Core Location terms that require a level of ‘calculation’. For example, the coordinateUncertaintyInMeters and text for the Darwin Core locality term.

Most functions in the package can take data from a single location / event directly as part of their arguments. However, we will mostly have a spreadsheet with multiple locations / events that we wish to format to Darwin Core terms. The following examples show how to use dwcPrepare with dplyr::mutate() in the case where we have multiple locations / events. Please see the individual help files for examples with a single location / event.

Darwin Core Location terms

First let’s walk through generating the Darwin Core Location terms.

Note, if you have a small number of locations, you may wish to use the the online Georeferencing Calculator (Wieczorek and Wieczorek 2021). This has more options than we provide here, including calculating the uncertainty from an unknown datum. We relied heavily on this calculator and associated documentation when writing this package. Output from the calculator can be formatted for use in R using the dwc_format_gco() function.

Load packages

Prepare raw data

dwcPrepare comes with a small mock data set - Let’s load the data and take a quick look.

data("thylacine_data")

thylacine_data
#> # A tibble: 8 × 13
#>   site   trap     date_trap_setup   date_trap_collected longitude_dd latitude_dd
#>   <chr>  <chr>    <chr>             <chr>                      <dbl>       <dbl>
#> 1 Sumac  Sumac 1  05/09/2022 10:32… 06/09/2022 12:24:00         145.       -41.2
#> 2 Sumac  Sumac 2  05/09/2022 12:15… 06/09/2022 14:57:00         145.       -41.2
#> 3 Sumac  Sumac 1  05/10/2022 08:23… 06/10/2022 0:00:00          145.       -41.2
#> 4 Sumac  Sumac 2  05/10/2022 10:14… 06/10/2022 10:29:00         145.       -41.2
#> 5 Picton Picton 1 10/09/2022 10:32… 11/09/2022 12:24:00         147.       -43.3
#> 6 Picton Picton 2 10/09/2022 12:15… 11/09/2022 14:57:00         147.       -43.2
#> 7 Picton Picton 1 10/10/2022 0:00:… 11/10/2022 08:46:00         147.       -43.3
#> 8 Picton Picton 2 10/10/2022 10:14… 11/10/2022 10:29:00         147.       -43.2
#> # ℹ 7 more variables: longitude_dms <chr>, latitude_dms <chr>,
#> #   longitude_ddm <chr>, latitude_ddm <chr>, gps_uncertainty <dbl>,
#> #   species <chr>, count <dbl>
# see ?thylacine_data for more information

You will see that the data include latitude and longitude in three formats: decimal degrees (_dd), degrees minutes seconds (_dms) and degrees decimal minutes (_ddm). These are the three formats supported by dwcPrepare, which uses the parzer R package to convert coordinates to decimal degrees if needed. We have included all three formats so that users can choose which format to use when trying out the functions. Here we choose degrees minutes seconds for this vignette, so we’ll remove unnecessary columns:

thylacine_data <-
  thylacine_data |>
  select(-c(longitude_dd:latitude_dms))

Location terms step-by-step

We can now go through building up the location terms step by step.

First, we use dwc_coordinates() to do some checks, get our longitude and latitude in the right format, and record some basic information:

thylacine_data_a <-
  thylacine_data |>
  mutate(
    dwc_coordinates(
      longitude = longitude_ddm,
      latitude = latitude_ddm,
      verbatimCoordinateSystem = "degrees decimal minutes",
      verbatimSRS = "EPSG:4326"
    )
  )

Two things to note from the above. One is that dwc_coordinates() is called inside of dplyr::mutate(). If, like above, a tibble::tibble() is returned by the dwcPrepare function, then there is no need to specify a column name within dplyr::mutate(). However, if a single vector is returned, then a column name needs to be specified (see the next code chunk).

The other thing to note is that by giving strings to the verbatimCoordinateSystem and verbatimSRS, we are telling dwc_coordinates() that these values apply to all rows in the dataframe.

Next we calculate the precision of our coordinates:

thylacine_data_b <-
  thylacine_data_a |>
  mutate(
    coordinatePrecision = dwc_coordinatePrecision(
      verbatimLatitude = verbatimLatitude,
      verbatimLongitude = verbatimLongitude,
      verbatimCoordinateSystem = verbatimCoordinateSystem
    )
  )

We now have all the information we need to calculate the coordinate uncertainty in meters:

thylacine_data_c <-
  thylacine_data_b |>
  mutate(
    coordinateUncertaintyInMeters = dwc_coordinateUncertaintyInMeters(
      decimalLatitude = decimalLatitude,
      coordinatePrecision = coordinatePrecision,
      geodeticDatum = geodeticDatum,
      gps_uncertainty = gps_uncertainty
    )
  )

Next, if we have a shape file with the country, countryCode, stateProvince and county information for our area of interest, we can use dwc_country_to_county() to assign these values for each of our points.

dwcPrepare includes shape file for the area where our mock data were collected.

Warning: The dwc_country_to_county() function differs to the other dwcPrepare functions in two ways. First, it is not wrapped in dplyr::mutate() when applying to a dataframe. Also, column names are given as strings. We may update this in a future version of the package.

# Load the sf object with the county shape files that include the country,
# countryCode, stateProvince and county information for the area of interest
data("county_tas")

thylacine_data_d <-
  dwc_country_to_county(
    thylacine_data_c,
    decimalLongitude = "decimalLongitude",
    decimalLatitude = "decimalLatitude",
    county_sf = county_tas,
    country_column_name = "country",
    countryCode_column_name = "countryCode",
    stateProvince_column_name = "stateProvince",
    county_column_name = "county"
  )

Finally, a useful feature of dwcPrepare is providing information that can be used in the Darwin Core locality field using dwc_locality().

This function does require an sf POINT object that includes the locality names and their longitude and latitude for the area of interest. dwcPrepare comes with this object for Australian localities (locality_data_aus()).

# load localities data
data("locality_data_aus")

thylacine_data_e <-
  thylacine_data_d |>
  mutate(
    locality = dwc_locality(
      decimalLongitude = decimalLongitude,
      decimalLatitude = decimalLatitude,
      localities_sf = locality_data_aus,
      localities_names = "locality_name"
    )
  )
#> The legacy packages maptools, rgdal, and rgeos, underpinning the sp package,
#> which was just loaded, will retire in October 2023.
#> Please refer to R-spatial evolution reports for details, especially
#> https://r-spatial.org/r/2023/05/15/evolution4.html.
#> It may be desirable to make the sf package available;
#> package maintainers should consider adding sf to Suggests:.
#> The sp package is now running under evolution status 2
#>      (status 2 uses the sf package in place of rgdal)

Location terms with one function

Our final dataframe above, thylacine_data_e, includes the Darwin Core Location fields: decimalLatitude, decimalLongitude, geodeticDatum, coordinateUncertaintyInMeters, coordinatePrecision, verbatimLatitude, verbatimLongitude, verbatimCoordinateSystem, verbatimSRS, country, countryCode, stateProvince, county and locality.

Rather than generating these terms step-by-step, we can use the wrapper function dwc_Location() to do everything in one step:

thylacine_data_location <-
  thylacine_data |>
  mutate(
    dwc_Location(
      longitude = longitude_ddm,
      latitude = latitude_ddm,
      verbatimCoordinateSystem = "degrees decimal minutes",
      verbatimSRS = "EPSG:4326",
      gps_uncertainty = gps_uncertainty,
      localities_sf = locality_data_aus,
      localities_names = "locality_name",
      county_sf = county_tas
    )
  )

Much easier!

Darwin Core Event terms

There is only a single function for Darwin Core Event terms: dwc_Event(). This is because the lubridate R package already includes many functions that can be used to calculate Darwin Core Event terms. dwc_Event() provides a wrapper for a series of lubridate functions, and experienced R users may prefer to use the lubridate functions directly (we have found dwc_Event() to be slow for large datasets).

The Darwin Core terms returned by dwc_Event() are: eventDate, startDayOfYear, endDayOfYear, year, month, day, verbatimEventDate, and optionally fieldNumber, habitat, samplingProtocol, samplingEffort, fieldNotes and eventRemarks. If both start and end date/date-time are supplied the function will also return: sampleSizeValue, sampleSizeUnit.

For example, using our thylacine_data()

thylacine_data_event <-
  thylacine_data |>
  mutate(
    dwc_Event(
      start = date_trap_setup,
      end = date_trap_collected,
      tzone = "Australia/Hobart",
      samplingEffort = "1 trap"
    )
  )

References

Wieczorek C, J Wieczorek (2021) Georeferencing Calculator. Available: http://georeferencing.org/georefcalculator/gc.html. Accessed [2023-03-09].