Overview
This vignette shows how to use dwcPrepare to prepare biodiversity data that use Darwin Core terms.
Currently, a number of Darwin Core Event and Location terms are
supported by the package, calculated with two wrapper functions
dwc_Event()
and dwc_Location()
. We think that
the most useful part of the package is to generate some of the Darwin
Core Location terms
that require a level of ‘calculation’. For example, the coordinateUncertaintyInMeters
and text for the Darwin Core locality term.
Most functions in the package can take data from a single location /
event directly as part of their arguments. However, we will mostly have
a spreadsheet with multiple locations / events that we wish to format to
Darwin Core terms. The following
examples show how to use dwcPrepare with dplyr::mutate()
in
the case where we have multiple locations / events. Please see the
individual help files for examples with a single location / event.
Darwin Core Location terms
First let’s walk through generating the Darwin Core Location terms.
Note, if you have a small number of locations, you may wish to use
the the online Georeferencing
Calculator (Wieczorek and Wieczorek 2021). This has more options
than we provide here, including calculating the uncertainty from an
unknown datum. We relied heavily on this calculator and associated
documentation when writing this package. Output from the calculator can
be formatted for use in R using the dwc_format_gco()
function.
Prepare raw data
dwcPrepare comes with a small mock data set - Let’s load the data and take a quick look.
data("thylacine_data")
thylacine_data
#> # A tibble: 8 × 13
#> site trap date_trap_setup date_trap_collected longitude_dd latitude_dd
#> <chr> <chr> <chr> <chr> <dbl> <dbl>
#> 1 Sumac Sumac 1 05/09/2022 10:32… 06/09/2022 12:24:00 145. -41.2
#> 2 Sumac Sumac 2 05/09/2022 12:15… 06/09/2022 14:57:00 145. -41.2
#> 3 Sumac Sumac 1 05/10/2022 08:23… 06/10/2022 0:00:00 145. -41.2
#> 4 Sumac Sumac 2 05/10/2022 10:14… 06/10/2022 10:29:00 145. -41.2
#> 5 Picton Picton 1 10/09/2022 10:32… 11/09/2022 12:24:00 147. -43.3
#> 6 Picton Picton 2 10/09/2022 12:15… 11/09/2022 14:57:00 147. -43.2
#> 7 Picton Picton 1 10/10/2022 0:00:… 11/10/2022 08:46:00 147. -43.3
#> 8 Picton Picton 2 10/10/2022 10:14… 11/10/2022 10:29:00 147. -43.2
#> # ℹ 7 more variables: longitude_dms <chr>, latitude_dms <chr>,
#> # longitude_ddm <chr>, latitude_ddm <chr>, gps_uncertainty <dbl>,
#> # species <chr>, count <dbl>
# see ?thylacine_data for more information
You will see that the data include latitude and longitude in three
formats: decimal degrees (_dd
), degrees minutes seconds
(_dms
) and degrees decimal minutes (_ddm
).
These are the three formats supported by dwcPrepare, which uses the parzer R package to convert
coordinates to decimal degrees if needed. We have included all three
formats so that users can choose which format to use when trying out the
functions. Here we choose degrees minutes seconds for this vignette, so
we’ll remove unnecessary columns:
Location terms step-by-step
We can now go through building up the location terms step by step.
First, we use dwc_coordinates()
to do some checks, get
our longitude and latitude in the right format, and record some basic
information:
thylacine_data_a <-
thylacine_data |>
mutate(
dwc_coordinates(
longitude = longitude_ddm,
latitude = latitude_ddm,
verbatimCoordinateSystem = "degrees decimal minutes",
verbatimSRS = "EPSG:4326"
)
)
Two things to note from the above. One is that
dwc_coordinates()
is called inside of
dplyr::mutate()
. If, like above, a
tibble::tibble()
is returned by the dwcPrepare
function, then there is no need to specify a column name within
dplyr::mutate()
. However, if a single vector is returned,
then a column name needs to be specified (see the next code chunk).
The other thing to note is that by giving strings to the
verbatimCoordinateSystem
and verbatimSRS
, we
are telling dwc_coordinates()
that these values apply to
all rows in the dataframe.
Next we calculate the precision of our coordinates:
thylacine_data_b <-
thylacine_data_a |>
mutate(
coordinatePrecision = dwc_coordinatePrecision(
verbatimLatitude = verbatimLatitude,
verbatimLongitude = verbatimLongitude,
verbatimCoordinateSystem = verbatimCoordinateSystem
)
)
We now have all the information we need to calculate the coordinate uncertainty in meters:
thylacine_data_c <-
thylacine_data_b |>
mutate(
coordinateUncertaintyInMeters = dwc_coordinateUncertaintyInMeters(
decimalLatitude = decimalLatitude,
coordinatePrecision = coordinatePrecision,
geodeticDatum = geodeticDatum,
gps_uncertainty = gps_uncertainty
)
)
Next, if we have a shape file with the country, countryCode,
stateProvince and county information for our area of interest, we can
use dwc_country_to_county()
to assign these values for each
of our points.
dwcPrepare includes shape file for the area where our mock data were collected.
Warning: The dwc_country_to_county()
function differs to the other dwcPrepare
functions in two
ways. First, it is not wrapped in dplyr::mutate()
when
applying to a dataframe. Also, column names are given as strings. We may
update this in a future version of the package.
# Load the sf object with the county shape files that include the country,
# countryCode, stateProvince and county information for the area of interest
data("county_tas")
thylacine_data_d <-
dwc_country_to_county(
thylacine_data_c,
decimalLongitude = "decimalLongitude",
decimalLatitude = "decimalLatitude",
county_sf = county_tas,
country_column_name = "country",
countryCode_column_name = "countryCode",
stateProvince_column_name = "stateProvince",
county_column_name = "county"
)
Finally, a useful feature of dwcPrepare is providing information that
can be used in the Darwin Core locality field using
dwc_locality()
.
This function does require an sf POINT object
that includes the locality names and their longitude and latitude for
the area of interest. dwcPrepare comes with this object for Australian
localities (locality_data_aus()
).
# load localities data
data("locality_data_aus")
thylacine_data_e <-
thylacine_data_d |>
mutate(
locality = dwc_locality(
decimalLongitude = decimalLongitude,
decimalLatitude = decimalLatitude,
localities_sf = locality_data_aus,
localities_names = "locality_name"
)
)
#> The legacy packages maptools, rgdal, and rgeos, underpinning the sp package,
#> which was just loaded, will retire in October 2023.
#> Please refer to R-spatial evolution reports for details, especially
#> https://r-spatial.org/r/2023/05/15/evolution4.html.
#> It may be desirable to make the sf package available;
#> package maintainers should consider adding sf to Suggests:.
#> The sp package is now running under evolution status 2
#> (status 2 uses the sf package in place of rgdal)
Location terms with one function
Our final dataframe above, thylacine_data_e, includes the Darwin Core Location fields: decimalLatitude, decimalLongitude, geodeticDatum, coordinateUncertaintyInMeters, coordinatePrecision, verbatimLatitude, verbatimLongitude, verbatimCoordinateSystem, verbatimSRS, country, countryCode, stateProvince, county and locality.
Rather than generating these terms step-by-step, we can use the
wrapper function dwc_Location()
to do everything in one
step:
thylacine_data_location <-
thylacine_data |>
mutate(
dwc_Location(
longitude = longitude_ddm,
latitude = latitude_ddm,
verbatimCoordinateSystem = "degrees decimal minutes",
verbatimSRS = "EPSG:4326",
gps_uncertainty = gps_uncertainty,
localities_sf = locality_data_aus,
localities_names = "locality_name",
county_sf = county_tas
)
)
Much easier!
Darwin Core Event terms
There is only a single function for Darwin Core Event terms:
dwc_Event()
. This is because the lubridate R package already
includes many functions that can be used to calculate Darwin Core Event terms.
dwc_Event()
provides a wrapper for a series of lubridate functions, and
experienced R users may prefer to use the lubridate functions directly
(we have found dwc_Event()
to be slow for large
datasets).
The Darwin Core terms returned by dwc_Event()
are: eventDate, startDayOfYear,
endDayOfYear, year, month, day, verbatimEventDate,
and optionally fieldNumber, habitat, samplingProtocol,
samplingEffort,
fieldNotes and eventRemarks. If
both start and end date/date-time are supplied the function will also
return: sampleSizeValue,
sampleSizeUnit.
For example, using our thylacine_data()
References
Wieczorek C, J Wieczorek (2021) Georeferencing Calculator. Available: http://georeferencing.org/georefcalculator/gc.html. Accessed [2023-03-09].