| Title: | One Health VBD Hub |
|---|---|
| Description: | Interface with the One Health VBD (vector-borne disease) Hub <https://vbdhub.org/> and related repositories (VectorByte <https://www.vectorbyte.org>, GBIF <https://www.gbif.org> and AREAdata <https://pearselab.github.io/areadata/>) directly to find, download, and subset vector-borne disease data. |
| Authors: | Francis Windram [aut, cre, cph] (ORCID: <https://orcid.org/0000-0002-2129-826X>) |
| Maintainer: | Francis Windram <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 1.0.1.9000 |
| Built: | 2026-06-05 11:55:27 UTC |
| Source: | https://github.com/fwimp/ohvbd |
This string is used as the basis for all calls to AREAdata. It does not contain any tokens or session ids, and thus can be regenerated at any time.
ad_basereq()ad_basereq()
Returns a string containing the root address of the AREAdata dataset.
Francis Windram
basereq <- ad_basereq()basereq <- ad_basereq()
Intelligently bind together data from AREAdata and other sources at a given spatial scale.
assoc_ad( data, areadata, targetdate = NA, enddate = NA, gid = 0, lonlat_names = c("Longitude", "Latitude"), cache_location = NULL, basereq = ad_basereq() )assoc_ad( data, areadata, targetdate = NA, enddate = NA, gid = 0, lonlat_names = c("Longitude", "Latitude"), cache_location = NULL, basereq = ad_basereq() )
data |
the source data to bind AREAdata to. This must contain decimal lonlat data! |
areadata |
the AREAdata to bind, usually from |
targetdate |
ONE OF the following:
|
enddate |
The (exclusive) end of the range of dates to search for. If this is unfilled, only the |
gid |
the spatial scale to retrieve (0 = country-level, 1=province-level...). (Note: this will preferentially use the gid level of |
lonlat_names |
a vector containing the column names of the longitude and latitude columns IN THAT ORDER! |
cache_location |
path to cache location (defaults to a temporary user directory, or one set by |
basereq |
the url of the AREAdata database (usually generated by |
A matrix of the data with added columns extracted from areadata.
The date range is a partially open interval. That is to say the lower bound (targetdate) is inclusive, but the upper bound (enddate) is exclusive.
For example a date range of "2020-08-04" - "2020-08-12" will return the 7 days from the 4th through to the 11th of August, but not the 12th.
In cases where a full date is not provided, the earliest date possible with the available data is chosen.
So "2020-04" will internally become "2020-04-01".
If an incomplete date is specified as the targetdate and no enddate is specified, the range to search is inferred from the minimum temporal scale provided in targetdate.
For example "2020-04" will be taken to mean the month of April in 2020, and the enddate will internally be set to "2020-05-01".
Francis Windram
vtdf <- search_hub("Aedes aegypti", "vt") |> tail(20) |> fetch() |> glean(cols = c( "DatasetID", "Latitude", "Longitude", "Interactor1Genus", "Interactor1Species" ), returnunique = TRUE) areadata <- fetch_ad(metric="temp", gid=2, use_cache=TRUE) ad_extract_working <- assoc_ad(vtdf, areadata, targetdate = c("2021-08-04"), enddate=c("2021-08-06"), gid=2, lonlat_names = c("Longitude", "Latitude"))vtdf <- search_hub("Aedes aegypti", "vt") |> tail(20) |> fetch() |> glean(cols = c( "DatasetID", "Latitude", "Longitude", "Interactor1Genus", "Interactor1Species" ), returnunique = TRUE) areadata <- fetch_ad(metric="temp", gid=2, use_cache=TRUE) ad_extract_working <- assoc_ad(vtdf, areadata, targetdate = c("2021-08-04"), enddate=c("2021-08-06"), gid=2, lonlat_names = c("Longitude", "Latitude"))
Intelligently bind together data with gadm IDs at all scales.
assoc_gadm( df, lonlat_names = c("Longitude", "Latitude"), cache_location = NULL, basereq = ad_basereq() )assoc_gadm( df, lonlat_names = c("Longitude", "Latitude"), cache_location = NULL, basereq = ad_basereq() )
df |
the source data to bind gadm IDs to. This must contain decimal lonlat data! |
lonlat_names |
a vector containing the column names of the longitude and latitude columns IN THAT ORDER! |
cache_location |
path to cache location (defaults to a temporary user directory, or one set by |
basereq |
the url of the AREAdata database (usually generated by |
A matrix of the data with added gadm columns.
This will ALWAYS get and cache gid level 2 data sources. These files are about 80MB total, so if you are running on a metered connection do beware of this.
Francis Windram
vtdf <- search_hub("Aedes aegypti", "vt") |> tail(20) |> fetch() |> glean(cols = c( "DatasetID", "Latitude", "Longitude", "Interactor1Genus", "Interactor1Species" ), returnunique = TRUE) |> assoc_gadm(lonlat_names = c("Longitude", "Latitude"))vtdf <- search_hub("Aedes aegypti", "vt") |> tail(20) |> fetch() |> glean(cols = c( "DatasetID", "Latitude", "Longitude", "Interactor1Genus", "Interactor1Species" ), returnunique = TRUE) |> assoc_gadm(lonlat_names = c("Longitude", "Latitude"))
Attempt to access all presently supported databases and report if they were accessible.
check_db_status()check_db_status()
TRUE if all DB checks pass, else FALSE
Francis Windram
check_db_status()check_db_status()
Access ohvbd options and configured variables, and print them to the command line.
check_ohvbd_config(options_list = NULL)check_ohvbd_config(options_list = NULL)
options_list |
An (optional) list of variables to search for. |
TRUE if all desired options are set (though not necessarily turned on), else FALSE.
Francis Windram
check_ohvbd_config()check_ohvbd_config()
Delete all rda files from ohvbd AREAdata cache (Deprecated)
clean_ad_cache(cache_location = NULL)clean_ad_cache(cache_location = NULL)
cache_location |
location of the cache. |
No return value, called for side effects
clean_ad_cache() is now deprecated and should not be used. Please use clean_ohvbd_cache() instead.
Francis Windram
Delete files from ohvbd cache directories
clean_ohvbd_cache(subdir = NULL, path = NULL, dryrun = FALSE, force = FALSE)clean_ohvbd_cache(subdir = NULL, path = NULL, dryrun = FALSE, force = FALSE)
subdir |
a subdirectory or list of subdirectories to clean. |
path |
location within which to remove rda files. (Defaults to the standard ohvbd cache location). |
dryrun |
if |
force |
do not ask for confirmation before cleaning. |
No return value, called for side effects
Francis Windram
clean_ad_cache()clean_ad_cache()
This is a convenience method that infers and applies the correct extractor for the input.
extract(res, ...)extract(res, ...)
res |
An object of type |
... |
Any arguments to be passed to the underlying extractors (see |
The extracted data, either as an ohvbd.data.frame or ohvbd.ad.matrix object.
extract() is now deprecated and should not be used. Please use glean() instead.
Francis Windram
This is a convenience method that infers and applies the correct fetch function for the input ids.
fetch(ids, ...)fetch(ids, ...)
ids |
An object of type |
... |
Any other arguments to be passed to the underlying fetch functions (see |
The downloaded data, as an ohvbd.responses object.
Francis Windram
search_hub("Ixodes", "vt") |> fetch()search_hub("Ixodes", "vt") |> fetch()
Retrieve AREAdata dataset/s specified by metric and spatial scale (GID).
fetch_ad( metric = "temp", gid = 0, use_cache = TRUE, cache_location = NULL, refresh_cache = FALSE, timeout = 240, basereq = ad_basereq() )fetch_ad( metric = "temp", gid = 0, use_cache = TRUE, cache_location = NULL, refresh_cache = FALSE, timeout = 240, basereq = ad_basereq() )
metric |
the metric to retrieve from areadata. |
gid |
the spatial scale to retrieve (0 = country-level, 1=province-level ...). |
use_cache |
load files from cache if possible, and save them if not present. |
cache_location |
path to cache location (defaults to a temporary user directory, or one set by |
refresh_cache |
force a refresh of the relevant cached data (and enables use_cache). |
timeout |
timeout for data download from figshare/github in seconds. |
basereq |
the url of the AREAdata database (usually generated by |
A ohvbd.ad.matrix of the requested data (with added attributes for gid and metric).
The following metrics are valid (alternative names are listed in brackets):
temp (temperature)
spechumid (specific humidity)
relhumid (relative humidity)
uv (ultraviolet)
precip (precipitation, rainfall)
popdens (population density, population)
forecast (future climate, future)
Francis Windram
fetch_ad(metric="temp", gid=0)fetch_ad(metric="temp", gid=0)
This tries to extract and simplify the citations from a dataset downloaded using ohvbd.
fetch_citations(dataset, ...)fetch_citations(dataset, ...)
dataset |
An object of type |
... |
Any arguments to be passed to the underlying funcs. |
The extracted data, either as an ohvbd.data.frame or ohvbd.ad.matrix object.
Francis Windram
search_hub("Ixodes", "vt") |> fetch() |> glean() |> fetch_citations()search_hub("Ixodes", "vt") |> fetch() |> glean() |> fetch_citations()
Retrieve citations for vecdyn data either directly from the dataset or by redownloading the appropriate data.
fetch_citations_vd( dataset, redownload = TRUE, minimise = FALSE, collapse_cols = TRUE )fetch_citations_vd( dataset, redownload = TRUE, minimise = FALSE, collapse_cols = TRUE )
dataset |
The dataset from which you wish to retrieve citations. |
redownload |
Redownload data if citation columns are missing. |
minimise |
Whether to return one row per citation (rather than one per dataset ID). |
collapse_cols |
Whether to remove completely empty columns. |
ohvbd.data.frame of citation data
Francis Windram
Retrieve citations for vectraits data either directly from the dataset or by redownloading the appropriate data.
fetch_citations_vt(dataset, redownload = TRUE, minimise = FALSE)fetch_citations_vt(dataset, redownload = TRUE, minimise = FALSE)
dataset |
The dataset from which you wish to retrieve citations. |
redownload |
Redownload data if citation columns are missing. |
minimise |
Whether to return one row per citation (rather than one per dataset ID). |
ohvbd.data.frame of citation data
Francis Windram
Retrieve AREAdata gadm mapping shapefiles specified by spatial scale (GID). These vectors are cached as GeoPackage files.
fetch_gadm_sfs( gid = 0, cache_location = NULL, refresh_cache = FALSE, basereq = ad_basereq(), call = rlang::caller_env() )fetch_gadm_sfs( gid = 0, cache_location = NULL, refresh_cache = FALSE, basereq = ad_basereq(), call = rlang::caller_env() )
gid |
the spatial scale to retrieve (0 = country-level, 1=province-level...). |
cache_location |
path to cache location (defaults to a temporary user directory, or one set by |
refresh_cache |
force a refresh of the relevant cached data. |
basereq |
the url of the AREAdata database (usually generated by |
call |
The env from which this was called (defaults to the direct calling environment). |
A SpatVector (from terra::vect()) of the requested shapefile.
Francis Windram
fetch_gadm_sfs(gid=0)fetch_gadm_sfs(gid=0)
Retrieve GBIF dataset/s specified by their dataset ID.
fetch_gbif(ids, filepath = ".")fetch_gbif(ids, filepath = ".")
ids |
a string or character vector of ids (preferably in an |
filepath |
directory to save gbif download files into. |
A list of rgbif occ_download_get objects, as an ohvbd.responses object.
Only 300 datasets can be requested at once (for now) due to technical limitations originating from the GBIF server setup. It is worth splitting longer lists of ids into a couple of chunks if you need more than this.
If you regularly use ohvbd to download large numbers of datasets at once and chunking is causing you other issues, please raise an issue at https://github.com/fwimp/ohvbd/issues.
Francis Windram
fetch_gbif("dbc4a3ae-680f-44e6-ab25-c70e27b38dbc") ohvbd.ids("dbc4a3ae-680f-44e6-ab25-c70e27b38dbc", "gbif") |> fetch() # Calls fetch_gbif()fetch_gbif("dbc4a3ae-680f-44e6-ab25-c70e27b38dbc") ohvbd.ids("dbc4a3ae-680f-44e6-ab25-c70e27b38dbc", "gbif") |> fetch() # Calls fetch_gbif()
Retrieve and parse VecDyn datasets specified by their dataset IDs in batches.
This is not usually necessary (generally you just need fetch_vd()) but allows one to release data that is not in use from memory. If you would like more control on extraction or parsing then it is best to wrap fetch_vd() and glean_vd() in your own chunker instead.
fetch_glean_vd_chunked( ids, chunksize = 20, cols = NULL, returnunique = FALSE, rate = 5, connections = 2, basereq = vb_basereq() )fetch_glean_vd_chunked( ids, chunksize = 20, cols = NULL, returnunique = FALSE, rate = 5, connections = 2, basereq = vb_basereq() )
ids |
a numeric vector of IDs (preferably in an |
chunksize |
an integer defining the size of chunks to retrieve in one iteration. |
cols |
a character vector of columns to extract from the dataset. |
returnunique |
whether to return only the unique rows within each dataset according to the filtered columns. |
rate |
maximum number of calls to the API per second. |
connections |
number of simultaneous connections to the server at once. Maximum 8. Do not enable unless you really need to as this hits the server significantly harder than usual. |
basereq |
an httr2 request object, as generated by |
An ohvbd.data.frame containing the requested data.
Francis Windram
fetch_glean_vd_chunked(c(423,424,425), chunksize = 2, rate=5)fetch_glean_vd_chunked(c(423,424,425), chunksize = 2, rate=5)
Retrieve and parse VecTraits datasets specified by their dataset IDs in batches.
This is not usually necessary (generally you just need fetch_vt()) but allows one to release data that is not in use from memory. If you would like more control on extraction or parsing then it is best to wrap fetch_vt() and glean_vt() in your own chunker instead.
fetch_glean_vt_chunked( ids, chunksize = 20, cols = NULL, returnunique = FALSE, rate = 5, connections = 2, basereq = vb_basereq() )fetch_glean_vt_chunked( ids, chunksize = 20, cols = NULL, returnunique = FALSE, rate = 5, connections = 2, basereq = vb_basereq() )
ids |
a numeric vector of IDs (preferably in an |
chunksize |
an integer defining the size of chunks to retrieve in one iteration. |
cols |
a character vector of columns to extract from the dataset. |
returnunique |
whether to return only the unique rows within each dataset according to the filtered columns. |
rate |
maximum number of calls to the API per second. |
connections |
number of simultaneous connections to the server at once. Maximum 8. Do not enable unless you really need to as this hits the server significantly harder than usual. |
basereq |
an httr2 request object, as generated by |
An ohvbd.data.frame containing the requested data.
Francis Windram
fetch_glean_vt_chunked(c(54,55,56), chunksize = 2, rate=5)fetch_glean_vt_chunked(c(54,55,56), chunksize = 2, rate=5)
Retrieve VecDyn dataset/s specified by their dataset ID.
fetch_vd(ids, rate = 5, connections = 2, basereq = vb_basereq())fetch_vd(ids, rate = 5, connections = 2, basereq = vb_basereq())
ids |
a numeric ID or numeric vector of ids (preferably in an |
rate |
maximum number of calls to the API per second. |
connections |
number of simultaneous connections to the server at once. Maximum 8. Do not enable unless you really need to as this hits the server significantly harder than usual. |
basereq |
an httr2 request object, as generated by |
A list of httr2 response objects, as an ohvbd.responses object.
Francis Windram
fetch_vd(54) fetch_vd(c(423,424,425)) ohvbd.ids(c(423,424,425), "vd") |> fetch() # Calls fetch_vd()fetch_vd(54) fetch_vd(c(423,424,425)) ohvbd.ids(c(423,424,425), "vd") |> fetch() # Calls fetch_vd()
Retrieve length of VecDyn dataset/s specified by their dataset ID.
fetch_vd_counts( ids, page_size = 50, cache_location = NULL, refresh_cache = FALSE, noprogress = FALSE, basereq = vb_basereq() )fetch_vd_counts( ids, page_size = 50, cache_location = NULL, refresh_cache = FALSE, noprogress = FALSE, basereq = vb_basereq() )
ids |
a numeric ID or numeric vector of ids (preferably in an |
page_size |
the page size returned by VecDyn (default is 50). |
cache_location |
path to cache location (defaults to a temporary user directory, or one set by |
refresh_cache |
force a refresh of the relevant cached data. |
noprogress |
disable non-essential messaging (progress bars etc.). |
basereq |
an httr2 request object, as generated by |
A dataframe describing the number of rows and number of pages for the set of ids.
Francis Windram
fetch_vd_counts(54) fetch_vd_counts(c(423,424,425))fetch_vd_counts(54) fetch_vd_counts(c(423,424,425))
Fetch VecDyn metadata table (downloading if necessary) and cache if fresh.
fetch_vd_meta( ids = NULL, cache_location = NULL, refresh_cache = FALSE, noprogress = FALSE, basereq = vb_basereq() )fetch_vd_meta( ids = NULL, cache_location = NULL, refresh_cache = FALSE, noprogress = FALSE, basereq = vb_basereq() )
ids |
a numeric ID or numeric vector of ids (preferably in an |
cache_location |
path to cache location (defaults to a temporary user directory, or one set by |
refresh_cache |
force a refresh of the relevant cached data. |
noprogress |
disable non-essential messaging (progress bars etc.). |
basereq |
an httr2 request object, as generated by |
A dataframe describing the current VecDyn metadata.
Francis Windram
fetch_vd_meta_table()fetch_vd_meta_table()
Retrieve VecTraits dataset/s specified by their dataset ID.
fetch_vt(ids, rate = 5, connections = 2, basereq = vb_basereq())fetch_vt(ids, rate = 5, connections = 2, basereq = vb_basereq())
ids |
a numeric ID or numeric vector of ids (preferably in an |
rate |
maximum number of calls to the API per second. |
connections |
number of simultaneous connections to the server at once. Maximum 8. Do not enable unless you really need to as this hits the server significantly harder than usual. |
basereq |
an httr2 request object, as generated by |
A list of httr2 response objects, as an ohvbd.responses object.
Francis Windram
fetch_vt(54) fetch_vt(c(54, 55, 56)) ohvbd.ids(c(54, 55, 56), "vt") |> fetch() # Calls fetch_vt()fetch_vt(54) fetch_vt(c(54, 55, 56)) ohvbd.ids(c(54, 55, 56), "vt") |> fetch() # Calls fetch_vt()
Retrieve the IDs for any datasets matching the given database.
filter_db(ids, db)filter_db(ids, db)
ids |
an |
db |
a database name as a string. One of |
An ohvbd.ids vector of dataset IDs.
If filter_db() recieves an ohvbd.ids object by mistake, it will transparently return it if the source database matches db.
Francis Windram
search_hub("Ixodes ricinus") search_hub("Ixodes ricinus") |> filter_db("vt") |> fetch() |> glean()search_hub("Ixodes ricinus") search_hub("Ixodes ricinus") |> filter_db("vt") |> fetch() |> glean()
Get all the current column names in VecDyn, alongside associated data if desired.
find_vd_columns(full = FALSE, basereq = vb_basereq())find_vd_columns(full = FALSE, basereq = vb_basereq())
full |
whether to return all the data about current fields, or else just return the names |
basereq |
an httr2 request object, as generated by |
A character vector (or dataframe) of column information.
Francis Windram
find_vd_columns()find_vd_columns()
Get all the current dataset IDs in VecDyn, as a numeric vector.
find_vd_ids(basereq = vb_basereq())find_vd_ids(basereq = vb_basereq())
basereq |
an httr2 request object, as generated by |
An ohvbd.ids vector of VecDyn dataset IDs.
Francis Windram
find_vd_ids()find_vd_ids()
Get all the current dataset IDs in VecTraits, as a numeric vector.
find_vt_ids(basereq = vb_basereq())find_vt_ids(basereq = vb_basereq())
basereq |
an httr2 request object, as generated by |
An ohvbd.ids vector of VecTraits dataset IDs.
Francis Windram
find_vt_ids()find_vt_ids()
Force an object to appear to come from a specific database
force_db(x, db)force_db(x, db)
x |
Object to force. |
db |
Database to apply to |
Object with the "db" attribute set to db
DO NOT use this function to create ids to feed into fetch()!
Objects created in this way may lack vital underlying data required later.
Instead use ohvbd.ids() for this purpose.
This is a synonym for ohvbd_db(x) <- db that's easier to work with in pipelines.
Francis Windram
force_db(c(1,2,3), "vt")force_db(c(1,2,3), "vt")
Format and output to the terminal a visualisation of the overlaps between a given period and another set of dates.
This is mostly used in the error handling of glean_ad() however it can also be used independently.
It was designed to fill a more general role within UI design using the cli package, and should be usable (or hackable) by others needing the same tool.
format_time_overlap_bar( start, end, targets, targetrange = FALSE, twobar = FALSE, width = NULL, style = list() )format_time_overlap_bar( start, end, targets, targetrange = FALSE, twobar = FALSE, width = NULL, style = list() )
start |
the date that the reference period begins (as Date object). |
end |
the date that the reference period ends (as Date object). |
targets |
a vector of dates. |
targetrange |
is the target a range? If so this will treat the first two elements of |
twobar |
whether to render as two bars or as one with different colours for overlaps. |
width |
the width of the bars in characters. Defaults to 0.5 * console width. |
style |
a style from |
No return value
Francis Windram
format_time_overlap_bar( start = as.Date("2022-08-04"), end = as.Date("2022-08-11"), targets = c(as.Date("2022-08-05"), as.Date("2022-08-12")), targetrange = TRUE, twobar=TRUE )format_time_overlap_bar( start = as.Date("2022-08-04"), end = as.Date("2022-08-11"), targets = c(as.Date("2022-08-05"), as.Date("2022-08-12")), targetrange = TRUE, twobar=TRUE )
Given the large number of fields in vectraits it can be hard to know which of these you need to fill out. This generator asks a series of questions to determine what columns should be included in one's dataset.
generate_vt_template()generate_vt_template()
A character vector containing the column headers of the desired vectraits template or NULL.
Francis Windram
generate_vt_template()generate_vt_template()
Get ohvbd cache locations
get_default_ohvbd_cache(subdir = NULL, create = TRUE)get_default_ohvbd_cache(subdir = NULL, create = TRUE)
subdir |
The subdirectory within the cache to find/create (optional). |
create |
Whether to create the cache location if it does not already exist (defaults to TRUE). |
ohvbd cache path as a string
Francis Windram
get_default_ohvbd_cache()get_default_ohvbd_cache()
This is a convenience method that infers and applies the correct extractor for the input
glean(res, ...)glean(res, ...)
res |
An object of type |
... |
Any arguments to be passed to the underlying extractors (unused). |
The extracted data, either as an ohvbd.data.frame or ohvbd.ad.matrix object.
Francis Windram
search_hub("Ixodes", "vt") |> fetch() |> glean(cols=c("Interactor1Species")) fetch_ad(use_cache=TRUE) |> glean(targetdate="2020-08-04")search_hub("Ixodes", "vt") |> fetch() |> glean(cols=c("Interactor1Species")) fetch_ad(use_cache=TRUE) |> glean(targetdate="2020-08-04")
Extract the data returned by a call to fetch_ad(), filter columns of interest and by dates of interest.
Currently this does not handle Population Density or Forecast matrices, however the other 5 metrics are handled natively.
glean_ad( ad_matrix, targetdate = NA, enddate = NA, places = NULL, gid = NULL, printbars = TRUE )glean_ad( ad_matrix, targetdate = NA, enddate = NA, places = NULL, gid = NULL, printbars = TRUE )
ad_matrix |
A matrix or |
targetdate |
ONE OF the following:
|
enddate |
The (exclusive) end of the range of dates to search for. If this is unfilled, only the |
places |
A character vector or single string describing what locality to search for in the dataset. |
gid |
The spatial scale of the AREAdata matrix (this is not needed if the matrix has been supplied by |
printbars |
Whether to print time overlap bars in the case of dates outside the data range. |
An ohvbd.ad.matrix or a named vector containing the extracted data.
This function attempts to intelligently infer place selections based upon the provided gid and place names.
So if you have an AREAdata dataset at gid=1, and provide country names, the function will attempt to match those country names and retrieve any GID1-level data that is present.
Occasionally (such as in the case of "Albania", the municipality in La Guajira, Columbia) the name of a place may occur in locations other than those expected by the researcher.
Unfortunately this is not an easy problem to mitigate, and as such it is worthwhile checking the output of this function to make sure it is as you expect.
The date range is a partially open interval. That is to say the lower bound (targetdate) is inclusive, but the upper bound (enddate) is exclusive.
For example a date range of "2020-08-04" - "2020-08-12" will return the 7 days from the 4th through to the 11th of August, but not the 12th.
In cases where a full date is not provided, the earliest date possible with the available data is chosen.
So "2020-04" will internally become "2020-04-01".
If an incomplete date is specified as the targetdate and no enddate is specified, the range to search is inferred from the minimum temporal scale provided in targetdate.
For example "2020-04" will be taken to mean the month of April in 2020, and the enddate will internally be set to "2020-05-01".
Francis Windram
# All dates in August 2022 fetch_ad("temp", gid=0) |> glean_ad( targetdate = "2022-08", places = c("Albania", "Thailand") ) # 4th, 5th, and 6th of August 2022 (remember the enddate is EXCLUSIVE) fetch_ad("temp", gid=0) |> glean_ad( targetdate = "2022-08-04", enddate="2022-08-07", places = c("Albania", "Thailand") ) # 4th of August 2022 and 1st of August 2023 fetch_ad("temp", gid=0) |> glean_ad( targetdate = c("2022-08-04", "2023-08-01"), places = c("Albania", "Thailand") )# All dates in August 2022 fetch_ad("temp", gid=0) |> glean_ad( targetdate = "2022-08", places = c("Albania", "Thailand") ) # 4th, 5th, and 6th of August 2022 (remember the enddate is EXCLUSIVE) fetch_ad("temp", gid=0) |> glean_ad( targetdate = "2022-08-04", enddate="2022-08-07", places = c("Albania", "Thailand") ) # 4th of August 2022 and 1st of August 2023 fetch_ad("temp", gid=0) |> glean_ad( targetdate = c("2022-08-04", "2023-08-01"), places = c("Albania", "Thailand") )
Extract the data returned by a call to fetch_gbif(), filter columns of interest, and find unique rows if required.
glean_gbif(res, cols = NULL, returnunique = FALSE)glean_gbif(res, cols = NULL, returnunique = FALSE)
res |
a list of responses from GBIF as an |
cols |
a character vector of columns to extract from the dataset. |
returnunique |
whether to return only the unique rows within each dataset according to the filtered columns. |
An ohvbd.data.frame containing the requested data.
Francis Windram
fetch_gbif("dbc4a3ae-680f-44e6-ab25-c70e27b38dbc") |> glean_gbif() ohvbd.ids("dbc4a3ae-680f-44e6-ab25-c70e27b38dbc", "gbif") |> fetch() |> glean() # Calls glean_gbif()fetch_gbif("dbc4a3ae-680f-44e6-ab25-c70e27b38dbc") |> glean_gbif() ohvbd.ids("dbc4a3ae-680f-44e6-ab25-c70e27b38dbc", "gbif") |> fetch() |> glean() # Calls glean_gbif()
Extract the data returned by a call to fetch_vd(), filter columns of interest, and find unique rows if required.
glean_vd(res, cols = NULL, returnunique = FALSE)glean_vd(res, cols = NULL, returnunique = FALSE)
res |
a list of responses from VecDyn as an |
cols |
a character vector of columns to extract from the dataset. If specified, this will be adjusted to always include the "dataset_id" column. |
returnunique |
whether to return only the unique rows within each dataset according to the filtered columns. |
An ohvbd.data.frame containing the requested data.
Francis Windram
fetch_vd(247) |> glean_vd(cols=c("species", "sample_start_date", "sample_value"), returnunique=TRUE) ohvbd.ids(247, "vd") |> fetch() |> glean() # Calls glean_vd()fetch_vd(247) |> glean_vd(cols=c("species", "sample_start_date", "sample_value"), returnunique=TRUE) ohvbd.ids(247, "vd") |> fetch() |> glean() # Calls glean_vd()
Extract the data returned by a call to fetch_vt(), filter columns of interest, and find unique rows if required.
glean_vt(res, cols = NULL, returnunique = FALSE)glean_vt(res, cols = NULL, returnunique = FALSE)
res |
a list of responses from VecTraits as an |
cols |
a character vector of columns to extract from the dataset. |
returnunique |
whether to return only the unique rows within each dataset according to the filtered columns. |
An ohvbd.data.frame containing the requested data.
Francis Windram
fetch_vt(54) |> glean_vt(cols=c("DatasetID", "Interactor1Genus", "Interactor1Species"), returnunique=TRUE) ohvbd.ids(54, "vt") |> fetch() |> glean() # Calls glean_vt()fetch_vt(54) |> glean_vt(cols=c("DatasetID", "Interactor1Genus", "Interactor1Species"), returnunique=TRUE) ohvbd.ids(54, "vt") |> fetch() |> glean() # Calls glean_vt()
This function tests whether an object has the provenance information expected by ohvbd.
has_db(x, ...)has_db(x, ...)
x |
An object to test. |
... |
Any arguments to be passed to the underlying functions (unused). |
Whether the data has a provenance tag (as a boolean).
Francis Windram
ids <- ohvbd.ids(c(1,2,3), "vd") has_db(ids)ids <- ohvbd.ids(c(1,2,3), "vd") has_db(ids)
Check whether an object has been loaded from cache by ohvbd
is_cached(x)is_cached(x)
x |
The object to check. |
A boolean indicating whether an object has been loaded from the cache.
Francis Windram
is.cached(c(1,2,3))is.cached(c(1,2,3))
This function tests whether an object is considered (by ohvbd) to be from a given database.
This is a fairly coarse check, and so cannot "work out" data provenance from its structure.
is_from(x, db, ...)is_from(x, db, ...)
x |
An object to test. |
db |
The database to test against. |
... |
Any arguments to be passed to the underlying functions (unused). |
Whether the data is from a given database (as a boolean).
Francis Windram
ids <- ohvbd.ids(c(1,2,3), "vd") is_from(ids, "vd")ids <- ohvbd.ids(c(1,2,3), "vd") is_from(ids, "vd")
List all ohvbd cached files
list_ohvbd_cache(subdir = NULL, path = NULL, treeview = FALSE)list_ohvbd_cache(subdir = NULL, path = NULL, treeview = FALSE)
subdir |
a subdirectory or list of subdirectories to list. |
path |
location within which to list files. (Defaults to the standard ohvbd cache location). |
treeview |
display the full cache in a tree structure |
No return value
Francis Windram
list_ohvbd_cache()list_ohvbd_cache()
Match country names to their equivalent naturalearth WKT polygons using rnaturalearth::ne_countries().
match_countries(countrynames, returnmulti = TRUE, onlywkt = FALSE)match_countries(countrynames, returnmulti = TRUE, onlywkt = FALSE)
countrynames |
a vector of country names to match to naturalearth. |
returnmulti |
return the GBIF taxon ids only (otherwise return the full lookup dataframe). |
onlywkt |
only return location_wkt (see note for more details). |
A list containing:
$location_wkt: a multipolygon containing all locations (or a named vector of individual country polygons).
$missing_locs: any provided countries not found in naturalearth.
$found_locs: any provided countries that were found in naturalearth.
Francis Windram
match_countries(c("United Kingdom", "Germany"))match_countries(c("United Kingdom", "Germany"))
Match species names to their GBIF backbone ids using rgbif::name_backbone_checklist().
match_species(speciesnames, exact = FALSE, returnids = TRUE, omit = TRUE)match_species(speciesnames, exact = FALSE, returnids = TRUE, omit = TRUE)
speciesnames |
a vector of species names to match to the GBIF backbone. |
exact |
whether to only return exact species matches. |
returnids |
return the GBIF taxon ids only (otherwise return the full lookup dataframe). |
omit |
omit missing taxon ids (inactive when |
The GBIF taxonids associated with speciesnames or the full GBIF lookup dataframe if returnids = TRUE.
If exact = TRUE and you search for a genus name, this will not be returned.
If you want more control over id filtering, use returnids = FALSE to get the source dataframe.
Francis Windram
match_species(c("Araneus diadematus", "Aedes aegypti"))match_species(c("Araneus diadematus", "Aedes aegypti"))
ohvbd uses a number of internal attributes to track data states within pipelines.
Generally these are not designed to be user-modified. They are, however, listed here for completeness (and curiosity).
It is typically not a good idea to manually modify these attributes directly unless a helper such as force_db() or ohvbd_db() is provided.
Even then, modifying these attributes may cause unexpected errors or data inconsistencies. These errors may not be signalled to the user by ohvbd, and they may not be obvious or even detectable.
Be sure when modifying the db attribute that the value you set it to is consistent with the origin of your data, and that the value is a db known to ohvbd.
| Attribute | Description | Object/s | |
| db | The database from which the object has been retrieved. | ohvbd.ids, ohvbd.responses, ohvbd.data.frame, ohvbd.ad.matrix |
|
| metric | The AD metric. | ohvbd.ad.matrix |
|
| gid | The AD aggregation level. | ohvbd.ad.matrix |
|
| cached | Whether the data was loaded from a cache. | Any | |
| writetime | The time at which a data file was originally cached. | Any | |
| query | The search query sent to the Hub. | ohvbd.hub.search |
|
| searchparams | Any extra parameters sent to the Hub. | ohvbd.hub.search
|
Note: (AD = AREAdata)
Type: string
The db attribute indicates to ohvbd where an object originated.
It is used to determine appropriate method dispatch (such as with fetch()) and to check that pipelines are sensible constructed.
Type: string
metric signifies what AD metric the matrix contains. It is predominantly (but not exclusively) used for formatting and caching.
Type: integer
gid represents the spatial scale of data from AD. It is used for a variety of spatial operations.
Type: boolean
cached objects receive this flag at write-time. It sticks with the object when it is reloaded, and is mostly used for UI/UX purposes.
Type: POSIXct
writetime stores the time at which a cached object (that is likely to become stale) was written to the cache.
Type: string
Simply stores the base query that was sent to the vbdhub search API.
Type: named list
A record of any other search parameters that were sent to the vbdhub search API (e.g. species IDs etc.).
Francis Windram
Retrieve or set the provenance information expected by ohvbd.
ohvbd_db(x) ohvbd_db(x) <- valueohvbd_db(x) ohvbd_db(x) <- value
x |
An object. |
value |
The value to set the db to. |
The database identifier associated with an object (or NULL if missing).
Francis Windram
ids <- ohvbd.ids(c(1,2,3), "vd") ohvbd_db(ids) ohvbd_db(ids) <- "vt" ohvbd_db(ids)ids <- ohvbd.ids(c(1,2,3), "vd") ohvbd_db(ids) ohvbd_db(ids) <- "vt" ohvbd_db(ids)
Set this option to make ohvbd terminate searches before execution and return the request object instead.
This is usually only useful when debugging, testing, or developing ohvbd.
Francis Windram
options(ohvbd_dryrun = TRUE) search_hub("Ixodes ricinus") options(ohvbd_dryrun = NULL) # Unset dryrunoptions(ohvbd_dryrun = TRUE) search_hub("Ixodes ricinus") options(ohvbd_dryrun = NULL) # Unset dryrun
When retrieving data from previous searches (or saved lists of IDs), it can be useful to package these data up in the form that ohvbd would expect to come out of a search.
To do this, create an ohvbd.ids object, specifying the database that the ids refer to.
ohvbd.ids(ids, db)ohvbd.ids(ids, db)
ids |
A numeric vector of ids referring to datasets within the specified database. |
db |
A string specifying the database that these ids refer to. |
An id vector: an S3 vector with class ohvbd.ids.
Francis Windram
ohvbd.ids(c(1,2,3,4,5), db="vt") ohvbd.ids(c(1,2,3,4,5), db="vd") ohvbd.ids( c( "dbc4a3ae-680f-44e6-ab25-c70e27b38dbc", "fac87892-68c8-444a-9ae9-46273fdff724" ), db="gbif" )ohvbd.ids(c(1,2,3,4,5), db="vt") ohvbd.ids(c(1,2,3,4,5), db="vd") ohvbd.ids( c( "dbc4a3ae-680f-44e6-ab25-c70e27b38dbc", "fac87892-68c8-444a-9ae9-46273fdff724" ), db="gbif" )
Retrieve the IDs for any datasets matching the given search parameters.
search_hub( query = "", db = c("vt", "vd", "gbif", "px"), fromdate = NULL, todate = NULL, locationpoly = NULL, taxonomy = NULL, exact = FALSE, withoutpublished = TRUE, returnlist = FALSE, simplify = TRUE, connections = 8 )search_hub( query = "", db = c("vt", "vd", "gbif", "px"), fromdate = NULL, todate = NULL, locationpoly = NULL, taxonomy = NULL, exact = FALSE, withoutpublished = TRUE, returnlist = FALSE, simplify = TRUE, connections = 8 )
query |
a search string. |
db |
the databases to search. |
fromdate |
the date from which to search (ISO format: yyyy-mm-dd). |
todate |
the date up to which to search (ISO format: yyyy-mm-dd). |
locationpoly |
a polygon or set of polygons in |
taxonomy |
a numeric vector containing the gbif ids of taxa to search for (found using |
exact |
whether to return exact matches only. |
withoutpublished |
whether to return results without a publishing date when filtering by date. |
returnlist |
return the raw output list rather than a formatted dataframe. |
simplify |
if only a single database was searched, return an |
connections |
the number of connections to use to parallelise queries. |
an ohvbd.hub.search dataframe, an ohvbd.ids vector (if returnlist=TRUE and length(db) == 1) a list (if returnlist=TRUE) containing the search results.
Francis Windram
search_hub("Ixodes ricinus")search_hub("Ixodes ricinus")
Retrieve the IDs for any VecDyn datasets matching the given keywords
search_vd(keywords, basereq = vb_basereq())search_vd(keywords, basereq = vb_basereq())
keywords |
either a string of search terms separated by spaces, or a vector of keywords. |
basereq |
an httr2 request object, as generated by |
An ohvbd.ids vector of VecDyn dataset IDs.
search_hub() is now preferred for keyword searches:
# old style
search_vd(c("Ixodes", "ricinus")
# new style
search_hub("Ixodes ricinus", db = "vd")
search_vd() may be deprecated in the future.
Francis Windram
search_vd("Aedes aegypti") search_vd(c("Aedes", "aegypti"))search_vd("Aedes aegypti") search_vd(c("Aedes", "aegypti"))
Retrieve the IDs for any VecDyn datasets matching the given filter.
search_vd_smart(field, operator, value, basereq = vb_basereq())search_vd_smart(field, operator, value, basereq = vb_basereq())
field |
a field of VecDyn to search. |
operator |
an operator to use when searching. |
value |
the value that the field might/might not be. |
basereq |
an httr2 request object, as generated by |
An ohvbd.ids vector of VecDyn dataset IDs.
The following field names are valid (shortcut names are listed in brackets):
SpeciesName (species)
Title
Collections
Years (yrs)
CollectionMethods (methods)
Tags
The following operators are valid (alternative names are listed in brackets):
contains (contain, has, have)
!contains (!contains, !has, !have, ncontains)
equals (=, ==, equal, eq)
!equals (!=, not, !equal, !eq, neq)
starts (starts with, start with, start, sw)
!starts (not starts with, not start with, !start, nsw)
in (within)
!in (not in, not within, !within, nin)
greater (greater than, gt, >)
less (less than, lt, <)
Francis Windram
search_vd_smart("Collections", "gt", "1000")search_vd_smart("Collections", "gt", "1000")
Retrieve the IDs for any VecTraits datasets matching the given keywords.
search_vt(keywords, basereq = vb_basereq())search_vt(keywords, basereq = vb_basereq())
keywords |
either a string of search terms separated by spaces, or a vector of keywords. |
basereq |
an httr2 request object, as generated by |
An ohvbd.ids vector of VecTraits dataset IDs.
The ids returned from the server (and thus this function) do not necessarily precisely match the keywords that were requested.
For example search_vt("United Kingdom") does not return only items found in the United Kingdom. Instead it returns items where some part of the string "United Kingdom" appears in one of the indexed columns.
The indexed columns of VecTraits are:
DatasetID
OriginalTraitName
Variables
Interactor1Order
Interactor1Family
Interactor1Genus
Interactor1Species
Interactor1Stage
Interactor1Sex
Interactor2Genus
Interactor2Species
Citation
DOI
CuratedByDOI
SubmittedBy
search_hub() is now preferred for keyword searches:
# old style
search_vt(c("Ixodes", "ricinus")
# new style
search_hub("Ixodes ricinus", db = "vt")
search_vt() may be deprecated in the future.
Francis Windram
search_vt("Aedes aegypti") search_vt(c("Aedes", "aegypti"))search_vt("Aedes aegypti") search_vt(c("Aedes", "aegypti"))
Retrieve the IDs for any VecTraits datasets matching the given filter.
search_vt_smart(field, operator, value, basereq = vb_basereq())search_vt_smart(field, operator, value, basereq = vb_basereq())
field |
a field of VecTraits to search. |
operator |
an operator to use when searching. |
value |
the value that the field might/might not be. |
basereq |
an httr2 request object, as generated by |
An ohvbd.ids vector of VecTraits dataset IDs.
The following field names are valid (shortcut names are listed in brackets):
DatasetID (id)
OriginalTraitName (traitname)
Variables
Interactor1Order (order)
Interactor1Family (family)
Interactor1Genus (genus)
Interactor1Species (species, spp)
Interactor1Stage (stage)
Interactor1Sex (sex)
Interactor2Genus (genus2)
Interactor2Species (species2, spp2)
Citation (cite)
DOI
CuratedByDOI (curateddoi)
SubmittedBy (who)
Tags
The following operators are valid (alternative names are listed in brackets):
contains (contain, has, have)
!contains (!contains, !has, !have, ncontains)
equals (=, ==, equal, eq)
!equals (!=, not, !equal, !eq, neq)
starts (starts with, start with, start, sw)
!starts (not starts with, not start with, !start, nsw)
in (within)
!in (not in, not within, !within, nin)
Francis Windram
search_vt_smart("Interactor1Genus", "equals", "Anopheles")search_vt_smart("Interactor1Genus", "equals", "Anopheles")
Set the default ohvbd cache location
set_default_ohvbd_cache(d = NULL)set_default_ohvbd_cache(d = NULL)
d |
The directory to set the cache path to (or NULL to use a default location). |
The path of the cache (invisibly)
To permanently set a path to use, add the following to your .Rprofile file:
options(ohvbd_cache = "path/to/directory")
Where path/to/directory is the directory in which you wish to cache ohvbd files.
You can find a good default path by running set_default_ohvbd_cache() with no arguments.
Francis Windram
set_default_ohvbd_cache()set_default_ohvbd_cache()
Set ohvbd to disable ssl verification for calls to external APIs. This should not be needed (and not be performed) unless you are otherwise experiencing SSL issues when using the package!
When in interactive mode, checks with you to make sure you want to do this. Does not check when run in a script.
set_ohvbd_compat(value = TRUE)set_ohvbd_compat(value = TRUE)
value |
The boolean value to set ohvbd_compat to. |
No return value, called for side effects
Francis Windram
set_ohvbd_compat()set_ohvbd_compat()
Add a tee to a pipeline to get the data coming in through the pipe.
This is generally a useful function for debugging pipelines, and for caching data after expensive calls. It is also useful if you want the flexibility of multiple calls with the convenience of a fully-piped approach.
The name tee comes from the tee shell command within unix systems.
tee(x, .name = "teeout", .env = NULL)tee(x, .name = "teeout", .env = NULL)
x |
The data coming in (whatever it may be). |
.name |
The name to assign to the output within |
.env |
The environment within which to save the output at this point. Defaults to the caller env (i.e the env which the pipeline is in). |
The value that came from the left hand side of the pipe.
tee() does modify the external environment (if .env is not specified).
This can lead to unpredictable behaviour if not carefully managed, so it is generally
worthwhile restricting usage to interactive situations where the environment
can be more carefully monitored.
Francis Windram
pipeout <- 1:5 |> exp() |> tee("teeout") |> log() print(pipeout) print(teeout) myenv <- new.env() pipeout <- 1:5 |> exp() |> tee("teeout", .env = myenv) |> log() print(myenv$teeout)pipeout <- 1:5 |> exp() |> tee("teeout") |> log() print(pipeout) print(teeout) myenv <- new.env() pipeout <- 1:5 |> exp() |> tee("teeout", .env = myenv) |> log() print(myenv$teeout)
This request is used as the basis for all calls to the vectorbyte API. It does not contain any tokens or session ids, and thus can be regenerated at any time.
vb_basereq( baseurl = "https://vectorbyte.crc.nd.edu/portal/api/", useragent = "ROHVBD", unsafe = FALSE, .qa = FALSE )vb_basereq( baseurl = "https://vectorbyte.crc.nd.edu/portal/api/", useragent = "ROHVBD", unsafe = FALSE, .qa = FALSE )
baseurl |
the base url for the vectorbyte API. |
useragent |
the user agent string used when contacting vectorbyte. |
unsafe |
disable ssl verification (shouldn't ever be required unless you are otherwise experiencing SSL issues!) |
.qa |
switch to the vb qa server (only useful for testing). |
Returns an httr2 request object, pointing at baseurl using useragent.
Francis Windram
basereq <- vb_basereq( baseurl="https://vectorbyte.crc.nd.edu/portal/api/", useragent="ROHVBD")basereq <- vb_basereq( baseurl="https://vectorbyte.crc.nd.edu/portal/api/", useragent="ROHVBD")