--- title: "genesysr Tutorial" author: "Matija Obreza & Nora Castaneda" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{genesysr Tutorial} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- Querying Genesys PGR ===================== [Genesys PGR](https://www.genesys-pgr.org) is the global database on plant genetic resources maintained *ex situ* in national, regional and international genebanks around the world. **genesysr** uses the [Genesys API](https://www.genesys-pgr.org/documentation/apis) to query Genesys data. The API is accessible at https://api.genesys-pgr.org. Accessing data with **genesysr** is similar to downloading data in CSV or Excel format and loading it into R. ## For the impatient Accession passport data is retrieved with the `get_accessions` function. Accessing Genesys requires authentication so the first thing to do is to login: ``` ## Setup: use Genesys Sandbox environment # genesysr::setup_sandbox() # Use this to connect to our test environment https://sandbox.genesys-pgr.org # genesysr::setup_production() # This is initialized by default when loading genesysr # Open a browser: login to Genesys and authorize access genesysr::user_login() ``` The database is queried by providing a `filter` (see Filters below) and the list of passport data fields that you wish to download from Genesys. A basic list of MCPD descriptors ("INSTCODE", "ACCENUMB", "DOI", "HISTORIC", "GENUS", "SPECIES", "SUBTAXA", "SAMPSTAT") is used if you don't specify your own list. ``` # Retrieve accessions for genus *Musa* musa <- get_accessions(filters = list(taxonomy = list(genus = list('Musa')))) # Retrieve all accession data for the Musa International Transit Center, Bioversity International itc <- get_accessions(list(institute = list(code = list('BEL084')))) # Retrieve all accession data for the Musa International Transit Center, Bioversity International (BEL084) and the International Center for Tropical Agriculture (COL003) some <- get_accessions(list(institute = list(code = list('BEL084','COL003')))) ``` **genesysr** provides utility functions to create `filter` objects using [Multi-Crop Passport Descriptors (MCPD)](https://www.genesys-pgr.org/documentation/basics) definitions: ``` # Retrieve data by country of origin (MCPD) get_accessions(mcpd_filter(ORIGCTY = list("DEU", "SVN"))) ``` # Processing fetched data Passport data follows MCPD standard and where multiple values are possible, they will be separated by a semicolon `;`. Example: Column "STORAGE" may include `11;12` or a single `11`. # Filters The `filter` object is a named `list()` where names match a Genesys filter and the value specifies the criteria to match. The records returned by Genesys match all filters provided (*AND* operation), while individual filters allow for specifying multiple criteria (*OR* operation): ```r # (GENUS == Musa) AND ((ORIGCTY == NGA) OR (ORIGCTY == CIV)) filter <- list(taxonomy = list(genus = c('Musa'), species = c('aa')), countryOfOrigin = list(iso3 = c('NGA', 'CIV'))) # OR filter <- list(); filter$taxonomy$genus = list('Musa') filter$taxonomy$species = list('aa') filter$countryOfOrigin$iso3 = list('NGA', 'CIV') # See filter object as JSON jsonlite::toJSON(filters) ``` There are a number of filtering options to retrieve data from Genesys. Best explore how filtering works on the actual website https://www.genesys-pgr.org/a/overview by inspecting the HTTP requests sent by your browser to the API server and then replicating them here. ### Taxonomy `taxonomy$genus` filters by a *list* of genera. ```r filters <- list(taxonomy = list(genus = list('Hordeum', 'Musa'))) # Print jsonlite::toJSON(filters) ``` `taxonomy$species` filters by a *list* of species. ```r filters <- list(taxonomy = list(genus = list('Hordeum'), species = list('vulgare'))) # Print jsonlite::toJSON(filters) ``` ### Origin of material `countryOfOrigin$iso3` filters by ISO3 code of country of origin of PGR material. ```r # Material originating from Germany (DEU) and France (FRA) filters <- list(countryOfOrigin = list(iso3 = list('DEU', 'FRA'))) ``` `geo.latitude` and `geo.longitude` filters by latitude/longitude (in decimal format) of the collecting site. ```r filters <- list(geo = list(latitude = genesysr::range(-10, 30), longitude = genesysr::range(30, 50))) ``` ### Holding institute `institute$code` filters by a *list* of FAO WIEWS institute codes of the holding institutes. ```r # Filter for ITC (BEL084) and CIAT (COL003) list(institute = list(code = list('BEL084', 'COL003'))) ``` `institute$country$iso3` filters by a *list* of ISO3 country codes of country of the holding institute. ```r # Filter for genebanks in Slovenia (SVN) and Belgium (BEL) list(institute = list(country = list(iso3 = list('SVN', 'BEL')))) ``` # Selecting columns # Step-by-step example Let's take a look of all the process of fetching accession passport data from Genesys. 1. Load genesysr ```r library(genesysr) ``` 2. Setup using user credentials ```r setup_sandbox() user_login() ``` 3. Fetch basic data ```r musa <- genesysr::get_accessions(list(taxonomy = list(genus = list('Musa')))) ``` 4. Download columns of interest ``` # Fetch only accession number, storage and taxonomic data for *Musa* accessions musa <- genesysr::get_accessions(list(taxonomy = list(genus = list('Musa'))), fields = list("ACCENUMB", "STORAGE", "GENUS", "SPECIES", "SUBTAXA")) ``` The following column names are available: |Column|Description| |--|-----| |INSTCODE|FAO WIEWS code of the genebank managing the material| |ACCENUMB|Accession number| |DOI|DOI of the accession| |HISTORIC|Flag indicating if the accession record is historical (`true`) or active (`false`)| |CURATION|Type of curation applied to this accession| |GENUS|Genus| |SPECIES|Specific epithet| |SPAUTHOR|Species authority| |SUBTAXA|Subtaxon information at the most detailed taxonomic level| |SUBTAUTHOR|Subtaxon authority| |GRIN_TAXON_ID|GRIN Taxonomy ID of the taxon| |GRIN_NAME|Taxon name according to GRIN Taxonomy| |GRIN_AUTHOR|Taxon authority| |CROPNAME|Crop name(s) as provided by the genebank| |CROPCODE|Crop code used by Genesys| |SAMPSTAT|Biological status of the accession| |ACQDATE|Acquisition date| |ACCENAME|Accession name| |ORIGCTY|Country of provenance of the material| |COLLSITE|Site of collecting| |DECLATITUDE|Latitude of the collecting site| |DECLONGITUDE|Longitude of the collecting site| |COORDUNCERT|Coordinate uncertainty in meters| |COORDDATUM|Coordinate datum| |GEOREFMETH|Georeferencing method| |ELEVATION|Elevation of the collecting site| |COLLDATE|Collecting date| |COLLSRC|Collecting source| |COLLNUMB|Collecting number| |COLLCODE|FAO WIEWS code of the institute that originally collected the material| |COLLNAME|Name of the institute that collected the material| |COLLINSTADDRESS|Address of the institute that collected the material| |COLLMISSID|Collecting mission name/identifier| |DONORCODE|FAO WIEWS code of the institute from which this accession was acquired| |DONORNAME|Name of the institute from which this accession was acquired| |DONORNUMB|Accession number at the donor institute| |OTHERNUMB|Other numbers/identifiers associated with this accession| |BREDCODE|FAO WIES code of the institute that developed/bred this material| |BREDNAME|Name of the institute that developed this material| |ANCEST|Ancestral data or pedigree information| |DUPLSITE|FAO WIEWS codes of institutes where this accession is safety duplicated by the genebank| |STORAGE|Types of germplasm storage| |MLSSTAT|Status of the accession in the Multilateral System of the ITPGRFA| |ACCEURL|Accession URL| |REMARKS|Notes and remarks| |DATAPROVIDERID|Database ID of this record in genebank's own database| |PDCI|Passport Data Completeness Index for this accession| |UUID|UUID assigned to this record by Genesys| |LASTMODIFIED|Date when this record was last updated in Genesys| # Downloading all non-historical records Please use sparingly! ``` accessions <- get_accessions(filters = c(historic = list('false'))) ```