---
title: "genesysr Tutorial"
author: "Matija Obreza & Nora Castaneda"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{genesysr Tutorial}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

Querying Genesys PGR
=====================

[Genesys PGR](https://www.genesys-pgr.org) is the global database on plant genetic resources
maintained *ex situ* in national, regional and international genebanks around the world.

**genesysr** uses the [Genesys API](https://www.genesys-pgr.org/documentation/apis) to query Genesys data.
The API is accessible at https://api.genesys-pgr.org.

Accessing data with **genesysr** is similar to downloading data in CSV or Excel format and loading
it into R.

## For the impatient

Accession passport data is retrieved with the `get_accessions` function.

Accessing Genesys requires authentication so the first thing to do is to login:

```
## Setup: use Genesys Sandbox environment
# genesysr::setup_sandbox() # Use this to connect to our test environment https://sandbox.genesys-pgr.org
# genesysr::setup_production() # This is initialized by default when loading genesysr

# Open a browser: login to Genesys and authorize access
genesysr::user_login()
```

The database is queried by providing a `filter` (see Filters below) and the list of passport data
fields that you wish to download from Genesys. A basic list of MCPD descriptors ("INSTCODE", "ACCENUMB", "DOI", "HISTORIC", "GENUS", "SPECIES", "SUBTAXA", "SAMPSTAT")
is used if you don't specify your own list.

```
# Retrieve accessions for genus *Musa*
musa <- get_accessions(filters = list(taxonomy = list(genus = list('Musa'))))

# Retrieve all accession data for the Musa International Transit Center, Bioversity International
itc <- get_accessions(list(institute = list(code = list('BEL084'))))

# Retrieve all accession data for the Musa International Transit Center, Bioversity International (BEL084) and the International Center for Tropical Agriculture (COL003)
some <- get_accessions(list(institute = list(code = list('BEL084','COL003'))))
```

**genesysr** provides utility functions to create `filter` objects using [Multi-Crop Passport Descriptors (MCPD)](https://www.genesys-pgr.org/documentation/basics) definitions:

```
# Retrieve data by country of origin (MCPD)
get_accessions(mcpd_filter(ORIGCTY = list("DEU", "SVN")))
```

# Processing fetched data

Passport data follows MCPD standard and where multiple values are possible, they will be separated by a semicolon `;`.

Example: Column "STORAGE" may include `11;12` or a single `11`.

# Filters

The `filter` object is a named `list()` where names match a Genesys filter and the value
specifies the criteria to match.

The records returned by Genesys match all filters provided (*AND* operation), while individual filters
allow for specifying multiple criteria (*OR* operation):

```r
# (GENUS == Musa) AND ((ORIGCTY == NGA) OR (ORIGCTY == CIV))
filter <- list(taxonomy = list(genus = c('Musa'), species = c('aa')), countryOfOrigin = list(iso3 = c('NGA', 'CIV')))

# OR
filter <- list();
filter$taxonomy$genus = list('Musa')
filter$taxonomy$species = list('aa')
filter$countryOfOrigin$iso3 = list('NGA', 'CIV')

# See filter object as JSON
jsonlite::toJSON(filters)
```

There are a number of filtering options to retrieve data from Genesys. Best explore how filtering 
works on the actual website https://www.genesys-pgr.org/a/overview by inspecting the HTTP requests
sent by your browser to the API server and then replicating them here.

### Taxonomy

`taxonomy$genus` filters by a *list* of genera.

```r
filters <- list(taxonomy = list(genus = list('Hordeum', 'Musa')))
# Print
jsonlite::toJSON(filters)
```

`taxonomy$species` filters by a *list* of species.

```r
filters <- list(taxonomy = list(genus = list('Hordeum'), species = list('vulgare')))
# Print
jsonlite::toJSON(filters)
```

### Origin of material

`countryOfOrigin$iso3` filters by ISO3 code of country of origin of PGR material.

```r
# Material originating from Germany (DEU) and France (FRA)
filters <- list(countryOfOrigin = list(iso3 = list('DEU', 'FRA')))
```

`geo.latitude` and `geo.longitude` filters by latitude/longitude (in decimal format) of the
collecting site.

```r
filters <- list(geo = list(latitude = genesysr::range(-10, 30), longitude = genesysr::range(30, 50)))
```


### Holding institute

`institute$code` filters by a *list* of FAO WIEWS institute codes of the holding institutes.

```r
# Filter for ITC (BEL084) and CIAT (COL003)
list(institute = list(code = list('BEL084', 'COL003')))
```

`institute$country$iso3` filters by a *list* of ISO3 country codes of country of the holding institute.

```r
# Filter for genebanks in Slovenia (SVN) and Belgium (BEL)
list(institute = list(country = list(iso3 = list('SVN', 'BEL'))))
```

# Selecting columns


# Step-by-step example

Let's take a look of all the process of fetching accession passport data from Genesys.

1. Load genesysr

```r
library(genesysr)
```

2. Setup using user credentials

```r
setup_sandbox()
user_login()
```

3. Fetch basic data

```r
musa <- genesysr::get_accessions(list(taxonomy = list(genus = list('Musa'))))
```

4. Download columns of interest

```
# Fetch only accession number, storage and taxonomic data for *Musa* accessions
musa <- genesysr::get_accessions(list(taxonomy = list(genus = list('Musa'))), fields = list("ACCENUMB", "STORAGE", "GENUS", "SPECIES", "SUBTAXA"))
```

The following column names are available:

|Column|Description|
|--|-----|
|INSTCODE|FAO WIEWS code of the genebank managing the material|
|ACCENUMB|Accession number|
|DOI|DOI of the accession|
|HISTORIC|Flag indicating if the accession record is historical (`true`) or active (`false`)|
|CURATION|Type of curation applied to this accession|
|GENUS|Genus|
|SPECIES|Specific epithet|
|SPAUTHOR|Species authority|
|SUBTAXA|Subtaxon information at the most detailed taxonomic level|
|SUBTAUTHOR|Subtaxon authority|
|GRIN_TAXON_ID|GRIN Taxonomy ID of the taxon|
|GRIN_NAME|Taxon name according to GRIN Taxonomy|
|GRIN_AUTHOR|Taxon authority|
|CROPNAME|Crop name(s) as provided by the genebank|
|CROPCODE|Crop code used by Genesys|
|SAMPSTAT|Biological status of the accession|
|ACQDATE|Acquisition date|
|ACCENAME|Accession name|
|ORIGCTY|Country of provenance of the material|
|COLLSITE|Site of collecting|
|DECLATITUDE|Latitude of the collecting site|
|DECLONGITUDE|Longitude of the collecting site|
|COORDUNCERT|Coordinate uncertainty in meters|
|COORDDATUM|Coordinate datum|
|GEOREFMETH|Georeferencing method|
|ELEVATION|Elevation of the collecting site|
|COLLDATE|Collecting date|
|COLLSRC|Collecting source|
|COLLNUMB|Collecting number|
|COLLCODE|FAO WIEWS code of the institute that originally collected the material|
|COLLNAME|Name of the institute that collected the material|
|COLLINSTADDRESS|Address of the institute that collected the material|
|COLLMISSID|Collecting mission name/identifier|
|DONORCODE|FAO WIEWS code of the institute from which this accession was acquired|
|DONORNAME|Name of the institute from which this accession was acquired|
|DONORNUMB|Accession number at the donor institute|
|OTHERNUMB|Other numbers/identifiers associated with this accession|
|BREDCODE|FAO WIES code of the institute that developed/bred this material|
|BREDNAME|Name of the institute that developed this material|
|ANCEST|Ancestral data or pedigree information|
|DUPLSITE|FAO WIEWS codes of institutes where this accession is safety duplicated by the genebank|
|STORAGE|Types of germplasm storage|
|MLSSTAT|Status of the accession in the Multilateral System of the ITPGRFA|
|ACCEURL|Accession URL|
|REMARKS|Notes and remarks|
|DATAPROVIDERID|Database ID of this record in genebank's own database|
|PDCI|Passport Data Completeness Index for this accession|
|UUID|UUID assigned to this record by Genesys|
|LASTMODIFIED|Date when this record was last updated in Genesys|


# Downloading all non-historical records

Please use sparingly!

```
accessions <- get_accessions(filters = c(historic = list('false')))
```