---
title: "openFDA"
description: >
  Get up and running with accessing openFDA from R.
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{openFDA}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
# save the built-in output hook
hook_output <- knitr::knit_hooks$get("output")

# set a new output hook to truncate text output
knitr::knit_hooks$set(output = function(x, options) {
  if (!is.null(n <- options$out.lines)) {
    x <- strsplit(x, split = "\n")[[1]]
    if (length(x) > n) {
      # truncate the output
      x <- c(head(x, n), "....\n")
    }
    x <- paste(x, collapse = "\n")
  }
  hook_output(x, options)
})

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

Using `{openFDA}` can be a breeze, if you know how to construct good queries.
This short guide will get you started with `{openFDA}`, and show you how to put together more complicated queries.

```{r load_openFDA}
library(openFDA)
```

```{r set_key}
set_api_key("tvzdUXoC3Yi21hhxtKKochJlQgm6y1Lr7uhmjcpT")
```

# The openFDA API
The openFDA API makes public FDA data available from a simple, public API.
Users of this API have access to FDA data on food, human and veterinary drugs, devices, and more.
You can read all about it at their [website](https://open.fda.gov/).

# A simple openFDA query
The simplest way to query the openFDA API is to identify the **endpoint** you want to use and provide other search terms.
For example, this snippet retrieves 1 record about **adverse events** in the **drugs** endpoint.
The empty search string (`""`) means the results will be non-specific.

```{r example1}
search <- openFDA(search = "", endpoint = "drug-event", limit = 1)
search
```

## openFDA results
The function returns an `{httr2}` response object, with attached JSON data.
We use `httr2::resp_body_json()` to extract the underlying data.

```{r example2_json}
json <- httr2::resp_body_json(search)
```

If you don't specify a field to `count` on, the JSON data has two sections - `meta` and `results`.

### Meta
The `meta` section has important metadata on your results, which includes:

* `disclaimer` - An important disclaimer regarding the data provided by openFDA.
* `license` - A webpage with license terms that govern the openFDA API.
* `last_updated` - The last date when this openFDA endpoint was updated.
* `results.skip` - How many results were skipped? Set by the `skip` parameter in `openFDA()`.
* `results.limit` - How many results were retrieved? Set by the `limit` parameter in `openFDA()`.
* `results.total` - How many results were there in total matching your `search` criteria?

```{r example_2a_json_meta}
json$meta
```

### Results

For non-`count` queries, this will be a set of records which were found in the endpoint and match your `search` term.

```{r example_2b_json_results, out.lines = 10}
json$results
```

### Results when `count`-ing
If you set the `count` query, then the openFDA API will not return full records.
Instead, it will count the number of records for each member in the openFDA field you specified for `count`.
For example, let's look at drug manufacturers in the [Drugs@FDA endpoint](https://open.fda.gov/apis/drug/drugsfda/) for `"paracetamol"`.
We'll use the `limit` parameter to limit our results to the first 3 drug manufacturers found.

```{r example_3_count}
count <- openFDA(search = "",
                 endpoint = "drug-drugsfda",
                 limit = 3,
                 count = "openfda.manufacturer_name.exact") |>
  httr2::resp_body_json()
count$results
```

You can count on fields with a date to create a time series, as demonstrated on the [openFDA website](https://open.fda.gov/apis/timeseries/).

# Using search terms
We can increase the complexity of our query using the `search` parameter, which lets us search against specific openFDA API fields.
These fields are harmonised to different degrees in each API, which you will need to check [online](https://open.fda.gov/apis/openfda-fields/).

## Searching on one field
You can provide search strategies to `openFDA()` as single strings.
They are constructed as `[FIELD_NAME]:[STRING]`, where `FIELD_NAME` is the openFDA field you want to search on.
If your `STRING` contains spaces, you must surround it with double quotes, or openFDA will search against each word in the string.
So, for example, a search for drugs with the class `"thiazide diuretic`" should be formatted as `"openfda.pharm_class_epc:\"thiazide diuretic\""`, or the API will collect all drugs which have the words `"thiazide"` or `"diuretic"` in their established pharmacological class (EPC).
Let's do an unrefined search first:

```{r example3a_single_search_term_unrefined, out.lines = 15}
search_unrefined <- openFDA(
  search = "openfda.pharm_class_epc:thiazide diuretic",
  endpoint = "drug-drugsfda",
  limit = 1
)
httr2::resp_body_json(search_unrefined)$meta$results$total
```

Let's compare this to our refined search, where we add double-quotes around the search term:

```{r example3a_single_search_term_refined, out.lines = 15}
search_refined <- openFDA(
  search = "openfda.pharm_class_epc:\"thiazide diuretic\"",
  endpoint = "drug-drugsfda",
  limit = 1
)
httr2::resp_body_json(search_refined)$meta$results$total
```

As you can see, the unrefined search picked up `r httr2::resp_body_json(search_unrefined)$meta$results$total - httr2::resp_body_json(search_refined)$meta$results$total` more results, most of which would have probably been non-thiazide diuretics.

## Searching on multiple fields
The openFDA API lets you search on various fields at once.
Simple methods for doing this are implemented in `{openFDA}`.

### Write your own search term
Using the guides on the [openFDA website](https://open.fda.gov/apis/query-parameters/), you can put together your own query.
For example, the following query looks for up to 5 records which were submitted by Walmart and are taken orally.
We can use `{purrr}` functions to extract a brand name for each record.
Note that though a single record can have multiple brand names, we are choosing to only extract the first one.

```{r example 3b_supply_scalar_search_term}
search_term <- "openfda.manufacturer_name:Walmart+AND+openfda.route=oral"
search <- openFDA(search = search_term,
                  limit = 5,
                  endpoint = "drug-drugsfda")
json <- httr2::resp_body_json(search)
purrr::map(json$results, .f = \(x) {
  purrr::pluck(x, "openfda", "brand_name", 1)
})
```

### Let `openFDA()` construct the search term
You can let the package do the heavy lifting for you with `openFDA()`, by providing a named character vector with many field/search term pairs to the `search` parameter.
The function will automatically add double quotes (`""`) around your search terms, if you're providing field/value pairs like this.

```{r example3b_format_scalar_search_term, out.lines = 15}
search <- openFDA(search = c("openfda.generic_name" = "amoxicillin"),
                  endpoint = "drug-drugsfda")
httr2::resp_body_json(search)$meta$results$total
```

You can include as many fields as you like, as long as you only provide each field once.
By default, the terms are combined with an `OR` operator in `openFDA()`.
The below search strategy will therefore pick up all entries in [Drugs@FDA](https://open.fda.gov/apis/drug/drugsfda/) which are taken by mouth.

```{r example3b_format_nonscalar_search_term, out.lines = 15}
search <- openFDA(search = c("openfda.generic_name" = "amoxicillin",
                             "openfda.route" = "oral"),
                  endpoint = "drug-drugsfda", limit = 1)
httr2::resp_body_json(search)$meta$results$total
```

### Pre-construct a search term

To apply multiple search terms with `AND` operators, use `format_search_term()` with `mode = "and"`:
```{r example3b_format_nonscalar_search_term_with_and, out.lines = 15}
search_term <- format_search_term(c("openfda.generic_name" = "amoxicillin",
                                    "openfda.route" = "oral"),
                                  mode = "and")
search <- openFDA(search = search_term,
                  endpoint = "drug-drugsfda", limit = 1)
httr2::resp_body_json(search)$meta$results$total
```

## Wildcards
You can use the wildcard character `"*"` to match zero or more characters.
For example, we could take the prototypical ending to a common drug class - e.g. the **sartans**, which are angiotensin-II receptor blockers - and see which manufacturers are most represented in Drugs@FDA for this class.
When using wildcards, either pre-format the string yourself *without double-quotes* or use `format_search_term()` with `exact = FALSE`.
If you try to search with both double-quotes and the wildcard character, you will get a 404 error from openFDA.

```{r example4_wildcards, out.lines = 15}
search_term <- format_search_term(c("openfda.generic_name" = "*sartan"),
                                  exact = FALSE)
search <- openFDA(search = search_term,
                  count = "openfda.manufacturer_name.exact",
                  endpoint = "drug-drugsfda",
                  limit = 5)
terms <- purrr::map(
  .x = httr2::resp_body_json(search)$results,
  .f = purrr::pluck("term")
)
counts <- purrr::map(
  .x = httr2::resp_body_json(search)$results,
  .f = purrr::pluck("count")
)

setNames(counts, terms)
```

It looks like `"Alembic Pharmaceuticals"` is very active in this space - interesting!

# Other openFDA API features
This short guide does not cover all aspects of openFDA.
It is recommended that you go to the [openFDA API website](https://open.fda.gov/apis/) and check out the resources there to see information on:

* [Date ands ranges](https://open.fda.gov/apis/dates-and-ranges/)
* [Search for fields with missing values](https://open.fda.gov/apis/missing-values/)
* [Generating time series](https://open.fda.gov/apis/timeseries/)
* [Paging larger queries](https://open.fda.gov/apis/paging/)