--- title: "rcatfish Tutorial" author: "Samuel R. Borstein, Brandon E. Dominy" date: "`r format(Sys.time(), '%d %B, %Y')`" output: html_document: keep_md: true vignette: > %\VignetteIndexEntry{rcatfish Tutorial} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- # 1: Introduction This is a tutorial for using the R package `rcatfish`. `rcatfish` provides access to the California Academy of Sciences Eschmeyer's Catalog of Fishes within R (Eschmeyer et al., 1998, Fricke et al., 2025, https://researcharchive.calacademy.org/research/ichthyology/catalog/fishcatmain.asp). The Catalog of Fishes database is the gold standard for fish taxonomy as it provides thorough citations of the taxonomic history of fishes and is updated continuosly via standard monthly releases. While there are other packages within R that can be used for checking the taxonomy of organisms, including fishes (i.e. `rfishbase`, `taxize`, `ritis`, etc.), the databases accessed by these packages lacks the expansive information on the taxonomic history of fishes and are typically not up to date on the cutting edge of fish systematics.`rcatfish` introduces functions to access the various information in the California Academy of Sciences Eschmeyer's Catalog of Fishes. This tutorial provides a basic introduction into using the package and its functions. Please note when using this package that it is intended to act solely as an interface to Eschmeyer's Catalog of Fishes and is not affiliated with California Academy of Sciences. As Eschmeyer's Catalog of Fishes is only published in a non-machine-readable format, `rcatfish` is intended to parse the catalog data into a format more suitable for analysis through web scraping. As such, `rcatfish` will only be as accurate as the data in the published catalog, and spelling or formatting errors may arise due to inconsistencies in Eschmeyer's Catalog of Fishes data entry. # 2: Installation ## 2.1: Installation From CRAN In order to install the stable CRAN version of the `rcatfish` package: ``` install.packages("rcatfish") ``` ## 2.2: Installation of Development Version From GitHub While we recommend use of the stable CRAN version of this package, we recommend using the package `devtools` to temporarily install the development version of the package from GitHub if for any reason you wish to use it: ``` #1. Install 'devtools' if you do not already have it installed: install.packages("devtools") #2. Load the 'devtools' package and temporarily install the development version of #'dietr' from GitHub: library(devtools) dev_mode(on=T) install_github("sborstein/rcatfish") # install the package from GitHub library(rcatfish)# load the package #3. Leave developers mode after using the development version of 'rcatfish' so it will not remain on your systempermanently. dev_mode(on=F) ``` ## 2.3 Installation Requirements for Windows OS users To make https connections, the dependencies of rcatish utilize curl. In versions of Windows 7 and up, the curl implementation in R can use either `openSSL` or Windows Secure Channel (`SChannel`). Only one of these options can be active at a time and the default is `Schannel`, which conflicts with this package. To see which one you have active, you can do the following: ``` curl::curl_version()$ssl_version ``` In the output you may have more than one option. The ones in parentheses are not in use, while the ones lacking parentheses are in use. If you have `Schannel` in use, you will need to add the following line to your `~/.Renviron` file to have curl use openSSL. ``` CURL_SSL_BACKEND=openssl ``` This can be added manually or can be done directly in R by running the following line of code. ``` write('CURL_SSL_BACKEND=openssl', file = "~/.Renviron", append = TRUE) ``` After adding this line to your `~/.Renviron`, re-start R. Then check the `curl_version` again. `Schannel` should now be in parentheses while your `openSSL` option should now lack parentheses, indicating it is in use. ``` curl::curl_version()$ssl_version ``` # 3: Using rcatfish Once installed, you can load `rcatfish` and all of its functions/data: ``` library(rcatfish) ``` Upon loading, you will see a message showing you the version of the Catalog of Fishes as well as how to properly cite the Catalog of Fishes and this R package. In the event you need to get the version of the Catalog of Fishes at any point, you can use the function `rcatfish_version()` which has no arguments to return the date of the version of the Catalog of Fishes being accessed. ## 3.1: Basic data necessary to run rcatfish To use the majority of functions in `rcatfish` you will need the following data as inputs. First, for all functions which search the catalog, a `query` is required. This is can be either a single search term or a vector of terms to be searched in series, the content of which will be dependent on the particular search function being used (e.g. when searching for references, `query` may be a catalog reference number). Aside from a `query`, several functions require a `type` as well. This is used to differentiate the search method in functions that support multiple types (e.g. search by `type = genus` or `type = keyword`). Similar to `query`, the content of this parameter will vary based on the function it is being used in, but unlike `query` it is typically not able to be vectorized and has a specific set of options it must match in each function that calls it. All other parameters used in this package are optional and unique to each individual function. Several of these will be discussed below, but each function's options can be seen by running `?function_name` or `help(function_name)`. ## 3.2: Using rcatfish to Search for Species and Genera in Eschmeyer's Catalog of Fishes (`rcatfish_search`) To search the Catalog of Fishes's taxonomic records, the function `rcatfish_search` should be used. This function is equivalent to using the "Search Eschmeyer's Catalog" tab on the catalog's website. While it has several parameters with default arguments, only `query` and `type` must be specified. As an example, a search can be performed for all available species names in the family *Rhincodontidae* using the following function call: ``` # Search CoF for Available Species Names in Rhincodontidae rhinco_species <- rcatfish_search(query = "Rhincodontidae", type = "Species") View(rhinco_species) ``` When viewing the created object `rhinco_species` above, you should see a large dataframe containing species results of that family. ### 3.2.1 The `type` Parameter in `rcatfish_search` The `type` parameter allows you to specify what type of results you want returned and is the equivalent of changing the radio button on the "Search Eschmeyer's Catalog" tab between "Genera" and "Species" (the "References" option is available using the `rcatfish_references` function which will be discussed later). It has two acceptable inputs, either "Species" or "Genus". This parameter is not vectorizable, so any searches should either all be by genus or all be by species. There is no default option, so it must always be explicitly assigned a value during function calls. To see an example of the difference between the two options: ``` # Search Rhincodontidae by Species by_species <- rcatfish_search(query = "Rhincodontidae", type = "Species") View(by_species) # Search Rhincodontidae by Genus by_genus <- rcatfish_search(query = "Rhincodontidae", type = "Genus") View(by_genus) ``` When viewing the outputs from above, you will notice that despite each search containing the same query, the results differ. Please always ensure you have the correct value assigned when searching the catalog. ### 3.2.2 The `query`, `phrase`, `unavailable`, and `resolve` Parameters in `rcatfish_search` The `query` parameter represents the keyword(s) to search for in the catalog. These can be any text found in the catalog's entries, including species names, genera, family and subfamily names, author names, type specimen information, and more. This argument is vectorizable, meaning multiple different queries can be passed in one function call. By default there will be a 10-second wait between each query, as the Catalog of Fishes requests at least this much time between requests. As an example: ``` # Search CoF for Using Multiple Queries searchTerms <- c("Rhincodontidae", "Aldrichetta") result <- rcatfish_search(query = searchTerms, type = "Species") View(result) ``` As you can see with the created object `results` from the function call above, one dataframe is returned containing all of the data from both queries, with the first column showing which query returned each result. By default, queries of more than one word will search for a separate instance of each word in the Catalog of Fish's entries. For example, `query = status uncertain` will search for any entries which contain the words "status" and "uncertain" anywhere in their text. If you wish to only search for entries which contain the exact phrase "status uncertain", you can specify this using the `phrase` parameter. By default this parameter is set to `FALSE`, but by explicitly setting it to `TRUE` you can have `rcatfish_search` treat each independent query as an exact phrase. As an example, see the difference in results for the following two searches: ``` # Search Catalog of Fishes with phrase = FALSE no_phrase <- rcatfish_search(query = "Mugil abu", type = "Species") View(no_phrase) # Search Catalog of Fishes with phrase = TRUE yes_phrase <- rcatfish_search(query = "Mugil abu", type = "Species", phrase = TRUE) View(yes_phrase) ``` This package also has some functionality for correcting misspellings in searches. By default this is not performed, however by setting `resolve = TRUE` within the `rcatfish_search` function call it can be toggled on. When this is set, `rcatfish` will perform the search as usual, however in the event that no results are found, it will attempt to resolve the search queries using fuzzy matching by finding the closest matches to them through the Global Names Verifier. See the difference below between these two options: ``` # Searching Catalog of Fishes with a Misspelled Species Name # Without Resolving Names no_resolve <- rcatfish_search(query = "rhincodon tipus", type = "Species") View(no_resolve) # With Resolving Names yes_resolve <- rcatfish_search(query = "rhincodon tipus", type = "Species", resolve = T) View(yes_resolve) ``` The catalog also allows the searching of unavailable names. By default these names are excluded when running a search, but by changing the `unavailable` parameter to `TRUE` they can be included. See the difference below: ``` # Search Catalog of Fishes Without Unavailable Names no_unavailable <- rcatfish_search(query = "Mitsukurinidae", type = "Species") View(no_unavailable) # Search Catalog of Fishes With Unavailable Names yes_unavailable <- rcatfish_search(query = "Mitsukurinidae", type = "Species", unavailable = TRUE) View(yes_unavailable) ``` ### 3.2.3 The `common.name` Parameter in `rcatfish_search` Currently, Eschmeyer's Catalog of Fishes does not include the common names of species. `rcatfish` does, however, have the capability of searching for species by common name by utilizing the `rfishbase` package. To do this, you can utilize the `common.name` parameter in `rcatfish_search`. Please not that searching by common names can only be performed on a species level, not by genus. By default this parameter is set to `FALSE`. When explicitly changed to `TRUE`, the function will first match the `query` input of common names to any associated scientific names currently found in FishBase. Note that while it is still vectorizable in this format, you can not combine common names with other search terms (e.g. you can search for `query = c("Humphead Wrasse", "Channel Catfish")` but searching for `query = c("Humphead Wrasse", "Lophius piscatorius")` may return unexpected results). This parameter will return a list containing the normal `rcatfish_search` result dataframe and a second dataframe showing the common names provided and the taxonomic names that they were matched to. As an example: ``` # Search Catalog of Fishes by Common Name common_name_result <- rcatfish_search(query = "Humphead wrasse", type = "Species", common.name = TRUE) View(common_name_result) # The full list returned View(common_name_result[[1]]) # The first dataframe in the list, the normal rcatfish_search output View(common_name_result[[2]]) # The second dataframe in the list, the common names searched and their matches ``` You can search using common names from other languages if you so desire by setting the `language` parameter. By default it is set to "English". ### 3.2.4 The `taxon.history` Parameter in `rcatfish_search` Each entry in Eschmeyer's Catalog of Fishes also contains a complete history of that entry's taxonomic status. By default, this is not captured with `rcatfish_search`, although it can be obtained by setting the `taxon.history` parameter to `TRUE`. When this is done, an additional dataframe is returned containing each result's original status, current status, and every change to its status made in between. This search can be performed both by species and by genus. Please note that, particularly for queries with a large number of changes in their history, searching with `taxon.history = TRUE` may take considerably longer than a typical search. As an example of what this may look like: ``` # Search Catalog of Fishes by Common Name taxon_history_result <- rcatfish_search(query = "Platyrhina", type = "Genus", taxon.history = TRUE) View(taxon_history_result) # The full list returned View(taxon_history_result[[1]]) # The first dataframe in the list, the normal rcatfish_search output View(taxon_history_result[[2]]) # The second dataframe in the list, the taxonomic histories of the results ``` ### 3.2.5 Other Parameters in `rcatfish_search` Several other parameters exist in the `rcatfish_search` function to modify minor aspects of the search function. The `verbose` parameter toggles on and off the message displayed to the user when running a search (e.g. "Now on query 1 of 100"). By default it is set to `TRUE`. Messages can be disabled by changing it to `FALSE`. The `sleep.time` query sets the length of time that the search function will wait between requests to the Catalog of Fishes's server when performing a search of multiple terms. This is set to 10 seconds as requested by the catalog. **This parameter should not be modified. Changing this value may result in blacklisting by the catalog.** ## 3.3: Using rcatfish to Search for References in Eschmeyer's Catalog of Fishes (`rcatfish_references`) To search through references in Eschmeyer's Catalog of Fishes, the function `rcatfish_references` should be used. This function is equivalent to using the "Search Eschmeyer's Catalog" tab on the catalog's website and selecting the "References" radio button. It has the parameters `query` and `type`, both of which must be specified. The `query` parameter can search either by reference number in the catalog of by keyword and can be passed as either a single search term or a vector of terms. The type of search performed is dictated by the the `type` parameter, which will accept either "RefNo" to search by reference number or "keyword" to search by keyword. Note that when searching by reference number, the query can be passed either as an integer or as a character string (e.g. 41479 and "41479" will return the same results). ``` # Search references by keyword keyword_reference <- rcatfish_references(query = "Tunisia", type = "keyword") # Search references by reference number RefNo_reference <- rcatfish_references(query = 41479, type = "RefNo") ``` `rcatfish_references` can be combined with a result from `rcatfish_search` to obtain all references associated with the resulting species. As an example: ``` # Search the catalog for a given species search_result <- rcatfish_search(query = "Cichla cataractae", type = "Species") # Retrieve references from resulting search references <- rcatfish_references(query = search_result$DescriptionRef, type = "RefNo") ``` ## 3.4: Using rcatfish to See Recent Catalog Updates Eschmeyer's Catalog of Fishes receives monthly updates. These updates include changes to the taxonomic status of genera and species, changes related to authorship, and the addition of newly described taxa. Users cans see these updates by using the `rcatfish_updates` function. By default, this function takes no arguments, and will return all changes provided by the most recent update. However, users can specify if they want to return the catalog taxonomic changes, authorship changes, added genera, and added.species with simple `TRUE` or `FALSE`. For example, if we wanted to obtain all the changes in a version of the catalog, we can do either of the following: ``` updates <- rcatfish_updates() ``` or, we can set specific arguments to return specific update components. These are set to `TRUE` by default, but users can change these given their names. ``` updates <- rcatfish_updates(changes = TRUE, author.changes = TRUE, added.genera = TRUE, added.species = TRUE) updates ``` We can see when running the above code that a list is returned of changes made in the newest edition of the catalog (which is updated once a month). This list will be of a variable length depending on which elements the user asked to return. Other elements of the returned list () are `Changes`, `AuthorshipChanges`, `AddedGenera`, and `AddedSpecies`, which contain the taxonomic changes, authorship changes, newly added genera to the catalog, and added species to the catalog respectively. ## 3.5: Using rcatfish to Obtain the Number of Species and Genera Described Per Taxonomic Entity Eschmeyer's Catalog of Fishes provides information on the number of species and genera described per family and subfamily via a table on the following linked page (https://researcharchive.calacademy.org/research/ichthyology/catalog/SpeciesByFamily.asp). rcatfish provides access to this page as well as the ability to return species totals for higher taxonomic entities than just family and subfamily, such as orders and classes using the rcatfish_species_by function. This function simply takes a query that is a subfamily, family, class, or order that the user wishes to obtain data for. For example, if we want to return information for the family Cichlidae we can easily do so using the following: ``` rcatfish_species_by("Cichlidae") ``` We can see that this has returned a data frame containing the number of available and valid genera and species, as well as the number of genera and species described in the last decade for the family and all subfamilies in Cichlidae. However, while the Catalog of Fishes does not report these figures at higher taxonomic levels, rcatfish can. We can obtain the number of described genera and species for the order Cichliformes, with the following. ``` rcatfish_species_by("Cichliformes") ``` We can see that this has provided not just the number of genera and species in each family within the Cichliformes, but has also returned the total for the entire order, which is not reported on the Catalog of Fishes. ## 3.6: Browsing the Eschmeyer's Catalog of Fishes Classification Eschmeyer's Catalog of Fishes provides a hierarchical classification of fishes organized by Class, Order, Suborder, Family, and Subfamily (https://www.calacademy.org/scientists/catalog-of-fishes-classification/). The `rcatfish` function `rcatfish_classification` provides access to this table. This function lacks arguments and can be simply called as followed. ``` # See Current Breakdown of Fish Classification, from Class Through Subfamily fish_classification <- rcatfish_classification() fish_classification ``` The function returns a data frame that progresses from left to right from most to least inclusive. In addition to providing the hierarchy for Class, Order, Suborder, Family, and Subfamily, the authorship of these taxonomic entities as well as their common name is returned. ## 3.7: Accessing Eschmeyer's Catalog of Fishes Glossary ``` # See a Glossary of Terms Used in the Catalog glossary <- rcatfish_glossary() ``` We can see that the `glossary` object made in the line of code above creates a data frame object containing a list of technical terms used in the catalog along with definitions and applicable sub-terms. ## 3.8: Using rcatfish to Obtain Information on the Journals Cited in the Catalog of Fishes Besides just citations for references used in the The Catalog of Fishes, the Catalog of Fishes also provides various information on the journals used for references (https://researcharchive.calacademy.org/research/ichthyology/catalog/journals.asp). For example, the Catalog of Fishes provides information for ISSN numbers, publishers, and comments, such as name changes for journals. Information on the journals can be accessed in rcatfish using the rcatfish_journals function. This function simply takes the argument query which is a string to search for as well as if the argument phrase which is if the query should be passed in quotes while searching as a phrase. For example, to search for journals that are related to Texas, we can do the following: ``` rcatfish_journals("Texas") ``` We can see that most of these contain Texas in the title, or information on how one, "Contributions in Marine Science" is a continuation of Publications of the Institute of Marine Science, University of Texas. Note that passing the query as a phrase may impact the success of the search. For example, if we wanted to search for Journal of Zoology, the search will fail if we do not pass the query as a phrase as it will look for each word separately. ``` rcatfish_journals("Journal of Zoology") ``` We can successfully search for this query by invoking phrase = TRUE in the arguments: ``` rcatfish_journals("Journal of Zoology", phrase = TRUE) ``` ## 3.9: Using rcatfish to Search the Eschmeyer's Catalog of Fishes Guide to Fish Collection Eschmeyer's Catalog of Fishes provides information, such as collection abbreviations, locality, previous names, and online access for museum collections with fish holdings (https://researcharchive.calacademy.org/research/ichthyology/catalog/collections.asp). This information can be accessed through rcatfish via the rcatfish_collections function. rcatfish_collections allows users to search for collections by abbreviation, country, or query term. For example, if we knew the museum abbreviation we wanted to search for, such as the UMMZ for the University of Michigan Museum of Zoology, we could do the following by simply providing UMMZ to the abbreviation argument: ``` rcatfish_collections(abbreviation = "UMMZ", country = NULL, query = NULL, verbose = TRUE) ``` We can also pass information to more than one field. This can be useful for narrowing down collection results, such as for countries that have a lot of natural history collections. For this example, lets search for collections in the United States of America and query for collections in California and Alaska. Note that to do queries longer than 1, we must ensure that all arguments are the same length. So, in this case, we need to pass the country twice in our search as follows: ``` rcatfish_collections(abbreviation = NULL, country = rep("U.S.A.",2), query = c("California","Alaska"), sleep.time = 10) ``` We may also want to query a phrase, such as "Museum of Zoology" to get a list of collections that contain that name across all collections (similar to what was covered for `rcatfish_journals`). In order to do more complex queries that are phrases, we need to use the `phrase = TRUE` argument. We can do the following search as such: ``` rcatfish_collections(query = "Museum of Zoology", phrase = TRUE) ``` ## 4.0: Troubleshooting Most of the functions in `rcatfish` require a stable internet connection to run as it connects to the online Catalog of Fishes database. If you run into problems, we recommend checking your internet connection as well as visiting the California Academy of Sciences Eschmeyer's Catalog of Fishes site (https://researcharchive.calacademy.org/research/ichthyology/catalog/fishcatmain.asp), to ensure that it is not down for routine maintenance. Should you find any entries which appear to return different data than anticipated, check the Catalog of Fishes directly to confirm the error and then use the issues section on Github (repo sborstein/rcatfish). Remember that `rcatfish` is designed to return exactly what is published and will capture any mistakes that are directly in the catalog. Please note that the authors of this package are not affiliated with Eschmeyer's Catalog of Fishes nor the California Academy of Sciences. As such, we are not able to correct any errors that exist on the Catalog of Fishes or fix/troubleshoot any issues with the Catalog of Fishes itself. ## 5.0: Final Comments Further information on the functions and their usage can be found in the help files `help(package=rcatfish)`. For any further issues and questions send an email with subject 'rcatfish support' to borstein@txstate.edu or post to the issues section on GitHub. ## References: Eschmeyer WN (1998). Catalog of Fishes California Academy of Sciences, San Francisco, California, 2905 pp. Fricke R (2025). Eschmeyer's Catalog of Fishes: References. https://researcharchive.calacademy.org/research/ichthyology/catalog/fishcatmain.asp. Fricke R, Eschmeyer WN (2025). Eschmeyer's Catalog of Fishes: Guide to Fish Collections. https://researcharchive.calacademy.org/research/ichthyology/catalog/collections.asp. Fricke R, Eschmeyer WN (2025). Eschmeyer’s Catalog of Fishes: Journals. https://researcharchive.calacademy.org/research/ichthyology/catalog/journals.asp. Fricke R, Eschmeyer WN, Fong JD (2025). Eschmeyer’s Catalog of Fishes: Species by family/subfamily in the Catalog of Fishes. https://researcharchive.calacademy.org/research/ichthyology/catalog/SpeciesByFamily.asp. Fricke R, Eschmeyer WN, van der Laan R (2025). Eschmeyer's Catalog of Fishes: Genera, Species, References. https://researcharchive.calacademy.org/research/ichthyology/catalog/fishcatmain.asp. Fricke R, van der Laan R, Fong JD (2025). Eschmeyer’s Catalog of Fishes: Changes and Additions. https://researcharchive.calacademy.org/research/ichthyology/catalog/ChangeSummary.asp. van der Laan R, Fricke R, Eschmeyer WN (2025). Eschmeyer's Catalog of Fishes: Classification. https://www.calacademy.org/scientists/catalog-of-fishes-classification/. van der Laan R, Fricke R, Fong J (2025). Eschmeyer's Catalog of Fishes: Glossary. https://www.calacademy.org/scientists/catalog-of-fishes-glossary/.