Require approach,
comparing pak and renvRequire is a single package that combines features of
base::install.packages, base::library,
base::require, as well as pak::pkg_install,
remotes::install_github, and
versions::install_version, plus the snapshotting
capabilities of renv. It takes its name from the idea that
a user could simply have one line named from the require
function that would load a package, but in this case it will also
install the package if necessary. Set it and forget it. This means that
even if a user has a dependency that is removed from CRAN (“archived”),
the line will still work. Because it can be done in one line, it becomes
relatively easy to share, which facilitates, for example, making
reprexes for debugging. This package can be a key part of a reproducible
workflow.
RequireRequire is designed with features that facilitate
running R code that is part of a continuous reproducible workflow, from
data-to-decisions. For this to work, all functions called by a user
should have a property whereby the initial time they are called does the
heavy work, and the subsequent times are sufficiently fast that the user
is not forced to skip over lines of code when re-running code. This is
called “rerun-tolerance”, i.e., the line can be rerun under identical
conditions and very quickly return the original result. The package,
reproducible, has a function Cache which can
convert many function calls to have this property. It does not work well
for functions whose objectives are side-effects, like installing and
loading packages. Require fills this gap.
Features include:
==3.5.0 or
>=3.5.0).options-level control of which packages should be
installed from source (see RequireOptions()) even if they
are being downloaded from a binary repository.install.packages like “already in use”.Require uses install.packages internally to
install packages. However, it does not let install.packages
download the packages. Rather, it identifies dependencies recursively,
finds out where they are (CRAN, GitHub, Archives, Local), downloads them
(or gets from local cache or clones from an specified package library).
If libcurl is available (assessed via
capabilities("libcurl")), it will download them in parallel
from CRAN-like repositories. If sys is installed, it will
download GitHub packages in parallel also. If a user has not set
options("Ncpus") manually, then it will set that to a value
up to 8 for parallel installs of binary and source packages.
To be functionally reproducible, code must be regularly run and tested on many operating systems and computers. When this does not happen, a user/developer does not know that certain code chunks no longer work until they try to run it later. In other words, code gets stale because underlying algorithms and data change. To be rerun-tolerant, a function must:
Require does both of these. See below “why is it
fast”.
It is common during code development to work in teams, and to be updating package code. This is beneficial whether the team is very tight, all working on exactly the same project, or looser where they only share certain components across diverse projects.
If the whole team is working on the same “whole” project, then it may
be useful to use a “package snapshot” approach, as is used with the
renv package. Require offers similar
functionality with the function pkgSnapshot(). Using this
approach provides a mechanism for each team member to update code, then
snapshot the project, commit the snapshot and push to the cloud for the
team to share.
However, if a team is more diversified and they are actually sharing
the new code, but not the whole project, then project snapshots will be
very inefficient and package management must be on a package-by-package
case, not the whole project. In other words, the code developer can work
on their package, and the various team members will have 2 options of
what they might want to do: keep at the bleeding edge or update only if
necessary for dependencies. More likely, they will want to have a
mixture of these strategies, i.e., bleeding edge with some code, but
only if necessary with others. Thus, Require offers
programmatic control for this. For example
library(Require)
Require::Install(
c("PredictiveEcology/reproducible@development (HEAD)",
"PredictiveEcology/SpaDES.core@development (>=2.0.5.9004)")) will keep the project at the bleeding edge of the development branch
of reproducible, but will only update if necessary (based
on the version needed, expressed by the inequality) for the development
branch of SpaDES.core. The user does not have to make
decisions at run time as to whether an update should be made, and for
which packages.
Require differs from other approachesFor packages that are not yet installed:
| Description | Outcome |
|---|---|
Install("data.table") |
data.table installed |
install.packages("data.table") |
data.table installed |
pak::pkg_install("data.table") |
data.table installed |
renv::install("data.table") |
data.table installed |
For packages that are installed:
| Description | Outcome |
|---|---|
Install("data.table") |
No installation |
install.packages("data.table") |
data.table installed |
pak::pkg_install("data.table") |
No installation |
renv::install("data.table") |
data.table installed |
For packages that are already installed, but not latest on CRAN:
| Description | Outcome |
|---|---|
Install("data.table") |
No installation |
install.packages("data.table") |
data.table installed |
pak::pkg_install("data.table") |
data.table installed, asks user if wants to update if
available |
renv::install("data.table") |
data.table installed, asks user if wants to update if
available |
pak and
RequireThis table is based on Require v1.0.0 and
pak v0.7.2.
* Indicates that there is an example below.
| Description | Require |
pak |
|---|---|---|
| Parallel downloads | Yes | Yes |
| Parallel installs | Yes | Yes |
Archived package* (e.g., "knn") |
Automatic | Must prefix with url:: and exact url
path |
| Archived package in dependency* | Automatic | May not work, even if manually adding
url:: or any:: |
| Dependency conflicts* | Yes | No (see example below using any::) |
| Multiple requests of same package* | Resolves by version number specification, or most recent version | Error |
| Control individual package updates | With HEAD |
No |
| Very clean messaging | somewhat, with
options(Require.installPackagesSys = 1) |
Yes |
| Package dependencies | data.table, sys |
None (though yes if user wants control, e.g.,
pkgcache) |
| Uses local cache | Yes | Yes |
| Package updates (default) | No, unless needed by version number | Yes, prompt user |
| Package install by version | Yes | Yes, but does not deal well with multiple packages with specific versions |
| Package conflict (CRAN & GitHub)* | Prefers CRAN, if version requirements met | Error |
| Version specification by user | Yes e.g., Require (>=1.0.0) |
Not an option |
| Exact version specification by user | Uses DESCRIPTION file approach e.g.,
Require (==1.0.0) |
Uses @ e.g.,
Require@1.0.0 |
| Version conflicts | Require attempts to resolve them, detailing conflict | Reports “dependency conflict” without details |
| Cache of package dependencies | Yes (internally in Require::pkgDep) |
No (cache not used in pak::pkg_dep) |
Additional_repositories (in DESCRIPTION
file of a package) |
Uses | Does not use (like
install.packages) |
| Cache of package binaries built locally from source | Yes | No (pak version 0.7.2) |
Between mid March 2024 and April 5, 2024, fastdigest was
taken off CRAN. If this is part of your direct dependencies,
you can remove it and find an alternative. However, if it is an indirect
dependency, you don’t have that choice: your workflow will break.
Require will just get the most recent archived copy and the
work can continue. While fastdigest is back on CRAN, others
are not, e.g., an older knn package:
When doing code development, it is common to use many
GitHub packages. Each of these (or their dependencies) may
point to one or more branches, either directly by user or in
Remotes field. In this next example, pak
errors, while Require makes decisions and installs. This is
a common occurrence for teams developing packages concurrently. The
pak approach suggests prepending any:: to the
package(s) that is/are causing the conflict. This may suffice under some
situations. The Require approach is to assume the
equivalent of any:: which means to prioritize base on (in
this order) 1. use package version requirements, 2. CRAN-like
repositories, 3. order.
library(Require)
# Fails because of a) packages taken off CRAN & multiple GitHub branches requested within the nested dependencies
pkgs <- c("reproducible", "PredictiveEcology/SpaDES@development")
dirTmp <- tempdir2(sub = "first")
.libPaths(dirTmp)
install.packages("pak") # need this in the library; can't use personal library version
try(pak::pkg_install(pkgs))
# ✔ Loading metadata database ... done
# Error : ! error in pak subprocess
# Caused by error:
# ! Could not solve package dependencies:
# * reproducible: dependency conflict
# * PredictiveEcology/SpaDES@development: Can't install dependency PredictiveEcology/reproducible@development (>= 2.0.10)
# * PredictiveEcology/reproducible@development: Conflicts with reproducible
pkgsAny <- c("any::reproducible", "PredictiveEcology/SpaDES@development")
try(pak::pkg_install(pkgsAny))
# Fine
dirTmp <- tempdir2(sub = "second")
.libPaths(dirTmp)
Require::Install(pkgs)# Fails
try(pk <- pak::pak(c("PredictiveEcology/LandR@development", "PredictiveEcology/LandR@main")))
# Error : ! error in pak subprocess
# Caused by error:
# ! Could not solve package dependencies:
# * PredictiveEcology/LandR@development: Conflicts with PredictiveEcology/LandR@main
# * PredictiveEcology/LandR@main: Conflicts with PredictiveEcology/LandR@development
# Fine -- takes in order, so main first in this example
rq <- Require::Install(c("PredictiveEcology/LandR@main", "PredictiveEcology/LandR@development"))
# Fine -- takes by version requirement, so takes development,
# which is the only one that fulfills requirement on Jul 25, 2024
rq <- Require::Install(c("PredictiveEcology/LandR@main", "PredictiveEcology/LandR@development (>=1.1.5)"))The following does not work with pak because BioSIM, a
dependency on GitHub is not found. This may be because the package name
is not the repository name, but it is not clear from the error message
why:
Version number requirements drive package updates. If a user does not need an update because version numbers are sufficient, no update will occur.
If no version number specification, then installs only occur if package is not present.
Multiple simultaneous requests to install a package from what appear to be incompatible sources, will not create a conflict unless version requirements cause the conflict. If version number requirements are not specified, CRAN versions will take precedence, and sequence of packages listed at installation will take preference otherwise.
# The following has no version specifications,
# so CRAN version will be installed or none installed if already installed
Require::Install(c("PredictiveEcology/reproducible@development", "reproducible"))
# The following specifies "HEAD" after the Github package name. This means the
# tip of the development branch of reproducible will be installed if not already installed
Require::Install(c("PredictiveEcology/reproducible@development (HEAD)", "reproducible"))
# The following specifies "HEAD" after the package name. This means the
# tip of the development branch of reproducible
Require::Install(c("PredictiveEcology/reproducible@development", "reproducible (HEAD)"))
# Not a problem because version number specifies
Require::Install(c("PredictiveEcology/reproducible@modsForLargeArchives (>=2.0.10.9010)",
"PredictiveEcology/reproducible (>= 2.0.10)"))
# Even if branch does not exist, if later version requirement specifies a different branch, no error
Require::Install(c("PredictiveEcology/reproducible@modsForLargeArchives (>=2.0.10.9010)",
"PredictiveEcology/reproducible@validityTest (>= 2.0.9)"))Require can handle package version specifications at the
function call (pak can handle them if they are in a
DESCRIPTION file, if they are >=), whereas
pak cannot (currently).
Some of the features make it fast the first time being used on a system, some make it fast the second & subsequent time on a system (which can be first time in a new project). These features are caching, cloning, and parallel downloads.
Require creates a local cache of several steps: the
packages files (source or binary including locally built binaries); the
package dependency tree (only in RAM currently, so only affects the same
session); available package matrices for CRAN-like repositories.
Together, these speed up the installation of packages on a computer that
can access the local cache, e.g., for each new project.
Require keeps the binary once the source
package is built, and it can therefore install the binary each
subsequent installation. This results in dramatically faster
installations of source packages after they have been built locally.
Require has an option,
options("Require.cloneFrom"), which, when set, will create
a hard link between the current project’s package library and the
library pointed to by the option. Setting to
e.g. options("Require.cloneFrom" = Sys.getenv("R_LIBS_USER"))
will allow packages in the user’s personal library to be the source of
the “copying” to the project library. This is dramatically faster than
installing, even when the installation is a local binary from the local
cache.
On Linux, users have the ability to install binary packages that are
pre-built e.g., from the Posit Package Manager. Sometimes the binary is
incompatible with a user’s system, even though it is the correct
operating system. This occurs generally for several packages, and thus
they must be installed from source. Require has a function
sourcePkgs(), which can be informed by
options("Require.spatialPkgs") and
options("Require.otherPkgs") that can be set by a user on a
package-by-package basis. By default, some are automatically installed
from "source" because in our experience, they tend to fail
if installed from the binary.
# In this example, it is `terra` that generally needs to be installed from source on Linux
if (Require:::isUbuntuOrDebian()) {
Require::setLinuxBinaryRepo()
pkgs <- c("terra", "PSPclean")
pkgFullName <- "ianmseddy/PSPclean@development"
try(remove.packages(pkgs))
pak::cache_delete() # make sure a locally built one is not present in the cache
try(pak::pkg_install(pkgFullName))
# ✔ Loading metadata database ... done
#
# → Will install 2 packages.
# → Will download 2 packages with unknown size.
# + PSPclean 0.1.4.9005 [bld][cmp][dl] (GitHub: fed9253)
# + terra 1.7-71 [dl] + ✔ libgdal-dev, ✔ gdal-bin, ✔ libgeos-dev, ✔ libproj-dev, ✔ libsqlite3-dev
# ✔ All system requirements are already installed.
#
# ℹ Getting 2 pkgs with unknown sizes
# ✔ Got PSPclean 0.1.4.9005 (source) (43.29 kB)
# ✔ Got terra 1.7-71 (x86_64-pc-linux-gnu-ubuntu-22.04) (4.24 MB)
# ✔ Downloaded 2 packages (4.28 MB) in 2.9s
# ✔ Installed terra 1.7-71 (61ms)
# ℹ Packaging PSPclean 0.1.4.9005
# ✔ Packaged PSPclean 0.1.4.9005 (420ms)
# ℹ Building PSPclean 0.1.4.9005
# ✖ Failed to build PSPclean 0.1.4.9005 (3.7s)
# Error:
# ! error in pak subprocess
# Caused by error in `stop_task_build(state, worker)`:
# ! Failed to build source package PSPclean.
# Type .Last.error to see the more details.
# Works fine because the `sourcePkgs()`
try(remove.packages(pkgs)) # uninstall to make sure it is a clean install for this test
Require::cacheClearPackages(pkgs, ask = FALSE) # remove any existing local packages
Require::Install(pkgFullName)
}pkgDep(..., which = XX) includes
LinkingTopkgDep, by default, includes LinkingTo as
these are required by Rcpp if that is required, and so are
strictly necessary. pak::pkg_deps does not include
LinkingTo by default.
depPak <- pak::pkg_deps("PredictiveEcology/LandR@LandWeb")
depRequire <- Require::pkgDep("PredictiveEcology/LandR@LandWeb") # Slightly different default in Require
# Same
pakDepsClean <- setdiff(Require::extractPkgName(depPak$ref), Require:::.basePkgs)
requireDepsClean <- setdiff(Require::extractPkgName(depRequire[[1]]), Require:::.basePkgs)
setdiff(pakDepsClean, requireDepsClean)
setdiff(requireDepsClean, pakDepsClean) # does not report "RcppArmadillo", "RcppEigen", "cpp11" which are LinkingToIf there is no version specification, Require prefers
CRAN packages when there are multiple pointers to a package. Thus, even
though a package may have a Remotes field pointing to e.g.,
PredictiveEcology/SpaDES.tools@development, if there is a
recursive dependency within that package that specifies
SpaDES.tools without a Remotes field, then
pkgDep will return the CRAN version. If a user
wants to override this behaviour, then the user can specify a version
requirement that can only be satisfied with the Remotes
option. Then pkgDep will take that.
pak::pkg_deps prefers the top-level specification, i.e.,
the non-recursive Remotes field will be returned, even if
the same package is also specified within a recursive dependency without
a Remotes field, i.e, if a recursive dependency points the
CRAN package, it will not return that version of the dependency.
pak fails for packages on GitHub that are not same name
as Git Repo in Remotesgg <- pak::pkg_deps("PredictiveEcology/LandR@development", dependencies = TRUE)
# Error:
# ! error in pak subprocess
# Caused by error:
# ! Could not solve package dependencies:
# * PredictiveEcology/LandR@development: Can't install dependency BioSIM
# * BioSIM: Can't find package called BioSIM.
# Type .Last.error to see the more details.
ff <- Require::pkgDep("PredictiveEcology/LandR@development", dependencies = TRUE)
# $`PredictiveEcology/LandR@development`
# [1] "BH" "BIEN"
# [3] "BioSIM" "DBI (>= 0.8)"
# [5] "Deriv" "ENMeval"
# ...renv and Requirerenv has a concept of a lockfile. This lockfile records
a specific version of a package. If the current installed version of a
package is different from the lockfile (e.g., I am the developer and I
increment the local version), renv will attempt to revert
the local changes (with prompt to confirm) unless the local
package is installed from a cloud repository (e.g., GitHub), and a
snapshot is taken. This sequence is largely incompatible
with pkgload::load_all() or
devtools::install(), as these do not record “where” to get
the current version from. Thus, the renv sequence can be
quite time consuming (1-2 minutes, instead of 1 second with
pkgload::load_all()).
Require does not attempt to update anything unless
required by a package. Thus, this issue never comes up. If and when it
is important to “snapshot”, then pkgSnapshot or
pkgSnapshot2 can be used.
DESCRIPTION file to maintain minimum
versionsDuring a project, a user can build and maintain and “project-level”
DESCRIPTION file, which can be useful for a renv managed
project. This approach does not, however, automatically detect minimum
version changes or GitHub branch changes (renv::status does
not recognize these). In order for a user to inherit the correct
requirements, a manual renv::install
must be used. For even moderate sized projects, this can take over
20 seconds.
Require does not need a lockfile; package violations are
found on the fly.