---
title: "z - Advanced topic: Understanding the rPackages set release cycle and using bleeding edge packages"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{z-advanced-topic-understanding-the-rpackages-set-release-cycle-and-using-bleeding-edge-packages}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r, include=FALSE}
library(rix)
```

## Introduction

It is important to understand the release cycle of the rPackages set and what
steps you should take if you need bleeding edge packages. R packages on
`nixpkgs` tend to get updated alongside a new release of R, and the reason is to
ensure a certain level of quality. The vast majority of CRAN (and Bioconductor)
packages are made available through `nixpkgs` in a fully automated way. But some
packages do require some manual intervention to work on Nix, and we only know
this if we try to build these packages, but building packages requires quite a
lot of resources. We can’t build CRAN packages every single day to see if
everything works well on Nix, so we only rebuild the whole tree whenever there’s
a new release of R. Packages get built on a CI infrastructure called *Hydra*,
and then these packages get cached on
[cache.nixos.org](https://cache.nixos.org/) so whenever someone wants to install
a package, a pre-built binary gets download from the cache. This avoids having
to build software from source locally. For packages that don’t need compiling
this is not that big of a time save, but for packages that do need to get
compiled it is huge. Depending on which packages you want to install, if you had
to build everything from source, it could potentially take hours, but if you can
install pre-built binaries it’s just a matter of how quick your Internet
connection is.

## R packages available through Nix

As explained in the introduction, the *rPackages* set on `nixpkgs` gets updated
shortly after a new release of R. The process involves first updating the
package definitions found
[here](https://github.com/NixOS/nixpkgs/tree/master/pkgs/development/r-modules),
and then building the whole tree on a CI platform called *Hydra*. Build failures
then get fixed by volunteers (to learn how you can contribute, read the
`vignette("z-contributing_to_nixpkgs")`). After the most important packages have
been fixed, the whole rPackages set gets updated and made available through
`nixpkgs` master branch.

Essentially this means that if you start a project with `{rix}` using
`"latest-upstream"` as the `r_ver` just after the rPackages set got updated,
this project will use very fresh packages. But if instead you start a project
just before an R release, then the environment will be using older packages. In
practice this rarely matters, unless you absolutely need a very recent version
of a specific package because you need a specific feature, or if you need an
environment with bleeding edge packages for development. For cases like this, we
provide the `r_ver = "bleeding-edge"` and `r_ver = "frozen-edge"` options. If
you need to test the current development version of R, you can use
`r_ver = "r-devel"`, and if you need to test the current development version
of Bioconductor use `r_ver = "bioc-devel"` and if you need both the development
version of R and Bioconductor use `r_ver = "r-devel-bioc-devel"`. The table below
illustrates this more clearly:

```{=html}
<table border="1">
  <thead>
    <tr>
      <th>r_ver or date</th>
      <th>Intended use</th>
      <th>State of R version</th>
      <th>State of CRAN packages</th>
      <th>State of Bioconductor packages</th>
      <th>State of other packages in Nixpkgs</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>r_ver = "latest-upstream"</td>
      <td>Start of new project where versions don’t matter</td>
      <td>Current or previous</td>
      <td>Outdated (up to 6 months)</td>
      <td>Outdated (up to 6 months)</td>
      <td>Current at time of generation</td>
    </tr>
    <tr>
      <td>r_ver = "4.4.2" (or other)</td>
      <td>Reproducing old project or starting a new project where versions don’t matter</td>
      <td>Same as in `r_ver`, check `available_r()`</td>
      <td>Outdated (up to 2 months if using latest release)</td>
      <td>Outdated (up to 2 months if using latest release)</td>
      <td>Potentially outdated (up to 12 months)</td>
    </tr>
    <tr>
      <td>date = "2024-12-14"</td>
      <td>Reproducing old project or starting a new project using the most recent date</td>
      <td>Current at that date, check `available_dates()`</td>
      <td>Current at that date, check `available_dates()`</td>
      <td>Current at that date, check `available_dates()`</td>
      <td>Potentially outdated (up to 12 months)</td>
    </tr>
    <tr>
      <td>r_ver = "bleeding-edge"</td>
      <td>To develop against the latest release of CRAN</td>
      <td>Always current</td>
      <td>Always current</td>
      <td>Always current</td>
      <td>Always current</td>
    </tr>
    <tr>
      <td>r_ver = "frozen-edge"</td>
      <td>To develop against the latest release of CRAN, but manually manage updates</td>
      <td>Current at time of generation</td>
      <td>Current at time of generation</td>
      <td>Current at time of generation</td>
      <td>Current at time of generation</td>
    </tr>
    <tr>
      <td>r_ver = "r-devel"</td>
      <td>To develop/test against the development version of R</td>
      <td>Development version</td>
      <td>Always current</td>
      <td>Always current</td>
      <td>Always current</td>
    </tr>
    <tr>
      <td>r_ver = "r-devel-bioc-devel"</td>
      <td>To develop/test against the development version of R and Bioconductor</td>
      <td>Development version</td>
      <td>Always current</td>
      <td>Development version</td>
      <td>Always current</td>
    </tr>
    <tr>
      <td>r_ver = "bioc-devel"</td>
      <td>To develop/test against the development version of Bioconductor</td>
      <td>Always current</td>
      <td>Always current</td>
      <td>Development version</td>
      <td>Always current</td>
    </tr>
  </tbody>
</table>
```

If you want to benefit from relatively fresh packages and have a stable environment
that for production purposes, using a date for `r_ver` is your best option.

## Using bleeding and frozen edge package for your environments

CRAN is continuously getting new or updated packages. When you use R outside of
Nix, running `install.packages(pkg)` will install the latest version of the
`{pkg}` package available from CRAN (unless you changed the default repository
url). With Nix, packages do not get downloaded from CRAN but for the `nixpkgs`
repository and they may be outdated as explained above. If you require bleeding
edge packages, use:

- `"bleeding-edge"` for the very latest packages as available currently on CRAN
  and Bioconductor (there could be a slight delay of some hours though): be
  aware that building this expression will always result in a different,
  updated environment. As such, `"bleeding-edge"` is ideal for running tests
  against the current state of CRAN or Bioconductor on CI for example;
- `"frozen-edge"`, which is the same as `"bleeding-edge"`, but with a pinned
  revision: in other words, building this expression will always result in the
  same environment.
- `"r-devel"`: the same as `"bleeding-edge"` but also with added development
  version of R. Building this expression will also always result in a different,
  updated environment.
- `"bioc-devel"`: the same as `"bleeding-edge"` but also with added development
  versions of Bioconductor packages instead of the latest release. Building this
  expression will also always result in a different, updated environment.
- `"r-devel-bioc-devel"`: the same as `"bioc-devel"` but also with added
  development version of R. Building this expression will also always result in
  a different, updated environment.

Just as when you provide a date or an R version to `rix()`, using these options
will point to our `rstats-on-nix` fork of
[Nixpkgs](https://github.com/rstats-on-nix/nixpkgs/tree/r-daily). This fork gets
updated every 12 hours with the latest commits from both the `nixpkgs`
repository and CRAN. This means that environments generated using this fork will
contain bleeding packages for both the CRAN (and Bioconductor) packages as well
as system-level dependencies.

In any case, whatever option you choose, using them comes at a cost of which you
must be aware.

First, because these packages are bleeding edge, they have not had the chance to
be built by *Hydra* yet. *Hydra* periodically builds packages and these get
cached. So if you’re using Nix, pre-compiled binaries get used instead of being
built from source. This is not the case if you use our fork, **unless** you also
use the cache that we provide, courtesy of
[cachix.org](https://app.cachix.org/cache/rstats-on-nix#pull). However, we are
limited in space, and cannot possible cache all the CRAN and Bioconductor
packages. So only the most popular packages get built and cached, and hopefully
the packages you need are part of this limited set. To use the cache, run the
following commands on your computer. First, install the Cachix client (if you
followed these instructions already when you set up `{rix}` you can skip them):

```
nix-env -iA cachix -f https://cachix.org/api/v1/install
```

then use the cache:

```
cachix use rstats-on-nix
```

that’s it! Packages now not only will get pulled from the official cache, but
also from the dedicated `rstats-on-nix` cache. Our cache also contains the
latest version of R, which sometimes can also lag behind on the official
`nixpkgs` repository.

When building an environment, if you see the following message:

```
warning: ignoring the client-specified setting 'trusted-substituters', because it is a restricted setting and you are not a trusted user
```

then this means that the cache was not configured properly. Try `cachix use
rstats-on-nix` again, and check that the `/etc/nix/nix.conf` file has a line
that looks like `trusted-users = root YOUR_USERNAME`. If not, add it using
admin/root privileges.

## Making your own cache

If you need packages that are not included in the `rstats-on-nix` cache, you can
very easily roll out your own binary cache. Simply build the environment once on
one machine, and then push the binaries in a cache for re-use. Refer to the
vignette `vignette("z-binary_cache")` to learn how to set up your own cache.