---
title: "Quick Tour of Package **optimCheck**"
author: "Martin Lysy"
date: "`r Sys.Date()`"
output:
html_vignette:
toc: true
bibliography: references.bib
csl: taylor-and-francis-harvard-x.csl
link-citations: true
vignette: >
%\VignetteIndexEntry{optimCheck: A Quick Tour}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, eval = FALSE, echo = FALSE}
rmarkdown::render("optimCheck-quicktut.Rmd") # R code to render vignette
```
\newcommand{\xx}{\boldsymbol{x}}
\newcommand{\hxx}{\hat{\boldsymbol{x}}}
\newcommand{\txx}{\tilde{\boldsymbol{x}}}
\newcommand{\bb}{\boldsymbol{b}}
\renewcommand{\AA}{\boldsymbol{A}}
## Introduction
The **optimCheck** package provides a set of tools to check that output of an optimization algorithm is indeed at a local mode of the objective function. The tools include both visual and numerical checks, the latter serving to automate formalized unit tests with e.g., the **R** packages [**testthat**](https://CRAN.R-project.org/package=testthat) or [**RUnit**](https://CRAN.R-project.org/package=RUnit).
## Example: Quadratic Objective Function
A brief overview of the package functionality is illustrated with the following example. Let
$$
Q(\xx) = \xx'\AA\xx - 2 \bb'\xx
$$
denote a quadratic objective function in $\xx \in \mathbb R^d$. If $\AA_{d \times d}$ is a positive-definite matrix, then the unique minimum of $Q(\xx)$ is $\hxx = \AA^{-1}\bb$. Let us now ignore this information and try to minimize $Q(\xx)$ using **R**'s simplest built-in mode-finding routine, provided by the **R** function `stats::optim()`.
In its simplest configuration, `stats::optim()` requires only the objective function and a starting value $\xx_0$ to initialize the mode-finding procedure. Let's consider a difficult setting for `stats::optim()`, with a relatively large $d = 12$ and a starting value $\xx_0$ which is far from the optimal value $\hxx$.
```{r, echo = -1}
set.seed(2608) # set seed to fix output
d <- 12 # dimension of optimization problem
# create the objective function: Q(x) = x'Ax - 2b'x
A <- crossprod(matrix(rnorm(d^2), d, d)) # positive definite matrix
b <- rnorm(d)
objfun <- function(x) crossprod(x, A %*% x)[1] - 2 * crossprod(b, x)[1]
xhat <- solve(A, b) # analytic solution
# numerical mode-finding using optim
xfit <- optim(fn = objfun, # objective function
par = xhat * 5, # initial value is far from the solution
control = list(maxit = 1e5)) # very large max. number of iterations
```
### Visual Checks with `optim_proj()`
Like most solvers, `stats::optim()` utilizes various criteria to determine whether its algorithm has converged, which can be assess with the following command:
```{r}
# any value other than 0 means optim failed to converge
xfit$convergence
```
Here `stats::optim()` reports that its algorithm has converged. Now let's check this visually with **optimCheck** using *projection plots*. That is, let $\txx$ denote the potential optimum of $Q(\xx)$. Then for each $i = 1,\ldots,d$, we plot
$$
Q_i(x_i) = Q(x_i, \txx_{-i}), \qquad \txx_{-i} = (\tilde x_1, \ldots, \tilde x_{i-1}, \tilde x_{i+1}, \ldots, \tilde x_d).
$$
In other words, projection plot $i$ varies only $x_i$, while holding all other elements of $\xx$ fixed at the value of the potential solution $\txx$. These plots are produced with the **optimCheck** function `optim_proj()`:
```{r, fig.width = 10, fig.height = 6, out.width = "97%"}
require(optimCheck) # load package
# projection plots
xnames <- parse(text = paste0("x[", 1:d, "]")) # variable names
oproj <- optim_proj(fun = objfun, # objective function
xsol = xfit$par, # potential solution
maximize = FALSE, # indicates that a local minimum is sought
xrng = .5, # range of projection plot: x_i +/- .5*|x_i|
xnames = xnames)
```
In each of the projection plots, the potential solution $\tilde x_i$ is plotted in red. The `xrng` argument to `optim_proj()` specifies the plotting range. Among various ways of doing this, perhaps the simplest is a single scalar value indicating that each plot should span $x_i \pm$ `xrng` $\cdot |x_i|$. Thus we can see from these plots that `stats::optim()` was sometimes up to 10% away from the local mode of the projection plots.
### Quantification of Projection Plots
Projection plots are a powerful method of assessing the convergence of mode-finding routines to a local mode. While great for interactive testing, plots are not well-suited to automated unit testing as part of an **R** package development process. To this end, **optimCheck** provides a means of quantifying the result of a call to `optim_proj()`. Indeed, a call to `optim_proj()` returns an object of class `optproj` with the following elements:
```{r}
sapply(oproj, function(x) dim(as.matrix(x)))
```
As described in the function documentation, `xproj` and `yproj` are matrices of which each column contains the $x$-axis and $y$-axis coordinates of the points contained in each projection plot. The `summary()` method for `optproj` objects coverts these to absolute and relative errors in both the potential solution and the objective function. The `print()` method conveniently displays these results:
```{r}
oproj # same print method as summary(oproj)
```
The documentation for `summary.optproj()` describes the various calculations it provides. Perhaps the most useful of these are the elementwise absolute and relative differences between the potential solution $\tilde{\xx}$ and $\hxx_\mathrm{proj}$, the vector of optimal 1D solutions in each projection plot. For convenience, these can be extracted with the `diff()` method:
```{r}
diff(oproj) # equivalent to summary(oproj)$xdiff
# here's exactly what these are
xsol <- summary(oproj)$xsol # candidate solution
xopt <- summary(oproj)$xopt # optimal solution in each projection plot
xdiff <- cbind(abs = xopt-xsol, rel = (xopt-xsol)/abs(xsol))
range(xdiff - diff(oproj))
```
Thus it is proposed that a combination of `summary()` and `diff()` methods for projection plots can be useful for constructing automated unit tests. In this case, plotting itself can be disabled by passing `optim_proj()` the argument `plot = FALSE`. See the `optimCheck/tests` folder for **testthat** examples featuring:
- Logistic Regression (`stats::glm()` function).
- Quantile Regression (`quantreg::rq()` function in [**quantreg**](https://CRAN.R-project.org/package=quantreg))
- *Multivariate normal mixtures* (`mclust::emEEE()` in [**mclust**](https://CRAN.R-project.org/package=mclust)).
You can run these tests with the command
```{r eval = FALSE}
testthat::test_package("optimCheck", reporter = "progress")
```
## `optim_refit()`: A Numerical Alternative to Projection Plots
There are some situations in which numerical quantification of projection plots leaves to be desired:
Generating all projection plots requires `N = 2 * npts * length(xsol)` evaluations of the objective function (where the default value is `npts = 100`), which can belabor the process of automated unit testing. A different test for mode-finding routines is to recalculate the optimal solution with an "very good" starting point: the current potential solution. This is the so-called "**refi**ne op**t**izimation" -- or `refit` -- strategy.
The `optim_refit()` function refines the optimization with a call to **R**'s built-in general-purpose optimizer: the function `stats::optim()`. In particular, it selects the default Nelder-Mead simplex method with a simplified parameter interface. As seen in the unit tests above, the `refit` checks are 2-3 times faster than their projection plot counterparts. Consider now the example of refining the original `stats::optim()` solution to the quadratic objective function:
```{r}
orefit <- optim_refit(fun = objfun, # objective function
xsol = xfit$par, # potential solution
maximize = FALSE) # indicates that a local minimum is sought
summary(orefit) # same print method as orefit
```
Thus we can see that the first and second run of `stats::optim()` are quite different.
Of course, this does not mean that the refit solution produced by `stats::optim()` is a local mode:
```{r, fig.width = 10, fig.height = 6, out.width = "97%"}
# projection plots with refined solution
optim_proj(xsol = orefit$xopt, fun = objfun,
xrng = .5, maximize = FALSE)
```
Indeed, the default `stats::optim()` method is only accurate when initialized close to the optimal solution. Therefore, one may wish to run the refit test with a different optimizer. This can be done externally to `optim_refit`, prior to passing the refit solution to the function via its argument `xopt`. This is illustrated below using `stats::optim()`'s gradient-based quasi-Newton method:
```{r}
# gradient of the objective function
objgrad <- function(x) 2 * drop(A %*% x - b)
# mode-finding using quasi-Newton method
xfit2 <- optim(fn = objfun, # objective function
gr = objgrad, # gradient
par = xfit$par, # initial value (first optim fit)
method = "BFGS")
# external refit test with optimizer of choice
orefit2 <- optim_refit(fun = objfun,
xsol = xfit$par, # initial value (first optim fit)
xopt = xfit2$par, # refit value (2nd fit with quasi-Newton method
maximize = FALSE)
# project plot test on refit solution
optim_proj(xsol = orefit2$xopt, fun = objfun,
xrng = .5, maximize = FALSE, plot = FALSE)
```
## Future Work: Constrained Optimization
Many constrained statistical optimization problems, seek a "sparse" solution, i.e., one for which some of the elements of the optimal solution are equal to zero. In such cases, the relative difference between potential and optimal solution is an unreliable metric. A working proposal is to flag these "true zeros" in `optim_proj()` and `optim_refit()`, so as to add a 1 to the relative difference denominators. Other suggestions on this and **optimCheck** in general are [welcome](mailto:mlysy@uwaterloo.ca).