geexIn the basic set-up, M-estimation applies to estimators of the \(p \times 1\) parameter \(\theta=(\theta_1, \theta_2, \dots, \theta_p)^{\intercal}\) which can be obtained as solutions to an equation of the form
\[\begin{equation} \label{eq:psi} \sum_{i = 1}^m \psi(O_i, \theta) = 0, \end{equation}\]
where \(O_1, \ldots, O_m\) are \(m\) independent and identically distributed (iid) random variables, and the function \(\psi\) returns a vector of length \(p\) and does not depend on \(i\) or \(m\) (Stefanski and Boos 2002, hereafter SB). See SB for the case where the \(O_i\) are independent but not necessarily identically distributed. The roots of (1) are referred to as M-estimators and denoted by \(\hat \theta\). M-estimators can be solved for analytically in some cases or computed numerically in general. Under certain regularity conditions, the asymptotic properties of \(\hat{\theta}\) can be derived from Taylor series approximations, the law of large numbers, and the central limit theorem. In brief, let \(\theta_0\) be the true parameter value defined by \(\int \psi(o, \theta_0) dF(o) = 0\), where \(F\) is the distribution function of \(O\). Let \(\dot{\psi}(o, \theta) = {\partial \psi(O_i, \theta)/\partial \theta}^{\intercal}\), \(A(\theta_0) = E[-\dot{\psi}(O_1, \theta_0)]\), and \(B(\theta_0) = E[\psi(O_1, \theta_0)\psi(O_1, \theta_0)^{\intercal}]\). Then under suitable regularity assumptions, \(\hat{\theta}\) is consistent and asymptotically Normal, i.e.,
\[\begin{equation*} \label{eq:variance} \sqrt{m}(\hat{\theta} - \theta_0) \overset{d}{\to} N(0, V(\theta_0)) \text{ as } m \to \infty, \end{equation*}\]
where \(V(\theta_0) = A(\theta_0)^{-1} B(\theta_0) \{A(\theta_0)^{-1} \}^{\intercal}\). The sandwich form of \(V(\theta_0)\) suggests several possible large sample variance estimators. For some problems, the analytic form of \(V(\theta_0)\) can be derived and estimators of \(\theta_0\) and other unknowns simply plugged into \(V(\theta_0)\). Alternatively, \(V(\theta_0)\) can be consistently estimated by the empirical sandwich variance estimator, where the expectations in \(A(\theta)\) and \(B(\theta)\) are replaced with their empirical counterparts. Let \(A_i = - \dot{\psi}(O_i, \theta)|_{\theta = \hat{\theta}}\), \(A_m = m^{-1} \sum_{i = 1}^m A_i\), \(B_i = \psi(O_i, \hat{\theta}) \psi(O_i, \hat{\theta})^{\intercal}\), and \(B_m = m^{-1} \sum_{i = 1}^m B_i\). The empirical sandwich estimator of the variance of \(\hat{\theta}\) is:
\[\begin{equation} \label{eq:esve} \hat{\Sigma} = A_m^{-1} B_m \{A_m^{-1}\}^{\intercal}/m . \end{equation}\]
The geex package provides an application programming
interface (API) for carrying out M-estimation. The analyst provides a
function, called estFUN, corresponding to \(\psi(O_i, \theta)\) that maps data \(O_i\) to a function of \(\theta\). Numerical derivatives approximate
\(\dot{\psi}\) so that evaluating \(\hat{\Sigma}\) is entirely a computational
exercise. No analytic derivations are required from the analyst.
Consider estimating the population mean \(\theta = E[Y_i]\) using the sample mean \(\hat{\theta} = m^{-1} \sum_{i=1}^m Y_i\) of \(m\) iid random variables \(Y_1, \dots, Y_m\). The estimator \(\hat{\theta}\) can be expressed as the solution to the following estimating equation:
\[ \sum_{i = 1}^m (Y_i - \theta) = 0. \]
which is equivalent to solving \(\eqref{eq:psi}\) where \(O_i = Y_i\) and \(\psi(O_i, \theta) = Y_i - \theta\). An
estFUN is a translation of \(\psi\) into a function whose first argument
is data and returns a function whose first argument is
theta. An estFUN corresponding to the
estimating equation for the sample mean of \(Y\) is:
meanFUN <- function(data){ function(theta){ data$Y - theta } } .
If an estimator fits into the above framework, then the user need
only specify estFUN. No other programming is required to
obtain point and variance estimates. The remaining sections provide
examples of translating \(\psi\) into
an estFUN.
The three examples of M-estimation from SB are presented here for
demonstration. For these examples, the data are \(O_i = \{Y_{1i}, Y_{2i}\}\) where \(Y_1 \sim N(5, 16)\) and \(Y_2 \sim N(2, 1)\) for \(m = 100\) and are included in the
geexex dataset.
The first example estimates the population mean (\(\theta_1\)) and variance (\(\theta_2\)) of \(Y_1\). The solution to the estimating equations below are the sample mean \(\hat{\theta}_1 = m^{-1} \sum_{i = 1}^m Y_{1i}\) and sample variance \(\hat{\theta}_2 = m^{-1} \sum_{i = 1}^m (Y_{1i} - \hat{\theta}_1)^2\).
\[ \psi(Y_{1i}, \theta) = \begin{pmatrix} Y_{1i} - \theta_1 \\ (Y_{1i} - \theta_1)^2 - \theta_2 \end{pmatrix} \]
SB1_estfun <- function(data){
Y1 <- data$Y1
function(theta){
c(Y1 - theta[1],
(Y1 - theta[1])^2 - theta[2])
}
}The primary geex function is m_estimate,
which requires two inputs: estFUN (the \(\psi\) function), data (the
data frame containing \(O_i\) for \(i = 1, \dots, m\)). The package defaults to
rootSolve::multiroot or estimating roots of \(\eqref{eq:psi}\), though the solver
algorithm can be specified in the root_control argument.
Starting values for rootSolve::multiroot are passed via the
root_control argument; e.g.,
library(geex)
results <- m_estimate(
estFUN = SB1_estfun,
data = geexex,
root_control = setup_root_control(start = c(1,1)))n <- nrow(geexex)
A <- diag(1, nrow = 2)
B <- with(geexex, {
Ybar <- mean(Y1)
B11 <- mean( (Y1 - Ybar)^2 )
B12 <- mean( (Y1 - Ybar) * ((Y1 - Ybar)^2 - B11) )
B22 <- mean( ((Y1 - Ybar)^2 - B11)^2 )
matrix(
c(B11, B12,
B12, B22), nrow = 2
)
})
# closed form roots
theta_cls <- c(mean(geexex$Y1),
# since var() divides by n - 1, not n
var(geexex$Y1) * (n - 1)/ n )
# closed form sigma
Sigma_cls <- (solve(A) %*% B %*% t(solve(A))) / n
comparison <- list(geex = list(estimates = coef(results), vcov = vcov(results)),
cls = list(estimates = theta_cls, vcov = Sigma_cls))The m_estimate function returns an object of the
S4 class geex, which contains an
estimates slot and vcov slot for \(\hat{\theta}\) and \(\hat{\Sigma}\), respectively. These slots
can be accessed by the functions coef (or
roots) and vcov. The point estimates obtained
for \(\theta_1\) and \(\theta_2\) are analogous to the base R
functions mean and var (after multiplying by
\(m-1/m\) for the latter). SB gave a
closed form for \(A(\theta_0)\) (an
identity matrix) and \(B(\theta_0)\)
(not shown here) and suggest plugging in sample moments to compute \(B(\hat{\theta})\). The point and variance
estimates using both geex and the analytic solutions are
shown below}. The maximum absolute difference between either the point
or variance estimates is 4e-11, thus demonstrating excellent agreement
between the numerical results obtained from geex and the
closed form solutions for this set of estimating equations and data.
## $geex
## $geex$estimates
## [1] 5.044563 10.041239
##
## $geex$vcov
## [,1] [,2]
## [1,] 0.10041239 0.03667969
## [2,] 0.03667969 2.49219638
##
##
## $cls
## $cls$estimates
## [1] 5.044563 10.041239
##
## $cls$vcov
## [,1] [,2]
## [1,] 0.10041239 0.03667969
## [2,] 0.03667969 2.49219638
This example calculates a ratio estimator and illustrates the delta method via M-estimation. The estimating equations target the means of \(Y_1\) and \(Y_2\), labelled \(\theta_1\) and \(\theta_2\), as well as the estimand \(\theta_3 = \theta_1/ \theta_2\).
\[ \psi(Y_{1i}, Y_{2i}, \theta) = \begin{pmatrix} Y_{1i} - \theta_1 \\ Y_{2i} - \theta_2 \\ \theta_1 - \theta_3\theta_2 \end{pmatrix} \]
The solution to \(\eqref{eq:psi}\) for this \(\psi\) function yields \(\hat{\theta}_3 = \bar{Y}_1 / \bar{Y}_2\), where \(\bar{Y}_j\) denotes the sample mean of \(Y_{j1}, \ldots,Y_{jm}\) for \(j=1,2\).
SB2_estfun <- function(data){
Y1 <- data$Y1; Y2 <- data$Y2
function(theta){
c(Y1 - theta[1],
Y2 - theta[2],
theta[1] - (theta[3] * theta[2])
)
}
}results <- m_estimate(
estFUN = SB2_estfun,
data = geexex,
root_control = setup_root_control(start = c(1, 1, 1)))# Comparison to an analytically derived sanwich estimator
A <- with(geexex, {
matrix(
c(1 , 0, 0,
0 , 1, 0,
-1, mean(Y1)/mean(Y2), mean(Y2)),
byrow = TRUE, nrow = 3)
})
B <- with(geexex, {
matrix(
c(var(Y1) , cov(Y1, Y2), 0,
cov(Y1, Y2), var(Y2) , 0,
0, 0, 0),
byrow = TRUE, nrow = 3)
})
## closed form roots
theta_cls <- c(mean(geexex$Y1), mean(geexex$Y2))
theta_cls[3] <- theta_cls[1]/theta_cls[2]
## closed form covariance
Sigma_cls <- (solve(A) %*% B %*% t(solve(A))) / nrow(geexex)
comparison <- list(geex = list(estimates = coef(results), vcov = vcov(results)),
cls = list(estimates = theta_cls, vcov = Sigma_cls))SB gave closed form expressions for \(A(\theta_0)\) and \(B(\theta_0)\), into which we plug in
appropriate estimates for the matrix components and compare to the
results from geex. The point estimates again show excellent
agreement (maximum absolute difference 4.4e-16), while the covariance
estimates differ by the third decimal (maximum absolute difference
1e-03).
comparison## $geex
## $geex$estimates
## [1] 5.044563 2.012793 2.506250
##
## $geex$vcov
## [,1] [,2] [,3]
## [1,] 0.100412389 -0.008718712 0.06074328
## [2,] -0.008718712 0.010693092 -0.01764626
## [3,] 0.060743283 -0.017646263 0.05215103
##
##
## $cls
## $cls$estimates
## [1] 5.044563 2.012793 2.506250
##
## $cls$vcov
## [,1] [,2] [,3]
## [1,] 0.10142666 -0.00880678 0.06135685
## [2,] -0.00880678 0.01080110 -0.01782451
## [3,] 0.06135685 -0.01782451 0.05267781
This example extends Example 1 to again illustrate the delta method. The estimating equations target not only the mean (\(\theta_1\)) and variance (\(\theta_2\)) of \(Y_1\), but also the standard deviation (\(\theta_3\)) and the log of the variance (\(\theta_4\)) of \(Y_1\).
\[ \psi(Y_{1i}, \mathbf{\theta}) = \begin{pmatrix} Y_{1i} - \theta_1 \\ (Y_{1i} - \theta_1)^2 - \theta_2 \\ \sqrt{\theta_2} - \theta_3 \\ \log(\theta_2) - \theta_4 \end{pmatrix} \]
SB3_estfun <- function(data){
Y1 <- data$Y1
function(theta){
c(Y1 - theta[1],
(Y1 - theta[1])^2 - theta[2],
sqrt(theta[2]) - theta[3],
log(theta[2]) - theta[4])
}
}## closed form roots
theta_cls <- numeric(4)
theta_cls[1] <- mean(geexex$Y1)
theta_cls[2] <- sum((geexex$Y1 - theta_cls[1])^2)/nrow(geexex)
theta_cls[3] <- sqrt(theta_cls[2])
theta_cls[4] <- log(theta_cls[2])
## Compare to closed form ##
theta2 <- theta_cls[2]
mu3 <- moments::moment(geexex$Y1, order = 3, central = TRUE)
mu4 <- moments::moment(geexex$Y1, order = 4, central = TRUE)
## closed form covariance
Sigma_cls <- matrix(
c(theta2, mu3, mu3/(2*sqrt(theta2)), mu3/theta2,
mu3, mu4 - theta2^2, (mu4 - theta2^2)/(2*sqrt(theta2)), (mu4 - theta2^2)/theta2,
mu3/(2 * sqrt(theta2)), (mu4 - theta2^2)/(2*sqrt(theta2)), (mu4 - theta2^2)/(4*theta2), (mu4 - theta2^2)/(2*theta2^(3/2)),
mu3/theta2, (mu4 - theta2^2)/theta2, (mu4 - theta2^2)/(2*theta2^(3/2)), (mu4/theta2^2) - 1) ,
nrow = 4, byrow = TRUE) / nrow(geexex)
## closed form covariance
# Sigma_cls <- (solve(A) %*% B %*% t(solve(A))) / n
comparison <- list(geex = list(estimates = coef(results), vcov = vcov(results)),
cls = list(estimates = theta_cls, vcov = Sigma_cls))SB again provided a closed form for \(A(\theta_0)\) and \(B(\theta_0)\), which we compare to the
geex results. The maximum absolute difference between
geex and the closed form estimates for both the parameters
and the covariance is 3.8e-11.
## $geex
## $geex$estimates
## [1] 5.044563 10.041239 3.168791 2.306700
##
## $geex$vcov
## [,1] [,2] [,3] [,4]
## [1,] 0.100412389 0.03667969 0.005787646 0.003652905
## [2,] 0.036679687 2.49219638 0.393240840 0.248196105
## [3,] 0.005787646 0.39324084 0.062049026 0.039162582
## [4,] 0.003652905 0.24819611 0.039162582 0.024717678
##
##
## $cls
## $cls$estimates
## [1] 5.044563 10.041239 3.168791 2.306700
##
## $cls$vcov
## [,1] [,2] [,3] [,4]
## [1,] 0.100412389 0.03667969 0.005787646 0.003652905
## [2,] 0.036679687 2.49219638 0.393240840 0.248196105
## [3,] 0.005787646 0.39324084 0.062049026 0.039162582
## [4,] 0.003652905 0.24819611 0.039162582 0.024717678