---
title: "Regression in Gifi"
author: "Patrick Mair, Jan De Leeuw"
output: rmarkdown::html_vignette
bibliography: gifi.bib
vignette: >
  %\VignetteIndexEntry{Regression in Gifi}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---
  
```{r, include = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
```


This vignette shows various regression options with optimal scaling (OS). In Gifi this idea is called **morals** [@Young:1976]. We start with simple linear regression, followed by other models of increasing complexity. 

Throughout this unit we use a dataset included in the `MPsychoR` package on emotion development. Participant's age and gender are predictors, and granularity is the response. Granularity refers to a person's ability to separate their emotions into specific types. People with low granularity (here scaled in the positive direction) struggle to separate their emotions (e.g., reporting that sadness, anger, fear, and others all just feel "bad"), whereas people with high granularity are very specific in how they parse their emotions (e.g., easily distinguishing between nuanced emotions like disappointment and frustration).


## Linear Regression

We first focus on simple regression settings with `age` as predictor and `granularity` as response. We start with emulating standard **linear regression**. We standardize the variables such that we get standardized regression coefficients right away.

```{r, fig.width=5, fig.height=5}
library("Gifi")
library("MPsychoR")
data("granularity")
granularity1 <- scale(granularity[,1:2]) |> as.data.frame()
head(granularity1)
plot(granularity1[,2:1], main = "Scatterplot")
```

```{r}
fitlin1 <- lm(gran ~ -1 + age, data = granularity1)
coef(fitlin1)
```

We get a standardized slope of `r round(coef(fitlin1), 4)`, and an $R^2$ of `r round(summary(fitlin1)$r.squared, 3)`. 

Now we mimic this model using **morals**. Note that here we operate on the unstandardized, original data. Since we are performing linear (metric) transformations on granularity as well as on age, `morals()` will standardize the data internally. Note that `morals()` requires to set up the linear transformations using splines: `type = "E"` in `knotsGifi()` implies no interior knots, `xdegrees = 1` and `ydegrees = 1` tells `morals()` to fit a polynomial of degree 1, and we also set the ordinal arguments to `FALSE`.

```{r}
xknots_age <- knotsGifi(granularity$age, type = "E")      
yknots_gran <- knotsGifi(granularity$gran, type = "E")
fitlin2 <-  morals(x = granularity$age, y = granularity$gran, 
   xknots = xknots_age, yknots = yknots_gran, xdegrees = 1, ydegrees = 1,
   xordinal = FALSE, yordinal = FALSE)
fitlin2
```

The results are the same as in the `lm()` fit above. 


## Polynomial Regression

Let us extend these models by fitting **quadratic regressions**. First, we call `lm()` on the unstandardized data (we could also use the standardized data, it doesn't matter). Then we emulate this quadratic regression with `morals()`. Note that, as opposed to the linear fit, we set the polynomial degree of the predictor to a value of 2.

```{r}
fitquad1 <- lm(gran ~ age + I(age^2), data = granularity)
fitquad1
fitquad2 <-  morals(x = granularity$age, y = granularity$gran, 
  xknots = xknots_age, yknots = yknots_gran, xdegrees = 2, ydegrees = 1, 
  xordinal = FALSE, yordinal = FALSE) 
fitquad2
```

Let us produce two plots based on the quadratic `morals()` fit and then explain how `morals()` achieves a quadratic fit. The top panel shows the *optimally scaled* predictor and response, including the regression line. The bottom panel shows quadratic line when age is kept on the *original scale* while the response is on the transformed scale.

```{r, echo = 2:6, fig.width=5, fig.height=7}
op <- par(mfrow = c(2,1))
plot(fitquad2$xhat, fitquad2$yhat, xlab = "Age (transformed)", ylab = "Granularity (transformed)", 
     main = "Optimally Scaled Scatterplot")
lines(fitquad2$xhat, fitquad2$ypred, col = "coral4", lwd = 2)
plot(granularity$age, fitquad2$yhat, xlab = "Age", ylab = "Granularity (transformed)", 
     main = "Quadratic Morals Fit")
ind <- order(granularity$age)
lines(granularity$age[ind], fitquad2$ypred[ind], col = "coral4", lwd = 2)
par(op)
```

For both models we get an $R^2$ of `r round(summary(fitquad1)$r.squared, 3)`. This reflects a substantial increase compared to the linear fit. 

Both `lm()` and `morals()` calls lead to the same results even though in the LM fit we get 2 slope parameters (one for the linear term and one for the quadratic term), whereas in `morals()` we only get a single one. 

In the `morals()` call we specified a linear transformation for the response and a quadratic transformation for the predictor which leads to a quadratic regression fit (see `plot(fitquad2, plot.type = "transplot")`). Internally, `morals()` scales both variables according to these transformation functions in a way such that their relationship becomes linear (sometimes called *bilinearizability*; see top panel of the figure above). The parameter we get from the `morals()` output reflects the slope of this line. The sign of this parameter is irrelavant since it can happen that, based on the starting values used for the OS process, the sign of the transformed age scores can be switched.

If we produce the scatterplot with the original age scale we see what we actually fitted (bottom panel of the figure above): a quadratic regression where the granularity score decreases for the younger participants and increases for the older ones. On the y-axis we plot the transformed (i.e., standardized) response in order to be able to picture the curve. A cubic regression can be fitted by setting `xdegrees=3`. 

## Piecewise Linear Regression

The next model we fit is a **piecewise linear regression**. That is, we want to fit two lines which connect at a particular age knot. In spline terminology we fit a spline of degree 1 with one interior knot. Using the knots utility function for setting quantile knots, it places the knot at the median.

```{r}
xknots_age2 <- knotsGifi(granularity$age, "Q", n = 1)
xknots_age2
```

Alternatively we could also set a knot at a particular age value of interest (e.g., 18 years) by modifying `xknots_age2` as follows: 

```{r}
xknots_age2[[1]] <- 18
```

Now we fit the piecewise regression model and produce the effects plot: 

```{r}
fitpiece <-  morals(x = granularity$age, y = granularity$gran, 
   xknots = xknots_age2, yknots = yknots_gran, xdegrees = 1, ydegrees = 1,
   xordinal = FALSE, yordinal = FALSE)
fitpiece
```

```{r, echo = 2:6, fig.width=5, fig.height=7}
op <- par(mfrow = c(2,1))
plot(fitpiece$xhat, fitpiece$yhat, xlab = "Age (transformed)", ylab = "Granularity (transformed)", 
     main = "Optimally Scaled Scatterplot")
lines(fitpiece$xhat, fitpiece$ypred, col = "coral4", lwd = 2)
plot(granularity$age, fitpiece$yhat, xlab = "Age", ylab = "Granularity (transformed)", 
     main = "Piecewise Linear Morals Fit")
ind <- order(granularity$age)
lines(granularity$age[ind], fitpiece$ypred[ind], col = "coral4", lwd = 2)
par(op)
```

We get an $R^2$ of `r round(fitpiece$smc, 3)`. The regression fit with the break at 18 years is shown in the bottom panel of the figure above. 

## Spline Regression

Next, we fit a **spline regression** using a quadratic spline (`xdegrees = 2`) with 3 interior quantile knots (i.e., we set them at the terciles), and produce the effects plots as above:

```{r}
xknots_age3 <- knotsGifi(granularity$age, "Q", n = 3)
xknots_age3
fitspline <-  morals(granularity$age, granularity$gran, 
   xknots = xknots_age3, yknots = yknots_gran, xdegrees = 2, ydegrees = 1,
   xordinal = FALSE, yordinal = FALSE)
fitspline
```
  
  
```{r, echo = 2:6, fig.width=5, fig.height=7}
op <- par(mfrow = c(2,1))
plot(fitspline$xhat, fitspline$yhat, xlab = "Age (transformed)", ylab = "Granularity (transformed)", 
     main = "Optimally Scaled Scatterplot")
lines(fitspline$xhat, fitspline$ypred, col = "coral4", lwd = 2)
plot(granularity$age, fitspline$yhat, xlab = "Age", ylab = "Granularity (transformed)", 
     main = "Spline Morals Fit")
ind <- order(granularity$age)
lines(granularity$age[ind], fitspline$ypred[ind], col = "coral4", lwd = 2)
par(op)
```


Compared to the quadratic and the piecewise linear fit, there is only a slight improvement in $R^2$ `r round(fitspline$smc, 3)`. The bottom panel of the figure above shows the corresponding smooth regression fit. 

## Monotone Regression

Next we present a **monotone regression** fit. Montone regression fits a step function into the scatterplot. If the step function is monotonically decreasing, it is called *antitone regression*. If it is monotonically increasing, *isotone regression*. Isotone regression is described in detail in @deLeeuw+Hornik+Mair:2009, who also provide a flexible implementation of various isotone regression techniques by means of the package `isotone`. In `Gifi`, a monotone regression can be fitted as follows (the knots are set at the data points). 

```{r}
xknots_age4 <- knotsGifi(granularity$age, "D")
fitmono <-  morals(x = granularity$age, y = granularity$gran, 
  xknots = xknots_age4, yknots = yknots_gran, ydegrees = 1, yordinal = FALSE)
```

```{r, echo = 2:6, fig.width=5, fig.height=7}
op <- par(mfrow = c(2,1))
plot(fitmono$xhat, fitmono$yhat, xlab = "Age (transformed)", ylab = "Granularity (transformed)", 
     main = "Optimally Scaled Scatterplot")
lines(fitmono$xhat, fitmono$ypred, col = "coral4", lwd = 2)
plot(granularity$age, fitmono$yhat, xlab = "Age", ylab = "Granularity (transformed)", 
     main = "Monotone Morals Fit")
sfun <- stepfun(sort(granularity$age)[-nrow(granularity)], fitmono$ypred[order(granularity$age)])
plot(sfun, col = "coral4", add = TRUE, pch = 19, cex = 0.7, lwd = 2)
par(op)
```

Obviously this U-shaped relationship is not the best example for a monotone regression fit. Still, it shows nicely how monotone regression works. In the figure above we see that after a certain point the descending step function is not able anymore to capture the increasing granularity pattern for the older participants, since it is restricted to be monotone. Thus, it becomes a flat line. The $R^2$ of `r round(fitmono$smc, 3)` is still good because the model is very precise for the younger participants. 

## Nominal Transformation

The last model we fit uses nominal transformation of age which leads to the most flexible specification. This model is nonsense, of course: aside from totally overfitting the data, it doesn't take any metric or ordinal `age` information into account. But it certainly demonstrates the flexibility of morals.


```{r}
xknots_age5 <- knotsGifi(granularity$age, "D")
fitnom <-  morals(granularity$age, granularity$gran, 
   xknots = xknots_age5, yknots = yknots_gran, xdegrees = 1, ydegrees = 1,
   xordinal = FALSE, yordinal = FALSE)
```

```{r, echo = 2:6, fig.width=5, fig.height=7}
op <- par(mfrow = c(2,1))
plot(fitnom$xhat, fitnom$yhat, xlab = "Age (transformed)", ylab = "Granularity (transformed)", 
     main = "Optimally Scaled Scatterplot")
lines(fitnom$xhat, fitnom$ypred, col = "coral4", lwd = 2)
plot(granularity$age, fitnom$yhat, xlab = "Age", ylab = "Granularity (transformed)", 
     main = "Nominal Morals Fit")
ind <- order(granularity$age)
lines(granularity$age[ind], fitnom$ypred[ind], col = "coral4", lwd = 2)
par(op)
```

This model has a remarkable $R^2$-value of `r round(fitnom$smc, 3)`. But we overfit, of course.


## Cross-Validation

At this point the question is: Which model should we use? We have seen that, apart from the linear and the nominal model, all $R^2$-values were in the 0.2 range. In order to examine potential overfitting and stability of these models, we can perform a $K$-fold *cross-validation* (CV). Avery basic version is implemented in `cv.morals()`, subject to further refinement in the future. For a typical value of $K=10$, the idea is to leave out 10 observations at random, fit `morals()` without these observations, and then use this model to predict the left out observations. For each model we compute a `CV error`: 

\begin{equation}
CV_{(K)} = \frac{1}{K} \sum_{k=1}^K ({\hat y_k} - y_k)^2,
\end{equation}

where $y_k$ are the observed (but scaled) response values in fold $k$, and $\hat y_k}$ are the predicted values based on the fit without the left-out observations. In other words, for prediction we use the optimally scaled linear morals model as shown in the left panels of the figures above. The smaller $CV_{(K)}$, the less we tend to overfit the data. 

```{r cvmorals, cache=TRUE}
set.seed(123)
cvlin <- cv(fitlin2, folds = 10)
cvquad <- cv(fitquad2, folds = 10)
cvpiece <- cv(fitpiece, folds = 10)
cvspline <- cv(fitspline, folds = 10)
cvmono <- cv(fitmono, folds = 10)
cvnom <- cv(fitnom, folds = 10)
cvvec <- c(cvlin, cvquad, cvpiece, cvspline, cvmono, cvnom)
r2vec <- c(fitlin2$smc, fitquad2$smc, fitpiece$smc, fitspline$smc,
          fitmono$smc, fitnom$smc)
cvr2 <- cbind(cvvec, r2vec)
dimnames(cvr2) <- list(c("linear", "quadratic", "piecewise", "spline",
                         "monotone", "nominal"), c("CV-error", "R2"))
round(cvr2, 5)
```

For the nominal model the CV-error is large, but the $R^2$ is high. Conversely, for the linear model is the CV-error is low, but the $R^2$ is bad. 

## Multiple Linear Regression 

Morals can easily be extended to multiple predictors. The starting point is the linear model from above. We add a categorical predictor (`gender`; converted into numerical), and an interaction between `gender` and `age`. 

```{r}
granularity2 <- granularity 
granularity2$gender <- as.numeric(granularity$gender)-1
granularity2 <- scale(granularity2) |> as.data.frame()
head(granularity2)
fitmlin1 <- lm(gran ~ -1 + age*gender, data = granularity2)
fitmlin1
```

Now we mimic this model using `morals()`. We operate on the standardized data and create the interaction terms as follows: 

```{r}
granularity2$int <- granularity2$age * granularity2$gender
xknots_age <- knotsGifi(granularity2[,2:4], "E")  
yknots_gran <- knotsGifi(granularity2$gran, "E")
fitmlin2 <-  morals(x = granularity2[,2:4], y= granularity2$gran, 
   xknots = xknots_age, yknots = yknots_gran, xdegrees = 1, ydegrees = 1,
   xordinal = FALSE, yordinal = FALSE)
fitmlin2
```

The `morals()` result matches the one from the `lm()` call.

## Multiple Ordinal Regression

So far we have not touched the response variable in terms of using a transformation other than linear (which just standardizes $Y$). Let us fit an ordinal response version using `morals()`. We categorize the `granularity` into a Likert-type item (5 categories). First we fit a *proportional odds model* using `polr()` with a quadratic `age` effect and a main effect for `gender`. By default, `polr()` uses a logit link.

```{r}
library("MASS")
grancat <- cut(granularity$gran, 5, labels = 1:5)
fitord1 <- polr(grancat ~ age + I(age^2) + gender, data = granularity)
summary(fitord1)
```

There is basically no gender effect. In `morals()` we do not need to specify a particular link function. Having an ordinal response we can simply apply an ordinal transformation on it. 

```{r}
granularity3 <- granularity 
granularity3$gender <- as.numeric(granularity$gender)
xknots_age <- knotsGifi(granularity3[,2:3], "E")  
yknots_gran2 <- knotsGifi(grancat, "D")
fitord2 <-  morals(x = granularity3[,2:3], y = as.numeric(grancat),
  xknots = xknots_age, yknots = yknots_gran2, xdegrees = c(2, -1), ydegrees = 1, 
  xordinal = FALSE, yordinal = TRUE) 
fitord2
```

```{r, fig.height=7, fig.width=7}
plot(fitord2, "transplot", main = c("Granularity Categorical", "Age", "Gender"))
```

The figure shows the transformation plots. For the response we have an ordinal transformation which stretches/squeezes the 1-5 input categories. Age is quadratic, and gender is taken as nominal. The parameters do not have the same meaning as is a logistic regression context. 

From the coefficients and in the plot below we see that there is a large effect for age, and the gender effect is close to 0. Again, these effects represent the slope parameters in the linearized regressions based on the transformed scores (see figures below). 

```{r echo = 2:11, fig.width=5, fig.height=7}
op <- par(mfrow = c(2,1))
plot(fitord2$xhat[,1], fitord2$yhat, xlab = "Age (transformed)", ylab = "Granularity (transformed)", 
     main = "Optimally Scaled Scatterplot")
lines(fitord2$xhat[,1][granularity3$gender == 1], fitord2$ypred[granularity3$gender == 1], col = "coral4", lwd = 2)
lines(fitord2$xhat[,1][granularity3$gender == 2], fitord2$ypred[granularity3$gender == 2], col = "cadetblue", lwd = 2)
legend(-0.02, 0.14, bty = "n", legend = c("male", "female"), lty = 1, col = c("coral4", "cadetblue"))
plot(granularity3$age, fitord2$yhat, xlab = "Age", ylab = "Granularity (transformed)", main = "Ordinal Polynomial Morals Fit")
ind1 <- order(granularity3$age[granularity3$gender == 1])
lines(granularity3$age[granularity3$gender == 1][ind1], fitord2$ypred[granularity3$gender == 1][ind1], col = "coral4", lwd = 2)
ind2 <- order(granularity3$age[granularity3$gender == 2])
lines(granularity3$age[granularity3$gender == 2][ind2], fitord2$ypred[granularity3$gender == 2][ind2], col = "cadetblue", lwd = 2)
legend(20, 0.14, bty = "n", legend = c("male", "female"), lty = 1, col = c("coral4", "cadetblue"))
par(op)
```

<br>
<br>