---
title: "4. Circular statistics"
author: "Tobias Stephan"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{4. Circular statistics}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
editor_options: 
  markdown: 
    wrap: 72
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

This vignette teaches you how to retrieve the mean direction of stress 
datasets. 

```{r setup, echo=TRUE, message=FALSE}
library(tectonicr)
library(ggplot2) # load ggplot library
```

## Mean direction
Directional data is $\pi$-periodical. 
Thus, for the calculation of mean, the average of 35 and 355$^{\circ}$ 
should be 15 instead of 195$^{\circ}$. 
**tectonicr** provides the circular mean (`circular_mean()`) and the 
quasi-median (`circular_median()`) as metrics to describe average direction:

```{r mean, echo=TRUE}
data("san_andreas")
circular_mean(san_andreas$azi)
circular_median(san_andreas$azi)
```

## Quality weighted mean direction
Because the stress data is heteroscedastic, the data with less precise 
direction should have less impact on the final mean direction The weighted 
mean or quasi-median uses the reported measurements weighted 
by the inverse of the uncertainties:
```{r weighted, echo=TRUE}
circular_mean(san_andreas$azi, 1 / san_andreas$unc)
circular_median(san_andreas$azi, 1 / san_andreas$unc)
```

The spread of directional data can be expressed by the standard deviation (for 
the mean) or the quasi-interquartile range (for the median):
```{r weighted_spread, echo=TRUE}
circular_sd(san_andreas$azi, 1 / san_andreas$unc) # standard deviation
circular_IQR(san_andreas$azi, 1 / san_andreas$unc) # interquartile range
```


## Statistics in the Pole of Rotation (PoR) reference frame

**NOTE:** Because the $\sigma_{SHmax}$ orientations are subjected to angular 
distortions in the geographical coordinate system, it is recommended to express
statistical parameters using the transformed orientations of the PoR reference 
frame. 
```{r por, echo=TRUE}
data("cpm_models")
por <- cpm_models[["NNR-MORVEL56"]] |>
  equivalent_rotation("na", "pa")
san_andreas.por <- PoR_shmax(san_andreas, por, type = "right")
```


```{r por_stats, echo=TRUE}
circular_mean(san_andreas.por$azi.PoR, 1 / san_andreas$unc)
circular_sd(san_andreas.por$azi.PoR, 1 / san_andreas$unc)

circular_median(san_andreas.por$azi.PoR, 1 / san_andreas$unc)
circular_IQR(san_andreas.por$azi.PoR, 1 / san_andreas$unc)
```

The collected summary statistics can be quickly obtained by `circular_summary()`:
```{r summary_stats, echo=TRUE}
circular_summary(san_andreas.por$azi.PoR, 1 / san_andreas$unc, mode = TRUE)
```

The summary statistics additionally include the circular quasi-quantiles, the 
variance, the skewness, the kurtosis, the mode, the 95% confidence angle, and 
the mean resultant length (R).

## Rose diagram
**tectonicr** provides a rose diagram, i.e. histogram for angular data. 
```{r rose1, echo=TRUE}
rose(san_andreas$azi,
  weights = 1 / san_andreas$unc, main = "North pole",
  dots = TRUE, stack = TRUE, dot_cex = 0.5, dot_pch = 21
)

# add the density curve
plot_density(san_andreas$azi, kappa = 20, col = "#51127CFF", shrink = 1.5)
```

The diagram shows the uncertainty-weighted frequencies (equal area rose fans), 
the von Mises density distribution (blue curve), and the circular mean (red line) 
incl. its 95% confidence interval (transparent red). 


Showing the distribution of the transformed data gives the better representation
of the angle distribution as there is no angle distortion due to the arbitrarily 
chosen geographic coordinate system.
```{r rose2, echo=TRUE}
rose(san_andreas.por$azi,
  weights = 1 / san_andreas$unc, main = "PoR",
  dots = TRUE, stack = TRUE, dot_cex = 0.5, dot_pch = 21
)
plot_density(san_andreas.por$azi, kappa = 20, col = "#51127CFF", shrink = 1.5)

# show the predicted direction
rose_line(135, radius = 1.1, col = "#FB8861FF")
```

The green line shows the predicted direction.

## QQ Plot

The (linearised) circular QQ-Plot (`circular_qqplot()`) can be used to visually 
assess whether our stress sample is drawn from an uniform distribution or has 
a preferred orientation.

```{r qqplot, echo=TRUE}
circular_qqplot(san_andreas.por$azi.PoR)
```

Our data clearly deviates from the diagonal line, indicating the data is not randomly distributed and has a strong preferred orientation around the 50% quantile.


## Statistical tests
### Test for random distribution
Uniformly distributed orientation can be described by the 
*von Mises distribution* (Mardia and Jupp, 1999). If the directions are 
distributed randomly can be tested with the **Rayleigh Test**:
```{r random, echo=TRUE}
rayleigh_test(san_andreas.por$azi.PoR)
```

Here, the test rejects the Null Hypothesis (`statistic > p.value`). 
Thus the $\sigma_{SHmax}$ directions have a preferred orientation.

Alternative statistical tests for circular uniformity are `kuiper_test()` and 
`watson_test()`. Read `help()` for more details...


## Test for goodness-of-fit
Assuming a von Mises Distribution (circular normal distribution) of the 
orientation data, a $100(1-\alpha)\%$  **confidence interval** can be 
calculated (Mardia and Jupp, 1999):
```{r confidence, echo=TRUE}
confidence_interval(san_andreas.por$azi.PoR, conf.level = 0.95, w = 1 / san_andreas$unc)
```
The prediction for the $\sigma_{SHmax}$ orientation is $135^{\circ}$. 
Since the prediction lies within the confidence interval, it can be concluded 
with 95% confidence that the orientations follow the predicted trend of 
$\sigma_{SHmax}$.

The (weighted) **circular dispersion** of the orientation angles around the 
prediction is another way of assessing the significance of a normal distribution
around a specified direction. It can be measured by:
```{r dispersion, echo=TRUE}
circular_dispersion(san_andreas.por$azi.PoR, y = 135, w = 1 / san_andreas$unc)
```

The value of the dispersion ranges between 0 and 2.

The standard error and the confidence interval of the calculated circular 
dispersion can be estimated by bootstrapping via:
```{r dispersion_MLE, echo=TRUE}
circular_dispersion_boot(san_andreas.por$azi.PoR, y = 135, w = 1 / san_andreas$unc, R = 1000)
```


The statistical test for the goodness-of-fit is the (weighted) **Rayleigh Test** 
with a specified mean direction (here the predicted direction of $135^{\circ}$:
```{r rayleigh2, echo=TRUE}
weighted_rayleigh(san_andreas.por$azi.PoR, mu = 135, w = 1 / san_andreas$unc)
```

Here, the Null Hypothesis is rejected, and thus, the alternative, i.e. an 
non-uniform distribution with the predicted direction as the mean cannot be excluded.


# References
Mardia, K. V., and Jupp, P. E. (Eds.). (1999). "Directional Statistics" 
Hoboken, NJ, USA: John Wiley & Sons, Inc. 
doi: 10.1002/9780470316979.
<!--doi: [10.1002/9780470316979](https://doi.org/10.1002/9780470316979).-->

Ziegler, Moritz O., and Oliver Heidbach. 2017. “Manual of the Matlab Script Stress2Grid” 
GFZ German Research Centre for Geosciences; World Stress Map Technical Report 17-02. 
doi: [10.5880/wsm.2017.002](https://doi.org/10.5880/wsm.2017.002).