--- title: "Beta Diversity" author: "N. Frerebeau" date: "`r Sys.Date()`" output: markdown::html_format: options: toc: true number_sections: true vignette: > %\VignetteIndexEntry{Beta Diversity} %\VignetteEngine{knitr::knitr} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` $\beta$-diversity measures how different local systems are from one another (Moreno and Rodríguez 2010). **tabula** allows to calculate several turnover and similarity measures from a count table (absolute frequencies giving the number of individuals for each category, i.e. a contingency table). *It assumes that you keep your data tidy*: each variable (type/taxa) must be saved in its own column and each observation (sample/case) must be saved in its own row. ```{r intro, fig.width=7, fig.height=5, fig.align="center"} ## Install extra packages (if needed) # install.packages("folio") # Datasets ## Load packages library(tabula) ## Ceramic data from Lipo et al. 2015 data("mississippi", package = "folio") ## Turnover turnover(mississippi, method = "whittaker") ## Similarity BR <- similarity(mississippi, method = "brainerd") ## Plot plot_spot(BR, col = color("YlOrBr")(12)) ``` Under the hood, the `index_*()` functions are called (see details below). We denote the $m \times p$ incidence matrix by $X = \left[ x_{ij} \right] ~\forall i \in \left[ 1,m \right], j \in \left[ 1,p \right]$ and the $p \times p$ corresponding co-occurrence matrix by $Y = \left[ y_{ij} \right] ~\forall i,j \in \left[ 1,p \right]$, with row and column sums: \begin{align} x_{i \cdot} = \sum_{j = 1}^{p} x_{ij} && x_{\cdot j} = \sum_{i = 1}^{m} x_{ij} && x_{\cdot \cdot} = \sum_{j = 1}^{p} \sum_{i = 1}^{m} x_{ij} && \forall x_{ij} \in \lbrace 0,1 \rbrace \\ y_{i \cdot} = \sum_{j \geqslant i}^{p} y_{ij} && y_{\cdot j} = \sum_{i \leqslant j}^{p} y_{ij} && y_{\cdot \cdot} = \sum_{i = 1}^{p} \sum_{j \geqslant i}^{p} y_{ij} && \forall y_{ij} \in \lbrace 0,1 \rbrace \end{align} ```{r woodland} ## Data from Magurran 1988, p. 162 woodland <- matrix( data = c(TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, TRUE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE, TRUE, FALSE, TRUE), nrow = 6, ncol = 6 ) colnames(woodland) <- c("Birch", "Oak", "Rowan", "Beech", "Hazel", "Holly") ``` # Turnover The following methods can be used to ascertain the degree of turnover in taxa composition along a gradient on qualitative (presence/absence) data. This assumes that the order of the matrix rows (from 1 to $m$) follows the progression along the gradient/transect. Data are standardized on a presence/absence scale ($0$/$1$) beforehand. ## Whittaker (1960) $$ \beta_W = \frac{S}{\alpha} - 1 $$ ```{r whittaker} index_whittaker(woodland) ``` Where $\alpha$ is the mean sample diversity: $\alpha = \frac{x_{\cdot \cdot}}{m}$ ## Cody (1975) $$ \beta_C = \frac{g(H) + l(H)}{2} - 1 $$ Where: * $g(H)$ is the number of taxa gained along the transect, * $l(H)$ is the number of taxa lost along the transect. ```{r cody} index_cody(woodland) ``` ## Routledge (1977) ## Routledge 1 $$ \beta_R = \frac{S^2}{2 y_{\cdot \cdot} + S} - 1 $$ ```{r routledge1} index_routledge1(woodland) ``` ## Routledge 2 $$ \beta_I = \log x_{\cdot \cdot} - \frac{\sum_{j = 1}^{p} x_{\cdot j} \log x_{\cdot j}}{x_{\cdot \cdot}} - \frac{\sum_{i = 1}^{m} x_{i \cdot} \log x_{i \cdot}}{x_{\cdot \cdot}} $$ ```{r routledge2} index_routledge2(woodland) ``` ## Routledge 3 $$ \beta_E = \exp(\beta_I) - 1 $$ ```{r routledge3} index_routledge3(woodland) ``` ## Wilson & Shmida (1984) $$ \beta_T = \frac{g(H) + l(H)}{2\alpha} $$ ```{r wilson} index_wilson(woodland) ``` # Similarity Similarity between two samples $a$ and $b$ can be measured as follow. These indices provide a scale of similarity from $0$-$1$ where $1$ is perfect similarity and $0$ is no similarity, with the exception of the Brainerd-Robinson index which is scaled between $0$ and $200$. Thereafter, we denote by: * $S_a$ and $S_b$ the total number of taxa observed in samples $a$ and $b$, respectively, * $N_a$ and $N_b$ the total number of individuals in samples $a$ and $b$, respectively, * $a_j$ and $b_j$ the number of individuals in the $j$-th type/taxon, $j \in \left[ 1,S \right]$, * $o_j$ the number of type/taxon common to both sample/case: $o_j = \sum_{k = 1}^{S} a_k \cap b_k$. ## Qualitative similarity measures Data are standardized on a presence/absence scale ($0$/$1$) beforehand. ### Jaccard $$ C_J = \frac{o_j}{S_a + S_b - o_j} $$ ### Dice (1945) - Sorensen (1948) $$ C_S = \frac{2 \times o_j}{S_a + S_b} $$ ## Quantitative similarity measures ### Brainerd (1951) - Robinson (1951) $$ C_{BR} = 200 - \sum_{j = 1}^{S} \left| \frac{a_j \times 100}{\sum_{j = 1}^{S} a_j} - \frac{b_j \times 100}{\sum_{j = 1}^{S} b_j} \right|$$ ### Bray-Curtis Bray and Curtis (1957) modified version of the Dice-Sorensen index. $$ C_N = \frac{2 \sum_{j = 1}^{S} \min(a_j, b_j)}{N_a + N_b} $$ ### Morisita-Horn Horn (1966) modified version of the Morisita (1959) overlap index. $$ C_{MH} = \frac{2 \sum_{j = 1}^{S} a_j \times b_j}{(\frac{\sum_{j = 1}^{S} a_j^2}{N_a^2} + \frac{\sum_{j = 1}^{S} b_j^2}{N_b^2}) \times N_a \times N_b} $$ # References Brainerd, G. W. 1951. The Place of Chronological Ordering in Archaeological Analysis. *American Antiquity*, 16(4), 301-313. DOI: [10.2307/276979](https://doi.org/10.2307/276979). Bray, J. R. & Curtis, J. T. (1957). An Ordination of the Upland Forest Communities of Southern Wisconsin. *Ecological Monographs*, 27(4), 325-349. DOI: [10.2307/1942268](https://doi.org/10.2307/1942268). Cody, M. L. (1975). Towards a Theory of Continental Species Diversity: Bird Distributions Over Mediterranean Habitat Gradients. In M. L. Cody & J. M. Diamond (Eds.), *Ecology and Evolution of Communities*, 214-257. Cambridge, MA: Harvard University Press. Dice, L. R. (1945). Measures of the Amount of Ecologic Association Between Species. *Ecology*, 26(3): 297-302. DOI: [10.2307/1932409](https://doi.org/10.2307/1932409). Horn, H. S. (1966). Measurement of "Overlap" in Comparative Ecological Studies. *The American Naturalist*, 100(914): 419-424. DOI: [10.1086/282436](https://doi.org/10.1086/282436). Moreno, C. E. & Rodríguez, P. (2010). A Consistent Terminology for Quantifying Species Diversity? *Oecologia*, 163(2), 279-782. DOI: [10.1007/s00442-010-1591-7](https://doi.org/10.1007/s00442-010-1591-7). Mosrisita, M. (1959). Measuring of interspecific association and similarity between communities. *Memoirs of the Faculty of Science, Kyushu University*, Series E, 3:65-80. Robinson, W. S. (1951). A Method for Chronologically Ordering Archaeological Deposits. *American Antiquity*, 16(4), 293-301. DOI: [10.2307/276978](https://doi.org/10.2307/276978). Routledge, R. D. (1977). On Whittaker's Components of Diversity. *Ecology*, 58(5), 1120-1127. DOI: [10.2307/1936932](https://doi.org/10.2307/1936932). Sorensen, T. (1948). A Method of Establishing Groups of Equal Amplitude in Plant Sociology Based on Similarity of Species Content and Its Application to Analyses of the Vegetation on Danish Commons. *Kongelige Danske Videnskabernes Selskab*, 5(4): 1-34. Whittaker, R. H. (1960). Vegetation of the Siskiyou Mountains, Oregon and California. *Ecological Monographs*, 30(3), 279-338. DOI: [10.2307/1943563.](https://doi.org/10.2307/1943563). Wilson, M. V. & Shmida, A. (1984). Measuring Beta Diversity with Presence-Absence Data. *The Journal of Ecology*, 72(3), 1055-1064. DOI: [10.2307/2259551](https://doi.org/10.2307/2259551).