---
title: "Using FLORAL for survival models with longitudinal microbiome data"
output: 
  rmarkdown::html_vignette:
    md_extensions: [ 
      "-autolink_bare_uris" 
    ]
vignette: >
  %\VignetteIndexEntry{Using FLORAL for survival models with longitudinal microbiome data}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
editor_options: 
  chunk_output_type: console
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  out.width = "100%"
)
```

```{r setup, warning=FALSE, message=FALSE}
library(FLORAL)
library(dplyr)
library(patchwork)
library(survival)
set.seed(8192024)
```

In this vignette, we illustrate how to apply `FLORAL` to fit a Cox model with longitudinal microbiome data. Due to limited availability of public data sets with survival information, we use simulated data for illustrative purposes.

## Data simulation
We will use the built-in simulation function `simu()` to generate longitudinal compositional features and the corresponding time-to-event. The underlying methodology used for the simulation is based on a piece-wise exponential distribution as described by [Hendry 2014](https://doi.org/10.1002/sim.5945).

By default, the first 10 features out of the 500 features simulated below are associated with the time-to-event.

```{r simulation}

simdat <- simu(n=200, # sample size
               p=500, # number of features
               model="timedep",
               pct.sparsity = 0.8, # proportion of zeros
               rho=0, # feature-wise correlation
               longitudinal_stability = TRUE # choose to simulate longitudinal features with stable trajectories
)

```

With the simulated data, the log-ratio lasso Cox model with time-dependent features can be fitted by running the following function. Here we provide a detailed description on each arguments:

* First of all, please use `longitudinal = TRUE` such that the algorithm would use the appropriate method to handle longitudinal data.
* The feature matrix input `x` should be the count matrix where rows specify samples and columns specify features.
* The vector of IDs of subjects/patients corresponding to the rows of `x` should be input as `id`.
* The vector of sample collection times corresponding to the rows of `x` should be input as `tobs`.
* The `Surv` object (`Surv(time,status)`) of **unique patients** should be input as `y`. Please note that the survival data should be sorted with respect to the IDs specified in `id`. 

```{r FLORAL, warning=FALSE, message=FALSE}

fit <- FLORAL(x=simdat$xcount,
              y=Surv(simdat$data_unique$t,simdat$data_unique$d),
              family="cox",
              longitudinal = TRUE,
              id = simdat$data$id,
              tobs = simdat$data$t0,
              progress=FALSE,
              plot=TRUE)

fit$selected
  
```

The list of selected features is saved in `fit$selected` as shown above.

To appropriately prepare the data in practice, we have the following recommendations:

* Start with patient metadata which includes survival data (time and status), sorting the metadata by patient IDs. Extract time and status variables for the `Surv` object for input as `y`.
* Curate the microbiome feature data matrix, sorted by patient IDs and time of sample collection. Save the patient ID and time of sample collection vectors for `id` and `tobs`. Save the feature table for input as `x`.