--- title: "nonparametric distribution" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{nonparametric} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(denim) ``` ## Non-parametric vs parametric In denim, users have 2 options to define dwell-time distribution: - As a parametric distribution: using `d_*()` functions. - As a non-parametric distribution: using `nonparametric()` function and user must provide the histogram of distribution where bin-width matches `timeStep`. ### Example 1 To demonstrate the difference between 2 approaches, we can try modeling an SIR model with Weibull distributed infectious period using `nonparametric()` and `d_weibull()`. ***Model definition using `d_weibull()`*** ```{r} sir_parametric <- denim_dsl({ S -> I = beta * (I/N) * S * timeStep I -> R = d_weibull(scale = r_scale, shape = r_shape) }) ``` Model parameters that must be defined are: `beta`, `N`, `r_scale`, `r_shape` ***Model definition using*** ***`nonparametric()`*** ```{r} sir_nonparametric <- denim_dsl({ S -> I = beta * (I/N) * S * timeStep I -> R = nonparametric(dwelltime_dist) }) ``` Model parameters that must be defined are: `beta`, `N`, `dwelltime_dist` (the discrete dwell time distribution) ***Run model*** We will run both models under the following model settings ```{r} # parameters mod_params <- list( beta = 0.4, N = 1000, r_scale = 4, r_shape = 3 ) # initial population init_vals <- c(S = 950, I = 50, R = 0) # simulation duration and timestep sim_duration <- 30 timestep <- 0.05 ``` Running the model with `d_weibull()` is straight forward ```{r} parametric_mod <- sim(sir_parametric, initialValues = init_vals, parameters = mod_params, simulationDuration = sim_duration, timeStep = timestep) plot(parametric_mod, ylim = c(0, 1000)) ``` However, to run the model using `nonparametric()`, we first need to compute the discrete dwell time distribution (`dwelltime_dist`). Since all parametric distributions are asymptotic to 1, we will set the maximal dwell time as the time point where the cumulative probability is sufficiently close to 1 (i.e. above the threshold `1 - error_tolerance`). A helper function to compute discrete dwell time distribution from a distribution function in R is provided below.
Helper function ```{r} # Compute discrete distribution of dwell-tinme # dist_func - R distribution function for dwell time (pexp, pgamma, etc.) # ... - parameters for dist_func compute_dist <- function(dist_func,..., timestep=0.05, error_tolerance=0.0001){ maxtime <- timestep prev_prob <- 0 prob_dist <- numeric() while(TRUE){ # get current cumulative prob and check whether it is sufficiently close to 1 temp_prob <- ifelse( dist_func(maxtime, ...) < (1 - error_tolerance), dist_func(maxtime, ...), 1); # get f(t) curr_prob <- temp_prob - prev_prob prob_dist <- c(prob_dist, curr_prob) prev_prob <- temp_prob maxtime <- maxtime + timestep if(temp_prob == 1){ break } } prob_dist } ```
We can then run the model as followed ```{r} # Compute the discrete distribution dwelltime_dist <- compute_dist(pweibull, scale = mod_params$r_scale, shape = mod_params$r_shape, timestep = timestep) # Compute the discrete distribution nonparametric_mod <- sim(sir_nonparametric, initialValues = init_vals, parameters = list( beta = mod_params$beta, N = mod_params$N, dwelltime_dist = dwelltime_dist ), simulationDuration = sim_duration, timeStep = timestep) plot(nonparametric_mod, ylim = c(0, 1000)) ``` ### Example 2 By using `nonparametric()`, we can run the model with any dwell time distribution shape Consider the following multimodal distribution. ```{r echo=FALSE} first_dist <- compute_dist(pweibull, scale = 1.5, shape = 4, timestep = timestep) second_dist <- compute_dist(pweibull, scale = 3, shape = 3.5, timestep = timestep) first_dist <- c(rep(0, length(second_dist) - length(first_dist)), first_dist) multimodal_dist <- first_dist + second_dist ``` ```{r} timestep <- 0.05 plot(seq(0, by = 0.05, length.out = length(multimodal_dist)), multimodal_dist, type = "l", col = "#374F77", lty = 1, lwd = 3, xlab = "Length of stay (days)", ylab = "", yaxt = 'n') ``` We can also run the `sir_nonparametric` model from last example with this dwell time distribution ```{r} # model parameter parameters <- list(beta = 0.4, N = 1000, dwelltime_dist = multimodal_dist) # initial population init_vals <- c(S = 950, I = 50, R = 0) # simulation duration and timestep sim_duration <- 30 timestep <- 0.05 # Run the model with multimodel distribution nonparametric_mod <- sim( sir_nonparametric, initialValues = init_vals, parameters = parameters, simulationDuration = sim_duration, timeStep = timestep) plot(nonparametric_mod, ylim = c(0, 1000)) ```