SNSeg supports change-points estimation for both univariate and multivariate time series (including high-dimensional time series with dimension greater than 10.) using Self-Normalization (SN) based framework. Please read Zhao, Jiang and Shao (2022) <doi.org/10.1111/rssb.12552> for details of the SN-based algorithms.
The package contain three functions for change-points estimation:
SNSeg_Uni works for a univariate time
series.SNSeg_Multi works for multivariate time
series with dimension up to ten.SNSeg_HD works for high-dimensional time
series with dimension above ten.All functions contain two important input arguments:
grid_size_scale and grid_size.
grid_size_scale: the trimming parameter \(\epsilon\), a parameter to control the
local window size grid_size. The range of
grid_size_scale should be between 0.05 and 0.5. Any input
grid_size_scale smaller than 0.05 will be automatically
changed to 0.05, and any input grid_size_scale greater than
0.5 will be automatically changed to 0.5.grid_size: the local window size \(h\) for local scanning for each time point.
According to Zhao et al. (2022), grid_size is a product of
grid_size_scale and the length of time.Users can set their own grid_size or leave it as
NULL in the input arguments. If grid_size is
NULL, these functions will compute the SN test statistic
using the value of grid_size_scale. If
grid_size is set by users, the functions will first
calculate grid_size_scale by diving the length of time, and
then compute the SN test statistic.
For the other input arguments: * ts: Users should enter
a time series for this argument. For SNSeg_Uni, the
dimension of ts should be exactly one in most cases (or two
when paras_to_test = 'bivcor'); for
SNSeg_Multi, the dimension must be at least two but no more
than ten; for SNSeg_HD, the dimension must be greater than
ten. * paras_to_test: This argument in functions
SNSeg_Uni and SNSeg_Multi allows users to
enter the parameter(s) they would like to test the change in. *
confidence: Users should choose a confidence level among
0.9, 0.95, 0.99, 0.995 and 0.995. A smaller confidence level is easier
to reject the null hypothesis and detect a change-point.
The package also offers an option to plot the time series (this only
works for the univariate time series cases!) Users need to set
plot_SN = TRUE to visualize the time series, and
est_cp_loc = TRUE to add the estimated change-point
locations in the plot.
max_SNsweepTo visualize the computed SN-based test statistic at each time point,
users can apply the function max_SNsweep by plugging in the
output object from one of the functions SNSeg_Uni,
SNSeg_Multi and SNSeg_HD. The options
est_cp_loc = TRUE and critical_loc = TRUE are
provided to draw the estimated change-point locations and the critical
value threshold inside the test statistic plot.
max_SNsweep also returns the SN-based test statistic for
each time point. A large number of test statistics in the output can be
messy for users who only seek to generate a plot. To hide the test
statistics output, users can create an arbitrary variable name and set
it to the max_SNsweep function, e.g.,
SN_stat <- max_SNsweep(...).
SNSeg_estimateThe function SNSeg_estimate allows users to compute the
parameter estimates (e.g., mean, variance, acf, quantile, etc.) of each
of the segments separated by the estimated change-points. To use this
function, users should use the output of the functions
SNSeg_Uni, SNSeg_Multi and
SNSeg_HD as the input of SNSeg_estimate.
The typical S3 methods summary, print and
plot are available to SNSeg_Uni,
SNSeg_Multi and SNSeg_HD objects. The
summary method displays the parameter to be tested, the
estimated change-point amount and locations, the grid_size,
confidence level as well as the critical value of the SN-based test. The
print method shows the change-point locations. The
plot method plots the time series, and similar to the
argument plot_SN = TRUE, the plot method
allows users to generate time series segmentation plot(s) with the
estimated change-point locations. It also provides ts_index
option to allow users to plot any individual time series they want.
Users can apply their preferred color of the change-point(s) within the
plot(s) by setting cpts.col to any color.
We then provide the examples of the functions SNSeg_Uni,
SNSeg_Multi and SNSeg_HD.
SNSeg_Uni:The function SNSeg_Uni detect change-points for a
univariate time series based on the change in a single or
parameters.
paras_to_test allows one or a combination of multiple
parameters from “mean”, “variance”, “acf” and a numeric quantile value
between 0 and 1. For instance, to test the change in autocorrelation and
80th quantile, users should set paras_to_test to c(‘acf’,
0.8).paras_to_test should be set to bivcor.paras_to_test using their own
parameters. In this case, the input of paras_to_test needs
to be a function which returns a numeric value.We provide examples for different cases:
# Please run the following function before running examples:
mix_GauGPD <- function(u,p,trunc_r,gpd_scale,gpd_shape){
indicator <- u<p
rv <- rep(0, length(u))
rv[indicator>0] <- qtruncnorm(u[indicator>0]/p,a=-Inf,b=trunc_r)
rv[indicator<=0] <- qgpd((u[indicator<=0]-p)/(1-p), loc=trunc_r, scale=gpd_scale,shape=gpd_shape)
return(rv)
}set.seed(7)
n <- 2000
reptime <- 2
cp_sets <- round(n*c(0,cumsum(c(0.5,0.25)),1))
mean_shift <- c(0.4,0,0.4)
rho <- -0.7
ts <- MAR(n, reptime, rho)
no_seg <- length(cp_sets)-1
for(index in 1:no_seg){ # Mean shift
tau1 <- cp_sets[index]+1
tau2 <- cp_sets[index+1]
ts[tau1:tau2,] <- ts[tau1:tau2,] + mean_shift[index]
}
ts <- ts[,2]
# grid_size undefined
result <- SNSeg_Uni(ts, paras_to_test = "mean", confidence = 0.9,
grid_size_scale = 0.05, grid_size = NULL,
plot_SN = FALSE, est_cp_loc = FALSE)
# grid_size defined & generate time series segmentation plot
result <- SNSeg_Uni(ts, paras_to_test = "mean", confidence = 0.9,
grid_size_scale = 0.05, grid_size = 116,
plot_SN = TRUE, est_cp_loc = TRUE)# Estimated change-point locations
result$est_cp
#> [1] 1018 1487
# Parameter estimates (mean) of each segment
SNSeg_estimate(result)
#> $mean
#> [1] 0.39444849 0.02107613 0.38362925
#>
#> attr(,"class")
#> [1] "SNSeg_estimate"
# plot the SN-based test statistic
SN_stat <- max_SNsweep(result, plot_SN = TRUE, est_cp_loc = TRUE,
critical_loc = TRUE)We then show how to use the S3 methods summary,
print and plot.
summary(result)
#> There are 2 change-point(s) detected at 90th confidence level based on the change in the single mean parameter.
#>
#> The critical value of SN-based test is 135.39283594
#>
#> The detected change-point location(s) are 1018,1487 with a grid_size of 116print(result)
#> The detected change-point location(s) are 1018,1487plot(result, cpts.col = 'red')We can see both the plot_SN = TRUE option and the
plot.SN function generates the same plot. Users can select
any choice based on their preferences. The class of the argument
result is SNSeg_Uni.
set.seed(7)
ts <- MAR_Variance(2, "V1")
ts <- ts[,2]
# grid_size defined
result <- SNSeg_Uni(ts, paras_to_test = "variance", confidence = 0.9,
grid_size_scale = 0.05, grid_size = NULL,
plot_SN = FALSE, est_cp_loc = TRUE)
# Estimated change-point locations
result$est_cp
# Parameter estimates (variance) of each segment
SNSeg_estimate(result)
# plot the SN-based test statistic
SN_stat <- max_SNsweep(result, plot_SN = TRUE, est_cp_loc = TRUE,
critical_loc = TRUE)# built-in functions
# summary statistic
summary(result)
# print
print(result)
# plot
plot(result, cpts.col = 'red')set.seed(7)
ts <- MAR_Variance(2, "A3")
ts <- ts[,2]
# grid_size defined
result <- SNSeg_Uni(ts, paras_to_test = "acf", confidence = 0.9,
grid_size_scale = 0.05, grid_size = 92, plot_SN = FALSE,
est_cp_loc = TRUE)
# Estimated change-point locations
result$est_cp
# Parameter estimates (acf) of each segment
SNSeg_estimate(result)
# plot the SN-based test statistic
SN_stat <- max_SNsweep(result, plot_SN = TRUE, est_cp_loc = TRUE,
critical_loc = TRUE)# built-in functions
# summary statistic
summary(result)
# print
print(result)
# plot
plot(result, cpts.col = 'red')library(mvtnorm)
set.seed(7)
n <- 1000
sigma_cross <- list(4*matrix(c(1,0.8,0.8,1), nrow=2),
matrix(c(1,0.2,0.2,1), nrow=2),
matrix(c(1,0.8,0.8,1), nrow=2))
cp_sets <- round(c(0,n/3,2*n/3,n))
noCP <- length(cp_sets)-2
rho_sets <- rep(0.5, noCP+1)
ts <- MAR_MTS_Covariance(n, 2, rho_sets, cp_sets, sigma_cross)
ts <- ts[1][[1]]
# grid_size defined
result <- SNSeg_Uni(ts, paras_to_test = "bivcor", confidence = 0.9,
grid_size_scale = 0.05, grid_size = 77,
plot_SN = FALSE, est_cp_loc = TRUE)
# Estimated change-point locations
result$est_cp
# Parameter estimates (bivariate correlation) of each segment
SNSeg_estimate(result)
# plot the SN-based test statistic
SN_stat <- max_SNsweep(result, plot_SN = TRUE, est_cp_loc = TRUE,
critical_loc = TRUE)# built-in functions
# summary statistic
summary(result)
# print
print(result)
# plot
plot(result, cpts.col = 'red')Please download the packages truncnorm and
evd before running the following code.
library(truncnorm)
library(evd)
set.seed(7)
n <- 1000
cp_sets <- c(0,n/2,n)
noCP <- length(cp_sets)-2
reptime <- 2
rho <- 0.2
# AR time series with no change-point (mean, var)=(0,1)
ts <- MAR(n, reptime, rho)*sqrt(1-rho^2)
trunc_r <- 0
p <- pnorm(trunc_r)
gpd_scale <- 2
gpd_shape <- 0.125
for(ts_index in 1:reptime){
ts[(cp_sets[2]+1):n, ts_index] <- mix_GauGPD(pnorm(ts[(cp_sets[2]+1):n, ts_index]),
p,trunc_r,gpd_scale,gpd_shape)
}
ts <- ts[,2]
# grid_size undefined
# test in 90% quantile
result <- SNSeg_Uni(ts, paras_to_test = 0.9, confidence = 0.9,
grid_size_scale = 0.066, grid_size = NULL,
plot_SN = FALSE, est_cp_loc = TRUE)
# Estimated change-point locations
result$est_cp
# Parameter estimates (90th quantile) of each segment
SNSeg_estimate(result)
# plot the SN-based test statistic
SN_stat <- max_SNsweep(result, plot_SN = TRUE, est_cp_loc = TRUE,
critical_loc = TRUE)# built-in functions
# summary statistic
summary(result)
# print
print(result)
# plot
plot(result, cpts.col = 'red')To perform the SN-based test using any general functional, users
should define their own function as the input of
paras_to_test. The input of this function should consist of
the followings:
ts: A univariate time series provided by the user.The user-defined function should return a numeric value as the output.
set.seed(7)
n <- 500
reptime <- 2
cp_sets <- round(n*c(0,cumsum(c(0.5,0.25)),1))
mean_shift <- c(0.4,0,0.4)
rho <- -0.7
ts <- MAR(n, reptime, rho)
no_seg <- length(cp_sets)-1
for(index in 1:no_seg){ # Mean shift
tau1 <- cp_sets[index]+1
tau2 <- cp_sets[index+1]
ts[tau1:tau2,] <- ts[tau1:tau2,] + mean_shift[index]
}
ts <- ts[,2]
# Define a general functional for the input 'paras_to_test'
paras_to_test = function(ts){
mean(ts)
}
result.SNCP.general <- SNSeg_Uni(ts, paras_to_test = paras_to_test,
confidence = 0.9, grid_size_scale = 0.05,
grid_size = NULL, plot_SN = FALSE,
est_cp_loc = TRUE)
# Estimated change-point locations
result.SNCP.general$est_cp
# Parameter estimates (general functional) of each segment
SNSeg_estimate(result.SNCP.general)
# plot the SN-based test statistic
SN_stat <- max_SNsweep(result.SNCP.general, plot_SN = TRUE, est_cp_loc = TRUE,
critical_loc = TRUE)# built-in functions
# summary statistic
summary(result)
# print
print(result)
# plot
plot(result, cpts.col = 'red')set.seed(7)
n <- 1000
cp_sets <- c(0,333,667,1000)
no_seg <- length(cp_sets)-1
rho <- 0
# AR time series with no change-point (mean, var)=(0,1)
ts <- MAR(n, 2, rho)*sqrt(1-rho^2)
no_seg <- length(cp_sets)-1
sd_shift <- c(1,1.6,1)
for(index in 1:no_seg){ # Mean shift
tau1 <- cp_sets[index]+1
tau2 <- cp_sets[index+1]
ts[tau1:tau2,] <- ts[tau1:tau2,]*sd_shift[index]
}
d <- 2
ts <- ts[,2]
# Test in 90th and 95th quantile with grid_size undefined
result <- SNSeg_Uni(ts, paras_to_test = c(0.9, 0.95), confidence = 0.9,
grid_size_scale = 0.05, grid_size = NULL,
plot_SN = FALSE, est_cp_loc = FALSE)
# Test in 90th quantile and the variance with grid_size undefined
result <- SNSeg_Uni(ts, paras_to_test = c(0.9, 'variance'),
confidence = 0.95, grid_size_scale = 0.078,
grid_size = NULL, plot_SN = FALSE,
est_cp_loc = FALSE)
# Test in 90th quantile, variance and acf with grid_size undefined
result <- SNSeg_Uni(ts, paras_to_test = c(0.9,'variance', "acf"),
confidence = 0.9, grid_size_scale = 0.064,
grid_size = NULL, plot_SN = TRUE,
est_cp_loc = TRUE)
# Estimated change-point locations
result$est_cp# Test in 60th quantile, mean, variance and acf with grid_size defined
result.last <- SNSeg_Uni(ts, paras_to_test = c(0.6, 'mean', "variance",
"acf"), confidence = 0.9, grid_size_scale = 0.05,
grid_size = 83, plot_SN = FALSE, est_cp_loc = TRUE)
# Estimated change-point locations
result.last$est_cp
# Parameter estimates (of the last example) of each segment
SNSeg_estimate(result.last)
# SN-based test statistic segmentation plot
SN_stat <- max_SNsweep(result.last, plot_SN = TRUE, est_cp_loc = TRUE,
critical_loc = TRUE)# built-in functions
# summary statistic
summary(result)
# print
print(result)
# plot
plot(result, cpts.col = 'red')SNSeg_MultiThe function SNSeg_Multi detects change-points for
multivariate time series based on the change in multivariate means or
covariances. Users can set paras_to_test to
mean or covariance for change-points
estimation.
mean, the dimension is equivalent to the
number of time series. If the parameter is covariance, the
dimension is equivalent to the number of unknown parameters within the
covariance matrix.For examples of multivariate means and covariances, please check the following commands:
# Please run this function before running examples:
exchange_cor_matrix <- function(d, rho){
tmp <- matrix(rho, d, d)
diag(tmp) <- 1
return(tmp)
}library(mvtnorm)
set.seed(10)
d <- 5
n <- 1000
cp_sets <- round(n*c(0,cumsum(c(0.075,0.3,0.05,0.1,0.05)),1))
mean_shift <- c(-3,0,3,0,-3,0)/sqrt(d)
mean_shift <- sign(mean_shift)*ceiling(abs(mean_shift)*10)/10
rho_sets <- 0.5
sigma_cross <- list(exchange_cor_matrix(d,0))
ts <- MAR_MTS_Covariance(n, 2, rho_sets, cp_sets=c(0,n), sigma_cross)
noCP <- length(cp_sets)-2
no_seg <- length(cp_sets)-1
for(rep_index in 1:2){
for(index in 1:no_seg){ # Mean shift
tau1 <- cp_sets[index]+1
tau2 <- cp_sets[index+1]
ts[[rep_index]][,tau1:tau2] <- ts[[rep_index]][,tau1:tau2] + mean_shift[index]
}
}
ts <- ts[1][[1]]
# grid_size undefined
result <- SNSeg_Multi(ts, paras_to_test = "mean", confidence = 0.95,
grid_size_scale = 0.079, grid_size = NULL,
plot_SN = FALSE, est_cp_loc = TRUE)
# grid_size defined
result <- SNSeg_Multi(ts, paras_to_test = "mean", confidence = 0.99,
grid_size_scale = 0.05, grid_size = 65,
plot_SN = FALSE, est_cp_loc = TRUE)
# Estimated change-point locations
result$est_cp
# Parameter estimates (multivariate mean) of each segment
SNSeg_estimate(result)
# SN-based test statistic segmentation plot
SN_stat <- max_SNsweep(result, plot_SN = TRUE, est_cp_loc = TRUE,
critical_loc = TRUE)# built-in functions
# summary statistic
summary(result)
# print
print(result)
# plot
par(mfrow=c(2,3))
plot(result, cpts.col = 'red')library(mvtnorm)
set.seed(10)
reptime <- 2
d <- 4
n <- 1000
sigma_cross <- list(exchange_cor_matrix(d,0.2),
2*exchange_cor_matrix(d,0.5),
4*exchange_cor_matrix(d,0.5))
rho_sets <- c(0.3,0.3,0.3)
mean_shift <- c(0,0,0) # with mean change
cp_sets <- round(c(0,n/3,2*n/3,n))
ts <- MAR_MTS_Covariance(n, reptime, rho_sets, cp_sets, sigma_cross)
noCP <- length(cp_sets)-2
no_seg <- length(cp_sets)-1
for(rep_index in 1:reptime){
for(index in 1:no_seg){ # Mean shift
tau1 <- cp_sets[index]+1
tau2 <- cp_sets[index+1]
ts[[rep_index]][,tau1:tau2] <- ts[[rep_index]][,tau1:tau2] +
mean_shift[index]
}
}
ts <- ts[[1]]
# grid_size undefined
result <- SNSeg_Multi(ts, paras_to_test = "covariance",
confidence = 0.9, grid_size_scale = 0.05,
grid_size = NULL, plot_SN = FALSE, est_cp_loc = FALSE)
# grid_size defined
result <- SNSeg_Multi(ts, paras_to_test = "covariance",
confidence = 0.9, grid_size_scale = 0.05,
grid_size = 81, plot_SN = FALSE,
est_cp_loc = TRUE)
# Estimated change-point locations
result$est_cp
# Parameter estimates (covariance estimate) of each segment
SNSeg_estimate(result)
# SN-based test statistic segmentation plot
SN_stat <- max_SNsweep(result, plot_SN = TRUE, est_cp_loc = TRUE,
critical_loc = TRUE)# built-in functions
# summary statistic
summary(result)
# print
print(result)
# plot
par(mfrow=c(1,1))
plot(result, cpts.col = 'red')SNSeg_HDThe function SNSeg_HD performs change-points estimation
for high-dimensional time series (dimension greater than 10) using the
change in high-dimensional means. For usage examples of
SNSeg_HD, we provide the followig code:
n <- 600
p <- 100
nocp <- 5
cp_sets <- round(seq(0,nocp+1,1)/(nocp+1)*n)
num_entry <- 5
kappa <- sqrt(4/5) # Wang et al(2020)
mean_shift <- rep(c(0,kappa),100)[1:(length(cp_sets)-1)]
set.seed(1)
ts <- matrix(rnorm(n*p,0,1),n,p)
no_seg <- length(cp_sets)-1
for(index in 1:no_seg){ # Mean shift
tau1 <- cp_sets[index]+1
tau2 <- cp_sets[index+1]
ts[tau1:tau2,1:num_entry] <- ts[tau1:tau2,1:num_entry] +
mean_shift[index] # sparse change
}
# SN segmentation plot
# grid_size undefined (plot the first 3 time series)
result <- SNSeg_HD(ts, confidence = 0.9, grid_size_scale = 0.05,
grid_size = NULL, plot_SN = FALSE, est_cp_loc = TRUE,
ts_index = c(1:3))
# grid_size defined (plot the 1st, 3rd and 5th time series)
result <- SNSeg_HD(ts, confidence = 0.9, grid_size_scale = 0.05,
grid_size = 52, plot_SN = FALSE, est_cp_loc = TRUE,
ts_index = c(1,3,5))
# Estimated change-point locations
result$est_cp
# Parameter estimates (high-dimensional means) of each segment
summary.stat <- SNSeg_estimate(result)
# SN-based test statistic segmentation plot
SN_stat <- max_SNsweep(result, plot_SN = TRUE, est_cp_loc = TRUE,
critical_loc = TRUE)# built-in functions
# summary statistic
summary(result)
# print
print(result)
# plot
plot(result, cpts.col = 'red', ts_index = c(1:3))We note that only the selected time series were plotted by setting
values for ts_index.