“early_warnings” subpackage

This is the antiCPy.early_warnings subpackage. Up to now it contains a LangevinEstimation class that provides a few possibilities to parameterize the drift and diffusion function in terms of polynomials. The parameters of these functions can be estimated via Markov Chain Monte Carlo (MCMC) sampling and the drift slope can be calculated in a rolling window approach in order to detect changes in the resilience and noise level of a system.

class antiCPy.early_warnings.langevin_estimation.LangevinEstimation(data, time, drift_model='3rd order polynomial', diffusion_model='constant', prior_type='Non-informative linear and Gaussian Prior', prior_range=None, scales=None, detrending_of_whole_dataset=None, gauss_filter_mode='reflect', gauss_filter_sigma=6, gauss_filter_order=0, gauss_filter_cval=0.0, gauss_filter_truncate=4.0, plot_detrending=False)[source]

The LangevinEstimation class includes tools to estimate a polynomial Langevin equation with various drift and diffusion terms and provides the drift slope \(\zeta\) as a resilience measure.

Parameters

data (One dimensional numpy array of floats) – A one dimensional numpy array containing the times series to analyse.
time (One dimensional numpy array of floats) – A one dimensional numpy array containing the time samples of the time series data.
drift_model (str) – Defines the drift model. Default is '3rd order polynomial'. Additionally, a 'first order polynomial' can be chosen.
diffusion_model (str) – Defines the diffusion model. Default is 'constant'. Additionally, a 'first order polynomial' can be chosen.
prior_type – Defines the used prior to calculate the posterior distribution of the data. Default is 'Non-informative linear and Gaussian Prior'. A flat prior can be chosen with 'Flat Prior'.
prior_range (Two-dimensional numpy array of floats, optional) – Customize the prior range of the estimated polynomial parameters :math:` heta_i` starting for :math:` heta_0` in row with index 0 and upper limits in the first column. Default is None which means prior ranges of -50 to 50 for the drift parameters and 0 to 50 for constant diffusion terms.
scales (One-dimensional numpy array with two float entries) – Two tailed percentiles to create credibility bands of the estimated measures.
theta (One-dimensional numpy array of floats) – Array that contains the estimated drift and diffusion parameters with drift and ending with diffusion. The lowest order is mentioned first.
ndim (int) – Total number of parameters to estimate.
nwalkers (int) – Number of MCMC chains.
nsteps (int) – Length of Markov chains.
nburn (int) – Length of burn in period of the MCMC chains.
data_size (int) – Length of the data array.
dt (float) – Time intervall of the equidistant time array.
drift_slope (One-dimensional numpy array of floats) – Array with the current drift slope estimate in the 0th component. Component 1, 2 and 3, 4 contain the upper and lower bound of the credibility intervals defined by the scales variable.
noise_level_estimate (One-dimenional numpy array of floats) – Array with the noise level estimate in the 0th component. Component 1, 2 and 3, 4 contain the upper and lower bound of the credibility intervals defined by the scales variable.
loop_range (One-dimensional (data_size - window_size / window_shift + 1) numpy array of integers) – Array with the start indices of the time array for each rolled time window.
window_size (int) – Size of rolling windows.
data_window (One-dimensional numpy array of floats) – Array that contains data of a certain interval of the dataset in data.
time_window (One-dimensional numpy array of floats) – Array that contains time of a certain interval of the dataset in data.
increments (One-dimensional numpy array of floats) – Array that contains the increments from \(t\) to \(t+1\).
joint_samples (Two-dimensional (ndim, num_of_joint_samples) numpy array of floats) – Array that contains drawed parameter samples from their joint posterior probability density function that is estimated from the data in the current data_window.
slope_estimates (One-dimensional numpy array of floats) – Estimates of the drift slope on the current data_window.
fixed_point_estimate (float) – Fixed point estimate calculated as the mean of the current data_window.
starting_guesses (Two-dimensional numpy array of floats) – Array that contains the initial guesses of the MCMC sampling.
theta_array (Two-dimensional numpy array of floats) – Array that contains the drift and diffusion parameters sampled with MCMC.
slope_storage (Two-dimensional (5, loop_range.size) numpy array of floats) – Array that contains the drift slope estimates \(\hat{\zeta}\) of the rolling windows. The columns encode each rolling window with the drift slope estimate \(\hat{\zeta}\) in the 0th row and in rows 1,2 and 3,4 the upper and lower bound of the credibility intervals defined by scales.
noise_level_storage (Two-dimensional (5, loop_range.size) numpy array of floats) – Array that contains the noise level estimates \(\hat{\sigma}\) of the rolling windows. The columns encode each rolling window with the noise level estimate \(\hat{\sigma}\) in the 0th row and in rows 1,2 and 3,4 the upper and lower bound of the credibility intervals defined by scales.
slope_kernel_density_obj (scipy kernel density object of the scipy.gaussian_kde(...) function.) – Kernel density object of scipy.gaussian_kde(...) of the slope posterior created with a Gaussian kernel and a bandwidth computed by silverman’s rule.
noise_kernel_density_obj – Kernel density object of scipy.gaussian_kde(...) of the noise level posterior created with a Gaussian kernel and a bandwidth computed by silverman’s rule.
slow_trend (One-dimensional numpy array of floats.) – Contains the subtracted slow trend if a detrending is applied to the whole data or each window separately.
detrending_of_whole_dataset (str) – Default is None. If 'linear' the whole data is detrended linearly. If Gaussian kernel smoothed a Gaussian smoothed curve is subtracted from the whole data. The parameters of the kernel smoothing can be set by gauss_filter_mode, gauss_filter_sigma, gauss_filter_order, gauss_filter_cval and gauss_filter_truncate.
gauss_filter_mode (str) – According to the scipy.ndimage.filters.gaussian_filter option.
gauss_filter_sigma (float) – According to the scipy.ndimage.filters.gaussian_filter option.
gauss_filter_order (int) – According to the scipy.ndimage.filters.gaussian_filter option.
gauss_filter_cval (float) – According to the scipy.ndimage.filters.gaussian_filter option.
gauss_filter_truncate (float) – According to the scipy.ndimage.filters.gaussian_filter option.
plot_detrending (bool) – Default is False. If True, the self.data as well as the self.slow_trend and the detrended version are shown.

static detrend(time, data, detrend_mode='linear', gauss_filter_mode='reflect', gauss_filter_sigma=6, gauss_filter_order=0, gauss_filter_cval=0.0, gauss_filter_truncate=4.0, plot_detrending=False)[source]

Method to apply a linear or Gaussian kernel smoothed detrending to the whole data or to each data window separately.

Parameters

time (One-dimensional numpy array of floats.) – Time points of the time series.
data (One-dimensional numpy array of floats.) – Time series that is to be detrended.
detrend_mode (str) – Each window is linearly detrended for detrend = 'linear' and with a Gaussian kernel filter via detrend = 'gauss_kernel'. The linear detrending is the default option.
gauss_filter_mode (str) – According to the scipy.ndimage.filters.gaussian_filter option.
gauss_filter_sigma (float) – According to the scipy.ndimage.filters.gaussian_filter option.
gauss_filter_order (int) – According to the scipy.ndimage.filters.gaussian_filter option.
gauss_filter_cval (float) – According to the scipy.ndimage.filters.gaussian_filter option.
gauss_filter_truncate (float) – According to the scipy.ndimage.filters.gaussian_filter option.

Returns

The first return object contains the detrended data, the second one the subtracted slow trend.

Return type

One-dimensional numpy arrays.

third_order_polynom_slope_in_fixed_point()[source]: Calculate the slope_estimates of a third order polynomial drift function with joint_samples of the posterior distribution around a fixed point estimate given as the data mean of the current window.

static calc_D1(theta, data, drift_model)[source]: Returns the drift parameterized by a first or third order polynomial for input data. Static function in order to speed up computation time via numba.jit(nopython = True).

D1()[source]: Calls the static calc_D1 function and returns its value.

static calc_D2(theta, data, diffusion_model)[source]: Returns the diffusion parameterized by a constant or first order polynomial for input data. Static function in order to speed up computation time via numba.jit(nopython = True).

D2()[source]: Calls the static calc_D2 function and returns its value.

log_prior()[source]: Returns the logarithmic prior probability for a given set of parameters based on the short time propagator depending on the drift and diffusion models and the given prior_range of the parameters theta. A flat prior for the parameters apart from restriction to positive constant diffusion and a non-informative prior for linear parts and Gaussian priors for higher orders are implemented. The standard deviation of the Gaussian priors is given by the scales variable.

log_likelihood()[source]: Returns the logarithmic likelihood of the data for the given model parametrization.

log_posterior(theta)[source]: Returns the logarithmic posterior probability of the data for the given model parametrization.

neg_log_posterior(theta)[source]: Returns the negative logarithmic posterior distribution of the data for the given model parametrization.

declare_MAP_starting_guesses(nwalkers=None, nsteps=None)[source]: Declare the maximum a posterior (MAP) starting guesses for a MCMC sampling with nwalkers and nsteps.

compute_posterior_samples(print_AC_tau, ignore_AC_error, thinning_by, print_progress)[source]: Compute the theta_array with \(nwalkers \cdot nsteps\) Markov Chain Mone Carlo samples. If ignore_AC_error = False the calculation will terminate with error if the autocorrelation of the sampled chains is to high compared to the chain length. Otherwise the highest autocorrelation length will be used to thin the sampled chains. In order to run tests in shorter time you can set ignore_AC_error = True and define a thinning_by n steps. If print_AC_tau = True the autocorrelation lengths of the sampled chains is printed.

static init()[source]: Defines the animation background for easy visualization of a rolling window scan of a given time series. It prepares plots for the time series and the time evolution of the drift slope estimate \(\hat{\zeta}\) and noise level estimate \(\hat{\sigma}\) and its probability density estimate. The method needs to be static. Some variables have to be declared global in to guarantee the functionality of the animation procedure.

animation(i, slope_grid, noise_grid, nwalkers, nsteps, nburn, n_joint_samples, n_slope_samples, n_noise_samples, cred_percentiles, print_AC_tau, ignore_AC_error, thinning_by, print_progress, detrending_per_window, gauss_filter_mode, gauss_filter_sigma, gauss_filter_order, gauss_filter_cval, gauss_filter_truncate, plot_detrending)[source]: Function that is called iteratively by the matplotlib.animation.FuncAnimation(...) tool to generate an animation of the rolling window scan of a time series. The time series, the rolling windows, the time evolution of the drift slope estimate \(\hat{\zeta}\) and the noise level estimate \(\hat{\sigma}\) and its posterior probality density is shown in the animation.

calc_driftslope_noise(slope_grid, noise_grid, nwalkers=50, nsteps=10000, nburn=200, n_joint_samples=50000, n_slope_samples=50000, n_noise_samples=50000, cred_percentiles=[16, 1], print_AC_tau=False, ignore_AC_error=False, thinning_by=60, print_progress=False, detrending_per_window=None, gauss_filter_mode='reflect', gauss_filter_sigma=6, gauss_filter_order=0, gauss_filter_cval=0.0, gauss_filter_truncate=4.0, plot_detrending=False)[source]

Calculates the drift slope estimate \(\hat{\zeta}\) and the noise level \(\hat{\sigma}\) from the MCMC sampled parameters of a Langevin model for a given window_size and given window_shift. In the course of the computations the parameters joint_samples, a noise_kernel_density_obj, a slope_kernel_density_obj, the drift_slope with credibility intverals, the noise_level with credibility intervals and in case of a third order polynomial drift the estimated fixed_point_estimate of the window will be stored.

Parameters

slope_grid (One-dimensional numpy array of floats) – Array on which the drift slope kernel density estimate is evaluated.
noise_grid (One-dimensional numpy array of floats) – Array on which the noise level kernel density estimate is evaluated.
nwalkers (int) – Number of walkers that are initialized for the MCMC sampling via the package emcee. Default is 50.
nsteps (int) – Length of the sampled MCMC chains. Default is 10000.
nburn (int) – Number of data points at the beginning of the Markov chains which are discarded in terms of a burn in period. Default is 200.
n_joint_samples (int) – Number of joint samples that are drawn from the estimated joint posterior probability in order to calculate the drift slope estimate \(\zeta\) and corresponding credibility bands. Default is 50000.
n_slope_samples (int) – Number of drift slope samples that are drawn from the estimated drift slope posterior computed based on the sampling results of the joint posterior probability of the Langevin parameters :math:` heta_i`. Default is 50000.
n_noise_samples (int) – Number of constant noise level samples that are drawn from the estimated marignal probility distribution of the diffusion model is chosen constant. Default is 50000.
cred_percentiles (One-dimensional numpy array of integers) – Two entries to define the percentiles of the calculated credibility bands of the estimated parameters. Default is numpy.array([16,1]).
print_AC_tau – If True the estimated autocorrelation lengths of the Markov chains is shown. The maximum length is used for thinning of the chains. Default is False.
ignore_AC_error (bool) – If True the autocorrelation lengths of the Markov chains is not estimated and thus, it is not checked whether the chains are too short to give an reliable autocorrelation estimate. This avoids error interruption of the procedure, but can lead to unreliable results if the chains are too short. The option should be chosen in order to save computation time, e.g. when debugging your code with short Markov chains. Default is False.
thinning_by (int) – If ignore_AC_error = True the chains are thinned by thinning_by. Every thinning_by-th data point is stored. Default is 60.
print_progress (bool) – If True the progress of the MCMC sampling is shown. Default is False.
detrending_per_window (str) – Default is None. If 'linear', the self.data_window is detrended linearly. If Gaussian kernel smoothed, a Gaussian smoothed curve is subtracted from self.data_window. The parameters of the kernel smoothing can be set by gauss_filter_mode, gauss_filter_sigma, gauss_filter_order, gauss_filter_cval and gauss_filter_truncate. Drift slope and noise level are estimated after detrending in these cases.
gauss_filter_mode (str) – According to the scipy.ndimage.filters.gaussian_filter option.
gauss_filter_sigma (float) – According to the scipy.ndimage.filters.gaussian_filter option.
gauss_filter_order (int) – According to the scipy.ndimage.filters.gaussian_filter option.
gauss_filter_cval (float) – According to the scipy.ndimage.filters.gaussian_filter option.
gauss_filter_truncate (float) – According to the scipy.ndimage.filters.gaussian_filter option.
plot_detrending (bool) – Default is False. If True, the self.data as well as the self.slow_trend and the detrended version are shown.

perform_resilience_scan(window_size, window_shift, slope_grid, noise_grid, nwalkers=50, nsteps=10000, nburn=200, n_joint_samples=50000, n_slope_samples=50000, n_noise_samples=50000, cred_percentiles=[16, 1], print_AC_tau=False, ignore_AC_error=False, thinning_by=60, print_progress=False, slope_save_name='default_save_slopes', noise_level_save_name='default_save_noise', save=True, create_animation=False, ani_save_name='default_animation_name', animation_title='', mark_critical_point=None, mark_noise_level=None, detrending_per_window=None, gauss_filter_mode='reflect', gauss_filter_sigma=6, gauss_filter_order=0, gauss_filter_cval=0.0, gauss_filter_truncate=4.0, plot_detrending=False)[source]

Performs an automated window scan with defined window_shift over the whole time series. In each window the drift slope and noise level estimates with corresponding credibility bands are computed and saved in the slope_storage and the noise_level_storage. It can also be used to create an animation of the sliding window approach plotting the time series, the moving window, and the time evolution of the drift slope estimates \(\hat{\zeta}\), the noise level \(\hat{\sigma}\) and the noise kernel density estimate. The start indices of the shifted windows is also saved in order to facilitate customized plots.

Parameters

window_size (int) – Time window size.
window_shift (int) – The rolling time window is shifted about window_shift data points.
slope_grid (One-dimensional numpy array of floats) – Array on which the drift slope kernel density estimate is evaluated.
noise_grid (One-dimensional numpy array of floats) – Array on which the noise level kernel density estimate is evaluated.
nwalkers (int) – Number of walkers that are initialized for the MCMC sampling via the package emcee. Default is 50.
nsteps (int) – Length of the sampled MCMC chains. Default is 10000.
nburn (int) – Number of data points at the beginning of the Markov chains which are discarded in terms of a burn in period. Default is 200.
n_joint_samples (int) – Number of joint samples that are drawn from the estimated joint posterior probability in order to calculate the drift slope estimate \(\zeta\) and corresponding credibility bands. Default is 50000.
n_slope_samples (int) – Number of drift slope samples that are drawn from the estimated drift slope posterior computed based on the sampling results of the joint posterior probability of the Langevin parameters :math:` heta_i`. Default is 50000.
n_noise_samples (int) – Number of constant noise level samples that are drawn from the estimated marignal probility distribution of the diffusion model is chosen constant. Default is 50000.
cred_percentiles (One-dimensional numpy array of integers) – Two entries to define the percentiles of the calculated credibility bands of the estimated parameters. Default is numpy.array([16,1]).
print_AC_tau – If True the estimated autocorrelation lengths of the Markov chains is shown. The maximum length is used for thinning of the chains. Default is False.
ignore_AC_error (bool) – If True the autocorrelation lengths of the Markov chains is not estimated and thus, it is not checked whether the chains are too short to give an reliable autocorrelation estimate. This avoids error interruption of the procedure, but can lead to unreliable results if the chains are too short. The option should be chosen in order to save computation time, e.g. when debugging your code with short Markov chains. Default is False.
thinning_by (int) – If ignore_AC_error == True the chains are thinned by thinning_by. Every thinning_by-th data point is stored. Default is 60.
print_progress (bool) – If True the progress of the MCMC sampling is shown. Default is False.
slope_save_name (str) – Name of the file in which the slope_storage array will be saved. Default is default_save_slopes.
noise_level_save_name (str) – Name of the file in which the noise_level_storage array will be saved. Default is default_save_noise.
save (bool) – If True the slope_storage and the noise_level_storage arrays are saved in the end of the scan in an .npy file. Default is True.
create_animation (bool) –
If True an automated animation of the time evolution of the drift slope estimate :math:hat{zeta}`, the noise level \(\hat{\sigma}\) and the noise kernel density is shown together with the time series and rolling windows. Default is False.

Warning

It makes use of global variables to incorporate the movie animation tool in the class. The create_animation parameter is only intended to easily reconstruct the plot results of the related publication. In other circumstances it should not be used in order to avoid conflicts with the global variable names.
ani_save_name (str) – If create_animation = True the animation is saved in a .mp4 file with this name. Default is default_animation_name.
animation_title (str) – A title can be given to the animation.
mark_critical_point (float) – A red dotted vertical line is shown at time mark_critical_point. Default is None.
mark_noise_level (float) – A green dotted line is shown at the noise level mark_noise_level in the noise kernel density plot and the time evolution plot of the noise level. Default is None.