The “drift_slope” package

The detailed documentation of the drift_slope package can be found in the following.

class antiCPy.early_warnings.drift_slope.rocket_fast_resilience_estimation.RocketFastResilienceEstimation(antiCPyObject=None)[source]

Superclass of the LangevinEstimation, BinningLangevinEstimation and NonMarkovEstimation class. Initializes functions for a strong parallelisation of the rolling window based MCMC and MAP algorithm by coordinating multiple window calculations over a freely eligible number of subprocesses. Instead of parallelizing the MCMC sampling itself, the whole window calculations are parallelized with single CPU MCMC sampling. This implementation yields significant speed up of computation times.

Parameters:

antiCPyObject (str) – Internal identifier of the object type.

fast_resilience_scan(window_size, window_shift, slope_grid, noise_grid, OU_grid=None, X_coupling_grid=None, nwalkers=50, nsteps=10000, nburn=200, n_joint_samples=50000, detrending_per_window=None, n_slope_samples=50000, n_noise_samples=50000, n_OU_param_samples=50000, n_X_coupling_samples=50000, MCMC_AC_estimate='standard', cred_percentiles=array([16, 1]), print_AC_tau=False, ignore_AC_error=False, thinning_by=60, print_progress=False, print_details=False, print_time_scale_info=False, slope_save_name='default_save_slopes', noise_level_save_name='default_save_noise', OU_param_save_name='default_save_OUparam', X_coupling_save_name='default_save_Xcoupling', num_processes=None, save=True)[source]

The function’s structure to use is almost equivalent to perform_resilience_scan of the LangevinEstimation and the NonMarkovEstimation class. It initializes a multiprocessing pool to calculate several rolling windows simultaneously.

static init_parallel_Langevin(antiCPyObject, data, time, loop_range, window_size, slope_grid, noise_grid, nwalkers, nsteps, nburn, n_joint_samples, n_slope_samples, n_noise_samples, cred_percentiles, print_AC_tau, ignore_AC_error, thinning_by, print_progress, print_details, drift_model, diffusion_model, prior_type, prior_range, scales, slope_storage_connector, noise_storage_connector, detrending_per_window, OU_param_storage_connector=None, X_coupling_storage_connector=None, n_OU_param_samples=None, n_X_coupling_samples=None, MCMC_AC_estimate=None, Y_model=None, Y_drift_model=None, Y_diffusion_model=None, activate_time_scale_separation_prior=False, slow_process=False, time_scale_separation_factor=None, max_likelihood_starting_guesses=None, OU_grid=None, X_coupling_grid=None, print_time_scale_info=None)[source]

Internal function. Only to initialize the workers of RocketFastResilienceEstimation.fast_resilience_scan(...).

static execute_window(m)[source]

Internal function. Subprocess Script of the parallel worker of RocketFastResilienceEstimation.fast_MAP_resilience_scan(...).

fast_MAP_resilience_scan(window_size, window_shift, num_processes='half', cred_percentiles=array([16, 1]), error_propagation='summary statistics', summary_window_size=10, sigma_multiples=array([1, 3]), print_progress=True, print_details=False, slope_save_name='default_save_slopes', noise_level_save_name='default_save_noise', OU_param_save_name='default_save_OUparam', X_coupling_save_name='default_save_Xcoupling', save=True, print_time_scale_info=False)[source]

The function’s structure to use is almost equivalent to perform_MAP_resilience_scan of the LangevinEstimation, BinningLangevinEstimation and the NonMarkovEstimation class. It initializes a multiprocessing pool to calculate several rolling windows simultaneously.

static init_parallel_MAP_Langevin(antiCPyObject, data, time, loop_range, window_size, cred_percentiles, print_details, print_progress, drift_model, diffusion_model, slope_storage_connector, noise_storage_connector, error_propagation, summary_window_size, sigma_multiples, prior_range, prior_type, scales, bin_num=None, OU_param_storage_connector=None, X_coupling_storage_connector=None, Y_model=None, Y_drift_model=None, Y_diffusion_model=None, activate_time_scale_separation_prior=False, slow_process=False, time_scale_separation_factor=None, max_likelihood_starting_guesses=None, print_time_scale_info=None)[source]

Internal function. Only to initialize the workers of RocketFastResilienceEstimation.fast_MAP_resilience_scan(...).

static execute_MAP_window(m)[source]

Internal function. Subprocess Script of the parallel worker of RocketFastResilienceEstimation.fast_MAP_resilience_scan(...).

class antiCPy.early_warnings.drift_slope.langevin_estimation.LangevinEstimation(data, time, drift_model='3rd order polynomial', diffusion_model='constant', prior_type='Non-informative linear and Gaussian Prior', prior_range=None, scales=array([4, 8]), detrending_of_whole_dataset=None, gauss_filter_mode='reflect', gauss_filter_sigma=6, gauss_filter_order=0, gauss_filter_cval=0.0, gauss_filter_truncate=4.0, plot_detrending=False)[source]

Bases: RocketFastResilienceEstimation

The LangevinEstimation class includes tools to estimate a polynomial Langevin equation with various drift and diffusion terms and provides the drift slope \(\zeta\) as a resilience measure.

Parameters:
  • data (One dimensional numpy array of floats) – A one dimensional numpy array containing the times series to analyse.

  • time (One dimensional numpy array of floats) – A one dimensional numpy array containing the time samples of the time series data.

  • drift_model (str) – Defines the drift model. Default is '3rd order polynomial'. Additionally, a 'first order polynomial' can be chosen.

  • diffusion_model (str) – Defines the diffusion model. Default is 'constant'. Additionally, a 'first order polynomial' can be chosen.

  • prior_type – Defines the used prior to calculate the posterior distribution of the data. Default is 'Non-informative linear and Gaussian Prior'. A flat prior can be chosen with 'Flat Prior'.

  • prior_range (Two-dimensional numpy array of floats, optional) – Customize the prior range of the estimated polynomial parameters :math:` heta_i` starting for :math:` heta_0` in row with index 0 and upper limits in the first column. Default is None which means prior ranges of -50 to 50 for the drift parameters and 0 to 50 for constant diffusion terms.

  • scales (One-dimensional numpy array with two float entries) – Two tailed percentiles to create credibility bands of the estimated measures.

  • theta (One-dimensional numpy array of floats) – Array that contains the estimated drift and diffusion parameters with drift and ending with diffusion. The lowest order is mentioned first.

  • ndim (int) – Total number of parameters to estimate.

  • nwalkers (int) – Number of MCMC chains.

  • nsteps (int) – Length of Markov chains.

  • nburn (int) – Length of burn in period of the MCMC chains.

  • data_size (int) – Length of the data array.

  • dt (float) – Time intervall of the equidistant time array.

  • drift_slope (One-dimensional numpy array of floats) – Array with the current drift slope estimate in the 0th component. Component 1, 2 and 3, 4 contain the upper and lower bound of the credibility intervals defined by the scales variable.

  • noise_level_estimate (One-dimenional numpy array of floats) – Array with the noise level estimate in the 0th component. Component 1, 2 and 3, 4 contain the upper and lower bound of the credibility intervals defined by the scales variable.

  • loop_range (One-dimensional (data_size - window_size / window_shift + 1) numpy array of integers) – Array with the start indices of the time array for each rolled time window.

  • window_size (int) – Size of rolling windows.

  • data_window (One-dimensional numpy array of floats) – Array that contains data of a certain interval of the dataset in data.

  • time_window (One-dimensional numpy array of floats) – Array that contains time of a certain interval of the dataset in data.

  • increments (One-dimensional numpy array of floats) – Array that contains the increments from \(t\) to \(t+1\).

  • joint_samples (Two-dimensional (ndim, num_of_joint_samples) numpy array of floats) – Array that contains drawed parameter samples from their joint posterior probability density function that is estimated from the data in the current data_window.

  • slope_estimates (One-dimensional numpy array of floats) – Estimates of the drift slope on the current data_window.

  • fixed_point_estimate (float) – Fixed point estimate calculated as the mean of the current data_window.

  • starting_guesses (Two-dimensional numpy array of floats) – Array that contains the initial guesses of the MCMC sampling.

  • theta_array (Two-dimensional numpy array of floats) – Array that contains the drift and diffusion parameters sampled with MCMC.

  • slope_storage (Two-dimensional (5, loop_range.size) numpy array of floats) – Array that contains the drift slope estimates \(\hat{\zeta}\) of the rolling windows. The columns encode each rolling window with the drift slope estimate \(\hat{\zeta}\) in the 0th row and in rows 1,2 and 3,4 the upper and lower bound of the credibility intervals defined by scales.

  • noise_level_storage (Two-dimensional (5, loop_range.size) numpy array of floats) – Array that contains the noise level estimates \(\hat{\sigma}\) of the rolling windows. The columns encode each rolling window with the noise level estimate \(\hat{\sigma}\) in the 0th row and in rows 1,2 and 3,4 the upper and lower bound of the credibility intervals defined by scales.

  • slope_kernel_density_obj (scipy kernel density object of the scipy.gaussian_kde(...) function.) – Kernel density object of scipy.gaussian_kde(...) of the slope posterior created with a Gaussian kernel and a bandwidth computed by silverman’s rule.

  • noise_kernel_density_obj – Kernel density object of scipy.gaussian_kde(...) of the noise level posterior created with a Gaussian kernel and a bandwidth computed by silverman’s rule.

  • slow_trend (One-dimensional numpy array of floats.) – Contains the subtracted slow trend if a detrending is applied to the whole data or each window separately.

  • detrending_of_whole_dataset (str) – Default is None. If 'linear' the whole data is detrended linearly. If Gaussian kernel smoothed a Gaussian smoothed curve is subtracted from the whole data. The parameters of the kernel smoothing can be set by gauss_filter_mode, gauss_filter_sigma, gauss_filter_order, gauss_filter_cval and gauss_filter_truncate.

  • gauss_filter_mode (str) – According to the scipy.ndimage.filters.gaussian_filter option.

  • gauss_filter_sigma (float) – According to the scipy.ndimage.filters.gaussian_filter option.

  • gauss_filter_order (int) – According to the scipy.ndimage.filters.gaussian_filter option.

  • gauss_filter_cval (float) – According to the scipy.ndimage.filters.gaussian_filter option.

  • gauss_filter_truncate (float) – According to the scipy.ndimage.filters.gaussian_filter option.

  • plot_detrending (bool) – Default is False. If True, the self.data as well as the self.slow_trend and the detrended version are shown.

static detrend(time, data, detrend_mode='linear', gauss_filter_mode='reflect', gauss_filter_sigma=6, gauss_filter_order=0, gauss_filter_cval=0.0, gauss_filter_truncate=4.0, plot_detrending=False)[source]

Method to apply a linear or Gaussian kernel smoothed detrending to the whole data or to each data window separately.

Parameters:
  • time (One-dimensional numpy array of floats.) – Time points of the time series.

  • data (One-dimensional numpy array of floats.) – Time series that is to be detrended.

  • detrend_mode (str) – Each window is linearly detrended for detrend = 'linear' and with a Gaussian kernel filter via detrend = 'gauss_kernel'. The linear detrending is the default option.

  • gauss_filter_mode (str) – According to the scipy.ndimage.filters.gaussian_filter option.

  • gauss_filter_sigma (float) – According to the scipy.ndimage.filters.gaussian_filter option.

  • gauss_filter_order (int) – According to the scipy.ndimage.filters.gaussian_filter option.

  • gauss_filter_cval (float) – According to the scipy.ndimage.filters.gaussian_filter option.

  • gauss_filter_truncate (float) – According to the scipy.ndimage.filters.gaussian_filter option.

Returns:

The first return object contains the detrended data, the second one the subtracted slow trend.

Return type:

One-dimensional numpy arrays.

third_order_polynom_slope_in_fixed_point()[source]

Calculate the slope_estimates of a third order polynomial drift function with joint_samples of the posterior distribution around a fixed point estimate given as the data mean of the current window.

static calc_D1(theta, data, drift_model)[source]

Returns the drift parameterized by a first or third order polynomial for input data.

D1()[source]

Calls the static calc_D1 function and returns its value.

static calc_D2(theta, data, diffusion_model)[source]

Returns the diffusion parameterized by a constant or first order polynomial for input data.

D2()[source]

Calls the static calc_D2 function and returns its value.

static log_prior(theta, drift_model, diffusion_model, prior_type, prior_range, scales)[source]

Returns the logarithmic prior probability for a given set of parameters based on the short time propagator depending on the drift and diffusion models and the given prior_range of the parameters theta. A flat prior for the parameters apart from restriction to positive constant diffusion and a non-informative prior for linear parts and Gaussian priors for higher orders are implemented. The standard deviation of the Gaussian priors is given by the scales variable.

static log_posterior(theta)[source]

Returns the logarithmic posterior probability of the data for the given model parametrization.

static log_likelihood(theta, data, dt, drift_model, diffusion_model)[source]

Returns the logarithmic likelihood of the data for the given model parametrization.

static neg_log_posterior(theta)[source]

Returns the negative logarithmic posterior distribution of the data for the given model parametrization.

declare_MAP_starting_guesses(nwalkers=None, nsteps=None)[source]

Declare the maximum a posterior (MAP) starting guesses for a MCMC sampling with nwalkers and nsteps.

compute_posterior_samples(print_AC_tau, ignore_AC_error, thinning_by, print_progress, nburn, MCMC_parallelization_method=None, num_processes=None, num_chop_chains=None)[source]

Compute the theta_array with \(nwalkers \cdot nsteps\) Markov Chain Monte Carlo (MCMC) samples. If ignore_AC_error = False the calculation will terminate with error if the autocorrelation of the sampled chains is too high compared to the chain length. Otherwise the highest autocorrelation length will be used to thin the sampled chains. In order to run tests in shorter time you can set “ignore_AC_error = True” and define a thinning_by n steps. If print_AC_tau = True the autocorrelation lengths of the sampled chains is printed.

static init()[source]

Defines the animation background for easy visualization of a rolling window scan of a given time series. It prepares plots for the time series and the time evolution of the drift slope estimate \(\hat{\zeta}\) and noise level estimate \(\hat{\sigma}\) and its probability density estimate. The method needs to be static. Some variables have to be declared global in to guarantee the functionality of the animation procedure.

animation(i, slope_grid, noise_grid, nwalkers, nsteps, nburn, n_joint_samples, n_slope_samples, n_noise_samples, cred_percentiles, print_AC_tau, ignore_AC_error, thinning_by, print_progress, detrending_per_window, gauss_filter_mode, gauss_filter_sigma, gauss_filter_order, gauss_filter_cval, gauss_filter_truncate, plot_detrending, MCMC_parallelization_method, num_processes, num_chop_chains)[source]

Function that is called iteratively by the matplotlib.animation.FuncAnimation(...) tool to generate an animation of the rolling window scan of a time series. The time series, the rolling windows, the time evolution of the drift slope estimate \(\hat{\zeta}\) and the noise level estimate \(\hat{\sigma}\) and its posterior probality density is shown in the animation.

calc_driftslope_noise(slope_grid, noise_grid, nburn, n_joint_samples=50000, n_slope_samples=50000, n_noise_samples=50000, cred_percentiles=[16, 1], print_AC_tau=False, ignore_AC_error=False, thinning_by=60, print_progress=False, detrending_per_window=None, gauss_filter_mode='reflect', gauss_filter_sigma=6, gauss_filter_order=0, gauss_filter_cval=0.0, gauss_filter_truncate=4.0, plot_detrending=False, MCMC_parallelization_method=None, num_processes=None, num_chop_chains=None)[source]

Calculates the drift slope estimate \(\hat{\zeta}\) and the noise level \(\hat{\sigma}\) from the MCMC sampled parameters of a Langevin model for a given window_size and given window_shift. In the course of the computations the parameters joint_samples, a noise_kernel_density_obj, a slope_kernel_density_obj, the drift_slope with credibility intverals, the noise_level with credibility intervals and in case of a third order polynomial drift the estimated fixed_point_estimate of the window will be stored.

Parameters:
  • slope_grid (One-dimensional numpy array of floats) – Array on which the drift slope kernel density estimate is evaluated.

  • noise_grid (One-dimensional numpy array of floats) – Array on which the noise level kernel density estimate is evaluated.

  • nburn (int) – Number of data points at the beginning of the Markov chains which are discarded in terms of a burn in period. Usually the defaul is 200, set by the perform_resilience_scan method.

  • n_joint_samples (int) – Number of joint samples that are drawn from the estimated joint posterior probability in order to calculate the drift slope estimate \(\zeta\) and corresponding credibility bands. Default is 50000.

  • n_slope_samples (int) – Number of drift slope samples that are drawn from the estimated drift slope posterior computed based on the sampling results of the joint posterior probability of the Langevin parameters :math:` heta_i`. Default is 50000.

  • n_noise_samples (int) – Number of constant noise level samples that are drawn from the estimated marignal probility distribution of the diffusion model is chosen constant. Default is 50000.

  • cred_percentiles (One-dimensional numpy array of integers) – Two entries to define the percentiles of the calculated credibility bands of the estimated parameters. Default is numpy.array([16,1]).

  • print_AC_tau – If True the estimated autocorrelation lengths of the Markov chains is shown. The maximum length is used for thinning of the chains. Default is False.

  • ignore_AC_error (bool) – If True the autocorrelation lengths of the Markov chains is not estimated and thus, it is not checked whether the chains are too short to give an reliable autocorrelation estimate. This avoids error interruption of the procedure, but can lead to unreliable results if the chains are too short. The option should be chosen in order to save computation time, e.g. when debugging your code with short Markov chains. Default is False.

  • thinning_by (int) – If ignore_AC_error = True the chains are thinned by thinning_by. Every thinning_by-th data point is stored. Default is 60.

  • print_progress (bool) – If True the progress of the MCMC sampling is shown. Default is False.

  • detrending_per_window (str) – Default is None. If 'linear', the self.data_window is detrended linearly. If Gaussian kernel smoothed, a Gaussian smoothed curve is subtracted from self.data_window. The parameters of the kernel smoothing can be set by gauss_filter_mode, gauss_filter_sigma, gauss_filter_order, gauss_filter_cval and gauss_filter_truncate. Drift slope and noise level are estimated after detrending in these cases.

  • gauss_filter_mode (str) – According to the scipy.ndimage.filters.gaussian_filter option.

  • gauss_filter_sigma (float) – According to the scipy.ndimage.filters.gaussian_filter option.

  • gauss_filter_order (int) – According to the scipy.ndimage.filters.gaussian_filter option.

  • gauss_filter_cval (float) – According to the scipy.ndimage.filters.gaussian_filter option.

  • gauss_filter_truncate (float) – According to the scipy.ndimage.filters.gaussian_filter option.

  • plot_detrending (bool) – Default is False. If True, the self.data as well as the self.slow_trend and the detrended version are shown.

perform_resilience_scan(window_size, window_shift, slope_grid, noise_grid, nwalkers=50, nsteps=10000, nburn=200, n_joint_samples=50000, n_slope_samples=50000, n_noise_samples=50000, cred_percentiles=[16, 1], print_AC_tau=False, ignore_AC_error=False, thinning_by=60, print_progress=False, slope_save_name='default_save_slopes', noise_level_save_name='default_save_noise', save=True, create_animation=False, ani_save_name='default_animation_name', animation_title='', mark_critical_point=None, mark_noise_level=None, detrending_per_window=None, gauss_filter_mode='reflect', gauss_filter_sigma=6, gauss_filter_order=0, gauss_filter_cval=0.0, gauss_filter_truncate=4.0, plot_detrending=False, MCMC_parallelization_method=None, num_processes=None, num_chop_chains=None)[source]

Performs an automated window scan with defined window_shift over the whole time series. In each window the drift slope and noise level estimates with corresponding credibility bands are computed and saved in the slope_storage and the noise_level_storage. It can also be used to create an animation of the sliding window approach plotting the time series, the moving window, and the time evolution of the drift slope estimates \(\hat{\zeta}\), the noise level \(\hat{\sigma}\) and the noise kernel density estimate. The start indices of the shifted windows is also saved in order to facilitate customized plots.

Parameters:
  • window_size (int) – Time window size.

  • window_shift (int) – The rolling time window is shifted about window_shift data points.

  • slope_grid (One-dimensional numpy array of floats) – Array on which the drift slope kernel density estimate is evaluated.

  • noise_grid (One-dimensional numpy array of floats) – Array on which the noise level kernel density estimate is evaluated.

  • nwalkers (int) – Number of walkers that are initialized for the MCMC sampling via the package emcee. Default is 50.

  • nsteps (int) – Length of the sampled MCMC chains. Default is 10000.

  • nburn (int) – Number of data points at the beginning of the Markov chains which are discarded in terms of a burn in period. Default is 200.

  • n_joint_samples (int) – Number of joint samples that are drawn from the estimated joint posterior probability in order to calculate the drift slope estimate \(\zeta\) and corresponding credibility bands. Default is 50000.

  • n_slope_samples (int) – Number of drift slope samples that are drawn from the estimated drift slope posterior computed based on the sampling results of the joint posterior probability of the Langevin parameters :math:` heta_i`. Default is 50000.

  • n_noise_samples (int) – Number of constant noise level samples that are drawn from the estimated marignal probility distribution of the diffusion model is chosen constant. Default is 50000.

  • cred_percentiles (One-dimensional numpy array of integers) – Two entries to define the percentiles of the calculated credibility bands of the estimated parameters. Default is numpy.array([16,1]).

  • print_AC_tau – If True the estimated autocorrelation lengths of the Markov chains is shown. The maximum length is used for thinning of the chains. Default is False.

  • ignore_AC_error (bool) – If True the autocorrelation lengths of the Markov chains is not estimated and thus, it is not checked whether the chains are too short to give an reliable autocorrelation estimate. This avoids error interruption of the procedure, but can lead to unreliable results if the chains are too short. The option should be chosen in order to save computation time, e.g. when debugging your code with short Markov chains. Default is False.

  • thinning_by (int) – If ignore_AC_error == True the chains are thinned by thinning_by. Every thinning_by-th data point is stored. Default is 60.

  • print_progress (bool) – If True the progress of the MCMC sampling is shown. Default is False.

  • slope_save_name (str) – Name of the file in which the slope_storage array will be saved. Default is default_save_slopes.

  • noise_level_save_name (str) – Name of the file in which the noise_level_storage array will be saved. Default is default_save_noise.

  • save (bool) – If True the slope_storage and the noise_level_storage arrays are saved in the end of the scan in an .npy file. Default is True.

  • create_animation (bool) –

    If True an automated animation of the time evolution of the drift slope estimate :math:hat{zeta}`, the noise level \(\hat{\sigma}\) and the noise kernel density is shown together with the time series and rolling windows. Default is False.

    Warning

    It makes use of global variables to incorporate the movie animation tool in the class. The create_animation parameter is only intended to easily reconstruct the plot results of the related publication. In other circumstances it should not be used in order to avoid conflicts with the global variable names.

  • ani_save_name (str) – If create_animation = True the animation is saved in a .mp4 file with this name. Default is default_animation_name.

  • animation_title (str) – A title can be given to the animation.

  • mark_critical_point (float) – A red dotted vertical line is shown at time mark_critical_point. Default is None.

  • mark_noise_level (float) – A green dotted line is shown at the noise level mark_noise_level in the noise kernel density plot and the time evolution plot of the noise level. Default is None.

  • detrending_per_window – Default is None. If 'linear' the data_window is detrended linearly. If Gaussian kernel smoothed a Gaussian smoothed curve is subtracted from the data_window. The parameters of the kernel smoothing can be set by gauss_filter_mode, gauss_filter_sigma, gauss_filter_order, gauss_filter_cval and gauss_filter_truncate.

  • gauss_filter_mode (str) – According to the scipy.ndimage.filters.gaussian_filter option.

  • gauss_filter_sigma (float) – According to the scipy.ndimage.filters.gaussian_filter option.

  • gauss_filter_order (int) – According to the scipy.ndimage.filters.gaussian_filter option.

  • gauss_filter_cval (float) – According to the scipy.ndimage.filters.gaussian_filter option.

  • gauss_filter_truncate (float) – According to the scipy.ndimage.filters.gaussian_filter option.

  • plot_detrending (bool) – Default is False. If True, the self.data_window as well as the self.slow_trend and the detrended version are shown.

  • MCMC_parallelization_method (str) – Default is None. If None the basic serial MCMC computation is performed. If MCMC_parallelization_method = 'multiprocessing', a multiprocessing pool with num_processes is used to accelerate MCMC sampling. If MCMC_parallelization_method = 'chop_chain' is used, the total length of the desired Markov chain is divided into 'chop_chain' parts each of which is sampled in parallel and joined together in the end.

  • num_processes (str or int) – Default is None. If 'half, almost half of the CPU kernels are used. If 'all', all CPU kernels are used. If integer number, the defined number of CPU kernels is used for multiprocessing.

  • num_chop_chains (int) – Number by which the total length of the Markov chain is divided. Each slice is sampled in parallel and joined together in the end of the calculations.

perform_MAP_resilience_scan(window_size, window_shift, cred_percentiles=array([16, 1]), error_propagation='summary statistics', summary_window_size=10, sigma_multiples=array([1, 3]), print_progress=True, print_details=False, slope_save_name='default_save_slopes', noise_level_save_name='default_save_noise', save=True, create_plot=False, ani_save_name='default_animation_name', animation_title='', mark_critical_point=None, mark_noise_level=None, print_hint=True, fastMAPflag=False)[source]

Performs an automated MAP window scan with defined window_shift over the whole time series. In each window the drift slope and noise level estimates with corresponding credibility bands are computed and saved in the slope_storage and the noise_level_storage. It can also be used to create an plot of the sliding window approach plotting the time series, the moving window, and the time evolution of the drift slope estimates \(\hat{\zeta}\) and the noise level \(\hat{\sigma}\). The start indices of the shifted windows are also saved in order to facilitate customized plots.

Parameters:
  • window_size (int) – Time window size.

  • window_shift (int) – The rolling time window is shifted about window_shift data points.

  • cred_percentiles (One-dimensional numpy array of integers.) – One or two entries to define the percentiles of the calculated credibility bands of the estimated parameters. It is stored in the attribute credibility_bands. Default is numpy.array([16,1]).

  • error_propagation (str) –

    Defines the method that is used to compute the confidence bands. Default is 'summary statistics. In that case drift slope samples of size summary_window_size are used to compute the drift slope mean and its standard error. The parameter sigma_multiples defines the width of the summary statistics’ symmetric error bands. If 'error bound' or 'uncorrelated Gaussian' is chosen, the marginal uncertainties corresponding to cred_percentiles are computed employing Wilks’ theorem. With 'error bound' the highest uncertainties per parameter and cred_percentiles level is interpreted as a symmetric worst case bound of the error. If 'uncorrelated Gaussian' is chosen, the uncertainties are treated corresponding to the slight asymmetric results of the Wilks’ theorem confidence intervals. Both options give trustworthy results in the case of a first order polynomial drift. In case of the third order polynomial drift they are too optimistic, i.e. narrow, since the error bounds ('error bound') or errors ('uncorrelated Gaussian') are propagated without regard of correlations in the model parameters. This issue might be solvable by introducing an estimate of the covariance matrix in future, but is not planned for now. Furthermore, the 'uncorrelated Gaussian' option keeps the slightly asymmetric errors proposed by Wilks’ theorem for the marginal parameter distributions. This small formal incorrectness is to maintain information about asymmetry in the estimates in a first Wilks’ theorem guess. The cred_percentiles are fixed to cred_percentiles = numpy.array([5,1]) to transform the half confidence bands under the assumption of Gaussian distributions by the factors 1.96 and 2.5525, respectively. The propagated result is transformed back.

    Hint

    The options 'error bound' and 'uncorrelated Gaussian' are only trustworthy for a linear drift term. They might also be correct for a third order polynomial drift in the case ofvery high amounts of data per window. Otherwise, the default 'summary statistics' should be used, unless the window shift is very high. In that case it might introduce a too strong delay due to the averaging procedure.

  • print_progress (Boolean) – If True the progress of the MAP estimation per window is shown. Default is False.

  • print_details (Boolean) – If True a more detailed print output in the case the binning procedure is provided. Default is False.

  • slope_save_name (string) – Name of the file in which the slope_storage array will be saved. Default is default_save_slopes.

  • noise_level_save_name (string) – Name of the file in which the noise_level_storage array will be saved. Default is default_save_noise.

  • save (Boolean) – If True the slope_storage and the noise_level_storage arrays are saved in the end of the scan in an .npy file. Default is True.

  • create_plot (Boolean) – If True, the MAP resilience scan results are plotted. Default is False. NOTE: Not implemented yet.

  • summary_window_size (int) – If error_propagation = 'summary statistics' is chosen, the parameter defines the number of drift slope estimates to use in a window summary statistic. The windows are shifted by one.

  • sigma_multiples (One dimensional numpy array of float .) – The array hast two entries. If error_propagation = 'summary statistics' is chosen, the entries define the drift slope standard error multiples which are used to calculate the uncertainty bands.

class antiCPy.early_warnings.drift_slope.binning_langevin_estimation.BinningLangevinEstimation(data, time, bin_num, drift_model='3rd order polynomial', diffusion_model='constant', prior_type='Non-informative linear and Gaussian Prior', prior_range=None, scales=array([4, 8]))[source]

Bases: LangevinEstimation

Child class of the LangevinEstimation class. Inherits all features and methods from the parent class. The class is used to apply a computation shortcut of the implemented resilience screening methods by a data binning approach (cf. [Kleinhans2012] ). Needed attributes and (overloaded) methods are provided.

Hint

The fitting procedure is faster than the MAP estimation if the windows contain a certain amount of data.

Parameters:
  • _bin_centers (One-dimensional numpy array of float) – Attribute that contains the bin centers of the time series.

  • _bin_num (int) – Number of chosen bins for the time series binning procedure.

  • _bin_array (One-dimensional numpy array of float) – Attribute that contains the limits of each bin.

  • _bin_labels (One-dimensional numpy array of int) – Attribute that contains the bin labels of each data point. Entries reach from one to _bin_num.

  • _number_bin_members (One-dimensional numpy array of int) – Attribute that contains the number of data points of a time series that belong to each bin.

  • _bin_mean_increment (One-dimensional numpy array of float) – Attribute that contains the mean increments of each bin.

  • _bin_mean_increment_squared (One-dimensional numpy array of float) – Attribute that contains the squared mean increments of each bin.

  • _bin_MAP (One-dimensional numpy array of float) – Attribute that contains the maximum a posteriori probability of each bin.

property bin_centers

Contains the center value of the binned data.

property bin_increment_mean

Contains the mean increments per bin.

property bin_increment_mean_squared

Contains the squared mean increment of each bin.

D1()[source]

Overloaded drift function in the case of the binning data approach.

D2()[source]

Overloaded diffusion function in the case of the binning data approach.

static calc_bin_neg_log_likelihood(pars, bin_incr_mean, bin_incr_mean_squared, num_bin_members, dt)[source]

Calculates the negative logarithmic likelihood function of a bin in the binning data approach.

bin_neg_log_likelihood()[source]

Calculates the negative logarithmic likelihood function of a bin in the binning data approach. The method calls the static helper function calc_bin_neg_log_likelihood(...).

log_posterior(theta)[source]

Calculates the logarithmic posterior for a given tuple of parameters \(\underline{\theta}\) .

neg_log_posterior(theta)[source]

Calculates the negative logarithmic posterior for a given tuple of parameters \(\underline{\theta}\) .

class antiCPy.early_warnings.drift_slope.non_markov_estimation.NonMarkovEstimation(data, time, X_drift_model='3rd order polynomial', Y_model='Ornstein-Uhlenbeck noise', Y_drift_model='no offset first order', X_coupling_term='constant', Y_diffusion_model='constant', prior_type='Non-informative linear and Gaussian Prior', prior_range=None, scales=array([4, 8]), activate_time_scale_separation_prior=False, slow_process=None, time_scale_separation_factor=None, max_likelihood_starting_guesses=None, detrending_of_whole_dataset=None, gauss_filter_mode='reflect', gauss_filter_sigma=6, gauss_filter_order=0, gauss_filter_cval=0.0, gauss_filter_truncate=4.0, plot_detrending=False)[source]

Bases: LangevinEstimation

Child class of LangevinEstimation. Inherits its functions to guarantee similar function structure.

Parameters:
  • data (One dimensional numpy array of floats) – A one dimensional numpy array containing the times series to analyse.

  • time (One dimensional numpy array of floats) – A one dimensional numpy array containing the time samples of the time series data.

  • X_drift_model (str) – Defines the drift model. Default is '3rd order polynomial'. Additionally, a 'first order polynomial' or a '3rd order odds correlated' can be chosen. In '3rd order odds correlated' the first and third order coefficients are the same to reduce complexity. Corresponds to drift_model in the LangevinEstimation case.

  • X_coupling_term (str) – Defines the X_coupling_term. Default is 'constant'. Additionally, a 'first order polynomial' can be chosen. Corresponds to diffusion_model in the LangevinEstimation case.

  • Y_model (str) – Default is 'Ornstein-Uhlenbeck noise'. It defines automatically the Y drift model to be linear without offset and the Y diffusion to be constant. The drift coefficient is given by \(-\frac{1}{\theta_5^2}\) . The diffusion coefficient is given by \(\frac{1}{\theta_5}\) .

  • Y_drift_model (str) – Other model settings than Y_model = 'Ornstein-Uhlenbeck noise' are not tested yet. In principle, Y_drift_model can be chosen to be 'first order polynomial', '3rd order polynomial' and '3rd order odds correlated'.

  • Y_diffusion_model (str) – Other model settings than Y_model = 'Ornstein-Uhlenbeck noise' are not tested yet. In principle, 'constant' and 'first order polynomial' models can be chosen.

  • prior_type (str) – Defines the used prior to calculate the posterior distribution of the data. Default is 'Non-informative linear and Gaussian Prior'. A flat prior can be chosen with 'Flat Prior'.

  • prior_range (Two-dimensional numpy array of floats, optional) – Customize the prior range of the estimated polynomial parameters \(\theta_i\) starting for \(\theta_0\) in row with index 0 and upper limits in the first column. Default is None which means prior ranges of -50 to 50 for the drift parameters and 0 to 50 for constant diffusion terms.

  • scales (One-dimensional numpy array with two float entries) – Two tailed percentiles to create credibility bands of the estimated measures.

  • theta (One-dimensional numpy array of floats) – Array that contains the estimated drift and diffusion parameters with drift and ending with diffusion. The lowest order is mentioned first.

  • ndim (int) – Total number of parameters to estimate.

  • nwalkers (int) – Number of MCMC chains.

  • nsteps (int) – Length of Markov chains.

  • nburn (int) – Length of burn in period of the MCMC chains.

  • data_size (int) – Length of the data array.

  • dt (float) – Time intervall of the equidistant time array.

  • drift_slope (One-dimensional numpy array of floats) – Array with the current drift slope estimate in the 0th component. Component 1, 2 and 3, 4 contain the upper and lower bound of the credibility intervals defined by the scales variable.

  • noise_level_estimate (One-dimenional numpy array of floats) – Array with the noise level estimate in the 0th component. Component 1, 2 and 3, 4 contain the upper and lower bound of the credibility intervals defined by the scales variable.

  • loop_range (One-dimensional (data_size - window_size / window_shift + 1) numpy array of integers) – Array with the start indices of the time array for each rolled time window.

  • window_size (int) – Size of rolling windows.

  • data_window (One-dimensional numpy array of floats) – Array that contains data of a certain interval of the dataset in data.

  • time_window (One-dimensional numpy array of floats) – Array that contains time of a certain interval of the dataset in data.

  • increments (One-dimensional numpy array of floats) – Array that contains the increments from \(t\) to \(t+1\).

  • joint_samples (Two-dimensional (ndim, num_of_joint_samples) numpy array of floats) – Array that contains drawed parameter samples from their joint posterior probability density function that is estimated from the data in the current data_window.

  • slope_estimates (One-dimensional numpy array of floats) – Estimates of the drift slope on the current data_window.

  • fixed_point_estimate (float) – Fixed point estimate calculated as the mean of the current data_window.

  • starting_guesses (Two-dimensional numpy array of floats) – Array that contains the initial guesses of the MCMC sampling.

  • theta_array (Two-dimensional numpy array of floats) – Array that contains the drift and diffusion parameters sampled with MCMC.

  • slope_storage (Two-dimensional (5, loop_range.size) numpy array of floats) – Array that contains the drift slope estimates \(\hat{\zeta}\) of the rolling windows. The columns encode each rolling window with the drift slope estimate \(\hat{\zeta}\) in the 0th row and in rows 1,2 and 3,4 the upper and lower bound of the credibility intervals defined by scales.

  • noise_level_storage (Two-dimensional (5, loop_range.size) numpy array of floats) – Array that contains the noise level estimates \(\hat{\sigma}\) of the rolling windows. The columns encode each rolling window with the noise level estimate \(\hat{\sigma}\) in the 0th row and in rows 1,2 and 3,4 the upper and lower bound of the credibility intervals defined by scales.

  • slope_kernel_density_obj (scipy kernel density object of the scipy.gaussian_kde(...) function.) – Kernel density object of scipy.gaussian_kde(...) of the slope posterior created with a Gaussian kernel and a bandwidth computed by silverman’s rule.

  • noise_kernel_density_obj – Kernel density object of scipy.gaussian_kde(...) of the noise level posterior created with a Gaussian kernel and a bandwidth computed by silverman’s rule.

  • slow_trend (One-dimensional numpy array of floats.) – Contains the subtracted slow trend if a detrending is applied to the whole data or each window separately.

  • detrending_of_whole_dataset (str) – Default is None. If 'linear' the whole data is detrended linearly. If Gaussian kernel smoothed a Gaussian smoothed curve is subtracted from the whole data. The parameters of the kernel smoothing can be set by gauss_filter_mode, gauss_filter_sigma, gauss_filter_order, gauss_filter_cval and gauss_filter_truncate.

  • gauss_filter_mode (str) – According to the scipy.ndimage.filters.gaussian_filter option.

  • gauss_filter_sigma (float) – According to the scipy.ndimage.filters.gaussian_filter option.

  • gauss_filter_order (int) – According to the scipy.ndimage.filters.gaussian_filter option.

  • gauss_filter_cval (float) – According to the scipy.ndimage.filters.gaussian_filter option.

  • gauss_filter_truncate (float) – According to the scipy.ndimage.filters.gaussian_filter option.

  • plot_detrending (bool) – Default is False. If True, the self.data as well as the self.slow_trend and the detrended version are shown.

  • activate_time_scale_separation_prior (bool) – If True, a time scale separation of the observed process \(X\) and the unobserved process \(Y\) is assumed. Default is False.

  • slow_process (str) – If activate_time_scale_separation_prior = True, define the slow process via 'X' or 'Y', respectively.

  • time_scale_separation_factor (int) – If activate_time_scale_separation_prior = True, define the factor by which you assume the time scales of \(X\) and \(Y\) to differ.

  • max_likelihood_starting_guesses (One-dimensional numpy array of float) –

    If None, the MAP starting guesses (essentially maximum likelihood because of flat priors) are are computed starting with all parameters set to one. In general, you can pass a one-dimensional numpy array with ndim entries to this argument to define different starting values.

    Note

    This is especially necessary, if you enable the time scale separation, since the default array of ones will contradict normally to the assumed two time scale model.

static D1(data, theta, Y_model, drift_model, component)[source]

Calls the static calc_D1 function and returns its value. Passing of parameters is adapted to the NonMarkovEstimation model.

static D2(data, theta, Y_model, diffusion_model, component)[source]

Calls the static calc_D2 function and returns its value. Passing of parameters is adapted to the NonMarkovEstimation model.

static log_prior(theta, X_drift_model, X_coupling_term, Y_model, prior_type, prior_range, scales, data_window, activate_time_scale_separation_prior, slow_process, time_scale_separation_factor, print_time_scale_info)[source]

Returns the logarithmic prior probability for a given set of parameters depending on the drift model and coupling term of the observed variable X and the unobserved drift-diffusion process Y that couples into X via the specified coupling term. The prior probability is computed within the given prior_range of the parameters theta. A flat prior for the parameters apart from restriction to positive constant diffusion and a non-informative prior for linear parts and Gaussian priors for higher orders are implemented. The standard deviation of the Gaussian priors is given by the scales variable. If two separate time scales are assumed via activate_time_scale_separation_prior=True, the slow_process and the time_scale_separation_factor can be specified. The time scales for each sampled parameter set are approximated by \(\left\lvert \frac{1}{f'(x)}\right\rvert\) and \(\left\lvert\frac{1}{f'(y)}\right\rvert\) with prime denoting derivative with respect to the variables \(x\) and \(y\), respectively.

static log_likelihood(theta, data, X_drift_model, Y_drift_model, X_coupling_term, Y_diffusion_model, Y_model, dt)[source]

Returns the logarithmic likelihood of the data for the given model parametrization. It is given by the modified short-time propagator of [Willers2021] .

static log_posterior(theta)[source]

Returns the logarithmic posterior probability of the data for the given model parametrization.

static neg_log_posterior(theta)[source]

Calculates the negative logarithmic posterior for a given tuple of parameters :math:` heta`.

static init_parallel_EnsembleSampler(data_window, X_drift_model, Y_drift_model, Y_diffusion_model, X_coupling_term, Y_model, dt, prior_type, prior_range, scales, activate_time_scale_separation_prior, time_scale_separation_factor, slow_process, print_time_scale_info)[source]

Initializes the workers for the MCMC sampling if MCMC_parallelization_method = 'multiprocessing' in perform_resilience_scan(...).

compute_posterior_samples(print_AC_tau, ignore_AC_error, thinning_by, print_progress, nburn, MCMC_parallelization_method=None, num_processes=None, num_chop_chains=None, MCMC_AC_estimate='standard')[source]

Compute the theta_array with \(nwalkers \cdot nsteps\) Markov Chain Monte Carlo (MCMC) samples. If ignore_AC_error = False the calculation will terminate with error if the autocorrelation of the sampled chains is too high compared to the chain length. Otherwise, the highest autocorrelation length will be used to thin the sampled chains. In order to run tests in shorter time you can set “ignore_AC_error = True” and define a thinning_by n steps. If print_AC_tau = True the autocorrelation lengths of the sampled chains is printed.

declare_MAP_starting_guesses(nwalkers=None, nsteps=None, print_time_scale_info=False)[source]

Declare the maximum a posterior (MAP) starting guesses for a MCMC sampling with nwalkers and nsteps.

calc_drift_slope_and_noise(slope_grid, noise_grid, OU_grid, X_coupling_grid, nburn, n_joint_samples=50000, n_slope_samples=50000, n_noise_samples=50000, n_OU_param_samples=50000, n_X_coupling_samples=50000, cred_percentiles=array([16, 1]), print_AC_tau=False, print_time_scale_info=False, ignore_AC_error=False, thinning_by=60, print_progress=False, detrending_per_window=None, gauss_filter_mode='reflect', gauss_filter_sigma=6, gauss_filter_order=0, gauss_filter_cval=0.0, gauss_filter_truncate=4.0, plot_detrending=False, print_details=False, MCMC_parallelization_method=None, num_processes=None, num_chop_chains=None, MCMC_AC_estimate='standard')[source]

Calculates the drift slope estimate \(\hat{\zeta}\) and the noise level \(\hat{\psi}\) from the MCMC sampled parameters of the non-Markovian model for a given window_size and given window_shift. In the course of the computations the parameters joint_samples, a noise_kernel_density_obj, a slope_kernel_density_obj, a X_coupling_density_obj, a OU_kernel_density_obj, the drift_slope with credibility intervals, the noise_estimates, the modified noise_level \(\hat{\psi}\) with credibility intervals as well as the X_coupling_estimate and OU_param_estimate with credibility invervals and in case of a third order polynomial drift the estimated fixed_point_estimate of the window will be stored.

Parameters:
  • slope_grid (One-dimensional numpy array of floats.) – Array on which the drift slope kernel density estimate is evaluated.

  • noise_grid (One-dimensional numpy array of floats.) – Array on which the noise level \(\hat{\psi}\) kernel density estimate is evaluated.

  • OU_grid (One-dimensional numpy array of floats.) – Array on which the Ornstein Uhlenbeck (OU) parameter’s kernel density estimate is evaluated.

  • X_coupling_grid (One-dimensional numpy array of floats.) – Array on which the X coupling strength kernel density estimate is evaluated.

  • nburn (int) – Number of data points at the beginning of the Markov chains which are discarded in terms of a burn in period. Usually the defaul is 200, set by the perform_resilience_scan method.

  • n_joint_samples (int) – Number of joint samples that are drawn from the estimated joint posterior probability in order to calculate the drift slope estimate \(\zeta\) and corresponding credibility bands. Default is 50000.

  • n_slope_samples (int) – Number of drift slope samples that are drawn from the estimated drift slope posterior computed based on the sampling results of the joint posterior probability of the Langevin parameters :math:` heta_i`. Default is 50000.

  • n_noise_samples (int) – Number of constant noise level samples that are drawn from the estimated marginal probability distribution of the diffusion model is chosen constant. Default is 50000.

  • n_X_coupling_samples (int) – Number of X coupling samples that are drawn from the estimated posterior probability. Default is 50000.

  • n_OU_param_samples (int) – Number of OU parameter samples that are drawn from the estimated posterior probability. Default is 50000.

  • cred_percentiles (One-dimensional numpy array of int) – Two entries to define the percentiles of the calculated credibility bands of the estimated parameters. Default is numpy.array([16,1]).

  • print_AC_tau – If True the estimated autocorrelation lengths of the Markov chains is shown. The maximum length is used for thinning of the chains. Default is False.

  • ignore_AC_error (Boolean) – If True the autocorrelation lengths of the Markov chains is not estimated and thus, it is not checked whether the chains are too short to give an reliable autocorrelation estimate. This avoids error interruption of the procedure, but can lead to unreliable results if the chains are too short. The option should be chosen in order to save computation time, e.g. when debugging your code with short Markov chains. Default is False.

  • thinning_by (int) – If ignore_AC_error = True the chains are thinned by thinning_by. Every thinning_by-th data point is stored. Default is 60.

  • print_progress (Boolean) – If True the progress of the MCMC sampling is shown. Default is False.

  • detrending_per_window – Default is None. If 'linear' the data_window is detrended linearly. If Gaussian kernel smoothed a Gaussian smoothed curve is subtracted from the data_window. The parameters of the kernel smoothing can be set by gauss_filter_mode, gauss_filter_sigma, gauss_filter_order, gauss_filter_cval and gauss_filter_truncate.

  • gauss_filter_mode (str) – According to the scipy.ndimage.filters.gaussian_filter option.

  • gauss_filter_sigma (float) – According to the scipy.ndimage.filters.gaussian_filter option.

  • gauss_filter_order (int) – According to the scipy.ndimage.filters.gaussian_filter option.

  • gauss_filter_cval (float) – According to the scipy.ndimage.filters.gaussian_filter option.

  • gauss_filter_truncate (float) – According to the scipy.ndimage.filters.gaussian_filter option.

  • plot_detrending (bool) – Default is False. If True, the self.data_window as well as the self.slow_trend and the detrended version are shown.

  • print_time_scale_info (bool) – If True, control values are plotted during each prior evaluation. First, the fast time scale estimate times the time_scale_separation_factor is printed. Second, the slow time scale estimate and third, a check up whether the desired time scale separation is fulfilled are printed.

  • MCMC_parallelization_method (str) – Default is None. If None the basic serial MCMC computation is performed. If MCMC_parallelization_method = ‘multiprocessing’, a multiprocessing pool with num_processes is used to accelerate MCMC sampling. If MCMC_parallelization_method = ‘chop_chain’ is used, the total length of the desired Markov chain is divided into ‘chop_chain’ parts each of which is sampled in parallel and joined together in the end.

  • num_processes (str or int) – Default is None. If 'half', almost half of the CPU kernels are used. If 'all' , all CPU kernels are used. If integer number, the defined number of CPU kernels is used for multiprocessing.

  • num_chop_chains (int) – Number by which the total length of the Markov chain is divided. Each slice is sampled in parallel and joined together in the end of the calculations.

  • MCMC_AC_estimate (str) – If default ‘standard’ is used, emcee’s .get_autocorr_time() is applied to estimate th sampled Markov chain’s autocorrelation length for thinning. In some cases the estimation procedure requires longer chains than you can run and does not converge at all. In such situations you can try to estimate the autocorrelation length with parametric models following the suggestions of the emcee documentation via MCMC_AC_estimate = 'alternative'.

perform_resilience_scan(window_size, window_shift, slope_grid, noise_grid, OU_grid, X_coupling_grid, nwalkers=50, nsteps=10000, nburn=200, n_joint_samples=50000, n_slope_samples=50000, n_noise_samples=50000, n_OU_param_samples=50000, n_X_coupling_samples=50000, cred_percentiles=array([16, 1]), print_AC_tau=False, ignore_AC_error=False, thinning_by=60, print_progress=False, print_details=False, print_time_scale_info=False, slope_save_name='default_save_slopes', noise_level_save_name='default_save_noise', save=True, create_animation=False, ani_save_name='default_animation_name', animation_title='', mark_critical_point=None, mark_noise_level=None, detrending_per_window=None, gauss_filter_mode='reflect', gauss_filter_sigma=6, gauss_filter_order=0, gauss_filter_cval=0.0, gauss_filter_truncate=4.0, plot_detrending=False, MCMC_parallelization_method=None, num_processes=None, num_chop_chains=None, MCMC_AC_estimate='standard')[source]

Performs an automated window scan with defined window_shift over the whole time series. In each window the drift slope, noise level, X coupling strength and Ornstein-Uhlenbeck (OU) parameter estimates with corresponding credibility bands are computed and saved in the slope_storage, noise_level_storage, X_coupling_storage and OU_param_storage. It can also be used to create an animation of the sliding window approach plotting the time series, the moving window, and the time evolution of the drift slope estimates \(\hat{\zeta}\), the noise level \(\hat{\psi}\) and the noise kernel density estimate. The start indices of the shifted windows are also saved in order to facilitate customized plots.

Parameters:
  • window_size (int) – Time window size.

  • window_shift (int) – The rolling time window is shifted about window_shift data points.

  • slope_grid (One-dimensional numpy array of floats.) – Array on which the drift slope kernel density estimate is evaluated.

  • noise_grid (One-dimensional numpy array of floats.) – Array on which the noise level kernel density estimate is evaluated.

  • OU_grid (One-dimensional numpy array of floats.) – Array on which the Ornstein Uhlenbeck (OU) parameter’s kernel density estimate is evaluated.

  • X_coupling_grid (One-dimensional numpy array of floats.) – Array on which the X coupling strength kernel density estimate is evaluated.

  • nwalkers (int) – Number of walkers that are initialized for the MCMC sampling via the package emcee. Default is 50.

  • nsteps (int) – Length of the sampled MCMC chains. Default is 10000.

  • nburn (int) – Number of data points at the beginning of the Markov chains which are discarded in terms of a burn in period. Default is 200.

  • n_joint_samples (int) – Number of joint samples that are drawn from the estimated joint posterior probability in order to calculate the drift slope estimate \(\zeta\) and corresponding credibility bands. Default is 50000.

  • n_slope_samples (int) – Number of drift slope samples that are drawn from the estimated drift slope posterior computed based on the sampling results of the joint posterior probability of the Langevin parameters :math:` heta_i`. Default is 50000.

  • n_noise_samples (int) – Number of constant noise level samples that are drawn from the estimated marginal probability distribution of the diffusion model is chosen constant. Default is 50000.

  • n_X_coupling_samples (int) – Number of X coupling samples that are drawn from the estimated posterior probability. Default is 50000.

  • n_OU_param_samples (int) – Number of OU parameter samples that are drawn from the estimated posterior probability. Default is 50000.

  • cred_percentiles (One-dimensional numpy array of integers.) – Two entries to define the percentiles of the calculated credibility bands of the estimated parameters. It is stored in the attribute credibility_bands. Default is numpy.array([16,1]).

  • print_AC_tau – If True the estimated autocorrelation lengths of the Markov chains is shown. The maximum length is used for thinning of the chains. Default is False.

  • ignore_AC_error (Boolean) – If True the autocorrelation lengths of the Markov chains is not estimated and thus, it is not checked whether the chains are too short to give an reliable autocorrelation estimate. This avoids error interruption of the procedure, but can lead to unreliable results if the chains are too short. The option should be chosen in order to save computation time, e.g. when debugging your code with short Markov chains. Default is False.

  • thinning_by (int) – If ignore_AC_error = True the chains are thinned by thinning_by. Every thinning_by-th data point is stored. Default is 60.

  • print_progress (Boolean) – If True the progress of the MCMC sampling is shown. Default is False.

  • print_details (Boolean) – If True a more detailed print output in the case the binning procedure is provided. Default is False.

  • slope_save_name (string) – Name of the file in which the slope_storage array will be saved. Default is default_save_slopes.

  • noise_level_save_name (string) – Name of the file in which the noise_level_storage array will be saved. Default is default_save_noise.

  • save (Boolean) – If True the slope_storage and the noise_level_storage arrays are saved in the end of the scan in an .npy file. Default is True.

  • create_animation (Boolean) –

    If True an automated animation of the time evolution of the drift slope estimate :math:hat{zeta}`, the noise level \(\hat{\sigma}\) and the noise kernel density is shown together with the time series and rolling windows. Default is False.

    Warning

    The implementation is same as in LangevinEstimation. It might work to generate animations with the option, but it is not tested. If errors occur, they are almost certainly bugs for that reason.

  • ani_save_name (string) – If create_animation = True the animation is saved in a .mp4 file with this name. Default is default_animation_name.

  • animation_title (string) – A title can be given to the animation.

  • mark_critical_point (float) – A red dotted vertical line is shown at time mark_critical_point. Default is None.

  • mark_noise_level (float) – A green dotted line is shown at the noise level mark_noise_level in the noise kernel density plot and the time evolution plot of the noise level. Default is None.

  • detrending_per_window – Default is None. If 'linear' the data_window is detrended linearly. If Gaussian kernel smoothed a Gaussian smoothed curve is subtracted from the data_window. The parameters of the kernel smoothing can be set by gauss_filter_mode, gauss_filter_sigma, gauss_filter_order, gauss_filter_cval and gauss_filter_truncate.

  • gauss_filter_mode (str) – According to the scipy.ndimage.filters.gaussian_filter option.

  • gauss_filter_sigma (float) – According to the scipy.ndimage.filters.gaussian_filter option.

  • gauss_filter_order (int) – According to the scipy.ndimage.filters.gaussian_filter option.

  • gauss_filter_cval (float) – According to the scipy.ndimage.filters.gaussian_filter option.

  • gauss_filter_truncate (float) – According to the scipy.ndimage.filters.gaussian_filter option.

  • plot_detrending (bool) – Default is False. If True, the self.data_window as well as the self.slow_trend and the detrended version are shown.

  • print_time_scale_info (bool) – If True, control values are plotted during each prior evaluation. First, the fast time scale estimate times the time_scale_separation_factor is printed. Second, the slow time scale estimate and third, a check up whether the desired time scale separation is fulfilled are printed.

  • MCMC_parallelization_method (str) – Default is None. If None the basic serial MCMC computation is performed. If MCMC_parallelization_method = ‘multiprocessing’, a multiprocessing pool with num_processes is used to accelerate MCMC sampling. If MCMC_parallelization_method = ‘chop_chain’ is used, the total length of the desired Markov chain is divided into ‘chop_chain’ parts each of which is sampled in parallel and joined together in the end.

  • num_processes (str or int) – Default is None. If 'half', almost half of the CPU kernels are used. If 'all', all CPU kernels are used. If integer number, the defined number of CPU kernels is used for multiprocessing.

  • num_chop_chains (int) – Number by which the total length of the Markov chain is divided. Each slice is sampled in parallel and joined together in the end of the calculations.

  • MCMC_AC_estimate (str) –

    If default ‘standard’ is used, emcee’s .get_autocorr_time() is applied to estimate the sampled Markov chain’s autocorrelation length for thinning. In some cases the estimation procedure requires longer chains than you can run and does not converge at all. In such situations you can try to estimate the autocorrelation length with parametric models following the suggestions of the emcee documentation via MCMC_AC_estimate = 'alternative'.

animation(i, slope_grid, noise_grid, OU_grid, X_coupling_grid, nwalkers, nsteps, nburn, n_joint_samples, n_slope_samples, n_noise_samples, n_OU_param_samples, n_X_coupling_samples, cred_percentiles, print_AC_tau, ignore_AC_error, thinning_by, print_progress, print_details, MCMC_parallelization_method, num_processes, num_chop_chains)[source]

Function that is called iteratively by the matplotlib.animation.FuncAnimation(…) tool to generate an animation of the rolling window scan of a time series. The time series, the rolling windows, the time evolution of the drift slope estimate \(\hat{\zeta}\) and the noise level estimate \(\hat{\psi}\) and its posterior probality density is shown in the animation.

perform_MAP_resilience_scan(window_size, window_shift, cred_percentiles=array([16, 1]), error_propagation='summary statistics', summary_window_size=10, sigma_multiples=array([1, 3]), print_progress=True, print_details=False, slope_save_name='default_save_slopes', noise_level_save_name='default_save_noise', save=True, create_plot=False, ani_save_name='default_animation_name', animation_title='', mark_critical_point=None, mark_noise_level=None, print_time_scale_info=False, print_hint=True, fastMAPflag=False)[source]

Performs an automated MAP window scan with defined window_shift over the whole time series. This might be the first approach for a non-Markovian estimation, since the MCMC computations are time-consuming. In each window the drift slope, noise level, X coupling strength and OU parameter estimates with corresponding credibility bands are computed and saved in the slope_storage and the noise_level_storage.

Parameters:
  • window_size (int) – Time window size.

  • window_shift (int) – The rolling time window is shifted about window_shift data points.

  • cred_percentiles (One-dimensional numpy array of integers.) – Two entries to define the percentiles of the calculated credibility bands of the estimated parameters. It is stored in the attribute credibility_bands. Default is numpy.array([16,1]).

  • error_propagation (str) –

    Defines the method that is used to compute the confidence bands. Default is 'summary statistics. In that case drift slope samples of size summary_window_size are used to compute the drift slope mean and its standard error. The parameter sigma_multiples defines the width of the summary statistics’ symmetric error bands. If 'error bound' or 'uncorrelated Gaussian' is chosen, the marginal uncertainties corresponding to cred_percentiles are computed employing Wilks’ theorem. With 'error bound' the highest uncertainties per parameter and cred_percentiles level is interpreted as a symmetric worst case bound of the error. If 'uncorrelated Gaussian' is chosen, the uncertainties are treated corresponding to the slight asymmetric results of the Wilks’ theorem confidence intervals. Both options give trustworthy results in the case of a first order polynomial drift. In case of the third order polynomial drift they are too optimistic, i.e. narrow, since the error bounds ('error bound') or errors ('uncorrelated Gaussian') are propagated without regard of correlations in the model parameters. Furthermore, the 'uncorrelated Gaussian' option keeps the slightly asymmetric errors proposed by Wilks’ theorem for the marginal parameter distributions. This small formal incorrectness is to maintain information about asymmetry in the estimates in a first guess. The cred_percentiles are fixed to cred_percentiles = numpy.array([5,1]) to transform the half confidence bands under the assumption of Gaussian distributions by the factors 1.96 and 2.5525, respectively. The propagated result is transformed back.

    Hint

    The options 'error bound' and 'uncorrelated Gaussian' are only trustworthy for a linear drift term. They might also be correct for a third order polynomial drift in the case ofvery high amounts of data per window. Otherwise, the default 'summary statistics' should be used, unless the window shift is very high. In that case it might introduce a too strong delay due to the averaging procedure.

  • print_progress (Boolean) – If True the progress of the MAP estimation per window is shown. Default is False.

  • print_details (Boolean) – If True a more detailed print output in the case the binning procedure is provided. Default is False.

  • slope_save_name (string) – Name of the file in which the slope_storage array will be saved. Default is default_save_slopes.

  • noise_level_save_name (string) – Name of the file in which the noise_level_storage array will be saved. Default is default_save_noise.

  • save (Boolean) – If True the slope_storage and the noise_level_storage arrays are saved in the end of the scan in an .npy file. Default is True.

  • create_plot (Boolean) – If True, the MAP resilience scan results are plotted. Default is False. NOTE: Not implemented yet.

  • summary_window_size (int) – If error_propagation = 'summary statistics' is chosen, the parameter defines the number of drift slope estimates to use in a window summary statistic. The windows are shifted by one.

  • sigma_multiples (One dimensional numpy array of float .) – The array hast two entries. If error_propagation = 'summary statistics' is chosen, the entries define the drift slope standard error multiples which are used to calculate the uncertainty bands.

static auto_window(taus, c)[source]

Helper function to perform parametric Markov chain autocorrelation estimation following the suggestions in emcee’s documentation .

static autocorr_ml(y, thin=1, c=5.0)[source]

Helper function to perform parametric Markov chain autocorrelation estimation following the suggestions in emcee’s documentation .

static autocorr_new(y, c=5.0)[source]

Helper function to perform parametric Markov chain autocorrelation estimation following the suggestions in emcee’s documentation .

static autocorr_func_1d(x, norm=True)[source]

Helper function to perform parametric Markov chain autocorrelation estimation following the suggestions in emcee’s documentation .

static next_pow_two(n)[source]

Helper function to perform parametric Markov chain autocorrelation estimation following the suggestions in emcee’s documentation .

antiCPy.early_warnings.drift_slope.summary_statistics_helper.summary_statistics_helper(metric, summary_window_size=10, sigma_multiples=array([1, 3]))[source]

Computes the mean of the drift slope \(\hat{\zeta}\) and its standard error in a predefined summary statistics window for the LangevinEstimation approaches with enabled summary statistics. In case of the NonMarkovEstimation the summary statistics are computed for the drift slope \(\hat{\zeta}\) and the noise level estimates \(\hat{\psi}\).

Parameters:
  • summary_window_size (int) – If error_propagation = 'summary statistics' is chosen, the parameter defines the number of drift slope estimates to use in a window summary statistic. The windows are shifted by one.

  • sigma_multiples (One dimensional numpy array of float .) – The array hast two entries. If error_propagation = 'summary statistics' is chosen, the entries define the drift slope standard error multiples which are used to calculate the uncertainty bands.