The “drift_slope” package
The detailed documentation of the drift_slope package can be found in the following.
- class antiCPy.early_warnings.drift_slope.rocket_fast_resilience_estimation.RocketFastResilienceEstimation(antiCPyObject=None)[source]
Superclass of the
LangevinEstimation,BinningLangevinEstimationandNonMarkovEstimationclass. Initializes functions for a strong parallelisation of the rolling window based MCMC and MAP algorithm by coordinating multiple window calculations over a freely eligible number of subprocesses. Instead of parallelizing the MCMC sampling itself, the whole window calculations are parallelized with single CPU MCMC sampling. This implementation yields significant speed up of computation times.- Parameters:
antiCPyObject (str) – Internal identifier of the object type.
- fast_resilience_scan(window_size, window_shift, slope_grid, noise_grid, OU_grid=None, X_coupling_grid=None, nwalkers=50, nsteps=10000, nburn=200, n_joint_samples=50000, detrending_per_window=None, n_slope_samples=50000, n_noise_samples=50000, n_OU_param_samples=50000, n_X_coupling_samples=50000, MCMC_AC_estimate='standard', cred_percentiles=array([16, 1]), print_AC_tau=False, ignore_AC_error=False, thinning_by=60, print_progress=False, print_details=False, print_time_scale_info=False, slope_save_name='default_save_slopes', noise_level_save_name='default_save_noise', OU_param_save_name='default_save_OUparam', X_coupling_save_name='default_save_Xcoupling', num_processes=None, save=True)[source]
The function’s structure to use is almost equivalent to
perform_resilience_scanof theLangevinEstimationand theNonMarkovEstimationclass. It initializes a multiprocessing pool to calculate several rolling windows simultaneously.
- static init_parallel_Langevin(antiCPyObject, data, time, loop_range, window_size, slope_grid, noise_grid, nwalkers, nsteps, nburn, n_joint_samples, n_slope_samples, n_noise_samples, cred_percentiles, print_AC_tau, ignore_AC_error, thinning_by, print_progress, print_details, drift_model, diffusion_model, prior_type, prior_range, scales, slope_storage_connector, noise_storage_connector, detrending_per_window, OU_param_storage_connector=None, X_coupling_storage_connector=None, n_OU_param_samples=None, n_X_coupling_samples=None, MCMC_AC_estimate=None, Y_model=None, Y_drift_model=None, Y_diffusion_model=None, activate_time_scale_separation_prior=False, slow_process=False, time_scale_separation_factor=None, max_likelihood_starting_guesses=None, OU_grid=None, X_coupling_grid=None, print_time_scale_info=None)[source]
Internal function. Only to initialize the workers of
RocketFastResilienceEstimation.fast_resilience_scan(...).
- static execute_window(m)[source]
Internal function. Subprocess Script of the parallel worker of
RocketFastResilienceEstimation.fast_MAP_resilience_scan(...).
- fast_MAP_resilience_scan(window_size, window_shift, num_processes='half', cred_percentiles=array([16, 1]), error_propagation='summary statistics', summary_window_size=10, sigma_multiples=array([1, 3]), print_progress=True, print_details=False, slope_save_name='default_save_slopes', noise_level_save_name='default_save_noise', OU_param_save_name='default_save_OUparam', X_coupling_save_name='default_save_Xcoupling', save=True, print_time_scale_info=False)[source]
The function’s structure to use is almost equivalent to
perform_MAP_resilience_scanof theLangevinEstimation,BinningLangevinEstimationand theNonMarkovEstimationclass. It initializes a multiprocessing pool to calculate several rolling windows simultaneously.
- static init_parallel_MAP_Langevin(antiCPyObject, data, time, loop_range, window_size, cred_percentiles, print_details, print_progress, drift_model, diffusion_model, slope_storage_connector, noise_storage_connector, error_propagation, summary_window_size, sigma_multiples, prior_range, prior_type, scales, bin_num=None, OU_param_storage_connector=None, X_coupling_storage_connector=None, Y_model=None, Y_drift_model=None, Y_diffusion_model=None, activate_time_scale_separation_prior=False, slow_process=False, time_scale_separation_factor=None, max_likelihood_starting_guesses=None, print_time_scale_info=None)[source]
Internal function. Only to initialize the workers of
RocketFastResilienceEstimation.fast_MAP_resilience_scan(...).
- class antiCPy.early_warnings.drift_slope.langevin_estimation.LangevinEstimation(data, time, drift_model='3rd order polynomial', diffusion_model='constant', prior_type='Non-informative linear and Gaussian Prior', prior_range=None, scales=array([4, 8]), detrending_of_whole_dataset=None, gauss_filter_mode='reflect', gauss_filter_sigma=6, gauss_filter_order=0, gauss_filter_cval=0.0, gauss_filter_truncate=4.0, plot_detrending=False)[source]
Bases:
RocketFastResilienceEstimationThe
LangevinEstimationclass includes tools to estimate a polynomial Langevin equation with various drift and diffusion terms and provides the drift slope \(\zeta\) as a resilience measure.- Parameters:
data (One dimensional numpy array of floats) – A one dimensional numpy array containing the times series to analyse.
time (One dimensional numpy array of floats) – A one dimensional numpy array containing the time samples of the time series data.
drift_model (str) – Defines the drift model. Default is
'3rd order polynomial'. Additionally, a'first order polynomial'can be chosen.diffusion_model (str) – Defines the diffusion model. Default is
'constant'. Additionally, a'first order polynomial'can be chosen.prior_type – Defines the used prior to calculate the posterior distribution of the data. Default is
'Non-informative linear and Gaussian Prior'. A flat prior can be chosen with'Flat Prior'.prior_range (Two-dimensional numpy array of floats, optional) – Customize the prior range of the estimated polynomial parameters :math:` heta_i` starting for :math:` heta_0` in row with index 0 and upper limits in the first column. Default is
Nonewhich means prior ranges of -50 to 50 for the drift parameters and 0 to 50 for constant diffusion terms.scales (One-dimensional numpy array with two float entries) – Two tailed percentiles to create credibility bands of the estimated measures.
theta (One-dimensional numpy array of floats) – Array that contains the estimated drift and diffusion parameters with drift and ending with diffusion. The lowest order is mentioned first.
ndim (int) – Total number of parameters to estimate.
nwalkers (int) – Number of MCMC chains.
nsteps (int) – Length of Markov chains.
nburn (int) – Length of burn in period of the MCMC chains.
data_size (int) – Length of the data array.
dt (float) – Time intervall of the equidistant time array.
drift_slope (One-dimensional numpy array of floats) – Array with the current drift slope estimate in the 0th component. Component 1, 2 and 3, 4 contain the upper and lower bound of the credibility intervals defined by the
scalesvariable.noise_level_estimate (One-dimenional numpy array of floats) – Array with the noise level estimate in the 0th component. Component 1, 2 and 3, 4 contain the upper and lower bound of the credibility intervals defined by the
scalesvariable.loop_range (One-dimensional (
data_size - window_size / window_shift + 1) numpy array of integers) – Array with the start indices of the time array for each rolled time window.window_size (int) – Size of rolling windows.
data_window (One-dimensional numpy array of floats) – Array that contains data of a certain interval of the dataset in
data.time_window (One-dimensional numpy array of floats) – Array that contains time of a certain interval of the dataset in
data.increments (One-dimensional numpy array of floats) – Array that contains the increments from \(t\) to \(t+1\).
joint_samples (Two-dimensional (
ndim, num_of_joint_samples) numpy array of floats) – Array that contains drawed parameter samples from their joint posterior probability density function that is estimated from the data in the currentdata_window.slope_estimates (One-dimensional numpy array of floats) – Estimates of the drift slope on the current
data_window.fixed_point_estimate (float) – Fixed point estimate calculated as the mean of the current
data_window.starting_guesses (Two-dimensional numpy array of floats) – Array that contains the initial guesses of the MCMC sampling.
theta_array (Two-dimensional numpy array of floats) – Array that contains the drift and diffusion parameters sampled with MCMC.
slope_storage (Two-dimensional (
5, loop_range.size) numpy array of floats) – Array that contains the drift slope estimates \(\hat{\zeta}\) of the rolling windows. The columns encode each rolling window with the drift slope estimate \(\hat{\zeta}\) in the 0th row and in rows 1,2 and 3,4 the upper and lower bound of the credibility intervals defined byscales.noise_level_storage (Two-dimensional (
5, loop_range.size) numpy array of floats) – Array that contains the noise level estimates \(\hat{\sigma}\) of the rolling windows. The columns encode each rolling window with the noise level estimate \(\hat{\sigma}\) in the 0th row and in rows 1,2 and 3,4 the upper and lower bound of the credibility intervals defined byscales.slope_kernel_density_obj (
scipykernel density object of thescipy.gaussian_kde(...)function.) – Kernel density object ofscipy.gaussian_kde(...)of the slope posterior created with a Gaussian kernel and a bandwidth computed by silverman’s rule.noise_kernel_density_obj – Kernel density object of
scipy.gaussian_kde(...)of the noise level posterior created with a Gaussian kernel and a bandwidth computed by silverman’s rule.slow_trend (One-dimensional numpy array of floats.) – Contains the subtracted slow trend if a detrending is applied to the whole data or each window separately.
detrending_of_whole_dataset (str) – Default is
None. If'linear'the wholedatais detrended linearly. IfGaussian kernel smootheda Gaussian smoothed curve is subtracted from the wholedata. The parameters of the kernel smoothing can be set bygauss_filter_mode,gauss_filter_sigma,gauss_filter_order,gauss_filter_cvalandgauss_filter_truncate.gauss_filter_mode (str) – According to the
scipy.ndimage.filters.gaussian_filteroption.gauss_filter_sigma (float) – According to the
scipy.ndimage.filters.gaussian_filteroption.gauss_filter_order (int) – According to the
scipy.ndimage.filters.gaussian_filteroption.gauss_filter_cval (float) – According to the
scipy.ndimage.filters.gaussian_filteroption.gauss_filter_truncate (float) – According to the
scipy.ndimage.filters.gaussian_filteroption.plot_detrending (bool) – Default is
False. IfTrue, theself.dataas well as theself.slow_trendand the detrended version are shown.
- static detrend(time, data, detrend_mode='linear', gauss_filter_mode='reflect', gauss_filter_sigma=6, gauss_filter_order=0, gauss_filter_cval=0.0, gauss_filter_truncate=4.0, plot_detrending=False)[source]
Method to apply a linear or Gaussian kernel smoothed detrending to the whole data or to each data window separately.
- Parameters:
time (One-dimensional numpy array of floats.) – Time points of the time series.
data (One-dimensional numpy array of floats.) – Time series that is to be detrended.
detrend_mode (str) – Each window is linearly detrended for
detrend = 'linear'and with a Gaussian kernel filter viadetrend = 'gauss_kernel'. The linear detrending is the default option.gauss_filter_mode (str) – According to the
scipy.ndimage.filters.gaussian_filteroption.gauss_filter_sigma (float) – According to the
scipy.ndimage.filters.gaussian_filteroption.gauss_filter_order (int) – According to the
scipy.ndimage.filters.gaussian_filteroption.gauss_filter_cval (float) – According to the
scipy.ndimage.filters.gaussian_filteroption.gauss_filter_truncate (float) – According to the
scipy.ndimage.filters.gaussian_filteroption.
- Returns:
The first return object contains the detrended data, the second one the subtracted slow trend.
- Return type:
One-dimensional numpy arrays.
- third_order_polynom_slope_in_fixed_point()[source]
Calculate the slope_estimates of a third order polynomial drift function with joint_samples of the posterior distribution around a fixed point estimate given as the data mean of the current window.
- static calc_D1(theta, data, drift_model)[source]
Returns the drift parameterized by a first or third order polynomial for input data.
- static calc_D2(theta, data, diffusion_model)[source]
Returns the diffusion parameterized by a constant or first order polynomial for input data.
- static log_prior(theta, drift_model, diffusion_model, prior_type, prior_range, scales)[source]
Returns the logarithmic prior probability for a given set of parameters based on the short time propagator depending on the drift and diffusion models and the given prior_range of the parameters theta. A flat prior for the parameters apart from restriction to positive constant diffusion and a non-informative prior for linear parts and Gaussian priors for higher orders are implemented. The standard deviation of the Gaussian priors is given by the scales variable.
- static log_posterior(theta)[source]
Returns the logarithmic posterior probability of the data for the given model parametrization.
- static log_likelihood(theta, data, dt, drift_model, diffusion_model)[source]
Returns the logarithmic likelihood of the data for the given model parametrization.
- static neg_log_posterior(theta)[source]
Returns the negative logarithmic posterior distribution of the data for the given model parametrization.
- declare_MAP_starting_guesses(nwalkers=None, nsteps=None)[source]
Declare the maximum a posterior (MAP) starting guesses for a MCMC sampling with
nwalkersandnsteps.
- compute_posterior_samples(print_AC_tau, ignore_AC_error, thinning_by, print_progress, MCMC_parallelization_method=None, num_processes=None, num_chop_chains=None)[source]
Compute the theta_array with \(nwalkers \cdot nsteps\) Markov Chain Monte Carlo (MCMC) samples. If ignore_AC_error = False the calculation will terminate with error if the autocorrelation of the sampled chains is too high compared to the chain length. Otherwise the highest autocorrelation length will be used to thin the sampled chains. In order to run tests in shorter time you can set “ignore_AC_error = True” and define a thinning_by n steps. If print_AC_tau = True the autocorrelation lengths of the sampled chains is printed.
- static init()[source]
Defines the animation background for easy visualization of a rolling window scan of a given time series. It prepares plots for the time series and the time evolution of the drift slope estimate \(\hat{\zeta}\) and noise level estimate \(\hat{\sigma}\) and its probability density estimate. The method needs to be static. Some variables have to be declared global in to guarantee the functionality of the animation procedure.
- animation(i, slope_grid, noise_grid, nwalkers, nsteps, nburn, n_joint_samples, n_slope_samples, n_noise_samples, cred_percentiles, print_AC_tau, ignore_AC_error, thinning_by, print_progress, detrending_per_window, gauss_filter_mode, gauss_filter_sigma, gauss_filter_order, gauss_filter_cval, gauss_filter_truncate, plot_detrending, MCMC_parallelization_method, num_processes, num_chop_chains)[source]
Function that is called iteratively by the
matplotlib.animation.FuncAnimation(...)tool to generate an animation of the rolling window scan of a time series. The time series, the rolling windows, the time evolution of the drift slope estimate \(\hat{\zeta}\) and the noise level estimate \(\hat{\sigma}\) and its posterior probality density is shown in the animation.
- calc_driftslope_noise(slope_grid, noise_grid, nwalkers=50, nsteps=10000, nburn=200, n_joint_samples=50000, n_slope_samples=50000, n_noise_samples=50000, cred_percentiles=[16, 1], print_AC_tau=False, ignore_AC_error=False, thinning_by=60, print_progress=False, detrending_per_window=None, gauss_filter_mode='reflect', gauss_filter_sigma=6, gauss_filter_order=0, gauss_filter_cval=0.0, gauss_filter_truncate=4.0, plot_detrending=False, MCMC_parallelization_method=None, num_processes=None, num_chop_chains=None)[source]
Calculates the drift slope estimate \(\hat{\zeta}\) and the noise level \(\hat{\sigma}\) from the MCMC sampled parameters of a Langevin model for a given
window_sizeand givenwindow_shift. In the course of the computations the parametersjoint_samples, anoise_kernel_density_obj, aslope_kernel_density_obj, thedrift_slopewith credibility intverals, thenoise_levelwith credibility intervals and in case of a third order polynomial drift the estimated fixed_point_estimate of the window will be stored.- Parameters:
slope_grid (One-dimensional numpy array of floats) – Array on which the drift slope kernel density estimate is evaluated.
noise_grid (One-dimensional numpy array of floats) – Array on which the noise level kernel density estimate is evaluated.
nwalkers (int) – Number of walkers that are initialized for the MCMC sampling via the package
emcee. Default is 50.nsteps (int) – Length of the sampled MCMC chains. Default is 10000.
nburn (int) – Number of data points at the beginning of the Markov chains which are discarded in terms of a burn in period. Default is 200.
n_joint_samples (int) – Number of joint samples that are drawn from the estimated joint posterior probability in order to calculate the drift slope estimate \(\zeta\) and corresponding credibility bands. Default is 50000.
n_slope_samples (int) – Number of drift slope samples that are drawn from the estimated drift slope posterior computed based on the sampling results of the joint posterior probability of the Langevin parameters :math:` heta_i`. Default is 50000.
n_noise_samples (int) – Number of constant noise level samples that are drawn from the estimated marignal probility distribution of the diffusion model is chosen constant. Default is 50000.
cred_percentiles (One-dimensional numpy array of integers) – Two entries to define the percentiles of the calculated credibility bands of the estimated parameters. Default is
numpy.array([16,1]).print_AC_tau – If
Truethe estimated autocorrelation lengths of the Markov chains is shown. The maximum length is used for thinning of the chains. Default isFalse.ignore_AC_error (bool) – If
Truethe autocorrelation lengths of the Markov chains is not estimated and thus, it is not checked whether the chains are too short to give an reliable autocorrelation estimate. This avoids error interruption of the procedure, but can lead to unreliable results if the chains are too short. The option should be chosen in order to save computation time, e.g. when debugging your code with short Markov chains. Default isFalse.thinning_by (int) – If
ignore_AC_error = Truethe chains are thinned bythinning_by. Everythinning_by-th data point is stored. Default is 60.print_progress (bool) – If
Truethe progress of the MCMC sampling is shown. Default isFalse.detrending_per_window (str) – Default is
None. If'linear', theself.data_windowis detrended linearly. IfGaussian kernel smoothed, a Gaussian smoothed curve is subtracted fromself.data_window. The parameters of the kernel smoothing can be set bygauss_filter_mode,gauss_filter_sigma,gauss_filter_order,gauss_filter_cvalandgauss_filter_truncate. Drift slope and noise level are estimated after detrending in these cases.gauss_filter_mode (str) – According to the
scipy.ndimage.filters.gaussian_filteroption.gauss_filter_sigma (float) – According to the
scipy.ndimage.filters.gaussian_filteroption.gauss_filter_order (int) – According to the
scipy.ndimage.filters.gaussian_filteroption.gauss_filter_cval (float) – According to the
scipy.ndimage.filters.gaussian_filteroption.gauss_filter_truncate (float) – According to the
scipy.ndimage.filters.gaussian_filteroption.plot_detrending (bool) – Default is
False. IfTrue, theself.dataas well as theself.slow_trendand the detrended version are shown.
- perform_resilience_scan(window_size, window_shift, slope_grid, noise_grid, nwalkers=50, nsteps=10000, nburn=200, n_joint_samples=50000, n_slope_samples=50000, n_noise_samples=50000, cred_percentiles=[16, 1], print_AC_tau=False, ignore_AC_error=False, thinning_by=60, print_progress=False, slope_save_name='default_save_slopes', noise_level_save_name='default_save_noise', save=True, create_animation=False, ani_save_name='default_animation_name', animation_title='', mark_critical_point=None, mark_noise_level=None, detrending_per_window=None, gauss_filter_mode='reflect', gauss_filter_sigma=6, gauss_filter_order=0, gauss_filter_cval=0.0, gauss_filter_truncate=4.0, plot_detrending=False, MCMC_parallelization_method=None, num_processes=None, num_chop_chains=None)[source]
Performs an automated window scan with defined
window_shiftover the whole time series. In each window the drift slope and noise level estimates with corresponding credibility bands are computed and saved in theslope_storageand thenoise_level_storage. It can also be used to create an animation of the sliding window approach plotting the time series, the moving window, and the time evolution of the drift slope estimates \(\hat{\zeta}\), the noise level \(\hat{\sigma}\) and the noise kernel density estimate. The start indices of the shifted windows is also saved in order to facilitate customized plots.- Parameters:
window_size (int) – Time window size.
window_shift (int) – The rolling time window is shifted about
window_shiftdata points.slope_grid (One-dimensional numpy array of floats) – Array on which the drift slope kernel density estimate is evaluated.
noise_grid (One-dimensional numpy array of floats) – Array on which the noise level kernel density estimate is evaluated.
nwalkers (int) – Number of walkers that are initialized for the MCMC sampling via the package
emcee. Default is 50.nsteps (int) – Length of the sampled MCMC chains. Default is 10000.
nburn (int) – Number of data points at the beginning of the Markov chains which are discarded in terms of a burn in period. Default is 200.
n_joint_samples (int) – Number of joint samples that are drawn from the estimated joint posterior probability in order to calculate the drift slope estimate \(\zeta\) and corresponding credibility bands. Default is 50000.
n_slope_samples (int) – Number of drift slope samples that are drawn from the estimated drift slope posterior computed based on the sampling results of the joint posterior probability of the Langevin parameters :math:` heta_i`. Default is 50000.
n_noise_samples (int) – Number of constant noise level samples that are drawn from the estimated marignal probility distribution of the diffusion model is chosen constant. Default is 50000.
cred_percentiles (One-dimensional numpy array of integers) – Two entries to define the percentiles of the calculated credibility bands of the estimated parameters. Default is
numpy.array([16,1]).print_AC_tau – If
Truethe estimated autocorrelation lengths of the Markov chains is shown. The maximum length is used for thinning of the chains. Default isFalse.ignore_AC_error (bool) – If
Truethe autocorrelation lengths of the Markov chains is not estimated and thus, it is not checked whether the chains are too short to give an reliable autocorrelation estimate. This avoids error interruption of the procedure, but can lead to unreliable results if the chains are too short. The option should be chosen in order to save computation time, e.g. when debugging your code with short Markov chains. Default isFalse.thinning_by (int) – If
ignore_AC_error == Truethe chains are thinned bythinning_by. Everythinning_by-th data point is stored. Default is 60.print_progress (bool) – If
Truethe progress of the MCMC sampling is shown. Default isFalse.slope_save_name (str) – Name of the file in which the
slope_storagearray will be saved. Default isdefault_save_slopes.noise_level_save_name (str) – Name of the file in which the
noise_level_storagearray will be saved. Default isdefault_save_noise.save (bool) – If
Truetheslope_storageand thenoise_level_storagearrays are saved in the end of the scan in an .npy file. Default isTrue.create_animation (bool) –
If
Truean automated animation of the time evolution of the drift slope estimate :math:hat{zeta}`, the noise level \(\hat{\sigma}\) and the noise kernel density is shown together with the time series and rolling windows. Default isFalse.Warning
It makes use of global variables to incorporate the movie animation tool in the class. The
create_animationparameter is only intended to easily reconstruct the plot results of the related publication. In other circumstances it should not be used in order to avoid conflicts with the global variable names.ani_save_name (str) – If
create_animation = Truethe animation is saved in a .mp4 file with this name. Default isdefault_animation_name.animation_title (str) – A title can be given to the animation.
mark_critical_point (float) – A red dotted vertical line is shown at time
mark_critical_point. Default isNone.mark_noise_level (float) – A green dotted line is shown at the noise level
mark_noise_levelin the noise kernel density plot and the time evolution plot of the noise level. Default isNone.detrending_per_window – Default is
None. If'linear'thedata_windowis detrended linearly. IfGaussian kernel smootheda Gaussian smoothed curve is subtracted from thedata_window. The parameters of the kernel smoothing can be set bygauss_filter_mode,gauss_filter_sigma,gauss_filter_order,gauss_filter_cvalandgauss_filter_truncate.gauss_filter_mode (str) – According to the
scipy.ndimage.filters.gaussian_filteroption.gauss_filter_sigma (float) – According to the
scipy.ndimage.filters.gaussian_filteroption.gauss_filter_order (int) – According to the
scipy.ndimage.filters.gaussian_filteroption.gauss_filter_cval (float) – According to the
scipy.ndimage.filters.gaussian_filteroption.gauss_filter_truncate (float) – According to the
scipy.ndimage.filters.gaussian_filteroption.plot_detrending (bool) – Default is
False. IfTrue, theself.data_windowas well as theself.slow_trendand the detrended version are shown.MCMC_parallelization_method (str) – Default is None. If None the basic serial MCMC computation is performed. If
MCMC_parallelization_method = 'multiprocessing', a multiprocessing pool with num_processes is used to accelerate MCMC sampling. IfMCMC_parallelization_method = 'chop_chain'is used, the total length of the desired Markov chain is divided into'chop_chain'parts each of which is sampled in parallel and joined together in the end.num_processes (str or int) – Default is
None. If'half, almost half of the CPU kernels are used. If'all', all CPU kernels are used. If integer number, the defined number of CPU kernels is used for multiprocessing.num_chop_chains (int) – Number by which the total length of the Markov chain is divided. Each slice is sampled in parallel and joined together in the end of the calculations.
- perform_MAP_resilience_scan(window_size, window_shift, cred_percentiles=array([16, 1]), error_propagation='summary statistics', summary_window_size=10, sigma_multiples=array([1, 3]), print_progress=True, print_details=False, slope_save_name='default_save_slopes', noise_level_save_name='default_save_noise', save=True, create_plot=False, ani_save_name='default_animation_name', animation_title='', mark_critical_point=None, mark_noise_level=None, print_hint=True, fastMAPflag=False)[source]
Performs an automated MAP window scan with defined window_shift over the whole time series. In each window the drift slope and noise level estimates with corresponding credibility bands are computed and saved in the slope_storage and the noise_level_storage. It can also be used to create an plot of the sliding window approach plotting the time series, the moving window, and the time evolution of the drift slope estimates \(\hat{\zeta}\) and the noise level \(\hat{\sigma}\). The start indices of the shifted windows are also saved in order to facilitate customized plots.
- Parameters:
window_size (int) – Time window size.
window_shift (int) – The rolling time window is shifted about window_shift data points.
cred_percentiles (One-dimensional numpy array of integers.) – One or two entries to define the percentiles of the calculated credibility bands of the estimated parameters. It is stored in the attribute credibility_bands. Default is numpy.array([16,1]).
error_propagation (str) –
Defines the method that is used to compute the confidence bands. Default is
'summary statistics. In that case drift slope samples of sizesummary_window_sizeare used to compute the drift slope mean and its standard error. The parametersigma_multiplesdefines the width of the summary statistics’ symmetric error bands. If'error bound'or'uncorrelated Gaussian'is chosen, the marginal uncertainties corresponding tocred_percentilesare computed employing Wilks’ theorem. With'error bound'the highest uncertainties per parameter andcred_percentileslevel is interpreted as a symmetric worst case bound of the error. If'uncorrelated Gaussian'is chosen, the uncertainties are treated corresponding to the slight asymmetric results of the Wilks’ theorem confidence intervals. Both options give trustworthy results in the case of a first order polynomial drift. In case of the third order polynomial drift they are too optimistic, i.e. narrow, since the error bounds ('error bound') or errors ('uncorrelated Gaussian') are propagated without regard of correlations in the model parameters. This issue might be solvable by introducing an estimate of the covariance matrix in future, but is not planned for now. Furthermore, the'uncorrelated Gaussian'option keeps the slightly asymmetric errors proposed by Wilks’ theorem for the marginal parameter distributions. This small formal incorrectness is to maintain information about asymmetry in the estimates in a first Wilks’ theorem guess. Thecred_percentilesare fixed tocred_percentiles = numpy.array([5,1])to transform the half confidence bands under the assumption of Gaussian distributions by the factors 1.96 and 2.5525, respectively. The propagated result is transformed back.Hint
The options
'error bound'and'uncorrelated Gaussian'are only trustworthy for a linear drift term. They might also be correct for a third order polynomial drift in the case ofvery high amounts of data per window. Otherwise, the default'summary statistics'should be used, unless the window shift is very high. In that case it might introduce a too strong delay due to the averaging procedure.print_progress (Boolean) – If True the progress of the MAP estimation per window is shown. Default is False.
print_details (Boolean) – If True a more detailed print output in the case the binning procedure is provided. Default is False.
slope_save_name (string) – Name of the file in which the slope_storage array will be saved. Default is default_save_slopes.
noise_level_save_name (string) – Name of the file in which the noise_level_storage array will be saved. Default is default_save_noise.
save (Boolean) – If True the slope_storage and the noise_level_storage arrays are saved in the end of the scan in an .npy file. Default is True.
create_plot (Boolean) – If
True, the MAP resilience scan results are plotted. Default isFalse. NOTE: Not implemented yet.summary_window_size (int) – If
error_propagation = 'summary statistics'is chosen, the parameter defines the number of drift slope estimates to use in a window summary statistic. The windows are shifted by one.sigma_multiples (One dimensional numpy array of float .) – The array hast two entries. If
error_propagation = 'summary statistics'is chosen, the entries define the drift slope standard error multiples which are used to calculate the uncertainty bands.
- class antiCPy.early_warnings.drift_slope.binning_langevin_estimation.BinningLangevinEstimation(data, time, bin_num, drift_model='3rd order polynomial', diffusion_model='constant', prior_type='Non-informative linear and Gaussian Prior', prior_range=None, scales=array([4, 8]))[source]
Bases:
LangevinEstimationChild class of the
LangevinEstimationclass. Inherits all features and methods from the parent class. The class is used to apply a computation shortcut of the implemented resilience screening methods by a data binning approach (cf. [Kleinhans2012] ). Needed attributes and (overloaded) methods are provided.Hint
The fitting procedure is faster than the MAP estimation if the windows contain a certain amount of data.
- Parameters:
_bin_centers (One-dimensional numpy array of float) – Attribute that contains the bin centers of the time series.
_bin_num (int) – Number of chosen bins for the time series binning procedure.
_bin_array (One-dimensional numpy array of float) – Attribute that contains the limits of each bin.
_bin_labels (One-dimensional numpy array of int) – Attribute that contains the bin labels of each data point. Entries reach from one to
_bin_num._number_bin_members (One-dimensional numpy array of int) – Attribute that contains the number of data points of a time series that belong to each bin.
_bin_mean_increment (One-dimensional numpy array of float) – Attribute that contains the mean increments of each bin.
_bin_mean_increment_squared (One-dimensional numpy array of float) – Attribute that contains the squared mean increments of each bin.
_bin_MAP (One-dimensional numpy array of float) – Attribute that contains the maximum a posteriori probability of each bin.
- property bin_centers
Contains the center value of the binned data.
- property bin_increment_mean
Contains the mean increments per bin.
- property bin_increment_mean_squared
Contains the squared mean increment of each bin.
- static calc_bin_neg_log_likelihood(pars, bin_incr_mean, bin_incr_mean_squared, num_bin_members, dt)[source]
Calculates the negative logarithmic likelihood function of a bin in the binning data approach.
- bin_neg_log_likelihood()[source]
Calculates the negative logarithmic likelihood function of a bin in the binning data approach. The method calls the static helper function
calc_bin_neg_log_likelihood(...).
- class antiCPy.early_warnings.drift_slope.non_markov_estimation.NonMarkovEstimation(data, time, X_drift_model='3rd order polynomial', Y_model='Ornstein-Uhlenbeck noise', Y_drift_model='no offset first order', X_coupling_term='constant', Y_diffusion_model='constant', prior_type='Non-informative linear and Gaussian Prior', prior_range=None, scales=array([4, 8]), activate_time_scale_separation_prior=False, slow_process=None, time_scale_separation_factor=None, max_likelihood_starting_guesses=None, detrending_of_whole_dataset=None, gauss_filter_mode='reflect', gauss_filter_sigma=6, gauss_filter_order=0, gauss_filter_cval=0.0, gauss_filter_truncate=4.0, plot_detrending=False)[source]
Bases:
LangevinEstimationChild class of
LangevinEstimation. Inherits its functions to guarantee similar function structure.- Parameters:
data (One dimensional numpy array of floats) – A one dimensional numpy array containing the times series to analyse.
time (One dimensional numpy array of floats) – A one dimensional numpy array containing the time samples of the time series data.
X_drift_model (str) – Defines the drift model. Default is
'3rd order polynomial'. Additionally, a'first order polynomial'or a'3rd order odds correlated'can be chosen. In'3rd order odds correlated'the first and third order coefficients are the same to reduce complexity. Corresponds todrift_modelin theLangevinEstimationcase.X_coupling_term (str) – Defines the X_coupling_term. Default is
'constant'. Additionally, a'first order polynomial'can be chosen. Corresponds todiffusion_modelin theLangevinEstimationcase.Y_model (str) – Default is
'Ornstein-Uhlenbeck noise'. It defines automatically the Y drift model to be linear without offset and the Y diffusion to be constant. The drift coefficient is given by \(-\frac{1}{\theta_5^2}\) . The diffusion coefficient is given by \(\frac{1}{\theta_5}\) .Y_drift_model (str) – Other model settings than
Y_model = 'Ornstein-Uhlenbeck noise'are not tested yet. In principle,Y_drift_modelcan be chosen to be'first order polynomial','3rd order polynomial'and'3rd order odds correlated'.Y_diffusion_model (str) – Other model settings than
Y_model = 'Ornstein-Uhlenbeck noise'are not tested yet. In principle,'constant'and'first order polynomial'models can be chosen.prior_type (str) – Defines the used prior to calculate the posterior distribution of the data. Default is
'Non-informative linear and Gaussian Prior'. A flat prior can be chosen with'Flat Prior'.prior_range (Two-dimensional numpy array of floats, optional) – Customize the prior range of the estimated polynomial parameters \(\theta_i\) starting for \(\theta_0\) in row with index 0 and upper limits in the first column. Default is
Nonewhich means prior ranges of -50 to 50 for the drift parameters and 0 to 50 for constant diffusion terms.scales (One-dimensional numpy array with two float entries) – Two tailed percentiles to create credibility bands of the estimated measures.
theta (One-dimensional numpy array of floats) – Array that contains the estimated drift and diffusion parameters with drift and ending with diffusion. The lowest order is mentioned first.
ndim (int) – Total number of parameters to estimate.
nwalkers (int) – Number of MCMC chains.
nsteps (int) – Length of Markov chains.
nburn (int) – Length of burn in period of the MCMC chains.
data_size (int) – Length of the data array.
dt (float) – Time intervall of the equidistant time array.
drift_slope (One-dimensional numpy array of floats) – Array with the current drift slope estimate in the 0th component. Component 1, 2 and 3, 4 contain the upper and lower bound of the credibility intervals defined by the
scalesvariable.noise_level_estimate (One-dimenional numpy array of floats) – Array with the noise level estimate in the 0th component. Component 1, 2 and 3, 4 contain the upper and lower bound of the credibility intervals defined by the
scalesvariable.loop_range (One-dimensional (
data_size - window_size / window_shift + 1) numpy array of integers) – Array with the start indices of the time array for each rolled time window.window_size (int) – Size of rolling windows.
data_window (One-dimensional numpy array of floats) – Array that contains data of a certain interval of the dataset in
data.time_window (One-dimensional numpy array of floats) – Array that contains time of a certain interval of the dataset in
data.increments (One-dimensional numpy array of floats) – Array that contains the increments from \(t\) to \(t+1\).
joint_samples (Two-dimensional (
ndim, num_of_joint_samples) numpy array of floats) – Array that contains drawed parameter samples from their joint posterior probability density function that is estimated from the data in the currentdata_window.slope_estimates (One-dimensional numpy array of floats) – Estimates of the drift slope on the current
data_window.fixed_point_estimate (float) – Fixed point estimate calculated as the mean of the current
data_window.starting_guesses (Two-dimensional numpy array of floats) – Array that contains the initial guesses of the MCMC sampling.
theta_array (Two-dimensional numpy array of floats) – Array that contains the drift and diffusion parameters sampled with MCMC.
slope_storage (Two-dimensional (
5, loop_range.size) numpy array of floats) – Array that contains the drift slope estimates \(\hat{\zeta}\) of the rolling windows. The columns encode each rolling window with the drift slope estimate \(\hat{\zeta}\) in the 0th row and in rows 1,2 and 3,4 the upper and lower bound of the credibility intervals defined byscales.noise_level_storage (Two-dimensional (
5, loop_range.size) numpy array of floats) – Array that contains the noise level estimates \(\hat{\sigma}\) of the rolling windows. The columns encode each rolling window with the noise level estimate \(\hat{\sigma}\) in the 0th row and in rows 1,2 and 3,4 the upper and lower bound of the credibility intervals defined byscales.slope_kernel_density_obj (
scipykernel density object of thescipy.gaussian_kde(...)function.) – Kernel density object ofscipy.gaussian_kde(...)of the slope posterior created with a Gaussian kernel and a bandwidth computed by silverman’s rule.noise_kernel_density_obj – Kernel density object of
scipy.gaussian_kde(...)of the noise level posterior created with a Gaussian kernel and a bandwidth computed by silverman’s rule.slow_trend (One-dimensional numpy array of floats.) – Contains the subtracted slow trend if a detrending is applied to the whole data or each window separately.
detrending_of_whole_dataset (str) – Default is
None. If'linear'the wholedatais detrended linearly. IfGaussian kernel smootheda Gaussian smoothed curve is subtracted from the wholedata. The parameters of the kernel smoothing can be set bygauss_filter_mode,gauss_filter_sigma,gauss_filter_order,gauss_filter_cvalandgauss_filter_truncate.gauss_filter_mode (str) – According to the
scipy.ndimage.filters.gaussian_filteroption.gauss_filter_sigma (float) – According to the
scipy.ndimage.filters.gaussian_filteroption.gauss_filter_order (int) – According to the
scipy.ndimage.filters.gaussian_filteroption.gauss_filter_cval (float) – According to the
scipy.ndimage.filters.gaussian_filteroption.gauss_filter_truncate (float) – According to the
scipy.ndimage.filters.gaussian_filteroption.plot_detrending (bool) – Default is
False. IfTrue, theself.dataas well as theself.slow_trendand the detrended version are shown.activate_time_scale_separation_prior (bool) – If
True, a time scale separation of the observed process \(X\) and the unobserved process \(Y\) is assumed. Default isFalse.slow_process (str) – If
activate_time_scale_separation_prior = True, define the slow process via'X'or'Y', respectively.time_scale_separation_factor (int) – If
activate_time_scale_separation_prior = True, define the factor by which you assume the time scales of \(X\) and \(Y\) to differ.max_likelihood_starting_guesses (One-dimensional numpy array of float) –
If
None, the MAP starting guesses (essentially maximum likelihood because of flat priors) are are computed starting with all parameters set to one. In general, you can pass a one-dimensional numpy array withndimentries to this argument to define different starting values.Note
This is especially necessary, if you enable the time scale separation, since the default array of ones will contradict normally to the assumed two time scale model.
- static D1(data, theta, Y_model, drift_model, component)[source]
Calls the static calc_D1 function and returns its value. Passing of parameters is adapted to the NonMarkovEstimation model.
- static D2(data, theta, Y_model, diffusion_model, component)[source]
Calls the static calc_D2 function and returns its value. Passing of parameters is adapted to the NonMarkovEstimation model.
- static log_prior(theta, X_drift_model, X_coupling_term, Y_model, prior_type, prior_range, scales, data_window, activate_time_scale_separation_prior, slow_process, time_scale_separation_factor, print_time_scale_info)[source]
Returns the logarithmic prior probability for a given set of parameters depending on the drift model and coupling term of the observed variable X and the unobserved drift-diffusion process Y that couples into X via the specified coupling term. The prior probability is computed within the given prior_range of the parameters theta. A flat prior for the parameters apart from restriction to positive constant diffusion and a non-informative prior for linear parts and Gaussian priors for higher orders are implemented. The standard deviation of the Gaussian priors is given by the scales variable. If two separate time scales are assumed via
activate_time_scale_separation_prior=True, theslow_processand thetime_scale_separation_factorcan be specified. The time scales for each sampled parameter set are approximated by \(\left\lvert \frac{1}{f'(x)}\right\rvert\) and \(\left\lvert\frac{1}{f'(y)}\right\rvert\) with prime denoting derivative with respect to the variables \(x\) and \(y\), respectively.
- static log_likelihood(theta, data, X_drift_model, Y_drift_model, X_coupling_term, Y_diffusion_model, Y_model, dt)[source]
Returns the logarithmic likelihood of the data for the given model parametrization. It is given by the modified short-time propagator of [Willers2021] .
- static log_posterior(theta)[source]
Returns the logarithmic posterior probability of the data for the given model parametrization.
- static neg_log_posterior(theta)[source]
Calculates the negative logarithmic posterior for a given tuple of parameters :math:` heta`.
- static init_parallel_EnsembleSampler(data_window, X_drift_model, Y_drift_model, Y_diffusion_model, X_coupling_term, Y_model, dt, prior_type, prior_range, scales, activate_time_scale_separation_prior, time_scale_separation_factor, slow_process, print_time_scale_info)[source]
Initializes the workers for the MCMC sampling if
MCMC_parallelization_method = 'multiprocessing'inperform_resilience_scan(...).
- compute_posterior_samples(print_AC_tau, ignore_AC_error, thinning_by, print_progress, MCMC_parallelization_method=None, num_processes=None, num_chop_chains=None, MCMC_AC_estimate='standard')[source]
Compute the theta_array with \(nwalkers \cdot nsteps\) Markov Chain Monte Carlo (MCMC) samples. If ignore_AC_error = False the calculation will terminate with error if the autocorrelation of the sampled chains is too high compared to the chain length. Otherwise, the highest autocorrelation length will be used to thin the sampled chains. In order to run tests in shorter time you can set “ignore_AC_error = True” and define a thinning_by n steps. If print_AC_tau = True the autocorrelation lengths of the sampled chains is printed.
- declare_MAP_starting_guesses(nwalkers=None, nsteps=None, print_time_scale_info=False)[source]
Declare the maximum a posterior (MAP) starting guesses for a MCMC sampling with nwalkers and nsteps.
- calc_drift_slope_and_noise(slope_grid, noise_grid, OU_grid, X_coupling_grid, n_joint_samples=50000, n_slope_samples=50000, n_noise_samples=50000, n_OU_param_samples=50000, n_X_coupling_samples=50000, cred_percentiles=array([16, 1]), print_AC_tau=False, print_time_scale_info=False, ignore_AC_error=False, thinning_by=60, print_progress=False, detrending_per_window=None, gauss_filter_mode='reflect', gauss_filter_sigma=6, gauss_filter_order=0, gauss_filter_cval=0.0, gauss_filter_truncate=4.0, plot_detrending=False, print_details=False, MCMC_parallelization_method=None, num_processes=None, num_chop_chains=None, MCMC_AC_estimate='standard')[source]
Calculates the drift slope estimate \(\hat{\zeta}\) and the noise level \(\hat{\psi}\) from the MCMC sampled parameters of the non-Markovian model for a given window_size and given window_shift. In the course of the computations the parameters joint_samples, a noise_kernel_density_obj, a slope_kernel_density_obj, a X_coupling_density_obj, a OU_kernel_density_obj, the drift_slope with credibility intervals, the noise_estimates, the modified noise_level \(\hat{\psi}\) with credibility intervals as well as the X_coupling_estimate and OU_param_estimate with credibility invervals and in case of a third order polynomial drift the estimated fixed_point_estimate of the window will be stored.
- Parameters:
slope_grid (One-dimensional numpy array of floats.) – Array on which the drift slope kernel density estimate is evaluated.
noise_grid (One-dimensional numpy array of floats.) – Array on which the noise level \(\hat{\psi}\) kernel density estimate is evaluated.
OU_grid (One-dimensional numpy array of floats.) – Array on which the Ornstein Uhlenbeck (OU) parameter’s kernel density estimate is evaluated.
X_coupling_grid (One-dimensional numpy array of floats.) – Array on which the X coupling strength kernel density estimate is evaluated.
nwalkers (int) – Number of walkers that are initialized for the MCMC sampling via the package emcee. Default is 50.
nsteps (int) – Length of the sampled MCMC chains. Default is 10000.
nburn (int) – Number of data points at the beginning of the Markov chains which are discarded in terms of a burn in period. Default is 200.
n_joint_samples (int) – Number of joint samples that are drawn from the estimated joint posterior probability in order to calculate the drift slope estimate \(\zeta\) and corresponding credibility bands. Default is 50000.
n_slope_samples (int) – Number of drift slope samples that are drawn from the estimated drift slope posterior computed based on the sampling results of the joint posterior probability of the Langevin parameters :math:` heta_i`. Default is 50000.
n_noise_samples (int) – Number of constant noise level samples that are drawn from the estimated marginal probability distribution of the diffusion model is chosen constant. Default is 50000.
n_X_coupling_samples (int) – Number of X coupling samples that are drawn from the estimated posterior probability. Default is 50000.
n_OU_param_samples (int) – Number of OU parameter samples that are drawn from the estimated posterior probability. Default is 50000.
cred_percentiles (One-dimensional numpy array of int) – Two entries to define the percentiles of the calculated credibility bands of the estimated parameters. Default is numpy.array([16,1]).
print_AC_tau – If True the estimated autocorrelation lengths of the Markov chains is shown. The maximum length is used for thinning of the chains. Default is False.
ignore_AC_error (Boolean) – If True the autocorrelation lengths of the Markov chains is not estimated and thus, it is not checked whether the chains are too short to give an reliable autocorrelation estimate. This avoids error interruption of the procedure, but can lead to unreliable results if the chains are too short. The option should be chosen in order to save computation time, e.g. when debugging your code with short Markov chains. Default is False.
thinning_by (int) – If ignore_AC_error = True the chains are thinned by thinning_by. Every thinning_by-th data point is stored. Default is 60.
print_progress (Boolean) – If True the progress of the MCMC sampling is shown. Default is False.
detrending_per_window – Default is
None. If'linear'thedata_windowis detrended linearly. IfGaussian kernel smootheda Gaussian smoothed curve is subtracted from thedata_window. The parameters of the kernel smoothing can be set bygauss_filter_mode,gauss_filter_sigma,gauss_filter_order,gauss_filter_cvalandgauss_filter_truncate.gauss_filter_mode (str) – According to the
scipy.ndimage.filters.gaussian_filteroption.gauss_filter_sigma (float) – According to the
scipy.ndimage.filters.gaussian_filteroption.gauss_filter_order (int) – According to the
scipy.ndimage.filters.gaussian_filteroption.gauss_filter_cval (float) – According to the
scipy.ndimage.filters.gaussian_filteroption.gauss_filter_truncate (float) – According to the
scipy.ndimage.filters.gaussian_filteroption.plot_detrending (bool) – Default is
False. IfTrue, theself.data_windowas well as theself.slow_trendand the detrended version are shown.print_time_scale_info (bool) – If True, control values are plotted during each prior evaluation. First, the fast time scale estimate times the
time_scale_separation_factoris printed. Second, the slow time scale estimate and third, a check up whether the desired time scale separation is fulfilled are printed.MCMC_parallelization_method (str) – Default is None. If None the basic serial MCMC computation is performed. If MCMC_parallelization_method = ‘multiprocessing’, a multiprocessing pool with num_processes is used to accelerate MCMC sampling. If MCMC_parallelization_method = ‘chop_chain’ is used, the total length of the desired Markov chain is divided into ‘chop_chain’ parts each of which is sampled in parallel and joined together in the end.
num_processes (str or int) – Default is
None. If'half', almost half of the CPU kernels are used. If'all', all CPU kernels are used. If integer number, the defined number of CPU kernels is used for multiprocessing.num_chop_chains (int) – Number by which the total length of the Markov chain is divided. Each slice is sampled in parallel and joined together in the end of the calculations.
MCMC_AC_estimate (str) – If default ‘standard’ is used, emcee’s
.get_autocorr_time()is applied to estimate th sampled Markov chain’s autocorrelation length for thinning. In some cases the estimation procedure requires longer chains than you can run and does not converge at all. In such situations you can try to estimate the autocorrelation length with parametric models following the suggestions of the emcee documentation viaMCMC_AC_estimate = 'alternative'.
- perform_resilience_scan(window_size, window_shift, slope_grid, noise_grid, OU_grid, X_coupling_grid, nwalkers=50, nsteps=10000, nburn=200, n_joint_samples=50000, n_slope_samples=50000, n_noise_samples=50000, n_OU_param_samples=50000, n_X_coupling_samples=50000, cred_percentiles=array([16, 1]), print_AC_tau=False, ignore_AC_error=False, thinning_by=60, print_progress=False, print_details=False, print_time_scale_info=False, slope_save_name='default_save_slopes', noise_level_save_name='default_save_noise', save=True, create_animation=False, ani_save_name='default_animation_name', animation_title='', mark_critical_point=None, mark_noise_level=None, detrending_per_window=None, gauss_filter_mode='reflect', gauss_filter_sigma=6, gauss_filter_order=0, gauss_filter_cval=0.0, gauss_filter_truncate=4.0, plot_detrending=False, MCMC_parallelization_method=None, num_processes=None, num_chop_chains=None, MCMC_AC_estimate='standard')[source]
Performs an automated window scan with defined
window_shiftover the whole time series. In each window the drift slope, noise level, X coupling strength and Ornstein-Uhlenbeck (OU) parameter estimates with corresponding credibility bands are computed and saved in theslope_storage,noise_level_storage,X_coupling_storageandOU_param_storage. It can also be used to create an animation of the sliding window approach plotting the time series, the moving window, and the time evolution of the drift slope estimates \(\hat{\zeta}\), the noise level \(\hat{\psi}\) and the noise kernel density estimate. The start indices of the shifted windows are also saved in order to facilitate customized plots.- Parameters:
window_size (int) – Time window size.
window_shift (int) – The rolling time window is shifted about
window_shiftdata points.slope_grid (One-dimensional numpy array of floats.) – Array on which the drift slope kernel density estimate is evaluated.
noise_grid (One-dimensional numpy array of floats.) – Array on which the noise level kernel density estimate is evaluated.
OU_grid (One-dimensional numpy array of floats.) – Array on which the Ornstein Uhlenbeck (OU) parameter’s kernel density estimate is evaluated.
X_coupling_grid (One-dimensional numpy array of floats.) – Array on which the X coupling strength kernel density estimate is evaluated.
nwalkers (int) – Number of walkers that are initialized for the MCMC sampling via the package emcee. Default is 50.
nsteps (int) – Length of the sampled MCMC chains. Default is 10000.
nburn (int) – Number of data points at the beginning of the Markov chains which are discarded in terms of a burn in period. Default is 200.
n_joint_samples (int) – Number of joint samples that are drawn from the estimated joint posterior probability in order to calculate the drift slope estimate \(\zeta\) and corresponding credibility bands. Default is 50000.
n_slope_samples (int) – Number of drift slope samples that are drawn from the estimated drift slope posterior computed based on the sampling results of the joint posterior probability of the Langevin parameters :math:` heta_i`. Default is 50000.
n_noise_samples (int) – Number of constant noise level samples that are drawn from the estimated marginal probability distribution of the diffusion model is chosen constant. Default is 50000.
n_X_coupling_samples (int) – Number of X coupling samples that are drawn from the estimated posterior probability. Default is 50000.
n_OU_param_samples (int) – Number of OU parameter samples that are drawn from the estimated posterior probability. Default is 50000.
cred_percentiles (One-dimensional numpy array of integers.) – Two entries to define the percentiles of the calculated credibility bands of the estimated parameters. It is stored in the attribute credibility_bands. Default is numpy.array([16,1]).
print_AC_tau – If True the estimated autocorrelation lengths of the Markov chains is shown. The maximum length is used for thinning of the chains. Default is False.
ignore_AC_error (Boolean) – If True the autocorrelation lengths of the Markov chains is not estimated and thus, it is not checked whether the chains are too short to give an reliable autocorrelation estimate. This avoids error interruption of the procedure, but can lead to unreliable results if the chains are too short. The option should be chosen in order to save computation time, e.g. when debugging your code with short Markov chains. Default is False.
thinning_by (int) – If ignore_AC_error = True the chains are thinned by thinning_by. Every thinning_by-th data point is stored. Default is 60.
print_progress (Boolean) – If True the progress of the MCMC sampling is shown. Default is False.
print_details (Boolean) – If True a more detailed print output in the case the binning procedure is provided. Default is False.
slope_save_name (string) – Name of the file in which the slope_storage array will be saved. Default is default_save_slopes.
noise_level_save_name (string) – Name of the file in which the noise_level_storage array will be saved. Default is default_save_noise.
save (Boolean) – If True the slope_storage and the noise_level_storage arrays are saved in the end of the scan in an .npy file. Default is True.
create_animation (Boolean) –
If True an automated animation of the time evolution of the drift slope estimate :math:hat{zeta}`, the noise level \(\hat{\sigma}\) and the noise kernel density is shown together with the time series and rolling windows. Default is False.
Warning
The implementation is same as in
LangevinEstimation. It might work to generate animations with the option, but it is not tested. If errors occur, they are almost certainly bugs for that reason.ani_save_name (string) – If create_animation = True the animation is saved in a .mp4 file with this name. Default is default_animation_name.
animation_title (string) – A title can be given to the animation.
mark_critical_point (float) – A red dotted vertical line is shown at time mark_critical_point. Default is None.
mark_noise_level (float) – A green dotted line is shown at the noise level mark_noise_level in the noise kernel density plot and the time evolution plot of the noise level. Default is None.
detrending_per_window – Default is
None. If'linear'thedata_windowis detrended linearly. IfGaussian kernel smootheda Gaussian smoothed curve is subtracted from thedata_window. The parameters of the kernel smoothing can be set bygauss_filter_mode,gauss_filter_sigma,gauss_filter_order,gauss_filter_cvalandgauss_filter_truncate.gauss_filter_mode (str) – According to the
scipy.ndimage.filters.gaussian_filteroption.gauss_filter_sigma (float) – According to the
scipy.ndimage.filters.gaussian_filteroption.gauss_filter_order (int) – According to the
scipy.ndimage.filters.gaussian_filteroption.gauss_filter_cval (float) – According to the
scipy.ndimage.filters.gaussian_filteroption.gauss_filter_truncate (float) – According to the
scipy.ndimage.filters.gaussian_filteroption.plot_detrending (bool) – Default is
False. IfTrue, theself.data_windowas well as theself.slow_trendand the detrended version are shown.print_time_scale_info (bool) – If True, control values are plotted during each prior evaluation. First, the fast time scale estimate times the
time_scale_separation_factoris printed. Second, the slow time scale estimate and third, a check up whether the desired time scale separation is fulfilled are printed.MCMC_parallelization_method (str) – Default is None. If None the basic serial MCMC computation is performed. If MCMC_parallelization_method = ‘multiprocessing’, a multiprocessing pool with num_processes is used to accelerate MCMC sampling. If MCMC_parallelization_method = ‘chop_chain’ is used, the total length of the desired Markov chain is divided into ‘chop_chain’ parts each of which is sampled in parallel and joined together in the end.
num_processes (str or int) – Default is
None. If'half', almost half of the CPU kernels are used. If'all', all CPU kernels are used. If integer number, the defined number of CPU kernels is used for multiprocessing.num_chop_chains (int) – Number by which the total length of the Markov chain is divided. Each slice is sampled in parallel and joined together in the end of the calculations.
MCMC_AC_estimate (str) –
If default ‘standard’ is used, emcee’s
.get_autocorr_time()is applied to estimate the sampled Markov chain’s autocorrelation length for thinning. In some cases the estimation procedure requires longer chains than you can run and does not converge at all. In such situations you can try to estimate the autocorrelation length with parametric models following the suggestions of the emcee documentation viaMCMC_AC_estimate = 'alternative'.
- animation(i, slope_grid, noise_grid, OU_grid, X_coupling_grid, nwalkers, nsteps, nburn, n_joint_samples, n_slope_samples, n_noise_samples, n_OU_param_samples, n_X_coupling_samples, cred_percentiles, print_AC_tau, ignore_AC_error, thinning_by, print_progress, print_details, MCMC_parallelization_method, num_processes, num_chop_chains)[source]
Function that is called iteratively by the matplotlib.animation.FuncAnimation(…) tool to generate an animation of the rolling window scan of a time series. The time series, the rolling windows, the time evolution of the drift slope estimate \(\hat{\zeta}\) and the noise level estimate \(\hat{\psi}\) and its posterior probality density is shown in the animation.
- perform_MAP_resilience_scan(window_size, window_shift, cred_percentiles=array([16, 1]), error_propagation='summary statistics', summary_window_size=10, sigma_multiples=array([1, 3]), print_progress=True, print_details=False, slope_save_name='default_save_slopes', noise_level_save_name='default_save_noise', save=True, create_plot=False, ani_save_name='default_animation_name', animation_title='', mark_critical_point=None, mark_noise_level=None, print_time_scale_info=False, print_hint=True, fastMAPflag=False)[source]
Performs an automated MAP window scan with defined window_shift over the whole time series. This might be the first approach for a non-Markovian estimation, since the MCMC computations are time-consuming. In each window the drift slope, noise level, X coupling strength and OU parameter estimates with corresponding credibility bands are computed and saved in the slope_storage and the noise_level_storage.
- Parameters:
window_size (int) – Time window size.
window_shift (int) – The rolling time window is shifted about window_shift data points.
cred_percentiles (One-dimensional numpy array of integers.) – Two entries to define the percentiles of the calculated credibility bands of the estimated parameters. It is stored in the attribute credibility_bands. Default is numpy.array([16,1]).
error_propagation (str) –
Defines the method that is used to compute the confidence bands. Default is
'summary statistics. In that case drift slope samples of sizesummary_window_sizeare used to compute the drift slope mean and its standard error. The parametersigma_multiplesdefines the width of the summary statistics’ symmetric error bands. If'error bound'or'uncorrelated Gaussian'is chosen, the marginal uncertainties corresponding tocred_percentilesare computed employing Wilks’ theorem. With'error bound'the highest uncertainties per parameter andcred_percentileslevel is interpreted as a symmetric worst case bound of the error. If'uncorrelated Gaussian'is chosen, the uncertainties are treated corresponding to the slight asymmetric results of the Wilks’ theorem confidence intervals. Both options give trustworthy results in the case of a first order polynomial drift. In case of the third order polynomial drift they are too optimistic, i.e. narrow, since the error bounds ('error bound') or errors ('uncorrelated Gaussian') are propagated without regard of correlations in the model parameters. Furthermore, the'uncorrelated Gaussian'option keeps the slightly asymmetric errors proposed by Wilks’ theorem for the marginal parameter distributions. This small formal incorrectness is to maintain information about asymmetry in the estimates in a first guess. Thecred_percentilesare fixed tocred_percentiles = numpy.array([5,1])to transform the half confidence bands under the assumption of Gaussian distributions by the factors 1.96 and 2.5525, respectively. The propagated result is transformed back.Hint
The options
'error bound'and'uncorrelated Gaussian'are only trustworthy for a linear drift term. They might also be correct for a third order polynomial drift in the case ofvery high amounts of data per window. Otherwise, the default'summary statistics'should be used, unless the window shift is very high. In that case it might introduce a too strong delay due to the averaging procedure.print_progress (Boolean) – If True the progress of the MAP estimation per window is shown. Default is False.
print_details (Boolean) – If True a more detailed print output in the case the binning procedure is provided. Default is False.
slope_save_name (string) – Name of the file in which the slope_storage array will be saved. Default is default_save_slopes.
noise_level_save_name (string) – Name of the file in which the noise_level_storage array will be saved. Default is default_save_noise.
save (Boolean) – If True the slope_storage and the noise_level_storage arrays are saved in the end of the scan in an .npy file. Default is True.
create_plot (Boolean) – If
True, the MAP resilience scan results are plotted. Default isFalse. NOTE: Not implemented yet.summary_window_size (int) – If
error_propagation = 'summary statistics'is chosen, the parameter defines the number of drift slope estimates to use in a window summary statistic. The windows are shifted by one.sigma_multiples (One dimensional numpy array of float .) – The array hast two entries. If
error_propagation = 'summary statistics'is chosen, the entries define the drift slope standard error multiples which are used to calculate the uncertainty bands.
- static auto_window(taus, c)[source]
Helper function to perform parametric Markov chain autocorrelation estimation following the suggestions in emcee’s documentation .
- static autocorr_ml(y, thin=1, c=5.0)[source]
Helper function to perform parametric Markov chain autocorrelation estimation following the suggestions in emcee’s documentation .
- static autocorr_new(y, c=5.0)[source]
Helper function to perform parametric Markov chain autocorrelation estimation following the suggestions in emcee’s documentation .
- static autocorr_func_1d(x, norm=True)[source]
Helper function to perform parametric Markov chain autocorrelation estimation following the suggestions in emcee’s documentation .
- static next_pow_two(n)[source]
Helper function to perform parametric Markov chain autocorrelation estimation following the suggestions in emcee’s documentation .
- antiCPy.early_warnings.drift_slope.summary_statistics_helper.summary_statistics_helper(metric, summary_window_size=10, sigma_multiples=array([1, 3]))[source]
Computes the mean of the drift slope \(\hat{\zeta}\) and its standard error in a predefined summary statistics window for the
LangevinEstimationapproaches with enabled summary statistics. In case of theNonMarkovEstimationthe summary statistics are computed for the drift slope \(\hat{\zeta}\) and the noise level estimates \(\hat{\psi}\).- Parameters:
summary_window_size (int) – If
error_propagation = 'summary statistics'is chosen, the parameter defines the number of drift slope estimates to use in a window summary statistic. The windows are shifted by one.sigma_multiples (One dimensional numpy array of float .) – The array hast two entries. If
error_propagation = 'summary statistics'is chosen, the entries define the drift slope standard error multiples which are used to calculate the uncertainty bands.