“trend_extrapolation” subpackage
This is the antiCPy.trend_extrapolation subpackage. It contains the class CPSegmentFit which incorporates all attributes needed to implement the Bayesian non-parametric fit which takes into account possible change points. The basic procedure is described in [vdL14] [K14] and the nomenclature is chosen congruent to that. Each of the calculation steps is realized by a class function of CP_segment_fit. You can follow the instructions of the cited papers to interpret the coding. For example, the segment fit can be applied to drift slope estimate \(\hat{\zeta}(t) \equiv y(x)\) time series computed with the antiCPy.early_warnings module.
- class antiCPy.trend_extrapolation.cp_segment_fit.CPSegmentFit(x_data, y_data, number_expected_changepoints, num_MC_cp_samples, predict_up_to=None, z_array_size=100)[source]
The
CP_segment_fitclass contains tools to perform a Bayesian segmental fit under the assumption of a certain number of change points.- Parameters
x_data (One-dimensional numpy array of floats) – Given data on the x-axis. Saved in attribute
x.y_data (One-dimensional numpy array of floats) – Given data on the y-axis. Saved in attribute
y.number_expected_changepoints (int) – Number of expected change points in the fit.
num_MC_cp_samples (int) – Maximum number of MC summands that shall be incorporated in order to extrapolate the fit. Saved in attribute
n_MC_samplesn_MC_samples (int) – Attribute contains the number of MC summands of the performed extrapolation of the fit. It is exact, whenever the number of possible change point configurations is smaller than
num_MC_cp_samplescp_prior_pdf (One-dimensional numpy array of floats) – Attribute that contains the flat prior probability of the considered change point configurations.
num_cp_configs (int) – Attribute of the number of possible change point configurations.
exact_sum_control (bool) – If this attribute is
Truethen the exact sum over all possible change point configurations will be computed in order to extrapolate the fit. If it is False, the given maximum numbernum_MC_cp_samplesof summands is smaller than the number of all possible change point configurations and the sum is performed as an approximative sum over num_MC_cp_samples randomly chosen change point configurations.predict_up_to (float) – Defines the x-horizon of the extrapolation of the fit. Default is
None, since it depends on the time scale of the given problem. It is saved in the attributeprediction_horizon.d (One-dimensional numpy array of floats) – Attribute that contains the given
y_data.x (One-dimensional numpy array of floats) – Attribute that contains the given
x_data.A_matrix (Three-dimensional (
num_MC_cp_samples,x_data.size,number_expected_changepoints + 2) numpy array of floats) – Attribute that contains the coefficients of the linear segments for the considered change point configurations.A_dim (One-dimensional numpy array of floats) – Contains the dimensions of the
A_matrix.N (int) – Attribute that contains the data size of the input
x_dataandy_data.n_cp (int) – Attribute that contains the
number_expected_changepoints.MC_cp_configurations (Two-dimensional (
num_MC_cp_samples,number_expected_changepoints + 2) numpy array of floats) – Attribute that contains all possible change point configurations under the given assumptions and amount of data.f0 (Two-dimensional (
num_MC_cp_samples,number_expected_changepoints + 2) numpy array of floats) – Attribute that defines a matrix of mean design ordinates. Each row corresponds to a vector of a specific configuration of change point positions.x_start (float) – Attribute contains the start value of
x_data/x.x_end (float) – Attribute contains the end value of
x_data/x.prediction_horizon (float) – Attribute in which the upper limit of the extrapolation x-horizon is saved.
Q_matrix (Three-dimensional (
num_MC_cp_samples,number_expected_changepoints + 2,number_expected_changepoints + 2) numpy array of floats) – Attribute that contains the matrices \(Q=A^{T}A\) of the considered change point configurations.Q_inverse (Three-dimensional (
num_MC_cp_samples,number_expected_changepoints + 2,number_expected_changepoints + 2) numpy array of floats) – Attribute that contains the inverse Q_matrices of each considered change point configuration.Res_E (One-dimensional (
num_MC_cp_samples) numpy array of floats) – Attribute contains the residues \(R(E)=d^T d - \sum_k (u_k^Td)^2\) of each possible change point configuration \(E\).marginal_likelihood_pdf (One-dimensional (
num_MC_cp_samples) numpy array of floats) – Attribute that contains the marginal likelihood of each change point configuration.marginal_log_likelihood (One-dimensional (
num_MC_cp_samples) numpy array of floats) – Attribute that contains the marginal natural logarithmic likelihood of each change point configuration.marginal_cp_pdf (One-dimensional (
num_MC_cp_samples) numpy array of floats) – Attribute that contains the normalized a posteriori probability of the computed change point configurations. The normalization is valid for the grid ofx_data.prob_cp (One-dimensional (
num_MC_cp_samples) numpy array of floats) – Attribute that contains the probability \(P(E \vert \underline{d}, \underline{x}, \mathcal{I})\) of a given change point configuration \(E\).D_array (One-dimensional numpy array of floats) – Attribute that contains the fitted values in the interval from the beginning of the time series up to
prediction_horizon.DELTA_D2_array (One-dimensional numpy array of floats) – Attributes that contains the variances of the fitted values in
D_array.transition_time (float) – Attribute which contains the time at which the extrapolated function crosses zero.
upper_uncertainty_bound (float) – Attribute which contains the time at which the upper uncertainty boundary crosses zero.
lower_uncertainty_bound (float) – Attribute which contains the time at which the lower uncertainty boundary crosses zero.
- initialize_MC_cp_configurations(print_sum_control=False, config_output=False)[source]
Defines the array
MC_cp_configurationsof all possible change point configurations including start and endxif the exact sum is computed. Otherwise it creates an approximate set of random change point configurations based on the cited literature.- Parameters
print_sum_control (bool) – If
print_sum_control == Trueit prints whether the exact or the approximate MC sum is computed. Default isFalse.config_output (bool) – If
Truethe possible change point configurations without start and end data point and the shape of the corresponding array are printed. Additionally, theMC_cp_configurationsattribute and its shape is printed. The attribute includes the start and end values. Default isFalse.
- initialize_A_matrices()[source]
Creates the A_matrices of the MC summands which correspond to possible change point configurations.
- Q_matrix_and_inverse_Q(save_Q_matrix=False)[source]
Computes the Q_matrices and the inverse of them for each MC summand which corresponds to a possible change point configuration.
- calculate_f0()[source]
Calculates
f0as the mean \(f_0\) of the normal distribution that characterizes the probability density function of the ordinate vectors \(f\).
- calculate_marginal_likelihood()[source]
Computes the
marginal_log_likelihoodas \(1/Z (R(E))^{(N-3)/2}\) and the correspondingmarginal_likelihoodof each considered change point configuration.
- calculate_marginal_cp_pdf(integration_method='Riemann sum')[source]
Calculates the marginal posterior
marginal_cp_pdfof each possible configuration of change point positions and normalizes the resulting probability density function. Therefore, the normalization constant is determined by integration of the resulting pdf via the simpson rule.- Parameters
integration_method (str) – Determines the integration method to compute the normalization. Default is
'Riemann sum'for performing numerical integration via a sum of rectangles with the sample width. Alternatively, the'Simpson rule'can be chosen in the case of one possible change point. Sometimes the Simpson rule tends to be unstable. The method should be the same as the integration method used incalculate_cp_prob(...).
- calculate_prob_cp(integration_method='Riemann sum')[source]
Calculates the probability
prob_cpof each configuration of change point positions.- Parameters
integration_method (str) – Determines the integration method to compute the change point probability. Default is
'Riemann sum'for numerical integration with rectangles. Alternatively, the'Simpson rule'can be chosen under the assumption of one change point. Sometimes the Simpson rule tends to be unstable. The method should be the same as the integration method used incalculate_marginal_cp_pdf(...).
- predict_D_at_z(z)[source]
- Parameters
z (float) – The x-data for which an extrapolated value
Dwith varianceDELTA_D2shall be calculated.- Returns
The extrapolated y-data point
Dand its varianceDELTA_D2for a given x-data pointz.
- cp_scan(print_sum_control=False, integration_method='Riemann sum', config_output=False)[source]
Perform a change point scan on the dataset.
- Parameters
print_sum_control (Boolean) – If print_sum_control = True it prints whether the exact or the approximate MC sum is computed. Default is False.
integration_method (str) – Determines the integration method to compute the change point probability. Default is
'Riemann sum'for numerical integration with rectangles. Alternatively, the'Simpson rule'can be chosen under the assumption of one change point. Sometimes the Simpson rule tends to be unstable. The method should be the same as the integration method used incalculate_marginal_cp_pdf(...).
- fit(sigma_multiples=3, print_progress=True, integration_method='Riemann sum', config_output=False, print_sum_control=True)[source]
Computes the segmental linear fit of the time series data with integrated change point assumptions over the
z_arraywhich containsz_array_sizeequidistant data points in the range from the first entry ofxup to theprediction_horizon. The fit results and corresponding variances are saved in the attributesD_arrayandDELTA_D2_array, respectively.- Parameters
sigma_multiples – Specifies which multiple of standard deviations is chosen to determine the
upper_uncertainty_boundand thelower_uncertainty_bound. Default is 3.print_progress (bool) – If
Truethe currently predicted data count is printed and updated successively.integration_method (str) – Determines the integration method to compute the change point probability. Default is
'Riemann sum'for numerical integration with rectangles. Alternatively, the'Simpson rule'can be chosen under the assumption of one change point. Sometimes the Simpson rule tends to be unstable. The method should be the same as the integration method used incalculate_marginal_cp_pdf(...).
Bibliography
- vdL14
Linden, W., Dose, V., & Toussaint, U. (2014). Bayesian Probability Theory: Applications in the Physical Sciences. Cambridge: Cambridge University Press. doi:10.1017/CBO9781139565608
- K14
A. Klöckner, F. van der Linden, and D. Zimmer, in Proceedings of the 10th International Modelica Conference, March 10-12, 2014, Lund, Sweden (Linköping University Electronic Press, 2014)