Finite Iterative Forecasting Model Based on Fractional Generalized Pareto Motion

: In this paper, an efﬁcient prediction model based on the fractional generalized Pareto motion (fGPm) with Long-Range Dependent (LRD) and inﬁnite variance characteristics is proposed. Firstly, we discuss the meaning of each parameter of the generalized Pareto distribution (GPD), and the LRD characteristics of the generalized Pareto motion are analyzed by taking into account the heavy-tailed characteristics of its distribution. Then, the mathematical relationship H = 1/ α between the self-similar parameter H and the tail parameter α is obtained. Also, the generalized Pareto increment distribution is obtained using statistical methods, which offers the subsequent derivation of the iterative forecasting model based on the increment form. Secondly, the tail parameter α is introduced to generalize the integral expression of the fractional Brownian motion, and the integral expression of fGPm is obtained. Then, by discretizing the integral expression of fGPm, the statistical characteristics of inﬁnite variance is shown. In addition, in order to study the LRD prediction characteristic of fGPm, LRD and self-similarity analysis are performed on fGPm, and the LRD prediction conditions H > 1/ α is obtained. Compared to the fractional Brownian motion describing LRD by a self-similar parameter H , fGPm introduces the tail parameter α , which increases the ﬂexibility of the LRD description. However, the two parameters are not independent, because of the LRD condition H > 1/ α . An iterative prediction model is obtained from the Langevin-type stochastic differential equation driven by fGPm. The prediction model inherits the LRD condition H > 1/ α of fGPm and the time series, simulated by the Monte Carlo method, shows the superiority of the prediction model to predict data with high jumps. Finally, this paper uses power load data in two different situations (weekdays and weekends), used to verify the validity and general applicability of the forecasting model, which is compared with the fractional Brown prediction model, highlighting the “high jump data prediction advantage” of the fGPm prediction model.


Introduction
This paper studies the infinite variance prediction model for random sequences with LRD characteristics. As random sequences are independent, the incremental processes of many existing models (such as the Gamma process [1], Markov process [2] and Wiener process [3]) are independent and cannot rely on the current state of the process to accurately predict random sequences. The LRD model [4][5][6] can predict more efficiently a random sequence by comprehensively considering the influence of past states and the current state on the future state. In addition, in terms of data characteristics, random processes can be divided into finite variance processes and infinite variance processes. The significant Previously, Stanislavsky et al. [12] proposed a fractional autoregressive integrated moving average (FARIMA) model with Pareto noise, and the FARIMA model is a discretetime simulation that considers non-Gaussian statistics and the fractional Langevin equation of LRD [25]. Therefore, in the following we propose a Langevin-type stochastic differential equation (SDE) [26,27] driven by fGPm. First, we extend the fractional Black-Scholes model [28,29] and obtain a parameterized SDE, where the two parameters µ and δ represent the mean (drift) and diffusion coefficients of the sequence, respectively, and the improved maximum likelihood method [30][31][32] can be used for parameter estimation. Then, the fGPm is discretized by a fractional Taylor series [33], and the mathematical relationship between the increment of fGPm and the generalized double Pareto white noise [34] is obtained and substituted into the discrete Langevin-type SDE. Finally, using the discrete Langevin-type SDE and some difference equations, the expression of the fGPm finite difference iterative prediction model is obtained. In addition, the LRD condition of the prediction model fulfills the LRD condition αH > 1 inherited from fGPm.
To verify the effectiveness of the proposed prediction model, we apply it to historical power load data collected by the Eastern Slovak Electric Power Company [35]. In order to evaluate the LRD characteristics of the power load sequence, we first use R/S [36] and the improved maximum likelihood estimation method [30][31][32] to estimate the parameters of the sequence, and then, we analyze the power load according to the condition αH > 1 to check whether the sequence owns the LRD characteristics. In order to verify the general applicability of the proposed prediction model, we analyze the power load data in two different situations, i.e., working days and weekends, to predict and forecast the trend of the historical data set for the next 6, 12, 18 and 24 h. The working days' data, i.e., the power load from Monday to Thursday, is used as historical data to predict the power load trend on Friday. Since the power consumption on working days is similar, and Monday to Thursday are close to Friday, it can accurately reflect the trend of power load data. In the same way, we select the data on Saturday and Sunday of two weeks to predict the data on Saturday and Sunday of the other week. At the same time, in order to highlight the superiority of the fGPm iterative prediction model in predicting high-jump data, the fractional Brown iterative prediction model [23,36] is used to compare the prediction accuracy of the two models.
The structure of the paper is as follows. Section 2 introduces the meaning and incremental distribution of each parameter in GPD, and introduces the LRD expression of generalized Pareto motion. Section 3 deals with the integral expression of fGPm and its increment. We also analyze the self-similarity and LRD characteristics of fGPm. In Section 4, the fGPm finite difference iterative prediction model is proposed. The Langevin-type SDE driven by fGPm establishes a finite-difference iterative prediction model and introduces the parameter estimation method of the Langevin-type SDE, an improved maximum likelihood estimation method [30][31][32]. Power load data considered in cases are used to show the effectiveness of the prediction model in Section 5. Summary and future perspectives of the work are given in Section 6.

Parameter Meaning of Generalized Pareto
The classical Pareto distribution is defined by the following probability density function (PDF) [14,37]: where x ≥ α > 0, δ > 0. However, in order to describe the random process in a more flexible way, the following PDF generalization has been proposed [9][10][11].
Which depends on the additional parameter µ and where x ≥ µ, α > 0, δ > 0. This variant is also called the generalized Pareto distribution (GPD). The parameter α is called the tail, where δ is the scale parameter and µ the location. The location µ indicates the minimum value of x, and the scale parameter δ represents the discrete nature of the distribution (Figure 1).

{| | > }~−
where is constant. The tail of the distribution with 0 < < 2 decreases to zero so slowly that the variance tends to infinity, an of , the slower it decays. From the probability distribution poi the tails are thicker ( Figure 2).   In the following, we consider only an infinite variance process, thus limiting ourselves to the case 0 < α < 2, for both practical and theoretical reasons [4]. When x → ∞ , the probability tails of X satisfy the conditions (see e.g., [19]): where C is constant. The tail of the distribution with 0 < α < 2 obeys a power law and decreases to zero so slowly that the variance tends to infinity, and the smaller is the value of α, the slower it decays. From the probability distribution point of view, it is said that the tails are thicker ( Figure 2). distribution ( Figure 1). In the following, we consider only an infinite variance pro selves to the case 0 < < 2, for both practical and theoretical re the probability tails of satisfy the conditions (see e.g., [19]): where is constant. The tail of the distribution with 0 < < 2 decreases to zero so slowly that the variance tends to infinity, and of , the slower it decays. From the probability distribution poin the tails are thicker (Figure 2).

LRD Characteristics of Generalized Pareto Motion
It is known that in the case of finite variance, the LRD characteristics of the finite variance time series can be expressed by its covariance. The covariance Cov(x 0 , x t ) slowly drops to 0 in the form of power law, that is, Cov(x 0 , x t ) ∼ Ct −a , ( t → ∞ ) so slowly that ∑ ∞ t=0 Cov(x 0 , x t ) = ∞, and this conditions characterizes the time series of finite variance with LRD characteristics [13,14]. However, the generalized Pareto time series with tail parameter α ∈ (0, 2) is an infinite variance process, which implies the absence of covariance. To analyze the LRD characteristics of the generalized Pareto time series, we start from the mathematical definition of covariance to analyze its long correlation. The definition of covariance is as follows [15]: where p(x 1 , t 1 ; x 2 , t 2 ) is the joint probability density function of GPD. It can be seen from Equation (4) that when p (x) in the calculation of the autocorrelation function slowly decays, also Cov(x(t 1 ), x(t 2 )) slowly decays, and the decay speed of Cov(x(t 1 ), x(t 2 )) is proportional to the decay speed of p (x), where, the intensity of LRD is inversely proportional to the decay rate of Cov(x(t 1 ), x(t 2 )). Meanwhile, the LRD intensity can be represented by the self-similar parameter H, and when H ∈ (1/2, 1), H is proportional to the intensity of the LRD [17,18]. So, it can be concluded that there is an inverse relationship between H and α.
For the infinite variance process, both the Levy stable distribution and GPD are characterized by the parameter α, and their tails P ( > x) satisfy [7,19]: where F (x) is the cumulative distribution function and ∼ indicates that the functions on the left and the right sides are asymptotically equivalent, i.e., their ratio tends to 1. It is known that the relationship between H and α in Levy stable motion is H = 1/α [20], and when the range of Pareto α is (0, 2), it is asymptotically in the attractive domain of Levy stable distribution [7,12]. So, the relationship between H and α of GPD motion is:

Incremental Distribution of the Generalized Pareto
The subsequent iterative prediction model is expressed by the incremental form. So, in this section, we use statistical methods to obtain the incremental distribution of GPD. The main steps can be summarized as follows: (1). Generate GPD-compliant time series.
(2). Determine the time interval τ, and the two-state quantities separated by τ in the sequence are differentiated multiple times, i.e., ∆x(t) = x(t + τ) − x(t). Repeat the above process to make multiple differences of the generated time series to form an incremental set. (3). Draw the histogram of the incremental set, in order to obtain the probability density map of the set and select a known distribution to fit it according to the characteristics of the probability distribution. (4). After proposing a distribution that meets the characteristics of the incremental data, the χ 2 test method [38,39] is used to test the distribution fit.
In the following, we use the GPD with parameters α = 1.5, δ = 2, µ = 10. The corresponding time series is shown in Figure 3.  The GPD is given in Figure 3 and the GPD incrementa The peak and high variability of the GPD time series data in heavy-tailed. At the same time, the GPD time series are no dom) noise, and the obvious clustering in the data indicates time series.
It can be seen from Figure 5 that the probability densi the characteristics of symmetry and tail weight, and in orde the parameters in the probability density, we can choose th istics and a similar probability density function. Generalize is used, then to fit the date. The PDF is as follows [34]: where > 0 is a scale parameter, > 0 is a tail paramet sents the mean value of the random variable. Similarly Equation (3). The influence of and on the distribution ures 6 and 7). The GPD is given in Figure 3 and the GPD incremental data are plotted in Figure 4. The peak and high variability of the GPD time series data indicate that the distribution is heavytailed. At the same time, the GPD time series are not similar to white (pure random) noise, and the obvious clustering in the data indicates the self-similarity of the GPD time series. The GPD is given in Figure 3 and the GPD increment The peak and high variability of the GPD time series data i heavy-tailed. At the same time, the GPD time series are n dom) noise, and the obvious clustering in the data indicate time series.
It can be seen from Figure 5 that the probability dens the characteristics of symmetry and tail weight, and in ord the parameters in the probability density, we can choose th istics and a similar probability density function. Generaliz is used, then to fit the date. The PDF is as follows [34]: where > 0 is a scale parameter, > 0 is a tail parame sents the mean value of the random variable. Similarly Equation (3). The influence of and on the distribution ures 6 and 7).  It can be seen from Figure 5 that the probability density of the GPD increment has the characteristics of symmetry and tail weight, and in order not to affect the meaning of the parameters in the probability density, we can choose those with these two characteristics and a similar probability density function. Generalized double Pareto distribution is used, then to fit the date. The PDF is as follows [34]: where δ > 0 is a scale parameter, α > 0 is a tail parameter, and the location µ represents the mean value of the random variable. Similarly, when x → ∞ , f (x) follows Equation (3). The influence of δ and α on the distribution is consistent with GPD ( Figures 6 and 7).     It can be seen from Figure 5 that the position param double Pareto distribution. And because when = 0, ( of 0 can be used to estimate . It can be seen from Figu It can be seen from Figure 5 that the position parameter is µ = 0 in the generalized double Pareto distribution. And because when x = 0, f (x) = 1/2δ, the probability value of 0 can be used to estimate δ. It can be seen from Figure 4, their peak value of the GPD increment sequence is due to the corresponding peak value of the GPD time sequence and their tails are similar in distribution, so their corresponding tail parameters α are the same. It can be concluded that the values of the parameters in the generalized double Pareto distribution are µ = 0, δ = 1.1638 and α = 1.5. To form the χ2 test [38,39] between the empirical distribution in Figure 5 and the generalized double Pareto distribution with known parameters, the process is as follows: (1). Make the following test assumptions, H 0 : Figure 4 obeys the generalized double Pareto distribution, H 1 : Figure 4 does not obey the generalized double Pareto distribution. (2). Partition the value range of the overall data X in Figure 4 , where the size of k is not strictly specified, but if it is too small, it will make the test too rough, and if it is too large, it will increase random errors. Usually, the sample size n is larger, and k can be slightly larger, but generally 5 ≤ k ≤ 16. In this example, there are four grouping cases k = 10, 12, 14, 16. (3). Assuming that H 0 holds, calculate the theoretical probability p i and theoretical frequency np i of each interval: (4). According to the sample observation values (x 1 , x 2 , . . . , x n ) in Figure 4, calculate the actual frequency v i of falling in the interval [a i−1 ,a i ], and then calculate the observation value of the statistic χ2: Fractal Fract. 2022, 6, 471 9 of 20 (5). Accordingly, choosing 95% based on confidence, check the χ2 distribution Table, and get χ2(k − r − 1), where r is the number of unknown parameters in the generalized Pareto distribution PDF, because the generalized Pareto distribution PDF parameters are known, so r = 0. The four sets of χ2(k − 1) values found out from the table and the χ2 values obtained from the four experiments are compared in Table 1: It can be seen from Table 1 that the experimental results show that the χ2 value is always smaller than the χ2(k − 1) value obtained from the look-up Table, so there is a 95% probability that H 0 holds. In a statistical sense, it can be said that the time series of Figure 4 obeys the generalized double Pareto distribution.

Fractional Generalized Pareto Motion Model
We define a parametric family of fractional Brownian motion in terms of the stochastic Weyl integral [7,40].
where a and b are arbitrary constants, x H−1/2 is Gaussian with mean 0 and variance |ds|, H is the self-similarity parameter. The integral expression analogous to fractional Brownian motion is generalized to the method of fractional Levy stable motion [8,21,22], and similarly, the integral expression of fGPm can also be obtained. In addition, the relationship between H and α in the generalized Pareto motion is H = 1/α . The representation of fGPm is acquired by transforming the exponent in the Equation (5) from H − 1/2 to H − 1/α, and a GPD random measure with the location parameter µ = 0 replaces the standard Brownian random measure as the driving function. The fGPm is defined by: where p(ds) is a GPD random process with the location parameter µ = 0 and scale parameter = |ds|. When t > s > 0, we discretize s as {0, 1, 2, · · · , t}, and Equation (11) can be written as: where a(t − s) H− 1 α − b(−s) H− 1 α can also be written as a constant sequence A n = {a 0 , a 1 , · · · , a t }, to obtain: P H,α (t) = (a 0 + a 1 p + · · · + a n )p(1) = A n p(1) Therefore, fGPm in Equation (11) can be regarded as a superposition of GPD with different weighting coefficients, where µ and δ of fGPm are 0 and ds, respectively. It can be concluded that the fGPm has infinite variance and its probability distribution is subject to GPD. Similarly, we can also define the fGPm as the following Riemann-Liouville fractional integral [41]: where p(dτ) is the GPD motion with the location parameter µ = 0 and the scale parameter = dτ. Γ(·) is the gamma function, the gamma function is defined as Γ(x) = +∞ 0 t x−1 e −t dt(x > 0). Figure 8 shows the fGPm sequence generated with different α values for H = 0.75. We should notice that the random walk of the fGPm sequence increases as α increases.
where ( ) is the GPD motion with the location parameter = 0 and the scale parameter = . (·) is the gamma function, the gamma function is defined as ( ) = ∫ −1 − ( > 0) +∞ 0 . Figure 8 shows the fGPm sequence generated with different values for = 0.75. We should notice that the random walk of the fGPm sequence increases as increases.   Figure 9 shows the fGPm incremental sequence generated with different , where = 0.75. We observe that the influence of noise increases as the parameters increases.

Fractional Generalized Pareto Motion Incremental Processes Model
The 1-step incremental model of fGPm is obtained from the identity X H,α (t) = P H,α (t + 1) − P H,α (t): (15) where ω α (s) = p[d(s + 1)]−p(ds)]. From Section 2.1, we can see that ω α (s) follows the generalized double Pareto distribution with position parameter µ = 0 and scale parameter δ = 1. Figure 9 shows the fGPm incremental sequence generated with different α, where H = 0.75. We observe that the influence of noise increases as the parameters increases.

Long-Range Dependence and Self-Similarity of Fractional Generalized Pareto Motion
The basic idea of self-similarity for stochastic processes can be seen as the invariance in distribution under suitable scaling of time. Self-similarity can be expressed as: where ( ) = [ ( + 1)] − ( )]. From Section 2.1, we can see that ( ) follows the generalized double Pareto distribution with position parameter = 0 and scale parameter = 1. Figure 9 shows the fGPm incremental sequence generated with different , where = 0.75. We observe that the influence of noise increases as the parameters increases.

Long-Range Dependence and Self-Similarity of Fractional Generalized Pareto Motion
The basic idea of self-similarity for stochastic processes can be seen as the invariance in distribution under suitable scaling of time. Self-similarity can be expressed as: where (Hurst exponent) is the self-similarity parameter, ≜ denotes equality in distribution. Several methods can be used to calculate the self-similarly parameter H such as the absolute value method, the periodogram estimation method, the wavelet estimation method, rescaled range method etc. [42][43][44]. Since the best accuracy is achieved by the rescaled range method [36], we use this method for the estimation of the self-similar parameters in LRD random processes. Generalized Pareto motion is a self-similar process with a self-similarity parameter 1 ⁄ , namely, ( ) ≜ −1/ ( ) for all > 0. As can be seen from Figure 8, the fGPm does not resemble white (pure random) noise as the clustering in data is clearly visible, this indicates the existence of self-similarity. Following N. Laskin et al.'s proof that fractional Levy stable motion is a self-similar process [41], fGPm self-similarity parameters are obtained as follows: Equation (17) shows that the fGPm is a self-similar process with self-similarly parameter − 1/2 + 1/ . Similarly, the fGPm incremental process is also self-similar with parameter − 1/2 + 1/ .
As shown in Equation (6), the LRD of fGPm is closely related to its integral kernel shows that the value of the current time is always related to the value of all the times in the past, so that the integral kernel has a strong memory. We say that fGPm has LRD where H (Hurst exponent) is the self-similarity parameter, denotes equality in distribution. Several methods can be used to calculate the self-similarly parameter H such as the absolute value method, the periodogram estimation method, the wavelet estimation method, rescaled range method etc. [42][43][44]. Since the best accuracy is achieved by the rescaled range method [36], we use this method for the estimation of the self-similar parameters in LRD random processes. Generalized Pareto motion is a self-similar process with a self-similarity parameter 1 α , namely, p(t) a −1/α p(at) for all a > 0. As can be seen from Figure 8, the fGPm does not resemble white (pure random) noise as the clustering in data is clearly visible, this indicates the existence of self-similarity. Following N. Laskin et al.'s proof that fractional Levy stable motion is a self-similar process [41], fGPm self-similarity parameters are obtained as follows: Equation (17) shows that the fGPm is a self-similar process with self-similarly parameter H − 1/2 + 1/α. Similarly, the fGPm incremental process is also self-similar with parameter H − 1/2 + 1/α.
As shown in Equation (6), the LRD of fGPm is closely related to its integral kernel When t > s, if H = 1 α , fGPm always shows that the value of the current time t is always related to the value of all the times s in the past, so that the integral kernel has a strong memory. We say that fGPm has LRD when H > 1 α and negative dependence when H < 1 α . Moreover, the sequence of the self-similar parameter H ∈ (1/2, 1) has LRD characteristics [17,18]. Therefore, we restrict H to the interval (1/2, 1) with H > 1 α so that α belongs to (1,2). The key feature of the fGPm model is that parameters α, H are not independent since the fGPm has the LRD condition for αH > 1.

Iterative Forecasting Model
A. A. Stanislavsky et al. [12] proposed the FARIMA model with Pareto noise, which is a discrete-time analog of the fractional Langevin equation, in which the non-Gaussian statistics and the LRD are considered [25]. Therefore, let us consider the following Langevintype stochastic differential equation [26,27] driven by GPD motion: where dp α (t) stands for the increments of generalized Pareto motion p α (t). By replacing P H,α (t) to p α (t), we obtain the Langevin-type stochastic differential equation driven by fGPm: where b(t, X(t)) and δ(t, X(t)) represent the drift and diffusion functions, respectively. The Black-Scholes model [28,29] was generalized to the fractional Black-Scholes model by Dai et al. [45][46][47], based on the stochastic equation: where µ indicates the expected return rate, δ is the volatility rate. In fGPm, the iterative prediction model is based on incremental modeling, so the parameters of the differential equation correspond to the parameters of the fGPm increment. In Section 2.3, we use the generalized double Pareto distribution to fit the increment of GPD. Figure 9 shows that the increment of fGPm also obeys the generalized double Pareto distribution. In addition, when the range of the generalized double Pareto α is (0, 2), it is asymptotically in the attractive domain of Levy's stable distribution [7,12]. Therefore, µ represents the mean value of random variables, the parameter δ represent the diffusion coefficient. Consequently, Equation (20) can be rewritten as follows: dX H,α (t) = µX H,α (t)dt + δX H,α (t)dP H,α (t) (21) where µ, δ are constants. By using the Maruyama symbol [33] dB t = w(t)(dt) 1/2 (ω(t) represents Gaussian white noise), we get: and where 0 < a < 1 represents the self-similar parameter of x. f (t) denote a continuous function. The incremental expression of fGPm can be obtained by replacing f (t) with w α (t): It can be seen from Figure 9 that the increment of fGPm obeys the generalized double Pareto distribution, so ω α (s) follows the generalized double Pareto distribution with position parameter µ = 0 and scale parameter δ = 1.
Equation (18) can be discretized as: The iterative predictive model was obtained from the identity ∆X(t) = X(t + 1) − X(t): By using Monte Carlo simulation [48], most likely curves for multiple time series can be generated. Assuming that α = 1.5, µ = 0.4586, δ = 0.0396, H = 0.75, X 0 = 0.6, we performed 50 simulations by the Monte Carlo method and obtained the results in Figure 10. We can see that the data simulated by the fGPm iterative prediction model has high jumps, and the model will have good predictability for high jump data. ract. 2022, 6, x FOR PEER REVIEW By using Monte Carlo simulation [48], most likely curves fo can be generated. Assuming that = 1.5, µ = 0.4586, = 0 0.6, we performed 50 simulations by the Monte Carlo method an in Figure 10. We can see that the data simulated by the fGPm itera has high jumps, and the model will have good predictability for hi

Parameter Estimation of ,δ,α
As mentioned in 3.1., is the mean of the series, so its estim pressed as the mean of the time series. In addition, the estimatio done by the most common maximum likelihood estimation met may be no maximum likelihood estimates for and . To find a pointed out that by a change of parameters to = 1⁄ and = duced to a unidimensional search. We search for , which gives a profile log-likelihood (the log-likelihood maximized over 1⁄ ). Th Step 1: Let | =1… be the sampling data for the fGPm, Step 2: let (1) ≤ (2) ≤···≤ ( ) be the order statistics.

Parameter Estimation of µ,δ,α
As mentioned in 3.1., µ is the mean of the series, so its estimated value can be expressed as the mean of the time series. In addition, the estimation of α and δ can be done by the most common maximum likelihood estimation method [30,31]. But there may be no maximum likelihood estimates for α and δ. To find a solution, Davison [32] pointed out that by a change of parameters to θ = 1/αδ and α = α, the problem is reduced to a unidimensional search. We search for θ, which gives a local maximum of the profile log-likelihood (the log-likelihood maximized over 1/α). The specific process is: Step 1: Let x i | i=1...N be the sampling data for the fGPm, Step 2: let x (1) ≤ x (2) ≤ ··· ≤ x (N) be the order statistics.

Case Study
To verify the validity of the fGPm forecasting model, in this section we give the results of an experiment considering power load curves and predictions in the future of 6, 12, 18 and 24 h model for the two cased. The actual power load data is collected by the Eastern Slovak Electric Power Company [35] and is sampled every 30 min. By selecting the historical power load data in two cases (weekdays-case1 and weekends-case2), the parameters of the are estimated using R/S and the improved maximum likelihood estimation method. The calculated parameters are shown in Table 2. According to the condition αH > 1, both sets of data have LRD characteristics. Therefore, the two sets of data are modeled by the fGPm forecast model. The experimental process is shown in Figure 11.  [35] and is sampled every 30 min the historical power load data in two cases (weekdays-case1 and weeken parameters of the are estimated using R/S and the improved maximum li mation method. The calculated parameters are shown in Table 2. Accordi dition > 1, both sets of data have LRD characteristics. Therefore, th data are modeled by the fGPm forecast model. The experimental proces Figure 11.  Figure 11. Forecasting process.

Case 1: Weekdays
The power load of the first week of working days in January is use

Case 1: Weekdays
The power load of the first week of working days in January is used as the input sequence: that is, the power load from Monday to Thursday is used as historical data to predict the power load on Friday. Since the power consumption on weekdays is similar, and Monday to Thursday are close to Friday, it can accurately reflect the trend of power load data. Replacing this set with an iterative predictive model, we predict the trend of data sets for the next 6, 12, 18 and 24 h. The 24-h forecast trend of historical data is shown in Figure 12. At the same time, the fractional Brownian iterative prediction model [23,36] is used to repeat the above process and the results are compared with the fGPm prediction results in Figure 13. The maximum and average absolute percentage errors of the prediction results are shown in Table 3. The results show that the fGPm iterative prediction model has a good effect on the high-jump power load data prediction.

Case 2: Weekends
We repeated the analysis of Case 1, using the weekends of the first two wee January to predict the power load trend in the next weekend. Since the electric trends on weekends are similar, the electric load sequence between each weekend has LRD characteristics. Therefore, it can also be used to predict power load trends

Case 2: Weekends
We repeated the analysis of Case 1, using the weekends of the first two weeks of January to predict the power load trend in the next weekend. Since the electric load trends on weekends are similar, the electric load sequence between each weekend also has LRD characteristics. Therefore, it can also be used to predict power load trends. The results of weekend power load forecasting are shown in Figure 14. The comparison of the two forecasting models is shown in Figure 15. The error analysis results are reported in Table 4

Case 2: Weekends
We repeated the analysis of Case 1, using the weekends of the first two weeks January to predict the power load trend in the next weekend. Since the electric lo trends on weekends are similar, the electric load sequence between each weekend a has LRD characteristics. Therefore, it can also be used to predict power load trends. T results of weekend power load forecasting are shown in Figure 14. The comparison of two forecasting models is shown in Figure 15. The error analysis results are reported Table 4.

Conclusions
The main contributions of this paper are as follows. Firstly, the incremental distribution of GPD is obtained using statistical methods and the LRD characteristics of the generalized Pareto motion are analyzed according to the heavy-tailed characteristics of its distribution: the mathematical relationship H = 1/α of the self-similarly parameter H and the tail parameter α is obtained. Secondly, analogously to the method of generalizing fBm's integral expression to fractional Levy stable motion, the integral expression of fGPm is obtained, discretized and analyzed, revealing that fGPm has infinite variance. Third, the fGPm-driven Langevin-type SDE has been used to establish the fGPm iterative prediction model and the parameter estimation method in the Langevin-type SDE was also provided. The prediction model for considered only the case of α ∈ (1, 2): the case of α ∈ (0, 1) has yet to be studied in the future.
In order to verify the effectiveness of the fGPm prediction model, two different scenarios (weekdays and weekends) of power load data were considered in the case study. The prediction accuracy in the study is good, indicating that the model is universally applicable. At the same time, the comparison of the fGPm iterative prediction model with the fractional Brownian iterative prediction model highlights the superiority of the prediction model in this paper for high-jump data prediction. Wind speed prediction is a challenging job at present. Due to the strong randomness and intermittency of wind speed, the wind speed sequence is high-jump data. This model may have a good prediction effect on this kind of problem.