Abstract
Triple seasonal autoregressive (TSAR) models have been introduced to model time series date with three layers of seasonality; however, the Bayesian identification problem of these models has not been tackled in the literature. Therefore, in this paper, we have the objective of filling this gap by presenting a Bayesian procedure to identify the best order of TSAR models. Assuming that the TSAR model errors are normally distributed along with employing three priors, i.e., normal-gamma, Jeffreys’ and g priors, on the model parameters, we derive the marginal posterior distributions of the TSAR model parameters. In particular, we show that the marginal posteriors are multivariate t and gamma distributions for the TSAR model coefficients vector and precision, respectively. Using the marginal posterior distribution of the TSAR model coefficients vector, we present an identification procedure for the TSAR models based on a sequence of t-test of significance. We evaluate the accuracy of the proposed Bayesian identification procedure by conducting an extensive simulation study, followed by a real application to hourly electricity load datasets in six European countries.
Keywords:
TSAR models; prior and posterior distributions; layers of seasonality; hourly electricity load MSC:
62F15
1. Introduction
Modeling time series data with multiple seasonalities is a challenging task [1,2,3], and researchers have extended autoregressive moving average (ARMA) models to fit these time series and accommodate multiple seasonalities [4,5,6,7]. Bayesian analysis of seasonal ARMA (SARMA) models and their extensions is difficult to handle since the likelihood function of these time series models is complicated and analytically intractable, which complicates the posterior and predictive analyses. Accordingly, analytical approximations and Markov Chain Monte Carlo (MCMC) methods-based approximations have been introduced in the literature to ease and simplify the Bayesian analysis of these models. In general, the analytical approximations idea is mainly carried out by modifying the posterior and predictive densities of SARMA models to be approximated in standard or closed-form distributions that can be analytically tractable; see, for example, Shaarawy and Broemeling [8], Shaarawy and Ismail [9], Shaarawy and Ali [10]. In addition, the idea of approximations based on MCMC methods is carried out by using one or more of these methods to simulate the available conditional posterior or predictive densities of SARMA models to empirically approximate their corresponding marginal posterior or predictive densities; see, for example, Barnett et al. [11], Vermaak et al. [12], Amin [13].
The Bayesian analysis of single SARMA models is well established in the literature [14], starting from the work of Barnett et al. [11,15], which presents the MCMC methods to develop the Bayesian estimation of seasonal autoregressive (SAR) and ARMA models. In addition, Vermaak et al. [12] used the Metropolis within Gibbs sampling algorithm to propose the Bayesian estimation of SAR models in the field of modeling speech production for voiced sounds. Ismail [16,17] introduced the Gibbs sampler to present the Bayesian analysis of the SAR and seasonal moving average (SMA) models, respectively. On the other hand, Shaarawy and Ali [10] presented the analytical approximation to introduce the Bayesian identification of SAR models. Moreover, Ismail and Amin [18] applied the Gibbs sampler to present the Bayesian inference of SARMA models, and, recently, Amin [19] introduced both a Bayesian estimation and prediction of SARMA models via a Gibbs sampler.
The Bayesian analysis of double seasonal ARMA (DSARMA) models is relatively recent and few works have been presented in the literature. Ismail and Zahran [20] presented the analytical approximations to introduce the Bayesian estimation of double SAR (DSAR) models, and Amin [21] applied the Gibbs sampler to develop the Bayesian estimation of DSARMA models. In addition, Amin [22,23] presented analytical approximations to introduce the Bayesian estimation of DSARMA models and the Bayesian identification of DSAR models, respectively. Moreover, recently, Amin [24,25] presented the Bayesian analysis of DSAR models via the Gibbs sampler. For the time series with three layers of seasonality, none have introduced or discussed their Bayesian analysis, except our recent work of introducing the analytical approximations [13] and the Gibbs sampling algorithm [26] to present the Bayesian estimation of the TSAR models. However, specifying the best TSAR model order, i.e., the TSAR model identification, is the first and an important stage of the Bayesian analysis of these models that needs to be tackled to model high-frequency time series with three layers of seasonality in real applications.
Therefore, our contribution in this paper is to fill this gap by introducing a testing-based Bayesian procedure to identify the best order of TSAR models. Assuming that the TSAR model errors are normally distributed along with employing three priors, i.e., normal-gamma, Jeffreys’ and g priors, on the model parameters, we first derive the marginal posterior distributions of these TSAR model parameters. In particular, we derive the marginal posteriors to be multivariate t and gamma distributions for the TSAR model coefficients vector and precision, respectively. Since the derived marginal posterior distribution of the TSAR model coefficients vector is a multivariate t-distribution, we present an identification procedure for the TSAR models based on a sequence of t-test of significance, which is a simple and easy procedure to be adopted and applied in real applications. We evaluate the accuracy of the proposed Bayesian identification procedure by conducting an extensive simulation study. In addition, we show the applicability of our work on real hourly electricity load time series datasets in six European countries.
The remainder of this paper is structured as follows. The TSAR models are first introduced in Section 2. In Section 3, we discuss the posterior analysis of these TSAR models, followed by the proposed Bayesian identification procedure. We discuss the simulation study setting and results and then present a real application of the proposed Bayesian identification of TSAR models on hourly electricity load time series in six European countries in Section 4. Finally, in Section 5, the paper is concluded.
2. TSAR Models
The time series is said to be generated from the zero-mean triple seasonal autoregressive (TSAR) model of order p, , and , i.e., TSAR(p)()()(), if it has the form:
where is a sequence of random errors that are assumed to be identically independently normally distributed with zero mean and unknown precision , , and are the seasonal periods and B is the back-shift operator defined as .
is the non-seasonal autoregressive operator of order p. The TSAR model (1) has three seasonal autoregressive operators to accommodate the triple seasonal patterns [13], which are:
and
with orders and , respectively.
The non-seasonal and seasonal autoregressive coefficients are , , and , respectively.
Using the summation notation, we can expand and simplify the compact form of the TSAR model (1) to be:
In addition, we can write the matrix form of the TSAR model as:
where is the vector of observed time series, is the vector of unobserved model errors, X is an design matrix, i.e., , with the tth row:
and is the vector of TSAR model coefficients given as:
It has to be noted that, when the TSAR model orders p, , and are unknown, the design matrix X in the TSAR model (3) is a function of p, , and , which complicates the posterior analysis of these models and shows the importance of our work in this paper.
3. Posterior Analysis and Proposed Identification Procedure for TSAR Models
We can obtain the posterior distribution of the TSAR model parameters by simply combining the prior distribution of these model parameters with the likelihood function of time series data [27]. First, for the likelihood function of observed time series data , assuming the random errors of the TSAR model (3) are normally distributed and using a straightforward transformation from the model errors to [28], the conditional likelihood function on the first initial values, i.e., = , can be given as:
On the other hand, for the prior specification of TSAR model parameters and , we assume the products of non-seasonal and seasonal coefficients to be free coefficients and consider three prior distributions: normal-gamma, g and Jeffreys’ priors. Suppose that and ; the normal-gamma prior of and can be written as:
where the parameters and are hyper-parameters of the normal-gamma prior that need to be estimated to conduct the Bayesian analysis.
In addition, we employ the g prior for the parameters and with the objective of simplifying the elicitation of covariances of these parameters. Thus, the g prior of and can be given as:
where g is a hyper-parameter that might be specified as a function of the size of time series n and the number of model coefficients; for more details about how to set the g hyper-parameter value, see Fernandez et al. [29].
In case no or little information is available about the model parameters and , Jeffreys’ prior can be employed, and it is given as:
In order to obtain the joint posterior of and , we multiply the likelihood function given in (6) by each one of the above-given three prior distributions in (7)–(9). Following this Bayesian rule and for the normal-gamma prior (7), the joint posterior of and is given as:
In addition, for the g prior (8), the joint posterior of and is given as:
Moreover, for Jeffreys’ prior (9), the joint posterior of and is given as:
In our previous work [13], we derived from the joint posteriors in (10)–(12) the marginal posteriors of the TSAR model coefficients vector and precision , and we summarize these results in the following theorem and two corollaries.
Theorem 1.
Given the conditional likelihood function (6) and the normal-gamma prior of β and τ (7), the marginal posterior of the TSAR model coefficients vector β is a multivariate t-distribution with the degrees of freedom , mean vector and dispersion matrix , and the marginal posterior of the TSAR model precision τ is a gamma distribution with the two parameters and , where , and .
Proof.
We multiply the conditional likelihood function (6) of observed time series data by the normal-gamma prior of and (7) to obtain the joint posterior of and as given in Equation (10). We then integrate (10) over and complete the square with respect to , leading to the given marginal posterior of the TSAR model coefficients vector . On the other hand, we complete the square in the exponent of (10) with respect to and then integrate over , leading to the given marginal posterior of the TSAR model precision .
The following two corollaries are special cases of Theorem 1 when little or no information is available a priori about the TSAR model parameters, since we employ the g and Jeffreys’ priors to obtain the marginal posteriors of the TSAR model coefficients vector and precision to also be the multivariate t and gamma distributions, respectively, with different parameters. □
Corollary 1.
Given the conditional likelihood function (6) and the g prior of β and τ (8), the marginal posterior of the model TSAR coefficients vector β is a multivariate t-distribution with the degrees of freedom , mean vector and dispersion matrix , and the marginal posterior of the TSAR model precision τ is a gamma distribution with the two parameters and , where , and .
Proof.
We set , and , then directly it follows from Theorem 1. □
Corollary 2.
Given the conditional likelihood function (6) and Jeffreys’ prior of β and τ (9), the marginal posterior of the TSAR model coefficients vector β is a multivariate t-distribution with the degrees of freedom , mean vector , and dispersion matrix , and the marginal posterior of the TSAR model precision τ is a gamma distribution with the two parameters and , where , , and .
Proof.
We set , , and ; then, directly, it follows from Theorem 1. □
It is important to mention that an interesting property of the multivariate t-distribution of a given vector is that each single component of this vector has a univariate t-distribution and also any subvector of this vector also has a multivariate t-distribution [30]. Accordingly, using Theorem 1 and Corollaries 1 and 2, the marginal posterior of each one of the TSAR model coefficients vectors and is a multivariate t-distribution, and the marginal posterior of any single component of each one of them has a univariate t-distribution.
Based on the derived marginal posterior of the TSAR model coefficients, we propose a Bayesian identification procedure to identify the best value of the TSAR model order by first assuming that the maximum values of the TSAR model order p, , and are known and given as , , and , respectively. We then consider the following testing scheme as follows:
- Test:using the marginal posterior of the coefficient , which is a univariate t-distribution.
- If is rejected, then the identified value for is . Otherwise, test:using the marginal posterior of the coefficients , which is a multivariate t-distribution.
- Continue executing this sequence of t-test of significance until the null hypothesis is rejected, and then the identified value for is , where .
- Test:using the marginal posterior of the coefficients , which is a multivariate t-distribution.
- If is rejected, then the identified value for is . Otherwise, test:using the marginal posterior of the coefficients , which is a multivariate t-distribution.
- Continue executing this sequence of t-test of significance until the null hypothesis is rejected, and then the identified value for is , where .
- In the same way, run sequences of t-test of significance for the TSAR model coefficients vectors and until the null hypotheses are rejected, and then the identified values for p and are and , respectively.
The outcome of this testing scheme is the values , , and as the best value for the TSAR model order p, , and , respectively.
An important point that we have to mention here is related to the computational challenge of applying the proposed Bayesian identification procedure. Indeed, capturing triple seasonality requires observing time series under study for long time, which results in time series with a very large sample size that can be considered as big time series data. For example, observing the hourly electricity load for four years results in a time series of size 35,000. As is known, analyzing big time series data requires a high computational cost, especially when the analysis is conducted using interpreter software.
4. Simulation Study and Real Application
In this section, we evaluate the accuracy of the proposed Bayesian identification for TSAR models by conducting an extensive simulation study, and then we show the applicability of the proposed Bayesian identification procedure on real time series datasets of electricity load in six European countries that have three layers of seasonality.
4.1. Simulation Study
In this simulation study, we evaluate the accuracy of the proposed Bayesian identification procedure for TSAR models by covering different seasonality patterns with different data types and sample sizes. We generate 500 time series datasets of size n (from 3000 to 6000 with an increment of 1000 observations) from five TSAR models, and the design of these TSAR models and the true parameters values are presented in Table 1. As it can be seen from this table that the seasonal periods for gthe enerated time series are = 12, = 60 and = 600, and since the time series size has to be greater than , it is justified for our selection for the minimum value of the time series size n to be 3000.
Table 1.
Simulation design.
After obtaining the 500 time series datasets from each one of these five TSAR models with different sizes, we apply the proposed Bayesian identification procedure to identify the best value for the TSAR model order as follows.
- First, we assume that the maximum values of the model order are , since the maximum order value in the simulated TSAR models is not more than two.
- Second, we employ different priors for the model parameters and in order to evaluate the robustness or sensitivity to prior specification. In particular, we employ Jeffreys’ prior and g prior with five values for the hyper-parameter g, i.e., , , , and .
- Third, for each time series, we execute the proposed testing scheme in Section 3 using the marginal posterior of the TSAR model coefficients resulting from the employed priors, and the outcome of this testing scheme is the identified TSAR model order, i.e., , , and .
For all the generated time series datasets, we obtain the percentage of correctly identified true TSAR models by simply comparing the identified TSAR order obtained from the testing scheme with the true values of p, , and employed to generate the given time series.
In order to explain how the proposed Bayesian identification procedure works to identify the best value of the TSAR model order, we present the results of the testing scheme for only one generated time series with from TSAR Model I in Table 2. In this table, we present the null hypothesis, p-value calculated from the t distribution and whether the status of the null hypothesis is rejected or not using a 5% significance level for each test. It is worth noting that using the 5% significance level for each test does not guarantee that the overall significance level of the testing scheme equals 0.05, and indeed it is difficult to determine the overall significance level; for more discussion about this point, see, for example, Lütkepohl [31]. The results of the simulation study for all TSAR models are presented in Table 3.
Table 2.
Testing scheme results for one generated time series from TSAR Model I.
Table 3.
Percentage of correctly identified true TSAR models.
From the simulation results in Table 3, we can conclude some important remarks. First, the proposed Bayesian identification procedure provides a higher percentage of correctly identified true TSAR models, since, in most of the cases, the percentage of correctly identified true TSAR models is at least 85%. Second, the larger the time series size, the higher the percentage of correctly identified true TSAR models obtained. Third, employing different prior distributions for the TSAR model coefficients results in totally different TSAR identification results, and their impact is obvious in the percentage of correctly identified true TSAR models. Fourth, employing the g prior with for the TSAR model parameters and highly improves the percentage of correctly identified true TSAR models compared to using other g values as well as Jeffreys’ prior, since, in all the cases of employing the g prior with , the percentage of correctly identified true TSAR models is at least 92.2%.
In order to evaluate the robustness of the proposed Bayesian procedure to the violation of time series normality assumption, we generate the 500 time series datasets from the five TSAR models while assuming other distributions for the model errors. These errors distributions include Student’s t with 15 degrees of freedom, i.e., t(15), Laplace, log-normal and skew-normal with moderate skewness, i.e., skewness = 0.75. We apply the proposed Bayesian TSAR identification procedure in the same way as we discussed above, and the Bayesian identification results for Model I are presented in Table 4.
Table 4.
Percentage of correctly identified true TSAR models for Model I with different errors distributions.
These results show that the proposed Bayesian TSAR identification procedure is robust to the normality assumption violation, and the same conclusion is obtained from the results of the other four TSAR models. In general, these results of the conducted simulation study confirm the accuracy of the proposed Bayesian procedure for the TSAR models identification, especially when assuming the g prior for the TSAR model coefficients and precision .
4.2. Applications on Real Hourly Electricity Load Datasets
We apply in this section the proposed Bayesian identification procedure of TSAR models to real hourly time series datasets with three layers of seasonality. These datasets are electricity load per hour collected during about four years, from Saturday 1 January 2006 to Thursday 31 December 2009, in six European countries. These electricity load datasets are characterized by exhibiting three seasonality layers: intraday, intraweek and intrayear patterns. The time plots for these time series datasets of electricity load are displayed in Figure 1. For more details about these electricity load datasets, the reader are referred to Amin [26], where these datasets are first introduced and analyzed.
Figure 1.
Time plots for electricity load datasets in European countries.
It is clear from Figure 1 that the seasonality with three layers is very strong in these hourly electricity load time series; accordingly, we set the maximum order values of the TSAR model as , and we apply our proposed Bayesian identification procedure for the TSAR model with the same design of the simulation study using Jeffreys’ prior and the g prior with . We present the results of the identified TSAR model for each one of these electricity load datasets in Table 5.
Table 5.
Identified TSAR models for electricity load datasets.
We can observe from the results of this real application in Table 5 that the proposed identification procedure identifies almost the same TSAR model for the electricity load datasets that have the same stochastic behavior, as displayed in Figure 1. In addition, the identified TSAR models for these electricity load datasets are very similar for both Jeffreys’ and g priors.
5. Conclusions
In this paper, we introduced a Bayesian identification procedure for TSAR models proposed to fit and model time series with triple seasonality. We first employed three prior distributions, i.e., Jeffreys’, g and normal-gamma priors, on the TSAR model coefficients and precision, and also we assumed that the TSAR model errors are identically and normally distributed. We then derived the marginal posteriors of the TSAR model coefficients vector and precision to be the multivariate t and gamma distributions, respectively. Using the derived marginal posterior of the TSAR model coefficients, we straightforwardly proposed an identification procedure for determining the best order of TSAR models based on a sequence of t-test of significance. We conducted an extensive simulation study and the simulation results confirmed the accuracy of the proposed Bayesian TSAR identification, and also we applied our proposed procedure to real hourly electricity load datasets in six European countries. Since the current work considers only the autoregressive component, future work may include considering the moving average component in the time series model [8] and may also include an extension to the multivariate time series models [32].
Author Contributions
Conceptualization, A.A.A.; Methodology, A.A.A.; Software, A.A.A.; Writing—original draft, A.A.A.; Writing—review & editing, S.A.A.; Project administration, S.A.A.; Funding acquisition, S.A.A. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Data Availability Statement
All the datasets used in this paper are available from the authors upon request.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Taylor, J.W. Triple seasonal methods for short-term electricity demand forecasting. Eur. J. Oper. Res. 2010, 204, 139–152. [Google Scholar] [CrossRef]
- Taylor, J.W. A comparison of univariate time series methods for forecasting intraday arrivals at a call center. Manag. Sci. 2008, 54, 253–265. [Google Scholar] [CrossRef]
- Taylor, J.W. An evaluation of methods for very short-term load forecasting using minute-by-minute British data. Int. J. Forecast. 2008, 24, 645–658. [Google Scholar] [CrossRef]
- Taylor, J.W. Exponentially weighted methods for forecasting intraday time series with multiple seasonal cycles. Int. J. Forecast. 2010, 26, 627–646. [Google Scholar] [CrossRef]
- Taylor, J.W.; Snyder, R.D. Forecasting intraday time series with multiple seasonal cycles using parsimonious seasonal exponential smoothing. Omega 2012, 40, 748–757. [Google Scholar] [CrossRef]
- De Livera, A.M.; Hyndman, R.J.; Snyder, R.D. Forecasting time series with complex seasonal patterns using exponential smoothing. J. Am. Stat. Assoc. 2011, 106, 1513–1527. [Google Scholar] [CrossRef]
- Sulandari, W.; Suhartono, S.; Rodrigues, P.C. Exponential Smoothing on Modeling and Forecasting Multiple Seasonal Time Series: An Overview. Fluct. Noise Lett. 2021, 20, 2130003. [Google Scholar] [CrossRef]
- Shaarawy, S.; Broemeling, L. Bayesian inferences and forecasts with moving averages processes. Commun. Stat.-Theory Methods 1984, 13, 1871–1888. [Google Scholar] [CrossRef]
- Shaarawy, S.; Ismail, M. Bayesian inference for seasonal ARMA models. Egypt. Stat. J. 1987, 31, 323–336. [Google Scholar]
- Shaarawy, S.M.; Ali, S.S. Bayesian identification of seasonal autoregressive models. Commun. Stat.-Theory Methods 2003, 32, 1067–1084. [Google Scholar] [CrossRef]
- Barnett, G.; Kohn, R.; Sheather, S. Robust Bayesian Estimation Of Autoregressive–Moving-Average Models. J. Time Ser. Anal. 1997, 18, 11–28. [Google Scholar] [CrossRef]
- Vermaak, J.; Niranjan, M.; Godsill, S.J. Markov Chain Monte Carlo Estimation for the Seasonal Autoregressive Process with Application to Pitch Modelling; Technical Report; Department of Engineering, University of Cambridge: Cambridge, UK, 1998. [Google Scholar]
- Amin, A.A. Bayesian Inference of Triple Seasonal Autoregressive Models. Pak. J. Stat. Oper. Res. 2022, 18, 853–865. [Google Scholar] [CrossRef]
- Amin, A.A.; Emam, W.; Tashkandy, Y.; Chesneau, C. Bayesian Subset Selection of Seasonal Autoregressive Models. Mathematics 2023, 11, 2878. [Google Scholar] [CrossRef]
- Barnett, G.; Kohn, R.; Sheather, S. Bayesian estimation of an autoregressive model using Markov chain Monte Carlo. J. Econom. 1996, 74, 237–254. [Google Scholar] [CrossRef]
- Ismail, M.A. Bayesian Analysis of Seasonal Autoregressive Models. J. Appl. Stat. Sci. 2003, 12, 123–136. [Google Scholar]
- Ismail, M.A. Bayesian Analysis of the Seasonal Moving Average Model: A Gibbs Sampling Approach. Jpn. J. Appl. Stat. 2003, 32, 61–75. [Google Scholar] [CrossRef]
- Ismail, M.A.; Amin, A.A. Gibbs sampling for SARMA models. Pak. J. Stat. 2014, 30, 153–168. [Google Scholar]
- Amin, A. Gibbs Sampling for Bayesian Prediction of SARMA Processes. Pak. J. Stat. Oper. Res. 2019, 15, 397–418. [Google Scholar] [CrossRef]
- Ismail, M.A.; Zahran, A.R. Bayesian inference on double seasonal autoregressive models. J. Appl. Stat. Sci. 2013, 21, 13. [Google Scholar]
- Amin, A. Gibbs sampling for double seasonal ARMA models. In Proceedings of the 29th Annual International Conference on Statistics and Computer Modeling in Human and Social Sciences, Cairo, Egypt, 28–30 March 2017. [Google Scholar]
- Amin, A.A. Bayesian inference for double SARMA models. Commun. Stat.-Theory Methods 2018, 47, 5333–5345. [Google Scholar] [CrossRef]
- Amin, A.A. Bayesian identification of double seasonal autoregressive time series models. Commun. Stat.-Simul. Comput. 2019, 48, 2501–2511. [Google Scholar] [CrossRef]
- Amin, A.A. Bayesian analysis of double seasonal autoregressive models. Sankhya B 2020, 82, 328–352. [Google Scholar] [CrossRef]
- Amin, A.A. Full Bayesian analysis of double seasonal autoregressive models with real applications. J. Appl. Stat. 2023, 1–21. [Google Scholar] [CrossRef]
- Amin, A.A. Gibbs sampling for Bayesian estimation of triple seasonal autoregressive models. Commun. Stat.-Theory Methods 2023, 52, 7303–7322. [Google Scholar] [CrossRef]
- Broemeling, L.D. Bayesian Analysis of Linear Models; CRC Press: Boca Raton, FL, USA, 1985. [Google Scholar]
- Broemeling, L.D. Bayesian Analysis of Time Series; CRC Press: Boca Raton, FL, USA, 2019. [Google Scholar]
- Fernandez, C.; Ley, E.; Steel, M.F. Benchmark priors for Bayesian model averaging. J. Econom. 2001, 100, 381–427. [Google Scholar] [CrossRef]
- Box, G.E.; Tiao, G. Bayesian Inference in Statistical Analysis; John Wiley & Sons: Hoboken, NJ, USA, 1973. [Google Scholar]
- Lütkepohl, H. New Introduction to Multiple Time Series Analysis; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2005. [Google Scholar]
- Reinsel, G.C. Elements of Multivariate Time Series Analysis; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2003. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).