A set of state space models at an high disaggregation level to forecast Italian Industrial Production

Normally econometric models that forecast Italian Industrial Production Index do not exploit pieces of information already available at time t+1 for its own main industry groupings. A new strategy is sketched here using state space models and aggregating the estimates to obtain improved results. The variables available at time t+1 are Consumption of Electricity, Compressed Natural Gas distributed on its own net, Production of Compressed Natural Gas, Compressed Natural Gas for thermoelectric use, Registration of commercial vehicles for Italy, Germany, France and Spain. Unfortunately for the other main industry groupings there are not available variables not prone to high revisions. The issue coming out from holidays taken during Tuesday or Friday will be tackled. How to handle in-sample forecast with different aggregating weights will be considered for the period before the ﬁrst of January of 2010 where is impossible to use the same structure for the base year 2010.


Introduction
Forecasting Industrial Production can be a very difficult task, but perhaps forecasting its own subcomponents at an high disaggregation level can be even more challenging for researchers. This happens because there are few pieces of information at disaggregated level, and unless such data have strong comovements with the subcomponents, the risk is to worsen forecast results simply exploiting the past. I analyzed gross data on the Italian Industrial Production Index at an higher disaggregated level, exploiting the high correlation with other times series when they are available. I found not an high forecasting performance on average when poor or no information is available, consistent with existing literature, but larger improvements for those disaggregated components who face a richer correlation with other variables on their past and on the one step ahead prediction. To the best of my knowledge this is the only study that compares the short-term forecasting performance of a naive autoregressive model applied to Italian Industrial Production Index, with a set of state space models applied to its own main industry groupings. Such an effort is motivated to understand evolution of Italian economy in the short term. It is mandatory to continuously review the estimates to monitor evolution of quarterly estimates of Gross Domestic Product. There is a trade-off between the need to provide Industrial Production forecast closely related with the future evolution of Gross Domestic Product in a short time and doing such job exploiting as much is possible all the pieces of information available with the most suitable econometric tools. Brunhes-Lesage and Darné (2012) point out analyzing the French Industrial Production Index also that:"The IIP, however, is characterized by a significant publication delay, around 40 days after the end of the reference month for the main European countries, and the first IIP estimations are often revised significantly. Thus, it is less useful for short-term forecasting exercises." Italian data are not so prone to high revisions, but they are also characterized by the same delay (40 days). The aim of this paper is, therefore, to propose many models designed to forecast the current-month Italian Industrial Production Index (nowcast), using data listed in table 1 and to forecast Italian Industrial Production Index for the next two months after the nowcasted month. The remainder of the paper is organized as follows. Section 2 describes the data used and introduces the models. Section 3 describes the issues related to long weekends. Section 4 outlines some issues related to backcast some series and about constructing the weights where is impossible to use the structure of the base year 2010. Section 5 concerns about the model used to estimate simultaneously IPI for intermediate and capital goods. Section 6 describes the models for durable goods and non durable goods. Section 7 describes models for energy's disaggregation. Section 8 is about the forecasting performance. Section 9 concludes.

A light description for the models used for disaggregated components
In figure 3 is plotted the level of Industrial Production Index for Intermediate goods, for Capital goods and the sum of Registration of light commercial vehicles over Italy, Spain, France and Germany. The last variable is always available at time t+1 and it is used in vector autoregressive moving average model in first difference of first log seasonal differences shaped with the aforementioned endogenous variables. I am assuming that these goods are transported in these countries monthly and that persists a relationship among their seasonal growth rates. A slight different strategy will be considered for industrial production index of durable goods without using the log transformation. A bivariate seemingly unrelated time series equations (see example about car drivers accidents in Durbin and Koopman (2012)) will be composed by the total sum of light commercial vehicles and industrial production index of durable goods. In contrast, no pieces of information are available for industrial production index of non durable goods so here a popular benchmark namely the autoregressive model of order three over the seasonal differences with no logarithmic transformation will be exploited. On the contrary many pieces of information are ready at the time of publication on ISTAT website concerning production electricity and other sub components. The daily data about consumption of electricity can be cumulated to obtain a preliminary estimate of the monthly data that will be released roughly fifteen -twenty days later. Data about compressed natural gas are less prone to revision and when they are cumulated to obtain their monthly value do not differ too much from their preliminary value. More in detail to obtain an estimate of main industry grouping Industrial Production Index of Electricity, gas, steam and air conditioning supply we have to consider five sub-components: • Extraction of crude petroleum and natural gas; • Manufacture of coke and refined petroleum products; • Electric power generation, transmission and distribution; • Manufacture of gas; distribution of gaseous fuels through mains;  (2012)). For t+2 roughly 10 daily observations are always available and will be appended to daily time series about production of compressed natural gas. Manufacture of coke and refined petroleum products is estimated by an autoregressive moving average identification procedure described in Gómez and Maravall (2000) applied to ISTAT's monthly data. Electric power generation, transmission and distribution is somewhat more complicated because two endogenous variables are available at time t+1 : Jan02 Jan03 Jan04 Jan05 Jan06 Jan07 Jan08 Jan09 Jan10 Jan11 Jan12 Jan13 Jan14 Jan15 Jan16 Jan17 A univariate model over the seasonal differences with no logarithmic transformation and cumulated daily data with consumption of electricity and compressed natural gas for thermoelectric use as exogenous variable will be set up and finally for period t+1 the two variables mentioned above will be used to forecast electric power generation, transmission and distribution. For t+2 and t+3 a naive autoregressive model of order three using first seasonal weekly differences applied to daily data about consumption and thermoelectric, national natural gas production and transportation will allow us to have more reliable estimates for production of electricity, extraction of petroleum and natural gas and finally distribution of gaseous fuels through mains at time t+2 and t+3.

Long weekends
During some months there are fixed holidays that come on Tuesdays or on Fridays. A typical example is the Immaculate Conception the 8th of December of yearly. Roughly speaking, the 7th of December of 2015 many workers requested to go on holidays. In this way they used four days (Saturday the 5th of December, Sunday the 6th of December, Monday the 7th of December and finally the day of the Immaculate Conception). The raw data about Industrial Production Index are affected by such slowdown on Monday the 7th. Nevertheless, ISTAT does not consider such group of people that do not work during that day. The calendar and seasonal adjustment is strict. Alternatively most of the times a calendar adjusted time series is more predictable than a gross one. A time series not affected by strikes or unexpected holidays could be better forecasted than a gross one. To tackle this problem I propose to trace on the calendar from the first of January 2001 the following days: • Epiphany the 6th of January; • Italian Republic day the 2nd of June; • Immaculate Conception the 8th of December; • International Workers' Day the 1st of May; To trace the long weekends I propose the following strategy: • for Epiphany, Italian Republic day, Immaculate Conception find the years where they happen on Tuesdays and Thursday; • create a binary dummy variable to inform the model there will be a discrepancy between what it expects and what happens in the real world and place it on Mondays or Fridays for daily data; • for the first of May let the computer decide; if the month of April is affected when Labour Day happens on Tuesday or Thursday • Finally, compute dummy variables for the months over the years detected from the procedure mentioned above.

Some reflections over the period from January 2006 up to December 2009
The period from January 2001 up to December 2009 deserves an ad hoc treatment because I want to compare the performance of the disaggregated composed model and a naive benchmark model .i.e. autoregressive model of order three over the log seasonal differences over the whole index, I have to tackle two issues: • backcast consumption of electricity daily data from 2006 up to 2013 exploiting compressed natural gas for thermoelectricity daily data; • since the weights for every subcomponent are not fixed for this period like from January 2010 up to nowadays (.i.e. they were rebuilt in terms of growth rate at the expenditure of fixed weight base year structure), it is mandatory to compute the discrepancy between the data published and the data obtained holding constant the structure of base year weight (i.e. 2010).

Conditional Vector Autoregressive Moving Average Model for Intermediate Goods, Capital Goods, Total Sum of Light
Commercial Vehicles Registration over Italy, Spain, France and Germany Bruno and Lupi (2004) use the following unrestricted Vector Autoregressive Model: where ∆ = (1-L), ∆ 12 = (1 − L 12 ), L is the usual lag operator such that L p z t = z t−p , y t = (log(IIPI t ); log(T ONFSt); PPt) (PPt represents production prospects from ISTAT surveys and TONFSt stands for tons of raw material transported by Italian railways for intermediate goods. Bruno and Lupi (2004) apply to PPt a logistic transformation, d t contains some deterministic components (constant, specific impulse dummies). The endogenous variables available at the time t+1 of forecast are T ONFS t and PP t . For the next periods the forecast is unconditional since there is no further information. Since together intermediate goods plus capital goods account for 61 percent of the whole index and they are transported on light commercial vehicles I take the sum of their Registration over Italy, France, Spain and Germany. So the new model is where ∆ = (1-L), ∆ 12 = (1 − L 12 ), L is the usual lag operator such that L p z t = z t−p , y t = (log(IPIINT ERMEDIAT E t ); log(IPICAPITAL t ); log(SUMLCV t )) (SUMLCV t represents the total level of Registrations of light commercial vehicles over the four mentioned European countries, while d t contains the long weekend dummies, the growth rates of number of days worked over the four countries and the growth rates of number of days worked in Italy computed as first differences as first log year to year differences).

Seemly Unrelated Time Series Equations for Durable Goods, seasonal differences autoregressive model for non durable goods
For Durable Goods I exploit a similar system of equations mentioned above (see Durbin and Koopman (2012)) without using logarithmic transformation. More in detail here y t = ((IPIDURABLEGOODS) t ; SUMLCV t ). On the contrary at the time of Italian Industrial Index publication on ISTAT website there won't be any pieces of information available for non-durable consumer goods index. I will use a popular autoregressive model without the logarithmic transformation.
where y t = (IPINONDURABLEGOODS) t and d t contains the same dummies and same working days number variables mentioned in equation 3.

Which models for energy's disaggregation?
Here the picture is somewhat more complicated by information available on monthly and daily basis. More in detail the daily observations about consumption of electricity downloadable from Terna website (www.terna.it) start from the first January of 2013 up to nowadays. For the past only chunks of observations are available. It is mandatory to backcast this time series using daily information of m 3 of compressed natural gas for thermoelectric use available from Snam website (www.snam.it). In the past most of production of electricity was generated by compressed natural gas so it sounded fair to exploit these series with no missing values as an exogenous regressor and put it in the benchmark model with weekly frequencies in place of monthly frequencies, and hoc set of daily dummies (see the appendix for further details) and no logarithmic transformation using data from 2013 up to now. Once the model is identified and casted in Kalman Filter equations the chunks available from Terna website will be inserted into the equations of Kalman filter smoother to backcast the whole time series of daily consumption from January 2006 up to now. Such data finally can be cumulated on monthly basis and compared with published monthly data from 2006 up to now. For the sake of simplicity such discrepancies are not reported here since they are negligible. At this point three endogenous variables are available: • index of production of electricity until time t • consumption of electricity until time t + 1 • meter cubes of compressed natural gas for thermoelectric use until time t + 1 Since at the time of publication on ISTAT website roughly ten daily observations for t+2 are already available and they correspond to roughly one third of a month, the benchmark model on daily data mentioned above is used to compute t+2 and t+3. Again with a similar equation 3 it will be estimated index of production of electricity at time t + 1, t + 1, t + 3. The endogenous variables available for the forecast window will be consumption of electricity cumulated on daily basis, volumes of natural gas for thermoelectricity. The same strategy will be exploited with only two variables for the index correlated with transportation of meter cubes of compressed natural gas. Extraction of crude petroleum and natural gas requires a model more flexible, so a seemingly unrelated equations is used. See Gómez (2015) for all the details about the implementations. Manufacture of crude petroleum suffers from insufficient information somewhat compensated by its own relative low weight (only 11 per cent on the Main Index Grouping Index, see fig.2). Only for this last component an automated model identification procedure will be used (see Gómez and Maravall (2000)).
8 Performance of the index compared to a naive benchmark over three steps In table 2 are summarized the performances of the models applied to disaggregated components mentioned in figure 1. An expanding window starting from the January of the year 2001 and moving from the January of 2008 up to the November of 2017 1 is considered. The mean absolute forecast error (https://en.wikipedia.org/wiki/Mean_absolute_ error) is computed over the three step ahead horizon using the first logarithmic seasonal differences. The column MAE-BEN shows the results for the autoregressive of order three over the seasonal logarithmic differences.

Index
Step 1 Step 2 Step 3

Index
Step 1 Step 2 Step 3  Alternatively the column MAE-DIS shows the performances of the aforementioned models. The disaggregated model outperforms the benchmark at one step. On the contrary the benchmark beats the disaggregated model on the second and third step (see the line IPI-WHOLE). Disaggregating simply using the past does not help. The line IPI-BENAGG shows that aggregating the benchmarks applied to the subcomponents worsen the situation. This concept is confirmed looking at the results shown in table 3. Simply using ten days of data we have better results not only at step 1, but also at step 2 (see the column MAE-DIS for the row IPI-BENAGG). These observations point out that we may use the autoregressive model with no logarithmic transformation and the long weekend dummies for the step 2 and the step 3 (see table 4 at the last line).

Conclusions
The aim of this paper is to understand if it does worth to exploit pieces of information already available at highest diasaggregation level at time t+1 and at time t+2. Of course when the forecasting errors propagate over time, even a naive autoregressive model performs better than a more disaggregated one. Yet, using an hybrid approach, i.e. conditioning a model exploiting the long weekends persistence over time with the aggregated one step, the results improve over three steps horizon. From www.terna.it and from www.snam.it is possible to download daily data from January of 2013 up to nowadays. If on one hand we do have daily data from 2006 up to 2012 of production of natural gas for thermoelectric use, however only chunks are available for consumptions of electricity during this period in the past. I assume that in the remote past renewable sources of energy ( like wind, solar power, hydroelectricity, geothermic source) had been less relevant.

About this document
∆ 7 log(C t ) = const t + φ 1 ∆ 7 log(C t−1 ) + φ 2 ∆ 7 log(C t−2 )+ φ 3 ∆ 7 log(C t−3 ) + β 1 ∆ 7 log(CH4 t ) + γD t + ε t Equation 5 will be casted in state space form to backcast daily consumption data in the remote past. The equation is a univariate autoregressive model over weekly frequencies with a set of exogenous regressors composed be weekly seasonal log differences daily data of natural gas for thermoelectric use and by a year fixed seasonal cycle over a year (see Gómez (2015)). More in detail D t is obtained by the following equation according to Gregorian calendar with a period of 365.25: s t = a · cos(wt) + b · sin(wt), w = 2π · k n For a more complex example considering two fixed seasonal patterns (according to the Gregorian and the Hiri calendar) see Livera et al. (2011). For all the explanations concerning filtering and smoothing (used here to backcast data) is possible to have a look at chapter four of Gómez (2016).