# Data-Driven Natural Gas Spot Price Forecasting with Least Squares Regression Boosting Algorithm

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Preliminaries

#### 2.1. Linear Regression

#### 2.2. SVM

#### 2.3. Boosting

Algorithm 1. The gradient boosting algorithm. |

Input: A training set ${\left\{\left({x}_{i},{y}_{j}\right)\right\}}_{i=1}^{n}$, a loss function $L\left(y,F\left(x\right)\right)$, number of iterations $M$ |

Initialize, ${F}_{0}\left(x\right)=\mathrm{arg}{\mathrm{min}}_{\rho}{\displaystyle {\sum}_{i=1}^{N}L\left({y}_{i},\rho \right)}$ |

For $m=1$ to $M$ do: |

${\tilde{y}}_{i}=-{\left[\frac{\partial L\left({y}_{i},F\left({x}_{i}\right)\right)}{\partial F\left({x}_{i}\right)}\right]}_{F\left(x\right)={F}_{m-1}\left(x\right)}$, $i=1,\cdots ,N$ |

${\alpha}_{m}=\mathrm{arg}{\mathrm{min}}_{\alpha ,\beta}{{\displaystyle {\sum}_{i=1}^{N}\left[{\tilde{y}}_{i}-\beta h\left({x}_{i};\alpha \right)\right]}}^{2}$ |

${\rho}_{m}=\mathrm{arg}{\mathrm{min}}_{\rho}{\displaystyle {\sum}_{i=1}^{N}L\left({y}_{i},{F}_{m-1}\left({x}_{i}\right)+\rho h\left({x}_{i};{\alpha}_{m}\right)\right)}$ |

${F}_{m}\left(x\right)={F}_{m-1}\left(x\right)+{\rho}_{m}h\left(x;{\alpha}_{m}\right)$ |

End for |

Output: The final regression function ${F}_{m}\left(x\right)$. |

Algorithm 2. The LSBoost algorithm. |

Input: A training set ${\left\{\left({x}_{i},{y}_{j}\right)\right\}}_{i=1}^{n}$, a loss function $L\left(y,F\right)={\left(y-F\right)}^{2}/2$, number of iterations $M$. |

Initialize, ${F}_{0}\left(x\right)=\overline{y}$ |

For $m=1$ to $M$ do: |

${\tilde{y}}_{i}={y}_{i}-{F}_{m-1}\left({x}_{i}\right)$, $i=1,\cdots ,N$ |

$\left({\rho}_{m},{\alpha}_{m}\right)=\mathrm{arg}{\mathrm{min}}_{\alpha ,\rho}{{\displaystyle {\sum}_{i=1}^{N}\left[{\tilde{y}}_{i}-\rho h\left({x}_{i};\alpha \right)\right]}}^{2}$ |

${F}_{m}\left(x\right)={F}_{m-1}\left(x\right)+{\rho}_{m}h\left(x;{\alpha}_{m}\right)$ |

End for |

Output: The final regression function ${F}_{m}\left(x\right)$. |

## 3. Datasets and Models

#### 3.1. Data Preparation and Description

#### 3.2. Model Validation Techniques

#### 3.3. Forecasting Performance Evaluation Criteria

^{2}) measures the goodness-of-fit for the entire regression [42]. The R

^{2}value ranges from 0–1. The closer the value is to 1, the better the goodness-of-fit of the regression line to the observed value and vice versa [43,44]:

## 4. Empirical Analysis

#### 4.1. An Empirical Study of Natural Gas Price Series

#### 4.2. Comparisons of Existing Predictive Methods

## 5. Conclusions

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## Appendix A

## References

- International Energy Agency (IEA). Key World Energy Statistics. 2018. Available online: https://webstore.iea.org/key-world-energy-statistics-2018 (accessed on 8 March 2019).
- Nick, S.; Thoenes, S. What drives natural gas prices?—A structural VAR approach. Energy Econ.
**2014**, 45, 517–527. [Google Scholar] [CrossRef] - The International Gas Union (IGU). Wholesale Gas Price Survey. 2018. Available online: https://www.igu.org/publication/301683/31 (accessed on 8 March 2019).
- MacAvoy, P.W.; Moshkin, N.V. The new long-term trend in the price of natural gas. Resour. Energy Econ.
**2000**, 22, 315–338. [Google Scholar] [CrossRef] - Buchanan, W.K.; Hodges, P.; Theis, J. Which way the natural gas price: An attempt to predict the direction of natural gas spot price movements using trader positions. Energy Econ.
**2001**, 23, 279–293. [Google Scholar] [CrossRef] - Woo, C.K.; Olson, A.; Horowitz, I. Market efficiency, cross hedging and price forecasts: California’s natural-gas markets. Energy
**2006**, 31, 1290–1304. [Google Scholar] [CrossRef] - Nguyen, H.T.; Nabney, I.T. Short-term electricity demand and gas price forecasts using wavelet transforms and adaptive models. Energy
**2010**, 35, 3674–3685. [Google Scholar] [CrossRef] - Azadeh, A.; Sheikhalishahi, M.; Shahmiri, S. A hybrid neuro-fuzzy approach for improvement of natural gas price forecasting in vague and noisy environments: domestic and industrial sectors. In Proceedings of the International Conference on Trends in Industrial and Mechanical Engineering (ICTIME’2012), Dubai, UAE, 24–25 March 2012; pp. 123–127. [Google Scholar]
- Zheng, H.; Yuan, J.; Chen, L. Short-term load forecasting using EMD-LSTM neural networks with a Xgboost algorithm for feature importance evaluation. Energies
**2017**, 10, 1168. [Google Scholar] [CrossRef] - Kuo, P.H.; Huang, C.J. A high precision artificial neural networks model for short-term energy load forecasting. Energies
**2018**, 11, 213. [Google Scholar] [CrossRef] - Merkel, G.; Povinelli, R.; Brown, R. Short-term load forecasting of natural gas with deep neural network regression. Energies
**2018**, 11, 2008. [Google Scholar] [CrossRef] - Abrishami, H.; Varahrami, V. Different methods for gas price forecasting. Cuadernos Econ.
**2011**, 34, 137–144. [Google Scholar] [CrossRef][Green Version] - Busse, S.; Helmholz, P.; Weinmann, M. Forecasting day ahead spot price movements of natural gas—An analysis of potential influence factors on basis of a NARX neural network. In Proceedings of the Tagungsband der Multikonferenz Wirtschaftsinformatik (MKWI), Braunschweig, Germany, 29 February–2 March 2012; pp. 1–3. [Google Scholar]
- Salehnia, N.; Falahi, M.A.; Seifi, A.; Adeli, M.H.M. Forecasting natural gas spot prices with nonlinear modeling using gamma test analysis. J. Natl. Gas Sci. Eng.
**2013**, 14, 238–249. [Google Scholar] [CrossRef] - Čeperić, E.; Žiković, S.; Čeperić, V. Short-term forecasting of natural gas prices using machine learning and feature selection algorithms. Energy
**2017**, 140, 893–900. [Google Scholar] - Freund, Y.; Schapire, R.E. Experiments with a new boosting algorithm. In Proceedings of the Thirteenth International Conference on International Conference on Machine Learning (ICML), Bari, Italy, 3–6 July 1996; pp. 148–156. [Google Scholar]
- Schapire, R.E. The Boosting Approach to Machine Learning: An Overview. Nonlinear Estimation and Classification; Springer: New York, NY, USA, 2003; pp. 149–171. [Google Scholar]
- Bishop, C.M. Pattern Recognition and Machine Learning; Springer: New York, NY, USA, 2007; pp. 1–738. [Google Scholar]
- Schapire, R.E. The strength of weak learnability. Mach. Learn.
**1990**, 5, 197–227. [Google Scholar] [CrossRef][Green Version] - Freund, Y. Boosting a weak learning algorithm by majority. Inf. Comput.
**1995**, 121, 256–285. [Google Scholar] [CrossRef] - Mendes-Moreira, J.; Soares, C.; Jorge, A.M.; Sousa, J.F.D. Ensemble approaches for regression: A survey. ACM Comput. Surveys
**2012**, 45, 10. [Google Scholar] [CrossRef] - Friedman, J.; Hastie, T.; Tibshirani, R. The Elements of Statistical Learning, Data Mining, Inference, and Prediction, 2nd ed.; Springer Series in Statistics; Springer: New York, NY, USA, 2009; pp. 1–745. [Google Scholar]
- Breiman, L. Random forests. Mach. Learn.
**2001**, 45, 5–32. [Google Scholar] [CrossRef] - Fama, E.F.; French, K.R. The capital asset pricing model: Theory and evidence. J. Econ. Perspect.
**2004**, 18, 25–46. [Google Scholar] [CrossRef] - Ehrenberg, R.G.; Smith, R.S. Modern Labor Economics: Theory and Public Policy, 13th ed.; Routledge: New York, NY, USA, 2018. [Google Scholar]
- DeFries, R.S.; Rudel, T.; Uriarte, M.; Hansen, M. Deforestation driven by urban population growth and agricultural trade in the twenty-first century. Nat. Geosci.
**2010**, 3, 178–181. [Google Scholar] [CrossRef] - Witten, I.H.; Frank, E.; Hall, M.A.; Pal, C.J. Data Mining: Practical Machine Learning Tools and Techniques, 4th ed.; Morgan Kaufmann: Cambridge, UK, 2016. [Google Scholar]
- Bianco, V.; Manca, O.; Nardini, S. Electricity consumption forecasting in Italy using linear regression models. Energy
**2009**, 34, 1413–1421. [Google Scholar] [CrossRef] - Smola, A.J.; Schölkopf, B. A tutorial on support vector regression. Statist. Comput.
**2004**, 14, 199–222. [Google Scholar] [CrossRef][Green Version] - Vapnik, V. The Nature of Statistical Learning Theory; Springer Science & Business Media: New York, NY, USA, 2013. [Google Scholar]
- Xie, W.; Yu, L.; Xu, S.Y.; Wang, S.Y. A new method for crude oil price forecasting based on support vector machines. In Proceedings of the International Conference on Computational Science (ICCS), Reading, UK, 28–31 May 2006; pp. 444–451. [Google Scholar]
- Freund, Y.; Schapire, R.; Abe, N. A short introduction to boosting. J. Jpn. Soc. Artif. Intell.
**1999**, 14, 771–780. [Google Scholar] - Kearns, M.; Valiant, L. Cryptographic limitations on learning Boolean formulae and finite automata. J. ACM
**1994**, 41, 67–95. [Google Scholar] [CrossRef] - Friedman, J.; Hastie, T.; Tibshirani, R. Additive logistic regression: A statistical view of boosting (with discussion and a rejoinder by the authors). Ann. Stat.
**2000**, 28, 337–407. [Google Scholar] [CrossRef] - Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat.
**2001**, 29, 1189–1232. [Google Scholar] [CrossRef] - Friedman, J.H. Stochastic gradient boosting. Comput. Stat. Data Anal.
**2002**, 38, 367–378. [Google Scholar] [CrossRef][Green Version] - Mason, L.; Baxter, J.; Bartlett, P.L.; Frean, M.R. Boosting algorithms as gradient descent. In Proceedings of the 12th International Conference on Neural Information Processing Systems (NIPS), 29 November–4 December 1999; pp. 512–518. [Google Scholar]
- Moonam, H.M.; Qin, X.; Zhang, J. Utilizing data mining techniques to predict expected freeway travel time from experienced travel time. Math. Comput. Simul.
**2019**, 155, 154–167. [Google Scholar] [CrossRef] - Reddy, L.V.; Yogitha, K.; Bandhavi, K.; Vinay, G.S.; Kumar, G.D. A Modern Approach Student Performance Prediction using Multi-Agent Data Mining Technique. i-Manag. J. Softw. Eng.
**2015**, 10, 14–20. [Google Scholar] [CrossRef] - EIA. Available online: https://www.eia.gov/ (accessed on 8 March 2019).
- Kohavi, R. A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the 14th international joint conference on artificial intelligence (IJCAI), Montreal, QC, Canada, 20–25 August 1995; pp. 1137–1145. [Google Scholar]
- Cameron, A.C.; Windmeijer, F.A. An r-squared measure of goodness of fit for some common nonlinear regression models. J. Econ.
**1997**, 77, 329–342. [Google Scholar] [CrossRef] - Jin, R.; Chen, W.; Simpson, T.W. Comparative studies of metamodelling techniques under multiple modelling criteria. Struct. Multidiscipl. Optim.
**2001**, 23, 1–13. [Google Scholar] [CrossRef] - Touzani, S.; Granderson, J.; Fernandes, S. Gradient boosting machine for modeling the energy consumption of commercial buildings. Energy Build.
**2018**, 158, 1533–1543. [Google Scholar] [CrossRef][Green Version] - Willmott, C.J.; Matsuura, K. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Clim. Res.
**2005**, 30, 79–82. [Google Scholar] [CrossRef][Green Version] - Chai, T.; Draxler, R.R. Root mean square error (RMSE) or mean absolute error (MAE)?—Arguments against avoiding RMSE in the literature. Geosci. Model Dev.
**2014**, 7, 1247–1250. [Google Scholar] [CrossRef] - Willmott, C.J. Some comments on the evaluation of model performance. Bull. Am. Meteorol. Soc.
**1982**, 63, 1309–1313. [Google Scholar] [CrossRef] - Hyndman, R.J.; Koehler, A.B. Another look at measures of forecast accuracy. Int. J. Forecast.
**2006**, 22, 679–688. [Google Scholar] [CrossRef][Green Version] - Willmott, C.J. On the validation of models. Phys. Geogr.
**1981**, 2, 184–194. [Google Scholar] [CrossRef] - Voyant, C.; Notton, G.; Kalogirou, S.; Nivet, M.L.; Paoli, C.; Motte, F.; Fouilloy, A. Machine learning methods for solar radiation forecasting: A review. Renew. Energy
**2017**, 105, 569–582. [Google Scholar] [CrossRef] - Malliaris, M.; Malliaris, S. Forecasting inter-related energy product prices. Eur. J. Financ.
**2008**, 14, 453–468. [Google Scholar] [CrossRef]

**Figure 1.**Henry Hub natural gas spot price series data of different timescales. (

**a**) Daily, (

**b**) Weekly, (

**c**) Monthly.

**Figure 3.**Predicted values versus actual values in different timescales (

**a**) Daily; (

**b**) Weekly; (

**c**) Monthly.

**Table 1.**Summary of descriptive statistics of natural gas spot price series for different timescales.

Timescale | Mean | Median | Max | Min | Standard Deviation |
---|---|---|---|---|---|

Daily | 4.7693 | 4.16 | 18.48 | 1.49 | 2.2456 |

Weekly | 4.7811 | 4.17 | 14.49 | 1.57 | 2.2518 |

Monthly | 4.7907 | 4.135 | 13.42 | 1.73 | 2.2343 |

Timescale | R-Square | MAE | MSE | RMSE |
---|---|---|---|---|

Daily | 0.91 | 0.4493 | 0.4376 | 0.6615 |

Weekly | 0.90 | 0.4761 | 0.5116 | 0.7153 |

Monthly | 0.78 | 0.6859 | 1.1166 | 1.0567 |

Timescale | HO | WTI | NGMP | NGRRC | NGC | NGUSC | NGI |
---|---|---|---|---|---|---|---|

Daily | 0.0075 | 0.0015 | 0.0551 | 0.0040 | 0.0036 | 0.0084 | 0.0021 |

Weekly | 0.0066 | 0.0031 | 0.0557 | 0.0042 | 0.0032 | 0.0078 | 0.0025 |

Monthly | 0.0043 | 0.0034 | 0.0582 | 0.0051 | 0.0008 | 0.0079 | 0.0020 |

Method | R-Square | MAE | MSE | RMSE |
---|---|---|---|---|

Linear Regression | 0.56 | 1.0477 | 2.1963 | 1.482 |

Linear SVM | 0.54 | 1.0029 | 2.3059 | 1.5185 |

Quadratic SVM | 0.79 | 0.6699 | 1.0559 | 1.0276 |

Cubic SVM | 0.88 | 0.4845 | 0.6103 | 0.7812 |

LSBoost | 0.91 | 0.4493 | 0.4376 | 0.6615 |

Method | R-Square | MAE | MSE | RMSE |
---|---|---|---|---|

Linear Regression | 0.56 | 1.0507 | 2.2119 | 1.4873 |

Linear SVM | 0.54 | 1.0061 | 2.3203 | 1.5233 |

Quadratic SVM | 0.78 | 0.7045 | 1.1306 | 1.0633 |

Cubic SVM | 0.88 | 0.5159 | 0.6318 | 0.7949 |

LSBoost | 0.90 | 0.4761 | 0.5116 | 0.7153 |

Method | R-Square | MAE | MSE | RMSE |
---|---|---|---|---|

Linear Regression | 0.54 | 1.0891 | 2.2945 | 1.5148 |

Linear SVM | 0.54 | 1.0389 | 2.3062 | 1.5186 |

Quadratic SVM | 0.72 | 0.7913 | 1.3829 | 1.176 |

Cubic SVM | 0.77 | 0.7242 | 1.1388 | 1.0672 |

LSBoost | 0.78 | 0.6859 | 1.1166 | 1.0567 |

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Su, M.; Zhang, Z.; Zhu, Y.; Zha, D. Data-Driven Natural Gas Spot Price Forecasting with Least Squares Regression Boosting Algorithm. *Energies* **2019**, *12*, 1094.
https://doi.org/10.3390/en12061094

**AMA Style**

Su M, Zhang Z, Zhu Y, Zha D. Data-Driven Natural Gas Spot Price Forecasting with Least Squares Regression Boosting Algorithm. *Energies*. 2019; 12(6):1094.
https://doi.org/10.3390/en12061094

**Chicago/Turabian Style**

Su, Moting, Zongyi Zhang, Ye Zhu, and Donglan Zha. 2019. "Data-Driven Natural Gas Spot Price Forecasting with Least Squares Regression Boosting Algorithm" *Energies* 12, no. 6: 1094.
https://doi.org/10.3390/en12061094