Graph Regression Model for Spatial and Temporal Environmental Data—Case of Carbon Dioxide Emissions in the United States
Abstract
:1. Introduction
Contributions
2. Problem Statement—The Classical Approach
2.1. Preliminaries
2.2. System Model
- 1.
- Determine, for each of the N different locations, the specific relationship between the response variables and the set of covariates .
- 2.
- Based on this relationship, make a prediction of the CO levels in different locations in space and time.
2.3. Problem Formulation with a Classical Linear Regression Model
- (A1)
- The matrix Φ is nonrandom and has full rank, i.e., its columns are linearly independent,
- (A2)
- The vector is a random vector such that the following hold:
- (i)
- for some ;
- (ii)
- is a known positive definite matrix.
2.4. Generalized Linear Models
3. Proposed Graph Regression Model
3.1. Penalized Regression Model over Graph
- Case 1—: the penalization induces the smoothness of the successive mean vectors over a static graph structure .
- Case 2—: the penalization induces the smoothness of the successive mean vectors over a time-varying graph structure, .
- Case 3— or : The penalization induces the smoothness of the time difference mean vectors over a graph structure which could be either static or time varying, respectively. The matrix of dimension defined as
3.2. Learning and Prediction Procedure
Algorithm 1 Learning procedure of the proposed penalized regression model over graph |
Input: ,
Output: Optimal hyperparameters and regression coefficients |
4. Numerical Study—CO Prediction in the United States
4.1. Choice of Covariates and Data Pre-Processing
- Daily weather data (available on the platform of National Centers for Environmental Information (NCEI) https://www.ncdc.noaa.gov/ghcnd-data-access (accessed on 1 August 2023)) in the United States of America including maximal temperature (TMAX), minimal temperature (TMIN) and precipitation (PREC);
- Temporal information to capture the time patterns of the data;
- Lagged CO emission variables to take into account the time correlation of the response.
4.2. Graph Construction of the Spatial Component
4.3. Numerical Experiments
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A. Proof of Proposition 1
Appendix B. List of Counties Used in the Numerical Study
List of Counties | |||||
---|---|---|---|---|---|
Number | Counties | States | Number | Counties | States |
1 | Anoka County | Minnesota | 31 | Daviess County | Kentucky |
2 | Dakota County | Minnesota | 32 | Hopkins County | Kentucky |
3 | Lyon County | Minnesota | 33 | Russel County | Kentucky |
4 | Buchanan County | Iowa | 34 | Alamance County | North Carolina |
5 | Crawford County | Iowa | 35 | Lenoir County | North Carolina |
6 | Page County | Iowa | 36 | Pender County | North Carolina |
7 | Union County | Iowa | 37 | Randolph County | North Carolina |
8 | Ashley County | Arkansas | 38 | Charleston County | South Carolina |
9 | Columbia County | Arkansas | 39 | Dillon County | South Carolina |
10 | Outagamie County | Wisconsin | 40 | Lee County | South Carolina |
11 | Dane County | Wisconsin | 41 | Marlboro County | South Carolina |
12 | Clark County | Illinois | 42 | Pickens County | South Carolina |
13 | Mercer County | Illinois | 43 | Bartholomew County | Indiana |
14 | Ogle County | Illinois | 44 | Posey County | Indiana |
15 | Stephenson County | Illinois | 45 | Mahoning County | Ohio |
16 | Lawrence County | Tennessee | 46 | Shelby County | Ohio |
17 | Obion County | Tennessee | 47 | Delta County | Michigan |
18 | Cumberland County | Tennessee | 48 | Montcalm County | Michigan |
19 | Hinds County | Mississipi | 49 | Washtenaw County | Michigan |
20 | Tate County | Mississipi | 50 | Armstrong County | Pennsylvania |
21 | Blount County | Alabama | 51 | Montour County | Pennsylvania |
22 | Autauga County | Alabama | 52 | Lebanon County | Pennsylvania |
23 | Marengo County | Alabama | 53 | Luzerne County | Pennsylvania |
24 | Morgan County | Alabama | 54 | Addison County | Vermont |
25 | Talladega County | Alabama | 55 | Windsor County | Vermont |
26 | Bulloch County | Georgia | 56 | Grant Parish | Louisiana |
27 | Habersham County | Georgia | 57 | Red River Parish | Louisiana |
28 | Bradford County | Florida | 58 | Vermilion Parish | Louisiana |
29 | Clay County | Florida | 59 | Madison Parish | Louisiana |
30 | Taylor County | Florida |
References
- Cressie, N.; Wikle, C. Statistics for Spatio-Temporal Data; Wiley: Hoboken, NJ, USA, 2011. [Google Scholar]
- Wikle, C. Modern Perspectives on Statistics for Spatio-Temporal Data. Wires Comput. Stat. 2014, 7, 86–98. [Google Scholar] [CrossRef]
- Wikle, C.K.; Zammit-Mangion, A.; Cressie, N. Spatio-Temporal Statistics with R; Chapman & Hall/CRC: Boca Raton, FL, USA, 2019. [Google Scholar]
- Stroup, W. Generalized Linear Mixed Models: Modern Concepts, Methods and Applications; Chapman & Hall/CRC Texts in Statistical Science; Chapman & Hall/CRC: Boca Raton, FL, USA, 2012. [Google Scholar]
- St-Pierre, J.; Oualkacha, K.; Bhatnagar, S.R. Efficient penalized generalized linear mixed models for variable selection and genetic risk prediction in high-dimensional data. Bioinformatics 2023, 39, btad063. [Google Scholar] [CrossRef] [PubMed]
- Schelldorfer, J.; Meier, L.; Bühlmann, P. GLMMLasso: An Algorithm for High-Dimensional Generalized Linear Mixed Models Using ℓ1-Penalization. J. Comput. Graph. Stat. 2014, 23, 460–477. [Google Scholar] [CrossRef]
- Shuman, D.I.; Narang, S.K.; Frossard, P.; Ortega, A.; Vandergheynst, P. The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains. IEEE Signal Process. Mag. 2013, 30, 83–98. [Google Scholar] [CrossRef]
- Qiu, K.; Mao, X.; Shen, X.; Wang, X.; Li, T.; Gu, Y. Time-Varying Graph Signal Reconstruction. IEEE J. Sel. Top. Signal Process. 2017, 11, 870–883. [Google Scholar] [CrossRef]
- Giraldo, J.H.; Mahmood, A.; Garcia-Garcia, B.; Thanou, D.; Bouwmans, T. Reconstruction of Time-Varying Graph Signals via Sobolev Smoothness. IEEE Trans. Signal Inf. Process. Over Netw. 2022, 8, 201–214. [Google Scholar] [CrossRef]
- Belkin, M.; Niyogi, P.; Sindhwani, V. Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples. J. Mach. Learn. Res. 2006, 7, 2399–2434. [Google Scholar]
- Venkitaraman, A.; Chatterjee, S.; Händel, P. Predicting Graph Signals Using Kernel Regression Where the Input Signal is Agnostic to a Graph. IEEE Trans. Signal Inf. Process. Over Netw. 2019, 5, 698–710. [Google Scholar] [CrossRef]
- Karakurt, I.; Aydin, G. Development of regression models to forecast the CO2 emissions from fossil fuels in the BRICS and MINT countries. Energy 2023, 263, 125650. [Google Scholar] [CrossRef]
- Fouss, F.; Saerens, M.; Shimbo, M. Algorithms and Models for Network Data and Link Analysis; Cambridge University Press: Cambridge, UK, 2016. [Google Scholar]
- Aitken, A.C. On Least-squares and Linear Combinations of Observations. Proc. R. Soc. Edinb. 1936, 55, 42–48. [Google Scholar] [CrossRef]
- Nelder, J.A.; Baker, R. Generalized Linear Models; Wiley Online Library: Hoboken, NJ, USA, 1972. [Google Scholar]
- McCullagh, P.; Nelder, J.A. Generalized Linear Models, 2nd ed.; Chapman & Hall: London, UK, 1989; p. 500. [Google Scholar]
- Denison, D.G. Bayesian Methods for Nonlinear Classification and Regression; John Wiley & Sons: Hoboken, NJ, USA, 2002; Volume 386. [Google Scholar]
- Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning: Data Mining, Inference and Prediction, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
- Arlot, S.; Celisse, A. A survey of cross-validation procedures for model selection. Stat. Surv. 2010, 4, 40–79. [Google Scholar] [CrossRef]
- Hjorth, U.; Hjort, U. Model Selection and Forward Validation. Scand. J. Stat. 1982, 9, 95–105. [Google Scholar]
- Gurney, K.R.; Liang, J.; Patarasuk, R.; Song, Y.; Huang, J.; Roest, G. The Vulcan Version 3.0 High-Resolution Fossil Fuel CO2 Emissions for the United States. J. Geophys. Res. Atmos. 2020, 125, e2020JD032974. [Google Scholar] [CrossRef] [PubMed]
- Nevat, I.; Mughal, M.O. Urban Climate Risk Mitigation via Optimal Spatial Resource Allocation. Atmosphere 2022, 13, 439. [Google Scholar] [CrossRef]
Root Mean Square Error (RMSE): Distances Versus Empirical Correlations | ||||||
---|---|---|---|---|---|---|
Testing Set | Validation Set | Training Set | ||||
Perc. Train | Graph (Distance) | Graph (Correlation) | Graph (Distance) | Graph (Correlation) | Graph (Distance) | Graph (Correlation) |
70% | 16.42 | 27.04 | 13.67 | 14.92 | 13.40 | 7.96 |
Root Mean Square Error (RMSE) | |||||||||
---|---|---|---|---|---|---|---|---|---|
Testing Set | Validation Set | Training Set | |||||||
Perc. Train | Graph Reg. | Ridge | OLS | Graph Reg. | Ridge | OLS | Graph Reg. | Ridge | OLS |
50% | 35.65 | 41.43 | 42.10 | 16.80 | 17.86 | 17.65 | 9.13 | 6.74 | 6.55 |
60% | 30.02 | 36.77 | 41.41 | 15.02 | 19.60 | 19.73 | 21.73 | 6.52 | 6.52 |
70% | 16.42 | 22.65 | 49.52 | 13.67 | 17.13 | 16.44 | 13.40 | 7.94 | 7.02 |
Root Mean Square Error (RMSE) without Lagged Variables | |||||||||
---|---|---|---|---|---|---|---|---|---|
Testing Set | Validation Set | Training Set | |||||||
Perc. Train | Graph Reg. | Ridge | OLS | Graph Reg. | Ridge | OLS | Graph Reg. | Ridge | OLS |
70% | 38.54 | 38.54 | 41.76 | 20.28 | 20.28 | 20.34 | 9.65 | 9.65 | 9.64 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tayewo, R.; Septier, F.; Nevat, I.; Peters, G.W. Graph Regression Model for Spatial and Temporal Environmental Data—Case of Carbon Dioxide Emissions in the United States. Entropy 2023, 25, 1272. https://doi.org/10.3390/e25091272
Tayewo R, Septier F, Nevat I, Peters GW. Graph Regression Model for Spatial and Temporal Environmental Data—Case of Carbon Dioxide Emissions in the United States. Entropy. 2023; 25(9):1272. https://doi.org/10.3390/e25091272
Chicago/Turabian StyleTayewo, Roméo, François Septier, Ido Nevat, and Gareth W. Peters. 2023. "Graph Regression Model for Spatial and Temporal Environmental Data—Case of Carbon Dioxide Emissions in the United States" Entropy 25, no. 9: 1272. https://doi.org/10.3390/e25091272