Big Data Analytics for Long-Term Meteorological Observations at Hanford Site
Abstract
:1. Introduction
2. Methods
2.1. Site Description
2.2. Data Description
2.3. Basis for Defining Threshold and Trend Analysis
2.4. RF Classification
2.5. GUI Development for Rapid Assessment
3. Results
3.1. Threshold for Extreme Events
3.2. Seasonal MK Test and Sen’s Slope Analysis
3.3. Features before, during, and after an Extreme Event
3.4. EWEC GUI for Rapid Assessment
3.5. Extreme Event Classification
4. Discussion
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Albeverio, S.; Jentsch, V.; Kantz, H. Extreme Events in Nature and Society; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2006; pp. 1–9. [Google Scholar]
- Dehghanian, P.; Zhang, B.; Dokic, T.; Kezunovic, M. Predictive Risk Analytics for Weather-Resilient Operation of Electric Power Systems. IEEE Trans. Sustain. Energy 2019, 10, 3–15. [Google Scholar] [CrossRef]
- Otto, F.E.L.; Philip, S.; Kew, S.; Li, S.; King, A.; Cullen, H. Attributing high-impact extreme events across timescales—a case study of four different types of events. Clim. Change 2018, 149, 399–412. [Google Scholar] [CrossRef] [Green Version]
- Katz, R.W.; Brown, B.G. Extreme events in a changing climate: Variability is more important than averages. Clim. Change 1992, 21, 289–302. [Google Scholar] [CrossRef]
- Staid, A.; Guikema, S.D.; Nateghi, R.; Quiring, S.M.; Gao, M.Z. Simulation of tropical cyclone impacts to the US power system under climate change scenarios. Clim. Change 2014, 127, 535–546. [Google Scholar] [CrossRef]
- Marx, J.D.; Cornwell, J.B. The importance of weather variations in a quantitative risk analysis. J. Loss Prev. Process Ind. 2009, 22, 803–808. [Google Scholar] [CrossRef]
- Bubbico, R. A statistical analysis of causes and consequences of the release of hazardous materials from pipelines. The influence of layout. J. Loss Prev. Process Ind. 2018, 56, 458–466. [Google Scholar] [CrossRef]
- CCPS. Guidelines for Siting and Layout of Facilities; Wiley: Hoboken, NJ, USA, 2018; pp. 59–62. [Google Scholar]
- Stephenson, D.B.; Diaz, H.F.; Murnane, R.J. Definition, diagnosis, and origin of extreme weather and climate events. In Climate Extremes and Society; Cambridge University Press: Cambridge, UK, 2008; Volume 340, pp. 11–23. [Google Scholar]
- Huth, R.; Beck, C.; Philipp, A.; Demuzere, M.; Ustrnul, Z.; Cahynova, M.; Kysely, J.; Tveito, O.E. Classifications of atmospheric circulation patterns: Recent advances and applications. ANNALS N. Y. Acad. Sci. 2008, 1146, 105–152. [Google Scholar] [CrossRef]
- Hershfield, D.M. On the Probability of Extreme Rainfall Events. Bull. Am. Meteorol. Soc. 1973, 54, 1013–1018. [Google Scholar] [CrossRef]
- Tukey, J.W. Exploratory Data Analysis; Addison-Wesley: Reading, MA, USA, 1977; Volume 2. [Google Scholar]
- DOE-STD-3009-2014; Preparation of Nonreactor Nuclear Facility Documented Safety Analysis; DOE: Washington, DC, USA, 2014.
- Hodge, V.J.; Austin, J. A survey of outlier detection methodologies. Artif. Intell. Rev. 2004, 22, 85–126. [Google Scholar] [CrossRef] [Green Version]
- Rousseeuw, P.J.; Leroy, A.M. Robust Regression and Outlier Detection; John Wiley & Sons: Hoboken, NJ, USA, 2005; Volume 589. [Google Scholar]
- Maimon, O.; Rokach, L. Outlier detection. In Data Mining and Knowledge Discovery Handbook; Maimon, O., Rokach, L., Eds.; Springer: Berlin/Heidelberg, Germany, 2005; pp. 131–146. [Google Scholar]
- Akouemo, H.N.; Povinelli, R.J. Time series outlier detection and imputation. In Proceedings of the 2014 IEEE PES General Meeting|Conference & Exposition, National Harbor, MD, USA, 27–31 July 2014; pp. 1–5. [Google Scholar]
- Zhang, M.H.; Li, X.; Wang, L.L. An Adaptive Outlier Detection and Processing Approach Towards Time Series Sensor Data. IEEE Access 2019, 7, 175192–175212. [Google Scholar] [CrossRef]
- Wang, H.Z.; Bah, M.J.; Hammad, M. Progress in Outlier Detection Techniques: A Survey. IEEE Access 2019, 7, 107964–108000. [Google Scholar] [CrossRef]
- Camizuli, E.; Carranza, E.J. Exploratory data analysis (EDA). Encycl. Archaeol. Sci. 2018, 1–7. [Google Scholar] [CrossRef]
- Ren, F.M.; Trewin, B.; Brunet, M.; Dushmanta, P.; Walter, A.; Baddour, O.; Korber, M. A research progress review on regional extreme events. Adv. Clim. Change Res. 2018, 9, 161–169. [Google Scholar] [CrossRef]
- Farnham, D.J.; Doss-Gollin, J.; Lall, U. Regional Extreme Precipitation Events: Robust Inference From Credibly Simulated GCM Variables. Water Resour. Res. 2018, 54, 3809–3824. [Google Scholar] [CrossRef]
- Joseph, B.; Wang, F.H.; Shieh, D.S.S. Exploratory Data Analysis: A Comparison of Statistical-Methods with Artificial Neural Networks. Comput. Chem. Eng. 1992, 16, 413–423. [Google Scholar] [CrossRef]
- Singh, K.; Nagpal, R.; Sehgal, R. Exploratory Data Analysis and Machine Learning on Titanic Disaster Dataset. In Proceedings of the 2020 10th International Conference on Cloud Computing, Data Science & Engineering (Confluence), Noida, India, 29–31 January 2020; pp. 320–326. [Google Scholar]
- Jones, Z.M.; Linder, J.F. edarf: Exploratory Data Analysis using Random Forests. J. Open Source Softw. 2016, 1, 92. [Google Scholar] [CrossRef] [Green Version]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
- Jaiswal, J.K.; Samikannu, R. Application of random forest algorithm on feature subset selection and classification and regression. In Proceedings of the 2017 World Congress on Computing and Communication Technologies (WCCCT), Tiruchirappalli, India, 2–4 February 2017; pp. 65–68. [Google Scholar]
- Lee, S.; Choi, H.; Cha, K.; Chung, H. Random forest as a potential multivariate method for near-infrared (NIR) spectroscopic analysis of complex mixture samples: Gasoline and naphtha. Microchem. J. 2013, 110, 739–748. [Google Scholar] [CrossRef]
- Friedman, J.H. Stochastic gradient boosting. Comput. Stat. Data Anal. 2002, 38, 367–378. [Google Scholar] [CrossRef]
- Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
- Ren, H.; Song, X.; Fang, Y.; Hou, Z.J.; Scheibe, T.D. Machine Learning Analysis of Hydrologic Exchange Flows and Transit Time Distributions in a Large Regulated River. Front. Artif. Intell. 2021, 4, 648071. [Google Scholar] [CrossRef]
- Nawar, S.; Mouazen, A.M. Comparison between Random Forests, Artificial Neural Networks and Gradient Boosted Machines Methods of On-Line Vis-NIR Spectroscopy Measurements of Soil Total Nitrogen and Total Carbon. Sensors 2017, 17, 2428. [Google Scholar] [CrossRef]
- Zhang, Y.; Haghani, A. A gradient boosting method to improve travel time prediction. Transp. Res. Part C Emerg. Technol. 2015, 58, 308–324. [Google Scholar] [CrossRef]
- Booker, D.J.; Snelder, T.H. Comparing methods for estimating flow duration curves at ungauged sites. J. Hydrol. 2012, 434, 78–94. [Google Scholar] [CrossRef]
- Snelder, T.H.; Datry, T.; Lamouroux, N.; Larned, S.T.; Sauquet, E.; Pella, H.; Catalogne, C. Regionalization of patterns of flow intermittence from gauging station records. Hydrol. Earth Syst. Sci. 2013, 17, 2685–2699. [Google Scholar] [CrossRef] [Green Version]
- Kaminska, J.A. A random forest partition model for predicting NO2 concentrations from traffic flow and meteorological conditions. Sci. Total Environ. 2019, 651, 475–483. [Google Scholar] [CrossRef]
- O’Gorman, P.A.; Dwyer, J.G. Using Machine Learning to Parameterize Moist Convection: Potential for Modeling of Climate, Climate Change, and Extreme Events. J. Adv. Model. Earth Syst. 2018, 10, 2548–2563. [Google Scholar] [CrossRef] [Green Version]
- Sen, P.K. Estimates of the Regression Coefficient Based on Kendall’s Tau. J. Am. Stat. Assoc. 1968, 63, 1379–1389. [Google Scholar] [CrossRef]
- Mann, H.B. Nonparametric Tests against Trend. Econometrica 1945, 13, 245–259. [Google Scholar] [CrossRef]
- Kendal, M.G. Rank Correlation Methods. Br. J. Stat. Psychol. 1956, 9, 68. [Google Scholar] [CrossRef]
- Pingale, S.M.; Khare, D.; Jat, M.K.; Adamowski, J. Spatial and temporal trends of mean and extreme rainfall and temperature for the 33 urban centers of the arid and semi-arid state of Rajasthan, India. Atmos. Res. 2014, 138, 73–90. [Google Scholar] [CrossRef]
- Anderson, D.R.; Burnham, K.P.; Thompson, W.L. Null hypothesis testing: Problems, prevalence, and an alternative. J. Wildl. Manag. 2000, 64, 912–923. [Google Scholar] [CrossRef]
- Seleshi, Y.; Zanke, U. Recent changes in rainfall and rainy days in Ethiopia. Int. J. Climatol. 2004, 24, 973–983. [Google Scholar] [CrossRef]
- Luo, Y.; Liu, S.; Fu, S.L.; Liu, J.S.; Wang, G.Q.; Zhou, G.Y. Trends of precipitation in Beijiang River basin, Guangdong Province, China. Hydrol. Process. 2008, 22, 2377–2386. [Google Scholar] [CrossRef]
- Yilmaz, A.G.; Perera, B.J.C. Extreme Rainfall Nonstationarity Investigation and Intensity–Frequency–Duration Relationship. J. Hydrol. Eng. 2014, 19, 1160–1172. [Google Scholar] [CrossRef] [Green Version]
- Agilan, V.; Umamahesh, N.V. Modelling nonlinear trend for developing non-stationary rainfall intensity-duration-frequency curve. Int. J. Climatol. 2017, 37, 1265–1281. [Google Scholar] [CrossRef]
- Ren, H.; Hou, Z.J.; Wigmosta, M.; Liu, Y.; Leung, L.R. Impacts of Spatial Heterogeneity and Temporal Non-Stationarity on Intensity-Duration-Frequency Estimates—A Case Study in a Mountainous California-Nevada Watershed. Water 2019, 11, 1296. [Google Scholar] [CrossRef] [Green Version]
- Hirsch, R.M.; Slack, J.R.; Smith, R.A. Techniques of Trend Analysis for Monthly Water-Quality Data. Water Resour. Res. 1982, 18, 107–121. [Google Scholar] [CrossRef] [Green Version]
- Gilbert, R.O. Statistical Methods for Environmental Pollution Monitoring; Wiley: Hoboken, NJ, USA, 1987; pp. 230–239. [Google Scholar]
- El-Shaarawi, A.H.; Piegorsch, W.W. Encyclopedia of Environmetrics; Wiley: Hoboken, NJ, USA, 2006; Volume 2. [Google Scholar]
- Partal, T.; Kahya, E. Trend analysis in Turkish precipitation data. Hydrol. Process. 2006, 20, 2011–2026. [Google Scholar] [CrossRef]
- da Silva, R.M.; Santos, C.A.G.; Moreira, M.; Corte-Real, J.; Silva, V.C.L.; Medeiros, I.C. Rainfall and river flow trends using Mann–Kendall and Sen’s slope estimator statistical tests in the Cobres River basin. Nat. Hazards 2015, 77, 1205–1221. [Google Scholar] [CrossRef]
- Pal, M. Random forest classifier for remote sensing classification. Int. J. Remote Sens. 2005, 26, 217–222. [Google Scholar] [CrossRef]
- Rodriguez-Galiano, V.F.; Ghimire, B.; Rogan, J.; Chica-Olmo, M.; Rigol-Sanchez, J.P. An assessment of the effectiveness of a random forest classifier for land-cover classification. ISPRS J. Photogramm. Remote Sens. 2012, 67, 93–104. [Google Scholar] [CrossRef]
- Mingers, J. An empirical comparison of selection measures for decision-tree induction. Mach. Learn. 1989, 3, 319–342. [Google Scholar] [CrossRef]
- Kuhn, M.; Johnson, K. An Introduction to Feature Selection. In Applied Predictive Modeling; Springer: New York, NY, USA, 2013; pp. 487–519. [Google Scholar]
- Le Blancq, F. Diurnal pressure variation: The atmospheric tide. Weather 2011, 66, 306–307. [Google Scholar] [CrossRef]
- Ngarambe, J.; Nganyiyimana, J.; Kim, I.; Santamouris, M.; Yun, G.Y. Synergies between urban heat island and heat waves in Seoul: The role of wind speed and land use characteristics. PLoS ONE 2020, 15, e0243571. [Google Scholar] [CrossRef] [PubMed]
Extreme Events | Threshold | Duration |
---|---|---|
High wind | >30 mph | 1 h |
Low wind | <5 mph | 1 day |
Heatwave | >38 °C | max temperature over 38 °C more than 2 days |
Extreme cold | <−15 °C | low temperature less than −15 °C more than 2 days |
Heat Wave | Strong Wind | ||||||
---|---|---|---|---|---|---|---|
Training data confusion matrix | Training data confusion matrix | ||||||
Actual Class | Actual Class | ||||||
Predicted Class | Heatwave | Not heatwave | Predicted Class | Strong wind | Not strong wind | ||
Heatwave | 985 | 0 | Strong wind | 92 | 0 | ||
Not heatwave | 9 | 9958 | Not strong wind | 14 | 44299 | ||
Validation data confusion matrix | Validation data confusion matrix | ||||||
Actual Class | Actual Class | ||||||
Predicted Class | Heatwave | Not heatwave | Predicted Class | Strong wind | Not strong wind | ||
Heatwave | 138 | 12 | Strong wind | 10 | 0 | ||
Not heatwave | 77 | 2120 | Not strong wind | 2 | 9503 | ||
Testing data confusion matrix | Testing data confusion matrix | ||||||
Actual Class | Actual Class | ||||||
Predicted Class | Heatwave | Not heatwave | Predicted Class | Strong wind | Not strong wind | ||
Heatwave | 155 | 7 | Strong wind | 11 | 3 | ||
Not heatwave | 77 | 2109 | Not strong wind | 1 | 9501 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhou, H.; Ren, H.; Royer, P.; Hou, H.; Yu, X.-Y. Big Data Analytics for Long-Term Meteorological Observations at Hanford Site. Atmosphere 2022, 13, 136. https://doi.org/10.3390/atmos13010136
Zhou H, Ren H, Royer P, Hou H, Yu X-Y. Big Data Analytics for Long-Term Meteorological Observations at Hanford Site. Atmosphere. 2022; 13(1):136. https://doi.org/10.3390/atmos13010136
Chicago/Turabian StyleZhou, Huifen, Huiying Ren, Patrick Royer, Hongfei Hou, and Xiao-Ying Yu. 2022. "Big Data Analytics for Long-Term Meteorological Observations at Hanford Site" Atmosphere 13, no. 1: 136. https://doi.org/10.3390/atmos13010136
APA StyleZhou, H., Ren, H., Royer, P., Hou, H., & Yu, X. -Y. (2022). Big Data Analytics for Long-Term Meteorological Observations at Hanford Site. Atmosphere, 13(1), 136. https://doi.org/10.3390/atmos13010136