Predicting Frost Depth of Soils in South Korea Using Machine Learning Techniques

Choi, Hyun-Jun; Kim, Sewon; Kim, YoungSeok; Won, Jongmuk

doi:10.3390/su14159767

Open AccessArticle

Predicting Frost Depth of Soils in South Korea Using Machine Learning Techniques

by

Hyun-Jun Choi

¹

,

Sewon Kim

²,

YoungSeok Kim

¹ and

Jongmuk Won

^3,*

¹

Northern Infrastructure Specialized Team, Korea Institute of Civil Engineering and Building Technology, Goyang 10223, Korea

²

Department of Geotechnical Engineering Research, Korea Institute of Civil Engineering and Building Technology, Goyang 10223, Korea

³

Department of Civil and Environmental Engineering, University of Ulsan, Ulsan 44610, Korea

^*

Author to whom correspondence should be addressed.

Sustainability 2022, 14(15), 9767; https://doi.org/10.3390/su14159767

Submission received: 2 June 2022 / Revised: 27 July 2022 / Accepted: 5 August 2022 / Published: 8 August 2022

(This article belongs to the Special Issue Geotechnical Engineering towards Sustainability)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Predicting the frost depth of soils in pavement design is critical to the sustainability of the pavement because of its mechanical vulnerability to frozen-thawed soil. The reliable prediction of frost depth can be challenging due to the high uncertainty of frost depth and the unavailability of geotechnical properties needed to use the available empirical- and analytical-based equations in literature. Therefore, this study proposed a new framework to predict the frost depth of soil below the pavement using eight machine learning (ML) algorithms (five single ML algorithms and three ensemble learning algorithms) without geotechnical properties. Among eight ML models, the hyperparameter-tuned gradient boosting model showed the best performance with the coefficient of determination (R²) = 0.919. Furthermore, it was also shown that the developed ML model can be utilized in the prediction of several levels of frost depth and assessing the sensitivity of pavement-related predictors for predicting the frost depth of soils.

Keywords:

frost depth; frozen-thawed; pavement; machine learning; hyperparameter

1. Introduction

Soils are porous materials containing pore water, which freeze from the ground surface when seasonally frozen soils are exposed to atmospheric temperature below 0 °C [1]. When the freezing process is initiated from the surface, the pore water is moved from the unfrozen to frozen soil by capillary action, which induces the progress of freezing with volume expansion [2,3]. The expansion of soil induced by the freezing process is called frost heaving and more significant frost heaving would be anticipated when the frozen soil mass continues to expand by groundwater supply [4,5]. Among many infrastructures that are potentially affected by frost heaving (e.g., buildings, bridge foundations, and utility lines), pavement is one of the substantially affected infrastructure by the frost heaving of the soils.

During the thawing period (when the atmospheric temperature rises above 0 °C), the frozen soil starts melting from the ground surface. Therefore, pavement structure (e.g., subgrade, granular base) becomes saturated when excess pore water from melted ice is trapped between the ground surface and frozen layer. The increased water content may lead to a potential risk of high pore water pressure, which reduces the shear strength of the soil [6,7] and degrades the performance of the pavement structure. The increase in pore water pressure also significantly reduces the bearing capacity of the pavement structure. Under such conditions, substantial displacement can occur if high vehicle loads are applied to the pavement [8]. The deeper the frost penetration, the greater the ground affected by frozen-thawed soil and the risk of pavement damage.

Because of the vulnerability of pavement above the frozen-thawed soil, the frost depth can be an important design parameter, particularly in cold regions [9,10]. Therefore, reliable prediction of the frost depth is essential for pavement engineering design. The frost depth is affected by climatic conditions such as atmospheric temperature, solar radiation, and wind speed for environmental factors and mineralogy, thermal properties (e.g., thermal conductivity), and water content for soil-related factors. Many studies have investigated the frost depth by using numerical and analytical approaches [11,12,13,14]. In addition, an empirical equation for predicting frost depth using thermal conductivities of soil and the average atmospheric temperature has been reported [15]. However, the abovementioned studies require geotechnical properties to predict the frost depth. However, some geotechnical properties are hard to measure and may not be available for the site where the prediction of frost depth is needed.

In this study, a new framework was proposed to predict the frost depth of soils without measuring the frost depth by coring the pavement layer. Eight machine learning (ML) algorithms were used to develop an optimal frost depth prediction model. The measured dataset in South Korea was used, which does not contain geotechnical properties. The best-performing ML models were determined by comparing the coefficient of determination for eight ML models after performing hyperparameter tuning. In addition, the predictability of each level of frost depth and sensitivity of frost depth by pavement-related predictors were also discussed using the best-performing developed ML model.

2. Dataset

2.1. Field Experiment

The frost depth was measured using the methylene blue solution, which is blue only at a temperature above 0 °C [16,17]. The device for measuring the frost depth consists of an acrylic cylinder with an outer diameter of 25.4 mm and an inner diameter of 15 mm, a transparent tube with a diameter of 10 mm, a protective iron body, and a cover (Figure 1). The acrylic inner tube was filled with methylene blue solution, which appears colorless at a temperature below 0 °C. The frost depth was evaluated by measuring the depth at which the methylene blue solution in the acrylic inner tube turned colorless. The accuracy of the measuring device was ±50 mm [18].

To measure the frost depth, a total of 80 stations were installed throughout South Korea as shown in Figure 2 [19]. The locations of each station were selected to measure the representative frost depth of the area, which satisfied the following conditions: (1) representative soil of the area, (2) relatively high altitude, and (3) shaded location. To analyze the factors affecting the frost depth, a reference survey of each area at the observation stations was conducted by measuring the east longitude, north latitude, and the thickness of each pavement layer (thickness of asphalt surface, thickness of asphalt base, thickness of granular base, thickness of subgrade, thickness of frost protection layer).

As shown in Figure 3, the installation of the frost depth observatory was divided into six steps [19]. (1) The locations of the observatory were determined. (2) Pavement was cored to a depth of 15 cm with a diameter of 15 cm, which allowed the protective iron body to be sufficiently inserted. (3) Borings with a diameter of 3 cm and a depth of 150 cm were performed on the pavement structure. (4) The water-proof acrylic tube containing the methylene blue solution was installed. (5) The protective iron body that can be opened for measurement and prevents damage to the device from the vehicle loads was installed. (6) The measuring device was fixed by filling the gap between the protective iron body and the pavement with mortar.

2.2. Data Analysis

A total of 686 data points were utilized to develop ML models in this study. The dataset consists of frost depth (Y) and nine predictors: X1 = longitude, X2 = latitude, X3 = elevation (m), X4 = thickness of asphalt surface (cm), X5 = thickness of asphalt base (cm), X6 = thickness of granular base (cm), X7 = thickness of subgrade (cm), X8 = thickness of frost protection layer (cm), X9 = freezing index (°C·day). The set of predictors includes three location-related parameters (X1, X2, and X3) and five pavement-related parameters (X4, X5, X6, X7, and X8). The freezing index (X9) in this study refers to the difference between the maximum and minimum cumulative temperature profiles throughout the year. Therefore, the freezing index is only dependent on temperature, but is not a function of soil properties. This allows the inclusion of the ambient temperature-dependent parameter in the set of predictors for developing ML models to predict the frost depth of soils, which is also a function of groundwater level, ground temperature, and soil properties (e.g., thermal conductivity, particle size, porosity, etc.).

Table 1 summarizes the statistical descriptions of the dataset. The distribution of predictors and frost depth showed the different range of values and variations also shown in Figure 4. It can be noted that location-related predictors, freezing index, and frost depth showed a relatively well distributed dataset throughout the range of values (Figure 4a–c,i,j) while the values of pavement-related predictors were limited to a few ranges (Figure 4d–h).

The high correlation between predictors may degrade the performance of ML models. Therefore, the Pearson correlation coefficients between each pair of the predictor were evaluated and illustrated by the heatmap as shown in Figure 5. The highest positive value of 0.73 between the thickness of asphalt surface (X4) and thickness of asphalt base (X5) was observed, implying a relatively strong correlation between those two predictors. However, because X4 and X5 are design parameters of pavement (no physical relationship), they can be critical predictors in predicting frost depth. Furthermore, X4 and X5 in the dataset fall into only a few values (Figure 4d,e); both X4 and X5 were included as predictors in the following sections.

3. Methodology

3.1. Machine Learning Techniques

Machine learning techniques is an AI system that predicts output values from given input data through regression analysis or classification method, and many machine learning algorithms are being developed. In this paper, eight commonly used machine learning techniques were utilized to obtain the best-performing algorithm to predict the frost depth of South Korea [20,21,22]. Of the eight machine learning techniques used in the ML model, five were machine learning algorithms (K-nearest neighbor (KNN), neural network (NN), stochastic gradient descent (SGD), support vector machine (SVM), and decision tree (DT)), and three were ensemble learning techniques (random forest (RF), gradient boosting (GB), and extreme-gradient boosting (XGB)). The RF is one of the most frequently used parallel ensemble learning methods while GB and XGB were selected to represent the sequential learning method (also known as boosting). The characteristics of each ML technique used in this study are summarized in Table 2.

3.2. Dataset Partition and Scaling

The distribution of predictors in the dataset shown in Figure 4 implies that nine predictors (X1–X9) and frost depth (Y) showed different scales, which may downgrade the performance of ML models. Therefore, min-max scaling (normalization of predictors and frost depth by the difference between maximum and minimum) was applied in this study to improve the performance of ML models. In addition, this study utilized 80% of the dataset for training ML models and 20% of the dataset to assess the generalization capacity of developed ML models. The training and test dataset were shuffled and randomly selected to avoid sampling bias.

3.3. Performance Measurement and K-Fold Cross-Validation

The performance of eight developed ML models for predicting frost depth was assessed through the coefficient of determination (R²) and root mean squared error (RMSE). The RMSE provided the absolute error of min-max scaled frost depth between measured and predicted values while R² values provided the relative predictability of developed ML models. In addition, k-fold cross-validation was performed to assess the performance of ML models where the training dataset was split into k subsets for training (k-1 subsets) and test (remaining one subset). Therefore, the k evaluation scores were returned as a result of k-fold cross-validation. Note that k = 10, which is normally applied to k-fold cross validation, was selected in this study.

3.4. Hyperparameter Tuning

This study implemented the hyperparameter tuning for eight ML algorithms to obtain the best performing each ML-algorithm with corresponding hyperparameters. The manual tuning of hyperparameters is usually time-consuming and inefficient. Therefore, this study applied a grid search approach for the efficient implementation of hyperparameter tuning. Several important hyperparameters for each ML algorithm were first selected followed by the selection of a few numbers in a reasonable range for each hyperparameter. Then all possible combinations were generated to determine the best-performing set of hyperparameters where the performance was assessed by R². In this paper, k = 5 was used to obtain the optimal hyperparameter combination.

4. Results and Discussion

4.1. Performance of Developed Models

Table 3 shows the K-fold cross-validation, R², and RMSE values of eight developed ML models without performing hyperparameter tuning (default hyperparameters were used). As shown in Table 3, each ML model showed a different performance for the given dataset used in this study. In addition, relatively high R² values for some ML models (e.g., SVM, GB, XGV) implies the dataset used in this study provides the predictability of frost depth without performing hyperparameter tuning. However, extremely low R² (0.569 for the test dataset) and high RMSE values (0.146 for the test dataset) for SGD and relatively low R² values for NN (0.805) and DT (0.806) imply the need for hyperparameter tuning to improve the performance of these ML models. Notably, very high R² values of DT, RF, and XGB for training show poor generalization capacity because of even lower R² values for test and K-fold cross-validation. Overall, the GB algorithm shows the best performance (based on R² values for K-fold and test in Table 3) among eight developed ML models and ensemble ML algorithms (RF, GB, and XGB) show better performance than NN, SGD, and DT.

4.2. Improved Performance after Hyperparameter Tunings

Table 4 summarizes the performance of developed ML models after hyperparameter tuning and Figure 6 compares K-fold cross-validation and R² values of the test dataset before (Table 3) and after (Table 4) hyperparameter tuning. As seen in Table 4 and Figure 6, an increase in R² values and a decrease in RMSE values were observed after performing hyperparameter tuning for all ML models. In addition, more significant performance improvement by hyperparameter tuning was observed for five single-learning regression models (KNN, NN, SGD, SVM, and DT) than in three ensemble ML models. This implies the ensemble ML algorithm can be utilized to develop a prediction model with reasonably high performance without performing hyperparameter tuning. Nevertheless, the performance of three ensemble ML models was slightly improved after performing hyperparameter tuning (Figure 6), meaning that hyperparameter tuning is still needed for ensemble learning algorithms for better prediction of frost depth.

GB showed the best performance among eight ML models after hyperparameter tuning and was also evaluated to have better performance than XGB with hyperparameter tuning under the same conditions (parameter and range of hyperparameters). In general, the performance of XGB is evaluated better than that of GB, however, in this study, the performance of GB was estimated to be better than that of XGB due to the influence of the configuration and distribution of the given data.

Figure 7 shows the scatter plots of the predicted and measured values for two algorithms, DT and GB. A good model predicts values that are as close to the measured values as possible, and in a scatter plot, the points are located as close to the diagonal line as possible. Compared to other algorithms, DT differed significantly in performance indicators (R² value and RMSE) of train and test dataset. This means that DT has poor generalizing capacity and is not suitable for predicting frost depth in this study.

4.3. Prediction of Each Level of Frost Depth (Confusion Matrix)

In this paper, the GB based model was used to classify the frost-sensitive ground at a given range of frost depth using the confusion matrix. The deeper the frost depth, the more frost expansion occurs and more susceptible the ground is to freezing. Grounds that have a deep frost depth and are sensitive to freezing are classified into a category that is concerned about frost damage and requires continuous management. This study divided the ranges of frost depth into four and eight categories as shown in Table 6. The maximum frost depth in the dataset used in this study was 159 cm and the range of frost depth was determined to have an identical interval (the dataset was split by the range of values listed in Table 6).

Figure 8 and Figure 9 illustrate the confusion matrix corresponding to the range of frost depths presented in Table 6. As shown in Figure 8, relatively high accuracy values for training and test were observed when dividing the dataset into four categories. In contrast, low accuracy valuates for training and test were observed for eight categories (Figure 9), which is more or less intuitive. Because the number of categories for the second scenario is twice higher than that of the first scenario, the number of data in every two categories in Figure 9 is equal to that in every category in Figure 8 (e.g., 70 + 18 + 6 + 92 (data in the first two categories in Figure 9) = 186 (data in the first category in Figure 8)). Therefore, it can be anticipated that the developed GB model shows good predictability for frost depth when the number of categories is low (<4) for designing civil infrastructure. However, the low accuracy in Figure 9 implies that a more accurate model would be required to achieve an accurate prediction of frost depth for the high number of categories (>5).

4.4. Sensitivity of Pavement-Related Predictors for Predicting Frost Depth

To quantify the sensitivity of frost depth caused by the variation pavement design, the distribution of min-max scaled frost depth according to the variation of pavement-related predictor (X4–X8) was evaluated as shown in Figure 10. All scaled predictors were set to 0.5 (corresponding to the value of (max-min) × 0.5), except the predictor on the horizontal axis in Figure 10, which was set as uniformly distributed values from 0 to 1.

As shown in Figure 10, all scaled frost depths by the variation of pavement-related predictors fell into the range between 0.45 and 0.55 (=71.55 and 87.45 cm), implying that the variation of single pavement-related predictors does not significantly affect the frost depth of soils. Nevertheless, only a few centimeter differences in predicted frost depth caused by the variation of pavement-related predictors may be critical in some cases. Moreover, the range of values for pavement-related predictors in this study (Table 3) is somewhat limited, and the higher variation of predicted frost depth can be observed by using the values higher than the maximum values in Table 3.

The boxplots in Figure 10 also imply that the thickness of the asphalt surface is the most critically influenced by the frost depth of soils. In contrast, the variation of thickness of granular base almost does not affect the frost depth. The width of boxplots in Figure 10 yields the order of most influenced pavement-related predictors as X4, X5, X7, X8, and X6. Because the values of pavement-related predictors in the dataset were not well distributed throughout the range of minimum and maximum values, the order of sensitivity observed in Figure 10 may be changed after applying well-distributed values of those predictors in developing ML models.

5. Conclusions

This study developed eight ML models with and without hyperparameter tuning for predicting the frost depth of soils below the pavement. R² and RMSE of each developed ML model were evaluated to obtain the best performing ML model among eight ML models. In addition, the prediction performance and sensitivity of pavement-related predictors were also evaluated by using a confusion matrix and sensitivity analysis using the GB model. Based on the observations made in this study, the following conclusions can be drawn:

(1): The evaluated R² and RMSE values indicated that ensemble ML algorithms (RF, GB, and XGB) showed higher performance than single ML algorithms (KNN, NN, SGD, SVM, and DT). After performing hyperparameter tuning, GB showed the best performance among the eight ML algorithms.
(2): The performance was improved more significantly after hyperparameter tuning for five single ML models than for three ensemble ML models, which implies the ensemble ML algorithms can be used to develop models for predicting frost depth with reasonably high performance without performing hyperparameter tuning.
(3): The developed best performing GB model can be used to assess the predictability of frost depth in a predefined number of categories using the confusion matrix. However, the low prediction accuracy in the confusion matrix for eight categories implies that a more accurate model would be required to achieve high predictability of frost depth.
(4): The result of sensitivity analysis for pavement-related predictors implies that the thickness of the asphalt surface is the most critical factor affecting the frost depth of soils.

Author Contributions

Conceptualization, S.K. and Y.K.; methodology, H.-J.C. and J.W.; software, H.-J.C. and J.W.; validation, S.K., Y.K. and J.W.; investigation, Y.K.; data curation, S.K.; writing—original draft preparation, H.-J.C. and J.W.; writing—review and editing, H.-J.C., Y.K. and J.W.; visualization, H.-J.C. and J.W.; supervision, Y.K. and J.W.; project administration, Y.K. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by Korea Agency for Infrastructure Technology Advancement grant funded by Ministry of Land, Infrastructure and Transport. (RS-2022-00143644).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Some or all data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Li, Q.; Wei, H.; Han, L.; Wang, F.; Zhang, Y.; Han, S. Feasibility of Using Modified Silty Clay and Extruded Polystyrene (XPS) Board as the Subgrade Thermal Insulation Layer in a Seasonally Frozen Region, Northeast China. Sustainability 2019, 11, 804. [Google Scholar] [CrossRef] [Green Version]
Penner, E. The Mechanism of Frost Heaving in Soils. Highw. Res. Board Bull. 1959, 225, 1–22. [Google Scholar]
Zhang, Y.; Korkiala-Tanttu, L.K.; Gustavsson, H.; Miksic, A. Assessment for Sustainable Use of Quarry Fines as Pavement Construction Materials: Part I-Description of Basic Quarry Fine Properties. Materials 2019, 12, 1209. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Vaitkus, A.; Gražulyte, J.; Skrodenis, E.; Kravcovas, I. Design of Frost Resistant Pavement Structure Based on Road Weather Stations (RWSs) Data. Sustainability 2016, 8, 1328. [Google Scholar] [CrossRef] [Green Version]
Liu, Y.; Li, D.; Chen, L.; Ming, F. Study on the Mechanical Criterion of Ice Lens Formation Based on Pore Size Distribution. Appl. Sci. 2020, 10, 8981. [Google Scholar] [CrossRef]
Yao, L.Y.; Broms, B.B. Excess Pore Pressures Which Develop during Thawing of Frozen Fine-Grained Subgrade Soils. Highw. Res. Rec. 1965, 39–57. [Google Scholar]
Eigenbrod, K.D.; Knutsson, S.; Sheng, D. Pore-Water Pressures in Freezing and Thawing Fine-Grained Soils. J. Cold Reg. Eng. 1996, 10, 77–92. [Google Scholar] [CrossRef]
Simonsen, E.; Isacsson, U. Thaw Weakening of Pavement Structures in Cold Regions. Cold Reg. Sci. Technol. 1999, 29, 135–151. [Google Scholar] [CrossRef]
Remišová, E.; Decký, M.; Podolka, L.; Kováč, M.; Vondráčková, T.; Bartuška, L. Frost Index from Aspect of Design of Pavement Construction in Slovakia. Procedia Earth Planet. Sci. 2015, 15, 3–10. [Google Scholar] [CrossRef] [Green Version]
Fu, J.; Shen, A. Meso- and Macro-Mechanical Analysis of the Frost-Heaving Effect of Void Water on Asphalt Pavement. Materials 2022, 15, 414. [Google Scholar] [CrossRef]
Chisholm, R.A.; Phang, W.A. Measurement and Prediction of Frost Penetration in Highways. Transp. Res. Rec. 1983, 918, 1–10. [Google Scholar]
Kahimba, F.C.; Ranjan, R.S.; Mann, D.D. Modeling Soil Temperature, Frost Depth, and Soil Moisture Redistribution in Seasonally Frozen Agricultural Soils. Appl. Eng. Agric. 2009, 25, 871–882. [Google Scholar] [CrossRef]
Orakoglu, M.E.; Liu, J.; Tutumluer, E. Frost Depth Prediction for Seasonal Freezing Area in Eastern Turkey. Cold Reg. Sci. Technol. 2016, 124, 118–126. [Google Scholar] [CrossRef]
Roustaei, M.; Hendry, M.T.; Roghani, A. Investigating the Mechanism of Frost Penetration under Railway Embankment and Projecting Frost Depth for Future Expected Climate: A Case Study. Cold Reg. Sci. Technol. 2022, 197, 103523. [Google Scholar] [CrossRef]
Rajaei, P.; Baladi, G.Y. Frost Depth: General Prediction Model. Transp. Res. Rec. J. Transp. Res. Board 2015, 2510, 74–80. [Google Scholar] [CrossRef]
Iwata, Y.; Hirota, T.; Suzuki, T.; Kuwao, K. Comparison of Soil Frost and Thaw Depths Measured Using Frost Tubes and Other Methods. Cold Reg. Sci. Technol. 2012, 71, 111–117. [Google Scholar] [CrossRef]
Kim, H.S.; Lee, J.; Kim, Y.S.; Kang, J.-M.; Hong, S.-S. Experimental and Field Investigations for the Accuracy of the Frost Depth Indicator with Methylene Blue Solution. J. Korean Geosynth. Soc. 2013, 12, 75–79. [Google Scholar] [CrossRef]
Gandahi, R. Determination of the Ground Frost Line by Means of a Simple Type of Frost Depth Indicator; Statens Väginstitut: Stockholm, Sweden, 1963; pp. 14–19. [Google Scholar]
Hong, S.; Kim, Y.; Kim, S. A Study on the Frost Penetration Depth in Pavements; Korea Institute of Civil Engineering and Building Technology: Goyang, Korea, 2019; pp. 57–59. [Google Scholar]
Liakos, K.; Busato, P.; Moshou, D.; Pearson, S.; Bochtis, D. Machine Learning in Agriculture: A Review. Sensors 2018, 18, 2674. [Google Scholar] [CrossRef] [Green Version]
Garg, R.; Aggarwal, H.; Centobelli, P.; Cerchione, R. Extracting Knowledge from Big Data for Sustainability: A Comparison of Machine Learning Techniques. Sustainability 2019, 11, 6669. [Google Scholar] [CrossRef] [Green Version]
Nguyen, H.; Vu, T.; Vo, T.P.; Thai, H.-T. Efficient Machine Learning Models for Prediction of Concrete Strengths. Constr. Build. Mater. 2021, 266, 120950. [Google Scholar] [CrossRef]
Guo, G.; Wang, H.; Bell, D.; Bi, Y.; Greer, K. KNN Model-Based Approach in Classification. In On The Move to Meaningful Internet Systems 2003: CoopIS, DOA, and ODBASE—OTM 2003; Springer: Berlin/Heidelberg, Germany, 2003; pp. 986–996. [Google Scholar]
Abiodun, O.I.; Jantan, A.; Omolara, A.E.; Dada, K.V.; Mohamed, N.A.; Arshad, H. State-of-the-Art in Artificial Neural Network Applications: A Survey. Heliyon 2018, 4, e00938. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Géron, A. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems; O’Reilly Media: Newton, MA, USA, 2019; ISBN 1492032611. [Google Scholar]
Cortes, C.; Vapnik, V. Support-Vector Networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
Myles, A.J.; Feudale, R.N.; Liu, Y.; Woody, N.A.; Brown, S.D. An Introduction to Decision Tree Modeling. J. Chemom. 2004, 18, 275–285. [Google Scholar] [CrossRef]
Belgiu, M.; Drăguţ, L. Random Forest in Remote Sensing: A Review of Applications and Future Directions. ISPRS J. Photogramm. Remote Sens. 2016, 114, 24–31. [Google Scholar] [CrossRef]
Natekin, A.; Knoll, A. Gradient Boosting Machines, a Tutorial. Front. Neurorobot. 2013, 7. [Google Scholar] [CrossRef] [Green Version]
Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; ACM: New York, NY, USA, 2016; pp. 785–794. [Google Scholar]

Figure 1. Description of the frost depth measuring device: (a) configuration of the acrylic tube; (b) installation of measuring device.

Figure 2. Locations of observatory stations marked with red dots.

Figure 3. Installation of frost depth observatory in six steps: (a) positioning; (b) pavement coring; (c) boring for acrylic tube; (d) installation for acrylic tube; (e) protective iron body insertion; (f) backfill.

Figure 4. Histograms of dataset used in this study: (a) X1 = longitude; (b) X2 = latitude; (c) X3 = elevation (m); (d) X4 = thickness of asphalt surface (cm); (e) X5 = thickness of asphalt base (cm); (f) X6 = thickness of granular base (cm); (g) X7 = thickness of subgrade (cm); (h) X8 = thickness of frost protection layer (cm); (i) X9 = freezing index (°C·day); (j) Y = frost depth (m).

Figure 5. Heatmap of correlation of each pair of input predictors. Note that numbers in each zone represent Pearson’s coefficient between two predictors.

Figure 6. R² values for K-fold cross-validation, train, test of eight ML models: (a) before hyperparameter tuning (using default hyperparameters in Table 5); (b) after hyperparameter tuning (using tuned hyperparameters in Table 5).

Figure 7. Scatter plots of cross-validation predicted Y (frost depth) against measured Y (frost depth): (a) train data for DT; (b) train data for GB; (c) test data for DT; (d) test data for GB.

Figure 8. Confusion matrix of four ranges of frost depth (D = frost depth): (a) training dataset; (b) test dataset.

Figure 9. Confusion matrix of eight ranges of frost depth (D = frost depth): (a) training dataset; (b) test dataset.

Figure 10. Distribution of scaled frost depth according to the variation of pavement-related predictors (GB model was used here).

Table 1. Statistical descriptions of dataset used in this study.

Predictors	Mean ± Std.	Min	Q1	Median	Q3	Max	CV
X1	127.72 ± 0.80	126.23	127.02	127.60	128.37	129.46	0.01
X2	36.70 ± 1.04	33.43	36.07	36.86	37.51	38.13	0.03
X3	225.63 ± 193.90	7.00	73.00	192.00	328.00	805.00	0.86
X4	5.48 ± 3.46	0.00	5.00	5.00	5.00	28.00	0.63
X5	9.59 ± 7.73	0.00	6.00	10.00	15.00	50.00	0.81
X6	2.56 ± 5.93	0.00	0.00	0.00	0.00	20.00	2.31
X7	27.32 ± 11.75	0.00	20.00	30.00	30.00	50.00	0.43
X8	12.45 ± 15.92	0.00	0.00	0.00	25.00	60.00	1.28
X9	195.50 ± 159.18	3.94	71.38	155.11	283.40	954.16	0.81
Y	59.75 ± 39.35	0.00	26.05	53.75	87.00	159.00	0.66

Std.: standard deviation; Q1 and Q3: first and third quartiles; CV: coefficient of variation.

Table 2. Key features of selected ML techniques in this study.

ML Technique	Features	Reference
KNN	Non-parametric algorithm, high-speed training, computationally expensive	[23]
NN	Ability to learn nonlinear model, sensitive to scaling of predictors, need to optimize number of layers and neurons	[24]
SGD	Optimization of differentiable cost function, ability to train big dataset, hyperparameters tuning is necessary	[25]
SVM	Non-parametric algorithm, ability to model high-dimensional data, computationally expensive	[26]
DT	Non-parametric algorithm, ability to obtain predictor importance, high chance of overfitting issues	[27]
RF	Homogenous ensemble algorithm, ability to select base ML algorithm (e.g., DT, SVM, NN, KNN)	[28]
GB	Sequential learning method, minimize the loss function by parameterizing DT,	[29]
XGB	Addition of regularization term into GB to solve overfitting issues, many loss functions available	[30]

Table 3. Performance of developed ML models by using default hyperparameters.

ML Technique	R²			RMSE
ML Technique	K-Fold	Train	Test	K-Fold	Train	Test
KNN	0.884 ± 0.031	0.935	0.884	0.082 ± 0.009	0.063	0.085
NN	0.823 ± 0.040	0.814	0.805	0.103 ± 0.011	0.107	0.108
SGD	0.480 ± 0.049	0.523	0.569	0.179 ± 0.016	0.175	0.146
SVM	0.880 ± 0.028	0.898	0.863	0.083 ± 0.007	0.078	0.094
DT	0.834 ± 0.051	1.000	0.808	0.098 ± 0.013	0.000	0.106
RF	0.872 ± 0.035	0.987	0.877	0.872 ± 0.035	0.029	0.079
GB	0.910 ± 0.019	0.950	0.910	0.073 ± 0.008	0.055	0.077
XGB	0.888 ± 0.033	0.999	0.892	0.081 ± 0.008	0.009	0.078

Table 4. Performance of developed ML models after hyperparameter tuning.

ML Technique	R²			RMSE
ML Technique	K-Fold	Train	Test	K-Fold	Train	Test
KNN	0.889 ± 0.032	0.954	0.889	0.079 ± 0.009	0.053	0.081
NN	0.889 ± 0.022	0.920	0.847	0.081 ± 0.010	0.070	0.093
SGD	0.782 ± 0.043	0.798	0.769	0.111 ± 0.010	0.111	0.118
SVM	0.882 ± 0.030	0.905	0.865	0.082 ± 0.008	0.076	0.094
DT	0.865 ± 0.042	0.932	0.818	0.088 ± 0.011	0.065	0.103
RF	0.888 ± 0.033	0.985	0.910	0.079 ± 0.009	0.030	0.076
GB	0.918 ± 0.015	0.955	0.919	0.070 ± 0.007	0.053	0.068
XGB	0.906 ± 0.014	0.975	0.904	0.076 ± 0.005	0.040	0.075

Table 5. Best hyperparameters for each ML algorithm.

ML Technique	Range of Hyperparameters	Tuned Hyperparameters
KNN	leaf_size = 10, 15, 20, …, 40, 45, 50 n_neighbors = 3, 4, 5, …, 8, 9, 10	10 3
NN	α = 0.00001, 0.0001, 0.001, 0.01, 0.1 early_stopping = True, False hidden_layer_sizes = [50], [100], [100, 100], [50, 50, 50]	0.0001 True [50, 50, 50]
SGD	α = 0.0001, 0.0001, 0.001, 0.005 eta0 = 0.005, 0.01, 0.03, 0.05, 0.1, 0.2 max_iter = 500, 600, 700, …, 1800, 1900, 2000	0.0001 0.2 1200
SVM	C = 0.5, 0.6, 0.7, …, 1.3, 1.4, 1.5 gamma = 0.5, 0.55, 0.6, …, 0.9, 0.95, 1.0 kernel = ’linear’, ’poly’, ’rbf’, ‘sigmoid’	1.4 1.9 ’rbf’
DT	min_samples_leaf = 1, 2, 3, …, 8, 9, 10 min_samples_split = 1, 2, 3, …, 8, 9, 10	7 3
RF ¹	n_estimators = 50, 60, 70, …, 280, 290, 300	280
GB ¹	max_depth = 2, 3, 4, …, 8, 9, 10 n_estimators = 20, 30, 40, …, 280, 290, 300	2 270
XGB ¹	max_depth = 2, 3, 4, …, 8, 9, 10 n_estimators = 20, 30, 40, …, 280, 290, 300	4 50

¹ Base predictor = DT.

Table 6. Ranges of frost depth in each category in Figure 8 and Figure 9.

First Scenario		Second Scenario
Category No. Figure 8	Range of Frost Depth (cm)	Category No. Figure 9	Range of Frost Depth (cm)
1	0–39.7	1	0–19.9
2	39.7–79.5	2	19.9–39.8
3	79.5–119.3	3	39.8–59.6
4	119.3–159	4	59.6–79.5
5	-	5	79.5–99.4
6	-	6	99.4–119.3
7	-	7	119.3–139.1
8	-	8	139.1–159

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Choi, H.-J.; Kim, S.; Kim, Y.; Won, J. Predicting Frost Depth of Soils in South Korea Using Machine Learning Techniques. Sustainability 2022, 14, 9767. https://doi.org/10.3390/su14159767

AMA Style

Choi H-J, Kim S, Kim Y, Won J. Predicting Frost Depth of Soils in South Korea Using Machine Learning Techniques. Sustainability. 2022; 14(15):9767. https://doi.org/10.3390/su14159767

Chicago/Turabian Style

Choi, Hyun-Jun, Sewon Kim, YoungSeok Kim, and Jongmuk Won. 2022. "Predicting Frost Depth of Soils in South Korea Using Machine Learning Techniques" Sustainability 14, no. 15: 9767. https://doi.org/10.3390/su14159767

APA Style

Choi, H.-J., Kim, S., Kim, Y., & Won, J. (2022). Predicting Frost Depth of Soils in South Korea Using Machine Learning Techniques. Sustainability, 14(15), 9767. https://doi.org/10.3390/su14159767

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Predicting Frost Depth of Soils in South Korea Using Machine Learning Techniques

Abstract

1. Introduction

2. Dataset

2.1. Field Experiment

2.2. Data Analysis

3. Methodology

3.1. Machine Learning Techniques

3.2. Dataset Partition and Scaling

3.3. Performance Measurement and K-Fold Cross-Validation

3.4. Hyperparameter Tuning

4. Results and Discussion

4.1. Performance of Developed Models

4.2. Improved Performance after Hyperparameter Tunings

4.3. Prediction of Each Level of Frost Depth (Confusion Matrix)

4.4. Sensitivity of Pavement-Related Predictors for Predicting Frost Depth

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI