Predicting Terrestrial Heat Flow in North China Using Multiple Geological and Geophysical Datasets Based on Machine Learning Method

: Geothermal heat ﬂow is an essential parameter for the exploration of geothermal energy. The cost is often prohibitive if dense heat ﬂow measurements are arranged in the study area. Regardless, an increase in the limited and sparse heat ﬂow observation points is needed to study the regional geothermal setting. This research is signiﬁcant in order to provide a new reliable map of terrestrial heat ﬂow for the subsequent development of geothermal resources. The Gradient Boosted Regression Tree (GBRT) prediction model used in this paper is devoted to solving the problem of an insufﬁcient number of heat ﬂow observations in North China. It considers the geological and geophysical information in the region by training the sample data using 12 kinds of geological and geophysical features. Finally, a robust GBRT prediction model was obtained. The performance of the GBRT method was evaluated by comparing it with the kriging interpolation, the minimum curvature interpolation, and the 3D interpolation algorithm through the prediction performance analysis. Based on the GBRT prediction model, a new heat ﬂow map with a resolution of 0.25 ◦ × 0.25 ◦ was proposed, which depicted the terrestrial heat ﬂow distribution in the study area in a more detailed and reasonable way than the interpolation results. The high heat ﬂow values were mostly concentrated in the northeastern boundary of the Tibet Plateau, with a few scattered and small-scale high heat ﬂow areas in the southeastern part of the North China Craton (NCC) adjacent to the Paciﬁc Ocean. The low heat ﬂow values were mainly resolved in the northern part of the Trans-North China Orogenic belt (TNCO) and the southmost part of the NCC. By comparing the predicted heat ﬂow map with the plate tectonics, the olivine-Mg#, and the hot spring distribution in North China, we found that the GBRT could obtain a reliable result under the constraint of geological and geophysical information in regions with scarce and unevenly distributed heat ﬂow observations.


Introduction
Terrestrial heat flow is a crucial parameter to constrain the Earth's thermal structure, thermal conductivity, global heat loss, and to provide a model of the thermomechanical evolution of the lithosphere [1].Terrestrial heat flow is commonly measured using temperature logs and geothermal conductivity measurements in boreholes.In-situ measurements to constrain the heat flow distribution of a large area are still sparse and limited.Thus, predicting regional terrestrial heat flow is essential to compensate for the unevenly distributed heat flow observations.Obtaining a regional heat flow map is also important for the exploration of geothermal energy, which has become the focus of development as a clean and low-carbon renewable energy [2].
Thus far, the proposed terrestrial heat flow maps of China are all obtained by the interpolation of the published heat flow measurements [1,3].The observations of the geothermal heat flow data are highly uneven, which is a severe challenge for data interpolation in Energies 2023, 16, 1620 2 of 14 areas with significant data loss.Machine learning methods can compensate for the sparse distribution of the measured data by using geological and geophysical features related to the geothermal heat flow (such as the crustal thickness, the Moho temperature, the thermal lithospheric thickness, and the distance to hotspots).Using machine learning techniques, the samples can be trained under the constraint of the relevant geological and geophysical features to establish a complex relationship between heat flow and these features.Finally, a robust geothermal heat flow prediction model can be obtained.The machine learning method used in this paper is the gradient boosted regression tree (GBRT), which is a mature, supervised machine learning technique initially proposed by Friedman (2001).The GBRT produces competitive, highly robust, interpretable procedures for both regression and classification, which is especially appropriate for mining data that is less than clean [4].It has been widely applied to various fields, such as the prediction of solar power generation at multiple stations [5], the prediction of binding affinity between protein and RNA [6], and the prediction of geothermal heat flux [7].The cross-validation in in a prediction of geothermal heat flux in Greenland has proven that the prediction performance of the GBRT is better than that of the linear regression and the constant predictor [7].In that prediction, there are only nine direct geothermal heat flow measurements in Greenland [7].Although the Gaussian kernel function is used to supplement more than 60 geothermal measurements, it will inevitably reduce the prediction accuracy.The same method was used to predict the geothermal heat flow in Antarctica [8], in which the influence of a single feature on the prediction results was discussed.However, predictions in Antarctica and Greenland faced the same issue of having very few heat flow observations.In addition, the GBRT was applied to the prediction of the terrestrial heat flow in the Haihe River Basin in North China [9], which suggested that the GBRT was superior to the random forests (RF) and the extra tree regressor (ETR) methods in predicting the terrestrial heat flow.
Comprehensive geophysical, petrological, and geochemical data have been collected in North China, but the geothermal heat flow measurements are still inadequate.It is an ideal study area to predict the regional geothermal heat flow using the integrated constraints from the extensive geological and geophysical datasets.In this paper, the geothermal heat flow data were trained using the GBRT approach under the constraints of 12 kinds of geological and geophysical features based on the published observations of heat flow data in North China and the adjacent area.The importance of different geological and geophysical features was calculated and analyzed.The prediction performance of the GBRT was compared with that of the kriging interpolation, the minimum curvature, and the 3D interpolation methods through the prediction performance analysis.Finally, the distribution of geothermal heat flow was discussed in relation to the regional tectonics, the hot spring locations, and the lithospheric modification.

Data
In the following study, we present 12 geological and geophysical datasets that are used to construct the GBRT prediction model.

Geothermal Heat Flow
We integrated all the published heat flow data which included 1230 heat flow measurements in China [10][11][12].Among them, 716 measurements were located in the NCC.After dividing and averaging them onto a 0.25 • × 0.25 • grid, the final number of data points was 379.The data distribution is shown in Figure 1.Generally, the data distribution was uneven, with dense measurements in the south and eastern coastal areas and sparse measurements in the western and northern areas.The histogram in Figure 2. shows that the heat flow in the study area had a maximum value of 140.7 mW/m 2 , and a minimum value of 26.4 mW/m 2 , with an average value of 61.5 mW/m 2 .

Geological and Geophysical Information
The use of reliable data was crucial for the prediction results, with the first step being determination of the features to be used [8].In the process of constructing the GBRT prediction model, we selected 12 features (Table 1) with strong relation to the geothermal heat flow to participate in the training.Among them, the Bouguer gravity anomaly is inferred from the Global Ocean Gravity Model [13] and the EGM2008 model [14].The Global Ocean Gravity Model was constructed by combining the new radar altimeter measurements from the CryoSat-2 and the Jason-1 satellites.The EGM2008 model was constructed using the GRACE satellite tracking data (ITG-GRACE03S position coefficient information and corresponding covariance information), the satellite altimetry data, and the ground gravity data.The magnetic anomaly data were derived from the satellite aeromagnetic data [15].The upper crustal thickness, the mid-crustal thickness, the crustal thickness, the upper mantle density, and the Depth to Moho data were all from CRUST1.0 [16], a global 3D crustal model constructed from seismic wave velocities, which has higher accuracy

Geological and Geophysical Information
The use of reliable data was crucial for the prediction results, with the first ste determination of the features to be used [8].In the process of constructing the GB diction model, we selected 12 features (Table 1) with strong relation to the geo heat flow to participate in the training.Among them, the Bouguer gravity anoma ferred from the Global Ocean Gravity Model [13] and the EGM2008 model [14].Th Ocean Gravity Model was constructed by combining the new radar altimeter m ments from the CryoSat-2 and the Jason-1 satellites.The EGM2008 model was con using the GRACE satellite tracking data (ITG-GRACE03S position coefficient info and corresponding covariance information), the satellite altimetry data, and the gravity data.The magnetic anomaly data were derived from the satellite aerom data [15].The upper crustal thickness, the mid-crustal thickness, the crustal thickn upper mantle density, and the Depth to Moho data were all from CRUST1.0 [16], 3D crustal model constructed from seismic wave velocities, which has higher a

Geological and Geophysical Information
The use of reliable data was crucial for the prediction results, with the first step being determination of the features to be used [8].In the process of constructing the GBRT prediction model, we selected 12 features (Table 1) with strong relation to the geothermal heat flow to participate in the training.Among them, the Bouguer gravity anomaly is inferred from the Global Ocean Gravity Model [13] and the EGM2008 model [14].The Global Ocean Gravity Model was constructed by combining the new radar altimeter measurements from the CryoSat-2 and the Jason-1 satellites.The EGM2008 model was constructed using the GRACE satellite tracking data (ITG-GRACE03S position coefficient information and corresponding covariance information), the satellite altimetry data, and the ground gravity data.The magnetic anomaly data were derived from the satellite aeromagnetic data [15].The upper crustal thickness, the mid-crustal thickness, the crustal thickness, the upper mantle density, and the Depth to Moho data were all from CRUST1.0 [16], a global 3D crustal model constructed from seismic wave velocities, which has higher accuracy and resolution than the previous CRUST5.1 [17] and CRUST2.0[18] models.The topographic relief data were referenced by combining the existing depth soundings with the high-resolution ocean gravity information from the Geosat and the ERS-1 spacecraft to produce the digital ocean bathymetry maps with horizontal resolution of 1 to 12 km [19].For the Moho temperature and the thermal lithosphere thickness, we used the result of Xia et al. (2020) [3].We selected the ten most typical volcanoes in China [20].The locations of the hotspots were from Anderson (2016) [21].These geophysical and geological features were processed in the same way as the geothermal heat flow data, and the above data were interpolated onto a 0.25 • × 0.25 • grid.In addition, for the volcanoes and the hotspots, we added the distance from the data points to the nearest volcano and hotspot as features for training.

Method
The GBRT prediction method does not completely depend on the number and the spatial distribution of the measured heat flow values, and there have been many successful applications in the heat flow prediction [7][8][9].Therefore, in this study, we choose the GBRT method to predict the heat flow in the NCC.The GBRT is a supervised machine learning method initially proposed by Friedman (2001), which is a strong learner weighted by the ensemble of multiple weak learners [4,22].Each weak learner is an individual regression decision tree, and each subtree learns in the direction of the negative gradient of the residuals of the previous tree.By iterating continuously to reach the minimum value of the loss function, the GBRT prediction model is constructed.The algorithm mainly includes the growing of regression decision trees, pruning of regression decision trees, and the ensemble of regression decision trees.

Growing a Regression Decision Tree
Growing a regression decision tree is mainly based on the minimum mean square deviation, finding the best split variable j and the split point d in the training sample set D, and then recursively constructing the binary tree [23].The split variable j represents the jth feature.The splitting point d is the splitting position with the minimum mean square error when splitting according to the jth feature.The training sample set [4] is: where x i is the input variable, and y i represents the corresponding label, in which i = 1, 2, ..., N.
First, a feature is selected to sort N training samples by the feature value, and then the training samples are divided into two subsets, D L and D R , by using the feature value from the smallest to the largest as the split point.The optimal split pair (j, d) is solved by [23] min where c L represents the average value of all y in the left subset D L , and c R stands for the average value of all y in the right subset D R .The optimal split result under the feature is determined by minimizing the squared error.Finally, all features are traversed, and the above process is repeated to obtain the optimal split variable j and the optimal split point d.
Then we continue to cut the left and right subsets D L and D R in the same way and increase the branches of the regression decision tree until the conditions are met so as to generate an unpruned regression decision tree f (x) [22] Each subset is a feature unit, and each feature unit has a fixed output value c r , which is the label mean of the corresponding subset D r .

Pruning of Regression Decision Tree
In order to make the structure of the regression decision tree as simple as possible and to prevent overfitting, it needs to be pruned without loss of accuracy.Equation ( 4) is often used as an evaluation criterion for pruned subtrees [23,24]: where C(T) represents the mean square error of the node sample set; |T| is the number of leaf nodes in subtree T; α is a parameter used to balance the data fitting and complexity of the model: where t is any node inside a generated regression decision tree, and T t is the subtree with t as the root node.
After the value of α is calculated by Equation ( 5), the node with the smallest value of α is pruned to obtain the new subtree after one pruning.The above process is repeated until the root node is left, and then a sequence of regression decision trees {T 0 , T 1 , T 2 , . . ., T n } is obtained, where T 0 is the initial unpruned regression decision tree.Finally, the subtree with the smallest evaluation criteria C α (T) is selected as a complete regression decision tree in the decision tree sequence.

Ensemble of Regression Decision Trees
After pruning the regression decision tree, it is possible to learn based on its residuals, and then integrate the regression decision tree to obtain a strong learner.Firstly, the strong learner F 0 (x) is initialized with the mean square error as the loss function L [4,25]: where N represents the number of training samples, and β is the coefficient of the weak learner.Equation ( 7) calculates the negative gradient value of the loss function with F(x) as a variable after each iteration, which is the negative derivative value y i of the loss function at F m-1 (x).The value of m is 1, 2, ..., M. M is the total number of regression trees.
After each training session, the optimal solution α m of the parameters is solved by the negative gradient value y i and the current trained regression decision tree h (x i ; α) [4].
Subsequently, execute the line search of the steepest descent method to solve the coefficient β m of the regression decision treen [4]: In this way, an iteration is completed.Subsequently, the regression decision tree is updated according to the regression decision tree obtained from the iteration, which is the last result added to the next iteration: By repeating the above process and iterating for M times, the GBRT strong learner prediction model integrated by M regression decision trees is obtained.The overall process of the GBRT algorithm is summarized in Figure 3.
Energies 2023, 16, x FOR PEER REVIEW 6 of 14 After each training session, the optimal solution αm of the parameters is solved by the negative gradient value i y and the current trained regression decision tree h (xi; α) [4].
Subsequently, execute the line search of the steepest descent method to solve the coefficient βm of the regression decision treen [4]: (9) In this way, an iteration is completed.Subsequently, the regression decision tree is updated according to the regression decision tree obtained from the iteration, which is the last result added to the next iteration: By repeating the above process and iterating for M times, the GBRT strong learner prediction model integrated by M regression decision trees is obtained.The overall process of the GBRT algorithm is summarized in Figure 3.

Prediction Performance Analysis
We randomly divided the 379 geodetic heat flow data samples in the NCC region into two data sets, of which 80% of the data volume was used as the training set and the remaining 20% as the validation set (Figure 4).Assuming that the heat flow values at the locations of the data points in the validation set were unknown, the remaining 20% of the heat flow data were predicted using four methods based on the training data set, namely the GBRT, the kriging interpolation, the minimum curvature interpolation and the 3D interpolation.Finally, the obtained results were compared and analyzed with the observed values in the validation set.
In order to assess the accuracy of the heat flow prediction, the uncertainty of the prediction results was quantified by using mean absolute error (MAE), and normalized mean square error (N_RMSE).MAE represents the average of the absolute values of the deviation of all individual observations from the arithmetic mean, with smaller values representing a smaller deviation of the data values from the mean; N_RMSE is a simplified transformation of the expression of mean square error.The value is between 0 and 1, with

Prediction Performance Analysis
We randomly divided the 379 geodetic heat flow data samples in the NCC region into two data sets, of which 80% of the data volume was used as the training set and the remaining 20% as the validation set (Figure 4).Assuming that the heat flow values at the locations of the data points in the validation set were unknown, the remaining 20% of the heat flow data were predicted using four methods based on the training data set, namely the GBRT, the kriging interpolation, the minimum curvature interpolation and the 3D interpolation.Finally, the obtained results were compared and analyzed with the observed values in the validation set.
minimum curvature interpolation and the 3D interpolation methods are 10.10, 10.59, 1 and 12.52, respectively, showing an increase of errors.The normalized mean square e (N_RMSE) was 0.22, 0.23, 0.24 and 0.30, respectively, which also indicated an increa errors.The difference between the maximum value and the minimum value was 0.0 is obvious that the GBRT had the best predictive performance among the four meth while the 3D interpolation showed the worst performance.In order to assess the accuracy of the heat flow prediction, the uncertainty of the prediction results was quantified by using mean absolute error (MAE), and normalized square error (N_RMSE).MAE represents the average of the absolute values of the deviation of all individual observations from the arithmetic mean, with smaller values representing a smaller deviation of the data values from the mean; N_RMSE is a simplified transformation of the expression of mean square error.The value is between 0 and 1, with smaller values representing smaller errors.The calculations of MAE and N_RMSE are shown in Equations ( 11) and ( 12) [26], respectively: where y i is the ith heat flow observation value in the validation set, y is the average heat flow observation value in the validation set, ŷi is the ith heat flow predicted value in the validation set, and n is the number of heat flow observed in the validation set.
The results of the linear correlation analysis between the heat flow measurements in the validation set and the predicted values of the four methods; the GBRT, the kriging interpolation, the minimum curvature interpolation, and the 3D interpolation; are shown in Figure 5.The average absolute errors (MAE) of the GBRT, the kriging interpolation, the minimum curvature interpolation and the 3D interpolation methods are 10.10, 10.59, 11.01 and 12.52, respectively, showing an increase of errors.The normalized mean square error (N_RMSE) was 0.22, 0.23, 0.24 and 0.30, respectively, which also indicated an increase in errors.The difference between the maximum value and the minimum value was 0.08.It is obvious that the GBRT had the best predictive performance among the four methods, while the 3D interpolation showed the worst performance.

Calculating Feature Importance
The geological and the geophysical information features used in this study were care fully selected during the training of the samples based on the calculation of importance (Figure 6).The importance of each feature is the magnitude of the effect of each feature component on the classification or the regression, and is calculated by summing up the splitting mass of each feature in the decision tree.Since this study is a regression problem the splitting mass is the decreasing value of the regression error for each split.The calculation is a cumulative summation of the splitting masses of each feature component in the decision tree.The sum of the splitting masses of the sth feature component is assumed to be qs, and the splitting masses are normalized after summing up the feature components as follows [4]: where w represents the importance of each feature and p represents the dimensionality o the feature vector.The larger the value, the greater the importance of the corresponding feature, and the greater the contribution of the corresponding feature to the regression.

Calculating Feature Importance
The geological and the geophysical information features used in this study were carefully selected during the training of the samples based on the calculation of importance (Figure 6).The importance of each feature is the magnitude of the effect of each feature component on the classification or the regression, and is calculated by summing up the splitting mass of each feature in the decision tree.Since this study is a regression problem, the splitting mass is the decreasing value of the regression error for each split.The calculation is a cumulative summation of the splitting masses of each feature component in the decision tree.The sum of the splitting masses of the sth feature component is assumed to be q s , and the splitting masses are normalized after summing up the feature components as follows [4]: where w represents the importance of each feature and p represents the dimensionality of the feature vector.The larger the value, the greater the importance of the corresponding feature, and the greater the contribution of the corresponding feature to the regression.The correlation between the 12 geological and geophysical features and the geothermal heat flow was different.The distance to hotspot had the highest weighting of 0.13, thus it had the strongest correlation with the geothermal heat flow.The thickness of the upper crust and the Moho temperature also showed a strong correlation with a weighting of 0.11 and 0.10, respectively.Since the heat flow measurements were relatively distant from the craters [27][28][29][30][31], the distance to the volcano only showed a small weight value of 0.08 in this study.Two geophysical features, the Bouguer gravity anomaly and the satellite aeromagnetic anomaly, had the weakest correlation with the geothermal heat flow, with feature weights of 0.06 and 0.05, respectively, indicating that the two parameters, the density and the magnetization, were less intrinsically linked to the geothermal heat flow.The weight share and the ranking of each geological and geophysical feature are shown in Figure 6.

Predicting NCC Heat Flow
The heat flow map with a resolution of 0.25° was obtained in the NCC by the GBRT prediction model developed in this study, as shown in Figure 7.The 379 heat flow observations were divided into three intervals: >80 mW/m 2 , 50~80 mW/m 2 , and <50 mW/m 2 , corresponding to the red, green, and blue dots in Figure 7, respectively.Figure 7 shows that the predicted heat flow values agreed well with the observations.The heat flow in the study area was in the range of 50-80 mW/m 2 .Generally, the heat flow values in the eastern coastal area of the NCC and the northeastern Tibet Plateau were higher than those in the central part.The highest heat flow value was located in the northern part of the Tibet Plateau, with a value of 133 mW/m 2 .Several relatively scattered and small-scale regions with high heat flow are located in the central part of Trans-North China Orogenic Belt (TNCO), the Eastern Block and the southeastern Sulu Belt, with heat flow values greater than 80 mW/m 2 .The low heat flow areas were mainly distributed in the eastern Qinling-Dabie Belt, the Yangtze Block and on the boundary of the TNCO and Central Asian Orogenic Belt, with heat flow values of less than 50 mW/m 2 .The lowest value was about 26 mW/m 2 , which appeared in the northwestern part of Yangtze Block.The correlation between the 12 geological and geophysical features and the geothermal heat flow was different.The distance to hotspot had the highest weighting of 0.13, thus it had the strongest correlation with the geothermal heat flow.The thickness of the upper crust and the Moho temperature also showed a strong correlation with a weighting of 0.11 and 0.10, respectively.Since the heat flow measurements were relatively distant from the craters [27][28][29][30][31], the distance to the volcano only showed a small weight value of 0.08 in this study.Two geophysical features, the Bouguer gravity anomaly and the satellite aeromagnetic anomaly, had the weakest correlation with the geothermal heat flow, with feature weights of 0.06 and 0.05, respectively, indicating that the two parameters, the density and the magnetization, were less intrinsically linked to the geothermal heat flow.The weight share and the ranking of each geological and geophysical feature are shown in Figure 6.

Predicting NCC Heat Flow
The heat flow map with a resolution of 0.25 • was obtained in the NCC by the GBRT prediction model developed in this study, as shown in Figure 7.The 379 heat flow observations were divided into three intervals: >80 mW/m 2 , 50~80 mW/m 2 , and <50 mW/m 2 , corresponding to the red, green, and blue dots in Figure 7, respectively.Figure 7 shows that the predicted heat flow values agreed well with the observations.The heat flow in the study area was in the range of 50-80 mW/m 2 .Generally, the heat flow values in the eastern coastal area of the NCC and the northeastern Tibet Plateau were higher than those in the central part.The highest heat flow value was located in the northern part of the Tibet Plateau, with a value of 133 mW/m 2 .Several relatively scattered and small-scale regions with high heat flow are located in the central part of Trans-North China Orogenic Belt (TNCO), the Eastern Block and the southeastern Sulu Belt, with heat flow values greater than 80 mW/m 2 .The low heat flow areas were mainly distributed in the eastern Qinling-Dabie Belt, the Yangtze Block and on the boundary of the TNCO and Central Asian Orogenic Belt, with heat flow values of less than 50 mW/m 2 .The lowest value was about 26 mW/m 2 , which appeared in the northwestern part of Yangtze Block.

Discussion
Although the observation points of heat flow within the study area were sparse and extremely unevenly distributed, we still obtained plausible heat flow values under the constraints of the 12 geological and geophysical features.The GBRT has been shown to outperform the kriging interpolation, the minimum curvature interpolation, and the 3D interpolation in the prediction performance analysis.The major difference between the GBRT and the interpolation is that multiple geological and geophysical information features can be used to constrain the prediction of the GBRT, which fully takes into account the tectonic background of the study area.The heat flow map obtained from the kriging in Xia et al. 2020 showed a narrow band of high heat flow concentration extending from the central and northern Tibetan Plateau, through the southeastern WB, across the central TNCO to the EB, with heat flow values of 70-90 mW/m 2 .This agrees well with our predication.
The temperature of hot springs is a direct indication of the geothermal heat flow [35][36][37][38][39][40].To evaluate the accuracy of our prediction results, we studied the distribution of hot springs by dividing the regions of hot springs into three intervals based on their temperature: the high-temperature hot springs (T ≥ 75 °C), the medium-temperature hot springs (50 °C ≤ T < 75 °C), and the low-temperature hot springs (25 °C ≤ T < 50 °C) [38] (Figure 8a).The hot springs in the study area were mainly medium-and low-temperature hot springs with temperatures less than 75°C.Only a few hot springs showed temperatures of greater than 75 °C.We found that the distribution of hot springs showed a certain regularity, with medium-to low-temperature hot springs mainly exposed in areas of low heat flow in the central north and south of the study area, while the high-temperature hot springs are exposed in areas with higher heat flow values.This verified the robustness of our prediction model.

Discussion
Although the observation points of heat flow within the study area were sparse and extremely unevenly distributed, we still obtained plausible heat flow values under the constraints of the 12 geological and geophysical features.The GBRT has been shown to outperform the kriging interpolation, the minimum curvature interpolation, and the 3D interpolation in the prediction performance analysis.The major difference between the GBRT and the interpolation is that multiple geological and geophysical information features can be used to constrain the prediction of the GBRT, which fully takes into account the tectonic background of the study area.The heat flow map obtained from the kriging interpolation in Xia et al. 2020 showed a narrow band of high heat flow concentration extending from the central and northern Tibetan Plateau, through the southeastern WB, across the central TNCO to the EB, with heat flow values of 70-90 mW/m 2 .This agrees well with our predication.
The temperature of hot springs is a direct indication of the geothermal heat flow [35][36][37][38][39][40].To evaluate the accuracy of our prediction results, we studied the distribution of hot springs by dividing the regions of hot springs into three intervals based on their temperature: the high-temperature hot springs (T ≥ 75 • C), the medium-temperature hot springs (50 • C ≤ T < 75 • C), and the low-temperature hot springs (25 8a).The hot springs in the study area were mainly medium-and low-temperature hot springs with temperatures less than 75 • C.Only a few hot springs showed temperatures of greater than 75 • C. We found that the distribution of hot springs showed a certain regularity, with medium-to low-temperature hot springs mainly exposed in areas of low heat flow in the central north and south of the study area, while the high-temperature hot springs are exposed in areas with higher heat flow values.This verified the robustness of our prediction model.
Due to the remote response of the collision of the Cenozoic Indian plate with the Eurasian plate in the western part of the NCC, the overall elevation in and around the Tibetan Plateau was uplifted.The thickness of the thermal lithosphere in these regions was relatively thin compared to its surrounding areas [38].The southeastern coastal area was subjected to the Mesozoic subduction of the Paleo-Pacific plate underneath the Eurasian plate [3].The heat flow in the above two regions (the red regions in Figure 8b) was high (>80 mW/m 2 ) and the olivine-Mg# was low (<90), indicating that they had been severely reworked during the long-term plate activity, and that the original lithospheric mantle components had been strongly modified compared with the surrounding areas [3,41].As a result, the degree of fracture/fault development was significantly high, which provided the conditions for the formation of heat-conducting channels.The heat from the Earth's interior was transported upward along the channels, producing high heat flow values at the surface.Correspondingly, the high-temperature hot springs and the medium-to-low-temperature hot springs were concentrated in the area of high heat flow values.Although the northern part of the TNCO is influenced by the Paleozoic closure of the Paleo-Asian Ocean, and the Qinling-Dabie orogen is influenced by the Triassic subduction of the Yangtze Block, the heat flow values in these regions were relatively low, with relatively high olivine-Mg# (90-92).The degree of the fracture/fault development was also relatively low.As a result, these areas were relatively intact.Only several small-scale high heat flow areas appeared in the central part of the TNCO due to the influence of the Cenozoic extensional tectonics.
Due to the scarce distribution of heat flow measurements in Greenland and Antarctica, the Gaussian kernel function was used to supplement the sample set [7,8].In North China, the data coverage had reached two thirds of the study area, thus it was not necessary to supplement the sample set.It is generally accepted that interpolation methods are highly dependent on the quantity and the spatial distribution of heat flow measurements, which were major challenges for previous heat flow maps in mainland China [1].The comparison of the predicted heat flow map with the plate tectonics, the olivine-Mg#, and the hot spring distribution in North China has justified the prediction from the GBRT method.

Conclusions
In this study, a new heat flow map of North China was obtained by the GBRT method.The prediction performance analysis result showed that this heat flow map was more reasonable than those derived from the kriging interpolation, the minimum curvature interpolation and the 3D interpolation methods.The new heat flow map of the NCC elucidates the intrinsic relationship between the geothermal heat flow and the plate tectonics, the olivine-Mg#, and the hot spring distribution.
We found that the eastern margin of the Tibet Plateau and the eastern coastal areas were of high geothermal heat flow, which were affected by the collision of the Cenozoic Indian plate to the Eurasian plate and the subduction of the Mesozoic Pacific plate underneath the Eurasian plate, respectively.The NCC was largely destructed, with fractures developed to form heat-conducting channels, showing high terrestrial heat flow values and high temperatures of exposed hot springs.Conversely, the northern part of the TNCO and the southmost NCC exhibited low heat flow values due to a low degree of lithospheric modification and less fault/fracture development, where only the medium and the low temperature hot springs were exposed.
By comparing the predicted heat flow map and the location of heat flow observations in North China, we found that the GBRT could still obtain a reliable result under the constraint of geological and geophysical information in the region where the observations were scarce and unevenly distributed.The regions of different heat flow values mentioned above possessed only some discontinuous and scattered heat flow observation points, but still showed reasonable heat flow patterns consistent with the plate tectonics, the olivine-Mg#, and the hot spring distribution.The high heat flow values obtained in the study area provide effective indicators for the development of geothermal energy in China.This is the first terrestrial heat flow map derived from machine learning method in North China, using multiple geological and geophysical datasets.From a global perspective, the global heat flow distribution can be supplemented and the global heat flow resolution can be improved.

Figure 1 .
Figure 1.Map of heat flow measurements in North China.

Figure 2 .
Figure 2. Histogram of heat flow values.

Figure 1 .
Figure 1.Map of heat flow measurements in North China.

Figure 1 .
Figure 1.Map of heat flow measurements in North China.

Figure 2 .
Figure 2. Histogram of heat flow values.

Figure 2 .
Figure 2. Histogram of heat flow values.

Figure 3 .
Figure 3. Flowchart of the GBRT algorithm, where D(yi, xi) represents the training sample.DL and DR is the left and right subset obtained by optimal splitting of D, respectively.DLL, DLR, DRL, and DRR are the left and right subsets obtained by further optimal splitting of DL and DR.{T0, T1, T2, …, Tn} signify the sequence of pruned regression decision trees.h (xi; α) is the subtree with the smallest evaluation criterion selected, and is also the regression decision tree of current training.αm stands for the parameter of the regression decision tree, βm represents the coefficient of the regression decision tree (m=1, 2, …, M), M denotes the number of regression trees, and FM(x) is the strong learner of the Mth iteration.

Figure 3 .
Figure 3. Flowchart of the GBRT algorithm, where D(y i , x i ) represents the training sample.D L and D R is the left and right subset obtained by optimal splitting of D, respectively.D LL , D LR , D RL , and D RR are the left and right subsets obtained by further optimal splitting of D L and D R .{T 0 , T 1 , T 2 , . . ., T n } signify the sequence of pruned regression decision trees.h (x i ; α) is the subtree with the smallest evaluation criterion selected, and is also the regression decision tree of current training.α m stands for the parameter of the regression decision tree, β m represents the coefficient of the regression decision tree (m=1, 2, . . ., M), M denotes the number of regression trees, and F M (x) is the strong learner of the Mth iteration.

Figure 4 .
Figure 4. Distribution of the training and the validation data points.

Figure 4 .
Figure 4. Distribution of the training and the validation data points.

Figure 6 .
Figure 6.Relative importance of each feature on the trained predication model.

Figure 6 .
Figure 6.Relative importance of each feature on the trained predication model.

Figure 7 .
Figure 7. Heat flow prediction results with regional tectonics.The red solid line delineates the most significant fault in the NCC, the north-south trending Tan-Lu Fault.Gray solid lines indicate minor faults [32-34].Black dashed lines indicate the boundaries of different tectonic units.WB: Western Block; TNCO: Trans-North China Orogenic Belt; EB: Eastern Block.

Figure 7 .
Figure 7. Heat flow prediction results with regional tectonics.The red solid line delineates the most significant fault in the NCC, the north-south trending Tan-Lu Fault.Gray solid lines indicate minor faults [32-34].Black dashed lines indicate the boundaries of different tectonic units.WB: Western Block; TNCO: Trans-North China Orogenic Belt; EB: Eastern Block.

Table 1 .
The geophysical and geological features used in this study with their respective sources.