Estimating Planetary Boundary Layer Height over Central Amazonia Using Random Forest

Silva, Paulo Renato P.; Carneiro, Rayonil G.; Moraes, Alison O.; Dias-Junior, Cleo Quaresma; Fisch, Gilberto

doi:10.3390/atmos16080941

Open AccessArticle

Estimating Planetary Boundary Layer Height over Central Amazonia Using Random Forest

by

Paulo Renato P. Silva

^1,†

,

Rayonil G. Carneiro

^2,*,†

,

Alison O. Moraes

^3,4,†

,

Cleo Quaresma Dias-Junior

^5,†

and

Gilberto Fisch

^6,†

¹

Instituto Tecnológico de Aeronáutica, São José dos Campos 12228-900, SP, Brazil

²

Instituto Nacional de Pesquisas Espaciais, São José dos Campus 12227-010, SP, Brazil

³

Instituto de Aeronáutica e Espaço, São José dos Campos 12228-904, SP, Brazil

⁴

Departamento de Informática, Universidade de Taubaté, Taubaté 12020-270, SP, Brazil

⁵

Instituto Federal do Pará, Belém 66645-240, PA, Brazil

⁶

Departamento de Ciências Agrárias, Universidade de Taubaté, Taubaté 12020-270, SP, Brazil

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Atmosphere 2025, 16(8), 941; https://doi.org/10.3390/atmos16080941

Submission received: 30 May 2025 / Revised: 31 July 2025 / Accepted: 1 August 2025 / Published: 5 August 2025

(This article belongs to the Special Issue Applications of Artificial Intelligence in Atmospheric Sciences)

Download

Browse Figures

Versions Notes

Abstract

This study investigates the use of a Random Forest (RF), an artificial intelligence (AI) model, to estimate the planetary boundary layer height (PBLH) over Central Amazonia from climatic elements data collected during the GoAmazon experiment, held in 2014 and 2015, as it is a key metric for air quality, weather forecasting, and climate modeling. The novelty of this study lies in estimating PBLH using only surface-based meteorological observations. This approach is validated against remote sensing measurements (e.g., LIDAR, ceilometer, and wind profilers), which are seldom available in the Amazon region. The dataset includes various meteorological features, though substantial missing data for the latent heat flux (LE) and net radiation (Rn) measurements posed challenges. We addressed these gaps through different data-cleaning strategies, such as feature exclusion, row removal, and imputation techniques, assessing their impact on model performance using the Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and

r^{2}

metrics. The best-performing strategy achieved an RMSE of 375.9 m. In addition to the RF model, we benchmarked its performance against Linear Regression, Support Vector Regression, LightGBM, XGBoost, and a Deep Neural Network. While all models showed moderate correlation with observed PBLH, the RF model outperformed all others with statistically significant differences confirmed by paired t-tests. SHAP (SHapley Additive exPlanations) values were used to enhance model interpretability, revealing hour of the day, air temperature, and relative humidity as the most influential predictors for PBLH, underscoring their critical role in atmospheric dynamics in Central Amazonia. Despite these optimizations, the model underestimates the PBLH values—by an average of 197 m, particularly in the spring and early summer austral seasons when atmospheric conditions are more variable. These findings emphasize the importance of robust data preprocessing and higtextight the potential of ML models for improving PBLH estimation in data-scarce tropical environments.

Keywords:

central Amazon; machine learning; boundary layer estimation; low-cost meteorological modeling

1. Introduction

The planetary boundary layer (PBL) is the lowest part of the atmosphere, directly influenced by its interaction with the Earth’s surface on hourly and subdiurnal scales [1,2]. This region plays a critical role in governing weather patterns, air quality, and the dispersion of pollutants. The planetary boundary layer height (PBLH) is a key parameter for characterizing the vertical structure of the PBL, providing crucial information on the depth of turbulent mixing, convective transport, and pollutant dispersion. In addition, the PBLH is directly associated with cloud generation and development, energy and mass exchanges between the surface and the atmosphere, and the hydrological cycle [3].

The role of the PBL is central to several applications, such as weather forecasting, numerical modeling of the atmosphere, climate change studies, and air quality research [4,5]. The transition of the PBL from the nocturnal stable phase to the diurnal convective phase, for example, is crucial for the evolution of atmospheric convection and cloud formation (shallow or deep convection), especially in tropical regions [6,7]. Understanding the temporal behavior of the PBLH, through continuous measurements and modeling, is essential to represent with more fidelity the processes of vertical mixing and surface–atmosphere coupling.

Traditional methods for estimating PBLH rely on numerical and theoretical approaches grounded in turbulence theory and thermodynamics. Foundational works such as Kaimal and Finnigan [1] and Stull [2] introduced definitions based on the Richardson number and potential temperature gradients. These remain embedded in large-eddy simulations and operational models, often supplemented by vertical mixing schemes such as those by Troen and Mahrt [8] or Noh et al. [9]. However, such methods are sensitive to parameter tuning and may fail under weak forcing or complex terrain.

Observation-based methods, such as radiosondes and LIDAR (Light Detection and Ranging), provide valuable PBLH estimates but are constrained by cost, coverage, and launch frequency. Although LIDARs and ceilometers allow high-resolution continuous monitoring, their reliance on backscatter properties and sensitivity to aerosol conditions can limit robustness. These challenges motivate the use of machine learning (ML), which has demonstrated success in capturing complex, nonlinear environmental relationships [10,11].

Machine learning techniques have opened new opportunities for estimating environmental variables by capturing complex, nonlinear relationships in large datasets [12,13]. In the context of PBLH estimation, early ML applications focused on tree-based models such as Random Forest (RF), known for their robustness, interpretability, and resistance to overfitting [14,15]. Krishnamurthy et al. [16], for instance, used RF with LIDAR and meteorological inputs, achieving strong performance compared to classical methods.

Recent advancements have introduced more sophisticated architectures, including deep learning and hybrid ensemble models, often relying on remote sensing or reanalysis datasets to improve accuracy [17,18,19,20]. While these approaches represent the state of the art, they typically require high-resolution inputs and substantial data availability.

Our study takes a distinct approach by using exclusively surface-based meteorological observations to estimate PBLH in the Central Amazon. We evaluate the performance of a Bayesian-optimized Random Forest model, under several preprocessing and imputation strategies, in comparison with Linear Regression and Support Vector Regression traditional ML models and LightGBM, XGBoost, and Deep Neural Network (DNN) state-of-the-art models and incorporate explainability tools to provide insight into the model’s behavior. This framework addresses the scarcity of in situ remote sensing instruments in the region and offers a cost-effective, scalable alternative for modeling tropical boundary layer dynamics. By training on variables such as surface fluxes, temperature, humidity, and wind speed, we aim to assess the feasibility of RF for this task, benchmark it against baseline models, and identify the most influential meteorological drivers of PBLH variability. While not intended to compete with large-scale, data-intensive approaches, this work demonstrates the potential of interpretable and operationally viable ML solutions in data-scarce environments.

The remainder of this report is structured as follows: Section 2 describes the dataset, preprocessing steps, and machine learning methods employed. Section 3 presents the results, including model performance across different strategies, feature importance analysis, and explainability insights. Finally, Section 4 discusses the implications of our findings and concludes with suggestions for future research directions.

2. Materials and Methods

2.1. Data

The dataset used for this study was measured during the Green Ocean Amazon (GoAmazon) 2014/5 project (https://www.arm.gov/research/campaigns/amf2014goamazon, accessed on 9 May 2025), which provides a comprehensive collection of meteorological and atmospheric measurements essential for estimating the PBLH values. These data were collected at the T3 research station (03°12′36″ S, 60°36′00″ W), located north of the municipality of Manacapuru, in the state of Amazonas, Brazil, approximately 9.5 km from the nearest urban area and about 11.5 km from the left bank of the Solimões River, near the confluence with the Manacapuru River in the Central Amazon basin. The T3 site is situated in a pasture area surrounded by native forest with an average canopy height of approximately 35 m [21]. The climate is closely linked to rainfall patterns, with distinct wet (February to May) and dry (August to October) seasons. Temperature variation is minimal, around 2 °C, and the energy partitioning between sensible and latent heat fluxes is largely influenced by rainfall and soil moisture. The wind speed in Central Amazonia is low, between 2.0 and 4.0 ms⁻¹. In this work, we will use the following variables, including air temperature, relative humidity, wind velocity, net radiation (

R n

), and surface turbulent fluxes, such as sensible (H) and latent (

L E

) heat fluxes. These variables are directly associated with the heating of the soil and air and consequently with the variability of the PBLH. Table 1 summarizes the instruments used for obtaining the features used in this work, along with the measurements and sensor resolution.

The PBLH values, used as an observational reference (ground truth) for model validation, were measured by a CL31 ceilometer (Vaisala Inc., Helsinki, Finland). This LIDAR remote sensing instrument operates with pulsed diode laser technology in the near-infrared (900–1100 nm), emitting vertical pulses autonomously and continuously. With a vertical range of up to 7700 m, the ceilometer records aerosol backscatter profiles at high temporal resolution (2 s measurement interval and 16 s sampling rate), allowing detailed monitoring of the PBL evolution throughout the day. The PBLH was estimated based on the layer of scattered particles derived from the backscatter profiles, assuming that this layer represents the daytime PBLH, given that the entrainment zone is typically shallow. This approach provides reliable, continuous PBLH data that are critical for training and validating our ML model.

The ceilometer database used has a 30 min time resolution, spanning two years (2014–2015), which allows for capturing detailed temporal variations in PBLH dynamics throughout the day. This level of detail helps reveal rapid changes influenced by factors like solar heating and atmospheric stability. The data are organized across multiple files (see Table 2), each containing a different feature or desired output. Although some data gaps exist due to occasional sensor downtimes or recording errors, these missing values were addressed using appropriate techniques, as outlined in the followinents a sample of the 35,328-rows dataset used for training the ML model, illustrating typical values found in the data.

2.2. Preliminary Analysis

We started the study with a preliminary analysis of the collected dataset, which provides descriptive statistics of the key features under consideration, presented at Table 3. It presents measures of central tendency (mean), dispersion/variability (standard deviation), and range (minimum and maximum values) for each variable. Additionally, the table higtextights the presence of missing values across different columns, which were addressed during the preprocessing phase. These descriptive statistics give an overview of the data’s distribution, helping to assess its suitability for model training and informing decisions on potential data transformations or imputation methods required for handling missing or outlier values. The characteristics of the dataset, including time resolution and units of measurement for each feature, were taken into account to ensure that the data are properly prepared for use in the RF model.

We then proceed to a deeper feature analysis, where we examine the distribution of each feature in detail. The histograms in Figure 1 reveal the frequency distribution of each feature across the dataset. We observe that most features, such as air temperature and relative humidity, display distributions with central tendencies, while some features, like H,

L E

, and

R n

, exhibit wider ranges, indicating variability in atmospheric conditions. Additionally, features such as wind velocity exhibit a skewed distribution, reflecting the predominance of lower wind speeds, typically around 2–4 ms^{$- 1$}. This preliminary analysis helps us understand the spread and central values of each feature, providing insight into the atmospheric conditions that influence PBL dynamics.

In addition to the original dataset features shown in Figure 1, we incorporated the hour of the day (4th column of Table 2) as a feature to capture daily patterns in PBL dynamics. In order to facilitate numerical processing, these values were converted to decimal format, allowing the model to work with continuous values instead of date–time objects. Additionally, a binary feature was engineered to distinguish between daytime (from 6 AM up to 6 PM) and nighttime (6 PM up to 6 AM), enhancing the model’s ability to account for temporal variations in PBL behavior. These additional features were designed to improve the model’s capacity to capture both diurnal and nocturnal dynamics in an effective way.

In the data preprocessing phase, several steps were implemented to enhance data quality and ensure consistency before model training. First, data cleaning was conducted to remove erroneous readings and extreme outliers, which were likely the result of instrument errors or temporary malfunctions. Figure 2, for instance, illustrates the boxplot of relative humidity values, text the presence of outliers. While deviation from the central cluster, in some cases, higtextights the dynamics of the feature, Figure 2 also reveals distinct outliers, which may indicate anomalies caused by sensor errors or extreme atmospheric conditions. Outliers like this were identified and removed based on statistical analysis to avoid skewing the results. Moreover, given that RF models generally do not require feature scaling, we opted not to apply feature scaling in any of the features in this study.

Missing values, primarily due to sensor inactivity, were addressed using three approaches: row removal, mean imputation, and K-Nearest Neighbors (KNN) imputation [22]. The row removal approach excluded the complete rows with missing data entirely, while KNN imputation estimates missing values based on the closest neighbors in feature space, leveraging relationships between variables. Mean imputation followed a structured two-step process: for specific day–hour combinations with missing observations, values were imputed using the mean of available observations for the corresponding hour across all non-missing days, thus preserving temporal patterns and minimizing the impact of missing data on model performance. When no hourly mean was available, the missing values were imputed using the overall mean of the corresponding feature or target variable. These preprocessing strategies ensured a cleaner and more reliable dataset, supporting robust model training and evaluation. These preprocessing strategies aimed to produce a cleaner and more reliable dataset, supporting robust model training and evaluation. However, it is important to acknowledge that imputing a substantial portion of a variable’s values, such as 30%, can significantly alter its statistical properties and underlying physical meaning. At this level, imputation may introduce additional uncertainty and potentially dampen the natural variability that the model is intended to capture.

The feature correlation matrix in Figure 3 reveals several important relationships among the variables in our dataset, offering insights that are valuable for interpreting the dynamics of the PBL. Constructed using all the available data, the matrix shows strong positive correlations between

R n

, air temperature, and H, suggesting that as solar energy increases, air temperature and H values also rise, which may contribute to the expansion of the PBL. Conversely, relative humidity shows a strong negative correlation with both air temperature and

R n

, indicating that as the atmosphere warms and gains energy, humidity levels typically decrease. This aligns with expected meteorological patterns, where warmer, drier air during periods of high solar radiation leads to lower relative humidity and deeper PBL heights. Wind speed has moderate-positive correlations with

R n

, air temperature, and H, showing that higher energy conditions enhance atmospheric mixing. Overall, these correlations higtextight the interactions between energy fluxes, temperature, humidity, and wind speed, underscoring their collective influence on PBL height dynamics. Despite the collinearity concerns suggested by the higher correlation values between air temperature and relative humidity (−0.93), shown in the correlation matrix, and given that we have only six features, we chose not to remove any of them and proceeded with the complete dataset.

The final dataset includes the following variables:

Sensible Heat Flux (H): Directly impacts atmospheric buoyancy and PBL growth and depth.
Latent Heat Flux (LE): Provides information on evapotranspiration and moisture flux.
Net Radiation (Rn): Drives surface energy balance, influencing temperature, relative humidity, and atmospheric stability.
Air Temperature and Relative Humidity: Key atmospheric conditions affecting thermal stratification and mixing.
Wind Velocity: Contributes to mechanical turbulence and mixing processes.
Hour of the Day and Daytime: Captures diurnal and nocturnal patterns in PBL dynamics.
These features, along with the PBL height, form the basis of our Random Forest (RF) model for PBL estimation.

2.3. Random Forest (RF) Model

The RF algorithm is an ensemble learning method that constructs multiple decision trees during training and outputs the average prediction (for regression) or the majority vote (for classification) of all trees, as depicted in Figure 4. This algorithm is particularly well suited to the problem of estimating the PBLH due to its ability to capture complex, non-linear interactions between features without requiring extensive feature engineering [19]. Random Forest’s inherent resilience to overfitting, especially when working with a relatively small number of input features, further makes it a reliable choice for modeling the atmospheric dynamics that influence PBL behavior.

2.4. Parameter Tuning

For this study, we carefully selected and tuned key hyperparameters of the RF model to optimize its predictive performance. The number of trees in the RF model, defined by the n_estimators parameter, was chosen to balance model accuracy and computational efficiency. We also adjusted the maximum depth of each tree (max_depth) to prevent the model from growing excessively complex, thereby reducing the risk of overfitting. Additional hyperparameters, such as the minimum samples required to split a node (min_samples_split) and the minimum samples required at a leaf node (min_samples_leaf), were fine-tuned to ensure that the model generalizes well to unseen data. In order to identify the optimal combination of hyperparameters, we employed the Optuna framework [24,25], a Bayesian optimization framework, which enabled efficient exploration of the hyperparameter space. Optuna systematically tested a range of values for each hyperparameter and identified the configuration that minimized the cross-validation error. This automated tuning process was faster and more efficient than traditional grid or randomized searches, allowing us to build a RF model that captures the non-linear relationships within our dataset while generalizing effectively.

2.5. Evaluation Metrics

In order to assess the performance of the Random Forest model in estimating the PBLH values, we used three metrics Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and

R^{2}

as evaluation metrics. RMSE is calculated as the square root of the average of the squared differences between the predicted and actual values, providing a measure of the model’s accuracy by penalizing larger errors more heavily. This metric is well suited for our regression task as it reflects the model’s ability to make precise and reliable predictions of PBLH. The RMSE is defined as follows:

RMSE = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}}

(1)

where N is the number of data points,

y_{i}

represents the actual PBL height for the i-th observation, and

y_{i}

is the predicted PBL height.

MAE measures the average magnitude of the prediction errors and provides an intuitive estimate of how much predictions deviate from observed values. It is defined as follows:

MAE = \frac{1}{N} \sum_{i = 1}^{N} |y_{i} - {\hat{y}}_{i}|

(2)

where

y_{i}

is the observed value,

{\hat{y}}_{i}

is the predicted value, and N is the total number of observations.

The

R^{2}

quantifies the proportion of variance in the observed data that is explained by the model. It is given by

R^{2} = 1 - \frac{\sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{N} {(y_{i} - \bar{y})}^{2}}

(3)

where

\bar{y}

is the mean of the observed values. An

R^{2}

value closer to 1 indicates better model performance, while values near 0 suggest limited explanatory power.

For model training and evaluation, we randomly sampled the dataset to ensure representation of diverse climate dynamics—including wet periods, dry periods, and El Niño events—splitting it into 80% for training (e.g., 27,296 rows) and 20% for testing (6824 rows). This split ensures that the model is trained on the majority of the data while retaining a substantial portion for independent testing, allowing us to evaluate its generalization performance. Additionally, we employed cross-validation to obtain a more robust estimate of the model’s performance. By dividing the training data into multiple folds and rotating the training and validation sets across these folds, cross-validation reduces the likelihood of overfitting and provides a comprehensive assessment of the model’s predictive capability across different data subsets [26,27].

2.6. Computational Performance

The computational performance of the proposed approach is evaluated by measuring the execution time for both model training and inference. The following experiments are conducted on a system with an Intel Core i5-11400 processor, 16 GB RAM, and a GeForce RTX 4060 GPU. The average training time across the strategies is approximately 15 s. Scenarios employing KNN imputation take slightly longer, averaging 20 s, while those with no imputation or feature removal complete in less than 10 s. Inference, however, is extremely efficient, with the model predicting PBLH values in just 16.586 milliseconds per sample, demonstrating its suitability for real-time or near-real-time applications. Hyperparameter tuning with Optuna requires approximately 90 min for 100 trials, averaging around 1 min per trial. This underscores the computational cost associated with exploring the parameter space for optimal model performance.

3. Results

In this section, we present the outcomes of the RF model in estimating PBLH based on the selected surface-based features. Model performance is assessed using the three evaluation metrics, defined in Section 2.5 as measures of predictive accuracy. To provide further insight into model behavior, we also analyze the contribution of individual features to PBLH predictions. For comparison, two additional models, namely, Linear Regression (LR) and Support Vector Regression (SVR), were trained and evaluated under the same conditions. Additionally, we compared the RF model with state-of-the-art methods, including LightGBM, XGBoost, and a DNN, using standard, untuned configurations commonly reported in the literature. While these models were not fully optimized, this comparison offers a baseline reference. We acknowledge that with a larger dataset and proper hyperparameter tuning, more advanced models could likely outperform RF.

Table 4 summarizes the different data preprocessing strategies adopted in this study, each defined by a specific combination of feature exclusion, row removal, and imputation technique. These configurations were designed to evaluate model sensitivity to data quality and completeness and to reflect realistic operational scenarios commonly encountered in atmospheric datasets. The evaluation results for all models and strategies are presented in Section 3.1. The data-cleaning strategies focused on selective feature exclusion and imputation methods, primarily due to significant missing values. The ‘Row Removal’ column specifies which rows were removed from the dataset to address incomplete or partial data and ensure a cleaner input for the model, while the ‘Imputation’ column outlines the specific data imputation techniques applied. Mean imputation and nearest neighbor imputation (using three neighbors) were used to fill in missing values, depending on the strategy.

The hyperparameters for the RF model were optimized using Optuna, with the selection guided by their known influence on model complexity and performance, as supported by the prior literature and preliminary exploratory runs, resulting in the following configuration: 300 estimators, a maximum depth of 41 levels (from the root down to the deepest leaf), a minimum samples split of 2, and a minimum samples leaf of 4. This configuration was selected to balance model complexity and predictive performance, enhancing the model’s ability to make accurate PBLH predictions. The hyperparameters of the models chosen for comparison are listed in Appendix A.

3.1. Model Performance

Table 5 summarizes the evaluation metrics for each model and scenario, providing insights into their predictive performance and generalizability. Among the evaluated models, Random Forest consistently achieved the best results across multiple preprocessing strategies, with the lowest RMSE and MAE and a moderate

R^{2}

. In contrast, both SVR and LR models showed higher errors and lower explanatory power.

When rows with any missing values are excluded, the RMSE is 381 m, suggesting that missing data negatively affect the model’s performance. When fully missing rows are removed and mean imputation is applied, the RMSE decreases slightly to 375.8 m, indicating that while row removal may reduce the impact of missing data, the choice of imputation strategy affects model accuracy. On the other hand, when no rows are removed and mean imputation is used to fill missing values, the RMSE increases to 381.2 m. This suggests that data imputation alone is not enough to reduce the RMSE and row removal is necessary.

Excluding LE values from the model results in an RMSE of 383.1 m with mean imputation, which is very similar to the result of mean imputation without any feature exclusion (381.2 m), showing that excluding LE does not significantly affect the model’s performance. However, when both the LE and Rn are excluded, the RMSE increases to 387 m, indicating that these features contribute to model accuracy, and their exclusion leads to a slight performance decline. Additionally, when fully missing rows are removed and mean imputation is applied after LE exclusion, the RMSE reduces slightly to 377.8 m, suggesting that removing rows with missing values and excluding the LE feature together may enhance model performance. Similar results can be noted when both LE and Rn are excluded with an RMSE of 382.2 m.

When removing fully missing rows and using mean imputation for missing PBLH values and using feature nearest neighbors imputation, the RMSE result is 381.2 m. However, when fully missing rows are removed, followed by a removal of rows with empty PBLH and nearest neighbors imputation being applied for the features, the RMSE drops to 380 m, suggesting that this strategy is slightly more effective at preserving model performance compared to other combinations. When first applying a mean imputation, and proceeding to row removals as previously stated in this paragraph, one can note an increase of 10 m in the RMSE, varying from 381.2 m to 391.2 m. However, when the nearest neighbor strategy is applied, the RMSE drops to 376.1 m.

The best result is achieved when rows with missing values are removed, and mean imputation is applied, yielding an RMSE of 375.8 m. Although very simple, this strategy demonstrates that a combination of row removal and imputation significantly improves model accuracy. In general, the results show that row removal combined with imputation yields the best outcomes, while excluding key features like LE and Rn or relying on row removal without effective imputation techniques tends to worsen the performance, as evidenced by the higher RMSE values.

3.2. Comparison with State-of-the-Art Models

To assess the robustness of our approach, we compared the performance of the RF model with three widely adopted state-of-the-art algorithms: LightGBM, XGBoost, and a DNN. These models represent some of the most powerful machine learning techniques currently used for regression tasks, particularly in environmental and geophysical applications [17,19,28].

All models were trained using the same input features and preprocessing pipeline adopted for the RF model, ensuring a consistent and fair comparison. While the RF model underwent Bayesian hyperparameter optimization using Optuna, no extensive hyperparameter tuning was conducted for the state-of-the-art models.

Table 6 summarizes the performance of the RF model compared to the selected state-of-the-art algorithms. Although all models performed similarly in terms of

R^{2}

, the RF model achieved the lowest RMSE (375.89 m) and MAE (241.27 m), indicating better predictive accuracy. LightGBM and XGBoost produced comparable results, while the DNN underperformed, particularly in terms of

R^{2}

. These outcomes reflect the strong baseline performance of the RF model under the constraints of this study.

Despite the reputation of advanced models like XGBoost and DNNs in capturing complex nonlinear patterns, their performance may have been limited by the relatively small dataset and the lack of extensive hyperparameter tuning. In contrast, the RF model benefited from Bayesian optimization and is known for its robustness under data-limited conditions.

Furthermore, RF offers notable advantages in terms of interpretability, computational efficiency, and resilience to missing data—features particularly valuable in operational contexts where vertical profiles or dense observational data are unavailable. While all models demonstrated moderate

R^{2}

values, text the inherent challenge of predicting PBLH from surface-only inputs, the RF model consistently delivered a favorable trade-off between accuracy and practicality.

3.3. Statistical Analysis

To evaluate whether the observed performance differences between the RF and the other models are statistically significant, we conducted two-sample paired t-tests using the prediction errors over the test set. The results confirm that RF significantly outperforms all baseline and state-of-the-art models. The t-statistics and corresponding p-values for the comparisons are as follows: RF vs. Linear Regression (t = −23.8, p < 1 × 10⁻¹¹⁹), RF vs. SVR (t = −16.96, p < 1 × 10⁻⁶²), RF vs. LightGBM (t = −4.71, p < 2.6 × 10⁻⁶), RF vs. XGBoost (t = −6.74, p < 1.7 × 10⁻¹¹), and RF vs. DNN (t = −12.53, p < 1.3 × 10⁻³⁵). Despite relatively close RMSE and MAE values for some models, these results provide strong statistical evidence that RF achieves superior predictive performance across the dataset and support the use of RF as a reliable and explainable tool for boundary layer height estimation in data-scarce environments such as the Central Amazon.

In the subsequent subsections, all analyses, including feature importance, SHAP explainability, and model performance on diurnal and seasonal data, will be made based on the best-performing model identified in Table 5, which was trained using Strategy 2.

3.4. Feature Importance Analysis

To identify which features contribute mostly to PBL height estimation, we analyzed the feature importance scores generated by the RF model. Feature importance in RF models are computed using the mean decrease in impurity (MDI), where splits are evaluated based on reductions in mean squared error (MSE). This is the standard technique for estimating feature importance in regression trees, and it reflects how much each feature contributes to reducing prediction error across the ensemble.

Figure 5 shows the relative importance of each feature for the best model present in Table 5. Air Temperature and Relative Humidity were found to be the most significant predictors, suggesting their strong influence on PBL dynamics. Notably, the features hour of the day and the engineered daytime indicator significantly contributed to the model’s performance, with hour of the day accounting for over 10% of the total feature importance.

3.5. Explainability Analysis

To provide insight into the model’s decision process, we applied SHAP (SHapley Additive exPlanations) [29] values to the trained RF model. SHAP assigns an importance value to each feature based on its contribution to the prediction, offering both global and local interpretability.

Figure 6 presents a summary plot of SHAP values for all input features considered in this work, revealing their relative importance and the distribution of their effects across the dataset. In the figure, each point represents a SHAP value for a single prediction; the color indicates the true corresponding feature value (green: high, blue: low). Features are ordered by their mean absolute SHAP value, text the most influential variables across the dataset. The horizontal spread of each feature reflects the range and direction of its influence on the model output. Among the features, Hour, Air Temperature, and Relative Humidity exhibit the greatest impact. Notably, higher temperatures and later hours are associated with increased predictions, while higher relative humidity tends to reduce them.

It is worth noting that while the feature importance plot in Figure 5 shows Air Temperature, Relative Humidity, and Hour, as the top three predictors, the SHAP analysis indicated a slightly different ordering: Hour, followed by Air Temperature and Relative Humidity. This divergence is expected as traditional feature importance metrics in Random Forests are based on the average reduction in impurity, which can be biased toward features with higher cardinality or correlated variables [30]. In contrast, SHAP values provide a more nuanced view by quantifying each feature’s contribution to individual predictions, leading to potentially different, but often more interpretable, rankings.

Additionally, Figure 7 displays a SHAP waterfall plot for a randomly selected sample showing how each feature contributes to the model output. In the figure, the base value,

E [f (X)]

, is 712.20 m, while the final predicted value for this instance is 892.78 m. Each bar represents the SHAP value of a feature, with positive contributions (in aquamarine color) pushing the prediction upward. The most influential features in this case are Hour, Air Temperature, and LE, which together account for the largest positive shifts from the baseline.

3.6. RF Performance on Diurnal and Seasonal Scale Data

We analyzed the best model performance across day and night periods to assess any differences in prediction accuracy relative to time of day. For this analysis, as we have defined earlier, daytime was defined as the hours between 6:00 AM and 6:00 PM, while nighttime covered the remaining hours from 6:00 PM to 6:00 AM. This division allowed us to investigate whether the model’s predictive capabilities varied with diurnal and nocturnal conditions, which are characterized by distinct atmospheric dynamics [1,2]. Also, the geographic location is very close to Equator line (2 S), so there is no difference between sunrise/sunset across the year. Figure 8 shows the raw error plots according to the actual PBL height values and the predicted PBLH values for both daytime (left panel) and nighttime (right panel) conditions. The error analysis revealed distinct patterns between day and night periods. During the day, the model exhibited a wider spread of errors, with frequent high-magnitude underestimations reaching up to 2500 m, suggesting greater variability and modeling challenges under convective conditions. It should be mentioned that, during dry periods (typically from September up to December), the height of PBL can reach 2500–3000 m. In contrast, nighttime errors were more tightly clustered around zero, indicating more stable performance. Again, this results in consistency with the observed values of the PBL heights that are lower than 500 m during nighttime conditions above the Central Amazon region [31]. Overall, the results suggest that the model tends to perform more reliably during the night, while daytime predictions are more prone to large deviations, likely due to the increased complexity of atmospheric dynamics and the largest surface fluxes measurements (both H,

L E

, and

R n

) [32].

In a similar way, Figure 9 presents scatter plots of predicted versus actual PBL height values for the years 2014 (left panel) and 2015 (right panel). In 2014, there was a noticeable concentration of predictions close to the 1:1 line up to approximately 2000 m, indicating the model performs well within this range. A similar pattern was observed in 2015, with most predictions clustering below 2500 m. It should also be mentioned that the year 2015 suffered from a ENSO teleconnection (an El Nino phase) that provoked a very intense dry period. However, both panels reveal noticeable scatter, particularly for higher PBLH values, where predictions tend to underestimate actual values. Specifically, in 2014, the model significantly underestimates the highest PBL height values. This comparison higtextights a consistent model behavior across both years, demonstrating reliable performance in lower PBLH ranges but a tendency to underestimate higher values. Notably, this underestimation of high PBLH values is a well-documented behavior of RF models in environmental prediction tasks. The bias stems from the ensemble averaging mechanism, which tends to smooth extreme values and pull predictions toward the mean [33].

We then moved to a seasonal (monttexty) analysis for both 2014 and 2015. Figure 10 and Figure 11 present scatter plots for each month of 2014 and 2015, respectively. Note that December 2015 is excluded from the analysis due to a lack of available data. The data collection phase of the GoAmazon 2014/5 ended during that month. In general, the model shows relatively consistent performance throughout the months, with predictions clustering near the ideal line (1:1), especially for lower PBL height values (below 1000 m). However, the model tends to underestimate higher PBL values (above 2000 m) in most months, as indicated by points falling below the line 1:1, particularly in the months of January, February, and March, when the model’s underestimation is more pronounced. The spread of data points is wider from April through August, suggesting that the model faces more variability in PBL height predictions during this period, likely due to more dynamic atmospheric conditions. It is a transition period (from wet to dry periods): sometimes there are dry days, and other days are rainy, which impact the PBLH values (deeper or shallower, respectively). In these months, the model still tends to underpredict higher PBLH values, but there is a noticeable reduction in the dispersion compared to the previous months of the year (which represents the rainy season). From September to December (typically dry period), the model’s predictions begin to show more consistent alignment with the line 1:1, especially in October and November. While the model still slightly underestimates high PBL values, the overall pattern suggests an improvement in accuracy in the latter months of the year.

In 2015, shown in Figure 11, a similar pattern is evident. Notably, September, October, and November also display predictions closer to the ideal line, indicating better performance during these months. However, the model still generally underestimates higher PBLH values throughout the year.

Overall, the model exhibits a consistent tendency to underestimate high PBL—in approximately 197.1 m—values across months, suggesting it may not fully capture all atmospheric processes that are driving extreme PBLH values. The seasonal fluctuations in accuracy higtextight areas for improvement. Enhancing the model’s sensitivity to seasonal changes and extreme PBLH values could improve generalization and performance year-round.

In order to study in detail the performance of the RF model, we have chosen two transition months (when the atmosphere presents its higher variability): November and April. Figure 12 presents a time series of RF predictions for the period from 12–15 November in both 2014 (left column) and 2015 (right column). The model generally captures the overall PBL dynamics across both years, with predicted values (dashed lines) following the observed PBL values (solid lines). However, some notable discrepancies emerge, particularly in the nighttime and during sharp fluctuations. In 2014, the model tends to smooth out rapid variations, especially in the early morning and late evening hours, that are the transition periods from stable to unstable and vice versa, respectively. For instance, on 13 and 14 November, it struggles to capture the sharp peaks observed during the night. In contrast, the 2015 results show more pronounced underestimations, especially for higher PBLH values. On 12 and 15 November, the model consistently predicted lower PBLH compared to actual measurements, particularly in the afternoon and evening. These results higtextight a recurring limitation in capturing abrupt PBLH variations and nocturnal behavior.

Based on Figure 12, one can note that 2014 predictions appear to be slightly better than those from 2015. In 2014, the model followed the observed PBL values more closely, even though it smooths out some of the rapid fluctuations. The overall trend alignment is more consistent, and large deviations are less frequent. Overall, while both years exhibit challenges, the 2014 predictions seem to demonstrate a relatively more stable and accurate performance in accordance with Figure 8 and Figure 9. This performance is associated with the fact that 2014 was considered to be a typical year in terms of wet and dry periods, while 2015 is atypical as the El Nino teleconnection influences the rainfall distribution.

In a similar fashion, Figure 13 displays the time series of RF predictions for random days in April 2014 (left) and 2015 (right). The plot reveals a broader range of actual PBLH values, likely reflecting more pronounced atmospheric dynamics during this period, as mentioned before. As observed in earlier figures, the RF model tends to smooth the predictions, which results in increased variability in the PBLH estimates. This pattern aligns with the trends seen in Figure 8 and Figure 9, confirming the model’s tendency to generalize across fluctuating PBL dynamics.

The RF model in this study, with a minimum RMSE of 441 m and a bias of −197.1 m under high PBLH conditions, aligns well with the performance ranges reported in recent literature. Reference [34] reported RMSE values ranging from 167 to 209 m in comparison between ceilometer-based ML predictions and radiosonde measurements at Amazonian sites. Their work also observed seasonal variability in PBLH predictions, particularly under convective conditions, consistent with our findings of greater underestimation in austral spring and early summer. Additionally, our feature importance results corroborate findings by [20], who identified air temperature, radiation, and surface energy fluxes as key drivers for ML-based PBLH estimation. Unlike Zhang’s integration of multiple remote sensing products, our model relies solely on ground-based surface observations yet achieves comparable predictive capability, illustrating the robustness of the Random Forest approach when optimized for regional climatological characteristics.

3.7. Residual Analysis

In order to evaluate the consistency of the model’s predictions, we conducted a residual analysis, as shown in Figure 14. The histogram shows that the distribution is slightly skewed towards negative values, as confirmed by the results shown in this study. While the negative residuals suggest that the model typically overpredicts PBLH values, the large positive residuals point to instances where the model underestimates the PBLH values. This distribution suggests that the model may struggle with accurately predicting extreme values or capturing certain dynamics that lead to larger errors. Refining the model, such as addressing outliers or adjusting for data imbalances, could help mitigate the negative bias, ultimately enhancing prediction accuracy.

4. Conclusions

This study evaluated the effectiveness of a Random Forest (RF) model in estimating the planetary boundary layer height (PBLH) over the Central Amazon using surface meteorological data from the GoAmazon 2014/5 campaign. The model demonstrated solid predictive capability, achieving a low RMSE of 375.8 m and a correlation coefficient of 0.785, indicating a high level of agreement with ceilometer-derived observations (ground truth). Feature importance analysis higtextighted air temperature and relative humidity (which accounts for approximately 60–65%) as the dominant predictors, reinforcing their key roles in PBL dynamics under tropical conditions. Despite its overall strong performance, the model exhibited a consistent underestimation bias—on average 197.1 m—for higher PBLH values, particularly during convective phases (diurnal scale) and transitional months (seasonal scale). This underperformance is likely attributed to both a limited representation of extreme PBLH cases in the training dataset and the inherent difficulty in capturing complex atmospheric processes with only surface-level variables.

Overall, the RF model proved to be robust in estimating the PBLH, even in the presence of incomplete data. Its strong ability to capture the core dynamics of the PBL was evident throughout the study. Among the input variables, relative humidity and air temperature emerged as the most influential factors affecting model performance, although the energy fluxes (H, LE, and Rn) also improve the results.

The model exhibited high accuracy in predicting PBLH values under various conditions, successfully reproducing expected patterns in most cases. Although some challenges were identified in capturing rapid intraday fluctuations—particularly during nighttime periods—these limitations were relatively minor and did not significantly compromise the model’s overall effectiveness. Even when higher PBL values were present, the model consistently provided reliable estimates.

These findings suggest that the model has a solid foundation and could benefit further from the inclusion of additional atmospheric variables, as well as from enhancements that improve its ability to handle extreme variations and noisy data. The integration of remote sensing vertical profiles and land–atmosphere interaction parameters could further improve accuracy, especially during daytime transitions or high PBLH values.

In contrast to many recent studies that rely on remote sensing, reanalysis data, or deep learning architectures, this work emphasizes a practical, interpretable, and data-efficient approach to PBLH estimation. By benchmarking RF model against LR and SVR traditional ML models and LightGBM, XGBoost, and DNN state-of-the-art models, and by integrating explainability tools such as SHAP, the study provides transparent insight into model behavior. While not aiming to outperform state-of-the-art deep learning models trained on large-scale datasets, the results demonstrate that surface-only machine learning approaches, when carefully tuned, can achieve competitive accuracy in data-scarce environments like the Amazon.

Furthermore, accurate estimation of PBLH over the Amazon has direct application to operational activities such as smoke dispersion modeling during forest fire, convective rainfall prediction, and simulations of atmospheric transport of pollution. Such applications are vital to public health management, agricultural production, and climate resilience over the region. Moreover, enhancing knowledge and prediction of PBLH contributes to sustainable environmental management and is aligned with the United Nations Sustainable Development Goals (SDGs), namely, SDG 13 (Climate Action) and SDG 15 (Life on Land).

Given the critical role of PBLH in air quality, cloud formation, and energy exchange processes, the ability to estimate it accurately using accessible and low-cost surface data represents a significant advancement. The results of this study not only reinforce the applicability of machine learning in atmospheric research within complex tropical environments but also point toward promising future directions—such as hybrid models combining remote sensing and machine learning—that could offer even greater precision and operational utility.

Author Contributions

Conceptualization, P.R.P.S., R.G.C., A.O.M., C.Q.D.-J. and G.F.; Formal analysis, P.R.P.S., R.G.C., A.O.M., C.Q.D.-J. and G.F.; Investigation, P.R.P.S., R.G.C., A.O.M., C.Q.D.-J. and G.F.; Methodology, P.R.P.S., R.G.C., A.O.M., C.Q.D.-J. and G.F.; Supervision, A.O.M. and G.F.; Visualization, P.R.P.S., R.G.C., A.O.M., C.Q.D.-J. and G.F.; Writing—original draft, P.R.P.S. and R.G.C.; Writing—review & editing, P.R.P.S., R.G.C., A.O.M., C.Q.D.-J. and G.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research received financial support from the National Council for Scientific and Technological Development (CNPq).

Data Availability Statement

The data supporting the findings of this study are available from the corresponding author upon reasonable request.

Acknowledgments

We would like to especially thank all the people involved in the technical, logistical, and scientific support of the GoAmazon data. The GoAmazon data were obtained from the Atmospheric Radiation Measurement (ARM) user facility, a U.S. Department of Energy (DOE) Office of Science user facility managed by the Biological and Environmental Research program. We would like to thank CNPq for financial support (grants 440170/2022-2). CQDJ acknowledges the support from CNPq grants (Processes 406884/2022-6, 307530/2022-1, 406307/2023-7, 444929/2024-0, 445451/2024-6 and 404254/2024-1). RGC acknowledges support from CNPQ (Process 381525/2025-2). PRPS acknowledges the support from CNPq grants (Process 440170/2022-2). GF acknowledges support from CNPQ (Processes 440170/2022-2 and 311884/2023-7). AOM acknowledges support from CNPQ (Process 309389/2021–6).

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Table A1. Hyperparameters and configurations for ML models chosen for comparison.

Model	Hyperparameters / Configuration
LR	Default parameters from
	`sklearn.linear_model.LinearRegression`
SVR	Kernel: `‘rbf’`
	C: `1.0`
	Epsilon: `0.1`
	Input and target scaled using `StandardScaler`
LightGBM	Objective: `‘regression’`
	Metric: `‘rmse’`
	Boosting type: `‘gbdt’`
	Learning rate: `0.05`
	Num leaves: `31`
	Feature fraction: `0.9`
	Bagging fraction: `0.8`
	Bagging frequency: `5`
	Verbosity: `-1`
XGBoost	Objective: `‘reg:squarederror’`
	n_estimators: `100`
	Learning rate: `0.1`
	Max depth: `6`
	Subsample: `0.8`
	Colsample_bytree: `0.8`
	Random state: `42`
DNN	Architecture: 3 hidden layers (128 → 64 → 32 neurons)
	Activation: `‘relu’` for all hidden layers
	Dropout rates: `0.2`, `0.1`, `0.1`
	Output: 1 neuron (linear activation)
	Optimizer: `‘adam’`
	Loss: `‘mse’`
	Metrics: `‘mae’`

References

Kaimal, J.C.; Finnigan, J.J. Atmospheric Boundary Layer Flows: Their Structure and Measurement; Oxford University Press: Oxford, UK, 1994. [Google Scholar]
Stull, R.B. An Introduction to Boundary Layer Meteorology; Springer Science & Business Media: New York, NY, USA, 2012; Volume 13. [Google Scholar]
Xi, X.; Zhang, Y.; Gao, Z.; Yang, Y.; Zhou, S.; Duan, Z.; Yin, J. Diurnal climatology of correlations between the planetary boundary layer height and surface meteorological factors over the contiguous United States. Int. J. Climatol. 2022, 42, 5092–5110. [Google Scholar] [CrossRef]
Geiß, A.; Wiegner, M.; Bonn, B.; Schäfer, K.; Forkel, R.; von Schneidemesser, E.; Münkel, C.; Chan, K.L.; Nothard, R. Mixing layer height as an indicator for urban air quality? Atmos. Meas. Tech. 2017, 10, 2969–2988. [Google Scholar] [CrossRef]
de Arruda Moreira, G.; Guerrero-Rascado, J.L.; Bravo-Aranda, J.A.; Foyo-Moreno, I.; Cazorla, A.; Alados, I.; Lyamani, H.; Landulfo, E.; Alados-Arboledas, L. Study of the planetary boundary layer height in an urban environment using a combination of microwave radiometer and ceilometer. Atmos. Res. 2020, 240, 104932. [Google Scholar] [CrossRef]
Díaz-Esteban, Y.; Raga, G.B. Observational evidence of the transition from shallow to deep convection in the Western Caribbean Trade Winds. Atmosphere 2019, 10, 700. [Google Scholar] [CrossRef]
Oliveira, M.I.; Acevedo, O.C.; Sörgel, M.; Nascimento, E.L.; Manzi, A.O.; Oliveira, P.E.; Brondani, D.V.; Tsokankunku, A.; Andreae, M.O. Planetary boundary layer evolution over the Amazon rainforest in episodes of deep moist convection at the Amazon Tall Tower Observatory. Atmos. Chem. Phys. 2020, 20, 15–27. [Google Scholar] [CrossRef]
Troen, I.; Mahrt, L. A simple model of the atmospheric boundary layer; Sensitivity to surface evaporation. Bound.-Layer Meteorol. 1986, 37, 129–148. [Google Scholar] [CrossRef]
Noh, Y.; Cheon, W.; Hong, S.; Raasch, S. Improvement of the K-profile model for the planetary boundary layer based on large eddy simulation data. Bound.-Layer Meteorol. 2003, 107, 401–427. [Google Scholar] [CrossRef]
Yao, H.; Li, X.; Pang, H.; Sheng, L.; Wang, W. Application of random forest algorithm in hail forecasting over Shandong Peninsula. Atmos. Res. 2020, 244, 105093. [Google Scholar] [CrossRef]
Mohammed, A.; Kora, R. A comprehensive review on ensemble deep learning: Opportunities and challenges. J. King Saud-Univ.-Comput. Inf. Sci. 2023, 35, 757–774. [Google Scholar] [CrossRef]
Zheng, L.; Lin, R.; Wang, X.; Chen, W. The development and application of machine learning in atmospheric environment studies. Remote Sens. 2021, 13, 4839. [Google Scholar] [CrossRef]
Peng, Z.; Zhang, B.; Wang, D.; Niu, X.; Sun, J.; Xu, H.; Cao, J.; Shen, Z. Application of machine learning in atmospheric pollution research: A state-of-art review. Sci. Total Environ. 2024, 910, 168588. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Das, S.; Chakraborty, R.; Maitra, A. A random forest algorithm for nowcasting of intense precipitation events. Adv. Space Res. 2017, 60, 1271–1282. [Google Scholar] [CrossRef]
Krishnamurthy, R.; Newsom, R.K.; Berg, L.K.; Xiao, H.; Ma, P.L.; Turner, D.D. On the estimation of boundary layer heights: A machine learning approach. Atmos. Meas. Tech. Discuss. 2020, 2020, 4403–4424. [Google Scholar] [CrossRef]
Xing Zhao, Z.; Lin Fu, S.; Jie Chen, J. A machine learning algorithm for planetary boundary layer height estimating by combining remote sensing data. In Proceedings of the Sixth Conference on Frontiers in Optical Imaging and Technology: Applications of Imaging Technologies, Nanjing, China, 22–24 October 2023; SPIE: Bellingham, WA, USA, 2024; Volume 13157, pp. 393–398. [Google Scholar]
Peng, K.; Xin, J.; Zhu, X.; Wang, X.; Cao, X.; Ma, Y.; Ren, X.; Zhao, D.; Cao, J.; Wang, Z. Machine learning model to accurately estimate the planetary boundary layer height of Beijing urban area with ERA5 data. Atmos. Res. 2023, 293, 106925. [Google Scholar] [CrossRef]
Canché-Cab, L.; San-Pedro, L.; Ali, B.; Rivero, M.; Escalante, M. The atmospheric boundary layer: A review of current challenges and a new generation of machine learning techniques. Artif. Intell. Rev. 2024, 57, 1–51. [Google Scholar] [CrossRef]
Zhang, D.; Comstock, J.; Sivaraman, C.; Mo, K.; Krishnamurthy, R.; Tian, J.; Su, T.; Li, Z.; Roldán-Henao, N. Best Estimate of the Planetary Boundary Layer Height from Multiple Remote Sensing Measurements. EGUsphere 2025, 2025, 1–36. [Google Scholar] [CrossRef]
Martin, S.T.; Artaxo, P.; Machado, L.A.T.; Manzi, A.O.; Souza, R.A.F.d.; Schumacher, C.; Wang, J.; Andreae, M.O.; Barbosa, H.; Fan, J.; et al. Introduction: Observations and modeling of the Green Ocean Amazon (GoAmazon2014/5). Atmos. Chem. Phys. 2016, 16, 4785–4797. [Google Scholar] [CrossRef]
Jadhav, A.; Pramod, D.; Ramanathan, K. Comparison of performance of data imputation methods for numeric dataset. Appl. Artif. Intell. 2019, 33, 913–933. [Google Scholar] [CrossRef]
Genuer, R.; Poggi, J.M.; Genuer, R.; Poggi, J.M. Random Forests; Springer: Berlin/Heidelberg, Germany, 2020. [Google Scholar]
Akiba, T.; Sano, S.; Yanase, T.; Ohta, T.; Koyama, M. Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 2623–2631. [Google Scholar]
Shekhar, S.; Bansode, A.; Salim, A. A comparative study of hyper-parameter optimization tools. In Proceedings of the 2021 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE), Brisbane, Australia, 8–10 December 2021; IEEE: New York, NY, USA, 2021; pp. 1–6. [Google Scholar]
Nti, I.K.; Nyarko-Boateng, O.; Aning, J. Performance of machine learning algorithms with different K values in K-fold cross-validation. Int. J. Inf. Technol. Comput. Sci. 2021, 13, 61–71. [Google Scholar]
Zhang, X.; Liu, C.A. Model averaging prediction by K-fold cross-validation. J. Econom. 2023, 235, 280–301. [Google Scholar] [CrossRef]
Su, T.; Zhang, Y. Deep-learning-derived planetary boundary layer height from conventional meteorological measurements. Atmos. Chem. Phys. 2024, 24, 6477–6493. [Google Scholar] [CrossRef]
Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 2017, 30, 4768–4777. [Google Scholar]
Scornet, E. Trees, forests, and impurity-based variable importance in regression. In Annales de l’Institut Henri Poincare (B) Probabilites et Statistiques; Institut Henri Poincaré: Paris, France, 2023; Volume 59, pp. 21–52. [Google Scholar]
Mendonça, A.C.; Dias-Júnior, C.Q.; Acevedo, O.C.; Marra, D.M.; Cely-Toro, I.M.; Fisch, G.; Brondani, D.V.; Manzi, A.O.; Portela, B.T.; Quesada, C.A.; et al. Estimation of the nocturnal boundary layer height over the Central Amazon forest using turbulence measurements. Agric. For. Meteorol. 2025, 367, 110469. [Google Scholar] [CrossRef]
Carneiro, R.G.; Ribeiro, M.M.; Gatti, L.V.; de Souza, C.M.A.; Dias-Júnior, C.Q.; Tejada, G.; Domingues, L.G.; Rykowska, Z.; dos Santos, C.A.; Fisch, G. Assessing the effectiveness of convective boundary layer height estimation using flight data and ERA5 profiles in the Amazon biome. Clim. Dyn. 2025, 63, 109. [Google Scholar] [CrossRef]
Kamińska, J.A. Residuals in the modelling of pollution concentration depending on meteorological conditions and traffic flow, employing decision trees. ITM Web Conf. 2018, 23, 00016. [Google Scholar] [CrossRef]
Stapleton, A.; Dias-Junior, C.Q.; Von Randow, C.; Farias D’Oliveira, F.A.; Pötextker, C.; de Araújo, A.C.; Roantree, M.; Eichelmann, E. Intercomparison of machine learning models to determine the planetary boundary layer height over Central Amazonia. J. Geophys. Res. Atmos. 2025, 130, e2024JD042488. [Google Scholar] [CrossRef]

Figure 1. Distribution histograms of key features in the dataset, illustrating the range and frequency of values for each atmospheric variable.

Figure 2. Boxplot of relative humidity values, showing some outliers that may indicate sensor errors or peculiar atmospheric conditions.

Figure 3. Feature correlation matrix showing the Pearson correlation coefficients between the variables used for PBL height estimation.

Figure 4. Graphical representation of a RF algorithm. Adapted from [23].

Figure 5. Feature importance showing the relative contribution of each variable to the Random Forest model’s predictions.

Figure 6. SHAP summary plot showing the global feature importance for the RF model in predicting PBLH.

Figure 7. SHAP waterfall plot illustrating the contribution of each feature to a single RF prediction of PBLH.

Figure 8. Comparison of model performance during day (left) and night (right) periods, with daytime defined as 6:00 AM to 6:00 PM and nighttime as 6:00 PM to 6:00 AM.

Figure 9. Predicted versus actual PBL values for the years 2014 (left) and 2015 (right) using the Random Forest model.

Figure 10. Predicted versus actual PBL values (meters) for each month of 2014 using the Random Forest (RF) model. Each plot shows monttexty model performance.

Figure 11. Predicted versus actual PBL values (in meters) for each month of 2015 using the Random Forest (RF) model. Note: December 2015 is excluded from the analysis due to a lack of available data.

Figure 12. Random Forest predictions for 12–15 November in 2014 (left) and 2015 (right), illustrating the model’s ability to capture the overall dynamics of the PBL. Dashed lines represent predicted PBL height values, while solid lines denote observed PBL height values from a ceilometer.

Figure 13. Random Forest predictions for 8, 11, 15, and 23 April in 2014 (left) and 2015 (right), illustrating the model’s ability to capture the PBL height dynamics.

Figure 14. Histogram of residuals for the model’s predictions, displaying the distribution of differences between actual and predicted PBL values.

Table 1. Instruments and measurements used in this study.

Instrument	Measurements or Features	Sensor Resolution
Ceilometer (CL31 from Vaisala Inc., Helsinki, Finland)	PBL height estimates (m)	16 s to 10 min
Thermo-hygrometer (HMP45C Vaisala Inc., Helsinki, Finland)	Air temperature (°C)	1 min
Thermo-hygrometer (HMP45C Vaisala Inc., Helsinki, Finland)	Relative Humidity (%)	1 min
RM Young 05103/05106 Wind Monitor	Wind Velocity (m/s)	1 min
CNR4/CNF4, (Kipp & Zonen, Delft, the Netherlands)	Radiation Balance ( ${W m}^{- 2}$ )	30 min
Sonic Anemometer (3D; CSAT model-3, Campbell Scientific Inc., Logan, USA)	Latent Heat Flux ( ${W m}^{- 2}$ )	30 min
Sonic Anemometer (3D; CSAT model-3, Campbell Scientific Inc., Logan, USA)	Sensible Heat Flux ( ${W m}^{- 2}$ )	30 min

Table 2. Sample of collected data used for training the machine learning model.

Y	M	D	Hour ¹	H $(\frac{W}{m^{2}})$	LE $(\frac{W}{m^{2}})$	Rn $(\frac{W}{m^{2}})$	Air Temp. (°C)	Rel. Hum. (%)	Wind Speed (m/s)	PBL Heig. (m)
2014	2	27	10:00	-	−216	-	28.0	83.0	1.2	503
2014	10	1	13:30	61	287	-	34.7	53.0	2.7	2048
2014	7	18	01:30	−1	-	−12	23.4	97.2	0.0	97
2015	1	7	16:30	−21	104	263	32.0	62.5	3.1	1481
2015	5	14	18:30	−1	−1	−39	24.9	99.1	0.7	2107
2015	4	1	04:00	0	−13	−5	21.9	99.5	0.9	-
2015	4	1	08:30	43	−26	151	24.2	97.5	1.1	280

¹ Local time.

Table 3. Summary statistics of the dataset used for training the ML model.

Statistic	H	LE	Rn	Air Temperature	Relative Humidity	Wind Speed	PBL Height
Count	25,889	21,833	22,616	33,037	32,457	33,322	31,954
Missing Ratio	27%	38%	36%	6%	8%	6%	7.3%
Mean	20.6	33.3	121.5	27	87.1	1.5	616.6
Std	54.7	124.8	213.6	3.5	15	1.2	616.4
Min	−449.7	−1233	−290.8	13.2	−245	0	40
25%	−1.8	−1.8	−27.3	24.3	78.1	0.59	146.1
50%	0.9	1.7	−8.8	26	94.1	1.2	340
75%	29.7	76.3	234.4	29.6	98.3	2.2	970.4
90%	83.3	206.7	491.4	32.3	99.7	3.3	1559.6
Max	384.6	573.4	1708.8	38.8	102.8	10.6	4000

Table 4. Overview of data pre-processing and imputation strategies evaluated in this study.

Strategy	Feature Exclusion	Row Removal	Imputation
1	-	Any missing values	-
2	-	Fully Missing Rows	Mean Imputation
3	-	-	Mean Imputation
4	Exclude LE	-	Mean Imputation
5	Exclude LE	Fully Missing Rows	Mean Imputation
6	Exclude LE and Rn	-	Mean Imputation
7	Exclude LE and Rn	Fully Missing Rows	Mean Imputation
8	-	Fully Missing Rows	Nearest Neighbors ² Mean Imputation ³
9	-	Fully Missing Rows Empty PBL	Nearest Neighbors ²
10	-	Any missing values	Mean Imputation ¹
11	-	Fully Missing Rows Empty PBL	Nearest Neighbors Mean Imputation ¹

¹ Imputation performed first, based on the day–hour combination, followed by rows removal. ² On features only. ³ On PBL observations only.

Table 5. Model performance (RMSE, MAE, and

R^{2}

) across imputation strategies for Random Forest, Linear Regression, and Support Vector Regression.

Table 5. Model performance (RMSE, MAE, and

R^{2}

) across imputation strategies for Random Forest, Linear Regression, and Support Vector Regression.

Strategy	RF			LR			SVR
Strategy	RMSE	MAE	R²	RMSE	MAE	R²	RMSE	MAE	R²
1	381.09	240.60	0.65	424.01	281.53	0.57	479.41	294.48	0.45
2	375.89	241.27	0.60	421.06	281.64	0.50	442.94	272.20	0.45
3	381.21	238.43	0.59	421.07	278.69	0.49	443.56	276.14	0.44
4	383.11	239.94	0.58	421.57	278.98	0.49	442.32	272.91	0.44
5	377.81	242.40	0.60	422.24	282.85	0.50	442.76	270.23	0.45
6	387.00	244.22	0.57	421.49	278.89	0.49	441.36	270.49	0.45
7	382.19	245.36	0.59	426.81	284.35	0.49	443.81	268.46	0.44
8	381.18	245.79	0.59	424.30	286.49	0.50	442.75	272.37	0.45
9	380.05	241.56	0.61	414.34	275.90	0.54	437.02	262.74	0.49
10	391.18	251.41	0.62	434.10	290.71	0.53	472.44	290.29	0.45
11	376.10	239.06	0.62	413.47	275.88	0.54	436.15	261.65	0.49

Table 6. Performance comparison between the RF model and state-of-the-art models.

Model	RMSE	MAE	R²
RF	375.89	241.27	0.6
LightGBM	379.16	244.3	0.6
XGBoost	379.31	244	0.6
DNN	390.89	251.56	0.57

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Silva, P.R.P.; Carneiro, R.G.; Moraes, A.O.; Dias-Junior, C.Q.; Fisch, G. Estimating Planetary Boundary Layer Height over Central Amazonia Using Random Forest. Atmosphere 2025, 16, 941. https://doi.org/10.3390/atmos16080941

AMA Style

Silva PRP, Carneiro RG, Moraes AO, Dias-Junior CQ, Fisch G. Estimating Planetary Boundary Layer Height over Central Amazonia Using Random Forest. Atmosphere. 2025; 16(8):941. https://doi.org/10.3390/atmos16080941

Chicago/Turabian Style

Silva, Paulo Renato P., Rayonil G. Carneiro, Alison O. Moraes, Cleo Quaresma Dias-Junior, and Gilberto Fisch. 2025. "Estimating Planetary Boundary Layer Height over Central Amazonia Using Random Forest" Atmosphere 16, no. 8: 941. https://doi.org/10.3390/atmos16080941

APA Style

Silva, P. R. P., Carneiro, R. G., Moraes, A. O., Dias-Junior, C. Q., & Fisch, G. (2025). Estimating Planetary Boundary Layer Height over Central Amazonia Using Random Forest. Atmosphere, 16(8), 941. https://doi.org/10.3390/atmos16080941

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Estimating Planetary Boundary Layer Height over Central Amazonia Using Random Forest

Abstract

1. Introduction

2. Materials and Methods

2.1. Data

2.2. Preliminary Analysis

2.3. Random Forest (RF) Model

2.4. Parameter Tuning

2.5. Evaluation Metrics

2.6. Computational Performance

3. Results

3.1. Model Performance

3.2. Comparison with State-of-the-Art Models

3.3. Statistical Analysis

3.4. Feature Importance Analysis

3.5. Explainability Analysis

3.6. RF Performance on Diurnal and Seasonal Scale Data

3.7. Residual Analysis

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI