MDPI - Publisher of Open Access Journals

27 pages, 3666 KiB

Open AccessArticle

A LightGBM-Based Power Grid Frequency Prediction Method with Dynamic Significance–Correlation Feature Weighting

by Jie Zhou, Xiangqian Tong, Shixian Bai and Jing Zhou

Energies 2025, 18(13), 3308; https://doi.org/10.3390/en18133308 - 24 Jun 2025

Viewed by 352

Accurate grid frequency prediction is essential for maintaining the stability and reliability of power systems. However, the complex dynamic characteristics of grid frequency and the nonlinear correlations among massive time series data make it challenging for traditional time series prediction methods to balance [...] Read more.

Accurate grid frequency prediction is essential for maintaining the stability and reliability of power systems. However, the complex dynamic characteristics of grid frequency and the nonlinear correlations among massive time series data make it challenging for traditional time series prediction methods to balance efficiency and accuracy. In this paper, we propose a Dynamic Significance–Correlation Weighting (D-SCW) method, which generates dynamic weight coefficients that evolve over time. This is achieved by constructing a joint screening mechanism of feature time series correlation analysis and statistical significance test, combined with the LightGBM gradient-boosting decision tree (GBDT) framework; accordingly, high-precision prediction of grid frequency time series data is realized. To verify the effectiveness of the D-SCW method, this study conducts comparative experiments on two actual grid operation datasets (including typical scenarios with wind/photovoltaic (PV) installations, accounting for 5–35% of the grid); additionally, the Spearman’s rank correlation coefficient method, mutual information (MI), Lasso regression, and the feature screening method of recursive feature elimination (RFE) are selected as the baseline control; root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE) are adopted as assessment indicators. The results show that the D-SCW-LightGBM framework reduces the root mean squared error (RMSE) by 5.2% to 10.4% and shortens the dynamic response delay by 52% compared with the benchmark method in high renewable penetration scenarios, confirming its effectiveness in both prediction accuracy and computational efficiency. Full article

► Show Figures

Figure 1

30 pages, 7559 KiB

Open AccessArticle

Deciphering Socio-Spatial Integration Governance of Community Regeneration: A Multi-Dimensional Evaluation Using GBDT and MGWR to Address Non-Linear Dynamics and Spatial Heterogeneity in Life Satisfaction and Spatial Quality

by Hong Ni, Jiana Liu, Haoran Li, Jinliu Chen, Pengcheng Li and Nan Li

Buildings 2025, 15(10), 1740; https://doi.org/10.3390/buildings15101740 - 20 May 2025

Viewed by 635

Abstract

Urban regeneration is pivotal to sustainable development, requiring innovative strategies that align social dynamics with spatial configurations. Traditional paradigms increasingly fail to tackle systemic challenges—neighborhood alienation, social fragmentation, and resource inequality—due to their inability to integrate human-centered spatial governance. This study addresses these [...] Read more.

Urban regeneration is pivotal to sustainable development, requiring innovative strategies that align social dynamics with spatial configurations. Traditional paradigms increasingly fail to tackle systemic challenges—neighborhood alienation, social fragmentation, and resource inequality—due to their inability to integrate human-centered spatial governance. This study addresses these shortcomings with a novel multidimensional framework that merges social perception (life satisfaction) analytics with spatial quality (GIS-based) assessment. At its core, we utilize geospatial and machine learning models, deploying an ensemble of Gradient Boosted Decision Trees (GBDT), Random Forest (RF), and multiscale geographically weighted regression (MGWR) to decode nonlinear socio-spatial interactions within Suzhou’s community environmental matrix. Our findings reveal critical intersections where residential density thresholds interact with commercial accessibility patterns and transport network configurations. Notably, we highlight the scale-dependent influence of educational proximity and healthcare distribution on community satisfaction, challenging conventional planning doctrines that rely on static buffer-zone models. Through rigorous spatial econometric modeling, this research uncovers three transformative insights: (1) Urban environment exerts a dominant influence on life satisfaction, accounting for 52.61% of the variance. Air quality emerges as a critical determinant, while factors such as proximity to educational institutions, healthcare facilities, and public landmarks exhibit nonlinear effects across spatial scales. (2) Housing price growth in Suzhou displays significant spatial clustering, with a Moran’s I of 0.130. Green space coverage positively correlates with price appreciation (β = 21.6919 ***), whereas floor area ratio exerts a negative impact (β = −4.1197 ***), highlighting the trade-offs between density and property value. (3) The MGWR model outperforms OLS in explaining housing price dynamics, achieving an R² of 0.5564 and an AICc of 11,601.1674. This suggests that MGWR captures 55.64% of pre- and post-pandemic price variations while better reflecting spatial heterogeneity. By merging community-expressed sentiment mapping with morphometric urban analysis, this interdisciplinary research pioneers a protocol for socio-spatial integrated urban transitions—one where algorithmic urbanism meets human-scale needs, not technological determinism. These findings recalibrate urban regeneration paradigms, demonstrating that data-driven socio-spatial integration is not a theoretical aspiration but an achievable governance reality. Full article

(This article belongs to the Special Issue Sensing the Built Environment: Measurements, Correlations, and Implications)

► Show Figures

Figure 1

20 pages, 15335 KiB

Open AccessArticle

A Method for Identifying Landslide-Prone Areas Using Multiple Factors and Adaptive Probability Thresholds: A Case Study in Northern Tongren, Longwu River Basin, Qinghai Province

by Jiawen Bao, Xiaojun Luo, Yueling Shi, Mingyue Hou, Jichao Lv and Guoxiang Liu

Remote Sens. 2025, 17(8), 1380; https://doi.org/10.3390/rs17081380 - 12 Apr 2025

Viewed by 640

Abstract

Early and accurate identification of landslide-prone areas is critical for monitoring and early-warning systems, forming the foundation of disaster prevention and mitigation. However, current landslide susceptibility assessment methods often rely on arbitrary probability classification thresholds, leading to subjective and regionally non-adaptive results that [...] Read more.

Early and accurate identification of landslide-prone areas is critical for monitoring and early-warning systems, forming the foundation of disaster prevention and mitigation. However, current landslide susceptibility assessment methods often rely on arbitrary probability classification thresholds, leading to subjective and regionally non-adaptive results that neglect low-susceptibility areas, thereby limiting their practical utility in disaster management. To address these limitations, this study proposes a novel method for identifying landslide-prone areas by integrating multi-factor analysis with adaptive probability thresholds. The methodology combines landslide catalog data with key landslide influencing factors, including geology, topography, precipitation, surface deformation, and human activities. The gradient boosting decision tree (GBDT) algorithm is employed to estimate landslide susceptibility probabilities, while an adaptive threshold criterion—based on minimizing the Jensen–Shannon (JS) divergences weighted sum between landslide-prone areas and positive samples—is established to objectively classify regions. Validation experiments were conducted in the northern Tongren region of the Longwu River Basin, Qinghai Province, China. Historical landslides (February 2016–June 2017) were used for model training, and subsequent landslides (June 2017–November 2022) served as validation data. The results demonstrate exceptional performance: the susceptibility model achieved an AUC value of 0.99, with 94.07% accuracy in classifying landslides positive samples. Furthermore, 77.78% of post-2017 landslides occurred within the identified prone areas, yielding a 22.22% omission rate. These findings highlight the method’s ability to dynamically adapt to regional characteristics, balance sensitivity and specificity, and provide actionable insights for landslide risk management. Full article

► Show Figures

Figure 1

30 pages, 7731 KiB

Open AccessArticle

Interpretable GBDT Model for Analysing Ridership Mechanisms in Urban Rail Transit: A Case Study in Shenzhen

by Wenjing Wang, Haiyan Wang, Jian Xu, Chengfa Liu, Shipeng Wang and Qing Miao

Appl. Sci. 2025, 15(7), 3835; https://doi.org/10.3390/app15073835 - 31 Mar 2025

Viewed by 392

Abstract

With the acceleration of urbanisation and the diversification of residents’ travel needs, rail transit plays a critical role in mitigating traffic congestion. However, existing studies predominantly rely on linear models, neglecting the nonlinear effects and spatial heterogeneity of built environment factors on ridership. [...] Read more.

With the acceleration of urbanisation and the diversification of residents’ travel needs, rail transit plays a critical role in mitigating traffic congestion. However, existing studies predominantly rely on linear models, neglecting the nonlinear effects and spatial heterogeneity of built environment factors on ridership. To address this gap, this study integrates the Multiscale Geographically Weighted Regression (MGWR) model and the Gradient Boosting Decision Tree (GBDT) model to analyse the impact of built environment factors on total, inbound, and outbound ridership in Shenzhen. Utilising Automatic Fare Collection (AFC) data and multiple built environment variables, we identify six key factors (office type, accessibility, road network density, floor area ratio (FAR), public services, and residential type) through SHapley Additive exPlanations (SHAP) value and partial dependency plot (PDP) analysis. Notably, this study constructs a three-dimensional PDP to explore the linkage effects of building volume ratio and accessibility, revealing their joint influence on ridership. The results demonstrate that the GBDT model outperforms MGWR in handling high-dimensional nonlinear data. This paper provides policy recommendations for transport authorities, highlighting the synergies between optimising the planning of the built environment and the development of rail transport to improve the efficiency of short-distance commuting while supporting long-distance cross-city travel. Full article

► Show Figures

Figure 1

36 pages, 10042 KiB

Open AccessArticle

Unraveling Spatial Nonstationary and Nonlinear Dynamics in Life Satisfaction: Integrating Geospatial Analysis of Community Built Environment and Resident Perception via MGWR, GBDT, and XGBoost

by Di Yang, Qiujie Lin, Haoran Li, Jinliu Chen, Hong Ni, Pengcheng Li, Ying Hu and Haoqi Wang

ISPRS Int. J. Geo-Inf. 2025, 14(3), 131; https://doi.org/10.3390/ijgi14030131 - 20 Mar 2025

Cited by 4 | Viewed by 1134

Abstract

Rapid urbanization has accelerated the transformation of community dynamics, highlighting the critical need to understand the interplay between subjective perceptions and objective built environments in shaping life satisfaction for sustainable urban development. Existing studies predominantly focus on linear relationships between isolated factors, neglecting [...] Read more.

Rapid urbanization has accelerated the transformation of community dynamics, highlighting the critical need to understand the interplay between subjective perceptions and objective built environments in shaping life satisfaction for sustainable urban development. Existing studies predominantly focus on linear relationships between isolated factors, neglecting spatial heterogeneity and nonlinear dynamics, which limits the ability to address localized urban challenges. This study addresses these gaps by utilizing multi-scale geographically weighted regression (MGWR) to assess the spatial nonstationarity of subject perceptions and built environment factors while employing gradient-boosting decision trees (GBDT) to capture their nonlinear relationships and incorporating eXtreme Gradient Boosting (XGBoost) to improve predictive accuracy. Using geospatial data (POIs, social media data) and survey responses in Suzhou, China, the findings reveal that (1) proximity to business facilities (β = 0.41) and educational resources (β = 0.32) strongly correlate with satisfaction, while landscape quality shows contradictory effects between central (β = 0.12) and peripheral zones (β = −0.09). (2) XGBoost further quantifies predictive disparities: subjective factors like property service satisfaction (R² = 0.64, MAPE = 3.72) outperform objective metrics (e.g., dining facilities, R² = 0.36), yet objective housing prices demonstrate greater stability (MAPE = 3.11 vs. subjective MAPE = 6.89). (3) Nonlinear thresholds are identified for household income and green space coverage (>15%, saturation effects). These findings expose critical mismatches—residents prioritize localized services over citywide economic metrics, while objective amenities like healthcare accessibility (threshold = 1 km) require spatial recalibration. By bridging spatial nonstationarity (MGWR) and nonlinearity (XGBoost), this study advances a dual-path framework for adaptive urban governance, the community-level prioritization of high-impact subjective factors (e.g., service quality), and data-driven spatial planning informed by nonlinear thresholds (e.g., facility density). The results offer actionable pathways to align smart urban development with socio-spatial equity, emphasizing the need for hyperlocal, perception-sensitive regeneration strategies. Full article

(This article belongs to the Special Issue Spatial Information for Improved Living Spaces)

► Show Figures

Figure 1

22 pages, 5841 KiB

Open AccessArticle

Enhancing Genomic Prediction Accuracy of Reproduction Traits in Rongchang Pigs Through Machine Learning

by Junge Wang, Jie Chai, Li Chen, Tinghuan Zhang, Xi Long, Shuqi Diao, Dong Chen, Zongyi Guo, Guoqing Tang and Pingxian Wu

Animals 2025, 15(4), 525; https://doi.org/10.3390/ani15040525 - 12 Feb 2025

Cited by 1 | Viewed by 1151

Abstract

The increasing volume of genome sequencing data presents challenges for traditional genome-wide prediction methods in handling large datasets. Machine learning (ML) techniques, which can process high-dimensional data, offer promising solutions. This study aimed to find a genome-wide prediction method for local pig breeds, [...] Read more.

The increasing volume of genome sequencing data presents challenges for traditional genome-wide prediction methods in handling large datasets. Machine learning (ML) techniques, which can process high-dimensional data, offer promising solutions. This study aimed to find a genome-wide prediction method for local pig breeds, using 10 datasets with varying SNP densities derived from imputed sequencing data of 515 Rongchang pigs and the Pig QTL database. Three reproduction traits—litter weight, total number of piglets born, and number of piglets born alive—were predicted using six traditional methods and five ML methods, including kernel ridge regression, random forest, Gradient Boosting Decision Tree (GBDT), Light Gradient Boosting Machine, and Adaboost. The methods’ efficacy was evaluated using fivefold cross-validation and independent tests. The predictive performance of both traditional and ML methods initially increased with SNP density, peaking at 800–900 k SNPs. ML methods outperformed traditional ones, showing improvements of 0.4–4.1%. The integration of GWAS and the Pig QTL database enhanced ML robustness. ML models exhibited superior generalizability, with high correlation coefficients (0.935–0.998) between cross-validation and independent test results. GBDT and random forest showed high computational efficiency, making them promising methods for genomic prediction in livestock breeding. Full article

(This article belongs to the Section Pigs)

► Show Figures

Figure 1

29 pages, 11007 KiB

Open AccessEditor’s ChoiceArticle

Research on Innovative Apple Grading Technology Driven by Intelligent Vision and Machine Learning

by Bo Han, Jingjing Zhang, Rolla Almodfer, Yingchao Wang, Wei Sun, Tao Bai, Luan Dong and Wenjing Hou

Foods 2025, 14(2), 258; https://doi.org/10.3390/foods14020258 - 15 Jan 2025

Cited by 5 | Viewed by 2601

Abstract

In the domain of food science, apple grading holds significant research value and application potential. Currently, apple grading predominantly relies on manual methods, which present challenges such as low production efficiency and high subjectivity. This study marks the first integration of advanced computer [...] Read more.

In the domain of food science, apple grading holds significant research value and application potential. Currently, apple grading predominantly relies on manual methods, which present challenges such as low production efficiency and high subjectivity. This study marks the first integration of advanced computer vision, image processing, and machine learning technologies to design an innovative automated apple grading system. The system aims to reduce human interference and enhance grading efficiency and accuracy. A lightweight detection algorithm, FDNet-p, was developed to capture stem features, and a strategy for auxiliary positioning was designed for image acquisition. An improved DPC-AWKNN segmentation algorithm is proposed for segmenting the apple body. Image processing techniques are employed to extract apple features, such as color, shape, and diameter, culminating in the development of an intelligent apple grading model using the GBDT algorithm. Experimental results demonstrate that, in stem detection tasks, the lightweight FDNet-p model exhibits superior performance compared to various detection models, achieving an mAP@0.5 of 96.6%, with a GFLOPs of 3.4 and a model size of just 2.5 MB. In apple grading experiments, the GBDT grading model achieved the best comprehensive performance among classification models, with weighted Jacard Score, Precision, Recall, and F1 Score values of 0.9506, 0.9196, 0.9683, and 0.9513, respectively. The proposed stem detection and apple body classification models provide innovative solutions for detection and classification tasks in automated fruit grading, offering a comprehensive and replicable research framework for standardizing image processing and feature extraction for apples and similar spherical fruit bodies. Full article

► Show Figures

Figure 1

29 pages, 7541 KiB

Open AccessArticle

Machine Learning-Based Water Quality Classification Assessment

by Wenliang Chen, Duo Xu, Bowen Pan, Yuan Zhao and Yan Song

Water 2024, 16(20), 2951; https://doi.org/10.3390/w16202951 - 17 Oct 2024

Cited by 7 | Viewed by 3946

Abstract

Water is a vital resource, and its quality has a direct impact on human health. Groundwater, as one of the primary water sources, requires careful monitoring to ensure its safety. Although manual methods for testing water quality are accurate, they are often time-consuming, [...] Read more.

Water is a vital resource, and its quality has a direct impact on human health. Groundwater, as one of the primary water sources, requires careful monitoring to ensure its safety. Although manual methods for testing water quality are accurate, they are often time-consuming, costly, and inefficient when dealing with large and complex data sets. In recent years, machine learning has become an effective alternative for water quality assessment. However, current approaches still face challenges, such as the limited performance of individual models, minimal improvements from optimization algorithms, lack of dynamic feature weighting mechanisms, and potential information loss when simplifying model inputs. To address these challenges, this paper proposes a hybrid model, BS-MLP, which combines GBDT (gradient-boosted decision tree) and MLP (multilayer perceptron). The model leverages GBDT’s strength in feature selection and MLP’s capability to manage nonlinear relationships, enabling it to capture complex interactions between water quality parameters. We employ Bayesian optimization to fine-tune the model’s parameters and introduce a feature-weighting attention mechanism to develop the BS-FAMLP model, which dynamically adjusts feature weights, enhancing generalization and classification accuracy. In addition, a comprehensive parameter selection strategy is employed to maintain data integrity. These innovations significantly improve the model’s classification performance and efficiency in handling complex water quality environments and imbalanced datasets. This model was evaluated using a publicly available groundwater quality dataset consisting of 188,623 samples, each with 15 water quality parameters and corresponding labels. The BS-FAMLP model shows strong classification performance, with optimized hyperparameters and an adjusted feature-weighting attention mechanism. Specifically, it achieved an accuracy of 0.9616, precision of 0.9524, recall of 0.9655, F1 Score of 0.9589, and an AUC score of 0.9834 on the test set. Compared to single models, classification accuracy improved by approximately 10%, and when compared to other hybrid models with additional attention mechanisms, BS-FAMLP achieved an optimal balance between classification performance and computational efficiency. The core objective of this study is to utilize the acquired water quality parameter data for efficient classification and assessment of water samples, with the aim of streamlining traditional laboratory-based water quality analysis processes. By developing a reliable water quality classification model, this research provides robust technical support for water safety management. Full article

(This article belongs to the Section Water Quality and Contamination)

► Show Figures

Figure 1

24 pages, 6869 KiB

Open AccessArticle

Automobile-Demand Forecasting Based on Trend Extrapolation and Causality Analysis

by Zhengzhu Zhang, Haining Chai, Liyan Wu, Ning Zhang and Fenghe Wu

Electronics 2024, 13(16), 3294; https://doi.org/10.3390/electronics13163294 - 19 Aug 2024

Viewed by 3927

Abstract

Accurate automobile-demand forecasting can provide effective guidance for automobile-manufacturing enterprises in terms of production planning and supply planning. However, automobile sales volume is affected by historical sales volume and other external factors, and it shows strong non-stationarity, nonlinearity, autocorrelation and other complex characteristics. [...] Read more.

Accurate automobile-demand forecasting can provide effective guidance for automobile-manufacturing enterprises in terms of production planning and supply planning. However, automobile sales volume is affected by historical sales volume and other external factors, and it shows strong non-stationarity, nonlinearity, autocorrelation and other complex characteristics. It is difficult to accurately forecast sales volume using traditional models. To solve this problem, a forecasting model combining trend extrapolation and causality analysis is proposed and derived from the historical predictors of sales volume and the influence of external factors. In the trend-extrapolation model, the historical predictors of sales series was captured based on the Seasonal Autoregressive Integrated Moving Average (SARIMA) and Polynomial Regression (PR); then, Empirical Mode Decomposition (EMD), a stationarity-test algorithm, and an autocorrelation-test algorithm were introduced to reconstruct the sales sequence into stationary components with strong seasonality and trend components, which reduced the influences of non-stationarity and nonlinearity on the modeling. In the causality-analysis submodel, 31-dimensional feature data were extracted from influencing factors, such as date, macroeconomy, and promotion activities, and a Gradient-Boosting Decision Tree (GBDT) was used to establish the mapping between influencing factors and future sales because of its excellent ability to fit nonlinear relationships. Finally, the forecasting performance of three combination strategies, namely the boosting series, stacking parallel and weighted-average parallel strategies, were tested. Comparative experiments on three groups of sales data showed that the weighted-average parallel combination strategy had the best performance, with loss reductions of 16.81% and 4.68% for data from the number-one brand, 25.60% and 2.79% for data from the number-two brand, and 46.26% and 14.37% for data from the number-three brand compared with the other combination strategies. Other ablation studies and comparative experiments with six basic models proved the effectiveness and superiority of the proposed model. Full article

(This article belongs to the Special Issue Innovations, Challenges and Emerging Technologies in Data Engineering)

► Show Figures

Figure 1

22 pages, 4761 KiB

Open AccessArticle

An Improved Method for Broiler Weight Estimation Integrating Multi-Feature with Gradient Boosting Decision Tree

by Ximing Li, Jingyi Wu, Zeyong Zhao, Yitao Zhuang, Shikai Sun, Huanlong Xie, Yuefang Gao and Deqin Xiao

Animals 2023, 13(23), 3721; https://doi.org/10.3390/ani13233721 - 1 Dec 2023

Cited by 5 | Viewed by 3162

Abstract

Broiler weighing is essential in the broiler farming industry. Camera-based systems can economically weigh various broiler types without expensive platforms. However, existing computer vision methods for weight estimation are less mature, as they focus on young broilers. In effect, the estimation error increases [...] Read more.

Broiler weighing is essential in the broiler farming industry. Camera-based systems can economically weigh various broiler types without expensive platforms. However, existing computer vision methods for weight estimation are less mature, as they focus on young broilers. In effect, the estimation error increases with the age of the broiler. To tackle this, this paper presents a novel framework. First, it employs Mask R-CNN for instance segmentation of depth images captured by 3D cameras. Next, once the images of either a single broiler or multiple broilers are segmented, the extended artificial features and the learned features extracted by Customized Resnet50 (C-Resnet50) are fused by a feature fusion module. Finally, the fused features are adopted to estimate the body weight of each broiler employing gradient boosting decision tree (GBDT). By integrating diverse features with GBTD, the proposed framework can effectively obtain the broiler instance among many depth images of multiple broilers in the visual field despite the complex background. Experimental results show that this framework significantly boosts accuracy and robustness. With an MAE of 0.093 kg and an R² of 0.707 in a test set of 240 63-day-old bantam chicken images, it outperforms other methods. Full article

(This article belongs to the Section Poultry)

► Show Figures

Figure 1

19 pages, 8653 KiB

Open AccessArticle

Estimation of Agronomic Characters of Wheat Based on Variable Selection and Machine Learning Algorithms

by Dunliang Wang, Rui Li, Tao Liu, Chengming Sun and Wenshan Guo

Agronomy 2023, 13(11), 2808; https://doi.org/10.3390/agronomy13112808 - 13 Nov 2023

Cited by 1 | Viewed by 2824

Abstract

Wheat is one of the most important food crops in the world, and its high and stable yield is of great significance for ensuring food security. Timely, non-destructive, and accurate monitoring of wheat growth information is of great significance for optimizing cultivation management, [...] Read more.

Wheat is one of the most important food crops in the world, and its high and stable yield is of great significance for ensuring food security. Timely, non-destructive, and accurate monitoring of wheat growth information is of great significance for optimizing cultivation management, improving fertilizer utilization efficiency, and improving wheat yield and quality. Different color indices and vegetation indices were calculated based on the reflectance of the wheat canopy obtained by a UAV remote sensing platform equipped with a digital camera and a hyperspectral camera. Three variable-screening algorithms, namely competitive adaptive re-weighted sampling (CARS), iteratively retains informative variables (IRIVs), and the random forest (RF) algorithm, were used to screen the acquired indices, and then three regression algorithms, namely gradient boosting decision tree (GBDT), multiple linear regression (MLR), and random forest regression (RFR), were used to construct the monitoring models of wheat aboveground biomass (AGB) and leaf nitrogen content (LNC), respectively. The results showed that the three variable-screening algorithms demonstrated different performances for different growth indicators, with the optimal variable-screening algorithm for AGB being RF and the optimal variable-screening algorithm for LNC being CARS. In addition, using different variable-screening algorithms results in more vegetation indices being selected than color indices, and it can effectively avoid autocorrelation between variables input into the model. This study indicates that constructing a model through variable-screening algorithms can reduce redundant information input into the model and achieve a better estimation of growth parameters. A suitable combination of variable-screening algorithms and regression algorithms needs to be considered when constructing models for estimating crop growth parameters in the future. Full article

(This article belongs to the Section Precision and Digital Agriculture)

► Show Figures

Figure 1

22 pages, 21410 KiB

Open AccessArticle

A High-Resolution Spatial Distribution-Based Integration Machine Learning Algorithm for Urban Fire Risk Assessment: A Case Study in Chengdu, China

by Yulu Hao, Mengdi Li, Jianyu Wang, Xiangyu Li and Junmin Chen

ISPRS Int. J. Geo-Inf. 2023, 12(10), 404; https://doi.org/10.3390/ijgi12100404 - 3 Oct 2023

Cited by 9 | Viewed by 2645

Abstract

The development and functional perfection of urban areas have led to increasingly severe fire risks in recent decades. Previous urban fire risk assessment methods relied on subjective judgment, rough data collection, simple linear statistical methods, etc. These drawbacks can lead to low robustness [...] Read more.

The development and functional perfection of urban areas have led to increasingly severe fire risks in recent decades. Previous urban fire risk assessment methods relied on subjective judgment, rough data collection, simple linear statistical methods, etc. These drawbacks can lead to low robustness of evaluation and inadequate generalization ability. To resolve these problems, this paper selects the indicator and regression models based on the high-resolution data of the spatial distribution characteristics of Longquanyi distinct in Chengdu, China. and proposes an integrated machine learning algorithm for fire risk assessment. Firstly, the kernel density analysis is used to map the fourteen urban characteristics related to fire risks. The contributions of these indicators (characteristics) to fire risk and its corresponding index are determined by Random Forest (RF), Gradient Boosting Decision Tree (GBDT), and eXtreme Gradient Boosting (XGBoost). Then, the spatial correlation of fire risks is determined through Moran’s I, and the spatial distribution pattern of indicator weights is clarified through the raster coefficient space analysis. Finally, with these selected indicators, we test the regression performance with a backpropagation neural network (BPNN) algorithm and a geographically weighted regression (GWR) model. The results indicate that numerical variables are more suitable than dummy variables for estimating micro-scale fire risks. The main factors with a high contribution are all numerical variables, including roads, gas pipelines, GDP, hazardous chemical enterprises, petrol and charging stations, cultural heritage protection units, assembly occupancies, and high-rise buildings. The machine learning algorithm integrating RF and BPNN shows the best performance (R² = 0.97), followed by the RF-GWR integrated algorithm (R² = 0.87). Compared with previous methods, this algorithm reduces the subjectivity of the traditional assessment models and shows the ability to automatically obtain the key indicators of urban fire risks. Hence, this new approach provides us with a more robust tool for assessing the future fire safety level in urban areas. Full article

(This article belongs to the Special Issue Application of Geographical Information System in Urban Design, Management or Evaluation)

► Show Figures

Figure 1

27 pages, 8593 KiB

Open AccessArticle

Machine Learning for Predicting Forest Fire Occurrence in Changsha: An Innovative Investigation into the Introduction of a Forest Fuel Factor

by Xin Wu, Gui Zhang, Zhigao Yang, Sanqing Tan, Yongke Yang and Ziheng Pang

Remote Sens. 2023, 15(17), 4208; https://doi.org/10.3390/rs15174208 - 27 Aug 2023

Cited by 20 | Viewed by 4155

Abstract

Affected by global warming and increased extreme weather, Hunan Province saw a phased and concentrated outbreak of forest fires in 2022, causing significant damage and impact. Predicting the occurrence of forest fires can enhance the ability to make early predictions and strengthen early [...] Read more.

Affected by global warming and increased extreme weather, Hunan Province saw a phased and concentrated outbreak of forest fires in 2022, causing significant damage and impact. Predicting the occurrence of forest fires can enhance the ability to make early predictions and strengthen early warning and responses. Currently, fire prevention and extinguishing in China’s forests and grasslands face severe challenges due to the overlapping of natural and social factors. Existing forest fire occurrence prediction models mostly take into account vegetation, topographic, meteorological and human activity factors; however, the occurrence of forest fires is closely related to the forest fuel moisture content. In this study, the traditional driving factors of forest fire such as satellite hotspots, vegetation, meteorology, topography and human activities from 2004 to 2021 were introduced along with forest fuel factors (vegetation canopy water content and evapotranspiration from the top of the vegetation canopy), and a database of factors for predicting forest fire occurrence was constructed. And a forest fire occurrence prediction model was built using machine learning methods such as the Random Forest model (RF), the Gradient Boosting Decision Tree model (GBDT) and the Adaptive Augmentation Model (AdaBoost). The accuracy of the models was verified using Area Under Curve (AUC) and four other metrics. The RF model with an AUC value of 0.981 was more accurate than all other models in predicting the probability of forest fire occurrence, followed by the GBDT (AUC = 0.978) and AdaBoost (AUC = 0.891) models. The RF model, which has the best accuracy, was selected to predict the monthly forest fire probability in Changsha in 2022 and combined with the Inverse Distance Weight Interpolation method to plot the monthly forest fire probability in Changsha. We found that the monthly spatial and temporal distribution of forest fire probability in Changsha varied significantly, with March, April, May, September, October, November and December being the months with higher forest fire probability. The highest probability of forest fires occurred in the central and northern regions. In this study, the core drivers affecting the occurrence of forest fires in Changsha City were found to be vegetation canopy evapotranspiration and vegetation canopy water content. The RF model was identified as a more suitable forest fire occurrence probability prediction model for Changsha City. Meanwhile, this study found that vegetation characteristics and combustible factors have more influence on forest fire occurrence in Changsha City than meteorological factors, and surface temperature has less influence on forest fire occurrence in Changsha City. Full article

(This article belongs to the Topic Application of Remote Sensing in Forest Fire)

► Show Figures

Figure 1

26 pages, 6929 KiB

Open AccessArticle

A Novel Heterogeneous Ensemble Framework Based on Machine Learning Models for Shallow Landslide Susceptibility Mapping

by Haozhe Tang, Changming Wang, Silong An, Qingyu Wang and Chenglin Jiang

Remote Sens. 2023, 15(17), 4159; https://doi.org/10.3390/rs15174159 - 24 Aug 2023

Cited by 12 | Viewed by 1878

Abstract

Landslides are devastating natural disasters that seriously threaten human life and property. Landslide susceptibility mapping (LSM) plays a key role in landslide hazard management. Machine learning (ML) models are widely used in LSM but suffer from limitations such as overfitting and unreliable accuracy. [...] Read more.

Landslides are devastating natural disasters that seriously threaten human life and property. Landslide susceptibility mapping (LSM) plays a key role in landslide hazard management. Machine learning (ML) models are widely used in LSM but suffer from limitations such as overfitting and unreliable accuracy. To improve the classification performance of a single machine learning (ML) model, this study selects logistic regression (LR), support vector machine (SVM), random forest (RF), and gradient boosting decision tree (GBDT), and proposes a novel heterogeneous ensemble framework based on Bayesian optimization (BO), namely, stratified weighted averaging (SWA), to test its applicability in a typical landslide area in Yanbian Prefecture, China. Firstly, a dataset consisting of 1531 historical landslides was collected from field investigations and historical records, and a spatial database containing 16 predisposing factors was established. The dataset was divided into a training set and a test set in a ratio of 7:3. The results showed that SWA effectively improved the Accuracy, AUC, and robustness of the model compared to a single ML model. The SWA achieved the best classification results (Accuracy = 91.39% and AUC = 0.967). To verify the generalization ability of SWA, we selected published landslide datasets from Yanshan country and Yongxin country in China for testing. SWA also performed well, with an AUC of 0.871 and 0.860, respectively. As indicated by shapely values (SVs), Normalized Difference Vegetation Index (NDVI) is the factor that has the greatest impact on landslide occurrence. The landslide susceptibility maps obtained from this study will provide an effective reference program for land use planning and disaster prevention and mitigation projects in Yanbian Prefecture, China. Full article

(This article belongs to the Special Issue Remote Sensing Techniques for Landslide Prediction, Monitoring, and Early Warning)

► Show Figures

Graphical abstract

19 pages, 4069 KiB

Open AccessArticle

A Multi-Source Data Fusion Method for Assessing the Tunnel Collapse Risk Based on the Improved Dempster–Shafer Theory

by Bo Wu, Jiajia Zeng, Ruonan Zhu, Weiqiang Zheng and Cong Liu

Appl. Sci. 2023, 13(9), 5606; https://doi.org/10.3390/app13095606 - 1 May 2023

Cited by 5 | Viewed by 2325

Abstract

Collapse is the main engineering disaster in tunnel construction when using the drilling and blasting method, and risk assessment is one of the important means to significantly reduce engineering disasters. Aiming at the problems of random decision-making and misjudgment of single indices in [...] Read more.

Collapse is the main engineering disaster in tunnel construction when using the drilling and blasting method, and risk assessment is one of the important means to significantly reduce engineering disasters. Aiming at the problems of random decision-making and misjudgment of single indices in traditional risk assessment, a multi-source data fusion method with high accuracy based on improved Dempster–Shafer evidence theory (D-S model) is proposed in this study, which can realize the accurate assessment of tunnel collapse risk value. The evidence conflict coefficient K is used as the identification index, and the credibility and importance are introduced. The weight coefficient is determined according to whether the conflicting evidence is divided into two situations. The advanced geological forecast data, on-site inspection data and instrument monitoring data are trained by Cloud Model (CM), Gradient Boosting Decision Tree (GBDT) and Support Vector Classification (SVC), respectively, to obtain the initial BPA value. Combined with the weight coefficient, the identified conflict evidence is adjusted, and then the evidence from different sources is fused to obtain the overall collapse risk value. Finally, the accuracy is selected to verify the proposed method. The proposed method has been successfully applied to Wenbishan Tunnel. The results show that the evaluation accuracy of the proposed multi-source information fusion method can reach 88%, which is 16% higher than that of the traditional D-S model and more than 20% higher than that of the single-source information method. The high-precision multi-source data fusion method proposed in this paper has good universality and effectiveness in tunnel collapse risk assessment. Full article

► Show Figures

Figure 1

Search Results (23)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (23)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI