MDPI - Publisher of Open Access Journals

25 pages, 2771 KB

Open AccessArticle

Data Fusion and Machine Learning for Diagnosing Electrical and Mechanical Faults in BLDC Motors

by Marek Karbowniczyn and Jerzy Baranowski

Machines 2026, 14(6), 680; https://doi.org/10.3390/machines14060680 (registering DOI) - 11 Jun 2026

Viewed by 47

One of the main challenges in BLDC motor diagnostics is the identification of faults with different physical origins, especially in mixed states where the symptoms of multiple faults may overlap. In this work, a classification system based on feature-level data fusion was developed [...] Read more.

One of the main challenges in BLDC motor diagnostics is the identification of faults with different physical origins, especially in mixed states where the symptoms of multiple faults may overlap. In this work, a classification system based on feature-level data fusion was developed by combining current and rotational signals. A homogeneous Stacking Ensemble model was used as the main mechanism for fault classification. The study was conducted on a dataset of 184 samples representing four operating conditions: healthy operation, mechanical faults, electrical faults associated with permanent magnet degradation, and their combined occurrence. The stability of the proposed classifier was evaluated using ten different data splits. The experiments showed that omitting PCA preserves more diagnostically relevant information contained in the raw features, resulting in a classification accuracy of 97.3% with a standard deviation of 0.017. PCA consistently reduced performance across all considered data modalities. The model was further analysed using SHAP, indicating that its decisions were driven by physically interpretable features from both the rotational and current domains. Full article

(This article belongs to the Section Machine Design and Theory)

30 pages, 14210 KB

Open AccessArticle

Characterising Multivariate Air Pollution State Evolution in an Urban Atmosphere Using Deep-Learned Baseline Representations: London

by Arda Eraslan, David Topping, Dudley E. Shallcross, M. A. H. Khan and Aşan Bacak

Atmosphere 2026, 17(6), 589; https://doi.org/10.3390/atmos17060589 - 8 Jun 2026

Viewed by 374

Abstract

Urban air quality management has been playing a significant role due to its effects on public health and pollution characteristics of countries with constantly changing policies. Traditional approaches capture how much pollution is present but are unable to detect changes in the chemical [...] Read more.

Urban air quality management has been playing a significant role due to its effects on public health and pollution characteristics of countries with constantly changing policies. Traditional approaches capture how much pollution is present but are unable to detect changes in the chemical character of the atmosphere, the relationships between co-emitted species, the balance of photochemical processing, and the combustion fingerprint of emission sources. This study introduces a framework that identifies and diagnoses such evolutions within the pollutants of the atmosphere. A chemistry-aware Variational Autoencoder is trained on 19 multivariate pollution features (7 raw concentrations, 5 chemical ratios, 7 temporal gradients) at London Marylebone Road (urban roadside) and North Kensington (urban background) from 2015 to 2019, and tested on 2022–2025. A four-method ensemble framework (VAE reconstruction error, reconstruction probability, Isolation Forest, and statistical Z-score) requires ≥3 agreement to identify high-confidence departed pollution states. Per-feature decomposition of the reconstruction probability diagnoses the chemical character of each departure. At the roadside site, 14.5% of post-COVID hours fall within departed states, dominated by the CO/NO_x combustion ratio (513.2) and the photostationary state proxy (391.4), chemical relationships rather than individual concentrations. This indicates that at the point of emission, London’s fleet modernisation and Ultra Low Emission Zone (ULEZ) have changed the combustion fingerprint and photochemical equilibrium. The same structural indicators are carried over during the COVID-19 lockdown; however, O₃ rises 3.2× during the pandemic period, reflecting suppressed NO titration. Conversely, at the urban background site, where the departures are driven by concentrations and boundary-layer trapping (

r = - 0.659

), the combustion fingerprint of the atmosphere is invisible to detect (CO/NO_x

= - 45.0

). These findings indicate that London’s emission landscape has undergone fundamental transformations over the past decade, and the consequences of ULEZ and similar interventions or greater impacts of pandemic-related events are non-homogeneously distributed across the relevant region. Full article

(This article belongs to the Special Issue Advances in Air Pollution Data Analysis: From Classical Geostatistics to Big Data and Artificial Intelligence)

► Show Figures

Graphical abstract

31 pages, 1820 KB

Open AccessArticle

Do Complex Models Matter? Evidence from Multiclass Machine Learning Models in Credit Outlook Prediction

by Rashmi Malhotra, Davinder Malhotra, Robert Nydick and Nathan Coates

J. Risk Financial Manag. 2026, 19(6), 389; https://doi.org/10.3390/jrfm19060389 - 28 May 2026

Viewed by 190

Abstract

This research explores whether boosting model complexity enhances the forecasting of corporate financial outlook in a multiclass credit outlook setup. Instead of viewing distress as simply a yes-or-no result, companies are divided into negative, neutral, and positive outlook categories to better reflect shifting [...] Read more.

This research explores whether boosting model complexity enhances the forecasting of corporate financial outlook in a multiclass credit outlook setup. Instead of viewing distress as simply a yes-or-no result, companies are divided into negative, neutral, and positive outlook categories to better reflect shifting credit conditions. The study evaluates a parametric baseline against several nonlinear classifiers—including ensemble, kernel-based, and similarity-driven approaches—while applying a consistent validation process and statistical testing. On average, nonlinear models outperform the linear specification in terms of out-of-sample accuracy and provide more homogeneous classification across the three outlook categories. Importantly, they substantially improve the identification of firms with financial vulnerabilities. Among nonlinear models, average performance differences are economically small and statistically insignificant. These findings suggest that there are diminishing returns to additional complexity once nonlinear structure is allowed for in the models. SHAP-based interpretability provides exploratory evidence that model decisions are economically intuitive and broadly consistent with nonlinear, state-dependent credit risk dynamics. Negative financial surprises tend to be penalized more heavily than positive ones are appreciated, demonstrating the convex nature of the underlying risk dynamics. Full article

(This article belongs to the Special Issue Financial Decision Making in the Age of Artificial Intelligence)

► Show Figures

Figure 1

29 pages, 4432 KB

Open AccessArticle

When Does Machine Learning Add Value over Theory? Predicting API Solubility in Binary Mixtures with COSMO-RS and DOOIT2 Across Diverse and Homogeneous Systems

by Maciej Przybyłek, Tomasz Jeliński, Adrian Drużyński and Piotr Cysewski

Molecules 2026, 31(10), 1566; https://doi.org/10.3390/molecules31101566 - 8 May 2026

Viewed by 682

Abstract

Predicting the solubility of active pharmaceutical ingredients (APIs) in binary aqueous-organic mixtures is critical for formulation design, yet remains challenging. Physics-based models such as COSMO-RS provide a solid theoretical foundation but often struggle with non-ideal mixing behavior in complex systems. This study asks [...] Read more.

Predicting the solubility of active pharmaceutical ingredients (APIs) in binary aqueous-organic mixtures is critical for formulation design, yet remains challenging. Physics-based models such as COSMO-RS provide a solid theoretical foundation but often struggle with non-ideal mixing behavior in complex systems. This study asks a practical question: when does machine learning actually add value beyond established theory? We compared COSMO-RS with DOOIT2 (Dual-Objective Optimization with Iterative Feature Pruning), a hybrid COSMO-RS/machine-learning correction workflow, across two complementary datasets: 85 structurally diverse APIs and related formulation-relevant compounds (10,140 data points) and 37 acid-centered solutes (6030 data points). The datasets also incorporate newly measured solubilities of lidocaine, benzocaine, and vanillic acid in aqueous 4-formylmorpholine mixtures. DOOIT2 employs rigorous API-out Structured Group K-Fold validation with fold-specific ensemble models to ensure realistic assessment of generalization to unseen compounds. The obtained results are dataset-dependent. For the homogeneous acid series, COSMO-RS already delivers strong predictive performance (RMSD = 0.321, R² = 0.925), and DOOIT2 brings no meaningful improvement (RMSD = 0.310, R² = 0.923). In contrast, for the diverse API set, DOOIT2 reduces RMSD from 0.686 to 0.527 and increases R² from 0.829 to 0.849. Residual analysis indicates that prediction uncertainty is driven primarily by the low-solubility region rather than by a simple monotonic dependence on molecular weight alone. These findings delineate the practical boundaries of machine-learning assistance in solubility prediction and offer clear guidance for formulation scientists. Full article

(This article belongs to the Special Issue Organic Molecules in Drug Discovery and Development)

► Show Figures

Figure 1

37 pages, 636 KB

Open AccessArticle

Protocol-Dependent Critical Exponents in Random Composites: Beyond Universality

by Simon Gluzman, Zhanat Zhunussova, Akylkerey Sarvarov and Vladimir Mityushev

Symmetry 2026, 18(4), 700; https://doi.org/10.3390/sym18040700 - 21 Apr 2026

Cited by 1 | Viewed by 449

Abstract

Classical homogenization theory treats critical exponents as universal quantities depending only on spatial dimension, but recent evidence shows that this assumption fails for continuum composites once the mechanism of randomness generation is taken into account. We synthesize three complementary frameworks—structural approximation, structural sums, [...] Read more.

Classical homogenization theory treats critical exponents as universal quantities depending only on spatial dimension, but recent evidence shows that this assumption fails for continuum composites once the mechanism of randomness generation is taken into account. We synthesize three complementary frameworks—structural approximation, structural sums, and self-similar renormalization—to develop a unified geometric theory of criticality in random composites. Dilute-regime expansions for the effective conductivity and shear modulus are expressed in terms of structural sums whose ensemble statistics depend sensitively on the randomness protocol. To bridge the dilute and critical regimes, we employ self-similar factor approximants, iterated-root approximants, additive approximants, and renormalization schemes based on minimal-difference and minimal-sensitivity conditions, combined with Borel summation. For maximally disordered protocols

P (τ)

, the conductivity index s and the elasticity index S fall within comparable numerical ranges, indicating a shared geometric origin and spectral response to the continuous breaking of translational symmetry. A regular periodic arrangement of inclusions (

τ = 0

) possesses full discrete translational symmetry; as a stochastic protocol

P (τ)

is applied (

τ

increases), this symmetry is gradually degraded until statistical chaos is reached. For instance, the parameter

τ

can be considered as a time of stirring. During this evolution, the system traverses a continuous spectrum of critical indices,

s = s [P (τ)],

which encodes the geometric and topological memory of the initial ordered state. It is established that the classical “universality” of percolation corresponds to a fixed point

τ

within a broader manifold of protocol-dependent critical behaviors. The framework developed here provides a coherent basis for inverse design, diagnostics, and classification of random composites by their disorder history, offering a geometric alternative to the universality paradigm. Full article

(This article belongs to the Section Mathematics)

► Show Figures

Figure 1

29 pages, 2804 KB

Open AccessArticle

Ensemble Graph Neural Networks for Probabilistic Sea Surface Temperature Forecasting via Input Perturbations

by Alejandro J. González-Santana, Giovanny A. Cuervo-Londoño and Javier Sánchez

Electronics 2026, 15(8), 1583; https://doi.org/10.3390/electronics15081583 - 10 Apr 2026

Viewed by 402

Abstract

Accurate regional ocean forecasting requires models that are both computationally efficient and capable of representing predictive uncertainty. This work investigates ensemble learning strategies for sea surface temperature (SST) forecasting using Graph Neural Networks (GNNs), with a focus on how input perturbation design affects [...] Read more.

Accurate regional ocean forecasting requires models that are both computationally efficient and capable of representing predictive uncertainty. This work investigates ensemble learning strategies for sea surface temperature (SST) forecasting using Graph Neural Networks (GNNs), with a focus on how input perturbation design affects forecast skill and uncertainty representation. We adapt a GNN architecture to the Canary Islands region in the North Atlantic and implement a homogeneous ensemble approach inspired by bagging, where diversity is introduced during inference by perturbing initial ocean states rather than retraining multiple models. Several noise-based ensemble generation strategies are evaluated, including Gaussian noise, Perlin noise, and fractal Perlin noise, with systematic variation of noise intensity and spatial structure. Ensemble forecasts are assessed over a 15-day horizon using deterministic metrics (RMSE and bias) and probabilistic metrics, including the Continuous Ranked Probability Score (CRPS) and the Spread–skill ratio. The results show that, while deterministic skill remains comparable to the single-model forecast, the type and structure of input perturbations influence uncertainty representation, particularly at longer lead times. Ensembles generated with spatially coherent perturbations, such as low-resolution Perlin noise, achieve improved calibration and lower CRPS compared to purely random Gaussian perturbations. These findings highlight the role of noise structure and scale in ensemble GNN design, indicating that specifically structured input perturbations can improve ensemble diversity and calibration without additional training cost. These results provide a methodological contribution toward the study of ensemble-based GNN approaches for regional ocean forecasting. Full article

(This article belongs to the Special Issue Feature Papers in Artificial Intelligence)

► Show Figures

Figure 1

16 pages, 5847 KB

Open AccessArticle

Reshaping Optical Speckles and Random Light Beam

by Yi Cui and Jun Xiong

Photonics 2026, 13(4), 342; https://doi.org/10.3390/photonics13040342 - 31 Mar 2026

Viewed by 577

Abstract

Speckle patterns generated by coherent illumination of random media are ubiquitous in optical imaging and information processing. However, most existing studies have primarily focused on isotropic or homogeneous speckle fields, while controlled manipulation of speckle patterns with customized geometric morphologies has received comparatively [...] Read more.

Speckle patterns generated by coherent illumination of random media are ubiquitous in optical imaging and information processing. However, most existing studies have primarily focused on isotropic or homogeneous speckle fields, while controlled manipulation of speckle patterns with customized geometric morphologies has received comparatively little attention. Here, we propose a random phase-coded array (RPA) as a general framework for generating geometrically reshaped speckle, enabling the formation of nonconventional random light fields whose ensemble-averaged intensity distributions follow prescribed geometric shapes. In this framework, the speckle geometry is determined by the unit-cell structure of the RPA, the unit-cell size governs the overall spatial extent of the speckle pattern, and the illuminating beam size sets the characteristic speckle grain size. These relationships are rigorously validated through theoretical derivations and numerical simulations. As a result, the global statistical envelope of the random light field can be intuitively and flexibly controlled without compromising the fully developed speckle characteristics. Our experimental framework offers a straightforward, scalable, and versatile approach for generating customized random light fields, with potential applications in optical information processing, secure optical communication, computational imaging, and speckle-based metrology. Full article

(This article belongs to the Special Issue Ghost Imaging and Quantum-Inspired Classical Optics)

► Show Figures

Figure 1

21 pages, 575 KB

Open AccessArticle

An Adaptive Online Knowledge Distillation Algorithm for Edge Computing Models Enhanced by Elite-Students

by Jincheng Xia, Yan Zhou, Xu Yang and Chengyan Zhao

Mathematics 2026, 14(5), 878; https://doi.org/10.3390/math14050878 - 5 Mar 2026

Viewed by 661

Abstract

In recent years, deep learning models have exhibited exceptional performance across several tasks. However, the substantial computational and storage demands impede implementation on edge devices with constrained resources. Online Knowledge Distillation (OKD) has arisen as an effective model compression strategy that removes the [...] Read more.

In recent years, deep learning models have exhibited exceptional performance across several tasks. However, the substantial computational and storage demands impede implementation on edge devices with constrained resources. Online Knowledge Distillation (OKD) has arisen as an effective model compression strategy that removes the reliance on pre-trained teachers characteristic of conventional distillation approaches. Nonetheless, OKD persists in facing challenges, including substantial performance variances within student networks, insufficient learning capacity for difficult data, and network homogeneity. To address those issues, this paper proposes an Elite-Student-Enhanced Adaptive Online Knowledge Distillation (ESAKD) algorithm. ESAKD introduces a patience factor-based adaptive temperature scheduling mechanism to dynamically balance knowledge clarity and richness during knowledge transfer. This mechanism utilizes the performance benefits of elite-students, particularly during initial training phases, to offer superior supervision that successfully transcends the learning capacity limitations of current OKD approaches. This method promotes swift convergence and substantially enhances the ultimate precision of the standard-student models. Furthermore, a confidence-weighted ensemble student model is designed to improve collective decision-making. Experimental assessments indicate that ESAKD provides substantial performance improvements over supervised learning without distillation and other leading online distillation techniques. On the CIFAR-100 dataset, ESAKD improves the test accuracy of various models by 1.49–6% over the undistilled baselines, and by 0.27–2.18% compared to advanced online distillation algorithms. Moreover, it exhibits enhanced performance on the Tiny-ImageNet-200 dataset as well. Full article

► Show Figures

Figure 1

49 pages, 2852 KB

Open AccessArticle

Color–Distance Relations in Cometary Comae: A 14-Comet, Multi-Epoch Statistical Study

by Alberto Silva Betzler, Ingrid dos Santos Delfino, Agábio Brasil dos Santos and Orahcio Felicio de Sousa

Universe 2026, 12(3), 65; https://doi.org/10.3390/universe12030065 - 27 Feb 2026

Viewed by 566

Abstract

Color–distance relations in the comae of 14 comets are analyzed using homogeneous broadband UBV/BVRI photometry. The sample includes several inner-Solar-System–reaching comets, including a subset from near-Earth orbits in the dynamical sense (perihelion distance

q < 1.3

au), so the results are directly relevant [...] Read more.

Color–distance relations in the comae of 14 comets are analyzed using homogeneous broadband UBV/BVRI photometry. The sample includes several inner-Solar-System–reaching comets, including a subset from near-Earth orbits in the dynamical sense (perihelion distance

q < 1.3

au), so the results are directly relevant to the near-Earth meteoroid environment. For each comet, we combine robust color statistics, rank-correlation tests, and simple activity laws to define two empirical diagnostics: an absolute color at 1 au and a differential heliocentric color index that measures color changes with distance. The ensemble does not follow a single universal trend; instead, we identify three empirical classes. One class of comets shows significant color gradients, usually confined to blue-sensitive indices and consistent with varying gas-to-dust ratios along the orbit. A second class exhibits colors that are persistently redder than the Sun and are statistically consistent with being constant both with heliocentric distance and across perihelion. A third class of “step comets” shows discrete changes in color level between pre- and post-perihelion branches, most often in red or red–near-IR indices, with little or no monotonic color–distance correlation within each branch. Several objects therefore defy the intuitive expectation of becoming bluer as they approach the Sun, emphasizing that heliocentric color evolution is highly object-dependent and that multi-epoch color monitoring is essential for interpreting cometary coma behavior. Full article

(This article belongs to the Special Issue Detection and Tracking of Near-Earth Asteroids)

► Show Figures

Figure 1

25 pages, 2523 KB

Open AccessArticle

Link Prediction in Heterogeneous Information Networks: Improved Hypergraph Convolution with Adaptive Soft Voting

by Sheng Zhang, Yuyuan Huang, Ziqiang Luo, Jiangnan Zhou, Bing Wu, Ka Sun and Hongmei Mao

Entropy 2026, 28(2), 230; https://doi.org/10.3390/e28020230 - 16 Feb 2026

Cited by 1 | Viewed by 667

Abstract

Complex real-world systems are often modeled as heterogeneous information networks with diverse node and relation types, bringing new opportunities and challenges to link prediction. Traditional methods based on similarity or meta-paths fail to fully capture high-order structures and semantics, while existing hypergraph-based models [...] Read more.

Complex real-world systems are often modeled as heterogeneous information networks with diverse node and relation types, bringing new opportunities and challenges to link prediction. Traditional methods based on similarity or meta-paths fail to fully capture high-order structures and semantics, while existing hypergraph-based models homogenize all high-order information without considering their importance differences, diluting core associations with redundant noise and limiting prediction accuracy. Given these issues, we propose the VE-HGCN, a link prediction model for HINs that fuses hypergraph convolution with soft-voting ensemble strategy. The model first constructs multiple heterogeneous hypergraphs from HINs via network frequent subgraph pattern extraction, then leverages hypergraph convolution for node representation learning, and finally employs a soft-voting ensemble strategy to fuse multi-model prediction results. Extensive experiments on four public HIN datasets show that the VE-HGCN outperforms seven mainstream baseline models, thereby validating the effectiveness of the proposed method. This study offers a new perspective for link prediction in HINs and exhibits good generality and practicality, providing a feasible reference for addressing high-order information utilization issues in complex heterogeneous network analysis. Full article

(This article belongs to the Special Issue Advances in Complex Networks and Their Applications, from COMPLEX NETWORKS 2025)

► Show Figures

Figure 1

24 pages, 2506 KB

Open AccessArticle

CEVD: Cluster-Based Ensemble Learning for Cross-Project Vulnerability Detection

by Yang Cao, Yunwei Dong and Jie Liu

Future Internet 2026, 18(2), 85; https://doi.org/10.3390/fi18020085 - 5 Feb 2026

Viewed by 614

Abstract

Deep learning has become an important approach for automated software vulnerability detection. However, due to domain shift, existing models often suffer from significant performance degradation when applied to unseen projects. To address this issue, prior studies have widely adopted Domain Adaptation (DA) techniques [...] Read more.

Deep learning has become an important approach for automated software vulnerability detection. However, due to domain shift, existing models often suffer from significant performance degradation when applied to unseen projects. To address this issue, prior studies have widely adopted Domain Adaptation (DA) techniques to improve cross-project generalization. Nevertheless, these methods typically rely on the implicit “project-as-domain” assumption and require access to target project data during training, which limits their applicability in practice. To overcome these limitations, this paper proposes a vulnerability detection framework that combines semantic clustering with ensemble-based Domain Generalization (DG), termed Cluster-based Ensemble Learning for Vulnerability Detection (CEVD). CEVD first performs unsupervised clustering on code semantic embeddings to automatically automatically identify latent semantic structures that transcend project boundaries, constructing pseudo-domains with intra-domain homogeneity. A soft domain labeling strategy is further introduced to model the membership of samples in multiple pseudo-domains, preserving semantic overlap across boundaries. Building upon this, CEVD employs an ensemble learning framework that jointly trains multiple expert models and a domain classifier. The predictions of these experts are dynamically fused based on the weights generated by the domain classifier, enabling effective vulnerability detection on unseen projects without requiring access to target data. Extensive experiments on real-world datasets demonstrate that CEVD consistently outperforms state-of-the-art baselines across different pre-trained backbone models. This work demonstrates the effectiveness of semantic structure mining in capturing latent domains and offers a practical solution for improving generalization in cross-project vulnerability detection. Full article

(This article belongs to the Special Issue Security of Computer System and Network)

► Show Figures

Figure 1

28 pages, 922 KB

Open AccessArticle

MAESTRO: A Multi-Scale Ensemble Framework with GAN-Based Data Refinement for Robust Malicious Tor Traffic Detection

by Jinbu Geng, Yu Xie, Jun Li, Xuewen Yu and Lei He

Mathematics 2026, 14(3), 551; https://doi.org/10.3390/math14030551 - 3 Feb 2026

Viewed by 810

Abstract

Malicious Tor traffic data contains deep domain-specific knowledge, which makes labeling challenging, and the lack of labeled data degrades the accuracy of learning-based detectors. Real-world deployments also exhibit severe class imbalance, where malicious traffic constitutes a small minority of network flows, which further [...] Read more.

Malicious Tor traffic data contains deep domain-specific knowledge, which makes labeling challenging, and the lack of labeled data degrades the accuracy of learning-based detectors. Real-world deployments also exhibit severe class imbalance, where malicious traffic constitutes a small minority of network flows, which further reduces detection performance. In addition, Tor’s fixed 512-byte cell architecture removes packet-size diversity that many encrypted-traffic methods rely on, making feature extraction difficult. This paper proposes an efficient three-stage framework, MAESTRO v1.0, for malicious Tor traffic detection. In Stage 1, MAESTRO extracts multi-scale behavioral signatures by fusing temporal, positional, and directional embeddings at cell, direction, and flow granularities to mitigate feature homogeneity; it then compresses these representations with an autoencoder into compact latent features. In Stage 2, MAESTRO introduces an ensemble-based quality quantification method that combines five complementary anomaly detection models to produce robust discriminability scores for adaptive sample weighting, helping the classifier to emphasize high-quality samples. MAESTRO also trains three specialized GANs per minority class and applies strict five-model ensemble validation to synthesize diverse high-fidelity samples, addressing extreme class imbalance. We evaluate MAESTRO under systematic imbalance settings, ranging from the natural distribution to an extreme 1% malicious ratio. On the CCS’22 Tor malware dataset, MAESTRO achieves 92.38% accuracy, 64.79% recall, and 73.70% F1-score under the natural distribution, improving F1-score by up to 15.53% compared with state-of-the-art baselines. Under the 1% malicious setting, MAESTRO maintains 21.1% recall, which is 14.1 percentage points higher than the best baseline, while conventional methods drop below 10%. Full article

(This article belongs to the Special Issue New Advances in Network Security and Data Privacy)

► Show Figures

Figure 1

33 pages, 12641 KB

Open AccessArticle

Exploring the Impact of Different Clustering Algorithms on the Performance of Ensemble Learning-Based Mass Appraisal Models

by Suleyman Sisman, Abdullah Kara and Arif Cagdas Aydinoglu

Buildings 2026, 16(3), 615; https://doi.org/10.3390/buildings16030615 - 2 Feb 2026

Cited by 1 | Viewed by 871

Abstract

Mass appraisal models are gaining use for improving valuation accuracy, yet their performance remains highly sensitive to how spatial and non-spatial data are structured before training. Clustering algorithms can be used to segment heterogeneous property groups into more homogeneous ones, potentially improving predictive [...] Read more.

Mass appraisal models are gaining use for improving valuation accuracy, yet their performance remains highly sensitive to how spatial and non-spatial data are structured before training. Clustering algorithms can be used to segment heterogeneous property groups into more homogeneous ones, potentially improving predictive performance. This study investigates the impact of different clustering algorithms, (i.e., K-Means, K-Medians and the Spatially Constrained Multivariate Clustering Algorithm (SCMCA)), on the performance of prominent ensemble learning-based mass appraisal models (i.e., Random Forest (RF), the Gradient Boosting Machine (GBM), Extreme Gradient Boosting (XGBoost) and the Light Gradient Boosting Machine (LightGBM)). Using a comprehensive real estate dataset, clustering quality is evaluated using Silhouette, Calinski–Harabasz, and Davies–Bouldin indices, and the performance of cluster-based ensemble mass appraisal models is then compared. The findings indicate that the best performance is achieved with the SCMCA–LightGBM model combination, which reached RMSE = 0.061 and R² = 0.722. Furthermore, it is determined that clustering-based models provide improvements of up to 7.26% in MAE, 10.61% in MAPE, and 8.40% in RMSE, depending on the combination. The results show that clustering is an effective preprocessing step that can substantially enhance the predictive performance and overall quality of mass appraisal models. Full article

(This article belongs to the Special Issue Study on Real Estate and Housing Management—2nd Edition)

► Show Figures

Figure 1

23 pages, 8188 KB

Open AccessArticle

Enhanced Pix2pixGAN with Spatial-Channel Attention for Underground Medium Inversion from GPR

by Sicheng Yang, Liangshuai Guo, Yahan Yang and Hongxia Ye

Remote Sens. 2026, 18(3), 448; https://doi.org/10.3390/rs18030448 - 1 Feb 2026

Viewed by 687

Abstract

Ground penetrating radar (GPR) data inversion, especially in parallel-layered homogeneous media with multiple subsurface targets, still faces challenges in accurately reconstructing geometric structures due to weak reflections and complex target–medium interactions. To address these limitations, this paper proposes a novel multi-scale inversion framework [...] Read more.

Ground penetrating radar (GPR) data inversion, especially in parallel-layered homogeneous media with multiple subsurface targets, still faces challenges in accurately reconstructing geometric structures due to weak reflections and complex target–medium interactions. To address these limitations, this paper proposes a novel multi-scale inversion framework named GPRGAN-SCSE (Ground Penetrating Radar Generative Adversarial Network with Spatial-Channel Squeeze and Excitation). Built upon the Pix2Pix Generative Adversarial Network (Pix2PixGAN), the proposed model incorporates a Spatial-Channel Squeeze and Excitation (SCSE) module into a residual U-Net generator to adaptively enhance target features embedded in layered media. Furthermore, a tri-scale discriminator ensemble is designed to enforce structural consistency and suppress layer-induced artifacts. The network is optimized using a composite loss integrating adversarial loss, L1 loss, and gradient difference loss to jointly improve structural continuity and boundary sharpness. Experiments conducted on a simulation dataset of parallel-layered homogeneous media with multiple targets demonstrate that GPRGAN-SCSE substantially outperforms existing inversion networks. The proposed method reduces the MAE by 63.8% and achieves a Structural Similarity Index (SSIM) of 99.96%, effectively improving the clarity of subsurface edges and the fidelity of geometric contours. These results confirm that the proposed framework provides a robust and high-precision solution for non-destructive subsurface imaging under layered media conditions. Full article

(This article belongs to the Topic Ground Penetrating Radar (GPR) Techniques and Applications, 2nd Edition)

► Show Figures

Graphical abstract

20 pages, 12133 KB

Open AccessArticle

Lithofacies Identification by an Intelligent Fusion Algorithm for Production Numerical Simulation: A Case Study on Deep Shale Gas Reservoirs in Southern Sichuan Basin, China

by Yi Liu, Jin Wu, Boning Zhang, Chengyong Li, Feng Deng, Bingyi Chen, Chen Yang, Jing Yang and Kai Tong

Processes 2025, 13(12), 4040; https://doi.org/10.3390/pr13124040 - 14 Dec 2025

Viewed by 578

Abstract

Lithofacies, as an integrated representation of key reservoir attributes including mineral composition and organic matter enrichment, provides crucial geological and engineering guidance for identifying “dual sweet spots” and designing fracturing strategies in deep shale gas reservoirs. However, reliable lithofacies characterization remains particularly challenging [...] Read more.

Lithofacies, as an integrated representation of key reservoir attributes including mineral composition and organic matter enrichment, provides crucial geological and engineering guidance for identifying “dual sweet spots” and designing fracturing strategies in deep shale gas reservoirs. However, reliable lithofacies characterization remains particularly challenging owing to significant reservoir heterogeneity, scarce core data, and imbalanced facies distribution. Conventional manual log interpretation tends to be cost prohibitive and inaccurate, while existing intelligent algorithms suffer from inadequate robustness and suboptimal efficiency, failing to meet demands for both precision and practicality in such complex reservoirs. To address these limitations, this study developed a super-integrated lithofacies identification model termed SRLCL, leveraging well-logging data and lithofacies classifications. The proposed framework synergistically combines multiple modeling advantages while maintaining a balance between data characteristics and optimization effectiveness. Specifically, SRLCL incorporates three key components: Newton-Weighted Oversampling (NWO) to mitigate data scarcity and class imbalance, the Polar Light Optimizer (PLO) to accelerate convergence and enhance optimization performance, and a Stacking ensemble architecture that integrates five heterogeneous algorithms—Support Vector Machine (SVM), Random Forest (RF), Light Gradient Boosting Machine (LightGBM), Convolutional Neural Network (CNN), and Long Short-Term Memory (LSTM)—to overcome the representational limitations of single-model or homogeneous ensemble approaches. Experimental results indicated that the NWO-PLO-SRLCL model achieved an overall accuracy of 93% in lithofacies identification, exceeding conventional methods by more than 6% while demonstrating remarkable generalization capability and stability. Furthermore, production simulations of fractured horizontal wells based on the lithofacies-controlled geological model showed only a 6.18% deviation from actual cumulative gas production, underscoring how accurate lithofacies identification facilitates development strategy optimization and provides a reliable foundation for efficient deep shale gas development. Full article

(This article belongs to the Special Issue Numerical Simulation and Application of Flow in Porous Media)

► Show Figures

Figure 1

Search Results (128)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (128)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI