1. Introduction
Machine learning has evolved into a pivotal discipline within contemporary computation, driving innovation across an expanding range of scientific, industrial, and societal domains. Its rapid development over recent decades has enabled the creation of data-driven solutions that address increasingly complex real-world problems. Today, machine learning techniques underpin advances in engineering, manufacturing, finance, medicine, environmental science, and numerous other fields. This progress is supported by a diverse methodological landscape that spans long-established models—such as linear regression, k-nearest neighbors, decision trees, support vector machines, and classical neural networks—as well as modern, high-capacity approaches including deep learning architectures, transformers, and ensemble-based boosted methods.
Despite remarkable achievements, several methodological challenges remain central to the advancement of machine learning. Identifying suitable model architectures, designing efficient training procedures, and tuning hyperparameters continue to be demanding tasks that significantly influence predictive accuracy and generalization. Additionally, machine learning workflows frequently contend with large-scale, incomplete, noisy, or uncertain data, which require robust preprocessing strategies and specialized modeling techniques. Another enduring priority is interpretability: the ability to articulate how and why a model arrives at its predictions is essential for ensuring transparency, trustworthiness, and responsible deployment in high-stakes environments.
This Special Issue brings together contributions that address these challenges from complementary perspectives. Following rigorous peer review, nine high-quality papers were selected from twenty-five submissions. Collectively, these works encompass diverse problem formulations, data representations, feature engineering strategies, learning algorithms, and performance evaluation methodologies. They additionally highlight the importance of measurable improvements over existing techniques and of the clear interpretation of empirical results. Through this curated selection, the Special Issue seeks to illustrate not only the breadth of machine learning applications but also the methodological innovations that support continued progress in the field.
The research contributions collected in this Special Issue reflect the accelerating evolution of data-driven science, in which machine learning, hybrid computational modeling, environmental monitoring, wearable systems, predictive analytics, and industrial intelligence converge to address a diverse array of scientific and societal challenges. Although the nine articles under consideration originate from distinct disciplinary backgrounds, they share a conceptual foundation in the rigorous treatment of data irregularities, the exploitation of nonlinear learning architectures, the infusion of domain knowledge into computational models, and the pursuit of interpretable yet high-performing predictive systems. This editorial aims to integrate these contributions into a cohesive scientific narrative, providing enough depth, contextualization, and analytical synthesis to serve as a substantive reference for researchers across fields as diverse as hydrology, atmospheric chemistry, pediatric motor development, nephrology, evolutionary computation, commodities forecasting, and production logistics.
2. Summary of the Contributions
The editorial begins by examining contributions focused on environmental modeling, highlighting how agricultural yield prediction, rainfall reconstruction, and ozone forecasting share a reliance on robust imputation and nonlinear machine learning architectures for handling variability induced by climate dynamics and anthropogenic pressures. It then transitions to wearable sensor analytics, where studies of gait irregularity detection and schoolchildren’s motor competence demonstrate how pervasive sensing technologies enable novel forms of physiological and biomechanical assessment. From there, the analysis moves to clinical machine learning for mortality prediction among hemodialysis patients, illustrating the complexities of prognostics in multivariate biomedical domains and the need for standardized predictive feature sets. The editorial then explores computational optimization and its applications to feature selection and financial modeling, before concluding with an examination of data-driven decision support in manufacturing environments. Throughout, the aim is not merely to summarize each study but to clarify how they collectively illuminate broader scientific patterns, methodological tensions, and conceptual advancements.
2.1. Environmental and Climate-Oriented Modeling
The first cluster of articles addresses the pervasive challenge of irregular, incomplete, or noisy environmental data. The article by Călin, Coroiu, and Mureșan situates itself at the nexus of agronomy, climate modeling, and computational data analysis by examining machine learning-based crop yield prediction [
1], with a specific focus on sunflower yield under conditions of climatic variability. Sunflower crops grown without irrigation are especially sensitive to fluctuations in precipitation, temperature, timing of sowing, and the accumulated stress of extreme weather conditions. These variables seldom conform to the structured, dense datasets favored by traditional statistical modeling; instead, agronomic datasets frequently include missing entries across key meteorological variables. The authors therefore treat missing data imputation not as a mundane preprocessing step but as an essential methodological challenge that fundamentally shapes the performance of downstream predictive models.
Their analysis includes a comparison among mean value imputation, similarity-based approaches, deterministic interpolation techniques such as linear and spline interpolation, and predictive imputation based on linear regression, Random Forest, extreme gradient boosting, and histogram gradient boosting. Critically, the authors also examine four unsupervised outlier removal methods to investigate how anomalous observations interact with imputation strategies. Their experiments reveal that Random Forest-based imputation combined with One-Class SVM outlier removal delivers a substantial improvement in predictive accuracy, raising the coefficient of determination from 0.723 to 0.938. This finding should be interpreted within the broader global literature, which increasingly recognizes that ensemble-based imputers capture nonlinear co-variations within environmental variables more effectively than deterministic approaches [
2]. The longitudinal, seasonal, and meteorological dependencies inherent to agricultural yield formation are particularly difficult to reconstruct through simple interpolation because such methods implicitly assume smooth temporal progression rather than variable responses to rainfall anomalies, heat stress, and soil moisture dynamics. The study therefore serves as an important demonstration of why predictive imputation must be treated as an integral component of agronomic modeling pipelines, especially as climate change intensifies variability and disrupts data continuity.
The second article in this cluster, authored by Saubhagya, Tilakaratne, Lakraj, and Mammadov, extends the challenge of missing data reconstruction into the domain of hydrology. Rainfall data are notoriously prone to missingness, often due to equipment malfunction in extreme weather, accessibility limitations in mountainous terrain, or prolonged periods during which manual gauge measurements cannot be performed. The Ratnapura region of Sri Lanka, characterized by diverse topography and exposure to monsoon dynamics, presents an especially challenging dataset containing extended periods of missing values at multiple stations distributed across five years.
The authors propose a hybrid imputation model integrating temporal modeling via multilayer perceptrons with spatial interpolation via kriging. Their innovation lies in the optimal combination of the two components, recognizing that rainfall exhibits nonlinear temporal patterns influenced by seasonal cycles, but also spatial coherence shaped by orographic effects and localized convective behavior. By assigning optimal weights to chemical-based predictions and kriging-based estimates, the hybrid model outperforms standalone approaches, including spatiotemporal kriging. This suggests that rainfall imputation benefits from deep learning’s ability to capture high-frequency temporal anomalies and kriging’s ability to express spatial continuity. Their contribution reflects a broader scientific trend in hydrological modeling where hybrid models increasingly replace monolithic statistical approaches [
3]. As climate change alters rainfall distribution patterns, such models may become essential for flood forecasting, reservoir management, and the calibration of hydrological simulations that depend on complete rainfall datasets.
The third article in this section, by Domínguez-García and Arellano-Vázquez, examines ozone concentration forecasting in Mexico City. The metropolitan valley, marked by high population density, mountainous topography, vehicular traffic, and intense solar radiation, presents a complex atmospheric environment where ozone formation stems from nonlinear chemical interactions among nitrogen oxides, volatile organic compounds, and meteorological factors. Forecasting ozone levels twenty-four hours ahead has profound implications for public health because ozone exacerbates respiratory conditions and contributes to acute and chronic health burdens [
4].
The authors compare Random Forests, Support Vector Regression, and gradient boosting, demonstrating superior performance of ensemble-based models. Their results accord with international atmospheric modeling research, which has repeatedly shown that gradient boosting techniques outperform linear and kernel-based models in capturing the complex relationships underlying tropospheric ozone formation. This study also underscores the limitations of relying exclusively on chemical transport models, which, while physically grounded, often require substantial computational resources and extensive tuning. The authors’ findings confirm that machine learning, when supported by high-quality meteorological inputs, can provide accurate short-term ozone forecasts that enhance environmental decision-making and contextualize real-time air quality alerts.
2.2. Wearable Sensing and Human Movement Analysis
The next thematic grouping concerns wearable sensor analytics and its application to human movement, walkability, and pediatric motor development. The ability to detect subtle changes in gait or walking environments using accelerometer data has major implications for urban planning, fall prevention, rehabilitation, and physical activity monitoring [
5].
Ng, Zhong, Nam, and Youn contribute a study on detecting irregular walking surfaces using LSTM networks trained on data recorded from a single wearable accelerometer. Detecting uneven surfaces is essential for understanding walkability, which influences mobility behaviors, accessibility, and pedestrian risk. Traditional walkability surveys rely heavily on human inspectors or subjective scoring systems, which limits their reproducibility and scalability. The authors propose a classifier that processes raw accelerometer sequences as well as stride-level features extracted from the signal. They find that models trained on multi-stride feature sets achieve the highest area under the curve, although raw-signal models retain advantages in capturing micro-perturbations.
This study belongs to a growing global body of research using wearable sensors to map environmental conditions indirectly through physiological signatures [
6]. The LSTM architecture is particularly well-suited to capturing the temporal dependencies in gait dynamics, where surface irregularities cause subtle variations in step timing, acceleration, and impact absorption. The recent literature in movement science has shown that temporal deep learning models outperform classical machine learning in detecting gait abnormalities, early neurodegenerative signs, and fall-risk indicators [
7]. The authors’ work therefore contributes directly to methodological advances in computational gait analysis, while also offering practical opportunities for low-cost urban walkability monitoring.
Sulla-Torres, Calla Gamboa, Avendaño Llanque, Angulo Osorio, and Zúñiga Carnero extend wearable analytics into pediatric evaluation by examining motor competence among schoolchildren in Arequipa. Motor competence, encompassing walking, running, balance, coordination, and dynamic mobility, has long been associated with physical health trajectories and developmental outcomes [
8]. However, traditional assessment methods often rely on trained evaluators or laboratory-based equipment, which limits accessibility in low-resource settings.
The authors employ smart bands to collect data on cadence, stride, heart rate, and energy expenditure during six-minute walking and running tests administered to more than 760 children. They apply the LMS method to generate normative percentile curves and use a wide range of machine learning models to classify motor competence levels. The gradient boosting model produces the highest performance, illustrating that nonlinear relationships across physiological features are critical for accurate classification. By validating a software prototype designed to implement their classification model in educational settings, the authors bridge the gap between research and applied practice. Their work adds to international research emphasizing the importance of culturally and geographically tailored motor competence assessment using affordable digital tools.
2.3. Predictive Nephrology and Clinical Risk Modeling
Hemodialysis mortality prediction forms a distinct thematic axis in this Special Issue. The systematic review by Motofelea, Mihaescu, Olariu, Marc, Chisavu, Pop, Crintea, Jura, Ivan, Apostol, and Schiller examines machine learning models used to predict mortality among hemodialysis patients, offering a rigorous synthesis guided by PRISMA principles. Hemodialysis patients exhibit 10-20 times higher mortality rates than the general population, due to the high burden of comorbidities, chronic inflammation, cardiovascular deterioration, and the physiological instability inherent to dialysis sessions.
The authors identify logistic regression, Random Forest, and XGBoost as the most frequently employed prediction models. Their meta-analysis finds that Random Forest achieves the highest average AUC, particularly at three-year follow-up horizons, although logistic regression frequently performs comparably despite its relative simplicity. This result is significant because it challenges the assumption that high model complexity necessarily leads to superior performance in clinical prognostics. Instead, model performance depends as much on the quality and structure of input features as it does on algorithmic complexity.
A major insight from the review is that predictor heterogeneity across studies greatly limits model generalization. Some studies rely heavily on biochemical indicators, whereas others emphasize comorbidity indices or dialysis-specific parameters such as ultrafiltration volume, interdialytic weight gain, or heart rate variability. This inconsistency hinders the development of standardized prognostic frameworks. The authors argue that future progress requires the harmonization of predictive feature sets, longitudinal study designs, and robust interpretability mechanisms. Their emphasis on interpretability echoes global trends in clinical machine learning, where black-box models face increasing scrutiny and where transparent, clinically actionable features remain essential for adoption in patient care [
9].
2.4. Other Topics: Evolutionary Optimization, Financial Modeling, and Smart Manufacturing
Zhou, Zheng, Li, Hao, Zhang, Gao, and Wang present a knowledge-guided competitive co-evolutionary algorithm designed for feature selection in high-dimensional datasets. Feature selection plays a crucial role in machine learning, particularly in domains where interpretability, efficiency, and dimensionality reduction are essential [
10]. In many applications, including biomedical modeling, industrial diagnostics, remote sensing, and financial forecasting, the input feature space is vast, and correlations among features can significantly influence model performance. The authors introduce a co-evolutionary mechanism that incorporates correlation-based domain knowledge to guide the search process, enhancing the ability of the algorithm to find meaningful and non-redundant feature subsets.
Their competitive–cooperative evolutionary architecture improves convergence speed while maintaining diversity among solutions, addressing a classical challenge in evolutionary computation. Traditional multi-objective evolutionary algorithms often suffer from premature convergence, or stagnation in local optima, especially when dealing with complex feature landscapes. By integrating correlation awareness into evolutionary dominance relationships, the authors demonstrate that feature selection becomes more efficient and yields subsets that contribute to higher predictive quality and more interpretable downstream models. This approach resonates with the emerging trend of hybrid optimization strategies in which domain-specific information is infused into evolutionary search, thereby reducing the randomness inherent in traditional population-based algorithms and enabling more refined exploration of the solution space [
11].
Financial time series prediction represents another area where hybrid machine learning models increasingly prove advantageous. The study by Huo, Xie, and Li proposes a deep learning-based futures price prediction system supported by a chaotic-mapping artificial bee colony algorithm. Futures markets are characterized by volatility, abrupt regime shifts, nonlinear temporal dependencies, and sensitivity to geopolitical or macroeconomic fluctuations. The LSTM architecture is well-suited for modeling long-term dependencies within such time series [
12], yet LSTM performance is heavily influenced by hyperparameter choices that define network size, learning rates, dropout proportions, and internal gating behavior. Hyperparameter tuning becomes particularly challenging in financial contexts, where overfitting is common and model generalization requires careful balancing of complexity and regularization.
The authors incorporate chaotic mapping into the artificial bee colony algorithm to enhance search diversity and counteract premature convergence. Chaotic sequences generate quasi-random oscillations that improve the exploration of the hyperparameter space, thereby enabling the optimization method to escape local minima. Through backtesting on gold and natural gas futures, the authors demonstrate that the chaotic bee colony-enhanced LSTM exhibits superior performance across error metrics compared to models optimized through conventional heuristic methods. This result reflects a broader shift in computational finance toward hybrid architectures that marry recurrent neural networks with swarm intelligence or evolutionary tuning. Recent research indicates that financial forecasting benefits greatly from models capable of capturing nonlinear patterns while maintaining adaptability in unstable markets [
13]. The hybrid system proposed in this study illustrates how computational intelligence can complement deep learning to address the intrinsic unpredictability of financial data.
In the domain of industrial analytics, Çağlayan and Aksoy investigate the use of machine learning to support the selection of material feeding systems in modern manufacturing. Material feeding strategies determine how components are delivered to assembly lines and directly influence cost efficiency, production stability, lead time, and material handling overhead. Conventional selection methods typically rely on engineering heuristics, rule-of-thumb guidelines, or expert judgment. These approaches become increasingly inadequate as modern manufacturing systems grow more dynamic, with greater variation in product types, material dimensions, demand profiles, and process constraints.
The authors construct a large dataset of more than two thousand materials, each represented by attributes such as usage frequency, storage constraints, handling characteristics, and consumption rates. They train several classification models, ultimately demonstrating the superior performance of artificial neural networks for predicting optimal feeding strategies. A particularly noteworthy aspect of their work is the use of Shapley value analysis, which provides a transparent explanation of feature contributions to the model’s decisions. This step is crucial in industrial contexts, where interpretability not only increases trust but also enables practitioners to integrate model recommendations into existing engineering workflows. Their system shows how machine learning can enhance the adaptability and intelligence of production systems, aligning with global Industry 4.0 trends [
14] toward data-driven automation, resource optimization, and real-time decision-making.
3. Cross-Domain Themes and Conceptual Integration
The diverse contributions summarized above exhibit several overarching scientific themes. One prominent theme is the centrality of high-quality data preprocessing. Whether reconstructing missing agricultural records, rainfall gaps, or inconsistent hemodialysis datasets, many of the studies emphasize that the quality of the input data often determines the attainable predictive performance more than model architecture alone. This observation aligns with the growing movement toward data-centric approaches in machine learning, which argue that computational models reach their full potential only when the structure, continuity, and resolution of input data are sufficiently refined [
15].
Another theme concerns the growing reliance on hybrid approaches [
16]. Hydrological modeling in the Special Issue combines neural networks with geostatistics; futures forecasting merges LSTM networks with swarm intelligence; feature selection fuses evolutionary search with correlation-based knowledge; and wearable analytics blends handcrafted features with deep recurrent architectures. These hybrid models illustrate the advantages of synthesizing domain knowledge with algorithmic flexibility. Rather than choosing between physically grounded models and data-driven models, the articles demonstrate how the combination of both can outperform either approach alone. This reflects a broader epistemological shift in applied sciences, where hybridization increasingly becomes the default expectation rather than a methodological exception.
Interpretability constitutes a third cross-domain theme [
17]. Several articles explicitly incorporate interpretability mechanisms, such as Shapley values, or emphasize the need for transparent feature selection and clinically meaningful predictors. This focus reflects growing recognition that black-box outputs cannot be adopted uncritically in medical, industrial, or policy-relevant applications. Interpretability enables expert validation, fosters trust, and supports ethical decision-making. In manufacturing and clinical settings, where practitioners must justify model recommendations, explainable AI is no longer optional but fundamental.
4. Conclusions
This Special Issue highlights significant advances in data-driven modeling across environmental sciences, human health, financial analytics, and industrial decision support. The nine articles collectively demonstrate that the integration of machine learning, hybrid optimization, and wearable sensing is reshaping predictive science. As these fields continue converging, interdisciplinary collaboration and methodological transparency will remain essential to sustaining scientific progress.