1. Introduction
Machine learning (ML) and artificial intelligence (AI) have entered a phase of accelerated evolution, reshaping the computational landscape and influencing an ever-growing spectrum of scientific and industrial activities. What once were specialized tools designed for narrow analytical tasks have now become integral components of complex pipelines that support decision-making, automate perception, enhance prediction, and facilitate interaction between humans and digital systems. This expansion has been driven not only by advances in algorithmic design but also by the increasing availability of heterogeneous data, the maturation of distributed computing ecosystems, and a heightened societal expectation for systems that are adaptive, transparent, and contextually aware.
As ML systems are deployed in environments that diverge markedly from controlled laboratory conditions, researchers face new methodological tensions. Data encountered in real-world settings may be incomplete, noisy, weakly labeled, fragmented across devices or institutions, or evolving in ways that challenge static modeling assumptions. These constraints have motivated the exploration of learning paradigms capable of functioning under limited supervision, handling uncertainty, and respecting privacy requirements. At the same time, domain specialists in areas such as agriculture, sports analytics, cyberdefense, and document intelligence are pushing for models that incorporate domain structure rather than relying on generic architectures. Complementing these technical imperatives is a parallel line of inquiry that addresses how people perceive, adopt, and interact with AI-driven tools, raising questions about interpretability, trust, accountability, and user behavior.
These developments converge around three major trajectories that increasingly define contemporary ML research: (1) distributed and semi-supervised learning frameworks, which aim to enable robustness and scalability when labels are scarce, data are decentralized, or uncertainty is inherent to the measurement process; (2) specialized, domain-responsive ML solutions that integrate contextual knowledge into model parameterization and evaluation, allowing AI to operate effectively in complex applied settings; and (3) human-centered behavioral modeling and technology acceptance studies, which investigate how individuals and organizations integrate intelligent systems into their workflows, and how model transparency, perceived utility, and personal traits influence adoption.
The ten articles included in this Special Issue exemplify these intertwined directions. They collectively expand the theoretical and practical boundaries of modern ML by introducing novel algorithmic strategies, validating them on demanding real-world datasets, and examining their implications for human-technology interaction. Their contributions resonate with several major currents in global research, including federated and distributed learning [
1], multimodal transformer architectures [
2], predictive models of user acceptance and behavior [
3], and the development of trustworthy, explainable, and ethically aligned AI systems [
4].
To guide readers through this multidisciplinary landscape, the remainder of this editorial is organized as follows.
Section 2 provides a structured synthesis of the ten contributions, grouping them into three thematic categories that reflect the aforementioned research trajectories.
Section 3 discusses conceptual themes that cut across these categories, highlighting methodological synergies and shared challenges.
Section 4 concludes with reflections on emerging research opportunities and the broader significance of the advances presented in this Special Issue.
2. Summary of the Contributions
2.1. Distributed, Semi-Supervised and Weakly-Supervised Learning
The first group of contributions addresses a core challenge in contemporary ML: how to learn reliable models when data are distributed across networks, labels are incomplete or ambiguous, and feature vectors may be both uncertain and partially missing. Rather than assuming centralized data access and fully supervised training, these works embrace the reality of fragmented, weakly supervised, and noisy environments. They do so by combining ideas from distributed optimization, semi-supervised learning, partial-label modeling, and multi-dimensional classification, thereby contributing to a growing body of research on federated and decentralized learning under imperfect supervision [
1,
5,
6,
7].
The paper “Distributed Semi-Supervised Multi-Dimensional Uncertain Data Classification over Networks” by Xu and Chen focuses on multi-dimensional classification in distributed networks where each node observes uncertain data and only a subset of instances have reliable labels. The authors consider a setting in which local nodes construct multi-dimensional classifiers based on their own data while exchanging limited information with neighboring nodes in a communication graph. The proposed method explicitly models the uncertainty of input data and couples this with a semi-supervised learning strategy that can exploit unlabeled observations to regularize the decision boundaries. By embedding this into a consensus-based framework over the network, the algorithm allows each node to benefit from global structure without centralizing the raw data, which is crucial in privacy-sensitive or bandwidth-limited scenarios [
8]. Beyond incremental improvements in accuracy, the main innovation lies in the joint treatment of uncertainty, multi-dimensional outputs, and distributed semi-supervised optimization, providing a template for future work on robust learning in sensor networks and edge-intelligence applications.
The second contribution in this group, authored by Lee and Han, takes the perspective of cybersecurity and mobile platforms. Here, the data distribution is not merely decentralized but also characterized by rapidly evolving threats and limited labeled examples. SMAD (Semi-Supervised Android Malware Detection) tackles Android malware detection by converting application packages into image-like representations and using a segmentation-oriented backbone to extract pixel-level, multi-scale features from these Android Package Kit (APK) images. On top of this representation, the authors introduce a dual-branch semi-supervised objective, in which two parallel prediction branches are encouraged to remain consistent on unlabeled samples. This consistency regularization enables the method to leverage large volumes of unlabeled telemetry data and to remain effective when the distribution of malware families drifts over time. In contrast to traditional signature-based or purely supervised ML systems [
9], SMAD illustrates how modern semi-supervised techniques—rooted in consistency and perturbation-based learning—can be transplanted into the malware domain, bringing ideas from image-based SSL and self-training into security analytics. The fine-grained spatial modeling of APK imagery is particularly innovative, as it allows the detector to capture subtle structural cues that are difficult to encode with handcrafted features.
The remaining two papers in this section focus on partial labels and missing data, further relaxing the assumption of clean supervision. In “Distributed Partial Label Learning for Missing Data Classification”, Xu and Chen study scenarios where each training instance is associated with a set of candidate labels (only one of which is correct), and where feature vectors are themselves incomplete. They propose a distributed partial-label missing-data classification (dPMDC) algorithm that combines generative and discriminative ideas into a unified framework. On the generative side, they design a probabilistic, information-theoretic imputation scheme that exploits the weak supervisory signal embedded in ambiguous labels to infer the missing features. On the discriminative side, once the features are imputed, a classifier is trained using random feature mappings of the
kernel, enabling nonlinear decision boundaries at modest computational cost. The entire procedure is implemented in a distributed manner so that nodes collaboratively refine imputations and classifiers without pooling data centrally. This work extends classical partial-label learning [
6] by coupling it explicitly with missing-feature imputation and by pushing the computation to the network edge, where data are produced.
In “Distributed Partial Label Multi-Dimensional Classification via Label Space Decomposition”, the same authors further generalize partial-label learning to the multi-dimensional case. Here, each instance is associated with multiple heterogeneous label variables, and for each dimension only a subset of candidate labels is known. The proposed dPL-MDC algorithm performs a one-vs.-one decomposition of the original multi-dimensional output space, effectively transforming the problem into a collection of distributed partial multi-label learning tasks. This label-space decomposition serves two purposes: it reduces the complexity of directly modeling interactions in a high-dimensional label space, and it enables parallelized computation across decomposed subproblems, which is well suited to distributed environments. The approach connects naturally with broader research on multi-label and multi-dimensional classification [
7] and complements recent advances in partial-label learning that address issues such as out-of-distribution candidate labels and long-tailed label distributions [
10]. By extending these ideas into a decentralized framework, dPL-MDC helps bridge the gap between sophisticated label modeling and the constraints of networked data acquisition.
Taken together, the four articles form a coherent line of research on learning in distributed, weakly supervised, and imperfect environments. All of them leverage local computation and limited communication instead of assuming fully centralized access, aligning with contemporary developments in federated and edge learning [
1,
5]. At the same time, they explore complementary dimensions of supervision: semi-supervised consistency in SMAD for security telemetry; exploitation of unlabeled and uncertain instances in multi-dimensional classification; and partial labels combined with missing-feature imputation or label-space decomposition. Their novelty is not only algorithmic but also conceptual: they illustrate how ideas from semi-supervised learning, probabilistic modeling, and structured prediction can be systematically adapted to realistic deployment settings where uncertainty, ambiguity, and decentralization are the norm rather than the exception. As such, this cluster of works provides a strong methodological foundation for the subsequent, more application-oriented contributions in this Special Issue.
2.2. Specialized Domain Applications with Machine Learning
The second thematic group highlights how ML methods can be adapted, extended, and refined to meet the unique structural and operational challenges of real-world application domains. These three papers illustrate the broader movement toward domain-aware AI, in which models are not simply transferred from generic benchmarks but are redesigned to exploit domain structure—be it multimodal agricultural data, complex performance dynamics in sports, or geometric distortions in document images. This aligns with a growing body of work emphasizing task-specific inductive biases and multimodal fusion as central to achieving state-of-the-art results in applied ML [
11,
12,
13,
14,
15].
The first contribution in this group, by Jácome Galarza et al., presents an innovative application of transformer architectures to agricultural forecasting. While crop yield prediction has traditionally relied on statistical agronomic models or convolutional networks applied to remote sensing data, AgriTransformer integrates multiple heterogeneous data streams—including satellite imagery and tabular data—within a unified transformer-based framework. By leveraging attention mechanisms, the model learns dynamic interdependencies across modalities. This design reflects the broader trend toward multimodal learning in Earth observation [
16] and is particularly significant given the scarcity, noise, and spatial variability characteristic of agricultural datasets. The authors demonstrate that attention-driven fusion not only improves predictive accuracy but also, through the use of explainability techniques, enables the identification of which components of different modalities contribute most to yield estimates. In doing so, the work provides a path forward for decision support in sustainable agriculture, food security, and climate-resilient farming.
The paper by Chandru et al. turns to the rapidly evolving domain of sports analytics, where strategic decisions increasingly rely on quantitative assessment of player performance. Building on the expanding literature in sports data science [
17], the authors design a ML pipeline that integrates performance metrics, game statistics, and contextual features to estimate players’ contributions and forecast team strategy outcomes. Their approach explores, among other techniques, ensemble learning and the optimization of model selection to improve accuracy while accommodating nonlinear interactions among variables describing players’ statistics. What distinguishes this contribution is the explicit linkage between predictive modeling and actionable strategy: the authors go beyond prediction to provide tactical insights for coaches and analysts. This demonstrates a broader pattern in domain-focused ML—shifting from merely descriptive analytics to prescriptive models that guide decision-making. In an era where professional sports are increasingly data-driven, the article offers a well-structured example of how ML can enhance competitive advantage through evidence-based strategic planning.
In the paper by Cai et al., the authors address a fundamental challenge in document intelligence: reliably detecting and geometrically correcting tables within scanned or photographed documents. Traditional OCR systems and naive detection architectures struggle with distortions, rotations, and variable table layouts, motivating recent research in document AI that leverages deep learning and geometric reasoning [
18,
19]. LocRecNet introduces a two-stage synergistic pipeline that first employs localization modules to detect characteristic points and then uses rectification module to compensate for skew, perspective distortion, and irregular cell geometry. The contribution is notable for its integration of detection and rectification rather than treating them as isolated tasks. This design mirrors trends in computer vision—especially in structured document understanding—where joint optimization leads to more stable and generalizable performance. By demonstrating strong results across diverse document types, the work presents a practical and scalable solution for applications such as automated data extraction, digital archiving, and financial or legal document processing.
Viewed collectively, the three articles exemplify how ML methods can be specialized to leverage domain structure, multimodality, and task-specific constraints. Each contribution demonstrates a distinct innovation: attention-based multimodal fusion in agriculture, predictive strategy analytics in sports, and synergistic geometric reasoning for document understanding. At the same time, they share methodological themes with broader ML research, such as the importance of representation learning, the integration of learning with domain knowledge, and the increasing relevance of task-aware architectures. These works underscore the maturation of applied ML: moving from generic models to tailored, interpretable, and operationally meaningful intelligent systems.
2.3. Human Factors and Behavioral Prediction
The third group of papers underscores a central insight of contemporary AI research: the success of intelligent systems ultimately depends not only on algorithmic performance but also on how humans perceive, interact with, and integrate these systems into their workflows. As AI becomes embedded in identity verification, customer service, and managerial decision-making, understanding user behavior, trust, and acceptance becomes essential. This evolution aligns with long-standing research in information systems and human-computer interaction [
3,
20,
21], as well as with the growing demand for transparent and trustworthy AI [
4,
22].
The first article in this category, by Halász et al., tackles the crucial problem of biometric authentication under open-set conditions. Unlike closed-set classification, open-set recognition must identify whether a user is known or entirely novel, requiring models that can generalize beyond the classes observed during training. The authors adapt an existing open-set framework to time-series biometrics, demonstrating strong performance in identifying impostors, even when variability in user behavior is substantial. Their work ties into broader concerns about secure and explainable biometric systems [
23,
24], illustrating how open-set approaches can mitigate risks associated with spoofing and distributional shift. Importantly, the article shows that incorporating time-series dynamics and uncertainty modeling can help bridge the gap between theoretical open-set constructs and the operational realities of biometric authentication.
In the work by Gené-Albesa and de Andrés-Sánchez, the focus shifts toward human-AI interaction in service-oriented contexts. The authors examine how customers perceive AI-driven chatbots in insurance policyholder support—a domain where user trust, perceived usefulness, and clarity of communication directly influence adoption. By combining explainable AI (XAI) with Importance-Performance Map Analysis (IPMA), the study identifies which features most strongly affect acceptance and offers interpretable insights for system designers. This methodology reflects the ongoing shift toward AI systems whose outputs and recommendations must be understandable to end-users [
4,
25]. The integration of XAI with behavioral modeling is a notable innovation, illustrating how explainability can serve not only regulatory or ethical purposes but also practical design objectives.
Finally, the paper by Cuc et al. explores how individual characteristics influence the intention to adopt AI tools in professional decision-making. Managerial accounting, increasingly shaped by automation, predictive analytics, and decision-support systems, provides a compelling context for studying technology adoption. The authors use decision-tree regression to uncover nonlinear relationships between AI knowledge, personality factors, and usage intention—patterns that traditional linear acceptance models often fail to capture. Their findings resonate with the broader literature on technology adoption, particularly the emphasis on cognitive, affective, and contextual determinants of behavior [
3]. By leveraging ML not only as a subject of study but also as an analytical tool, the paper demonstrates how predictive modeling can augment theory-building in behavioral research.
Taken together, the three articles highlight a crucial dimension of modern AI systems: their embedding in socio-technical ecosystems. From biometric authentication to service chatbots to managerial decision support, human behavior emerges as both a constraint and a driver of system effectiveness. Across these studies, themes such as trust, transparency, and user diversity recur, aligning with ongoing global discussions about responsible AI. The emphasis on interpretability, open-set robustness, and behavior-aware modeling illustrates an important shift in AI research—from a technology-centered paradigm to a human-centered one – mirroring contemporary calls for systems that are trustworthy, explainable, and psychologically attuned to their users.
3. Cross-Domain Themes and Conceptual Integration
Although the ten contributions span diverse application areas and methodological perspectives, several overarching themes reveal a deeper conceptual alignment across the Special Issue. A central unifying motif is the movement toward learning under real-world constraints, where data are incomplete, weakly supervised, uncertain, or distributed across heterogeneous computational environments. This shift reflects a growing recognition that classical assumptions of IID data, centralized processing, and fully supervised labels rarely hold in operational contexts. As illustrated by recent surveys on federated and distributed learning [
1,
5], the increasing prominence of edge devices, privacy regulations, and decentralized infrastructures creates strong demand for algorithms capable of collaboration without centralized data aggregation. The contributions in distributed semi-supervised and partial-label learning directly speak to this global trend, offering models that balance local autonomy with global consistency.
Another cross-domain theme is the progressive specialization of ML architectures. Instead of relying on universal modeling paradigms, researchers increasingly adapt architectures to the structural properties of specific domains, whether through multimodal attention mechanisms in agriculture, geometric rectification in document analysis, or fine-grained spatial representations in cybersecurity. This reflects the broader shift in ML toward domain-aware inductive biases and task-specific architectures, a movement emphasized in contemporary discussions of transformer-based models [
2]. The domain studies included in this Special Issue illustrate how such specialization enhances interpretability, robustness, and operational significance.
A third integrative thread emerges from the human-centered dimension of ML. As AI systems become embedded in professional workflows, consumer-facing services, and identity verification processes, understanding user perception, behavioral variability, and trust becomes as important as optimizing algorithmic accuracy. Research on technology acceptance, explainability, and decision support [
3,
4,
26] highlights that the success of AI initiatives depends on an interplay between model capabilities, user characteristics, and organizational practices. The studies on chatbot acceptance, biometric identification under open-set conditions, and AI adoption in managerial accounting underscore that responsible AI development must integrate psychological, sociotechnical, and ethical considerations alongside computational ones.
Finally, a broader methodological synthesis can be observed across all contributions: the convergence toward data-centric and context-sensitive ML. This orientation prioritizes understanding the structure, quality, and limitations of data, whether through robust strategies for missing labels, multimodal fusion techniques, or behavioral modeling frameworks. Recent calls within the ML community [
27] emphasize that improvements in data quality, representation, and alignment often yield greater gains than increasing model complexity alone. The collective work presented in this Special Issue exemplifies this philosophy by pairing methodological advances with a realistic appraisal of domain constraints and user-centered requirements.
Taken together, these cross-domain themes reflect an evolving ML landscape—one in which scalability, domain integration, and human-centered design are no longer optional enhancements but essential elements of modern intelligent systems. The convergence of these perspectives suggests a future in which ML research becomes increasingly interdisciplinary, context-aware, and attuned to both technical performance and societal impact.