1. Introduction
The rise in ubiquitous computing and the Internet of Things (IoT) has generated a data ecosystem characterized by continuous and heterogeneous streams that challenge traditional approaches to analysis [
1,
2]. These streams are highly dimensional, have complex temporal patterns, and have arrival rates that can reach millions of instances per second [
3,
4,
5].
Against this backdrop, intelligent learning on multidimensional streams is emerging as an interdisciplinary field that integrates machine learning, real-time processing, and high-performance distributed systems [
5]. This field faces key challenges such as the curse of dimensionality, concept drift, and the need for computational scalability [
6,
7].
In this paper, the term intelligent learning is used to broadly refer to the use of machine-learning techniques (including deep learning, reinforcement learning, ensemble methods, and adaptive/online algorithms) on continuously evolving multidimensional data streams in real-time or near-real-time scenarios. This concept encompasses both classic stream-mining algorithms and modern distributed approaches, with privacy preservation.
The analysis of scientific production in emerging areas such as intelligent learning applied to multidimensional data flows is key to understanding both current trends and the real relevance that this field has acquired in the scientific and technological sphere. Bibliometric studies provide a powerful tool for tracking the evolution of research in this domain, identifying patterns of collaboration, citation dynamics, and levels of impact that highlight its growing consolidation within the academic community.
Bibliometrics has become a benchmark discipline in recent decades. The creation of the Institute for Scientific Information (ISI) by Eugene Garfield in the 1960s marked the beginning of the systematic and quantitative analysis of publications, journals, authors, and institutions [
8]. This approach examines aspects such as authorship, scientific productivity, citations, and thematic content using objective indicators applied to large sets of the scientific literature [
9]. Today, the existence of massive databases allows for the automatic measurement of parameters such as keywords, number of citations, number of authors per article, collaboration networks, institutional impact, and annual production trends, among others. The most widely accepted criterion is that a higher volume of citations generally reflects greater relevance and perceived influence in the field [
10].
Researchers choose to cite works that contribute fundamental ideas or are directly related to their own research. Given that this selection is unrestricted, they tend to prioritize the most outstanding contributions that are closest to their line of work; therefore, the most cited articles are a solid indicator of the real impact they have had in their field. This data is extremely valuable for universities, research centers, and funding agencies, as it facilitates decisions related to hiring, defining strategic priorities, and evaluating performance. In addition, bibliometric analyses make it possible to reconstruct the historical trajectory of a topic, detect its key moments, and anticipate the directions it is taking, which is especially helpful for those new to the field in quickly finding their place [
11].
This type of study is only feasible thanks to comprehensive bibliographic databases such as Scopus and Web of Science, which have become essential tools for academic evaluation processes. Although the bibliometric approach has been successfully used in countless disciplines to detect trends, thematic evolutions, and collaborative structures, the specific field of intelligent learning in multidimensional data flows has hardly been addressed from this perspective. There are numerous studies focused on the development of specific algorithms and models for this type of data [
12], but very few studies have systematically and quantitatively analyzed global scientific production in this domain. This absence limits our understanding of the degree of maturity, scope, and real impact of a line of research that has positioned itself as one of the fundamental pillars of intelligent real-time data processing.
Intelligent learning in multidimensional streams consists of extracting useful knowledge and making adaptive decisions in real time in the face of dynamic changes [
13,
14]. It is based on four pillars: multidimensionality, continuous temporality, dynamic adaptability, and computational efficiency [
15,
16,
17,
18].
The main challenges are the curse of dimensionality in temporal environments [
19,
20,
21,
22,
23], multidimensional concept drift [
20,
21,
24], and combinatorial explosion in attention operations and tensors [
18,
25,
26,
27].
Its most relevant applications include anomaly detection and optimization in IoT [
28,
29,
30,
31], adaptive risk management in finance [
32], dynamic personalization in digital platforms, and continuous monitoring in healthcare.
The evolution of the field shows three stages: 2000–2010 (foundations with incremental trees and first characterizations), 2010–2018 (consolidation of frameworks and ensembles), and 2018–2025 (shift towards high dimensionality, online feature selection, and integration with deep learning).
The most notable current trends are Continuous Learning Streaming to avoid catastrophic forgetting [
1,
33,
34,
35], specialized Transformer architectures and temporal tensor factorization [
2,
3,
4,
36,
37], online feature selection based on
norms and reinforcement learning [
13,
16,
17,
38], and hardware architectures for extreme scalability [
39,
40].
Theoretical limitations (Hughes’ paradox and No-Free-Lunch [
41,
42]) and practical limitations of interpretability and heterogeneity persist. Emerging directions include Quantum-Enhanced Streaming [
43], Neuromorphic Stream Processing [
44], and paradigms such as Causal Stream Learning and Continuous Meta-Learning.
3. Results
The analysis was carried out based on the bibliographic metadata of the documents indexed in the Scopus database, restricting the selection to those works related to the topic of intelligent learning on multidimensional data streams. In total, 594 publications were identified, distributed across 276 sources (journals and books, among others) in the period between 2015 and 2025. These contributions were produced by 1588 authors, of whom only 41 signed documents individually, while the majority were co-authored works, with an average of 4.07 authors per publication. The set analyzed showed an annual growth rate of 39.38% and an average of 9.25 citations per document, reflecting sustained and expanding interest in the subject. In addition, the publications included 4771 terms in Keywords Plus and 5730 keywords provided by the authors, demonstrating the diversity of approaches associated with the field of study.
The results presented in
Table 1 show that scientific production on intelligent learning on multidimensional data streams is mainly concentrated in Computer Science (30.33%) and Engineering (19.98%), confirming the predominance of approaches focused on the design of computational models and the development of algorithms applied to spatial data processing. However, the presence of other disciplines such as Mathematics (11.44%), Physics and Astronomy (6.03%), and Decision Sciences (5.49%) reveals a growing interest in addressing the topic from complementary perspectives that incorporate theoretical foundations, mathematical modeling, and decision-making support criteria.
Together, these five areas represent 73.28% of the records analyzed, highlighting not only a significant thematic concentration but also the existence of a considerable scope for strengthening interdisciplinary research. In particular, the contribution of mathematics and physical sciences opens up the possibility of delving deeper into advanced analytical methods, while the incorporation of decision sciences highlights the potential for application in areas related to mobility, logistics, and strategic planning.
Table 2 shows that scientific output on intelligent learning applied to multidimensional data streams has experienced sustained growth over the last decade. Since 2015, there has been a progressive increase in the number of publications, with notable values in 2019 (59.09%), 2022 (70.27%), and 2025 (42.20%). This trend demonstrates a consolidation of interest in the academic community, particularly in the last five years, when the volume of articles reached a remarkable growth rate.
Although there were slight declines in some periods, such as in 2021 with a decrease of 9.76%, the overall trend reflects a rapidly expanding field, with a significant increase from 6 articles in 2015 to 166 in 2025. These results suggest that the subject is maturing, driven by the development of more advanced methodologies, the availability of large volumes of data, and the relevance of its application in various scientific and technological fields. In this sense, the growth dynamics confirm that this is a constantly evolving domain with increasing potential to impact interdisciplinary research.
3.3. Most Cited Articles
An analysis of the most cited articles highlights the thematic diversity and interdisciplinary breadth that research on intelligent learning applied to multidimensional data streams has achieved, as can be seen in
Table 6. The most influential work, “Advancing Biosensors with Machine Learning” with 551 citations, integrates artificial intelligence and biomedicine, showing how machine learning methods, especially convolutional and recurrent neural networks, can enhance detection and analysis using electrochemical, fluorescent, and spectral biosensors, as well as the fusion of data from multiple sensors for more accurate diagnoses. This article highlights the expansion of chemometrics into intelligent and automated applications in the biomedical field.
In the field of energy and smart systems, “Secure and Efficient Federated Learning for Smart Grid With Edge–Cloud Collaboration” with 216 citations stands out for its federated learning proposal that allows artificial intelligence models to be shared without compromising the privacy of users’ energy data. Its approach combines edge computing, cloud computing, and reinforcement learning to optimize the quality of local models and communication efficiency, addressing non-IID data problems and user participation limitations. Complementarily, “A high-accuracy, real-time, intelligent material perception system” with 197 citations applies machine learning to intelligent perception using hybrid e-skins capable of recognizing multidimensional materials and stimuli in real time, opening up possibilities for physical interfaces and touch-sensitive robotics.
In the industrial and transportation sector, publications such as “Digital Twin as Enabler for an Innovative Digital Shopfloor Management System” with 190 citations and articles in Transportation Research Part C (145 and 126 citations) demonstrate the importance of intelligent learning in manufacturing, logistics, and transportation. This research highlights the management of digital twins and the imputation of missing data using Bayesian models or variational autoencoders to optimize traffic prediction, scenario simulation, and production and vehicle flow planning in complex and multidimensional environments.
Finally, other highly cited works demonstrate the breadth of applications of intelligent learning. For example, studies on mechanical fault diagnosis using recurrent neural networks and intelligent microfluidics illustrate its impact on advanced manufacturing and biomedicine. Likewise, research on intelligent personal assistants for language learning and heart disease prediction using supervised algorithms reflects its ability to improve human interactions and the analysis of large medical datasets, consolidating the relevance of these methodologies in practical and multidisciplinary environments.
The high number of citations of these articles confirms that they are fundamental references in their respective fields, either for their theoretical contributions or for their applicability in real-world scenarios. Furthermore, the breadth of areas involved, ranging from biomedicine and energy to transportation and education, highlights the cross-cutting nature of intelligent learning, consolidating it as a key driver for the development of innovative solutions in diverse domains.
Table 7 shows the most productive authors in the field of study. According to the results obtained, the most prominent authors are Wang Yaoze and Zhang Yushuang, who top the list with 21 and 20 articles, respectively. They are followed by Wang Xiuwen with 16 publications, and Li Xiuzheng and Li Yonghui, both with 15 papers. This pattern of productivity suggests the existence of researchers who act as key references within the field, consolidating stable lines of research and promoting the continuous generation of knowledge. In addition, the concentration of publications by these authors indicates that their scientific leadership influences the orientation of studies and the consolidation of collaborative networks in the area.
The analysis of the most productive authors allows us to identify the main contributors in this area of research, as well as the institutions that have promoted greater scientific production. The presence of multiple prominent researchers at specific universities shows a concentration of scientific activity in these centers, which could be related to the existence of specialized research clusters that favor the development and advancement of the discipline.
Likewise, these results reflect trends in institutional and geographical collaboration, mainly in Chinese universities, suggesting that efforts in intelligent learning applied to multidimensional data streams are being led by well-established and well-structured research groups capable of generating a significant impact on scientific production in the area.
3.5. Keyword Strategy Diagram
The strategic keyword diagram allows us to identify the most relevant subject areas within the field of intelligent learning applied to multidimensional data streams, differentiating between those that are well established and those that are emerging or in decline. This analysis is based on two parameters: density, which reflects the degree of internal cohesion of each topic, and centrality, which indicates its level of connection and influence with other topics in the field [
56]. Therefore, the strategic diagram distinguishes research topics according to their degree of development (density) and degree of relevance (centrality).
Figure 1 shows four quadrants that allow us to interpret the status and evolution of topics within the field.
Prior to analysis, generic terms and linguistic noise (“human,” “article,” “controlled study,” “algorithm,” etc.) were thoroughly cleaned using expanded stop words and manual review.
The map classifies topics according to their centrality (importance in the field) and density (degree of internal development of the topic):
Motor Themes (high density and high centrality): Deep learning, machine-learning systems, neural networks, convolutional neural networks. These are the most developed and central topics in the field.
Basic Themes (low density and high centrality): Learning systems, data mining, decision making, Internet of Things. Cross-cutting and fundamental topics.
Niche Themes (high density and low centrality): Adversarial machine learning, contrastive learning, federated learning. Highly specialized topics undergoing rapid internal development but still with less connection to the rest of the field.
Emerging or Declining Themes (low density and low centrality): After cleaning, this quadrant is practically empty, confirming that there are no themes in clear decline.
Overall, the map reveals that the core of the field is solidly rooted in deep learning and classical learning systems, while federated learning and adversarial techniques are emerging as specialized niches with great future potential.
In the upper right quadrant is “learning systems”, a well-developed and highly central topic. This indicates that it is a structuring axis of the area and also maintains strong links with other topics, becoming a driver of research. Its size in the figure confirms its weight within the field. The upper left quadrant contains the topic “adversarial machine learning”, a topic with high density but low centrality. Although it is well defined and developed internally, its contribution is more marginal within the research as a whole. This positions it as a specialized niche. The topic “human” is located in the lower left quadrant, with low density and low centrality. Its location indicates that it is an underdeveloped topic with little influence in the field, which can be interpreted as an emerging but still incipient line or a declining trend. Finally, in the lower right quadrant, we see “deep learning”, characterized by strong centrality but low density. This positioning makes it an essential topic that is widely connected to others, although it does not yet have such a consolidated internal development. It is a basic topic that supports other lines of research and has the potential to evolve into driving topics.
Overall, the diagram reflects that the field is structured around “learning systems” as a driving topic, while “deep learning” acts as a cross-cutting foundation. The topics “adversarial machine learning” and “human” show more specific trajectories: the former as a highly specialized line and the latter as a possible emerging trend.
Figure 2, which corresponds to the analysis of the authors’ keywords, shows patterns that complement and, in some cases, contrast with those identified in
Figure 1. In the upper right quadrant, “learning systems” stand out. With their high density and centrality, they are consolidated as driving themes in the discipline, showing a well-cohesive structure and strong influence on the evolution of the field. In the upper left quadrant are “federated learning” and “adversarial machine learning”, specialized topics that, although highly developed internally, have less connection to the central cores of research, similar to what occurred in
Figure 1 with more peripheral topics.
In contrast, the lower left quadrant contains terms such as “human” and “controlled study”, which reflect emerging or marginal areas with low density and little influence on the articulation of the main advances in the field. Finally, in the lower right quadrant are “deep learning” and “machine learning”, which, as in
Figure 1, appear as highly central cross-cutting axes, although not yet fully developed, confirming their role as methodological and conceptual foundations that support other lines of research.
Comparatively, while
Figure 1 placed “deep learning” as a driving theme, in
Figure 2 it is repositioned as a basic theme, suggesting that from the authors’ perspective, it constitutes a fundamental and broadly transversal axis, rather than a core of specialization in itself. Thus, both figures together allow us to appreciate the coexistence of consolidated approaches, specialized areas, and emerging topics, highlighting the diversity and dynamism of the field of intelligent learning applied to multidimensional data streams.
This diagram shows how the literature has structured the thematic contributions of the field, revealing a balance between established, specialized, and emerging areas. The prominence of approaches such as deep learning and intelligent systems, together with the emergence of specialized topics such as federated learning, reflects the trend toward the integration of advanced and collaborative techniques to address the challenges of managing large volumes of multidimensional data.
3.6. Thematic Evolution of Keywords
For the thematic analysis of keyword evolution, we used the Bibliometrix package with its Biblioshiny graphical interface, establishing ranges of years that allowed us to observe changes in the dynamics of the topics. Based on keyword plus,
Figure 3 shows how the topics have evolved between the periods 2015–2020 and 2021–2025.
Topics in blue tones correspond to consolidated concepts from 2015 to 2019 (concept drift, feature selection, high dimensionality). Topics in yellow/orange tones represent emerging areas since 2021 (federated learning, transformers, edge computing, privacy-preserving). There is a clear shift towards distributed and deep learning-based paradigms.
In the first period, the main topics were “deep learning”, “machine learning”, “learning systems”, and “article”, which formed the conceptual basis of the initial studies. In the second period, some of these topics were transformed or integrated into new orientations. For example, “learning systems” remains a central theme, consolidating itself as a persistent and expanding topic within the topic of “deep learning”.
Likewise, new topics such as “human” and “convolutional neural networks” are emerging, reflecting both a shift towards human–machine interaction and the incorporation of specific architectures in the field of machine learning. On the other hand, “deep learning” retains its relevance, albeit with a more limited scope, and “machine learning” tends to be redistributed towards these new areas of research.
Figure 3 and
Figure 4: The thematic evolution diagrams (Sankey) include both author keywords and keyword plus automatically generated by Scopus. Some terms such as “article,” “human(s),” “review,” or “male/female” are bibliometric artifacts that Scopus assigns according to document type or population studied, and do not reflect conceptual content in the field. These terms appear in the flows because they are present in the original Scopus dataset but should not be interpreted as relevant scientific topics. The real and significant thematic flows in the field are those that connect technical concepts such as “concept drift,” “online learning,” “feature selection,” “high-dimensional data,” “deep learning,” “federated learning,” and “transformers,” as highlighted in the interpretation below.
Overall, the thematic evolution analysis reveals a shift from more general notions of artificial intelligence and learning toward more specific and applied approaches, which demonstrates the maturation of the field and its progressive diversification into emerging lines of research.
Figure 4 shows the thematic evolution of author keywords between the periods 2015–2020 and 2021–2025. In the first stage, as with keyword plus, general topics such as “article”, “deep learning”, “learning systems” and “machine learning” predominate, reflecting the initial interest in laying the conceptual and methodological foundations for research in the field.
The map highlights four clearly differentiated thematic areas:
Red cluster: Detection and adaptation to concept drift and mining of evolutionary flows in non-stationary environments.
Green cluster: Management of high dimensionality, selection and reduction in characteristics in multidimensional data streams.
Blue cluster: Federated learning, edge computing, and applications in the Internet of Things (IoT).
Purple cluster: Integration with deep learning, transformer architectures, deep neural networks, and learning about complex time series.
The size of the nodes is proportional to the frequency of occurrence of each term, and the thickness of the links reflects the strength of co-occurrence. The clear separation between the four clusters confirms that, despite sharing the common core of “data streams + intelligent learning,” the scientific community has currently structured its work into four well-defined and relatively independent thematic subareas.
In the second period, there is a shift towards more specialized and applied areas. Machine learning and deep learning remain central, although integrated with new lines of research. At the same time, topics such as “human” emerge, reflecting the growing attention to human–machine interaction, and “convolutional neural networks”, denoting the incorporation of specific architectures in the analysis of complex data. Notably, “deep learning” remains a topic with an independent evolution, which provides opportunities for expansion toward more autonomous and adaptive learning approaches.
This thematic shift suggests a process of maturation in the field, in which general concepts of machine learning have evolved toward more robust and specialized models. The diversification of topics highlights methodological sophistication and a focus on more practical applications, in line with advances in artificial intelligence and large-scale data processing.
3.7. Degree of Concentration of Selected Variables
To evaluate the distribution of scientific output in this field, the degree of concentration of different bibliometric variables was analyzed using Shannon entropy, a fundamental measure in information theory [
57]. This metric quantifies the uncertainty or diversity in a data distribution: values close to 1 indicate that the elements are evenly distributed, while values close to 0 reflect a high concentration in a few elements. Mathematically, for a discrete probability distribution
that satisfies
, entropy is defined in Equation (
1).
In bibliometrics, entropy is used to study the distribution of equity or concentration of relevant variables, such as research topics and authors. To facilitate interpretation, normalized Shannon entropy is used, defined in Equation (
2).
where
, with
indicating a uniform distribution, i.e., without concentration, and
when the entire distribution is concentrated at a single point.
The normalized entropic concentration index was calculated for the distribution of authors, sources, countries, research areas, and article citations, with the results presented in
Table 9. It can be seen that authors show a highly homogeneous distribution with
, as shown in
Table 7, as do sources with
shown in
Table 5. In contrast, countries show a high concentration with
, based on
Table 3, indicating that scientific production is geographically concentrated in a few countries. However, the distribution of authors within these countries is balanced, suggesting that individual contributions are not dominated by a small group. Research areas show moderate concentration with
()this can be seen in
Table 1), with a predominance of disciplines such as computer science, engineering, decision sciences, and mathematics, which account for 73.28% of publications. Finally, article citations show moderate concentration (H = 0.8154,
Table 6), indicating the existence of multiple influential articles without centralization in a few works.
These results reflect that scientific production in the area is heterogeneous in terms of authors and sources, which points to a field open to new contributions and collaborations. In geographical terms, although some countries account for a large part of the production, the participation of authors within these countries is balanced. In terms of research areas and the influence of articles, it is evident that certain key disciplines and publications are more relevant but without generating excessive dominance, suggesting a diverse and dynamic scientific ecosystem.
Another way to examine how authors are distributed according to their productivity is through Lotka’s law. According to Lotka’s empirical finding [
58], this law states that the number of authors who publish
n articles follows a relationship similar to Zipf’s law. Originally, Lotka analyzed a database limited to physics and chemistry, and the law is expressed in Equation (
3).
where
represents the number of authors who publish
n articles and
represents the number of authors who publish a single article. To generalize this relationship and better adjust it to other fields, it can be expressed as Equation (
4).
where
c is a parameter estimated to optimize the fit to the observed data. In this study, the value of
and
, indicating a very accurate fit.
Table 10 presents the observed distribution of authors according to the number of articles published, together with the frequency adjusted according to Lotka’s law. It can be seen that most authors publish only one article (1287 authors, 81% of the total), while authors with two or more publications represent much smaller proportions. In some cases, such as authors with seven articles, the observed frequency is slightly higher than predicted, reflecting the presence of particularly productive researchers.
Overall, the results confirm that authorship is not evenly distributed: a small group of authors contribute multiple publications, while the majority publish only once. This concentration of productivity is consistent with what Lotka’s law predicts and evidence of the existence of a small core of highly active researchers within the field of study, which could influence the direction and impact of research.
Interpreting these clouds allows readers to immediately and visually identify the dominant themes in the field over the last decade: classic challenges (concept drift, high dimensionality, incremental learning) remain central; paradigms emerging since 2020 (deep learning/transformers, federated learning, edge computing, privacy) have gained significant weight; the most recurrent applications are concentrated in IoT, health, energy, and finance. This visual summary complements previous quantitative analyses and offers a concise and accessible overview of the current state of the field, which is especially useful for novice researchers and professionals seeking entry points into the field.
3.8. Charts of Citations, Sources, and Authors
The visualization presented in
Figure 5 was generated using VOSviewer software, a tool specialized in bibliometric analysis and the graphical representation of relationships between terms. This application, developed by Van Eck and Waltman (2010), allows the identification of co-occurrence patterns based on keywords extracted from titles, abstracts, and descriptors of scientific publications [
45].
Some terms with high frequency but low informational value remain on the map due to their widespread use in titles and abstracts. Therefore, the interpretation focuses exclusively on the technical and applied concepts of greatest scientific relevance, grouped into five major thematic clusters.
The figure shows a density and semantic connection map that groups key concepts related to the field of intelligent learning on multidimensional data streams. Each color in the diagram represents a thematic cluster, that is, a subset of terms that share high levels of co-occurrence within the analyzed documents.
The light blue cluster focuses on terms such as research, use, modeling, concept, and data mining. This group represents an orientation toward the development of analytical models and data mining techniques applied in organizational contexts and complex systems. In the green cluster, words such as simulation, user, practice, student, course, and higher education stand out, suggesting a focus on designing user-centered (student-centered) educational experiences to improve teaching and learning processes. The red cluster groups terms such as recognition, classification, image, prediction model, and input. This segment is clearly linked to visual data processing, pattern recognition, and prediction using machine-learning models. The yellow cluster contains words such as internet, vehicle, smart city, and energy efficiency. This group points to technological applications in the field of the Internet of Things (IoT), smart vehicles, and energy efficiency. The dark blue cluster includes terms such as computer, progress, AI technology, and mapping, which refer to advances in emerging technologies in artificial intelligence and digital cartography. Finally, the purple cluster is composed of terms such as conference, topic, case study, and intelligent systems, reflecting a focus on academic production and the dissemination of knowledge in scientific forums.
Terms in blue correspond to topics consolidated between 2015 and 2019 (concept drift, feature selection, high dimensionality), whereas terms in yellow/orange tones represent emerging topics that have gained relevance since 2021 (federated learning, transformers, edge computing, privacy-preserving). A clear shift is observed from the traditional challenges of concept drift and high dimensionality toward modern paradigms based on distributed learning and Transformer architectures.
Central terms such as research, recognition, data mining, classification, and modeling function as bridges between the different clusters, highlighting their integrative role in the development of research. This type of analysis allows us to identify key areas of interest, as well as possible future lines of exploration in the field of intelligent learning about data streams.
Figure 6 shows a new term map generated with VOSviewer, this time using a binary counting method. Unlike the complete counting used in
Figure 5, this method counts each term only once per document, regardless of how many times it is repeated. This approach subtly modifies the results, as it reduces the weight of the most frequent words and gives greater visibility to the lexical diversity present in the analyzed texts.
In the binary count, some extremely frequently used terms persist due to their appearance in virtually all documents. For this reason, the interpretation focuses exclusively on the technical and applicative concepts of greatest scientific relevance, avoiding methodological or generic terms that do not contribute any distinctive thematic value.
In this new co-occurrence map, significant changes can be observed in the structure of thematic clusters. One of the most notable aspects is the merging of the yellow cluster with key terms such as classification, topic, recognition, and detection. This grouping suggests a more focused approach to the analysis of classification and detection strategies applied to intelligent systems and machine learning, marking a difference from the previous map.
The red cluster remains one of the most prominent, including words such as experiment, feature, input, prediction model, and experimental result.
The interpretation of these entropy values as in
Table 9 is as follows: low entropy in authors (H = 0.412) and countries (H = 0.538) indicates a high concentration of scientific production in a few actors and regions (especially China and a small group of very prolific researchers). In contrast, the higher entropy in research areas (H = 0.719) reflects a growing diversification beyond pure computer science, with notable participation from mathematics, physics, and decision sciences. This contrast offers the reader a clear picture of the maturity of the field: high productive concentration coupled with progressive interdisciplinary openness, which is particularly valuable information for funders and science policy makers.
These terms reinforce the idea that a considerable part of the literature continues to focus on practical applications, empirical validations, and experimental testing, especially in contexts such as fault diagnosis, intelligent transportation, and multidimensional feature analysis.
For their part, the green and blue clusters maintain their relevance, albeit with slight changes in their connections. The green cluster groups terms such as progress, innovation, limitation, view, and advancement, reflecting a critical and forward-looking view of technological advances in the field. The blue cluster, on the other hand, includes words such as platform, industry, IoT, and interaction, suggesting the growing integration of these technologies in industrial environments and interconnected systems.
The small purple cluster, although less dense, incorporates terms associated with implementation and teaching, such as implementation, teaching, and learner, denoting an emerging interest in knowledge transfer and capacity building in areas related to intelligent systems.
This type of visualization allows for clearer detection of the thematic diversity of the field of study, as well as the interconnections between technical, methodological, and applied areas. In addition, it provides a useful tool for identifying emerging trends and possible future lines of research.
Figure 7 presents a cloud map of publication sources, distinguishing the main academic media that have contributed to the dissemination of research on algorithms, trajectory clustering methods, GPS trajectories, urban planning, and traffic. As also shown in
Table 5, the source with the highest number of contributions is Lecture Notes in Computer Science, with 23 articles, consolidating its position as the most representative space for the dissemination of work in this area. It is followed by Communications in Computer and Information Science and Smart Innovation, Systems and Technologies, with 15 and 11 publications, respectively, as well as Advances in Intelligent Systems and Computing and the IEEE Internet of Things Journal, with 9 articles each.
The size of each node indicates the total frequency of publication in that source, while the color (blue → green → yellow scale) reflects the average year of publication of articles in each source. This visualization allows you to immediately identify
Established sources with a long history (blue tones): Lecture Notes in Computer Science, IEEE Transactions, etc.
Recently emerging and rapidly growing sources (yellow tones): IEEE Internet of Things Journal, IEEE Access, Smart Innovation...
The combination of both types confirms the maturity of the field (continuity in classic sources) and its current dynamism (explosion in high-impact specialized journals on IoT and AI).
The map allows us to visualize the frequency of appearance of each source, reflected in the size of the nodes, and its evolution over time using a color scale. We can see that journals and conferences such as IEEE Internet of Things Journal and IEEE Access have had a more recent participation, while series such as Lecture Notes in Computer Science show sustained continuity over time, reaffirming their central role in the dissemination of knowledge. Together, these sources demonstrate the diversity of publication spaces, ranging from specialized conferences to high-impact journals in artificial intelligence and the Internet of Things, confirming the interdisciplinary and dynamic nature of the field.
Figure 8 presents the cloud map constructed from the most cited articles listed in
Table 6, where the size of each node depends on the number of citations received. This visualization, similar to the previous figure, graphically highlights the influence of certain works on the development of the topic Intelligent Learning on Multidimensional Data Streams.
The size of each node reflects the total number of citations received by the author’s works (or the most representative article in the case of teams). The color indicates the average year of the citations received (blue → green → yellow scale): blue tones correspond to seminal authors/articles whose influence was consolidated between 2015 and 2019, while shades that transition to yellow indicate more recent impactful contributions (2022–2025).
This visualization allows for immediate identification of
Classic and foundational authors (blue tones): Cui (2020), Wang (2018), and Chen (2017), among others.
Emerging authors with recent high impact (yellow tones): Su (2022), Wei (2022), and Tang (2023), among others.
The most prominent node corresponds to the article by Cui et al. (2020), published in ACS Sensors, which, with 551 citations, leads the most influential production in applying machine-learning techniques to the advancement of biosensors [
46]. It is followed by Su et al. (2022), with a proposal for federated learning for smart grids [
47], and Wei et al. (2022), who introduces a real-time material perception system based on machine learning [
48]. These results reflect a shift towards concrete applications in highly relevant domains such as health, energy, and smart materials.
In contrast to the previous figure—where the most cited articles focused on vehicle trajectories and precision mapping—the current visualization shows greater thematic diversity. Works such as Brenner and Hummel (2017), on Digital Twin applied to smart factory management [
49], and Zhu et al. (2022), focused on traffic data imputation using Bayesian tensor factorization, consolidate this breadth of approaches [
51]. In this way, the most cited articles not only mark trends of impact in different disciplines but also confirm the cross-cutting nature of intelligent learning applied to multidimensional data streams.
Applying Shannon entropy to these variables allows the reader to obtain a clear structural overview of the field without the need for specific numerical values: the distribution of authors and countries shows a relatively high concentration (typical of emerging technical areas dominated by established groups in China, the US, and India), while the distribution of research areas reveals significantly greater and increasing diversity. This combination reflects a field undergoing healthy maturation: a strong core of scientific leadership coupled with a progressive openness to interdisciplinary contributions from advanced mathematics, physics, and decision sciences. This information is particularly useful for early-career researchers, doctoral students, and organizations seeking to identify both current centers of excellence and opportunities for thematic diversification.
This bibliometric study follows the standard multidimensional approach widely accepted in this type of work (annual production, research areas, geographical distribution, collaboration patterns, keyword analysis, and thematic evolution) with the aim of providing a comprehensive overview of the field. Although geographical distribution may be of sociological interest, the central contribution of this article lies in identifying the predominant methods and domains of application and their evolution over time. These aspects are explicitly addressed through the analysis of authors’ keywords (
Table 8), co-occurrence networks (
Figure 6), thematic maps (
Figure 7 and
Figure 8), and, especially, thematic evolution diagrams (
Figure 3 and
Figure 4), which clearly show the progressive predominance of methods based on deep learning, concept drift adaptation techniques, federated learning, and transformer architectures, as well as their main domains of application (IoT systems/sensor networks, energy, finance, health, and transportation).
4. Conclusions
The bibliometric analysis carried out made it possible to characterize the current state and evolution of scientific production on intelligent learning applied to multidimensional data streams during the period 2015–2025. The results show sustained and accelerated growth in the literature, with an annual increase of 39.38%, which shows that this is a rapidly expanding field. The volume of publications, the diversity of sources, and the consolidation of thematic lines reflect the progressive maturity of this area, which combines both methodological developments and practical applications in various domains.
In disciplinary terms, the concentration in computer science and engineering confirms the leading role of these areas in the formulation of algorithms and models applied to the processing of spatial and multidimensional data. However, the growing participation of mathematics, physical sciences, and decision sciences reveals an interdisciplinary openness, this interdisciplinary openness is quantitatively supported by the high dispersion value obtained for the distribution of research areas (entropy = 0.7188,
Table 9) and by the notable presence of mathematics, physics and astronomy, and decision sciences among the fifteen most frequent subject areas shown in which theoretical foundations, mathematical modeling, and applications oriented toward strategic planning and decision making converge. This diversity suggests fertile ground for research that integrates complementary perspectives, particularly in emerging fields such as mobility, logistics, and complex systems management.
Geographically, the results show an uneven picture: China leads in terms of quantity of production, while countries such as the United States, Canada, and Germany stand out for their impact as measured by citations, reflecting different contribution profiles. Likewise, international cooperation is still limited, although cases such as Canada and Australia show a higher degree of collaboration. This finding points to the need to strengthen international research networks, especially in emerging regions, to promote greater diversity and global visibility.
A review of the most cited articles revealed the cross-cutting nature of intelligent learning applied to multidimensional data streams, with applications in biomedicine, energy, manufacturing, transportation, and human–machine interfaces. This research not only constitutes key references in theoretical terms but also demonstrates a high degree of applicability in real-world scenarios. The variety of topics identified confirms that the field is establishing itself as a catalyst for innovative solutions in different scientific and technological domains.
Analysis of keywords and their thematic evolution shows that research has shifted from general approaches, such as deep learning and machine learning, to more specific and applied areas, such as federated learning, convolutional neural networks, and topics related to human–machine interaction. This transition reflects a process of diversification and maturation in the field, in which established topics, specialized areas, and emerging lines of research coexist. Overall, the results allow us to conclude that intelligent learning on multidimensional data streams is an area of growing academic and practical relevance, with high potential for interdisciplinary impact and multiple open challenges in terms of quality, international collaboration, and consolidation of new areas of research.
In summary, this work offers readers an updated and quantitatively grounded roadmap of the field of intelligent learning in multidimensional streams: it identifies established and emerging thematic clusters, the most influential authors and institutions, geographical and international collaboration gaps, and the fastest-growing methodological trends (federated learning, transformers, edge computing). This information is particularly useful for early-career researchers seeking to position their work, doctoral students who need to define their state of the art, and funding agencies wishing to prioritize lines of research with high potential and a low current level of international collaboration.
Future research in intelligent learning on multidimensional data streams should focus on the design of more efficient and adaptive algorithms capable of processing heterogeneous data in real time. In addition, it is essential to strengthen interdisciplinary and international collaboration to broaden the impact of findings and diversify applications in areas such as mobility, biomedicine, and complex systems management. Finally, progress is needed on ethical and sustainability issues, including privacy protection and the energy efficiency of models.