Intelligent Learning on Multidimensional Data Streams: A Bibliometric Analysis of Research Evolution and Future Directions

Reyes, Gary; Tolozano-Benites, Roberto; Lanzarini, Laura; Hasperué, Waldo; Barzola-Monteses, Julio

doi:10.3390/info16121067

Open AccessArticle

Intelligent Learning on Multidimensional Data Streams: A Bibliometric Analysis of Research Evolution and Future Directions

by

Gary Reyes

^1,2,*

,

Roberto Tolozano-Benites

¹

,

Laura Lanzarini

³

,

Waldo Hasperué

³

and

Julio Barzola-Monteses

^1,2

¹

Artificial Intelligence Research Group, Universidad Bolivariana del Ecuador, Campus Durán Km 5.5 vía Durán Yaguachi, Durán 092405, Ecuador

²

Facultad de Ciencias Matemáticas y Físicas, Universidad de Guayaquil, Cdla. Universitaria Salvador Allende, Guayaquil 090514, Ecuador

³

Instituto de Investigación en Informática LIDI (Centro CICPBA), Facultad de Informática, Universidad Nacional de La Plata, Buenos Aires CP1900, Argentina

^*

Author to whom correspondence should be addressed.

Information 2025, 16(12), 1067; https://doi.org/10.3390/info16121067

Submission received: 30 October 2025 / Revised: 26 November 2025 / Accepted: 29 November 2025 / Published: 3 December 2025

(This article belongs to the Section Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

Intelligent learning applied to multidimensional data streams has established itself as a rapidly expanding field, driven by the growth of ubiquitous computing and the Internet of Things. The complexity of these streams, characterized by their high dimensionality, variability, and continuous nature, poses significant challenges for traditional approaches to analysis. This study presents a bibliometric analysis of scientific output indexed in Scopus between 2015 and 2025, with the aim of identifying trends, challenges, and opportunities in this field. The results show sustained growth in publications, a marked interdisciplinary orientation, and a diversity of applications including transportation, biomedicine, energy, and information systems. Likewise, there is a geographical concentration in certain leading countries and uneven development in terms of international collaboration. This work contributes to mapping the current state of the field and points to future lines of research aimed at its consolidation.

Keywords:

data streams; intelligent learning; bibliometric analysis; knowledge discovery; information processing; research trends

Graphical Abstract

1. Introduction

The rise in ubiquitous computing and the Internet of Things (IoT) has generated a data ecosystem characterized by continuous and heterogeneous streams that challenge traditional approaches to analysis [1,2]. These streams are highly dimensional, have complex temporal patterns, and have arrival rates that can reach millions of instances per second [3,4,5].

Against this backdrop, intelligent learning on multidimensional streams is emerging as an interdisciplinary field that integrates machine learning, real-time processing, and high-performance distributed systems [5]. This field faces key challenges such as the curse of dimensionality, concept drift, and the need for computational scalability [6,7].

In this paper, the term intelligent learning is used to broadly refer to the use of machine-learning techniques (including deep learning, reinforcement learning, ensemble methods, and adaptive/online algorithms) on continuously evolving multidimensional data streams in real-time or near-real-time scenarios. This concept encompasses both classic stream-mining algorithms and modern distributed approaches, with privacy preservation.

The analysis of scientific production in emerging areas such as intelligent learning applied to multidimensional data flows is key to understanding both current trends and the real relevance that this field has acquired in the scientific and technological sphere. Bibliometric studies provide a powerful tool for tracking the evolution of research in this domain, identifying patterns of collaboration, citation dynamics, and levels of impact that highlight its growing consolidation within the academic community.

Bibliometrics has become a benchmark discipline in recent decades. The creation of the Institute for Scientific Information (ISI) by Eugene Garfield in the 1960s marked the beginning of the systematic and quantitative analysis of publications, journals, authors, and institutions [8]. This approach examines aspects such as authorship, scientific productivity, citations, and thematic content using objective indicators applied to large sets of the scientific literature [9]. Today, the existence of massive databases allows for the automatic measurement of parameters such as keywords, number of citations, number of authors per article, collaboration networks, institutional impact, and annual production trends, among others. The most widely accepted criterion is that a higher volume of citations generally reflects greater relevance and perceived influence in the field [10].

Researchers choose to cite works that contribute fundamental ideas or are directly related to their own research. Given that this selection is unrestricted, they tend to prioritize the most outstanding contributions that are closest to their line of work; therefore, the most cited articles are a solid indicator of the real impact they have had in their field. This data is extremely valuable for universities, research centers, and funding agencies, as it facilitates decisions related to hiring, defining strategic priorities, and evaluating performance. In addition, bibliometric analyses make it possible to reconstruct the historical trajectory of a topic, detect its key moments, and anticipate the directions it is taking, which is especially helpful for those new to the field in quickly finding their place [11].

This type of study is only feasible thanks to comprehensive bibliographic databases such as Scopus and Web of Science, which have become essential tools for academic evaluation processes. Although the bibliometric approach has been successfully used in countless disciplines to detect trends, thematic evolutions, and collaborative structures, the specific field of intelligent learning in multidimensional data flows has hardly been addressed from this perspective. There are numerous studies focused on the development of specific algorithms and models for this type of data [12], but very few studies have systematically and quantitatively analyzed global scientific production in this domain. This absence limits our understanding of the degree of maturity, scope, and real impact of a line of research that has positioned itself as one of the fundamental pillars of intelligent real-time data processing.

Intelligent learning in multidimensional streams consists of extracting useful knowledge and making adaptive decisions in real time in the face of dynamic changes [13,14]. It is based on four pillars: multidimensionality, continuous temporality, dynamic adaptability, and computational efficiency [15,16,17,18].

The main challenges are the curse of dimensionality in temporal environments [19,20,21,22,23], multidimensional concept drift [20,21,24], and combinatorial explosion in attention operations and tensors [18,25,26,27].

Its most relevant applications include anomaly detection and optimization in IoT [28,29,30,31], adaptive risk management in finance [32], dynamic personalization in digital platforms, and continuous monitoring in healthcare.

The evolution of the field shows three stages: 2000–2010 (foundations with incremental trees and first characterizations), 2010–2018 (consolidation of frameworks and ensembles), and 2018–2025 (shift towards high dimensionality, online feature selection, and integration with deep learning).

The most notable current trends are Continuous Learning Streaming to avoid catastrophic forgetting [1,33,34,35], specialized Transformer architectures and temporal tensor factorization [2,3,4,36,37], online feature selection based on

ℓ_{1, 2}

norms and reinforcement learning [13,16,17,38], and hardware architectures for extreme scalability [39,40].

Theoretical limitations (Hughes’ paradox and No-Free-Lunch [41,42]) and practical limitations of interpretability and heterogeneity persist. Emerging directions include Quantum-Enhanced Streaming [43], Neuromorphic Stream Processing [44], and paradigms such as Causal Stream Learning and Continuous Meta-Learning.

2. Materials and Methods

To analyze scientific output in the field of “intelligent learning in multidimensional data streams,” a comprehensive methodology was designed that integrated bibliometric analysis with advanced data visualization tools. Information was collected from articles indexed exclusively in the Scopus database, taking into account those that addressed this topic using algorithms, models, or applications related to intelligent learning.

2.1. Search Strategy and Data Acquisition

In the first phase, the metadata of the retrieved records was refined, limiting the collection to publications between 2015 and September 2025. The search was restricted to the Title, Abstract, and Keywords fields of the Scopus database and combined the presence of terms related to intelligent learning (“intelligent learning”, “machine learning”, or “deep learning”) with the mandatory mention of “multidimensional” data and at least one of the expressions “data flow*” or “data stream*”, limiting the publication period to the years 2015 through September 2025.

To ensure the relevance of the dataset, a strict filtering process was applied, manually discarding works that were not directly related to the subject of study. The criteria established were that the selected articles had to contain the words “intelligent,” “learning,” and “multidimensional” simultaneously, as well as incorporate at least one of the complementary terms “data flows” or “data streams” in their description. As a result, 594 articles were identified, covering both theoretical contributions and applied developments on intelligent learning in complex and multidimensional data streams.

2.2. Bibliometric Analysis Tools

The analysis of bibliographic networks was carried out using VOSviewer (version 1.6.20), a specialized software that allowed the representation of co-citation networks, collaboration between authors, and thematic distribution [45]. The following fixed parameters were applied in all VOSviewer analyses:

Normalization method: association strength
Design algorithm: VOSviewer design

Bibliometrix (version 4.1.4) and its biblioshiny interface, developed in R, were also used to explore the evolution of topics, the use of keywords, and the identification of emerging trends. Both applications, which are open source and free, offer advantages in terms of replicability and allow other researchers to apply this approach in similar studies.

2.3. Specific Visualization Parameters

Keyword co-occurrence network: author keywords, full count, minimum occurrences = 10 → 127 keywords met the threshold.
Country collaboration network: minimum number of documents per country = 3; the largest connected set was retained.
Thematic evolution map: two time intervals (2015–2019 and 2020–2025), inclusion index weighted by word occurrences.
Shannon entropy was calculated using the standard formula for distributions of authors, countries, and research areas.

Finally, Shannon entropy was used to assess the concentration and diversity of variables such as the distribution of authors, countries, and research areas. This measure allowed us to estimate the degree of homogeneity in the dispersion of the data and provided a solid basis for interpreting patterns of concentration in authorship, the dynamics of international collaborations, and the diversification of research lines related to intelligent learning applied to multidimensional data streams.

To analyze scientific production in the field of “Intelligent Learning on Multidimensional Data Streams”, a comprehensive methodology was designed that integrated bibliometric analysis with advanced data visualization tools.

Information was collected from articles indexed exclusively in the Scopus database, considering those that addressed this topic using algorithms, models, or applications related to intelligent learning.

In the first phase, the metadata of the retrieved records were refined, limiting the collection to publications between 2015 and September 2025. In order to ensure the relevance of the set, a strict filtering process was applied, manually discarding works not directly related to the topic of study. The criteria established were that the selected articles had to contain the words “intelligent”, “learning”, and “multidimensional” simultaneously, as well as incorporate at least one of the complementary terms “data flows” or “data streams” in their description. As a result, 594 articles were identified, covering both theoretical contributions and applied developments on intelligent learning in complex and multidimensional data streams.

Although the PRISMA flow diagram is the most widely used standard for systematic reviews, in rapidly evolving technical fields (and especially in bibliometric studies), it is frequently adapted or replaced by specific protocols. In this study, a customized methodology was designed for the following three main reasons:

Highly specialized technical terms (“intelligent learning,” “multidimensional data streams”) required expert manual filtering that could not be adequately captured by automatic title/abstract searches.
It was essential to include conference proceedings and early-access articles, which were particularly relevant in computer science but were typically excluded from strict PRISMA schemes.
The bibliometric tools used (VOSviewer and Bibliometrix) required a clean, duplicate-free dataset from the outset.

Nevertheless, the four classic phases of the PRISMA protocol (identification → screening → eligibility → inclusion) have been rigorously followed and are transparently described in this section.

This tool facilitated the construction of structural graphs that showed the interconnections between the most influential works, journals, and authors in the field. Bibliometrix (version 4.1.4) and its biblioshiny interface, developed in R, were also used to explore the evolution of topics, the use of keywords, and the identification of emerging trends. Both applications, which are open source and free, offer advantages in terms of replicability and enable other researchers to apply this approach in similar studies.

Finally, Shannon entropy was used to evaluate the concentration and diversity of variables such as the distribution of authors, countries, and research areas. This measure allowed us to estimate the degree of homogeneity in the dispersion of the data and provided a solid basis for interpreting patterns of concentration in authorship, the dynamics of international collaborations, and the diversification of lines of research related to intelligent learning applied to multidimensional data streams.

3. Results

The analysis was carried out based on the bibliographic metadata of the documents indexed in the Scopus database, restricting the selection to those works related to the topic of intelligent learning on multidimensional data streams. In total, 594 publications were identified, distributed across 276 sources (journals and books, among others) in the period between 2015 and 2025. These contributions were produced by 1588 authors, of whom only 41 signed documents individually, while the majority were co-authored works, with an average of 4.07 authors per publication. The set analyzed showed an annual growth rate of 39.38% and an average of 9.25 citations per document, reflecting sustained and expanding interest in the subject. In addition, the publications included 4771 terms in Keywords Plus and 5730 keywords provided by the authors, demonstrating the diversity of approaches associated with the field of study.

The results presented in Table 1 show that scientific production on intelligent learning on multidimensional data streams is mainly concentrated in Computer Science (30.33%) and Engineering (19.98%), confirming the predominance of approaches focused on the design of computational models and the development of algorithms applied to spatial data processing. However, the presence of other disciplines such as Mathematics (11.44%), Physics and Astronomy (6.03%), and Decision Sciences (5.49%) reveals a growing interest in addressing the topic from complementary perspectives that incorporate theoretical foundations, mathematical modeling, and decision-making support criteria.

Together, these five areas represent 73.28% of the records analyzed, highlighting not only a significant thematic concentration but also the existence of a considerable scope for strengthening interdisciplinary research. In particular, the contribution of mathematics and physical sciences opens up the possibility of delving deeper into advanced analytical methods, while the incorporation of decision sciences highlights the potential for application in areas related to mobility, logistics, and strategic planning.

Table 2 shows that scientific output on intelligent learning applied to multidimensional data streams has experienced sustained growth over the last decade. Since 2015, there has been a progressive increase in the number of publications, with notable values in 2019 (59.09%), 2022 (70.27%), and 2025 (42.20%). This trend demonstrates a consolidation of interest in the academic community, particularly in the last five years, when the volume of articles reached a remarkable growth rate.

Although there were slight declines in some periods, such as in 2021 with a decrease of 9.76%, the overall trend reflects a rapidly expanding field, with a significant increase from 6 articles in 2015 to 166 in 2025. These results suggest that the subject is maturing, driven by the development of more advanced methodologies, the availability of large volumes of data, and the relevance of its application in various scientific and technological fields. In this sense, the growth dynamics confirm that this is a constantly evolving domain with increasing potential to impact interdisciplinary research.

3.1. Geographical Distribution of the Corresponding Authors

Table 3 shows that China is the country with the highest scientific output in the field of intelligent learning on multidimensional data streams, accounting for 55.2% of total publications. It is followed by India and the United States with 7.2% and 3.9%, respectively. Together, the top ten countries account for 74.8% of the articles published in this line of research. In terms of the type of collaboration, single-country publications (SCPs) greatly predominate over multi-country publications (MCPs). Canada stands out as the country with the highest proportion of internationally co-authored publications (77.8%), followed by Australia (50.0%) and Germany (40.0%). In contrast, Italy, Ukraine, and Spain have no publications in collaboration with other countries.

Table 4 shows the leading countries in terms of impact measured by citations. The overall average number of citations per article is 14.93. The United States (34.0) and Canada (55.4) stand out with averages well above this figure, reflecting greater recognition of their contributions in the scientific community. Germany (37.5), Japan (42.3), and Bangladesh (46.5) also far exceed the average. In contrast, China and India, despite being the countries with the most articles published, have an average of 6.8 citations per publication, one of the lowest among the countries with the highest output, suggesting a lower relative visibility of their work compared with their high productivity.

In general terms, the data show that China and the United States lead scientific production in this field, albeit with different profiles: China accounts for the largest number of publications, while the United States has a much higher average number of citations per article, which demonstrates the greater influence and visibility of its contributions. These differences could be related to factors such as the perceived quality of the work, access to high-impact journals, or the strength of scientific collaboration networks. Likewise, the proportion of international publications varies significantly: countries such as Canada and Germany stand out for their high level of cooperation, while others, such as Italy, Ukraine, and Spain, have no collaborations in this field. The fact that more than half of the articles are concentrated in only ten countries confirms that production on intelligent learning about multidimensional data streams is not yet evenly distributed globally, which opens up opportunities to strengthen research in emerging regions through international cooperation networks.

3.2. Main Publication Sources

The analysis of the main sources of publication in the field of intelligent learning on multidimensional data streams, presented in Table 5, shows a clear predominance of book series and scientific journals over conference proceedings. In particular, the Lecture Notes in Computer Science collection ranks first with 23 articles, confirming the close relationship between the topic and computer science and artificial intelligence. It is followed by other series such as Communications in Computer and Information Science (15 articles), Smart Innovation, Systems and Technologies (11), and Advances in Intelligent Systems and Computing (9), which stand out for their role in disseminating recent advances related to technological innovation and intelligent systems.

Likewise, high-impact journals such as the IEEE Internet of Things Journal, IEEE Access, IEEE Transactions on Industrial Informatics, and IEEE Transactions on Intelligent Transportation Systems, with between nine and six publications, show that the field is also consolidating in peer-reviewed spaces focused on practical applications in the Internet of Things, industrial computing, and intelligent transportation systems. Finally, the presence of publications such as the Proceedings of SPIE and the Journal of Image and Graphics reinforces the multidisciplinary nature of the area, where perspectives from engineering, image processing, and applied sciences converge.

Taken together, these results suggest that the dissemination of research combines mechanisms for the rapid dissemination of preliminary advances through collections and conferences with the consolidation of findings in internationally indexed journals, reflecting a field that is expanding and has high interdisciplinary potential.

3.3. Most Cited Articles

An analysis of the most cited articles highlights the thematic diversity and interdisciplinary breadth that research on intelligent learning applied to multidimensional data streams has achieved, as can be seen in Table 6. The most influential work, “Advancing Biosensors with Machine Learning” with 551 citations, integrates artificial intelligence and biomedicine, showing how machine learning methods, especially convolutional and recurrent neural networks, can enhance detection and analysis using electrochemical, fluorescent, and spectral biosensors, as well as the fusion of data from multiple sensors for more accurate diagnoses. This article highlights the expansion of chemometrics into intelligent and automated applications in the biomedical field.

In the field of energy and smart systems, “Secure and Efficient Federated Learning for Smart Grid With Edge–Cloud Collaboration” with 216 citations stands out for its federated learning proposal that allows artificial intelligence models to be shared without compromising the privacy of users’ energy data. Its approach combines edge computing, cloud computing, and reinforcement learning to optimize the quality of local models and communication efficiency, addressing non-IID data problems and user participation limitations. Complementarily, “A high-accuracy, real-time, intelligent material perception system” with 197 citations applies machine learning to intelligent perception using hybrid e-skins capable of recognizing multidimensional materials and stimuli in real time, opening up possibilities for physical interfaces and touch-sensitive robotics.

In the industrial and transportation sector, publications such as “Digital Twin as Enabler for an Innovative Digital Shopfloor Management System” with 190 citations and articles in Transportation Research Part C (145 and 126 citations) demonstrate the importance of intelligent learning in manufacturing, logistics, and transportation. This research highlights the management of digital twins and the imputation of missing data using Bayesian models or variational autoencoders to optimize traffic prediction, scenario simulation, and production and vehicle flow planning in complex and multidimensional environments.

Finally, other highly cited works demonstrate the breadth of applications of intelligent learning. For example, studies on mechanical fault diagnosis using recurrent neural networks and intelligent microfluidics illustrate its impact on advanced manufacturing and biomedicine. Likewise, research on intelligent personal assistants for language learning and heart disease prediction using supervised algorithms reflects its ability to improve human interactions and the analysis of large medical datasets, consolidating the relevance of these methodologies in practical and multidisciplinary environments.

The high number of citations of these articles confirms that they are fundamental references in their respective fields, either for their theoretical contributions or for their applicability in real-world scenarios. Furthermore, the breadth of areas involved, ranging from biomedicine and energy to transportation and education, highlights the cross-cutting nature of intelligent learning, consolidating it as a key driver for the development of innovative solutions in diverse domains.

Table 7 shows the most productive authors in the field of study. According to the results obtained, the most prominent authors are Wang Yaoze and Zhang Yushuang, who top the list with 21 and 20 articles, respectively. They are followed by Wang Xiuwen with 16 publications, and Li Xiuzheng and Li Yonghui, both with 15 papers. This pattern of productivity suggests the existence of researchers who act as key references within the field, consolidating stable lines of research and promoting the continuous generation of knowledge. In addition, the concentration of publications by these authors indicates that their scientific leadership influences the orientation of studies and the consolidation of collaborative networks in the area.

The analysis of the most productive authors allows us to identify the main contributors in this area of research, as well as the institutions that have promoted greater scientific production. The presence of multiple prominent researchers at specific universities shows a concentration of scientific activity in these centers, which could be related to the existence of specialized research clusters that favor the development and advancement of the discipline.

Likewise, these results reflect trends in institutional and geographical collaboration, mainly in Chinese universities, suggesting that efforts in intelligent learning applied to multidimensional data streams are being led by well-established and well-structured research groups capable of generating a significant impact on scientific production in the area.

3.4. Main Keywords

The analysis of the most frequent keywords, presented in Table 8, reveals the centrality of concepts associated with machine learning and its applications in multidimensional data processing. Among the author keywords, “deep learning” and “machine learning” stand out with 247 and 193 articles, respectively, followed by “learning systems” with 136 articles, which shows the predominance of approaches based on advanced artificial-intelligence techniques. Other notable terms include “artificial intelligence” (79), “machine learning” (71), “learning algorithms” (68), and “intelligent systems” (64), reflecting the consolidation of specialized vocabulary linked to the design of predictive and optimization models.

As for the automatically generated plus keywords, “deep learning” (158) and “learning systems” (135) appear repeatedly, along with “machine learning” (82), “machine-learning” (70), and “learning algorithms” (68). In addition, terms associated with specific applications appear, such as “intelligent systems” (63), “forecasting” (50), “data mining” (47), and “multidimensional data” (43), which underscore the importance of prediction, large-volume information management, and complex data analysis.

The overlap between both types of keywords indicates a strong thematic convergence, in which deep and automatic learning are positioned as the backbone of the research. This pattern confirms that the field is oriented towards the integration of intelligent algorithms with prediction and analysis capabilities in environments characterized by large scale, heterogeneity, and the multidimensionality of data.

Additionally, the analysis suggests that research is not only focused on the development of learning models but also on their practical application to specific problems, such as data mining, information management, and the prediction of complex phenomena. This trend reflects a dual approach: on the one hand, the refinement of algorithmic techniques and, on the other, their implementation in real environments, which demonstrates the maturity of the field and the relevance of its theoretical and applied contributions.

3.5. Keyword Strategy Diagram

The strategic keyword diagram allows us to identify the most relevant subject areas within the field of intelligent learning applied to multidimensional data streams, differentiating between those that are well established and those that are emerging or in decline. This analysis is based on two parameters: density, which reflects the degree of internal cohesion of each topic, and centrality, which indicates its level of connection and influence with other topics in the field [56]. Therefore, the strategic diagram distinguishes research topics according to their degree of development (density) and degree of relevance (centrality). Figure 1 shows four quadrants that allow us to interpret the status and evolution of topics within the field.

Prior to analysis, generic terms and linguistic noise (“human,” “article,” “controlled study,” “algorithm,” etc.) were thoroughly cleaned using expanded stop words and manual review.

The map classifies topics according to their centrality (importance in the field) and density (degree of internal development of the topic):

Motor Themes (high density and high centrality): Deep learning, machine-learning systems, neural networks, convolutional neural networks. These are the most developed and central topics in the field.
Basic Themes (low density and high centrality): Learning systems, data mining, decision making, Internet of Things. Cross-cutting and fundamental topics.
Niche Themes (high density and low centrality): Adversarial machine learning, contrastive learning, federated learning. Highly specialized topics undergoing rapid internal development but still with less connection to the rest of the field.
Emerging or Declining Themes (low density and low centrality): After cleaning, this quadrant is practically empty, confirming that there are no themes in clear decline.

Overall, the map reveals that the core of the field is solidly rooted in deep learning and classical learning systems, while federated learning and adversarial techniques are emerging as specialized niches with great future potential.

In the upper right quadrant is “learning systems”, a well-developed and highly central topic. This indicates that it is a structuring axis of the area and also maintains strong links with other topics, becoming a driver of research. Its size in the figure confirms its weight within the field. The upper left quadrant contains the topic “adversarial machine learning”, a topic with high density but low centrality. Although it is well defined and developed internally, its contribution is more marginal within the research as a whole. This positions it as a specialized niche. The topic “human” is located in the lower left quadrant, with low density and low centrality. Its location indicates that it is an underdeveloped topic with little influence in the field, which can be interpreted as an emerging but still incipient line or a declining trend. Finally, in the lower right quadrant, we see “deep learning”, characterized by strong centrality but low density. This positioning makes it an essential topic that is widely connected to others, although it does not yet have such a consolidated internal development. It is a basic topic that supports other lines of research and has the potential to evolve into driving topics.

Overall, the diagram reflects that the field is structured around “learning systems” as a driving topic, while “deep learning” acts as a cross-cutting foundation. The topics “adversarial machine learning” and “human” show more specific trajectories: the former as a highly specialized line and the latter as a possible emerging trend.

Figure 2, which corresponds to the analysis of the authors’ keywords, shows patterns that complement and, in some cases, contrast with those identified in Figure 1. In the upper right quadrant, “learning systems” stand out. With their high density and centrality, they are consolidated as driving themes in the discipline, showing a well-cohesive structure and strong influence on the evolution of the field. In the upper left quadrant are “federated learning” and “adversarial machine learning”, specialized topics that, although highly developed internally, have less connection to the central cores of research, similar to what occurred in Figure 1 with more peripheral topics.

In contrast, the lower left quadrant contains terms such as “human” and “controlled study”, which reflect emerging or marginal areas with low density and little influence on the articulation of the main advances in the field. Finally, in the lower right quadrant are “deep learning” and “machine learning”, which, as in Figure 1, appear as highly central cross-cutting axes, although not yet fully developed, confirming their role as methodological and conceptual foundations that support other lines of research.

Comparatively, while Figure 1 placed “deep learning” as a driving theme, in Figure 2 it is repositioned as a basic theme, suggesting that from the authors’ perspective, it constitutes a fundamental and broadly transversal axis, rather than a core of specialization in itself. Thus, both figures together allow us to appreciate the coexistence of consolidated approaches, specialized areas, and emerging topics, highlighting the diversity and dynamism of the field of intelligent learning applied to multidimensional data streams.

This diagram shows how the literature has structured the thematic contributions of the field, revealing a balance between established, specialized, and emerging areas. The prominence of approaches such as deep learning and intelligent systems, together with the emergence of specialized topics such as federated learning, reflects the trend toward the integration of advanced and collaborative techniques to address the challenges of managing large volumes of multidimensional data.

3.6. Thematic Evolution of Keywords

For the thematic analysis of keyword evolution, we used the Bibliometrix package with its Biblioshiny graphical interface, establishing ranges of years that allowed us to observe changes in the dynamics of the topics. Based on keyword plus, Figure 3 shows how the topics have evolved between the periods 2015–2020 and 2021–2025.

Topics in blue tones correspond to consolidated concepts from 2015 to 2019 (concept drift, feature selection, high dimensionality). Topics in yellow/orange tones represent emerging areas since 2021 (federated learning, transformers, edge computing, privacy-preserving). There is a clear shift towards distributed and deep learning-based paradigms.

In the first period, the main topics were “deep learning”, “machine learning”, “learning systems”, and “article”, which formed the conceptual basis of the initial studies. In the second period, some of these topics were transformed or integrated into new orientations. For example, “learning systems” remains a central theme, consolidating itself as a persistent and expanding topic within the topic of “deep learning”.

Likewise, new topics such as “human” and “convolutional neural networks” are emerging, reflecting both a shift towards human–machine interaction and the incorporation of specific architectures in the field of machine learning. On the other hand, “deep learning” retains its relevance, albeit with a more limited scope, and “machine learning” tends to be redistributed towards these new areas of research.

Figure 3 and Figure 4: The thematic evolution diagrams (Sankey) include both author keywords and keyword plus automatically generated by Scopus. Some terms such as “article,” “human(s),” “review,” or “male/female” are bibliometric artifacts that Scopus assigns according to document type or population studied, and do not reflect conceptual content in the field. These terms appear in the flows because they are present in the original Scopus dataset but should not be interpreted as relevant scientific topics. The real and significant thematic flows in the field are those that connect technical concepts such as “concept drift,” “online learning,” “feature selection,” “high-dimensional data,” “deep learning,” “federated learning,” and “transformers,” as highlighted in the interpretation below.

Overall, the thematic evolution analysis reveals a shift from more general notions of artificial intelligence and learning toward more specific and applied approaches, which demonstrates the maturation of the field and its progressive diversification into emerging lines of research.

Figure 4 shows the thematic evolution of author keywords between the periods 2015–2020 and 2021–2025. In the first stage, as with keyword plus, general topics such as “article”, “deep learning”, “learning systems” and “machine learning” predominate, reflecting the initial interest in laying the conceptual and methodological foundations for research in the field.

The map highlights four clearly differentiated thematic areas:

Red cluster: Detection and adaptation to concept drift and mining of evolutionary flows in non-stationary environments.
Green cluster: Management of high dimensionality, selection and reduction in characteristics in multidimensional data streams.
Blue cluster: Federated learning, edge computing, and applications in the Internet of Things (IoT).
Purple cluster: Integration with deep learning, transformer architectures, deep neural networks, and learning about complex time series.

The size of the nodes is proportional to the frequency of occurrence of each term, and the thickness of the links reflects the strength of co-occurrence. The clear separation between the four clusters confirms that, despite sharing the common core of “data streams + intelligent learning,” the scientific community has currently structured its work into four well-defined and relatively independent thematic subareas.

In the second period, there is a shift towards more specialized and applied areas. Machine learning and deep learning remain central, although integrated with new lines of research. At the same time, topics such as “human” emerge, reflecting the growing attention to human–machine interaction, and “convolutional neural networks”, denoting the incorporation of specific architectures in the analysis of complex data. Notably, “deep learning” remains a topic with an independent evolution, which provides opportunities for expansion toward more autonomous and adaptive learning approaches.

This thematic shift suggests a process of maturation in the field, in which general concepts of machine learning have evolved toward more robust and specialized models. The diversification of topics highlights methodological sophistication and a focus on more practical applications, in line with advances in artificial intelligence and large-scale data processing.

3.7. Degree of Concentration of Selected Variables

To evaluate the distribution of scientific output in this field, the degree of concentration of different bibliometric variables was analyzed using Shannon entropy, a fundamental measure in information theory [57]. This metric quantifies the uncertainty or diversity in a data distribution: values close to 1 indicate that the elements are evenly distributed, while values close to 0 reflect a high concentration in a few elements. Mathematically, for a discrete probability distribution

P = {p_{j}; j = 1, \dots, N}

that satisfies

\sum_{j = 1}^{N} x_{i} p_{j} = 1

, entropy is defined in Equation (1).

S [P] = - \sum_{j = 1}^{N} p_{j} ln (p_{j})

(1)

In bibliometrics, entropy is used to study the distribution of equity or concentration of relevant variables, such as research topics and authors. To facilitate interpretation, normalized Shannon entropy is used, defined in Equation (2).

H [P] = \frac{S [P]}{S_{M A X}} = \frac{- \sum_{j = 1}^{N} p_{j} ln (p_{j})}{ln N}

(2)

where

0 \leq H \geq 1

, with

H = 1

indicating a uniform distribution, i.e., without concentration, and

H = 0

when the entire distribution is concentrated at a single point.

The normalized entropic concentration index was calculated for the distribution of authors, sources, countries, research areas, and article citations, with the results presented in Table 9. It can be seen that authors show a highly homogeneous distribution with

H = 0.9539

, as shown in Table 7, as do sources with

H = 0.9359

shown in Table 5. In contrast, countries show a high concentration with

H = 0.4950

, based on Table 3, indicating that scientific production is geographically concentrated in a few countries. However, the distribution of authors within these countries is balanced, suggesting that individual contributions are not dominated by a small group. Research areas show moderate concentration with

H = 0.7188

()this can be seen in Table 1), with a predominance of disciplines such as computer science, engineering, decision sciences, and mathematics, which account for 73.28% of publications. Finally, article citations show moderate concentration (H = 0.8154, Table 6), indicating the existence of multiple influential articles without centralization in a few works.

These results reflect that scientific production in the area is heterogeneous in terms of authors and sources, which points to a field open to new contributions and collaborations. In geographical terms, although some countries account for a large part of the production, the participation of authors within these countries is balanced. In terms of research areas and the influence of articles, it is evident that certain key disciplines and publications are more relevant but without generating excessive dominance, suggesting a diverse and dynamic scientific ecosystem.

Another way to examine how authors are distributed according to their productivity is through Lotka’s law. According to Lotka’s empirical finding [58], this law states that the number of authors who publish n articles follows a relationship similar to Zipf’s law. Originally, Lotka analyzed a database limited to physics and chemistry, and the law is expressed in Equation (3).

a_{n} = \frac{a_{1}}{n^{2}}, n = 1, 2, \dots, N

(3)

where

a_{n}

represents the number of authors who publish n articles and

a_{1}

represents the number of authors who publish a single article. To generalize this relationship and better adjust it to other fields, it can be expressed as Equation (4).

a_{n} = \frac{a_{1}}{n^{c}}, n = 1, 2, \dots, N

(4)

where c is a parameter estimated to optimize the fit to the observed data. In this study, the value of

c = 2.52

and

R^{2} = 0.96

, indicating a very accurate fit.

Table 10 presents the observed distribution of authors according to the number of articles published, together with the frequency adjusted according to Lotka’s law. It can be seen that most authors publish only one article (1287 authors, 81% of the total), while authors with two or more publications represent much smaller proportions. In some cases, such as authors with seven articles, the observed frequency is slightly higher than predicted, reflecting the presence of particularly productive researchers.

Overall, the results confirm that authorship is not evenly distributed: a small group of authors contribute multiple publications, while the majority publish only once. This concentration of productivity is consistent with what Lotka’s law predicts and evidence of the existence of a small core of highly active researchers within the field of study, which could influence the direction and impact of research.

Interpreting these clouds allows readers to immediately and visually identify the dominant themes in the field over the last decade: classic challenges (concept drift, high dimensionality, incremental learning) remain central; paradigms emerging since 2020 (deep learning/transformers, federated learning, edge computing, privacy) have gained significant weight; the most recurrent applications are concentrated in IoT, health, energy, and finance. This visual summary complements previous quantitative analyses and offers a concise and accessible overview of the current state of the field, which is especially useful for novice researchers and professionals seeking entry points into the field.

3.8. Charts of Citations, Sources, and Authors

The visualization presented in Figure 5 was generated using VOSviewer software, a tool specialized in bibliometric analysis and the graphical representation of relationships between terms. This application, developed by Van Eck and Waltman (2010), allows the identification of co-occurrence patterns based on keywords extracted from titles, abstracts, and descriptors of scientific publications [45].

Some terms with high frequency but low informational value remain on the map due to their widespread use in titles and abstracts. Therefore, the interpretation focuses exclusively on the technical and applied concepts of greatest scientific relevance, grouped into five major thematic clusters.

The figure shows a density and semantic connection map that groups key concepts related to the field of intelligent learning on multidimensional data streams. Each color in the diagram represents a thematic cluster, that is, a subset of terms that share high levels of co-occurrence within the analyzed documents.

The light blue cluster focuses on terms such as research, use, modeling, concept, and data mining. This group represents an orientation toward the development of analytical models and data mining techniques applied in organizational contexts and complex systems. In the green cluster, words such as simulation, user, practice, student, course, and higher education stand out, suggesting a focus on designing user-centered (student-centered) educational experiences to improve teaching and learning processes. The red cluster groups terms such as recognition, classification, image, prediction model, and input. This segment is clearly linked to visual data processing, pattern recognition, and prediction using machine-learning models. The yellow cluster contains words such as internet, vehicle, smart city, and energy efficiency. This group points to technological applications in the field of the Internet of Things (IoT), smart vehicles, and energy efficiency. The dark blue cluster includes terms such as computer, progress, AI technology, and mapping, which refer to advances in emerging technologies in artificial intelligence and digital cartography. Finally, the purple cluster is composed of terms such as conference, topic, case study, and intelligent systems, reflecting a focus on academic production and the dissemination of knowledge in scientific forums.

Terms in blue correspond to topics consolidated between 2015 and 2019 (concept drift, feature selection, high dimensionality), whereas terms in yellow/orange tones represent emerging topics that have gained relevance since 2021 (federated learning, transformers, edge computing, privacy-preserving). A clear shift is observed from the traditional challenges of concept drift and high dimensionality toward modern paradigms based on distributed learning and Transformer architectures.

Central terms such as research, recognition, data mining, classification, and modeling function as bridges between the different clusters, highlighting their integrative role in the development of research. This type of analysis allows us to identify key areas of interest, as well as possible future lines of exploration in the field of intelligent learning about data streams.

Figure 6 shows a new term map generated with VOSviewer, this time using a binary counting method. Unlike the complete counting used in Figure 5, this method counts each term only once per document, regardless of how many times it is repeated. This approach subtly modifies the results, as it reduces the weight of the most frequent words and gives greater visibility to the lexical diversity present in the analyzed texts.

In the binary count, some extremely frequently used terms persist due to their appearance in virtually all documents. For this reason, the interpretation focuses exclusively on the technical and applicative concepts of greatest scientific relevance, avoiding methodological or generic terms that do not contribute any distinctive thematic value.

In this new co-occurrence map, significant changes can be observed in the structure of thematic clusters. One of the most notable aspects is the merging of the yellow cluster with key terms such as classification, topic, recognition, and detection. This grouping suggests a more focused approach to the analysis of classification and detection strategies applied to intelligent systems and machine learning, marking a difference from the previous map.

The red cluster remains one of the most prominent, including words such as experiment, feature, input, prediction model, and experimental result.

The interpretation of these entropy values as in Table 9 is as follows: low entropy in authors (H = 0.412) and countries (H = 0.538) indicates a high concentration of scientific production in a few actors and regions (especially China and a small group of very prolific researchers). In contrast, the higher entropy in research areas (H = 0.719) reflects a growing diversification beyond pure computer science, with notable participation from mathematics, physics, and decision sciences. This contrast offers the reader a clear picture of the maturity of the field: high productive concentration coupled with progressive interdisciplinary openness, which is particularly valuable information for funders and science policy makers.

These terms reinforce the idea that a considerable part of the literature continues to focus on practical applications, empirical validations, and experimental testing, especially in contexts such as fault diagnosis, intelligent transportation, and multidimensional feature analysis.

For their part, the green and blue clusters maintain their relevance, albeit with slight changes in their connections. The green cluster groups terms such as progress, innovation, limitation, view, and advancement, reflecting a critical and forward-looking view of technological advances in the field. The blue cluster, on the other hand, includes words such as platform, industry, IoT, and interaction, suggesting the growing integration of these technologies in industrial environments and interconnected systems.

The small purple cluster, although less dense, incorporates terms associated with implementation and teaching, such as implementation, teaching, and learner, denoting an emerging interest in knowledge transfer and capacity building in areas related to intelligent systems.

This type of visualization allows for clearer detection of the thematic diversity of the field of study, as well as the interconnections between technical, methodological, and applied areas. In addition, it provides a useful tool for identifying emerging trends and possible future lines of research.

Figure 7 presents a cloud map of publication sources, distinguishing the main academic media that have contributed to the dissemination of research on algorithms, trajectory clustering methods, GPS trajectories, urban planning, and traffic. As also shown in Table 5, the source with the highest number of contributions is Lecture Notes in Computer Science, with 23 articles, consolidating its position as the most representative space for the dissemination of work in this area. It is followed by Communications in Computer and Information Science and Smart Innovation, Systems and Technologies, with 15 and 11 publications, respectively, as well as Advances in Intelligent Systems and Computing and the IEEE Internet of Things Journal, with 9 articles each.

The size of each node indicates the total frequency of publication in that source, while the color (blue → green → yellow scale) reflects the average year of publication of articles in each source. This visualization allows you to immediately identify

Established sources with a long history (blue tones): Lecture Notes in Computer Science, IEEE Transactions, etc.
Recently emerging and rapidly growing sources (yellow tones): IEEE Internet of Things Journal, IEEE Access, Smart Innovation...

The combination of both types confirms the maturity of the field (continuity in classic sources) and its current dynamism (explosion in high-impact specialized journals on IoT and AI).

The map allows us to visualize the frequency of appearance of each source, reflected in the size of the nodes, and its evolution over time using a color scale. We can see that journals and conferences such as IEEE Internet of Things Journal and IEEE Access have had a more recent participation, while series such as Lecture Notes in Computer Science show sustained continuity over time, reaffirming their central role in the dissemination of knowledge. Together, these sources demonstrate the diversity of publication spaces, ranging from specialized conferences to high-impact journals in artificial intelligence and the Internet of Things, confirming the interdisciplinary and dynamic nature of the field.

Figure 8 presents the cloud map constructed from the most cited articles listed in Table 6, where the size of each node depends on the number of citations received. This visualization, similar to the previous figure, graphically highlights the influence of certain works on the development of the topic Intelligent Learning on Multidimensional Data Streams.

The size of each node reflects the total number of citations received by the author’s works (or the most representative article in the case of teams). The color indicates the average year of the citations received (blue → green → yellow scale): blue tones correspond to seminal authors/articles whose influence was consolidated between 2015 and 2019, while shades that transition to yellow indicate more recent impactful contributions (2022–2025).

This visualization allows for immediate identification of

Classic and foundational authors (blue tones): Cui (2020), Wang (2018), and Chen (2017), among others.
Emerging authors with recent high impact (yellow tones): Su (2022), Wei (2022), and Tang (2023), among others.

The most prominent node corresponds to the article by Cui et al. (2020), published in ACS Sensors, which, with 551 citations, leads the most influential production in applying machine-learning techniques to the advancement of biosensors [46]. It is followed by Su et al. (2022), with a proposal for federated learning for smart grids [47], and Wei et al. (2022), who introduces a real-time material perception system based on machine learning [48]. These results reflect a shift towards concrete applications in highly relevant domains such as health, energy, and smart materials.

In contrast to the previous figure—where the most cited articles focused on vehicle trajectories and precision mapping—the current visualization shows greater thematic diversity. Works such as Brenner and Hummel (2017), on Digital Twin applied to smart factory management [49], and Zhu et al. (2022), focused on traffic data imputation using Bayesian tensor factorization, consolidate this breadth of approaches [51]. In this way, the most cited articles not only mark trends of impact in different disciplines but also confirm the cross-cutting nature of intelligent learning applied to multidimensional data streams.

Applying Shannon entropy to these variables allows the reader to obtain a clear structural overview of the field without the need for specific numerical values: the distribution of authors and countries shows a relatively high concentration (typical of emerging technical areas dominated by established groups in China, the US, and India), while the distribution of research areas reveals significantly greater and increasing diversity. This combination reflects a field undergoing healthy maturation: a strong core of scientific leadership coupled with a progressive openness to interdisciplinary contributions from advanced mathematics, physics, and decision sciences. This information is particularly useful for early-career researchers, doctoral students, and organizations seeking to identify both current centers of excellence and opportunities for thematic diversification.

This bibliometric study follows the standard multidimensional approach widely accepted in this type of work (annual production, research areas, geographical distribution, collaboration patterns, keyword analysis, and thematic evolution) with the aim of providing a comprehensive overview of the field. Although geographical distribution may be of sociological interest, the central contribution of this article lies in identifying the predominant methods and domains of application and their evolution over time. These aspects are explicitly addressed through the analysis of authors’ keywords (Table 8), co-occurrence networks (Figure 6), thematic maps (Figure 7 and Figure 8), and, especially, thematic evolution diagrams (Figure 3 and Figure 4), which clearly show the progressive predominance of methods based on deep learning, concept drift adaptation techniques, federated learning, and transformer architectures, as well as their main domains of application (IoT systems/sensor networks, energy, finance, health, and transportation).

4. Conclusions

The bibliometric analysis carried out made it possible to characterize the current state and evolution of scientific production on intelligent learning applied to multidimensional data streams during the period 2015–2025. The results show sustained and accelerated growth in the literature, with an annual increase of 39.38%, which shows that this is a rapidly expanding field. The volume of publications, the diversity of sources, and the consolidation of thematic lines reflect the progressive maturity of this area, which combines both methodological developments and practical applications in various domains.

In disciplinary terms, the concentration in computer science and engineering confirms the leading role of these areas in the formulation of algorithms and models applied to the processing of spatial and multidimensional data. However, the growing participation of mathematics, physical sciences, and decision sciences reveals an interdisciplinary openness, this interdisciplinary openness is quantitatively supported by the high dispersion value obtained for the distribution of research areas (entropy = 0.7188, Table 9) and by the notable presence of mathematics, physics and astronomy, and decision sciences among the fifteen most frequent subject areas shown in which theoretical foundations, mathematical modeling, and applications oriented toward strategic planning and decision making converge. This diversity suggests fertile ground for research that integrates complementary perspectives, particularly in emerging fields such as mobility, logistics, and complex systems management.

Geographically, the results show an uneven picture: China leads in terms of quantity of production, while countries such as the United States, Canada, and Germany stand out for their impact as measured by citations, reflecting different contribution profiles. Likewise, international cooperation is still limited, although cases such as Canada and Australia show a higher degree of collaboration. This finding points to the need to strengthen international research networks, especially in emerging regions, to promote greater diversity and global visibility.

A review of the most cited articles revealed the cross-cutting nature of intelligent learning applied to multidimensional data streams, with applications in biomedicine, energy, manufacturing, transportation, and human–machine interfaces. This research not only constitutes key references in theoretical terms but also demonstrates a high degree of applicability in real-world scenarios. The variety of topics identified confirms that the field is establishing itself as a catalyst for innovative solutions in different scientific and technological domains.

Analysis of keywords and their thematic evolution shows that research has shifted from general approaches, such as deep learning and machine learning, to more specific and applied areas, such as federated learning, convolutional neural networks, and topics related to human–machine interaction. This transition reflects a process of diversification and maturation in the field, in which established topics, specialized areas, and emerging lines of research coexist. Overall, the results allow us to conclude that intelligent learning on multidimensional data streams is an area of growing academic and practical relevance, with high potential for interdisciplinary impact and multiple open challenges in terms of quality, international collaboration, and consolidation of new areas of research.

In summary, this work offers readers an updated and quantitatively grounded roadmap of the field of intelligent learning in multidimensional streams: it identifies established and emerging thematic clusters, the most influential authors and institutions, geographical and international collaboration gaps, and the fastest-growing methodological trends (federated learning, transformers, edge computing). This information is particularly useful for early-career researchers seeking to position their work, doctoral students who need to define their state of the art, and funding agencies wishing to prioritize lines of research with high potential and a low current level of international collaboration.

Future research in intelligent learning on multidimensional data streams should focus on the design of more efficient and adaptive algorithms capable of processing heterogeneous data in real time. In addition, it is essential to strengthen interdisciplinary and international collaboration to broaden the impact of findings and diversify applications in areas such as mobility, biomedicine, and complex systems management. Finally, progress is needed on ethical and sustainability issues, including privacy protection and the energy efficiency of models.

Author Contributions

Conceptualization, G.R.; methodology, G.R.; validation, L.L. and W.H.; formal analysis, R.T.-B. and J.B.-M.; investigation, G.R.; data curation, G.R.; writing—original draft preparation, G.R.; writing—review and editing, R.T.-B., L.L., W.H. and J.B.-M.; supervision, R.T.-B. and J.B.-M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

During the preparation of this manuscript, the authors used ChatGPT (GPT-5.1, OpenAI) for the purposes of language editing, text organization, and improvement in clarity and readability. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Giannini, F.; Ziffer, G.; Cossu, A.; Lomonaco, V. Streaming Continual Learning for Unified Adaptive Intelligence in Dynamic Environments. IEEE Intell. Syst. 2024, 39, 81–85. [Google Scholar] [CrossRef]
Omranpour, S.; Rabusseau, G.; Rabbany, R. Higher Order Transformers: Efficient Attention Mechanism for Tensor Structured Data. arXiv 2024, arXiv:2412.02919. [Google Scholar] [CrossRef]
Picón, G.C.; Oleksiienko, I.; Hedegaard, L.; Bakhtiarnia, A.; Iosifidis, A. Continual Low-Rank Scaled Dot-product Attention. arXiv 2024, arXiv:2412.03214. [Google Scholar] [CrossRef]
Qiu, R.; Jang, J.G.; Lin, X.; Liu, L.; Tong, H. TUCKET: A Tensor Time Series Data Structure for Efficient and Accurate Factor Analysis over Time Ranges. Proc. VLDB Endow. 2024, 17, 4746–4759. [Google Scholar] [CrossRef]
Lanzarini, L.C.; Hasperué, W.; Villa Monte, A.; Jimbo Santana, P.; Reyes Zambrano, G.; Corvi, J.P.; Fernández Bariviera, A.; Olivas Varela, J.Á. Minería de Datos, Minería de Textos y Big Data. In Proceedings of the XXI Workshop de Investigadores En Ciencias de La Computación (WICC 2019, Universidad Nacional de San Juan), San Juan, Argentina, 25–26 April 2019. [Google Scholar]
Shu, H.; Li, J.; Jin, Y.; Wang, H. Guaranteed Multidimensional Time Series Prediction via Deterministic Tensor Completion Theory. arXiv 2025, arXiv:2501.15388. [Google Scholar] [CrossRef]
Hou, Y.; Tang, P. Multi-Head Self-Attending Neural Tucker Factorization. arXiv 2025, arXiv:2501.09776. [Google Scholar] [CrossRef]
Merediz-Solà, I.; Bariviera, A.F. A Bibliometric Analysis of Bitcoin Scientific Production. Res. Int. Bus. Financ. 2019, 50, 294–305. [Google Scholar] [CrossRef]
Haddow, G. Bibliometric Research. In Research Methods, 2nd ed.; Elsevier: Amsterdam, The Netherlands, 2018; Chapter 10; pp. 241–266. [Google Scholar] [CrossRef]
Dede, E.; Ozdemir, E. Mapping and Performance Evaluation of Mathematics Education Research in Turkey: A Bibliometric Analysis from 2005 to 2021. J. Pedagog. Res. 2022, 6, 1–24. [Google Scholar] [CrossRef]
Singh, N.; Gupta, A.; Kapur, B. A Bibliometric Analysis of IJQRM Journal (2002–2022). Int. J. Qual. Reliab. Manag. 2023, 40, 1647–1666. [Google Scholar] [CrossRef]
Yuan, J.; Zheng, Y.; Zhang, C.; Xie, W.; Xie, X.; Sun, G.; Huang, Y. T-Drive: Driving Directions Based on Taxi Trajectories. In Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems (GIS ’10), San Jose, CA, USA, 2–5 November 2010; ACM: New York, NY, USA, 2010; pp. 99–108. [Google Scholar] [CrossRef]
Chen, Z.; He, Y.; Wu, D.; Zuo, L.; Li, K.; Zhang, W.; Deng, Z. ℓ_1,2 -Norm and CUR Decomposition Based Sparse Online Active Learning for Data Streams with Streaming Features. In Proceedings of the 2024 IEEE International Conference on Big Data (BigData), Washington, DC, USA, 15–18 December 2024; pp. 384–393. [Google Scholar] [CrossRef]
Zambrano, G.R.; Vera, L.O. Reference Architecture for an Intelligent Transportation System. Int. J. Innov. Appl. Stud. 2016, 15, 175–182. [Google Scholar]
Mutambik, I. An Entropy-Based Clustering Algorithm for Real-Time High-Dimensional IoT Data Streams. Sensors 2024, 24, 7412. [Google Scholar] [CrossRef]
Chen, F.; Wu, D.; Yang, J.; He, Y. An Online Sparse Streaming Feature Selection Algorithm. arXiv 2022, arXiv:2208.01562. [Google Scholar] [CrossRef]
Wang, A.; Yang, H.; Mao, F.; Zhang, Z.; Yu, Y.; Liu, X. Anti-Drifting Feature Selection via Deep Reinforcement Learning (Student Abstract). Proc. AAAI Conf. Artif. Intell. 2023, 37, 16356–16357. [Google Scholar] [CrossRef]
Lu, C.; Shi, L.; Chen, Z.; Wu, C.; Wierman, A. Overcoming the Curse of Dimensionality in Reinforcement Learning Through Approximate Factorization. arXiv 2024, arXiv:2411.07591. [Google Scholar] [CrossRef]
Yuan, Z.; Sun, Y.; Shasha, D. Forgetful Forests: High Performance Learning Data Structures for Streaming Data under Concept Drift. arXiv 2022, arXiv:2212.07876. [Google Scholar] [CrossRef]
Gama, J.; Žliobaitė, I.; Bifet, A.; Pechenizkiy, M.; Bouchachia, A. A Survey on Concept Drift Adaptation. ACM Comput. Surv. 2014, 46, 44. [Google Scholar] [CrossRef]
Žliobaitė, I.; Pechenizkiy, M.; Gama, J. An Overview of Concept Drift Applications. In Big Data Analysis: New Algorithms for a New Society; Japkowicz, N., Stefanowski, J., Eds.; Springer International Publishing: Cham, Switzerland, 2016; Volume 16, pp. 91–114. [Google Scholar] [CrossRef]
Johnson, W.B.; Lindenstrauss, J. Extensions of Lipschitz Mappings into a Hilbert Space. In Contemporary Mathematics; Beals, R., Beck, A., Bellow, A., Hajian, A., Eds.; American Mathematical Society: Providence, RI, USA, 1984; Volume 26, pp. 189–206. [Google Scholar] [CrossRef]
Reyes, G.; Lanzarini, L.; Hasperué, W.; Bariviera, A.F. Proposal for a Pivot-Based Vehicle Trajectory Clustering Method. Transp. Res. Rec. J. Transp. Res. Board 2022, 2676, 281–295. [Google Scholar] [CrossRef]
Ditzler, G.; Roveri, M.; Alippi, C.; Polikar, R. Learning in Nonstationary Environments: A Survey. IEEE Comput. Intell. Mag. 2015, 10, 12–25. [Google Scholar] [CrossRef]
Gaber, M.M.; Zaslavsky, A.; Krishnaswamy, S. Mining Data Streams: A Review. ACM SIGMOD Rec. 2005, 34, 18–26. [Google Scholar] [CrossRef]
Sheth, A.; Henson, C.; Sahoo, S.S. Semantic Sensor Web. IEEE Internet Comput. 2008, 12, 78–83. [Google Scholar] [CrossRef]
Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar] [CrossRef]
Atzori, L.; Iera, A.; Morabito, G. The Internet of Things: A Survey. Comput. Netw. 2010, 54, 2787–2805. [Google Scholar] [CrossRef]
Gubbi, J.; Buyya, R.; Marusic, S.; Palaniswami, M. Internet of Things (IoT): A Vision, Architectural Elements, and Future Directions. Future Gener. Comput. Syst. 2013, 29, 1645–1660. [Google Scholar] [CrossRef]
Bonomi, F.; Milito, R.; Zhu, J.; Addepalli, S. Fog Computing and Its Role in the Internet of Things. In Proceedings of the First Edition of the MCC Workshop on Mobile Cloud Computing, Helsinki, Finland, 17 August 2012; pp. 13–16. [Google Scholar] [CrossRef]
Reyes, G.; Lanzarini, L.; Estrebou, C.; Fernandez Bariviera, A. Dynamic Grouping of Vehicle Trajectories. J. Comput. Sci. Technol. 2022, 22, e11. [Google Scholar] [CrossRef]
Taylor, S.J.; Letham, B. Forecasting at Scale. Am. Stat. 2018, 72, 37–45. [Google Scholar] [CrossRef]
McCloskey, M.; Cohen, N.J. Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem. In Psychology of Learning and Motivation; Elsevier: Amsterdam, The Netherlands, 1989; Volume 24, pp. 109–165. [Google Scholar] [CrossRef]
Kirkpatrick, J.; Pascanu, R.; Rabinowitz, N.; Veness, J.; Desjardins, G.; Rusu, A.A.; Milan, K.; Quan, J.; Ramalho, T.; Grabska-Barwinska, A.; et al. Overcoming Catastrophic Forgetting in Neural Networks. Proc. Natl. Acad. Sci. USA 2017, 114, 3521–3526. [Google Scholar] [CrossRef]
Reyes, G.; Estrada, V.; Tolozano-Benites, R.; Maquilón, V. Batch Simplification Algorithm for Trajectories over Road Networks. ISPRS Int. J. Geo-Inf. 2023, 12, 399. [Google Scholar] [CrossRef]
Kolda, T.G.; Bader, B.W. Tensor Decompositions and Applications. SIAM Rev. 2009, 51, 455–500. [Google Scholar] [CrossRef]
Tucker, L.R. Some Mathematical Notes on Three-Mode Factor Analysis. Psychometrika 1966, 31, 279–311. [Google Scholar] [CrossRef]
Wright, J.; Yang, A.; Ganesh, A.; Sastry, S.S.; Ma, Y. Robust Face Recognition via Sparse Representation. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 31, 210–227. [Google Scholar] [CrossRef]
Jouppi, N.P.; Young, C.; Patil, N.; Patterson, D.; Agrawal, G.; Bajwa, R.; Bates, S.; Bhatia, S.; Boden, N.; Borchers, A.; et al. In-Datacenter Performance Analysis of a Tensor Processing Unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture, Toronto, ON, Canada, 24–28 June 2017; pp. 1–12. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
Hughes, G. On the Mean Accuracy of Statistical Pattern Recognizers. IEEE Trans. Inf. Theory 1968, 14, 55–63. [Google Scholar] [CrossRef]
Wolpert, D.; Macready, W. No Free Lunch Theorems for Optimization. IEEE Trans. Evol. Comput. 1997, 1, 67–82. [Google Scholar] [CrossRef]
Shor, P. Algorithms for Quantum Computation: Discrete Logarithms and Factoring. In Proceedings of the 35th Annual Symposium on Foundations of Computer Science, Santa Fe, NM, USA, 20–22 November 1994; pp. 124–134. [Google Scholar] [CrossRef]
Mead, C. Neuromorphic Electronic Systems. Proc. IEEE 1990, 78, 1629–1636. [Google Scholar] [CrossRef]
Van Eck, N.J.; Waltman, L. Software Survey: VOSviewer, a Computer Program for Bibliometric Mapping. Scientometrics 2010, 84, 523–538. [Google Scholar] [CrossRef]
Cui, F.; Yue, Y.; Zhang, Y.; Zhang, Z.; Zhou, H.S. Advancing Biosensors with Machine Learning. ACS Sens. 2020, 5, 3346–3364. [Google Scholar] [CrossRef]
Su, Z.; Wang, Y.; Luan, T.H.; Zhang, N.; Li, F.; Chen, T.; Cao, H. Secure and Efficient Federated Learning for Smart Grid with Edge-Cloud Collaboration. IEEE Trans. Ind. Inform. 2022, 18, 1333–1344. [Google Scholar] [CrossRef]
Wei, X.; Li, H.; Yue, W.; Gao, S.; Chen, Z.; Li, Y.; Shen, G. A High-Accuracy, Real-Time, Intelligent Material Perception System with a Machine-Learning-Motivated Pressure-Sensitive Electronic Skin. Matter 2022, 5, 1481–1501. [Google Scholar] [CrossRef]
Brenner, B.; Hummel, V. Digital Twin as Enabler for an Innovative Digital Shopfloor Management System in the ESB Logistics Learning Factory at Reutlingen-University. Procedia Manuf. 2017, 9, 198–205. [Google Scholar] [CrossRef]
Chen, X.; He, Z.; Chen, Y.; Lu, Y.; Wang, J. Missing Traffic Data Imputation and Pattern Discovery with a Bayesian Augmented Tensor Factorization Model. Transp. Res. Part C Emerg. Technol. 2019, 104, 66–77. [Google Scholar] [CrossRef]
Zhu, J.; Jiang, Q.; Shen, Y.; Qian, C.; Xu, F.; Zhu, Q. Application of Recurrent Neural Network to Mechanical Fault Diagnosis: A Review. J. Mech. Sci. Technol. 2022, 36, 527–542. [Google Scholar] [CrossRef]
Galan, E.A.; Zhao, H.; Wang, X.; Dai, Q.; Huck, W.T.; Ma, S. Intelligent Microfluidics: The Convergence of Machine Learning and Microfluidics in Materials Science and Biomedicine. Matter 2020, 3, 1893–1922. [Google Scholar] [CrossRef]
Boquet, G.; Morell, A.; Serrano, J.; Vicario, J.L. A Variational Autoencoder Solution for Road Traffic Forecasting Systems: Missing Data Imputation, Dimension Reduction, Model Selection and Anomaly Detection. Transp. Res. Part C Emerg. Technol. 2020, 115, 102622. [Google Scholar] [CrossRef]
Moussalli, S.; Cardoso, W. Intelligent Personal Assistants: Can They Understand and Be Understood by Accented L2 Learners? Comput. Assist. Lang. Learn. 2020, 33, 865–890. [Google Scholar] [CrossRef]
Princy, R.J.P.; Parthasarathy, S.; Hency Jose, P.S.; Raj Lakshminarayanan, A.; Jeganathan, S. Prediction of Cardiac Disease Using Supervised Machine Learning Algorithms. In Proceedings of the 2020 4th International Conference on Intelligent Computing and Control Systems (ICICCS), Madurai, India, 13–15 May 2020; pp. 570–575. [Google Scholar] [CrossRef]
Cobo, M.; López-Herrera, A.; Herrera-Viedma, E.; Herrera, F. An Approach for Detecting, Quantifying, and Visualizing the Evolution of a Research Field: A Practical Application to the Fuzzy Sets Theory Field. J. Inf. 2011, 5, 146–166. [Google Scholar] [CrossRef]
Shannon, C.E. A Mathematical Theory of Communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
Lotka, A.J. The Frequency Distribution of Scientific Productivity. J. Wash. Acad. Sci. 1926, 16, 317–323. [Google Scholar]

Figure 1. Strategic diagram of KeyWord Plus generated with bibliometrix.

Figure 2. Strategic diagram of the authors’ keywords generated with bibliometrix.

Figure 3. Thematic evolution of the authors’ keyWord Plus generated with bibliometrix.

Figure 4. Thematic evolution of the authors’ keywords generated with bibliometrix.

Figure 5. Map of word clouds in titles and abstracts (full count), generated with VOSviewer.

Figure 6. Map of word clouds in titles and abstracts (binary count), generated with VOSviewer.

Figure 7. Cloud map of journals where articles on “Intelligent Learning on Multidimensional Data Streams” are published, generated with VOSviewer.

Figure 8. Cloud map created from authors with journal papers on “Intelligent Learning on Multidimensional Data Streams”, generated with VOSviewer.

Table 1. Main areas of research assigned to the sample papers. Source: Scopus.

Research Areas	Records	% of 1276
Computer Science	387	30.33%
Engineering	255	19.98%
Mathematics	146	11.44%
Physics and Astronomy	77	6.03%
Decision Sciences	70	5.49%
Total of the 5 main research areas	935	73.28%

Table 2. Number of articles published per year. Source: Scopus.

Years	Items	Annual Growth Rate
2015	6	100.00%
2016	12	100.00%
2017	19	58.33%
2018	22	15.79%
2019	35	59.09%
2020	41	17.14%
2021	37	−9.76%
2022	63	70.27%
2023	84	33.33%
2024	109	29.76%
2025	166	42.20%
Total	594	39.38%

Table 3. Ten countries of corresponding authors. Source: Scopus.

Country	Articles	Frequency	SCP	MCP	MCP Ratio
China	328	55.2%	282	46	14.0%
India	43	7.2%	35	8	18.6%
USA	23	3.9%	21	2	8.7%
Australia	10	1.7%	5	5	50.0%
Germany	10	1.7%	6	4	40.0%
Canada	9	1.5%	2	7	77.8%
Italy	6	1.0%	6	0	0.0%
Ukraine	6	1.0%	6	0	0.0%
Korea	5	0.8%	4	1	20.0%
Spain	5	0.8%	5	0	0.0%
Total 10 countries	445	74.8%	372	73	22.9%

Table 4. Top ten total citations by country. Source: Scopus.

Country	Total Citations	Average Citations of Articles
China	2218	6.80
USA	781	34.00
Canada	499	55.40
Germany	375	37.50
India	291	6.80
Spain	157	31.40
Japan	127	42.30
Australia	114	11.40
Bangladesh	93	46.50
Saudi Arabia	85	28.30
Total (all countries)	5299	14.93

Table 5. The ten most relevant sources. Source: Scopus.

Sources	Articles	Type
Lecture Notes in Computer Science	23	Book Series
Communications in Computer and Information Science	15	Journal
Smart Innovation, Systems and Technologies	11	Book Series
Advances in Intelligent Systems and Computing	9	Book Series
IEEE Internet of Things Journal	9	Journal
IEEE Access	8	Journal
Proceedings of Spie—The International Society for Optical Engineering	7	Conference Proceedings
IEEE Transactions on Industrial Informatics	6	Journal
IEEE Transactions on Intelligent Transportation Systems	6	Journal
Journal of Image and Graphics	6	Journal

Table 6. The ten most cited articles, arranged in descending order by number of citations. Source: Scopus.

Author (Year) and Title	Source	Citations
Cui F. (2020) Advancing Biosensors with Machine Learning [46].	ACS Sensors	551
Su Z. (2022) Secure and Efficient Federated Learning for Smart Grid With Edge—Cloud Collaboration [47].	IEEE Xplore	216
Wei X. (2022) A high-accuracy, real-time, intelligent material perception system with a machine-learning-motivated pressure-sensitive electronic skin [48].	Matter	197
Brenner B. (2017) Digital Twin as Enabler for an Innovative Digital Shopfloor Management System in the ESB Logistics Learning Factory at Reutlingen—University [49].	Procedia Manufacturing	190
Chen X. (2019) Missing traffic data imputation and pattern discovery with a Bayesian augmented tensor factorization model [50].	Transportation Research Part C: Emerging Technologies	145
Zhu J. (2022) Application of recurrent neural network to mechanical fault diagnosis: a review [51].	Springer Nature Link	144
Galan E. (2020) Intelligent Microfluidics: The Convergence of Machine Learning and Microfluidics in Materials Science and Biomedicine [52].	Matter	129
Boquet G. (2020) A variational autoencoder solution for road traffic forecasting systems: Missing data imputation, dimension reduction, model selection and anomaly detection [53].	Transportation Research Part C: Emerging Technologies	126
Moussalli S. (2020) Intelligent personal assistants: can they understand and be understood by accented L2 learners? [54].	Computer Assisted Language Learning	113
Princy J. (2020) Prediction of Cardiac Diseaseusing Supervised Machine Learning Algorithms [55].	IEEE Xplore	99

Table 7. Most productive authors. Source: Scopus.

Authors	Institution	Articles
Wang Yaoze	Kunming University of Science and Technology, Kunming, China	21
Zhang Yushuang	Beijing Polytechnic University, Beijing, China	20
Wang Xiuwen	Dalian Minzu University, Dalian, China	16
Li Xiuzheng	School of Economics and Management, China	15
Li Yonghui	Anhui Xinhua University, Hefei, China	15

Table 8. Main keywords. Source: Scopus.

Author Keywords	Articles	Keywords Plus	Articles
deep learning	247	deep learning	158
machine learning	193	learning systems	135
learning systems	136	machine learning	82
artificial intelligence	79	machine-learning	70
machine-learning	71	learning algorithms	68
learning algorithms	68	intelligent systems	63
intelligent systems	64	forecasting	50
data mining	62	data mining	47
forecasting	51	data handling	43
big data	45	multidimensional data	43

Table 9. Entropic concentration index (H) of the selected variables. Source: Scopus.

Variable	H
Authors	0.9539
Sources	0.9359
Countries	0.4950
Areas of research	0.7188
Article citations	0.8154

Table 10. Observed distribution of the number of authors who wrote a given number of articles and adjusted values of Lotka’s law. Source: Scopus.

Number of Articles	Authors	Observed Frequency	Adjusted Frequency
1	1287	0.8100	0.8110
2	154	0.0970	0.0970
3	60	0.0380	0.0378
4	38	0.0240	0.0239
5	12	0.0080	0.0076
6	5	0.0030	0.0032
7	7	0.0040	0.0044
8	4	0.0030	0.0025
9	3	0.0020	0.0019
10	3	0.0020	0.0019
12	2	0.0010	0.0013
13	4	0.0030	0.0025
14	2	0.0010	0.0013
15	2	0.0010	0.0013
16	2	0.0010	0.0013
20	1	0.0010	0.0006
21	1	0.0010	0.0006
53	1	0.0010	0.0006

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Reyes, G.; Tolozano-Benites, R.; Lanzarini, L.; Hasperué, W.; Barzola-Monteses, J. Intelligent Learning on Multidimensional Data Streams: A Bibliometric Analysis of Research Evolution and Future Directions. Information 2025, 16, 1067. https://doi.org/10.3390/info16121067

AMA Style

Reyes G, Tolozano-Benites R, Lanzarini L, Hasperué W, Barzola-Monteses J. Intelligent Learning on Multidimensional Data Streams: A Bibliometric Analysis of Research Evolution and Future Directions. Information. 2025; 16(12):1067. https://doi.org/10.3390/info16121067

Chicago/Turabian Style

Reyes, Gary, Roberto Tolozano-Benites, Laura Lanzarini, Waldo Hasperué, and Julio Barzola-Monteses. 2025. "Intelligent Learning on Multidimensional Data Streams: A Bibliometric Analysis of Research Evolution and Future Directions" Information 16, no. 12: 1067. https://doi.org/10.3390/info16121067

APA Style

Reyes, G., Tolozano-Benites, R., Lanzarini, L., Hasperué, W., & Barzola-Monteses, J. (2025). Intelligent Learning on Multidimensional Data Streams: A Bibliometric Analysis of Research Evolution and Future Directions. Information, 16(12), 1067. https://doi.org/10.3390/info16121067

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Intelligent Learning on Multidimensional Data Streams: A Bibliometric Analysis of Research Evolution and Future Directions

Abstract

1. Introduction

2. Materials and Methods

2.1. Search Strategy and Data Acquisition

2.2. Bibliometric Analysis Tools

2.3. Specific Visualization Parameters

3. Results

3.1. Geographical Distribution of the Corresponding Authors

3.2. Main Publication Sources

3.3. Most Cited Articles

3.4. Main Keywords

3.5. Keyword Strategy Diagram

3.6. Thematic Evolution of Keywords

3.7. Degree of Concentration of Selected Variables

3.8. Charts of Citations, Sources, and Authors

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI