Bibliometric Mining of Research Trends in Machine Learning

: We present a method, including tool support, for bibliometric mining of trends in large and dynamic research areas. The method is applied to the machine learning research area for the years 2013 to 2022. A total number of 398,782 documents from Scopus were analyzed. A taxonomy containing 26 research directions within machine learning was defined by four experts with the help of a Python program and existing taxonomies. The trends in terms of productivity, growth rate, and citations were analyzed for the research directions in the taxonomy. Our results show that the two directions, Applications and Algorithms, are the largest, and that the direction Convolutional Neural Networks is the one that grows the fastest and has the highest average number of citations per document. It also turns out that there is a clear correlation between the growth rate and the average number of citations per document, i.e., documents in fast-growing research directions have more citations. The trends for machine learning research in four geographic regions (North America, Europe, the BRICS countries, and The Rest of the World) were also analyzed. The number of documents during the time period considered is approximately the same for all regions. BRICS has the highest growth rate, and, on average, North America has the highest number of citations per document. Using our tool and method, we expect that one could perform a similar study in some other large and dynamic research area in a relatively short time.


Introduction
The basic idea of machine learning (ML) is that the behavior of a computer (a machine) should not be (completely) defined by a programmer.Instead, the computer should learn from existing data observations through the use of algorithms and, based on that, handle (e.g., classify) previously unknown data observations.This resembles the way that humans can handle new and unknown situations based on previous experience.The origin of modern ML is often associated with Frank Rosenblatt from Cornell University.In the late 1950s, he and his group created the perceptron classifier for recognizing the letters of the alphabet [1,2].During the 1980s, the concept of backpropagation was rediscovered, increasing the interest in ML research.In the 1990s, there was a general shift toward more data-driven ML approaches compared with the prior knowledge-based approaches.Researchers created algorithms and methods for analyzing large amounts of data observations by training ML models, which learned patterns from the data [3].
Today, ML is a rapidly evolving research area, and there are many research directions within ML, e.g., different ML algorithms and different applications of ML technology.The rate at which the number of articles and other documents related to ML grows is staggering.It is, therefore, very difficult for researchers and practitioners to follow and analyze the research trends in ML.In the Scopus database, there are over 106,000 documents from 2022 that have "machine learning" in the title, abstract, or list of keywords.
Two examples of classification systems for research documents are the ACM Computing Classification System [4]  When looking at Table 1, it becomes clear that no study has considered as many documents as we did in the current study.We considered 398,782 documents, whereas the studies shown in Table 1 considered between 86 and 3057 documents, i.e., the number of documents in our study is more than 100 times larger than the number of documents in any of the other studies.
Table 1 shows that Scopus and WoS (Web of Science) are the two dominating databases; we used Scopus.The table also shows that VOSviewer is a very popular tool when analyzing results from this kind of study.The typical way of analyzing keywords in bibliometric studies is to use tools such as VOSviewer or CiteSpace for plotting graphs showing how the different keywords relate to each other (see Section 2.2 for a discussion on VOSviewer and CiteSpace and why such tools are less suited in our case).
Table 2 summarizes 10 bibliometric studies in ML that are not application area specific.The focus of each study can be seen in the rightmost column.There are two types of focus areas for these studies: a specific type of algorithms (e.g., deep learning, neural networks, and support vector machines) or a specific geographic region (e.g., Africa, India, or China).Some studies combined these two types of foci.One study analyzed the publications in a certain journal during an eight-year period.The number of documents considered in these studies varies between 262 and 13,224, i.e., no study considered as many documents as we did in the current study.The databases, tools/graphs, and parameters analyzed in Table 2 are similar to those in Table 1.

Bibliometric Analysis and Tools
Bibliometrics is often used for research performance assessment of countries, universities, or researchers.Another way to use bibliometrics is science mapping, where bibliometric techniques are used to delimit a research field and detect subfields (research directions) [39,40].Bibliometric research performance assessment is most popular in Scandinavia, Italy, the Netherlands, and Great Britain [41], and the H-index is one of the most popular performance metrics [42].New research directions can, in some cases, be identified by analyzing citations [43], e.g., the yearly report on research fronts from the Chinese Academy of Science and Clarivate [44].
In order to identify research directions based on a corpus of documents, one wants to cluster related documents.One usually identifies relations between documents either based on citations or based on words in common [45].Obviously, citation data are only available for a certain time after a document has been published.To handle such delays, different ML techniques have been used to predict future citations [46].ML techniques have also been used for handling the problem of gradually changing the semantic meaning of certain concepts and topics [47].Clustering based on word relations, which is what we did in this study, has proven very useful in studies involving a large number of documents [48].
The two most popular software tools for bibliometric analysis are CiteSpace 6.2 [49] and VOSviewer 1.6.20 [50,51].These tools have similar functionality [52], and both tools use graphs to visualize bibliometric data.Such graphs become very complex and difficult to interpret, even when the number of documents is relatively small (e.g., Figures 9 and 10 in [9], Figure 16 in [8], and Figure 13 in [7]).Such visualization tools are, therefore, not suitable for studies based on hundreds of thousands of documents.

Research Gaps
The research gaps related to each of the two contributions defined in the Introduction are described in the two paragraphs below.
Tables 1 and 2 show that all bibliometric ML studies considered a much smaller number of documents compared with the current study (398,782 documents).This means that compared with previous research, the number of documents is very large, and our study, therefore, offers a much more comprehensive overview of the major research directions in ML compared to previous research.We analyzed productivity and citations for different research directions in ML.We also analyzed geographic distribution.From Tables 1 and 2, we see that productivity, citations, and geographic distribution are common (and important) parameters.Some studies also analyzed authors and journals.However, when dealing with a very large corpus of documents covering many application areas and algorithms, it becomes less relevant to analyze specific journals and authors.As discussed above, the automatic analysis of keywords (typically performed using VOSviewer or similar tools) often becomes confusing and difficult to interpret.We handle these two problems using an expert-defined blacklist and thesaurus (see Section 3 and Appendix B for details).
There is no bibliometric tool that, in a useful way, (semi-)automatically identifies important research directions and trends in large and dynamic research areas such as ML.Our approach is semi-automatic.By using a tool (in the form of a Python program), research area experts could identify research directions through data mining of a large corpus of documents.It turned out that only 30-40 h of the experts' time were needed (further discussed in Section 5).Since we considered author-defined keywords when defining the taxonomy, our approach easily adapts to emerging topics and new trends in dynamic research areas such as ML.

Methodology
Our research question is, "What are the trends in machine learning regarding research directions, geographic regions, and citations?".All authors are experts with several years of experience in ML research (see Appendix A for details).
The first step was to create an initial taxonomy of different research directions in ML.The taxonomy was hierarchical, i.e., it corresponded to a tree.This taxonomy was created by the authors in three two-hour workshops.During these workshops, the taxonomy was created in parallel with the thesaurus that connects author-defined keywords with the research directions in the taxonomy.As a starting point, the experts considered existing taxonomies in ML [53][54][55][56][57], as well as ACM's Computing Classification System (The relevant subsection: Theory of computation->Theory and algorithms for application domains-> Machine learning theory) [58] and the Scikit Learns API division [59].It should be noted that most taxonomies on ML are in different subfields and, as such, do not cover the complete research area.When defining the taxonomy, the experts also used a tool in the form of a Python program that extracted common keywords in ML from the Scopus database (see Section 3.3 for details about the program).The list of these keywords provided useful information when defining the taxonomy, e.g., if a frequent author-defined keyword could not be allocated to a research direction, the experts discussed how the taxonomy could be modified so that the author-defined keyword could be allocated to a research direction.This means that the final taxonomy and the thesaurus were created in parallel during the three workshops.By adapting it to frequent author-defined keywords, our taxonomy reflected important trends that may not have been identified in existing taxonomies.
After the author-defined keywords were connected to the leaves in the taxonomy, the Python program was used to generate graphs and citation statistics for research directions at different levels in the hierarchical taxonomy (see Section 3.2).

The Mining Process
Figure 1 shows our process for bibliometric mining.First, the research area (machine learning) and time period (2013-2022) were sent to the program for bibliometric mining (Step 1 in Figure 1).
could not be allocated to a research direction, the experts discussed how the taxonomy could be modified so that the author-defined keyword could be allocated to a research direction.This means that the final taxonomy and the thesaurus were created in parallel during the three workshops.By adapting it to frequent author-defined keywords, our taxonomy reflected important trends that may not have been identified in existing taxonomies.
After the author-defined keywords were connected to the leaves in the taxonomy, the Python program was used to generate graphs and citation statistics for research directions at different levels in the hierarchical taxonomy (see Section 3.2).

The Mining Process
Figure 1 shows our process for bibliometric mining.First, the research area (machine learning) and time period (2013-2022) were sent to the program for bibliometric mining (Step 1 in Figure 1).We retrieved all documents with "machine learning" in the title, list of keywords, or abstract for the 10-year time period (Step 2).This was done using the 10 search strings: "TITLE-ABS-KEY ({machine learning}) AND (PUBYEAR = 2013)",…, "TITLE-ABS-KEY ({machine learning}) AND (PUBYEAR = 2022)".
For each document retrieved using the search strings above, we obtained a record from Scopus (Step 3).Each record contained information that made it possible to determine which keywords the document could be connected to.This information included the title, abstract, and author-defined list of keywords of the document.The record also contained the number of citations of the document and the affiliations (including country) of the authors.In our case, there were 398,782 such records, with one record for each document retrieved from Scopus.There were 383,559 unique author-defined keywords.Some author-defined keywords were very general, e.g., data', research', and new' (all authordefined keywords were changed to lower case).Clearly, general keywords are not useful when defining research directions, e.g., a research direction called new' would be very difficult to relate to and, thus, would not be very useful.Once the experts looked at the list of the most common keywords from the documents retrieved from Scopus (Step 4), the keywords that were too general were put on a blacklist.After looking at the list of We retrieved all documents with "machine learning" in the title, list of keywords, or abstract for the 10-year time period (Step 2).This was done using the 10 search strings: "TITLE-ABS-KEY ({machine learning}) AND (PUBYEAR = 2013)",. .., "TITLE-ABS-KEY ({machine learning}) AND (PUBYEAR = 2022)".
For each document retrieved using the search strings above, we obtained a record from Scopus (Step 3).Each record contained information that made it possible to determine which keywords the document could be connected to.This information included the title, abstract, and author-defined list of keywords of the document.The record also contained the number of citations of the document and the affiliations (including country) of the authors.In our case, there were 398,782 such records, with one record for each document retrieved from Scopus.There were 383,559 unique author-defined keywords.Some authordefined keywords were very general, e.g., 'data', 'research', and 'new' (all author-defined keywords were changed to lower case).Clearly, general keywords are not useful when defining research directions, e.g., a research direction called 'new' would be very difficult to relate to and, thus, would not be very useful.Once the experts looked at the list of the most common keywords from the documents retrieved from Scopus (Step 4), the keywords that were too general were put on a blacklist.After looking at the list of common author-defined keywords, the experts also created an initial taxonomy with research directions.They then clustered the common keywords into research directions, which were the leaves in the taxonomy.The clustering of the keywords into research directions became our thesaurus.The blacklist and the thesaurus were then sent to the program (Step 5).Some clustering of author-defined keywords into research directions was trivial, e.g., 'neural network' and 'neural networks' were put into the same cluster, and 'health care' and 'healthcare' were put into the same cluster.Moreover, 'internet of things' and 'iot' were put in the same cluster, and 'artificial intelligence' and 'ai' were put into the same cluster.However, some clustering required the experts' knowledge (e.g., putting 'malware' and 'security' in the same cluster and putting 'nlp' and 'sentiment analysis' in the same cluster) (see Appendix B for details).
Steps 4 and 5 were repeated three times in our case, one time for each workshop.As discussed above, the taxonomy and the thesaurus were created in parallel during these workshops.Once the experts were happy with the taxonomy and thesaurus, the taxonomy was finalized (Step 6).The results presented in Section 4 were generated using the program based on the documents from Scopus, the blacklist, and thesaurus (Step 7).
As mentioned above, 383,559 unique author-defined keywords were collected from the 398,782 documents that are included in this study.A document belonged to a research direction (i.e., a leaf in the thesaurus) if at least one of the keywords that the experts allocated to that research direction was present in either the title, the abstract, or the list of author-defined keywords of the document.N.b., a document could belong to many research directions, and, as can be seen in Figure 2, a small number of documents did not belong to any research direction.
Steps 4 and 5 were repeated three times in our case, one time for each workshop.As discussed above, the taxonomy and the thesaurus were created in parallel during these workshops.Once the experts were happy with the taxonomy and thesaurus, the taxonomy was finalized (Step 6).The results presented in Section 4 were generated using the program based on the documents from Scopus, the blacklist, and thesaurus (Step 7).
As mentioned above, 383,559 unique author-defined keywords were collected from the 398,782 documents that are included in this study.A document belonged to a research direction (i.e., a leaf in the thesaurus) if at least one of the keywords that the experts allocated to that research direction was present in either the title, the abstract, or the list of author-defined keywords of the document.N.b., a document could belong to many research directions, and, as can be seen in Figure 2, a small number of documents did not belong to any research direction.
In Figure 2, the keywords in the expert-defined blacklist are removed, and the remaining keywords are ordered according to the number of documents that have the keyword in either the title, abstract, or author-defined list of keywords, with the most frequent keywords being first.The blue line in Figure 2 indicates the number of documents that have been classified as belonging to at least one research direction as a function of the number of keywords considered.The figure shows that for 200 (out of 383,559) keywords, more than 94% of the documents have already been classified as belonging to at least one research direction.The expert-defined thesaurus used in this study contained 202 keywords (see Appendix B).In Figure 2, the keywords in the expert-defined blacklist are removed, and the remaining keywords are ordered according to the number of documents that have the keyword in either the title, abstract, or author-defined list of keywords, with the most frequent keywords being first.The blue line in Figure 2 indicates the number of documents that have been classified as belonging to at least one research direction as a function of the number of keywords considered.The figure shows that for 200 (out of 383,559) keywords, more than 94% of the documents have already been classified as belonging to at least one research direction.The expert-defined thesaurus used in this study contained 202 keywords (see Appendix B).

Data Related to Research Directions
The p i keywords that correspond to research direction i, i.e., to leaf i in the taxonomy, were obtained from the expert-defined thesaurus (as mentioned above ∑ p i = 202 in our case).For each of the p i keywords, we created a set B k i consisting of the documents that contain the keyword k i (1 ≤ k i ≤ p i ) in the title, abstract, or list of keywords.A set A i is then created the following: The number of documents belonging to research direction i is given by the cardinality of A i (1).
As mentioned previously and as will be shown in Section 4, the expert-defined taxonomy is hierarchical.This means that there are internal nodes, i.e., nodes that are not leaves.We refer to such nodes as high-level research directions.The set of documents belonging to a high-level research direction is the union of the documents belonging to all the nodes below the internal node in the taxonomy tree.
The total number of documents for each research direction (including high-level research directions) during the time period of 2013 to 2022 was calculated.The growth factor for each research direction and each year was also calculated.The growth factor for research direction i for year j is defined as the number of documents for direction i for year j divided by the number of documents for research direction i for the year 2013 (2013 is the first year in the time interval considered).
A document that was published 10 years ago will normally have more citations than a document that was published recently.In order to compare the number of citations between documents published in different years, a year-normalized citation score, NCS (Normalized Citation Score), was defined for each document.NCS is obtained by dividing the number of citations within the document by the average number of citations for all documents in our dataset from the same year.As a consequence of this definition, the average NCS was 1 for the documents in our dataset.

Data Related to Geographic Information
We considered four geographic regions: Europe, North America (USA and Canada), the BRICS countries (Brazil, Russia, India, China, and South Africa), and The Rest of the World.A document with authors from more than one geographic region was counted proportionally in the corresponding regions, e.g., a document with three authors-one from China, one from Sweden, and one from Argentina-was allocated 1/3 to the region BRICS, 1/3 to the region Europe, and 1/3 to the region The Rest of the World.A small fraction (less than 3%) of the documents did not have information about affiliation country.These documents were excluded from this part of the study.

Data Related to Authors and Document Sources
We identified the 10 most productive authors in our dataset during the time period of 2013 to 2022.For the five most productive authors we plotted the number of documents per year.We also identified the 10 document sources that contributed the most to our dataset during the time period of 2013 to 2022.For the five most important sources, we plotted the number of documents per year.

The Program for Bibliometric Mining
The Python program for bibliometric mining used the pybliometrics interface to Scopus [60].The program went through all the documents two times.First, all authordefined keywords were collected and put on a list.After that, the program went through all documents again and, for each keyword, counted the number of documents with the keyword in the title, author-defined list of keywords, or abstract.As discussed above, each leaf in the taxonomy (i.e., each research direction) corresponded to a list of author-defined keywords (defined by the thesaurus).For each leaf in the taxonomy, a set was created.This set contained all documents that had one of the author-defined keywords associated with the taxonomy leaf in the document title, abstract, or author-defined keyword list.Finally, the normalized citation score (NCS), the number of documents, and the growth factor for each research direction and geographic region were calculated.The code is available at https://github.com/Lars-Lundberg-bth/bibliometric-ml(accessed on 10 January 2024).

Results
The number of documents retrieved from Scopus was 398,782; 316,462 of these documents had author-defined keywords.The total number of author-defined keywords was 1,642,925, and 383,559 of these keywords were unique.
Section 4.1 presents the taxonomy defined by the experts, Section 4.2 presents the trends and citation counts for the research directions defined in the taxonomy, and Section 4.3 presents the trends and citation counts for the four geographic regions considered.the taxonomy (i.e., each research direction) corresponded to a list of author-defined keywords (defined by the thesaurus).For each leaf in the taxonomy, a set was created.This set contained all documents that had one of the author-defined keywords associated with the taxonomy leaf in the document title, abstract, or author-defined keyword list.Finally, the normalized citation score (NCS), the number of documents, and the growth factor for each research direction and geographic region were calculated.The code is available at https://github.com/Lars-Lundberg-bth/bibliometric-ml(accessed on 16 January 2024).

Results
The number of documents retrieved from Scopus was 398,782; 316,462 of these documents had author-defined keywords.The total number of author-defined keywords was 1,642,925, and 383,559 of these keywords were unique.
Section 4.1 presents the taxonomy defined by the experts, Section 4.2 presents the trends and citation counts for the research directions defined in the taxonomy, and Section 4.3 presents the trends and citation counts for the four geographic regions considered.

Trends in ML Research
Figure 4 shows that the total number of documents in ML is increasing rapidly, and the increase rate has been constant during the last years, i.e., the line is more or less linear during the last years.

Trends in ML Research
Figure 4 shows that the total number of documents in ML is increasing rapidly, and the increase rate has been constant during the last years, i.e., the line is more or less linear during the last years.
Figure 5 shows that Algorithms and Applications are the two largest research directions in ML. Figure 6 shows that all the five top-level research directions in ML are growing.However, Algorithms and System and hardware are the directions that have grown fastest during the 10-year period from 2013 to 2022. Figure 6 shows that the research direction Learning paradigms has a local maximum 2019.By looking at the keywords included in this research direction, it was clear that the reason for this local maximum was related to the keyword 'reinforcement learning' (see Figure 7).We believe that the peak in interest in reinforcement learning during 2019 was related to the remarkable success of reinforcement learning in games such as Go and Chess during that period, e.g., the Go world champion Lee Sedol lost to AlphaGo in March 2016.For instance, MuZero, which is a computer program developed by the AI research company DeepMind to master games without knowing their rules, was released in 2019.Figure 5 shows that Algorithms and Applications are the two largest research directions in ML. Figure 6 shows that all the five top-level research directions in ML are growing.However, Algorithms and System and hardware are the directions that have grown fastest during the 10-year period from 2013 to 2022. Figure 6 shows that the research direction Learning paradigms has a local maximum 2019.By looking at the keywords included in this research direction, it was clear that the reason for this local maximum was related to the keyword reinforcement learning' (see Figure 7).We believe that the peak in interest in reinforcement learning during 2019 was related to the remarkable success of reinforcement learning in games such as Go and Chess during that period, e.g., the Go world champion Lee Sedol lost to AlphaGo in March 2016.For instance, MuZero, which is a computer program developed by the AI research company DeepMind to master games without knowing their rules, was released in 2019.Figure 5 shows that Algorithms and Applications are the two largest research directions in ML. Figure 6 shows that all the five top-level research directions in ML are growing.However, Algorithms and System and hardware are the directions that have grown fastest during the 10-year period from 2013 to 2022. Figure 6 shows that the research direction Learning paradigms has a local maximum 2019.By looking at the keywords included in this research direction, it was clear that the reason for this local maximum was related to the keyword reinforcement learning' (see Figure 7).We believe that the peak in interest in reinforcement learning during 2019 was related to the remarkable success of reinforcement learning in games such as Go and Chess during that period, e.g., the Go world champion Lee Sedol lost to AlphaGo in March 2016.For instance, MuZero, which is a computer program developed by the AI research company DeepMind to master games without knowing their rules, was released in 2019.Figure 8 shows the average NCS for the five top-level research directions in ML from 2013 to 2022.The figure shows that documents in the research direction Learning paradigms are the most cited ones (NCS = 1.16) and that documents in the research direction System and hardware have the lowest number of citations (NCS = 0.99).One can see that the average of the four NCS values shown in Figure 8 is more than 1.By definition, the average NCS for all documents was 1.However, as mentioned previously, some documents may belong to many research directions, and some other documents may not belong to any research direction.Since the average of the NCS values in Figure 8 is larger than 1, it seems that documents with many citations tend to belong to many research directions.Figure 8 shows the average NCS for the five top-level research directions in ML from 2013 to 2022.The figure shows that documents in the research direction Learning paradigms are the most cited ones (NCS = 1.16) and that documents in the research direction System and hardware have the lowest number of citations (NCS = 0.99).One can see that the average of the four NCS values shown in Figure 8 is more than 1.By definition, the average NCS for all documents was 1.However, as mentioned previously, some documents may belong to many research directions, and some other documents may not belong to any research direction.Since the average of the NCS values in Figure 8 is larger than 1, it seems that documents with many citations tend to belong to many research directions.Figure 8 shows the average NCS for the five top-level research directions in ML from 2013 to 2022.The figure shows that documents in the research direction Learning paradigms are the most cited ones (NCS = 1.16) and that documents in the research direction System and hardware have the lowest number of citations (NCS = 0.99).One can see that the average of the four NCS values shown in Figure 8 is more than 1.By definition, the average NCS for all documents was 1.However, as mentioned previously, some documents may belong to many research directions, and some other documents may not belong to any research direction.Since the average of the NCS values in Figure 8 is larger than 1, it seems that documents with many citations tend to belong to many research directions.

Trends for the Direction Algorithms in ML Research
As can be seen in the taxonomy in Figure 3, the research direction Algorithms was divided into two categories: Neural Networks and Algorithms other than NN or Other Algorithms Than Neural Networks.Figure 9 shows that these two categories contain almost the same number of documents (Neural Networks is slightly larger).Figure 10 shows that the category Neural Network is growing significantly faster than the other category.

Trends for the Direction Algorithms in ML Research
As can be seen in the taxonomy in Figure 3, the research direction Algorithms was divided into two categories: Neural Networks and Algorithms other than NN or Other Algorithms Than Neural Networks.Figure 9 shows that these two categories contain almost the same number of documents (Neural Networks is slightly larger).Figure 10 shows that the category Neural Network is growing significantly faster than the other category.

Trends for Research on Neural Networks
The taxonomy in Figure 3 shows that the research direction Neural Networks can be split into three categories: CNN (Convolutional Neural Networks), RNN (Recurrent Neural Networks), and Other Neural Network Algorithms. Figure 12 shows that the category Other Neural Network Algorithms contains the largest number of documents for the time period considered.Figure 13 shows that the research direction CNN is growing very rapidly.

Trends for Research on Neural Networks
The taxonomy in Figure 3 shows that the research direction Neural Networks can b split into three categories: CNN (Convolutional Neural Networks), RNN (Recurrent Neu ral Networks), and Other Neural Network Algorithms. Figure 12 shows that the categor Other Neural Network Algorithms contains the largest number of documents for the tim period considered.Figure 13 shows that the research direction CNN is growing very rap idly.Figure 14 shows that, on average, documents belonging to the research direction CNN have more citations compared with the other two categories.Figure 14 shows that, on average, documents belonging to the research direction CNN have more citations compared with the other two categories.Figure 14 shows that, on average, documents belonging to the research direction CNN have more citations compared with the other two categories.

Trends for Research on Algorithms Other than Neural Networks
The taxonomy in Figure 3 shows that the research direction Other Algorithms Than Neural Networks can be split into eight categories: Ensemble methods, Linear methods, Nearest Neighbor methods, Kernel-based methods, Tree-based methods, Clustering, Statistical methods, and Other algorithms.Figure 15 shows that the categories Ensemble

Trends for Research on Algorithms Other than Neural Networks
The taxonomy in Figure 3 shows that the research direction Other Algorithms Than Neural Networks can be split into eight categories: Ensemble methods, Linear methods, Nearest Neighbor methods, Kernel-based methods, Tree-based methods, Clustering, Statistical methods, and Other algorithms.Figure 15 shows that the categories Ensemble methods and Kernel-based methods contain the largest number of documents for the time period considered.Figure 16 shows that the categories Ensemble methods and Linear methods are the categories within the research direction Other Algorithms Than Neural Networks that grow the fastest.
AI 2024, 5, FOR PEER REVIEW 15 methods and Kernel-based methods contain the largest number of documents for the time period considered.Figure 16 shows that the categories Ensemble methods and Linear methods are the categories within the research direction Other Algorithms Than Neural Networks that grow the fastest.

Trends for the Direction Applications in ML Research
The taxonomy shows that the research direction Applications can be split in categories: Data mining, Healthcare, Social media, Security, Image Processing an puter Vision (IP and CV), Natural Language Processing (NLP), Internet of Thin and Other applications.Figure 18 shows that the categories Healthcare and Data contain the largest number of documents for the time period considered.Figure 1 that the category IoT grows very fast.To better visualize the growth factors for t categories, the growth factors for these categories, excluding IoT, have been p Figure 20. Figure 20 shows that the growth factors of the categories Other app and Healthcare are the highest for these seven categories.

Trends for the Direction Applications in ML Research
The taxonomy shows that the research direction Applications can be split into eight categories: Data mining, Healthcare, Social media, Security, Image Processing and Computer Vision (IP and CV), Natural Language Processing (NLP), Internet of Things (IoT), and Other applications.Figure 18 shows that the categories Healthcare and Data mining contain the largest number of documents for the time period considered.Figure 19 shows that the category IoT grows very fast.To better visualize the growth factors for the other categories, the growth factors for these categories, excluding IoT, have been plotted in Figure 20. Figure 20 shows that the growth factors of the categories Other applications and Healthcare are the highest for these seven categories.
and Other applications.Figure 18 shows that the categories Healthcare and Data m contain the largest number of documents for the time period considered.Figure 19 s that the category IoT grows very fast.To better visualize the growth factors for the categories, the growth factors for these categories, excluding IoT, have been plott Figure 20. Figure 20 shows that the growth factors of the categories Other applica and Healthcare are the highest for these seven categories. Figure 21 shows that, on average, documents belonging to the category Healthcare have more citations compared with the other categories in the research direction Applications.Figure 21 shows that, on average, documents belonging to the category Healthcare have more citations compared with the other categories in the research direction Applications.

Geographic Regions in ML Research
Table 3 shows the 20 most productive countries in ML research for the period of 2013 to 2022.We considered four geographic regions: Europe, North America, BRICS (Brazil, Russia, India, China, and South Africa) and The Rest of the World.Figure 22 shows that approximately the same number of documents were produced in these regions from 2013 to 2022 (the BRICS countries have a slightly higher production than the other regions).Fig-

Geographic Regions in ML Research
Table 3 shows the 20 most productive countries in ML research for the period of 2013 to 2022.We considered four geographic regions: Europe, North America, BRICS (Brazil, Russia, India, China, and South Africa) and The Rest of the World.Figure 22 shows that approximately the same number of documents were produced in these regions from 2013 to 2022 (the BRICS countries have a slightly higher production than the other regions).Figure 23 shows that the BRICS region has the highest growth factor and that the growth factors for Europe and North America are similar.We four geographic regions: Europe, North America, BRICS (Bra sia, India, China, and South Africa) and The Rest of the World.Figure 22 shows proximately the same number of documents were produced in these regions from 2022 (the BRICS countries have a slightly higher production than the other regio ure 23 shows that the BRICS region has the highest growth factor and that the factors for Europe and North America are similar.

BRICS
North America

Europe
The Rest of the World  Figure 24 shows that, on average, documents from North America have more citations compared with documents from the other regions.Documents from the BRICS countries have the lowest number of citations on average.In fact, there are, on average, 54% more citations to documents from North America compared with BRICS (1.25/0.81= 1.54).Figure 24 shows that, on average, documents from North America have more citations compared with documents from the other regions.Documents from the BRICS countries have the lowest number of citations on average.In fact, there are, on average, 54% more citations to documents from North America compared with BRICS (1.25/0.81= 1.54).Figure 24 shows that, on average, documents from North America have more citations compared with documents from the other regions.Documents from the BRICS countries have the lowest number of citations on average.In fact, there are, on average, 54% more citations to documents from North America compared with BRICS (1.25/0.81= 1.54).

Important Authors and Document Sources for ML Research
Figure 25 shows the total number of documents during the time period of 2013 to 2022 for the 10 most productive authors in our dataset.The figure shows that Amir Mosavi from Obuda University in Hungary is the most productive author, with 180 documents.The total number of documents for all 10 top authors is 1,352 (see datafile on https://github.com/Lars-Lundberg-bth/bibliometric-ml(accessed 16 January 2024)).This is less than 0.4% of the total number of publications (398,782).This means that ML research is a large area that is not dominated by a small set of authors.Figure 26 shows the number of documents per year for the five most productive authors in our dataset.The figure shows that Amir Mosavi produced 76 documents in 2020, which is the highest number of documents by the same author during the same year.

Important Authors and Document Sources for ML Research
Figure 25 shows the total number of documents during the time period of 2013 to 2022 for the 10 most productive authors in our dataset.The figure shows that Amir Mosavi from Obuda University in Hungary is the most productive author, with 180 documents.The total number of documents for all 10 top authors is 1,352 (see datafile on https://github.com/Lars-Lundberg-bth/bibliometric-ml (accessed on 10 January 2024)).This is less than 0.4% of the total number of publications (398,782).This means that ML research is a large area that is not dominated by a small set of authors.Figure 26 shows the number of documents per year for the five most productive authors in our dataset.The figure shows that Amir Mosavi produced 76 documents in 2020, which is the highest number of documents by the same author during the same year.Figure 27 shows the total number of documents during the time period 2013 to 2022 for the 10 largest sources of ML documents, i.e., the 10 largest sources of documents in our dataset.The figure shows that Lecture Notes in Computer Science (including the subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) is the source with the largest number of documents in our dataset (12,603 documents).The figure also shows that IEEE Access is the journal with the largest number of documents in our dataset.Figure 28 shows the number of documents per year for the five largest sources of documents in our dataset.The figure shows that none of these five sources increased significantly during the last two years of the period considered (in fact, most of them decreased).Since the total number of documents in ML also increased significantly during the last two years of the time period (see Figure 4), it is clear that the spread of sources of ML documents increased.This seems reasonable since ML has worked its way into new areas, resulting in a wider range of publication sources.Figure 27 shows the total number of documents during the time period 2013 to 2022 for the 10 largest sources of ML documents, i.e., the 10 largest sources of documents in our dataset.The figure shows that Lecture Notes in Computer Science (including the subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) is the source with the largest number of documents in our dataset (12,603 documents).The figure also shows that IEEE Access is the journal with the largest number of documents in our dataset.Figure 28 shows the number of documents per year for the five largest sources of documents in our dataset.The figure shows that none of these five sources increased significantly during the last two years of the period considered (in fact, most of them decreased).Since the total number of documents in ML also increased significantly during the last two years of the time period (see Figure 4), it is clear that the spread of sources of ML documents increased.This seems reasonable since ML has worked its way into new areas, resulting in a wider range of publication sources.

Discussion
When looking at the graphs in Section 4, one can make some general observations.One such observation is that there seems to be a correlation between a high NCS score and a large growth factor.In total, 26 research directions and subdirections were analyzed in Section 4. For each direction and subdirection, there is a pair (growth factor 2022, NCS).If one looks at these 26 pairs and calculates the Spearman correlation [61] between the two values in the pair, one gets a value of 0.67, i.e., there is a clear positive correlation between the growth factor and the NCS value.This means that, on average, documents in fastgrowing directions are cited more than documents in directions with slower growth.
Since we only considered four geographic regions, it did not make sense to do any statistical analysis between the NCS value and the growth factor of the regions.However, the relationship between citations and growth factor seems to be the opposite compared with the research directions, i.e., the region that grows the fastest (BRICS) has the lowest NCS, and the regions that grow the slowest (North America and Europe) have the highest NCS values.One reason for this finding could be that North America and Europe are pi-

Discussion
When looking at the graphs in Section 4, one can make some general observations.One such observation is that there seems to be a correlation between a high NCS score and a large growth factor.In total, 26 research directions and subdirections were analyzed in Section 4. For each direction and subdirection, there is a pair (growth factor 2022, NCS).If one looks at these 26 pairs and calculates the Spearman correlation [61] between the two values in the pair, one gets a value of 0.67, i.e., there is a clear positive correlation between the growth factor and the NCS value.This means that, on average, documents in fastgrowing directions are cited more than documents in directions with slower growth.
Since we only considered four geographic regions, it did not make sense to do any statistical analysis between the NCS value and the growth factor of the regions.However, the relationship between citations and growth factor seems to be the opposite compared with the research directions, i.e., the region that grows the fastest (BRICS) has the lowest NCS, and the regions that grow the slowest (North America and Europe) have the highest

Discussion
When looking at the graphs in Section 4, one can make some general observations.One such observation is that there seems to be a correlation between a high NCS score and a large growth factor.In total, 26 research directions and subdirections were analyzed in Section 4. For each direction and subdirection, there is a pair (growth factor 2022, NCS).If one looks at these 26 pairs and calculates the Spearman correlation [61] between the two values in the pair, one gets a value of 0.67, i.e., there is a clear positive correlation between the growth factor and the NCS value.This means that, on average, documents in fast-growing directions are cited more than documents in directions with slower growth.
Since we only considered four geographic regions, it did not make sense to do any statistical analysis between the NCS value and the growth factor of the regions.However, the relationship between citations and growth factor seems to be the opposite compared with the research directions, i.e., the region that grows the fastest (BRICS) has the lowest NCS, and the regions that grow the slowest (North America and Europe) have the highest NCS values.One reason for this finding could be that North America and Europe are pioneers and leaders in ML research.One implication of being a leader in the field is that one gets more citations.This would explain the high NCS values for North America and Europe.An implication of being a pioneer is that one starts with research in ML early, resulting in a relatively high number of documents already since 2013.Since the growth factor is relative to the productivity in 2013, this could explain why the growth factors in North America and Europe are lower than the growth factors in BRICS and The Rest of the World.
Our results show that documents from North America are the most cited ones.The same observation was made in a similar study related to Big Data [12].In that study, the country/region with the lowest NCS was China, and in this study, the region/group with the lowest NCS is BRICS.This means that the citation patterns seem to be similar in ML and Big Data, which is not very surprising since there is some overlap between these two research areas.
Our method provides tool support, making it possible to identify research directions and trends in areas with a large number of documents (almost 400,000 documents in our case).Since we used human experts, the identified research directions became useful and easy to relate to.Completely automatic approaches tend to identify research directions (in the form of common keywords) that are too general (e.g., 'risk' [6], 'costs' [7], 'new' [8], and 'mouse' [9]) or confusing (e.g., 'micro grid' different from 'microgrid' different from 'microgrids' [7]; 'svm' different from 'support vector machine' [9]; and 'neural network' different from 'neural networks' [10]).In our case, the problem with keywords that are too general was handled using the blacklist, and the problem with confusing keywords was handled using the thesaurus.
The time required by the experts was rather limited.Defining the taxonomy and thesaurus and grouping the keywords into research directions or a blacklist (see Appendix B) was performed during three workshops.Each such workshop took approximately two hours, and all four experts participated in these workshops.Taking into account some preparation work before the workshops, we estimate that only 30-40 h of the experts' time were needed to produce the taxonomy, thesaurus, and blacklist.
The Python program was not only used to automatically generate the results after the experts grouped the keywords into research directions and blacklist.The Python program was also used when defining the taxonomy.The program produced a list of the most common author-defined keywords in the corpus of documents considered.Having a list of common author-defined keywords was very helpful when defining research directions in the taxonomy since this list provided a bottom-up approach in the sense that when defining the taxonomy, the experts needed to consider how the list of author-defined keywords could be grouped into research directions in a good way.This kind of bottom-up grouping was a very useful complement to the different existing taxonomies that the experts looked at when defining the taxonomy used in this study.We believe that a combination of bottomup grouping of author-defined keywords and considering existing taxonomies was the best way to create our taxonomy.Since we considered frequent author-defined keywords when defining our taxonomy, our approach easily adapted to hot topics and new trends in dynamic research areas such as ML.
As mentioned in the Introduction, the method used in this study is similar to the one used in a previous bibliometric study on trends in Big Data research [12].However, one important improvement in the current study is the method and tool support used for defining the taxonomy.In the study on trends in Big Data research, no taxonomy was defined.If a similar study as the one presented here is conducted five years from now using the same tool and methodology, the tool will identify new frequent author-defined keywords, and the experts will adapt the taxonomy accordingly.For instance, if the recent trends in multi-modal machine learning [62,63] turn out to be important five years from now, this will be reflected in the author-defined keywords found by the tool and, as a consequence, be reflected in future versions of the expert-defined taxonomy.In this way, the tool and methodology will identify emerging topics.The trends in the areas defined in the existing taxonomy can be identified using the graphs presented in Section 4. For instance, Figure 13 shows that there is a rapid increase in research related to CNN, and Figure 19 shows that there is a clear trend toward ML research related to IoT.Identification of such trends is important for policy-and decision-makers in industry and academia when they decide the direction for future research and innovation programs.
Since we used a program in the mining process, it became easy to calculate metrics that would have been difficult to obtain without this kind of support.Examples of such metrics are the growth factor and the average NCS per research direction and geographic region.
All keywords that we used were defined as keywords by authors in the research area, i.e., we do not perform any general text analysis when identifying keywords.Therefore, no weighting, such as TF-IDF (Term Frequency-Inverse Document Frequency [64]), was needed in order to find relevant keywords.Instead, we benefited from the authors' intellectual work when they defined their keywords.When searching for potential keywords in general text, one needs to use techniques such as Inverse Document Frequency to filter out common words that do not carry any interesting information.We used the blacklist for this purpose.

Conclusions and Future Work
By using research area experts and a computer program, we defined a taxonomy for research in ML and evaluated trends in research productivity and citations in the different research directions defined by that taxonomy.The first conclusion is that ML is a very active research area that grew rapidly during the period of 2013 to 2022.Our results also show that the two largest research directions in ML are Algorithms and Applications.The fastest-growing subdirection in Algorithms is CNN, and the fastest-growing application area in the research direction Applications is IoT.
Most directions and subdirections in ML grew monotonously during the period of 2013 to 2022.However, the number of documents that contain "reinforcement learning" in the title, abstract, or author-defined list of keywords actually peaked in 2019.The reason for this is probably related to the remarkable success of the use of reinforcement learning in games such as Go and Chess during that period.
When looking at citations, we see that there is a positive correlation between the number of citations to documents in a research direction and the rate with which the research direction is growing; the Spearman correlation coefficient was 0.67.For instance, the fast-growing research direction CNN has a Normalized Citation Score of 1.39, whereas the application area NLP only has an NCS value of 0.81; NLP grows relatively slowly compared with other directions and subdirections in ML.
ML is an active research area in all parts of the world.We considered four regions/groups of countries: North America (USA and Canada), Europe, BRICS (Brazil, Russia, India, China, and South Africa), and The Rest of the World.It turns out that the growth rate is highest in the BRICS countries and lowest in Europe and North America.However, the citation rate of papers from North America is the highest, and the citation rate for papers from the BRICS countries is the lowest.In fact, there are, on average, 54% more citations of documents from North America compared with those of documents from BRICS.
When considering important authors and document sources related to ML research, it is clear that ML research is a large area that is not dominated by a small set of authors.
Moreover, the spread of sources of ML documents seems to have increased during the last few years, which is reasonable since ML has worked its way into new areas, resulting in a wider range of publication sources.
In this study, we considered the Scopus database.However, the methodology could be applied also to other databases.Some parts of the code for the mining program could be reused.Obviously, the interface to the document database would have to be replaced if our approach should be used for databases other than Scopus.
The tool (program) and method presented here are not specific to ML.We expect that it would be possible to use the same method to identify research directions and trends in other large and dynamic research areas with a relatively limited effort.Due to the support of the program and the design of the mining process, we only needed 30-40 h of the experts' time in this study.
As discussed in Section 2.2, keyword graphs (i.e., graphs with keywords as nodes) produced using visualization tools such as VOSviewer and CiteSpace become hard to read and interpret when applied to large research areas, such as ML.However, these kinds of graphs could probably also be useful to visualize the connections between different expert-defined research directions, i.e., putting the expert-defined research directions as the nodes and letting the edges indicate the overlap between two directions in terms of number of shared documents between the two directions.However, more research is needed regarding how these kinds of graphs can visualize interesting relationships in large and complex areas with hierarchical taxonomies.This is something that will be investigated in future studies.

Figure 1 .
Figure 1.Overview of the mining process.

Figure 1 .
Figure 1.Overview of the mining process.

Figure 2 .Figure 2 .
Figure 2. The number of classified documents (i.e., documents present in at least one research direction defined by the thesaurus) as a function of the number of keywords in the thesaurus.

Figure 3
Figure 3 shows the expert-defined taxonomy used in this study.As can be seen in the figure, there are 22 leaf nodes, four internal nodes (Algorithms, Neural Network, Algorithms other than NN, and Applications), and one top node (Machine learning).Section 4.2 covers the different levels in the taxonomy.

Figure 3
Figure 3 shows the expert-defined taxonomy used in this study.As can be seen in the figure, there are 22 leaf nodes, four internal nodes (Algorithms, Neural Network, Algorithms other than NN, and Applications), and one top node (Machine learning).Section 4.2 covers the different levels in the taxonomy.

Figure 3 .
Figure 3.The expert-defined hierarchical taxonomy with the research directions considered.

Figure 3 .
Figure 3.The expert-defined hierarchical taxonomy with the research directions considered.

AI 2024, 5 218Figure 4 .
Figure 4. Total number of documents per year in ML for the period of 2013 to 2022.

Figure 5 .
Figure 5.The relative proportion of the total number of documents for the five top-level research directions in ML for the period of 2013 to 2022.

Figure 4 .
Figure 4. Total number of documents per year in ML for the period of 2013 to 2022.

Figure 4 .
Figure 4. Total number of documents per year in ML for the period of 2013 to 2022.

Figure 5 .Figure 5 . 12 Figure 6 .Figure 6 .
Figure 5.The relative proportion of the total number of documents for the five top-level research directions in ML for the period of 2013 to 2022.

Figure 6 .
Figure 6.Growth factor per year for the five top-level research directions in ML.

Figure 7 .
Figure 7.The number of documents containing the keyword "reinforcement learning" in the title, abstract, or author-defined list of keywords.

Figure 7 .
Figure 7.The number of documents containing the keyword "reinforcement learning" in the title, abstract, or author-defined list of keywords.

Figure 6 .
Figure 6.Growth factor per year for the five top-level research directions in ML.

Figure 7 .
Figure 7.The number of documents containing the keyword "reinforcement learning" in the title, abstract, or author-defined list of keywords.

Figure 8 .
Figure 8.The average Normalized Citation Score (NCS) for the five top-level research directions in ML.

AI 2024, 5 , 13 Figure 8 .
Figure 8.The average Normalized Citation Score (NCS) for the five top-level research directions in ML.

Figure 9 . 50 60Figure 9 .
Figure 9.The relative proportion of the total number of documents for the two research directions Neural Networks and Other Algorithms Than Neural Networks for the period 2013 to 2022.

Figure 9 .
Figure 9.The relative proportion of the total number of documents for the two research direction Neural Networks and Other Algorithms Than Neural Networks for the period 2013 to 2022.

Figure 10 .
Figure 10.Growth factor per year for the two research directions Neural Networks and Other Algo rithms Than Neural Networks.

Figure 11
Figure11shows the NCS for the two categories in the research direction Algorithm The figure shows that, on average, documents in the category Neural Networks have mor citations than documents in the other category.

Figure 11 .Figure 10 .
Figure 11.The average Normalized Citation Score (NCS) for the two research directions Neura Networks and Other Algorithms Than Neural Networks.

Figure 11
Figure11shows the NCS for the two categories in the research direction Algorithms.The figure shows that, on average, documents in the category Neural Networks have more citations than documents in the other category.

Figure 9 .
Figure 9.The relative proportion of the total number of documents for the two research dire Neural Networks and Other Algorithms Than Neural Networks for the period 2013 to 2022.

Figure 10 .
Figure 10.Growth factor per year for the two research directions Neural Networks and Other rithms Than Neural Networks.

Figure 11
Figure11shows the NCS for the two categories in the research direction Algorit The figure shows that, on average, documents in the category Neural Networks have citations than documents in the other category.

Figure 11 .Figure 11 .
Figure 11.The average Normalized Citation Score (NCS) for the two research directions N Networks and Other Algorithms Than Neural Networks.

Figure 12 .Figure 12 .
Figure 12.The relative proportion of the total number of documents for the three research direction CNN, RNN, and Other Neural Network Algorithms for the period 2013 to 2022.

Figure 12 .
Figure 12.The relative proportion of the total number of documents for the three research directions CNN, RNN, and Other Neural Network Algorithms for the period 2013 to 2022.

Figure 13 .
Figure 13.Growth factor per year for the three research directions CNN, RNN, and Other Neural Network Algorithms.

Figure 13 .
Figure 13.Growth factor per year for the three research directions CNN, RNN, and Other Neural Network Algorithms.

Figure 12 .
Figure 12.The relative proportion of the total number of documents for the three research directions CNN, RNN, and Other Neural Network Algorithms for the period 2013 to 2022.

Figure 13 .
Figure 13.Growth factor per year for the three research directions CNN, RNN, and Other Neural Network Algorithms.

Figure 14 .
Figure 14.The average Normalized Citation Score (NCS) for the three research directions CNN, RNN, and Other Neural Network Algorithms.

Figure 14 .
Figure 14.The average Normalized Citation Score (NCS) for the three research directions CNN, RNN, and Other Neural Network Algorithms.

Figure 15 .Figure 15 .
Figure 15.The relative proportion of the total number of documents for the research directions related to Algorithms Other Than Neural Networks for the period of 2013 to 2022.

Figure 15 .
Figure 15.The relative proportion of the total number of documents for the research directions related to Algorithms Other Than Neural Networks for the period of 2013 to 2022.

Figure 16 .
Figure 16.Growth factor per year for the research directions related to Algorithms Other Than Neural Networks.

Figure 17
Figure17shows that, on average, documents belonging to the category Ensemble methods have more citations than the other categories in the research direction Other Algorithms Than Neural Networks.The figure also shows that documents belonging to the category Clustering have the smallest NCS value.

Figure 16 .
Figure 16.Growth factor per year for the research directions related to Algorithms Other Than Neural Networks.

Figure 17 Figure 17 .
Figure17shows that, on average, documents belonging to the category Ensemble methods have more citations than the other categories in the research direction Other Algorithms Than Neural Networks.The figure also shows that documents belonging to the category Clustering have the smallest NCS value.

Figure 17 .
Figure 17.The average Normalized Citation Score (NCS) for the research directions related to Algorithms Other Than Neural Networks.

Figure 18 .Figure 18 . 17 Figure 19 . 35 Figure 19 .
Figure 18.The relative proportion of the total number of documents for the research directio lated to Applications for the period 2013 to 2022.

Figure 19 .
Figure 19.Growth factor per year for the research directions related to Applications.

Figure 20 .
Figure 20.Growth factor per year for the research directions related to Applications, except for IoT.

Figure 20 .Figure 21 .
Figure 20.Growth factor per year for the research directions related to Applications, except for IoT.AI 2024, 5, FOR PEER REVIEW 18

Figure 21 .
Figure 21.The average Normalized Citation Score (NCS) for the research directions related to Applications.

Figure 22 . 19 Figure 22 .
Figure 22.The relative proportion of the total number of documents for the four geographic regions considered.

Figure 23 .
Figure 23.Growth factor per year for the four geographic regions considered.

Figure 23 .
Figure 23.Growth factor per year for the four geographic regions considered.

Figure 23 .
Figure 23.Growth factor per year for the four geographic regions considered.

Figure 24 .
Figure 24.The average Normalized Citation Score (NCS) for the four geographic regions considered.

Figure 24 .
Figure 24.The average Normalized Citation Score (NCS) for the four geographic regions considered.

AI 2024, 5 ,Figure 25 .
Figure 25.The 10 most productive authors in ML during the period of 2013 to 2022.

Figure 26 .
Figure 26.The number of documents per year for the five most productive authors in ML.

Figure 25 .Figure 25 .
Figure 25.The 10 most productive authors in ML during the period of 2013 to 2022.

Figure 26 .
Figure 26.The number of documents per year for the five most productive authors in ML.

Figure 27
Figure27shows the total number of documents during the time period 2013 to 2022 for the 10 largest sources of ML documents, i.e., the 10 largest sources of documents in our dataset.The figure shows that Lecture Notes in Computer Science (including the subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) is the source with the largest number of documents in our dataset (12,603 documents).The figure also shows that IEEE Access is the journal with the largest number of documents in our dataset.Figure28shows the number of documents per year for the five largest sources of documents in our dataset.The figure shows that none of these five sources increased

Figure 26 .
Figure 26.The number of documents per year for the five most productive authors in ML.

Figure 28 .
Figure 28.The number of documents per year for the five largest sources of ML documents.

Figure 27 . 21 Figure 27 .
Figure 27.The 10 largest sources of ML documents during the period 2013 to 2022.

Figure 28 .
Figure 28.The number of documents per year for the five largest sources of ML documents.

Figure 28 .
Figure 28.The number of documents per year for the five largest sources of ML documents.

Table 1 .
and the system for Mathematics Subject Classification Cont.

Table 2 .
Other bibliometric studies in ML.

Table 3 .
The major contributing countries in ML for the period 2013 to 2022.

Table 3 .
The major contributing countries in ML for the period 2013 to 2022.