Bibliometric Analysis of Information Theoretic Studies

Statistical information theory is a method for quantifying the amount of stochastic uncertainty in a system. This theory originated in communication theory. The application of information theoretic approaches has been extended to different fields. This paper aims to perform a bibliometric analysis of information theoretic publications listed on the Scopus database. The data of 3701 documents were extracted from the Scopus database. The software used for analysis includes Harzing’s Publish or Perish and VOSviewer. Results including publication growth, subject areas, geographical contributions, country co-authorship, most cited publications, keyword co-occurrence analysis, and citation metrics are presented in this paper. Publication growth has been steady since 2003. The United States has the highest number of publications and received more than half of the total citations from all 3701 publications. Most of the publications are in computer science, engineering, and mathematics. The United States, the United Kingdom, and China have the highest collaboration across countries. The focus on information theoretic is slowly shifting from mathematical models to technology-driven applications such as machine learning and robotics. This study highlights the trends and developments of information theoretic publications, which helps researchers to understand the state of the art of information theoretic approaches for future contributions in this research domain.


Introduction
Entropy function is the basic building block of information theory. The calculation of entropy for a probabilistic system is widely recognized as the standard way to quantify the amount of uncertainty in that probabilistic system [1]. Statistical information theory provides a framework to quantify in a single value the proportion of total information in one set of measures explained by another set of measures and also quantifies the amount of redundant information.
The application of information theoretic approaches has been extended to various areas. Information theoretic approaches have been applied in the analysis of questionnaire data [2], product vulnerability [3], and brand switching [4]. Besides that, the information theoretic approach has been illustrated for identifying shared information and asymmetric relationships among variables [5]. Still and Precup [6] have studied reinforcement learning by drawing on ideas from information theory, and information theoretic approaches have been studied by past researchers in stock markets and complex systems [7][8][9][10][11].
The bibliometric analysis involves systematic quantitative and qualitative analyses of a particular research title or domain to assess the publication trends and developments from time to time [12]. The main advantage of bibliometric analysis is the ability to assess publication growth, geographical contribution, research focus, and scientific achievements such as citations and h-index [13]. Popular analyses in bibliometrics include publication growth, subject area, geographical contribution, source title, most cited publications, and keyword analyses [14].
Bibliometric analysis helps researchers to identify emerging areas and future directions of the research domain with the help of visualization tools. Bibliometric analysis is able to handle massive volumes of unstructured data from scientific databases and provide factual and objective information in the form of citation metrics [15]. According to our best understanding, there has been no past bibliometric analysis of information theoretic, therefore, this paper aims to perform a bibliometric analysis of information theoretic based on documents indexed in the Scopus database from 1958 to 2022. This bibliometric paper aims to provide a comprehensive overview of information theoretic studies and help researchers derive groundbreaking ideas by looking at the knowledge gaps.
This paper continues as follows. Section 2 introduces the data and methodology. Section 3 presents the bibliometric analysis results, including the publication trends, subject areas, geographical contribution, country co-authorship, famous source titles, most cited publications, keyword analyses, and citation metrics, and Section 4 concludes the paper with a summary and limitations of the study.

Data and Methodology
This study assesses the literature of information theoretic with data obtained from the Scopus database. Scopus is a universally accepted citation database that houses many high-quality peer-reviewed papers [16,17]. Since Scopus has wider coverage than the Web of Science, it is highly suited to be used in bibliometric analysis [18,19]. Data mining was performed from the Scopus database on 17 August 2022. This study focuses on "information theoretic" in the title. The first publication was made in 1958 and there have been 3701 papers as of 17 August 2022. The following search query was used: (TITLE ("information theoretic")) and a total of 3701 documents were obtained from this query. The 3701 publications were classified into eleven document types, as shown in Table 1. The two most popular document types were conference papers (1759) and articles (1756), accounting for 47.53% and 47.45% of the total publication, respectively. Other document types included book chapter (72), review (51), erratum (18), conference review (14), editorial (10), letter (10), note (6), book (4), and data paper (1). Table 1 lists the frequencies and percentages of the document types of the 3701 publications under information theoretic. The source type is the final published version of a document; out of the 3701 documents, 50.66% of the documents were published in journals (1875), 40.23% of the papers were published in conference proceedings (1489), and the remaining documents were published in book series (281 documents or 7.59%), books (55 documents or 1.49%) and trade journals (1 document or 0.03%). Figure 1 describes the source types of information theoretic publications. All the data used in this bibliometric analysis were downloaded on 17 August 2022. Upon extracting the data from the Scopus database, Harzing's Publish or Perish 8 was used to process and analyze the information. The strength of bibliometric analysis revolves around its ability to provide powerful quantitative and qualitative indicators of the title under study. The total number of papers (TP) is an important quantitative index to measure publication growth over the years. Other indices such as total citation (TC), number of cited papers (NCP), citation per paper (C/P), and citation per cited paper (C/CP) contribute to the qualitative assessment in bibliometric analysis. For future publications forecasting, h-index (h) and g-index (g) are often studied together. All these indicators, based on publication growth, country, and source title, can be obtained from Harzing's Publish or Perish software [20][21][22].
Network visualization and overlay visualization can be mapped with VOSviewer. VOSviewer is an open-source programme created by Van Eck and Waltman [23]. VOSviewer has also gained popularity due to its ability to perform co-occurrence and cocitation analyses with a user-friendly interface for map generation [24]. In this study, VOSviewer version 1.6.17 is used to visualize keyword co-occurrences and country coauthorship.

Results
This section covers the bibliometric analysis of information theoretic based on data extracted from the Scopus database. Publication growth, subject areas, geographical contributions, highly cited publications, and keyword analyses will be performed in this section. In the end, citation metrics will also be presented for a brief summary of all the 3701 publications under information theoretic, as of 17 August 2022.

Publication Growth
The annual publication of information theoretic papers from 1958 to 2022 is presented in Table 2. The publication growth of information theoretic papers was slow from the first paper published in 1958 until 2002. The average annual growth rate in this period was approximately 20%. The number of published papers exceeded 100 in 2003 and has been relatively stable since then, as there has been an increase in popularity among researchers towards information theoretic. Quantifying information based on probability distribution Upon extracting the data from the Scopus database, Harzing's Publish or Perish 8 was used to process and analyze the information. The strength of bibliometric analysis revolves around its ability to provide powerful quantitative and qualitative indicators of the title under study. The total number of papers (TP) is an important quantitative index to measure publication growth over the years. Other indices such as total citation (TC), number of cited papers (NCP), citation per paper (C/P), and citation per cited paper (C/CP) contribute to the qualitative assessment in bibliometric analysis. For future publications forecasting, h-index (h) and g-index (g) are often studied together. All these indicators, based on publication growth, country, and source title, can be obtained from Harzing's Publish or Perish software [20][21][22].
Network visualization and overlay visualization can be mapped with VOSviewer. VOSviewer is an open-source programme created by Van Eck and Waltman [23]. VOSviewer has also gained popularity due to its ability to perform co-occurrence and co-citation analyses with a user-friendly interface for map generation [24]. In this study, VOSviewer version 1.6.17 is used to visualize keyword co-occurrences and country co-authorship.

Results
This section covers the bibliometric analysis of information theoretic based on data extracted from the Scopus database. Publication growth, subject areas, geographical contributions, highly cited publications, and keyword analyses will be performed in this section. In the end, citation metrics will also be presented for a brief summary of all the 3701 publications under information theoretic, as of 17 August 2022.

Publication Growth
The annual publication of information theoretic papers from 1958 to 2022 is presented in Table 2. The publication growth of information theoretic papers was slow from the first paper published in 1958 until 2002. The average annual growth rate in this period was approximately 20%. The number of published papers exceeded 100 in 2003 and has been relatively stable since then, as there has been an increase in popularity among researchers towards information theoretic. Quantifying information based on probability distribution is necessary for more data-driven analyses. Information theoretic has attracted massive attention from computer science, engineering, and mathematics researchers. Since data extraction from the Scopus database was performed on 17 August 2022, the TP in 2022 was still unavailable. However, 96 papers have been published and indexed in Scopus database from 1 January to 17 August 2022. The citation metrics of the annual information theoretic publications are also listed in Table 2. The highest C/P was found in 1985 as there were 2934 total citations from only 16 publications, yielding a C/P index of 183.38 and a C/CP index of 244.50 because there were 12 cited papers. The high C/P and C/CP indices in 1985 were contributed by the highest cited publication by Wax and Kailath [25]. This paper has received 2731 citations since its publication.

Subject Areas
Information theoretic has attracted massive attention from researchers in the computer science, engineering and mathematics areas. Information theoretic adopts mathematical laws to monitor and maximize the value of data movement, storage, and retrieval [26]. In total, 2259 papers (32.89%) were listed under computer science, while 1254 papers (18.26%) and 1068 papers (15.55%) were categorized under engineering and mathematics, respectively. Other areas that apply information theoretic include physics and astronomy (8.36%), social sciences (3.61%), materials science (2.77%), biochemistry, genetics and molecular biology (2.66%), chemistry (1.86%), medicine (1.86%) and decision sciences (1.76%). The complete list of subject areas associated with information theoretic is presented in Table 3.

Geographical Contribution of Information Theoretic Publications
Information theoretic has attracted the contributions of researchers from more than 80 countries worldwide. The United States (1624) contributes the highest to the TP, followed by China (285) and the United Kingdom (279). The total citation from the 3701 publications is 82,000. Publications from the United States have received 45,797 citations, which amounts to 55.85% of the total citation. The country with the second highest number of citations is the United Kingdom, with 7352 citations.
Meanwhile, the United States also has the highest h-index of 89. This means that 89 papers have been cited at least 89 times, signaling high scientific achievement [21]. The United Kingdom and Germany have an h-index of 40 and 35, respectively. The highest C/P and C/CP are 41.58 and 49.46, respectively, and are contributed by Israel. Even though China has the second highest number of publications, China still needs to improve the quality of its scientific contributions. Table 4 presents the top 10 geographical contributions of information theoretic publications. In many publications, there is more than one researcher due to the complexity and scope of the study, which requires collaboration; co-authorship analysis assesses this research collaboration. Country co-authorship analysis portrays the level of collaboration among countries and highlights the most prominent countries in the research domain. The country co-authorship network map for information theoretic publication is shown in Figure 2. The larger node represents more prominent countries in the research interest of information theoretic. The length between the nodes and the boldness of lines reflect the collaboration among researchers across countries. From Figure 2, the most prominent countries active in information theoretic publications co-authorship are the United States, United Kingdom, China, Germany, Canada, France, Italy, Israel, Japan, and Australia. The United States has the highest total link strength of 490 from more than 1600 publications, followed by the United Kingdom (193), China (169), Germany (128), and Canada (127).       Table 6 tabulates the top 10 most cited publications in information theoretic since 1958. The most cited document, "Detection of signals by information theoretic criteria" by Wax and Kailath [25], received 2731 citations and 73.81 citations per year. This paper proposed a new method to detect the amount of signal in a multi-channel time series, removing any subjective setting. The second most cited paper was "Breaking spectrum gridlock with cognitive radios: an information theoretic perspective by Goldsmith et al. [27]. This paper discussed the use of cognitive radios for higher spectral efficiency with an information theoretic survey. The third most cited paper by Biglieri et al. [28] found that information theoretic approaches such as equalization, coding, and modulation enhance the performances of fading dispersive channels. Davis et al. [29] introduced an information theoretic method for learning a Mahalanobis distance function. This study showed that the proposed method could handle multiple constraints and incorporate a prior on the distance function. The following most cited paper by Bloch et al. [30] focused on the transmission of confidential data over wireless channels. This paper developed a practical secure communication protocol to ensure wireless information-theoretic security. Pedersen [31] presented an efficient non-interactive scheme for verifiable secret sharing without information about the secret. Vinh et al. [32] presented an organized study of information theoretic measures for comparing clusterings. The findings showed that the normalized information distance and normalized variation of information satisfy the normalization as well as metric properties. The next most cited paper by Dhillon et al. [33] proposed an innovative co-clustering algorithm to increase the preserved mutual information by intertwining the column and row clusterings at all stages. This paper showed that the proposed algorithm performed well, especially in the presence of highdimensionality and sparsity. Ozarow et al. [34] presented an information theoretic analysis associated with digital cellular radio that focused on time division multiple access protocols. Based on the information theoretic point of view, double ray propagation was advantageous Entropy 2022, 24, 1359 9 of 13 over a single ray propagation when both normalized to the same power. The tenth most cited paper by Brown et al. [35] proposed a unifying framework for information theoretic feature selection. This paper showed that common heuristics for information-based feature selection are approximate iterative maximizers of the conditional likelihood.

Keyword Analyses
This subsection includes various keyword analyses such as the keyword co-occurrence network and keyword overlay visualization maps. Keyword co-occurrence provides research highlights of the title under study. In all the 3701 documents, there are 16,828 indexed keywords as furnished by VOSviewer. One of the advantages of VOSviewer is the ability to adjust the minimum occurrences that the keywords should achieve in the visualization maps for better interpretation, clarity, and understanding. The frequency with which a keyword appears indicates its significance in the study domain. The more the keyword appears among the 3701 publications, the greater the attention placed on the research area based on the keyword, and this would also create a map with high clarity for better analysis [36][37][38]. Therefore, the keyword co-occurrence map in Figure 3 considers only keywords that have appeared more than 30 times, with 148 keywords matching the threshold and relevant to the information theoretic studies. The size of the node of a keyword reflects the weightage of the keyword in co-occurrence. The length between nodes shows the relationship strength between the nodes. As such, the shorter the length between nodes, the stronger the relationship between the nodes. The thickness of the line signifies the co-occurrences of the two keywords. The bolder the line, the higher the co-occurrences of the keywords [24].  VOSviewer also provides an overlay visualization map to analyze keyword adoption across the years to observe the trend of the research title. Based on the overlay visualiza- Nodes with similar colors are categorized as one cluster. VOSviewer classified the 148 keywords into four clusters. The first cluster involves the red nodes with keywords such as antennas, channel capacity, cryptography, data privacy, decoding, estimation, fading channels, fisher information, gaussian distribution, information theory, Markov processes, mathematical models, Monte Carlo methods, network protocols, optimization, radar, sensors, set theory, Shannon entropy, stochastic systems, and topology. The second cluster involves the green nodes with keywords such as artificial intelligence, Bayesian networks, cluster analysis, computation theory, data mining, decision making, entropy, feature extraction, forecasting, iterative methods, Kullback Leibler divergence, learning algorithms, machine learning, probability density function, probability distributions, robotics, semantics, and uncertainty analysis. Blue nodes are the third cluster. They involve bioinformatics, computational biology, computer simulation, controlled study, information science, metabolism, neurons, physiology, and theoretical model. The last cluster is yellow and includes image analysis, image enhancement, image reconstruction, image segmentation, magnetic resonance imaging, and medical imaging.
The link strength is quantitative and can be used to identify the frequencies of cooccurrence. The total link strength involves all the link strengths with other nodes [23]. "Information theory" has the highest total link strength of 7598 and 2326 occurrences. Other nodes with high total link strengths include algorithms (2289), entropy (1488), and human (1445). Figure 3 shows the keyword co-occurrence map of information theoretic documents, while Table 7 presents the top 10 keywords with total link strengths. VOSviewer also provides an overlay visualization map to analyze keyword adoption across the years to observe the trend of the research title. Based on the overlay visualization map in Figure 4, the yellow node implies that the keyword is of current research interest. For example, current research trends in information theoretic focus on the information theoretic measure, network security, data privacy, fisher information matrix, uncertainty analysis, decision making, and machine learning. Based on these keywords, it can be forecast that future publications in information theoretic will revolve around the technological revolution to quantify and extract information from various signals. Moreover, researchers are studying to ensure useful and valuable information can be quantified while maintaining users' and system security and privacy.       The most promising thematic lines in information theoretic include machine learning, robotics, quantification of information, and decision making. In Industry 4.0, all these thematic lines become the building blocks of transformation. Information theoretic is fundamental in processing and transmitting data and signals, which supports machine learning, robotics, information quantification, and decision making.

Discussion and Conclusions
This paper performed a comprehensive bibliometric analysis of the publications of information theoretic listed on the Scopus database from 1958 to 2022. The publication rate of information theoretic papers was low until the beginning of the 21st century. Publication growth has become steady, especially after 2003. Information theoretic has received much attention in computer science, engineering, and mathematics, using mathematical models and laws to quantify information in signals. The United States has the highest number of publications and received more than half of the total citations from all 3701 publications. The United States also has the highest scientific achievement after receiving the highest h-index. Even though China has the second highest number of publications, China could improve the quality of its contribution. The United States, United Kingdom, and China have the highest collaboration across countries. The source title that publishes the most information theoretic papers is Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics by Springer Nature. The most cited publication is "Detection of signals by information theoretic criteria" by Wax and Kailath [25]. From the keyword analysis, the research interest in information theoretic has slowly shifted from mathematical models to technology-driven applications. The most promising thematic lines in information theoretic include machine learning, robotics, quantification of information and decision making.
Even though this paper has contributed to providing insights into the development of information theoretic publications since the first paper was listed in the Scopus database, this study has a limitation. This paper queried information theoretic documents based on title only. This analysis is very highly accurate at the time of the query. The Scopus database is continuously updating the new publications from time to time. Therefore, a bibliometric analysis of information theoretic may be revisited in a few years. Moreover, this bibliometric analysis extracts scientific data from the Scopus database only. Future studies may cover other databases for a more extensive understanding of information theoretic studies.