The data were analyzed in two stages. First, we compared the six main areas in terms of five different variables on citations and collaboration statistics. Then, we focused on the humanities and compared five subareas of humanities using the same variables.
3.1. Comparisons of Six Main Areas
The six main OECD areas were compared by the citation and collaboration variables presented in Table 1
. The values in Table 1
are the median values, meaning half of the categories in a specific area had greater values than those. As shown in Table 1
, the humanities have the lowest values among the six areas for all five variables. The first three variables, based on citations, were evaluated in detail under the first subtitle, and the other two variables (percentage of industry collaboration and international collaboration) were evaluated under the second.
shows the distribution of publications and citations in the OECD areas. The figure proves the dominance of pure sciences in terms of publication patterns. Of the papers, 81% were published in three main pure sciences categories: natural sciences (33%), medical sciences (27%), and engineering and technology (21%). The total number of humanities publications was almost similar to a relatively small pure science area: agricultural sciences. However, the distribution of citations to the OECD subject areas revealed the main difference with humanities. Humanities had only 0.52% of whole citations in the dataset, while natural sciences had 44%, medical sciences had 30%, engineering and technology had 17%, social sciences had 6%, and agriculture had 1.5%.
The other important aspect of Figure 1
is proofing of the differences in subareas, not only for the humanities, but also for all scientific areas. The numerical advantages of psychology, economics and management, clinical medicine, biological sciences, and electrical and electronic engineering over the other sub-areas in their domain are obvious.
The median for citations per paper was 19 for natural sciences. Agricultural sciences and medical and health sciences followed with 14 and 13, respectively. It was 10 for engineering and technology and 9 for social sciences, but the median was not even 1 for the humanities (0.68). The boxplot graph in Figure 2
shows the values of citations per paper for the categories included in each area. Humanities stands out among others with its different position. Of the 29 categories classified under humanities, 7 had values greater than 2 (linguistics, ethics, archaeology, history and philosophy of science, language and linguistics, history of social sciences, and philosophy in descending order). The linguistics category had the highest value for citations per paper at 7.85, below the median value of all the other areas. The six main areas differed statistically in terms of citations per paper (H
= 113.629, p
= 0.436), and the humanities were the main reason that they had statistically significant differences from all five of the other areas (p
> 0.87). Besides that, engineering and technology (U
= 539.000, p
= 0.620) and social sciences (U
= 588.000, p
= 0.611) differed from the natural sciences with reasonable effect sizes.
Similarly, the median for the percentage of cited publications in the humanities was considerably low (16%), which meant 84% of the papers for half of the categories in the humanities had not been cited yet. The median value of cited publication rates for other areas was between 51% and 78%. When we compared the areas by the categories’ cited paper rates, we found that the difference was statistically significant (H = 134.659, p < 0.001, = 0.521). This was mainly because of the humanities that differed from all other areas (p < 0.001). Note that was 0.833 for the social sciences and humanities, whereas it was calculated to be greater than 0.96 between the humanities and four other areas. Furthermore, the social sciences affected the difference between the areas. It differed from the agricultural sciences (U = 59.000, p < 0.001, = 0.777), medical and health sciences (t = 5.070, p < 0.001, = 1.023), engineering and technology (t = 4.187, p < 0.001, = 0.869), and natural sciences (U = 269.000, p < 0.001, = 0.822). In addition, the natural sciences differed from engineering and technology and the medical and health sciences, with values around 0.59 (p < 0.001 for both).
At least 1 in every 400 papers in the natural sciences, and 1 in every 500–700 papers in the social sciences (0.19%), agricultural sciences (0.18%), medical and health sciences (0.17%), and engineering and technology (0.15%) were highly cited for half of the categories included in these areas. More than half of the categories (15) in the humanities did not have any highly cited papers. Ethics had the highest rate of highly cited papers in the humanities (0.14%). The rates for the other 13 categories in the humanities changed between 0.015% (one in every 6667 publications) and 0.0003%. The humanities were significantly different from all other areas in regard to the percentage of highly cited papers (p < 0.001, > 0.91).
Similar differences among the six OECD areas were observed for the percentage of papers produced with international collaboration (H = 139.421, p < 0.001, = 0.540). Half of the categories in the natural sciences had at least one in five papers produced via international collaboration, which made it statistically different from the five other areas (p = 0.001, = 0.619 for agricultural sciences, p < 0.001, = 0.672 for medical and health sciences, and p < 0.001, d = −1.217 for engineering and technology). One in 167 papers had international collaboration for the humanities (0.60%), and it changed between 9% and 13% for the other four areas. Humanities differed from all other areas significantly (p < 0.001, = 0.859 for social sciences, and > 0.97 for others). Social sciences differed from agricultural sciences (p = 0.001, = 0.636), engineering and technology (p < 0.001, d = 1.00), and natural sciences (p < 0.001, d = 2.018) in addition to the humanities but, interestingly, not the medical and health sciences ( = 0.403).
The papers produced with industry collaboration rate was around 1–2%, except for social sciences and the humanities, and was highest for engineering and technology (at least 2% for half of the categories included). For 24 categories in the social sciences, at least 1 out of every 500 papers was produced with industry collaboration. Industry collaboration was not observed in almost one-quarter of the humanities categories (8 categories). Ethics (0.12%), the history and philosophy of science (0.10%), and language and linguistics (0.10%) were the categories in the humanities that differed from others in terms of having the highest rate of industry collaboration. All areas differed from each other, except the natural sciences and medical and health sciences (H = 150.059, p < 0.001, = 0.583). Considering the effect sizes, the humanities and social sciences were the two leading areas for statistically significant differences from all other areas. Except for the social sciences (U = 0.000, p < 0.001, = 0.830), humanities had values calculated as almost 1. Both the social sciences and humanities differed significantly from engineering and technology (humanities: U = 0.000, p < 0.001, = 1; social sciences: U = 53.000, p < 0.001, = 0.951).
3.2. Comparisons of Six Subareas of Humanities
The humanities were one of the main OECD areas that had very low values in comparison with other areas, in regard to the variables analyzed. We also compared five subareas of the humanities (history and archaeology, language and literature, philosophy, ethics and religion, art, and other humanities) regarding the citation and collaboration variables of the 51,934 sources.
First of all, it should be noted that about 21% of the 51,934 sources have not been cited yet. Philosophy, ethics and religion (24.5%), and art (23.4%) were the prominent subareas having the highest percentages of uncited sources. On the other side, for almost 89% of the sources, the highest number of citations per document was 3. Only 1% of the sources had more than 16 citations, and 2% of the sources had more than ten citations per paper. Only about 17% of the sources had more than half of the documents cited.
Considering Figure 3
, which presents a 95% confidence interval and a scatter of sources based on the citation per publication and percentage of documents cited data, it is clear that all five subareas of the humanities are not the same. Art and other humanities had lower values in regard to citations per publication (see the upper left part of Figure 3
). However, there were sources in the art subcategory that stood out with their high numbers of citations per publication compared with other sources (see the upper right part of Figure 3
). The most remarkable one in this sense was the Canadian Medical Association Journal
with 177 citations per publication. Note that the number of WoS documents was just one. The paper on art, entitled Playing-related musculoskeletal disorders in musicians: A systematic review of incidence and prevalence
, had a high number of citations compared with the papers published in the arts category. The important thing was that the paper was indexed in the general internal medicine category of SCIE, as was mentioned in the Materials and Methods section. However, the paper was not indexed in AHCI in the WoS. It was only considered as a publication of art by the InCites database. These kinds of data issues prove the limitations of citation databases.
On the other hand, languages and literature, as well as history and archaeology, were the two subareas of humanities with the highest number of citations per publication. English for Academic Purposes: An Advanced Resource Book, which is a book series having 201 citations, followed by another book, Etymologies of Isidore of Seville (129 citations), had the highest number of citations per publication for the language and literature subarea. Philosophy, ethics, and religion had medium values for the number of citations per publication, but had many remarkable sources regarding this variable. There were two books that led the way: On Bullshit (446 citations) and Giving an Account of Oneself (324 citations). These were followed by an article published in PLOS Medicine (200 citations) and a book series entitled Birth of Biopolitics: Lectures at the College de France, 1978–1979 (184 citations per publication).
presents the 95% confidence interval graph (lower left) and scatter graph (lower right) for the percentage of cited documents for each of the subareas of the humanities. The most different area, according to the 95% confidence interval graph, was art (mean = 20.6, median = 12.5), which had the lowest percentage of cited documents among other subareas. The scatter graph for the percentage of papers cited (lower right) presents a different picture, suggesting that there were also sources with all the documents cited (100%) at least one time. Note that 459 sources out of 1006 (45.6%), which had 100% of the papers cited, had only one document. They were mainly books or journals from other areas that had only one article indexed under a specific subcategory.
Although five subareas of the humanities differed by the number of citations per paper (H = 358.681, p < 0.001, = 0.0068) and the percentage of cited documents (H = 259.286, p < 0.001, = 0.0049), according to the p values, the effect sizes indicate that these test results are not statistically significant.
The rate of highly cited papers and collaboration statistics of each subarea of the humanities are presented in Table 2
. The highly cited publication rate was highest for philosophy, ethics, and religion, with a rate of about two highly cited publications in 10,000 papers. Language and linguistics stood out with the highest percentage of industry collaboration, that being about 4 in 10,000 documents. It was almost two for history and archaeology, as well as philosophy, ethics, and religion. These two subareas also had international collaboration rates of about 2%, which was the highest. Art and other humanities, on the other hand, had the lowest percentage for all three variables.