1. Introduction
Categorizing academic journals is a fundamental practice in the field of library and information science. It involves the classification of indexed journals into subject-specific categories. Journal classification serves several purposes, including enhancing discoverability, facilitating an efficient search and filtering, and enabling researchers to assess the quality and relevance of articles within their respective fields [
1,
2,
3]. Journals are made available in databases such as Scopus, PubMed, and Web of Science through the process of categorization. One popular method is to assign journals to multiple subject categories and use a hierarchical organization of subject categories.
For example, the Scopus journal classification method, known as All Science Journal Classification (ASJC), follows a hierarchical organization that includes minor and major subject categories, as well as supergroups. Scopus uses approximately 333 minor subject categories [
4]. We refer to such a classification scheme as a multiple-category hierarchical scheme. For instance, at the minor subject category level, a journal may be associated with Library and Information Science (ASJC 3309), Communication (ASJC 3315), and Information Systems (ASJC 1706).
One particularly interesting aspect of a multiple-category hierarchical scheme is the dominance of subject categories within the filtered set of journal data. Understanding the dominant characteristics within such datasets can provide valuable insights into the fundamental nature of defining fields of study and specialization across different disciplines.
In the context of this study, “specialization” refers to the diversity or heterogeneity reflected by the proposed set of subject category indices. To conduct this study, we formulated a set of subject category indices, including the Number of Journals (J), Total Instances of Subject Categories (SC), Number of Unique Subject Categories (USC), and Dominance Index (DOMI).
Thus, the objective of this study is to investigate the dominant characteristics of subject categories within the Scopus database and quantify their dominance using various subject indices. To examine the dominant characteristics of Scopus subject categories, we created distinctive subsets of data by filtering journals associated with each minor subject category. These subsets were used to analyze the frequency of occurrence and dominance of subject categories compared to other co-occurring subject categories.
By investigating the dominant characteristics of subject categories that utilize the multiple-category hierarchical scheme, this study aims to contribute to the existing literature on journal classification and research design. Focusing specifically on Scopus as a case study, the findings of this study offer insights into the disciplinary specialization, prevalence, and dominant characteristics of subject categories within the hierarchically organized multiple category scheme.
2. Literature Review
Categorizing and evaluating subject categories in scholarly databases is a topic of significant interest and importance. Prior research has contributed valuable insights into the organization and characteristics of subject categories, forming the basis for examining the dominant characteristics of subject categories in this study. For instance, García et al. [
5] provided insights into the ranking and relative importance of subject areas within Scopus, which informs our understanding of field importance. Bartol et al. [
6] conducted a comparative analysis of research fields in Scopus and Web of Science, highlighting the similarities and differences between these databases and their impact on research evaluation. These studies serve as a foundation for investigating subject category characteristics in research studies that compare disciplines. In addition, Yan [
7] explored the growth rates and knowledge trading practices of different subjects in Scopus, complementing our exploration of dominant characteristics in a multiple-category hierarchical scheme.
Numerous efforts have been dedicated to enhancing existing journal classification systems. For instance, Singh et al. [
8] conducted a study comparing the classification accuracy of different databases like Web of Science, Scopus, and Dimensions. Their aim was to identify anomalies within these systems. In a separate study, Rafols and Leydesdorff [
9] delved into content-based and algorithmic journal classifications, uncovering their impacts on scientific communication. Gómez-Núñez et al. [
10] introduced a method to refine subject category classifications through reference analysis. Their proposed approach aimed to boost accuracy and effectiveness by restructuring the Scimago Journal & Country Rank (SJR) journal classification. However, this method was predicated on the assumption that the highest citation counts accurately represent the most representative subject categories for each journal. While these studies contribute to the broader discourse on enhancing classification systems, they do not directly address the specific focus of the study on Scopus and its assignment of multiple subject categories.
Previous studies also addressed the challenges associated with assigning publications to multiple categories. In particular, Bormann [
11] examined the complexity and potential impact of such assignments in the Web of Science database and highlighted the need for careful consideration. The diversity and interrelatedness of subject categories reflect the multidimensional nature of scholarly research. For example, Aviv-Reuven and Rosenfeld [
12] and Jesenko and Schlögl [
13] investigated the interconnections and implications of subject category diversity, uncovering the complex relationships between fields. In addition, Kim [
14], Wang and Ahlgren [
15], and Silva et al. [
16] studied the utilization and interdisciplinarity of subject categories, and they found intricate connections between different areas of study and the potential for interdisciplinary research. While studies on multiple subject categories shed light on the intricate nature of inter-connected subject categories, a simpler approach to assess the dominant characteristics of individual subject categories is desirable.
Despite the wealth of existing literature on subject categories, there remains a research gap in adequately addressing dominance and specialization within these categories and in measuring dominance when multiple subject categories are used in classifying indexed journals. This study builds on and extends the existing literature to further contribute to our understanding of the dominant characteristics of subject categories when they are assigned to multiple subject categories, particularly within the Scopus database.
3. Methodology
To assess the dominance characteristics of Scopus-indexed journals, we have formulated several subject indices:
J: The total number of journals within a given subject category;
SC: The total number of subject categories assigned to the journals;
USC: The total number of unique subject categories assigned to the journals after removing duplicate subject categories;
DOMI: The dominance index, defined as the ratio of J to SC.
The value of J represents the number of journals sharing the same subject classification, while SC represents the number of instances of subject categories in a filtered dataset. USC represents the richness of subject categories in a dataset. The concept of the dominance index has been suggested in the context of biodiversity and was initially proposed by Berger–Parker [
17,
18]. We adapted this concept to analyze the dominant characteristics of subject categories and defined DOMI as
The resemblance between the DOMI (dominance index) and the Berger–Parker index lies in their common objective to assess dominance within a given dataset, although they are applied in different contexts. Both indices use a simple ratio to express dominance. However, the Berger–Parker index is calculated as the proportion of the most abundant species in a community, while the DOMI represents the ratio between the number of journals and the number of subject categories in a filtered dataset of a given subject category.
To illustrate the calculation of the subject category index, consider the following journal data filtered from the Scopus journal list using the subject category “Equine” (ASJC 3402):
Equine Veterinary Education: 3402;
Equine Veterinary Journal: 3402, 2700;
Journal of Equine Science: 3402;
Journal of Equine Veterinary Science: 3402;
Pferdeheilkunde: 3402;
Theriogenology: 3402, 1103, 3403, 3404;
Veterinary Clinics of North America—Equine Practice: 3402.
Here, ASJC 3402 is a specific subject category assigned to all journals and common to all assigned journals. Note that some journals are assigned a category other than ASJC 3402 that corresponds to the given subject category (ASJC 3402) (e.g., ASJC 2700). The value of J is 7 because there are 7 journals, and SC is equal to 11 because there are a total of 11 subject categories in the filtered data. USC is 5 because there are 5 unique subject categories in this filtered dataset. These include the specified subject category “ASJC 3402” and all other related subject categories: “ASJC 2700”, “ASJC 1103”, “ASJC 3403”, and “ASJC 3404”. Finally, to calculate the DOMI, we divide J by SC. Thus, 7 divided by 11 gives 0.64.
To empirically examine the proposed subject category indices, all journals indexed in the October 2022 Scopus list were included in this study and analyzed using four subject indices: J, SC, USC, and DOMI. According to the Scopus journal index for October 2022, there were 26,591 indexed journals (excluding trade journals) classified into 333 minor subject categories. Subject category indices were obtained for all these minor subjects. The major subject categories and supergroups were calculated by averaging the subject category index values of the corresponding minor subject categories. The R programming language was used for calculating and analyzing the subject category indices at all levels of subject categories.
The subject indices (J, SC, USC, and DOMI) were used to test the following hypothesis:
H0. There is no significant difference in the mean values of the subject indices (J, SC, USC, and DOMI) across the supergroups (Health Sciences, Life Sciences, Physical Sciences, and Social Sciences).
We tested this hypothesis to determine whether there are statistically significant differences in the mean values of the subject indices between the supergroups, providing insights into the diversity and dominance characteristics of Scopus-indexed journals within various subject areas of Scopus.
4. Results
4.1. Scopus Subject Categories and Journals Assigned with ASJC Codes
Table 1 provides an overview of the ASJC codes, major subject categories, the corresponding supergroups in Scopus, and the number of minor subject categories associated with each major subject category. At the highest level, there are four supergroups: Physical Sciences, Health Sciences, Social Sciences, and Life Sciences. These supergroups encompass a total of 27 major subject categories, each identified by an ASJC code. The special category 10** has only one minor subject category, 1000, which can be considered both a major and minor subject category.
The ASJC codes for the minor subject categories and their descriptions are available in [
4]. Kim [
14] demonstrated a wide variation in the number of journals associated with major subject categories and supergroups of subject categories. Consistent with this study,
Table 1 shows notable variation in the number of minor subject categories in each supergroup, reflecting the diverse nature of journals and subjects within each supergroup. The Physical Sciences supergroup consists of the highest number of minor categories, while the Social Sciences supergroup has the lowest number of minor subject categories. It is important to note that in delineating the supergroups, our goal was to conform to the classification in Scopus’ All Science Journal Classification (ASJC) system. Certain aspects of this classification may be subject to interpretation and differing views.
Figure 1 shows the frequency of journals indexed by Scopus and the number of subject categories used to categorize individual journals. The average number of subject categories used to categorize journals is 2.37, meaning that journals are assigned 2.37 subject categories on average. The most commonly used number of subject categories per journal is two. The journal
Latin American Research Review had the highest number of assigned subject categories (11 ASJC codes). However, journals assigned more than six subject categories are rare, as shown in this figure.
4.2. Ranks of Minor Subject Categories Based on Subject Category Index
Table 2 presents the distribution and characteristics of minor subject categories in relation to various metrics related to subject categories. The table displays the top and bottom five subject index values of minor subject categories. Multidisciplinary (ASJC 1000) stands out with the highest DOMI value of 0.7, indicating its dominance relative to other subject categories in the filtered subset. Despite its name suggesting connections to various categories, the high DOMI indicates its strong presence within the subset, with relatively fewer co-occurring subject categories. Thus, the high dominance of this subject category implies that it is not commonly associated with many other subject categories, despite its apparent connotations of being cross disciplines. The lower DOMI implies that the subject category of interest is commonly associated with a broader spectrum of related subject categories. For instance, Reviews and References (medical) stands out as a subject category with a lower DOMI, suggesting a higher degree of co-occurrence with other subject categories.
Among the top five subject categories, History (ASJC 1202) and General Medicine (ASJC 2700) exhibit the highest diversity in subject categories, with 4221 and 4220 total SCs, respectively. History (ASJC 1202) has the highest value in SC but does not appear in the top 5 of USC. Conversely, Computer Science Applications (ASJC 1706) has a high value in USC but does not appear in the top 5 of SC. This suggests that History co-occurs with a smaller number of subject categories compared to Computer Science Applications. Computer Science Applications relatively co-occurs with more subject categories but has lower frequency counts of SCs than History. General Medicine (ASJC 2700) appears in the top five lists for J, SC, and USC, indicating the broad scope and multidisciplinary nature of research within this area.
Overall, subjects with lower values in J, SC, and USC tend to exhibit a narrower and more specialized focus in their research, as exemplified by Respiratory Care (ASJC 3615). In general, health science subjects are present in both the top and bottom five subject categories across all indices, underscoring the broad diversity of indexed subject categories within the field. The complete list of minor subject categories and their subject category indices are shown in the
Supplementary Material Table S1.
4.3. Relationships among Subject Category Indices
The scatter plots in
Figure 2 provide information about the relationships between subject category indices. They show clusters of subcategories with similar index values and highlight areas of high density. Each point in these subgraphs represents a subject category. In general, the indices have a wide range of values, indicating considerable variation. Particularly noticeable are the outliers for subject categories with extreme values for J, SC, USC, or DOMI. It is worth noting that most subject categories have DOMI values between 0.25 and 0.5, indicating moderate dominance in their respective fields.
The correlation analysis shows that while J, SC, and USC show positive correlations with each other, DOMI does not exhibit any meaningful correlation with other subject category indices. J and SC have the highest correlation, with an R-squared value of 0.95 and a correlation coefficient (r) of 0.98. The linear relationship in the SC versus J scatterplot implies that as the number of journals increases, there is a proportional growth in the diversity of subject categories. This finding underscores the intricate interplay between the quantity of journals and the richness of subject categories within the Scopus-indexed dataset. The analysis also shows minimal correlations between DOMI and J (0.089) and between DOMI and SC (−0.041). There is a weak negative correlation (−0.2) between DOMI and USC, indicating a slight negative relationship. The low correlations between DOMI and J, SC, and USC suggest that the dominance of journals within specific subject categories (DOMI) is not strongly tied to the sheer number of journals, subject categories, or unique subject categories.
Examining outliers within subject category indices reveals intricate insights into the particular characteristics and dominance patterns of specific journals or subject categories. Outliers can be visibly detected in this figure. The significance of outliers lies in their deviation from the typical range of values, suggesting that these journals, subject categories, or dominance indices possess unique characteristics that set them apart. This includes not only those that have extreme values on the x-axis or y-axis but also those that appear far from the linear regression lines. For example, the top five minor subject categories with a high DOMI mentioned in
Table 2 could be visibly traced in this graph in the top part of the graph where the DOMI is shown. These include 1000 Multidisciplinary (0.7), 3402 Equine (0.64), 3500 General Dentistry (0.63), 3404 Small Animals (0.61), and 2900 General Nursing (0.58). Analyzing such outliers enriches our insights into the diversity and distinctive features exhibited by certain subject categories within the realm of Scopus-indexed journals. Equine, in particular, also has a low USC and stands out as an outlier in the top left corner of the lower right subplot in
Figure 2, contributing to its distinctive position in the overall distribution of subject categories within the dataset.
4.4. Subject Category Indices of Major Subject Categories and Supergroups
Figure 3 shows subject category indices for major subject categories. As shown, the Multidisciplinary (ASJC 10**) subject category has the highest DOMI value of 0.7, indicating a high level of dominance and specialization in the multidisciplinary research domain. Within the Health Sciences supergroup, both the Veterinary and Dentistry subject categories exhibit relatively high J and SC values compared to their USC values. The Veterinary subject category particularly stands out with a high DOMI value of 0.48, suggesting a narrower focus and specialization compared to other subject categories. In contrast, within the Life Sciences supergroup, the Biochemistry, Genetics, and Molecular Biology subject category shows a relatively low DOMI value of 0.28, suggesting a more distributed and diverse landscape within this subject category without a single dominant research area.
Regarding J and SC, Arts and Humanities and Social Sciences stand out the most. However, their DOMIs (0.33 and 0.34) are moderate, suggesting that the focus and specialization of these subject categories are considered moderate among the supergroups, and the journals are categorized at a relatively moderate level with other subject categories.
Table 3 indicates variations in subject category indices (J, SC, USC, and DOMI) within subject categories among the supergroups—Health Sciences, Life Sciences, Physical Sciences, and Social Sciences. These calculations were based on the mean of the subject category indices of minor subject categories.
Table 3 presents the results of a one-way analysis of variance (ANOVA) examining the differences in subject indices across the supergroups. The Social Sciences (SS) supergroup stands out with higher mean values in all indices compared to the other supergroups. This suggests a larger number of journals, subject categories, and unique subject categories, within the Social Sciences field. Furthermore, the Health Sciences (HS) supergroup shows a higher dominance (DOMI) (0.37), while Physical Sciences (PS) has the lowest DOMI (0.29). The hypotheses stated earlier in the methodology section were tested using a one-way analysis of variance (ANOVA) presented in
Table 3. The results revealed significant differences in the mean values of subject indices (J, SC, USC, and DOMI) across the supergroups (Health Sciences, Life Sciences, Physical Sciences, and Social Sciences), as evidenced by the rejection of the null hypotheses (J: F(3, 17.3) = 1.9 × 10
-10,
p < 0.001; SC: F(3, 20.11) = 5.51 × 10
−12,
p < 0.001; USC: F(3, 22.08) = 4.8 × 10
−13,
p < 0.001; DOMI: F(3, 35.28) < 2 × 10
−16,
p < 0.001). These findings suggest substantial variations in subject indices among the supergroups in the Scopus database.
5. Discussion
This study aimed to investigate the characteristics of a journal classification method that uses a multiple-category hierarchical scheme, utilizing various subject category indices, namely J, SC, USC, and DOMI. It is worth noting the general characteristics of these subject category indices. One of the fundamental characteristics of journals classified using a multiple-category hierarchical scheme is that J cannot be greater than SC, and USC cannot be greater than SC either. Furthermore, while each subject category index offers its own utility, DOMI depends on the values of J and SC. Thus, subject categories with high DOMI tend to have relatively high J values and low SC values, whereas subject categories with low DOMI tend to have relatively low DOMI values along with high J and SC values.
In the case of Scopus, empirical results based on filtered subsets revealed observable characteristics of subject categories concerning subject category indices (J, SC, USC, and DOMI). Lower values in J, SC, and USC tended to indicate a narrower focus, while subject categories with higher values in these indices tended to be more general in nature. Although some subject categories with high DOMI values also exhibited lower values in J, SC, and USC (e.g., the subject category Equine), the distributional characteristics of subject categories concerning DOMI in relation to J, SC, and USC indicate a low or negligible correlation between DOMI and other subject category indices (J, SC, and USC). We emphasize that empirically observed low correlations between DOMI and other indices indicate that the dominance index explores distinctive facets of subject category characteristics, not closely associated with the sheer abundance of journals or subject categories. Thus, DOMI provides unique insights into the dominance patterns within subject categories beyond what can be inferred from basic quantity metrics.
Our results also revealed that variations in subject category indices among minor subject categories and differences among major subject categories underscore the importance of evaluating dominance and subject category features in hierarchical systems like Scopus. When subject category indices were measured at different levels, they exhibited distinct characteristics. At the supergroup level, we found that minor subject categories within the Health Sciences are more specialized and have a higher DOMI than those within Physical Sciences, although the Physical Sciences consist of more minor subject categories than those in the Health Science.
In general, minor subject categories showed more variation in subject category indices compared to their upper-level subject categories. For example, while the minor subject category General Nursing (ASJC 2900) had a DOMI of 0.58, Assessment and Diagnosis (ASJC 2903) had a DOMI of 0.21. However, both of these minor subject categories belong to the major subject category of 29** Nursing, which has a DOMI of 0.35. Thus, the averaged DOMI of its minor subject categories does not indicate a wide range of DOMI.
Regarding the characteristics of indices, J serves as the most primitive index, aiding researchers in identifying journals with a higher total number of publications within a specific subject category. This information assists authors in targeting journals actively publishing research in their field of study. By definition of DOMI, J and SC are influential indicators in calculating the DOMI, while USC reflects the overall degree of unique subject categories that are assigned to the journal in a filtered dataset. These indices, when evaluated independently, are primitive and may have limited practical applications. However, their true utility unfolds when these indicators are used together to provide a more comprehensive understanding of the dominance characteristics of a filtered dataset derived by using a particular category as a filtering criterion.
One potential application of the proposed indices is that they can be used to assess the degree to which the subject category of interest is connected with other subject categories when using journal-based ranking websites, such as SJR. For instance, the subject category of Reviews and References (medical) [ASJC 2744], with J, SC, USC, and DOMI values of 2, 10, 8, and 0.2, respectively, exhibits a low DOMI, indicating connections with other subject categories. In addition, from these values, we can see that journals indexed in this category are low, and the degree to which other subject categories are assigned together with the subject category of Reviews and References (medical). Thus, using all these indices provides a more comprehensive picture of journals and subject categories associated with this particular subject category of interest.
Furthermore, the varying degree of dominance of subject categories in the Scopus classification has an important implication on the selection of subject categories in research studies, for example, the subject category of Library and Information Science (ASJC 3309). With a DOMI value of 0.4 and a ranking of 39th out of 333, Library and Information Science demonstrates relative dominance compared to other minor subject categories. Journals listed under this subject category are less likely to be associated with other subject categories. Consequently, researchers should thoughtfully weigh the overall dominant characteristic of the subject category and its extent of association with other subject categories when choosing journals for their studies. Journals within the Library and Information Science subject category, marked by a high DOMI, offer valuable and relatively specialized literature within the field.
There are two limitations to this study that are worth mentioning. First, the subject category indices presented in this study rely on Scopus-indexed journals. As Gómez et al. [
19] pointed out, journals may evolve, change their focus, or introduce new interdisciplinary perspectives, making it challenging to assign them to fixed disciplinary categories. Thus, as the data changes and new journal lists become available, the subject category indices will need to be recalculated.
Second, issues related to indexing were not explicitly considered, and it was assumed that subject categories accurately represented the journals. Solely relying on indexing words may lead to inaccuracies, known as the “indexer effect” [
9]. Identifying journals’ fields of study or subfields of study through subject categories is inherently challenging and subject to subjective interpretation [
19]. There are studies that have raised concerns regarding the broad categories and quality of the Scopus journal classification method [
8,
10,
20].
6. Conclusions
We recognize that more sophisticated bibliometric methods may reveal deeper problems in journal classification; however, our selected measures provide a straightforward and interpretable assessment of diversity and dominance. The strength of our proposed approach lies in the generalizability of the subject category indices, which can be applied to other classification systems with multiple subject categories. It can also be extended to examine the dominance characteristics of research types that use multiple-category hierarchical schemes. In PubMed, for example, articles are categorized at the article level by study type or publication format, which provides researchers with a structured framework for navigating and exploring the literature in their areas of interest [
21,
22]. Consequently, further empirical studies can address some of the limitations mentioned above and contribute to a better understanding of the prevalent characteristics of subject categories through more extensive empirical research.