Suborganizations of Institutions in Library and Information Science Journals

: In this paper, we analyze Web of Science data records of articles published from 1991 to 2010 in library and information science (LIS) journals. We focus on addresses of these articles’ authors and create citation and collaboration networks of departments which we define as the first suborganization of an institution. We present various rankings of departments (e.g., by citations, times cited, PageRank, publications, etc .) and highlight the most influential of them. The correlations between the individual departments are also shown. Furthermore, we visualize the most intense citation and collaboration relationships between ―LIS‖ departments (many of which are not genuine LIS departments but merely affiliations of authors publishing in journals covered by the specific Web of Science category) and give examples of two basic research performance distributions across departments of the leading universities in the field.


Introduction and Related Work
Bibliometric studies can roughly be conducted at three levels-individual researchers (micro-level), institutions (meso-level), and countries (macro-level).Of course, these -basic‖ levels can have their own sublevels (e.g., regions of a country) or they can be grouped into supralevels (such as continents).There have been many bibliometric analyses at various levels, but we can feel that at the meso-level those analyses have mainly concentrated on institutions as such or that they have not really been large-scale, i.e., involving tens or hundreds of thousands of items to analyze.This study tries to bridge OPEN ACCESS this gap in the field of library and information science (LIS) by analyzing several tens of thousands of bibliographic records at the meso-level and concentrating on the suborganizations of institutions.An institution (or the primary organization) usually has an organizational structure comprising some suborganizations (level 1) that themselves may consist of other suborganizations (level 2).The depth of this hierarchy may vary-some institutions have a relatively flat structure, while other hierarchies may include suborganizations of even higher levels.A typical academic institution (a university) may be divided into faculties, schools, departments, laboratories, and research groups, which are difficult to capture in scientometric studies due to the inconsistent way they are present (or absent) in authors' addresses.As we will show later on, we will call level-1 suborganizations -departments‖ for the sake of simplicity.The main research questions of this study are the following: (a) Do Web of Science (WoS) data contain enough information to analyze the scientific performance and collaboration of the departments with which authors of journal articles in the LIS research area are affiliated?(hereafter called -LIS‖ departments); (b) What are the most intense citations and collaborations between -LIS‖ departments?and (c) Which -LIS‖ departments are the most highly ranked by various indicators based on publications from 1991-2010?Responses to these questions will be given in the next sections.
Bibliometric analysis of library and information science institutions has a long history in the United Kingdom.For instance, Bradley et al. [1] measured the publication patterns of the Department of Information Studies at the University of Sheffield, Holmes and Oppenheim [2] analyzed the citation impact of British LIS departments, and Oppenheim [3] ranked British LIS schools by citation impact.Seng and Willet [4] conducted a citation analysis of a small number of LIS departments in the UK and LIS departments in the UK were investigated by Webber [5].British LIS departments were also analyzed webometrically-by Thomas and Willet [6] and by Arakaki and Willet [7].As for other regions of the world, Aina and Mooko [8] analyzed a small set of top African LIS researchers and defined the centers of the African LIS research.Another tiny group of LIS publications was investigated by Herrero-Solana and Rí os-Gómez [9] to identify the most productive Latin American universities and departments.Meho and Spurgin [10] ranked American LIS schools by the visibility of their faculty in various databases and Yazit and Zainab [11] reported on the publication productivity in LIS of some Malaysian institutions.There have been two large-scale studies in which Yan and Sugimoto [12] explored citation patterns of various LIS institutions and He et al. [13] explored tens of thousands of LIS publications, but both of them remained at the institutional level.This study is the only large-scale one at the departmental level and the visualization tools used in this article are discussed by Shannon et al. [14].

Data and Methods
In November 2012 we manually queried the Web of Science web interface to obtain records of all articles published in the period 1991-2010 and indexed in the Social Sciences Citation Index in the research area -Information Science & Library Science‖ (ISLS).We were interested in the -article‖ document type only.In this way, we acquired plain text metadata on 46,800 journal articles.(Saving to plain text took about 50 min because a maximum of 500 records can be saved at once by anyone with a Web of Science subscription.)These metadata typically include an article's title, journal name, volume, issue, pagination, and year as well as its authors' names, addresses, times cited count and some other information.An example of a journal record is presented in Figure 1.As we can see, only some of the cited references (CR) can be identified unambiguously-in this case with a digital object identifier (DOI).The remaining references can be identified using the volume, issue, and pagination or cannot be identified at all.To create a citation network from the article records retrieved (a basic, root, or seed set of articles), we need one more tool.Therefore, in the next step, we used the Web Services Lite application programming interface (API) to retrieve the records of articles citing the articles in the basic set.This API is available for free to anyone with a Web of Science subscription after registration.In total, we got 175,139 citing article records.The information contained in the citing article records is somewhat less abundant than in the plain text seed article records.In particular, any author address information is missing.On the other hand, citing article records are structured in a similar way as XML records.See Figure 2 for an example of a citing article record.In the example, an article with ID (UT) 000283981500004 is cited by an article with ID 000283981500001.These IDs can then be matched with -UT WOS‖ in seed article records (see bottom of Figure 1) and, as a result, a complete citation network of the articles in the root set can be constructed.This citation graph had 94,836 edges, i.e., slightly over 54% of all citations were citations within the seed set.Since this paper is concerned with departments, the research depends on the extent to which affiliations and addresses of article authors are systematically present in the records we analyzed.
There is no genuine affiliation information in the records, but there is often information on authors' addresses denoted with C1 and RP like in Figure 1.RP means a -reprint address‖, which is the address of the corresponding author (usually, but not always, the first author), and C1 is a field containing authors' addresses.Reprint and -normal‖ addresses may sometimes be the same, for instance when there is one author only.In total, almost 88% of publications had some address information associated with them and 65% had both reprint and normal address.85% of publications had a reprint address and 68% had one normal address at least, but the latter percentage was quite different in various years under study as can be seen from Figure 3.While the share of publications with some address information has been about 90% throughout the period, the number of publications with one normal address at least has only had a similar share since 1998.Before 1998 there was a high percentage of publications having a reprint but no normal address (from 45% to 70%), but this was almost negligible in later years and so was the number of articles having a normal address but no reprint address in the whole period 1991-2010.As can be seen in Figure 1, addresses have a relatively clear structure starting with an institution followed with suborganizations (from bigger to smaller ones) and ending with a city and a country.Organizations (institutions) and suborganizations are written using standardized abbreviations and are delimited with commas as are cities and countries.In our experience, reprint addresses often include also other information such as street names and numbers or state or province names, etc.This additional information can distort the common address pattern -institution, suborganizaiton1, …, suborganizationN, city (+ZIP), country‖, but based on our experiments with random address samples and a manual checking of the pattern correctness, the pattern is violated in a few percent of cases even if reprint addresses are included.As a result, we made an approximation and considered all addresses in all publications in the period 1991-2010 as having an institution as their first item, a city and a country as their last item, and suborganizations in between.The number of suborganizations can vary as shown in Table 1.In the data under study, an institution (main organization) can have up to seven suborganizations associated with it, but most affiliations consist of an institution and its suborganization.Thus, before all the experiments whose results will be reported in the next section, we retained suborganization 1 in each address and discarded the other suborganizations of higher levels.We will call the couple -institution; suborganization 1‖ a -department‖ because this is typically what is represented by that.

Results and Discussion
The citation graph of departments we obtained had 18,291 nodes and 154,744 edges.The graph is directed and the edges are weighted with an average weight of 2.62 per edge.The total sum of edge weights in the graph (404,755) is the total number of citations between departments.In Table 2 we can see the departments that received the most citations: -Indiana Univ; Sch Lib & Informat Sci‖, -Leiden Univ; Ctr Sci & Technol Studies‖, and -Univ Sheffield; Dept Informat Studies‖.However, the numbers of publications by which the departments are represented (see the last column in Table 2) vary significantly so -Leiden Univ; Ctr Sci & Technol Studies‖ with 3722 citations and 84 publications is actually relatively more cited than -Indiana Univ; Sch Lib & Informat Sci‖ with 4334 citations and 243 publications (44 citations per publication compared to 18).But the measure of citations per publication is obviously biased towards departments with fewer publications.For instance, the relatively most cited department in Table 2 is -Lib Hungarian Acad Sci; Bibliometr Serv‖ (position 33) with 165 citations per publication.
As far as the citations between individual departments are concerned, we can see the most intense of them in Figure 4.The size of nodes is based on the -times cited‖ (see below for an explanation) of a department and the thickness of edges depends on the number of citations from one department to another.We can notice that there are two big components-one centred around -Wolverhampton Univ; Technol‖, but these may sometimes be self-citations of departments that changed their names or whose names are used inconsistently.These errors are inherent in the Web of Science data and they could be removed only by means of a huge amount of manual effort.In total, we found that 4.3% of all citations were intra-institutional.The citations shown in Table 2 are based on the citation graph of departments, which was generated from the core 46,800 publication records retrieved.Citations from publications outside of this core are not counted in, but they are included in the -Times Cited‖ indicator which is present in each publication record retrieved (TC in Figure 1).The ranking of departments by times cited looks different than that in Table 2 and the top departments are presented in Table 3.The best three departments are -Univ Minnesota; Carlson Sch Management‖, -Harvard Univ; Sch Med‖, and -Univ Maryland; Robert H Smith Sch Business‖.Again, departments with fewer publications often have higher times cited counts.An extreme case is -Univ So Calif; Knowledge Syst Lab‖ with one publication only and the largest times cited in Table 3.Note that the times cited count is not always greater than or equal to citations because both indicators are based on different citation graphs-the citation graph of articles and the citation graph of departments, respectively.Imagine a department affiliated with one article only that is merely cited once from an article with which three distinct departments are affiliated.In that case the cited department's times cited count is 1 and its citations indicator is 3. Thus the ranks of individual departments in both rankings can differ significantly.For example, -Univ So Calif; Knowledge Syst Lab‖ is ranked 10th by times cited but 396th by citations or -Lib Hungarian Acad Sci; Bibliometr Serv‖ is 33th by citations but 155th by times cited.Anyway, the interpretation may be that -Univ So Calif; Knowledge Syst Lab‖ is relatively more cited by researchers from other scientific fields than from the community of library and information science whereas -Lib Hungarian Acad Sci; Bibliometr Serv‖ is relatively more cited from within the community than from outside of it.There is also one highly ranked -department‖ by times cited, namely -The Scientist; 3600 Market St‖, which is wrongfully identified as such from frequent addresses associated with -The Scientist‖ journal articles in WoS data and which is ranked very low by citations.Nevertheless, the correlation between the department rankings by citations and by times cited is still rather high as will be shown later on.By the way, many of the present departments are not genuine LIS departments, but are affiliations of authors publishing in journals categorized as ISLS by WoS showing the multidisciplinarity of this field.On the other hand, some LIS research is also published in other WoS categories not covered by this study.We did not make an attempt to disambiguate and/or unify the names of institutions and suborganizations, but we used them as they were in WoS data.Instead, we tried to estimate the share of possible duplicate departments.The easiest way to do so was to calculate the similarities of all department names in three random samples of 500 departments using a well known algorithm and then manually check the department pairs whose similarity reached a certain threshold.The determined share of duplicate departments was always below 1%.Thus, we believe that the absence of name disambiguation and unification (which is a very time-consuming task) does not significantly affect the results of this study.Apart from citations, we can also inspect collaboration patterns.The most intense collaborations between departments are depicted in Figure 5, where the node size depends on the publication count of a department and the edge thickness depends on the number of collaborations.The three most intense collaborations occur between -Univ Illinois; Coordinated Sci Lab‖ and -Univ Illinois; Grad Sch Lib & Informat Sci‖ (an intra-institutional collaboration), -Brigham & Womens Hosp; Div Gen Med & Primary Care‖ and -Harvard Univ; Sch Med‖, and -Harvard Univ; Sch Med‖ and -Harvard Univ; Sch Publ Hlth‖ (also an intra-institutional collaboration).-Harvard Univ; Sch Med‖ is the -centre‖ of the biggest community in Figure 5 collaborating with four -Brigham & Womens Hosp‖ departments, with another -Harvard Univ‖ department, and with -Childrens Hosp; Div Emergency Med‖.The share of intra-institutional interactions is substantially greater with collaborations than with citations-we found that almost 22% of all 22,569 collaborations were intra-institutional.As for the strength of the relationship between citations and collaborations, it does not seem meaningful to draw any conclusions from our data since only about 6% of collaborations occurred more than once and only about 1.5% of citations occurred more than ten times.
In addition to the rankings by citations or times cited, we created also other rankings of -LIS‖ departments based on other indicators: Publications (by the number of publications), Indegree (like citations but with all weights in the citation graph of departments set to 1), AvgTimesCited (average times cited per publication), HindexByTimesCited (h-index as defined by Hirsch [15] and based on times cited), HindexByEdges (based on citations within the graph), HITS [16], PageRank [17], and Weighted PageRank [18].From these other eight rankings we only show the top 40 departments by PageRank and weighted PageRank in Table 4 and Spearman's rank correlations between all the rankings in Table 5 (all significant at the 0.01 leveltwo-tailed).
The PageRank and weighted PageRank rankings are the most highly correlated rankings of all with a rank correlation coefficient of 0.996 and also the first difference in the rankings is at rank 5, where there is -Haifa Univ; Dept Geog‖ by PageRank and -Univ Minnesota; Carlson Sch Management‖ by the weighted PageRank.Otherwise, the rankings in Table 4 are quite similar to each other but less so to the ranking by citations (correlation about 0.83) and even less to the ranking by times cited (around 0.69).PageRank-like algorithms (and also HITS) are iterative recursive methods dependent on the structure of the citation graph of departments and, therefore, they are much more related to citations than to times cited.Although the top departments shown in Table 4 do not resemble those in Tables 2 and 3, the overall rankings are still quite strongly correlated with all other rankings except Publications.The least correlation we found between Publications and AvgTimesCited-only about 0.2 Publications is also the most distant ranking from all others with an average correlation of 0.483.
Finally, to conclude the section on results, in Table 6 we present examples of the most influential departments (by times cited) of four leading universities having the greatest times cited counts in our LIS data set.These universities are -Univ Maryland‖, -Indiana Univ‖, -Georgia State Univ‖, and -Univ Minnesota‖.We can notice that there are basically two types of performance distribution at institutions-either there is one dominant department like -Carlson Sch Management‖ at -Univ Minnesota‖ or -Robert H Smith Sch Business‖ at -Univ Maryland‖ or, to a lesser extent, -Sch Lib & Informat Sci‖ at -Indiana Univ‖, or there are several comparably well performing departments like -Coll Business Adm‖, -Robinson Coll Business‖, and -Dept Comp Informat Syst‖ at -Georgia State Univ‖.Even if this example is small, we can assume that all influential institutions whose research influence is investigated at the level of departments can fit into one of these two basic performance distribution schemes.In our future work on the scientific performance and collaboration at the level of departments, we would like focus on other fields of science, other publication sources (e.g., conference proceedings), and other time periods.

Figure 1 .
Figure 1.A sample journal article record.

Figure 2 .
Figure 2. A sample citing article record.

Figure 3 .
Figure 3. Numbers of publications with different types of addresses.


-Indiana Univ; Sch Lib & Informat Sci‖ is the best department in terms of citations and -Univ Minnesota; Carlson Sch Management‖ is ranked first by times cited. Most cited of all departments is -Indiana Univ; Sch Lib & Informat Sci‖ by -Wolverhampton Univ; Sch Comp & Informat Technol‖ and the most intense departmental collaboration occurs between -Univ Illinois; Coordinated Sci Lab‖ and -Univ Illinois; Grad Sch Lib & Informat Sci‖.

Table 1 .
Examples of various suborganizations of an institution.
Sch Comp & Informat Technol‖ and the other one around -Penn State Univ; Sch Informat Sci & Technol‖.The most intense citations as such are those from -Wolverhampton Univ; Sch Comp & Informat Technol‖ to -Indiana Univ; Sch Lib & Informat Sci‖, -Victoria Univ Wellington; Sch Commun & Informat Management‖, and -Univ Western Ontario; Fac Informat & Media Studies‖.There are also intra-institutional citations such as from -Wolverhampton Univ; Sch Comp & Informat Technol‖ to -Wolverhampton Univ; Sch Comp & Informat Sci‖ or from -Penn State Univ; Coll Informat Sci & Technol‖ to -Penn State Univ; Sch Informat Sci &