RePEc has become an important bibliographic service for economics and related fields. A considerable amount of data has been collected regarding who authored which paper, where it was published, who reads it, and where it is cited. One way to use this wealth of data is to compute rankings of individuals, journals (and series), institutions, and even countries. Along with the growth of the underlying data, these rankings, even though they are still experimental, have grown in importance in the profession. Indeed, there is evidence that they are used more and more for evaluation purposes (promotion and tenure decisions) and even hiring. Also, country-specific rankings have been used in various professional publications and even the popular press.
It is therefore time for the methodology behind these rankings to be explained. While a criterion such as the number of citations may appear to be simple, it is necessary to understand how it is computed. Indeed, for ranking purposes in RePEc, self-citations are not counted, but citations to other versions of an articles are counted. It is also important to understand how the citations are extracted, i.e., what citations can be considered in the statistics.
Compared with other ranking exercises, the present one also includes some criteria that are unique, such as those based on readership, those based on the number of authors citing, and those based on centrality among co-authors. It is also rare to find the same source being used both to establish impact factors of publications and rankings of authors or institutions. Finally, no other effort has included working papers, which have now become a very important way to disseminate research in economics, if not the most important.
The RePEc project would never have been possible without the efforts of the many volunteers that have participated in one way or another: the maintainers of the so-called RePEc archives who contribute the basic bibliographic data and all those who have contributed through their programming skills, making available hardware and/or bandwidth, giving advice or simply spreading the word about RePEc. RePEc is committed to honor the work of these volunteers by making sure their work will never be subject to fees, both for publishers and users, and will remain in the public domain.
The rest of the paper is structured in the following way. Section 2
describes how the various components of the data used in the rankings are gathered. Section 3
details the construction of the impact factors. Section 4
describes how articles and working papers can be ranked. The various criteria used to rank authors are introduced in Section 5
, which also discusses the various ways these criteria can be aggregated and justifies the choices made for the “official” rankings. Section 6
, 7, and 8 present the procedures to rank, respectively, institutions, geographic regions, and finally other rankings. Section 9
takes a snapshot of the data and documents the concordance of the the various rank criteria. Section 10
discusses how RePEc rankings differ from other rankings. Section 11
2. Data Gathering
This section describes how all the data are gathered to obtain the sources underlying for the rankings. All data come from RePEc and other projects related to RePEc. These data are continuously updated, and the rankings are refreshed on a monthly basis.
2.1. Bibliographic Data
The source of all the bibliographic data is RePEc. RePEc (Research Papers in Economics, http://repec.org/
) was founded in June 1997 under the leadership of Thomas Krichel as a followup project to NetEc, founded in 1993. Under very little central management, publishers (commercial or academic) contribute the bibliographic data (called metadata) themselves using a common format. These data are provided through the servers of the publishers, which anybody can access and use. Thus RePEc is just a scheme to organize metadata and make it available in the public domain.
At the time of this writing, almost 1,500 archives were contributing metadata to RePEc, thus covering: 3,400+ series with 460,000 working papers, 1,500 journals with 720,000 articles, 15,500 book chapters, 12,000 books, and 2,700 software components, for a total of over 1,200,000 items. Almost 1,100,000 of them are available for download in full text.
So-called RePEc services are then allowed to use these data to freely provide public access to them. Several websites directly display the data collected through RePEc, the most popular being IDEAS [1
], EconPapers [2
], Inomics [3
], and finally Socionet [4
An email notification service for new on-line working papers is also available (NEP [5
]). Finally, data gathered by RePEc are relayed through the Open Archives Initiative and therefore made available even more widely, but to services that do not specialize in economics, such as Google Scholar and Oyster.
RePEc data are so far relatively under-used for the exploration of publishing in Economics. Only a few papers so far have exploited the dataset. [6
] use RePEc data to compute some alternative rankings of economists, and in the last case to categorize them in archetypes. [9
] use the RePEc journal impact factors to determine the impact of a publication on academic salaries in California universites. [10
] use the RePEc author rankings to analyse the distribution of citations across authors within cohorts. Finally in this issue, [12
] compare ISI and RePEc impact factors for econometrics journals and conclude that the 2-year ISI factors are not robust.
2.2. Author Data
For any ranking, one needs to collect information about the publications of an author. One great difficulty is the many ways an author’s name may be indexed. For example, John Maynard Keynes may be listed in the bibliographic metadata as
John Maynard Keynes
John M. Keynes
J. M. Keynes
Keynes, John Maynard
Keynes, John M.
Keynes, J. M.
and one an imagine many other ways, including misspellings. Variations are even more numerous if nicknames, titles or suffixes (Jr., Sr., III) are used or if accents are used. In addition, several people may have the same name, especially if the first name is abbreviated. Thus, an automatated attribution of works to authors is bound to have a high level of errors. Human intervention is necessary here.
The best people to perform this intervention are the authors themselves. To do this, they register with the RePEc Author Service at http://authors.repec.org/
. In doing so, they provide contact details, their affiliations (see next section), and their name variations expected in the metadata. The search engine then suggests to them works from the RePEc metadata that match the name variations—works that the author then can add to their profile.
One may ask why authors would go through that trouble. There are several incentives, see [13
]. First, without being registered in the RePEc Author Service, an author is not ranked and his research output does not count toward the ranking of the institutions he is affiliated with. Second, when registered, an author obtains notification of new citations that are found within RePEc, a compilation of all citations, as well as a detailed ranking analysis every month.
At the time of this writing, over 32,000 authors were registered, claiming over 750,000 works as theirs, somewhat less than half of all the works listed in RePEc once the double-counts (works claimed by several authors) are taken into account.2
The RePEc Author Service is based at the Economic Research Division of the Federal Reserve Bank of St. Louis and is monitored by the author of this paper. It runs on open source software written by Ivan Kurmanov and financed by a grant from the Ford Foundation, with extension funding provided by the Federal Reserve Bank of St. Louis.
2.3. Institutional Data
Institutional data are based on the institutional records collected since 1995 in EDIRC (Economics Departments, Institutes, and Research Centers in the World, [14
]). This website collects links to academic institutions and government agencies that would principally employ economists. The data are quite accurate; for example it lists within a university all relevant departments (economics, finance, agricultural economics, business schools, and sometimes public policy and similar departments), research centers, institutes, formal research groups, and some chairs are listed as long as economists form a substantial part of the staff or economic issues are prominent in the mission of the group. A second condition is that this listed entity has its own website. It does not need to have its own server (virtual or not), but it needs to have a web page that is more substantial than just a listing of classes: there should be at least a listing of faculty by name.
Entities not based in universities can also be listed. The obvious ones are central banks and government agencies directly applying economic policy, say ministries of finance, treasury, labor, and industry, but also statistical agencies and various research agencies. The same applies to international organizations. Finally, independent research institutes and think tanks are also listed, but not most commercial institutions (banks, consultants). The only exceptions are those that have a RePEc archive or that provide substantial research for free through their website. Associations and societies are also listed.
All in all, over 12,700 institutions are listed, almost 6,000 of which are associated with an author registered in the RePEc Author Serive (not counting those without claimed works). If they are specialized in a particular field, they are categorized, and almost all governmental agencies are categorized. Institutions are also categorized by countries or, in the case of the United States, by state. When authors register with the RePEc Author Service, they have the opportunity to specify with which institutions they are affiliated with among those listed in EDIRC (except associations and societies), but they can also suggest new entities. If they do not fit within the criteria of EDIRC, they are still kept in their list of affiliations without a link to an institution in EDIRC.
EDIRC is housed and managed at the Economic Research Division of the Federal Reserve Bank of St. Louis by the author of this paper.
2.4. Citation Data
Citation counts are often considered to be the most useful metric of the impact of a piece of research. Finding citations is, however, not a trivial matter. It can be performed either manually at great cost or automatically which is a process that needs considerable fine tuning and many exception rules.
All citation data for RePEc ranking purposes are provided by the CitEc project [15
] managed by José Manuel Barrueco Cruz, librarian at the University of Valencia. CitEc runs on hardware provided by the Valencian Economic Research Institute.
CitEc downloads all papers that it can find in pdf format, typically those that are not hidden behind a password or some IP protection. Those pdf files are then successively converted to PostScript and text. The text is then parsed to recognize the references, which are then paired with items listed in RePEc with a fuzzy matching algorithm on titles and authors. To prevent erroneous attributions, the level of confidence for a match needs to be set quite high. For somewhat lower levels of confidence, registered authors have the option to check and add appropriate citations.
At the time of this writing, over 360,000 documents have been processed, extracting over eight million references, over three million of which refer to over 400,000 items listed in RePEc. Given that only freely available documents can be analyzed, a large part of those documents are working papers. This has advantages and disadvantages. Working papers are typically more recent than published articles, thus it allows a much more up-to-date analysis than with articles alone. However, citations in published articles are considered to be much more valuable than those in working papers (erroneously, as discussed further in a subsequent section). This is partially corrected in three ways: (1) publishers directly provide information to CitEc about references in their articles, either because their content is gated or because they want to increase the quality of matches; (2) for authors who have both the working paper and published article version of an item in their profile, the references found in one version can be attributed to the other; (3) on an experimental basis, authors can add references to the database, something authors are quite keen to do to increase their citation count. The system requests that all references of a paper be added, so as to provide a positive externality for others as well.
2.5. Abstract Views and Downloads Data
Another measure of the impact of research is how often it has been “looked at.” Abstract views statistics assess the attractiveness of the title, the authors or the general topic. In addition, downloads statistics indicate how much abstracts have contributed to the attractiveness of the downloaded document.
Keeping track of abstract views is not difficult using the logs of a web server. The only drawback is that abstracts displayed during uses of the search engine cannot be counted. Downloads are more difficult, given that they typically link to external servers. Thus some mechanism needs to be put in place to keep track of downloads.
The decentralized nature of RePEc complicates the compilation of these statistics. The participating services first need to keep appropriate logs and second need to make them available in an appropriate format. The LogEc project [16
], managed by Sune Karlsson at Örebro University, tries collect this information. The following RePEc services provide information for downloads and abstract views: EconPapers, IDEAS, NEP, EconomistsOnline, and Socionet. The defunct NetEc also used to provide data. Other services that use RePEc data, in whole or part, unfortunately do not provide statistics. Among them are EconStor, Inomics, Econlit, Oyster, and any service making use of the RePEc data made available though the Open Archive Initiative (Google Scholar, for example).
Quite obviously, these statistics are subject to manipulation, as one could repeatedly download a paper to increase its count. For this reason, various information about the abstract viewer or downloader are recorded to prevent repeat counts. This is mainly performed through the use of the IP address, taking also into account IP clusters. Also, and this is mostly relevant for abstract views, visits by search engine robots need to be discarded as they do not represent human readership. Some robots identify themselves, and they can easily be taken care of. Others do not obey standard protocols and need to be recognized as robots. Various identification mechanisms are used to filter these additional robots from the data. Complete details on how all this performed cannot be given here. But overall, about 80% of abstract views are thus discarded—less for downloads.
Whether is it an over-count or under-count of the true count is unknown. Some robots may err, through. Some downloads are discarded as repeated despite originating from different users because they came from the same IP clusters. This happens in particular with institutions using a single cache or proxy server. We hope, however, that the statistics are sufficiently high for such accidents to even out relatively smoothly across all documents and no bias is introduced.
In addition, various checks and balances are implemented to recognize abnormal behavior, mostly from authors trying to manipulate the statistics. Obviously, these safeguards are not revealed here, but let it be known that a human eye has a final look at the server logs in these cases and that several authors have been caught.
Despite all these adjustments, LogEc records over two million abstract views and half a million downloads a month; in other words, every document’s abstract is viewed once or twice time a month, and every item available on-line is downloaded once every second month, on average for reporting RePEc services.
2.6. Further Refinements of the Data
As the works covered in RePEc contain both publications and pre-publications, there is an issue with several versions of the same work being listed. In particular, a working paper may appear in several series. Thus, for any measure that considers the number of works someone has authored, one should count distinct works. For technical reasons, the matching of different versions is done only for works that are listed in a registered author’s profile. The basis is a very similar title and the author’s recognition of authorship. Manual adjustments are done when titles differ, upon request to the RePEc team.
Note that such works may have been cited in their different versions. A citation to any version is counted toward all versions. The same applies to references. Any author statistics involving a count of works or citations aggregates the data from the different versions. Such matching is not performed for abstract views and download counts.
2.7. Discussion of Coverage
Quite obviously, only journals and working paper series that are listed in RePEc can be classified, and only authors that registered themselves can be included. Thus, there are omissions. This is obviously avoidable, but the structure of RePEc puts the burden of indexing on the publishers. Unlisted authors can easily correct this by registering themselves. Missing journals and working papers series can be indexed by their publishers and they will then be fully considered.
Being listed is not sufficient. The listing needs to be maintained, i.e., new items added as they are published. Some publishers are better at this task than others, be it with regard to timeliness, completeness (missing items), coverage (years covered), or data quality (syntax errors, confusing author names). Again, it is up to the publishers to do this work. Registered authors also need to maintain their profile with any additions.
Deceased authors are kept in the database, but their affiliations are removed, the logic being that they cannot contribute to the academic life of their employer anymore. The RePEc Author Service maintenance team tries to keep their profiles current. Authors whose email addresses no longer function are considered to have either moved or died. Hence, their affiliations are discarded from consideration.
Note that while some journals present in other studies are not classified here, our rankings also cover working paper series that are typically neglected by other studies. There is also a limited number of chapters and books. It turns out that some working paper series have very high impact factors, while many journals have low impact factors. It is thus wrong to believe that research is valued only when it is published in a journal. There are also software components. They are either stand-alone program code or material necessary to replicate some study. Citations to them are currently not considered, as it is often difficult to disentangle them with citations to the original works. More on this later, in the discussion of impact factors.
3. Computation of Impact Factors and Ranking of Series or Journals
Many ranking exercises for institutions or authors rely heavily on impact factors calculated elsewhere, and these impact factors are usually the most controversial issue with these rankings. Here we take a different approach in that the impact factors are determined with the RePEc data. We compute four sets of impact factors.
3.1. Simple Impact Factors
The computation of this simple impact factor is rather straightforward. Just find all citations to items in that particular series or journal, count those citations and divide by the number of items in the series or journal. Several adjustments are performed to the number of citations: (1) Self-citations within the series or journal are discarded, to prevent self-inflation. Self-citations by authors are still counted, though. (2) Considering that a work may have appeared in different series, all versions of the cited and citing work are considered, but only one is counted. This matters: for example, an article may be cited, while its working paper version is not, but the working paper series is still credited with this citation.
3.2. Recursive Impact Factors
Recursive impact factors are computed in the same way as the simple impact factors, except that every citation carries some weight. That weight is the recursive impact factor. It is thus the fix point of a function that could be specified in the following way:
is the recursive impact factor of series or journal I
, which has items i
represents all citations from journal J
. To guarantee that a fix point exists, the weights are normalized such that the average item (article or working papers) has a recursive impact factor of one. Also, when there are several versions of a citing item, the one with the highest impact factor is considered.
These factors are computed by iteration. In the first pass, simple impact factors are used and then in each pass the recursive impact factors from the previous iteration are taken. This does, however, never converge completely, as new items and citations are continuously added to the database. The results are relatively stable, though. Concretely, the weights are recomputed every day for all series and journals that are refreshed on IDEAS, that is, those that have had any amendments in the bibliographic data and those that have not been refreshed for thirty days.
The recursive impact factor computed here is similar to the Google PageRank [17
], which ranks web pages higher if they are linked to many others, even more so if it is by web sites that have a high PageRank. The difference is that Google computes a different factor for every page, whereas we compute one for every journal or paper series. The idea of the PageRank is to determine the probability that a web surfer clicking randomly would end up at that page. In our case, this would be the probability, or rather something proportional to it, that a reader randomly following references in articles and papers would end up with a particular journal or working paper series.3
The recursive impact factor also bears some similarity with the Article Influence, which is a journal’s Eigenfactor divided by the proprotion of articles from that journal (see [18
] for details.) We have, however, not checked how close to each other the two are.
3.3. Discounted Impact Factors
This factor is similar to the simple impact factor, with one important difference: each citation counts for the inverse of the age in years (plus one) of the citing paper. Thus, if an article is cited in a paper dated in 2009 and we are in 2012, this citations would count for 0.25.
Such a factor gives an edge to what is cited now, and therefore highlights the publications series that are hot now. It does not mean, however, that its most recent publications are well cited, only that some of them—possibly old—are now well cited.
3.4. Recursive Discounted Impact Factors
This factor is the recursive version of the discounted impact factor. It thus uses its own factors as weights, multiplied by the age factor. This highlights publication series currently well cited in series that are currently well cited.
This statistic is typically used for authors, and hence it is more thoroughly discussed when we go through author rankings. A journal would have H-index of h if h articles have at least h citations. Quite obviously, this favors older journals or series that have a good and numerours stock of articles that attract citations. It takes quite a few years for young publications series to rank well.
3.6. Abstract Views
This criterion simply extracts the abstract views statistics from the LogEc project, using the numbers for the past twelve months.
As for the previous criterion, download numbers are used for the past twelve months.
With six criteria, rankings are obviously going to differ, and every editor or publisher is going to find a favorite. There is nothing wrong with that, but one may want to have a more authoritative ranking. We suggest that aggregating these rankings may do the trick. For reasons explained below on the aggregation of author rankings, the harmonic mean of ranks of all six criteria is used.
Some other published impact factors differentiate by type of article, for example, by giving different weights to full articles, notes and book reviews. One may also want to discard corrigenda. The metadata do not contain the type of the article, and the title in the vast majority of cases does not allow one to infer the type. Thus, we do not take them into consideration.
Also, some journal issues are different. For example, the American Economic Review has one issue a year with non-refereed short articles, the Papers and Proceedings of the annual meeting of the American Economic Association. These short papers are less likely to be cited and add to the article count, thereby diluting the impact factor of the regular articles. One could isolate these special issues, but the task then becomes subjective, as other journals are subject to the same issues at varying degrees. We want to stay objective in our ranking and thus do not adjust. In this particular example, the American Economic Association does not want this distinction to be made anyway.
There are also some small sample issues. Some working paper series, especially, have few items and may as a result have unexpectedly high or low impact factors—high if just one item is often cited. The current solution is not to rank series or journals with fewer than 50 items. The impact factors are, however, used as is.
Finally, there is a problem when a journal changes publishers. Technically, it is now a different journal in RePEc, as its metadata are supplied from a different source. Publishers have the opportunity to record in the journal metadata what the predecessor of the successor was, but few do (or are aware they can). When recognized, this is adjusted by hand in adding pairs, or even triplets, into an exception file. Then statistics are aggregated over them.
4. Ranking of Works
There are six different ways to rank works (working papers, articles, chapters, books). One is to simply count the number of citations it has gathered, again adjusting for different versions of the same item. The second is to discount each citation by its age. The remaining four are to weigh those citations by the impact factors of the citing series or journals.
Thus, if one were to add up all citations to articles in a particular journal, then divide the result by the number of articles, one would obtain the simple impact factor (except that self-citations within the journal need to be excluded). Or if one were to add up the scores of all articles in a journal, with scores using the recursive impact factors and excluding self-citations, one would obtain the recursive impact factor. Doing this with simple impact factors would result in the factors of the first pass in the recursive impact factor computation.
RePEc publicizes rankings for the top 1‰ (one in every thousand) items for each ranking method. In addition, items published five years ago or more recently that are among the top 2‰ are also listed. As there are several criteria, one could also think of aggregating them. Because of the large amount of data, the required computational time and the fact that these rankings are updated daily as abstract pages are refreshed, this has not been implemented.
5. Rankings of Authors
Every person registered in the RePEc Author Service with works listed in the profile is ranked. There are many ways to rank authors and this section discusses those used in the RePEc rankings. The strategy to aggregate the various rankings is then discussed.
5.1. Criteria based on the Number of Works
The simplest of all ways to ranks authors is by the number of works they have authored. However, as working papers are also considered, the same work may appear several times, in different versions. These duplicates cannot therefore be considered. A ranking including the duplicates is provided, but it is not used in the calculation of the aggregate rankings.
The number of distinct works thus serves as basis for the following criteria. They are a combination of simple counts and counts with weights from the simple or recursive impact factors with those counts either divided by the number of authors or not. Thus, the following criteria are used (with their respective labels in bold face):
NbWorks: Simple count.
DNbWorks: Count divided by number of authors on each work.
ScWorks: Count with simple impact factor weights.
AScWorks: Count with simple impact factor weights divided by number of authors on each work.
WScWorks: Count with recursive impact factor weights.
AWScWorks: Count with recursive impact factor weights divided by number of authors on each work.
The first two criteria merely indicate how prolific an author is. The four others measure one characteristic of the quality of one’s work: where it was published. It is an imperfect measure, given that one may simply ride on the coat tails of other papers published in the same series or journal that have been frequently cited. But such count based solely on the impact factors are the ones most frequently used, as they do not necessitate the compilation of citations if one simply takes the impact factors from somewhere else.
Note that the discounted impact factors and recursive discounted impact factors are not used here. They could also be considered, but this would put too much weight on criteria based on the number of works in the overall rankings.
5.2. Criteria based on Citation Counts
Here, we have criteria similar to those based on the work counts, but we count citations. Self-citations are eliminated to avoid artificial and in some cases malicious inflation of citation scores. We may apply to each citation weights by any of the four impact factors, or no weight. And all these criteria may be divided by the number of authors or not.
In addition, we provide the h-index introduced by [19
]. His definition: A scientist has index h if h of his/her papers have at least h citations each, and the other papers have no more than h citations each
. Thus, this author would have at least
citations (at least h
papers with at least h
citations each). Such a criterion puts more emphasis on an important body of work, instead of a few very highly cited papers, by giving higher score to those who have many cited papers. This index was developed for physics, where scientists write a lot of papers and also cite rather generously. Some physicists have h
above 100, but in economics it is very rare to have an h
above 20, mainly due to the fact that economists write fewer, but more involved papers.
A variation of the h-index is provided, the so-called Wu-index following [20
]: A scientist has index w if w of his/her papers have at least citations each, and the other papers have no more than citations each
Finally, two criteria count the number of registered authors citing a particular author: first a simple count, second a count considering the rank of the citing author, giving more points for highly ranked citers. This can measure how widely an author is cited. For example, this penalizes those that cite each other repeatedly (“citing clubs”). Note that each co-author counts for these criteria is she has some self-citations. This is the only case where a self-citation may count. It is possible to compute these criteria thanks to the very nature of the RePEc data with author profiles. We are not aware of any other ranking using such criteria.
Thus, we have the following criteria based on citations:
NbCites: Simple citation count.
ANbCites: Citation count divided by number of authors on each work.
ScCites: Citation count with simple impact factor weights.
AScCites: Citation count with simple impact factor weights divided by number of authors on each work.
WScCites: Citation count with recursive impact factor weights.
AWScCites: Citation count with recursive impact factor weights divided by number of authors on each work.
DCites: Citation count discounted by age.
ADCites: Citation count discounted by age and divided by number of authors on each work.
DScCites: Citation count with discounted impact factor weights.
ADScCites: Citation count with discounted impact factor weights divided by number of authors on each work.
WDScCites: Citation count with recursive discounted impact factor weights.
AWDScCites: Citation count with recursive discounted factor weights divided by number of authors on each work.
NCAuthors: Count of citing registered authors.
RCAuthors: Rank weighted count of citing registered authors.
Due to scheduling differences between the upload of new citations and the ranking computations, the new citations are included for a minority of the authors in current ranking, but they are for all authors in the next issue of the rankings. Again, all self-citations by the author are, of course, excluded.
5.3. Criteria based on Journal Page Counts
The following criteria concern only journal articles. Whether one publishes a note, which is shorter, or a full-length article is an indication how editors feel about the contribution of an article. Also, some argue that editors allow particularly good pieces to run longer, while less important works are cut. Thus the page count can be an indication of the worth of one’s publication record. Again, the page count can be weighted or not and divided by the number of authors or not.
NbPages: Simple page count.
ScPages: Page count divided by number of authors on each work.
WSCPages: Page count with simple impact factor weights.
ANbPages: Page count with simple impact factor weights divided by number of authors on each work.
AScPages: Page count with recursive impact factor weights.
AWScPages: Page count with recursive impact factor weights divided by number of authors on each work.
Thus, publishing a long article in an obscure journal is valued highly with the two first criteria, but barely factors in with the four others. Note that these are criteria that, in contrast to the others, pertain to a subset of all documents (articles). Also, these criteria can sometimes be somewhat misleading. For example, if a journal does not provide page numbers, either because they are missing in the metadata or because the article is online only and not in a paginated format, the number of pages defaults to one. This is justified by the fact that in some cases only the number of the starting page is provided, with it indistinguishable from a one-page article. In addition, these criteria do not take into account the size of the pages. Some journals publish in A4 or Letter format, whereas most have smaller formats. Font size may vary as well, thus actual content of a page could be quite different from one journal to another. No such adjustments are performed as there is no way to systematically verify those parameters and how they may change through the years, except through intensive manual labor that would count the average number of words per page or something of that order.
Note also that the discounted impact factors are not considered. Adding them would be giving more weight to publications in journals. Given that many journals have impact factors lower than working paper series, there is no particular reason to privilege journals. The market can decide which is the better publication outlet.
5.4. Criteria based on Popularity on Reporting RePEc Services
Here, we measure how many times document abstracts have been viewed and how often they have been downloaded. As described in the section on LogEc, these statistics pertain to the subset of RePEc services that report such statistics. Furthermore, as all the metadata collected by RePEc are in the public domain, one cannot track how much it is used. Looking at the collected subset can. however, still give a good indication. Note that these statistics are checked for multiple views or downloads, and robot and web spider activity is excluded, as described above.
Again, we provide statistics with the criteria either divided by the number of authors or not. Thus the following four criteria are available in the category:
AbsViews: Total abstract views in the past 12 months.
AAbsViews: Total abstract views per author in the past 12 months.
Downloads: Total downloads in the past 12 months.
ADownloads: Total downloads per author in the past 12 months.
Statistics are computed for the past 12 months. On the one hand, including a longer period allows the smoothing out of inherent short-term variability—for example, new papers announced through NEP get a large one-time boost, and authors may not yet have claimed them in their profile. On the other hand, the period considered should not be too long. First, this allows one to take into account what is popular now; second, it corrects for bias stemming from items having been listed for a long time, while even older material may have been added only recently.
Note that counting abstract views and downloads starts as soon as the research item (article, paper, etc.) is added to RePEc, and these numbers are aggregated for registered authors. Thus, when an author creates a profile, the statistics for his/her papers are added also for the period where he/she was not yet registered.
For computational reasons, the criteria with statistics per author are computed with a one-month delay.
5.5. Criteria based on Co-Authorship Networks
These two criteria have been recently included and exploit the new CollEc project, http://collec.repec.org/
, run by Thomas Krichel and hosted by the Economics Department at Washington University. CollEc looks at all registered authors and computes a network of all of them, using their ties through co-authorship. Several disconnected networks emerge from this analysis, one of them encompassing the majority of the authors (as of this writing, 23,994 of 32,664). Within this network, the shortest path between any two authors is computed. With this, we can compute two criteria:
The first measures the average number of hops through the co-authorship network that are necessary to reach all member authors. This is similar to the Erdős number mathematicians use to relate themselves to the most prolific of them, Paul Erdős. In economics, there is no such standout author, and we average over all. This is the only criterion where a smaller number is better.
The second looks at how likely a shortest path (from the set of all shortest paths) is likely to run through a particular author. Because some authors that are in the network are on the end-points of shortest paths (“dangling nodes”), the numbers of those that can be ranked with betweenness is smaller than for closeness, currently 16,791. In both cases, authors with null scores are ranked just behind the last author with a score.
Why are these criteria used for rankings? For one, they measure how involved and networked authors may be. Co-authorship is penalized by other criteria, so this may be some redemption for authors who have helped many others with their work. But because these criteria do not merely measure the number of co-authors but also the centrality in the co-authorship network, it is more relevant to measuring the pre-eminence of an author. In addition, it encourages authors to get their co-authors to sign up with RePEc.
5.6. Aggregation of Criteria
Quite obviously, with so many criteria, it is difficult to agree on who the best economists are, especially as the rankings certainly do not correlate perfectly.5
Some way to aggregate the rankings is required and unfortunately different ways of doing so give different results. In fact, they emphasize different aspects that all have some relevance. We discuss here some of them and then discuss our choice.
5.6.1. Harmonic Mean of Ranks
The harmonic mean is defined as
is the ranking of an author in criterion i
. In such a mean, very good rankings have a lot of weight; for example, the first rank counts twice as much as the second one. But a one rank difference carries very little weight for higher numbers. This aggregation method therefore rewards those who are particularly good in some category, but perhaps rewards too much. For this reason, the harmonic mean is dampened somewhat by adding a constant (currently one) to each rank and then subtracting it from the mean.
5.6.2. Arithmetic Mean of Ranks
This is the easiest and most frequently used way to aggregate criteria and create indices. It is defined as
Doing poorly on one criterion penalizes an author particularly hard. Doing particularly well on one criterion to compensate is much more difficult. Thus, the arithmetic mean rewards those who rank consistently across criteria.
5.6.3. Geometric Mean of Ranks
The geometric mean is defined as
where ∏ symbolizes the product. The geometric mean penalizes poor rankings and emphasizes good rankings. To see this, notice that the geometric mean is the exponential of the arithmetic mean, and thus it dramatizes the features of the latter. Or put in another way, given a generalized mean with exponent p
the geometric mean corresponds to
, which is between the arithmetic mean (
) and the harmonic mean (
5.6.4. Lexicographic Ordering of Ranks
Ranking extremely well for a particular criterion is the most rewarding with this aggregation method. For an author, all ranks are ordered from best to worst, then all authors are ranked in the following way: first all those with their best rank being a first rank, the tie breaker being their second best rank, then third best. Once all authors ranked first for any criterion are exhausted, those with rank two as their best rank are taken, and so on. This is akin to the ordering of words in the dictionary, hence it is named “lexicographic.” This concept is also used in economics to describe some preference classes in utility theory.
5.6.5. Graphicolexic Ordering of Ranks
This method takes the lexicographic method, but turns it on its head, hence its newly coined name: authors are ranked by their worst rank under any criterions, then their second worst rank to break ties, etc. This rewards authors that do not have a slip-up according to some criterion.
5.6.6. Sum of Percent of Best in Criterion
All the aggregation methods above consider only how someone is ranked according to the various criteria, but not how far apart the ranks are from each other for each criterion. For example, barely being first is valued in the same way as when there is a large gap between the first-ranked and the second-rankied. One way to take the latter into account is to attribute 100% to the first ranked, and then proportionally percentages to the lower ranked authors. All these scores are then added. This aggregation method benefits the most those who have criteria where they are significantly better than others, especially for criteria where the dispersion of scores is larger.
5.6.7. Exclusion of Extremes
The truncated mean excludes the x largest and smallest values. This reduces the impact of outliers. In particular, if one thinks that the particular aggregation mean one has chosen is influenced too much by such outliers, using truncation can make the mean more credible. There is no particular guideline to choose what the value of x should be. An alternative is the Winsorized mean, where the truncated criteria are set to the rank of the largest respectively lowest remaining ranks.
5.6.8. Discussion and Aggregation Choice
We have identified 35 different criteria for ranking authors and could have easily added more. In addition, we presented six aggregation methods, which can even be varied with the number of extremes to exclude and some other degrees of freedom. Each of the criteria can be multiplied by some weight. This is a dismaying array of possibilities, but we need to make choices. Those choices are easier if the criteria or aggregation methods lead to similar results. To some extent they do, as we see in a subsequent section, but there are noticeable differences. We still need to make a choice and take a stand.
Everyone would probably favor a combination of criteria and aggregation method that would favor oneself. We need to find something that is credible, in the sense that a person outside the profession would find it agreeable. We want to highlight the particular achievement, say that an author is particularly successful in downloads despite not having published much (yet), or that an author elicited many citations despite not being prolific. The harmonic mean achieves this, but needs to be tempered somewhat, and we thus add a constant of one to each rank. Also, we include all criteria but two, the simple number of works NbWorks (which does not distinguish distinct works, as multiple versions of the same work inflate this count) and the Wu-index (as it leads to a large number of ties and in particular a lot of null scores), in the aggregation. For each author, we further truncate by dropping the best and worst ranking. Thus, in summary: we consider for each author 31 rankings from a pool of 33, having dropped for all authors NbWorks and the Wu-index from the 35 presneted rankings, with aggregation through an adjusted harmonic mean.
These choices can, and should, be argued and we leave the reader the opportunity to try other ways to rank on the website.
6. Ranking of Institutions
When registering, each author has the opportunity to affiliate himself with some institution(s). For those that are listed in EDIRC, the affiliation is recorded with an identifier that can be used to aggregate all authors from that institution. This then allows ranking of institutions.
A few rules apply. Only institutions listed in EDIRC are ranked. An author can affiliate himself with several institutions and all receive credit for that author. If an institution is a sub-entity of another institution also listed in EDIRC, the latter also receives credit. The ranking score of sub-entities is computed, but these institutions do not increment ranking counters. This means it is possible to consider this sub-entity as if it were a stand-alone entity. For each criterion, the institution’s score is just the sum of the scores of each affiliated author. The only exceptions are the h-index and the Wu-index, see below.
Quite obviously, institutions with many authors have an advantage. Clearly, taking an average score within an institution would make little sense, as author registration is not mandatory, and potentially lower-ranked authors may be discouraged to register. On the contrary, adding up all authors’ scores gives the right incentive: everyone should register, including students who already have authored something in RePEc.
One controversial aspect, though, is how to treat authors with multiple affiliations. Until the December 2008 ranking, each affiliation counted equally and fully, which counted some authors multiple times, and some institutions with numerous “courtesy” appointments would rank much higher than expected. Since the January 2009 ranking, the rules for multiple appointments have changed in the following way. For each affiliation i
, the number of registered authors is counted; call it
. Then, the weight of that institution is
Note that these weights sum to 0.5. The remaining 0.5 is attributed the the affiliations whose website domain most closely matches the email address or personal website of the author (ties are split equally). If it is impossible to identify a principal affiliation, for example for authors without institutional homepages and with email accounts at Gmail or alumni accounts, all weights are doubled. For affiliations that are not listed in EDIRC, and thus that do not have a well-defined , by default the number of authors divided by the number of institutions in EDIRC with authors is taken.
Of course, authors may disagree with the weights. Since February 2012, authors can specify the weights themselves. In fact, any change in affiliation now requires the author to set these weights, with the hope that system-set weights will gradually fade out. In the end, these weights are supposed to better take into account courtesy appointments by giving them less weight and attribute authors to the location where they mostly work.
Finally, we need to explain how the h-index is computed in the case of institutions. Remember that for authors h
is defined as the number of works with at least h
citations. For institutions, we follow [21
] and define the institutional h
as the number of authors affiliated to that institution with an h-index of at least h
. As the h
can only be an integer and the support of its distribution is even smaller than for authors, there are numerous ties. To break these ties, we adapt [22
]. They augment h
by a rational number between zero and one, measuring the distance to the next h-index considering how many citations are required to reach it. In our case, we measure a similar distance, but we consider how many authors with appropriate h-indices are necessary to reach the next step. Note that for multiple affiliations, it is impossible to use the weights
discussed above. The h
of member authors is fully counted toward each institution.
A final note regarding institutions: due to the nature of criteria, the measure of centrality in the co-authorship network, Close and Betweenn, cannot be computed.
7. Ranking of Geographic Regions
To rank geographic regions (countries, U.S. states), the same logic is used as for ranking institutions. All authors affiliated with institutions in a particular region are added to the pool of that region. However, authors with multiple affiliations have their scores split among all regions according to the weights discussed in the previous section.
For authors with affiliations not listed in EDIRC, the geographic location of their affiliation is guessed from the address of its web page. If it still cannot be found, then the home page of the author and then the email address are used. Obviously, this can still fail, as addresses with .com, .net, .org or .info are not geographically informative. We have, at least tried to avoid errors.6
Once all these attributions are made, we simply add up the scores, properly weighted. The only exceptions are, again, the h-index, where the same scheme as for institutions is used, and the closeness and betweenness indicators in the co-authorship network. Note that we do not calculate scores for the United States as a whole, as it would obviously be number one in every aspect. Rankings for every state are given, though.
8. Other Rankings
A wealth of data is available, and this allows us to establish various other rankings. A few examples are below, and more will be added once sufficient critical mass is present to display somewhat credible results.
8.1. Ranking within Geographic Regions
Once authors have been attributed to a particular region, it is easy to rank them within that region as well. The same applies to institutions within that region. Publishing rankings with very few entities or authors do not make much sense, though. For this reason, a minimum of five authors or five institutions need to be present. In some regions, there is little hope for authors to be listed, whatever their prestige, due to lack of participation by others in RePEc, or in small countries, due to the lack of economists. Therefore, rankings for regional conglomerates are presented as well, say the Mountain states in the United States, Central America and the Caribbean, or Africa.
Again, we need to mention authors with multiple affiliations here. If those span several geographic regions, their score is multiplied by the appropriate weight
as computed above.7
A ranking that uses a straight excerpt from the world rankings is also provided for information (take the world ranking, and pick those from the specific region in the same order). But this ranking can differ significantly from the regional ranking for several reasons: first and as mentioned, authors with multiple affiliations across regions can only count part of their score toward a regional ranking; second, aggregate rankings are computed afresh within the region. This means that an author who far ahead in the world ranking under some criteria (say, because of very high citation counts) is still ahead under the regional ranking, but not by much. This can matter for the aggregation of ranks.
The same rules apply for ranking institutions within regions, where author scores (multiplied by relevant weights) are added. And in a similar way, regional rankings may differ from a regional extraction from the world rankings.
8.2. Ranking of Female Economists
Women are, unfortunately, quite underrepresented in the economics profession. It appears, from a limited investigation, that they are also underrepresented within RePEc. One can still try to make a meaningful ranking with data collected within RePEc. Unfortunately, an author registering with RePEc does not declare his or her gender. This needs to be inferred from the first and middle names using a name data bank. There are, however, several difficulties: some names may be used for both females and males, and this may vary by culture. Also, given the international nature of RePEc, there is a incredible diversity in first names.
The following rules are applied for gender attribution: if there is more than 90% confidence the gender is correct, it is attributed. Ambiguous and unrecognized names and are then manually entered into the exception tables—one for names that were not in the original tables, the other for case by case attributions.8
In the end, only 0.4% are left without a gender. Close to 18% are identified as female.
The ranking of female economists is performed solely among female economists, that is, without considering the gender-wide ranking: females are ranked within their group according to each criterion and then the rankings are aggregated. This means it is possible that the order of female economists when considering only gender may be different from the order of female economists among all economists, as also occurs in the regional rankings described above.
8.3. Ranking of Young Economists
It takes a long time for economists to make it into the top ranks; thus, it is of interest to compute rankings limited to young economists so that they have a chance of gaining some visibility. However, the RePEc Author Service does not collect data about birth date or graduation dates. As a proxy for age or professional experience, one can use the date of the first publication, whatever its form. It is commonplace to publish at least a working paper within a year of graduation, if not before finishing studies.
There is a small percentage of records in RePEc that do not carry dates. There is nothing that can be done about this, but we can only hope that those items are not the first works of some authors. For all others, the selection criterion is that the first work be within 5, 10, 15, or 20 years of the current year, counting whole years. Clearly, young economists have fewer papers and citations, so the rankings are much less stable once you go past the top few—especially for the very youngest. For this reason, the ranking is limited to the top 200.
8.4. Ranking of Deceased Economists
When we learn about the death of an economist, the deceased author is flagged and the profile continues to be maintained, as some works may still be added (posthumous publications as well as late additions to the database). As noted above, deceased authors do not have affiliations and are ranked alongside living authors.
As there are now about 150 deceased author, it has become possible to rank deceased authors. It does not serve any particular purpose, except that it can be done.
8.5. Ranking of Top-Level Institutions
Institution rankings are performed at the department, school, institute, or center level—that is, whichever unit has a substantial number of economists. Some, however, are not affiliated in such units—say, a political science department. In a few cases they are senior adminitrators of the university. In addition, many universities have economists dispersed in several independent units that are listed in EDIRC; for example, the departments of economics, agricultural economics, public policy, and finance. The strength of a university with many such units may not be properly reflected in ranking based on units. The same applies to the Federal Reserve System whose constituents are treated as separate.
For this purpose, a separate ranking is created for “top-level” institutions. For example, anyone affiliated with any unit at Harvard University is counted toward the university. The aggregation is based on EDIRC records, and for those without an EDIRC affiliation the domain of the institutional webpage is used (authors can submit free text affiliations, where an institutional URL is required). The Federal Reserve Banks are aggregated as well. From there, ranking is performed in the same way as other institutional rankings.
8.6. Ranking within Fields
When registering, authors do not declare a field of research. It is therefore difficult to classify them within each field, although one could try to infer it from the JEL codes attached to their papers. However, as it is customary to put several JEL codes on each paper, and only about 20% of all papers have such a code, infered field attributions would not be reliable. However, we can attribute authors to fields by using data collected with the NEP project [5
NEP disseminates new working papers by email. At the time of writing, there are 91 field-specific NEP reports, each managed by an editor who selects, from all new papers, those that fit within her field. We use these assignments to classify authors. Thus, an author who had 75% of his papers in NEP announced in field A would get 75% of his score attributed towards his ranking in that field. To be ranked, a minimum threshold of 5 papers or 25% is required. As a paper can be announced in several NEP fields, an author may have attributions adding to more than 100%.
To rank institutions within fields, author scores are added for those affiliated, using the appropriate field and affiliation weights. No minimum threshold is used, the rationale being that institutions are expected to have much more diverse expertise than individuals.
In addition, one can also use the field code in EDIRC for institutions. For example, institutions working in agricultural economics or finance are well identified. Also, certain institution types are well documented: central banks, think tanks, international organizations. For others, patterns in their names (or their English translation) are used. This is the case for economics departments and business schools. For all of them, separate rankings are released, including for U.S. Economics departments. Note that for economics departments, an effort is made to remove mis-fits and add those missed by the automatic categorization.
Note that, as for regional rankings, ranking points are computed within the set of admissible authors or institutions and thus can differ from an excerpt of the world rankings, as for other rankings of subsets described above.
9. A Glimpse at Results
We do not want to give detailed rankings here; they are constantly updated and available at http://ideas.repec.org/top/
. In the following, we present a comparison of the various criteria and aggregation methods using a snapshot of the data on July 9, 2012, with 32,731 authors registered affiliated with 5,825 institutions.
9.1. Impact Factors
How do the impact factors compare? Table 1
provides a summary with rank correlations. All of them are very high. This is quite natural as series with many citations ought also to be cited by series with high impact factors. Overall, it does not seem to matter which criterion is used when it comes to ranking series or journals.
Of particular interest here is to compare the impact of journal articles relative to working papers. Table 2
shows that there is no clear winner, which could surprise many. We have to keep in mind that some journals have very low impact factors, while some working paper series have impact factors superior to most journals. Note also, as explained in the previous sections, that if the article version of a paper is cited, it counts toward both. So these numbers do not reflect where the citing author found the reference.
Rank correlations of series.
Rank correlations of series.
| ||All series|
|Recursive discounted factors||.951||.993||.96||1||.023||.508||.44|
|Recursive discounted factors||.966||.994||.965||1||-.016||.515||.524|
| ||Working paper series|
|Recursive discounted factors||.948||.992||.961||1||.023||.585||.457|
|Impact factor||w/ items|
| ||All series|
|Recursive discounted factors||.957||.991||.964||1||-.062||.352||.345|
|Recursive discounted factors||.956||.992||.954||1||-.069||.422||.455|
| ||Working paper series|
|Recursive discounted factors||.952||.988||.966||1||-.057||.394||.312|
How do the various rankings compare? Taking all articles and papers that are ranked in the top 500 in any of the six categories on February 23, 2009, and narrowing them down to those listed in all six categories, we obtain a sample of 416 items. The fact that 83% of the top 500 according to one criterion are listed in all other criteria is already an indication of high correlation. Within this (rather small) sample, the rank correlations are still fairly high, averaging 0.647 (Table 3
). Rank correlations over the whole sample would be much larger, as demonstrated in other contexts below, but much more difficult to compute, for technical reasons.
We have 34 different ways to rank authors;9
thus if we want to compare how differently they perform, we need to look at 1,122 correlations (
). Table 4
reports them along with correlations with the harmonic average of rank (excluding extreme ranks), which is used as the default for aggregate rankings. While all these numbers can be overwhelming, the following can be extracted: the average correlation stands at 0.822 and varies between 0.561 and 0.997. The table groups the criteria in categories (number of works, citations, derived from citations, article pages, visibility on RePEc); not surprisingly, correlations within these categories tend to be higher than within other categories. It is more interesting to see where criteria seem to differ most: article pages versus
co-authorship centrality on RePEc, with an average correlation of 0.646. This does not mean that they are orthogonal, though; 0.646 is still a significant correlation. But it is revealing that publishing in journals, or even in good journals more specifically, has relatively
little to do with how much people are connected to each other. Looking at how the various criteria correlate with the harmonic mean, we note that correlations are again relatively high. No criterion stands out with a particularly high correlation, an indication that one cannot easily summarize the aggregate ranking with a single criterion. The criteria based on co-authorship networks score lower than the others, however.
Speaking of significance of correlations, there is a statistic that allows one to measure how independent the criteria are from each other,
is the number of criteria10
is the average correlation,
is the number of authors, and
. To be significant at 5%, the statistic would need to be below
. Therefore, we easily reject the null hypothesis that the criteria are independent.
One should expect that correlations are higher when we consider the aggregate ranking criteria than for the individual ranking criteria. This turns out to be wrong; see Table 5
. They average 0.771, with a minimum of
and a maximum of 1. This is because the “percent” aggregator is not well defined for the Close criterion, where a smaller number is better.11
Excluding the best and worst criterion for each author makes a significant impact on the overall picture, however, as the Close criterion is then excluded for most in the “percent” aggregations but has little impact otherwise. Indeed, experience shows that the exclusion of extremes can alter the rankings at the very top for a few authors with a large variance in the rankings across criteria, but it does relatively little for others. The only exception in the “percent” aggregation, where a strong lead in a category can cause a drastic reduction in ranking when it is excluded, for example. It is also remarkable that harmonic, arithmetic, and geometric aggregation methods are all very close to each other.
Average impact factors.
Average impact factors.
|Discounted simple factors||0.86||0.61|
|Discounted recursive factors||0.31||0.19|
Rank correlations of scores for top items by criteria.
Rank correlations of scores for top items by criteria.
| ||Criteria from Left Column|
|Number of citations||1||.909||.465||.895||.831||.429|
|Discounted simple factors||.831||.883||.483||.867||1||.564|
|Discounted recursive factors||.429||.432||.598||.508||.564||1|
Rank correlations across criteria for authors, full sample.
Rank correlations across criteria for authors, full sample.
| ||Works||Works||Works||Works||Works||Works||Works||Cites||Cites||Cites||Cites||Cites||Cites||Cites||Cites||Cites||Cites||Cites||Cites||Index||Authors||Authors||Pages||Pages||Pages||Pages||Pages||Pages||Views||loads||Views||loads|| ||weenn||onic|
Rank correlations across aggregate criteria for authors, full sample.
Rank correlations across aggregate criteria for authors, full sample.
The concordance of rankings across institutions is higher than that of authors for individual criteria and for aggregate criteria;12
see Table 6
and Table 7
. Looking at the individual correlations, the patterns are also somewhat different compared with authors. For example, the h-index and citing authors rankings typically correlate less, while page counts and RePEc visibility correlate more. And of course, the “percent” aggregation correlates much more. Comparing the individual criteria with their harmonic average, we find that correlations are significantly lower than for authors, and the h-index stands out with a higher correlation than the others.13
One has to keep in mind that while a correlation is a linear concept, the harmonic mean is not, the more so when the extreme values are removed. One can thus not necessarily imply from the rather low correlations that all criteria necessarily contribute information to the aggregate.
10. Comparison with Other Ranking Methodologies
The goal of this section is not to compare how the impact factors or rankings obtained by RePEc differ from other exercises.14
It is rather to highlight some of the conceptual differences: what RePEc may miss and what others may miss.
10.1. What RePEc can Do and Others Not
The rankings described above make use of the many facets of the data collected within the RePEc project. Some of them are quite unique, which certainly gives these rankings some added value when compared with existing rankings:
Rank correlations across criteria for institutions, full sample.
Rank correlations across criteria for institutions, full sample.
Rank correlations across aggregate criteria for institutions, full sample.
Rank correlations across aggregate criteria for institutions, full sample.
| || ||harmonic||arithmetic||geometric||lexicographic||graphicolexic||percent|
Timeliness: The data in RePEc are constantly updated and the results are continuously refreshed on its websites. For example, a working paper or article is typically listed within 24 hours of the publisher indexing it, its citation analysis is released within a month, and its downloads are continuously monitored.
Current Affiliations: Rankings of institutions reflect the current affiliations of authors and can incorporate the move of an author from one affialiation to the other into account within a month. Other counts typically take into account only the affiliation at the time of publication.
Pre-Publications: Established citation aggregators typically consider only citations in journals to journal articles. Even the set of journals is often severely limited. There are no such restrictions in RePEc. In fact, working papers are a very important means of dissemination in economics (and RePEc may have contributed to this) that should not be neglected. Note that analyzing working papers also significantly contributes to the timeliness of rankings.
Certainty about Authorship: Given that authors acknowledge what works they have authored when they maintain their RePEc profiles, one big issue in ranking authors is resolved: name ambiguities. Indeed, many publications provide only the initial of the first name. Also, there are homonyms in the profession. The use of RePEc data leave no doubt.
New Ranking Criteria: Thanks to the fact that authors build profiles in RePEc, it is possible to reliably count how many different authors cite a particular author. We do not know of the use of the NCAuthors, RCAuthors, Close, and Betweenn criteria elsewhere. The same applies to the h-index for journals, series, and institutions.
10.2. What RePEc cannot Do
There is very little human intervention in anything that RePEc does. Thus various aspects of other ranking analyses cannot be performed here:
Errors: Citation analysis is based heavily on automatic reference extraction from texts and pattern matching of titles. Errors can obviously happen, and probably more so than with analysis by humans. The most important case is when a list of other working papers in a particular series is printed on the last page of a paper, and this list is interpreted as the continuation of the citations. This is adjusted when reported, but affected authors have little incentive to report this. Authors can now remove citations that are not accurate, though.
Adjustments: Any criteria based on page counts can be adjusted by the size of the page or its average word count in order to truly reflect the length of the article. RePEc does not do this, as it is completely automated.
Stable impact factors: Due to the constant adjustments in RePEc, impact factors change frequently, within bounds. This can make the use of such factors difficult for third parties.
Comprehensiveness: Some important publications are still missing in RePEc, but RePEc has no staff to index them. Also, not all authors are registered with RePEc, and some do little to maintain the accuracy of their records.
In this paper, we hope to have demonstrated that the ranking exercises performed in RePEc are based on a sound methodology and can be useful. It should also be clear that they are a work in progress, as the data are not yet as comprehensive as they could be, both in terms of listed publications and, especially, registered authors. The citation database is the component that is the most experimental15
at this point, as reference extraction and matching is difficult and error prone. As more publishers and more authors join in the RePEc project, as we perfect the analysis of the data, our confidence in the rankings will rise, and we hope the RePEc rankings will be regarded as a useful tool in the profession.