Next Article in Journal
Construction and Performance Analysis of Image Steganography-Based Botnet in KakaoTalk Openchat
Previous Article in Journal
RNN-ABC: A New Swarm Optimization Based Technique for Anomaly Detection
Previous Article in Special Issue
Homogenous Granulation and Its Epsilon Variant
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Multilingual Ranking of Wikipedia Articles with Quality and Popularity Assessment in Different Topics

by
Włodzimierz Lewoniewski
*,
Krzysztof Węcel
and
Witold Abramowicz
Department of Information Systems, Poznań University of Economics and Business, 61-875 Poznań, Poland
*
Author to whom correspondence should be addressed.
Computers 2019, 8(3), 60; https://doi.org/10.3390/computers8030060
Submission received: 10 May 2019 / Revised: 2 August 2019 / Accepted: 13 August 2019 / Published: 14 August 2019

Abstract

:
On Wikipedia, articles about various topics can be created and edited independently in each language version. Therefore, the quality of information about the same topic depends on the language. Any interested user can improve an article and that improvement may depend on the popularity of the article. The goal of this study is to show what topics are best represented in different language versions of Wikipedia using results of quality assessment for over 39 million articles in 55 languages. In this paper, we also analyze how popular selected topics are among readers and authors in various languages. We used two approaches to assign articles to various topics. First, we selected 27 main multilingual categories and analyzed all their connections with sub-categories based on information extracted from over 10 million categories in 55 language versions. To classify the articles to one of the 27 main categories, we took into account over 400 million links from articles to over 10 million categories and over 26 million links between categories. In the second approach, we used data from DBpedia and Wikidata. We also showed how the results of the study can be used to build local and global rankings of the Wikipedia content.

1. Introduction

Nowadays, in order to make the right economic decisions, one needs to analyze and interpret a vast amount of information. The quantity and quality of information to a large extent determine the quality of decisions in various branches of the economy. On the one hand, one must take care of access to proper sources of information. On the other hand, the quality of information determined by various characteristics is also important. High-quality information is essential for effective operation and decision-making in organizations [1]. Inaccurate and incomplete information may have a negative impact on a company’s competitive edge [2].
The Internet enables cooperation and exchange of information on a global scale. Useful information can be found both in specialized sources as well as in general online resources. Nowadays, everyone can also contribute to the development of common human knowledge on the Internet. One of the best examples of such online repositories is Wikipedia, in which content can be created from the level of a web browser. This online encyclopedia has been available for approximately 20 years as a freely available resource, and anyone willing can co-create content. Wikipedia relatively quickly became an important source of information around the world. It contains over 50 million articles in over 300 different languages [3]. The English language version is the largest and contains over 5.8 million articles. Currently, Wikipedia is placed on the fifth place in the ranking of the most visited websites on the Internet [4], giving way only to Google, YouTube, Facebook, and Baidu.
The popularity of Wikipedia is even reflected in the language that scientists use in their works [5]. Despite its popularity, Wikipedia is often criticized for the low quality of content [6]. Articles on a specific subject (a thing, a human, an event, etc.) can be created and edited independently in each language version. Therefore, the quality of information about the same subject often varies depending on the language [7,8,9,10]. It should also be noted that the topic described in one language version can be translated into other languages. However, a relatively small number of users with knowledge of two or more languages take up such an initiative by transferring content between different language versions [11].
Even the largest English Wikipedia does not contain information about all subjects. As we can see in Figure 1, there are over 15 million unique subjects described in at least one of 55 considered language versions. This can be explained by the fact that some issues may be more common in smaller geographical areas, hence the probability of finding more information on a given topic in the relevant language versions (other than English). Overall, we can find almost 10 million subjects that are not covered in English and appear in less-developed versions of Wikipedia [7,12].
When a subject is not described in the analyzed language version or information about the subject is of low quality, we can try to find information about it in other Wikipedia languages. However, identifying a language version best describing the subject may require significant effort from user-popular subjects that are available in several dozen language versions.
Automatic quality assessment of Wikipedia articles is a known challenge in the scientific community. Existing works have some limitations, e.g., they focus mostly on the biggest edition (English) or other popular language versions of Wikipedia. Usually the measurement of quality is reduced to an analysis of the volume of content-number of important elements that the article must contain (such as references, images, sections). However, for quality assessment content must be checked by other users in terms of the neutral point of view, timeliness, quality of sources and other important elements that can be challenging even with current approaches. Therefore, the popularity of the article may be another factor to be considered for quality assessment—the more users read the content, the greater probability of introducing amendments to the article, especially when incorrect or outdated information is detected.
In this paper, we present the assessment of quality and popularity of Wikipedia articles in different languages related to selected topics. This assessment was performed for articles on two levels: within each considered language version (local) and for all languages combined (global).
For the purpose of this study, we selected 55 language versions of Wikipedia that in 2018 and 2019 had at least 100 thousand articles and the depth indicator was at least five. The depth (or editing depth) shows how frequently articles are updated in a specific language version [13]. Table 1 presents basic statistics about 55 language versions of Wikipedia that were considered in the study.

2. Topic Classifications of Wikipedia Articles

2.1. Category Classification

Wikipedia has an extensive category network and each article can be annotated with multiple categories, organized into an “ontology of topics” [14]. Each language version can define its own structure and hierarchy of categories. Moreover, in some language versions that structure is often too fine-grained to be directly analyzed [15]. All this may make it difficult to determine the number of possible topics to deal with.
Category structure and alignment of articles to each category can be analyzed based on files from Wikipedia dumps. Three files have to be used (example for English Wikipedia):
  • enwiki-latest-category.sql.gz—category information; here we use category identifiers and their names;
  • en-latest-categorylinks.sql.gz—wiki category membership link records; here we use information about source page ID and destination category name;
  • en-latest-page.sql.gz—base per-page data; here we use pages ID, title and information about namespaces to identify articles (ns 0) and category (ns 14) pages.
For further research, we extracted information about over 10 million categories in 55 language versions and analyzed about 400 million links from articles to categories and over 26 million links between categories. General statistics about categories are presented in Table 2. The category ratio shows the number of unique categories per number of articles in a particular language version. The highest value of this indicator was Urdu Wikipedia—1.23. The largest English Wikipedia is in the middle in the ranking regarding the value of this indicator.
Another measure that can be useful when analyzing how often Wikipedia users assign different categories to describe each article is the average number of categories per article. Based on data from Table 2 we can define the top three leaders: Arabic with 30, English with 21, and French with 18 categories per article.
We can also notice that in some language versions of Wikipedia there are a large number of categories that do not have their own page that describes these categories and point to the parent category. The highest values were Vietnamese, Chinese and Indonesian Wikipedia pages—about 100 thousand categories without pages. For first two languages with about 1 million articles this is one-fourth and one third of all categories respectively. In Indonesian with about 460 thousand articles, it is about half of all categories. For comparison, the largest English version with over 5 million articles has only 97 categories without a page.
The so-called main categories are present in the majority of considered languages. This applies mainly to those categories that are at the highest levels in the polyhierarchy. One of the main categories is presented at a special page “Category: main topic classifications” [16]. Based on this page, we can identify 38 categories on specific topics in the English Wikipedia. Table 3 shows the names of these categories with a number of the considered language versions. As we can see, some topics may not be available in all languages.
As mentioned before, the category structure is a complex and ever-changing, as it can be edited by any person-users can add or change a category assignment to another category. The resulting category structure is noisy [14], sparse and it contains duplications and oversights [15]. So, we can also face the situation that categories are repeated at different levels of the tree, in which the root can be another main category (one of the 27 considered). In order to avoid such situations, we cut off those branches that were found at higher levels. Figure 2 shows an example of such procedure, when subcategory “Food and Drink” is found at different levels of the tree and only one remains, which is at the highest level.
By counting articles in English Wikipedia in each of the considered main categories we discovered that almost 15% of them are about people. The pie chart in Figure 3 shows shares of articles in English Wikipedia in 27 considered categories.
Figure 4 shows the distribution of articles by category within each considered language version of Wikipedia. Darker colors in the heatmap represent higher share of articles in particular main category within the selected Wikipedia languages.
After combining articles from all considered language versions to a particular category we concluded that the largest number of articles are in one of two categories: Geography (12.68%) and People (11.48%). The pie chart in Figure 5 presents how articles in all considered Wikipedia languages are distributed among 27 main categories.
As we mentioned before, in some language versions there is a relatively high average number of categories assigned to each article. This may increase the possibility of an article falling into more than one main category. We studied this issue for the leading language versions (Arabic, English, French) with regard to the number of categories per article. Results are presented in Figure 6.

2.2. Semantic Classification

The second approach to category assignment to Wikipedia articles is based on Wikidata and DBpedia. Wikidata is a collaboratively edited knowledge base [17]. DBpedia is the semantic database resulting from the extraction of structured, multilingual knowledge from Wikipedia [18,19]. The data from this open databases are widely used in a number of domains: web search, life sciences, maritime domain, art market, digital libraries, business networks, and others [20,21,22,23].
DBpedia uses its own ontology with defined properties and classes organized into a hierarchy. DBpedia provides English names to each class, such as “Place”, “Species”, “Person” etc. Wikidata gives a unique identifier to each class, for example class “city” is marked as Q515, “human” as Q5, “Organization” as Q43229. Another difference between these databases lies in the number of classes and placing these classes in an ontology. Wikidata has over 300 thousand classes [24], while DBpedia ontology consists of about 800 classes [25].
A significantly larger number of classes in Wikidata can lead to difficulties in finding a list of objects on a particular topic. For example, if we want to find all cities, it is not enough to take into account only one class Q515 (city), because city can also be described by Q1637706 (city with millions of inhabitants), Q5119 (capital), Q2264924 (port city), Q58339717 (city of India), Q174844 (megacity) and other identifiers. This variety of classes leads to significantly fewer instances in each class in Wikidata than in DBpedia [24].
We should also consider a way of assigning a class to objects in these semantic databases. DBpedia extracts information from Wikipedia infoboxes and identifies classes based on the name of the infobox and values of some special parameters. Thus, articles with the same infobox name often go to the same class. In Wikidata, items can be edited by everyone, therefore different classes can be assigned to similar objects.
There are some papers that study the differences between DBpedia and Wikidata [24,26,27]. Each has its own advantages, so we decided to use combined data to divide articles into separate classes: actor, automobile, business, city, film, football player, human, programming, university, videogame, and website. One of the advantages of such a classification approach by topic is that we are dealing here with a more explicit assignment of articles to specific classes and each language version has at least several representatives of each class.

3. Quality Measures

In order to discern the quality of content, the Wikipedia community created a grading system for articles. However, each language version can use its own standards and grading scale [28,29]. For example, in English Wikipedia, articles can get one of 7 grades (from highest to lowest): featured articles (FA), good article (GA), A-class, B-class, C-class, start, and stub. Russian Wikipedia has also seven quality grades but with other names and criteria: Izbrannaja Stat’ja (similar to FA), Horoshaja Stat’ja (similar to GA), Dobrotnaja Stat’ja, I, II, III, IV (similar to Stub). German Wikipedia uses only two quality grades (Exzellente Artikel and Lesenswerte Artikel) which has similar criteria to FA and GA grades respectively. Polish Wikipedia defined five quality grades: Artykuł na Medal (similar to FA), Dobry Artykuł (similar to GA), Czwórka (A-Class), Start, and Zalążek (similar to Stub).
Even though the grading system is available, still the big challenge is a large number of unassessed articles. For example, German and Polish Wikipedia have less than 1% of articles with quality grades. Moreover, articles about the same topic in different languages can also be graded using different criteria. The above facts not only pose problems for comparing the quality of articles in the same language but also for evaluating and comparing different language versions of articles on the same topic.
Using machine learning techniques it is possible to solve the problem of quality assessment of Wikipedia articles as a classification task. In order to build such models, various features can be taken into the account, for example length of an article, number of references, number of images or sections [30,31,32,33,34,35].
One of the universal approaches for quality assessment of multilingual articles is the objective revision evaluation service (ORES) [36]. This service automates tasks like detection of vandalism and removal of edits made in bad faith [37]. Additionally, the service can evaluate articles on a scale between 0 and 1 in some language versions. However, automatic quality assessment of an article by the ORES is currently limited to nine language version of the Wikipedia and it does not include such developed language chapters as German, Spanish, Italian, Polish, Japanese, or Chinese.
In our previous studies [28,38] we defined the synthetic measure to combine several features of articles to allow ranking of Wikipedia articles on a scale between 0 and 100. It is based on the most universal features inferred from machine learning models built for several languages. In the paper, we present conclusions from an assessment of over 39 million articles. Additional focus of this work is analysis of demand for information about various topics in different languages from the point of view of readers, as well as from the authors of Wikipedia content. The intersection of those two dimensions is also considered.
Our previous study [39] showed that the popularity of the Wikipedia articles can be measured by different SEO metrics from other websites. Such indicators as social signals from Facebook, Twitter, Pinterest, Youtube, and others can help to determine also the quality of the content in a multilingual encyclopedia from the external sources. In this work we decided to use internal popularity measures from the point of view of readers and writers of the Wikipedia articles. Additionally, we decided to provide cumulative (global) values of these measures over the language versions about various subjects.
Diverse approaches to defining information by researchers lead also to inconsistencies in defining the notion of its quality. According to the most popular definition, the quality of information can be defined as fitness for use [40,41].
In order to define the quality dimensions in Wikipedia, one should take into account the similarity of this website with traditional encyclopedias and web 2.0 services. On the one hand, content in Wikipedia is created to be a reference point, in an encyclopedic style. According to various studies it has comparable accuracy to other traditional encyclopedias [42,43]. The quality of an article in a traditional encyclopedia can be defined by 7 dimensions: authority, completeness, format, objectivity, style, timeliness, uniqueness [44,45]. On the other hand, Wikipedia is built in a way to allow collaboration between users. It is therefore based on web 2.0 technologies, which have the following quality dimensions: accessibility, completeness, credibility, involvement, objectivity, readability, relevance, reputation, style, timeliness, uniqueness, usefulness [45,46].
Considering the quality criteria adopted by the Wikipedia community and previously described characteristics of traditional encyclopedia and web 2.0 documents, we can choose the following quality dimensions for the Wikipedia articles: completeness, credibility, objectivity, readability, relevance, style, timeliness. Figure 7 shows coverage between quality dimensions of the web 2.0, traditional encyclopedia and Wikipedia.
Each quality dimension contains a specific set of features (measures). Some features can be related to multiple quality dimensions. There are different ways to define and extract features of the Wikipedia articles. Based on the literature and own experiments, we focused on one of the important features, which can show the quality of a Wikipedia article from different dimensions.
Length of text can be measured in various ways-most often it is represented by the length in bytes, the number of letters or words [28,38,47,48,49,50,51,52,53,54,55,56,57,58]. The length of an article is related to completeness and may indicate the presence of relevant facts and details in its articles.
High-quality articles are expected to use reliable sources [59]. Readers of encyclopedias must be able to check where the information comes from [60]. Therefore, one of the most commonly used reliability measures is the number of references in a Wikipedia article [28,34,38,48,49,50,56,58,61,62,63,64]. References are related to the credibility of the article. Our previous research has shown that it is advantageous to analyze not only the quantity but also the quality of the references [39].
Length of text can be positively correlated with the number of references but it is important that all relevant facts in Wikipedia should be supported by reliable sources. For this purpose, the reference density can be calculated as the number of references divided by the length of text.
Wikipedia articles must provide information in a fair and impartial manner. In this case, we can take into account information presented graphically-images [28,34,38,47,50,55,56,57,61,62,65,66]. On the one hand, pictures can help to assess the objectivity of the presented material. On the other hand, we can also measure completeness (because articles on a specific topic should contain images) and style (because the authors decided to add more photos instead of writing long text).
High-quality content must be prepared in accordance with the guidelines of Wikipedia regarding the style that applies to, among others, organization, and structure of the article. Therefore, one of the simplest and most popular measures of this dimension is the number of sections in the article [28,32,34,50,52,56,58,61,62,63].
Quality measures mentioned before can be combined to build a synthetic measure for evaluation of Wikipedia articles. Unlike most methods in this domain, the synthetic measure can assess the quality of Wikipedia articles on a scale from 0 to 100 [38]. Thus, we can compare the quality of articles between different language versions, which can have own quality grading scheme.
Synthetic measure encompasses normalized values of the following five features: length, number of references, reference density, number of images, and number of sections. Every considered language of Wikipedia has a special distinction for articles of the highest quality-equivalents to FA and GA grades in the English version. Normalization of the 5 selected features depends on language chapter of Wikipedia since it uses thresholds, which depend on the best articles in the considered language version [38].
Normalization of each feature was conducted according to the following rule: if value of a given feature in a given language exceeded the threshold of the median value of the best articles in the same language version, it was set to 100 points; otherwise, its value was linearly scaled to reflect the relation of the value to the median value. For example, if the median for the number of references in Polish Wikipedia was 97, any article with a larger number of references would score 100 for this feature; an article with 59 references would score proportionally 60.82 (59/97) points after normalizing. Changing the value of any metric in a particular Wikipedia language version would have a different effect on the normalized value.
For each language version of Wikipedia, each feature could play an important role in assessing the quality; therefore we first counted the normalized metrics average (NMA) by the following formula:
N M A = 1 c i = 1 c m ^ i ,
where m ^ i is a normalized measure m i and c is the number of measures.
Next we took into account the number of quality flaw templates (QFT) in the considered article (if they existed) and our final formula for the quality measure reads as follows:
Q u a l i t y S c o r e = N M A · ( 1 005 · Q F T ) .
Previous research [29] revealed that the synthetic measure was one of the most significant among 100 variables used in the quality model of Wikipedia.

4. Popularity Measures

The popularity of an article can be determined with measures reflecting the demand for the information contained in it by the readers and Wikipedia authors. Popularity can play an important role in quality estimation in specific language versions of Wikipedia [29,34]. A larger number of users reading an article can contribute to faster identification and correction of errors, therefore amendments can be made more often (including an update of the information).
Popularity of an article can be measured based on the number of visits [34,38]. For example, one of the studies compared reptiles species’ page view numbers across languages and in their spatial distribution along with various biological attributes [67].
For assessment of popularity, we decided to use features available in Wikipedia database-page views and the number of unique authors of an article. We also provided local and global measurements characterizing articles, which took into account semantic links between language versions.
For each page of Wikipedia, daily page views statistics are available in a dedicated online service [68] and Wikimedia dumps [69]. We used dumps to analyze the popularity of over 39 million articles in considered language versions of Wikipedia.
Popularity measures in this study were calculated as a median of the number of page visits per day, as it was proposed in the previous study [38]. If the measurement concerns only selected language version, then we call it local popularity. We can also calculate the global popularity, which takes into account the popularity of articles about the same topic in different languages (the so-called interwiki links are considered). The global popularity of an article is calculated according to the following formula:
P o p G l o b a l ( a r t i c l e ) = l a n g = 1 n P o p L o c a l l a n g ( a r t i c l e ) ,
where P o p L o c a l means local popularity of the article, l a n g is the index of specific language version and n is number of the language versions of the selected a r t i c l e .
For quality improvement even more important than the number of page views is the number of real edits. Authors’ interest (AI) can be measured as the number of unique authors of the Wikipedia articles. Each user editing articles on Wikipedia has own experience, level of knowledge and can adhere to a certain world view. In this regard, it can be assumed that a larger number of authors can positively influence the objectivity of the article, since it may contain different points of view on a particular question. At the same time, the number of authors of an article can also indicate the relevance of the article to the Wikipedia community. To sum up, articles created by a larger number of people may be more objective, hence one of the measures leveraged in our research is the number of unique authors [28,34,47,55,56,57,58,63,64,65,70,71,72,73,74,75].
The number of authors can be extracted from article history. Figure 8 shows part of the article’s history about Game of Thrones (season 8) in English and German Wikipedia with highlighted authors.
Similarly to measuring popularity, AI can also be calculated for a specific language version (local AI) and as a cumulative value for all languages (global AI). Authors are identified by names or IP addresses. So, if the same user edited the article in different language versions, in the global AI it will be counted as one author. Calculation of this measure can be carried out using the flowing formula:
G l o b a l A I ( a r t i c l e ) = | l a n g = 1 n A u t h o r s l a n g ( a r t i c l e ) | ,
where A u t h o r s means a set of authors’ names, l a n g is the index of specific language version and n is the number of language versions of the a r t i c l e .

5. Quality and Popularity Assessment

Following the procedures described in previous sections, we extracted over 100 million values of features characterizing articles in all analyzed languages. These values were then used to calculate the synthetic measure that assesses the quality of the content. We next grouped articles by 27 main categories and 55 languages. Within each of the obtained groups (almost 1500), we calculated the sum of all synthetic measure values and divided it by the number of articles. The resulting average quality of articles is presented in Figure 9. Darker colors in the heatmap represent higher values of average quality of articles in a specific category and language version.
The highest average quality was had by articles in category Crime in Slovak Wikipedia (sk)—63.92 points. This is due to the fact that in this language version only a few articles fall into this category and they are generally well written according to studied features. Articles about crime also have relatively higher quality scores in English (en) and Chinese (zh) Wikipedia.
Second place in the ranking are taken by articles about events in Uzbek Wikipedia (uz)—43.96 points. Again, this main category does not contain much content-there are only 31 articles. If we take into account the development of the Uzbek Wikipedia (about 130 thousand articles), we can conclude that this category is rather important for the local community of editors. Articles about events also have relatively higher quality scores in Hungarian (hu), Slovak (sk), Hebrew (he), and Chinese (zh) Wikipedia.
Third place regarding the quality is taken by articles about mathematics in Volapük Wikipedia-39.63 points. However, in this language chapter the category contains only 2 articles. Latin Wikipedia (la) has the fourth place with average quality of articles about religion-37.77.
If we take into account the most developed English Wikipedia, the highest average quality of articles can be found in categories: Philosophy, Crime, Military, and History. Generally, we can conclude that English Wikipedia articles usually have a high value of average quality measure in different topics.
Figure 10 shows the average number of page views per article in the year 2018 for each category and language version of Wikipedia. Darker colors in the heatmap represent higher average number of page views of articles in specific category and language version.
Generally, page views values are higher for the most popular languages. This led to the fact that the first 11 positions in the rank are occupied by English (en) Wikipedia. The most popular topic in this language is Philosophy. One of the highest average popularity in this language characterizes also articles about crime, technology, entertainment, mathematics, culture, and health. All these categories had at least 20 thousand page views in year 2018.
The second most popular language version is Spanish (es). Similarly to English, the most visited category is Philosophy. It is also worth to mention two other popular categories in this language: Mathematics and Health. Articles in three mentioned main categories of Spanish Wikipedia have at least 14 thousand page views per year.
Third place is taken by Russian (ru) Wikipedia and category Entertainment, with about 16 thousand page views per year. Entertainment is also the most popular topic in Chinese (zh) Wikipedia.
Finally, Figure 11 shows the average number of authors (authors’ interest) per article in 2018 in each category and language version of Wikipedia. Darker colors in the heatmap represent higher values of average number of authors of articles in a specific category and language version.
As in the case of the popularity of page views, in the ranking of the author’s interest categories in English Wikipedia topped the ranking. Here we have such popular categories as Crime, Philosophy, Entertainment. Articles about topics were edited at least by eight authors during the 2018 year.
The second language version that has most active authors is Hebrew (he) Wikipedia with articles about entertainment. During a year at least 6 authors have edited each article in this topic. Entertainment is also popular among authors in Italian (it), Spanish (es) and Chinese (zh) Wikipedia. At the same time Italian Wikipedia we can met as the third language in the authors’ interest ranking.
Table 4 presents the main categories that have the highest value of average quality, average popularity and authors’ interest in each language version of Wikipedia.
Depending on Wikipedia language version, we observed different categories with the highest average quality, popularity, and AI. For example in English Wikipedia articles in category “Crime” have the highest average quality, but articles from category “Philosophy” has the highest average popularity and AI. Another example: Arabic Wikipedia has articles from the Religion category as the best for these three measures. Similar applies to Latin Wikipedia. In Persian Wikipedia there is also a similar situation, with exception to popularity-here category “Philosophy” has the highest values. Articles in Russian Wikipedia from category “Entertainment” are the most popular and have the highest average quality. At the same time from the authors’ point of view, the “Events” category is the most popular. A similar trend applies to German Wikipedia. Category “Government” in Azerbaijan, Finnish, Slovenian Wikipedia occupies a leading position.
Finally, we do similar calculations for articles in semantic classes: actor, automobile, business, city, film, football player, human, programming, university, videogame, website. Figure 12 shows average quality, authors interest and page views in 2018 per article in each semantic class and language version of Wikipedia. Darker colors in heatmaps represent higher values of the selected measures.
The leader in terms of the value of average quality is Tamil (ta) Wikipedia with articles that describe cars (automobiles)—43.22 points. The second place in this ranking occupy articles about football players in Hindi (hi) Wikipedia-40.35 points for quality per article. The third place in quality took English (en) Wikipedia with articles about cars—37.39 points. Articles about cars have also relatively high quality in Hebrew (he), Hindi (hi) and Chinese (zh) Wikipedia-over 31 points. In this quality ranking most often we can see articles about cities in English (en), Latin (la), German (de), Slovenian (sl), Serbo–Croatian (sh), Greek (el) Wikipedia-over 30 points per article.
As for page views, we have a similar situation as it was in the case of main category classifications-English Wikipedia has here the highest values. The most popular class in these language versions is programming, which has over 40 thousand page visits per article during 2018. Next the most popular classes with over 23 thousand visits per articles during a year are related to video games, cities, cars, actors, and web sites. Second language version that we can see in the top of the popularity ranking-Russian (ru) Wikipedia with articles about web sites and video games. Next is the German (de) version with articles about web sites.
Authors’ interest ranking of the classes also shows a leading position of English (en) Wikipedia. Here the highest number of authors per article in 2018 have articles about cities—over 10 authors edited each article during a year. Popular among authors are also articles about cars, actors, video games and programming languages—over eight authors per article during a year. The following are articles from Hebrew (he) Wikipedia, describing actors—over seven authors per article during past year. Relatively high interest among authors we can observe also in Chinese (zh), Thai (th), Italian (it), Spanish and Japanese (ja) Wikipedia—over four authors per article about an actor during 2018. Articles about universities have similar values of average authors’ interest in English (en), Urdu (ur), Japanese (ja) and Korean (ko) Wikipedia.
Table 5 presents classes that have the highest value of average quality, average popularity and authors’ interest in each language version of Wikipedia.

6. Local and Global Rankings of Wikipedia Articles

Based on the assessment of over 39 million articles we built rankings of articles in each language version of Wikipedia separately and also leveraged knowledge about links between languages to build multilingual global rankings. Page views and authors’ interest can change in time, therefore we also conducted calculations for individual months-from January 2018 till March 2019. This allows interesting analyses of changes of preferences of Wikipedia authors and readers.
Measurement of popularity can be carried out for a specific language version of the article. In this case results are used to create a local ranking of the article in selected Wikipedia language while combining popularity measurements from all the surveyed language versions of the same article was used to create a global ranking. As it was mentioned before, popularity was measured based on the median value of the daily visits in the selected month. For the purpose of ranking, if median is not sufficient to sort articles we use an additional criterion-total number of visits in selected month is considered.
Another measure-authors’ interest-is calculated as a number of unique authors who provided changes to an article during the selected period (e.g., month). If the number of authors for selected articles is the same, we further sort based on the total number of the page visits.
Popularity and AI measures can be used to build ranking on various topics and for a specific period. Thus, we can examine which articles are popular from the point of view of their authors and readers in each selected month. Global measures can show these results, taking into account several different language versions.
Table 6, Table 7 and Table 8 present top three articles about cars, films, and video games respectively with the highest values of page views and authors’ interest in each period in all considered language versions.
Monthly multilingual ranking of Wikipedia articles about cars shows that depending on the period under consideration, various car models may be at the forefront. From readers’ point of view, in the period of 2018–2019 the most interesting automobiles were: BMW 3 Series, Volkswagen Golf, Ford Mustang, Tesla Model S, Audi A4, BMW 3 Series (F30), and Toyota Supra. However, if we look from the authors’ point of view, there are other Wikipedia articles about cars in the lead: Honda Accord, Honda Civic Type R, Subaru Impreza, Toyota Land Cruiser, Tesla Model X, BMW 3 Series (E36), and Lincoln Continental.
In the multilingual ranking of Wikipedia articles related to films, we can also observe fluctuations among leaders in each considered month. Readers of this encyclopedia preferred such movies as Avengers: Infinity War, Black Panther, Bohemian Rhapsody, Story of Yanxi Palace, Money Heist, Aquaman, The Umbrella Academy, You, The Haunting, The Matrix, Venom, Game of Thrones, Green Book. It was not overlapping with the authors’ preferences who contributed mostly to films: Bairavaa, Doctor Sleep, Escape Room, Kanne Kalaimaane, 8 Mile, Bean, Crazy Rich Asians, Jaws 2, War for the Planet of the Apes, The Ghost of Hui Family, Traffik. Only one title appeared in both rankings—Crazy Rich Asians.
Analysis of leading articles about video games in multilingual ranking shows similar tendencies. Readers preferred Wikipedia articles about such games as Fortnite, Assassin’s Creed, Borderlands: The Pre-Sequel, Red Dead Redemption 2, Detroit: Become Human, God of War, Sekiro: Shadows Die Twice, Fallout 76, Spider-Man (2018 video game), Minecraft, PlayerUnknown’s Battlegrounds, Devilman, Kingdom Come: Deliverance, Call of Duty, Far Cry 5, World of Warcraft, Apex Legends. Wikipedia authors have other priorities of games in the same period: H1Z1, Spider-Man 3 (video game), Ace Combat 7: Skies, Celeste (video game), Rules of Survival, Dick Vitale’s “Awesome Baby” College Hoops, Fire Emblem Warriors, Ace Combat 7: Skies Unknown, MicroVolts, Call of Duty: Black Ops III, RuneScape, Aliens: Colonial Marines, Unreal Tournament, Portal 2. There is no overlap between top titles from readers’ and authors’ point of view.
These rankings show that the most popular articles from a readers’ point of view usually do not match the priorities of the community of Wikipedia authors. This may be due to the fact that popular articles are sufficiently developed and do not require significant revisions. Nevertheless, we also found examples when popular articles are blocked for editing by anonymous users or users with low experience.
Such global quality rankings can show how a specific product is popular worldwide. Table 6, Table 7 and Table 8 show limited number of leading titles of the Wikipedia articles in some of the categories. Therefore, we implemented various multilingual rankings in WikiRank service [78], where it is possible to analyze how the position of a particular article has changed in rankings in comparison with the previous period, what is the most popular language version, what is the quality of the popular language version article, etc. Figure 13 presents an example of the ranking of the articles about films with different parameters.
A combination of measures from different languages makes it possible to create global rankings of all articles. Additionally, for each language version it is possible to generate local rankings-here measures from one language that can be taken into account. An example of the local ranking with a quality distribution of all articles in English Wikipedia is shown in Figure 14.
Calculated measures can be gathered to create an individual profile for each article in each language version. For example, Figure 15 presents such a profile for the article “Fortnite” in English Wikipedia on WikiRank with information about places in local and global rankings, quality and popularity scores, and also history of popularity rank.
Each Wikipedia article in WikiRank service can have information about local and global measurements of popularity, AI and their historical ranks for the last period (Figure 15 show such data monthly from January 2018 to April 2019 on the right side).
If an article is written in more than one language, an additional ranking of the most popular language versions, as well as languages with the highest quality, are displayed. Additionally, it is marked, which language versions were edited by the largest number of authors. Figure 16 shows an example of such ranking of the best language versions about Minecraft.
Profiles of Wikipedia articles can also be used to compare the demand for a specific product between various language communities. For example, video game Dota 2 is the most popular in English, Russian, Chinese, German, and Spanish [83]. Based on obtained measures for the action-adventure video game Grand Theft Auto V (GTA 5) we can see relatively large demand from English, Russian, Arabic, Spanish, and Chinese language community [84].

7. Results and Discussion

During the research, we encountered several restrictions, mainly related to the differences between language versions of Wikipedia. For example, as we showed in Table 3, some main categories do not have links to all considered language versions. This is also true for developed languages. For example, the category “Art” in English Wikipedia does not have a direct equivalent in German Wikipedia, which uses the category “Kunst und Kultur” [85] (“Arts and Culture”) to describe part of this topic.
Regarding categories, our experiments showed that each language version has a specific ratio between the number of articles and the number of categories. Additionally, some language versions can have a lot of undefined pages for the categories. There is also a difference in the number of categories that are assigned to each article. Some languages can use an average of 30 categories to describe one article, while the others are limited to 2–3 categories per article.
Depending on the Wikipedia language version, we observed different categories with the highest average quality, popularity, and authors’ interest. For example in English Wikipedia articles in category “Crime” have the highest average quality, but articles from category “Philosophy” have the highest average popularity and AI. Another example, Arabic Wikipedia has articles from the Religion category as the best for these three measures. Articles in Russian Wikipedia from category “Entertainment” are the most popular and have the highest average quality, while from the author’s point of view the most popular is “Events” category.
Results for authors’ popularity can be sometimes biased due to temporal or permanent restrictions. According to one of the main principles of Wikipedia, anyone can edit content. However, in some particular situations this right can be revoked to protect content from unwanted changes (vandalism) [86]. Each language version can define own levels of page protection. For example, in English Wikipedia there is full protection, where only administrators can edit an article, and semi-protection, which prevents editing by unregistered users or users that are not confirmed. Each article can be protected for a specified period. Figure 17 shows an example of the protected Wikipedia article about Bitcoin with a marked level of protection. As a result, some articles can have fewer authors’ interest than it would in the situation without protection.
In our work, we provided a classification of articles by main categories according to the structure of categories in English Wikipedia. However, each language can have own definition of main categories. In the future, we plan to develop more sophisticated methods to take into account refined category structures.
Supplementing research results are available online at WikiRank service [78]. In research, we used some tools that are available on GitHub [88].

8. Conclusions and Future Work

In this paper, we presented results of the quality and popularity assessment of articles in multilingual Wikipedia. For this purpose, we calculated over 200 million values characterizing quality and popularity of articles in 55 language versions of Wikipedia. Additionally, we analyzed over 10 million categories, over 26 million links between them, and about 400 million links from articles to categories in order to determine the assignment of articles to one of the topics in the main classification. In order to assign articles from different languages to various topics we also used semantic databases-Wikidata and DBpedia. We combined data from these sources to obtain more comprehensive classifications of articles.
Results of the research showed not only how quality and popularity differ for articles from various topics and languages but also how the same topic is developed in different languages of Wikipedia in terms of quality and popularity of content. We observed that articles from topics that are popular in a given language are characterized by relatively higher quality. For instance, articles related to main category ‘Religion’ have relatively higher quality and popularity in Arabic and Latin Wikipedia. Likewise, articles from main category ‘Government’ have relatively higher quality and popularity in Azerbaijani, Finnish, Armenian, Romanian, and Slovenian language version of Wikipedia. Articles related to main category ‘Entertainment’ are more popular in Chinese, Russian, German Wikipedia. At the same time, articles in those three language versions have relatively the highest quality compared to other main categories.
Additionally to categories, we also studied semantic classes as defined by DBpedia ontology and their relation to quality and popularity. The highest average number of page views among different classes in almost all considered language versions had articles that described websites, e.g., Facebook, YouTube, Google. However, popular articles from this class rarely were assessed as articles of high quality. Articles about cities were relatively better described in English, German, Czech, Hindi, Polish, and Spanish Wikipedia. Actors were described better than other classes in Bulgarian, Catalan, Danish, Greek, French, Hebrew, Croatian, Indonesian, Italian, Malay, Portuguese, Serbian, Tamil, Thai, Turkish, Ukrainian, and Chinese language versions.
With regard to popularity, we proposed to pay attention not only to how often users visit certain articles but also what is the authors’ interest in them. The authors’ interest measure can be calculated for a language version or can be combined across studied languages. Sometimes both popularity measures show similar leaders in main categories and semantic classes. For example, Slovenian Wikpedia has the most popular articles related to the main category ‘Government’, while for readers and authors of English Wikipedia articles have higher preference related to ‘Philosophy’. If we consider semantic classes, we can conclude that among analyzed languages the most popular articles for Wikipedians are related to cities and automobiles. We also aggregated numbers for all considered languages so that global demand for specific products, such as films, video games, cars, can be studied.
Additional analyses of popularity measures allowed to find priorities and preferences of Wikipedians and readers in relation to the temporal dimension. Often the most popular subjects of the readers differed from leading subjects from the author’s point of view in the same periods of time. This can be explained by the fact that popular articles are protected and cannot be edited by anonymous users. Additionally, some Wikipedia authors may choose articles based on various initiatives related to the improvement of specific topics at a certain period of time.
Presented results can be used to build more complex models for quality assessment of information in Wikipedia in different languages and topics. In the future, they can help not only to automatically enrich less-developed language versions of Wikipedia but also can be used to build massive semantic databases with a powerful inference system, creating new knowledge for humanity in a relatively short time.
The work towards a more precise assessment of Wikipedia quality will be continued, especially different measures and approaches for quality assessment in Wikipedia and other collaborative knowledge bases will be studied. As of April 2019, based on our calculations, there were over 70 thousand wiki services on the Internet, which potentially can be used to enrich various knowledge bases used in enterprises. Additionally, there are over 1300 linked databases [89] that use data from open sources. We can also take into account dedicated web portals that allow companies and individuals to share their databases for research, such as Kaggle [90]. Local and global AI measurements can be improved by including different additional features. For example, it is possible to divide all users into three categories: anonymous users, registered users, and bots. We can also take into account the reputation and experience of each author of the article. For this purpose we can use the information provided by services like GUC [91] or WikiTop [92].

Author Contributions

K.W. and W.L. conceived the research problem; W.L. conducted state of the art analysis; K.W. proposed research methodology and designed the experiments, starting from hypotheses to be verified statistically; W.L. collected data and performed the analysis; W.L. and K.W. interpreted the results; W.A. provided an overall guidance.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Price, R.; Shanks, G. A Semiotic Information Quality Framework: Development and Comparative Analysis. In Enacting Research Methods in Information Systems; Springer: Cham, Switzerland, 2016; pp. 219–250. [Google Scholar]
  2. Xu, H.; Koronios, A. Understanding information quality in e-business. J. Comput. Inf. Syst. 2005, 45, 73–82. [Google Scholar]
  3. Wikipedia Meta-Wiki. List of Wikipedias. Available online: https://meta.wikimedia.org/wiki/List_of_Wikipedias (accessed on 5 May 2019).
  4. Alexa. Wikipedia.org Traffic Statistics. Available online: https://www.alexa.com/siteinfo/wikipedia.org (accessed on 8 October 2018).
  5. Thompson, N.; Hanley, D. Science Is Shaped by Wikipedia: Evidence from a Randomized Control Trial. MIT Sloan Research Paper No. 5238-17 2018. Available online: https://ssrn.com/abstract=3039505 (accessed on 13 August 2019).
  6. Osman, K. The role of conflict in determining consensus on quality in Wikipedia articles. In Proceedings of the 9th International Symposium on Open Collaboration, Hong Kong, China, 5–7 August 2013; p. 12. [Google Scholar]
  7. Callahan, E.S.; Herring, S.C. Cultural bias in Wikipedia content on famous persons. J. Am. Soc. Inf. Sci. Technol. 2011, 62, 1899–1915. [Google Scholar] [CrossRef]
  8. Laufer, P.; Wagner, C.; Flöck, F.; Strohmaier, M. Mining cross-cultural relations from Wikipedia: A study of 31 European food cultures. In Proceedings of the ACM Web Science Conference, Oxford, UK, 28 June–1 July 2015; p. 3. [Google Scholar]
  9. Gieck, R.; Kinnunen, H.M.; Li, Y.; Moghaddam, M.; Pradel, F.; Gloor, P.A.; Paasivaara, M.; Zylka, M.P. Cultural Differences in the Understanding of History on Wikipedia. In Designing Networks for Innovation and Improvisation; Springer: Cham, Switzerland, 2016; pp. 3–12. [Google Scholar]
  10. Samoilenko, A.; Karimi, F.; Edler, D.; Kunegis, J.; Strohmaier, M. Linguistic neighbourhoods: Explaining cultural borders on Wikipedia through multilingual co-editing activity. EPJ Data Sci. 2016, 5, 9. [Google Scholar] [CrossRef]
  11. Kim, S.; Park, S.; Hale, S.A.; Kim, S.; Byun, J.; Oh, A.H. Understanding editing behaviors in multilingual Wikipedia. PLoS ONE 2016, 11, e0155305. [Google Scholar] [CrossRef] [PubMed]
  12. Bao, P.; Hecht, B.; Carton, S.; Quaderi, M.; Horn, M.; Gergle, D. Omnipedia: Bridging the wikipedia language gap. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Austin, TX, USA, 5–10 May 2012; pp. 1075–1084. [Google Scholar]
  13. Wikimedia Meta-Wiki. Wikipedia Article Depth. Available online: https://meta.wikimedia.org/wiki/Wikipedia_article_depth (accessed on 26 April 2019).
  14. Kittur, A.; Chi, E.H.; Suh, B. What’s in Wikipedia? Mapping topics and conflict using socially annotated category structure. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Boston, MA, USA, 4–9 April 2009; pp. 1509–1512. [Google Scholar]
  15. Boldi, P.; Monti, C. Cleansing wikipedia categories using centrality. In Proceedings of the 25th International Conference Companion on World Wide Web, Montréal, QC, Canada, 11–15 April 2016; pp. 969–974. [Google Scholar]
  16. English Wikipedia. Category: Main Topic Classifications. Available online: https://en.wikipedia.org/wiki/Category:Main_topic_classifications (accessed on 27 April 2019).
  17. Vrandečić, D. Wikidata: A new platform for collaborative data collection. In Proceedings of the 21st International Conference on World Wide Web, Lyon, France, 16–20 April 2012; pp. 1063–1064. [Google Scholar]
  18. Auer, S.; Bizer, C.; Kobilarov, G.; Lehmann, J.; Cyganiak, R.; Ives, Z. DBpedia: A Nucleus for a Web of Open Data. In The Semantic Web; Springer: Cham, Switzerland, 2007; pp. 722–735. [Google Scholar] [Green Version]
  19. Lehmann, J.; Isele, R.; Jakob, M.; Jentzsch, A.; Kontokostas, D.; Mendes, P.N.; Hellmann, S.; Morsey, M.; Van Kleef, P.; Auer, S.; et al. DBpedia—A large-scale, multilingual knowledge base extracted from Wikipedia. Semant. Web 2015, 6, 167–195. [Google Scholar]
  20. Abramowicz, W.; Auer, S.; Heath, T. Linked Data in Business. Bus. Inf. Syst. Eng. 2016, 58, 323–326. [Google Scholar] [CrossRef] [Green Version]
  21. Lewańska, E. Towards Automatic Business Networks Identification. In Proceedings of the International Conference on Business Information Systems, Poznan, Poland, 28–30 June 2017; pp. 389–398. [Google Scholar] [CrossRef]
  22. Filipiak, D.; Filipowska, A. Improving the Quality of Art Market Data Using Linked Open Data and Machine Learning. In Business Information Systems Workshops; Abramowicz, W., Alt, R., Franczyk, B., Eds.; Springer International Publishing: Cham, Switzerland, 2017; pp. 418–428. [Google Scholar]
  23. Stróżyna, M.; Eiden, G.; Abramowicz, W.; Filipiak, D.; Małyszko, J.; Węcel, K. A framework for the quality-based selection and retrieval of open data—A use case from the maritime domain. Electron. Mark. 2018, 28, 219–233. [Google Scholar] [CrossRef]
  24. Färber, M.; Bartscherer, F.; Menne, C.; Rettinger, A. Linked data quality of dbpedia, freebase, opencyc, wikidata, and yago. Semant. Web 2018, 9, 77–129. [Google Scholar] [CrossRef]
  25. DBpedia. Ontology Classes. Available online: http://mappings.dbpedia.org/server/ontology/classes/ (accessed on 5 May 2019).
  26. Ringler, D.; Paulheim, H. One Knowledge Graph to Rule Them All? Analyzing the Differences Between DBpedia, YAGO, Wikidata & co. In Joint German/Austrian Conference on Artificial Intelligence (Künstliche Intelligenz); Springer: Cham, Switzerland, 2017; pp. 366–372. [Google Scholar]
  27. Ismayilov, A.; Kontokostas, D.; Auer, S.; Lehmann, J.; Hellmann, S. Wikidata through the Eyes of DBpedia. Semant. Web 2018, 9, 493–503. [Google Scholar] [CrossRef]
  28. Węcel, K.; Lewoniewski, W. Modelling the Quality of Attributes in Wikipedia Infoboxes. In Business Information Systems Workshops; Abramowicz, W., Ed.; Springer International Publishing: Cham Switzerland, 2015; Volume 228, pp. 308–320. [Google Scholar] [CrossRef]
  29. Lewoniewski, W. The Method of Comparing and Enriching Information in Multlingual Wikis Based on the Analysis of Their Quality. Ph.D. Thesis, Poznań University of Economics and Business, Poznań, Poland, 2018. [Google Scholar]
  30. Xu, Y.; Luo, T. Measuring article quality in Wikipedia: Lexical clue model. In Proceedings of the 2011 3rd Symposium on Web Society, Port Elizabeth, South Africa, 26–28 October 2011; pp. 141–146. [Google Scholar] [CrossRef]
  31. Anderka, M.; Stein, B.; Lipka, N. Predicting quality flaws in user-generated content: The case of wikipedia. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, Portland, OR, USA, 12–16 August 2012; pp. 981–990. [Google Scholar]
  32. Warncke-wang, M.; Cosley, D.; Riedl, J. Tell Me More: An Actionable Quality Model for Wikipedia. In Proceedings of the 9th International Symposium on Open Collaboration, Hong Kong, China, 5–7 August 2013; pp. 1–10. [Google Scholar] [CrossRef]
  33. Su, Q.; Liu, P. A Psycho-Lexical Approach to the Assessment of Information Quality on Wikipedia. In Proceedings of the 2015 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), Singapore, 6–9 December 2015; Volume 3, pp. 184–187. [Google Scholar] [CrossRef]
  34. Lewoniewski, W.; Węcel, K.; Abramowicz, W. Quality and Importance of Wikipedia Articles in Different Languages. In Proceedings of the International Conference on Information and Software Technologies, Druskininkai, Lithuania, 13–15 October 2016; pp. 613–624. [Google Scholar] [CrossRef]
  35. Dang, Q.V.; Ignat, C.L. Quality assessment of Wikipedia articles without feature engineering. In Proceedings of the 2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL), Newark, NJ, USA, 19–23 June 2016; pp. 27–30. [Google Scholar]
  36. Halfaker, A.; Taraborelli, D. Artificial Intelligence Service ‘ORES’ Gives Wikipedians X-ray Specs to See Through Bad Edits. Available online: https://blog.wikimedia.org/2015/11/30/artificial-intelligence-x-ray-specs/ (accessed on 31 December 2017).
  37. Wikimedia Foundation. ORES. Available online: https://ores.wikimedia.org/ (accessed on 5 May 2019).
  38. Lewoniewski, W.; Węcel, K.; Abramowicz, W. Relative Quality and Popularity Evaluation of Multilingual Wikipedia Articles. Informatics 2017, 4, 43. [Google Scholar] [CrossRef]
  39. Lewoniewski, W.; Härting, R.C.; Węcel, K.; Reichstein, C.; Abramowicz, W. Application of SEO Metrics to Determine the Quality of Wikipedia Articles and Their Sources. In Information and Software Technologies; Damaševičius, R., Vasiljevienė, G., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 139–152. [Google Scholar] [CrossRef]
  40. Kahn, B.K.; Strong, D.M.; Wang, R.Y. Information quality benchmarks: Product and service performance. Commun. ACM 2002, 45, 184–192. [Google Scholar] [CrossRef]
  41. Tayi, G.K.; Ballou, D.P. Examining data quality. Commun. ACM 1998, 41, 54–57. [Google Scholar] [CrossRef]
  42. Giles, J. Internet encyclopaedias go head to head. Nature 2005, 438, 900–901. [Google Scholar] [CrossRef] [PubMed]
  43. Holman Rector, L. Comparison of Wikipedia and other encyclopedias for accuracy, breadth, and depth in historical articles. Ref. Serv. Rev. 2008, 36, 7–22. [Google Scholar] [CrossRef]
  44. Crawford, H. Encyclopedias. Ref. Inf. Serv. An Introd. 2001, 433–459. [Google Scholar]
  45. Lewoniewski, W. Measures for Quality Assessment of Articles and Infoboxes in Multilingual Wikipedia. In Proceedings of the International Conference on Business Information Systems, Seville, Spain, 26–28 June 2019; pp. 619–633. [Google Scholar] [CrossRef]
  46. Dalip, D.H.; Gonçalves, M.A.; Cristo, M.; Calado, P. A general multiview framework for assessing the quality of collaboratively created content on web 2.0. J. Assoc. Inf. Sci. Technol. 2017, 68, 286–308. [Google Scholar] [CrossRef]
  47. Yaari, E.; Baruchson-Arbib, S.; Bar-Ilan, J. Information quality assessment of community generated content: A user study of Wikipedia. J. Inf. Sci. 2011, 37, 487–498. [Google Scholar] [CrossRef]
  48. Dang, Q.V.; Ignat, C.L. Measuring Quality of Collaboratively Edited Documents: The Case of Wikipedia. In Proceedings of the 2016 IEEE 2nd International Conference on Collaboration and Internet Computing (CIC), Pittsburgh, PA, USA, 1–3 November 2016; pp. 266–275. [Google Scholar]
  49. Shen, A.; Qi, J.; Baldwin, T. A Hybrid Model for Quality Assessment of Wikipedia Articles. In Proceedings of the Australasian Language Technology Association Workshop 2017, Brisbane, Australia, 6–8 December 2017; pp. 43–52. [Google Scholar]
  50. Zhang, S.; Hu, Z.; Zhang, C.; Yu, K. History-Based Article Quality Assessment on Wikipedia. In Proceedings of the 2018 IEEE International Conference on Big Data and Smart Computing (BigComp), Shanghai, China, 15–17 January 2018; pp. 1–8. [Google Scholar]
  51. Warncke-Wang, M.; Ranjan, V.; Terveen, L.G.; Hecht, B.J. Misalignment Between Supply and Demand of Quality Content in Peer Production Communities. In Proceedings of the Ninth International AAAI Conference on Web and Social Media, Oxford, UK, 26–29 May 2015; pp. 493–502. [Google Scholar]
  52. Lerner, J.; Lomi, A. Knowledge categorization affects popularity and quality of Wikipedia articles. PLoS ONE 2018, 13, e0190674. [Google Scholar] [CrossRef]
  53. Blumenstock, J.E. Automatically Assessing the Quality of Wikipedia Articles; Technical Report; UC Berkeley: Berkeley, CA, USA, 2008. [Google Scholar] [CrossRef]
  54. Dalip, D.H.; Gonçalves, M.A.; Cristo, M.; Calado, P. Automatic Assessment of Document Quality in Web Collaborative Digital Libraries. J. Data Inf. Qual. 2011, 2, 1–30. [Google Scholar] [CrossRef]
  55. Stvilia, B.; Twidale, M.B.; Smith, L.C.; Gasser, L. Assessing information quality of a community-based encyclopedia. Proc. ICIQ 2005, 5, 442–454. [Google Scholar]
  56. Wu, K.; Zhu, Q.; Zhao, Y.; Zheng, H. Mining the factors affecting the quality of Wikipedia articles. In Proceedings of the Information Science and Management Engineering (ISME), Xi’an, China, 7–8 August 2010; Volume 1, pp. 343–346. [Google Scholar]
  57. Stvilia, B.; Twidale, M.B.; Gasser, L.; Smith, L.C. Information quality discussions in Wikipedia. Proc. ICKM 2005, 5, 101–113. [Google Scholar]
  58. Conti, R.; Marzini, E.; Spognardi, A.; Matteucci, I.; Mori, P.; Petrocchi, M. Maturity assessment of Wikipedia medical articles. In Proceedings of the 2014 IEEE 27th International Symposium on Computer-Based Medical Systems, New York, NY, USA, 27–29 May 2014; pp. 281–286. [Google Scholar]
  59. Wikipedia. Featured Article Criteria. Available online: https://en.wikipedia.org/wiki/Wikipedia:Featured_article_criteria (accessed on 5 May 2019).
  60. Wikipedia. Verifiability. Available online: https://en.wikipedia.org/wiki/Wikipedia:Verifiability (accessed on 5 May 2019).
  61. Blumenstock, J.E. Size matters: Word count as a measure of quality on Wikipedia. In Proceedings of the 17th international conference on World Wide Web, Beijing, China, 21–25 April 2008; pp. 1095–1096. [Google Scholar] [CrossRef]
  62. Dalip, D.H.; Gonçalves, M.A.; Cristo, M.; Calado, P. Automatic quality assessment of content created collaboratively by web communities: A case study of wikipedia. In Proceedings of the 9th ACM/IEEE-CS Joint Conference on Digital Libraries, Austin, TX, USA, 15–19 June 2009; pp. 295–304. [Google Scholar] [CrossRef]
  63. Ferschke, O.; Gurevych, I.; Rittberger, M. FlawFinder: A Modular System for Predicting Quality Flaws in Wikipedia. Available online: https://pdfs.semanticscholar.org/72d6/9432b9703b632bac1d477d5020631c05cd53.pdf (accessed on 13 August 2019).
  64. Di Sciascio, C.; Strohmaier, D.; Errecalde, M.; Veas, E. WikiLyzer: Interactive information quality assessment in Wikipedia. In Proceedings of the 22nd International Conference on Intelligent User Interfaces, Limassol, Cyprus, 13–16 March 2017; pp. 377–388. [Google Scholar]
  65. Liu, J.; Ram, S. Using big data and network analysis to understand Wikipedia article quality. Data Knowl. Eng. 2018, 115, 80–93. [Google Scholar] [CrossRef]
  66. Shang, W. A Comparison of the Historical Entries in Wikipedia and Baidu Baike. In Proceedings of the International Conference on Information, Sheffield, UK, 25–28 March 2018; pp. 74–80. [Google Scholar]
  67. Roll, U.; Mittermeier, J.C.; Diaz, G.I.; Novosolov, M.; Feldman, A.; Itescu, Y.; Meiri, S.; Grenyer, R. Using Wikipedia page views to explore the cultural importance of global reptiles. Biol. Conserv. 2016, 204, 42–50. [Google Scholar] [CrossRef]
  68. Wikimedia Toolforge. Pageviews Analysis. Available online: https://tools.wmflabs.org/pageviews/ (accessed on 5 May 2019).
  69. WMF Analytics. Wikistats Pageview Files. Available online: https://dumps.wikimedia.org/other/pagecounts-ez/ (accessed on 5 May 2019).
  70. Lih, A. Wikipedia as Participatory Journalism: Reliable Sources? Metrics for evaluating collaborative media as a news resource. In Proceedings of the 5th International Symposium on Online Journalism, Austin, TX, USA, 16–17 April 2004; p. 31. [Google Scholar]
  71. Wilkinson, D.M.; Huberman, B.A. Cooperation and quality in wikipedia. In Proceedings of the 2007 international symposium on Wikis WikiSym 07, Montreal, QC, Canada, 21–25 October 2007; pp. 157–164. [Google Scholar] [CrossRef]
  72. Kittur, A.; Kraut, R.E. Harnessing the wisdom of crowds in wikipedia. In Proceedings of the ACM 2008 Conference on Computer Supported Cooperative Work—CSCW ’08, San Diego, CA, USA, 8–12 November 2008; p. 37. [Google Scholar] [CrossRef]
  73. Wilkinson, D.M.; Huberman, B.A. Assessing the Value of Coooperation inWikipedia. First Monday 2007, 12. [Google Scholar] [CrossRef]
  74. Kane, G.C. A multimethod study of information quality in wiki collaboration. ACM Trans. Manag. Inf. Syst. 2011, 2, 4. [Google Scholar] [CrossRef]
  75. Flekova, L.; Ferschke, O.; Gurevych, I. What makes a good biography?: Multidimensional quality analysis based on wikipedia article feedback data. In Proceedings of the 23rd International Conference on World Wide Web, Seoul, Korea, 7–11 April 2014; pp. 855–866. [Google Scholar]
  76. German Wikipedia. “Game of Thrones/Staffel 8”—Versionsgeschichte. Available online: https://de.wikipedia.org/w/index.php?title=Game_of_Thrones/Staffel_8&action=history (accessed on 1 June 2019).
  77. English Wikipedia. Game of Thrones (Season 8): Revision History. Available online: https://en.wikipedia.org/w/index.php?title=Game_of_Thrones_(season_8)&action=history (accessed on 1 June 2019).
  78. WikiRank. Quality and Popularity Assessment of Wikipedia. Available online: https://wikirank.net/ (accessed on 27 April 2019).
  79. WikiRank. Films Multilingual Ranking. Available online: https://wikirank.net/top/film (accessed on 1 June 2019).
  80. WikiRank. English Wikipedia. Available online: https://wikirank.net/en/ (accessed on 1 June 2019).
  81. WikiRank. Fortnite. Available online: https://wikirank.net/en/Fortnite (accessed on 1 June 2019).
  82. WikiRank. Minecraft. Available online: https://wikirank.net/en/Minecraft (accessed on 1 June 2019).
  83. WikiRank. Dota 2. Available online: https://wikirank.net/en/Dota_2 (accessed on 5 May 2019).
  84. WikiRank. Grand Theft Auto V. Available online: https://wikirank.net/en/Grand_Theft_Auto_V (accessed on 5 May 2019).
  85. Deutschsprachige Wikipedia. Kategorie: Kunst und Kultur. Available online: https://de.wikipedia.org/wiki/Kategorie:Kunst_und_Kultur (accessed on 5 May 2019).
  86. English Wikipedia. Wikipedia: Protection Policy. Available online: https://en.wikipedia.org/wiki/Wikipedia:Protection_policy (accessed on 5 May 2019).
  87. English Wikipedia. Bitcoin. Available online: https://en.wikipedia.org/wiki/Bitcoin (accessed on 1 June 2019).
  88. GitHub. Lewoniewski-User Profile. Available online: https://github.com/lewoniewski (accessed on 5 May 2019).
  89. The Linked Open Data Cloud. Datasets. Available online: https://lod-cloud.net/datasets (accessed on 5 May 2019).
  90. Kaggle. Datasets. Available online: https://www.kaggle.com/datasets (accessed on 5 May 2019).
  91. Wikimedia Toolforge. Global User Contributions. Available online: https://tools.wmflabs.org/guc/ (accessed on 1 June 2019).
  92. WikiTop. Wikipedians Top. Available online: http://wikitop.org/ (accessed on 1 June 2019).
Figure 1. Subject overlaps of articles in various language versions of Wikipedia. Source: own calculation based on Wikipedia dumps in April, 2019. Over 175 thousand of interactive combinations of these Venn diagrams can be found on the web page: http://data.lewoniewski.info/computers/vn1/.
Figure 1. Subject overlaps of articles in various language versions of Wikipedia. Source: own calculation based on Wikipedia dumps in April, 2019. Over 175 thousand of interactive combinations of these Venn diagrams can be found on the web page: http://data.lewoniewski.info/computers/vn1/.
Computers 08 00060 g001
Figure 2. Occurrence of similar sub-categories in the English Wikipedia category polyhierarchy. Source: own work based on Wikipedia dumps from April 2019.
Figure 2. Occurrence of similar sub-categories in the English Wikipedia category polyhierarchy. Source: own work based on Wikipedia dumps from April 2019.
Computers 08 00060 g002
Figure 3. Shares of articles in each category in English Wikipedia. Source: own calculation based on Wikipedia dumps in April, 2019.
Figure 3. Shares of articles in each category in English Wikipedia. Source: own calculation based on Wikipedia dumps in April, 2019.
Computers 08 00060 g003
Figure 4. Share of articles in the main categories within each of 55 language versions of Wikipedia. Source: own calculation based on Wikipedia dumps in April, 2019. A more detailed and interactive chart can be found on the web page: http://data.lewoniewski.info/computers/heatmap-cat-art.
Figure 4. Share of articles in the main categories within each of 55 language versions of Wikipedia. Source: own calculation based on Wikipedia dumps in April, 2019. A more detailed and interactive chart can be found on the web page: http://data.lewoniewski.info/computers/heatmap-cat-art.
Computers 08 00060 g004
Figure 5. Shares of articles in each category in 55 language versions of Wikipedia. Source: own calculation based on Wikipedia dumps in April, 2019.
Figure 5. Shares of articles in each category in 55 language versions of Wikipedia. Source: own calculation based on Wikipedia dumps in April, 2019.
Computers 08 00060 g005
Figure 6. Overlaps of articles between selected main categories in Arabic, English and French Wikipedia. Source: own calculation based on Wikipedia dumps in April, 2019. Over a million of interactive combinations of these Venn diagrams (each main categories and language versions) can be found on the web page: http://data.lewoniewski.info/computers/vn2/.
Figure 6. Overlaps of articles between selected main categories in Arabic, English and French Wikipedia. Source: own calculation based on Wikipedia dumps in April, 2019. Over a million of interactive combinations of these Venn diagrams (each main categories and language versions) can be found on the web page: http://data.lewoniewski.info/computers/vn2/.
Computers 08 00060 g006
Figure 7. Quality dimensions of web 2.0 portals, encyclopedias and Wikipedia. Source: own work based on [45].
Figure 7. Quality dimensions of web 2.0 portals, encyclopedias and Wikipedia. Source: own work based on [45].
Computers 08 00060 g007
Figure 8. Part of the article history about Game of Thrones (season 8) in English (en) and German (de) Wikipedia with highlighted authors. Source: [76,77].
Figure 8. Part of the article history about Game of Thrones (season 8) in English (en) and German (de) Wikipedia with highlighted authors. Source: [76,77].
Computers 08 00060 g008
Figure 9. Average quality of articles in each category and language version of Wikipedia. Source: own calculation based on Wikipedia dumps in April, 2019. A more detailed and interactive chart can be found on the web page: http://data.lewoniewski.info/computers/heatmap-cat-quality.
Figure 9. Average quality of articles in each category and language version of Wikipedia. Source: own calculation based on Wikipedia dumps in April, 2019. A more detailed and interactive chart can be found on the web page: http://data.lewoniewski.info/computers/heatmap-cat-quality.
Computers 08 00060 g009
Figure 10. Average page views per article in year 2018 for each main category and language version of Wikipedia. Source: own calculation based on Wikipedia dumps. A more detailed and interactive chart can be found on the web page: http://data.lewoniewski.info/computers/heatmap-cat-views.
Figure 10. Average page views per article in year 2018 for each main category and language version of Wikipedia. Source: own calculation based on Wikipedia dumps. A more detailed and interactive chart can be found on the web page: http://data.lewoniewski.info/computers/heatmap-cat-views.
Computers 08 00060 g010
Figure 11. Average number of authors per article during 2018 in each main category and language version of Wikipedia. Source: own calculation based on Wikipedia dumps. A more detailed and interactive chart can be found on the web page: http://data.lewoniewski.info/computers/heatmap-cat-authors.
Figure 11. Average number of authors per article during 2018 in each main category and language version of Wikipedia. Source: own calculation based on Wikipedia dumps. A more detailed and interactive chart can be found on the web page: http://data.lewoniewski.info/computers/heatmap-cat-authors.
Computers 08 00060 g011
Figure 12. Average quality, authors interest and page views during 2018 per article in each class and language version of Wikipedia. A more detailed and interactive chart can be found on the web page: http://data.lewoniewski.info/computers/heatmap-classes.
Figure 12. Average quality, authors interest and page views during 2018 per article in each class and language version of Wikipedia. A more detailed and interactive chart can be found on the web page: http://data.lewoniewski.info/computers/heatmap-classes.
Computers 08 00060 g012
Figure 13. List of the most popular articles about films in multilingual Wikipedia in WikiRank service. Source: [79].
Figure 13. List of the most popular articles about films in multilingual Wikipedia in WikiRank service. Source: [79].
Computers 08 00060 g013
Figure 14. Local ranking with quality distribution of all articles in English Wikipedia in WikiRank service. Source: [80].
Figure 14. Local ranking with quality distribution of all articles in English Wikipedia in WikiRank service. Source: [80].
Computers 08 00060 g014
Figure 15. Profile on WikiRank of the article about Fortnite in English Wikipedia with information about places in local and global rankings, quality and popularity scores, history of popularity rank. Source: [81].
Figure 15. Profile on WikiRank of the article about Fortnite in English Wikipedia with information about places in local and global rankings, quality and popularity scores, history of popularity rank. Source: [81].
Computers 08 00060 g015
Figure 16. The most popular language versions, languages with the highest quality and language versions with the highest authors’ interest (AI) value for article about Minecraft on WikiRank. Source: [82].
Figure 16. The most popular language versions, languages with the highest quality and language versions with the highest authors’ interest (AI) value for article about Minecraft on WikiRank. Source: [82].
Computers 08 00060 g016
Figure 17. Wikipedia article about Bitcoin with a marked level of protection. Source: [87].
Figure 17. Wikipedia article about Bitcoin with a marked level of protection. Source: [87].
Computers 08 00060 g017
Table 1. The analyzed 55 language versions of Wikipedia with article count, views from unique devices and total page views (based on the dump in April 2019).
Table 1. The analyzed 55 language versions of Wikipedia with article count, views from unique devices and total page views (based on the dump in April 2019).
No.Language VersionAbbr.ArticlesAuthorsTotal Page ViewsUnique Devices
1Englishen5,835,94636,031,9427,846,676,922866,456,515
2Swedishsv3,748,546664,601102,423,25212,597,043
3Germande2,288,1483,158,210975,590,897114,380,633
4Frenchfr2,094,7233,405,365742,709,05596,553,550
5Dutchnl1,962,531986,565155,136,11323,873,475
6Russianru1,539,4112,500,221896,358,32396,537,026
7Italianit1,518,7021,803,513544,481,44553,459,817
8Spanishes1,514,4315,375,4091,090,438,930180,071,200
9Polishpl1,329,622949,766278,226,32929,262,659
10Vietnamesevi1,205,176660,02068,454,73516,396,173
11Japaneseja1,145,8381,462,0521,043,323,32298,636,732
12Chinesezh1,051,8742,709,195412,676,45752,328,429
13Portuguesept1,007,9422,230,598352,570,67169,605,320
14Ukrainianuk896,476448,34562,906,36110,849,975
15Arabicar715,8501,643,146188,230,43539,994,487
16Persianfa671,576812,855142,075,76121,993,488
17Serbiansr618,230240,80227,054,6154,776,849
18Catalanca610,217319,68121,121,4813,439,969
19Norwegian (Bokmål)no506,510457,76736,974,9986,017,919
20Indonesianid458,0341,047,391146,481,27133,774,831
21Finnishfi454,859413,53365,437,8327,372,105
22Koreanko450,896559,60883,623,81919,933,158
23Hungarianhu448,744133,23254,741,9218,298,454
24Serbo-Croatiansh447,790409,9105,900,0872,372,396
25Czechcs425,852448,81673,574,8109,338,114
26Romanianro393,439470,90239,466,6747,711,157
27Basqueeu332,99798,9209,067,706446,209
28Turkishtr325,627233,11825,389,3233,076,606
29Malayms325,5921,028,12812,291,7273,960,414
30Esperantoeo256,487156,7111,981,767263,084
31Bulgarianbg254,27284,45127,272,9984,093,761
32Danishda250,890249,63830,667,7225,190,512
33Armenianhy248,278349,9176,013,622918,474
34Hebrewhe240,943507,61858,213,9496,344,428
35Slovaksk229,146171,23816,854,6143,117,661
36Min Nanzh-min-nan228,10237,919572,77384,788
37Kazakhkk223,88185,93411,562,9252,142,268
38Croatianhr204,240216,01621,779,9294,497,371
39Lithuanianlt194,537131,09512,276,8821,984,922
40Estonianet189,742125,75411,502,3191,187,671
41Belarusianbe166,77584,9711,711,658253,243
42Sloveniansl164,036178,0428,497,8671,491,437
43Greekel160,482271,12534,866,9196,330,938
44Galiciangl155,57396,6172,533,863512,368
45Azerbaijaniaz145,060172,09312,826,8071,748,834
46Urduur144,94293,3772,916,140506,414
47Simple Englishsimple144,053823,35519,179,0479,071,802
48Norwegian (Nynorsk)nn142,63595,9451,733,721563,079
49Uzbekuz130,99044,2643,256,673569,355
50Thaith130,723349,69563,983,64614,758,190
51Hindihi130,443444,00456,017,39817,087,729
52Latinla130,327117,1101,086,052173,591
53Georgianka127,899109,5318,642,1991,147,871
54Volapükvo122,75726,048266,02038,888
55Tamilta121,501152,0248,357,7082,295,703
Table 2. The number of categories, number of links from articles to categories and between categories in 55 lanquage versions of Wikipedia (sorted by category density). Source: own calculations in April, 2019.
Table 2. The number of categories, number of links from articles to categories and between categories in 55 lanquage versions of Wikipedia (sorted by category density). Source: own calculations in April, 2019.
Wikipedia LanguageNumber of CategoriesCategory RatioNumber of LinksAverage Number of Categories Per Article
AllWithout PageFrom Articles to CategoriesBetween Categories
Urdu (ur)178,27188361.2301,048,967775,5907.237
Arabic (ar)576,87263680.80621,548,3191,982,15730.102
Persian (fa)499,231370.7439,748,8241,568,01814.516
Turkish (tr)226,14510,3830.6942,322,792542,3667.133
Belarusian (be)115,20533,8070.6911,182,398193,1687.090
Norwegian (Nynorsk) (nn)88,80418,1560.623789,450158,2805.535
Korean (ko)268,76120,7730.5964,462,341652,7649.897
Thai (th)73,10625,1300.559922,356118,3697.056
Georgian (ka)65,04715,3170.509435,646103,9733.406
Slovenian (sl)77,14621,6490.4701,078,180119,5676.573
Azerbaijani (az)65,62721040.452906,108127,1446.246
Hindi (hi)54,78530,5070.420593,49650,6734.550
Indonesian (id)186,977102,4060.4085,279,994185,26611.528
Galician (gl)62,1095770.399689,762120,1904.434
Chinese (zh)395,448101,1110.37612,793,208716,79812.162
Greek (el)60,05638260.3741,218,241156,1997.591
Armenian (hy)87,52225,7290.3531,601,227136,0136.449
Czech (cs)140,7576650.3312,730,698333,8706.412
Esperanto (eo)83,33115,7270.3251,136,030184,4284.429
Portuguese (pt)316,31811,2930.3149,346,482751,7189.273
Slovak (sk)70,586760.308919,689199,7174.014
Russian (ru)469,18053,0680.30517,351,449929,16511.271
Hebrew (he)71,150250.2952,310,076170,7369.588
Norwegian (Bokmål) (no)148,81665090.2944,182,237340,2518.257
English (en)1,711,545970.293127,118,1955,545,93821.782
Latin (la)38,187890.293628,28076,7264.821
Romanian (ro)115,32526,2310.2933,398,779274,8588.639
Malay (ms)91,57862,8700.2811,393,58859,2644.280
Simple English (simple)40,0524770.278778,386101,1125.403
Ukrainian (uk)248,61446,1810.2777,008,669538,4377.818
Bulgarian (bg)68,89826240.2711,291,378150,4525.079
>Spanish (es)398,82823,0740.2639,103,226903,9996.011
Tamil (ta)30,47776610.251483,54641,0803.980
Danish (da)62,49050050.2491,861,533156,6087.420
Vietnamese (vi)276,936101,1730.2307,745,566476,3646.427
Italian (it)348,216320.22914,715,516847,5839.690
Basque (eu)73,82719,2060.2221,497,904170,5044.498
French (fr)425,707760.20338,654,8802,583,39418.453
Japanese (ja)232,88120,2310.2038,060,212551,9807.034
Kazakh (kk)45,51223,0830.2031,660,29441,9587.416
Estonian (et)29,8894410.158553,02753,9332.915
Finnish (fi)72,0062800.1582,707,673157,9135.953
German (de)354,701290.15512,255,563886,2695.356
Polish (pl)205,3912060.1545,310,093399,2993.994
Min Nan (zh-min-nan)32,59214,5160.143608,96946,2802.670
Hungarian (hu)60,203300.1342,895,750111,0676.453
Lithuanian (lt)24,7213160.127541,91145,8742.786
Catalan (ca)75,9511680.1242,672,097179,4834.379
Serbo-Croatian (sh)45,5273740.1021,520,947101,5153.397
Serbian (sr)59,25410,8990.0964,355,457106,2867.045
Swedish (sv)354,075160.09420,002,023639,0595.336
Croatian (hr)19,065530.093503,92032,9032.467
Uzbek (uz)12,02640010.092832,32112,7586.354
Dutch (nl)114,899100.05910,060,345320,3545.126
Volapük (vo)24402690.020353,34328782.878
Table 3. List of the categories in “Category:Main topic classifications” in English Wikipedia with number of the considered language versions (April 2019).
Table 3. List of the categories in “Category:Main topic classifications” in English Wikipedia with number of the considered language versions (April 2019).
No.Category NameNumber of Considered Language Versions
1Education55
2Geography55
3History55
4Mathematics55
5Music55
6Philosophy55
7Religion55
8Science55
9Society55
10Sports55
11Arts54
12Organizations54
13People54
14Politics54
15Culture53
16Law53
17Technology53
18Health52
19Military52
20Entertainment51
21Events51
22Food and drink51
23Government49
24Nature49
25Crime48
26Business47
27Life47
28Academic disciplines45
29Human behavior44
30Knowledge44
31Concepts43
32Language39
33Objects37
34Mind28
35Humanities27
36World27
37Economy17
38Universe5
Table 4. Main category of articles with the highest value of average quality, average popularity and authors’ interest in each language version of Wikipedia. Source: own calculations.
Table 4. Main category of articles with the highest value of average quality, average popularity and authors’ interest in each language version of Wikipedia. Source: own calculations.
Language VersionQualityPopularityAuthors’ Interest
Arabic (ar)ReligionReligionReligion
Azerbaijani (az)GovernmentGovernmentGovernment
Belarusian (be)GovernmentBusinessEvents
Bulgarian (bg)EventsFood and drinkLife
Catalan (ca)EventsLawEvents
Czech (cs)OrganizationsHealthCrime
Danish (da)PhilosophyPhilosophyCrime
German (de)EntertainmentEntertainmentEvents
Greek (el)EntertainmentHealthFood and drink
English (en)CrimePhilosophyPhilosophy
Esperanto (eo)PhilosophyEventsLife
Spanish (es)PhilosophyPhilosophyCrime
Estonian (et)CrimeFood and drinkCrime
Basque (eu)EducationEducationEducation
Persian (fa)ReligionPhilosophyReligion
Finnish (fi)GovernmentGovernmentGovernment
French (fr)CrimeCrimeCrime
Galician (gl)EducationEventsFood and drink
Hebrew (he)EntertainmentEventsEvents
Hindi (hi)LawLawBusiness
Croatian (hr)OrganizationsMathematicsMilitary
Hungarian (hu)EventsEventsEvents
Armenian (hy)GovernmentGovernmentCrime
Indonesian (id)ArtsBusinessPhilosophy
Italian (it)EntertainmentEducationMilitary
Japanese (ja)OrganizationsEventsEvents
Georgian (ka)GovernmentCrimeMusic
Kazakh (kk)SportsPhilosophyHealth
Korean (ko)PeopleBusinessMilitary
Latin (la)ReligionReligionReligion
Lithuanian (lt)EducationMathematicsSports
Malay (ms)PeopleLawBusiness
Dutch (nl)EducationPhilosophyEvents
Norwegian (Nynorsk) (nn)HistoryHistoryMusic
Norwegian (Bokmål) (no)CrimeMathematicsSports
Polish (pl)CrimeCrimeEntertainment
Portuguese (pt)BusinessHealthCrime
Romanian (ro)GovernmentGovernmentFood and drink
Russian (ru)EntertainmentEntertainmentEvents
Serbo-Croatian (sh)MusicMathematicsScience
Simple English (simple)OrganizationsOrganizationsOrganizations
Slovak (sk)CrimeCrimeCrime
Slovenian (sl)GovernmentGovernmentGovernment
Serbian (sr)CrimeCrimeLife
Swedish (sv)EventsHealthGeography
Tamil (ta)EntertainmentPhilosophyTechnology
Thai (th)ArtsMilitaryEvents
Turkish (tr)EventsPoliticsNature
Ukrainian (uk)CrimePhilosophyCrime
Urdu (ur)EducationMilitaryOrganizations
Uzbek (uz)EventsPhilosophyEvents
Vietnamese (vi)OrganizationsLawSports
Volapük (vo)SportsPhilosophyMathematics
Chinese (zh)EntertainmentEntertainmentCrime
Min Nan (zh-min-nan)HealthTechnologyPolitics
Table 5. Classes of articles with the highest value of average quality, average popularity and authors’ interest in each language version of Wikipedia. Source: own calculations.
Table 5. Classes of articles with the highest value of average quality, average popularity and authors’ interest in each language version of Wikipedia. Source: own calculations.
Language VersionQualityPopularityAuthors’ Interest
Arabic (ar)websitewebsitewebsite
Azerbaijani (az)websitewebsiteuniversity
Belarusian (be)footballplayerprogrammingautomobile
Bulgarian (bg)actorwebsitecity
Catalan (ca)actorwebsitewebsite
Czech (cs)citywebsitecity
Danish (da)actorwebsiteautomobile
German (de)citywebsitecity
Greek (el)actorwebsitecity
English (en)cityprogrammingautomobile
Esperanto (eo)footballplayerwebsitecity
Spanish (es)citywebsitecity
Estonian (et)websitewebsiteprogramming
Basque (eu)websitewebsitecity
Persian (fa)universitywebsiteuniversity
Finnish (fi)websitewebsitecity
French (fr)actorwebsitewebsite
Galician (gl)businesswebsitecity
Hebrew (he)actorwebsiteautomobile
Hindi (hi)citywebsitefootballplayer
Croatian (hr)actorwebsitecity
Hungarian (hu)universitywebsiteuniversity
Armenian (hy)videogamewebsitefootballplayer
Indonesian (id)actorprogrammingwebsite
Italian (it)actorwebsitefootballplayer
Japanese (ja)universityactorautomobile
Georgian (ka)footballplayerwebsitevideogame
Kazakh (kk)footballplayerwebsitewebsite
Korean (ko)universitywebsiteautomobile
Latin (la)programmingwebsitecity
Lithuanian (lt)websitewebsitefootballplayer
Malay (ms)actoruniversitybusiness
Dutch (nl)websitewebsitewebsite
Norwegian (Nynorsk) (nn)automobilewebsitecity
Norwegian (Bokmål) (no)websitewebsitevideogame
Polish (pl)citywebsitecity
Portuguese (pt)actorprogrammingwebsite
Romanian (ro)websitewebsitebusiness
Russian (ru)videogamewebsitevideogame
Serbo-Croatian (sh)websitewebsitecity
Simple English (simple)websiteprogrammingactor
Slovak (sk)websitewebsiteautomobile
Slovenian (sl)websitewebsitecity
Serbian (sr)actoractorwebsite
Swedish (sv)websitewebsitecity
Tamil (ta)actorwebsiteautomobile
Thai (th)actoruniversityuniversity
Turkish (tr)actorwebsitecity
Ukrainian (uk)actorwebsitevideogame
Urdu (ur)universityprogrammingprogramming
Uzbek (uz)filmwebsitefilm
Vietnamese (vi)universitywebsitevideogame
Volapük (vo)filmwebsitefilm
Chinese (zh)actoractorautomobile
Min Nan (zh-min-nan)videogamewebsitecity
Table 6. Top three articles about cars with highest number of page views and authors’ interest in multilingual ranking, monthly. Source: own calculations.
Table 6. Top three articles about cars with highest number of page views and authors’ interest in multilingual ranking, monthly. Source: own calculations.
MonthPage ViewsAuthors’ Interest
January 2018Volkswagen GolfHonda Accord
BMW 3 SeriesHonda Ridgeline
Audi A4Toyota Avalon
February 2018BMW 3 SeriesHonda Civic Type R
Volkswagen GolfTesla Model X
Audi A4Nissan GT-R
March 2018BMW 3 SeriesHonda Civic Type R
Ford MustangSubaru Impreza
Volkswagen GolfTesla Model X
April 2018Ford MustangHonda Civic Type R
BMW 3 SeriesSubaru Impreza
Volkswagen GolfBMW M5
May 2018Ford MustangDMC DeLorean
BMW 3 SeriesSubaru Impreza
Volkswagen GolfMcLaren P1
June 2018Ford MustangAcura RDX
BMW 3 SeriesLaFerrari
Volkswagen GolfFord Model T
July 2018BMW 3 SeriesHonda Accord
Ford MustangVolvo 850
Volkswagen GolfChevrolet Impala
August 2018BMW 3 SeriesPontiac GTO
Ford MustangHonda Accord
Volkswagen GolfBMW M3
September 2018BMW 3 SeriesPorsche 997
Ford MustangOpel Combo
Volkswagen GolfFord Falcon (AU)
October 2018BMW 3 SeriesToyota Land Cruiser
BMW 3 Series (F30)Lamborghini Aventador
Volkswagen GolfLincoln Continental
November 2018BMW 3 SeriesToyota Land Cruiser
Tesla Model SHonda Accord
Volkswagen GolfMitsubishi Triton
December 2018BMW 3 SeriesHonda Civic Type R
Volkswagen GolfToyota Land Cruiser
Tesla Model SSubaru Impreza
January 2019BMW 3 SeriesToyota Prius
Toyota SupraToyota Corolla
Volkswagen GolfFord F-Series
February 2019BMW 3 SeriesBMW 3 Series (E36)
Volkswagen GolfLincoln Continental
Ford MustangHonda Accord
March 2019BMW 3 SeriesToyota Prius
Tesla Model STesla Model X
Ford MustangBMW 3 Series (E36)
Table 7. Top three articles about films with highest number of page views and authors’ interest in multilingual ranking, monthly. Source: own calculations.
Table 7. Top three articles about films with highest number of page views and authors’ interest in multilingual ranking, monthly. Source: own calculations.
MonthPage ViewsAuthors’ Interest
January 2018Black MirrorPokkiri
The End of the F***ing WorldDhoom 3
Star Wars: The Last JediStreet Lights
February 2018Black Panther (film)The Ghost of Hui Family
Altered Carbon (TV series)Children of Men
Money HeistBairavaa
March 2018Black Panther (film)Bairavaa
The Shape of WaterA Night to Remember (1958 film)
Avengers: Infinity WarAcrimony (film)
April 2018Avengers: Infinity WarJason X
A Quiet Place (film)Traffik (2018 film)
Money HeistCrazy Rich Asians (film)
May 2018Avengers: Infinity WarBairavaa
Deadpool 2War for the Planet of the Apes
Black Panther (film)Masterpiece (2017 film)
June 2018Jurassic World: Fallen KingdomBairavaa
Avengers: Infinity WarHello (2017 film)
Westworld (TV series)Crazy Rich Asians (film)
July 2018Ant-Man and the WaspBairavaa
Avengers: Infinity WarAntenna (film)
The Handmaid’s Tale (TV series)Bean (film)
August 2018Story of Yanxi PalaceRangasthalam
Avengers: Infinity WarWhite Boy Rick
Crazy Rich Asians (film)Happy Death Day
September 2018Story of Yanxi PalaceJaws 2
The Nun (2018 film)Bean (film)
The MatrixInstant Family
October 2018Venom (2018 film)Doctor Sleep (2019 film)
A Star Is Born (2018 film)Escape Room (film)
The Haunting (TV series)Jawani Phir Nahi Ani 2
November 2018Bohemian Rhapsody (film)Doctor Sleep (2019 film)
Fantastic Beasts: The Crimes of GrindelwaldEnai Noki Paayum Thota
Fantastic Beasts and Where to Find Them (film)Scooby-Doo! and the Curse of the 13th Ghost
December 2018Aquaman (film)Unda (film)
Spider-Man: Into the Spider-VerseEscape Room (film)
Bohemian Rhapsody (film)Bairavaa
January 2019Glass (2019 film)Bairavaa
You (TV series)Vaagai Sooda Vaa
Aquaman (film)Bros: After the Screaming Stops
February 2019Alita: Battle AngelDoctor Sleep (2019 film)
The Umbrella Academy (TV series)Kanne Kalaimaane
Green Book (film)8 Mile (film)
March 2019Captain Marvel (film)Kanne Kalaimaane
Us (2019 film)Son of Kashmir: Burhan
Game of Thrones8 Mile (film)
Table 8. Top three articles about video games with the highest number of page views and authors’ interest in multilingual ranking, monthly. Source: own calculations.
Table 8. Top three articles about video games with the highest number of page views and authors’ interest in multilingual ranking, monthly. Source: own calculations.
MonthPage ViewsAuthors’ Interest
January 2018Assassin’s CreedCeleste (video game)
DevilmanUnreal Tournament
PlayerUnknown’s BattlegroundsLego Marvel Super Heroes 2
February 2018Assassin’s CreedCeleste (video game)
Kingdom Come: DeliveranceLittle Witch Academia: Chamber of Time
FortniteFire Emblem: The Binding Blade
March 2018FortniteAce Combat 7: Skies Unknown
Assassin’s CreedThe Crew 2
Call of DutyDetective Pikachu
April 2018God of War (2018 video game)FortniteAce Combat 7: Skies Unknown
FortniteH1Z1
Far Cry 5Skynet (video game)
May 2018FortniteSpider-Man 3 (video game)
God of War (2018 video game)AirAttack
Assassin’s CreedImperator: Rome
June 2018Detroit: Become HumanAce Combat 7: Skies
FortniteRules of Survival
Assassin’s CreedTotally Accurate Battlegrounds
July 2018FortniteAce Combat 7: Skies
Detroit: Become HumanMicroVolts
Assassin’s CreedAliens: Colonial Marines
August 2018FortniteSpider-Man 3 (video game)
Assassin’s CreedH1Z1
World of WarcraftShovel Knight
September 2018Borderlands: The Pre-SequelRules of Survival
Spider-Man (2018 video game)Nickelodeon Kart Racers
FortniteH1Z1
October 2018Borderlands: The Pre-SequelRuneScape
Assassin’s CreedH1Z1
Red Dead Redemption 2Starlink: Battle for Atlas
November 2018Borderlands: The Pre-SequelCall of Duty: Black Ops III
Red Dead Redemption 2Spider-Man 3 (video game)
Fallout 76Dragon Ball Xenoverse 2
December 2018Borderlands: The Pre-SequelMarvel: Ultimate Alliance
FortnitePewDiePie: Legend of the Brofist
Red Dead Redemption 2Yo-kai Watch
January 2019Borderlands: The Pre-SequelPortal 2
FortniteDick Vitale’s ”Awesome Baby” College Hoops
MinecraftFire Emblem Warriors
February 2019Borderlands: The Pre-SequelDick Vitale’s ”Awesome Baby” College Hoops
Apex LegendsWargroove
FortniteFire Emblem Warriors
March 2019Borderlands: The Pre-SequelAssassin’s Creed II
FortniteDance Dance Revolution A20
Sekiro: Shadows Die TwiceSubnautica

Share and Cite

MDPI and ACS Style

Lewoniewski, W.; Węcel, K.; Abramowicz, W. Multilingual Ranking of Wikipedia Articles with Quality and Popularity Assessment in Different Topics. Computers 2019, 8, 60. https://doi.org/10.3390/computers8030060

AMA Style

Lewoniewski W, Węcel K, Abramowicz W. Multilingual Ranking of Wikipedia Articles with Quality and Popularity Assessment in Different Topics. Computers. 2019; 8(3):60. https://doi.org/10.3390/computers8030060

Chicago/Turabian Style

Lewoniewski, Włodzimierz, Krzysztof Węcel, and Witold Abramowicz. 2019. "Multilingual Ranking of Wikipedia Articles with Quality and Popularity Assessment in Different Topics" Computers 8, no. 3: 60. https://doi.org/10.3390/computers8030060

APA Style

Lewoniewski, W., Węcel, K., & Abramowicz, W. (2019). Multilingual Ranking of Wikipedia Articles with Quality and Popularity Assessment in Different Topics. Computers, 8(3), 60. https://doi.org/10.3390/computers8030060

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop