Next Article in Journal
Web-Based Scientific Exploration and Analysis of 3D Scanned Cuneiform Datasets for Collaborative Research
Next Article in Special Issue
A Data Quality Strategy to Enable FAIR, Programmatic Access across Large, Diverse Data Collections for High Performance Data Analysis
Previous Article in Journal
Requirements and Pitfalls in AAL Projects. Guide to Self-Criticism for Developers from Experience
Previous Article in Special Issue
Big Data in the Era of Health Information Exchanges: Challenges and Opportunities for Public Health
Article Menu

Export Article

Informatics 2017, 4(4), 43; doi:10.3390/informatics4040043

Article
Relative Quality and Popularity Evaluation of Multilingual Wikipedia Articles
Włodzimierz Lewoniewski *,,Orcid, Krzysztof Węcel Orcid and Witold Abramowicz
Department of Information Systems, Poznań University of Economics and Business, 61-875 Poznań, Poland
*
Correspondence: Tel.: +48-(61)-639-27-93
Current address: al. Niepodległości 10, 61-875 Poznań, Poland
These authors contributed equally to this work.
Academic Editors: Mouzhi Ge and Vlastislav Dohnal
Received: 21 September 2017 / Accepted: 2 December 2017 / Published: 8 December 2017

Abstract

:
Despite the fact that Wikipedia is often criticized for its poor quality, it continues to be one of the most popular knowledge bases in the world. Articles in this free encyclopedia on various topics can be created and edited in about 300 different language versions independently. Our research has showed that in language sensitive topics, the quality of information can be relatively better in the relevant language versions. However, in most cases, it is difficult for the Wikipedia readers to determine the language affiliation of the described subject. Additionally, each language edition of Wikipedia can have own rules in the manual assessing of the content’s quality. There are also differences in grading schemes between language versions: some use a 6–8 grade system to assess articles, and some are limited to 2–3. This makes automatic quality comparison of articles between various languages a challenging task, particularly if we take into account a large number of unassessed articles; some of the Wikipedia language editions have over 99% of articles without a quality grade. The paper presents the results of a relative quality and popularity assessment of over 28 million articles in 44 selected language versions. Comparative analysis of the quality and the popularity of articles in popular topics was also conducted. Additionally, the correlation between quality and popularity of Wikipedia articles of selected topics in various languages was investigated. The proposed method allows us to find articles with information of better quality that can be used to automatically enrich other language editions of Wikipedia.
Keywords:
Wikipedia; information quality; WikiRank; DBpedia

1. Introduction

Sustaining accurate, complete, reliable, and up-to-date information on the Web is very important, particularly during the development of collaborative platforms and the growth of their popularity. These platforms allow Internet users to create content without special technical skills. Despite the fact that even anonymous users can participate in content addition, information in these knowledge bases can be not only abundant but also trustworthy [1].
Wikipedia is one of the best examples of such collaborative platforms. This encyclopedia became a popular source of information on different topics. Nowadays, it is the fifth most visited page in the world (https://www.alexa.com/siteinfo/wikipedia.org). The pages of this online knowledge base often appear among the first in search results using Google, Bing, Yandex, and other search engines. There are about 300 language editions on Wikipedia with over 46 million articles, which cover all subjects of human activity (https://meta.wikimedia.org/wiki/List_of_Wikipedias). The English edition is the largest and consists of over 5.4 million articles.
Despite the popularity and large volume of information in free access, Wikipedia is often criticized for unreliable content (more information about criticism of Wikipedia can be found on the page https://en.wikipedia.org/wiki/Criticism_of_Wikipedia). This is due primarily to the fact that everyone can participate in the creation and editing of the articles without proving competence or education. Changes made by users (even anonymous users) are immediately available for a wide range of Wikipedia readers. There is no professional editorial control. Articles about the same subject can be edited independently in each language version. Therefore, we can observe a difference in quality between languages depending on the described topic.
In order to help readers quickly determine the quality of the content, the Wikipedia community has defined a grading system for assessing the quality of the articles. Each language version of Wikipedia can have its own rules and standards for writing. In many language versions, there are special awards for articles of the highest quality. In English Wikipedia, these articles are labeled as “featured articles” (FAs)—they must be well-written with appropriate structure, comprehensive, well-researched with reliable sources, and present views fairly and without bias (https://en.wikipedia.org/wiki/Wikipedia:Featured_article_criteria). Another distinction—“good article” (GA)—can be awarded to an article that has not met the criteria for a FA but was close enough. These awards used in English Wikipedia often have equivalents in other language editions of Wikipedia. For example, for FA and GA awards in German Wikipedia are “exzellente artikel” and “lesenswerte artikel”, respectively. However, the share of the best articles in each Wikipedia language is relatively small—on average, around 0.3% in each language.
In some language editions of Wikipedia, there are also other quality grades, which can reflect the maturity of an article. In English Wikipedia, in addition to the highest FA and GA grades, there are also A-class, B-class, C-class, start, and stub. In Russian Wikipedia, there are additionally “solid article”, “I level”, “II level”, “III level”, and “IV level” grades. Polish Wikipedia has three additional grades: “four”, “start”, and “stub”. Equivalent classes between language versions can have differences in assessing standards. For example, in some language versions, for high grades, there is a limitation on the article’s length. Therefore, each language version can have its own quality model, even if those languages have the same number of grades.
The observation is then that there are differences between Wikipedia languages in grading schemes, and not all language versions have a developed system of quality grades for articles. For example, one of the largest versions of Wikipedia is the German edition, and it has only two highest grades—equivalent to FA and GA. Differences in quality grades do not allow us to directly compare the quality of the articles between the various language versions. An additional challenge is a large number of articles without grades. For example, in German and Polish Wikipedia, over 99% of articles are unassessed (i.e., over 2 million and 1.2 million articles, respectively).
The goal of this paper is to research the relation between the quality of Wikipedia articles and their popularity. Our hypothesis is that relative popularity is positively correlated with the relative quality of an article. We introduce a method of quality assessment of Wikipedia articles as a synthetic measure, on a scale between 0 and 100. This approach is used to evaluate more than 28 million articles in 44 language versions of Wikipedia. In addition, a comparison of quality between the articles in different languages on selected topics is conducted. The paper also presents results of the estimation of relative popularity of these articles. This makes it possible to study the association between quality and popularity in each language–topic pair.
The paper is structured as follows. Section 2 describes related work concerning both the quality and popularity of Wikipedia articles. Section 3 introduces a synthetic measure used by us to assess the quality of articles, and we present various statistics. Section 4 explains how popularity is measured. In Section 5, we study the association between quality and popularity. Section 6 presents the results of the quality and popularity assessment of Wikipedia articles in 44 languages on different topics. Section 7 concludes the paper.

2. Related Work

2.1. Quality Assessment

Automatic quality assessment of Wikipedia articles is a relatively developed topic in scientific works. Using different methods, it is possible to estimate the quality of articles on the basis of content, edit history, the article’s discussion page, the article’s links, users’ reputations, and other sources. Related studies have proposed different sets of metrics, which can be divided into two groups: content-based and user-based methods.
First works concerning content-based methods have concluded that longer articles in Wikipedia often had a higher quality [2]. Other papers have showed that high-quality articles tend to have more images, sections, and references [3,4,5]. Some scientific works have analyzed language features, which can characterize the writing style of articles. High-quality articles cover more concepts, objects and facts than lower-quality articles [6,7]. According to these studies, the number of facts in a document can indicate its informativeness. The writing style of Wikipedia articles can be also estimated by analyzing character trigram metrics [8]. Basic lexical metrics based on word usages in Wikipedia articles are used in another study as the factors that can reflect the articles’ quality—high-quality articles often used more nouns and verbs and less adjectives [9]. Finally, a quality evaluation of Wikipedia articles can also be based on special quality flaw templates [10].
The second group of studies—user-based—is related to editors’ behavior. These aim to analyze how the user skills, experience, and coordination of their activities affect the quality of Wikipedia articles. These methods use different metrics related to the user’s reputation and changes that they have made in pages [11,12]. If an article has a relatively large number of editors and edits, then often this article will be of high quality [13]. Cooperation among authors and edited articles can be visualized as a network. Using graph theory, it is possible to determine structural features associated with an article’s quality [14]. Artificial intelligence methods can be applied to score the article quality by discovering damaging edits [15]. However, described user-based approaches often require complex calculations, and they cannot indicate what needs to be corrected in the article to improve its quality.
Among other suggested methods, it is also necessary to note the Objective Revision Evaluation Service (ORES) [15], which can classify an article to one of the quality grades and also can automate the vandalism detection. In this case, the article quality can be evaluated on an interval scale (between 0 and 1). However, currently, automatic quality assessment of an article by the ORES is only possible in three Wikipedia language versions (https://www.mediawiki.org/wiki/ORES): English, French, and Russian. This may be due to the fact that the approach works well on large language editions of Wikipedia (with over 1 million articles), for which it is possible to obtain a sufficient amount of data for a training set. Another limitation is the specifics of grading schemes—a relatively well-developed grading scheme is necessary, with six or more quality grades. Our previous works have showed that each Wikipedia language version can have its own grading scheme [4,5], and some of these versions use only 2–3 grades. For example, German Wikipedia with over 2 million articles has only two highest-quality grades for articles’ assessment. A less developed quality grading scheme is one of the main reasons for the large number of unevaluated articles—more than 99% of articles in German, Polish, and other Wikipedia language versions do not have any quality grade.
Although existing works propose various sets of metrics for assessing the quality of Wikipedia articles, there is no universal feature set for this task [16]. An additional challenge is to consider different language versions, which can have different quality models [4,5]. Extraction rules of some metrics (e.g., lexical) can also be language-sensitive [6,7,9]. There are also a few works that aim to combine metrics from articles’ content and edition history [16,17].
Concluding, by using different metrics and models, it is possible to estimate the quality of an article. The majority of the approaches are focused only on one (usually the largest—English) or several language versions. Additionally, these methods essentially allow for the evaluation of articles and the comparing of their quality only within one selected language version of Wikipedia. This is due to the differences that can arise in the quality models between various Wikipedia languages [4,5].
In this paper, for the particular task of comparing the quality using synthetic measures, we decided to take into the account only important content-based metrics. Most of the existing studies evaluate the quality of Wikipedia articles as a binary classification problem, which is limited when comparing articles with similar quality classes. Some of the researchers have aimed to build models by taking into account all (or major) quality grades in developed language versions (such as English), but in this case, the precision decreases significantly. Additionally, previous studies have examined the quality of an article within one selected Wikipedia language instead of comparing different language versions of this article.

2.2. Popularity Measures

The second measure that we analyze in this paper is the popularity of articles. Earlier studies have showed that for some developed language versions of Wikipedia (such as English, German, and Spanish), the popularity of the articles was correlated with its number of edits [18]. Our prior work has showed that popularity can play an important role in the estimation of quality in specific language versions of Wikipedia [5]. Other studies have showed that measuring a topic’s popularity in English Wikipedia can help in determining its number of articles of good quality—if the topic is popular, then it has a larger number of high-quality articles [19]. Warncke-Wang et al. showed misalignment between the popularity and quality of the articles in Wikipedia; however, the study was limited to four language versions of Wikipedia [20]. Additionally, none of the studies provided a comparative analysis of the popularity of the same article between language versions and its impact on the quality. Popularity can also show to a certain degree the importance of the article for groups of Wikipedia users that read it in a selected language version. This can also provide motivation to assign a higher-quality grade for an article in a given language version compared to other languages—a greater number of users can check the completeness, timeliness, and reliability of facts described in the article. Therefore, our hypothesis is that popularity can affect the quality dimension of an article.
This study is the continuation of work on building a synthetic measure for the quality assessment of Wikipedia articles in different languages [21]. Preliminary results have shown the high efficiency of this method in assessing articles on language-sensitive topics. Compared with our previous work [21], we decided to increase the number of analyzed languages (from 7 to 44), expand the rules for quality assessment, and analyze the popularity of the articles.

3. Quality Measure

Many of the existing studies solve the problem of automatic quality assessment of articles as a classification task: articles can be marked as complete or incomplete [3,4,5,6,7,9]. This is a large limitation for comparing articles in different languages, as it is not possible to show to what degree one article is better than the other if both are tagged with the same class (e.g., incomplete). Additionally, it is necessary to take into account different standards in the quality assessment met in various language editions of Wikipedia, defined by each community.
In order to build a synthetic measure, we chose five important content-based metrics:
  • len—article length (in bytes);
  • ref—number of references;
  • img—number of images;
  • hdr—number of first- and second-level headers;
  • ral—the ratio of the number of references to the article length.
These metrics previously have showed high prediction power in quality assessment of English Wikipedia [3], as well as for other language editions of Wikipedia [4,5]. According to our findings, the above metrics are positively correlated with the quality grades [4,5,21] (see Figure 1).
Considering over 4 million articles with assigned quality classes in English Wikipedia, we have calculated the values of proposed metrics by quality classes. We can observe that the values of the metrics increase with an increase in the quality (stub—the lowest; FA—the highest). Table 1 presents medians of each metric of all articles in a particular quality class. As a side note, we do not take into account the A-class, because this class is usually assigned to articles that already have a FA or GA grade. We also excluded 111,412 articles that had two or more different quality grades assigned by various Wikipedia projects.
In addition to the above metrics, which were used in our previous work [21], we also decided to take into account special quality flaw templates, which can indicate some problems as identified by Wikipedia editors in a considered article. There are 12 types of this template in English Wikipedia, for example, verifiability, the style of writing, the structure, and neutrality [10]. We conducted a preliminary analysis of the best articles for finding quality flaw templates. It turned out that articles with a FA grade virtually did not contain important quality flaw templates. Therefore, including this additional metric is important for decreasing the quality score for articles with high values of content-based metrics and some quality problems at the same time.

3.1. Language Versions

We applied the following selection criteria for language editions of Wikipedia: (a) more than 100,000 articles and (b) editing depth value higher than 20. The latter value reflects the depth of collaborativeness, that is, how frequently articles are updated (https://meta.wikimedia.org/wiki/Wikipedia_article_depth). This descriptor is highly relevant for Wikipedia. These criteria were met by 44 language versions. The list of languages along with a number of extracted articles and redirects is presented in Table 2.

3.2. Metrics Extraction

We used our own parser to extract the six considered metrics. This parser uses some of the files from Wikipedia dumps (a complete copy of all Wikimedia wikis, in the form of Wikitext source, raw database tables in SQL and metadata embedded in XML can be found at https://dumps.wikimedia.org/). Below is list of the files that were used by our parser for metrics extraction:
  • {lang}wiki-latest-pages-articles.xml.bz2—Recombined articles, templates, media/file descriptions, and primary meta-pages. Used for calculation of articles’ length, number of headers and references.
  • {lang}wiki-latest-imagelinks.sql.gz—Wiki media/files usage records. Used in calculation of number of images in articles.
  • {lang}wiki-latest-templatelinks.sql.gz—Wiki template inclusion link records. Used in calculation of number of quality flaw templates and for searching of articles with selected infoboxes (topics).
  • {lang}wiki-latest-redirect.sql.gz—Redirect list. Used for determining articles’ name that redirects to other articles.
  • {lang}wiki-latest-langlinks.sql.gz—Wiki interlanguage link records. Used for determining name(s) of the article in other language version(s).
In the above file names, {lang} refers to the language code of the Wikipedia edition (as described in Table 2). Thus, for each language version, we downloaded and then processed these five compressed files.
To obtain the most complete list of language links of each article, it is necessary to follow language links from each language version. For example, if an article in a given language has Wikilinks to relevant articles in other languages, one needs to check if the links are mutual. An additional challenge was to overcome redirections in language links of the articles. Summarizing, we collected about 19.3 million language link sets, and 5.6 million remained after removing duplicates. Further refining, on the basis of the similarity analysis, reduced the number of articles to 4.2 million interlanguage link sets.
In the case of counting quality flaws, we had to take into account various names of templates that pointed to specific English counterparts. For this purpose, we used interlanguage links in important quality flaw templates in English Wikipedia to obtain automatically appropriate names for these templates in other languages.
In this paper, we have used the Wikipedia dumps from September 2017.

3.3. Building Quality Measure

As described in [21], often we can observe a positive correlation between the article quality and the value of each of the five considered quality metrics (article length, number of references, images, headers, and references per length). Figure 1 shows how the distribution of articles varies depending on metrics values considered by the example from the largest English Wikipedia version, which is noticeable if we consider the same number of articles with different quality grades.
As mentioned previously, English Wikipedia is the biggest edition, has an extensive grading system, and has a large number of assessed articles. The less developed languages (e.g., Belarusian, Georgian, Serbian, and Czech) do not always behave similarly to their more developed counterparts. However, taking into account the presence of the highest FA grade in all considered language versions of Wikipedia, we could calculate the median value of these best articles in each language. Medians for each considered metric and language versions are shown in Table 3.
The above values were then used as thresholds in our quality measure. As proposed in [21], on the basis of the medians, we normalized each metric, in particular, the Wikipedia language version, according to the following rule: if the value of the given metric in a given language exceeded the threshold, it was set to 100 points; otherwise its value was linearly scaled to reflect the relation of the value to the median value. For example, if the median for the number of references in Japanese Wikipedia was 118, any article with a larger number of references would score 100 for this metric; an article with 59 references would score proportionally 50 points after normalizing.
Changing the value of any metric in a particular Wikipedia language version would have a different effect on the normalized value. For each language version of Wikipedia, each metric could play an important role in assessing the quality; therefore we first counted the normalized metrics average (NMA) by the following formula:
N M A = 1 c i = 1 c n m i
where n m i is a normalized metric m i and c is the number of metrics.
Next we took into account the number of quality flaw templates QFT in the considered article (if they existed) and our final formula for the quality measure reads as follows:
Q u a l i t y S c o r e = N M A N M A 0.05 Q F T
In articles with a high quality score value (e.g., 90 points), each quality flaw template reduced the quality score by 5% (for one such template in our example, the article had 85.5 points). This way, if an article had the maximum values of a particular metric but at the same time had quality flaw template(s), this would not allow it to obtain the maximum value of the quality score (100).
After the assessing of more than 28 million articles in 44 considered language editions of Wikipedia, we found that most of the articles obtained scores of between 0 and 30 points. Figure 2 shows the distribution of articles in this scale (a more detailed and interactive chart is found on the Web page: http://data.lewoniewski.info/informatics2017/).

4. Popularity Measure

The quality of Wikipedia articles can change over time. This is particularly true for articles that contain time-sensitive information. If they are not regularly updated or are updated with delays, their quality will decrease over time. A lower quality will be observed particularly in comparison to equivalent articles in other languages that may be updated regularly. We can expect that more-popular language versions of the article will be verified by authors more often and, if necessary, can be updated faster than less-popular language versions. To some extent, this is reflected in the Wikipedia article depth measure. Concluding, it can be useful to consider popularity metrics of the articles.
Similarly to other studies [19,20], we have used the page view information in order to measure the popularity of articles. Wikipedia records data on users visiting their pages in all language versions every hour to special compressed files (https://dumps.wikimedia.org/other/analytics/). In order to measure the popularity of articles, we downloaded these data files with statistics for the last year (from September 2016 to August 2017)—about 442 GB of compressed raw data.
We define the following popularity metrics:
  • tp—total popularity: total number of visits during the considered period;
  • sp—stable popularity: stable number of visits, which is calculated as the median of daily visits during the considered period.
In order to calculate the relative popularity, we normalized both metrics with regard to maximum values of popularity metrics in corresponding articles in other languages. Thus, for the popularity metric p of the particular article with v numbers of language versions, the language l p with the maximum value can be found by the formulas:
l t p = arg max v = 1 . . n t p ( v ) , l s p = arg max v = 1 . . n s p ( v )
Now, in order to calculate the relative popularity R P (on a scale between 0 and 100) of the selected language version l of the article, we counted using the average of the normalized popularity metrics t p and s p :
R P ( l ) = t p ( l ) t p ( l t p ) × 50 + s p ( l ) s p ( l s p ) × 50
We consider an example. We suppose we have three language versions of the article—en, de, and fr. For each language, we have the following popularity metrics:
  • total popularity t p ( e n ) = 2000 , t p ( d e ) = 1000 , and t p ( f r ) = 500 ;
  • stable popularity s p ( e n ) = 30 , s p ( d e ) = 40 , and t p ( f r ) = 20 .
English (1) has the highest value of the t p metric; therefore l t p = e n , and we normalize using the value t p ( l t p ) = 2000 : t p ( e n ) = 2000 2000 = 1 , t p ( d e ) = 1000 2000 = 0.5 , and t p ( f r ) = 500 2000 = 0.25 .
German (2) has the highest value of the s p metric; therefore we normalize using the value s p ( l s p ) = 40 for l s p = d e : s p ( e n ) = 30 40 = 0.75 , s p ( d e ) = 40 40 = 1 , and s p ( f r ) = 20 40 = 0.5 .
Now substituting the normalized values into Equation (4), we obtain the following values of the relative popularity measure for each considered language version of the article:
  • R P ( e n ) = 1 × 50 + 0.75 × 50 = 87.5 ;
  • R P ( d e ) = 0.5 × 50 + 1 × 50 = 75 ;
  • R P ( f r ) = 0.25 × 50 + 0.5 × 50 = 37.5 .

5. Wikipedia Articles’ Assessment

In this section, we present the results of the quality and popularity assessment of Wikipedia articles in 44 languages on different topics: companies, films, persons, universities, and video games.

5.1. Dataset

Wikipedia provides a system of categories, specific to each language, that allows for the grouping of articles. Thus, each language version of Wikipedia usually has its own structure of categories and own practices concerning their assignment. For example, in some languages, it is customary to tag an article with more than 20 categories; in others, the number can be limited to 2–5 categories. The quality of structure of categories also differs among languages. For example, in some language versions, articles about people, events, transport and other topics can be assigned to just one category.
A more reliable approach for classification is based on the infobox system. An infobox is a table, located usually at the top right-hand corner of an article, that concisely presents main facts about the subject. Depending on the topic described, infoboxes have different names. This allows other popular knowledge bases (e.g., DBpedia, https://dbpedia.org) to develop detailed ontology on the basis of these Wikipedia templates [22]. Popular infoboxes usually have their own names in various languages. For the purpose of our research, we have chosen 12 different infobox types on the basis of popularity in English Wikipedia. Using interwiki links, we extracted infobox names in other language versions. Table 4 shows that almost all languages of Wikipedia have equivalents of popular infoboxes in the English version.
In order to define groups of the articles that described the same topic, we extracted lists of articles separately for each infobox in a particular language version. In some languages, the lack of an infobox does not mean the absence of articles on a given topic. For example, German Wikipedia does not use infoboxes for people (office holders, musicians, etc.). Moreover, there is no obligation to add an infobox at all. However, it is often considered an important element of an article’s quality. In such cases, we can use interwiki links from identified articles in some languages to reach articles in other versions. Results of the above procedure are presented in Table 5, which presents the number of articles on a particular topic in the analyzed Wikipedia languages.
Table 6 presents the results from another perspective. Here we can find out, for each topic, the number of articles that were translated to a given number of languages. As data is best interpreted using visual cues, we also present the phenomenon in Figure 3 (logarithmic scale on vertical axis).
Another possibility to analyze the data on language versions from Table 5 is to show overlaps between a group of three languages using Venn diagrams. These show how many articles specific languages have in common (see Figure 4).

5.2. Quality Assessment

For all articles from our dataset, we calculated a synthetic measure of quality as described in Section 3.3. Table 7 presents the average quality scores of articles for each topic in 44 Wikipedia language editions.
If we consider the distribution of the quality scores of Wikipedia articles, we can also observe differences across language versions and topics. Figure 5 presents the distribution of quality scores for three Wikipedia language versions (English, German, and French) in 12 considered topics (charts for other languages are available from the Web page: http://data.lewoniewski.info/informatics2017/).

5.3. Popularity Assessment

Our goal is to look for correlation between quality and popularity. Therefore, we also collected data about popularity as described in Section 4. In Table 8, we present the average popularity metric t p for articles in each topic in 44 Wikipedia language editions.

6. Association between Quality and Popularity

In this section, we present a comparison of the quality and popularity of Wikipedia articles in different languages.
As there were additional requirements for relations between languages, we have conducted the analysis on a subset of Wikipedia articles. We selected only those articles in each topic that had at least three language versions (cf. Table 6). We further analyzed combinations of a language and a topic—a pair. Table 9 presents the top 25 pairs with a share of articles, which had the highest quality in comparison to other languages (full data is presented in Table A1 in the Appendix). For example, the first row of this table should be interpreted as follows: regarding the topic “videogame”, 60.5% of articles according to our quality score were best described in the English version.
An analogous table was prepared for popularity. Table 10 presents the top 25 language–topic pairs with the share of articles that attracted the greatest popularity in comparison to other languages (full data is presented in Table A2 in the Appendix). Similarly to the previous table, the first row of this table should be interpreted as follows: regarding the topic “album”, 85.8% of articles had the English version as the most popular (attracted the greatest number of visits).
The goal of our research is to analyze the association between quality and popularity. We have done this on two levels, using appropriate statistics, both parametric and non-parametric.
We first present results of a parametric test using a phi coefficient, calculated for each language–topic pair. This is a measure of association for two binary variables. Our variables were coded as follows: if an article about a specific topic in a given language was of the highest quality among all languages, then it was assigned a value of 1 (high score); otherwise, it was assigned 0 (low score). Popularity was coded similarly: if an article about a specific topic in a given language was the most popular among all languages, then it was assigned a value of 1 (high score); otherwise it was assigned 0 (low score).
Then, the phi coefficient was calculated by the following formula:
ϕ = n 11 n 00 n 10 n 01 n 1 n 0 n 0 n 1
where n 11 is the number of articles of high quality and popularity scores, n 10 is the number of articles that have a high quality and low popularity score, n 01 is the number of articles that have a low quality and high popularity score, and n 00 is the number of articles that have low quality and popularity scores.
Depending on the language and topic, the correlation may differ significantly. Table 11 shows the top 25 language–topic pairs with the highest correlation coefficients (full data is presented in Table A3 in the Appendix).
The problem with the phi coefficient, a special case of Pearson’s correlation coefficient, is that the results have a high granularity and that it cannot be easily generalized. Therefore, we also set up another experiment, in which we estimated the association between the quality and popularity within a topic. For every topic, we prepared two lists of languages: one ordered by the share of articles that were of highest quality (see Table A1), and the other ordered by the share of articles that were the most popular (see Table A2). These lists were effectively ranks. We wished to know whether the order of the languages was similar, which would support the hypothesis that quality and popularity are associated. For this purpose, we used Spearman’s rank correlation coefficient between shares of articles (also used by [19] in similar tasks). The results are presented in Table 12.
Spearman’s rank correlation assesses the strength of a link between two sets of considered data, which in our case reached 0.87 (for the topic “company”). The results show that depending on the topic, we could find a different correlation between quality and popularity, but a coefficient of no less then 0.61 (for the topic “settlement”). All associations were statistically significant (as shown by p-values). Overall, the results of our calculations supported the hypothesis that there is an association between high quality of articles and their popularity. However, the association strength depends on the topic and the language version of Wikipedia.

7. Conclusions and Future Work

In this paper, we have described how the quality and popularity of Wikipedia articles can be measured across different languages. Depending on the topic and language, different correlations can be observed between the quality and popularity of Wikipedia articles. This can be due to several reasons.
First, there are differences between Wikipedia language communities in terms of the number of experts in each area. In less-developed language versions of Wikipedia, there are a small number of experts (or even an absence of) in some topics. This can be observed particularly in domains not specific for a given language community. Therefore, for any created article, there is a greater chance to award the highest grade through an assessment procedure than would be possible in a more-developed language version. More-developed versions, having a larger user base, are more demanding and hence more critical. It is more difficult to obtain a high-quality grade when there are more eyes watching.
Second, quality can evolve over time. We suppose that a non-popular article once received a high grade from the community in a less-developed language version. Because it is not popular, there is no incentive to update this article. The same subject in another language can be much more popular, and therefore facts can be updated regularly. In the less-developed language versions, we observe a discrepancy between the “graded” quality and the real quality. The factor that can help in this distinction is popularity.
Third, a large number of unassessed articles make it difficult to build accurate quality models on the basis of awards provided by users. Except for English and French Wikipedia, most of the language versions have a large number of unrated articles. In such models, important metrics are often related to the volume of information (e.g., articles’ length, number of references, and images). Unfortunately, these metrics cannot measure other quality dimensions of the article content, such as, for example, timeliness.
The approach for quality assessment presented in this paper takes into account the specifics of the best articles of each language version of Wikipedia. By considering the popularity measure, we can improve the process of identification of language versions with the highest quality.
The proposed quality and popularity measures can be helpful particularly in automated knowledge extraction from Wikipedia articles. One of these such solutions is DBpedia. The problem that is often encountered is a conflict resolution, which is necessary when various language versions concerning the same subject have conflicting information [24]. Our quality metrics can help in building more-effective conflict-resolution strategies for data fusing. An example of such conflict in DBpedia is presented in Figure 6.
Conflict resolution is a first step towards the overall objective of enriching less-developed Wikipedia language versions, for which the appropriate information is of poor quality or is even absent. Figure 7 shows the general scheme of enrichment of information by transferring values from infoboxes of language versions with the highest quality and popularity scores. Before transferring values of particular parameters of an infobox, the information is compared to other language versions, and versions with higher quality and popularity scores will have a higher influence (weight) on selecting the proper value.
The methods proposed in the paper have practical implications. The synthetic quality measure is used in the WikiRank service (http://wikirank.net), which assesses and compare articles in the various language versions of Wikipedia. A quality and popularity assessment of an article can help to evaluate the quality of its important part—the infobox. Such evaluation is used in the Infoboxes service (http://infoboxes.net).
Some of the presented metrics can be expanded. For example, by analyzing the similarity of sources in Wikipedia articles across languages, we can also evaluate the quality of their content [25]. Furthermore, the references themselves can have their own quality metrics (e.g., impact factor), which can be used as an indirect indicator of the article’s quality. For popularity measurementsm it can be useful to add some metrics related to link analysis in Wikipedia articles [26]. In the future, we plan to continue studies on new metrics and their extraction methods for improving the Wikipedia article quality assessment model.

Author Contributions

K.W. and W.L. conceived the research problem; W.L. conducted state of the art analysis; K.W. proposed research methodology and designed the experiments, starting from hypotheses to be verified statistically; W.L. collected data and performed the analysis; W.L. and K.W. interpreted the results; W.A. provided an overall guidance.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Appendix A.1

Table A1. Shares of Wikipedia articles with the highest quality score compared with other language versions (articles with at least three language versions were considered). Source: own calculations.
Table A1. Shares of Wikipedia articles with the highest quality score compared with other language versions (articles with at least three language versions were considered). Source: own calculations.
Lang.AlbumComp.FilmFootb.Music.Offic.PersonSettl.TaxoboxTelev.Univ.Videog.
ar0.00.0020.0010.0080.0010.0030.0070.0010.0040.0010.0070.004
az0.00.00.0010.0010.0010.0040.0010.0060.0010.00.0040.0
be0.00.00.00.0030.00.0010.00.00.00.00.00.0
bg0.0030.0010.0060.0030.0040.0030.0040.010.030.0020.0050.0
ca0.0010.0030.030.0120.0040.0040.040.0050.0330.0040.0040.008
cs0.0030.0050.0020.0160.0090.0020.0060.0030.0060.0020.0020.002
da0.0010.0020.0010.0030.0020.0010.00.00.00.0010.00.0
de0.0320.1330.1470.0910.0350.0130.0380.0150.080.0360.0710.047
el0.0050.0020.0030.0050.0040.0060.0060.0020.0020.0020.0020.002
en0.5550.4970.4370.2910.490.3930.3870.2120.2710.4780.4350.605
es0.0580.0270.0170.0530.0570.0620.0620.0460.1390.0730.0390.025
et0.0010.0040.0010.0020.0030.0070.0040.0030.0030.0010.0040.0
fa0.0010.0020.0020.0030.0020.0080.0030.0120.00.0020.0050.001
fi0.0180.0130.0140.0070.0120.0090.0120.0050.0090.0050.0060.006
fr0.0320.050.060.0520.0670.0650.0770.1150.0260.0380.060.051
gl0.00.0010.00.00.0010.0010.0010.0010.0020.00.0010.0
he0.0020.0060.0050.0030.0080.0140.0110.0020.0010.0060.0090.001
hi0.0010.0020.0060.00.0010.0020.0010.0020.00.0020.0020.0
hr0.0080.0050.0070.0070.0080.010.0070.0160.0020.0040.0030.001
hu0.0110.0060.0080.0150.010.0130.010.0150.0090.0110.0030.005
hy0.00.00.0010.00.0010.0020.0010.0190.00.00.00.0
id0.0040.0040.0070.0020.0050.0050.0030.0010.0020.0070.0070.001
it0.0630.030.0660.1810.0510.0590.10.0310.0420.0570.0140.027
ja0.0110.0460.0170.0360.0350.0190.0180.0040.0060.0430.0470.053
ka0.010.0010.0010.0010.0010.0020.0010.0030.010.00.0030.001
ko0.0030.0070.0010.0030.0050.0050.0030.0010.0020.0110.0130.004
lt0.0010.0020.0010.0030.0030.0060.0020.0040.0010.00.0010.0
no0.0060.0080.0110.0110.0120.0140.0330.0110.0070.0060.0060.004
pl0.0380.0160.0180.0270.0380.060.040.0880.0340.0240.0250.015
pt0.0430.0150.0270.0240.020.0180.0190.0280.0330.0350.0130.018
ro0.0030.0030.0020.0040.0030.0040.0030.0170.0290.0020.0030.001
ru0.0340.0320.0310.0570.040.0780.040.0390.0260.0190.040.045
sh0.00.0010.0050.00.0010.0050.0020.0360.00.0010.0020.0
simple0.0020.0020.0020.0060.0040.0060.0040.0040.0010.0030.0040.002
sl0.0030.0010.0010.00.0030.0040.0030.0170.00.00.00.001
sr0.0010.0010.0010.0010.0020.0030.0020.0020.0130.0020.0010.001
ta0.00.0020.0010.00.0010.0050.0020.0020.0010.0010.0030.0
th0.0010.0010.0010.0010.0020.0030.0010.0010.0020.0020.0080.002
tr0.0020.0030.0020.0110.0050.0070.0040.0060.0020.0040.0080.002
uk0.0190.0190.0210.0410.0170.0370.020.1510.0110.0070.0230.01
ur0.00.00.00.00.00.0010.0010.0370.00.00.0020.0
uz0.00.00.00.00.00.0010.00.0010.00.00.00.0
vi0.0010.0010.0020.0010.0020.0020.0010.0170.1350.0040.0040.002
zh0.0220.0440.0320.0120.0290.0320.0190.0090.0250.1050.1090.052

Appendix A.2

Table A2. Shares of Wikipedia articles with the highest popularity compared with other language versions (articles with at least three language versions were considered). Source: own calculations.
Table A2. Shares of Wikipedia articles with the highest popularity compared with other language versions (articles with at least three language versions were considered). Source: own calculations.
Lang.AlbumComp.FilmFootb.Music.Offic.PersonSettl.TaxoboxTelev.Univ.Videog.
ar0.00.0010.0020.0040.0030.0030.0060.00.0010.0010.010.0
az0.00.00.00.00.0010.0020.00.00.00.00.0040.0
be0.00.00.00.00.00.00.00.00.00.00.00.0
bg0.00.00.0010.0010.0020.0030.0010.0020.00.0010.0030.0
ca0.00.00.00.00.00.00.0010.00.00.00.00.0
cs0.00.0030.0030.0060.0060.0010.0040.0010.0010.0010.0030.0
da0.00.0010.0010.0010.0020.0030.00.00.00.0010.0010.0
de0.0050.060.0320.0410.0080.0040.0170.0080.0330.0230.0320.004
el0.00.0010.0010.0020.0010.0050.0020.0020.00.0010.0010.0
en0.8580.6630.7330.5520.6760.5440.6240.3910.7360.6440.5080.857
es0.0190.0240.020.0890.0460.0690.0650.0520.0610.0950.0510.007
et0.00.0010.00.00.0010.0020.0010.0040.00.00.0010.0
fa0.00.0010.0020.0040.0020.0060.0040.0050.00.0010.0080.0
fi0.0030.0040.0010.0010.0060.0050.0030.0020.0010.0010.0020.0
fr0.0110.0320.0430.0350.0290.0370.0490.1130.0330.0220.0510.005
gl0.00.00.00.00.00.00.00.00.00.00.00.0
he0.00.0020.0010.0020.0030.0040.0020.0030.0010.0010.0030.0
hi0.00.00.00.00.00.00.00.00.00.00.00.0
hr0.0020.0010.0010.00.0040.0040.0020.0110.00.0020.0010.0
hu0.00.0010.0010.0050.0030.0060.0040.0190.0010.0010.0030.0
hy0.00.00.00.00.00.00.00.00.00.00.0010.0
id0.00.0020.00.00.0030.0040.0010.0020.0080.00.0050.0
it0.0110.0180.0370.0350.0190.0270.0380.0240.0060.0120.0120.004
ja0.0580.0770.0350.060.060.0220.0290.0070.0140.080.0930.075
ka0.00.00.00.00.00.0020.00.00.00.00.00.0
ko0.00.0060.00.0020.0020.0040.0020.0010.0010.0060.0110.0
lt0.00.0010.00.00.0010.0020.0010.0010.00.00.0020.0
no0.00.0020.0010.0030.0030.0030.0040.0030.0010.0010.0010.0
pl0.0030.010.0060.0150.0110.0320.0180.0750.0150.0050.0160.001
pt0.0070.0120.0040.030.0140.0120.0130.0250.0080.0190.0170.002
ro0.00.0010.0010.00.0030.0050.0020.0350.0010.00.0020.0
ru0.0130.050.0510.0960.0590.1420.0840.1690.0520.0290.0760.036
sh0.00.00.00.00.00.00.00.0010.00.00.00.0
simple0.00.00.00.00.00.00.00.00.00.00.00.0
sl0.00.00.00.00.0010.0010.0010.0040.00.00.00.0
sr0.0010.00.0040.00.0030.0050.0030.0140.00.0020.0050.0
ta0.00.00.00.00.00.00.00.00.00.00.00.0
th0.00.0020.0010.0010.0020.0020.0010.0030.0020.0010.0080.0
tr0.0010.0040.0030.010.0060.0110.0050.0120.00.0060.0120.0
uk0.00.0010.00.00.0010.0020.0010.0070.0010.00.0090.0
ur0.00.00.00.00.00.00.00.00.00.00.00.0
uz0.00.00.00.00.00.00.00.00.00.00.00.0
vi0.00.0010.00.00.0020.0020.0010.0020.0070.0010.0060.0
zh0.0060.0170.0130.0020.0150.0250.010.0020.0150.0420.0420.007

Appendix A.3

Table A3. Phi correlation coefficients of articles with the highest quality and popularity in selected Wikipedia languages. Source: own calculations.
Table A3. Phi correlation coefficients of articles with the highest quality and popularity in selected Wikipedia languages. Source: own calculations.
Lang.AlbumComp.FilmFootb.Music.Offic.PersonSettl.TaxoboxTelev.Univ.Videog.
ar0.3340.1990.2430.0860.3800.1770.2410.1040.2740.4570.390
az−0.0060.4630.1010.3500.2630.2260.0630.158−0.0050.458−0.010
be0.3040.2630.2690.0990.1610.0280.349
bg0.1400.3670.2230.0970.5340.4880.2440.3580.1390.6020.603
ca0.1800.0510.0410.1860.0730.0330.1700.1670.0530.2880.1810.088
cs0.2110.4550.3310.2790.4050.4000.4060.2090.1780.5800.6400.269
da0.4270.3720.3580.2250.4190.4610.1620.2890.1820.4770.4110.405
de0.3340.5230.3380.3640.2630.3410.3090.5120.2070.5350.5930.215
el0.1960.3030.5640.2460.2380.5110.3350.0260.0660.5220.134
en0.6360.6250.4450.4340.5760.5650.3900.4540.3630.6300.6760.589
es0.2620.4660.2950.3370.4360.5670.3150.6420.4190.5430.6830.185
et0.2070.3320.1890.1510.4610.3620.2830.2110.1570.6500.359
fa0.2980.4020.3860.2200.5950.4550.4580.1610.1410.6550.6010.181
fi0.3250.4430.2540.2500.4560.5570.2990.3860.1660.5110.3240.176
fr0.2380.4300.2370.2630.3430.3620.3300.3700.2040.4570.6360.205
gl−0.0010.1030.065−0.0050.0240.0830.2470.0820.1020.367
he0.3430.5040.2810.5250.4590.4100.3250.2560.1790.5860.552
hi0.074−0.015−0.0060.0740.0150.200−0.0270.026
hr0.2620.4210.2570.4020.5930.4000.3520.4090.3410.4430.456
hu0.1030.4670.2520.4170.4250.4580.4190.5460.0930.3690.5370.094
hy0.163−0.0040.198−0.0020.2200.2450.124−0.003−0.0070.1680.341
id0.2490.3620.1900.2120.4350.5820.3220.6560.0330.4660.705
it0.2480.4380.1370.3440.3370.4400.3130.4660.2830.2540.5360.159
ja0.3090.6370.4550.5940.5620.4820.4920.4470.3420.5320.6140.379
ka0.0520.2680.0200.1760.2010.1950.501−0.0010.1110.214
ko0.2130.4540.1900.3190.3690.5670.4170.5520.1760.4260.5110.150
lt0.1330.2470.2320.0880.3530.3960.3690.5410.2350.3180.558
no0.1490.3380.1910.0360.2530.3530.1550.1700.5360.3510.3420.227
pl0.3190.6070.2560.2920.4140.4240.3390.5240.1340.3040.7170.160
pt0.3340.4740.1790.4440.4960.4860.3670.5530.0640.6320.7070.252
ro0.1290.4450.4410.1010.4980.3700.3980.1130.4350.2590.473
ru0.2680.4600.3920.3650.4140.4560.3300.4210.3600.4700.6070.296
sh−0.0070.075−0.0010.0670.0270.1750.011−0.004−0.0050.495
simple0.0980.0910.137−0.0070.0490.0710.0770.074
sl0.2140.4360.642−0.0050.3290.2300.3290.1130.140−0.0260.597
sr0.1220.1890.3050.0450.3300.2510.3010.1130.0810.3980.556
ta−0.012−0.0130.0100.0380.0390.0330.2120.082
th0.1960.4400.4070.2260.6840.7620.4810.4950.1510.5440.8380.173
tr0.4510.3670.4530.3660.5070.3970.4050.5070.2900.5600.6770.163
uk0.1040.1880.0830.1100.2060.2160.2010.2040.0910.1840.391
ur0.1040.0440.013
uz0.2360.3140.2010.0730.338
vi0.1210.3020.2610.4200.6070.4810.3690.1420.3720.2090.7190.096
zh0.3630.4230.3780.2500.4560.6150.4100.1850.1260.5500.4620.243

References

  1. Staub, T.; Hodel, T. Wikipedia vs. Academia: An Investigation into the Role of the Internet in Education, with a Special Focus on Wikipedia. Univ. J. Educ. Res. 2016, 4, 349–354. [Google Scholar] [CrossRef]
  2. Blumenstock, J.E. Size matters: Word count as a measure of quality on wikipedia. In Proceedings of the 17th International Conference on World Wide Web, Beijing, China, 21–25 April 2008; pp. 1095–1096. [Google Scholar]
  3. Warncke-wang, M.; Cosley, D.; Riedl, J. Tell Me More: An Actionable Quality Model for Wikipedia. In Proceedings of the 9th International Symposium on Open Collaboration, Hong Kong, China, 5–7 August 2013; pp. 1–10. [Google Scholar]
  4. Węcel, K.; Lewoniewski, W. Modelling the Quality of Attributes in Wikipedia Infoboxes. In Business Information Systems Workshops; Abramowicz, W., Ed.; Lecture Notes in Business Information Processing; Springer International Publishing: Cham, Switzerland, 2015; Volume 228, pp. 308–320. [Google Scholar]
  5. Lewoniewski, W.; Węcel, K.; Abramowicz, W. Quality and Importance of Wikipedia Articles in Different Languages. In Information and Software Technologies: 22nd International Conference, ICIST 2016, Druskininkai, Lithuania, October 13-15, 2016, Proceedings; Springer International Publishing: Cham, Switzerland, 2016; pp. 613–624. [Google Scholar]
  6. Lex, E.; Voelske, M.; Errecalde, M.; Ferretti, E.; Cagnina, L.; Horn, C.; Stein, B.; Granitzer, M. Measuring the quality of web content using factual information. In Proceedings of the 2nd Joint WICOW/AIRWeb Workshop on Web Quality—WebQuality’12, Lyon, France, 16–20 April 2012; p. 7. [Google Scholar]
  7. Khairova, N.; Lewoniewski, W.; Węcel, K. Estimating the Quality of Articles in Russian Wikipedia Using the Logical-Linguistic Model of Fact Extraction. In Business Information Systems: 20th International Conference, BIS 2017, Poznan, Poland, June 28–30, 2017, Proceedings; Abramowicz, W., Ed.; Springer International Publishing: Cham, Switzerland, 2017; pp. 28–40. [Google Scholar]
  8. Lipka, N.; Stein, B. Identifying Featured Articles in Wikipedia: Writing Style Matters. In Proceedings of the 19th International Conference on World Wide Web (2010), Raleigh, NC, USA, 26–30 April 2010; pp. 1147–1148. [Google Scholar]
  9. Xu, Y.; Luo, T. Measuring article quality in Wikipedia: Lexical clue model. In Proceedings of the 2011 3rd Symposium on Web Society (SWS), Port Elizabeth, South Africa, 26–28 October 2011; pp. 141–146. [Google Scholar]
  10. Anderka, M. Analyzing and Predicting Quality Flaws in User-generated Content: The Case of Wikipedia. Ph.D. Thesis, Bauhaus-Universitaet, Weimar, Germany, 2013. [Google Scholar]
  11. Wu, G.; Harrigan, M.; Cunningham, P. Characterizing Wikipedia Pages Using Edit Network Motif Profiles. In Proceedings of the 3rd International Workshop on Search and Mining User-generated Contents, Glasgow, UK, 24–28 October 2011; pp. 45–52. [Google Scholar]
  12. Suzuki, Y.; Nakamura, S. Assessing the Quality of Wikipedia Editors Through Crowdsourcing. In Proceedings of the 25th International Conference Companion on World Wide Web, Montreal, QC, Canada, 11–15 April 2016; International World Wide Web Conferences Steering Committee: Geneva, Switzerland, 2016; pp. 1001–1006. [Google Scholar]
  13. Wilkinson, D.M.; Huberman, B.A. Cooperation and quality in wikipedia. In Proceedings of the 2007 International Symposium on Wikis WikiSym 07, Montreal, QC, Canada, 21–23 October 2007; pp. 157–164. [Google Scholar]
  14. Ingawale, M.; Dutta, A.; Roy, R.; Seetharaman, P. Network analysis of user generated content quality in Wikipedia. Online Inf. Rev. 2013, 37, 602–619. [Google Scholar] [CrossRef]
  15. Halfaker, A.; Taraborelli, D. Artificial Intelligence Service “ORES” Gives Wikipedians X-Ray Specs to See Through Bad Edits. Available online: https://blog.wikimedia.org/2015/11/30/artificial-intelligence-x-ray-specs/ (accessed on 31 October 2017).
  16. Dang, Q.V.; Ignat, C.L. Quality assessment of Wikipedia articles without feature engineering. In Proceedings of the 2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL), Newark, NJ, USA, 19–23 June 2016; pp. 27–30. [Google Scholar]
  17. Dalip, D.H.; Lima, H.; Gonçalves, M.A.; Cristo, M.; Calado, P. Quality assessment of collaborative content with minimal information. In Proceedings of the IEEE/ACM Joint Conference on Digital Libraries, London, UK, 8–12 September 2014; pp. 201–210. [Google Scholar]
  18. Reinoso, A.J. Temporal and Behavioral Patterns in the Use of Wikipedia. Ph.D. Thesis, Universidad Rey Juan Carlos, Madrid, Spain, 2011. Available online: https://gsyc.urjc.es/~ajreinoso/thesis/ (accessed on 31 October 2017). [Google Scholar]
  19. Lehmann, J.; Müller-Birn, C.; Laniado, D.; Lalmas, M.; Kaltenbrunner, A. Reader preferences and behavior on wikipedia. In Proceedings of the 25th ACM Conference on Hypertext and Social Media, Santiago, Chile, 1–4 September 2014; pp. 88–97. [Google Scholar]
  20. Warncke-Wang, M.; Ranjan, V.; Terveen, L.G.; Hecht, B.J. Misalignment Between Supply and Demand of Quality Content in Peer Production Communities. In Proceedings of the ICWSM, Oxford, UK, 26–29 May 2015; pp. 493–502. [Google Scholar]
  21. Lewoniewski, W.; Węcel, K. Relative Quality Assessment of Wikipedia Articles in Different Languages Using Synthetic Measure. In Business Information Systems Workshops: BIS 2017 International Workshops, Poznań, Poland, June 28-30, 2017, Revised Papers; Abramowicz, W., Ed.; Springer International Publishing: Cham, Switzerland, 2017; pp. 282–292. [Google Scholar]
  22. Bizer, C.; Lehmann, J.; Kobilarov, G.; Auer, S.; Becker, C.; Cyganiak, R.; Hellmann, S. DBpedia-A crystallization point for the Web of Data. Web Semant. Sci. Serv. Agents World Wide Web 2009, 7, 154–165. [Google Scholar] [CrossRef]
  23. Wessa, R. Spearman Rank Correlation (v1.0.3) in Free Statistics Software (v1.2.1), Office for Research Development and Education. Available online: https://www.wessa.net/rwasp_spearman.wasp/ (accessed on 31 October 2017).
  24. Bryl, V.; Bizer, C. Learning conflict resolution strategies for cross-language wikipedia data fusion. In Proceedings of the 23rd International Conference on World Wide Web, Seoul, Korea, 7–11 April 2014; pp. 1129–1134. [Google Scholar]
  25. Lewoniewski, W.; Węcel, K.; Abramowicz, W. Analysis of References Across Wikipedia Languages. In Information and Software Technologies: 23rd International Conference, ICIST 2017, Druskininkai, Lithuania, October 12–14, 2017, Proceedings; Damaševičius, R., Mikašytė, V., Eds.; Springer International Publishing: Cham, Switzerland, 2017; pp. 561–573. [Google Scholar]
  26. Hanada, R.; Cristo, M.; Pimentel, M.D.G.C. How do metrics of link analysis correlate to quality, relevance and popularity in wikipedia? In Proceedings of the 19th Brazilian Symposium on Multimedia and the Web, Salvador, Brazil, 5–8 November 2013; pp. 105–112. [Google Scholar]
Figure 1. Distribution of metrics in articles of each quality class in English Wikipedia (featured article (FA)—the highest grade; stub—the lowest). Source: own calculation.
Figure 1. Distribution of metrics in articles of each quality class in English Wikipedia (featured article (FA)—the highest grade; stub—the lowest). Source: own calculation.
Informatics 04 00043 g001
Figure 2. Distribution of articles depending on quality score in each language version of Wikipedia. The medians of the quality scores for each language edition are in parentheses. Source: own calculation.
Figure 2. Distribution of articles depending on quality score in each language version of Wikipedia. The medians of the quality scores for each language edition are in parentheses. Source: own calculation.
Informatics 04 00043 g002
Figure 3. Number of articles that have a certain number of language versions in particular topics. Source: own calculations.
Figure 3. Number of articles that have a certain number of language versions in particular topics. Source: own calculations.
Informatics 04 00043 g003
Figure 4. Coverage of articles that describe universities in different languages. Source: own calculation. Other interactive Venn diagrams for this paper with different topics and languages are available on the following Web page: http://data.lewoniewski.info/informatics2017/vn/.
Figure 4. Coverage of articles that describe universities in different languages. Source: own calculation. Other interactive Venn diagrams for this paper with different topics and languages are available on the following Web page: http://data.lewoniewski.info/informatics2017/vn/.
Informatics 04 00043 g004
Figure 5. Distribution of quality scores for three Wikipedia language versions (English, German, and French) in 12 considered topics. Source: own calculation.
Figure 5. Distribution of quality scores for three Wikipedia language versions (English, German, and French) in 12 considered topics. Source: own calculation.
Informatics 04 00043 g005
Figure 6. Infobox about Basel with its data sources and its extraction to DBpedia from different Wikipedia language versions.
Figure 6. Infobox about Basel with its data sources and its extraction to DBpedia from different Wikipedia language versions.
Informatics 04 00043 g006
Figure 7. Scheme of information enrichment of Wikipedia infobox on the basis of quality and popularity assessment of other language versions by an example of Basel city. Source: own calculations from September 2017.
Figure 7. Scheme of information enrichment of Wikipedia infobox on the basis of quality and popularity assessment of other language versions by an example of Basel city. Source: own calculations from September 2017.
Informatics 04 00043 g007
Table 1. Median of metrics in each quality class in English Wikipedia. Source: own calculations.
Table 1. Median of metrics in each quality class in English Wikipedia. Source: own calculations.
Quality/MetricLenRefImgHdrRalNo. of Articles
FA49,292.5113.013.014.00.002315117
GA25,862.057.08.010.00.0021526,126
B21,791.032.06.011.00.0015769,545
C14,751.021.04.09.00.00147178,902
Start6526.06.02.05.00.000971,300,912
Stub2182.01.02.02.00.000732,604,331
Table 2. Number of articles and redirects in considered language versions of Wikipedia.
Table 2. Number of articles and redirects in considered language versions of Wikipedia.
Lang. CodeFull NameNumber of ArticlesNumber of Redirects
arArabic540,604469,411
azAzerbaijani124,75834,223
beBelarusian146,060187,545
bgBulgarian234,409111,580
caCatalan555,036360,622
csCzech389,769246,868
daDanish231,498140,296
deGerman2,102,4981,403,049
elGreek136,68267,422
enEnglish5,479,8347,865,769
esSpanish1,354,8351,655,009
etEstonian161,221117,093
faPersian575,8761,471,443
fiFinnish422,047243,497
frFrench1,910,8151,464,984
glGalician141,14655,341
heHebrew212,814171,196
hiHindi121,14145,802
hrCroatian177,76250,454
huHungarian417,182187,423
hyArmenian230,411316,974
idIndonesian410,170442,416
itItalian1,383,839660,330
jaJapanese1,076,601641,393
kaGeorgian117,61437,333
koKorean397,641336,249
ltLithuanian182,96179,476
noNorwegian475,291268,180
plPolish1,241,294407,200
ptPortuguese978,485748,634
roRomanian379,141495,065
ruRussian1,421,8081,860,232
shSerbo-Croatian439,8893,537,980
simpleSimple English127,96352,026
slSlovenian158,14165,893
srSerbian356,250848,652
taTamil113,14636,502
thThai119,425137,551
trTurkish298,523239,841
ukUkrainian734,784416,183
urUrdu123,921191,456
uzUzbek128,997315,513
viVietnamese1,161,311198,618
zhChinese962,982760,244
Table 3. Median metrics values in the highest quality class in various Wikipedia languages. Source: own calculation.
Table 3. Median metrics values in the highest quality class in various Wikipedia languages. Source: own calculation.
Lang.LengthReferencesImagesHeadersRef/Len
ar120,704.5162.541.527.00.00133
az76,048.0124.026.021.00.00162
be170,430.0197.035.027.00.00113
bg76,416.060.022.021.00.00081
ca47,890.066.018.017.00.00144
cs70,012.0123.018.021.00.00196
da72,937.5125.022.029.50.00196
de56,438.055.017.021.00.00095
el89,168.083.513.018.00.00094
en49,316.0113.013.014.00.00231
es76,565.599.019.021.00.00133
et16,834.027.010.012.50.00203
fa10,2343.0147.520.522.00.00141
fi49,264.0113.015.020.00.00224
fr90,736.0167.029.026.00.00186
gl89,990.0157.021.022.00.00203
he64,263.038.017.019.00.0006
hi74,027.538.518.016.00.00057
hr36,925.025.014.017.00.00073
hu59,459.563.022.021.00.00114
hy157,587.0169.038.033.00.00108
id49,018.092.014.016.00.00207
it82,750.0141.029.023.00.00177
ja97,329.0188.022.029.00.00198
ka92,822.046.021.020.00.00043
ko72,534.0131.020.022.00.00186
lt52,274.044.027.022.00.00056
no62,999.077.020.023.00.00108
pl59,967.097.016.017.00.00168
pt70,432.5146.023.017.00.00209
ro83,933.5154.024.021.00.00197
ru139,812.0164.024.022.00.00117
sh55,668.065.015.017.00.00116
simple22,231.051.08.09.00.00227
sl40,176.051.512.016.00.00135
sr112,775.0109.029.024.00.00098
ta96,282.024.021.019.00.00017
th122,833.091.016.022.00.00088
tr65,254.098.018.017.00.00177
uk84,159.041.025.021.00.00051
ur54,045.531.517.521.00.00058
uz55,387.027.522.026.00.00081
vi89,724.0138.021.020.00.00164
zh43,215.091.012.012.00.00219
Table 4. Number of considered language versions of Wikipedia with particular infobox. Source: own calculation.
Table 4. Number of considered language versions of Wikipedia with particular infobox. Source: own calculation.
Infobox NameAbbreviationNo. of Lang.
AlbumAlbum41
CompanyComp.41
FilmFilm43
Football biographyFootb.38
Musical artistMusic.40
OfficeholderOffice35
PersonPerson41
SettlementSettl.42
TaxoboxTaxobox43
TelevisionTelev.41
UniversityUniv.40
VideogameVideog.43
Table 5. Number of articles on particular topic in various Wikipedia languages. Source: own calculations.
Table 5. Number of articles on particular topic in various Wikipedia languages. Source: own calculations.
Lang.AlbumComp.FilmFootb.Music.OfficePersonSettl.TaxoboxTelev.Univ.Videog.
az246540469217592042377314,81898867755204956218
be16858925123461157552412,94410,155387060436111
bg26441133491948664395341633,34027,18330,6601240439203
ca139614838729808234884130128,84425,23927,9387665361214
cs69193189547110,44912,007374858,21218,28812,7161467547965
da2859251512,84969176745358624,469647860771082611770
de869923,05233,07937,65310,977699897,83633,74945,935618336433037
el2020781237225782123406428,65556412019491239377
en161,20767,416123,962149,140105,658142,209559,453513,861337,21143,42122,93422,666
es37,487950423,07128,57135,90836,945235,382168,487160,47012,32339276920
et14371053162524283016560920,82117,9595803397507178
fa6680509920,03716,516897912,98183,842150,34825,017304414621719
fi22,230643210,382995015,09410,94879,12622,88519,75837009503666
fr42,03021,84551,15743,02639,09041,593278,194217,022111,84511,041520112,364
gl38831146245821724080422921,55711,1705087611222498
he49282552453253106389935141,87610,70371552421603654
hi908872430762626514078297100133356362369
hr48751022199132323593430621,69125,4915127531200345
hu10,4532353598016,524772310,39656,162101,13221,41029982531076
hy2874855319624733970328622,98776,5283216630440149
id8567460010,51913,226536012,00939,41993,62296,84345181561673
it71,36813,11460,99950,13831,08234,395331,480183,63337,40810,59017058790
ja28,37531,71519,02916,87426,50116,449100,93643,25315,758483229178696
ka4634602169016552111517214,55430,79210,582384248204
ko7234751010,44611,20911,703901554,49824,35014,142738917212646
lt22731387212926443400397414,87021,2979309249453507
no11,5655460682210,83611,34114,458136,40536,22428,405148417421237
pl30,606818519,50639,58920,36330,018172,777230,48342,047597226053372
pt36,065945324,04426,85925,36018,047143,961153,436100,58010,12322325501
ro44522593439047435321808536,072157,47332,0081417484763
ru22,05912,94028,38632,14530,24848,437199,057244,56740,238551532296251
sh1735657889822612445652125,881119,86325781268623101
simple36051488268957135281606128,63325,20343501258836968
sl153696573022522194336331,79627,7122441217274140
sr25711068562125083105563327,707102,21196051170335315
ta12548314960166652297275377371250123972434
th27141439266317393662512613,911553559182687687796
tr96413689729417,988851010,05853,06857,582612727149931318
uk9880603113,96715,391917018,09582,276176,11124,649162618171656
ur143467322693841826771764,0906119953342
uz901841321775671034342771,79410244317715
vi42311915270626943297601419,693201,490796,74922786481038
zh11,05911,07510,12911,571766319,16768,975148,41697,55311,04046694477
Table 6. Number of articles that have a certain number of language versions (NoL) in particular topics. Source: own calculations.
Table 6. Number of articles that have a certain number of language versions (NoL) in particular topics. Source: own calculations.
NoLAlbumComp.FilmFootb.Music.OfficePersonSettl.TaxoboxTelev.Univ.Videog.
1263,745129,613226,052187,770164,297198,470912,559842,467113,950474,48231,83431,437
290,29939,04592,473103,27974,01684,295527,270574,990425,05426,97612,16017,700
354,34322,64053,07666,45247,03752,638368,955386,807180,52115,538699612,385
438,45115,75336,47548,10034,14538,797278,537296,388100,37310,65645129059
528,92911,95127,22737,44126,61730,934219,460235,16170,777800933806962
622,404949821,45630,38821,39025,496176,480196,75253,416619926705514
717,602780017,57225,04217,69321,374143,829167,86542,560493021674496
814,296648214,63620,36514,87618,262118,437149,57034,636402118063687
911,774550412,34416,98712,61015,66398,543131,89028,594332415263063
109652469610,61714,31510,79313,62582,757118,16223,935280813102536
1179574017922812,065933711,88470,551106,23320,298237911292094
1266473481806010,178812110,49160,80592,09317,25019999941774
1355793014705688467116927752,60376,81614,85516648761513
1447252635617777466225830346,10561,66012,74514287791309
1539732310541168625525746740,67148,62411,08512326731112
1633571994472961544925672536,01937,01197041051599935
1728401753413355074366610332,02030,2768514893523797
1823981567364249563861553628,54825,5397543763470669
1920401393318444863427499625,57022,2816694646430575
2017341256278740573027455422,87319,5205951543381492
2114491130244536562658418120,57216,8505293463340421
221250995216432762385384118,62213,7174740406300362
231049879190729712126350316,87810,5614248361261298
24899775166226411895323615,24584763801309240249
25744690146223711676295513,78070313405268218206
26617618128221451500272812,55361033030237191171
27504559111619421340249811,40654072681199166134
2839348397017451193228910,37148452360172147113
2931544084015671066207993814321209614013096
302343697371240945185882003876186412211680
3117733562610088441684728534001635989959
321302945238707631505651529461433858749
33892614316646601315563225381245737543
34722223465165731153487721931089626735
3548190277372494101341501856924526026
363616220728441688535021522782475120
372213715522235475129701321627424015
38161171111722986192477114951838338
39139674130241504200797441429244
406804291195406159481232822202
414652455136320120168325117160
4234714299724687654618011120
4313661360170584389118670
4411717219431022859040
Table 7. Average quality scores of articles for each topic in 44 Wikipedia language editions. Source: own calculations.
Table 7. Average quality scores of articles for each topic in 44 Wikipedia language editions. Source: own calculations.
Lang.AlbumComp.FilmFootb.Music.OfficePersonSettl.TaxoboxTelev.Univ.Videog.
ar12.514.010.88.914.216.613.712.814.213.513.913.9
az13.112.410.315.911.113.111.016.413.115.220.711.7
be12.010.111.910.89.99.18.77.49.812.18.120.4
bg12.417.916.314.117.016.014.923.526.215.321.611.8
ca18.323.923.020.021.822.518.123.025.024.423.622.6
cs12.119.910.218.614.117.715.119.922.214.215.415.6
da11.913.511.411.611.311.49.310.610.611.28.512.5
de29.529.523.219.627.326.821.328.628.524.725.232.7
el23.623.822.920.726.223.920.625.231.721.922.924.7
en23.829.521.119.127.724.326.020.818.823.828.131.3
es18.421.014.018.118.419.918.816.120.920.818.818.2
et10.621.29.811.914.314.115.315.021.615.112.514.1
fa9.314.37.67.88.912.28.712.87.510.215.712.9
fi14.418.514.015.113.116.015.121.021.114.917.414.4
fr14.919.513.715.817.916.417.019.811.818.918.016.8
gl5.913.610.97.910.710.210.711.922.411.811.811.8
he18.022.520.114.618.120.219.525.618.119.323.317.3
hi23.025.817.846.327.019.319.518.919.719.914.130.3
hr18.821.823.520.619.818.318.423.421.624.018.921.4
hu16.321.819.515.319.217.616.715.218.018.623.522.7
hy17.617.212.818.114.712.513.112.110.416.413.916.0
id15.419.316.310.521.016.816.36.84.217.218.019.4
it13.918.412.822.218.416.917.715.221.817.217.715.6
ja10.515.615.117.416.115.015.719.518.022.919.118.1
ka24.615.512.912.916.79.911.113.719.216.615.026.7
ko11.513.38.79.012.312.511.27.716.313.913.717.7
lt12.716.38.421.611.813.715.111.87.914.617.814.8
no11.015.316.117.113.915.915.619.812.519.111.220.1
pl16.018.512.111.619.415.415.419.120.617.923.619.6
pt19.617.315.914.716.715.715.115.210.619.315.718.9
ro17.218.115.615.616.412.913.516.123.115.116.016.9
ru22.521.114.320.117.216.316.816.220.520.318.424.1
sh17.818.712.59.715.413.212.826.020.812.515.516.5
simple18.120.415.314.920.122.020.915.621.216.819.417.7
sl25.920.513.98.617.919.114.223.221.115.613.823.7
sr11.017.37.714.315.714.213.817.624.113.313.316.4
ta16.525.611.917.124.124.823.826.526.718.621.427.8
th17.319.515.714.818.219.917.518.419.215.721.119.3
tr13.715.812.49.914.613.612.814.414.512.816.414.5
uk19.320.714.920.518.016.016.924.217.020.318.025.7
ur16.321.715.718.916.715.519.324.616.216.222.515.0
uz13.315.217.113.713.114.913.08.011.012.211.710.0
vi26.920.419.217.921.918.418.412.116.017.917.222.2
zh22.325.026.821.627.922.521.812.113.527.829.329.4
Table 8. Average popularity metric t p in articles for each topic in 44 Wikipedia language editions. Source: own calculations.
Table 8. Average popularity metric t p in articles for each topic in 44 Wikipedia language editions. Source: own calculations.
Lang.AlbumComp.FilmFootb.Music.OfficePersonSettl.TaxoboxTelev.Univ.Videog.
ar940.61842.01578.2328.02015.02852.01294.5339.6383.31997.41102.7509.7
az503.5466.0130.6122.3464.8511.8319.9148.8152.5212.0364.9126.3
be111.0218.297.940.0121.0106.1124.033.487.6150.2112.878.4
bg162.9904.3483.3376.91146.41419.1613.5247.7168.31306.2569.0582.7
ca94.1342.7115.566.5396.0355.1130.5109.1136.9545.4204.297.9
cs275.31603.8850.3350.91542.83414.31130.7802.41910.92246.4992.11352.9
da208.3920.0223.5326.5856.21337.8750.7523.1613.31496.3312.5453.5
de2609.75147.76075.41263.611,532.112,524.16267.54579.82929.915,321.52551.36210.1
el276.71539.81143.6918.01796.01563.8971.21114.32121.23287.01129.7595.0
en11,111.214,451.016,943.03250.718,625.79016.014,687.22491.92235.326,019.47132.121,296.7
es3495.37508.57622.63110.37905.35143.74634.81369.01014.310,001.43242.65122.7
et130.2408.5214.8115.5462.0343.2312.6228.4474.5535.4231.3243.1
fa869.71154.8949.2290.8845.31510.3801.7120.0347.91298.41297.1554.7
fi371.6964.7793.9172.01044.6803.6561.0572.3659.41327.2482.2609.7
fr2446.73997.33541.01457.44577.63759.73223.21041.3872.69042.32020.91824.7
gl33.5159.059.248.7102.598.596.4103.1133.9105.0128.961.8
he920.11461.41438.0545.91198.3942.4897.51098.11089.22312.9861.41020.5
hi265.0681.2120.4614.5505.8569.5961.8315.61255.8228.2239.1174.2
hr247.7985.7623.0424.41087.1899.8710.8329.8700.21186.9405.3557.9
hu522.11374.71473.9264.41617.81326.3975.0222.8622.71963.61655.31112.1
hy74.7256.9119.498.2178.4286.6205.925.7353.0232.5252.1151.7
id489.31472.4621.0181.71382.21148.6718.8204.4105.5920.11484.2833.1
it1352.82849.52585.81352.43431.32137.41724.3639.51008.27068.91565.21847.2
ja4217.16841.410,135.92112.011,154.67079.97882.02509.46141.325,687.46324.18822.1
ka63.4555.3230.2195.4404.8485.1470.6108.3154.8274.8395.4218.3
ko802.71617.6564.9334.71224.41370.5878.3385.6369.21762.81121.2862.5
lt141.0510.3210.497.8432.8593.8491.1228.7460.0547.5393.1307.3
no190.4525.6372.7241.4606.4466.3270.2293.4226.5931.1223.3354.7
pl922.13305.71765.5714.83805.72328.91753.5485.61410.63654.21151.92152.0
pt1348.13011.02071.71959.63637.22786.62283.5549.8412.64593.01601.32611.7
ro280.3880.6499.7432.11209.51007.2781.099.8180.8996.2543.9747.3
ru7657.97968.812,011.12904.58646.75561.76182.51464.43507.121,073.44641.417,428.5
sh170.3494.3105.2144.5567.0244.5264.232.2437.7268.4108.8282.8
simple139.8486.4187.579.6249.7492.9321.8143.4775.0187.3186.0178.5
sl133.8460.6376.7131.2644.9353.3234.7128.1888.9717.0241.5322.5
sr449.2806.9582.9562.71391.91102.1840.3114.7321.41358.4513.6689.4
ta111.4262.064.867.0186.5307.9302.2122.7285.6130.791.5158.7
th780.42827.11651.41209.83302.62371.82277.02077.71624.13554.54558.2938.1
tr846.22424.11960.9678.72135.52785.62102.8464.71372.13826.71714.12111.2
uk271.6800.8309.5141.2703.7558.1420.2124.3378.0897.6695.6554.0
ur70.7177.257.069.4135.7218.7146.416.8214.781.068.396.5
uz105.4408.9119.1112.6152.5270.0197.021.9155.2167.8182.1205.7
vi794.22695.11531.61004.73342.22050.41686.572.914.32080.01798.11149.5
zh8591.25689.713,477.2600.217,115.24524.95499.5361.8495.217,052.63535.28218.4
Table 9. Top 25 language–topic pairs with share of articles that have the highest quality in comparison to other languages (articles with at least three language versions were considered). Source: own calculations.
Table 9. Top 25 language–topic pairs with share of articles that have the highest quality in comparison to other languages (articles with at least three language versions were considered). Source: own calculations.
Lang.–TopicShare of Art.
en–Videogame60.5%
en–Album55.5%
en–Company49.7%
en–Musical artist49.0%
en–Television47.8%
en–Film43.7%
en–University43.5%
en–Officeholder39.3%
en–Person38.7%
en–Football29.1%
en–Taxobox27.1%
en–Settlement21.2%
it–Football18.1%
uk–Settlement15.1%
de–Film14.7%
es–Taxobox13.9%
vi–Taxobox13.5%
de–Company13.3%
fr–Settlement11.5%
zh–University10.9%
zh–Television10.5%
it–Person10.0%
de–Football9.1%
pl–Settlement8.8%
de–Taxobox8.0%
Table 10. Top 25 language versions and topics with share of articles that have the highest popularity in comparison to other languages (articles with at least three language versions were considered). Source: own calculations.
Table 10. Top 25 language versions and topics with share of articles that have the highest popularity in comparison to other languages (articles with at least three language versions were considered). Source: own calculations.
Lang.—TopicShare of Art.
en–Album85.8%
en–Videogame85.7%
en–Taxobox73.6%
en–Film73.3%
en–Musical artist67.6%
en–Company66.3%
en–Television64.4%
en–Person62.4%
en–Football55.2%
en–Officeholder54.4%
en–University50.8%
en–Settlement39.1%
ru–Settlement16.9%
ru–Officeholder14.2%
fr–Settlement11.3%
ru–Football9.6%
es–Television9.5%
ja–University9.3%
es–Football8.9%
ru–Person8.4%
ja–Television8.0%
ja–Company7.7%
ru–University7.6%
ja–Videogame7.5%
es–Officeholder6.9%
Table 11. Top 25 language versions and topics with the highest phi coefficients between articles with the highest quality and popularity (articles with at least three language versions were considered). Source: own calculations.
Table 11. Top 25 language versions and topics with the highest phi coefficients between articles with the highest quality and popularity (articles with at least three language versions were considered). Source: own calculations.
Lang.–TopicCorrelation Coeff.
th–University0.838
th–Officeholder0.762
vi–University0.719
pl–University0.717
pt–University0.707
id–University0.705
th–Musical artist0.684
es–University0.683
tr–University0.677
en–University0.676
id–Settlement0.656
fa–Television0.655
et–Television0.65
sl–Film0.642
cs–University0.64
ja–Company0.637
fr–University0.636
pt–Television0.632
en–Television0.63
en–Company0.625
zh–Officeholder0.615
ja–University0.614
vi–Musical artist0.607
bg–University0.603
bg–Television0.602
Table 12. Spearman’s rank correlation coefficients for shares of the articles of the highest quality and popularity, on various topics. Source: own calculation using [23].
Table 12. Spearman’s rank correlation coefficients for shares of the articles of the highest quality and popularity, on various topics. Source: own calculation using [23].
TopicSpearman’s Rank Cor. Coef.Two-Sided p-Value
Album0.7227 3.05 × 10 8
Company0.8749 8.29 × 10 15
Film0.6408 2.80 × 10 6
Football biography0.7872 2.33 × 10 10
Musical artist0.8453 5.27 × 10 13
Officeholder0.7665 1.32 × 10 9
Person0.8370 1.45 × 10 12
Settlement0.6146 9.09 × 10 6
Taxobox0.6997 1.26 × 10 7
Television0.7950 1.15 × 10 10
University0.8362 1.60 × 10 12
Videogame0.7436 7.35 × 10 9

© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Informatics EISSN 2227-9709 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top