Next Article in Journal
Wideband Spectrum Sensing Method Based on Channels Clustering and Hidden Markov Model Prediction
Previous Article in Journal
Improving Basic Natural Language Processing Tools for the Ainu Language
Open AccessArticle

Analysis of Data Persistence in Collaborative Content Creation Systems: The Wikipedia Case

Department of Electronic Engineering, University of Rome “Tor Vergata”, Via Cracovia, 00133 Rome, Italy
*
Author to whom correspondence should be addressed.
Information 2019, 10(11), 330; https://doi.org/10.3390/info10110330
Received: 9 September 2019 / Revised: 16 October 2019 / Accepted: 23 October 2019 / Published: 25 October 2019
(This article belongs to the Special Issue Knowledge Discovery on the Web)
A very common problem in designing caching/prefetching systems, distribution networks, search engines, and web-crawlers is determining how long a given content lasts before being updated, i.e., its update frequency. Indeed, while some content is not frequently updated (e.g., videos), in other cases revisions periodically invalidate contents. In this work, we present an analysis of Wikipedia, currently the 5th most visited website in the world, evaluating the statistics of updates of its pages and their relationship with page view statistics. We discovered that the number of updates of a page follows a lognormal distribution. We provide fitting parameters as well as a goodness of fit analysis, showing the statistical significance of the model to describe the empirical data. We perform an analysis of the views–updates relationship, showing that in a time period of a month, there is a lack of evident correlation between the most updated pages and the most viewed pages. However, observing specific pages, we show that there is a strong correlation between the peaks of views and updates, and we find that in more than 50% of cases, the time difference between the two peaks is less than a week. This reflects the underlying process whereby an event causes both an update and a visit peak that occurs with different time delays. This behavior can pave the way for predictive traffic analysis applications based on content update statistics. Finally, we show how the model can be used to evaluate the performance of an in-network caching scenario. View Full-Text
Keywords: Wikipedia; real-data statistics; update statistics; popularity; caching; content revisions Wikipedia; real-data statistics; update statistics; popularity; caching; content revisions
Show Figures

Figure 1

MDPI and ACS Style

Bracciale, L.; Loreti, P.; Detti, A.; Blefari Melazzi, N. Analysis of Data Persistence in Collaborative Content Creation Systems: The Wikipedia Case. Information 2019, 10, 330.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop