# Webometrics: Some Critical Issues of WWW Size Estimation Methods

^{*}

^{†}

^{‡}

## Abstract

**:**

## 1. Introduction

## 2. Contributions to Webometrics Study

#### 2.1. On Webometrics

#### 2.2. Study of Overlap

#### 2.3. Graph Nature of the World Wide Web

#### 2.4. Diameter of the Web Graph

#### 2.5. Experiments on Closed Environment vs. the Web

## 3. Estimating the Size of the Indexed Web

#### 3.1. Search Engines and WWW Size Estimation

#### 3.2. Methods Surveyed

#### 3.2.1. Statistical Approach Using Web Page Sampling

#### 3.2.2. Updated Experiment Setting

#### 3.2.3. Size Estimation through Quadrat Sampling

#### 3.2.4. Size Estimation through Extrapolation

#### 3.3. Index Stability

## 4. Discussion and Conclusions

## Acknowledgments

## Author Contributions

## Conflicts of Interest

## Abbreviations

WWW | World Wide Web |

W3C | World Wide Web Consortium |

IETF | Internet Engineering Task Force |

AFRINIC | African Network Information Centre |

DMOZ | directory.mozilla.org |

## References

- Rhodenizer, D.; Trudel, A. How Big is the World Wide Web? ICWI: Kingston, Jamaica, 2002; pp. 176–183. [Google Scholar]
- Van den Bosch, A.; Bogers, T.; De Kunder, M. Estimating search engine index size variability: A 9-year longitudinal study. Scientometrics
**2016**, 107, 839–856. [Google Scholar] [CrossRef] [PubMed] - Broder, A.; Kumar, R.; Maghoul, F.; Raghavan, P.; Rajagopalan, S.; Stata, R.; Tomkins, A.; Wiener, J. Graph structure in the web. Comput. Netw.
**2000**, 33, 309–320. [Google Scholar] [CrossRef] - Gulli, A.; Signorini, A. Building an open source meta-search engine. In Proceedings of the Special Interest Tracks and Posters of the 14th International Conference on World Wide Web, Chiba, Japan, 10–14 May 2005; ACM: New York, NY, USA, 2005; pp. 1004–1005. [Google Scholar]
- Brin, S.; Page, L. Reprint of: The anatomy of a large-scale hypertextual web search engine. Comput. Netw.
**2012**, 56, 3825–3833. [Google Scholar] [CrossRef] - Björneborn, L.; Ingwersen, P. Toward a basic framework for webometrics. J. Assoc. Inf. Sci. Technol.
**2004**, 55, 1216–1227. [Google Scholar] [CrossRef] - Björneborn, L.; Ingwersen, P. Perspective of webometrics. Scientometrics
**2001**, 50, 65–82. [Google Scholar] [CrossRef] - Spink, A.; Jansen, B.J.; Blakely, C.; Koshman, S. A study of results overlap and uniqueness among major web search engines. Inf. Process. Manag.
**2006**, 42, 1379–1391. [Google Scholar] [CrossRef] [Green Version] - Taneja, H. Mapping an audience-centric World Wide Web: A departure from hyperlink analysis. New Media Soc.
**2016**, 19, 1331–1348. [Google Scholar] [CrossRef] - Bharat, K.; Broder, A. A technique for measuring the relative size and overlap of public web search engines. Comput. Netw. ISDN Syst.
**1998**, 30, 379–388. [Google Scholar] [CrossRef] - Kleinberg, J.; Kumar, R.; Raghavan, P.; Rajagopalan, S.; Tomkins, A. The web as a graph: Measurements, models, and methods. In Proceedings of the International Computing and Combinatorics Conference, Tokyo, Japan, 26–28 July 1999; pp. 1–17. [Google Scholar]
- Albert, R.; Jeong, H.; Barabási, A.L. Internet: Diameter of the world-wide web. Nature
**1999**, 401, 130–131. [Google Scholar] [CrossRef] - Orduña-Malea, E.; Ayllón, J.M.; Martín-Martín, A.; López-Cózar, E.D. Methods for estimating the size of Google Scholar. Scientometrics
**2015**, 104, 931–949. [Google Scholar] [CrossRef] - Khabsa, M.; Giles, C.L. The number of scholarly documents on the public web. PLoS ONE
**2014**, 9, e93949. [Google Scholar] [CrossRef] [PubMed] - Greenfield, D.N.; Davis, R.A. Lost in cyberspace: The web@work. CyberPsychol. Behav.
**2002**, 5, 347–353. [Google Scholar] [CrossRef] [PubMed] - Bar-Ilan, J. Search engine results over time: A case study on search engine stability. Cybermetrics
**1999**, 2, 1–16. [Google Scholar] - Spink, A.; Jansen, B.J.; Kathuria, V.; Koshman, S. Overlap among major web search engines. Internet Res.
**2006**, 16, 419–426. [Google Scholar] [CrossRef] - Gulli, A.; Signorini, A. The indexable web is more than 11.5 billion pages. In Proceedings of the Special Interest Tracks and Posters of the 14th International Conference on World Wide Web, Chiba, Japan, 10–14 May 2005; ACM: New York, NY, USA, 2005; pp. 902–903. [Google Scholar]
- Xing, S.; Paris, B.P. Measuring the size of the Internet via importance sampling. IEEE J. Sel. Areas Commun.
**2003**, 21, 922–933. [Google Scholar] [CrossRef] - Lewandowski, D.; Wahlig, H.; Meyer-Bautor, G. The freshness of web search engine databases. J. Inf. Sci.
**2006**, 32, 131–148. [Google Scholar] [CrossRef] - Lewandowski, D. A three-year study on the freshness of web search engine databases. J. Inf. Sci.
**2008**, 34, 817–831. [Google Scholar] [CrossRef] - Rosenbaum, P.R.; Rubin, D.B. The central role of the propensity score in observational studies for causal effects. Biometrika
**1983**, 70, 41–55. [Google Scholar] [CrossRef]

**Figure 2.**Connectivity of the web, as in [3].

**Figure 4.**Size of static web pages in 1997 as in [10].

**Figure 5.**Estimated index sizes of Google and Bing as in [2].

Website | Information Provided |
---|---|

WorldWideWebSize | Daily estimates on Google and Bing index sizes |

InternetLiveStats | Live update of variety of statistics on things connected to the internet |

InternetWorldStats | Provides statistics on the world internet usage |

Statista | Provides variety of statistics on the online and offline world of the consumer |

Alexa | Provides commercial web traffic data and analytics |

The Internet Map | Tries to display the web as a map |

httpArchive | Provides statistics on how the data is constructed and served on the internet |

Netcraft | Provides research data and analysis on many aspects of the internet |

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Mohana Arunachalam, S.; Koumpis, A.; Handschuh, S.
Webometrics: Some Critical Issues of WWW Size Estimation Methods. *Multimodal Technol. Interact.* **2018**, *2*, 12.
https://doi.org/10.3390/mti2020012

**AMA Style**

Mohana Arunachalam S, Koumpis A, Handschuh S.
Webometrics: Some Critical Issues of WWW Size Estimation Methods. *Multimodal Technologies and Interaction*. 2018; 2(2):12.
https://doi.org/10.3390/mti2020012

**Chicago/Turabian Style**

Mohana Arunachalam, Srinivasan, Adamantios Koumpis, and Siegfried Handschuh.
2018. "Webometrics: Some Critical Issues of WWW Size Estimation Methods" *Multimodal Technologies and Interaction* 2, no. 2: 12.
https://doi.org/10.3390/mti2020012