Four Datasets Derived from an Archive of Personal Homepages (1995–2009)
AbstractWhile data from social media are easily accessible, understanding how individuals expressed themselves on the Internet in its initial years of public availability (the mid-late 1990s) has proved difficult. In this data deposit, I describe how archival data from Geocities homepages were retrieved and processed to remove non-text data, then further refined to create separate datasets, each of which provides unique insights into modes of personal expression on the early Internet. The present paper describes four datasets, all of which were derived from a larger collection of personal websites: (1) a large corpus of raw text data from Geocities personal homepages, (2) a linguistic analysis of basic psychological properties of the same Geocities pages, using an open-source implementation of the Linguistic Inquiry Word Count (LIWC), (3) a dataset of links between homepages (suitable for network analysis), and (4) a manifest dataset summarizing the size and last update date for each file in the dataset. Data from over 378,000 Geocities pages are included. In addition to providing a detailed description of how these datasets were created, I describe how they might be utilized in future research. View Full-Text
Share & Cite This Article
Rife, S.C. Four Datasets Derived from an Archive of Personal Homepages (1995–2009). Data 2017, 2, 19.
Rife SC. Four Datasets Derived from an Archive of Personal Homepages (1995–2009). Data. 2017; 2(2):19.Chicago/Turabian Style
Rife, Sean C. 2017. "Four Datasets Derived from an Archive of Personal Homepages (1995–2009)." Data 2, no. 2: 19.
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.