Next Article in Journal
Open Source Fundamental Industry Classification
Previous Article in Journal
Towards Automatic Bird Detection: An Annotated and Segmented Acoustic Dataset of Seven Picidae Species
Article Menu

Export Article

Open AccessData Descriptor
Data 2017, 2(2), 19; doi:10.3390/data2020019

Four Datasets Derived from an Archive of Personal Homepages (1995–2009)

Department of Psychology, Murray State University, Murray, KY 42071, USA
Academic Editor: Xinyue Ye
Received: 31 December 2016 / Revised: 25 May 2017 / Accepted: 8 June 2017 / Published: 13 June 2017
View Full-Text   |   Download PDF [867 KB, uploaded 13 June 2017]   |  

Abstract

While data from social media are easily accessible, understanding how individuals expressed themselves on the Internet in its initial years of public availability (the mid-late 1990s) has proved difficult. In this data deposit, I describe how archival data from Geocities homepages were retrieved and processed to remove non-text data, then further refined to create separate datasets, each of which provides unique insights into modes of personal expression on the early Internet. The present paper describes four datasets, all of which were derived from a larger collection of personal websites: (1) a large corpus of raw text data from Geocities personal homepages, (2) a linguistic analysis of basic psychological properties of the same Geocities pages, using an open-source implementation of the Linguistic Inquiry Word Count (LIWC), (3) a dataset of links between homepages (suitable for network analysis), and (4) a manifest dataset summarizing the size and last update date for each file in the dataset. Data from over 378,000 Geocities pages are included. In addition to providing a detailed description of how these datasets were created, I describe how they might be utilized in future research. View Full-Text
Keywords: Internet; linguistics; online culture; Linguistic Inquiry Word Count (LIWC); corpora; homepages; cyberpsychology; network analysis Internet; linguistics; online culture; Linguistic Inquiry Word Count (LIWC); corpora; homepages; cyberpsychology; network analysis
Figures

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. (CC BY 4.0).

Scifeed alert for new publications

Never miss any articles matching your research from any publisher
  • Get alerts for new papers matching your research
  • Find out the new papers from selected authors
  • Updated daily for 49'000+ journals and 6000+ publishers
  • Define your Scifeed now

SciFeed Share & Cite This Article

MDPI and ACS Style

Rife, S.C. Four Datasets Derived from an Archive of Personal Homepages (1995–2009). Data 2017, 2, 19.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Data EISSN 2306-5729 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top