Abstract
New health-related concepts, terms, and topics emerge, and the meanings of existing terms and topics keep changing. This study investigated and explored the evolutions of the women’s health topic on Wikipedia. The creation time, page views data, page edits data, and text of historical versions of 207 women-health-related entries from 2010 to 2017 on Wikipedia were collected. Coding, subject analysis, descriptive and inferential statistical analysis, and Self-Organizing Map and n-gram approaches were employed to explore the characteristics and evolutions of the entries for the women’s health topic. The results show that the number of the women-health-related entries kept increasing from 2010 to 2017, and nearly half of them were related to the supports and protection of women’s health. The total number of page views of the investigated items increased from 2011 to 2013, but it decreased from 2013 to 2017, while the total number of page edits stayed stable from 2010 to 2017. Growing subjects were found during the investigated period, such as abuse and violence, and family planning and reproduction. However, the entries related to the economy and politics were diminishing. There was no association between the internal characteristic evolution and the external popularity evolution of the women’s health topic.
1. Introduction
With the development of computer technology and Internet technology, the volume of health information keeps increasing on the Internet. According to Tu’s report, the proportion of people among all the consumers who sought health information online increased from 15.9% to 32.6% from 2001 to 2010 [1]. A survey in 2013 reported that 87% of USA adults use the Internet and among them, 72% stated that they sought health information online during the past year [2].
The emergence of Web 2.0 advocated the creation of social media, which changed the method of communication between health organizations, health consumers, patients, and health professionals [3]. Dawson reported that 81% of European consumers and 63% of USA consumers trust the health information on social media applications [4]. Marar, Al-Madaney, and Almousawi found that 85% of patients and their companions sought health information via social media [5]. These statistics reveals that social media is recognized as an important channel for seeking health information in recent years.
The health-related information on social media covers a wide range of topics, including diseases and treatments, nutrition, health care and insurance, and healthy lifestyle [6]. The women’s health topic is a main category among the various health-related topics on social media. Women usually face special health risks, such as pregnancy, cervical and breast cancer, and accompanying physical and psychological issues [7,8,9]. Social media are currently utilized as important platforms by women to gain and share specific women-health-related information, to strengthen social supports by building social connections with others, and to self-manage health care [10,11,12,13].
As the volume of health information increases rapidly on social media, new terms, concepts, and topics emerge, which cause problems in health information seeking from both system and users’ perspectives. Therefore, it is necessary to explore the temporal features of health-related concepts, terms, and topics on social media. Wikipedia is one of the largest social media platforms consisting of user-generated articles and the semantic relations between them [14,15]. Wikimedia Downloads stores all the historical versions and data of entries, including editors, viewers, content, interactions, and temporal data [16]. All of its data are open to the public. Hence, to investigate the temporal features of the women’s health topic on social media, Wikipedia is a proper data source. However, few studies have focused on the women’s health user-generated content on Wikipedia, and the temporal characteristics of the women’s health topic on social media have not been adequately investigated, either. This study aims at discovering the themes, subjects, and entries related to women’s health on Wikipedia and the relations among them and exploring how the women’s health topic evolved from 2010 to 2017 on Wikipedia on both internal and external aspects.
2. Research Problem and Questions
The research problem of this study is to investigate and discover the evolution of the women’s health topic derived from the social media website Wikipedia. The research questions are as follows:
- RQ1: What are the emergent themes and subjects of the women’s health topic on Wikipedia and the relations among them?
- RQ2: How does the women’s health topic evolve on Wikipedia in terms of its internal characteristics and external popularity during the investigated time periods?
In this study, a theme of the topic means a specific and distinctive concern of a group of Wikipedia entries. The entries collected can be assigned to several categories, and every category has its own theme. A subject of the topic means the focus of an entry. Every Wikipedia entry can have one or more than one subjects. The structure of entries, themes, and subjects for the Women’s Health topic in a specific time period is shown in Figure 1.
Figure 1.
Concept Map.
The internal characteristics of a specific topic in different time periods show the emergences, growths, and disappearances of entries, subjects, and terms in each theme. The external popularity of an entry was measured by its number of page edits and number of page views. The number of page edits reflects the popularity of an entry among the Wikipedia editors, and the number of page views reflects the popularity of an entry among the Wikipedia viewers. This study explored the internal characteristics and external popularities of the women’s health topic from 2010 to 2017. Four periods were defined: 2010 to 2011 (Period 1), 2012 to 2013 (Period 2), 2014 to 2015 (Period 3), and 2016 to 2017 (Period 4).
3. Materials and Methods
3.1. Data Collection
The data collection processes of this study contain entries collection, text collection, and page views and edits data collection from Wikipedia. Figure 2 illustrates the entire data collection processes.
Figure 2.
Data Collection Processes.
Wikipedia is regarded as a social media platform and its content is open to the public. It does not include any private data of editors and users. The investigated study does not involve any human subjects and the topic is not sensitive. Therefore, it was exempt from ethics approval.
3.1.1. Entries Collection
To explore the evolution of the women’s health topic on social media, Wikipedia was selected as the data source in this study. Milne and Witten argued that Wikipedia is a rapidly growing platform containing vast interlinked information [15]. The richness of its content makes it an important resource for knowledge sharing and citation, and even for research. The history of each entry on Wikipedia is accessible to users, which means that all the historical versions of the entry are recorded by Wikipedia and could be viewed and collected by researchers.
Two methods were applied to retrieve entries related to women’s health on Wikipedia. For the first method, the entries in the “See also” section of the women’s health entry and the entries in the “See also” sections of these related entries were regarded as the associated entries, since the “See also” section of an entry contains the relevant entries selected by editors. For the second method, the term “women’s health” was used as the search term to retrieve related entries on Wikipedia. The search results returned were ranked by relevance, and the top 100 search results returned were examined by the researcher. The associated entries, which were not the same as the entries obtained by the first method, were collected.
3.1.2. Text Collection
The content of every history version of an entry on Wikipedia contains several sections, such as content, main text, and reference. Although not all entries consist of the same sections, certain sections are included in almost all entries. They are the title, other entries associated to this entry, a short description of the entry, content, main body, “See also”, and reference. The content section includes the content table of an entry, the main text or main body of an entry, and the reference section, which consists of references and URLs of references of an entry.
For each of the associated entries, the text data of the current version and the last version generated in 2011, 2013, 2015, and 2017 were collected based on the time periods determined. For each history version, the text of all the sections of each entry was collected. The WikipediR package developed by Oliver Keyes run on R was adopted for text data collection [17]. The software R is developed by The R Foundation for Statistical Computing who is seated in Vienna and Austria [18]. It enables the researcher to retrieve and gather the text content of an entry’s current and historical versions.
3.1.3. Page Views and Edits Data Collection
The page views data during 2010 to 2017 were collected from Wikimedia Downloads. This website provides the Wikipedia data dumps that store all the historical page views data of all the Wikipedia entries since January 2010. The page edits data were collected from the view history page of the associated entries by R and RStudio. The software RStudio is developed by the RStudio Team in Boston, MA, USA [19].
3.2. Data Analysis
3.2.1. Categories and Themes
The entries obtained related to the topic on different aspects. In order to explore the relations among the entries, they were grouped into several categories in terms of their content. Since there are no existing categories of these entries, the open coding method was employed to analyze the associated entries and group them into several categories. In this study, every category had only one theme.
3.2.2. Text Data Processing
The subjects of every theme were extracted from the entries belonging to it by clustering and text mining approaches. To apply these approaches, the text data obtained for the themes were cleansed and transformed first.
The open source software R and RStudio and the tm package were adopted for text data cleansing, transformation, and processing. The punctuations, stop words, meaningless words (e.g., numbers, dates, equations, and so on), and words whose frequencies were less than 4 were removed. For each theme, a document–term matrix (Equation (1)) of the vector space model was presented. The matrix has m rows and n columns. The value of the cell (aij) in the matrix represents the frequency of the term j in the entry i.
Then, the document–term matrices were transformed to Term Frequency × Inverse Document Frequency (TF-IDF) matrices. Each value of the TF-IDF matrices (wij) was calculated based on Equation (2). In this equation, m is the number of the entries of a matrix, and ej represents the number of the entries containing the term j in the matrix. The TF-IDF matrices obtained were the input matrices for the following clustering process.
3.2.3. SOM Approach
To cluster the entries of the categories, the Self-Organizing Map (SOM) approach was employed in this study. It is a widely used neural network method that measures similarities among items of input data so as to form similarity graphs. The whole procedure of this approach is a recursive regression process [20]. The input matrices are the TF-IDF matrices created for the categories.
The output of the SOM approach is an output display map. Similar entries were assigned to the same node on the output display map. A U-matrix was used for projecting the clustering results to SOM displays. Every entry was projected to the SOM display as a number. Numbers with shorter distances among them were more similar than those with longer distances. Moreover, the similarity among entries was indicated by the color of an SOM output. The color projected to the SOM display background was determined by a U-matrix [21]. Higher values of the U-matrix stood for cluster borders, while lower values represented clusters.
According to the distances between numbers and the background colors, the entries of each matrix were clustered. The criteria for clustering the numbers are as follows: (1) the numbers located in the same SOM node were grouped into one cluster; and (2) if the numbers are located in two or more nodes, and the nodes are adjacent, or separated by only one empty node, and at the same time, the numbers are located in the same area where the U-matrix values are lower than half of the highest U-matrix value of the matrix, then these numbers were grouped into one cluster. In this way, the entries of each category were assigned into several clusters.
3.2.4. Subject Analysis
To identify the subjects of the clusters and categories, the n-gram approach was employed. The n-gram package offered by R extracts the n-word phrases in unstructured text files.
The historical revisions of the entries in one cluster were merged into one document. For each category, the historical revisions in a specific period of its entries were also merged into one document. The most used 2-word, 3-word, and 4-word phrases in each document were extracted by the n-gram package. The set phrases and meaningless phrases (e.g., “of the” and “the study is a”) were removed from the dataset. If a phrase was a part of another one and the two phrases had the same meaning, then they were regarded as one phrase, and their frequencies were added together (e.g., “child development index” and “the child development index”). After data processing, a list containing phrases and frequencies was obtained for each document.
The researcher manually reviewed the lists to summarize the subjects of each document. One phrase could relate to more than one subject, and different phrases could relate to the same subjects. In this way, the subjects of each cluster were generated, and the subjects in each period of each theme were generated.
To find the increasing and decreasing phrases in each category, the differences of the frequencies obtained from the adjunct periods for a phrase was calculated. After all the frequency differences were obtained for each category, the researcher reviewed the most increasing and most decreasing phrases to generate the subjects.
3.2.5. Inferential Analysis
Inferential statistical tests allow the researcher to gain insights into the differences among the objects. In addition to descriptive statistical methods, inferential statistical analysis was applied to test the differences among the determined periods for the women’s health topic. The hypotheses are:
- H01: There were no significant differences among the investigated time periods in terms of the number of views of the entries relevant to women’s health.
- H02: There were no significant differences among the investigated time periods in terms of the number of edits of the entries relevant to women’s health.
Since the independent variable (time period) was categorical, the dependent variables (number of views and number of edits) were continuous with repeated measures, and the distributions of the dependent variables did not follow the normal distribution; meanwhile, the Friedman’s Test was applied to test the differences among the periods. To explore the difference among every two periods, a series of pairwise comparisons were conducted. Since the distribution of the differences among every two periods was not symmetrical, the Sign Test was used. The significant level of the inferential statistical tests was 0.05.
4. Results and Discussion
4.1. Descriptive Results
4.1.1. Entries and Themes
According to the data collection and analysis strategy, 207 associated entries were obtained, and four themes were generated from them. Table 1 lists the themes of the women’s health topic, the number of entries related to each theme, and the description of each theme. The Support and protection (WH-SP) theme had the most relevant entries (99 entries) among the four themes, which reflects that the general public cared more about the protection of women’s health than the other themes.
Table 1.
Themes of Women’s Health Topic.
Table 2 presents the numbers of the entries created from 2010 to 2017 of each theme. The number of the entries had a steady rise. The WH-SP theme contributed to the entry increase the most among the four themes every year from 2010 to 2017. In 2010 and 2015, more new entries were created for this theme compared with the other years. Another special case is that for the MIS theme, 4 new entries were generated in 2013, which was larger than the other years.
Table 2.
Number of Entries Created during the Investigated Time Periods.
4.1.2. Page Views and Edits
Figure 3 illustrates the Numbers of Yearly Page Edits (NYPEs) of the four themes of the women’s health topic, and the NYPE of the topic as well. This figure reveals that the general trend of the NYPE of the women’s health topic decreased from 2010 to 2017. The Support and protection theme received the largest NYPEs for six years (2010 to 2012 and 2015 to 2017) among the investigated eight years. In 2013 and 2014, the Discrimination, violence, harm, and subordination theme surpassed Support and protection and occupied the first position. The NYPEs of these two themes and Medical and interdisciplinary subjects fluctuated from 2010 to 2017, and no obvious ascending or descending trend was found for them. The NYPEs of the Health problems and risks theme rose from 2010 and reached its peak in 2012; then, it began to drop and reached its trough in 2017.
Figure 3.
Numbers of Yearly Page Edits for Each Theme of Women’s Health.
Figure 4 displays the Numbers of Yearly Page Views (NYPVs) of the four themes of the women’s health topic, and the NYPV of the topic as well. The trend of the total page views decreased from 2010 to 2011, increased from 2011 to 2013, but then decreased again after that. The trends of the four themes were similar to the trend of the entire women’s health topic. The Support and protection theme and the Medical and interdisciplinary subjects theme ranked in the top two places among the four themes from 2010 to 2017. The Health problems and risks theme occupied the third place from 2010 to 2012 but fell to the last place from 2013. The trend of the Discrimination, violence, harm, and subordination theme was slightly different from the other three themes, because its NYPV increased rapidly from 2010 to 2013. However, the decreasing of its NYPV from 2013 to 2017 was similar to the other themes and the Women’s Health topic.
Figure 4.
Numbers of Yearly Page Views for Each Theme of Women’s Health.
For each theme and the entire topic, its NYPE trend differed a lot from its NYPV trend from 2010 to 2017. No association was found between the NYPEs and the NYPVs. It indicates that the user groups who created the page edits and the page views had different interests in the investigated time periods. The Wikipedia editors were more interested in Support and protection, and Discrimination, violence, harm, and subordination, while the Wikipedia viewers were more interested in the Support and protection and Medical and interdisciplinary subjects.
4.2. Subject Analysis Results and Discussion
Figure 5, Figure 6, Figure 7 and Figure 8 are the SOM displays for the four identified themes. The color bars on the right side of the figures represent different values of the U-matrix. A lower value means higher similarity. In the displays, every number stands for an entry, and the corresponding entry of each number is presented in Appendix A. Every rectangle or polygon represents a cluster and the numbers in the same rectangle/polygon represent the entries belonging to the same cluster. The numbers not included in any rectangle/polygon stand for the isolated entries, which were not grouped to any clusters. The clusters with more than three entries were recognized as large clusters, while those with three or less entries were the small clusters. The large clusters were represented by purple rectangles or polygons. The small clusters were represented by red rectangles.
Figure 5.
Self-Organizing Map (SOM) Display of DVHS.
Figure 6.
SOM Display of WH-HPR.
Figure 7.
SOM Display of MIS.
Figure 8.
SOM Display of WH-SP.
Table 3, Table 4, Table 5 and Table 6 display the high-frequency terms and phrases, as well as the subjects discovered within each large cluster. The high-frequency terms/phrases were extracted from the entries by the n-gram approach. The high-frequency terms and phrases are displayed in the second column of each table, and the frequency of each term/phrase is included in the brackets following the term/phrase. The researcher proposed the subjects of each large cluster by examining the high-frequency terms and phrases of it. The small clusters and isolated entries were not included in these tables.
Table 3.
Subject Analysis of DVHS.
Table 4.
Subject Analysis of WH-HPR.
Table 5.
Subject Analysis of MIS.
Table 6.
Subject Analysis of WH-SP.
4.2.1. The Discrimination, Violence, Harm, and Subordination (DVHS) Theme
Figure 5 presents four large clusters and two small clusters discovered for the DVHS theme. Clusters C3 and C4 were all located in the same blue area. Therefore, the entries in these two clusters were relevant to one another. Table 3 lists the clusters and their high-frequency terms, phrases, and subjects.
The minority group subject occurred in all the four clusters of this theme, so it was the most dominant subject of the DVHS theme. The minority groups mentioned in the entries associated to this subject contained LGBT people, women, and African Americans (black).
The inequality and discrimination subject and the abuse and violence subject appeared in three clusters. The former subject had three lower-level subjects, which were health care inequality and discrimination, inequality and discrimination in work, and inequality and discrimination in research. Health care inequality and discrimination was reflected by the “missing women” phenomenon. As it was demonstrated in the “Missing women” entry, this phenomenon indicated that the number of women in a region was smaller than the expected number of women, which was caused by sex-selective abortion, female infanticide, and inadequate health care and nutrition for female children.
The abuse and violence subject had two lower-level subjects: sexual violence and heterosexist violence. The findings imply that these two types of violence were the most discussed violence-related subjects in the women’s health topic. The associated entries of the abuse and violence subject mentioned not only different types of violence, but also the causes of the violence. For instance, rape culture was one of the main causes of high rape rates in certain countries, such as India.
Women protection was another subject of the DVHS theme. Its associated entries were about the organizations (e.g., the Supreme Court of the United States) and research (e.g., the triple oppression theory) of women protection.
4.2.2. The Health Problems and Risks (WH-HPR) Theme
Figure 6 presents that two clusters were generated for the WH-HPR theme. These two clusters were close to each other and in the same blue area, which indicates that the entries in the two clusters shared some similarities. Table 4 presents the high-frequency terms/phrases and subjects of the two clusters.
The WH-HPR theme only had two subjects: health issues and inequality and discrimination. Cluster C1 mainly concentrated on health problems, such as high blood pressure and sexually transmitted infections. In this cluster, a frequently used synonym of high blood pressure was found, which was hypertension. The term “hypertension” was often used by health professionals and the term “high blood pressure” was usually used by lay people. Since Wikipedia is a user-generated platform, these two expressions were both utilized in the Wikipedia entries.
The health issue subject of Cluster C2 had four lower-level subjects, including health service (e.g., health care), research (e.g., medical anthropology), organization (e.g., the World Health Organization), and problem (e.g., heart disease). Another subject of this cluster was inequality and discrimination, and this subject had a lower-level subject, research. For example, an entry of this subject was about the “Gender polarization” concept proposed by American psychologist Sandra Bem [22].
4.2.3. The Medical and Interdisciplinary Subjects (MIS) Theme
Figure 7 presents that four large clusters and two small clusters emerged from all the entries of the MIS theme. Two of the four clusters were either not close to each other or had green areas between them. These results show that the entries of the four clusters were not quite relevant. Table 5 lists the clusters, high-frequency terms, phrases, and subjects.
The health issue subject and the family planning and reproduction subject appeared in all the four clusters, but each cluster of the MIS theme had their own unique subject: C1 had the abuse and violence subject, C2 had the minority group subject, C3 had the social factor subject, and C4 had the population issue subject. For the abuse and violence subject, a new lower-level subject emerged from this theme, which was the structural violence subject. Different from the previous types of violence, structural violence was caused by social structure or social institution.
4.2.4. The Support and Protection (WH-SP) Theme
Figure 8 presents that eight large clusters and five small clusters were discovered for the WH-SP theme. Clusters C1 to C7 were all located in the same blue area, which means that their entries had similarities to some extent. Cluster C8 stayed in another blue area, and the yellow and green areas separated it from the other clusters, which means that its entries had no strong connections with the entries of the other clusters. The eight large clusters and their high-frequency terms/phrases and subjects are displayed in Table 6.
Table 6 shows that the health issue subject appeared in the first seven clusters (Clusters C1 to C7) of the WH-SP theme, more than any other subjects of this theme. Therefore, the health issue subject was the salient subject of the WH-SP theme. This subject had several lower-level subjects, including health organizations (e.g., OHSU Center for Women’s Health), services (e.g., obstetric and neonatal nurses), research (e.g., women’s studies journals), problems (e.g., heart disease), education (e.g., Performance Indicators), and laws (e.g., Social Security Act). The health education subject of this theme covered the content about the performance indicators that were used for student assessment. The health law subject, which only occurred in this theme, referred to the entries of health-related laws and policies, such as the Social Security Act and the policies developed by the European Institute of Women’s Health.
The minority group subject, the population issue subject, the family planning and reproduction subject, and the woman protection subject each only appeared in one cluster, respectively. Different from the other three subjects, the woman protection subject did not occur together with the health issue subject, which indicates that there was no strong connection between C8 and the other seven clusters. The entries in C8 were relevant to woman protection activities and research. For example, many theorists proposed a series of feminism theories (e.g., liberal feminism and gender feminism) so as to fight against gender inequality. A certain instance was the history of women fighting for equal smoking rights.
Table 7 lists the subjects of the identified themes and shows the relations between the themes and subjects. It reveals that the DVHS theme, the MIS theme, and the WH-SP theme had more diverse subjects compared with the WH-HPR theme. In other words, the entries’ subjects of the WH-HPR theme were more centralized than those of the other themes. Among the four themes, the MIS theme and the WH-SP theme had more common subjects. Meanwhile, every two of the four themes had one or more subjects in common with each other, which indicates that these themes were relevant to each other.
Table 7.
Subjects of Women’s Health. The check mark (√) shows that a certain subject appears in a category.
4.3. Evolution of the Women’s Health Topic
4.3.1. Entry Growth
New entries created in a certain period reflect the Wikipedia editors’ new interests and focuses during the period. According to Table 2, among the four themes of women’s health, the WH-SP theme had much more new entries in the four investigated periods than the other three themes.
After reviewing all the new generated entries from 2010 to 2017, it reveals that the new entries in the DVHS theme were related to sexism, such as sexism in the workplace (e.g., Women in law enforcement) and sexism in specific regions (e.g., Discrimination against girls in India). The entries in the WH-HPR theme and the MIS theme focused on women’s health issues, including the research of women’s health issues (e.g., Women’s health issues), the determinants of health issues (e.g., Social determinants of health in poverty), and women’s health status in specific regions (e.g., Women’s reproductive health in Russia).
The new entries in the WH-SP theme concentrated on the techniques and methods (e.g., Gynography), research (e.g., Black Women’s Health Study), organizations (e.g., EuroHealthNet), training and education (e.g., Oregon Health and Science University Center for Women’s Health), services (e.g., Midwife), and works of art (e.g., The Honest Body Project) that aimed to support and protect women and improve women’s health.
4.3.2. Changes of Subjects
To explore the internal characteristic evolution of each selected topic, the changes of the subjects from one period to another were explored. For each theme of a selected topic, the frequency difference of each term/phrase from one period to the next period was calculated. The terms/phrases of each theme were ranked according to their frequency differences and the terms/phrases whose frequencies increased or decreased the most from one period to the next were extracted from the rankings. Table 8, Table 9, Table 10 and Table 11 display the top 20 terms/phrases of the rankings, and only the terms/phrases whose frequencies increased or decreased by more than 4 are included in the 11 tables. The subjects relevant to the terms/phrases were also included in these tables. The numbers in each table show the frequency differences. If a term’s frequency decreased from Periods 1 to 2, its frequency difference would be negative, and vice versa.
Table 8.
Changes of Subjects in the Four Periods in the DVHS Theme.
Table 9.
Changes of Subjects in the Four Periods in the WH-HPR Theme.
Table 10.
Changes of Subjects in the Four Periods in the MIS Theme.
Table 11.
Changes of Subjects in the Four Periods in the WH-SP Theme.
1. The Discrimination, violence, harm, and subordination (DVHS) theme
Table 8 presents the terms/phrases whose frequencies changed the most from one period to the next in the DVHS theme. During all the periods, the terms about abuse and violence, inequality and discrimination, and minority group kept growing. One lower-level subject of abuse and violence, sexual violence, increased in all the periods, and another lower-level subject, domestic violence, increased from Periods 1 to 2, and Periods 3 to 4. The increase of the terms relevant to female genital mutilation mainly caused the growth of domestic violence.
The content about inequality and discrimination focused on different aspects in different time periods. For instance, the interests about inequality in society, economy, and work increased from Periods 1 to 2, while from Periods 2 to 4, the interests about inequality in health care grew. For the minority group subject, the content about LGBT people increased in all the investigated periods. When examining the high-frequency terms/phrases about LGBT, it shows that “transgender people” occurred the most. In other words, from Periods 1 to 4, the Wikipedia editors paid increasing attention to the LGBT group, especially transgender people.
2. The Health problems and risks (WH-HPR) theme
Table 9 shows that the terms about the health issue subject and the family planning and reproduction subject increased in all the periods. From Periods 1 to 2, the increasing terms about family planning and reproduction were related to family planning and reproduction organizations (e.g., United Nations Population Fund), while in the following periods, the terms were related to family planning and reproduction methods (e.g., induced abortion and medical abortion).
The increasing terms of the health issue subject were related to different aspects in different periods. From Periods 1 to 2, the increasing terms covered various lower-level subjects, including health problems, health organizations, health services, and causes of health problems. Among these lower-level subjects, only health problems and health services attracted more attention than before in the next period. From Periods 3 to 4, a new lower-level subject emerged, which was health research.
3. The Medical and interdisciplinary subjects (MIS) theme
Table 10 shows that the frequency of increasing terms covered more and more subjects as time went by in the MIS theme. From Periods 1 to 2, the terms were relevant to health issue and family planning and reproduction. It means that the Wikipedia editors’ interests focused on these two subjects. In Period 3, a new interest about the population issue emerged. In addition to the previous subjects, in Period 4, the Wikipedia editors had two more interests, violence and inequality and discrimination.
4. The Support and protection (WH-SP) theme
Table 11 illustrates that the terms about health issue and woman protection kept increasing from Periods 1 to 2, although in different periods, these terms focused on different aspects of the two subjects. For instance, the terms about treatment only increased from Periods 1 to 2, while the terms about health education increased from Periods 1 to 3.
The woman protection subject had three lower-level subjects, which were politics, health, and education. The terms about politics increased from Periods 2 to 4, which indicates that the Wikipedia editors had increasing interests in this subject in recent years. Furthermore, examination of the terms about politics demonstrates that the Wikipedia editors’ interests increased the most in women’s suffrage.
Table 12 summarizes the growing, diminishing, and fluctuating subjects of the Women’s Health topic from 2010 to 2017. The growing/diminishing subjects were the subjects whose associated terms and phrases kept increasing/decreasing during the investigated periods. In other words, the growing/diminishing subjects attracted increasing/decreasing attention during the investigated periods. The fluctuating subjects were the subjects whose associated terms and phrases increased in some periods but decreased in other periods.
Table 12.
Growing, Diminishing, and Fluctuating Subjects.
The minority group subject’s associated terms/phrases kept increasing from Periods 1 to 4. It became more and more important from Periods 1 to 4. In other words, the Wikipedia editors paid increasing attention to the minority groups from 2010 to 2017.
4.3.3. Changes of External Popularities
The external popularity of a topic/theme was defined as the numbers of the page edits and the numbers of the page views of its associated entries. The Friedman’s Test was applied to test for the differences among the periods. Table 13 presents the results.
Table 13.
Hypothesis Testing Results of H01 and H02.
The results show that H01 was rejected. It means that: (1) there were significant differences among the four periods in terms of the number of the page views; (2) there were no significant differences among the four periods in terms of the number of the page edits.
The Sign Test was used to explore the differences between every two periods. The comparisons intended to reveal the differences from one period to the next in order to show the temporal changes of external popularities. Hence, only the adjacent periods were compared. Since the result of H02 was not significant, no follow-up test was conducted for this hypothesis. The results of the follow-up tests for H01 are presented in Table 14.
Table 14.
Pairwise Comparison Results of H01 and H02.
Table 14 displays that there were significant differences among Periods 1 and 2, and Periods 3 and 4, but no significant difference was found among Periods 2 and 3 in terms of the number of the page views. When investigating the detailed results obtained from the pairwise comparisons, it shows that the number of the page views of Period 2 was larger than that of Period 1 (129 positive signs versus 34 negative signs) and the number of the page views of Period 4 was smaller than that of Period 3 (53 positive signs versus 139 negative signs). Therefore, the number of the page views of the associated entries in women’s health grew from Periods 1 to 2, remained stable from Periods 2 to 3, and dropped from Periods 3 to 4. These findings reveal that the Wikipedia editors’ interests in women’s health did not change quickly from 2010 to 2017, while the Wikipedia viewers’ interests in this topic grew rapidly from Periods 1 to 2 but dropped quickly from Periods 3 to 4, which indicates that these groups were built by different people.
5. Conclusions
This study discovers the evolution characteristics of the women’s health topic on Wikipedia. Two hundred and seven associated entries of women’s health were retrieved on Wikipedia, and four themes emerged from these entries, which were (2) Discrimination, violence, harm, and subordination; (2) Health problems and risks; (3) Medical and interdisciplinary subjects; and (4) Support and protection. It indicates that the Wikipedia editors focused on these four aspects of women’s health.
From the internal characteristic’s aspect, the women’s health content on Wikipedia kept increasing from 2010 to 2017. The subjects became increasingly diverse as time went by. The editors paid more and more attention to abuse and violence, family planning and reproduction, health issue, inequality and discrimination, minority group, and woman protection, while their interests in economy and politics decreased. If a subject was quickly changed in certain periods, it was usually caused by social events or social issues.
From the external popularity’s aspect, the overall popularity of the women’s health topic declined from 2010 to 2017, contrary to the growth of their content and the growth of extensive online health information seeking. The themes identified in this study had similar trends of popularities among the Wikipedia viewers. Their popularity all grew rapidly from Periods 1 to 2, remained stable from Periods 2 to 3, and fell dramatically from Periods 3 to 4. However, the trends of the popularities among the viewers were not consistent with those among the editors. Therefore, the two groups were not composed of the same members.
The results show that no association was found between the internal characteristic evolution and the external popularity evolution of the women’s health topic on Wikipedia. The content generation or change of Wikipedia entries had no impact on the Wikipedia users.
The findings can enable health professionals, health care givers, and general users to get a more comprehensive understanding of women’s health information on social media by illustrating and discovering the entries, subjects, and themes of women’s health discussed on Wikipedia and the relationships among them. Exploring the women-health-related themes and subjects will contribute to the developments of health ontologies and consumer health vocabularies and assist Website designers in organizing online women’s health information. Revealing the temporal features of the women’s health topic can support the temporal information retrieval of women-health-related information.
There are plenty of health-related topics on social media, such as women’s health, men’s health, and children’s health, which are worthy research topics. However, because of the limitations of time and paper length, it is difficult to investigate all the related topics on different social media platforms in one study. In future research, the researchers will explore the characteristics of more health topics on Wikipedia and other social media platforms, and compare different health topics and health information on different platforms.
Author Contributions
Conceptualization, J.Z. and Y.W.; methodology, Y.W. and J.Z.; software, Y.W.; validation, Y.W.; formal analysis, Y.W.; investigation, Y.W.; data curation, Y.W.; writing—original draft preparation, Y.W.; writing—review and editing, J.Z. and Y.W..; visualization, Y.W.; supervision, J.Z.; project administration, J.Z. and Y.W.; funding acquisition, Y.W. All authors have read and agreed to the published version of the manuscript.
Funding
This research was supported by the Fundamental Research Funds for the Central Universities, and the Research Funds of Renmin University of China (Funding No. 19XNF028).
Conflicts of Interest
The authors declare no conflict of interest. The founding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.
Appendix A
Table A1.
Themes and Investigated Entries of Women’s Health Topic.
Table A1.
Themes and Investigated Entries of Women’s Health Topic.
| Themes | Entries |
|---|---|
| Discrimination, violence, harm, and subordination (DVHS): 37 entries | (1) Ageism, (2) Airline seating sex discrimination controversy, (3) Ambivalent sexism, (4) Discrimination against girls in India, (5) Female genital mutilation, (6) Femicide, (7) Gender apartheid, (8) Gender bias on Wikipedia, (9) Gender inequality in India, (10) Gender inequality, (11) Glass cliff, (12) Hegemonic masculinity, (13) Heterosexism, (14) Husband stitch, (15) Hypermasculinity, (16) LGBT stereotypes, (17) Male privilege, (18) Misogyny in horror films, (19) Misogyny, (20) Missing women, (21) Occupational segregation, (22) Occupational sexism, (23) Patriarchy, (24) Pink-collar worker, (25) Rape culture, (26) Reverse sexism, (27) Sexism in the technology industry, (28) Sexism, (29) Transphobia, (30) Triple oppression, (31) Victim blaming, (32) Wife selling, (33) Women in firefighting, (34) Women in law enforcement, (35) Women in medicine, (36) Women in Pakistan, (37) Women in the workforce |
| Health problems and risks (WH-HPR): 25 entries | (1) Abortion, (2) Anilingus, (3) Birth control, (4) Complications of pregnancy, (5) Disease, (6) Diseases of affluence, (7) Diseases of poverty, (8) Drift hypothesis, (9) Gender disparities in health, (10) Gender polarization, (11) Hypertensive disease of pregnancy, (12) Incarceration of women in the United States, (13) Inequality in disease, (14) Infant mortality, (15) List of bacterial vaginosis microbiota, (16) Medical anthropology, (17) Mental health inequality, (18) Misandry, (19) Molar pregnancy, (20) Ovarian cancer, (21) Schistosomiasis, (22) Unnatural Causes: Is Inequality Making Us Sick?, (23) Water supply and sanitation in India, (24) Women’s Health Issues (journal), (25) Women and smoking |
| Medical and interdisciplinary subjects (MIS): 46 entries | (1) Epidemiology, (2) Etiology, (3) Face-ism, (4) Family planning, (5) Gender-blind, (6) Global health, (7) Health equity, (8) Health in China, (9) Health in India, (10) Health, (11) History of medicine, (12) History of nursing, (13) Immigrant paradox, (14) International Conference on Population and Development, (15) Intersectionality, (16) Maternal health, (17) Matriarchy, (18) Medical sociology, (19) Menstruation, (20) Mental health, (21) Molecular pathological epidemiology, (22) Pathogenesis, (23) Pathology, (24) Population Health Forum, (25) Population health, (26) Public health, (27) Race and health, (28) Reproductive health, (29) Richard G. Wilkinson, (30) Sex differences in humans, (31) Sex segregation, (32) Sexual division of labour, (33) Social determinants of health in Mexico, (34) Social determinants of health in poverty, (35) Social determinants of health, (36) Social determinants of obesity, (37) Social epidemiology, (38) Vaginal tightening, (39) Whitehall Study, (40) Women’s health in China, (41) Women’s health in Ethiopia, (42) Women’s health in India, (43) Women’s health, (44) Women’s reproductive health in Russia, (45) Women’s reproductive health in the United States, (46) Women who have sex with women |
| Support and protection (WH-SP): 99 entries | (1) Alexandria Regional Center for Women’s Health and Development, (2) American Medical Women’s Association, (3) AnMed Health Women’s & Children’s Hospital, (4) Antifeminism, (5) Association of Women’s Health, Obstetric and Neonatal Nurses, (6) Australian Longitudinal Study on Women’s Health, (7) Australian Women’s Health Network, (8) B.C. Women’s Hospital & Health Centre, (9) Black Women’s Health Study, (10) Condom, (11) Dennis Raphael, (12) Equity feminism, (13) EuroHealthNet, (14) European Institute of Women’s Health, (15) Female condom, (16) Female education, (17) Feminism, (18) Feminist health centers, (19) Feminist movement, (20) Feminist Women’s Health Center (Atlanta, Georgia), (21) Florence Hartley, (22) Gender equality, (23) Gender feminism, (24) Gender neutrality, (25) Global Library of Women’s Medicine, (26) Global Task Force on Expanded Access to Cancer Care and Control in Developing Countries, (27) Gynaecology, (28) Gynography, (29) Health (magazine), (30) Health Care for Women International, (31) Health care in the United States, (32) Health Disparities Center, (33) Health education, (34) Health literacy, (35) Health professional, (36) Healthcare and the LGBT community, (37) Healthcare in Canada, (38) Healthy People program, (39) HealthyWomen, (40) Hopkins Center for Health Disparities Solutions, (41) Hormone replacement therapy (menopause), (42) Howard Atwood Kelly, (43) International Journal of Women’s Health, (44) International Planned Parenthood Federation, (45) International Women’s Health Coalition, (46) Ipas (organization), (47) Journal of Midwifery & Women’s Health, (48) Journal of Women’s Health, (49) Kegel exercise, (50) Laura W. Bush Institute for Women’s Health, (51) List of first female physicians by country, (52) List of health and fitness magazines, (53) List of medical journals, (54) List of women’s studies journals, (55) Madsen v. Women’s Health Center, Inc., (56) Martha Ballard, (57) Men and feminism, (58) Michael Marmot, (59) Michigan Medicine, (60) Midwife, (61) Midwifery, (62) National Organization for Men Against Sexism, (63) National Organization for Women, (64) National Women’s Health Network, (65) New Space for Women’s Health, (66) Office on Women’s Health, (67) Oregon Health and Science University Center for Women’s Health, (68) Our Bodies, Ourselves, (69) Psychology of Women Quarterly, (70) Reproductive Health Supplies Coalition, (71) Reproductive rights, (72) Separatist feminism, (73) Sex Roles (journal), (74) Society for Women’s Health Research, (75) Sunnybrook Health Sciences Centre, (76) Sutter Health, (77) Sybil Shainwald, (78) Tamika D. Mallory, (79) The Heart Truth, (80) The Honest Body Project, (81) The NeuroGenderings Network, (82) Torches of Freedom, (83) United Nations Foundation, (84) United Nations Population Fund, (85) United States Department of Health and Human Services, (86) University of Pittsburgh Graduate School of Public Health, (87) Women’s College Hospital, (88) Women’s empowerment, (89) Women’s Health (magazine), (90) Women’s Health Action and Mobilization, (91) Women’s Health Care Nurse Practitioner-Board Certified, (92) Women’s Health Initiative, (93) Women’s health nurse practitioner, (94) Women’s medicine in antiquity, (95) Women’s rights in Iran, (96) Women’s rights, (97) Women’s suffrage, (98) Women & Health, (99) Women in India |
References
- Tu, H.T. Surprising decline in consumers seeking health information. Track. Rep. 2011, 26, 1–6. [Google Scholar]
- Fox, S.; Duggan, M. Health Online. 2013. Available online: http://www.pewinternet.org/2013/01/15/health-online-2013/ (accessed on 20 June 2020).
- Moorhead, S.A.; Hazlett, D.E.; Harrison, L.; Carroll, J.K.; Irwin, A.; Hoving, C. A new dimension of health care: Systematic review of the uses, benefits, and limitations of social media for health communication. J. Med. Internet Res. 2013, 15, e85. [Google Scholar] [CrossRef] [PubMed]
- Dawson, J. Doctors Join Patients in Going Online for Health Information. Available online: http://connection.ebscohost.com/c/opinions/49259197/doctors-join-patients-going-online-health-information (accessed on 25 March 2010).
- Marar, S.D.; Al-Madaney, M.M.; Almousawi, F.H. Health information on social media. perceptions, attitudes, and practices of patients and their companions. Saudi Med. J. 2019, 40, 1294–1299. [Google Scholar] [CrossRef] [PubMed]
- Zhang, J.; Zhao, Y. A user term visualization analysis based on a social question and answer log. Inf. Process. Manag. 2013, 49, 1019–1048. [Google Scholar] [CrossRef]
- Brasil, P.; Pereira, J.P.J.; Moreira, M.E.; Ribeiro Nogueira, R.M.; Damasceno, L.; Wakimoto, M.; Nielsen-Saines, K. Zika Virus Infection in Pregnant Women in Rio de Janeiro. N. Engl. J. Med. 2016, 375, 2321–2334. [Google Scholar] [CrossRef] [PubMed]
- Oteng-Ntim, E.; Tezcan, B.; Seed, P.; Poston, L.; Doyle, P. Lifestyle interventions for obese and overweight pregnant women to improve pregnancy outcome: A systematic review and meta-analysis. Lancet 2015, 386, S61. [Google Scholar] [CrossRef]
- Subramaniam, M.; Prasad, R.O.; Abdin, E.; Vaingankar, J.A.; Chong, S.A. Single mothers have a higher risk of mood disorders. Ann. Acad. Med. Singap. 2014, 43, 145–151. [Google Scholar] [PubMed]
- Asiodu, I.V.; Waters, C.M.; Dailey, D.E.; Lee, K.A.; Lyndon, A. Breastfeeding and use of social media among first-time African American mothers. J. Obstet. Gynecol. Neonatal Nurs. 2015, 44, 268–278. [Google Scholar] [CrossRef] [PubMed]
- Gleeson, D.M.; Craswell, A.; Jones, C.M. Women’s use of social networking sites related to childbearing: An integrative review. Women Birth 2019, 32, 294–302. [Google Scholar] [CrossRef] [PubMed]
- Holtz, B.; Smock, A.; Reyes-Gastelum, D. Connected motherhood: Social support for moms and moms-to-be on Facebook. Telemed. J. E-Health 2015, 21, 415–421. [Google Scholar] [CrossRef] [PubMed]
- Ure, C.; Cooper-Ryan, A.M.; Condie, J.; Galpin, A. Exploring Strategies for Using Social Media to Self-Manage Health Care When Living with and Beyond Breast Cancer: In-Depth Qualitative Study. J. Med. Internet Res. 2020, 22, e16902. [Google Scholar] [CrossRef] [PubMed]
- Milne, D.; Witten, I.H. Learning to Link with Wikipedia. In Proceedings of the 17th ACM Conference on Information and Knowledge Management, Napa Valley, CA, USA, 26–30 October 2008; Association for Computing Machinery: New York, NY, USA, 2008; pp. 509–518. [Google Scholar]
- Milne, D.; Witten, I.H. An open-source toolkit for mining Wikipedia. Artif. Intell. 2013, 194, 222–239. [Google Scholar] [CrossRef]
- Wikimedia Downloads. Available online: https://dumps.wikimedia.org/ (accessed on 9 July 2020).
- Keyes, O. Package ‘WikipediR’. Available online: http://www.stats.bris.ac.uk/R/web/packages/WikipediR/WikipediR.pdf (accessed on 20 June 2020).
- Kohonen, T.; Kaski, S.; Lagus, K.; Salojarvi, J.; Honkela, J.; Paatero, V.; Saarela, A. Self organization of a massive document collection. IEEE Trans. Neural Netw. 2000, 11, 574–585. [Google Scholar] [CrossRef] [PubMed]
- The R Foundation for Statistical Computing. The R Project for Statistical Computing. Available online: https://www.r-project.org/ (accessed on 17 July 2020).
- RStudio Team. RStudio. Available online: https://rstudio.com/ (accessed on 17 July 2020).
- Ultsch, A.; Siemon, H. Kohonen’s Self Organizing Feature Maps for Exploratory Data Analysis. In Proceedings of the INNC’90, International Neural Networks Conference, Palais des Congres, Paris, France, 9–13 July 1990; Kluwer Academic: Dordrecht, The Netherlands; Boston, MA, USA, 1990; pp. 305–308. [Google Scholar]
- Bem, S.L. Dismantling gender polarization and compulsory heterosexuality: Should we turn the volume down or up? J. Sex Res. 1995, 32, 329–334. [Google Scholar] [CrossRef]
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).