Investigation of Women’s Health on Wikipedia—A Temporal Analysis of Women’s Health Topic

: New health-related concepts, terms, and topics emerge, and the meanings of existing terms and topics keep changing. This study investigated and explored the evolutions of the women’s health topic on Wikipedia. The creation time, page views data, page edits data, and text of historical versions of 207 women-health-related entries from 2010 to 2017 on Wikipedia were collected. Coding, subject analysis, descriptive and inferential statistical analysis, and Self-Organizing Map and n-gram approaches were employed to explore the characteristics and evolutions of the entries for the women’s health topic. The results show that the number of the women-health-related entries kept increasing from 2010 to 2017, and nearly half of them were related to the supports and protection of women’s health. The total number of page views of the investigated items increased from 2011 to 2013, but it decreased from 2013 to 2017, while the total number of page edits stayed stable from 2010 to 2017. Growing subjects were found during the investigated period, such as abuse and violence, and family planning and reproduction. However, the entries related to the economy and politics were diminishing. There was no association between the internal characteristic evolution and the external popularity evolution of the women’s health topic.


Introduction
With the development of computer technology and Internet technology, the volume of health information keeps increasing on the Internet. According to Tu's report, the proportion of people among all the consumers who sought health information online increased from 15.9% to 32.6% from 2001 to 2010 [1]. A survey in 2013 reported that 87% of USA adults use the Internet and among them, 72% stated that they sought health information online during the past year [2].
The emergence of Web 2.0 advocated the creation of social media, which changed the method of communication between health organizations, health consumers, patients, and health professionals [3]. Dawson reported that 81% of European consumers and 63% of USA consumers trust the health information on social media applications [4]. Marar, Al-Madaney, and Almousawi found that 85% of patients and their companions sought health information via social media [5]. These statistics reveals that social media is recognized as an important channel for seeking health information in recent years.
The health-related information on social media covers a wide range of topics, including diseases and treatments, nutrition, health care and insurance, and healthy lifestyle [6]. The women's health topic is a main category among the various health-related topics on social media. Women usually face special health risks, such as pregnancy, cervical and breast cancer, and accompanying physical and In this study, a theme of the topic means a specific and distinctive concern of a group of Wikipedia entries. The entries collected can be assigned to several categories, and every category has its own theme. A subject of the topic means the focus of an entry. Every Wikipedia entry can have one or more than one subjects. The structure of entries, themes, and subjects for the Women's Health topic in a specific time period is shown in Figure 1.
Informatics 2020, 7, x 2 of 23 usually face special health risks, such as pregnancy, cervical and breast cancer, and accompanying physical and psychological issues [7][8][9]. Social media are currently utilized as important platforms by women to gain and share specific women-health-related information, to strengthen social supports by building social connections with others, and to self-manage health care [10][11][12][13].
As the volume of health information increases rapidly on social media, new terms, concepts, and topics emerge, which cause problems in health information seeking from both system and users' perspectives. Therefore, it is necessary to explore the temporal features of health-related concepts, terms, and topics on social media. Wikipedia is one of the largest social media platforms consisting of user-generated articles and the semantic relations between them [14,15]. Wikimedia Downloads stores all the historical versions and data of entries, including editors, viewers, content, interactions, and temporal data [16]. All of its data are open to the public. Hence, to investigate the temporal features of the women's health topic on social media, Wikipedia is a proper data source. However, few studies have focused on the women's health user-generated content on Wikipedia, and the temporal characteristics of the women's health topic on social media have not been adequately investigated, either. This study aims at discovering the themes, subjects, and entries related to women's health on Wikipedia and the relations among them and exploring how the women's health topic evolved from 2010 to 2017 on Wikipedia on both internal and external aspects.

Research Problem and Questions
The research problem of this study is to investigate and discover the evolution of the women's health topic derived from the social media website Wikipedia. The research questions are as follows: • RQ1: What are the emergent themes and subjects of the women's health topic on Wikipedia and the relations among them? • RQ2: How does the women's health topic evolve on Wikipedia in terms of its internal characteristics and external popularity during the investigated time periods?
In this study, a theme of the topic means a specific and distinctive concern of a group of Wikipedia entries. The entries collected can be assigned to several categories, and every category has its own theme. A subject of the topic means the focus of an entry. Every Wikipedia entry can have one or more than one subjects. The structure of entries, themes, and subjects for the Women's Health topic in a specific time period is shown in Figure 1. The internal characteristics of a specific topic in different time periods show the emergences, growths, and disappearances of entries, subjects, and terms in each theme. The external popularity of an entry was measured by its number of page edits and number of page views. The number of page edits reflects the popularity of an entry among the Wikipedia editors, and the number of page views reflects the popularity of an entry among the Wikipedia viewers. This study explored the internal characteristics and external popularities of the women's health topic from 2010 to 2017. Four

Data Collection
The data collection processes of this study contain entries collection, text collection, and page views and edits data collection from Wikipedia. Figure 2 illustrates the entire data collection processes.

Data Collection
The data collection processes of this study contain entries collection, text collection, and page views and edits data collection from Wikipedia. Figure 2 illustrates the entire data collection processes. Wikipedia is regarded as a social media platform and its content is open to the public. It does not include any private data of editors and users. The investigated study does not involve any human subjects and the topic is not sensitive. Therefore, it was exempt from ethics approval.

Entries Collection
To explore the evolution of the women's health topic on social media, Wikipedia was selected as the data source in this study. Milne and Witten argued that Wikipedia is a rapidly growing platform containing vast interlinked information [15]. The richness of its content makes it an important resource for knowledge sharing and citation, and even for research. The history of each entry on Wikipedia is accessible to users, which means that all the historical versions of the entry are recorded by Wikipedia and could be viewed and collected by researchers.
Two methods were applied to retrieve entries related to women's health on Wikipedia. For the first method, the entries in the "See also" section of the women's health entry and the entries in the "See also" sections of these related entries were regarded as the associated entries, since the "See also" section of an entry contains the relevant entries selected by editors. For the second method, the term "women's health" was used as the search term to retrieve related entries on Wikipedia. The search results returned were ranked by relevance, and the top 100 search results returned were examined by the researcher. The associated entries, which were not the same as the entries obtained by the first method, were collected. Wikipedia is regarded as a social media platform and its content is open to the public. It does not include any private data of editors and users. The investigated study does not involve any human subjects and the topic is not sensitive. Therefore, it was exempt from ethics approval.

Entries Collection
To explore the evolution of the women's health topic on social media, Wikipedia was selected as the data source in this study. Milne and Witten argued that Wikipedia is a rapidly growing platform containing vast interlinked information [15]. The richness of its content makes it an important resource for knowledge sharing and citation, and even for research. The history of each entry on Wikipedia is accessible to users, which means that all the historical versions of the entry are recorded by Wikipedia and could be viewed and collected by researchers.
Two methods were applied to retrieve entries related to women's health on Wikipedia. For the first method, the entries in the "See also" section of the women's health entry and the entries in the "See also" sections of these related entries were regarded as the associated entries, since the "See also" section of an entry contains the relevant entries selected by editors. For the second method, the term "women's health" was used as the search term to retrieve related entries on Wikipedia. The search results returned were ranked by relevance, and the top 100 search results returned were examined by the researcher. The associated entries, which were not the same as the entries obtained by the first method, were collected.

Text Collection
The content of every history version of an entry on Wikipedia contains several sections, such as content, main text, and reference. Although not all entries consist of the same sections, certain sections are included in almost all entries. They are the title, other entries associated to this entry, a short description of the entry, content, main body, "See also", and reference. The content section includes the content table of an entry, the main text or main body of an entry, and the reference section, which consists of references and URLs of references of an entry.
For each of the associated entries, the text data of the current version and the last version generated in 2011, 2013, 2015, and 2017 were collected based on the time periods determined. For each history version, the text of all the sections of each entry was collected. The WikipediR package developed by Oliver Keyes run on R was adopted for text data collection [17]. The software R is developed by The R Foundation for Statistical Computing who is seated in Vienna and Austria [18]. It enables the researcher to retrieve and gather the text content of an entry's current and historical versions.

Page Views and Edits Data Collection
The page views data during 2010 to 2017 were collected from Wikimedia Downloads. This website provides the Wikipedia data dumps that store all the historical page views data of all the Wikipedia entries since January 2010. The page edits data were collected from the view history page of the associated entries by R and RStudio. The software RStudio is developed by the RStudio Team in Boston, MA, USA [19].

Categories and Themes
The entries obtained related to the topic on different aspects. In order to explore the relations among the entries, they were grouped into several categories in terms of their content. Since there are no existing categories of these entries, the open coding method was employed to analyze the associated entries and group them into several categories. In this study, every category had only one theme.

Text Data Processing
The subjects of every theme were extracted from the entries belonging to it by clustering and text mining approaches. To apply these approaches, the text data obtained for the themes were cleansed and transformed first.
The open source software R and RStudio and the tm package were adopted for text data cleansing, transformation, and processing. The punctuations, stop words, meaningless words (e.g., numbers, dates, equations, and so on), and words whose frequencies were less than 4 were removed. For each theme, a document-term matrix (Equation (1)) of the vector space model was presented. The matrix has m rows and n columns. The value of the cell (a ij ) in the matrix represents the frequency of the term j in the entry i.
Then, the document-term matrices were transformed to Term Frequency × Inverse Document Frequency (TF-IDF) matrices. Each value of the TF-IDF matrices (w ij ) was calculated based on Equation (2). In this equation, m is the number of the entries of a matrix, and e j represents the number of the entries containing the term j in the matrix. The TF-IDF matrices obtained were the input matrices for the following clustering process.

SOM Approach
To cluster the entries of the categories, the Self-Organizing Map (SOM) approach was employed in this study. It is a widely used neural network method that measures similarities among items of input data so as to form similarity graphs. The whole procedure of this approach is a recursive regression process [20]. The input matrices are the TF-IDF matrices created for the categories.
The output of the SOM approach is an output display map. Similar entries were assigned to the same node on the output display map. A U-matrix was used for projecting the clustering results to SOM displays. Every entry was projected to the SOM display as a number. Numbers with shorter distances among them were more similar than those with longer distances. Moreover, the similarity among entries was indicated by the color of an SOM output. The color projected to the SOM display background was determined by a U-matrix [21]. Higher values of the U-matrix stood for cluster borders, while lower values represented clusters.
According to the distances between numbers and the background colors, the entries of each matrix were clustered. The criteria for clustering the numbers are as follows: (1) the numbers located in the same SOM node were grouped into one cluster; and (2) if the numbers are located in two or more nodes, and the nodes are adjacent, or separated by only one empty node, and at the same time, the numbers are located in the same area where the U-matrix values are lower than half of the highest U-matrix value of the matrix, then these numbers were grouped into one cluster. In this way, the entries of each category were assigned into several clusters.

Subject Analysis
To identify the subjects of the clusters and categories, the n-gram approach was employed. The n-gram package offered by R extracts the n-word phrases in unstructured text files.
The historical revisions of the entries in one cluster were merged into one document. For each category, the historical revisions in a specific period of its entries were also merged into one document. The most used 2-word, 3-word, and 4-word phrases in each document were extracted by the n-gram package. The set phrases and meaningless phrases (e.g., "of the" and "the study is a") were removed from the dataset. If a phrase was a part of another one and the two phrases had the same meaning, then they were regarded as one phrase, and their frequencies were added together (e.g., "child development index" and "the child development index"). After data processing, a list containing phrases and frequencies was obtained for each document.
The researcher manually reviewed the lists to summarize the subjects of each document. One phrase could relate to more than one subject, and different phrases could relate to the same subjects. In this way, the subjects of each cluster were generated, and the subjects in each period of each theme were generated.
To find the increasing and decreasing phrases in each category, the differences of the frequencies obtained from the adjunct periods for a phrase was calculated. After all the frequency differences were obtained for each category, the researcher reviewed the most increasing and most decreasing phrases to generate the subjects.

Inferential Analysis
Inferential statistical tests allow the researcher to gain insights into the differences among the objects. In addition to descriptive statistical methods, inferential statistical analysis was applied to test the differences among the determined periods for the women's health topic. The hypotheses are: • H01: There were no significant differences among the investigated time periods in terms of the number of views of the entries relevant to women's health.
• H02: There were no significant differences among the investigated time periods in terms of the number of edits of the entries relevant to women's health.
Since the independent variable (time period) was categorical, the dependent variables (number of views and number of edits) were continuous with repeated measures, and the distributions of the dependent variables did not follow the normal distribution; meanwhile, the Friedman's Test was applied to test the differences among the periods. To explore the difference among every two periods, a series of pairwise comparisons were conducted. Since the distribution of the differences among every two periods was not symmetrical, the Sign Test was used. The significant level of the inferential statistical tests was 0.05.

Entries and Themes
According to the data collection and analysis strategy, 207 associated entries were obtained, and four themes were generated from them. Table 1 lists the themes of the women's health topic, the number of entries related to each theme, and the description of each theme. The Support and protection (WH-SP) theme had the most relevant entries (99 entries) among the four themes, which reflects that the general public cared more about the protection of women's health than the other themes. Entries related to policies, laws, research studies, literary and artistic work, treatments, people, and organizations that support and protect women, and improve women's health Table 2 presents the numbers of the entries created from 2010 to 2017 of each theme. The number of the entries had a steady rise. The WH-SP theme contributed to the entry increase the most among the four themes every year from 2010 to 2017. In 2010 and 2015, more new entries were created for this theme compared with the other years. Another special case is that for the MIS theme, 4 new entries were generated in 2013, which was larger than the other years.        Figure 4 displays the Numbers of Yearly Page Views (NYPVs) of the four themes of the women's health topic, and the NYPV of the topic as well. The trend of the total page views decreased from 2010 to 2011, increased from 2011 to 2013, but then decreased again after that. The trends of the four themes were similar to the trend of the entire women's health topic. The Support and protection theme and the Medical and interdisciplinary subjects theme ranked in the top two places among the four themes from 2010 to 2017. The Health problems and risks theme occupied the third place from 2010 to 2012 but fell to the last place from 2013. The trend of the Discrimination, violence, harm, and subordination theme was slightly different from the other three themes, because its NYPV increased rapidly from 2010 to 2013. However, the decreasing of its NYPV from 2013 to 2017 was similar to the other themes and the Women's Health topic.
For each theme and the entire topic, its NYPE trend differed a lot from its NYPV trend from 2010 to 2017. No association was found between the NYPEs and the NYPVs. It indicates that the user groups who created the page edits and the page views had different interests in the investigated time periods. The Wikipedia editors were more interested in Support and protection, and Discrimination, violence, harm, and subordination, while the Wikipedia viewers were more interested in the Support and protection and Medical and interdisciplinary subjects. and the Medical and interdisciplinary subjects theme ranked in the top two places among the four themes from 2010 to 2017. The Health problems and risks theme occupied the third place from 2010 to 2012 but fell to the last place from 2013. The trend of the Discrimination, violence, harm, and subordination theme was slightly different from the other three themes, because its NYPV increased rapidly from 2010 to 2013. However, the decreasing of its NYPV from 2013 to 2017 was similar to the other themes and the Women's Health topic. For each theme and the entire topic, its NYPE trend differed a lot from its NYPV trend from 2010 to 2017. No association was found between the NYPEs and the NYPVs. It indicates that the user groups who created the page edits and the page views had different interests in the investigated 3    In the displays, every number stands for an entry, and the corresponding entry of each number is presented in Appendix A. Every rectangle or polygon represents a cluster and the numbers in the same rectangle/polygon represent the entries belonging to the same cluster. The numbers not included in any rectangle/polygon stand for the isolated entries, which were not grouped to any clusters. The clusters with more than three entries were recognized as large clusters, while those with three or less entries were the small clusters. The large clusters were represented by purple rectangles or polygons. The small clusters were represented by red rectangles.

Subject Analysis Results and Discussion
Tables 3-6 display the high-frequency terms and phrases, as well as the subjects discovered within each large cluster. The high-frequency terms/phrases were extracted from the entries by the n-gram approach. The high-frequency terms and phrases are displayed in the second column of each table, and the frequency of each term/phrase is included in the brackets following the term/phrase. The researcher proposed the subjects of each large cluster by examining the high-frequency terms and phrases of it. The small clusters and isolated entries were not included in these tables. Clusters C3 and C4 were all located in the same blue area. Therefore, the entries in these two clusters were relevant to one another. Table 3 lists the clusters and their high-frequency terms, phrases, and subjects.
The minority group subject occurred in all the four clusters of this theme, so it was the most dominant subject of the DVHS theme. The minority groups mentioned in the entries associated to this subject contained LGBT people, women, and African Americans (black).
The inequality and discrimination subject and the abuse and violence subject appeared in three clusters. The former subject had three lower-level subjects, which were health care inequality and discrimination, inequality and discrimination in work, and inequality and discrimination in research. Health care inequality and discrimination was reflected by the "missing women" phenomenon. As it was demonstrated in the "Missing women" entry, this phenomenon indicated that the number of women in a region was smaller than the expected number of women, which was caused by sex-selective abortion, female infanticide, and inadequate health care and nutrition for female children.
Women protection was another subject of the DVHS theme. Its associated entries were about the organizations (e.g., the Supreme Court of the United States) and research (e.g., the triple oppression theory) of women protection. 4.2.2. The Health Problems and Risks (WH-HPR) Theme Figure 6 presents that two clusters were generated for the WH-HPR theme. These two clusters were close to each other and in the same blue area, which indicates that the entries in the two clusters shared some similarities. Table 4 presents the high-frequency terms/phrases and subjects of the two clusters. Informatics 2020, 7, x 10 of 23

C1
blood pressure (20), high blood pressure (11), bacterial vaginosis (7), passive partner (7), active partner (6), chronic hypertension (5) The WH-HPR theme only had two subjects: health issues and inequality and discrimination. Cluster C1 mainly concentrated on health problems, such as high blood pressure and sexually transmitted infections. In this cluster, a frequently used synonym of high blood pressure was found, which was hypertension. The term "hypertension" was often used by health professionals and the term "high blood pressure" was usually used by lay people. Since Wikipedia is a user-generated platform, these two expressions were both utilized in the Wikipedia entries.
The health issue subject of Cluster C2 had four lower-level subjects, including health service (e.g., health care), research (e.g., medical anthropology), organization (e.g., the World Health Organization), and problem (e.g., heart disease). Another subject of this cluster was inequality and discrimination, and this subject had a lower-level subject, research. For example, an entry of this subject was about the "Gender polarization" concept proposed by American psychologist Sandra Bem [22]

C1
blood pressure (20), high blood pressure (11), bacterial vaginosis (7), passive partner (7), active partner (6) (15), women's health (14) Health issue (service, research, organization, problem), inequality and discrimination (research) The WH-HPR theme only had two subjects: health issues and inequality and discrimination. Cluster C1 mainly concentrated on health problems, such as high blood pressure and sexually transmitted infections. In this cluster, a frequently used synonym of high blood pressure was found, which was hypertension. The term "hypertension" was often used by health professionals and the term "high blood pressure" was usually used by lay people. Since Wikipedia is a user-generated platform, these two expressions were both utilized in the Wikipedia entries.
The health issue subject of Cluster C2 had four lower-level subjects, including health service (e.g., health care), research (e.g., medical anthropology), organization (e.g., the World Health Organization), and problem (e.g., heart disease). Another subject of this cluster was inequality and discrimination, and this subject had a lower-level subject, research. For example, an entry of this subject was about the "Gender polarization" concept proposed by American psychologist Sandra Bem [22].

The Medical and Interdisciplinary Subjects (MIS) Theme
Figure 7 presents that four large clusters and two small clusters emerged from all the entries of the MIS theme. Two of the four clusters were either not close to each other or had green areas between them. These results show that the entries of the four clusters were not quite relevant. Table 5 lists the clusters, high-frequency terms, phrases, and subjects.
Informatics 2020, 7, x 11 of 23 Figure 7 presents that four large clusters and two small clusters emerged from all the entries of the MIS theme. Two of the four clusters were either not close to each other or had green areas between them. These results show that the entries of the four clusters were not quite relevant. Table  5 lists the clusters, high-frequency terms, phrases, and subjects.   (14), medical care (14), birth control (14) Health issue (service, problem, research, organization), family planning and reproduction, abuse and violence (structural violence)

C3
Whitehall Study (25), pelvic floor (23), reproductive health (21), Whitehall II (16), health care (16), reproductive rights (13), heart disease (13), Russian women (13), coronary heart disease (10), reproductive law and policy (8), Center for Reproductive Law (8), women's health (7), civil servants (6), social class (6), mortality rate (6), live births (6), risk factors (6), social determinants (6), blood pressure (6) (14), health services (14), prenatal care (13), developing countries (12), United Nations (12), United States (12) Health issue (problem, service, organization), population issue (problem), family planning and reproduction The health issue subject and the family planning and reproduction subject appeared in all the four clusters, but each cluster of the MIS theme had their own unique subject: C1 had the abuse and violence subject, C2 had the minority group subject, C3 had the social factor subject, and C4 had the population issue subject. For the abuse and violence subject, a new lower-level subject emerged from this theme, which was the structural violence subject. Different from the previous types of violence, structural violence was caused by social structure or social institution. Figure 8 presents that eight large clusters and five small clusters were discovered for the WH-SP theme. Clusters C1 to C7 were all located in the same blue area, which means that their entries had similarities to some extent. Cluster C8 stayed in another blue area, and the yellow and green areas separated it from the other clusters, which means that its entries had no strong connections with the entries of the other clusters. The eight large clusters and their high-frequency terms/phrases and subjects are displayed in Table 6. Table 6. Subject Analysis of WH-SP.

C1
healthy people (17), Department of Health (12), black women (9), health and human services (9), women's health (8), disease prevention (6), health promotion (5), Human Services Office (5), Office of Disease Prevention (5) Health issue (organization, problem, service), minority group (black) C2 women's health (42), Center for Women's Health (14), health sciences (13), Health Centre (10), women's hospital (10), AnMed Health (7), health services (7), OHSU Center (6), women and newborns (6) Table 6 shows that the health issue subject appeared in the first seven clusters (Clusters C1 to C7) of the WH-SP theme, more than any other subjects of this theme. Therefore, the health issue subject was the salient subject of the WH-SP theme. This subject had several lower-level subjects, including health organizations (e.g., OHSU Center for Women's Health), services (e.g., obstetric and neonatal nurses), research (e.g., women's studies journals), problems (e.g., heart disease), education (e.g., Performance Indicators), and laws (e.g., Social Security Act). The health education subject of this theme covered the content about the performance indicators that were used for student assessment. The health law subject, which only occurred in this theme, referred to the entries of health-related laws and policies, such as the Social Security Act and the policies developed by the European Institute of Women's Health.
The minority group subject, the population issue subject, the family planning and reproduction subject, and the woman protection subject each only appeared in one cluster, respectively. Different from the other three subjects, the woman protection subject did not occur together with the health issue subject, which indicates that there was no strong connection between C8 and the other seven clusters. The entries in C8 were relevant to woman protection activities and research. For example, many theorists proposed a series of feminism theories (e.g., liberal feminism and gender feminism) so as to fight against gender inequality. A certain instance was the history of women fighting for equal smoking rights. Table 7 lists the subjects of the identified themes and shows the relations between the themes and subjects. It reveals that the DVHS theme, the MIS theme, and the WH-SP theme had more diverse subjects compared with the WH-HPR theme. In other words, the entries' subjects of the WH-HPR theme were more centralized than those of the other themes. Among the four themes, the MIS theme and the WH-SP theme had more common subjects. Meanwhile, every two of the four themes had one or more subjects in common with each other, which indicates that these themes were relevant to each other.  Table 6 shows that the health issue subject appeared in the first seven clusters (Clusters C1 to C7) of the WH-SP theme, more than any other subjects of this theme. Therefore, the health issue subject was the salient subject of the WH-SP theme. This subject had several lower-level subjects, including health organizations (e.g., OHSU Center for Women's Health), services (e.g., obstetric and neonatal nurses), research (e.g., women's studies journals), problems (e.g., heart disease), education (e.g., Performance Indicators), and laws (e.g., Social Security Act). The health education subject of this theme covered the content about the performance indicators that were used for student assessment. The health law subject, which only occurred in this theme, referred to the entries of health-related laws and policies, such as the Social Security Act and the policies developed by the European Institute of Women's Health.
The minority group subject, the population issue subject, the family planning and reproduction subject, and the woman protection subject each only appeared in one cluster, respectively. Different from the other three subjects, the woman protection subject did not occur together with the health issue subject, which indicates that there was no strong connection between C8 and the other seven clusters. The entries in C8 were relevant to woman protection activities and research. For example, many theorists proposed a series of feminism theories (e.g., liberal feminism and gender feminism) so as to fight against gender inequality. A certain instance was the history of women fighting for equal smoking rights. Table 7 lists the subjects of the identified themes and shows the relations between the themes and subjects. It reveals that the DVHS theme, the MIS theme, and the WH-SP theme had more diverse subjects compared with the WH-HPR theme. In other words, the entries' subjects of the WH-HPR theme were more centralized than those of the other themes. Among the four themes, the MIS theme and the WH-SP theme had more common subjects. Meanwhile, every two of the four themes had one or more subjects in common with each other, which indicates that these themes were relevant to each other.

Entry Growth
New entries created in a certain period reflect the Wikipedia editors' new interests and focuses during the period. According to Table 2, among the four themes of women's health, the WH-SP theme had much more new entries in the four investigated periods than the other three themes.
After reviewing all the new generated entries from 2010 to 2017, it reveals that the new entries in the DVHS theme were related to sexism, such as sexism in the workplace (e.g., Women in law enforcement) and sexism in specific regions (e.g., Discrimination against girls in India). The entries in the WH-HPR theme and the MIS theme focused on women's health issues, including the research of women's health issues (e.g., Women's health issues), the determinants of health issues (e.g., Social determinants of health in poverty), and women's health status in specific regions (e.g., Women's reproductive health in Russia).
The new entries in the WH-SP theme concentrated on the techniques and methods (e.g., Gynography), research (e.g., Black Women's Health Study), organizations (e.g., EuroHealthNet), training and education (e.g., Oregon Health and Science University Center for Women's Health), services (e.g., Midwife), and works of art (e.g., The Honest Body Project) that aimed to support and protect women and improve women's health.

Changes of Subjects
To explore the internal characteristic evolution of each selected topic, the changes of the subjects from one period to another were explored. For each theme of a selected topic, the frequency difference of each term/phrase from one period to the next period was calculated. The terms/phrases of each theme were ranked according to their frequency differences and the terms/phrases whose frequencies increased or decreased the most from one period to the next were extracted from the rankings. Tables 8-11 display the top 20 terms/phrases of the rankings, and only the terms/phrases whose frequencies increased or decreased by more than 4 are included in the 11 tables. The subjects relevant to the terms/phrases were also included in these tables. The numbers in each table show the frequency differences. If a term's frequency decreased from Periods 1 to 2, its frequency difference would be negative, and vice versa.
1. The Discrimination, violence, harm, and subordination (DVHS) theme Table 8 presents the terms/phrases whose frequencies changed the most from one period to the next in the DVHS theme. During all the periods, the terms about abuse and violence, inequality and discrimination, and minority group kept growing. One lower-level subject of abuse and violence, sexual violence, increased in all the periods, and another lower-level subject, domestic violence, increased from Periods 1 to 2, and Periods 3 to 4. The increase of the terms relevant to female genital mutilation mainly caused the growth of domestic violence.
The content about inequality and discrimination focused on different aspects in different time periods. For instance, the interests about inequality in society, economy, and work increased from Periods 1 to 2, while from Periods 2 to 4, the interests about inequality in health care grew. For the minority group subject, the content about LGBT people increased in all the investigated periods. When examining the high-frequency terms/phrases about LGBT, it shows that "transgender people" occurred the most. In other words, from Periods 1 to 4, the Wikipedia editors paid increasing attention to the LGBT group, especially transgender people.   (21), transgender people (20), age discrimination (18), rape culture (18), Glick P (16), World Health Organization (16), genital cutting (14), Oxford University (14), South Africa (14), gender roles (13), sexual assault (13), Islamic law (13) Abuse and violence (domestic violence, sexual violence), inequality and discrimination (society, economy, work), minority group (LGBT), woman protection (organization), health issue (organization)  (13), women's rights (11), New York (9), gender bias (9), rape victims (9), global gender gap (9), gender roles (9), violence against women (9), labor force (9), sex differences (9), pay gap (9), gender gap report (9) Inequality and discrimination (health care, work), abuse and violence (sexual violence, domestic violence), minority group (LGBT, woman), woman protection Table 9. Changes of Subjects in the Four Periods in the WH-HPR Theme.

Time Period High-Frequency Terms and Phrases Subjects
The increasing terms of the health issue subject were related to different aspects in different periods. From Periods 1 to 2, the increasing terms covered various lower-level subjects, including health problems, health organizations, health services, and causes of health problems. Among these lower-level subjects, only health problems and health services attracted more attention than before in the next period. From Periods 3 to 4, a new lower-level subject emerged, which was health research. Table 10 shows that the frequency of increasing terms covered more and more subjects as time went by in the MIS theme. From Periods 1 to 2, the terms were relevant to health issue and family planning and reproduction. It means that the Wikipedia editors' interests focused on these two subjects. In Period 3, a new interest about the population issue emerged. In addition to the previous subjects, in Period 4, the Wikipedia editors had two more interests, violence and inequality and discrimination. Table 11 illustrates that the terms about health issue and woman protection kept increasing from Periods 1 to 2, although in different periods, these terms focused on different aspects of the two subjects. For instance, the terms about treatment only increased from Periods 1 to 2, while the terms about health education increased from Periods 1 to 3.

The Support and protection (WH-SP) theme
The woman protection subject had three lower-level subjects, which were politics, health, and education. The terms about politics increased from Periods 2 to 4, which indicates that the Wikipedia editors had increasing interests in this subject in recent years. Furthermore, examination of the terms about politics demonstrates that the Wikipedia editors' interests increased the most in women's suffrage. Table 12 summarizes the growing, diminishing, and fluctuating subjects of the Women's Health topic from 2010 to 2017. The growing/diminishing subjects were the subjects whose associated terms and phrases kept increasing/decreasing during the investigated periods. In other words, the growing/diminishing subjects attracted increasing/decreasing attention during the investigated periods. The fluctuating subjects were the subjects whose associated terms and phrases increased in some periods but decreased in other periods. The minority group subject's associated terms/phrases kept increasing from Periods 1 to 4. It became more and more important from Periods 1 to 4. In other words, the Wikipedia editors paid increasing attention to the minority groups from 2010 to 2017.

Changes of External Popularities
The external popularity of a topic/theme was defined as the numbers of the page edits and the numbers of the page views of its associated entries. The Friedman's Test was applied to test for the differences among the periods. Table 13 presents the results. The results show that H01 was rejected. It means that: (1) there were significant differences among the four periods in terms of the number of the page views; (2) there were no significant differences among the four periods in terms of the number of the page edits.
The Sign Test was used to explore the differences between every two periods. The comparisons intended to reveal the differences from one period to the next in order to show the temporal changes of external popularities. Hence, only the adjacent periods were compared. Since the result of H02 was not significant, no follow-up test was conducted for this hypothesis. The results of the follow-up tests for H01 are presented in Table 14.  Table 14 displays that there were significant differences among Periods 1 and 2, and Periods 3 and 4, but no significant difference was found among Periods 2 and 3 in terms of the number of the page views. When investigating the detailed results obtained from the pairwise comparisons, it shows that the number of the page views of Period 2 was larger than that of Period 1 (129 positive signs versus 34 negative signs) and the number of the page views of Period 4 was smaller than that of Period 3 (53 positive signs versus 139 negative signs). Therefore, the number of the page views of the associated entries in women's health grew from Periods 1 to 2, remained stable from Periods 2 to 3, and dropped from Periods 3 to 4. These findings reveal that the Wikipedia editors' interests in women's health did not change quickly from 2010 to 2017, while the Wikipedia viewers' interests in this topic grew rapidly from Periods 1 to 2 but dropped quickly from Periods 3 to 4, which indicates that these groups were built by different people.

Conclusions
This study discovers the evolution characteristics of the women's health topic on Wikipedia. Two hundred and seven associated entries of women's health were retrieved on Wikipedia, and four themes emerged from these entries, which were (2) Discrimination, violence, harm, and subordination; (2) Health problems and risks; (3) Medical and interdisciplinary subjects; and (4) Support and protection. It indicates that the Wikipedia editors focused on these four aspects of women's health.
From the internal characteristic's aspect, the women's health content on Wikipedia kept increasing from 2010 to 2017. The subjects became increasingly diverse as time went by. The editors paid more and more attention to abuse and violence, family planning and reproduction, health issue, inequality and discrimination, minority group, and woman protection, while their interests in economy and politics decreased. If a subject was quickly changed in certain periods, it was usually caused by social events or social issues.
From the external popularity's aspect, the overall popularity of the women's health topic declined from 2010 to 2017, contrary to the growth of their content and the growth of extensive online health information seeking. The themes identified in this study had similar trends of popularities among the Wikipedia viewers. Their popularity all grew rapidly from Periods 1 to 2, remained stable from Periods 2 to 3, and fell dramatically from Periods 3 to 4. However, the trends of the popularities among the viewers were not consistent with those among the editors. Therefore, the two groups were not composed of the same members.
The results show that no association was found between the internal characteristic evolution and the external popularity evolution of the women's health topic on Wikipedia. The content generation or change of Wikipedia entries had no impact on the Wikipedia users.
The findings can enable health professionals, health care givers, and general users to get a more comprehensive understanding of women's health information on social media by illustrating and discovering the entries, subjects, and themes of women's health discussed on Wikipedia and the relationships among them. Exploring the women-health-related themes and subjects will contribute to the developments of health ontologies and consumer health vocabularies and assist Website designers in organizing online women's health information. Revealing the temporal features of the women's health topic can support the temporal information retrieval of women-health-related information.
There are plenty of health-related topics on social media, such as women's health, men's health, and children's health, which are worthy research topics. However, because of the limitations of time and paper length, it is difficult to investigate all the related topics on different social media platforms in one study. In future research, the researchers will explore the characteristics of more health topics on Wikipedia and other social media platforms, and compare different health topics and health information on different platforms.

Conflicts of Interest:
The authors declare no conflict of interest. The founding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.