1. Introduction
The metabolic disease diabetes mellitus, the contagious infection tuberculosis and thyroid disease are major chronic diseases which affect billions of people every year. These chronic diseases rapidly increased death rates over past decades and they act as a gateway to several other diseases by weakening the immune system of humans. According to the World Health Organization, 422 million people are affected by diabetes and 1.6 million deaths occur each year due to diabetes and tuberculosis [
1]. A BioMed Centre (BMC) public health journal survey indicates that lower levels of thyroid hormones increase the risk of diabetes mellitus.
Diabetes mellitus is a metabolic disease in which blood glucose levels are divergently high. Insulin is a hormone produced by the pancreas and is responsible for lowering the glucose level in blood. Insufficient production of insulin, absence of insulin and an inability of human bodies to properly utilize insulin are major causes of diabetes [
2]. Diabetes mellitus is categorized as type1 or insulin-dependent or juvenile-onset diabetes and type2 or insulin-independent or adult-onset diabetes [
3]. In the United States, diabetes is the seventh most common cause for death.
Tuberculosis is an infectious disease caused by a bacterium called Mycobacterium tuberculosis (MTB). Tuberculosis (TB) directly affects lungs and also invades through other organs. It spreads from one person to another person through coughs, sneezes and saliva. TB is categorized into active TB or extrapulmonary TB and latent TB infection. The BCG vaccine acts as a barrier to the deadly disease tuberculosis. The WHO describes TB as an “epidemic” and proclaims that tuberculosis is one of the preeminent causes of death by a single contagious agent [
4].
The thyroid gland is a butterfly-shaped endocrine gland present in the neck. The thyroid gland is responsible for producing thyroid hormones that control various metabolic activities in the human body. An abnormal increase or decrease of the thyroid hormone leads to thyroid disease. Thyroid disease is classified into hyperthyroidism, or an overactive thyroid, and hypothyroidism, which is an underactive thyroid. Hashimoto disease, Graves’ disease, thyroid nodules and goiter are the most prominent disorders of the thyroid. Thyroid disease is a truculent disease, which is almost impossible to eradicate and exists in the human body throughout its lifetime [
5].
Social media platforms support reciprocated computing-mediated technologies that facilitate users to share new information, ideas and their opinions [
6] with their communities. Online health communities (OHCs) and health care professionals (HCPs) are an emerging phenomenon in social media which connect various groups of individuals having similar health-related issues and interests [
7,
8,
9,
10,
11]. Using this persuasive platform HCPs clarify public health-related problems, illustrate the use of health care policy and practice issues, promote public health programs, motivate patients and educate every individual by providing continuous support and service.
The information collected from Med Help, e-Health, WebMD, Healthline, Medscape, Everyday Health and Health Central are helpful in identifying inter-relationships among generally arising acute diseases [
12]. The keywords from the collected information are helpful for patients and physicians to explore information about these chronic diseases. The knowledge gathered from these keywords acts as an aegis to reduce the possible death rate.
The analysis of 750 messages collected for four different chronic diseases depicts a perception of a diverse and varied range of activities carried out by moderators [
13,
14]. Community development and a strengthening of local networks help to improve the quality of life (QOL) of older people and self-harming behavior patients affected by various diseases [
15,
16,
17].
In the work [
18] data are collected from a Zambia rural community and the analyzed results evidently explain the experience and responsibility of the mother, who satisfies cultural and health expectation during new-born care. Through community content and thematic relationships, the effect of climatic changes on human physical and mental health are explained in [
19]. Text mining and science mapping techniques are used to analyze and interpret the results [
20]. A systematic pharmacological method is combined with other data mining techniques for the evaluation of drug similarity [
21].
Dataset data mining techniques are applied on a dataset of MTA (Metropolitan Transportation Authority) customer feedback to enhance QOS (quality of service) and identify customer satisfaction levels [
22]. The tool interprets and identifies diagnostics patterns from a huge free clinical dataset of notes of patients, using text mining techniques [
23]. The study used the K-means++ algorithm to increase accuracy of the recommendation system [
24].
An improved K-means algorithm and dimensionality reduction were used to perform clustering of Arabic text [
25]. A K-means text clustering algorithm was efficiently used in spam detection [
26]. In another study a weighted K-means algorithm text clustering was performed [
27].
IoT and cloud services are playing a major role for extracting and visualizing the data without human intervention [
28,
29,
30,
31,
32].
The analysis reports of trusted health care organizations are an important source from which to find relationships between the mentioned chronic diseases. The online health community platform is recommended by physicians to obtain accurate knowledge about all diseases. The OHC texts play a vital role in the extraction of keywords and in finding inter-relationships between all three chronic diseases.
The objectives are designed in a way to emphasize social values and to eradicate lingering diseases, namely diabetes mellitus (DM), tuberculosis (TB) and thyroid disorders. The three prominent objectives are delineated as follows:
To extract important keywords of each disease from each cluster.
To find inter–relationships among three chronic diseases.
To measure the accuracy of extracted keywords by comparing keywords with the world’s trusted organization reports.
4. Discussion
The clusters are labelled manually for all three diseases. The most prominent keywords of each cluster are tabulated for all three diseases. The most important keywords are extracted as a result of the LDA process for all three diseases. The sample terms of each cluster are extracted based on the sample terms inter-relationships between all three diseases. Authors should discuss the results and how they can be interpreted in perspective of previous studies and of the working hypotheses. The findings and their implications should be discussed in the broadest context possible. Future research directions may also be highlighted. Keywords about the diseases are listed in the
Table 1,
Table 2 and
Table 3.
Side effects, Habits and Healthy Lifestyle are the clusters which are found common in all three diseases. Based on this inference, the relationship between the three chronic diseases is found.
Side effects: In the Side effects cluster problems faced by each disease patient are grouped. The interpretation is that the patients of all three diseases are facing common health problems even though the cause of all three diseases is different.
Habits: The Habits cluster demonstrates that pre-activity should be carried out by patients to prevent all three chronic diseases.
Healthy Lifestyle: The Healthy Lifestyle cluster describes the activities that should be carried out by patients to recover from diseases and to prevent death.
Side effects, Habits, and Healthy Lifestyle are three clusters which were found common among all three chronic diseases. These three clusters and their respective keywords evidently depict the prominent inter-relationships between diabetes mellitus, tuberculosis and thyroid disease. Venn diagrams are used to analyze common themes among all three diseases. A Venn diagram interprets common themes among diabetes mellitus, tuberculosis and thyroid disease and also illustrates similarity among diabetes mellitus and thyroid disease. Side effects, Habits and Healthy Lifestyle are common themes between all three chronic diseases, which are found from Venn diagram interpretation. It is represented in
Figure 5. The common themes identified among the three chronic diseases reveal an occurrence of inter-relationship between them. The cause and impact of the three chronic diseases are different but the cluster similarity among the three diseases evidently describes inter-relationships between the three diseases.
The accuracy score of a keyword is measured based on the number of keywords extracted and is mapped with the world’s trusted organization reports. Keywords of each cluster extracted from all diseases are compared with the world’s trusted organization reports. The comparison results illustrate accuracy of each keyword of all clusters, which evidently shows the accuracy of each keyword. The World Health Organization (WHO), the National Health Survey (NHS), the National Institute of Health (NIH), the Centre for Disease Control and Prevention (CDC), the European Centre for Disease Control and Prevention (ECDC), the National Centre for Disease Control and Prevention (NCDC), the American Diabetes Association (ADA), the American Thyroid Association (ATA), Women’s Health, MedlinePlus, WebM and Healthline are twelve of the world’s trusted organizations. The mentioned twelve organization reports are compared to measure accuracy of all keywords for all diseases. Accuracy scores of each cluster keyword, compared with trusted organization reports, are tabulated.
The comparison result evidently illustrates that each keyword of all clusters extracted from all disease datasets are accurate and they can be interpreted to have a factual meaning. The sample keywords from each cluster are compared with the mentioned 12 organization reports. Based on the occurrence of the keywords, each cluster accuracy is measured in percentage and the results are tabulated in
Table 4,
Table 5 and
Table 6.
The American Diabetes Association gives 94.8% accuracy for keywords of the disease diabetes mellitus (DM). The majority of diabetes mellitus keywords are matched with ADA reports. The keywords of thyroid are well mapped with the American Thyroid Association reports, which in turn produce 92.3% overall accuracy. The tuberculosis sample keywords are majorly matched with the National Institute of Health reports, which show an overall accuracy of 90.5%.