Abstract
This paper proposes an artificial intelligence model to manage risks in healthcare institutions. This model uses a trendy data source, social media, and employs users’ interactions to identify and assess potential risks. It employs natural language processing techniques to analyze the tweets of users and produce vivid insights into the types of risk and their magnitude. In addition, some big data analysis techniques, such as classification, are utilized to reduce the dimensionality of the data and manage the data effectively. The produced insights will help healthcare managers to make the best decisions for their institutions and patients, which can lead to a more sustainable environment. In addition, we build a mathematical model for the proposed model, and some closed-form relations for risk analysis, identification and assessment are derived. Moreover, a case study on the CVS institute of healthcare in the USA, and our subsequent findings, indicate that a quartile of patients’ tweets refer to risks in CVS services, such as operational, financial and technological risks, and the magnitude of these risks vary between high risk (19%), medium risk (80.4%) and low risk (0.6%). Further, several performance measures and a complexity analysis are given to show the validity of the proposed model.
1. Introduction
Sustainable development is the meeting of human needs while simultaneously advancing society, technology, and the preservation of the planet’s natural systems. Moreover, continuous advancement in healthcare systems is necessary for sustainable global progress. To achieve this not only requires the development of infrastructure and systems, but also needs appropriate management of potential risks [1,2]. Healthcare institutions may face different types of risks, especially those who present their services digitally such as operational risks, financial risks, technological risks, and so on [3]. Hence, identifying, assessing and handling these risks based on advanced techniques, such as artificial intelligence, are an essential step in delivering high quality services for both patients and employers [4,5,6].
Artificial intelligence models, such as natural language processing, are widely used to analyze and extract semantics from texts. NLP was developed in the mid-20th century as the intersection of linguistics and artificial intelligence. It was initially distinct from text information retrieval (IR), which effectively indexes and retrieves massive volumes of text using highly scalable statistics-based techniques [7]. NLP is used to solve many problems in healthcare, such as healthcare-associated infections monitoring, by detecting medical events in computerized medical records [8]. In addition, it helps to assess the performance of healthcare institutions, for example, assessing hospital readmissions for patients with COPD [9].
Social media is popular, developing quickly, and affects the healthcare industry more and more. There is no doubt that social media is changing how patients connect with one another. For instance, patients can now share thoughts on support groups, new studies, medications, and even doctors, on social media and so, continues to play an active role in patient healthcare [10]. Moreover, social big data analysis can help healthcare institutions to proficiently understand and analyze the interactions of users on social media websites, which can enhance the competitiveness and productivity of these institutions [11]. Besides, trendy business intelligence systems are also useful for optimizing risks in healthcare firms [12].
The objective of this study is to propose a natural language processing model that depends on social media as an advanced data source to manage risks in healthcare institutions. Besides, this model utilizes some big data analysis techniques, such as classification, to minimize the dimensionality of data and manipulate data effectively. We describe this model mathematically and derive explicit formulas for risk analysis, identification and assessment. In addition, a practical case is provided on the CVS institute of healthcare in the USA, and we give insights concerning recognizing and assessing related risks. Further, some performance measures are evaluated to demonstrate the efficiency of the suggested model.
The rest of the paper is structured as follows. A literature review of related articles is discussed in Section 2. In Section 3, we present the proposed model. A mathematical model is developed in Section 4. In Section 5, we provide a practical case study. The conclusion and further work are included in Section 6.
2. Literature Review
Hammoda and Durst [13] presented a developed knowledge risks taxonomy in terms of healthcare with related descriptions, and analyzed their possible effects on healthcare organizations depending on an inductive approach. The findings indicated that 25 types of knowledge risks are identified in healthcare organizations and classified into three main categories: human, technology and operational. The risk factors of environmental health and safety are evaluated and ranked, and they identify the common relationship between them by constructing and employing a fuzzy decision making trial and evaluation laboratory approach [14]. The results demonstrated the efficiency of the proposed approach in improving the casual relationships as well as the ranking between risk factors.
Salih et al. [15] examined the IoT risk management features in a healthcare environment and proposed the detailed process of an IoT model for risk management by conducting a case study from a hospital in Sudan. The results showed good usability of the proposed model in the evaluation and implementation of IoT from the perspective of risk management. The current risks related to big data security in healthcare are presented and some of the newest techniques and approaches that minimize these risks are surveyed in [16]. The findings showed that the main strengths and limitations of the presented methods depend on anonymization and encryption.
Simsekler et al. [17] focused on identifying patient safety risks to assist healthcare organizations with risk management by understanding the type of support that is provided from the policies, procedures and strategies of the trust-level by using the Freedom of Information Act. The results exhibited how the documents gave deficient support for using the methods of prospective risk identification, whilst control from other industries was highly promoted. The reallocation of resources in hospitals amongst high and low risk patients were identified and handled by employing big data analysis in [18]. The findings included how the proposed approach has major implications for developing value in healthcare organizations.
Liverani et al. [19] introduced different factors, such as ecological, biological and socioeconomic factors, to minimize zoonotic risk, and recognized ways in which a deep risk analysis can be deployed by proposing a conceptual approach. The findings concluded that the need for interdisciplinary cooperation concerning zoonotic risks also needs to include the complication of environmental risks. New risks concerning runtimes in the healthcare industry for effective performance are identified and mitigated by utilizing neural networks in [20]. The results showed that the risks are identified at a faster rate using the proposed approach and in safe mode. Yucel et al. [21] developed a new risk management model to predict risks in healthcare that assessed and minimized risk before deploying a new healthcare information system based on a fuzzy risk evaluation model. The results referred to determining the degree of risk of implementing a new major healthcare information system with 100% confidence. The direct and indirect risk effect on patients was explored by applying a systematic approach based on Reason’s theory of failures in [22]. The findings illustrated the ability of the proposed framework to foster valuable decision making concerning the minimization of risks and the development of risk management in the healthcare organization. A comparative study is presented in Table 1.
Table 1.
Comparative study on risk management in healthcare institutions.
3. The Proposed Model
This section includes a new framework linking risk management, healthcare, and social media big data analysis with each other. This framework includes four phases, namely: risk identification, risk assessment, risk control, and risk monitoring as shown in Figure 1 as follows:
Figure 1.
Artificial healthcare risk management model.
Healthcare institutions can detect potential risks by employing social media platforms throughout the risk-identification phase. This can be achieved by taking into account the views of clients and visitors on social media sites, such as Facebook, Twitter, and YouTube. Furthermore, it is critical to identify and evaluate potential hazards by participating in social media discussions about the spread of certain pandemics and other public health issues. This phase involves three steps that include collecting the data by API, crawling and surveys; cleaning the data of invalid data, such as deceptive information, missing data, inconsistent data; and storing the data in a big data storage facility, such as NoSQL databases and Apache Hive.
In the risk estimation phase, the potential risks in healthcare organizations are proposed by studying the behavior of users on social media websites. This behavior can be observed in the numbers of likes, comments, and shares on posts, in addition to the number of hashtags. Consequently, extracting behavioral features from the collected dataset plays a vital role in analyzing the mood of users and assessing the potential risks. It declares how, where, how much, and how long the users of healthcare organizations are generally exposed to a potential risk. This phase involves three steps: data classification, feature extraction, and feature selection. Multimedia data, including texts, photographs, videos, and sounds, can be found in large amounts on social media platforms. This data might be structured, semi-structured, or unstructured. To reduce the complexity and dimensionality of social big data, the classification process divides the data into groups of related data by using several techniques, such as a Quantum support vector machine, a MapReduce-based k-Nearest Neighbor approach, and Hybrid neural networks. Classified groups from the previous step involving many features need to be extracted, and then the features in a group are collected for the following steps. The process of feature extraction includes transforming the input data, such as texts and images, into features that have numerical characteristics, which is more effective for machine learning algorithm processing. This reduces the number of variables in the original dataset and starts with a set of weighted data and then produces the derived values by using many methods, such as N-Gram, Lexicon-based features, Bag-of-Words technique and Principal Component Analysis (PCA). The goal of the feature selection process is to use various techniques to minimize the number of attributes in the set of extracted features in order to reduce the dimensionality and complexity of the data, such as Chi Square (CHI), or Information Gain (IG).
Different forms of analysis, including descriptive, diagnostic, predictive, and prescriptive analysis, can be used throughout the control phase. These techniques help in obtaining a better understanding of the positive and negative factors, which can improve the decision making process. The descriptive analysis, which gives all the important archive data needed to deliver usable information, constitutes the first stage of data manipulation. Additionally, it can be used to determine the ratio of specific observations according to all of the observations in the experimental results section. The diagnostic analysis is a more advanced type of analysis that is described by numerous approaches, including data mining, data correlations, and data discovery. Data are transformed into useful and valuable information by predictive analysis. Predictive analysis requires several crucial phases, including predictive modelling, decision analysis and optimization, and transaction profiling. In order to uncover future opportunities or reduce prospective hazards, prescriptive analytics looks for decision chances.
The healthcare organizations’ investors and financial organizations receive helpful comments about the present hazards and strategies for minimizing issues at this stage due to the visualization of insights from the previous stage. Potential actions are anticipated to be taken after receiving feedback. Therefore, these actions will have an impact on performance, either positively or negatively. To identify and manage risks, the risk management process is, therefore, iterated and closed.
4. Mathematical Model
In this section, we will describe the suggested model mathematically, to derive some essential closed-form formulas for risk analysis, risk identification and risk assessment. Firstly, we will provide a description of the basic notations in this section, which are demonstrated in Abbreviation.
Let represent the posts of patients or the people who are interested in a specific healthcare organization where n is the number of posts in the study. Each patient may share his impression about the services in this organization on social media, which leads to of knowledge. We consider that each patient writes one post that can be represented by multiple words as in the following:
where and is word number for patient such that and .
The vector may have huge data that are in different formats, such as nouns, verbs and prepositions. So, to minimize the dimensionality of the data, we will categorize the data in Equation (1) into similar groups of topics, which can be achieved by utilizing the following formula:
where refers to the noun number y in the dataset.
Thus, Equation (1) can be split into familiar categories as in the following form:
where .
In addition, the knowledge is represented by many formats, such as nouns, prepositions, adjectives, adverbs, and so on. Therefore, to decrease the size of the data, we can focus only on the fundamental words, such as the nouns, verbs, adjectives and adverbs. Hence, the vector of words that are associated with the knowledge vector in Equation (1) has the following form:
where .
In order to analyze the knowledge of each patient, we define a criteria to weight the fundamental words according to three cases, as in the following:
We develop a closed-form formula that can measure the knowledge based on the criteria in Equation (5), which can be shown as the following:
where and refer to verb, adverb, noun and adjective formats, respectively, and such that d is the number of risk words in the text and as well as Pr is an indicator for possibility words in the text and receives a value of 1 if a possibility word is found in the text or 0 otherwise.
Consequently, the vector in Equation (4) produces a new vector that has numeric values according to Equation (6). This vector has the following form:
To perform a risk analysis for multiple topics, we can use the Algorithm 1 as follows:
| Algorithm 1 Pseudo-code for risk analysis |
Input ← P
|
To determine the type of risk , we can employ the following formula:
where the R value is one if there is any term that indicates risk in the sentence and zero otherwise; E, F, M and O refer to environmental, financial, market and operational risks, respectively, such that the value of any of them equals 1 if any term in the text refers to them and 0 otherwise; also e, f, m and o are positive numbers. The process of assessing risks can be shown by the Algorithm 2 as follows:
| Algorithm 2 Pseudo-code for identifying the risk |
Input ← RA
|
For estimating the risk , we define:
where Pr is defined in Equation (6) such that , and refer to high and low risks, respectively, whereas , . To show the methodology of the risk estimation process, we developed the Algorithm 3 as follows:
| Algorithm 3 Pseudo-code for the estimation of risks |
Input ← RA
|
5. Case Study
CVS Health in USA is one of the leading healthcare organizations in the world. Therefore, we decided to use it for a case study. We scraped more than 28 k tweets based on the geolocation of users for six months starting from 1 February 2022 to 1 August 2022 by using the Twint tool.
5.1. Results and Performance Measures
Figure 2 shows the number of tweets in every month:
Figure 2.
Distribution of the number of tweets in 6 months.
By using Equation (2), we can recognize the top most common words mentioned in the dataset, as shown in Figure 3.
Figure 3.
Distribution of the top frequencies of words in the dataset.
To identify risks in the dataset, we first performed a risk analysis. In risk analysis, we can determine the state of the risk in each tweet. We focused on identifying the risks in three common services, namely, pharmacy, care and store. We built a new dictionary including more than 7500 words that refer to a risk, such as hacking, poor services, dangerous and so on. The risk analysis was performed on more than 5 k tweets, 4.5 k tweets and 3 k tweets for the categories pharmacy, care and store, respectively. The results indicated that there were no risks in approximately three-quarters of the samples, while more than a quarter of the samples referred to risk. The results are visualized in Figure 4. Besides, some examples of the tweets that indicated no risk, potential risk and risk are given in Table 2.
Figure 4.
Risk analysis for pharmacy, care and store services.
Table 2.
Some examples of patients’ tweets with their classification.
To approve the obtained results, we used multiple performance measures, such as accuracy, precision, recall and f1-score [24]. In reality, it was hard to annotate 28 k tweets, so we resorted to statistical sampling theory. We utilized the following formula to obtain the size of sample (s):
Such that is the size of population, is the Z-score, is the standard deviation and is the margin of error. We choose the confidence level to be 95%, so the Z-score will be equal to 1.96. In addition, the margin of error equals 5% as well as the standard deviation, which is 50%. Hence, by substituting the previous values in Equation (10), the sample size equals 379 tweets. In Table 3, the distribution of the sample size for each category is described.
Table 3.
Distribution of sample size for each category.
Now, we could annotate the sample size and compute the performance measures, which are shown in Table 4.
Table 4.
Performance measures of risk analysis.
The obtained risks included several types, such as operational risk, financial risk, and technological risk, as shown in Figure 5. In addition, some tweet examples for these types are given in Table 5. The validation results of the risk identification process are shown in Table 6.
Figure 5.
Risk identification for pharmacy, care and store services.
Table 5.
Some examples of risk tweets with the determination of their type.
Table 6.
Performance measures of risk identification.
In addition, we estimated the risks in each category as shown in Figure 6 with some examples of tweets for risk assessment given in Table 7. The performance measures for the risk estimations are given in Table 8.
Figure 6.
Risk estimation for pharmacy, care and store services.
Table 7.
Some examples of risk tweets with their estimation.
Table 8.
Performance measures of risk assessment.
5.2. Complexity of the Proposed Algorithms
To calculate the time complexity of the proposed algorithms,
we concentrated on the essential steps. For the first algorithm in Algorithm 1,
line 7 includes a constant loop, so its cost is . Inside this
loop, there is another loop in line 8, which traverses the rows of matrix . This matrix includes a constant number of rows and columns c such that (see Equation (2) and Equation (3)). Hence, the cost of this matrix is . For line 9, this includes searching for a word in two lexicons, one for risk words and the other for possibility words, where their costs are as well as some simple if conditions, where their cost is . Therefore, the overall cost of the first algorithm is . Similarly, for the algorithms in Algorithms 2 amd 3, we used two lexicons for each algorithm, one for risk words with the other for risk types in Algorithm 2, and one for risk words with the other for risk assessments in Algorithm 3. So, the time complexity for each of them is .
6. Conclusions
This study proposed a natural language processing model to identify and assess risks in healthcare institutes. It depends on social media as a trendy data source as well as using classification to reduce the dimensionality of the input data. We were able to derive some mathematical expressions for risk analysis, identification and assessment in closed-form. Furthermore, we presented a practical case study on the CVS institute of healthcare in the USA and the potential risks were identified, analyzed and assessed. Moreover, several measures were computed to estimate the performance of the proposed model. In further work, we will employ other techniques to enhance the accuracy and complexity of the obtained results.
Author Contributions
Conceptualization, A.D.; Writing—original draft, Abdelaziz Darwiesh; Writing—review & editing, A.H.E.-B. and A.Z.A.; Project manager and supervisor, M.E. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Not applicable.
Conflicts of Interest
The authors declare no conflict of interest.
Abbreviation
| Notation | Description |
| P | Group of all the posts |
| N | Number of patients in the study |
| Post of patient number in the study | |
| Knowledge that needs to be extracted from the post of patient | |
| Number of words in the post | |
| Word number in the post of patient | |
| T | Group of the most frequent nouns in all posts |
| Noun number y | |
| Verb | |
| Adverb | |
| Noun | |
| Adjective | |
| D | Number of risk words in the post |
| Pr | Indicator takes 1 if there are words that refer to possibility in the post or 0 otherise |
| Vector of risks for one topic | |
| Vector of knowldege or risks for several topics | |
| Variable refers to type of risk | |
| Vector of risk types | |
| Variable refers to the magnitude of risk | |
| Vector of risk magnitude | |
| R | Indicator takes 1 when there are words that refer to risk and 0 otherwise |
| E, F, M and O | Indicators take 1 if there are words that refer to environmental, financial, market or operational risks, respectively, or 0 otherwise |
| e, f, m and o | Associated constant variables for indicators E, F, M and O |
| and | Indicators take 1 if there are words that refer to maximum or minimum words in the post, respectively, and 0 otherwise |
References
- Cucuzzella, C. Creativity, sustainable design and risk management. J. Clean. Prod. 2016, 135, 1548–1558. [Google Scholar] [CrossRef]
- Corvalan, C.; Prats, E.V.; Sena, A.; Campbell-Lendrum, D.; Karliner, J.; Risso, A.; Wilburn, S.; Slotterback, S.; Rathi, M.; Stringer, R.; et al. Towards climate resilient and environmentally sustainable health care facilities. Int. J. Environ. Res. Public Health 2020, 17, 8849. [Google Scholar] [CrossRef] [PubMed]
- Gupta, S.; Kamboj, S.; Bag, S. Role of risks in the development of responsible artificial intelligence in the digital healthcare domain. Inf. Syst. Front. 2021, 1–18. [Google Scholar] [CrossRef]
- Yeh, S.C.; Wu, A.W.; Yu, H.C.; Wu, H.C.; Kuo, Y.P.; Chen, P.X. Public perception of artificial intelligence and its connections to the sustainable development goals. Sustainability 2021, 13, 9165. [Google Scholar] [CrossRef]
- Abdulrahman, S.A.; Salem, A.B.M. An efficient deep belief network for Detection of Coronavirus Disease COVID-19. Fusion Pract. Appl. 2020, 2, 5–13. [Google Scholar] [CrossRef]
- Abualkishik, A.Z.; Alwan, A.A. Multi-objective Chaotic Butterfly Optimization with Deep Neural Network based Sustainable Healthcare Management Systems. Am. J. Bus. Oper. Res 2021, 4, 39–48. [Google Scholar] [CrossRef]
- Nadkarni, P.M.; Ohno-Machado, L.; Chapman, W.W. Natural language processing: An introduction. J. Am. Med. Inform. Assoc. 2011, 18, 544–551. [Google Scholar] [CrossRef]
- Tvardik, N.; Kergourlay, I.; Bittar, A.; Segond, F.; Darmoni, S.; Metzger, M.H. Accuracy of using natural language processing methods for identifying healthcare-associated infections. Int. J. Med. Inform. 2018, 117, 96–102. [Google Scholar] [CrossRef]
- Agarwal, A.; Baechle, C.; Behara, R.; Zhu, X. A natural language processing framework for assessing hospital readmissions for patients with COPD. IEEE J. Biomed. Health Inform. 2017, 22, 588–596. [Google Scholar] [CrossRef]
- Korda, H.; Itani, Z. Harnessing social media for health promotion and behavior change. Health Promot. Pract. 2013, 14, 15–23. [Google Scholar] [CrossRef]
- Darwiesh, A.; Alghamdi, M.; El-Baz, A.H.; Elhoseny, M. Social Media Big Data Analysis: Towards Enhancing Competitiveness of Firms in a Post-Pandemic World. J. Healthc. Eng. 2022, 2022, 6967158. [Google Scholar] [CrossRef]
- Darwiesh, A.; El-Baz, A.H.; Tarabia, A.M.K.; Elhoseny, M. Business Intelligence for Risk Management: A Review. Am. J. Bus. Oper. Res. 2022, 6, 16–27. [Google Scholar] [CrossRef]
- Hammoda, B.; Durst, S. A taxonomy of knowledge risks for healthcare organizations. VINE J. Inf. Knowl. Manag Syst. 2022, 52, 354–372. [Google Scholar] [CrossRef]
- Bhalaji, R.K.A.; Bathrinath, S.; Ponnambalam, S.G.; Saravanasankar, S. A Fuzzy Decision-Making Trial and Evaluation Laboratory approach to analyse risk factors related to environmental health and safety aspects in the healthcare industry. Sādhanā 2019, 44, 55. [Google Scholar] [CrossRef]
- Salih, F.I.; Bakar, N.A.A.; Hassan, N.H.; Yahya, F.; Kama, N.; Shah, J. IOT security risk management model for healthcare industry. Malays. J. Comput. Sci. 2019, 131–144. [Google Scholar] [CrossRef]
- Abouelmehdi, K.; Beni-Hessane, A.; Khaloufi, H. Big healthcare data: Preserving security and privacy. J. Big Data 2018, 5, 1. [Google Scholar] [CrossRef]
- Simsekler, M.C.; Card, A.J.; Ward, J.R.; Clarkson, P.J. Trust-level risk identification guidance in the NHS East of England. Int. J. Risk Saf. Med. 2015, 27, 67–76. [Google Scholar] [CrossRef]
- Bates, D.W.; Saria, S.; Ohno-Machado, L.; Shah, A.; Escobar, G. Big data in health care: Using analytics to identify and manage high-risk and high-cost patients. Health Aff. 2014, 33, 1123–1131. [Google Scholar] [CrossRef]
- Liverani, M.; Waage, J.; Barnett, T.; Pfeiffer, D.U.; Rushton, J.; Rudge, J.W.; Coker, R.J. Understanding and managing zoonotic risk in the new livestock industries. Environ. Health Perspect. 2013, 121, 873–877. [Google Scholar] [CrossRef]
- Jadi, A.; Zedan, H.; Alghamdi, T. Risk management based early warning system for healthcare industry. In Proceedings of the IEEE International Conference on Computer Medical Applications (ICCMA), Sousse, Tunisia, 20–22 January 2013. [Google Scholar]
- Yucel, G.; Cebi, S.; Hoege, B.; Ozok, A.F. A fuzzy risk assessment model for hospital information system implementation. Expert Syst. Appl. 2012, 39, 1211–1218. [Google Scholar] [CrossRef]
- Cagliano, A.C.; Grimaldi, S.; Rafele, C. A systemic methodology for risk management in healthcare sector. Saf. Sci. 2011, 49, 695–708. [Google Scholar] [CrossRef]
- Vekaria, D.; Kumari, A.; Tanwar, S.; Kumar, N. ξboost: An AI-based data analytics scheme for COVID-19 prediction and economy boosting. IEEE Internet Things J. 2020, 8, 15977–15989. [Google Scholar] [CrossRef] [PubMed]
- Jiang, T.A.O.; Li, J.P.; Haq, A.U.; Saboor, A.; Ali, A. A novel stacking approach for accurate detection of fake news. IEEE Access 2021, 9, 22626–22639. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).