Medical Instructed Real-Time Assistant for Patient with Glaucoma and Diabetic Conditions

: Virtual assistants are involved in the daily activities of humans such as managing calendars, making appointments, and providing wake-up calls. They provide a conversational service to customers around-the-clock and make their daily life manageable. With this emerging trend, many well-known companies launched their own virtual assistants that manage the daily routine activities of customers. In the healthcare sector, virtual medical assistants also provide a list of relevant diseases linked to a speciﬁc symptom. Due to low accuracy and uncertainty, these generated recommendations are untrusted and may lead to hypochondriasis. In this study, we proposed a Medical Instructed Real-time Assistant (MIRA) that listens to the user’s chief complaint and predicts a speciﬁc disease. Instead of informing about the medical condition, the user is referred to a nearby appropriate medical specialist. We designed an architecture for MIRA that considers the limitations of existing virtual medical assistants such as weak authentication, lack of understanding multiple intent statements about a speciﬁc medical condition, and uncertain diagnosis recommendations. To implement the designed architecture, we collected the chief complaints along with the dialogue corpora of real patients. Then, we manually validated these data under the supervision of medical specialists. We then used these data for natural language understanding, disease identiﬁcation, and appropriate response generation. For the prototype version of MIRA, we considered the cases of glaucoma (eye disease) and diabetes (an autoimmune disease) only. The performance measure of MIRA was evaluated in terms of accuracy (89%), precision (90%), sensitivity (89.8%), speciﬁcity (94.9%), and F-measure (89.8%). The task completion was calculated using Cohen’s Kappa ( k = 0.848 ) that categorizes MIRA as ‘Almost Perfect’. Furthermore, the voice-based authentication identiﬁes the user effectively and prevent against masquerading attack. Simultaneously, the user experience shows relatively good results in all aspects based on the User Experience Questionnaire (UEQ) benchmark data. The experimental results show that MIRA efﬁciently predicts a disease based on chief complaints and supports the user in decision making.


Introduction
With the emerging trends of technology, virtual assistants help users complete their daily routine tasks efficiently.Most of the virtual assistants use artificial intelligence and provide personalized assistance to the users in the form of managing calendars, controlling smart environments, navigation, making an appointment, providing wake-up calls, and many more things [1].Many applications from different domains currently have their own built-in virtual assistants such as televisions [2], mobile devices [3], vehicles [4], and the Internet of things [5,6].The virtual assistant is also known as a chatbot, dialogue manager, virtual agent, interactive assistant, or conversational agent.Many well-known companies including Apple (Siri), Google (Assistant), Samsung (Bixby) and Amazon (Alexa) introduced their own virtual assistants.These virtual assistants provide an interactive user interface (text, speech, or both) that have the ability to understand requests, handle complex tasks, and generate an appropriate response using the machine learning model [7].
In the healthcare sector, the adoption of machine learning facilitated diagnosis [8,9], treatment [10,11], and streamlining of administrative tasks [12].With the popularity of virtual assistants, healthcare is also moving toward this technology.It prevents unnecessary visits to the doctor, which reduces the administrative burden, increases efficiency, and support clinical decisions.According to a survey conducted in [13], primary care physicians spent more time managing electronic medical records (EMRs) than engaging with patients.Therefore, several virtual medical assistants were introduced, such as Nuance [14], Suki [15], and Robin Healthcare [16], which automate the process of documenting clinical information using artificial intelligence and provide services to the healthcare provider [17][18][19].Moreover, several virtual medical assistants provide trusted information based on an analysis of medical symptoms, which include MedWhat [20], Your.MD [21], and Sensely [22].These provide personal healthcare assistance using medical knowledge on the web, and EMR.These virtual medical assistants show a list of relevant diseases that match the input symptoms.
Suppose, a statement 'I have abdominal pain' is linked with a list of conditions, which include bowel cancer, constipation, Crohn's disease, and gluten intolerance.The recommendation is predicted based on one symptom and requires approval from the medical specialist [23].Due to the low accuracy and uncertainty of the existing virtual medical assistants, the resulting list of conditions may lead to depression, anxiety, and hypochondriasis [24].To the best of our knowledge, none of the existing virtual medical assistants in the natural language processing domain considered real-time disease diagnosis based on the user's chief complaint, which has the utmost priority, stated in the patient's own words, and is the main reason for the patient's visit.It may be possible that more than one disease has the same kind of chief complaint, making it hard to identify a specific disease.Furthermore, every person has their own accent and way of explaining the chief complaint, so understanding this type of conversation is a challenge for the virtual medical assistant as well.
In this study, we considered the challenges faced by the existing virtual medical assistant and proposed a solution in terms of the Medical Instructed Real-time Assistant (MIRA).MIRA supports primary healthcare services and uses spoken natural language for interactive communication to achieve a high success rate on task completion [25].Moreover, MIRA analyzes the user's chief complaint and predicts a specific disease.Then, the users are referred to a nearby appropriate medical specialist based on the predicted disease.For the prototype version of MIRA, we used the chief complaint of glaucoma and diabetes based on the availability of collaborative medical specialists from the Yeouido Saint Mary's Hospital, Republic of Korea.
The main contributions provided by this study are summarized as follows: • We introduced the MIRA that identifies a disease based on user's chief complaint, understands single and multiple intent statements about a specific medical condition, and generates an appropriate response.

•
We added an identity and access manager, a session manager, and security event logging and monitoring to the MIRA architecture.These provide strong authentication, manage the conversational state, and monitor the system for anomalies, respectively.

•
We created a dataset of 816 patient chief complaints that were manually validated under the supervision of medical specialists, and were classified into glaucoma, diabetes, and other labels under the broad category of diseases.

•
We designed stock phrases from the recorded 816 dialogue corpora that contain 11,532 utterances.Each utterance was manually annotated for intent and context identification.

•
We evaluated MIRA based on a performance measure (including accuracy, precision, sensitivity, specificity, and f-measure), task completion, security, and user experience.
The rest of this paper is organized as follows.The overview of literature related to virtual medical assistants is described in Section 2.Then, Section 3 provides a comprehensive description of the MIRA methodology including system architecture, digital brain, and a case study.Subsequently, the evaluation of MIRA is presented in Section 4. Finally, Section 5 summarizes the work proposed in this study.

Related Work
We performed a systematic search of existing literature from the well-known digital libraries such as IEEE, ScienceDirect, ACM, Springer, PubMed, and Scopus.Based on this study, we focused on a spoken dialogue-based system that supports healthcare services.Therefore, we excluded the literature that does not focus on healthcare services and uses text, click, or touch as an interactive medium.Moreover, the studies that considered the Wizard-of-Oz concept were also filtered out.Based on these criteria, we found 14 studies and classified them into Finite State Assistants (10 studies) and Frame-based Assistants (4 studies).A comprehensive description of each category is provided in the subsequent sections.

Finite State Assistants
The finite state assistant asks a series of relevant questions to make a decision.This type of assistant does not support personalized recommendations because it follows the same sequential steps for each user.Philip et al. designed an Embodied Conversational Agent (ECA) for sleep disorder patients that ask questions using the Epworth Sleepiness Scale and identify the somnolence patients [26].Similarly, the mental disorder diagnostic system conducts an interview based on DSM-5 criteria and identifies patients with major depressive disorders [27].Moreover, an ECA was proposed for autism spectrum disorder patients that use audiovisual features for teaching social communication skills [28].The proposed system is also effective for those experiencing social complications.To reduce hospitalization of suicidal patients, an e-caring avatar was proposed in [29], which involves patients in self-care conversations and recommends relevant videos.To monitor chronic pain patients, Levin et al., proposed a Pain Monitoring Voice Diary that asks a sequence of questions and identifies the severity of pain accordingly [30].Moreover, the virtual agent for monitoring diabetic patients was proposed in [31], which makes a phone call once a week to collect vitals.Similarly, the spoken dialogue-based diabetic monitoring system collects patient vitals and helps physicians provide recommendations remotely based on the recorded information [32].Virtual human interviewers are becoming popular due to anonymity and rapport building that supports posttraumatic stress disorder patients.Lucus et al., proposed a virtual human interviewer, which conducts an interview with military service members involved in an intense situation and identifies the symptoms associated with their mental state [33].A similar kind of virtual agent was proposed in [34], which interacts with the users and identifies their mental symptoms using mixed methods for triangulation of data.Moreover, a rule-based patient-centric application was proposed in [35], which provides medical coaching services.

Frame-Based Assistants
The frame-based assistant analyzes and extracts the content from the user's conversation, then fills in the existing template to generate an appropriate response.The generated response may be personalized depending upon the business logic and training model of the corresponding virtual assistant.Ireland et al., proposed 'Harlie', which converses with the user on a variety of topics and helps in the neurological conditions of Parkinson's patients [36].Similarly, a virtual nurse was proposed in [37] to support maternal healthcare and provide guidance to expectant mothers during pregnancy.A few smartphone applications are also available that provide medical information after the analysis of symptoms such as MedWhat [20], Your.MD [21], and Sensely [22].Giorgino et al., proposed a virtual medical assistant that interacts with hypertensive patients and collects relevant data, which help the physician to evaluate the risk of cardiovascular disease [38].In [39], the virtual medical assistant supports general practitioners by analyzing patient health conditions (using a breast cancer ontological model) and recommending an oncologist.

Limitations of Existing Studies
According to our survey analysis, we identified three limitations in the existing studies that focused on spoken dialogue-based virtual medical assistants.

•
None of the existing studies considered security as a primary factor except [30], which uses the traditional PIN-based authentication mechanism [40], and it is vulnerable to brute-force attack [41].The virtual medical assistant interacts with users and gathers health-related information.The leakage of such information may lead to different attacks such as masquerading, and ransomware [42,43].Moreover, commercially available applications such as Your.MD [21], and Sensely [22] only comply with the security standards.

•
Most of the existing studies along with commercially available virtual medical assistants analyze the input symptoms, and either provide a list of specific diseases or relevant information [44].None of the existing spoken dialogue-based system considered patient chief complaint corpora for disease prediction or medical advice.

•
Limited studies focused on frame-based assistants due to various challenges such as intent identification, context awareness, and appropriate response generation.However, it provides interactions in a natural way (i.e., similar to humans) and keeps the user motivated to continue the conversation [45].

Medical Awareness Survey
We conducted a survey to assess medical awareness among university students and determine the need for MIRA.For this purpose, we designed a questionnaire and obtained approval from the Kyung Hee University Ethics Assessment Committee (KHU-EAC) after rigorous analysis of privacy aspects.The questionnaire was distributed via email among different departments including Computer Science and Engineering, Electrical and Electronic Engineering, Biomedical Engineering, Life Sciences, and Foreign Languages.The survey form was active for five consecutive working days.We received 119 responses from the age group (18 to 36 years) across 11 countries (International Students).Figure 1 presents the country-based distribution of participants along with gender ratio of male (50.8%) and female (49.2%).The participants responded to five polar questions as shown in Table 1.The survey result showed that 25% of the respondents had an awareness of medication and take medicine without doctor consultation (such as aspirin for pain and fever, amoxicillin for infection, and many more).These participants are also able to identify appropriate medical specialists based on their symptoms.The remaining 75% discuss with friends, family or general physicians.Healthcare services are expensive in most countries.Therefore, the majority of respondents preferred to discuss their symptoms with friends or family, which helps them to determine whether to seek an appropriate medical specialist.However, a small number of participants are not open to these discussions due to personal reasons.Overall, the majority of participants were excited about an application that understands speech-based natural language, determines specific disease based on chief complaints, and recommends a nearby appropriate medical specialist.

Methodology
In this section, we deliver a comprehensive description of our designed state-of-the-art virtual medical assistant (MIRA), which provides efficient and reliable service to the user.First, we describe the overall system architecture of MIRA as shown in Figure 2, where the three modules (such as identity and access manager, session manager, and security event logging and monitoring) are introduced and integrated with the basic architecture (i.e., voice user interface, speech recognition, natural language understanding, and dialogue manager).Then, the next sub-section provides details about the composition of the MIRA's digital brain, which includes the knowledge source and stock phrases that support natural language understanding and appropriate response generation.Finally, we provide a case study at the end of this section that gives a better understanding of the MIRA.

MIRA System Architecture
As illustrated in Figure 2, we added the identity and access manager, session manager, and security event logging and monitoring to the existing architecture of the virtual assistant [46,47], which overcomes the identified limitation of existing literature and virtual medical assistants.Here, the voice user interface provides an interactive communication medium between the MIRA and the user.We developed the prototype version of MIRA for Android due to wider compatibility with devices.Therefore, any smart devices (including smartwatches, smartphones, tablets, laptops, and some vendor-specific devices) that contain a microphone, speaker, and support Android can use the MIRA application.The speech recognition module recognizes human speech, then breaks it into voice samples, and transcribes each voice sample into text using the neural network algorithm for signal processing [48].The MIRA speech recognition module automatically transcribes the voice sample in a context-specific format.Then the Natural Language Understanding (NLU) module determines the intent of user's input based on the trained model.We used the Rasa framework for machine learning-based NLU and dialogue management [49].For tokenization and part of speech annotation, we extracted the semantic concepts from the Unified Medical Language System (UMLS) [50].The NLU also analyzes the nature of intent and forwards a request to a specific module (such as identity and access manager, session manager, or dialogue manager).
To the best of our knowledge, MIRA is the only virtual medical assistant that uses the concept of identity and access management [51].We used our designed voice-based authentication protocol that identifies the user based on their voice samples [52].Instead of random text, we matched the Mel-Frequency Cestrum Coefficients (MFCC) of each natural language input to provide a strong authentication mechanism.Moreover, the identity and access management consists of two sub-modules such as identity registration, and identity verification and validation.To use MIRA services, the user has to complete the registration process using the identity registration sub-module.For this purpose, MIRA collects a smart device identifier along with personal information such as name, address, gender, age, medical history, and voice samples.Among the collected information, the smart device identifier along with voice samples support authentication.The medical history, gender, and age help in the personalized recommendation.Moreover, this module also analyzes the collected information to avoid duplication and assigns a unique identifier of 7 digits, which can be used in a crisis such as authentication failure, identity verification, or permanent data removal.The identity verification and validation sub-module verifies and validates the identity of a registered user.First, the smart device identifier links a user to the information that they provided during the registration phase.To authenticate the user, the smart device identifier helps to retrieve the provided voice sample MFCC; it is then compared with the calculated MFCC of natural language input to calculate the similarity index (SI).If the SI greater than 70%, then the user gets authenticated and MIRA generates an appropriate response.
The session manager assigns a session identifier to the authenticated users, which binds with the user identity and is valid for a specific session only.We used the keyword spotting technique, which detects 'Hello MIRA' and 'Bye MIRA' keywords in the spoken utterances.'Hello MIRA' is used to initiate a session, and all the communication during this period is bound with the issued session identifier.The 'Bye MIRA' is used to terminate the ongoing session.We used two types of templates 'Hello [Given Name], How may I help you?' and 'Hello [Given Name], How may I help you today?' for greeting a new user with no medical history, and an established user with a medical history, respectively.Moreover, MIRA checks the validity of a corresponding session upon receiving an input request.In the case of timeout (idle for 60 minutes), the renewal request is forwarded to the session manager.
The dialogue manager is responsible for scenario understanding, state tracking and managing the flow of the conversation.This module identifies the conversational context from the natural language input and generates an appropriate response.It may be possible that the user starts another conversation without terminating or concluding the previous one.This type of conversation handling is not in the scope of this study.Moreover, the dialogue manager consists of six sub-modules.(i) The story data are used to train the dialogue management model.A story is the representation of a complete dialogue between the user and virtual assistant.We designed the story data manually from the recorded dialogue corpora that facilitate MIRA to make the conversation real and natural.(ii) The state tracking is the core module of MIRA that predicts the user goal (represented by slot-value pairs) at every dialogue turn.It maintains the conversation state, performs an action based on policy, and generates a relevant response after analyzing the natural language input.(iii) The dialogue templates consist of predefined statements that can be used by filling in the keyword.Although we trained a model to understand conversation and response generation, some statements are similar and common except for the keyword.Consider the statements 'Do you feel hungry?' and 'Do you feel tired?'.Both sentences are similar except for the keywords 'hungry' and 'tired'.To improve the performance and response generation of MIRA, we used templates for these kinds of statements that have similar semantics.(iv) The chief complaint data is the knowledge source that helps identify the conversation context.Based on the identified context, MIRA analyzes the dialogue corpora and asks a follow-up question.(v) The medical history consists of the health record that a user provided during registration.It also stores each recommendation along with the key attributes (sign and symptom) that results from the conversation between MIRA and the user.Keeping these health records helps the MIRA to generate a personalized decision for future conversations.(vi) The response formulation has a challenging role in the interaction because it generates a relevant response based on the input query.Therefore, this module takes the necessary information from different sub-modules of the dialogue manager and generates an appropriate text-based statement.
The text to speech synthesis analyzes and processes the text-based statement using natural language processing.Then, it converts the processed text into synthesized speech using digital signal processing and conveys it to the end-user in a polite female voice.MIRA deals with healthcare data and directs the user to a nearby appropriate medical specialist based on the chief complaints.This kind of dialogue contains sensitive information and its leakage may lead to serious consequences such as a masquerading and ransomware attacks.The security event logging and monitoring module continuously monitors the communication channels for anomalies.Also, it collects the information, which can be used as an audit trail for intrusion prevention and event management.With the proposed system architecture, MIRA understands single and multiple intent statements, supports adaptability, and provides data control.

Understanding the MIRA Digital Brain
According to [53], a virtual assistant consists of a digital brain, which is divided into a knowledge source, stock phrases, and conversation memory.Our state-of-the-art MIRA's digital brain is divided into a knowledge source, and stock phrases.We incorporated the conversation memory inside the stock phrases for efficient response generation.The knowledge source is an important part of a virtual assistant that helps in understanding the context of a conversation.Our proposed MIRA focused on the identification of a disease based on the user's chief complaint.In this regard, the first challenge that we faced was the selection of an appropriate dataset.We analyzed the publicly available datasets on the Internet, but to the best of our knowledge, none of the available datasets in English considered the patient chief complaint.Most of the datasets considered medical terminologies that are hard to understand for non-medical professionals.Therefore, we decided to create a dataset considering the patient chief complaints.For this purpose, we selected two well-known diseases, glaucoma and diabetes, due to the availability of collaborative medical specialists from the Yeouido Saint Mary's Hospital in the Republic of Korea.Under the hospital's legal policy (Institutional Review Board approval) and HIPAA (Health Insurance Portability and Accountability Act), we briefed the participants before their medical examinations, and a written consent form was signed by each participant.This form explained that the data would be collected anonymously and strictly used for research purposes (considered the privacy aspects) only.We collected 816 patient chief complaints and, based on the medical specialist's recommendation, classified them into glaucoma (48.5%), diabetes (46.2%), and other (5.3%).These labels were assigned based on the broad category of diseases.The glaucoma label consists of all patients, which includes angle-closure suspect, glaucoma suspect, and pure glaucoma patients.Similarly, the diabetes label consists of all types of diabetic patients, which include type 1, type 2, and gestational.The other label consists of those patients that have diseases except glaucoma and diabetes, including normal conditions.We represented the data in tabular form that consist of 816 rows and 32 columns.Each row represents one patient with potential symptoms, while the columns represent observed features for that patient, including the class of diagnosis label (glaucoma, diabetes, or other).Table 2 describes 31 features of the MIRA dataset.Collecting such data helps us to identify specific patients based on their chief complaints since the categorization of these patients is based on different laboratory test results and medical specialist opinion.
After the creation of the knowledge source, the next challenge was to identify the most appropriate predictive model.For this purpose, we used MOD [54], which filters out seven applicable machine learning models (including decision trees, naive Bayes, K-nearest neighbors, random forest, random tree, decision stump, and deep learning) based on the provided dataset features.To determine the accuracy of each predictive model for MIRA's dataset, we used RapidMiner with 10-fold cross-validation and evaluated the predictive model accuracy as shown in Figure 3.The result shows higher accuracy for the deep learning model (99.14%) because it learns from data incrementally and identifies the hidden relationships.Therefore, we selected deep learning as the best suitable predictive model for MIRA.The predictive model along with knowledge source helps in context identification of a dialogue corpus, which determines the category of the disease such as glaucoma, diabetes, or other.The stock phrases help MIRA to understand the user intent (what the user is trying to say) and support response generation.We searched online for publicly available patient-doctor dialogue corpora in the English language, but none of the relevant datasets were found.Therefore, we decided to design the dialogue corpus from the recorded patient-doctor conversation, which includes 816 dialogue corpora (11,532 utterances).We manually annotated each utterance for NLU and the dialogue manager to make the interactive environment of MIRA as real and natural as possible.

Case Study
To understand the working scenario of MIRA, consider John Doe, a registered user, who wants to discuss his medical condition with MIRA and is looking for an appropriate medical specialist nearby.John started the conversation by saying 'Hello MIRA'.The speech recognition recognizes the natural language input as received from the voice user interface, transcribes it into text, and sends it to the NLU to identify the intent of the utterance, which is a greeting in this case.The intent-text pair (intent: greeting, Text: Hello MIRA) along with voice-print is sent to the identity and access manager, which verifies and validates John's identity using the MFCC matching technique.Upon approval, the request is forwarded to the session manager, which determines whether John has an ongoing session or the phrase is to initiate a new conversational session.According to the session manager, John does not have an ongoing session.Therefore, the session manager generates a new session identifier linked with John's identity and forwards the request to the dialogue manager for generating a relevant response.At first, the dialogue manager analyzes the state for an ongoing conversation using state tracking, then infers the intent of the request based on the chief complaint data and medical history.In this case, John did not provide any medical history during the registration phase and initiated the conversation with a greeting utterance, which does not link to any of the chief complaints.Therefore, the inferred request is forwarded to the story data for selecting an appropriate story.Then, a new user greeting template is selected using the dialogue templates and is forwarded to the response formulation, which customizes the template based on the user identifier to generate a text statement.The text to speech synthesis receives this text, transcribes it into a spoken response, and plays it on a smart device speaker.A similar procedure will be followed for handling each dialogue corpus.At any point of the conversation, the user can say 'Bye MIRA' to terminate the session.Figure 4 illustrates the MIRA implementation model for handling a complex conversation.The different colored lines present the workflow and inter-connectivity between the basic modules.Figure 5 presents the MIRA smartphone application screenshots.The user interface shows a circularly shaped gray button on the main screen, which can be used to activate MIRA by pressing the button.Upon activation, MIRA starts listening, and the color of the button changes to bright green.We set the listening duration to 5 seconds, but it can be changed to 1 minute from the application setting.When the time is up, MIRA starts analyzing the spoken natural language and changes the button color to orange.We used the color change technique because warm colors have a positive impact on the user's emotions and behavior as per psychology [55].Furthermore, MIRA displays the input and output natural language on the smartphone screen in the form of a chat bubble for better understanding along with the spoken response.MIRA switches to an idle state (gray color), if the user does not speak for 5 seconds, which requires reactivation by pressing the gray button.However, the session identifier will be valid until the user terminates by saying 'Bye MIRA' or the conversation is idle for 60 consecutive minutes.As a final recommendation, a frame of Google maps shows the nearby appropriate medical specialist.By clicking the map frame, the query will open in Google maps.

Evaluation
MIRA provides efficient and reliable healthcare services to the users.To ensure productivity, we evaluated MIRA based on performance measures, task completion, security, and user experience.For this purpose, we circulated a call for participants on the university's mailing list and social media.A total of 33 participants belonging to seven countries registered, including 20 males and 13 females within the age group of 18 to 43 years as shown in Figure 6.The participants were affiliated with different departments such as Healthcare Subject Matter Experts (5), Medical Practitioners (4), and students belonging to Medical (7), Computer Science (9), Bioinformatics (3), Life Science (3), and International Relations (2) disciplines.Each participant was given a set of procedural documents, which contained a checklist of tasks, consent form, hints for acting as a particular patient type, and a user experience questionnaire.The consent form clearly describes the data collection procedure, including audio and video recording of interactions with MIRA, data storage, data usage, and disposal details.Moreover, participants were shortly briefed about the goal of the activity, and we instruct them to sign the consent form after reading it carefully.Upon agreement, the voice sample along with demographic information (name, address, gender, age, and medical history) was collected to complete the MIRA registration process.As per the scope of this study, MIRA predicts glaucoma and diabetes based on the trained model.The remaining diseases including normal conditions are out of scope and are considered under the other label.Therefore, MIRA analyzed user interactions, identified chief complaints, and categorized these as glaucoma, diabetes, or other.Among the 33 registered participants, 17 did not belong to the medical profession.For this reason, we provided a list of chief complaints as described in Table 3, which guide the participants to act as a patient for three health conditions.In the case of other label, we selected cardiovascular and orthopedic chief complaints that are similar to glaucoma and diabetes.If MIRA did not generate a final recommendation for some reason, then it politely responded 'I am sorry, I am not able to diagnose your disease based on the provided knowledge.Do you want me to assist you further?'.Moreover, the participants were allowed to use synonyms, ask questions in a random sequence, and interact in a natural way of communication.

Experimental Setup
We set up an interactive environment based on the availability of resources, which includes three android smartphones (Samsung Galaxy S7), three iPhones (6s), three cell phone holders, and three tripod mounts.The MIRA application was installed on three android smartphones, and these were attached to classroom desks with the help of cell phone holders, which can be adjusted.The three iPhones attached to tripod mounts were used for audio and video recording of each user's interaction with MIRA.Complete sets of equipment were placed at three corners of the classroom, which include an android smartphone, cell phone holder, iPhone, tripod mount, classroom desk, and chair.Only three participants can interact with MIRA simultaneously in the design experimental setup.Therefore, we divided the participants into 11 groups (three members per group) based on their availability and feasibility.Each member of the group can interact with MIRA independently for an allocated time of 60 minutes while acting like a patient using the provided hints.

Performance Evaluation
To assess the effectiveness of MIRA, we used the common performance evaluation measure based on an independently distributed confusion matrix, as described in Table 4.The values were assigned based on the final recommendation label.The diagonal and off-diagonal values of the confusion matrix present correctly classified and incorrectly classified results, respectively.Similarly, the rows and columns of the confusion matrix show actual values per label and predicted value per label, respectively.Furthermore, the characteristics of performance evaluation measurement are reflected in terms of accuracy, precision, sensitivity, specificity, and F-measure.The corresponding description along with formulas for these measures are described as follows.Each participant completed the interaction for three health conditions that included glaucoma, diabetes, and other.Figure 7 illustrates the value of each label.We recorded a total of 99 dialogue corpus based on the interactions of 33 participants.• Accuracy identifies the effectiveness of an algorithm based on the probability of true values as stated in Equation ( 1).MIRA gets an overall accuracy of 89.8% because it correctly identified 90.9% glaucoma (30), 84.8% diabetes (28), and 93.9% other (31) labels among the recorded dialogue corpus (99).
• Precision or confidence presents the positive predictive value of a label that can be derived using Equation (2).We obtained the precision for each label including glaucoma (88.24%), diabetes (93.33%), and other (88.57%), with an average precision of 90%.
• Sensitivity (also known as recall) corresponds to the true positive rate of a specific label and can be computed with Equation (3) for glaucoma (90.91%), diabetes (84.85%), and other (93.94%), with an average value of 89.8%.
• Specificity corresponds to the true negative rate and can be computed using Equation (4) for a specific label for glaucoma (93.94%), diabetes (96.97%), and other (93.94%), with an average value of 94.9%.
• The F-measure, also known as F-score or F1-score, is the weighted harmonic mean of precision and sensitivity (recall) as stated in Equation ( 5).The F-Measures for each label in MIRA were as follows: glaucoma (89.55%), diabetes (88.89%), and other (91.18%), with an average value of 89.8%.We used β = 1 that evenly balances the F-score based on precision and sensitivity.

Task Completion
Task completion is an important factor in the virtual assistant.It measures the task success probability of dialogue corpora.To assess MIRA's task completion, we used the PARADISE (PARAdigm for DIalogue System Evaluation) framework that uses the Kappa coefficient to operationalize the measure of task-based success [56].The Kappa coefficient k measures the success rate of task completion and is computed with Equation (6).
P(A) is the proportion of times that agreement occurs between the actual and scenario attribute value.P(E) is the proportion of times when the agreement between the actual and scenario attribute value is expected.The value of k considers task complexity and assesses the virtual assistant by correcting for the expected agreement and performing different tasks.If agreement is only expected by chance, then k = 1 and k = 0 for total agreement and no agreement, respectively.Moreover, if the expected chance of agreement (P(E)) is unknown, then it can be calculated from the confusion matrix using Equation (7).
Here, t i is the sum of the i th column frequency of the confusion matrix.T is the sum of frequency t 1 + t 2 + ... + t n in the confusion matrix.Similarly, P(A) can be calculated from the confusion matrix with Equation ( 8), if unknown.
MIRA task completion based on the PARADISE framework gives Expected Agreement P(E) = 0.334, Actual Agreement P(A) = 0.898, and Kappa Coefficient k = 0.848.The interpretation of Kappa categorized MIRA as 'Almost Perfect' in term of task completion [57].

Security
Healthcare applications deal with sensitive data such as medical records, health conditions, and quality of life.The illegal usage of these data may lead to several attacks.For this purpose, we launched a masquerading attack based on the scope of this study.The prevention of masquerading attack also minimizes the risk of ransomware.
The masquerading attack uses a fake identity to gain unauthorized access [58].To launch this attack on MIRA, we asked the members of each group to shift their positions.Suppose the participants on positions C, B, and A will shift to A, C, and B respectively.Then, the adjacent member gets access to the authenticated user account of MIRA and starts an interaction.During the analysis of natural language input, MIRA verifies the device identifier, but is unable to validate the MFCC value.Therefore, MIRA holds the ongoing session and asks the participant for identity verification as 'Sorry for the interruption, malicious activity was detected.To proceed with the ongoing session, please enter your seven digit identity verification key'.At this stage, the user has to enter the identity verification key to interact with MIRA.Moreover, if an unauthorized user wants to interact after session time out (60 minutes), then MIRA will respond as 'I am sorry, but I am not able to verify your identity.Do you want me to assist you through the registration process?'.Furthermore, one smart device identifier can bind with multiple user identities, which means that more than one user can use the same device, but registration is mandatory for each user.The results show that MIRA prevents against masquerading attacks because none of the participants were able to interact with other user applications due to voice-based authentication.

User Experience
After interacting with MIRA, the participants were asked to fill out the User Experience Questionnaire (UEQ) [59], which covers all the aspect of user experience in a comprehensive way.The UEQ is widely used as a subjective measurement of user experience and provides a data analysis tool for assessing user responses.Therefore, we used it to evaluate the MIRA user experience.It consists of 26 items using a 7 point Likert scale for rating.The results of these 26 items are mapped with 6-dimensional scales such as attractiveness (6 items), perspicuity (4 items), efficiency (4 items), dependability (4 items), stimulation (4 items), and novelty (4 items) as shown in Figure 8.The x-axis and y-axis present the list of items and rating scales (extremely good(+3), neutral (0), horribly bad (−3)), respectively.Furthermore, the 6-dimensional scales are grouped into pragmatic quality (perspicuity, efficiency, and dependability), and hedonic quality (stimulation, originality).The pragmatic deals with task-related quality aspects, while the hedonic describes non-task related quality aspects.Figure 9 illustrates the result of MIRA based on 6-dimensional scales, which exhibit accurate measurements because the values are greater than 1.6.Moreover, Figure 10 presents MIRA's attractiveness and pragmatic quality along with hedonic quality, where the value is greater than 1.80, reflecting a positive evaluation based on UEQ criteria.To identify the correlation of items per scale, UEQ uses Cronbach's alpha-coefficient, which measures the consistency of a scale as shown in Table 5.The value of attractiveness is higher than 0.7, which means that all users enjoyed the interactions with MIRA.Most of the participants recommend an avatar instead of a simple user interface for MIRA.Therefore, the alpha-coefficient value of novelty was less than 0.5.Furthermore, Figure 11 presents a comparative analysis of MIRA based on the UEQ benchmark dataset, which consists of 401 product evaluations collected from 18483 participants.The results show that MIRA is relatively good in all aspects based on the benchmark data.

Discussion
MIRA was evaluated by 33 participants belonging to different domains, age groups, genders, and diverse nationalities.The participants were given 60 minutes to complete a list of tasks during the interaction with MIRA.Among the 33 participants, 27 completed their tasks at an average time of 40 minutes because their interaction was smooth with little or no misinterpretation.However, 6 participants took an average of 55 minutes due to several misinterpretations such as 'thirsty' as 'thirty', 'tired' as 'tire', 'driving' as 'diving', and 'tear' as 'tire'.Based on these interactions, MIRA gets an overall accuracy of 89% because it used the deep learning predictive model, which learns from the data incrementally and manages complex dialogues efficiently.We considered the macro-average instead of micro-average for calculating precision (90%), sensitivity (89.8%), specificity (94.9%), and F-measure (89.8%) of the complete system.Please note that the macro-average gives equal weight to each class label, while the micro-average results are biased towards the larger class label.Therefore, we showed impressive results for MIRA in terms of efficiency and effectiveness.Moreover, the PARADISE framework was used to evaluate the task completion of MIRA, where the actual agreement (P(A) = 0.898) is better than agreement-by-chance (P(E) = 0.334).Because the stock phrases were designed from real conversations that facilitated MIRA for a better understanding of natural language input.The Cohen's Kappa value (k = 0.848) was interpreted as 'Almost Perfect' because MIRA generated the response in a real and natural way using a female voice that keeps the user motivated to continue the interaction.
MIRA also keeps a record of the conversational dialogue corpus along with final recommendation about the appropriate medical specialist that supports personalized interactions with the established user.For the prototype version of MIRA, we considered authentication instead of confidentiality, integrity, and availability.A strong authentication mechanism minimizes the risk of exploiting security vulnerabilities but will affect the performance and efficiency of the system.Therefore, we used the lightweight version of our designed voice-based authentication protocol, which identifies the user based on the extracted MFCC value of natural language utterances; this method was evaluated for masquerading attack.The results showed that MIRA successfully identified the user in real time based on their voice samples and strongly resisted a masquerading attack.
We used UEQ for evaluating the user experience because it provides ease of data analysis and calculates the necessary statistics accordingly.Due to reliability, different organizations used UEQ for evaluating their products and consider it to be a good measure.According to UEQ, MIRA was evaluated in terms of attractiveness, pragmatic quality, and hedonic quality, where the value of pragmatic is smaller than the other two qualities (attractiveness and hedonic).This is due to the low value of secure and predictable items under the category of pragmatic quality because some participants considered secure in terms of security, but it evaluates the user's feelings regarding the interaction control.Moreover, MIRA uses synonyms of specific words for generating a relevant response, which may be unpredictable for a conversational scenario in some situations.Suppose in one interaction MIRA asked a user, 'How about your empty-bellied?' instead of 'Do you feel extremely hungry?'.The value of pragmatic quality is affected by these two factors.However, the overall results of UEQ present positive feedback, and users were satisfied with MIRA's interactive communication.
After the completion of tasks, the participants were awarded a shopping coupon worth 30,000 KRW as an incentive.The participants belonged to diverse nationalities that helped assess how MIRA deals with a variety of accents as well.According to our analysis, some participants do not realize the voice-based authentication mechanism due to the lightweight protocol until they were asked to switch their positions for performing a masquerading attack.In the future we plan to evaluate MIRA with real glaucoma and diabetic patients, then compare the results of both assessments.Furthermore, we will evaluate MIRA for relevant emerging cyber-attacks.

Conclusions
In this study, we introduced a state-of-the-art virtual medical assistant, MIRA, that interacts with the user in a spoken natural language, diagnoses a disease based on a user's chief complaint, and refers the user to a nearby appropriate medical specialist.The key contribution of MIRA includes disease identification based on chief complaint, understanding single and multiple intents, a voice-based authentication mechanism, conversational state tracking, and continuous monitoring of the system for detecting anomalies.Moreover, we designed a chief complaint dataset and stock phrases from the recorded dialogue corpora.MIRA is the first assistant of its kind that considers security aspects (such as authentication), which requires improvements in terms of transmission security and audit control to become HIPAA compliant.The designed knowledge source of MIRA considered glaucoma and diabetes chief complaints only, which can be extended to other medical conditions in the future.
There are many challenges in developing these kinds of interactive systems such as privacy concerns, accuracy constraints, correct decision making, precise response generation, and gaining user trust.The compliance with standards may help in risk minimization.Besides these challenges, it is beneficial for society, especially in underdeveloped countries, where people are suffering from many diseases due to the lack of healthcare facilities.These kinds of virtual medical assistants help the patient identify an appropriate medical specialist and reduce healthcare cost.Also, it supports medical practitioners and students in clinical decision making.

Figure 1 .
Figure 1.Medical awareness survey: Country-based distribution of participants along with gender ratio.

Figure 3 .
Figure 3. MIRA prediction accuracy with machine learning models.

Figure 4 .
Figure 4. MIRA implementation model for conversational handling.

Figure 6 .
Figure 6.Country-based distribution of MIRA evaluation participants.

Figure 7 .
Figure 7. Performance evaluation measure of interactive scenarios.

Figure 8 .
Figure 8. MIRA user experience questionnaire mean value per item.

Figure 9 .
Figure 9. MIRA user experience questionnaire resulting scores on six dimensional scale.

Figure 10 .
Figure 10.MIRA user experience questionnaire aggregated score of pragmatic and hedonic qualities.

Figure 11 .
Figure 11.MIRA user experience questionnaire scores on six dimensions scales along with benchmark data.

Table 1 .
Medical awareness survey questionnaire results.

Table 2 .
MIRA Dataset features with ranges, measurement units, and meaning of each feature.

Table 3 .
Sample of hints for acting as glaucoma, diabetes, and other, patient types.

Table 5 .
Correlation of items per scale using Cronbach's Alpha Coefficient.