Online Textual Symptomatic Assessment Chatbot Based on Q&A Weighted Scoring for Female Breast Cancer Prescreening

: The increasing number of female breast cancer (FBC) incidences in the East predominated by Chinese language speakers has generated concerns over women’s medicare. To minimize the mortality rate associated with FBC in the region, governments and health experts are jointly encouraging women to undergo mammography screening at the earliest suspicion of FBC symptoms. However, studies show that a huge number of women affected by FBC tend to delay medical consultation at its early stage as a result of factors such as complacency due to unawareness of FBC symptoms, procras-tination due to lifestyle, and the feeling of embarrassment in discussing private matters especially with medical personnel of the opposite gender. To address these issues, we propose a symptomatic assessment chatbot (SAC) based on artiﬁcial intelligence (AI) designed to prescreen women for FBC symptoms via a textual question-and-answer (Q&A) approach. The purpose of our chatbot is to assist women in engaging in communication regarding FBC symptoms, so as to subsequently initiate formal medical consultations for early FBC diagnosis and treatment. We implemented the SAC systematically with some of the latest natural language processing (NLP) techniques suitable for Chinese word segmentation (CWS) and trained the model with real-world FBC Q&A data obtained from a major hospital in Taiwan. The results from our experiments showed that the SAC achieved very high accuracy in FBC assessment scoring in comparison to FBC patients’ screening benchmark scores obtained from doctors. is scored by also proceeds to query


Introduction
In recent years, the main reason for the high mortality rate associated with female breast cancer (FBC) cases as reported by several global health organizations is attributed to delayed diagnosis and late treatment. Several studies [1][2][3] have revealed that lack of awareness, hesitancy, and shyness to discuss FBC symptoms during its early stages are among the top contributing factors exposing women to the danger of late treatment. Previous research has shown that Asian women were considered the least vulnerable to FBC compared to women in other parts of the world [4], however, recently there is an unprecedented rapid rise in the rate of FBC incidences in Asia [3][4][5]. In Taiwan; for example, the government has deployed several medical vans across the country, known as mobile mammography units [6], to serve as a desperate community outreach initiative to encourage screening for FBC symptoms. Unfortunately, irrespective of the costs of such an initiative, several patient-centric drawbacks concerning privacy, COVID-19 scare, lack of knowledge of FBC symptoms, etc., render such efforts less effective. Although the precise cause of FBC and its prevalence remains unascertained [7], the first step to averting these life-threatening risks requires that the affected persons address their symptoms in a timely manner by communicating with the appropriate healthcare providers.
In medical practice, before diagnosis and subsequent treatment are administered to patients, bilateral interactions based on question-and-answer (Q&A) is the basic and most vital approach used by doctors to fathom the type and stage of patients' internal symptoms (such as early-stage FBC), which are usually invisible to the eyes by mere observation. Therefore, effective communication between doctors and patients is generally advantageous towards achieving successful medicare outcomes [8], and telemedicine via chatbots is the promising modern way to foster it [9,10]. Based on this predication, various healthcare establishments are beginning to adopt interactive text-based chatbot systems [11] to streamline communication between doctors and their remotely distributed patients; with artificial intelligence (AI) technologies, chatbots are quickly gaining the potential to facilitate interactive communication-dependent services and the capability to independently make adaptive judgment calls reliably in task-oriented closed domains.
In this paper, we present the FBC symptomatic assessment chatbot (SAC) which is a Chinese-text-based Q&A interactive dialog system developed for Chang Gung Memorial Hospital (CGMH) in Taiwan. The aim of our research study is to proffer a solution to the patient-centric communication barriers hindering most women from initiating consultations for mammography screening, especially at the early stage of FBC symptoms. For this task, first, we understand that, in critical vertical domains such as the healthcare sector, it is important that a chatbot should be built to process and match the textual Q&A dialog contexts with high-level logical consistency. According to Chen et al. [12], the degree of dialog output accuracy is often the primary criteria used in judging the performance of a Chatbot.
However, with respect to achieving high output accuracy, modeling a healthcare chatbot in Chinese presents both technical and linguistic challenges, especially with regards to Chinese word segmentation (CWS) [13] and other factors pertaining to medical ethics that often limit access to substantial patients' medical data samples [11,14] required for the training of AI models. Nonetheless, we implemented the SAC model with a light-weight trigram-based natural language processing (NLP) algorithm trainable with low volume data. In addition, we created a web and an iOS front-end chat interface for easy access to the services of the chatbot. Figure 1 illustrates how the SAC achieves its goal of connecting prospective FBC patients to the healthcare providers over the network. Against this background, we affirm that our proposed SAC dialog system is a resourceful application capable of contributing immense benefits to both the end-users and the healthcare domain responsible for handling FBC incidences. Such benefits include but are not limited to the following: • Benefit to users: SAC's Q&A assessment raises the awareness of FBC ailment and encourages self examination of the breasts during dialog in the comfort of the users' valued privacy, which will stimulate them to communicate freely and sincerely about their FBC symptoms. In congruence to this regard, an earlier study by Lucas et al. [15] revealed that patients are more comfortable responding honestly to computer-administered health assessments than that of human physicians due to the "intimate" nature of information usually required in such conversational contexts. On the other hand, SAC has the ability to promptly schedule medical appointment for the users scored with positive FBC symptoms in-order to expedite timely FBC diagnosis and early medical treatment. • Benefit to hospitals: SAC offers the potential benefit of controlling undue and inundating influx of patients in FBC healthcare departments, most of whom may schedule FBC medical appointments while having unrelated symptoms (i.e., FBC false negative cases). According to Bibault et al. [14], dialog systems can allow physicians to spend more time in treating patients who need the most consultation. On this account, the healthcare sector of many countries are beginning to utilize the digital services of chatbots for patients' disease prescreening assessment and triage in-order to reduce stress on health providers [16]. Therefore, by adopting SAC, hospitals can drastically minimize wastage of time, money, and other resource costs, and more importantly, doctors and other healthcare personnel will be enabled to work more productively to achieve effective patient-centric care service delivery.
The remainder of this paper is structured as follows: Section 2 covers a review of related works, and thereafter, in Section 3, we present the model description comprising of the adopted concepts used in implementing the SAC model. Section 4 contains brief details about the experiment and the corresponding result outcomes, respectively. Finally, Section 5 summarizes the work and provides a conclusion.

Related Works
Modern-day health service delivery is becoming more patient-centric than ever before. This new norm is instigating medical institutions across the world to seek urgent solutions on how to prioritize patients' medicare outcomes because a greater percentage of their reputation and profitability now depends on it [17]. Meanwhile, meeting every care need of patients is currently almost impossible for human physicians, considering the bottlenecks of gross maldistribution and shortage of healthcare workforce in the real-world scenario. Therefore, the paradigm and demand for patient-centric healthcare nowadays is introducing a new wave of physical and emotional stress problems for doctors and other health workers [18], which can result in low efficiency in delivering essential healthcare interventions and services.
On the basis of the above-stated problem, a plethora of diverse research studies has explored the use of text-based chatbots as a solution to support doctors and patients towards effective healthcare service delivery in various areas of medicine. To this regard, scholars such as Omoregbe et al. [19] developed an SMS-based NLP technique engineered with a knowledge base in the domain of known disease facts to assist medical experts in successfully boosting medical service delivery. Additionally, other similar studies have widely covered the aspects of using chatbots to support healthy lifestyles [9], mental health [11], pregnancy care [20,21], etc. [22]; however, very few related works have been conducted on the subject of oncology. With respect to breast cancer, Bibault et al. [14] developed an informative chatbot, referred to as Vik, which provides support to medically confirmed FBC patients and is capable of diagnosing other diseases as well. A subsequent study by Chaix et al. [23] investigated the effectiveness of the Vik chatbot from the patient-centric point of view and discovered that the use and acceptance of chatbots in healthcare can also improve patients' medication adherence. Judging from the perspective of Palan-ica et al. [22], the integral generic features of text-based chatbots such as being gender neutral, racially unbiased, cost-effective, and available round-the-clock to ask or respond coherently to chats when needed are among the appealing qualities that makes chatbots suitable in supporting patients' care needs.
Recently, modelling of chatbots with deep neural network (DNN) algorithms has become the in-vogue approach widely used for the purpose of achieving intelligent dialog outcomes. Meanwhile, Najafabadi et al. [24] and Wang et al. [21] considered this approach as being very data-intensive. The former [24] indicated that the output accuracy of some of the models based on DNN approach depends on the hugeness of the data size used for model training, which, as a result, often necessitates demand for extensive computational resources that can only be afforded by a limited audience. In confirmation, Shaikhlina and Khovanova [25] stated that DNNs trained with small dataset often exhibit unstable behavior in terms of their output performance. Considering that data availability for research in the closed domain of healthcare sector are often either scarce [14,25,26] due to strict medical ethics enacted to protect the privacy of patients' information or that the available data are unusable [27] poses some concerns regarding the application of DNNs in this regard. Although authors such as Kapociūtė-Dzikienė [28] have conducted experiments on DNNbased chatbot (e.g., stacked bidirectional long short-term memory (BiLSTM)) and achieved interesting results after training with small-scale dataset for a closed-domain task, such approaches may not yet be fully accredited for implementation in critical service domains such as the healthcare sector, where dialog errors can result to fatal implications [12,14]. Meanwhile, contrary to DNNs, scholars [9,14,16,19] have evidently demonstrated the practical effectiveness of modelling medical chatbots with rule-based NLP frameworks that are based on traditional hand-crafted machine-learning approaches. This approach is often based on pattern-matching [9,10,14,19] with a knowledge base for domain-specific tasks and requires low-volume data and less computational resources for model training in contrast to the DNN approach.
Further research directions relevant to our work borders on the implementation of text-based natural language understanding for dialog systems. On this subject, English and Chinese texts are the most researched languages [28]. A study by Zhang et al. [29] highlighted the current social phenomenon of today's multilingualism of which the authors [29] proposed a two-stage Chinese-English mixed text normalization module to benefit NLP preprocessing tasks, owing to the recent prevalent norm of Chinese-speaking locals frequently mixing informal Chinese texts with English words, especially in social messaging platforms. Similarly, Shao et al. [30] have investigated greedy search algorithm with probabilistic N-gram matching, proposed for Chinese-to-English neural machine translation task. With precise regard to Chinese texts, several authors [31][32][33] unanimously agree that Chinese sentences being character sequences without word delimiters makes Chinese word segmentation (CWS) a key prerequisite in NLP. Ma and Hinrichs [31] proposed an embedding-based matching model for CWS using a greedy search segmenter to extract and match the historical and predicted Ngram-based character context features (i.e., unigrams and bigrams). In variance to [31], Cai et al. [32] implemented greedy search with a neural scorer (i.e., LSTM) to achieve fast and accurate CWS. Nevertheless, Fu et al. [33] conducted several experiments on eight diverse NLP models trained on seven different datasets, after which the authors proposed an attribute-aided evaluation method for CWS on the basis of their analytical research discovery that CWS evaluation performance varies per model and depends on the type of dataset used for training.
From the above reviewed studies, we generated the concept to model our proposed FBC symptomatic assessment chatbot (SAC) for best performance using the traditional handcrafted NLP approach. Our method also involved the application of trigram-based Chinese-text segmentation with probabilistic greedy search to effectively achieve contextual feature matching of the Q&A chat, in a domain-specific rule-based neural architecture. The justification for adopting this approach was necessitated by the limited size of the Q&A medical data availed by our research sponsor (i.e., Chang Gung Memorial Hospital) for this work.

Methodology
Generally, the healthcare service industry is a critical rule-based sector with low tolerance for professional misconduct. With this understanding, we devised the SAC dialog system to be domain-specific, i.e., based on adaptive logical rules bound within the locus of FBC data in the system's knowledge base. In other words, the SAC only asks and responds to textual chats related to the subject of breast cancer symptoms and ailments. The main reason behind this mechanism is to keep the SAC task-oriented in order to prevent any tendencies of deviation towards out-of-scope conversational outcomes and ambiguities. Therefore, human emotional cues were deemed unnecessary in this study and, thus, were not considered.
As proof of our compliance with medical ethics and other best practices adopted in this research, the SAC was granted full approval for clinical trial by the institutional review board (IRB) of Chang Gung Medical Foundation with number: 202000815B0C501. In this section of the paper, we present the most important aspects of the SAC's concepts and modus operandi.

The Symptomatic Assessment Process
According to Fadhil and Schiavo [34], chatbots can gain self-learning ability through pattern matching to provide more natural interaction with users. Being a domain-specific rule-based neural architecture, the SAC is integrated with the SQLite database containing structured FBC medical Q&A data curated by the physicians of the general breast surgical department at the Linkou branch of CGMH, Taiwan. In compliance with medical ethics and best practices, we anonymized patients' information from the obtained data by filtering out personal details as was performed by Bibault et al. [14]. Thereafter, in conjunction with two experienced FBC specialist doctors in CGMH, we structured the symptomatic assessment knowledge-base questions and their corresponding answer probability matches.

Question and Answer Distribution
The data structure in SAC's knowledge base is formatted to aid the chatbot's logic adapter mechanism to effectively match Q&A's and to retrieve the appropriate next subquestion. In respect to this and together with the directive of the FBC specialist doctors, we adapted a total of 16 main question categories (QCs) unevenly distributed and hierarchically structured to cover every potential topics regarding female breast cancer ailment and symptoms (see Table 1 and the statistical representation in Figure 2). The agent can ask several sub-questions under each question category in Table 1 in a sequence-to-sequence manner (as illustrated in Figure 3). In other words, each response to SAC's query provided by the user is computed by an adaptive logical process module in the system's backend (i.e., the chatbot's engine), which utilizes the Q&A contextual match to determine the next relevant sub-question or next question category to retrieve from the knowledge base. This process summarizes the dialog flow handling process (as shown in Figure 3). Additionally, since the SAC asks the questions and utilizes the context of the users' response to determine next relevant main or sub-question to be retrieved from the knowledge base, the multi-turn response selection design problem that commonly poses a challenge in retrieval-based dialog systems is thus eliminated [12]. Moreover, according to Figure 4, each of the 16 main category of questions on FBC symptoms contains a varied distributed number of sub-questions (see Figure 2) to enable the SAC agent to understand the extent of the user's FBC symptoms as a human physician (i.e., doctors) would.

Knowledge-Base Data Structure
The most frequent and notable symptoms of FBC ailment include but are not limited to users' evident feeling of lumps, persistent feeling of pains, fluid discharge from the nipple, and visible changes on the skin area of the breasts, illustrated and denoted as s1, . . . , s4 in Figure 5. These symptoms often trigger human physician's suspicion of a positive case of breast cancer before mammography screening is conducted. In the question category distribution (detailed in Table 1), most symptoms such as these were assigned high scores. Meanwhile, the FBC symptomatic assessment answer scores range on a scale of 1 to 10 (i.e., low to high). Take, for example, in the sample question 4 (QC4) in Figure 4, the agent queries the user if there is any visible depression, bulge, or lesion on the breast; therefore, assuming the user's response correlates with the context of A1 in the answer match probability (i.e., s4 symptom in Figure 5), the SAC's logic module considers the user's FBC symptoms as very serious (i.e., after matching the Q&A context with the domain lexical keywords in the knowledge base), and thus, the agent weighs and scores the user's response as a 10 as predefined in the knowledge base (see Figure 4). Subsequently, such prospective FBC patients (scored high) is granted priority with respect to scheduling of doctor's appointment at the end of the chat session.    FBC ailment (s1, . . . , s4). The SAC is modeled to sequentially compute and score the seriousness of the FBC symptoms (if any), during textual Q&A dialog with the user.

Weighing and Scoring Module
From our several liaisons with the FBC specialists doctors, we understood how different levels of FBC symptoms are classified during human screening assessment of FBC patients. Based on that, we classified the FBC symptomatic assessment questions, i.e., the 16 main QCs and their sub-questions (Q), into three levels. Let A, B, and C denote the sets of the levels A, B, and C, respectively, and N A , N B , and N C be the total number of questions that the patients answer at levels A, B, and C. We hereby note that each main question is accompanied by a varying number of sub-questions depending on the chat response answer provided by the subject (i.e., the patient/user). Let Q A i,j , Q B i,j , and Q C i,j denote the value of main question i accompanied by sub-question j, which the subject answered during their chat with SAC. For example, Q A 3,0 stands for the value of the subject's response answer to main question 3, which is assigned to A (i.e., Q A i,j ∈ A). In other words, Q A 3,2 stands for the value of the response by the subject for main question 3 accompanied by a second sub-question. For the FBC symptom score evaluation, we assigned three weight factors of different values denoted as f A , f B , and f C to each of the three class levels allotted to each of the questions. The values of f A , f B , and f C were adjudged and provided by the CGMH FBC specialist doctors. Meanwhile, the summation of f A , f B , and f C is 1 (i.e., f A + f B + f C = 1). Based on this rule, the SAC computes the final FBC assessment score, denoted as S 1 , with the derived algorithm given below: where n is the number of sub-questions under a main FBC question topic, 10], and α is an adjustment coefficient. We note that the coefficient α is an important variable in Equation (1) for accurate scaling of the SAC's assessment scoring the users for FBC symptoms. Although users' positive answers to level A questions for example are considered and scored by SAC as very high probability of being at risk of having FBC symptoms, further confirmation is needed, such as a clinical evaluation by a human FBC specialist doctor. The experimental setup values of the weight factors f A , f B , and f C and the coefficient α are further discussed in Section 4.

Natural Language Understanding
The effectiveness of chatbots in the healthcare domain requires advanced reasoning capability to formalize medical knowledge [19], and this process begins with understanding the language vocabulary. Bearing in mind that Chinese sentences are strings of characters with no natural white spaces [31][32][33] unlike English texts and other alphabet-based language words, it is imperative to use CWS to disambiguate Chinese texts for effective natural language understanding (NLU). Despite the approach used, past evidences in existing literature acknowledged CWS as a sequence labeling problem in which the target Chinese word character influences the feature prediction [31]. Similar to Zhao et al. [35], we first used a greedy-search character tokenizer to represent the Chinese texts as vector embedding, then we fed the vectors as the input of a trigram module. Thereafter, the NLU output was parsed into the multi-logic adapters in the chatbot's engine for Q&A matching.
Since our task is domain specific, we adapted the method by Zeng et al. [36] and constructed the trigram feature representation function, expressed as follows: where is the word character length, τ is the length of the trigram to be considered, and ρ is the probability of an ith word character w i position corresponding to a learned jth word w j as long as w j is in the agent's knowledge base. This process is illustrated in the model architecture in Figure 6.

Cache
Doctor's Backend Figure 6. An overview of the SAC dialog system architecture. Herein, e A represents the user's input answer embedding, whereas e Q represents the agent's output question embedding.

The Logic Reasoning Module
To enrich the SAC with NLP and machine learning capabilities, we employed the ChatterBot (https://chatterbot.readthedocs.io/en/stable/logic/index.html, accessed on 10 February 2021) framework as the AI engine. The ChatterBot platform is a stable opensource python-based framework that comprises scalable multiple logic adapters, which we adopted in SAC to serve as the reasoning module (see Figure 6). We aligned the logic adapters to compute Equation (1) and to match the FBC symptomatic assessment Q&A during an ongoing chat session with a user. In addition, this also enables the agent to retrieve the most appropriate and contextual next response (i.e., question) that has the highest confidence score (also referred to as the best match) from the SAC's knowledge base.

Experimental Results
We implemented the SAC program with Python 3. The knowledge-base training data and the chat score log data are both configured in YAML (yet another multicolumn layout) format. Moreover, YAML is a data-serialization language readable by both humans and machines, which makes it generically suitable for our task. For the trigram module's CWS processing, we set the text encoding to UTF-8 to accommodate Chinese texts. In our approach, we used the double blind assessment method to verify the correctness of the proposed SAC method. The weight factors f A , f B , and f C for SAC's FBC assessment score computation were originally assigned the values 0.6, 0.25, and 0.15, respectively, according to the directives of the doctors. Later on, the values of f A , f B , and f C were slightly adjusted after we received the doctors' assessment score of each user. Other model hyper-parameter settings for the training were as default in ChatterBot python library. Due to data size limitation, we trained the SAC with the entire FBC Q&A medical data availed by the research sponsor CGMH.
Inasmuch as the performance outcome of text-based dialog models are popularly evaluated with a variety of computer-based metrics such as in [37], etc., we however subjected the performance of the SAC to expert human evaluators. According to Palanica et al. [22], human physicians are the traditional benchmark for treating patients over many centuries. Thus, their keen evaluation and approval of a medical chatbot's performance is crucial. For this reason, the SAC model's scoring mechanism (S 1 ) was fine-tuned to meet the doctors' scoring standard (denoted as S 2 ). The doctors used 135 anonymized patients' FBC Q&A screening data to evaluate the symptomatic assessment scoring capabilities of the SAC. The subsequent approval of SAC for clinical trial by IRB of Chang Gung Medical Foundation reflects the satisfaction of the evaluators with SAC's high accuracy and stable performance outcome.

Discussion
In the plot scenarios depicted in Figure 7, the red dots represents the users (i.e., FBC patients) and the values on the y-axis indicate the SAC's FBC assessment scores, whereas on the x-axis is the doctor's FBC screening scores which we considered in this study as the fixed benchmark standard. In case 1(a), it was observed that the performance of the SAC agent was very poor. Herein, for example, the batch of users that was scored 10 by the doctors received between 4.1 to 5.9 scores from the chatbot. Therefore, we introduced an adjustable scaling coefficient α in the algorithm in Equation (1) to address the low score outcome. Thereafter, in subsequent retraining of the SAC model, we experimentally varied the value of the α and observed improved scoring abilities of the agent as shown in Figure 7b,c, respectively. In Figure 7d, we adjusted the scaling value of α until we arrived at 1.53, and then also, we slightly modified the values of the weight factors to f A = 0.63, f B = 0.22, and f C = 0.15, upon which, after completion of the training iterations, we achieved up to 95.4% assessment scoring accuracy. In this instance, only a few variations was observed between the agent's score (S 1 ) and the FBC doctors' score (S 2 ). For example, the lower end user's FBC symptom that was scored 10 by the doctor was also likewise scored 9.2 by the chatbot, both of which are equally within the bounds of serious FBC ailment confirmation.  In Figure 8, we utilized more statistical representation approaches to further analyze SAC's assessment scoring performance in comparison to the FBC assessment benchmark standard set by the FBC doctors. The FBC prescreening scores of SAC was compared against that of the doctors using the same group of patients. Since several distributed users can access and chat with SAC at the same time over the network, we subjected the SAC to multiple instances of textual dialogs using the same responses from the patients' Q&A data. In the SAC's testing process, a total number of 135 anonymized patients' Q&A responses to the doctors' FBC screening questions were used. In the graph plot of Figure 8a, the blue plot represents the doctors' screening scores for each of the 135 patients, whereas the corresponding orange plot represents the SAC's FBC assessment scores of the same patients. Keen observation of the graph plot outcome in Figure 8 indicates a high correlation accuracy achieved by SAC in FBC assessment scoring versus the benchmark scores from the CGMH FBC specialist doctors.  In addition, we also used the histogram in Figure 8b to show a cleaner comparative outcome of SAC's score accuracy versus the FBC doctors' scores using a random sample of 10 users. Here, the slight variations in both scores are clearly visible; however, irrespective of that, the histogram chart also shows that the SAC's scores meets up within the bounds of the doctors' score standard as described in Figure 7d.

Limitations
Although we designed the SAC's dialog-flow system to encourage users to respond to the agent's questions with domain-knowledge-related answers (see Figure 3), some users who tend to persist with verbose explanations in chats may result in an out-of-vocabulary situations for SAC as it is a rule-based chatbot. We assume that the likelihood of this occurrence can slightly affect SAC's overall assessment scoring. Therefore, in future work, we aim to increase the domain knowledge (i.e., FBC lexical keywords) in SAC's knowledge base and to improve the algorithm to become more dynamic in order to address these impending concerns.

Conclusions
In this paper, we presented the text-based chatbot (SAC) developed to prescreen users for symptoms of FBC ailment plaguing the majority of women around the world nowadays. The key design goal of SAC is to assist patients scored as having positive FBC symptoms to overcome personal barriers that often hinder them from initiating timely formal consultation with the appropriate healthcare providers. Invariably, this task is also a viable attempt to empower prospective FBC patients to avert the risk of uncertainties associated with late FBC treatment. Therefore, the SAC promises immense benefit to FBC healthcare providers and the general female population. To achieve the goal of this study, we modelled the SAC with bespoke trigram-based NLP technologies suitable for CWS. Finally, the evaluation outcome of SAC's performance indicate that it achieves high S 1 accuracy that is significantly comparable to the S 2 benchmark screening standard obtained from the FBC specialist doctors, which thus qualifies SAC for real-world practical use.