BRCA Variations Risk Assessment in Breast Cancers Using Different Artificial Intelligence Models

Artificial intelligence provides modelling on machines by simulating the human brain using learning and decision-making abilities. Early diagnosis is highly effective in reducing mortality in cancer. This study aimed to combine cancer-associated risk factors including genetic variations and design an artificial intelligence system for risk assessment. Data from a total of 268 breast cancer patients have been analysed for 16 different risk factors including genetic variant classifications. In total, 61 BRCA1, 128 BRCA2 and 11 both BRCA1 and BRCA2 genes associated breast cancer patients’ data were used to train the system using Mamdani’s Fuzzy Inference Method and Feed-Forward Neural Network Method as the model softwares on MATLAB. Sixteen different tests were performed on twelve different subjects who had not been introduced to the system before. The rates for neural network were 99.9% for training success, 99.6% for validation success and 99.7% for test success. Despite neural network’s overall success was slightly higher than fuzzy logic accuracy, the results from developed systems were similar (99.9% and 95.5%, respectively). The developed models make predictions from a wider perspective using more risk factors including genetic variation data compared with similar studies in the literature. Overall, this artificial intelligence models present promising results for BRCA variations’ risk assessment in breast cancers as well as a unique tool for personalized medicine software.


Introduction
Early diagnosis is the initial step in medical practice [1]. The integration of artificial intelligence (AI) approaches such as machine learning including fuzzy logic, neural network can transform big data into clinically actionable knowledge [2] and will become the foundation of precision medicine in three ways: quick decision making for clinicians, reasonable source for healthcare systems and individual data for better and precise treatment [3]. In particular, AI has been continuing to improve characterizations in genetic and molecular medicine since it was first conceived by providing knowledge management [4]. This has given rise to evidence-based computerized diagnostic tools, intended to aid physicians in making primary medical decisions and hence early diagnosis, which helps reduce the treatment options and increase survival rate [5]. An artificial intelligence model is used to simplify and accelerate this complex decision-making process. Some of the most important areas in medical research are related to cancer and cardiovascular diseases [4,6,7]. It is based on the complex clinical decision-making method that often accompanies the degree of uncertainty [8].
Breast cancer, as a heterogeneous disease, is the most common cause of cancer-related death in women and affects one in eight women globally [9]. In 2020, 2.3 million women were diagnosed with breast cancer and 685,000 deaths resulted from this disease [10]. Molecular, pathological and clinical characteristics complicate the progression of breast cancer [11]. However, the early detection of breast cancer is an effective method of reducing mortality [12]. Despite its complex aetiology, breast cancer is affected by both environmental and genetic factors. Generally, cancer results from the accumulation of genetic variations known as either somatic or germline. The majority (~70%) of breast cancer cases are sporadic [11]. While 10-30% cases are related to the inherited component, 4-5% cases were related autosomal dominant manner. Familial breast cancers are often seen in families and have been associated with susceptibility genes [13].
BRCA1 and BRCA2 are involved in maintaining genome integrity, at least in part, by engaging in DNA repair, cell cycle checkpoint control and even the regulation of key mitotic or cell division steps. Thus, the complete loss of function of either protein leads to a dramatic increase in genomic instability [14]. Women who inherit a deleterious germline BRCA1 or BRCA2 mutation face high lifetime risks of developing breast cancer by the age of 80, which are estimated to be 72% and 69%, respectively [15,16]. These women have a higher risk of having a second ipsilateral [17] or contralateral [18] breast cancer after being diagnosed with invasive breast cancer. Women with an inherited mutation in these genes also have a higher risk of developing ovarian cancer [19]. For BRCA1 mutation carriers, the risk increases significantly between the ages of 30-50, while the risks for BRCA2 mutations are highest between the ages of 40-60 [15]. BRCA1-associated breast cancers have aggressive pathological traits and are mainly hormone receptor-negative, whereas BRCA2associated breast cancers have sporadic characteristics and are predominantly hormone receptor-positive [16,20]. BRCA1 and BRCA2 genes alteration are also associate with other cancer types such as ovarian cancer (16.5-27%), prostate cancer (15%), pancreas cancer (2-7%) and possible melanoma [21,22]. The risk of ovarian cancer increases significantly by the age of 36-39 with BRCA1 mutation carriers and by the age of 44-46 with BRCA2 mutation carriers. On the other hand, the age range is around 63 for sporadic ovarian cancer [23].
The American College of Medical Genetics and Genomics (ACMG) has recommended a five different variant classification: pathogenic, likely-pathogenic, variant with unknown significance (VUS), likely-benign and benign [24]. The pathogenic variants contribute to the development of diseases [25]. However, a single pathogenic variant may not be sufficient to cause a disease. Likely pathogenic variants have a high likelihood (greater than 90% certainty) of causing disease; however, further evidence will be needed to confirm this assertion of pathogenicity [26]. VUS variants are crucial as the potential effect of the variant in the protein structure is either unknow or rare in the population or has not been registered before [24][25][26]. Thus, identification of VUS variants is important for precise treatment and targeted therapies. This developed artificial intelligence models have been successful in characterization the pathogenicity of VUS variants.
In the literature, many studies have used artificial intelligence models and created risk assessment or early prediction software [27][28][29][30][31][32][33][34][35][36][37]. To the best of our knowledge, this is the first study to assess breast cancer risk using BRCA1 and BRCA2 genetic variants using the MATLAB for both fuzzy logic and neural network.

Study Design and Cohorts
A retrospective integrated analysis was performed from two independent breast cancer cohorts from Bursa Uludag University, Department of Genetics and Erciyes University, Department of Medical Genetics, respectively. Sixteen different risk factors were determined for each subject. These risk factors were age, sex, consanguinity, family history, affected number of family number, tumour size, lymph node, degree of malignancy, tumour position, oestrogen receptor hormone, progesterone hormone, BRCA1 gene variation status, BRCA2 gene variation status, other gene status, diagnosis and variant classification.

MATLAB and Mamdani's Fuzzy Inference Method
MATLAB is a multiple paradigm digital computing software (R2018a) and a fourthgeneration programming language. MATLAB is a proprietary programming language developed by MathWorks and is a high-performance language for technical computing [38]. It combines computing, visualization and programming in an easy-to-use environment, where problems and solutions are expressed in familiar mathematical notations [39]. In this study, the fuzzy logic-based artificial intelligence model was developed on this platform using Mamdani's fuzzy inference method [40].
The five well-known main steps were used: (i) The fuzzification of inputs, (ii) Rule values were determined by using fuzzy logic operations, (iii) The implementation of fuzzy cluster logical processors as "and", "or", (iv) Collection of results; the combination of fuzzy clusters was represented as output of each rule, (v) Defuzzification, where the system clarified the total fuzzy cluster results and converted them into a single number ( Figure 1). The basis of the fuzzy logic system is the creation of a model that can think and make decisions by using data in input clusters [41]. All rules are evaluated in parallel, and the order of the rules is not important [42]. Our system was developed with 16 input attributes from the dataset of BRCA associated breast cancer patients and 1 output attribute with 5 features: pathogenic, likely-pathogenic, VUS, likely-benign and benign. Fuzzification was structured in triangular and trapezoidal membership functions. A Membership Function (MF) is a continuous curve that defines the degree of any numerical variable. The degree of membership is between 0 and 1. Implementation of the Mamdani inference system was made with a rule-based system of 268 rules using if-then statements. The system was characterized by these statements using logical combinations of inputs with an AND operator [43]. The Centroid technique was used for defuzzification and yielded a 95.5% accuracy.

Feed-Forward Neural Network Method
In this study, the neural network-based artificial intelligence model was developed on this platform using feed-forward method [44]. The three well-known main steps were used: (i) The Initialization of network, (ii) Feed-Forward; input values were set and hidden layer values were calculated by using neural network operations, (iii) Backpropagation, where the system clarified the total neural network cluster results and converted them into a single number [45].
An input layer moves in single direction from the input layer to the output layer, with a series of hidden layers and an output layer, each responding to different properties of the data [3]. Therefore, the system learns how to predict the output from the input data.

Data Collection and Study Design
Sixteen different risk factors were determined for each patient's data. Each risk factor was divided into sub-groups known as membership functions. Membership functions of each risk factor for both fuzzy logic and neural network models are shown in Table 1. Data from a total of 932 breast cancer patients were evaluated. 280 patients with genetic variations could be included in the study ( Table 2); 268 patients of out 280 were used to train the systems and 12 patients were introduced to systems. These 12 patients were used to test the systems. The remaining 652 patients were therefore not included. It is important to note that that only 22 patients out of 268 were male.

Generating Fuzzy Logic and Neural Network Systems on the MATLAB
As it is crucial to train the complete data to both systems, fuzzy logic and neural network, which includes input, rules and output sections, was generated on MATLAB.
A total of 43 different membership functions from 16 different input clusters were created (Table 1). Each patient's information was defined as a different rule, which yield as system with different perspectives and possibilities by evaluating 268 different data to give more accurate and sensitive results. For example, the fuzzy logic rules are shown in Figure 2.
Five membership functions, the values of which were given for each membership, were defined at the output section. A classification range was created to determine the variant pathogenicity which was predicted using in silico and variant analysis programs, previously ( Table 3). The values in Table 3 were determined by their pathogenic classification according to ACMG [21]. Results were evaluated at the test phase according to given classification values, which were created for the output cluster. Figure 3 shows a fuzzy logic interface on the MATLAB.  Randomly selected 160 patients were used to train the neural network system, 54 patients were tested for the system and finally, remaining 54 patients therefore used to validate the created data. The training regression (success) was obtained as 99.9% (R = 0.99976). The test success of the system was calculated as 99.7% (R = 0.99735). The validation rate was achieved as~99.6% (R = 0.99578). Thus, all regression were given as 99.9% (R = 0.99882) (Figure 4). Thus, this overall result was compatible with the accurate result that obtained by fuzzy logic (95.5%). (b) The test success of the system using 54 patient (99.7%). (c) The validation success of the system using remain 54 patients (~99.6%). (d) The overall success rate of the system (~99.9%). X-axis represented as output explain regressions data. Y-axis represented as target meaning success ratio between 0-1.

Testing the Systems
The designed software systems were tested using an operation test. Six different tests were conducted for 12 different individuals in the test group to check the accuracy and success rates in both systems. These individuals were grouped according to their variant results. It is important to note that these subjects had not previously been introduced to the system. However, the variant classifications were already known. Therefore, the system outcome result confirmed by previously conducted genetic analysis report. Four patients had two different pathogenic variants within either BRCA1 or BRCA2. Two subjects (subject 1 and 2) had the pathogenic BRCA2 c.7698deIC variant (Table 4) and the other two (subject 3 and 4) had the BRCA1 C.788dupG (p.Ser264*fs*1) pathogenic frameshift variant (Table 4). After the data for subject 1 and subject 2 were entered, the system calculated values for fuzzy logic were 0.900 and 0.890 and for neural network 0.999 and 0.999, respectively. According to the classification criteria and obtained values, the systems confirmed that both individuals were pathogenic. Subject 3 and subject 4 also had the same pathogenic variant, which gave the same risk scores of for both systems 0.900 and 0.999, respectively. Subject 5 and 6 had the same variant classified as likely pathogenic BRCA1 c.4070_4071delAA (p.Glu135.7Glyfs*10). While the test was focused on two likely pathogenic variants, we obtained 0.661 for both variants in the fuzzy logic system. However, neural network achieved 0.778 and 0.751, respectively. On the other hand, the system was tested for variants of unknown significance (VUS) such as BRCA2 c.9924 A > G (p.Ile3312Val), BRCA1 c.3368 A > G (p.Lys1290Glu) and RAD50 c.379 G > A, respectively (Table 4). Therefore, we focused on making the correct estimation of individuals with VUS and possible identification of VUS variants. In the fourth test, we checked the BRCA2 c.9924 A > G (p.Ile3312Val) VUS variant in both individuals (subject 7 and subject 8). The fuzzy logic system predicted cancer risk scores 0.425 and 0.489 for everyone, respectively. On the other hand, the neural network systems calculated the success rate as 0.502 and 0.449, respectively. Subject 9 and subject 10 both carried the BRCA1 c.3368 A > G (p.Lys1290Glu) VUS variant according to the ACMG criteria. The value obtained for subject 9 was 0.489. However, subject 10 had a cancer risk score of 0.571 cancer risk, which was within the likely pathogenic threshold in our fuzzy logic system. The neural network system values were 0.505 and 0.503, respectively. Subject 11 and subject 12 carried the same RAD50 c.379 G > A VUS variant and the fuzzy logic system predicted a value of 0.425 for both individuals, whereas the neural network values were 0.499 and 0.510, respectively.

Discussion
Artificial intelligence enables cheaper, faster and more practical results in medical diagnosis. As technology develops, the use of artificial intelligence will become more widespread especially in medical diagnosis. Rapid diagnosis and treatment are crucial for the prevention of many diseases, such as cancer in medicine. In this context, artificial intelligence applications have gained importance in recent years. In the last decade, the use of high-throughput sequencing methods accumulated enormous genetic variation data as well as patients' clinical and laboratory data. For this reason, it is thought that the use of the accumulated data in artificial intelligence applications would determine risk score assessment for the breast cancer which is the most common in women. Therefore, in this study, we aimed to evaluate the risk assessment for BRCA1and BRCA2associated breast cancer using fuzzy logic and neural networks systems.
Machine learning based on artificial intelligence was successfully used to classify cancer risk scores by Kaya and Turk (2020). They used a total of 140 data to test including 130 for test performance analysis and the remaining 10 for status determination [27]. In the current study, 268 different patients' data were trained in both fuzzy logic and neural network systems. Therefore, broader perspectives were used in both systems for decisionmaking section whereas the risk of making errors were reduced.
A previous study was focused on cytological and histological image analysis in breast diseases for diagnostic outcomes using the fuzzy logic system on the MATLAB [28].
In 2018, A fuzzy logic system was used to predict breast cancer mortality with only five risk factors such as age, personal history, grade, malignant tumour classification (TNM) stage and multicentricity [29]. Furthermore, Domingo et al., (2019) only used six risk factors on fuzzy logic for predicting the stages of breast cancer [30]. As the variety of risk factors was quite low, they mainly focused on lymph nodes and tumours with a narrow perspective.
Sahria and Mandang (2019) developed a program that could show the risk of breast cancer based on the fuzzy logic method using five histological risk factors for only young women [31]. Controversially, the developed systems in this study can applied any age, gender.
On the other hand, the neural network system was previously proposed to diagnose breast cancer patterns using histological and demographic characteristics, such as Togaçar et al., (2020) investigated the diagnostic process based on histological image analysis in breast cancer using deep learning and a convolutional neural network giving success rate of 98.80% [32]. In another study, a hybrid deep neural network with artificial intelligence was successfully used to classify breast cancer risk scores by Yan et al., (2020) based on histological image classification and an average accuracy was 91.3% [33]. Zhang et al. (2020) investigated three breast cancer molecular subtypes based on DCE-MRI images using a convolutional neural network [34]. Thus, in this study, 16 different risk factors were used with the aim of obtaining more accurate results affecting breast cancer with a broader perspective to give more significant value than similar studies in the literature.
The most important key point of the study was the risk assessment of two designed different artificial intelligence methods were based on cancer-associate genes and gene variants. Two recent study aimed to predict breast cancer using histopathology and radiology images for BRCA-mutation carriers using deep learning [35] and machine learning [36], respectively. This current study mainly focused on gene variant-based risk assessment in cancer.
Genetic variants are classified as pathogenic, likely pathogenic, VUS, likely benign and benign according to ACMG [21]. However, problems arising from the evaluation and diagnosis of VUS variants have been a major challenge for physicians and geneticist today. More importantly, VUS variant carrier cancer patients cannot benefit from treatment processes. A study designed to classify BRCA gene related VUS variant in breast cancer using statistical method, previously [37]. In their study, VUS variant classified as either pathogenic or non-pathogenic.
Fuzzy logic and neural network systems in this study were designed and trained to give risk scores to VUS variants using other clinical outcomes of the patient. Therefore, physicians can evaluate VUS variant with given risk score 87 patients with pathogenic, 23 with likely pathogenic, 128 VUS, 29 likely benign and 1 benign BRCA1 and BRCA2 gene variants together with 14 other clinical breast cancer risk factors. Moreover, systems were tested for 12 new individuals including two pathogenic (BRCA2 c.7698deIC and BRCA1 C.788dupG), one likely pathogenic (BRCA1 c.4070_4071deIAA) and three VUS (BRCA2 c.9934 A > G, BRCA1 c.3368 A > G, RAD50 c.379 G > A) variants.
In these models, the neural network system overall success rate was achieved as 99.9% whereas training success (99.9%), evaluating validation success (99.6%), test success (99.7%). Therewithal, the fuzzy logic system showed 95.5% accuracy rate. Therefore, as a result, the accuracy rates given by these systems were precisely correct. Software codes will be available Near East University DESAM Research Institute web link (https: //desam.neu.edu.tr/, accessed on 17 October 2021).

Conclusions
Overall, in this study, developed fuzzy logic and neural networks models were found to be successful in predicting correct risk scores for BRCA1 and BRCA2 associated breast cancers, especially classifying VUS variants. Thus, we believe that the generated fuzzy logic system will become a good source for the identification of VUS variants in breast cancer diagnosis. To conclude, the artificial intelligence model will provide significant advantages considering an early diagnosis and personalized therapy are vital in cancer.  Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest.