Exploring Machine Learning Techniques to Predict the Response to Omalizumab in Chronic Spontaneous Urticaria

Background: Omalizumab is the best treatment for patients with chronic spontaneous urticaria (CSU). Machine learning (ML) approaches can be used to predict response to therapy and the effectiveness of a treatment. No studies are available on the use of ML techniques to predict the response to Omalizumab in CSU. Methods: Data from 132 CSU outpatients were analyzed. Urticaria Activity Score over 7 days (UAS7) and treatment efficacy were assessed. Clinical and demographic characteristics were used for training and validating ML models to predict the response to treatment. Two methodologies were used to label the data based on the response to treatment (UAS7 ≥ 6): (A) at 1, 3 and 5 months; (B) classifying the patients as early responders (ER), late responders (LR) or non-responders (NR) (ER: UAS 7 ≥ 6 at first month, LR: UAS 7 ≥ 6 at third month, NR: if none of the previous conditions occurred). Results: ER were predominantly characterized by hypertension, while LR mainly suffered from asthma and hypothyroidism. A slight positive correlation (R2 = 0.21) was found between total IgE levels and UAS7 at 1 month. Variable Importance Analysis (VIA) reported D-dimer and C-reactive proteins as the key blood tests for the performance of learning techniques. Using methodology (A), SVM (specificity of 0.81) and k-NN (sensitivity of 0.8) are the best models to predict LR at the third month. Conclusion: k-NN plus the SVM model could be used to identify the response to treatment. D-dimer and C-reactive proteins have greater predictive power in training ML models.


Introduction
Chronic spontaneous urticaria (CSU) is defined by the spontaneous occurrence of wheals, angioedema or both that last longer than 6 weeks. Globally, it affects 1% of the general population, has an unpredictable course and duration, and 11-14% of the patients suffer for more than 5 years [1]. Moreover, the impaired quality of life of these patients has a dramatic impact on daily life, personal relationships, work and sleep [2,3].
Appropriate effective treatment is, therefore, extremely important. According to the EAACI/GA 2 LEN/EDF/WAO guideline for urticaria, the first-line therapy is secondgeneration H1-antihistamines in standard dose, but, unfortunately, these are effective in less than 50% of CSU patients [4]. The third-line therapy, omalizumab, an anti-IgE monoclonal antibody, is more effective with a complete response rate that ranges from 26% to 83%, as demonstrated in several landmark studies [5][6][7][8][9].
Ideally, the treatment of patients with CSU should be "tailored to patient's" clinical or biochemical characteristics, based on predictors of response to treatment.
Identification of these predictors will save time, costs and improve patient's lifestyle. Machine Learning (ML) approaches are bioinformatic techniques that use labeled data to try to identify significant patterns called "supervised" with the purpose to create a statistical model to explain an unresolved question. Once the model is created, these algorithms are able to predict the class for new data whose label is unknown. Each label is associated with a set of features, which usually are able to explain the model. In this study, CSU patients are labeled as early, late and non-responders while the features used to train the model were several characteristics, such as demographic and clinical parameters etc., associated to each patient.
It is well known that algorithms have been used in medicine [10], to detect anaphylaxis cases [11], in human microbiome studies [12], in anesthesiology [13], obesity [14] and drug discovery [15], and other medical fields [16]. At the moment, nobody has assessed the performances of well-known ML approaches for the prediction of treatment response with omalizumab in patients with CSU.

Participants
From October 2018 to December 2019, database was retrospectively collected from 132 South Italian CSU outpatients recruited from the Allergy Disease Center "Prof. Giovanni Bonsignore" and the Allergology and Pulmonology Unit of Palermo. Urticaria activity was assessed using the UAS7. As response threshold, UAS7 was considered equal or greater than 6 to classify CSU patients as "early responders" (ER; if they started to respond at 1 month and remained in this condition until the fifth month), and "late responders" (LR; if the response was achieved at the third month). If none of the previous conditions occurred, the patients were considered as "non-responders" (NR). The classification of our cohort is described in Table 1. Several baseline variables were collected, e.g., age, sex, residence, weight, height, urticaria start date, co-occurrence of angioedema, total serum IgE (UI/mL), total number (k/uL) and mean basophils number/mm 3 [17], D-dimer (ng/mL), C-reactive protein (mg/L), co-occurrence of allergies or other concomitant diseases, pre-treatment outlier blood test exams, basophil activation test (BAT) results, UAS and UAS7 pre-treatment, after 1 month, 3 and 5 months.

Machine Learning Approach
Support Vector Machine (SVM) [18] is a widely used classification algorithm. SVM learns from data trying to find the best hyperplane in a high-dimensional space that is capable of dividing them in different classes. In this direction, kernel function provides a methodology to deal with non-linearly separable data. The algorithm is based on support vectors representing actual data maximizing the distance between each class from the hyperplane. k-nearest neighbors (k-NN) considers the characteristics of closest objects to classify new input data. New cases are labeled according to a voting technique by using the most common class among its k neighbors [19]. Generally, k = 1 is a good choice indicating that only 1 neighbor is considered, although with larger dataset higher values of k can be used.
Cross validation method is used to stress the results in order to validate their ability to generalize. The classification is performed several times by using smaller subsets of the original data. Finally, the results are combined together to obtain the definitive classification performance. This procedure makes the model independent from the dataset and increases the probability that it will perform better with new data [20].

Data Preparation
Medical records were screened to identify the eligible patients. All numerical and categorical variables (independent variables) were used to create both linear and multiple regression models to study their relationship with UAS7 (response variable).
Concomitant diseases were organized into general groups to identify the most frequent associated diseases (Table 2). The same approach was used for the abnormal blood test values (Table 3). Outlier values from each variable were removed. After that, missing values were replaced with the mean value. Finally, the numerical variables were preprocessed and normalized with a scaling function.
Different ML models (e.g., k-NN, SVM, lasso, logistic, ridge and elastic net regression) that have been profitably applied for predicting clinical information were explored [18][19][20][21][22][23], to find a model that could predict the response to treatment with omalizumab in CSU patients. The final dataset was randomly subdivided into a training and a test set of twothirds and one-third, respectively. The results were assessed with a 2-fold cross validation that was repeated 10 times.
The performance of each model was compared calculating the statistical measures e.g., accuracy, specificity, sensitivity, precision, F1 score [24].
The contribution of each variable was evaluated through Variable Importance Analysis (VIA) in R with both caret and Boruta packages. Model was trained with data scaling preprocessing, svmRadialWeights method and train control with repeatedcv resampling method.

Ethical Approval
The study was conducted according to the principles of the Declaration of Helsinki and approved informed consent was obtained from all patients.

Concomitant Diseases
Results showed no correlation between concomitant diseases and the severity of urticaria (UAS7) or the response to post-treatment therapy.
It was observed that ER patients were predominantly characterized by hypertension, while LR and NR mainly suffered from asthma. Overall, rhinitis and dyslipidemia were common concomitant diseases. Conversely, hypothyroidism was found only in the group of NR patients. Between groups, not statistical difference was found in the number of concomitant diseases.

Treatment Efficacy
Interesting, almost 60% of the CSU patients had a good response to omalizumab after 1 month treatment and, this percentage increased as far as more than 80% after 5 months treatment ( Figure 1). scaling preprocessing, svmRadialWeights method and train control with repea resampling method.

Ethical Approval
The study was conducted according to the principles of the Declaration of He and approved informed consent was obtained from all patients.

Concomitant Diseases
Results showed no correlation between concomitant diseases and the sever urticaria (UAS7) or the response to post-treatment therapy.
It was observed that ER patients were predominantly characterized by hyperte while LR and NR mainly suffered from asthma. Overall, rhinitis and dyslipidemia common concomitant diseases. Conversely, hypothyroidism was found only in the of NR patients. Between groups, not statistical difference was found in the num concomitant diseases.

Treatment Efficacy
Interesting, almost 60% of the CSU patients had a good response to omalizumab 1 month treatment and, this percentage increased as far as more than 80% after 5 m treatment ( Figure 1).  The clinical response of CSU patients before and after treatment with Omalizumab is shown in Figure 2.
It was interesting to know, how the response to omalizumab has been over time. The scores obtained at 1, 3 and 5 months confirmed that the majority of patients responded moderately at the first month and almost completely, at third and fifth month (Figure 3) [25].

Response to Treatment Based on Patient's Characteristics
It was also investigated whether gender, age, weight or height affected the severity of urticaria or the response to therapy. Only the variables height and age at 1 and 5 months, respectively, were found to be statistically significant for the response to treatment with omalizumab [26].
The place where patients live (inner city vs. countryside) affected neither the severity nor the duration of the disease.

Impact of Disease Duration
A positive, although limited correlation was found between the number of years the patient had been suffering from urticaria and the response to therapy. Conversely, CSU patients with angioedema were statistically related neither to the severity of the disease nor to the response to therapy.

Association between Total IgE Levels and Response to Omalizumab
It was found a slight positive correlation (R 2 = 0.21) between total serum IgE levels and UAS7 at month 1, with no correlation at months 3 or 5. It was interesting to know, how the response to omalizumab has been over time. The scores obtained at 1, 3 and 5 months confirmed that the majority of patients responded moderately at the first month and almost completely, at third and fifth month (Figure 3) [25]. It was interesting to know, how the response to omalizumab has been over time. The scores obtained at 1, 3 and 5 months confirmed that the majority of patients responded moderately at the first month and almost completely, at third and fifth month (

Variable Importance Analysis
Based on patient data, the Variable Importance Analysis (VIA) assumes considerable significance for the selection of the best features and for the classification of performance improvement.
For this purpose, we considered personal patient information, disease duration, serological results and, finally, disease severity at 1, 3 and 5 months.
The importance of variables for the performance classification changes over time (Figures 4-6).
patients with angioedema were statistically related neither to the severity of the disease nor to the response to therapy.

Association between Total IgE Levels and Response to Omalizumab
It was found a slight positive correlation (R 2 = 0.21) between total serum IgE levels and UAS7 at month 1, with no correlation at months 3 or 5.

Variable Importance Analysis
Based on patient data, the Variable Importance Analysis (VIA) assumes considerable significance for the selection of the best features and for the classification of performance improvement.
For this purpose, we considered personal patient information, disease duration, serological results and, finally, disease severity at 1, 3 and 5 months.
The importance of variables for the performance classification changes over time (Figures 4-6).     However, many variables ranked similarly in the first three months of treatm D-dimer and C-reactive protein, followed by age, height, weight. By contrast, tota IgE and basophils levels had a lower rank of importance.
Boruta method analysis showed that only C-reactive protein is signi associated with treatment response at 1 month (Figure 7). However, many variables ranked similarly in the first three months of treatment, like D-dimer and C-reactive protein, followed by age, height, weight. By contrast, total serum IgE and basophils levels had a lower rank of importance.
Boruta method analysis showed that only C-reactive protein is significantly associated with treatment response at 1 month (Figure 7). Basically, the variable analysis selects statistically and clinically relevant tests. It is important to consider these results for the follow up of CSU patients during omalizumab treatment.

Machine Learning Methods Classification
The overall results for each ML method are summarized in Table 4. Table 4. Prediction of the response to treatment with omalizumab using 5 different ML methods. Basically, the variable analysis selects statistically and clinically relevant tests. It is important to consider these results for the follow up of CSU patients during omalizumab treatment.

Machine Learning Methods Classification
The overall results for each ML method are summarized in Table 4. The SVM model was created by using onset of urticaria (years), total serum IgE Test, Basophils percentage and Basophils Counts, D-dimer, Reactive C-Protein and UAS7 pre-treatment. Several kernels were tested, e.g., linear, polynomial, radial basis and sigmoid. The best results were obtained with sigmoid kernel, reaching an increased performance. The cost of constraints violation was set to 100. Finally, for each temporal step, 1, 3 and 5 months of treatment, accuracy, sensitivity, specificity and precision measures were obtained.
Taken together, these results confirm the utility of the ML approach in learning from patient clinical records and suggest the use of feature selection through VIA as a powerful statistical tool.

Discussion
Electronic Medical Records (EMR) are a powerful source of information and temporal data that foster retrospective studies as the number of patients grows. The use of ML techniques in allergies is still being explored [27]. To our knowledge, this is the first time that a study explores the potential of ML approaches in predicting the response to omalizumab in patients with chronic spontaneous urticaria (CSU).
Moreover, these techniques help to identify the most important indicators through feature selection such as D-dimer and C-reactive protein.
Most patients treated with omalizumab respond quickly to treatment, although to varying degrees. The most selective classification methods (k-NN and SVM) are able to provide high accuracy but lower precision value. These results could be explained due to the intrinsic diversity of our cohort and, by extension, the original cause of urticaria and the way omalizumab affects each patient. Furthermore, there is a reduced number of patients with urticaria in the analyzed time interval because many of them responded early to omalizumab. As a consequence, many classifiers are able to identify very well the true negative responses at the beginning, while accuracy tends to decrease, although, conversely, the true response rate increases in the third month. In this scenario, SVM represents the most stable approach.
Classical ML approaches are suitable for non-large datasets and hyperparameter optimization is easier to control. These algorithms can shed light on disease-specific traits by analyzing the most statistically relevant characteristics emerging from the data. The results confirmed a mild robustness to potential bias [28] by revealing literaturebased characteristics of CSU patients. Feature selection starts from clinical practice being able to suggest new examinations and tests not previously linked to the disease, with the aim to predict the outcome or the response to treatment in novel patients. It is important that future studies extend the analysis with ML approaches, considering as much information as possible; especially for diseases with unknown etiology like CSU.
Interesting, ML approaches showed good accuracy already in the third month of treatment with omalizumab and selected concomitant disease, disease duration and serological exams (IgE levels, D-dimer and C-reactive protein among others) were crucial characteristics that achieved a better performance. Having classified the patients into ER or LR was the best choice compared with other alternatives.

Conclusions
Nowadays, omalizumab is the only approved third-line treatment for patients with antihistamine refractory CSU [4]. ML techniques could be effectively used to predict the response to omalizumab therapy, extending our understanding about how it works [29,30].
Further studies involving transcriptional levels could broaden this landscape for the selection of new clinical biomarkers. Institutional Review Board Statement: The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the Ethics Committee "Palermo 1, Azienda Ospedaliera Universitaria Policlinico Paolo Giaccone di Palermo" (protocol n.1/2020, date 15 January 2020).

Informed Consent Statement:
Informed consent was obtained from all subjects involved in the study.
Data Availability Statement: Not applicable.

Conflicts of Interest:
The authors declare no conflict of interest.