Prediction Model for Diagnosis of Kawasaki Disease Using iTRAQ-Based Analysis

A quick prediction method may help confirm the diagnosis of Kawasaki disease (KD), and reduce the risk of coronary artery lesions. The purpose of this study was to evaluate potential candidate diagnostic serum proteins in KD using isobaric tagging for relative and absolute quantification (iTRAQ) gel-free proteomics. Ninety two subjects, including 68 KD patients (1.6 ± 1.2 years, M/F 36/32) and 24 fever controls with evident respiratory tract infection (2.1 ± 1.2 years, M/F 13/11) were enrolled. Medical records were reviewed for demographic and laboratory data. The iTRAQ gel-free proteomics was used to screen serum proteins completely and compare the difference between two groups followed by specific validation with ELISA. The candidate proteins and conventional laboratory items were selected for the prediction model of KD diagnosis by support vector machine. Five selected candidate proteins, including protein S100-A8, protein S100-A9, protein S100-A12, neutrophil defensin 1, and alpha-1-acid glycoprotein 1 were identified for developing the prediction model of KD diagnosis. They were used to develop an efficient KD prediction model with an area under receiver operating characteristic (auROC) value of 0.92 (95% confidence interval: 0.84, 0.98). These protein biomarkers were significantly correlated with the conventional laboratory items as follows: C-reactive protein, glutamic pyruvic transaminase, white blood count, platelet, segment and hemoglobin. These conventional laboratory items were used to develop a prediction model of KD diagnosis with an auROC value of 0.88 (95% confidence interval: 0.80, 0.96). Our result demonstrated that the prediction model with combined five selected candidate protein levels may be a good diagnostic tool of KD. Further prediction model with combined six conventional laboratory data is also an acceptable alternative method for KD diagnosis.


Introduction
Kawasaki disease (KD) is an acute systemic vasculitis, preferentially affecting Asian infants and children under 5 years of age [1,2]. Delayed detection and management of KD (especially atypical KD) may lead to a high risk of coronary artery lesions (CALs) [2,3]. The diagnosis of KD depends mainly on the clinical features. Typical diagnostic symptoms of KD include fever for at least 5 days, nonpurulent conjunctivitis, extremity edema, oral mucosal changes with red lips and strawberry tongue, cervical lymphadenopathy, and a polymorphous skin rash [2]. However, these clinical features are not objective and similar to the infectious symptoms by pathogens, resulting in difficulty in accurate diagnosis and timely treatment of KD.
A quick and efficient prediction method may facilitate the diagnosis of KD, significantly reducing the risk of CALs. The purpose of this study was to evaluate candidate diagnostic serum proteins in KD using iTRAQ gel-free proteomics, compare selected candidate proteins with conventional laboratory items, and develop the prediction model of KD diagnosis using both combined selected candidate proteins as well as conventional laboratory items.

Subject Enrollment
We enrolled 68 typical KD patients and 24 fever control (FC) subjects as the comparison group between 1998 and 2018 at the department of pediatrics, Kaohsiung Veterans General Hospital (KVGH), Taiwan. The inclusion criteria were as follows: (1) KD patients meeting the diagnostic criteria of KD, including fever for at least 5 days, nonpurulent conjunctivitis, extremity edema, oral mucosal changes with red lips and strawberry tongue, cervical lymphadenopathy, a polymorphous skin rash and Bacillus Calmette-Guérin (BCG) site erythema [2]. (2) FC subjects with fever (body temperature ≥ 38 • C) for at least 5 days, with evident respiratory tract infections diagnosed, and without history of KD, autoimmune disease, allergic disease or cardiovascular disease. The exclusion criteria were as follows: (1) KD patients who had no serum samples before IVIG treatment in acute stage, missing laboratory data, did not receive IVIG treatment or received initial IVIG treatment beyond 10 days of fever. (2) FC subjects who had no serum samples in acute stage, or missing laboratory data. Medical records were reviewed for age, sex, presenting symptoms, doses of IVIG treatment (2 gm/kg/dose), complications and laboratory data. Blood samples (about 1-2 mL) of KD patients were be collected before IVIG treatment in acute stage after obtaining informed consent from their guardians. The FC subjects also underwent blood sampling (about 1-2 mL) once after obtaining informed consent from their guardians. Samples were processed, separated into aliquots of 1 mL and then frozen to −80 • C until analysis of proteomics. This study was carried out after the approval of the Institutional Review Board of KVGH (IRB number: VGHKS19-CT2-22). All guardians signed the informed consent form.

ITRAQ Gel-Free Proteomics
For diagnosis of KD, we first had serum samples subject to high-abundant depletion kit treatment to remove high-abundant proteins, including albumin, immunoglobulin (Ig)G, IgA, IgM, α1-acid glycoprotein, α1-antitrypsin, α2-macroglobulin, apolipoprotein A-I, apolipoprotein A-II, fibrinogen, haptoglobin and transferrin. Then, 12 serum samples from FC subjects were evenly pooled to generate two serum libraries. Twelve serum samples from KD patients in acute phase were evenly pooled to generate two serum libraries. By doing so, we collected two pooled FC and two pooled KD serum libraries. Then, the following treatment was performed using the iTRAQ reagent kit according to the manufacturer's instructions. The samples were reduced with tris-(2-carboxyethyl) phosphine and alkylated with methyl methanethiosulfonate. Trypsin was added to the samples, and the solutions were incubated at 37 • C overnight. The peptide mixtures were then labeled with the iTRAQ 4-plex reagent. Two FC sample libraries were labeled with iTRAQ 114 and 115. Two KD sample libraries were labeled with iTRAQ 116 and 117.
To examine the difference of serum proteins between the KD and FC groups by iTRAQ, samples were be pooled for analysis in each group. Serum proteins with 1.5 fold change were selected as candidate biomarkers for KD diagnosis.

Support Vector Machine (SVM) Alignment for Prediction Model of KD Diagnosis
SVM is one type of machine learning algorithm specifically good in dealing with binary classification problem, e.g., disease versus health, control versus treatment and positive versus negative [25]. When applying SVM in the binary classification jobs, the users must first prepare the positive and negative sets and the individuals in sets must have numerical variables (also called vectors). For example, we collected positive (KD) and negative (FC) sets in this study. Every subject had numerical values of five selected candidate protein levels and the following conventional laboratory data including c-reactive protein (CRP), glutamic oxaloacetic transaminase (GOT) and glutamic pyruvic transaminase (GPT), etc. Then, by inputting the data of positive and negative sets into SVM, SVM generated a classification model. With this classification model, an unknown case (waiting for being classified into positive or negative set) can be classified. A machine learning algorithm usually reports an area under the receiver operating characteristic curve (auROC) value to represent the overall performance [26]. The X-and Y-axis of an auROC plot denote false positive rate and true positive rate, respectively.

Statistical Analysis
Continuous variables are expressed as means with standard deviation. Categorical variables are presented as absolute numbers and percentages. Comparison of continuous variables (or categorical variables) between two groups was carried out using the two-tailed unpaired t-test (or chi-square or Fisher's test as appropriate). Correlation analysis was carried out using the Pearson correlation method. SVM was used for prediction model of KD diagnosis. An auROC value with 95% confidence interval (CI) was calculated. A p-value < 0.05 was considered statistically significant.

Discussion
Our study demonstrated that the prediction model with combined S100A8, S100A9, S100A12, DEFA1 and ORM1 levels can help confirm the KD diagnosis with an auROC value 0.92. The combined six conventional laboratory data (CRP, GOT, GPT, WBC, platelet, segment and Hgb) can help confirm the KD diagnosis with an auROC value 0.88. The auROC values of the two models overlapped so that both models are useful for early diagnosis of KD. In the clinical practice, the prediction model with combined conventional laboratory data is a more easy and cost-effective way for KD diagnosis than that with combined selected candidate protein levels. Few cytokine based platforms for KD diagnosis have been reported and underline the potential value of our prediction model using combined six conventional laboratory data or five selected candidate protein levels [27,28]. In this series, we further documented the significant association of the five selected candidate proteins and conventional laboratory items.
The previous dada on the KD diagnosis based on proteomics are limited and inconsistent. Zhang et al. reported that 29 differentially expressed proteins in KD patients and

Discussion
Our study demonstrated that the prediction model with combined S100A8, S100A9, S100A12, DEFA1 and ORM1 levels can help confirm the KD diagnosis with an auROC value 0.92. The combined six conventional laboratory data (CRP, GOT, GPT, WBC, platelet, segment and Hgb) can help confirm the KD diagnosis with an auROC value 0.88. The auROC values of the two models overlapped so that both models are useful for early diagnosis of KD. In the clinical practice, the prediction model with combined conventional laboratory data is a more easy and cost-effective way for KD diagnosis than that with combined selected candidate protein levels. Few cytokine based platforms for KD diagnosis have been reported and underline the potential value of our prediction model using combined six conventional laboratory data or five selected candidate protein levels [27,28]. In this series, we further documented the significant association of the five selected candidate proteins and conventional laboratory items.
The previous dada on the KD diagnosis based on proteomics are limited and inconsistent. Zhang et al. reported that 29 differentially expressed proteins in KD patients and

Discussion
Our study demonstrated that the prediction model with combined S100A8, S100A9, S100A12, DEFA1 and ORM1 levels can help confirm the KD diagnosis with an auROC value 0.92. The combined six conventional laboratory data (CRP, GOT, GPT, WBC, platelet, segment and Hgb) can help confirm the KD diagnosis with an auROC value 0.88. The auROC values of the two models overlapped so that both models are useful for early diagnosis of KD. In the clinical practice, the prediction model with combined conventional laboratory data is a more easy and cost-effective way for KD diagnosis than that with combined selected candidate protein levels. Few cytokine based platforms for KD diagnosis have been reported and underline the potential value of our prediction model using combined six conventional laboratory data or five selected candidate protein levels [27,28]. In this series, we further documented the significant association of the five selected candidate proteins and conventional laboratory items.
The previous dada on the KD diagnosis based on proteomics are limited and inconsistent. Zhang et al. reported that 29 differentially expressed proteins in KD patients and found that the proteins, especially Transthyretin, are potential markers for therapeutic monitoring [19]. Kimura et al.' results suggested leucine-rich alpha-2-glycoprotein could be used as biomarkers to facilitate KD [20]. Kuo et al. found 70 proteins were shown to have high accuracy in diagnosis of KD [21]. Compared to these previous studies using traditional proteomics [19][20][21], our study used iTRAQ method and showed it as a powerful tool for discovering candidate proteins in KD diagnosis. Further multi-center studies are needed to evaluate the feasibility of iTRAQ method in selecting diagnostic or therapeutic targets in KD.
The role of five selected candidate proteins for KD diagnosis in this series had been reported previously. Fu et al. reported that S100A12 expression on the circulating endothelial cell (CEC) surface increased significantly in patients with KD [29]. Armaroli et al. suggested S100A12 appears to activate human coronary artery endothelial cells in an IL-1β-dependent manner [30]. Li et al.' study showed S100A12 promoted both freshly clinically isolated neutrophils and neutrophil-like cells to infiltrate through the endothelial layer in vitro [28]. Further in vitro study by Li et al. implied that S100A12 could be a potential therapeutic target for KD [28]. Lech et al. reported elevated levels of plasma calprotectin months to decades after acute KD and infiltration of cells expressing S100A8 and S100A9 in vascular tissues suggest ongoing, subclinical inflammation [31]. DEFA1 and ORM1 were first reported to be significantly correlated with Kawasaki disease in acute stage according to Li et al.' study [28]. ORM1 is very significantly correlated to CRP in this series. Both ORM1 and DEFA1 are related to inflammation [32,33] and therefore speculated to play a role in the KD pathogenesis. Further studies are required to evaluate the role of these proteins, especially DEFA1 and ORM1, in the pathogenesis of KD.
There are some limitations of this study. First, the big limitation is that the comparison group cannot completely meet the major signs of KD except fever duration ≥5 days and cervical adenopathy. We matched two groups in terms of age, sex and blood sampling time since fever onset for decreasing analytic deviation. Second, the comparison group comprised FC subjects with heterogeneous respiratory tract infections. The extrapolation and validation of our results are therefore limited. Third, this is a single-center investigation with limited number of patients. A multi-center study with a large cohort is suggested.

Conclusions
In conclusion, our result demonstrated that the prediction model with combined S100A8, S100A9, S100A12, DEFA1 and ORM1 levels may be a good diagnostic tool of KD. Further prediction model with combined six conventional laboratory data (CRP, GPT, WBC, Hgb, platelet and segment) is an acceptable alternative method for KD diagnosis.

Informed Consent Statement:
After providing a detailed explanation of the study, informed consent was obtained from all guardians.

Conflicts of Interest:
The authors declare no conflict of interest.