Can AI Help Pediatricians? Diagnosing Kawasaki Disease Using DRSA

The DRSA method (dominance-based rough set approach) was used to create decision-making rules based on the results of physical examination and additional laboratory tests in the differential diagnosis of Kawasaki disease (KD), infectious mononucleosis and S. pyogenes pharyngitis in children. The study was conducted retrospectively. The search was based on the ICD-10 (International Classification of Diseases) codes of final diagnosis. Demographic and laboratory data from one Polish hospital (Poznan) were collected. Traditional statistical methods and the DRSA method were applied in data analysis. The algorithm formed 45 decision rules recognizing KD. The rules with the highest sensitivity (number of false negatives equals zero) were based on the presence of conjunctivitis and CRP (C-reactive Protein) ≥ 40.1 mg/L, thrombocytosis and ESR (Erythrocyte Sedimentation Rate) ≥ 77 mm/h; fair general condition and fever ≥ 5 days and rash; fair general condition and fever ≥ 5 days and conjunctivitis; fever ≥ 5 days and rash and CRP ≥ 7.05 mg/L. The DRSA analysis may be helpful in diagnosing KD at an early stage of the disease. It can be used even with a small amount of clinical or laboratory data.


Introduction
Despite the development of medicine, in some cases, there are still no tests giving a clear diagnosis. One of such examples is the differential diagnosis of Kawasaki disease (KD), infectious mononucleosis and Streptococcus pyogenes infections (angina, scarlet fever). Dealing with each of those diseases is completely different. Proper diagnosis and accurate treatment implementation, especially in KD and streptococcal pharyngitis, can reduce the risk of permanent complications [1].
Symptoms presented by the patient such as pharyngitis with cervical lymphadenopathy and ambiguous skin lesions are almost the same in each of those cases and do not differentiate the mentioned diseases. Additional tests might be helpful, but none of them are 100% sensitive. In Table 1, there are presented symptoms and laboratory findings at the initial stage of three analyzed diseases.
That is why we decided to use artificial intelligence to try to deal with this problem. In this study, for the first time, the original methodology of the dominance-based rough set approach (DRSA) was used to create decision-making rules based on the results of physical examination as well as additional tests that can potentially be helpful in the differential diagnosis of KD, infectious mononucleosis and S. pyogenes pharyngitis in children. DRSA enables testing, even in the case of incomplete data, enables the comparison of parametric and non-parametric data. The disadvantage of the classical scaling system is that the conversion of clinical data into numeric values risks losing the primary character of the data. Different parameters are added together as if they were equal, such that the sum of completely different parameters can yield the same results. The obtained results are non-informative in that they do not show how the diagnosis was made. It discourages the application of this system in therapeutic decisions because it does not give decision makers the chance to evaluate the independent results. Abbreviations: ALT-glutamic pyruvic transferase; anti-VCA-anti-viral capsid antigen; anti-EA-anti-early antigen; anti-EBNA-anti-EBV nuclear antigen; AST-aspartate transaminase; CRP-C-reactive protein; SR-Sedimentation Rate; PCT-procalcitonin; StrepTest-Group A Streptococcus antigen immunoassay (throat swab); N-normal; ↓-below normal range; ↑-slightly above normal range; ↑↑-much above normal range.

Materials and Methods
The study was conducted retrospectively. The search was based on the ICD-10 (International Classification of Diseases) codes of final diagnosis.
Data was collected from 1 January 2015 to 31 December 2019 in the Children's Hospital in Poznan, which includes four pediatric departments. Children of both sexes up to five years of age in the case of infectious mononucleosis and S. pyogenes infection were included in the study. When considering KD, all children with this particular diagnosis were included in order to extend the study group as much as possible. The definite diagnosis of KD was based on AHA (American Heart Association) criteria. We used this age restriction because we wanted to eliminate adolescent patients with infectious mononucleosis. Epidemiological data show that this disease is most common in older children and young adults, in whom KD is very unlikely [2]. The second aspect is the prevalence of KD, primarily in children under five years of age. In the absence of an age restriction (excluding KD), comparing such inhomogeneous in age groups would not fit into the overall study aim.
Streptococcal pharyngitis was diagnosed based on positive rapid antigen tests for S. pyogenes. According to Polish recommendations, when there is a positive high-sensitivity test, it is not necessary to perform a throat culture [3]. The test was performed at the admission to the hospital, so at the beginning of diagnostic process, as well as before treatment started. The diagnosis of infectious mononucleosis was made based on positive serological results-positive IgM and IgG anti-VCA (anti-viral capsid antigen) antibodies. We collected following data: basic demographic information, the duration of the fever and physical examination results. General status was defined as good, fair, serious or critical. When it comes to examination, we chose those clinical signs which are known as diagnostic criteria for KD: general condition, presence of conjunctivitis, oral mucosa findings, distal extremities changes, skin changes and presence of cervical lymph nodes enlargement.
The laboratory test included those which are easily available: general blood count, alanine aminotransferase (ALT) activity, aspartate aminotransferase (AST) activity, urinalysis, urine culture, erythrocyte sedimentation rate (ESR), C-reactive protein (CRP) and procalcitonin levels. The values of the blood count parameters and the activity of ALT and AST were analyzed depending on the normal values specified for a given age group [4][5][6]. Blood for laboratory tests was collected during children's admission.

DRSA Analysis
There were two subclasses among the data. The first included patients who were diagnosed with Kawasaki Disease. The second one contained the remaining patients, i.e., those who were finally diagnosed with infectious mononucleosis or Streptococcal infection. A relatively high number of missing values was observed for ESR and procalcitonin attributes. Overall, the resulting data set was fully consistent.
Some attributes were transformed in a way that allowed the discover global and local monotonic relationships between condition and decision attributes [7]. The applied transformation was non-invasive, that is, it did not bias the matter of discovered relationships. Sets of decision rules, which were essential for this analysis, were induced using VC-DOMLEM algorithm [8]. These sets of rules were used to construct component classifiers in variable consistency bagging [9,10]. Variable consistency bagging (VC-bagging) and its variants developed to handle class imbalance were applied to increase the accuracy of results produced by VC-DOMLEM [11]. Estimation of attribute relevance in rules was performed according to Blaszczynski et al. [12]. All results were obtained in a 10-fold cross validation experiment that was repeated 20 times.
Most relevant rules with respect to correct classification of cases by decision rules in repeated cross validation are presented in Table 2. Correct classification to any of the two considered classes were investigated in this setting.  Lymphadenopathy (0)-cervical lymph nodes not enlarged, Lymphadenopathy (1)-one side enlargment of cervical lymph nodes; Lymphadenopathy (2)-symmetrical (both sides) enlargment of cervical lymph nodes.

Statistical Analysis
Statistical analysis was performed using PQStat 1.8.0. (PQStatSoftware). For one group of data, Wilcoxon signed-rank test and χ 2 test were used. Comparison of two groups were performed using the χ 2 test, Student's t-test or a correction for this test (Cochran-Cox). Shapiro-Wilk test was used to check normal distribution of data and Fisher-Snedecor test determined the equality of variance of variables in the analyzed population. While comparing three groups of data the Shapiro-Wilk test was used as well. Levene's test was used to check whether the variances of the variables in the population are equal for the interval scale. When comparing groups in the interval scale, the ANOVA test was used for unrelated variables (mean age of patients, mean length of hospitalization). The χ 2 test was used to compare the groups on the nominal scale. Post-hoc analysis, if needed, was performed with the Fisher test. For the SR value (Sedimentation Rate, Biernacki's test), it was impossible to perform statistical analysis because of too few patients with this result. Statistical significance was calculated for PCT (procalcitonin) and CRP (C-reactive protein) values. The significance level was set at p < 0.05.

Results
There were 150 patients: 48 with KD, 49 patients with infectious mononucleosis and 53 with S. pyogenes pharyngitis. The analyzed population is characterized in Table 3.
Patients diagnosed with KD were admitted mainly during the winter and summer months (33% of all KD patients), those with infectious mononucleosis mostly during autumn months (31% of all these patients, p < 0.01) while children with S. pyogenes infection mostly in the spring (30% of the study group) and winter months (28% of the study group).
Patients with KD usually presented in a fair general condition (92%), while most patients with infectious mononucleosis (80%) and S. pyogenes pharyngitis (70%) were admitted in good general condition (p < 0.01). When analyzing the diagnostic criteria for KD, 79% of the study group presented bilateral, nonexudative conjunctivitis; 90% presented changes on mucous membranes; 48% presented changes on the extremities (mainly edema); 83% presented with a rash; and 77% presented with cervical lymphadenopathy. Those KD symptoms were presented by patients with infectious mononucleosis, S. pyogenes pharyngitis as well as in patients with KD.
In patients with infectious mononucleosis, anemia (14%) and elevated ALT and AST activity (27% and 33%, respectively) were observed less frequently, while thrombocytopenia and sterile leukocyturia were not observed at all.
Similarly, in patients with S. pyogenes pharyngitis, there were almost no cases of anemia or thrombocythemia (2% of patients in both groups). Aseptic leukocyturia occurred in 9% of patients, and transaminases activities were elevated in 8% of cases. The mean CRP and procalcitonin levels were increased in patients in all studied groups. The highest CRP results were observed in patients with KD (112.3 mg/L, p < 0.01), while procalcitonin was the highest in patients with S. pyogenes pharyngitis (2.37 µg/L, p = 0.44).

DRSA Results
The most important predictors for the decision rules are presented in Figure 1. Lymphadenopathy (1)-one side enlargement of cervical lymph nodes; Lymphadenopathy (2)-symmetrical (both sides) enlargement of cervical lymph nodes The algorithm generated 45 decision rules recognizing KD. The rules with the highest sensitivity (number of false negatives equals zero) are as follows:

1.
If a child suspected of KD has conjunctivitis and CRP ≥ 40.1 mg/L, it is KD.

2.
If a child suspected of KD has thrombocythemia and ESR ≥ 77 mm/h, it is KD. 3.
If a child suspected of KD has a fair general condition, rash and fever lasting ≥ 5 days, it is KD. 4.
If a child suspected of KD has a fair general condition, conjunctivitis and fever lasting ≥ 5 days, it is KD.

5.
If a child with suspected KD has fever lasting ≥5 days, rash and CRP ≥ 7.05 mg/L, it is KD. Lymphadenopathy (1)-one side enlargement of cervical lymph nodes; Lymphadenopathy (2)-symmetrical (both sides) enlargement of cervical lymph nodes The algorithm generated 45 decision rules recognizing KD. The rules with the highest sensitivity (number of false negatives equals zero) are as follows: 1. If a child suspected of KD has conjunctivitis and CRP ≥ 40.1 mg/L, it is KD. 2. If a child suspected of KD has thrombocythemia and ESR ≥ 77 mm/h, it is KD. 3. If a child suspected of KD has a fair general condition, rash and fever lasting ≥ 5 days, it is KD. 4. If a child suspected of KD has a fair general condition, conjunctivitis and fever lasting ≥ 5 days, it is KD. 5. If a child with suspected KD has fever lasting ≥5 days, rash and CRP ≥ 7.05 mg/L, it is KD.

Discussion
Five of the rules generated using the DRSA method with the most clinical significance were based on following parameters: fair general condition, fever lasting ≥5 days, rash, conjunctivitis, CRP and ESR thrombocytosis. One of the big advantages of DRSA is using a combination of parametrical and non-parametrical variables.
Rules 3 and 4 were based on the results of physical examination only. What is more, both of them use patients' general condition. The general condition assessed during admission to the hospital is a useful and easy-to-evaluate indicator, but it is still a subjective factor, depending on the experience of the doctor who makes this assessment [13,14]. The American Hospital Association has advised physicians to use general terminology when describing their patient's state of health. The patient is Good when the patient's vital signs are stable and they are within the normal limits. The patient is conscious, and the patient feels comfortable. The patient is Fair when the patient's vital signs may be stable and within the prescribed normal limits. While the patient may be conscious, there may, however, be minor complications, making the patient somewhat

Discussion
Five of the rules generated using the DRSA method with the most clinical significance were based on following parameters: fair general condition, fever lasting ≥ 5 days, rash, conjunctivitis, CRP and ESR thrombocytosis. One of the big advantages of DRSA is using a combination of parametrical and non-parametrical variables.
Rules 3 and 4 were based on the results of physical examination only. What is more, both of them use patients' general condition. The general condition assessed during admission to the hospital is a useful and easy-to-evaluate indicator, but it is still a subjective factor, depending on the experience of the doctor who makes this assessment [13,14]. The American Hospital Association has advised physicians to use general terminology when describing their patient's state of health. The patient is Good when the patient's vital signs are stable and they are within the normal limits. The patient is conscious, and the patient feels comfortable. The patient is Fair when the patient's vital signs may be stable and within the prescribed normal limits. While the patient may be conscious, there may, however, be minor complications, making the patient somewhat uncomfortable. The patient is Serious when the patient's vital signs might be fluctuating, and they do not adhere to normal safe limits. The patient is Critical when the vital signs of the patient are fluctuating and they do not satisfy the normal patient limits. The patient may have, at some point, lost their consciousness. This type of patient usually will require critical care or some other treatment in the hospital's intensive care unit (ICU). Nevertheless, such a factor as general condition has so far not been taken into account in the analyses of the KD diagnosis. We are aware that the general condition defined as fair is not unique and pathognomonic for KD, but its important feature is that no additional equipment is needed. Moreover, its assessment may help in deciding whether to treat or not because additional symptoms may appear later in the disease or may not occur at all. We believe that further studies should be carried out to optimize and validate this factor in similar analyses.
In addition, these rules are important at the beginning of hospitalization because they consider only one additional factor (clinical symptom), not all the criteria (four out of five required to state a diagnosis of classic KD). This means a possibility of faster diagno-sis, especially in the atypical KD. It may reduce the number of complications-coronary aneurysms among them [15]. Kim and Kim presented a study considering the differential diagnosis of Kawasaki Disease based on acute cervical lymphadenopathy using the decision tree method [16]. It took into account radiological (Ultrasonography/Computer tomography) and laboratory findings (CRP, general blood count). The usefulness of such a diagnostic algorithm was shown, but it requires additional tests, especially radiological ones. That is why it may be less useful in every day clinical practice.
It should be added that we included only children with a clinical suspicion of KD and with a definite alternative final diagnosis. The exclusion of other children was necessary to avoid including to non-KD group children who had KD but were not correctly diagnosed. From the hospital perspective, these three clinical entities are the most difficult to distinguish.
Viral infections other than EBV (Epstein-Barr Virus) ones do not have prolonged fever, and children are not very sick, so parents do not bring them to hospital. In invasive bacterial infection, the history of fever is usually short, and because of the severity of the condition, parents go to the hospital immediately. Other diseases such as JRA (juvenile rheumatoid arthritis) and SLE (systemic lupus erythematosus) are very rare, and it was not possible to form a group enough large for comparison.
Half of the decision rules contained inflammatory markers (CRP and ESR). CRP is produced by hepatocytes in response to inflammatory processes. However, its production is not only activated by an infectious agent but also a non-infectious one [17]. Therefore, this is a non-specific marker. CRP was initially used to differentiate infectious diseases (viral versus bacterial etiology of disease), but over time it has been noticed that this protein is useful in estimating the prognosis of inflammatory process (not necessarily infectious diseases), including KD [18][19][20][21][22].
Among the analyzed patients, the CRP concentration was different in each disease. The highest concentration of this marker was observed in patients with KD (mean 112.3 mg/L) compared to infectious diseases (infectious mononucleosis-26.9 mg/L, streptococcal pharyngitis-50.9 mg/L). Statistical significance was demonstrated in the above differences, but in practice, it is a very non-specific indicator and cannot be individually taken into account in the differentiation of the above diseases. However, an advantage of using CRP can be useful by adding the DRSA method. As shown above, this marker combined with the factors obtained from the physical examination resulted in an excellent sensitivity result.
ESR is simple and cheap to perform. However, its variability in both physiological and pathological states and low specificity, together with the introduction of newer and newer inflammatory markers into laboratory practice, make it nowadays less used [23][24][25]. Its increase is observed, as in the case of CRP, in all inflammatory processes; therefore, it seems that if both tests can be performed, CRP should be checked [26]. However, SR is still included in the American Heart Association guidelines for the diagnosis of Kawasaki disease as an additional test [27]. A meta-analysis by Xuan et al. showed that SR may be a prognostic factor for KD resistance to first-line immunoglobulin therapy [28,29]. In the analyzed group of patients, no comparison was made between them due to missing data. In the hospital, the principle of the superiority of CRP over ESR was followed. The group of children with KD (where ESR was measured) also had anemia and hypoalbuminemia, which are the factors that can increase the ESR result themselves. Thus, it can be concluded that SR may be probably useful in analyzing the prognosis of resistance to immunoglobulins used in therapy, but it is not an appropriate differentiating factor. However, using the DRSA method, which enables comparative analyses even in the absence of some data, the combination of thrombocytosis and an increased ESR ≥ 77 mm/h was found to be highly sensitive in the diagnosis of KD.
In addition, the DRSA analysis showed that general abnormalities in the additional tests, such as thrombocytosis or anemia, are not good factors in the differential diagnosis of KD. The typical causes of thrombocytosis are inflammatory diseases (both infectious and non-infectious), postoperative conditions, trauma, burns, blood loss, asplenia or hyposplenia [30]. What is more, anemia is a common problem in the population, also in Poland [31]. Knowing that thrombocytosis is associated with anemia by pathophysiological mechanisms, these factors are not useful in approximating the diagnosis of KD.
To conclude, it is important that the DRSA analysis made it possible to create decision rules with a small amount of data, in some cases incomplete (only 150 patients were analyzed). With the enormous potential of this method, further research should be carried out, especially prospective ones. Observing the increasing frequency of Pediatric Inflammatory Multisystem Syndrome clinically corresponding to KD, the results would be interesting.

Conclusions
A DRSA analysis may be helpful to diagnose of KD at an early stage of the disease. It can be used even with a small amount or incomplete clinical or laboratory data. The most valuable rules for a KD diagnosis, which were created on the basis of the data presented in this work, are as follows: If a child with suspected KD has conjunctivitis and CRP ≥ 40.1 mg/L, it is KD. If a child with suspected KD has thrombocytosis and ESR ≥ 77 mm/h, it is KD. If a child with suspected KD has a fair general condition and fever ≥ 5 days and rash, it is KD.
If a child with suspected KD has a fair general condition and fever ≥ 5 days and conjunctivitis, it is KD.
If a child with suspected KD has a fever ≥ 5 days and rash and CRP ≥ 7.05 mg/L, this is KD. What is more, we think that it is worth using the DRSA method in medicine. It is useful for differential diagnosis, especially when the disease does not have specific tests to be diagnosed. Much more prospective studies should be performed to confirm this thesis.  Institutional Review Board Statement: It is explained in the informed consent statement section. The data concerned hospitalized patients, no additional tests were taken beyond those required by the hospitalization itself. Parents signed this consent upon admission.

Informed Consent Statement:
On admission to the hospital, all patients' guardians were informed that their children's medical data obtained during hospitalization may be anonymously used for various analyses. That kind of consent is enough for this kind of research from the ethical point of view in Poland.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest.