Previous Article in Journal
Feasibility of Sodium and Amide Proton Transfer-Weighted Magnetic Resonance Imaging Methods in Mild Steatotic Liver Disease
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Machine Learning and Feature Selection in Pediatric Appendicitis

1
Department of Computer Science, St. Francis Xavier University, Antigonish, NS B2G 2W5, Canada
2
Nova Scotia Health Authority, Halifax, NS B3H 1V8, Canada
*
Author to whom correspondence should be addressed.
Tomography 2025, 11(8), 90; https://doi.org/10.3390/tomography11080090
Submission received: 31 May 2025 / Revised: 1 August 2025 / Accepted: 6 August 2025 / Published: 13 August 2025
(This article belongs to the Special Issue Celebrate the 10th Anniversary of Tomography)

Abstract

Background/Objectives: Accurate prediction of pediatric appendicitis diagnosis, management, and severity is critical for clinical decision-making. We aimed to evaluate the predictive performance of a wide range of machine learning models, combined with various feature selection techniques, on a pediatric appendicitis dataset. A particular focus was placed on the role of ultrasound (US) image-descriptive features in model performance and explainability. Methods: We conducted a retrospective cohort study on a dataset of 781 pediatric patients aged 0–18 presenting to Children’s Hospital St. Hedwig in Regensburg, Germany, between January 2016 and February 2023. We developed and validated predictive models; machine learning algorithms included the random forest, logistic regression, stochastic gradient descent, and the light gradient boosting machine (LGBM). These were paired exhaustively with feature selection methods spanning filter-based (association and prediction), embedded (LGBM and linear), and a novel redundancy-aware step-up wrapper approach. We employed a machine learning benchmarking study design where AI models were trained to predict diagnosis, management, and severity outcomes, both with and without US image-descriptive features, and evaluated on held-out testing samples. Model performance was assessed using overall accuracy and area under the receiver operating characteristic curve (AUROC). A deep learner optimized for tabular data, GANDALF, was also evaluated in these applications. Results: US features significantly improved diagnostic accuracy, supporting their use in reducing model bias. However, they were not essential for maximizing accuracy in predicting management or severity. In summary, our best-performing models were, for diagnosis, the random forest with embedded LGBM feature selection (98.1% accuracy, AUROC: 0.993), for management, the random forest without feature selection (93.9% accuracy, AUROC: 0.980), and for severity, the LGBM with filter-based association feature selection (90.1% accuracy, AUROC: 0.931). Conclusions: Our results demonstrate that high-performing, interpretable machine learning models can predict key clinical outcomes in pediatric appendicitis. US image features improve diagnostic accuracy but are not critical for predicting management or severity.

1. Introduction

Pediatric appendicitis is characterized by inflammation of the appendix found in patients aged eighteen years and younger. When inflamed, the appendix causes pain and can lead to serious complications for the patient, including peritonitis and infection [1]. Symptoms can include nausea, loss of appetite, constipation, bloating, and abdominal pain [1]. Symptoms are not always easily identified or caught in time in younger patients, as they may not communicate as well and often experience fewer symptoms [2]. Appendicitis is typically caused by a blockage in the lumen, leading to an infection that then causes the appendix to expand and potentially burst [1]. While appendicitis can occur in both males and females, males have been found to be at a slightly higher risk, and most cases occur between the ages of ten and thirty [1]. A highly effective way to diagnose appendicitis is to evaluate the current state of the appendix using medical imaging. This is performed through computed tomography (CT), ultrasound (US), or magnetic resonance imaging (MRI), with CT being the most accurate of the three [3]. A shortcoming of these imaging techniques is that they are expensive and potentially time-consuming. MRI may not always be readily accessible due to high costs, limited availability, and the need for specialized interpretation, all of which can delay diagnosis and treatment. Additionally, CT relies upon ionizing radiation, which for most adults is safe, but may be risky for younger patients due to the radiation’s potential negative effects on their growing bodies [3].
Supervised machine learning is a common technology applied to predictive applications, such as diagnosing a given medical condition. The algorithms are provided with ground-truth training data, which are represented by sets of samples/instances, each containing a set of feature measurements that can inform predictions, and a target variable to be predicted. During training, algorithms establish complex correlational relationships between predictor variables and the target variable, supporting the creation of technologies that can be relied upon to make predictions on samples that were not trained upon. As such, as long as correlations exist between predictor variables and the target variable, AI has the potential to create highly accurate predictive models.
Using artificial intelligence technologies to diagnose appendicitis has been the subject of previous analyses. One study from Saudi Arabia used K-nearest neighbours (KNN), decision trees (DT), bagging, and stacking to identify acute appendicitis and found their stacking model to be the most successful with training accuracy, testing accuracy, testing precision, and testing F1 scores of 97.51%, 92.63%, 95.29%, and 92.04%, respectively [4]. From their study, they found their most important features to be neutrophils, white blood cell count, length of stay, and symptom days for their stacking model [4]. Another study [5] was conducted using results from previous studies [6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34] to determine whether using AI models is an effective way for diagnosing acute appendicitis in adults. This review analyzed twenty-nine studies on diagnosis and prognosis of acute appendicitis, and found that the model most commonly used was the artificial neural network (ANN) [5]. These ANNs produced accuracy scores typically above 80% with some reporting the area under the receiver operating characteristic curve (AUC) nearing 0.99 [5]. However, it should be noted that this analysis was based on an adult population, and so the findings therein may not hold in a pediatric population.
Several recent studies have applied machine learning approaches to pediatric appendicitis using subsets of the dataset analyzed in the present work. A foundational study [35] on a subset of the dataset addressed in this research [36] was previously conducted and included 430 patients. The machine learning models used were logistic regression, random forest, and generalized boosted regression model, all in the R programming language. Their results are summarized: “A random forest classifier achieved areas under the precision-recall curve of 0.94, 0.92, and 0.70, respectively, for the diagnosis, management, and severity of appendicitis”, based on held-out test samples as part of 10-fold validation [35]. A subsequent analysis, as part of a larger team, was performed using a larger subset (579 patients) of the dataset addressed in this study to diagnose pediatric appendicitis using deep learning methods with concept bottleneck models (CBMs) with a primary focus on the ultrasound images [37]. While the dataset contains images and corresponding descriptions of the images, some patients included do not contain a complete set of all of these features. The images are taken from multiple views of the same target area to help ensure imaging has captured key features of the appendix being analyzed. To handle this, the study used a semi-supervised extension in addition to the CBM [37]. They first used a shared encoder neural network to map the images to features, which are then aggregated across imaging views to produce representations and concepts understandable by humans, contributing to the prediction of the target class [37]. Results of 0.80 AUROC were reported for predicting the diagnosis of appendicitis. Two additional studies have been conducted on the updated dataset used in this analysis, focused exclusively on diagnosis [38,39]. This includes an approach achieving 94.5% accuracy with the random forest [38], and an approach based on the Hybrid Bat algorithm achieving 94% accuracy [39]. An additional analysis focused on diagnosis and severity [40] but did not report accuracy statistics.

Hypotheses and Contributions

Our objective in this study is to address the following hypotheses. We hypothesize that:
The use of open-source machine learning software applied to the Regensburg Pediatric Appendicitis Dataset may produce useful technology for predicting aspects of pediatric appendicitis patient care.
By creating technologies that can predict diagnosis, severity, and management of pediatric appendicitis, both by using and withholding US image-derived features, we can assess the apparent value of US imaging in the context of AI predictive technology.
Our models will be able to more accurately predict their respective target variables (diagnosis, management, and severity), as compared to previous works on this topic, by thoroughly examining a large set of combinations of machine learning and feature selection algorithms.
Feature selection subsets will be informative to clinicians and researchers as to factors that are predictive of diagnosis, management, and severity of pediatric appendicitis, respectively.
Contributions provided by this study include the consideration of a large selection of feature selection (FS) algorithms, including a novel redundancy-aware FS algorithm developed in our lab, consideration of novel subsets of features identified by FS, consideration of a variety of high-performing machine learning algorithms, including the computationally efficient light gradient boosting machine and a deep learner optimized for tabular data, known as Gandalf, evaluation of our study findings on an updated pediatric appendicitis dataset with more patients/samples than those included in the early work on this topic, confirmation of the value of ultrasound imaging features as assisting in mitigating bias in prediction for diagnosis of appendicitis, and finally, demonstrating strong predictive performance from the models developed across three AI applications in pediatric appendicitis.
We introduced an overview of pediatric appendicitis, related AI technological development, and closely related work on the same dataset in Section 1, as well as provided a Hypotheses and Contributions subsection. The rest of the paper will proceed as follows: we will provide a study design overview in Section 2.1, an outline of the study participants in Section 2.2, a detailed dataset description of the variables/measurements in Section 2.3, an outline of the preprocessing performed on the dataset in Section 2.4, the machine learning methods used are presented in Section 2.5, and the statistics relied upon for machine learning evaluation are presented in Section 2.6. The results for predicting diagnosis are provided in Section 3.1, the results for predicting management are provided in Section 3.2, the results for predicting severity are provided in Section 3.3, and the Gandalf deep learner results are provided in Section 3.4. A discussion of interactions between machine learning and feature selection technologies employed is provided in Section 4.1, a discussion of Gandalf results is provided in Section 4.2, a discussion of the value of Ultrasound features is provided in Section 4.3, a literature comparison is provided in Section 4.4, future work is presented in Section 4.5, followed by our conclusions in Section 5.

2. Materials and Methods

2.1. Study Design Overview

We conducted a retrospective cohort study on a dataset of 781 pediatric patients aged 0–18 presenting to Children’s Hospital St. Hedwig in Regensburg, Germany, between January 2016 and February 2023. This study employed a comparative AI benchmarking approach using publicly available benchmarking software applied to an open-access pediatric appendicitis dataset. The analysis covered three clinical tasks: diagnosis (the AI is tasked with performing a diagnosis of appendicitis or not), management (the AI is tasked with predicting the treatment option for the patient), and severity (the AI is tasked with predicting the state of the patient’s appendicitis). The potential value from the inclusion of ultrasound image features was considered for all applications. This study was performed retrospectively on a public domain dataset; as such, no ethics committee approval was required for this analysis.

2.2. Participants

The dataset examined was initially assembled by Marcinkevičs et al., and their analysis was previously published [35]. The dataset was revisited [37] with an extended observation timeline, more patients, and additionally collected ultrasound images for many of the patients. The dataset previously studied [37] included records for 579 patients, whereas we examined an updated version of this dataset with 781 observations. The data was obtained from patients admitted to the tertiary Children’s Hospital St. Hedwig in Regensburg, Germany, with suspected appendicitis between 2016 and 2021. All aspects of the methods of this study were completed by the study authors except for the patient recruitment and data acquisition/curation previously completed [35,37].

2.3. Variables/Measurements

Patient data included demographic information, clinical examinations, laboratory tests, scoring results, and (potentially multiple per patient) ultrasound (US) images and expert-interpreted findings from the images. Descriptions of the feature measurements and target variables are detailed in Table 1, Table 2 and Table 3, and their numeric feature distributions in Table 4. The categorical feature statistics tables have also been provided in Appendix A (Table A1, Table A2 and Table A3). Detailed feature descriptions are also provided in Appendix A, broken down for different feature types, see Table A4, Table A5, Table A6, Table A7, Table A8 and Table A9. Note that there was a single patient/sample with a missing diagnosis field in this dataset; as such, it needed to be excluded from the diagnosis application, resulting in a count of 780 samples for the diagnosis application, whereas we were able to maintain the full sample count of 781 for the remaining two target variable applications. Predictive models were created to target the same three variables previously targeted [35] for binary classification:
  • Diagnosis: Appendicitis (n = 463, 59.36%) or no appendicitis (n = 317, 40.64%).
  • Management: Surgical (n = 298, 38.16%) or conservative (n = 483, 61.84%).
  • Severity: Complicated (n = 119, 15.24%) or uncomplicated (n = 662, 84.76%).

2.4. Data Preprocessing

Df-analyze, the software relied upon for our machine learning and feature selection analysis, performs its own data cleaning [41], so null value handling was left to its imputation feature with median selection. A variety of preprocessing steps were applied prior to the use of df-analyze. The US number was dropped as it acted as a unique ID. All urine sample features were converted from categorical features to an ordinal scale from 0 to 3, so the relationship between values was encapsulated in the feature encodings. The management target variable was reduced to a binary class by combining primary surgical, secondary surgical, and simultaneous appendectomy in a single surgical class, as df-analyze requires substantial class representation for all target values for its validation to function. The data summary suggests secondary surgical management indicates surgery after their initial stay, when the patient data was recorded. As part of the previous analysis [35], patients were contacted at least 6 months after discharge and classified their management as (secondary) surgical if they had since had an appendectomy. As was previously investigated [35], we predict whether a patient required surgery, as it could potentially prevent a second visit to the hospital. Length of stay was also dropped from the dataset, as we have created technologies with potential real-world utility, in which we would want to be able to predict important target variables, such as diagnosis, severity, and management as early on in their hospital admission as possible, and we cannot establish the correct length of stay value for each patient until the end of their hospitalization.
The presumptive diagnosis feature may not always match the final diagnosis and may provide additional information reinforced by the managing doctors’ education and expertise, which could be particularly useful in smaller datasets. However, the feature may bias a machine learning model, or in real-world applications, may not be available for input. As such, this feature was excluded from our dataset.
Lymph Nodes Location, Abscess Location, and Gynecological findings were excluded from our dataset, as they were all described as free-form text, mostly in German. When divided into classes by unique values, Abscess Location and Gynecological findings’ largest class had fewer than 20 instances, which is too few for informing reliable predictions in df-analyze [41]. Lymph Nodes Location had some unique values with at least 20 instances, but many of its classes were combinations of others, and the feature is null for more than 80% of records; as such, it was also excluded. To facilitate reproducibility, custom pre-processing code for this dataset is provided in clean-tabular-dataset.py [42].

2.5. Machine Learning

The machine learning software used in this study is df-analyze [41]. The models considered in this study include the light gradient boosting machine (LGBM), random forest (RF), linear regression (LR), stochastic gradient descent (SGD), k-nearest neighbours (KNN), and a dummy model that predicts the class with the largest number of samples as a baseline. Df-analyze also supports assessment of a variety of feature selection (FS) technologies [41], each of which is exhaustively combined with all supported aforementioned machine learning methods. This includes two types of filter-based FS: association (assoc) and prediction (pred) [41], two types of embedded FS: linear (embed_linear) and LGBM (embed_lgbm) [41], and an emerging redundancy-aware step-up feature selection method (wrap) unique to df-analyze [43], as well as no (none) FS. The target features in this study were predicted from exhaustive combinations of supported machine learning and FS algorithms trained and tested individually as part of a fair comparison validation. For each target variable, models are constructed with each FS method. Optuna hyperparameter tuning is supported in df-analyze [41] and was used in this analysis for all machine learning techniques.
The code for running all configurations of our dataset with command line interfaces (CLIs) is provided in run-df-analyze.sh [42]. Each target variable was run with and without US image features. Thus, our analysis involves six runs of df-analyze as follows:
  • Targeting Diagnosis with US Image Features Included;
  • Targeting Diagnosis without US Image Features Included;
  • Targeting Management with US Image Features Included;
  • Targeting Management without US Image Features Included;
  • Targeting Severity with US Image Features Included;
  • Targeting Severity without US Image Features Included.

2.6. Statistical Analysis

Df-analyze conducts statistical analyses of each classification model paired with each FS method, using eight different metrics. These metrics are: overall accuracy (acc—the proportion of correct predictions out of all predictions), balanced accuracy (bal-acc—the expected accuracy if the dataset classes were balanced), F1-score (f1—the harmonic mean of recall and precision), negative predicted value (npv—the proportion of negative predictions that are correct), positive predicted value (ppv—the proportion of positive predictions that are correct), sensitivity (sens—the proportion of the group of interest predicted correctly), specificity (spec—the proportion of the group not-of-interest predicted correctly), and the area under ROC curve (AUROC or AUC—the area under the curve outlining the tradeoff between sensitivity and specificity across operating points). The primary metrics used to evaluate each model are overall accuracy and AUROC. Two validation methods were employed, including holdout set validation and K-Fold validation on the hold-out set. The hold-out set was established with a large 40% of the samples randomly selected in order to assist with reliability and reproducibility. Validation was performed on the holdout set, as well as with K-Fold validation on the holdout set with K = 5. Optuna hyperparameter tuning was completed with 50 runs. After completion of the above methods, a new version of df-analyze was released with support for an emerging deep learning method designed for tabular data, known as Gated Adaptive Network for Deep Automated Learning of Features (GANDALF) [44]. Df-analyze was re-accessed to assess this method as well (df-analyze access date: November 2024), and the experiments were re-run with GANDALF enabled. Due to the additional computational demands of GANDALF relative to the other machine learning methods assessed, df-analyze was run without redundancy-aware step-up feature selection enabled, as this was the slowest of our considered feature selection methods.

3. Results

3.1. Predicting Diagnosis

For predicting diagnosis when including US image features, the best-performing model was the random forest (RF) with embedded LGBM-based feature selection, achieving an accuracy of 98.1% and an AUROC of 99.3% across both validation methods, see Appendix B. The features that this model relied upon are outlined in Appendix C, which provides a ranking of their respective apparent importance to inform prediction.
When excluding the US image-based features, the best-performing model was LGBM with no (none) feature selection, achieving an accuracy of 80.1% and an AUROC of 87.3–88.0% across both validation methods, see Appendix D.
The Optuna hyperparameter-tuned model parameters for the leading techniques are provided in Appendix E. A comparative visualization of leading findings is provided in Figure 1.

3.2. Predicting Management

For predicting management, the best-performing models that included US-based image features were the random forest with association filter-based feature selection (assoc), achieving accuracies of 92.0–93.6% accuracy and an AUROC of 97.3–98.4% across both validation methods, see Appendix F. The association feature selection method selected for a large number of the available features in this dataset and is provided in detail in Appendix G. Note that a sorting of the importance of the features is provided both for numerical and categorical features, respectively. The leading features informing prediction, according to the association filter-based method’s reliance on mutual information, were C-reactive protein, Alvarado score, the appendix diameter, white blood cell count, and neutrophil percentage for the numerical variables, and ipsilateral rebound tenderness, diagnosis, peritonitis, severity, and surrounding tissue reaction for the categorical variables.
When predicting management without the US-based image features, the best-performing model was the random forest (RF) with no (none) feature selection, achieving accuracies of 92.0–93.9% and an AUROC of 97.0–98.0% across both validation methods, see Appendix H. Noteworthy is that our emerging redundancy-aware step-up feature selection method (wrap), which is biased in favour of unusually small feature sets, achieved near equal accuracies of 92.0–92.7% and an AUROC of 94.2–96.0%, based on just 11 features, as outlined in Appendix I. The leading features relied upon were peritonitis, white blood cell count, body temperature, weight, severity, and C-reactive protein.
The Optuna hyperparameter-tuned model parameters for leading techniques are provided in Appendix E. A visualization of leading findings is provided in Figure 2.

3.3. Predicting Severity

For predicting severity, with US image features included, the best-performing model was logistic regression (LR) with wrapper-based redundancy-aware step-up feature selection (wrap), which achieved accuracy of 89.1–89.5% and an AUROC of 82.0–83.4% across both validation methods, see Appendix J. The feature selection results are provided in Appendix K. Leading features were meteorism (excess gas in the digestive tract), dysuria, weight, lower right abdominal pain, and free fluids.
When predicting severity with US image features excluded, the best-performing model was LGBM with filter-based association (assoc) feature selection, achieving an accuracy of 89.2–90.1% and an AUROC of 89.6–93.1% across both validation methods, see Appendix L. As is common, the association-based feature selection method selects a large number of the available features in this dataset. Also of interest, redundancy-aware step-up feature selection (wrap) produced similar results, achieving an accuracy of 88.8% and an AUROC of 80.5–81.1% when combined with logistic regression based on just five features, as outlined in Appendix M. The five features included were peritonitis, coughing pain, body temperature, thrombocyte count, and C-reactive protein.
The Optuna hyperparameter-tuned model parameters for leading techniques are provided in Appendix E. A visualization of leading findings is provided in Figure 3.

3.4. GANDALF Results

GANDALF [44] was run with an updated version of df-analyze, and so the results presented can only be roughly compared with the findings presented above due to it being run as an additional round of validation with unique randomization. When predicting diagnosis, the leading accuracy/AUROC for GANDALF was 80.5/90.6% with US features (filter-based prediction feature selection), and 66.7/75.7% without US features (filter-based prediction feature selection). When predicting management, the leading accuracy/AUROC for GANDALF was 91.5/96.9% with US features (no feature selection), and 90.5/97.5% without US features (embedded linear feature selection). When predicting severity, the leading accuracy/AUROC for GANDALF was 81.1/77.7% with US features (filter-based prediction feature selection), and 85.4/81.1% without US features (embedded linear feature selection).

4. Discussion

We performed a detailed study comparing several machine learning algorithms combined exhaustively with a variety of feature selection approaches applied to pediatric appendicitis diagnostics, management (treatment prediction), and severity. Results demonstrate that we are able to create high-performing models for each of the three main predictive tasks addressed. Our extensive use of feature selection has provided a variety of feature sets predictive of our three addressed target variables, information that can potentially assist in the clinical management of appendicitis and may inform the development of future AI technologies in this domain.

4.1. Interactions Between Machine Learning and Feature Selection Technologies

Our df-analyze benchmarking software has been previously used to assess machine learning and feature selection combinations that produce high-quality AI models to assist in schizophrenia diagnostics [45], chronic kidney disease diagnosis [46], mitigating bias in traffic stop outcomes [47], and studying proteins potentially linked with learning in the cerebral cortex [48]. In this study, we investigated the tool’s potential for use in three applications of pediatric appendicitis.
Logistic regression (LR) and stochastic gradient descent (SGD) were only among our top performers when using a feature selection method, suggesting that those methods are sensitive to being negatively biased from the inclusion of noisy, useless, and/or redundant features. In contrast, the light gradient boosting machine (LGBM) and the random forest (RF) models often performed well in predicting appendicitis diagnosis, management, and severity with and without feature selection methods. These results imply that the LGBM and RF are strong at ignoring noisy, useless, and/or redundant features in this application. These observations are expected as the LGBM and RF are both based on collections of decision tree classifiers, which are inherently capable of ignoring weak features, as they strongly tend not to be selected for in the splitting process that creates decisions at each split in each base learner decision tree. Our results also demonstrate potential from our novel redundancy-aware feature selection (FS) method, contributing to high-performing models in both management and severity prediction, based on relatively small feature sets. Such solutions have the potential to improve the explainability of our AI technologies through a greatly reduced feature set size. For management, our redundancy-aware FS method identified 11 features (see Appendix I), with the leading features relied upon being peritonitis, white blood cell count, body temperature, weight, severity, and C-reactive protein. For severity, our redundancy-aware FS method identified five features (see Appendix M): peritonitis, coughing pain, body temperature, thrombocyte count, and C-reactive protein. These feature sets are highly predictive of management and severity, respectively, and so may represent useful information for clinicians responsible for patient management.

4.2. Discussion of GANDALF Results

GANDALF [44], an emerging deep learning architecture designed for tabular data, upon which deep learners have traditionally been underperformers, was assessed as an addendum to this study. Results demonstrate overall good performance from GANDALF; however, it was not the leading AI technology in our trials in terms of predictive accuracy. That said, GANDALF was very competitive in predicting management and severity, especially in terms of AUROC scores, implying the method is capable of creating internal embeddings of feature representations that assist in delineating between our target classes of interest as assessed by AUROC. It is well known that deep learners in particular benefit from large sample sizes to train upon, and so it is expected that in this application, with relatively few samples compared with many other machine learning studies, GANDALF is disadvantaged.

4.3. Predictive Significance of US Image Features

For predicting diagnosis, the performance tables in Appendix H and Appendix J consistently show a decrease in predictive accuracy of our top-performing models of 10–20% in both performance on holdout set and 5-fold cross-validation on the holdout set when withholding US image features. The significant drop in performance suggests information in the US image features is important for diagnosing appendicitis and contributes to a mitigation in how biassed the resultant models are from predicting ground-truth diagnoses. When predicting management, there is no drop in performance across our top-performing models when US image features are removed (see Appendix C and Appendix L). Similar findings were observed in comparative performance when US image features are included/excluded when predicting Severity (see Appendix G and Appendix I). These results suggest US image-derived features are either not useful in predicting the management and severity of pediatric appendicitis or are redundant to non-US-based features available in this dataset.

4.4. Literature Comparison

The appendicitis dataset relied upon has been updated since the earliest publications focused on this work [35,37], supporting a more statistically powered analysis with 781 patients in our study, as opposed to 430 patients [35] and 579 patients [37]. Thus, any comparisons between our findings and the foundational papers on this dataset in the literature [35,37] are not exact comparisons due to the dataset size, as well as inevitably employing different validation strategies. Having more samples in the total dataset is expected to help improve predictive accuracies, as more samples are available for training, which is well known to improve the performance of machine learning models generally. Also noteworthy, our validation approach involved reserving 40% of the samples in the dataset for our hold-out testing to help ensure reliability. This has the potential to reduce our reported predictive accuracy, as only 60% of the total samples were available for training in a relatively small dataset. Previous work on this dataset employed validation with 10% of samples included in the testing pools [35]. Results indicate that our leading models produced AUROC scores of 0.993 for predicting diagnosis, 0.973–0.984 for predicting management, and 0.896–0.931 for predicting severity across our two validation methods. This compares favourably with literature work on a subset of this dataset [35], which reported AUROC scores of 0.96 (+/−0.01) for predicting diagnosis, and 0.94 (+/−0.02) for predicting management; however, our findings were approximately the same for predicting severity, with the literature reporting AUROC scores of 0.91 (+/−0.07) [35]. Our results are roughly in line with those from the literature [35], with some noteworthy improvements in AUROC scores in predicting diagnosis and management. The improved performance of our models may be attributable to the increased sample size available in our dataset, and features of df-analyze, such as Optuna hyperparameter tuning, extensive feature selection techniques evaluated, using state-of-the-art scikit learn implementations of learning machines in Python (as opposed to relying on R), and consideration of lightweight high-performing algorithms such as the light gradient boosting machine (LGBM), and LGBM-based embedded feature selection. It should also be noted that two additional studies have been conducted on the updated dataset used in this analysis, focused exclusively on diagnosis [38,39]. This includes an approach achieving 94.5% accuracy with the random forest [38], and an approach based on the Hybrid Bat algorithm achieving 94% accuracy [39]. An additional study was based on recursive feature elimination and the random forest, which did not report overall accuracies [39], but reported AUROC scores for diagnosis of 0.96 +/−0.02 [40]. In contrast, our approach, enhanced by Optuna hyperparameter tuning and feature selection, compares favourably with 98.1% accuracy and AUROC scores of 0.993 for diagnosis.

4.5. Future Work and Limitations

An interesting consideration that has resulted from this study relates to interactions between the target variables. There is potential value, for instance, in predicting diagnosis with and without knowledge of management, or predicting management with or without knowledge of the diagnosis. For instance, diagnosis is often not established until after surgical management, so the method selected for surgical management can potentially be a useful informative feature assisting in the predictive capacity of diagnosis. Conversely, management may benefit from knowledge of the final diagnosis if it is available. However, in situations where it is not (the patient’s final diagnosis is unknown), but the patient is proceeding to management/surgery, then a management prediction algorithm should not be informed as to the patient’s diagnosis when creating a technology to be relied upon clinically. Confounding issues, such as these, are important when creating a series of technologies to be relied upon for aiding clinical management of patients. Models can be created with and without knowledge of the other target variables of interest; thus, appropriate AI models can theoretically be relied upon clinically based on the availability (or not) of given target variables that may be helpful in informing prediction. Furthermore, AI technologies can be created that input a prediction of a target variable assessed by a different AI model. While this study is a research endeavour, and the models developed have not been clinically deployed, it is important for AI developers in medical applications to appreciate the various trade-offs and varying clinical utility of nearly identical models trained on almost the same set of potential predictor variables. Preliminary experiments indicate that high-performing models can be built with df-analyze for these applications with and without the inclusion of alternate target variables as features informing prediction. Limitations include that this study was performed on a single dataset, as this is the only dataset of its type publicly available; thus, independent dataset validation was not possible. Future work should involve validation on additional independent datasets in different healthcare environments to assess their generalisability across diverse pediatric populations. Future work should also involve consideration of emerging learning algorithms, such as updates to deep learners focused on tabular data.

5. Conclusions

We investigated the use of several machine learning technologies exhaustively combined with a variety of feature selection algorithms for predicting the diagnosis, management, and severity of pediatric appendicitis, with and without the inclusion of ultrasound image-derived features. Ultrasound image features were found to be important for maximizing accuracy when performing diagnostics, providing support for the value of imaging features in mitigating bias in the AI model relative to ground-truth diagnoses. However, findings imply that image-derived features are not as useful when predicting the management and severity of the condition. A variety of leading learning machines were presented based on variable subsets of the features identified by our redundancy-aware FS, providing detailed information that can potentially aid in the explainability of our AI models. The methods outlined in this study produced AI technologies with robust predictive potential in three applications focused on pediatric appendicitis as assessed by the area under the receiver operating characteristic curve. The technologies developed in this study could potentially help identify and manage young patients with suspected appendicitis. Advantages of the approach taken in this study include the consideration of a novel redundancy-aware step-up feature selection algorithm, consideration of an emerging deep learner optimized for tabular data (Gandalf), assessment of the value of US-derived features, and the creation of highly accurate AI models for three applications. Disadvantages include that this study did not consider convolutional neural networks that process the US images available in this dataset, as well as being reliant on a single dataset for all analyses. Future work will investigate the role of image analysis deep learners, including on additional datasets.

Author Contributions

Conceptualization, J.K. and G.G.; methodology, J.K. and G.G.; software, J.K., G.G. and D.B.; validation, J.K., G.G. and D.B.; formal analysis, J.K. and G.G.; investigation, J.K. and G.G.; resources, J.L.; data curation, J.K. and G.G.; writing—original draft preparation, J.K. and G.G.; writing—review and editing, J.K., G.G. and J.L.; supervision, J.L.; project administration, J.L.; funding acquisition, J.L. All authors have read and agreed to the published version of the manuscript.

Funding

This study was financially supported by a Canada Foundation for Innovation grant, a Nova Scotia Research and Innovation Trust grant, an NSERC Discovery grant, a Compute Canada Resource Allocation, and a Nova Scotia Health Authority grant to J.L.

Institutional Review Board Statement

This dataset was obtained from a public source, the IRB approval was reported by the study authors (without a specific date), as follows. The study involving human participants was reviewed and approved by the University of Regensburg institutional review board (Ethikkommission der Universität Regensburg, no. 18-1063-101). The results presented in this manuscript involved only secondary analysis of de-identified data. The dataset used in this study is publicly available and so institutional review board approval was not required to complete this retrospective analysis.

Informed Consent Statement

The study involving human participants was reviewed and approved by the University of Regensburg institutional review board (Ethikkommission der Universität Regensburg, no. 18-1063-101), which also waived informed consent to routine data analysis. The results presented in this manuscript involved only secondary analysis of de-identified data. For patients followed up after discharge, written informed consent was obtained from parents or legal representatives.

Data Availability Statement

The dataset used in this study is publicly available and can be accessed at https://archive.ics.uci.edu/dataset/938/regensburg+pediatric+appendicitis (accessed on 30 September 2024). No new data were created or collected specifically for this study. Since this was a retrospective analysis of public domain data, no institutional review board approval was necessary for conducting this study.

Conflicts of Interest

Dr. Levman is founder of Time Will Tell Technologies, Inc. The authors declare no relevant conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:
LGBMLight Gradient Boosting Machine
RFRandom Forest
LRLinear Regression
SGDStochastic Gradient Descent
AUROCArea Under the Receiver Operating Characteristic Curve
USUltrasound
GANDALFGated Adaptive Network for Deep Automated Learning of Features

Appendix A

Categorical feature statistics for each of three applications.
Table A1. Categorical feature statistics for the target variable Diagnosis.
Table A1. Categorical feature statistics for the target variable Diagnosis.
FeatureClass% Appendicitis% No Appendicitis% of Total
Sexfemale53.1946.8148.33
male65.0134.9951.67
Managementconservative35.264.861.84
primary surgical99.260.7434.57
secondary surgical96.153.853.46
simultaneous appendectomy01000.13
Severitycomplicated99.160.8415.24
uncomplicated52.1947.8184.76
Appendix on the USno30.0469.9635.14
yes752564.86
Migratory Painno56.2343.7772.7
yes66.8233.1827.3
Lower Right Abd Painno36.5963.415.3
yes60.4439.5694.7
Contralateral Rebound Tendernessno51.648.461.15
yes70.1329.8738.85
Coughing Painno55.8444.1671.54
yes66.0633.9428.46
Nauseano48.9151.0941.47
yes66.4533.5558.53
Loss of Appetiteno51.0548.9549.22
yes66.8433.1650.78
Neutrophiliano44.4755.5350.68
yes74.5225.4849.32
Dysuriano58.6741.3394.16
yes47.7352.275.84
Stoolconstipation59.7740.2311.37
constipation, diarrhea10000.13
diarrhea65.6234.3816.73
normal57.1942.8171.76
Peritonitisgeneralized87.812.25.3
local86.9813.0224.84
no47.0452.9669.86
Psoas Signno60.6739.3368.59
yes52.5647.4431.41
Ipsilateral Rebound Tendernessno47.6852.3293.86
yes73.6826.326.14
US_Performedno71.4328.571.93
yes59.1140.8998.07
Free_Fluidsno50.6149.3956.88
yes71.9428.0643.12
Appendix Wall Layersintact77.2722.7360.55
partially raised10004.13
raised96.053.9534.86
upset10000.46
Target Signno49.0250.9836.96
yes94.255.7563.04
Appendicolithno90.919.0947.83
suspected10004.35
yes100047.83
Perfusionhyperperfused96.773.2349.21
hypoperfused96.433.5744.44
no10004.76
present10001.59
Perforationno88.2411.7641.98
not excluded100018.52
suspected66.6733.333.7
yes100035.8
Surrounding Tissue Reactionno63.6436.3617.46
yes94.235.7782.54
Appendicular Abscessno86.1513.8576.47
suspected10001.18
yes100022.35
Pathological Lymph Nodesno59.1840.8224.14
yes53.2546.7575.86
Bowel Wall Thickeningno505044.44
yes85.4514.5555.56
Conglomerate of Bowel Loopsno81.8218.1851.16
yes90.489.5248.84
Ileusno83.7816.2261.67
yes100038.33
Coprostasisno100035.21
yes505064.79
Meteorismno10007.86
yes45.7454.2692.14
Enteritisno86.6713.3322.73
yes31.3768.6377.27
Table A2. Categorical feature statistics for the target variable management.
Table A2. Categorical feature statistics for the target variable management.
FeatureClassConservativePrimary SurgicalSecondary Surgical
Sexfemale65.5229.974.24
male58.5638.962.48
Severitycomplicated096.643.36
uncomplicated72.9623.413.47
Diagnosisappendicitis36.7257.885.4
no appendicitis98.740.630.32
Appendix_on_USno68.8626.744.03
yes58.5338.892.58
Migratory_Painno63.1733.812.85
yes60.6636.023.32
Lower_Right_Abd_Painno73.1724.392.44
yes61.835.063
Contralateral_Rebound_Tendernessno70.7927.291.71
yes50.6744.634.7
Coughing_Painno64.632.852.55
yes59.1738.072.29
Nauseano73.8323.362.8
yes54.342.62.87
Loss_of_Appetiteno71.0527.631.32
yes54.3441.334.08
Neutrophiliano79.5117.522.96
yes46.2650.692.77
Dysuriano64.3232.582.96
yes61.3636.362.27
Stoolconstipation63.2235.631.15
constipation, diarrhea01000
diarrhea57.8139.063.12
normal64.4832.243.1
Peritonitisgeneralized14.6382.932.44
local19.7974.485.21
no81.316.672.04
Psoas_Signno63.834.052.15
yes66.2429.494.27
Ipsilateral_Rebound_Tendernessno80.0318.761.2
yes47.37502.63
US_Performedno26.6746.6726.67
yes62.7834.342.75
Free_Fluidsno74.5722.492.93
yes46.4550.322.9
Appendix_Wall_Layersintact71.9726.521.52
partially raised01000
raised17.1176.326.58
upset01000
Target_Signno60.7831.377.84
yes29.8968.971.15
Appendicolithno54.5536.369.09
suspected10000
yes9.0987.883.03
Perfusionhyperperfused48.3945.166.45
hypoperfused14.2978.577.14
no01000
present01000
Perforationno44.12505.88
not excluded01000
suspected66.6733.330
yes01000
Surrounding_Tissue_Reactionno77.2722.730
yes26.4469.713.85
Appendicular_Abscessno38.4658.463.08
suspected01000
yes089.4710.53
Pathological_Lymph_Nodesno59.1838.782.04
yes68.8327.273.9
Bowel_Wall_Thickeningno68.1827.274.55
yes23.6467.279.09
Conglomerate_of_Bowel_Loopsno31.8263.644.55
yes9.5285.714.76
Ileusno27.0362.168.11
yes095.654.35
Coprostasisno4888
yes69.5730.430
Meteorismno090.919.09
yes66.6727.914.65
Enteritisno2073.336.67
yes90.29.80
Table A3. Categorical feature statistics for target variable severity.
Table A3. Categorical feature statistics for target variable severity.
FeatureClassComplicatedUncomplicated% of Total
Sexfemale14.8585.1548.33
male15.6384.3751.67
Managementconservative010061.84
primary surgical42.5957.4134.57
secondary surgical14.8185.193.46
simultaneous appendectomy01000.13
Diagnosisappendicitis25.4974.5159.36
no appendicitis0.3299.6840.64
Appendix_on_USno17.2282.7835.14
yes14.0985.9164.86
Migratory_Painno15.1284.8872.7
yes15.1784.8327.3
Lower_Right_Abd_Painno19.5180.495.3
yes14.8785.1394.7
Contralateral_Rebound_Tendernessno11.7388.2761.15
yes19.4680.5438.85
Coughing_Painno14.0585.9571.54
yes16.9783.0328.46
Nauseano5.6194.3941.47
yes21.8578.1558.53
Loss_of_Appetiteno7.3792.6349.22
yes22.777.350.78
Neutrophiliano5.1294.8850.68
yes23.8276.1849.32
Dysuriano13.9686.0494.16
yes18.1881.825.84
Stoolconstipation17.2482.7611.37
constipation, diarrhea10000.13
diarrhea19.5380.4716.73
normal13.1186.8971.76
Peritonitisgeneralized51.2248.785.3
local29.1770.8324.84
no7.2292.7869.86
Psoas_Signno15.6684.3468.59
yes10.2689.7431.41
Ipsilateral_Rebound_Tendernessno6.5493.4693.86
yes23.6876.326.14
US_Performedno13.3386.671.93
yes15.0784.9398.07
Free_Fluidsno7.5892.4256.88
yes23.5576.4543.12
Appendix_Wall_Layersintact5.394.760.55
partially raised66.6733.334.13
raised32.8967.1134.86
upset10000.46
Target_Signno19.6180.3936.96
yes21.8478.1663.04
Appendicolithno9.0990.9147.83
suspected01004.35
yes48.4851.5247.83
Perfusionhyperperfused16.1383.8749.21
hypoperfused28.5771.4344.44
no01004.76
present10001.59
Perforationno11.7688.2441.98
not excluded66.6733.3318.52
suspected33.3366.673.7
yes68.9731.0335.8
Surrounding_Tissue_Reactionno6.8293.1817.46
yes30.2969.7182.54
Appendicular_Abscessno21.5478.4676.47
suspected10001.18
yes78.9521.0522.35
Pathological_Lymph_Nodesno16.3383.6724.14
yes9.7490.2675.86
Bowel_Wall_Thickeningno11.3688.6444.44
yes36.3663.6455.56
Conglomerate_of_Bowel_Loopsno22.7377.2751.16
yes71.4328.5748.84
Ileusno10.8189.1961.67
yes82.6117.3938.33
Coprostasisno287235.21
yes21.7478.2664.79
Meteorismno27.2772.737.86
yes13.1886.8292.14
Enteritisno208022.73
yes5.8894.1277.27
Table A4. Demographic/Other.
Table A4. Demographic/Other.
VariableVariable Name in Data FilesExplanationMode and Time of MeasurementVariable Type and Values
Age, yearsAgeObtained from the date of birthAt hospital admissionContinuous
SexSexRegistered genderAt hospital admissionBinary: female/male
Height, cmHeightPatient’s heightAt hospital admissionContinuous
Weight, kgWeightPatient’s weightAt hospital admissionContinuous
Body mass index (BMI), kg/m2BMIMeasures body fat; patient’s weight divided by the square of the heightAt hospital admissionContinuous
Length of stay, daysLength_of_StayLength of stay in the hospitalAt dischargeContinuous
Table A5. Scoring.
Table A5. Scoring.
VariableVariable Name in Data FilesExplanationMode and Time of MeasurementVariable Type and Values
Alvarado score (AS), ptsAlvarado_ScorePatient’s score according to the scoring systemAt hospital admission, after clinical examination and laboratory dataDiscrete
Pediatric appendicitis score (PAS), ptsPediatric_Appendicitis_ScorePatient’s score according to the scoring systemAt hospital admission, after clinical examination and laboratory dataDiscrete
Table A6. Clinical features.
Table A6. Clinical features.
VariableVariable Name in Data FilesExplanationMode and Time of MeasurementVariable Type and Values
Peritonitis/
abdominal guarding
PeritonitisSpasm of abdominal wall muscles detected on palpation, usually a result of inflammationAt hospital admission, during clinical examination, or after a few hours of observation, if needed, after analgesiaCategorical:
no
localized
generalized
Migration of painMigratory_PainAbdominal pain; usually starts in the epigastrium and moves to the right lower quadrantAt hospital admission, during clinical examination or anamnesisBinary: no/yes
Tenderness in right lower quadrant (RLQ)Lower_Right_Abd_PainRight iliac fossa pain detected on palpationAt hospital admission, during clinical examinationBinary: no/yes
Contralateral rebound tndernessContralateral_Rebound_TendernessA state in which pain of the contralateral side (usually, the right lower quadrant) is felt on the release of pressure (usually, in the left lower quadrant) over the abdomenAt hospital admission, during clinical examinationBinary: no/yes
Ipsilateral rebound tendernessIpsilateral_Rebound_TendernessA state in which pain of the ipsilateral side is felt on the release of pressure over the abdomenAt hospital admission, during clinical examinationBinary: no/yes
Cough tendernessCoughing_PainAbdominal pain from forced coughAt hospital admission, during clinical examinationBinary: no/yes
Psoas signPsoas_SignAbdominal pain produced by extension of the hipAt hospital admission, during clinical examinationBinary: negative/positive
Nausea/vomitingNauseaFeeling of sickness/ejection of contents from the stomach through the mouthAnamnesisBinary: no/yes
AnorexiaLoss_of_AppetiteLoss of appetiteAnamnesisBinary: no/yes
Body temperature, °CBody_TemperatureMeasured by a thermometer placed in the rectum or in the auditory canalAt hospital admission or after a few hours of observationContinuous
DysuriaDysuriaPain or other difficulty during urinationAnamnesisBinary: no/yes
StoolStoolCharacteristics of bowel movementsAnamnesisCategorical:
· normal
· diarrhea
· obstipation
Table A7. Laboratory Features.
Table A7. Laboratory Features.
VariableVariable Name in Data FilesExplanationMode and Time of MeasurementVariable Type and Values
White blood cell count (WBC), 103/µLWBC_CountThe number of leucocytes in a unit volume of blood; inflammation parameterAt hospital admission, obtained from a routine hemogramContinuous
Red blood cell count (RBC), /pLRBC_CountThe number of erythrocytes in a unit volume of boodAt hospital admission, obtained from a routine hemogramContinuous
Hemoglobin, g/dLHemoglobinHemoglobin level; a red protein in the red blood cells that contains iron and is responsible for transporting oxygenAt hospital admission, obtained from a routine hemogramContinuous
Red cell distribution width (RDW), %RDWA blood test that measures the differences in the volume and size of the erythrocytesAt hospital admission, obtained from a routine hemogramContinuous
Thrombocyte count, /nLThrombocyte_CountThe number of platelets in a unit volume of boodAt hospital admission, obtained from a routine hemogramContinuous
Neutrophils, %Neutrophil_PercentageMature WBC in the granulocytic seriesAt hospital admission, obtained from differential WBCContinuous
Neutrophilia, >= 75%NeutrophiliaRelative neutrophilic leucocytosis, often a result of a bacterial infectionAt hospital admission, obtained from differential WBCBinary: no/yes
Segmented neutrophils, %Segmented_NeutrophilsMost mature neutrophilic granulocytes present in circulating blood, increased during an inflammatory disorderAt hospital admission, obtained from differential WBCContinuous
C-reactive protein (CRP), mg/LCRPProtein produced by the liver, elevated in case of inflammation, infection, or injuryAt hospital admission, obtained from blood sampleContinuous
Ketones in urineKetones_in_UrinePresence of ketone bodies in urine, e.g., in case of anorexiaAt hospital admission, obtained from routine urine statusCategorical:
o
+
++
+++
Erythrocytes in urineRBC_in_UrineBlood in urineAt hospital admission, obtained from routine urine statusCategorical:
neg: <5 ery/µL
+: approx. 5–10 ery/µL
++: approx. 25 ery/µL
+++: approx. 50 ery/µL
White blood cells in urineWBC_in_UrineLeucocytes in urine, e.g., in case of infectionAt hospital admission, obtained from routine urine statusCategorical:
no
+
++
+++
Table A8. Ultrasound Features.
Table A8. Ultrasound Features.
VariableVariable Name in Data FilesExplanationMode and Time of MeasurementVariable Type and Values
Performed ultrasound (US)US_PerformedIf an abdominal ultrasonography was performed or notAt hospital admission, after clinical examination, or after a few hours of observationBinary: no/yes
Visibility of appendixAppendix_on_USDetectability of the vermiform appendix during sonographic examinationAt hospital admission, after clinical examination, or after a few hours of observationBinary: no/yes
Appendix diameter, mmAppendix_DiameterMaximal outer diameter of the appendixAt hospital admission, after clinical examination, or after a few hours of observationContinuous
Free intraperitoneal fluidFree_FluidsFree fluids inside the abdomenAt hospital admission, after clinical examination, or after a few hours of observationBinary: no/yes
Appendix layer structureAppendix_Wall_LayersDistribution and characteristics of appendix layers, e.g., irregular in case of increasing inflammationAt hospital admission, after clinical examination, or after a few hours of observationBinary: regular/irregular
Target signTarget_SignAxial image of appendix with a fluid-filled centre surrounded by echogenic mucosa and submucosa and hypoechoic muscularisAt hospital admission, after clinical examination, or after a few hours of observationBinary: no/yes
Appendix perfusionPerfusionBlood flow to the appendix wallAt hospital admission, after clinical examination, or after a few hours of observationCategorical:
unremarkable
hypoperfused
hyperperfused
Surrounding tissue reactionSurrounding_Tissue_ReactionInflammation signs in tissue (i.a. in omentum/fat tissue) surrounding appendixAt hospital admission, after clinical examination, or after a few hours of observationBinary: no/yes
Pathological lymph nodesPathological_Lymph_NodesEnlarged and inflamed intra-abdominal lymph nodesAt hospital admission, after clinical examination, or after a few hours of observationBinary: no/yes
Location of pathological lymph nodesLymph_Node_LocationThe location of pathological lymph nodes in the abdomenAt hospital admission, after clinical examination, or after a few hours of observationFree-form text (in German)
Thickening of the bowel wallBowel_Wall_ThickeningEdema of the intestinal wall, >2–3 mm for small bowel wall thickeningAt hospital admission, after clinical examination, or after a few hours of observationBinary: no/yes
IleusIleusSonographic signs of paralytic ileus (e.g., dilated intestinal loops, pendulum peristalsis or absence of peristalsis)At hospital admission, after clinical examination, or after a few hours of observationBinary: no/yes
CoprostasisCoprostasisFecal impaction in the colonAt hospital admission, after clinical examination, or after a few hours of observationBinary: no/yes
MeteorismMeteorismAccumulation of gas in the intestineAt hospital admission. after clinical examination, or after a few hours of observationBinary: no/yes
EnteritisEnteritisSonographic features of gastroenteritis, e.g., wall thickening of the ileum, increased peristalsisAt hospital admission, after clinical examination, or after a few hours of observationBinary: no/yes
AppendicolithApendicolithPresence of fecalith in the appendix, e.g., acoustic shadowAt hospital admission, after clinical examination, or after a few hours of observationBinary: no/yes
PerforationPerforationSigns of appendix perforation in USAt hospital admission, after clinical examination, or after a few hours of observationBinary: no/yes
Appendicular abscessAppendicular_AbscessAppendiceal massAt hospital admission, after clinical examination, or after a few hours of observationBinary: no/yes
Location of abscessAbscess_LocationLocation of the abscess intraperitonealAt hospital admission, after clinical examination, or after a few hours of observationFree-form text (in German)
Conglomerate of bowel loopsConglomerate_of_Bowel_LoopsSmall intestine conglomerate as a sign of intraperitoneal inflammationAt hospital admission, after clinical examination, or after a few hours of observationBinary: no/yes
Gynecological findingsGynecological_FindingsGynecological abnormalities, e.g., cysts, ovarian torsionAt hospital admission, after clinical examination, or after a few hours of observationFree-form text (in German)
Ultrasound imagesNASnapshots from the abdominal ultrasound examsAt hospital admission, after clinical examination, or after a few hours of observationImages in BMP format
Table A9. Diagnosis/management/severity target variables.
Table A9. Diagnosis/management/severity target variables.
VariableVariable Name in Data FilesExplanationMode and Time of MeasurementVariable Type and Values
Presumptive diagnosisDiagnosis_PresumptivePatient’s suspected diagnosisAt hospital admission, after clinical examination, or after a few hours of observationFree-form text (in German)
DiagnosisDiagnosisPatient’s diagnosis, histologically confirmed for operated patients. Conservatively managed patients were labelled as having appendicitis if they had an AS or PAS of ≥ 4 and an appendix diameter of ≥6 mmAt hospital admission, after clinical examination, or after a few hours of observationBinary: no appendicitis/appendicitis
ManagementManagementManagement of the patient assigned by a senior pediatric surgeon: operative (appendectomy: laparoscopic, open or conversion) or conservative (without antibiotics). In case of the secondary surgery after prior stay, the patient was labelled as operatively managed.At hospital admission after clinical examination, or after a few hours of observation; or during follow-up.Categorical:
conservative
primary surgical
secondary surgical
SeveritySeveritySeverity of appendicitis: uncomplicated: subacute/catharral, fibrosis; phlegmonous or complicated: gangrenous, perforated, abscessedAt hospital admission after clinical examination, or after a few hours of observation; or during follow-up.Binary: uncomplicated or no appendicitis/complicated appendicitis

Appendix B

Results from predicting Diagnosis with US image features.
Table A10. Diagnosis with US image features holdout set performance.
Table A10. Diagnosis with US image features holdout set performance.
ModelSelectionEmbed_SelectorAccAurocBal-AccF1NpvPpvSensSpec
rfembed_lgbmlgbm0.9810.9930.9780.9800.9920.9740.9780.961
lgbmembed_linearlinear0.9810.9930.9790.9800.9840.9790.9790.969
rfprednone0.9810.9930.9790.9800.9840.9790.9790.969
lgbmnonenone0.9810.9940.9790.9800.9840.9790.9790.969
lgbmprednone0.9780.9960.9760.9770.9760.9780.9760.969
rfembed_linearlinear0.9780.9910.9740.9770.9920.9680.9740.953
lgbmassocnone0.9780.9940.9750.9770.9840.9730.9750.961
lgbmembed_lgbmlgbm0.9780.9960.9760.9770.9760.9780.9760.969
rfnonenone0.9650.9920.9580.9630.9920.9480.9580.921
rfassocnone0.9520.9930.9420.9490.9910.9290.9420.890
rfwrapnone0.8750.9450.8670.8700.8610.8840.8670.827
lrprednone0.8650.9470.8640.8620.8200.8990.8640.858
lgbmwrapnone0.8620.9500.8570.8570.8330.8820.8570.827
sgdprednone0.8370.8430.8230.8280.8330.8380.8230.748
lrassocnone0.8330.9100.8270.8270.7950.8590.8270.795
lrnonenone0.8330.9100.8270.8270.7950.8590.8270.795
lrembed_linearlinear0.8330.9100.8270.8270.7950.8590.8270.795
lrembed_lgbmlgbm0.8040.8930.7940.7960.7700.8260.7940.740
sgdembed_linearlinear0.7790.7980.7590.7650.7690.7840.7590.654
sgdnonenone0.7660.7530.7530.7550.7250.7920.7530.685
sgdassocnone0.7600.7480.7480.7490.7130.7890.7480.685
sgdwrapnone0.7600.7950.7530.7520.7000.8020.7530.717
lrwrapnone0.7530.8470.7240.7300.7660.7480.7240.567
sgdembed_lgbmlgbm0.7530.7430.7430.7430.7020.7870.7430.685
knnnonenone0.6830.7150.6510.6530.6490.6970.6510.480
knnembed_linearlinear0.6830.7210.6440.6440.6710.6870.6440.433
knnassocnone0.6830.7210.6440.6440.6710.6870.6440.433
knnprednone0.6760.7220.6340.6330.6670.6790.6340.409
knnembed_lgbmlgbm0.6570.6770.6270.6280.6020.6820.6270.465
knnwrapnone0.6220.6010.6010.6020.5390.6700.6010.488
dummyembed_lgbmlgbm0.5930.5000.5000.372nan0.5930.5000.000
dummywrapnone0.5930.5000.5000.372nan0.5930.5000.000
dummyprednone0.5930.5000.5000.372nan0.5930.5000.000
dummynonenone0.5930.5000.5000.372nan0.5930.5000.000
dummyembed_linearlinear0.5930.5000.5000.372nan0.5930.5000.000
dummyassocnone0.5930.5000.5000.372nan0.5930.5000.000
nan = Not a Number.
Table A11. Diagnosis with US image features 5-fold performance on the holdout set.
Table A11. Diagnosis with US image features 5-fold performance on the holdout set.
ModelSelectionEmbed_SelectorAccAurocBal-AccF1NpvPpvSensSpec
rfembed_linearlinear0.9840.9930.9810.9830.9920.9790.9810.968
lgbmassocnone0.9840.9950.9830.9830.9850.9840.9830.976
lgbmprednone0.9810.9970.9800.9800.9770.9840.9800.976
lgbmnonenone0.9810.9970.9770.9800.9920.9740.9770.960
rfembed_lgbmlgbm0.9810.9930.9770.9800.9920.9740.9770.960
rfprednone0.9780.9940.9750.9770.9850.9740.9750.960
lgbmembed_linearlinear0.9740.9930.9710.9730.9850.9690.9710.952
rfnonenone0.9620.9920.9560.9600.9770.9540.9560.929
rfassocnone0.9260.9880.9320.9250.8800.9710.9320.960
lgbmwrapnone0.8590.9440.8510.8510.8480.8810.8510.810
rfwrapnone0.8590.9490.8490.8520.8530.8680.8490.794
lrprednone0.8560.9390.8530.8510.8100.8910.8530.842
lgbmembed_lgbmlgbm0.8270.9090.8190.8190.7920.8530.8190.778
lrembed_linearlinear0.8110.9000.8050.8030.7630.8510.8050.778
lrnonenone0.8110.8990.8050.8030.7630.8510.8050.778
lrassocnone0.8110.8990.8050.8030.7630.8510.8050.778
sgdprednone0.7920.8010.7850.7840.7370.8320.7850.755
lrembed_lgbmlgbm0.7850.8730.7740.7730.7460.8170.7740.715
sgdembed_linearlinear0.7500.8060.7470.7430.6780.8060.7470.731
sgdassocnone0.7500.7430.7430.7410.6850.7980.7430.707
sgdnonenone0.7430.7410.7370.7340.6740.7990.7370.707
sgdembed_lgbmlgbm0.7310.7260.7260.7230.6590.7870.7260.700
lrwrapnone0.7120.8090.6860.6880.6850.7280.6860.550
knnembed_linearlinear0.6830.7190.6410.6400.6830.6840.6410.417
knnassocnone0.6830.7190.6410.6400.6830.6840.6410.417
sgdwrapnone0.6820.6840.6750.6720.6050.7440.6750.636
knnprednone0.6790.7250.6450.6430.6410.6950.6450.462
knnnonenone0.6760.7390.6460.6490.6390.6960.6460.487
knnembed_lgbmlgbm0.6700.7170.6430.6420.6060.7010.6430.503
dummyembed_lgbmlgbm0.5930.5000.5000.372nan0.5930.5000.000
dummywrapnone0.5930.5000.5000.372nan0.5930.5000.000
dummyprednone0.5930.5000.5000.372nan0.5930.5000.000
dummynonenone0.5930.5000.5000.372nan0.5930.5000.000
dummyembed_linearlinear0.5930.5000.5000.372nan0.5930.5000.000
dummyassocnone0.5930.5000.5000.372nan0.5930.5000.000
knnwrapnone0.5800.5610.5610.5600.4840.6430.5610.463
nan = Not a Number.

Appendix C

Feature selection results for leading wrapper-based embedded LGBM feature selection for predicting Diagnosis with US image features.
Table A12. Selection scores (Importances: Larger magnitude = More important).
Table A12. Selection scores (Importances: Larger magnitude = More important).
FeatureScore
Management_surgical6.800 × 101
Appendix_Diameter5.800 × 101
Appendix_Diameter_NAN4.900 × 101
Thrombocyte_Count3.400 × 101
Age3.400 × 101
Paedriatic_Appendicitis_Score2.900 × 101
WBC_Count2.700 × 101
Alvarado_Score2.500 × 101
CRP2.200 × 101
Appendix_on_US_yes1.800 × 101
Hemoglobin1.400 × 101
RDW1.400 × 101
Neutrophil_Percentage1.300 × 101
BMI1.000 × 101
Body_Temperature9.000 × 100
RBC_Count8.000 × 100
Coughing_Pain_yes7.000 × 100
Height4.000 × 100
Surrounding_Tissue_Reaction_nan2.000 × 100
Peritonitis_no2.000 × 100
Weight1.000 × 100
Contralateral_Rebound_Tenderness_yes1.000 × 100
Psoas_Sign_yes1.000 × 100

Appendix D

Results from predicting diagnosis without US image features.
Table A13. Diagnosis without US image features holdout set performance.
Table A13. Diagnosis without US image features holdout set performance.
ModelSelectionEmbed_SelectorAccAurocBal-AccF1NpvPpvSensSpec
lgbmnonenone0.8010.8730.7980.7960.7440.8440.7980.780
sgdwrapnone0.7920.7800.7800.7820.7580.8120.7800.717
lgbmassocnone0.7820.8820.7780.7760.7220.8270.7780.756
rfembed_lgbmlgbm0.7790.8640.7780.7740.7100.8330.7780.772
lgbmembed_lgbmlgbm0.7760.8710.7660.7670.7280.8070.7660.717
rfnonenone0.7690.8610.7720.7650.6900.8380.7720.787
rfembed_linearlinear0.7660.8590.7680.7620.6880.8330.7680.780
rfassocnone0.7660.8610.7510.7540.7330.7860.7510.669
rfwrapnone0.7600.8620.7420.7460.7320.7750.7420.646
rfprednone0.7600.8580.7440.7470.7240.7810.7440.661
lgbmembed_linearlinear0.7530.8720.7460.7450.6920.7970.7460.709
lrwrapnone0.7500.8230.7300.7340.7250.7640.7300.622
lgbmprednone0.7470.8550.7350.7360.6970.7790.7350.669
lgbmwrapnone0.7440.8580.7340.7340.6850.7840.7340.685
lrprednone0.7400.8470.7260.7280.6950.7680.7260.646
lrembed_linearlinear0.7340.8140.7100.7150.7120.7450.7100.583
lrassocnone0.7280.8150.7040.7080.7020.7400.7040.575
sgdembed_linearlinear0.7240.7530.7070.7100.6780.7510.7070.614
lrnonenone0.7150.8090.6920.6950.6790.7330.6920.567
sgdprednone0.7150.7850.6990.7010.6610.7470.6990.614
lrembed_lgbmlgbm0.7120.7690.6840.6880.6870.7230.6840.535
sgdnonenone0.7120.7660.6950.6970.6580.7440.6950.606
sgdembed_lgbmlgbm0.7080.6890.6890.6910.6610.7350.6890.583
sgdassocnone0.7050.7630.6750.6780.6840.7140.6750.512
knnwrapnone0.6890.6820.6590.6620.6560.7040.6590.496
knnnonenone0.6860.7130.6540.6560.6560.6990.6540.480
knnembed_lgbmlgbm0.6730.7210.6330.6320.6540.6800.6330.417
knnassocnone0.6600.6590.6230.6230.6210.6760.6230.425
knnprednone0.6470.6670.6160.6170.5880.6740.6160.449
knnembed_linearlinear0.6440.6200.6200.6210.5740.6810.6200.488
dummywrapnone0.5930.5000.5000.372nan0.5930.5000.000
dummyembed_linearlinear0.5930.5000.5000.372nan0.5930.5000.000
dummynonenone0.5930.5000.5000.372nan0.5930.5000.000
dummyprednone0.5930.5000.5000.372nan0.5930.5000.000
dummyembed_lgbmlgbm0.5930.5000.5000.372nan0.5930.5000.000
dummyassocnone0.5930.5000.5000.372nan0.5930.5000.000
nan = Not a Number.
Table A14. Diagnosis without US image features 5-fold performance on the holdout set.
Table A14. Diagnosis without US image features 5-fold performance on the holdout set.
ModelSelectionEmbed_SelectorAccAurocBal-AccF1NpvPpvSensSpec
lgbmembed_lgbmlgbm0.8080.8780.8050.8020.7560.8550.8050.794
lgbmnonenone0.8010.8800.7980.7950.7530.8470.7980.779
lgbmembed_linearlinear0.7950.8940.7880.7870.7510.8330.7880.755
lgbmassocnone0.7850.8830.7840.7800.7200.8400.7840.779
rfwrapnone0.7820.8650.7760.7740.7300.8260.7760.747
rfprednone0.7790.8670.7760.7730.7140.8310.7760.763
rfassocnone0.7720.8700.7650.7640.7150.8160.7650.731
lgbmprednone0.7720.8640.7630.7620.7200.8090.7630.715
lgbmwrapnone0.7690.8660.7750.7650.6860.8560.7750.810
lrprednone0.7660.8310.7500.7520.7280.7900.7500.667
rfnonenone0.7630.8530.7480.7510.7280.7850.7480.670
rfembed_linearlinear0.7590.8660.7660.7550.6710.8510.7660.802
rfembed_lgbmlgbm0.7470.8610.7410.7390.6920.7940.7410.709
lrwrapnone0.7470.8150.7330.7340.7040.7780.7330.660
lrassocnone0.7430.8000.7290.7300.6960.7730.7290.652
lrembed_linearlinear0.7370.7990.7220.7230.6870.7680.7220.644
lrnonenone0.7300.7980.7140.7160.6820.7610.7140.628
sgdwrapnone0.7180.7100.7100.7080.6480.7710.7100.668
sgdprednone0.7180.7750.6980.7000.6670.7460.6980.596
lrembed_lgbmlgbm0.7080.7540.6860.6860.6570.7380.6860.573
sgdassocnone0.6950.7590.6770.6760.6360.7350.6770.580
sgdnonenone0.6920.7500.6780.6760.6260.7410.6780.604
knnwrapnone0.6860.7060.6710.6710.6250.7280.6710.590
knnprednone0.6830.7160.6690.6700.6180.7270.6690.598
sgdembed_lgbmlgbm0.6790.6700.6700.6680.6020.7380.6700.621
sgdembed_linearlinear0.6760.7360.6650.6630.5970.7370.6650.612
knnnonenone0.6700.7350.6390.6400.6340.6890.6390.472
knnembed_lgbmlgbm0.6570.7220.6190.6170.6130.6730.6190.416
knnassocnone0.6540.6800.6380.6390.5800.7020.6380.552
dummyembed_lgbmlgbm0.5930.5000.5000.372nan0.5930.5000.000
dummyprednone0.5930.5000.5000.372nan0.5930.5000.000
dummyembed_linearlinear0.5930.5000.5000.372nan0.5930.5000.000
dummynonenone0.5930.5000.5000.372nan0.5930.5000.000
dummywrapnone0.5930.5000.5000.372nan0.5930.5000.000
dummyassocnone0.5930.5000.5000.372nan0.5930.5000.000
knnembed_linearlinear0.5700.5690.5570.5560.4740.6410.5570.487
nan = Not a Number.

Appendix E

Hyperparameters established by Optuna hyperparameter tuning for each of our leading models.
Table A15. Optuna hyperparameters, for example, leading models.
Table A15. Optuna hyperparameters, for example, leading models.
TargetUS
Features
ModelFeature
Selection
Hyperparameters
DiagnosisYesrfembed_linear{‘verbosity’: −1, ‘boosting_type’: ‘rf’,
‘bagging_freq’: 1, ‘bagging_fraction’: 0.6424705933428012, ‘n_estimators’: 100, ‘reg_alpha’: 0.0003532789339921058, ‘reg_lambda’: 0.004369030571226374, ‘num_leaves’: 8, ‘colsample_bytree’: 0.8437223587619459,
‘subsample’: 0.403473633073295, ‘subsample_freq’: 1, ‘min_child_samples’: 5}
DiagnosisNolgbmembed_lgbm{‘verbosity’: −1, ‘n_estimators’: 50,
‘reg_alpha’: 0.06466023097198124, ‘reg_lambda’: 0.022294761212156983, ‘num_leaves’: 15, ‘colsample_bytree’: 0.5464250771120893,
‘subsample’: 0.5536293838457955, ‘subsample_freq’: 7, ‘min_child_samples’: 29}
ManagementYeslgbmassoc{‘verbosity’: −1, ‘n_estimators’: 150,
‘reg_alpha’: 0.01918207182498792, ‘reg_lambda’: 7.461771397395436, ‘num_leaves’: 2,’colsample_bytree’: 0.5614712282427238,
‘subsample’: 0.8168115609573287, ‘subsample_freq’: 5, ‘min_child_samples’: 7}
ManagementNolgbmno_select{‘verbosity’: −1, ‘n_estimators’: 50,
‘reg_alpha’: 1.2550179156417959 × 10−8, ‘reg_lambda’: 1.9742923076305905 × 10−8, ‘num_leaves’: 2, ‘colsample_bytree’: 0.9856142911837322,
‘subsample’: 0.7805261984723494, ‘subsample_freq’: 0, ‘min_child_samples’: 5}
SeverityYeslrwrap{‘max_iter’: 2000, ‘penalty’: ‘elasticnet’, ‘solver’: ‘saga’,
‘l1_ratio’: 0.09092139813688659, ‘C’: 0.0007760418893874168}
SeverityNolgbmassoc{‘verbosity’: −1, ‘n_estimators’: 200,
‘reg_alpha’: 0.025561180230324252, ‘reg_lambda’: 0.0020714646371430326, ‘num_leaves’: 67, ‘colsample_bytree’: 0.4887103613060258,
‘subsample’: 0.5044229983427804, ‘subsample_freq’: 3, ‘min_child_samples’: 18}

Appendix F

Results from predicting management with US image features.
Table A16. Management with US image features holdout set performance.
Table A16. Management with US image features holdout set performance.
ModelSelectionEmbed_SelectorAccAurocBal-AccF1NpvPpvSensSpec
rfassocnone0.9360.9840.9220.9310.9630.9220.9220.866
rfembed_linearlinear0.9360.9780.9220.9310.9630.9220.9220.866
rfnonenone0.9330.9800.9170.9270.9710.9140.9170.849
lgbmnonenone0.9300.9820.9160.9240.9530.9170.9160.857
rfprednone0.9300.9770.9140.9240.9620.9130.9140.849
lgbmprednone0.9270.9800.9110.9200.9530.9130.9110.849
rfwrapnone0.9230.9460.9020.9160.9800.8970.9020.815
rfembed_lgbmlgbm0.9230.9800.9090.9170.9440.9130.9090.849
lgbmassocnone0.9230.9810.9070.9170.9520.9090.9070.840
lgbmembed_lgbmlgbm0.9230.9820.9060.9160.9610.9050.9060.832
lgbmembed_linearlinear0.9200.9830.9030.9130.9520.9040.9030.832
lgbmwrapnone0.9200.9430.9000.9120.9700.8970.9000.815
lrprednone0.8790.9490.8550.8660.9090.8640.8550.756
lrembed_linearlinear0.8660.9290.8380.8510.9050.8490.8380.723
lrnonenone0.8660.9290.8380.8510.9050.8490.8380.723
lrassocnone0.8630.9240.8340.8470.9040.8450.8340.714
lrembed_lgbmlgbm0.8530.9200.8230.8360.8920.8360.8230.697
sgdprednone0.8470.8900.8370.8370.7980.8760.8370.798
sgdassocnone0.8270.8390.8230.8190.7560.8760.8230.807
lrwrapnone0.8270.8970.7840.8010.9110.7990.7840.605
sgdnonenone0.8050.7890.7890.7910.7540.8340.7890.723
sgdembed_linearlinear0.8020.8720.7850.7880.7520.8300.7850.714
sgdwrapnone0.7800.7920.7590.7620.7270.8080.7590.672
sgdembed_lgbmlgbm0.7760.8520.7460.7540.7470.7900.7460.622
knnembed_lgbmlgbm0.7410.7720.6990.7060.7210.7490.6990.521
knnprednone0.7190.7790.6560.6590.7460.7120.6560.395
knnnonenone0.6960.7570.6170.6060.7730.6840.6170.286
knnembed_linearlinear0.6960.7570.6170.6060.7730.6840.6170.286
knnassocnone0.6960.7570.6170.6060.7730.6840.6170.286
knnwrapnone0.6870.7200.6370.6400.6300.7070.6370.429
dummyembed_lgbmlgbm0.6200.5000.5000.383nan0.6200.5000.000
dummywrapnone0.6200.5000.5000.383nan0.6200.5000.000
dummyprednone0.6200.5000.5000.383nan0.6200.5000.000
dummynonenone0.6200.5000.5000.383nan0.6200.5000.000
dummyembed_linearlinear0.6200.5000.5000.383nan0.6200.5000.000
dummyassocnone0.6200.5000.5000.383nan0.6200.5000.000
nan = Not a Number.
Table A17. Management with US image features 5-fold performance on the holdout set.
Table A17. Management with US image features 5-fold performance on the holdout set.
ModelSelectionEmbed_SelectorAccAurocBal-AccF1NpvPpvSensSpec
lgbmassocnone0.9300.9760.9180.9240.9450.9220.9180.866
lgbmembed_linearlinear0.9300.9730.9180.9240.9440.9230.9180.867
rfembed_lgbmlgbm0.9270.9730.9120.9200.9530.9140.9120.850
rfnonenone0.9200.9680.9070.9130.9370.9140.9070.850
rfembed_linearlinear0.9200.9710.9050.9130.9440.9090.9050.841
rfassocnone0.9200.9730.9080.9140.9290.9170.9080.858
lgbmembed_lgbmlgbm0.9200.9690.9100.9140.9190.9220.9100.866
lgbmwrapnone0.9110.9370.8930.9020.9420.8970.8930.816
rfprednone0.9110.9680.8960.9020.9280.9050.8960.833
lgbmnonenone0.9070.9760.8950.9000.9140.9090.8950.841
lgbmprednone0.9070.9710.8950.9000.9110.9060.8950.841
rfwrapnone0.9040.9380.8860.8950.9360.8920.8860.808
lrprednone0.8820.9380.8640.8710.8900.8810.8640.790
lrembed_linearlinear0.8050.8920.7780.7860.7950.8120.7780.664
lrnonenone0.8050.8920.7780.7860.7950.8120.7780.664
sgdprednone0.8010.8610.7890.7890.7480.8390.7890.739
lrassocnone0.7990.8880.7710.7790.7860.8080.7710.656
lrembed_lgbmlgbm0.7820.8770.7500.7590.7730.7890.7500.614
sgdwrapnone0.7700.7780.7560.7560.7110.8140.7560.697
lrwrapnone0.7540.8370.7100.7190.7590.7550.7100.529
sgdnonenone0.7500.7300.7300.7330.6900.7880.7300.647
sgdembed_linearlinear0.7440.8120.7240.7260.6840.7830.7240.639
sgdassocnone0.7410.7910.7160.7200.6870.7750.7160.614
sgdembed_lgbmlgbm0.7250.8140.7020.7040.6490.7680.7020.605
knnprednone0.7160.7820.6440.6400.7950.7020.6440.344
knnnonenone0.6970.7550.6150.5990.7780.6840.6150.276
knnembed_linearlinear0.6970.7550.6150.5990.7780.6840.6150.276
knnassocnone0.6970.7550.6150.5990.7780.6840.6150.276
knnembed_lgbmlgbm0.6930.7500.6560.6580.6230.7280.6560.504
knnwrapnone0.6900.7180.6580.6600.6170.7310.6580.522
dummyembed_lgbmlgbm0.6200.5000.5000.383nan0.6200.5000.000
dummyembed_linearlinear0.6200.5000.5000.383nan0.6200.5000.000
dummywrapnone0.6200.5000.5000.383nan0.6200.5000.000
dummynonenone0.6200.5000.5000.383nan0.6200.5000.000
dummyprednone0.6200.5000.5000.383nan0.6200.5000.000
dummyassocnone0.6200.5000.5000.383nan0.6200.5000.000
nan = Not a Number.

Appendix G

Filter-based Association Feature Selection Results for Predicting Management with US image derived features.
Table A18. Continuous Feature scores (Mutual Information: Higher = More important).
Table A18. Continuous Feature scores (Mutual Information: Higher = More important).
Mut_Info
CRP__Management.01.396 × 10−1
CRP__Management.11.358 × 10−1
Alvarado_Score__Management.11.052 × 10−1
Appendix_Diameter__Management.09.615 × 10−2
Appendix_Diameter__Management.18.457 × 10−2
WBC_Count__Management.17.696 × 10−2
Neutrophil_Percentage__Management.07.531 × 10−2
Paedriatic_Appendicitis_Score__Management.07.084 × 10−2
Neutrophil_Percentage__Management.16.487 × 10−2
Alvarado_Score__Management.06.183 × 10−2
WBC_Count__Management.06.059 × 10−2
Height__Management.15.915 × 10−2
RDW__Management.05.379 × 10−2
Segmented_Neutrophils__Management.05.172 × 10−2
Paedriatic_Appendicitis_Score__Management.14.978 × 10−2
RDW__Management.14.709 × 10−2
Ketones_in_Urine__Management.04.477 × 10−2
Weight__Management.04.001 × 10−2
Weight__Management.13.639 × 10−2
Hemoglobin__Management.03.513 × 10−2
Height__Management.03.048 × 10−2
Body_Temperature__Management.02.882 × 10−2
Ketones_in_Urine__Management.12.756 × 10−2
RBC_Count__Management.11.856 × 10−2
Body_Temperature__Management.11.781 × 10−2
Hemoglobin__Management.11.754 × 10−2
WBC_in_Urine__Management.11.720 × 10−2
RBC_Count__Management.01.283 × 10−2
Age__Management.09.069 × 10−3
Age__Management.18.695 × 10−3
RBC_in_Urine__Management.18.061 × 10−3
Segmented_Neutrophils__Management.10.000 × 100
Thrombocyte_Count__Management.00.000 × 100
Thrombocyte_Count__Management.10.000 × 100
BMI__Management.10.000 × 100
RBC_in_Urine__Management.00.000 × 100
WBC_in_Urine__Management.00.000 × 100
BMI__Management.00.000 × 100
Table A19. Categorical Feature scores (Mutual Information: Higher = More important).
Table A19. Categorical Feature scores (Mutual Information: Higher = More important).
Feature_TargetclassMut_Info
Ipsilateral_Rebound_Tenderness__Management.surgical2.827 × 10−1
Ipsilateral_Rebound_Tenderness__Management.conservative2.827 × 10−1
Ipsilateral_Rebound_Tenderness2.827 × 10−1
Diagnosis2.553 × 10−1
Diagnosis__Management.surgical2.553 × 10−1
Diagnosis__Management.conservative2.553 × 10−1
Peritonitis__Management.conservative1.963 × 10−1
Peritonitis1.963 × 10−1
Peritonitis__Management.surgical1.963 × 10−1
Severity1.800 × 10−1
Severity__Management.conservative1.800 × 10−1
Severity__Management.surgical1.800 × 10−1
Surrounding_Tissue_Reaction__Management.conservative1.077 × 10−1
Surrounding_Tissue_Reaction1.077 × 10−1
Surrounding_Tissue_Reaction__Management.surgical1.077 × 10−1
Neutrophilia__Management.surgical6.087 × 10−2
Neutrophilia6.087 × 10−2
Neutrophilia__Management.conservative6.087 × 10−2
Appendix_Wall_Layers__Management.conservative5.302 × 10−2
Appendix_Wall_Layers5.302 × 10−2
Appendix_Wall_Layers__Management.surgical5.302 × 10−2
Ileus__Management.conservative4.696 × 10−2
Ileus__Management.surgical4.696 × 10−2
Ileus4.696 × 10−2
Dysuria__Management.conservative3.966 × 10−2
Dysuria3.966 × 10−2
Dysuria__Management.surgical3.966 × 10−2
Free_Fluids__Management.surgical3.871 × 10−2
Free_Fluids__Management.conservative3.871 × 10−2
Free_Fluids3.871 × 10−2
Perforation__Management.conservative3.749 × 10−2
Perforation__Management.surgical3.749 × 10−2
Perforation3.749 × 10−2
Appendicolith3.328 × 10−2
Appendicolith__Management.surgical3.328 × 10−2
Appendicolith__Management.conservative3.328 × 10−2
Psoas_Sign3.245 × 10−2
Psoas_Sign__Management.surgical3.245 × 10−2
Psoas_Sign__Management.conservative3.245 × 10−2
Target_Sign__Management.surgical3.087 × 10−2
Target_Sign__Management.conservative3.087 × 10−2
Target_Sign3.087 × 10−2
Contralateral_Rebound_Tenderness2.795 × 10−2
Contralateral_Rebound_Tenderness__Management.surgical2.795 × 10−2
Contralateral_Rebound_Tenderness__Management.conservative2.795 × 10−2
Coprostasis__Management.conservative2.749 × 10−2
Coprostasis2.749 × 10−2
Coprostasis__Management.surgical2.749 × 10−2
Perfusion__Management.conservative2.600 × 10−2
Perfusion__Management.surgical2.600 × 10−2
Perfusion2.600 × 10−2
Nausea2.583 × 10−2
Nausea__Management.surgical2.583 × 10−2
Nausea__Management.conservative2.583 × 10−2
Loss_of_Appetite__Management.surgical2.467 × 10−2
Loss_of_Appetite__Management.conservative2.467 × 10−2
Loss_of_Appetite2.467 × 10−2
Enteritis__Management.surgical2.135 × 10−2
Enteritis2.135 × 10−2
Enteritis__Management.conservative2.135 × 10−2
Stool__Management.conservative2.133 × 10−2
Stool__Management.surgical2.133 × 10−2
Stool2.133 × 10−2
Conglomerate_of_Bowel_Loops__Management.conservative2.066 × 10−2
Conglomerate_of_Bowel_Loops__Management.surgical2.066 × 10−2
Conglomerate_of_Bowel_Loops2.066 × 10−2
Bowel_Wall_Thickening__Management.surgical1.697 × 10−2
Bowel_Wall_Thickening__Management.conservative1.697 × 10−2
Bowel_Wall_Thickening1.697 × 10−2
Appendicular_Abscess1.666 × 10−2
Appendicular_Abscess__Management.conservative1.666 × 10−2
Appendicular_Abscess__Management.surgical1.666 × 10−2
Coughing_Pain__Management.conservative1.565 × 10−2
Coughing_Pain__Management.surgical1.565 × 10−2
Coughing_Pain1.565 × 10−2
Meteorism1.319 × 10−2
Meteorism__Management.surgical1.319 × 10−2
Meteorism__Management.conservative1.319 × 10−2
Appendix_on_US1.000 × 10−2
Appendix_on_US__Management.conservative1.000 × 10−2
Appendix_on_US__Management.surgical1.000 × 10−2
US_Performed6.948 × 10−3
US_Performed__Management.surgical6.948 × 10−3
US_Performed__Management.conservative6.948 × 10−3
Lower_Right_Abd_Pain__Management.conservative6.616 × 10−3
Lower_Right_Abd_Pain__Management.surgical6.616 × 10−3
Lower_Right_Abd_Pain6.616 × 10−3
Migratory_Pain__Management.surgical6.200 × 10−3
Migratory_Pain6.200 × 10−3
Migratory_Pain__Management.conservative6.200 × 10−3
Pathological_Lymph_Nodes__Management.conservative5.161 × 10−3
Pathological_Lymph_Nodes__Management.surgical5.161 × 10−3
Pathological_Lymph_Nodes5.161 × 10−3
Sex__Management.conservative4.313 × 10−4
Sex__Management.surgical4.313 × 10−4
Sex4.313 × 10−4

Appendix H

Results from predicting management without US image features.
Table A20. Management without US image features holdout set performance.
Table A20. Management without US image features holdout set performance.
ModelSelectionEmbed_SelectorAccAurocBal-AccF1NpvPpvSensSpec
rfnonenone0.9390.9800.9250.9340.9720.9230.9250.866
rfassocnone0.9360.9800.9220.9310.9630.9220.9220.866
lgbmembed_linearlinear0.9360.9770.9210.9300.9710.9180.9210.857
lgbmembed_lgbmlgbm0.9330.9750.9180.9270.9620.9180.9180.857
rfembed_linearlinear0.9330.9790.9180.9270.9620.9180.9180.857
lgbmnonenone0.9300.9780.9160.9240.9530.9170.9160.857
rfembed_lgbmlgbm0.9300.9790.9160.9240.9530.9170.9160.857
lgbmassocnone0.9300.9770.9140.9240.9620.9130.9140.849
rfprednone0.9270.9810.9110.9200.9530.9130.9110.849
rfwrapnone0.9200.9600.9010.9130.9610.9000.9010.824
lgbmprednone0.9170.9780.8960.9090.9700.8930.8960.807
lgbmwrapnone0.9170.9530.8940.9080.9790.8890.8940.798
lrprednone0.8560.9320.8190.8360.9400.8250.8190.664
lrnonenone0.8370.9250.8020.8160.8860.8180.8020.655
lrembed_linearlinear0.8370.9250.8020.8160.8860.8180.8020.655
lrassocnone0.8370.9250.8020.8160.8860.8180.8020.655
sgdnonenone0.8340.8220.8220.8230.7860.8620.8220.773
lrembed_lgbmlgbm0.8240.9160.7880.8020.8640.8090.7880.639
sgdassocnone0.8240.8130.8130.8130.7710.8560.8130.765
lrwrapnone0.8150.9030.7710.7860.8860.7910.7710.588
sgdprednone0.8020.7880.7880.7890.7440.8370.7880.731
sgdembed_linearlinear0.7670.8620.7440.7480.7130.7950.7440.647
sgdembed_lgbmlgbm0.7540.7500.7380.7390.6780.8000.7380.672
knnassocnone0.7190.7320.6660.6710.7070.7230.6660.445
knnembed_linearlinear0.7190.7320.6660.6710.7070.7230.6660.445
knnwrapnone0.7090.7230.6420.6420.7410.7020.6420.361
knnprednone0.7060.7480.6330.6290.7650.6950.6330.328
sgdwrapnone0.7030.8020.6770.6800.6180.7490.6770.571
knnembed_lgbmlgbm0.7030.7390.6530.6570.6620.7170.6530.445
knnnonenone0.6260.5880.5880.5890.5100.6810.5880.429
dummyembed_lgbmlgbm0.6200.5000.5000.383nan0.6200.5000.000
dummywrapnone0.6200.5000.5000.383nan0.6200.5000.000
dummyprednone0.6200.5000.5000.383nan0.6200.5000.000
dummynonenone0.6200.5000.5000.383nan0.6200.5000.000
dummyembed_linearlinear0.6200.5000.5000.383nan0.6200.5000.000
dummyassocnone0.6200.5000.5000.383nan0.6200.5000.000
nan = Not a Number.
Table A21. Management without US image features 5-fold performance on the holdout set.
Table A21. Management without US image features 5-fold performance on the holdout set.
ModelSelectionEmbed_SelectorAccAurocBal-AccF1NpvPpvSensSpec
lgbmnonenone0.9300.9710.9160.9230.9520.9190.9160.858
rfwrapnone0.9270.9420.9140.9200.9450.9200.9140.858
rfembed_lgbmlgbm0.9270.9680.9120.9200.9520.9140.9120.850
lgbmassocnone0.9200.9730.9050.9130.9440.9100.9050.841
rfassocnone0.9200.9690.9050.9130.9430.9090.9050.841
rfnonenone0.9200.9700.9070.9130.9340.9130.9070.849
lgbmembed_linearlinear0.9170.9700.9030.9090.9350.9100.9030.841
lgbmprednone0.9140.9740.9050.9070.9030.9220.9050.867
rfembed_linearlinear0.9140.9640.8980.9060.9330.9040.8980.833
rfprednone0.9070.9680.8920.8990.9250.9000.8920.825
lgbmembed_lgbmlgbm0.8980.9600.8850.8890.8950.9050.8850.833
lgbmwrapnone0.8950.9480.8750.8840.9270.8840.8750.791
lrprednone0.7760.8630.7430.7510.7670.7850.7430.605
lrassocnone0.7600.8640.7250.7330.7350.7720.7250.580
lrnonenone0.7600.8640.7250.7330.7350.7720.7250.580
lrembed_linearlinear0.7600.8640.7250.7330.7350.7720.7250.580
lrwrapnone0.7510.8410.7130.7210.7280.7610.7130.554
sgdprednone0.7510.7350.7350.7350.6750.8000.7350.672
lrembed_lgbmlgbm0.7480.8540.7130.7200.7080.7660.7130.571
sgdnonenone0.7380.7220.7220.7220.6610.7890.7220.656
sgdembed_linearlinear0.7350.8150.7180.7180.6570.7840.7180.647
sgdwrapnone0.7320.7870.7080.7110.6650.7730.7080.613
sgdassocnone0.7310.7150.7150.7150.6590.7820.7150.647
sgdembed_lgbmlgbm0.7090.7120.6910.6910.6260.7630.6910.614
knnwrapnone0.7000.7210.6390.6380.6930.7050.6390.386
knnprednone0.6970.7590.6120.5920.8120.6810.6120.260
knnembed_lgbmlgbm0.6840.7230.6480.6500.6100.7220.6480.496
knnembed_linearlinear0.6580.7080.6220.6220.5710.7050.6220.471
knnassocnone0.6580.7080.6220.6220.5710.7050.6220.471
knnnonenone0.6360.6200.6200.6170.5220.7150.6200.555
dummyembed_lgbmlgbm0.6200.5000.5000.383nan0.6200.5000.000
dummywrapnone0.6200.5000.5000.383nan0.6200.5000.000
dummyprednone0.6200.5000.5000.383nan0.6200.5000.000
dummynonenone0.6200.5000.5000.383nan0.6200.5000.000
dummyembed_linearlinear0.6200.5000.5000.383nan0.6200.5000.000
dummyassocnone0.6200.5000.5000.383nan0.6200.5000.000
nan = Not a Number.

Appendix I

Redundancy-Aware Step-Up Feature Selection Results for predicting Management without US image features.
Table A22. Selection scores (Accuracy: Higher = More important) for redundancy-aware feature selection predicting management without US features.
Table A22. Selection scores (Accuracy: Higher = More important) for redundancy-aware feature selection predicting management without US features.
FeatureScore
Ipsilateral_Rebound_Tenderness_nan8.312 × 10−1
Severity_uncomplicated8.932 × 10−1
RDW8.953 × 10−1
Peritonitis_no9.359 × 10−1
WBC_Count9.359 × 10−1
Peritonitis_local9.295 × 10−1
Body_Temperature9.274 × 10−1
Weight8.932 × 10−1
CRP8.720 × 10−1
Segmented_Neutrophils8.397 × 10−1
Height7.884 × 10−1
Thrombocyte_Count7.478 × 10−1

Appendix J

Results from predicting Severity with US image features.
Table A23. Severity with US image features holdout set performance.
Table A23. Severity with US image features holdout set performance.
ModelSelectionEmbed_SelectorAccAurocBal-AccF1NpvPpvSensSpec
lgbmprednone0.8910.9080.7230.7560.9110.7190.7230.966
lrwrapnone0.8910.8340.7060.7450.9050.7500.7060.974
lrprednone0.8910.8790.7060.7450.9050.7500.7060.974
lgbmembed_lgbmlgbm0.8910.9390.7820.7870.9330.6520.7820.940
lrassocnone0.8880.8930.7040.7410.9050.7240.7040.970
sgdassocnone0.8880.8540.7120.7460.9080.7100.7120.966
rfembed_lgbmlgbm0.8880.9290.7800.7830.9320.6380.7800.936
lrnonenone0.8880.8950.7040.7410.9050.7240.7040.970
lgbmassocnone0.8880.9320.7550.7710.9230.6590.7550.947
rfwrapnone0.8850.8790.7270.7530.9130.6670.7270.955
rfprednone0.8850.9060.7530.7660.9230.6430.7530.943
lrembed_linearlinear0.8850.8920.7020.7360.9050.7000.7020.966
sgdnonenone0.8820.8570.7000.7320.9040.6770.7000.962
sgdembed_linearlinear0.8820.8640.7000.7320.9040.6770.7000.962
rfassocnone0.8820.9330.8110.7880.9450.5960.8110.913
lrembed_lgbmlgbm0.8820.8830.6910.7260.9010.6900.6910.966
sgdwrapnone0.8820.8250.7000.7320.9040.6770.7000.962
lgbmnonenone0.8820.9330.7510.7620.9220.6280.7510.940
lgbmembed_linearlinear0.8820.9300.7430.7580.9190.6340.7430.943
knnnonenone0.8820.8330.6740.7130.8960.7200.6740.974
knnembed_linearlinear0.8820.7880.6910.7260.9010.6900.6910.966
knnembed_lgbmlgbm0.8820.8260.6660.7060.8930.7390.6660.977
knnassocnone0.8820.7880.6910.7260.9010.6900.6910.966
rfembed_linearlinear0.8790.9330.8090.7840.9450.5860.8090.909
sgdprednone0.8790.8460.6980.7280.9040.6560.6980.958
rfnonenone0.8720.9200.7800.7660.9340.5740.7800.913
sgdembed_lgbmlgbm0.8720.8460.7200.7360.9120.6000.7200.940
lgbmwrapnone0.8690.8690.7090.7260.9090.5900.7090.940
knnwrapnone0.8690.7530.6500.6820.8890.6400.6500.966
knnprednone0.8560.7430.6250.6510.8820.5600.6250.958
dummyembed_lgbmlgbm0.8470.5000.5000.4580.847nan0.5001.000
dummywrapnone0.8470.5000.5000.4580.847nan0.5001.000
dummyprednone0.8470.5000.5000.4580.847nan0.5001.000
dummynonenone0.8470.5000.5000.4580.847nan0.5001.000
dummyembed_linearlinear0.8470.5000.5000.4580.847nan0.5001.000
dummyassocnone0.8470.5000.5000.4580.847nan0.5001.000
nan = Not a Number.
Table A24. Severity with US image features 5-fold performance on the holdout set.
Table A24. Severity with US image features 5-fold performance on the holdout set.
ModelSelectionEmbed_SelectorAccAurocBal-AccF1NpvPpvSensSpec
lrwrapnone0.8950.8200.6990.7440.9030.8130.6990.981
lrassocnone0.8910.8730.7070.7460.9050.7530.7070.974
lrnonenone0.8910.8750.7070.7460.9050.7530.7070.974
lrembed_lgbmlgbm0.8910.8730.7070.7460.9050.7530.7070.974
lrembed_linearlinear0.8910.8700.7070.7460.9050.7530.7070.974
rfassocnone0.8880.9260.7900.7850.9360.6280.7900.932
sgdassocnone0.8880.8450.6960.7350.9020.7470.6960.974
knnprednone0.8880.7670.6950.7340.9020.7500.6950.974
rfembed_linearlinear0.8850.9240.7890.7810.9360.6170.7890.928
lrprednone0.8850.8560.6870.7240.8990.7530.6870.974
sgdembed_lgbmlgbm0.8850.8420.7110.7420.9080.6950.7110.962
sgdembed_linearlinear0.8850.8550.6950.7300.9020.7160.6950.970
rfnonenone0.8850.9240.7700.7750.9290.6320.7700.936
sgdnonenone0.8850.8450.6950.7300.9020.7160.6950.970
lgbmwrapnone0.8850.8370.6940.7300.9020.7130.6940.970
knnnonenone0.8820.7810.6490.6880.8880.8000.6490.985
rfprednone0.8790.8740.7070.7350.9070.6600.7070.955
rfwrapnone0.8790.8560.7070.7340.9070.6560.7070.955
sgdprednone0.8790.8400.6920.7190.9020.6990.6920.962
lgbmprednone0.8760.8670.6930.7140.9020.6420.6930.958
lgbmembed_linearlinear0.8760.9160.6750.7010.8960.6640.6750.966
sgdwrapnone0.8760.8240.6890.7180.9010.6520.6890.958
lgbmembed_lgbmlgbm0.8760.9240.7140.7330.9100.6290.7140.947
rfembed_lgbmlgbm0.8720.9280.6450.6760.8870.6800.6450.974
knnembed_linearlinear0.8720.7180.6340.6670.8840.7330.6340.977
knnassocnone0.8720.7180.6340.6670.8840.7330.6340.977
lgbmnonenone0.8660.8970.6510.6770.8890.6000.6510.962
lgbmassocnone0.8660.9150.6150.6440.8780.6870.6150.977
knnwrapnone0.8660.7160.6480.6750.8890.6680.6480.962
dummyembed_lgbmlgbm0.8470.5000.5000.4580.847nan0.5001.000
knnembed_lgbmlgbm0.8470.8150.5000.4580.847nan0.5001.000
dummywrapnone0.8470.5000.5000.4580.847nan0.5001.000
dummyprednone0.8470.5000.5000.4580.847nan0.5001.000
dummynonenone0.8470.5000.5000.4580.847nan0.5001.000
dummyembed_linearlinear0.8470.5000.5000.4580.847nan0.5001.000
dummyassocnone0.8470.5000.5000.4580.847nan0.5001.000
nan = Not a Number.

Appendix K

Redundancy-Aware Step-Up Feature Selection Results for predicting Severity with US image features.
Table A25. Selection scores (Accuracy: Higher = More important) for redundancy-aware feature selection predicting severity with US features.
Table A25. Selection scores (Accuracy: Higher = More important) for redundancy-aware feature selection predicting severity with US features.
FeatureScore
CRP8.697 × 10−1
Peritonitis_no8.846 × 10−1
Neutrophil_Percentage8.889 × 10−1
Thrombocyte_Count8.954 × 10−1
Weight_NAN8.996 × 10−1
Dysuria_nan8.996 × 10−1
Meteorism_nan8.997 × 10−1
Lower_Right_Abd_Pain_nan8.975 × 10−1
Free_Fluids_nan8.975 × 10−1
Nausea_nan8.954 × 10−1
Lower_Right_Abd_Pain_yes8.932 × 10−1
Peritonitis_generalized8.846 × 10−1
Segmented_Neutrophils8.740 × 10−1
Height8.654 × 10−1

Appendix L

Results from predicting Severity without US image features.
Table A26. Severity without US image features holdout set performance.
Table A26. Severity without US image features holdout set performance.
ModelSelectionEmbed_SelectorAccAurocBal-AccF1NpvPpvSensSpec
lgbmassocnone0.9010.9310.7880.8010.9330.6980.7880.951
rfassocnone0.8910.9330.7820.7870.9330.6520.7820.940
lrwrapnone0.8880.8110.6950.7350.9020.7410.6950.974
lrprednone0.8880.8810.6950.7350.9020.7410.6950.974
lrembed_lgbmlgbm0.8880.8780.6950.7350.9020.7410.6950.974
lgbmnonenone0.8880.9310.7550.7710.9230.6590.7550.947
lrassocnone0.8850.8890.6930.7300.9020.7140.6930.970
sgdprednone0.8850.8270.6760.7180.8960.7500.6760.977
sgdembed_lgbmlgbm0.8850.8540.7020.7360.9050.7000.7020.966
rfwrapnone0.8850.8730.7020.7360.9050.7000.7020.966
lrnonenone0.8850.8890.6930.7300.9020.7140.6930.970
sgdwrapnone0.8850.7880.6850.7240.8990.7310.6850.974
knnembed_linearlinear0.8850.8070.7020.7360.9050.7000.7020.966
sgdnonenone0.8820.8550.7000.7320.9040.6770.7000.962
knnassocnone0.8820.8290.6660.7060.8930.7390.6660.977
lrembed_linearlinear0.8820.8890.6910.7260.9010.6900.6910.966
knnnonenone0.8820.7950.7000.7320.9040.6770.7000.962
knnwrapnone0.8820.8170.6660.7060.8930.7390.6660.977
lgbmembed_linearlinear0.8820.9290.7680.7700.9290.6170.7680.932
sgdassocnone0.8820.8550.7000.7320.9040.6770.7000.962
lgbmprednone0.8790.9280.7490.7580.9220.6140.7490.936
sgdembed_linearlinear0.8790.8170.6810.7150.8980.6790.6810.966
knnembed_lgbmlgbm0.8750.7720.6960.7230.9040.6360.6960.955
rfembed_linearlinear0.8750.9330.7810.7700.9350.5850.7810.917
rfprednone0.8750.9300.7980.7770.9410.5790.7980.909
lgbmwrapnone0.8720.8500.6600.6930.8920.6540.6600.966
rfembed_lgbmlgbm0.8720.9260.8050.7760.9450.5670.8050.902
rfnonenone0.8720.9260.7970.7730.9410.5690.7970.906
lgbmembed_lgbmlgbm0.8690.9300.7350.7410.9180.5780.7350.928
dummyembed_linearlinear0.8470.5000.5000.4580.847nan0.5001.000
dummynonenone0.8470.5000.5000.4580.847nan0.5001.000
dummywrapnone0.8470.5000.5000.4580.847nan0.5001.000
dummyprednone0.8470.5000.5000.4580.847nan0.5001.000
dummyembed_lgbmlgbm0.8470.5000.5000.4580.847nan0.5001.000
dummyassocnone0.8470.5000.5000.4580.847nan0.5001.000
knnprednone0.8430.7170.6180.6370.8800.4830.6180.943
nan = Not a Number.
Table A27. Severity without US image features 5-fold performance on the holdout set.
Table A27. Severity without US image features 5-fold performance on the holdout set.
ModelSelectionEmbed_SelectorAccAurocBal-AccF1NpvPpvSensSpec
lgbmassocnone0.8920.8960.7410.7680.9170.7170.7410.958
sgdnonenone0.8910.8500.6990.7390.9030.7830.6990.977
lrassocnone0.8910.8690.7070.7450.9060.7700.7070.974
lrembed_lgbmlgbm0.8910.8640.7070.7450.9060.7700.7070.974
lrnonenone0.8910.8690.7070.7450.9060.7700.7070.974
lrembed_linearlinear0.8910.8690.7070.7450.9060.7700.7070.974
lgbmembed_lgbmlgbm0.8880.9070.7210.7530.9110.7090.7210.962
sgdembed_lgbmlgbm0.8880.8460.6960.7350.9020.7470.6960.974
knnnonenone0.8880.7460.6960.7350.9020.7470.6960.974
lrwrapnone0.8880.8050.6860.7270.8990.7670.6860.977
rfassocnone0.8850.9230.7120.7420.9080.6870.7120.962
sgdassocnone0.8850.8530.6950.7300.9020.7160.6950.970
rfwrapnone0.8850.8210.6940.7310.9020.7280.6940.970
lrprednone0.8850.8470.6780.7170.8960.7770.6780.977
sgdwrapnone0.8850.7700.6850.7240.8990.7370.6850.974
sgdprednone0.8820.7800.6840.7200.8990.7410.6840.970
knnembed_linearlinear0.8820.7310.6910.7250.9020.7020.6910.966
rfprednone0.8760.9210.7260.7310.9140.5930.7260.943
rfembed_lgbmlgbm0.8760.9180.8160.7810.9490.5700.8160.902
rfembed_linearlinear0.8750.9250.6890.7170.9010.6450.6890.958
knnembed_lgbmlgbm0.8720.7190.6340.6670.8840.7330.6340.977
lgbmembed_linearlinear0.8690.9280.5930.6050.8720.7000.5930.992
lgbmwrapnone0.8690.8070.6840.7090.9000.6200.6840.951
knnprednone0.8690.7380.6580.6870.8920.6340.6580.962
sgdembed_linearlinear0.8660.8450.7110.7160.9110.6010.7110.936
lgbmnonenone0.8560.8760.6090.6290.8770.5670.6090.966
lgbmprednone0.8470.8870.5000.4580.847nan0.5001.000
rfnonenone0.8470.9060.5000.4580.847nan0.5001.000
knnassocnone0.8470.8040.5000.4580.847nan0.5001.000
dummywrapnone0.8470.5000.5000.4580.847nan0.5001.000
dummyprednone0.8470.5000.5000.4580.847nan0.5001.000
dummynonenone0.8470.5000.5000.4580.847nan0.5001.000
knnwrapnone0.8470.8160.5000.4580.847nan0.5001.000
dummyembed_linearlinear0.8470.5000.5000.4580.847nan0.5001.000
dummyembed_lgbmlgbm0.8470.5000.5000.4580.847nan0.5001.000
dummyassocnone0.8470.5000.5000.4580.847nan0.5001.000
nan = Not a Number.

Appendix M

Redundancy-Aware Step-Up Feature Selection Results for predicting Severity without US image features.
Table A28. Selection scores (Accuracy: Higher = More important) for redundancy-aware feature selection predicting severity without ultrasound features.
Table A28. Selection scores (Accuracy: Higher = More important) for redundancy-aware feature selection predicting severity without ultrasound features.
FeatureScore
CRP8.697 × 10−1
Peritonitis_no8.868 × 10−1
Coughing_Pain_nan8.868 × 10−1
Body_Temperature8.847 × 10−1
Thrombocyte_Count8.718 × 10−1

References

  1. “Appendicitis,” Mayo Clinic. Available online: https://www.mayoclinic.org/diseases-conditions/appendicitis/symptoms-causes/syc-20369543 (accessed on 28 September 2024).
  2. “Does This Child Have Appendicitis?” Johns Hopkins Medicine. Available online: https://www.hopkinsmedicine.org/health/conditions-and-diseases/does-this-child-have-appendicitis#:~:text=Up%20to%2080%20percent%20of,easy%20to%20miss%20or%20delay. (accessed on 5 October 2024).
  3. “Appendicitis Tests: Medlineplus Medical Test,” MedlinePlus. Available online: https://medlineplus.gov/lab-tests/appendicitis-tests/#:~:text=CT%20scan%20(computed%20tomography%20scan,up%20better%20in%20the%20pictures (accessed on 28 September 2024).
  4. Gollapalli, M.; Rahman, A.; Kudos, S.A.; Foula, M.S.; Alkhalifa, A.M.; Albisher, H.M.; Al-Hariri, M.T.; Mohammad, N. Appendicitis Diagnosis: Ensemble Machine Learning and Explainable Artificial Intelligence-Based Comprehensive Approach. Big Data Cogn. Comput. 2024, 8, 108. [Google Scholar] [CrossRef]
  5. Issaiy, M.; Zarei, D.; Saghazadeh, A. Artificial Intelligence and Acute Appendicitis: A Systematic Review of Diagnostic and Prognostic Models. World J. Emerg. Surg. 2023, 18, 59. [Google Scholar] [CrossRef]
  6. Kang, C.B.; Li, X.W.; Hou, S.Y.; Chi, X.Q.; Shan, H.F.; Zhang, Q.J. Preoperatively predicting the pathological types of acute appendicitis using machine learning based on peripheral blood biomarkers and clinical features: A retrospective study. Ann. Transl. Med. 2021, 9, 835. [Google Scholar] [CrossRef]
  7. Park, J.J.; Kim, K.A.; Nam, Y.; Choi, M.H.; Choi, S.Y.; Rhie, J. Convolutional-neural-network-based diagnosis of appendicitis via CT scans in patients with acute abdominal pain presenting in the emergency department. Sci. Rep. 2020, 10, 9556. [Google Scholar] [CrossRef]
  8. Akbulut, S.; Yagin, F.H.; Cicek, I.B.; Koc, C.; Colak, C.; Yilmaz, S. Prediction of perforated and nonperforated acute appendicitis using machine learning-based explainable artificial intelligence. Diagnostics 2023, 13, 1173. [Google Scholar] [CrossRef]
  9. Rajpurkar, P.; Park, A.; Irvin, J.; Chute, C.; Bereket, M.; Mastrodicasa, D. AppendiXNet: Deep learning for diagnosis of appendicitis from a small dataset of CT exams using video pretraining. Sci. Rep. 2020, 10, 3958. [Google Scholar] [CrossRef]
  10. Prabhudesai, S.G.; Gould, S.; Rekhraj, S.; Tekkis, P.P.; Glazer, G.; Ziprin, P. Artificial neural networks: Useful aid in diagnosing acute appendicitis. World J. Surg. 2008, 32, 305–309. [Google Scholar] [CrossRef]
  11. Park, S.H.; Kim, Y.J.; Kim, K.G.; Chung, J.-W.; Kim, H.C.; Choi, I.Y.; You, M.-W.; Lee, G.P.; Hwang, J.H. Comparison between single and serial computed tomography images in classification of acute appendicitis, acute right-sided diverticulitis, and normal appendix using EfficientNet. PLoS ONE 2023, 18, e0281498. [Google Scholar] [CrossRef]
  12. Zhao, Y.; Yang, L.; Sun, C.; Li, Y.; He, Y.; Zhang, L. Discovery of urinary proteomic signature for differential diagnosis of acute appendicitis. Biomed. Res. Int. 2020, 2020, 3896263. [Google Scholar] [CrossRef]
  13. Hsieh, C.H.; Lu, R.H.; Lee, N.H.; Chiu, W.T.; Hsu, M.H.; Li, Y.C. Novel solutions for an old disease: Diagnosis of acute appendicitis with random forest, support vector machines, and artificial neural networks. Surgery 2011, 149, 87–93. [Google Scholar] [CrossRef]
  14. Phan-Mai, T.A.; Thai, T.T.; Mai, T.Q.; Vu, K.A.; Mai, C.C.; Nguyen, D.A. Validity of machine learning in detecting complicated appendicitis in a resource-limited setting: Findings from Vietnam. Biomed. Res. Int. 2023, 2023, 5013812. [Google Scholar] [CrossRef]
  15. Sakai, S.; Kobayashi, K.; Toyabe, S.; Mandai, N.; Kanda, T.; Akazawa, K. Comparison of the levels of accuracy of an artificial neural network model and a logistic regression model for the diagnosis of acute appendicitis. J. Med. Syst. 2007, 31, 357–364. [Google Scholar] [CrossRef]
  16. Lin, H.A.; Lin, L.T.; Lin, S.F. Application of artificial neural network models to differentiate between complicated and uncomplicated acute appendicitis. J. Med. Syst. 2023, 47, 38. [Google Scholar] [CrossRef]
  17. Bunn, C.; Kulshrestha, S.; Boyda, J.; Balasubramanian, N.; Birch, S.; Karabayir, I. Application of machine learning to the prediction of postoperative sepsis after appendectomy. Surgery 2021, 169, 671–677. [Google Scholar] [CrossRef]
  18. Eickhoff, R.M.; Bulla, A.; Eickhoff, S.B.; Heise, D.; Helmedag, M.; Kroh, A. Machine learning prediction model for postoperative outcome after perforated appendicitis. Langenbecks Arch. Surg. 2022, 407, 789–795. [Google Scholar] [CrossRef]
  19. Ghareeb, W.M.; Emile, S.H.; Elshobaky, A. Artificial intelligence compared to alvarado scoring system alone or combined with ultrasound criteria in the diagnosis of acute appendicitis. J. Gastrointest. Surg. 2022, 26, 655–658. [Google Scholar] [CrossRef]
  20. Ramirez-GarciaLuna, J.L.; Vera-Bañuelos, L.R.; Guevara-Torres, L.; Martínez-Jiménez, M.A.; Ortiz-Dosal, A.; Gonzalez, F.J.; Kolosovas-Machuca, E.S. Infrared thermography of abdominal wall in acute appendicitis: Proof of concept study. Infrared Phys. Technol. 2020, 105, 103165. [Google Scholar] [CrossRef]
  21. Forsström, J.J.; Irjala, K.; Selén, G.; Nyström, M.; Eklund, P. Using data preprocessing and single layer perceptron to analyze laboratory data. Scand. J. Clin. Lab. Investig. Suppl. 1995, 222, 75–81. [Google Scholar] [CrossRef]
  22. Afshari Safavi, A.; Zand Karimi, E.; Rezaei, M.; Mohebi, H.; Mehrvarz, S.; Khorrami, M.R. Comparing the accuracy of neural network models and conventional tests in diagnosis of suspected acute appendicitis. J. Maz. Univ. Med. Sci. 2015, 25, 58–65. [Google Scholar]
  23. Pesonen, E.; Eskelinen, M.; Juhola, M. Comparison of different neural network algorithms in the diagnosis of acute appendicitis. Int. J. Biomed. Comput. 1996, 40, 227–233. [Google Scholar] [CrossRef]
  24. Ting, H.W.; Wu, J.T.; Chan, C.L.; Lin, S.L.; Chen, M.H. Decision model for acute appendicitis treatment with decision tree technology–a modification of the Alvarado scoring system. J. Chin. Med. Assoc. 2021, 73, 401–406. [Google Scholar] [CrossRef]
  25. Son, C.S.; Jang, B.K.; Seo, S.T.; Kim, M.S.; Kim, Y.N. A hybrid decision support model to discover informative knowledge in diagnosing acute appendicitis. BMC Med. Inform. Decis. Mak. 2012, 12, 17. [Google Scholar] [CrossRef]
  26. Yoldaş, Ö.; Tez, M.; Karaca, T. Artificial neural networks in the diagnosis of acute appendicitis. Am. J. Emerg. Med. 2012, 30, 1245–1247. [Google Scholar] [CrossRef]
  27. Park, S.Y.; Kim, S.M. Acute appendicitis diagnosis using artificial neural networks. Technol. Health Care 2015, 23, S559–S565. [Google Scholar] [CrossRef]
  28. Jamshidnezhad, A.; Azizi, A.; Zadeh, S.R.; Shirali, S.; Shoushtari, M.H.; Sabaghan, Y. A computer based model in comparison with sonography imaging to diagnosis of acute appendicitis in Iran. J. Acute Med. 2017, 7, 10–18. [Google Scholar]
  29. Gudelis, M.; Lacasta Garcia, J.D.; Trujillano Cabello, J.J. Diagnosis of pain in the right iliac fossa. A new diagnostic score based on decision-tree and artificial neural network methods. Cir. Esp. (Engl. Ed.) 2019, 97, 329–335. [Google Scholar] [CrossRef]
  30. Kang, H.J.; Kang, H.; Kim, B.; Chae, M.S.; Ha, Y.R.; Oh, S.B.; Ahn, J.H. Evaluation of the diagnostic performance of a decision tree model in suspected acute appendicitis with equivocal preoperative computed tomography findings compared with Alvarado, Eskelinen, and adult appendicitis scores: A STARD compliant article. Medicine 2019, 98, e17368. [Google Scholar] [CrossRef]
  31. Shahmoradi, L.; Safdari, R.; Mir Hosseini, M.; Arji, G.; Jannt, B.; Abdar, M. Predicting risk of acute appendicitis: A comparison of artificial neural network and logistic regression models. Acta Med. Iran. 2019, 56, 785. [Google Scholar]
  32. Li, P.; Zhang, Z.; Weng, S.; Nie, H. Establishment of predictive models for acute complicated appendicitis during pregnancy-a retrospective case-control study. Int. J. Gynaecol. Obstet. 2023, 162, 744–751. [Google Scholar] [CrossRef]
  33. Lee, Y.H.; Hu, P.J.; Cheng, T.H.; Huang, T.C.; Chuang, W.Y. A preclustering-based ensemble learning technique for acute appendicitis diagnoses. Artif. Intell. Med. 2013, 58, 115–124. [Google Scholar] [CrossRef]
  34. Xia, J.; Wang, Z.; Yang, D.; Li, R.; Liang, G.; Chen, H. Performance optimization of support vector machine with oppositional grasshopper optimization for acute appendicitis diagnosis. Comput. Biol. Med. 2022, 143, 105206. [Google Scholar] [CrossRef]
  35. Marcinkevičs, R.; Reis Wolfertstetter, P.; Wellmann, S.; Knorr, C.; Vogt, J.E. Using machine learning to predict the diagnosis, management and severity of pediatric appendicitis. Front. Pediatr. 2021, 9, 662183. [Google Scholar] [CrossRef]
  36. Regensburg Pediatric Appendicitis. Available online: https://archive.ics.uci.edu/dataset/938/regensburg+pediatric+appendicitis (accessed on 30 September 2024).
  37. Marcinkevičs, R.; Wolfertstetter, P.R.; Klimiene, U.; Chin-Cheong, K.; Paschke, A.; Zerres, J.; Denzinger, M.; Niederberger, D.; Wellmann, S.; Ozkan, E.; et al. Interpretable and intervenable ultrasonography-based machine learning models for pediatric appendicitis. Med. Image Anal. 2024, 91, 103042. Available online: https://www.sciencedirect.com/science/article/pii/S136184152300302X?via%3Dihub (accessed on 15 November 2024). [CrossRef]
  38. Navaei, M.; Doogchi, Z.; Gholami, F.; Tavakoli, M.K. Leveraging Machine Learning for Pediatric Appendicitis Diagnosis: A Retrospective Study Integrating Clinical, Laboratory, and Imaging Data. Health Sci. Rep. 2025, 8, e70756. [Google Scholar] [CrossRef]
  39. Chadaga, K.; Khanna, V.; Prabhu, S.; Sampathila, N.; Chadaga, R.; Umakanth, S.; Bhat, D.; Swathi, K.S.; Kamath, R. An interpretable and transparent machine learning framework for appendicitis detection in pediatric patients. Sci. Rep. 2024, 14, 24454. Available online: https://www.nature.com/articles/s41598-024-75896-y (accessed on 15 November 2024). [CrossRef]
  40. Thapa, A.; Timilsina, S.; Chapagain, B. Dharma: A novel machine learning framework for pediatric appendicitis-diagnosis, severity assessment and evidence-based clinical decision support. medRxiv 2025. [Google Scholar] [CrossRef]
  41. Berger, D. DF-analyze/readme.md. GitHub. Available online: https://github.com/stfxecutables/df-analyze/blob/02e546f50d66ba2b27faae94758f5f69d29ad8f8/README.md#feature-type-and-cardinality-inference (accessed on 18 October 2024).
  42. Kendall, J. appendicitis-ml. GitHub. Available online: https://github.com/johnkxl/appendicitis-ml (accessed on 1 December 2024).
  43. Berger, D. df-analyze: Redundancy-Aware Feature Selection [Experimental Branch], GitHub. Available online: https://github.com/stfxecutables/df-analyze/tree/experimental?tab=readme-ov-file#redundancy-aware-feature-selection-new (accessed on 15 November 2024).
  44. Joseph, M.; Raj, H. GANDALF: Gated Adaptive Network For Deep Automated Learning of Features. 2024. Available online: https://arxiv.org/abs/2207.08548 (accessed on 15 October 2024).
  45. Levman, J.; Jennings, M.; Rouse, E.; Berger, D.; Kabaria, P.; Nangaku, M.; Gondra, I.; Takahashi, E. A Morphological Study of Schizophrenia with Magnetic Resonance Imaging, Advanced Analytics, and Machine Learning. Front. Neurosci. 2022, 16, 926426. [Google Scholar] [CrossRef]
  46. Figueroa, J.; Etim, P.; Shibu, A.; Berger, D.; Levman, J. Diagnosing and Characterizing Chronic Kidney Disease with Machine Learning: The Value of Clinical Patient Characteristics as Evidenced from an Open Dataset. Electronics 2024, 13, 4326. [Google Scholar] [CrossRef]
  47. Saville, K.; Berger, D.; Levman, J. Mitigating Bias Due to Race and Gender in Machine Learning Predictions of Traffic Stop Outcomes. Information 2024, 11, 687. [Google Scholar] [CrossRef]
  48. Huang, X.; Gauthier, C.; Berger, D.; Cai, H.; Levman, J. Identifying Cortical Molecular Biomarkers Potentially Associated with Learning in Mice Using Artificial Intelligence. Int. J. Mol. Sci. 2025, 26, 6878. [Google Scholar] [CrossRef]
Figure 1. Comparative bar plot of leading models for predicting diagnosis with and without US features.
Figure 1. Comparative bar plot of leading models for predicting diagnosis with and without US features.
Tomography 11 00090 g001
Figure 2. Comparative bar plot of leading models for predicting management with and without US features.
Figure 2. Comparative bar plot of leading models for predicting management with and without US features.
Tomography 11 00090 g002
Figure 3. Comparative bar plot of leading models for predicting severity with and without US features.
Figure 3. Comparative bar plot of leading models for predicting severity with and without US features.
Tomography 11 00090 g003
Table 1. Diagnosis target variable.
Table 1. Diagnosis target variable.
AppendicitisNo Appendicitis
Frequency463317
Proportion463/780317/780
Table 2. Management target variable.
Table 2. Management target variable.
ConservativePrimary SurgicalSecondary SurgicalSimultaneous Appendectomy
Frequency483270271
Proportion483/781270/78127/7811/781
Relative Frequency61.84%34.57%3.46%0.13%
Table 3. Severity target variable.
Table 3. Severity target variable.
UncomplicatedComplicated
Frequency662119
Proportion662/781119/781
Relative Frequency84.76%15.24%
Table 4. Numeric feature statistics for patients with and without appendicitis.
Table 4. Numeric feature statistics for patients with and without appendicitis.
VariableAppendicitis: Mean, SDNo Appendicitis: Mean, SD
Age11.08, 3.5611.72, 3.46
BMI18.45, 4.1619.56, 4.62
Height146.93, 20.43149.51, 18.64
Weight41.72, 17.4745.25, 17.11
Length_of_Stay5.11, 2.983.09, 0.98
Alvarado_Score6.67, 1.934.83, 2.0
Paedriatic_Appendicitis_Score5.82, 1.854.42, 1.81
Appendix_Diameter8.7, 2.185.04, 1.17
Body_Temperature37.52, 0.8137.24, 1.0
WBC_Count14.28, 5.3410.33, 4.48
Neutrophil_Percentage76.03, 12.6365.6, 14.76
Segmented_Neutrophils71.6, 12.5155.23, 13.29
RBC_Count4.79, 0.374.82, 0.64
Hemoglobin13.38, 1.6113.38, 1.02
RDW13.4, 5.8612.87, 0.87
Thrombocyte_Count285.79, 70.83284.48, 74.92
Ketones_in_Urine1.15, 1.280.69, 1.11
RBC_in_Urine0.4, 0.820.32, 0.71
WBC_in_Urine0.24, 0.630.19, 0.55
CRP44.9, 68.5111.72, 24.92
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kendall, J.; Gaspar, G.; Berger, D.; Levman, J. Machine Learning and Feature Selection in Pediatric Appendicitis. Tomography 2025, 11, 90. https://doi.org/10.3390/tomography11080090

AMA Style

Kendall J, Gaspar G, Berger D, Levman J. Machine Learning and Feature Selection in Pediatric Appendicitis. Tomography. 2025; 11(8):90. https://doi.org/10.3390/tomography11080090

Chicago/Turabian Style

Kendall, John, Gabriel Gaspar, Derek Berger, and Jacob Levman. 2025. "Machine Learning and Feature Selection in Pediatric Appendicitis" Tomography 11, no. 8: 90. https://doi.org/10.3390/tomography11080090

APA Style

Kendall, J., Gaspar, G., Berger, D., & Levman, J. (2025). Machine Learning and Feature Selection in Pediatric Appendicitis. Tomography, 11(8), 90. https://doi.org/10.3390/tomography11080090

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop