Machine Learning Logistic Regression Model for Early Decision Making in Referral of Children with Cervical Lymphadenopathy Suspected of Lymphoma

Simple Summary Cervical lymphadenopathy is common in children. A decision model for detecting high-grade lymphoma in children with cervical lymphadenopathy is currently lacking. Most previous studies identified individual predicting factors for lymphoma, a few created multivariate models, but none of these were sufficiently discriminative for application in clinical practice. We have developed a 12-factor diagnostic scoring model with machine learning logistic regression that is highly sensitive and specific in detecting high-grade lymphomas. This diagnostic model facilitates early decision making in children with cervical lymphadenopathy suspected of lymphoma. Its application may enable early referral to a pediatric oncologist in patients with high-grade lymphoma and may reduce the number of referrals in patients with benign lymphadenopathy, thus preventing unnecessary invasive procedures, such as biopsies. Abstract While cervical lymphadenopathy is common in children, a decision model for detecting high-grade lymphoma is lacking. Previously reported individual lymphoma-predicting factors and multivariate models were not sufficiently discriminative for clinical application. To develop a diagnostic scoring tool, we collected data from all children with cervical lymphadenopathy referred to our national pediatric oncology center within 30 months (n = 182). Thirty-nine putative lymphoma-predictive factors were investigated. The outcome groups were classical Hodgkin lymphoma (cHL), nodular lymphocyte-predominant Hodgkin lymphoma (NLPHL), non-Hodgkin lymphoma (NHL), other malignancies, and a benign group. We integrated the best univariate predicting factors into a multivariate, machine learning model. Logistic regression allocated each variable a weighing factor. The model was tested in a different patient cohort (n = 60). We report a 12-factor diagnostic model with a sensitivity of 95% (95% CI 89–98%) and a specificity of 88% (95% CI 77–94%) for detecting cHL and NHL. Our 12-factor diagnostic scoring model is highly sensitive and specific in detecting high-grade lymphomas in children with cervical lymphadenopathy. It may enable fast referral to a pediatric oncologist in patients with high-grade lymphoma and may reduce the number of referrals and unnecessary invasive procedures in children with benign lymphadenopathy.


Introduction
Lymphadenopathy is a common clinical finding in children and adolescents and can occur at any age [1][2][3][4][5][6][7]. In most cases it is caused by non-malignant conditions, such the future, the implementation of a diagnostic scoring model may lead to less unnecessary referrals to the pediatric oncologist and less biopsies in patients with benign causes of lymphadenopathy.

Patient Inclusion
This is a retrospective, single center study, performed with patient data from the Princess Máxima Centre for Pediatric Oncology in Utrecht, the Netherlands, which is the only pediatric oncology facility in the Netherlands. We studied all patients (≤18 years) referred with lymphadenopathy suspected of lymphoma, either directly from general practitioners or from pediatricians, between June 2018 and December 2020 (n = 333). We included patients with cervical (including supraclavicular) lymphadenopathy or mass as the reason of referral (n = 182). We excluded patients who presented with lymphadenopathy or masses in other body regions, relapsed lymphoma, post-transplant lymphoproliferative disorders (PTLD), patients who had already started treatment for lymphoma elsewhere, and patients with genetic syndromes causing tumors.
A second cohort of patients was created consisting of all patients from January 2021 to January 2022 (n = 60). This cohort was used to test the reproducibility of the diagnostic model. Inclusion and exclusion criteria were the same as for the study group.
The study was approved by the Dutch Medical Research Ethical Committee Utrecht under trial number 21-073/C and number . All patients, and, when needed, parents or guardians, gave written informed consent.

Data Collection and Definitions
We collected data from electronic patient files. Two different researchers (E.A.M.Z. and L.A.W.) collected data independently. Uncertainties were discussed together and, if necessary, with the other researchers (A.B., R.J.N.) to reach a consensus. Due to the retrospective design, no standard formats were used for the description of the investigated variables. For example, involvement of the body regions was scored based on the radiology reports. Sometimes, one of the investigators was not certain how to interpret the radiology reports. This was discussed with the colleagues, and if necessary, the radiology investigations were re-analyzed by the involved radiologist of our study.
Data were processed anonymously and encrypted. We identified predicting factors for lymphoma based on an extensive search of the literature using PubMed, Medline, and Embase. We searched for studies using Medical Subject Heading terms including "lymphadenopathy", "child", "adolescent", and "lymphoma". An overview of potential predicting factors based on this search of the literature and their results are given in Table S1 [8,[31][32][33][34][35][36][37][38][46][47][48][49][50][51][52]. We identified 39 potential predictors and included these in our univariate analyses: age, gender, presence of B-symptoms, 11 laboratory parameters including TARC, and several imaging findings. These variables and their definitions are listed in Table S2.
The body regions of the involved areas were scored individually. An overview of the separately scored anatomical body regions and an explanation is provided in Table S3.
We used pathology reports primarily for defining the diagnosis; 158 out of 182 patients underwent biopsy, including all cases of lymphoma. Twenty-four patients were diagnosed without a biopsy, but based on clinical, radiological, microbiological, and laboratory results (twenty infectious/reactive lymphadenopathy, one venous malformation, one lymphangioma, one branchiogenic cyst, and one dermoid cyst).
We categorized the patients into 12 groups according to their diagnosis. The malignant diagnoses in the study population included: cHL, NLPHL, ALCL, primary mediastinal large B-cell lymphoma (PMBCL), diffuse large B-cell lymphoma (DLBCL), Burkitt lymphoma (BL), T-LBL, B-cell lymphoblastic lymphoma (B-LBL), and other malignancies (Langerhans cell histiocytosis (LCH)). Furthermore, there were three groups with benign For the identification of predictive factors, we divided the outcome into the benign group and the malignant group for univariate analysis. However, the malignant group contained nine different diagnoses, which differ significantly in incidence and clinical presentation. Therefore, we subdivided the group into five categories for multivariate analysis: cHL, NLPHL, NHL, other malignancies, and the benign group. In brief, we collected data from electronic patient files. We identified 39 potential predictors based on an extensive search of the literature. We used pathology reports for defining the diagnosis.

Statistical Analysis
For univariate analysis, the Fisher's exact test was used for binary variables to determine differences between the malignant and benign group. Laboratory values were analyzed both as continuous factors as well as binary variables with a cut-off point to investigate in which way they predict best for lymphoma. To develop a cut-off point, we developed a receiver operating characteristic (ROC) curve to determine the best cut-off point. We calculated the odds ratios (OR) with 95% confidence intervals (CI) to investigate the best predicting factors. In the case that one of the predictors was not present in either the benign or malignant group (n = 0), the OR was set as 0. In that case, we used the p-value for the selection for multivariate analysis. A p-value of <0.01 was considered statistically significant.
Since TARC is a new biomarker and only known as a marker for cHL so far, TARC was also evaluated as variable between cHL, NLPHL, NHL, and the benign group. We used SPSS Statistical software (version 27.0, IBM, Armonk, NY, USA) for univariate analysis.
NLPHL is a unique form of lymphoma. Its clinical presentation mimics a benign cause of lymphadenopathy [22][23][24]. Therefore, we investigated whether there are predicting factors to distinguish between NLPHL and the benign group.
We performed multivariate data analysis using Python Programming software (version 3.8.8). We decided on a data-driven approach instead of using the more traditional bio statistical methodologies. We chose this novel approach for efficiency and automation in finding the best model, as well as the removal of human bias in decisions. We used the most relevant variables based on univariate analysis and only included variables which can be used in general clinical practice. Using univariate analysis for feature selection does not address multicollinearity, whereas multivariate techniques, such as LASSO, are more successful in addressing this [53]. To prevent collinearity from biasing our results, we tested the correlation of our input predictors against each other with a Kendall Tau test. Besides that, we also performed further feature selection using the multivariate technique least absolute shrinkage and selection operator (LASSO) and extracted the feature importance in LASSO to use in the assessment.
We built a machine learning model using logistic regression in the scikit-learn package version 0.24.1.
We tested out several different model types, including easily explainable versions, such as decision trees and logistic regression, and the higher quality models, such as random forests and support vector machines [54,55]. Details of the different model types and the comparisons of the different models are provided in the Text S1.
Logistic regression was chosen for its relatedness to other human-made models and comprehensibility and its limited compromise on model quality. The machine learning model is binary (positive or negative) and assumes that no missing variables occur. However, there were some missing variables. We decided to mark all missing variables as negative, since we hypothesized a diagnostic test is not performed when the suspicion of abnormalities is low.
The model calculated the importance of a variable and gave it a weighing factor using logistic regression. A more detailed explanation of the weighing factor and the formula is provided in the Text S2. In model training, we set false negatives to be twice as bad as false positives, as, in general, false negatives are to be avoided when dealing with potential malignant disease. To validate the model quality, we used a 5-fold cross validation on the complete dataset for quality calculation.
A 5-fold cross-validation (CV) was applied to test the model quality on the dataset directly without overfitting and by using the entire set in training. In the 5-fold CV, the data are first randomly split into five parts. Then, the model was trained on four parts and we validated the quality with the fifth part. We then applied the same technique five times, for each combination once. The results of the 5-fold CV are the averaged values of the quality metrics obtained from the five tests. We evaluated the model in terms of sensitivity, specificity, and likelihood: Besides these quality metrics, we also visualized the receiver operator curve (ROC) and calculated the area under the ROC curve, also known as the area under the curve (AUC). To define the best cut-off in the ROC curves, we used the Youden Index in combination with expert opinions.
Youden s Index = Sensitivity + Speci f icity − 1 Moreover, we also used the patient dataset to identify potential overfitting. After finishing the multivariate model, we tested the model in a second (test) cohort. We first investigated whether the test group and the study group are similar enough, such that we can use the test set to test the model. We calculated whether the input variable distribution was significantly different between the study and test cohort, for the whole set combined, and for the malignant and benign cases separately. Afterwards, we tested the model for its quality and lack of overfitting in the second (test) cohort.

Study Population and Baseline Characteristics
In total, 182 patients were referred with cervical lymphadenopathy and included in the study (consort diagram, Figure 1). The malignant group contained 115 patients, including 79 cHL, 11 NLPHL, 24 NHL, and 1 LCH patient. The benign group contained 67 patients, including 48 with reactive/infectious lymphadenopathy (Table 1). Baseline characteristics are provided in Table 1. The test set contained 60 patients: 40 patients with malignancies and 20 with benign lymphadenopathy. The patient characteristics of the test group were not statistically different from the study group (Table 1).

Univariate Analysis Identified 29 Predictive Factors for Lymphoma
Univariate analysis of all variables identified the significant variables (p < 0.01) as significant predictors for high-grade lymphoma. The results of the univariate analysis are presented in Table 2. Odds Ratios and a 95% CI showed the predicting value of the different variables. We found that p-values for the laboratory values as continuous variables were comparable with the p-values when analyzed as dichotomous variables. Odds ratios were easier to interpret when analyzed as dichotomous variables due to the high range of some of the laboratory variables. Moreover, a cut-off point is very useful in clinical practice. Therefore, we decided to use a cut-off point for laboratory values for further analysis.

Univariate Analysis Identified 29 Predictive Factors for Lymphoma
Univariate analysis of all variables identified the significant variables (p < 0.01) a significant predictors for high-grade lymphoma. The results of the univariate analysis ar presented in Table 2. Odds Ratios and a 95% CI showed the predicting value of the diffe ent variables. We found that p-values for the laboratory values as continuous variable were comparable with the p-values when analyzed as dichotomous variables. Odds ratio were easier to interpret when analyzed as dichotomous variables due to the high range o some of the laboratory variables. Moreover, a cut-off point is very useful in clinical prac tice. Therefore, we decided to use a cut-off point for laboratory values for further analysi

Characteristics Benign Group
Study Group, Benign n = 67 Test Group, Benign n = 20  Table 1 describes the baseline characteristics of the malignant and benign group separately. Furthermore, differences between the different groups were investigated We calculated whether the input variable distribution was significantly different between the study and test cohort, for the whole set combined, and for the malignant and benign cases separately.   Abbreviations: ESR-erythrocyte sedimentation rate; Hb-hemoglobin; LD-lactate dehydrogenase; CRP-C-reactive protein; TARC-thymus and activation regulated chemokine; NA-not applicable; m-missing. a Ultrasound characteristics of pathological lymph nodes are diffuse hypo-echogenicity, absence of fatty hilum, round shaped and/or abnormal cluster of lymph nodes, and a resistance index (RI) above 0.8 [21,24,50]. The size of the lymph node that is considered pathological is dependent on the locations of the lymph node. Cervical lymph nodes in level two are considered pathological when the shortest diameter is larger than 15 mm. Cervical lymph nodes in other levels are considered pathological when the shortest diameter is larger than 10 mm. For non-cervical regions, the shortest diameter of greater than 10 mm was considered pathological [10,51]. We registered the lymph node as pathological when it was described as pathological by the radiologist based on the characteristics above. When the lymph node was described as doubtful pathological, we scored it as negative. b Conversion factor 1, c conversion factor 0.6206, d conversion factor 0.059, e conversion factor 10. f LD results are based on univariate analysis of NHL versus the benign group, since we concluded that LD is only a marker for NHL. g TARC results are based on univariate analysis of cHL versus benign group, since we concluded that TARC is only a marker for cHL. Outcomes in bold represent the variables that were included in the multivariate analysis.

Gender
As described in the statistical analysis, we performed some extra analysis on TARC to investigate whether it is a predictive factor for all types of lymphoma. TARC was found to be a predictive factor for high-grade lymphoma in our univariate analysis. However, this was only due to the 79 cases of cHL in the malignant group. TARC values were significantly higher in the cHL group compared to the NLPHL, NHL, and the benign group (median 7.207 pg/mL in cHL versus median 66 pg/mL in NLPHL, median 155 pg/mL in NHL and median 88 pg/mL in the benign group, p < 0.001). The ROC analysis showed that TARC is an excellent predictor for cHL (AUC 0.954, 95% CI 0.920-0.989), but not for other types of lymphoma (AUC 0.567, 95% CI 0.429-0.704).
Based on the results of the univariate analysis, we decided to include all variables with a p-value of <0.001 combined with an OR of larger than five and a CI of greater than one in the multivariate analysis. The presence of trachea deviation, obstructed airway, and vena cava superior syndrome were very specific for lymphoma. However, we excluded these factors from the multivariate analysis, since these were almost only present in patients with enlarged mediastinum, and we decided to add enlarged mediastinum in our multivariate analysis, since this can be established on a routine X-thorax. The same accounted for the presence of cervical level IV lymph nodes and the presence of supraclavicular involvement in general, since level IV refers to a supraclavicular region. We, therefore, included only supraclavicular involvement.
Univariate analysis of the NLPHL and the benign group showed that patients with NLPHL were significantly more often male (85% versus 50% p = 0.02), had more pathological lymph nodes visible on ultrasound (92% versus 36% p < 0.001), and never had B-symptoms (0% versus 30% p = 0.02). The C-reactive protein (CRP) and erythrocyte sedimentation rate (ESR) were significantly lower in patients with NLPHL.

Multivariate Analysis Creates a Diagnostic Model for Predicting Lymphoma
For multivariate analysis, outcomes were split into five diagnostic groups. The group with other malignancies contained only one patient with LCH and this patient was therefore excluded. We decided to exclude NLPHL patients for the creation of the model, because NLPHL showed no predictive quality in univariate analysis and because of the lack of medical need for immediate care for these patients. However, we did include NLPHL in the visualization of the outcome of the scoring model to show the differences between the groups. Therefore, the model was created based on 171 patients. In our first model, NHL was difficult to differentiate from cHL. According to the literature, lactate dehydrogenase (LD) is a well-known predictive factor for NHL, but not for HL [8,56,57]. LD was not statistically significant in our univariate analysis. We hypothesized that this was caused by the outcome comparison grouped in malignant and benign, since the malignant group contained mostly HL cases. For that reason, we tested LD in our multivariate model as well. Adding LD gave a better predictive power for NHL, and therefore LD was included in the final model.
We started the multivariate analysis with 13 variables, based on the univariate analysis. One variable gave no added value in the multivariate analysis (CRP), and was therefore rejected. The Kendall Tau test showed that most predictors correlate with other predictors, and therefore raise concern for collinearity. LASSO, on the other hand, attributed a positive feature importance to all 12 variables; although, the contribution of some of the variables was relatively low. Since all variables contributed positively, we decided to keep these 12 variables in the final model.
The following variables were included in the final scoring model (listed based on weighting factor): involvement of more than three body regions; involvement of mediastinum or hilum; TARC > 850 pg/mL; the presence of pathological lymph nodes on ultrasound; involvement of more than three cervical levels; enlarged mediastinum; cervical level V involvement; involvement of supraclavicular lymph nodes; involvement of infraclavicular lymph nodes; enlarged liver and/or spleen; neutrophils > 6.0 × 10 9 /L and LD > 260 U/L. The variables and their corresponding weighing factors are distributed in Figure 2 (maximum score 102).
variables was relatively low. Since all variables contributed positively, we decided to ke these 12 variables in the final model.
The following variables were included in the final scoring model (listed based weighting factor): involvement of more than three body regions; involvement of medi tinum or hilum; TARC > 850 pg/mL; the presence of pathological lymph nodes on ult sound; involvement of more than three cervical levels; enlarged mediastinum; cervi level V involvement; involvement of supraclavicular lymph nodes; involvement of inf clavicular lymph nodes; enlarged liver and/or spleen; neutrophils > 6.0 × 10 9 /L and LD 260 U/L. The variables and their corresponding weighing factors are distributed in Fig  2 (maximum score 102). After CV, the final model provided a sensitivity of 95% (95% CI 89-98%) and a specificity of 88% (95%CI 77-94%). The ROC curve of the logistic regression model is shown in Figure 4A. The AUC was 92% (95% CI 87-96%). The logistic regression model curve is shown in Figure 4B. The ROC curves of the outcomes of the cross validation are shown in Figure S1. The sensitivity and specificity were similar to the alternative machine learning models, confirming our choice of using the explainable logistic regression (Table S4). ROC curves of the different models are shown in Figure S2. When keeping the numeric values continuous when training the model, we observed a similar sensitivity of 94% ((95% CI 87-98%), but a lower specificity of 76% ((95% CI 64-85%), potentially due to overfitting on the continuous scales. Youden's Index was the highest at 86% with a score of 33.5. However, this score does not take into account that false negatives are more costly than false positives; therefore, we chose to reduce the score to 27.5. With this cut-off point, there were two false negative cases of NHL, and six false positive cases in a total of 171 patients. After creating the model, we tested it on the second (test) cohort. First, we compared the test group and the study group for overall differences. For all included variables, there were no significant differences between the test group and study group, in total; between the two malignant groups; and between the two benign groups (p < 0.01). In the test cohort, the model provided a sensitivity of 100% (95% CI 80-100%) and a specificity of 93% (95% CI 80-98%) for detecting high-grade lymphoma. The ROC curve of the study and test cohort together is shown in Figure 4C. The AUC was 98% (95% CI 95-100%). The logistic regression model curve is shown in Figure 4D.
The final model follows the following formula: With f being any of the 12 predictors, x being the presence of feature f (1 if condition applies, 0 if not), and a being the weighing factor ( Figure 2). R corresponds to the resulting score that needs to be above 27.5 to result in a classification to have a very high chance to be malignant.
Added Value of the Model Compared to the Literature An enlarged mediastinum, pathological lymph nodes, supraclavicular involvement, and extended disease are sometimes described in the literature as "red flags". In the univariate analysis, 64 out of 66 patients (97%) with an enlarged mediastinum were diagnosed with a malignancy. The multivariate model recognized all these patients as malignant and could not identify the two patients without malignancy as benign. For the other red flags, the positive predictive value of the variables was lower; 91% for more than three body regions involved (extended disease), 84% for supraclavicular involvement, and 81% for pathological lymph nodes on ultrasound. The model showed added value for these patients; 2 out of 8 (25%) patients with more than three body regions were identified correctly as not malignant with the model, 13 out of 19 (68%) patients with supraclavicular involvement were identified correctly, and 18 out of 24 (75%) with pathological lymph nodes were identified correctly. The six patients that the model misdiagnosed had both supraclavicular involvement and pathological lymph nodes, and more than three body regions involved.
One of the two false negative patients had none of the red flags. For this patient, adding red flags for immediate referral would not have made a difference. The other patient did have more than three body regions involved, and for this patient, immediate referral would have made the difference.

Added Value of TARC
TARC is a novel biomarker that is so far not routinely used in the work-up for pediatric lymphadenopathy. Therefore, we analyzed the additional value of TARC in our model. To investigate this, we set all TARC values to be above the cut-off of 850 pg/mL, since we hypothesized missing a malignant case would be more harmful than an unnecessary referral. In that case, another 21 out of the 67 (31%) benign patients would have been referred unnecessary. If we set all TARC values to be normal (<850 pg/mL), one patient with malignancy would be missed. So, TARC has major value to the model, especially for avoiding unnecessary referral.

Discussion
We developed a machine learning diagnostic model with 12 variables, including the novel biomarker TARC, which can serve as an important decision tool for the adequate referral of children with cervical lymphadenopathy. Lymphadenopathy is a frequent problem in children, and therefore it deserves a targeted approach. Timely recognition of high-grade lymphomas is a prerequisite. On the other hand, unnecessary referral to a pediatric oncologist leads to high stress in children and parents. Most studies identified individual predicting factors for lymphoma. A few studies created a multivariate model, but none of these were sufficiently discriminative for application in clinical practice [31,34,37]. Decision models could contribute to a targeted approach and have not yet been described in the literature. The high statistical power of this diagnostic model is very promising.
The model is intended as a diagnostic tool for patients with a high suspicion of malignant lymphoma. Based on a high sensitivity and specificity for diagnosing high-grade lymphoma in our cohort, this model appears to be very suitable for targeted decision making in children with cervical lymphadenopathy suspected of lymphoma. Patients with high-grade lymphoma will be identified and referred early. It may also prevent referrals to the pediatric oncologist in patients with a benign cause of lymphadenopathy, leading to less unnecessary biopsies as well. In our study, all biopsies in benign cases were performed to rule out malignancy. In retrospect, in almost all cases the application of our diagnostic tool would have prevented the referral to the pediatric oncologist and the biopsy that was performed. Therefore, our decision model will contribute to a reduction of unnecessary referrals and biopsies. Additionally, this could be highly beneficial for children and parents because of the stress that unnecessary referrals to a pediatric oncologist may cause. Finally, a more targeted approach will most likely lead to a higher cost-effectiveness of care.
In the Princess Máxima Centre for pediatric oncology, the diagnostic model is implemented during decision making about a biopsy in patients with cervical lymphadenopathy. Furthermore, the score is used for decision making when pediatricians in the Netherlands are consulting the Princess Máxima Centre about a patient with cervical lymphadenopathy, especially in the case of there being a long distance between the different hospitals. Before using the diagnostic model in general pediatric care, the model should be validated first in that setting. However, implementation will be easy, since all variables can be performed in general pediatric care. The diagnostic process for patients in a general pediatric practice often includes a broader work-up, including screening for infectious diseases, which is not taken into account in this study. To use the model, the following diagnostic investigations should be obtained: full physical examination, including all lymph node stations; an ultrasound of the neck region, focusing on the presence of pathological lymph nodes and the involved levels; an X-thorax to detect mediastinal enlargement, hilar and mediastinal lymphadenopathy, or lung lesions; an ultrasound of the abdomen, focusing on the presence of pathological lymph nodes, involvement of organs, and hepatosplenomegaly; TARC; LD; and neutrophil count. The investigations do not have to be performed at once; if a patient does not score points at the first investigation, the pediatrician can calculate if it is necessary to complete further investigations, based on the scoring points of the model. TARC may not be determined in every hospital, but could be sent to a neighboring hospital for determination. This work-up may be broader than the current pediatric approach. However, all proposed diagnostic tests are not invasive or are minimally invasive, and this diagnostic model will lead to less unnecessary referrals and invasive biopsies.
We have previously shown that TARC is a diagnostic marker for pediatric cHL [38,39]. The current study showed that TARC is a valuable diagnostic marker in patients with cervical lymphadenopathy as well. TARC has the greatest added value in patients with a benign cause of lymphadenopathy and in patients with cHL.
Despite the high sensitivity and specificity, the model resulted in two false negative and six false positive cases. One patient with B-LBL presented with a skin lesion and cervical lymphadenopathy without other signs of disease and was, therefore, missed as lymphoma. The other patient had BL; in this case, there was no ultrasound of the neck performed, and therefore we could not score several variables. This most likely has contributed to the false negative result. The six false positive cases were patients with extended lymphadenopathy.
One patient was diagnosed with tuberculosis, one with atypical mycobacterial infection caused by an inborn IFNGR1 mutation, two with rheumatological disorders, and two with other infectious diseases. Four of these are rare disorders, sometimes mimicking lymphoma, thus justifying referral to the oncologist.
The diagnostic model is not suitable for the detection of NLPHL. NLPHL only accounts for 5-8% of pediatric HL patients [62] and most often presents with stage I or II disease [63,64]. All patients in our study achieved full remission after surgery alone or after three cycles of chemotherapy. The lymphadenopathy in these patients existed for a longer period (mean 8.9 months, range 1.5-42 months). We argue that it is acceptable that the model misses these NLPHL patients at the first assessment, since NLPHL is a low-grade tumor with a survival rate of 100% (22,26). However, it is important that NLPHL will be detected, referred, and treated. We therefore recommend performing a lymph node biopsy if the cervical lymphadenopathy persists for more than 4 months after the first admission and cannot be explained otherwise.
A strength of this study is the data-driven approach instead of the traditional biostatistical, hypothesis-driven approach. The data-driven approach is highly efficient in creating the most powerful model and reduces human bias in decision making. The best predicting model was created by machine learning, based on the available data. Using machine learning also has its limitations. This data-driven approach might surface non-existing correlations by chance, due to bias being present in the data. Alternatively, the final model might be too difficult to understand by the ones supposed to use it. This may reduce adaption significantly. To address these concerns, the model was judged and evaluated by the investigators by comparing it with the existing literature. The most important limitation of this multivariate model is the potential collinearity of the variables. From a clinical perspective, collinearity is to be expected, since patients most often present with a combination of symptoms. Our correlation matrix raised suspicion for collinearity; however, LASSO showed a positive contribution for each of the variables, and therefore we decided to leave all variables in. Future research should reveal if the model can be adjusted or minimalized. Potential solutions to address collinearity in future research are the use of an algorithm, such as optimized Elasticnet. Another limitation of this study is the retrospective design. Data were collected from the electronic patient files. However, no standard formats for description of the variables existed, for example, the involvement of different body regions. For these variables, it would be of added value to describe this in standard formats and analyze prospectively. Moreover, due to the retrospective design, some missing values in different variables were found. The machine learning model in multivariate analysis was binary (positive or negative) and assumed that no missing variables occur. Therefore, we decided to mark all missing variables as negative, since we hypothesized a diagnostic test is not performed when the suspicion of abnormalities is low. This was a risk, especially if there was a higher percentage of missing values in the malignant group. This was the case in the presence of pathological lymph nodes on ultrasound. We expect that these variables would be given a higher weighing factor if the values were not missing. All of the abovementioned limitations highlight the need to retrain the model with prospective data in the future.
The authors acknowledge that this work is based on a patient population referred to the pediatric oncologist, making the diagnosis of malignancy more likely. The strength of this study population is the high number of lymphoma patients, which is not reported in the literature yet. Moreover, due to the centralization of pediatric oncology care in the Netherlands, this dataset contains all cases of lymphadenopathy referred to the pediatric oncologist within the whole country. On the other hand, this study is limited by the fact that the study patients are a selection of lymphadenopathy patients with suspicion of malignancy seen by general practitioners and referral hospitals, which is illustrated by the fact that only a few patients had single-node disease. Therefore, additional research is needed to explore the effectiveness of the model in general pediatric care. Furthermore, the feasibility of the model, which integrates different diagnostics, including the novel biomarker TARC, should also be tested in general pediatric care.

Conclusions
In conclusion, lymphadenopathy is a frequent problem in children, and therefore deserves a targeted diagnostic approach. Timely detection of high-grade lymphomas is a prerequisite. On the other hand, unnecessary referral to a pediatric oncologist leads to high stress in children and parents. Decision models could contribute to this targeted approach. Our 12-factor logistic regression machine learning diagnostic model is a crucial step forward towards a targeted diagnostic approach in children with cervical lymphadenopathy suspected of lymphoma. Our study results suggest that this model will support the identification of patients with high-grade lymphoma. This may reduce unnecessary referral to the pediatric oncologist and unnecessary biopsies. Future studies should focus on testing and validating our decision model for children with cervical lymphadenopathy in different clinical settings.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to ethical considerations. The codes of the machine learning model are available on request from the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest. The funding source had no involvement on the study design, data collection, data analysis, interpretation of data, writing the report, or the decision to submit the paper for publication.