Performance Metrics of the Scoring System for the Diagnosis of the Beckwith–Wiedemann Spectrum (BWSp) and Its Correlation with Cancer Development

Simple Summary Beckwith-Wiedemann syndrome (BWSp) has recently been renamed to spectrum to reflect its diverse presentation and clinical features. In 2018, an international consensus developed a diagnostic approach and redefined clinical criteria, establishing a score above which a diagnosis can be made in case of a negative genetic test. We described a cohort of 831 patients to validate the efficacy of the 2018 consensus score for BWSp diagnosis, and to gather data on the performance of previous and current scoring systems, as well as the relationship between BWSp features, molecular tests, and the risk of cancer development. Abstract Different scoring systems for the clinical diagnosis of the Beckwith–Wiedemann spectrum (BWSp) have been developed over time, the most recent being the international consensus score. Here we try to validate and provide data on the performance metrics of these scoring systems of the 2018 international consensus and the previous ones, relating them to BWSp features, molecular tests, and the probability of cancer development in a cohort of 831 patients. The consensus scoring system had the best performance (sensitivity 0.85 and specificity 0.43). In our cohort, the diagnostic yield of tests on blood-extracted DNA was low in patients with a low consensus score (~20% with a score = 2), and the score did not correlate with cancer development. We observed hepatoblastoma (HB) in 4.3% of patients with UPD(11)pat and Wilms tumor in 1.9% of patients with isolated lateralized overgrowth (ILO). We validated the efficacy of the currently used consensus score for BWSp clinical diagnosis. Based on our observation, a first-tier analysis of tissue-extracted DNA in patients with <4 points may be considered. We discourage the use of the consensus score value as an indicator of the probability of cancer development. Moreover, we suggest considering cancer screening for negative patients with ILO (risk ~2%) and HB screening for patients with UPD(11)pat (risk ~4%).

Beckwith-Wiedemann syndrome has recently been renamed to spectrum in an international consensus [10] to emphasize its heterogeneity in a clinical presentation that variably includes several features in various degrees of severity. The spectrum includes so-called 'classic' and 'atypical' forms and spans to isolated lateralized overgrowth (ILO) [11]. Due to mild forms, it is well known that BWSp is underestimated from an epidemiological point of view. BWSp diagnosis is hampered by the low specificity of some features, very common in the general population, by difficulties in recognizing dysmorphisms in different populations [12], and by the level of tissue mosaicism of the underlying molecular defect, sometimes low or confined to tissues other than blood and difficult to be explored [13]. Indeed, some patients have negative molecular tests despite a clear-cut phenotype. Nevertheless, a prompt suspect and early diagnosis is the key to initiating specific follow-up and cancer screening, given that the majority of tumors in BWSp occur in early childhood [4]. When trying to standardize a diagnostic approach heterogeneous in the past, several scoring systems based on clinical criteria have been devised [2,3,[14][15][16][17] and accumulated over time (Table 1), as well as management recommendations [18]. Lastly, in 2018, an international consensus [10] elaborated a diagnostic approach and redefined the clinical diagnostic criteria, separating for the first time the criteria for requesting molecular tests from those permitting a clinical diagnosis in the case of negative testing [10]. Based on their specificity, the consensus scoring system identified cardinal and supportive features, contributing 2 and 1 point, respectively. At least 2 points are required to trigger a specific molecular test and 4 to make a clinical diagnosis (notwithstanding a negative molecular test). Besides driving a genotype-based cancer screening, a positive molecular test allows making a diagnosis in cases with <4 points and supports the clinical diagnosis in those with ≥4 points. As pointed out in the international consensus [10], despite the extensive literature review, the scoring system criteria do not derive from a methodological or statistical review of case series but rather from a shared reasoned approach to the problem. Therefore, it is necessary to analyze the results deriving from implementing these recommendations in clinical practice to expand the evidence of the effectiveness and efficiency of such a system. This study aims at providing a systematic and statistical validation of the consensus criteria by evaluating its reliability in clinical practice, comparing it with the previously used criteria [2,3,[14][15][16][17], analyzing its performance metrics against the outcomes of positive molecular tests and tumor development, as well as identifying the clinical features with higher diagnostic accuracy.

Methods
Study cohort: This is a multicentric retrospective observational study that included patients from the main clinical genetics and rare disease centers in Italy, diagnosing and following patients with BWSp. Consent was obtained from the participants/parents, and this study was approved by the ethics committee (IRB 93/2021, Protocol 0070581-1 July 202). We asked the participant center to provide comprehensive data on the genotype and phenotype of the patients diagnosed with BWSp and filled out a spreadsheet including all the features described in BWSp. Efforts were made to collect medical records as completely as possible. Data were merged and entered into the database by assigning patients a unique identification number.
The BWSp clinical score was calculated according to the international consensus criteria [10]. Scores and clinical diagnosis according to a previous version of diagnostic criteria, were also calculated according to each of the authors providing such definition [2,3,[14][15][16][17] (Table 1). This is because, before the introduction of the consensus criteria there was heterogeneity in submitting patients to molecular tests and diagnosing BWSp clinically. We included in the analysis of this study cases diagnosed previous and after 2018, notwithstanding complete fulfillment of a clinical diagnosis of BWSp according to the consensus criteria.
Genotyping: All the patients were tested on peripheral blood-extracted DNA and underwent analysis of the methylation level at the IC1 and IC2 by MS-MLPA [19,20], except for 20, who were tested by pyrosequencing [21]. Molecular testing on tissue other than blood was not carried out systematically by the various centers, so the outcome of these assessments has limited value for the purpose of this study. Overall, 14 patients with negative blood-extracted DNA tests were tested on DNA extracted from a skin biopsy of a hypertrophic body region (n = 13) or peritumoral tissue (n = 1).
MS-MLPA allows simultaneously detecting both hypermethylation at H19/IGF2:IG-DMR (IC1)-hypomethylation at KCNQ1OT1:TSS-DMR (IC2) and the copy number variants in these regions. In the case of both IC2-LoM and IC1-GoM with a proper copy number, a UPD(11)pat genotype was disclosed. Either microsatellite segregation or SNP array was then investigated to refine the extension of the disomy because when disomy involves the whole 11 chromosomes, paternal genome-wide UPD may occur [22]. We decided to rule out cases with GWpUPD/entire chromosome disomy a priori because (a) this study was multicenter, and this test was not performed consistently across the various referral centers that provided the data, and (b) these patients usually represent a very small subset of cases with a very different phenotype. Hence, we figured the criteria would not perform as well in such cases. In cases that are negative for MS-MLPA and with characteristics such as cleft palate, hereditary familiarity forms, diagnostic score > 8, or ≥4 with omphalocele, the screening of pathogenic variants in CDKN1C sequencing [22] completes the diagnostic flowchart.
Statistical analysis: The differences between the clinical characteristics of the patients in the various molecular subgroups were evaluated with Fisher's exact test or chi-square (for categorical variables) or Student's t-test (after verification of homoscedasticity by Shapiro-Wilk test) for continuous variables. For the comparison of several groups, a one-way ANOVA test with a post hoc Bonferroni test was performed for the continuous variables. Correlations were tested with linear logistic regression (Pearson method).
Positive and negative predictive value, sensitivity, specificity, and diagnostic accuracy for a positive methylation test were assessed for each of the clinical diagnostic scores by standard formulas for each of the items of the scoring systems proposed over time, used as a gold standard for the positive molecular test. The performance of each scoring system was also analyzed to identify a positive molecular test by a receiver operator characteristics (ROC) analysis, evaluated based on the area under the ROC curve (AUC).
A statistical significance threshold of two-sided p < 0.05 was used for all the tests. Data were analyzed using GraphPad Prism 8.0 packages (Graphpad Holdings, LLC, San Diego, CA, USA).

Results
Genotype and phenotype: We analyzed characteristics and results of the molecular tests of 831 patients with features within the BWSp submitted to specific molecular tests. Six hundred ninety-nine received a diagnosis of BWSp according to the consensus definition (524 with a positive test, 175 negatives with clinical criteria). One hundred thirty-two patients (15.9%) were negative for testing and had <4 points. The cohort distribution is summarized in Figure 1. In the inner circle, the cohort is divided into patients with positive and negative tests. However, the outer circle distinguishes patients with a clinical diagnosis and/or positive test from patients with less than 4 points and a negative test. Among the 831 patients, 322 had an IC2-LoM (38.7%, 61.0% of those with a positive molecular test), 52 had an IC1-GoM (6.3%, 9.9% with a positive molecular test), 138 had a UPD(11)pat (16.6%, 26.3% with a positive molecular test), 12 had a pathogenic variant in CDKN1C (1.4%, 2.3% with a positive molecular test), and 307 were negative (36.9%). In 519 patients, the molecular defect was found on blood-extracted DNA. In contrast, among the 14 patients tested on tissue-extracted DNA (all with score <4), 5 (35.7%) had positive tests (3 UPD(11)pat and 2 IC1-GoM).
with omphalocele, the screening of pathogenic variants in CDKN1C sequencing [22] completes the diagnostic flowchart.
Statistical analysis: The differences between the clinical characteristics of the patients in the various molecular subgroups were evaluated with Fisherʹs exact test or chi-square (for categorical variables) or Student's t-test (after verification of homoscedasticity by Shapiro-Wilk test) for continuous variables. For the comparison of several groups, a oneway ANOVA test with a post hoc Bonferroni test was performed for the continuous variables. Correlations were tested with linear logistic regression (Pearson method).
Positive and negative predictive value, sensitivity, specificity, and diagnostic accuracy for a positive methylation test were assessed for each of the clinical diagnostic scores by standard formulas for each of the items of the scoring systems proposed over time, used as a gold standard for the positive molecular test. The performance of each scoring system was also analyzed to identify a positive molecular test by a receiver operator characteristics (ROC) analysis, evaluated based on the area under the ROC curve (AUC).
A statistical significance threshold of two-sided p < 0.05 was used for all the tests. Data were analyzed using GraphPad Prism 8.0 packages (Graphpad Holdings, LLC, San Diego, Califormia, USA).

Results
Genotype and phenotype: We analyzed characteristics and results of the molecular tests of 831 patients with features within the BWSp submitted to specific molecular tests. Six hundred ninety-nine received a diagnosis of BWSp according to the consensus definition (524 with a positive test, 175 negatives with clinical criteria). One hundred thirty-two patients (15.9%) were negative for testing and had < 4 points. The cohort distribution is summarized in Figure 1. In the inner circle, the cohort is divided into patients with positive and negative tests. However, the outer circle distinguishes patients with a clinical diagnosis and/or positive test from patients with less than 4 points and a negative test. Among the 831 patients, 322 had an IC2-LoM (38.7%, 61.0% of those with a positive molecular test), 52 had an IC1-GoM (6.3%, 9.9% with a positive molecular test), 138 had a UPD(11)pat (16.6%, 26.3% with a positive molecular test), 12 had a pathogenic variant in CDKN1C (1.4%, 2.3% with a positive molecular test), and 307 were negative (36.9%). In 519 patients, the molecular defect was found on blood-extracted DNA. In contrast, among the 14 patients tested on tissue-extracted DNA (all with score <4), 5 (35.7%) had positive tests (3 UPD(11)pat and 2 IC1-GoM).  Table 2 summarizes the clinical features of the patients divided by molecular subtypes. Several differences in the features of the molecular subgroups, already reported elsewhere [4,[23][24][25][26][27][28][29][30], were found: macroglossia was less represented in the UPD(11)pat group; facial naevus flammeus, ear anomalies, postnatal overgrowth, preterm birth, and abdominal wall defects (especially the major ones) were more common in the IC2-LoM and CDKN1C variant groups; lateralized overgrowth was very common in the UPD(11)pat group and rare in the CDKN1C variant one; fetal overgrowth, organomegaly, and polyhydramnios were more represented among patients with IC1-GoM; placentomegaly, cryptorchidism, and cleft palate were more common in the CDKN1C variant one; transient hypoglycemia was less common among negative patients; assisted reproduction and twinning were much more frequent in cases with IC2-LoM. As for concerns with tumors, the already assessed differences [5,[31][32][33][34] were confirmed, with patients with IC1-GoM with a high risk (especially for Wilms Tumor (WT)), patients with UPD(11)pat one with an intermediate one (mostly WT and hepatoblastoma (HB)), patients with CDKN1C variants with a low risk (especially neuroblastic tumors), and patients with IC2-LoM with a very low one (for several different tumor types).
average consensus score of 5.1 ± 2.1 (median 5.0) ( Table 2). r tests had a score lower than those with a positive test (p < e was rather due to a lower number of cardinal features (p s. No differences in the score were found between the ents with IC1-GoM, despite a score similar to that of the fewer cardinal features (p = 0.043) and more supportive . bution of patients' clinical scores for each of the molecular Figure 3 displays the molecular subgroups per each point e did not observe any differences in the score distribution bgroups of BWSp, the distribution in cases with negative kewed towards low scores. The probability of having a ortionately to a higher score (p < 0.001), ranging from 70% e ≥ 10 points). Patients with a score <4 were 210: 73 had a d-extracted DNA (34.8%), 20 with 2 points (23.8%), 53 with Abbreviations: Gain of methylation at imprinting center 1 (IC1-GoM), loss of methylation at imprinting center 2 (IC2-LoM), paternal uniparental disomy of the 11p15.5 chromosomal region (UPD(11)pat), pathogenic variants in CDKN1C (CDKN1C mutation), standard deviation (SD). bbreviations: Gain of methylation at imprinting center 1 (IC1-GoM), loss of methylation at printing center 2 (IC2-LoM), paternal uniparental disomy of the 11p15.5 chromosomal region PD(11)pat), pathogenic variants in CDKN1C (CDKN1C mutation), standard deviation (SD). ⁑ p = 043; • p = 0.013; * p<0.001, excluding patients with negative molecular tests.
Score: The patients had an average consensus score of 5.1 ± 2.1 (median 5.0) ( Table 2). atients negative for molecular tests had a score lower than those with a positive test (p < 001). However, this difference was rather due to a lower number of cardinal features (p 0.001) than supportive ones. No differences in the score were found between the olecular subgroups but patients with IC1-GoM, despite a score similar to that of the her molecular subtypes, had fewer cardinal features (p = 0.043) and more supportive es (p = 0.013) than the others. Figure 2 reports the distribution of patients' clinical scores for each of the molecular btypes, while, contrariwise, Figure 3 displays the molecular subgroups per each point the consensus score. While we did not observe any differences in the score distribution p = 0.043; • p = 0.013; * p < 0.001, excluding patients with negative molecular tests. Score: The patients had an average consensus score of 5.1 ± 2.1 (median 5.0) ( Table 2). Patients negative for molecular tests had a score lower than those with a positive test (p < 0.001). However, this difference was rather due to a lower number of cardinal features (p < 0.001) than supportive ones. No differences in the score were found between the molecular subgroups but patients with IC1-GoM, despite a score similar to that of the other molecular subtypes, had fewer cardinal features (p = 0.043) and more supportive ones (p = 0.013) than the others. Figure 2 reports the distribution of patients' clinical scores for each of the molecular subtypes, while, contrariwise, Figure 3 displays the molecular subgroups per each point of the consensus score. While we did not observe any differences in the score distribution in the four main molecular subgroups of BWSp, the distribution in cases with negative molecular tests was clearly skewed towards low scores. The probability of having a negative test diminished proportionately to a higher score (p < 0.001), ranging from 70% (score of 2 points) to 8% (score ≥ 10 points). Patients with a score < 4 were 210: 73 had a positive molecular test on blood-extracted DNA (34.8%), 20 with 2 points (23.8%), 53 with 3 points (40.7%), and 5 on tissue extracted-DNA (14 tested, 35.7%). One hundred thirty-two patients were negative for molecular tests and had diagnostic scores < 4. According to the consensus criteria, they cannot be diagnosed with BWSp. However, we included such cases in the cohort to provide a follow-up and comparison. A total of 56 had 2 points (8 with macroglossia, 48 with isolated lateralized overgrowth), 76 had 3 points deriving by a combination of one minor criterion with macroglossia (n = 11), lateralized overgrowth (n = 56), or omphalocele (n = 2), or from a combination of 3 minor criteria.
Tumors: Table 3  The score of patients without tumors was similar to the score of those with a tumor (not including points related to tumors, 5.4 ± 2.4 vs. 5.0 ± 2.0, p not significant). There was no correlation between the consensus score and the likelihood of developing a tumor (r 2 = 0.301, p = 0.124) nor differences between patients with more or less than 4 points (p = 0.160). Among the 132 negative patients with less than 4 points, 2 developed WT (1.5%, 1.9% of the patients with ILO), having 3 points and lateralized overgrowth as a cardinal feature. Abbreviations: Gain of methylation at imprinting center 1 (IC1-GoM), loss of methylation at imprinting center 2 (IC2-LoM), paternal uniparental disomy of the 11p15.5 chromosomal region (UPD(11)pat), pathogenic variants in CDKN1C (CDKN1C mutation), standard deviation (SD). ⁑ p = 0.043; • p = 0.013; * p<0.001, excluding patients with negative molecular tests.
Score: The patients had an average consensus score of 5.1 ± 2.1 (median 5.0) ( Table 2). Patients negative for molecular tests had a score lower than those with a positive test (p < 0.001). However, this difference was rather due to a lower number of cardinal features (p < 0.001) than supportive ones. No differences in the score were found between the molecular subgroups but patients with IC1-GoM, despite a score similar to that of the other molecular subtypes, had fewer cardinal features (p = 0.043) and more supportive ones (p = 0.013) than the others. Figure 2 reports the distribution of patients' clinical scores for each of the molecular subtypes, while, contrariwise, Figure 3 displays the molecular subgroups per each point of the consensus score. While we did not observe any differences in the score distribution in the four main molecular subgroups of BWSp, the distribution in cases with negative molecular tests was clearly skewed towards low scores. The probability of having a negative test diminished proportionately to a higher score (p < 0.001), ranging from 70% (score of 2 points) to 8% (score ≥ 10 points). Patients with a score <4 were 210: 73 had a positive molecular test on blood-extracted DNA (34.8%), 20 with 2 points (23.8%), 53 with 3 points (40.7%), and 5 on tissue extracted-DNA (14 tested, 35.7%). One hundred thirtytwo patients were negative for molecular tests and had diagnostic scores <4. According to the consensus criteria, they cannot be diagnosed with BWSp. However, we included such cases in the cohort to provide a follow-up and comparison. A total of 56 had 2 points (8 with macroglossia, 48 with isolated lateralized overgrowth), 76 had 3 points deriving by a combination of one minor criterion with macroglossia (n = 11), lateralized overgrowth (n = 56), or omphalocele (n = 2), or from a combination of 3 minor criteria.     Table 3 details the 48 patients who developed a total of 50 tumors (2 patients had 2 tumors). A consistent difference in tumor risk was found in the four molecular defects, with the risk ranging from the highest (19.2%) in IC1-GoM to the lowest in IC2-LoM (1.6%) (p < 0.001). Tumor types were 26 WT (4 bilateral), 7 HB, 5 adrenal carcinomas (AK), 4 neuroblastomas (NB), 2 pancreatoblastomas (PB), 1 Sertoli cell tumor (SCT), 1 hepatocarcinoma (HC), 1 acute lymphoblastic leukemia (LLA), 1 Hodgkin lymphoma (LH), 1 rhabdomyosarcoma (RMS), and 1 pheochromocytoma (PCC).    Metrics of the scoring systems: Table 4 summarizes the performance metrics and characteristics of the several scoring systems for BWSp proposed over time. Each of the scoring systems based on clinical criteria was evaluated by ROC curves against the outcome of the molecular test and its ability to detect cases of cancer. For each scoring system, patients were divided into those with positive/negative molecular tests and with/without tumor development to assess the criteria's ability to identify cases of interest. The sensitivity of the criteria ranged from 0.4 [14] to 0.88 [17] and specificity from 0.28 [17] to 0.86 [14]. The criteria from Ibrahim 2014 and the Consensus were those with the highest area under curve (AUC) and diagnostic accuracy in detecting patients with a positive molecular test. These data are presented visually in Figure 4, where the ROC curve for each scoring system is compared to the ROC curve for the consensus criteria (indicated with a red line). The performance against cancer development was tested, excluding from the score points deriving from the development of embryonal tumors. The consensus criteria allowed the diagnosis of BWSp in 40 of the 48 cases with cancer (83.3%), including 7 with negative molecular tests: the 8 cases not included were negative to molecular tests and scored less than 4 points. It is notable that LO remained a consistent clinical feature in patients with negative molecular analysis who developed tumors and had not previously reached the minimum score threshold for a clinical diagnosis of BWS, despite the possibility of mosaicisms. It is worth mentioning that none of these patients have yet been tested in alternative tissue, so we were unable to exclude mosaicisms.  Abbreviations: Gain of methylation at imprinting center 1 (IC1-GoM), loss of methylation at imprinting center 2 (IC2-LoM), paternal uniparental disomy of the 11p15.5 chromosomal region (UPD(11)pat), pathogenic variants in CDKN1C (CDKN1C mutation), standard deviation (SD). ⁑ p = 0.043; • p = 0.013; * p<0.001, excluding patients with negative molecular tests. Score: The patients had an average consensus score of 5.1 ± 2.1 (median 5.0) ( Table 2). Patients negative for molecular tests had a score lower than those with a positive test (p < 0.001). However, this difference was rather due to a lower number of cardinal features (p < 0.001) than supportive ones. No differences in the score were found between the molecular subgroups but patients with IC1-GoM, despite a score similar to that of the other molecular subtypes, had fewer cardinal features (p = 0.043) and more supportive ones (p = 0.013) than the others. Figure 2 reports the distribution of patients' clinical scores for each of the molecular subtypes, while, contrariwise, Figure 3 displays the molecular subgroups per each point of the consensus score. While we did not observe any differences in the score distribution in the four main molecular subgroups of BWSp, the distribution in cases with negative molecular tests was clearly skewed towards low scores. The probability of having a negative test diminished proportionately to a higher score (p < 0.001), ranging from 70%  Abbreviations: Gain of methylation at imprinting center 1 (IC1-GoM), loss of methylation at imprinting center 2 (IC2-LoM), paternal uniparental disomy of the 11p15.5 chromosomal region (UPD(11)pat), pathogenic variants in CDKN1C (CDKN1C mutation), standard deviation (SD). ⁑ p = 0.043; • p = 0.013; * p<0.001, excluding patients with negative molecular tests. Score: The patients had an average consensus score of 5.1 ± 2.1 (median 5.0) ( Table 2). Patients negative for molecular tests had a score lower than those with a positive test (p < 0.001). However, this difference was rather due to a lower number of cardinal features (p < 0.001) than supportive ones. No differences in the score were found between the molecular subgroups but patients with IC1-GoM, despite a score similar to that of the other molecular subtypes, had fewer cardinal features (p = 0.043) and more supportive ones (p = 0.013) than the others. Figure 2 reports the distribution of patients' clinical scores for each of the molecular subtypes, while, contrariwise, Figure 3 displays the molecular subgroups per each point of the consensus score. While we did not observe any differences in the score distribution in the four main molecular subgroups of BWSp, the distribution in cases with negative molecular tests was clearly skewed towards low scores. The probability of having a ns: Gain of methylation at imprinting center 1 (IC1-GoM), loss of methylation at center 2 (IC2-LoM), paternal uniparental disomy of the 11p15.5 chromosomal region t), pathogenic variants in CDKN1C (CDKN1C mutation), standard deviation (SD). ⁑ p = 0.013; * p<0.001, excluding patients with negative molecular tests.

Tumors:
: The patients had an average consensus score of 5.1 ± 2.1 (median 5.0) ( Table 2). gative for molecular tests had a score lower than those with a positive test (p < ever, this difference was rather due to a lower number of cardinal features (p an supportive ones. No differences in the score were found between the subgroups but patients with IC1-GoM, despite a score similar to that of the cular subtypes, had fewer cardinal features (p = 0.043) and more supportive .013) than the others.   Table 4.    Table 4.

Discussion
The first international consensus group for diagnosing and managing BWSp was established in 2017. It provided the first standardized clinical criteria scoring system and testing algorithm to enhance the diagnosis and management of patients with BWSp. It has been observed that a major limitation of these recommendations was that they were derived from historical data, hampering the generalization to the whole BWSp population, as mostly inferred from 'classic BWS' cohorts [25]. In this study, we tried to apply to a large cohort of patients with BWSp or features within the BWSp, diagnostic criteria to provide evidence-based validation in the real-world context of such criteria. The population we analyzed included more than 800 subjects diagnosed with one of the entities of the BWSp (i.e., classic, atypical, (isolated)-lateralized overgrowth and with a variable association of cardinal features of BWS).
Our study group had an average score of 5.1, significantly lower than that of Duffy KA et al., another big cohort described with similar purposes (score 6.7) [25]. This is in part due to the fact that we included more patients with atypical and ILO phenotypes and also cases with some of the features of the BWSp without a formal diagnosis. Our group has a high percentage of cases negative to molecular tests (36.9%), as more frequently seen in patients with few clinical features of BWSp. Among the patients with positive molecular tests, the four molecular subgroups were roughly represented as expected from the literature, with a prevalence of the IC2-LoM (50-60%),~25% of UPD(11)pat, 10% IC1-GoM, and <5% patients with CDKN1C mutations [7]. Differently from other studies [25], we found no differences between the molecular subgroups in the average score. However, patients with IC1-GoM tended to have fewer cardinal features and more supportive features than the others. In our cohort, also the genotype-phenotype correlations already reported [2,[4][5][6]15,17,[23][24][25][26][27][28][29][30]33] were observed in our patients.
In the cohort, the probability of having a negative molecular test result diminished proportionally to a higher score ranging from 70% in patients with 2 points to 8% in patients with more than 10 points. This difference can result from more localized tissue mosaicism in patients with less than 4 points.
The likelihood of a positive molecular test on blood-extracted DNA was 50% for patients with a score of 4 points, 40% for patients with a score of 3 points, and less than 30% for patients with a score of 2 points. Among patients with a score of fewer than 4 points, the diagnostic yield of molecular tests on blood-extracted DNA was 34.8% (23.8% for patients with only 2 points). It would have been advisable to perform additional molecular investigations on these patients. As indicated in the consensus, molecular testing should be prioritized based on the cause that is most likely to be present. For example, a less severe phenotype with lateralized overgrowth may suggest mosaicism. In these cases, analyzing DNA from sources such as buccal swabs, cultures of fibroblasts, or cells of mesenchymal origin (obtained through surgical resection or excision of hyperplastic tissues) can help improve the detection rate for mosaic defects [10]. However, our study had relatively few cases in which tissue-extracted DNA was tested (14 out of 319 with negative molecular tests). These cases were largely limited to patients with molecular tests performed on bloodextracted DNA. This is likely due to the multicentric and retrospective nature of the study, as tissue-DNA testing was only recently introduced in clinical practice and was performed heterogeneously among the various centers. This might explain the slightly higher fraction of patients with IC2-LoM in our cohort, as most patients who are negative on blood testing but positive on tissue-extracted DNA usually have UPD(11)pat or IC1-GoM. It is interesting to note that more than one-third of the patients (5 out of 14) tested on tissue-extracted DNA were positive (a diagnostic yield of 35.7%). As previously reported, the molecular defects in these cases were frequently UPD(11)pat and IC1-GoM. Therefore, a first-tier approach with tissue testing would be at least comparable, if not even better, in this setting. Given the considerable share of negative cases below 4 points, a molecular approach from tissue-extracted DNA instead of blood could be advisable in patients with the possibility to define a regional involvement (e.g., tumor, ILO, pancreatic hyperplasia . . . ), a testing approach which is commonly used in other conditions characterized by overgrowth and localized mosaicism [35]. In fact, an improvement in diagnostic performance has been documented by analyzing DNA from overgrowth tissue in these conditions, and the greater level of invasiveness in this situation would not only be justified by allowing a differential diagnosis towards other forms of overgrowth (e.g., PIK3CA-related overgrowth spectrum, vascular phenotype overlapping PIK3CA-related overgrowth spectrum with mutations in other genes) or body asymmetry (e.g., Silver-Russell syndrome) [35][36][37], but also by the opportunity to apply more precisely a targeted cancer screening [31] or management [38,39] based on the molecular lesion found within the BWSp. Further studies are needed to test this hypothesis and assess the best approach in such a condition, as well as the increase in the diagnostic yield by this approach.
With a view to defining the performance of the currently used diagnostic criteria for BWSp, we calculated the sensitivity and diagnostic accuracy of the consensus scoring system in detecting molecularly positive cases and cases with tumors. We also compared its performance using ROC curves with previous criteria suggested in the medical literature widely and heterogeneously employed in the years before 2018. The oldest criteria [14] were less sensitive and the most specific, allowing mostly diagnosing 'Classic BWSp'. According to these criteria, nearly 60% of molecularly positive patients would not have been tested molecularly at all. The criteria by Gaston et al., 2001 [17] were more sensitive but were also less specific. Over time, the criteria published demonstrated an improvement in sensitivity at the expense of specificity. This was likely due to the fact that many of the BWSp features are not specific and very common in healthy children, as well as in the BWSp population. Adopting excessively broad and generous criteria would imply that patients with multiple low specificity features frequently in the population (e.g., fetal macrosomia, diastasis recti, or facial naevus simplex) would be diagnosed without being affected. There was a progressive improvement in the performance metrics of the criteria used across the years from 1994 to 2018: the positive predictive value was overall maintained, with a significant increase in the negative predictive value, resulting, therefore, in an increase in diagnostic accuracy over these 34 years. The more recent criteria from Ibrahim 2014 and the consensus were those with the highest AUC and diagnostic accuracy in detecting patients with a positive molecular test. The consensus criteria demonstrated superior performance in predicting cancer development, correctly identifying BWSp in 40 out of the 48 cases with cancer development (83.3%). This included seven cases with negative molecular tests and a clinical diagnosis. The eight cases not diagnosed by the consensus criteria were negative for molecular tests and scored less than 4 points, but all of these patients had ILO. Therefore, it may be advisable to consider molecular testing in other tissues in cases where a negative result is obtained. However, there is a strong suspicion of mosaicism. Additionally, according to consensus recommendations, alternative diagnoses should be considered if all molecular tests are negative.
Indeed, the main novelty introduced by these criteria was to establish a different score threshold to trigger the molecular analysis (≥2 points) and to formalize a clinical diagnosis of BWSp even with a negative molecular test (≥4 points). This expedient allowed obviating the intrinsic nonspecificity of some of the clinical features of BWSp, which are in themselves very common in the general population and, therefore, currently used as support criteria. In the future, similar reasoning could lead to implementation among the supporting criteria also the use of ART, which is significantly more common in subjects with BWSp [40] although frequent in the population. Future studies might clarify if this will result in a further improvement in the score performance metrics.
Regarding ART, IC2-LoM is the subgroup with the highest rate of ART (16.1%). However, it is worth noting that a significant proportion of patients in the other subgroups were also born through ART. These data support the previously mentioned hypothesis of a connection between ART and BWS, as highlighted by Brioude et al. [10]. Moreover, it has been previously pointed out that many patients conceived via ART were characterized in the atypical or ILO groups [25] or by a less severe presentation [41]. Here we confirmed that, although they have an average score similar to that of patients naturally conceived, ART-conceived patients have less commonly cardinal and more supportive features. This further confirms that such patients less frequently belong to the "classic" BWSp group but rather are more frequently "atypical" [41]. This observation further corroborates the employment of ART usage as a supportive criterion to be implemented into the score in the future [42].
The most relevant concern for patients with BWSp is cancer development: managing cancer risk in this population requires specific screening programs for the early detection of tumors to reduce the treatment burden and improve outcomes [43,44]. These cancer screening programs can be effectively implemented in cases diagnosed with BWSp and demonstrate the best balance between benefit and medicalization when genotypebased [31]. However, a consistent portion of patients is diagnosed only after a tumor's development, which diminishes the benefits of tumor screening. Some have no or very mild phenotypes [34,45]. With this view, the currently employed consensus score represents a great improvement, as it proves much more effective than the previous criteria in recognizing patients who will develop cancer. The consensus score allows diagnosing with BWSp and screening for tumor development in 83% of the cases that will develop cancer later. The main objective of the diagnostic score is to allow the carrying out of an empirical classification of the patient aimed at the follow-up and mainly at the adoption of a correct screening strategy. The score did not miss any case with a positive molecular test, proving itself sensitive enough to allow diagnosing all the positive patients. Interestingly, we observed no tendency for patients who develop tumors to have a high score. The score did not predict the probability of developing tumors, so it should not be used for this purpose or for stratifying patients based on their cancer risk. Indeed, many cases with tumors (17/48) occurred in patients with <4 points.
Among the 132 patients with less than 4 score points and negative molecular tests we included in our cohort, most (104, 78.8%) had LO as the sole or cardinal feature, and 2 (all with LO) developed a WT (1.9%). This observation has relevant management implications. Based on different approaches to the "acceptable risk" in different healthcare systems, the consensus recommendation adopted a 5% tumor risk cutoff to advise tumor surveillance [10]. In comparison, the American Association for Cancer Research (AACR) maintained a more conservative approach using a 1% cutoff [46]. This resulted in different approaches in different countries, with most clinicians in the U.S.A. continuing to screen all patients with BWSp. However, most EU countries screen only patients at high risk (IC1-GoM and UPD(11)pat). Currently, there are no clear recommendations concerning cancer screening in cases with LO due to the few studies that focused on this clinical entity [26,45,47,48]. Based on our results, patients with negative molecular tests and LO with <4 points at the score, having a~2% risk of developing cancer, are under the 5% screening threshold for healthcare systems adopting a high acceptable risk of tumor and above the 1% threshold suggested by the AACR. Concerning cancer screening, a final observation can be carried out: HB developed in 4.3% of patients with UPD(11)pat, a fraction significantly higher than that of the other molecular subgroups. This finding further supports our previous recommendation to screen at least patients with UPD(11)pat by alphafetoprotein [49][50][51]. It suggests that screening should be conducted until 30 months [52].

Conclusions
In conclusion, this study documented the very high performance and effectiveness of the currently employed diagnostic criteria for BWSp, supporting a widespread implementation. Moreover, our study proposes several further hints for the diagnosis and management of patients with BWsp: (a) it could be an effective strategy to begin molecular testing from tissue-extracted DNA in patients with <4 points and an identifiable affected body region, as the diagnostic yield of tests on blood-extracted DNA, is low, (b) adding ART among the supportive criteria of the scoring system might lead to an improvement in the score performance, (c) clinicians should refrain from using the score as an indicator of the probability of cancer development, (d) patients with ILO and <4 points with negative molecular test have a cancer risk between 1% and 5%, and screening in such condition should be applied in accordance with specifics of local healthcare systems, and (e) our data support the screening of HB in patients with UPD(11)pat.