Screening Methods for Diagnosing Cystic Fibrosis-Related Diabetes: A Network Meta-Analysis of Diagnostic Accuracy Studies

Background: Cystic fibrosis-related diabetes (CFRD) has become more common due to higher life expectancy with cystic fibrosis. Early recognition and prompt treatment of CFRD leads to improved outcomes. Methods: We performed a network meta-analysis (NMA) in order to identify the most valuable diagnostic metrics for diagnosing CFRD out of available screening tools (index test), using the oral glucose tolerance test as a reference standard. Pooled sensitivity (Se), specificity (Sp), and superiority indices were calculated and used to rank the index tests. Results: A total of 31 articles with 25 index tests were eligible for inclusion. Two-day, continuous glucose monitoring (CGM) ranked the highest (Se: 86% Sp: 76%), followed by glucose measurement from blood capillary samples (Se: 70%, Sp: 82%) and three-day CGM (Se: 96%, Sp: 56%). When we compared the CGM of different durations, two-day CGM performed best (Se: 88%, Sp: 80%), followed by three-day (Se: 96%, Sp: 59%) and six-day CGM (Se: 66%, Sp: 79%). Conclusions: Considering its overall performance ranking, as well as the high sensitivity, two-day CGM appears to be a promising screening test for CFRD.


Introduction
Cystic fibrosis (CF) is an autosomal recessive genetic disease caused by mutations of the cystic fibrosis transmembrane conductance regulator (CFTR) gene. The pathogenic mutations cause abnormal chloride transport across secretory epithelial cells, leading to thick, sticky mucus production, mainly affecting the lungs and the digestive system [1]. Due to advances in treatment and diagnostics, the life expectancy of CF patients has increased over the last decade. Consequently, previously rare extra-pulmonary complications are now highly prevalent. Risk factors for developing cystic fibrosis-related diabetes (CFRD) include pancreatic insufficiency, severe genotype (deltaF508 homozygotes), and increased age; consequently it affects approximately 20% of adolescent and 40-50% of adult individuals with CF [2,3]. CFRD is categorized as a specific subtype of diabetes mellitus (DM) in the 2018 American Diabetes Association (ADA) guidelines, sharing features with both type 1 and type 2 diabetes [4]. In the initial stage of the disease, insulin and glucagon deficiency develop due to pancreatic islet cell dysfunction and loss in combination with exocrine pancreatic insufficiency. In addition, acute and chronic inflammation result in fluctuating insulin resistance, which may play a significant role in the pathogenesis. CFRD is associated with diminished lung function, suboptimal nutritional status, and increased mortality [1,5]. Despite the resemblances to other types of diabetes, pathophysiology of CFRD is a fundamentally different, and it should be considered independently regarding diagnosis and treatment.
The current gold-standard for CFRD screening is the 2 h oral glucose tolerance test (OGTT) (1.75 g glucose/kg body weight, maximum = 75 g). It is recommended that this is performed annually in all people with CF beginning at 10 years age and with no previous diagnosis of CFRD. [4] The CFRD diagnostic criteria, according to the ADA guidelines for stable CF, are defined as 2 h OGTT (T120 ) plasma glucose ≥ 11.1 mmol/L, fasting plasma glucose (FPG) ≥ 7.0 mmol/L, haemoglobin A1c (HbA1C) ≥ 48 mmol/mol (6.5%), or classical symptoms of diabetes (polyuria and polydipsia) in the presence of a random glucose level ≥ 11.1 mmol/L [6].
The current gold standard procedure is inconvenient, due to fasting and being timeconsuming, and does not represent an everyday glucose homeostasis [4,7]. Therefore, we cannot leave out of consideration the fact that OGTT T120 plasma glucose can be normal even if other screening methods, such as continuous glucose monitoring (CGM), reveal abnormal daily glucose excursions. Management with insulin and strict glycaemic control are accompanied by improved clinical outcomes and increased life expectancy. Early recognition of glucose homeostasis abnormalities is thus clearly important and an area of active research [8].
We undertook a network meta-analysis (NMA) to compare the diagnostic performance of the currently available screening tools for the diagnosis of CFRD, and to identify a potential alternative screening tool to the formal OGTT in CF patients. NMA structure allows to simultaneously compare multiple diagnostic tests at multiple thresholds to the gold standard at the same time.

Protocol and Registration
The network meta-analysis of multiple diagnostic tests (NMA-DT) is reported according to the Preferred Reporting Items for Systematic Review and Meta-Analysis (PRISMA) for Network Meta-Analyses Statement [9]. The protocol of the meta-analysis was registered on the International Prospective Register of Systematic Reviews (PROSPERO) under the registration number CRD42020160389.

Data Sources and Search Strategy
We conducted a systematic literature search using five medical databases, including MEDLINE (via PubMed), EMBASE, Web of Science Core Collection, Scopus, and CENTRAL in October 2019 (our search strategy is detailed in Table S1). Additionally, we performed a manual search for cited and citing reports of the eligible articles, revised through Google Scholar, applying the same eligibility criteria as for the database search.

Eligibility Criteria
The inclusion criteria for the study were the following: (1) prospective diagnostic accuracy studies, in which patients were previously diagnosed with CF; (2) studies must use OGTT T120 value as the reference standard; and (3) studies must evaluate one or more diagnostic methods, comparing to the reference standard, as index tests. Studies were not limited based on the age of enrolled participants-both pediatric and adult studies were considered eligible.
Exclusion criteria were the following: (1) records examining only CF patients with previously known impaired glucose tolerance (IGT) or diabetes; (2) studies that assessed FPG (T0 ) or 1-h OGTT (T60 ) values as the only index test; and (3) fewer than five participants.
We included both full-text papers and conference abstracts to reduce publication bias. If useful data were available for an incomplete proportion of the study population, we used only the valid data from the smaller number of participants. In case of conspicuous overlapping populations between studies (matching authors, hospitals, index tests, and same study periods), we selected the most recent full text article rather than conference abstract and larger over smaller sample size. Although we included studies with potential risk of overlapping study populations (same authors, same hospitals, same or partially same study period but different index tests), we created two network analyses to avoid any over-representation of data. One network included all studies regardless of potential overlapping populations, and in the second, overlaps were avoided by evaluating the period and place of enrollment, study authors, eligibility criteria, and baseline characteristics of participants. We chose studies with CGM over other index tests, full-text articles over conference abstracts, and greater sample size over small sample size.

Selection and Data Collection Process
Two independent review authors (V.D.I., Á.R.M.) completed all steps of selection and data collection (onto the pre-defined data collection sheet) in duplicate. A third party (Z.S.) resolved any disagreement between the authors. We imported records from each database into EndNote X9 citation manager (Clarivate Analytics, Philadelphia, PA, United States). First, we removed duplicates using the citation manager, and then manually. Remaining records were assessed for inclusion according to the eligibility criteria, by their titles, abstracts, and then by full texts.
All data, according to study type, author, and publication information, demographic data, study period, details of diagnostic methods, number of CFRD participants, number of non-CFRD participants detected by reference standard, and index tests, were collected in the study data table.

Risk of Bias and Applicability
The quality of all included studies was assessed by two independent researchers (V.D.I., Á.R.M.), according to the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) tool [10]. Any disagreement was resolved by a third independent person (Z.S.).

Statistical Analysis
After data collection, we created 2 × 2 contingency tables with true positive (TP), true negative (TN), false positive (FP), and false-negative (FN) values for each comparison. If more than one cut-off value was reported for the same index test, we chose the best-performing cut-off. In trials, if the best cut-off was not given, we chose the one corresponding to the value from the currently available CFRD guidelines [4][5][6]. If there was no recommendation in the guidelines, we calculated combined sensitivity and specificity and chose the cut-off with the highest values.

Network Meta-Anaylsis
We performed a network meta-analysis for diagnostic tests (NMA-DT) to investigate which diagnostic method performs best for CFRD diagnosis. NMA-DT allows us to simultaneously compare multiple diagnostic tests at multiple thresholds to the gold standard at the same time. This approach allowed us to make direct (head-to-head) as well as indirect comparisons, given the common comparator (OGTT 120 ) in all studies [11]. We considered the evaluation of diagnostic odds ratios (DORs); however, due to continuity correction, results were uninterpretable. DOR is not defined when TP values are zero, which occurred in half of our articles. To assess the relative performance of a diagnostic test, we calculated pooled sensitivity (Se) and specificity (Sp) of the index tests compared to OGTT for the diagnosis CFRD, and ranked them based on the superiority index (SI). The greater the SI, the more accurately a screening test is expected to identify the target condition. "This approach gives more weight to a diagnostic test doing comparatively well on both measures and less emphasis on tests doing relatively poorly on both measures or even doing extremely well on one measure but performing poorly on the other measure" [12].
We illustrated the network graph using STATA (version 15.1). To display the network, we constructed a graph where nodes represent different screening methods, and lines represent head-to-head comparisons. In the network graph, the direct comparisons are presented with edges, the thickness of the edges represents the number of the head-to-head trials, and the size of the nodes correlates with the number of studies [9]. All statistical calculations were performed by R programming language using an ANOVA arm-based model by Nyaga et al. [13].

Characteristics of the Studies Included
We included 31 studies (26 full-text papers and 5 conference abstracts) in the network meta-analysis ( ). The selection process is shown in Figure 1. Included studies reported on a total of 1976 people with CF, 243 (12.3%) of whom had CFRD. One study included CF patients after lung transplantation without DM [42]. Although we excluded studies including participants with previously identified IGT, we included one study that compared HbA1c to OGTT T120 . It selected patients with clinical decline or reactive symptomatic hypoglycaemia during OGTT, some of whom had impaired or indeterminate glucose tolerance (INDET). INDET means fasting and 2 h glucose levels are normal, but the 1 h glucose level is greater than 11.1 mmol/L. calculated pooled sensitivity (Se) and specificity (Sp) of the index tests compared to OGTT for the diagnosis CFRD, and ranked them based on the superiority index (SI). The greater the SI, the more accurately a screening test is expected to identify the target condition. "This approach gives more weight to a diagnostic test doing comparatively well on both measures and less emphasis on tests doing relatively poorly on both measures or even doing extremely well on one measure but performing poorly on the other measure" [12]. We illustrated the network graph using STATA (version 15.1). To display the network, we constructed a graph where nodes represent different screening methods, and lines represent head-to-head comparisons. In the network graph, the direct comparisons are presented with edges, the thickness of the edges represents the number of the headto-head trials, and the size of the nodes correlates with the number of studies [9]. All statistical calculations were performed by R programming language using an ANOVA armbased model by Nyaga et al. [13].

Characteristics of the Studies Included
We included 31 studies (26 full-text papers and 5 conference abstracts) in the network meta-analysis ( ). The selection process is shown in Figure 1. Included studies reported on a total of 1976 people with CF, 243 (12.3%) of whom had CFRD. One study included CF patients after lung transplantation without DM [42]. Although we excluded studies including participants with previously identified IGT, we included one study that compared HbA1c to OGTT T120′. It selected patients with clinical decline or reactive symptomatic hypoglycaemia during OGTT, some of whom had impaired or indeterminate glucose tolerance (INDET). INDET means fasting and 2 h glucose levels are normal, but the 1 h glucose level is greater than 11.1 mmol/L. The studies included a total of 24 index tests, with HbA1c being the most widely used (20 studies). Nineteen studies used a single index test, 10 included two index tests, while 3, 5, and 8 index tests were used in other studies (one study each). Table 1 summarizes the characteristics of the included studies. The studies included a total of 24 index tests, with HbA1c being the most widely used (20 studies). Nineteen studies used a single index test, 10 included two index tests, while 3, 5, and 8 index tests were used in other studies (one study each). Table 1 summarizes the characteristics of the included studies.

Diagnostic Performance of the Index Tests
Network graph A (Figure 2A) summarizes 31 eligible studies and compares all available index tests (n = 24) to the reference standard. The most commonly used (19 articles) comparator was HbA1c. The two-day CGM (2d-CGM) (Se: 86%, Sp: 76%), blood capillary sample (BCS) (Se: 70%, Sp: 82%), and three-day CGM (3d-CGM) (Se: 96% Sp: 56%) ranked in the first three positions, according to their SI ( Table 2). The top three diagnostic modalities ranked by their superiority indices in all four analysis. Network A represents the first analysis, which included all eligible studies (n = 31) and compared 24 index tests to the reference standard. Two day-CGM, BCS 120 , and 3 day-CGM ranked in the first three positions, according to the SI. A more detailed ranking table is available for all analysis in Table S3. Network B lists the ranking of the second analysis, which included 25 articles and 13 index tests. Among these 13 index tests, 2 day-CGM, 6 day-CGM, and 3 day-CGM ranked the highest. A more detailed ranking table is available in the Table S4. Network C shows the results of the third analysis, comparing those index tests that were used in at least two different articles to the reference standard. From 23 articles, 6 different screening methods were evaluated. In the first three positions, 2 day-CGM, 3 day-CGM, and HbA1c were ranked. A more detailed ranking table is available in Table S5. Network D demonstrates the ranking of different lengths of CGM. Two day-CGM seems to be relatively the best diagnostic method, while 3 day-CGM and 6 day-CGM took second and third places. The full table can be seen in Table S6. In all four analyses, according to the SI, 2 d-CGM ranked the highest. Abbreviations: SI: superiority index, CI: 95% confidence interval, BCS: blood capillary sample, CGM: continuous glucose monitoring, HbA1c: haemoglobin A1c.
The network graph B ( Figure 2B) represents 25 studies comparing 13 index tests to OGTT 120 . This network was created according to the a priori planned algorithm to avoid potential population overlaps in the first analysis. As in the initial analysis, the majority of articles (17 articles) used HbA1c as the index test. The second most-investigated screening tool was 3d-CGM (six articles). Fructosamine was evaluated in three studies, with fractional serum fructosamine (FSF); 2d-CGM; and 1,5-Anhydroglucitol (1,5-AG) each used in two articles. The glucose challenge test (GCT), glycated albumin, lunch meal, OGTT and CGM combination, combination of different test values, seven-day CGM (7d-CGM), and six-day CGM (6d-CGM) were reported only in single studies. Ranked by SI, 2d-CGM (Se: 87%, Sp: 78%) performed best, mirroring the first analysis. The 2d-CGM was followed by 6d-CGM (Se: 60%, Sp: 77%) and 3d-CGM (Se: 97%, Sp: 54%).
Network graph C ( Figure 2C) includes only those index tests that were used in at least two studies. Six different screening tests were investigated. In accordance with the first two analyses, 2d-CGM (Se: 86%, Sp: 78%) was ranked the highest according to SI. The 3d-CGM (Se: 95%, Sp: 53%) was ranked second, and HbA1c (Se: 48%, Sp: 82%) was ranked third.
Network graph D ( Figure 2D) was constructed to compare different durations of CGM (between two and seven days) to OGTT. We evaluated data from 10 articles that compared four screening tests. In line with the previous networks, this analysis also ranked 2d-CGM (Se: 88%, Sp: 80%) the highest.

Risk of Bias and Applicability Assessment
The patient selection domain carried a low or unclear risk of bias in the majority of the articles, due to limited reporting in the publications. One study was considered to have a high risk of bias, since the selected patients were known to have abnormal glucose tolerance (but not CFRD), previously diagnosed by OGTT [37]. In the index test domain of QUADAS-2, three records were deemed as having a high risk of bias. In these articles, the authors defined cut-off values for the index tests based on the OGTT results [27,32,40]. The reference standard domain was considered at a low risk of bias in all but one case [41]. In 9% of the papers, the flow and timing domain was considered to have a high risk of bias. The source of bias in this section was the discrepancy between the target and accrued population size in the articles [30,32,41]. All studies had low or unclear applicability concerns in the "patient selection" and "reference standard" domains. The detailed risk of bias and applicability assessment figures are available in Figures S1-S3.

Discussion
The aim of our study was to robustly compare alternative screening tools to OGTT; therefore, we evaluated currently reported screening methods for CFRD diagnosis for their diagnostic performance. In our results, all four network meta-analyses indicate that the two-day CGM performed best compared to the currently used gold standard: OGTT (T120 ).

OGTT
Recent guidelines recommend annual routine OGTT screening for people above the age of 10. Even though OGTT serves as a basis for defining CFRD, its thresholds were established in diabetes without CF, and were not designed to detect hyperglycaemiaassociated risk of deterioration in lung function or under-nutrition [39,45]. Considering the practical challenges of OGTT and the low frequency of screening, according to the 2018 report of the Cystic Fibrosis Foundation patient registry (approximately 61.3% of children, aged between 10 and 17, and 33.8% of adults), the need for a more convenient screening method is urgent [3].

HbA1c
HbA1c, the only widely available, simple, relatively cheap diagnostic tool for diabetes, is not recommended for CFRD screening according to recent guidelines, although it is used to guide therapeutic decisions to ensure proper glycaemic control. [4,5]. While an elevated HbA1c ≥ 48 mmol/mol (6.5%) serves as sufficient evidence of hyperglycaemia, a normal HbA1c does not exclude it [5]. Our results were in line with this statement, which is from the International Society for Pediatric and Adolescent Diabetes (ISPAD) Clinical Practice Consensus Guidelines 2018: in our study, HbA1c had a pooled sensitivity of 49% (CI: 35-62%), with a pooled specificity of 81% (CI: 73-87%) (Table S2). Recent studies have suggested lower cut-off values (e.g., 5.4%) for HbA1c to increase sensitivity and specificity [20,46]. In our analysis, HbA1c ranked lower than 2d-CGM in all analysis. We evaluated 17 studies comparing HbA1c to OGTT in Panel B (Table S2), with eight using the recommended 48 mmol/mol (6.5%) cut-off value, and others applying different cut-offs ranging from 5.5 to 6.5 (Table 1).

Glycaemic Biomarkers
Other non-traditional glycaemic biomarkers, such as FSF; fructosamine; 1,5-AG; and glycated albumin are gaining more attention for monitoring short-term glycaemic control in type 1 and type 2 DM, as well as in CFRD [26,27]. The 2010 ADA guideline states that fructosamine has low sensitivity for screening CFRD, while other biomarkers are not mentioned in current guidelines [4,5]. These glycaemic markers ranked in the middle of our ranking table (Table 2). Considering the limited evidence on the diagnostic performance of these markers, further research is necessary to define their cut-off values and exact role in the diagnosis and management of CFRD.

Continuous Glucose Monitoring
The CGM system uses a subcutaneous sensor to measure interstitial fluid glucose levels and provides an average measurement every 5-15 min (depending upon the device). The machine is calibrated by capillary blood glucose levels (usually four times a day). This method has already been used successfully in type 1 and type 2 diabetes patients, as well as in insulin-treated children with CFRD for guiding safe and effective insulin therapy [5,47].
Due to frequent glucose measurements, early glucose abnormalities (both hyper-and reactive hypoglicameia), often preceding CFRD, could be identified by CGM. However, current guidelines do not recommend its use as a screening tool, in spite of its ability to detect elevated glucose levels in CF patients (even when OGTT shows normal values) [22,23,25,29,48], although the clinical significance of intermittent hyperglycaemia detected by CGM is disputed [22,24,29,33]. It is also noteworthy that while OGTT results are not always reproducible and can vary over time, CGM seems to show good reproducibility and reliability [48]. Further advantages of CGM over OGTT include lack of fasting periods, diets, no restrictions on physical activities, and better representation of everyday glucose homeostasis. Our findings demonstrate that from all analysed index tests, CGM (using random glucose levels > 11.1 mmol/L) was relatively the best screening method, as reported in our ranking table (Table 2). In spite of the mildly burdensome calibration process and the substantial required financial investment for the acquisition of a CGM device, in the long term it may help with early diagnosis of glucose abnormalities, allowing earlier treatment and better clinical outcomes for CFRD patients [22,23].
The CGM device is appropriate for both continuous and intermittent measurements. For tracking continuous blood glucose levels, CGM is applied usually for two to seven days. In order to understand the optimal length, we evaluated studies using various lengths of CGM (2-7 days). The diagnostic accuracy seems to be the highest for 2d-CGM. Short measurement length makes the method accessible for a greater proportion with CF, as well as being less inconvenient, facilitating better uptake.

Strength and Limitations
To our knowledge, this is the largest meta-analysis of people with CF evaluating the diagnostic accuracy of the available techniques for the detection of CFRD. Strengths include the selection of homogeneous study populations and standardized OGTT methodology, although the effect of the quality of the studies on the results could not be estimated reliably, since the risk of bias in many studies was unclear.
Limitations include the divergent use of cut-off values of the index tests. Many studies used several different cut-off values for the same index tests. Also, this variability was seen within articles, but in the diagnostic test accuracy (DTA)-NMAs, only one cut-off can be used to avoid overrepresentation of participants. Current guidelines do not state specific cut-off values for CGM or non-traditional glycemic markers. Furthermore, we observed the use of alternative values even where cut-off values are clearly stated in the guidelines (e.g., HbA1c). In order to reduce confounding, we implemented a transparent decision-making algorithm to select the cut-off values used in the analysis.
Originally, we planned to use CGM as a reference standard, as indicated in PROS-PERO registration. However, we decided to deviate from our protocol, since OGTT seemed to be the better choice based on methodological recommendations of the Cochrane diagnostic test accuracy (DTA) guidelines, and the professional recommendations of the ADA guideline [6,49].
DTA-NMA allows the use of only one reference standard (as for all studies). This methodology precluded elucidation of whether an index test is better than the reference standard.

Implication for Practice
Our results indicate that CGM performs well in diagnosing CFRD, which raises the idea of using CGM as an alternative reference standard instead of OGTT in further studies. Two-day CGM seems sufficient, since longer follow-up did not improve global diagnostic performance; this has important financial implications.

Implication for Research
Studies investigating the role of CGM as the gold standard are awaited. In addition, prospective cohort studies should evaluate if abnormalities detected by CGM and missed by OGTT have prognostic value for clinical outcomes, and if patients benefit from treatment initiated based on CGM results. Furthermore, these abnormalities also should be investigated in special risk groups (e.g., severe and less severe genotypes, or pancreatic sufficient and insufficient CF patients).

Supplementary Materials:
The following are available online at https://www.mdpi.com/article/ 10.3390/biom11040520/s1, Figure S1: Risk of bias assessment Figure S2: Risk of bias assessment, applicability concerns. Figure S3: Bar graphs of QUADAS-2 risk of bias and applicability assessment Table S1: Detailed search strategy. Table S2 Table S3: First analysis index tests ranked by their superiority indexes. Table S4: Second analysis which included 25 articles and ranked 13 index tests by their superiority indices. Table S5: Index tests used in at least two articles, ranked by their superiority indices.  Funding: The article was funded by the project titled "GINOP-2.3.2-15-2016-00048-STAY ALIVE", which is co-financed by the European Union (European Regional Development Fund) within the framework of Programme Széchenyi 2020 and Human Resources Development Operational Programme Grant. Grant number EFOP-3.6.2-16-2017-00006-LIVE LONGER is also co-financed by the European Union (European Regional Development Fund) within the framework of Programme Széchenyi 2020. The János Bolyai Research Scholarship of the Hungarian Academy of Sciences and Cystic Fibrosis Trust Strategic Research Center Grant (NU-000600; SRC-019) went to P.H., A.P., and J.S. V.I. is the guarantor of this work, and as such, had full access to all the data in the study, and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Institutional Review Board Statement: Not applicable.

Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable.

Conflicts of Interest:
The authors declare no conflict of interest. Prior presentation information: we assert that the article has not received prior publication and is not under consideration for publication elsewhere.