Multiple Sclerosis in the Campania Region (South Italy): Algorithm Validation and 2015–2017 Prevalence

We aim to validate a case-finding algorithm to detect individuals with multiple sclerosis (MS) using routinely collected healthcare data, and to assess the prevalence of MS in the Campania Region (South Italy). To identify individuals with MS living in the Campania Region, we employed an algorithm using different routinely collected healthcare administrative databases (hospital discharges, drug prescriptions, outpatient consultations with payment exemptions), from 1 January 2015 to 31 December 2017. The algorithm was validated towards the clinical registry from the largest regional MS centre (n = 1460). We used the direct method to standardise the prevalence rate and the capture-recapture method to estimate the proportion of undetected cases. The case-finding algorithm including individuals with at least one MS record during the study period captured 5362 MS patients (females = 64.4%; age = 44.6 ± 12.9 years), with 99.0% sensitivity (95% CI = 98.3%, 99.4%). Standardised prevalence rate per 100,000 people was 89.8 (95% CI = 87.4, 92.2) (111.8 for females [95% CI = 108.1, 115.6] and 66.2 for males [95% CI = 63.2, 69.2]). The number of expected MS cases was 2.7% higher than cases we detected. We developed a case-finding algorithm for MS using routinely collected healthcare data from the Campania Region, which was validated towards a clinical dataset, with high sensitivity and low proportion of undetected cases. Our prevalence estimates are in line with those reported by international studies conducted using similar methods. In the future, this cohort could be used for studies with high granularity of clinical, environmental, healthcare resource utilisation, and pharmacoeconomic variables.


Introduction
Multiple sclerosis (MS) is the leading cause of disability from central nervous system disease in young adults [1,2]. Prevalence of MS has increased by 10% from 1990, reaching 91-164 cases per 100,000 in North America, Western Europe and Australasia [1]. Clinical onset is generally in early adult life, though there is increased awareness of presentation in childhood [2]. Prevalence of MS is similar in preteen boys and girls, but then progressively increases throughout the lifetime among women, with a 2:1 sex ratio in favour of women in the sixth decade of life [1].
In the past decades, fifteen disease-modifying therapies (DMTs) for MS have been developed within randomised controlled trials (RCTs), directly comparing different DMTs (or placebo) in a short time frame (24-36 months), and for specific clinical outcome measures (e.g., relapses, disability) [3,4]. In the meanwhile, worldwide MS centres have developed MS registries to provide meaningful information on the natural history of MS, and on the long-term safety and effectiveness of DMTs in the real-life [5][6][7]. However, RCTs and MS registries do not include overall healthcare resource utilisation [6,[8][9][10], and, not least, hold potential risks from patient selection (e.g., with earlier and more educated patients being more likely to visit MS centres and participate into research), and follow-up (e.g., with patients doing poorly being more likely to be lost to follow-up) [11,12].
Population-based studies can overcome these limitations. Datasets of routinely collected healthcare data, derived from both publicly-funded and private healthcare systems [13,14], provide standardised healthcare resource utilisation in the long-term (e.g., diagnoses, procedures, medications), and on fully representative populations [15]. As such, population-based studies can provide a detailed description of disease epidemiology, comorbidities and treatment pathways [15], also thanks to the linkage to clinical registries [16,17].
Hereby, we aim to: (1) validate an algorithm to identify MS cases using routinely collected healthcare administrative databases; (2) estimate the 2015-2017 prevalence of MS in the Campania Region (South Italy) and in its provinces; and (3) estimate the proportion of undetected cases.

Study Design and Setting
This is a population-based study, obtained from the retrospective analysis of routinely collected healthcare data of individuals residing in the Campania Region (South Italy) from 2015 to 2017 (population on 1 January 2018: 5,826,860 with 2,841,049 males and 2,985,811 females) (http://dati.istat.it/).
The Italian National Healthcare Service (NHS) operates under the principles of universal coverage and is organised at the national, regional, and local levels. The central government controls the distribution of tax revenues for publicly financed healthcare, whilst the regions are responsible to organise and deliver healthcare services through Local Health Authorities (Azienda Sanitaria Locale (ASL)) [18]. In the Campania Region, there are seven Local Health Authorities (ASL Avellino, Benevento, Caserta, Napoli 1 Centro, Napoli 2 Nord, Napoli 3 Sud, and Salerno), overall including 10 qualified MS centres, complying with regulatory indications for DMT prescription and management [8,19,20]. Healthcare services delivered out of the Campania Region (e.g., DMT prescriptions, hospital admissions, outpatient consultations) are then reported to the Campania Region for refund purposes.
The study was approved by the Federico II Ethics Committee (355/19). All patients signed informed consent authorising the use of anonymised, routinely collected healthcare data, in line with data protection regulation (GDPR EU2016/679). The study was performed in accordance with good clinical practice and the Declaration of Helsinki.

Population
The dataset was created by merging different data sources of the Campania Region. In particular, the cohort comprised all individuals alive at the prevalence day (1 January 2018) who had at least one record from the following databases [21]:

1.
Hospital Discharge Record database, which included all admissions in the study period with an ICD-9 CM code of MS as one of the discharges diagnoses.

3.
Outpatient database, which included all outpatient consultations with an MS-specific exemption from co-payment records (as defined in the Healthcare Co-payment Database).
Hospital admissions, DMT prescriptions, and outpatient consultations delivered out of the Campania Region were reported to the Campania Region Healthcare Regulatory Society (So.Re.Sa.) for refund purposes and then included in the above-mentioned datasets.
From the database, individuals with a diagnosis of MS not resident in the Campania Region were filtered out. Patient unique identifier code was fully anonymised by the Campania Region Healthcare Regulatory Society (So.Re.Sa.) before releasing the datasets. As the same anonymisation algorithm was used across datasets, data merging was possible. As an additional measure of patient privacy protection, the only demographic information retained from each dataset was year of birth, sex, education attainment, and local health authority the individual was registered with. We also extracted type and frequency of DMT prescription, type and frequency of access to healthcare facilities, and type of exemption from co-payment records (e.g., MS-specific, disability due to other conditions, low household income).
Proportions of patients identified from Hospital Discharge Record database, Regional Drug Prescription database, and Outpatient database are presented in Figure 1. 2. Regional Drug Prescription database, which included all MS-specific DMTs prescribed in the study period (e.g., Alemtuzumab, Dimethyl Fumarate, Fingolimod, Glatiramer Acetate, Interferon Beta-1a, Interferon Beta-1b, Natalizumab, Peg-Interferon Beta-1a, Teriflunomide). 3. Outpatient database, which included all outpatient consultations with an MS-specific exemption from co-payment records (as defined in the Healthcare Co-payment Database).
Hospital admissions, DMT prescriptions, and outpatient consultations delivered out of the Campania Region were reported to the Campania Region Healthcare Regulatory Society (So.Re.Sa.) for refund purposes and then included in the above-mentioned datasets.
From the database, individuals with a diagnosis of MS not resident in the Campania Region were filtered out. Patient unique identifier code was fully anonymised by the Campania Region Healthcare Regulatory Society (So.Re.Sa.) before releasing the datasets. As the same anonymisation algorithm was used across datasets, data merging was possible. As an additional measure of patient privacy protection, the only demographic information retained from each dataset was year of birth, sex, education attainment, and local health authority the individual was registered with. We also extracted type and frequency of DMT prescription, type and frequency of access to healthcare facilities, and type of exemption from co-payment records (e.g., MS-specific, disability due to other conditions, low household income).
Proportions of patients identified from Hospital Discharge Record database, Regional Drug Prescription database, and Outpatient database are presented in Figure 1. The Venn diagram shows the number and the proportions of patients identified from different data sources from the overall population (n = 5362).

Clinical Dataset
We extracted individuals with a diagnosis of MS and resident in the Campania Region from the clinical registry of the MS Clinical Care and Research Centre, at the "Federico II" University of Naples, Italy, from 2015 to 2017 (n = 1460). This cohort is part of the Italian MS Registry [7] and has already been used for a number of cohort studies [22,23]. Of note, a recent meta-analysis defined this cohort at moderate risk of bias, compared with the serious risk of other similar cohorts [24,25]. The Venn diagram shows the number and the proportions of patients identified from different data sources from the overall population (n = 5362).

Clinical Dataset
We extracted individuals with a diagnosis of MS and resident in the Campania Region from the clinical registry of the MS Clinical Care and Research Centre, at the "Federico II" University of Naples, Italy, from 2015 to 2017 (n = 1460). This cohort is part of the Italian MS Registry [7] and has already been used for a number of cohort studies [22,23]. Of note, a recent meta-analysis defined this cohort at moderate risk of bias, compared with the serious risk of other similar cohorts [24,25].
Anonymisation was performed using the same algorithm as for routinely collected healthcare data to allow data linkage.

Statistical Analysis
Missing data was present for year of birth (2.5%), and Local Health Authorities of registration (8.8%). A missing pattern analysis involving both graphical and statistical methods (e.g., logistic regression models) was carried out to deem data being missing at random. Thus, we used multiple imputation by chained equations (10 copies) to estimate missing data for age and local health authority. We included the following covariates in the imputation model: sex, number of records for each patient within the study period, DMT prescription (or no DMT prescription), hospital admissions (or no hospital admissions), outpatient consultations (or no outpatient consultations), and MS centre where healthcare was delivered.
Considering that previously published MS case-finding algorithms to identify people with MS using administrative healthcare databases varied in relation to the number of MS records considered for each patient for case identification [26][27][28][29], we aimed to validate two versions of the algorithm (aim 1), which considered the presence of: 1) at least one MS record during the study period; and 2) at least two MS records during the study period. To validate the algorithm, we merged our dataset with a clinical registry and identified individuals who accessed the MS Clinical Care and Research Centre at the "Federico II" University of Naples using the information provided by the algorithm. Then, we assessed the ability of the candidate algorithms to capture people with MS from the clinical registry using sensitivity, specificity, positive and negative predictive values, area under the curve (AUC), and κ-statistics as measures of agreement [30].
To capture MS prevalence (aim 2), age-standardised prevalence rates were calculated for the whole cohort and then stratified by sex using the direct standardisation method. The European population in 2018 was considered as a reference population. To assess differences in the prevalence ratios in the five provinces of the Campania Region (Avellino, Benevento, Caserta, Napoli (resulted from the combination of ASL Napoli 1 Centro, Napoli 2 Nord, and Napoli 3 Sud), and Salerno), standardised morbidity ratios (SMR) were calculated, considering the regional population as standard population (indirect standardisation). To calculate 95% confidence intervals (95%CI) for the standardised rates, the Byar's approximation method based on the Poisson distribution was used. Provincial differences in the MS prevalence were also evaluated with a Poisson regression-based model, with robust estimation of the variance and adjusted for age and sex [31].
Finally, to estimate the proportion of undetected cases (aim 3), we used the capture-recapture method, which employs log-linear models including main effects and specific interaction terms to assess dependence between sources. Capture-recapture has often been used as an indirect method to estimate incidence and prevalence of a condition considering the overlap between more than one data source [32][33][34][35][36]. Model selection was based on a number of parameters including the Akaike Information Criterion (AIC), the Bayesian Information Criterion (BIC) and the goodness-of-fit based confidence intervals following the method published by Regal and Hook [35][36][37].
Statistical analyses were performed using Stata 15.0. Results were considered statistically significant for p < 0.05.

Data Availability
Data is available upon request to the Regional Healthcare Society (So.Re.Sa-www.soresa.it).
In the capture-recapture analysis, the model with the best fit showed that the number of expected MS cases was equal to 5508 (95%CI = 5480, 5539), with a 2.7% increase, when compared with the number of detected cases.  Figure 3. These results were confirmed by regression-based estimation showing that, when compared with the province of Naples, the prevalence was 42% greater in the provinces of Salerno and Avellino (prevalence ratio = 1.42; 95%CI = 1.39, 1.45; and prevalence ratio = 1.42; 95%CI = 1.38, 1.46 respectively), 13% greater in the province of Benevento (prevalence ratio = 1.13; 95%CI = 1.09, 1.18), and 3% greater in the province of Caserta (prevalence ratio = 1.03; 95%CI = 1.01, 1.06).
In the capture-recapture analysis, the model with the best fit showed that the number of expected MS cases was equal to 5508 (95%CI = 5480, 5539), with a 2.7% increase, when compared with the number of detected cases.
The forest plot shows standardised morbidity ratios (SMR) by sex for the five provinces of the Campania Region (Avellino, Benevento, Caserta, Napoli (resulted from the combination of ASL Napoli 1 Centro, Napoli 2 Nord, and Napoli 3 Sud), and Salerno), considering the regional population as the standard population. Ninety-five percent confidence intervals (95%CI) were calculated using the Byar's approximation method based on the Poisson distribution. The forest plot shows standardised morbidity ratios (SMR) by sex for the five provinces of the Campania Region (Avellino, Benevento, Caserta, Napoli (resulted from the combination of ASL Napoli 1 Centro, Napoli 2 Nord, and Napoli 3 Sud), and Salerno), considering the regional population as the standard population. Ninety-five percent confidence intervals (95%CI) were calculated using the Byar's approximation method based on the Poisson distribution.

Discussion
We have validated a case-finding algorithm to capture individuals with MS from routinely collected healthcare data, and, based on this, we estimated the prevalence of MS in the Campania Region (South Italy). After linkage to a clinical registry, our algorithm showed high sensitivity (99.0%), with only 2.7% of MS cases remaining undetected on capture-recapture models [32][33][34][35][36]. The Campania Region accounts for 10% of the Italian population, and thus we captured a large, and possibly fully representative, sample of Italian MS patients.
Previously published MS case-finding algorithms to identify people with MS from administrative healthcare databases used one to three MS records for case identification [26][27][28][29]. Our algorithm used one MS record for case identification and had high sensitivity (99.0%) at detecting MS cases from a clinical registry. In particular, previous studies using similar algorithms presented with the same or frequently lower sensitivity in MS (85.0%-99.0%) [27,38,39], and an in other neurological diseases (e.g., 75.8%-91.2% in Parkinson's disease, and 85.9%-87.3% in epilepsy) [21]. We have also estimated specificity, positive predictive value, and negative predictive value that, however, should be interpreted cautiously since it specifically relates to the clinical registry rather than to a general

Discussion
We have validated a case-finding algorithm to capture individuals with MS from routinely collected healthcare data, and, based on this, we estimated the prevalence of MS in the Campania Region (South Italy). After linkage to a clinical registry, our algorithm showed high sensitivity (99.0%), with only 2.7% of MS cases remaining undetected on capture-recapture models [32][33][34][35][36]. The Campania Region accounts for 10% of the Italian population, and thus we captured a large, and possibly fully representative, sample of Italian MS patients.
Previously published MS case-finding algorithms to identify people with MS from administrative healthcare databases used one to three MS records for case identification [26][27][28][29]. Our algorithm used one MS record for case identification and had high sensitivity (99.0%) at detecting MS cases from a clinical registry. In particular, previous studies using similar algorithms presented with the same or frequently lower sensitivity in MS (85.0%-99.0%) [27,38,39], and an in other neurological diseases (e.g., 75.8%-91.2% in Parkinson's disease, and 85.9%-87.3% in epilepsy) [21]. We have also estimated specificity, positive predictive value, and negative predictive value that, however, should be interpreted cautiously since it specifically relates to the clinical registry rather than to a general individual with/without MS from the Campania Region. Of note, our algorithm was developed considering Italian recommendations for case identification [21], and equally weighted the use of DMTs and other healthcare resources (e.g., hospital admissions, outpatients). Indeed, most algorithms proposed in the last years to identify MS cases from routinely collected healthcare data relied on drug prescriptions (and related consultations to MS Centres), biasing the inclusion towards early MS patients [21]. As such, our algorithm is also applicable to other fields of medicine and is currently under investigation to study other neurological (e.g., epilepsy, headache) and non-neurological diseases (e.g., kidney failure) in the Campania Region. However, the applicability of our algorithm to other Italian Regions should be evaluated carefully, since in the Campania Region all datasets are provided by the Regional Healthcare Society (So.Re.Sa), whilst other Regions rely on different agencies, possibly with heterogeneity in data collection.
Our estimate of 90 MS cases per 100,000 people perfectly falls within prevalence intervals for South European countries, as measured by the Global Burden of Diseases study (60-120 per 100,000) [1]. However, a number of previous Italian studies estimated higher MS prevalence in Italy, being, on average, 176/100,000 in mainland and Sicily, and 299/100,000 in Sardinia [27,38,40,41]. Our lower prevalence estimates are possibly a consequence of different factors. First, the Campania Region is located in South Italy at a low-risk latitude [1], and thus prevalence could be actually lower when compared with other regions from North and Central Italy. Genetic background can also account for differences in prevalence, as in the case of high prevalence in Sardinia [38,42]. There are also methodological differences in our study, as discussed below.
Looking at previous national and international studies using routinely collected healthcare data, our approach holds novelties due to the application of a conservative case-finding algorithm [21], and to the linkage to a clinical dataset, which has been hereby used for algorithm validation and could be used in the future also for clinical validation (e.g., identification of relapses and disability levels). In particular, we included a clinical registry for validation purposes, but not for prevalence estimates as in previous Italian studies [39][40][41]43,44]. In our study, prevalence was measured directly on the population and was not derived from expected annual increase in prevalence adjusted by mortality [38]. Not least, we excluded MS patients who were receiving care in the Campania Region but were actually living elsewhere, which could have overinflated MS prevalence in other Italian studies [14,27]. Other international cohorts included a nationwide cohort in France (>100,000 individuals with MS) [16], and a cohort representative of the UK population (>10,000 individuals with MS) [26], which, however, lack linkage to clinical datasets for validation of case-finding algorithms and clinical outcomes. In Germany, a cohort of >30,000 individuals has been described, though limited by the lack of linkage to clinical datasets and by the risk of multiple counting of patients who changed of insurance number during the study period [45]. In the US, routinely collected healthcare data is derived from insurance claims with an unspecified number of individuals being excluded [46]. As such, our cohort, though smaller than previous international studies, looks representative of the Italian population and holds the potential of higher granularity in both healthcare resource utilisation and clinical perspectives.
Population demographics are in line with previous studies [28,47]. In particular, we found most cases between 40 and 60 years, with 1.8 male-to-female ratio, progressively growing over the life span [1,2]. Of note, we have hereby included age at prevalence date/study inclusion (1-Jan-2018), whilst age at onset would need a specific validation study. We have also found a limited number of paediatric MS cases which would deserve further validation of the algorithm due to possible differences when compared with adults. We favoured the algorithm considering at least one MS record in light of the larger sample identified, when compared with two MS records. On the contrary, to identify paediatric MS with high sensitivity, given the high risk of misclassification (e.g., monophasic demyelinating diseases) [2], the number of MS records should probably increase (e.g., two or more MS records) and the follow-up should be longer.
Limitations of the present study include the possibility of false negatives; though unlikely [16,21,27,47], some patients may have not accessed any MS-related service over three years (e.g., no DMTs, hospital admissions, outpatients with MS exemption), and might have been missed from our case-finding algorithm. In the Italian NHS, there is an overlap between payment exemptions, with the possibility that one person holds more than one reason for exemption; for instance, low household income exemption could be used instead of the MS exemption, thus limiting our case-finding algorithm, especially in high deprivation areas (e.g., the province of Naples and Caserta). We cannot exclude coding errors and data omissions that, however, would have been responsible for wrong compensations and thus would have been very likely detected during the administrative processing. We did not present specific clinical and treatment data, which would need further validation studies (e.g., incident MS, relapse occurrence, etc.) and consistent study objectives (e.g., treatment switch/discontinuation).

Conclusions
We have validated an algorithm for capturing MS cases from routinely collected healthcare data in the Campania Region (South Italy) and have confirmed expected rates of prevalence. In the future, this cohort will allow studies with high granularity of clinical, environmental, healthcare resource utilisation, and pharmacoeconomic variables on a large sample, representing 10% of the Italian population. Not least, the possibility of data linkage to a clinical dataset will give the opportunity of integrating routinely collected healthcare data with clinical variables and patient-reported outcome measures.

Conflicts of Interest:
Authors have nothing to disclose.