A Systematic Review and Meta-Analysis of the Brief Cognitive Assessment for Multiple Sclerosis (BICAMS) International Validations

Cognitive impairment is a prevalent and debilitating symptom of multiple sclerosis (MS) but is not routinely addressed in clinical care. The Brief Cognitive Assessment for Multiple Sclerosis (BICAMS) was developed in 2012 to screen and monitor MS patients’ cognition. This systematic review and meta-analysis aimed to identify, synthesise, and critically appraise current BICAMS’ international validations. The literature search was conducted using PubMed, PsycINFO and Web of Science electronic databases in August 2022. Quantitative, peer-reviewed adult studies, which followed the BICAMS international validation protocol and were published in English, were included. The search identified a total of 203 studies, of which 26 were eligible for inclusion. These reported a total of 2833 adults with MS and 2382 healthy controls (HC). The meta-analysis showed that BICAMS identified impaired cognitive functioning in adults with MS compared to HC for all three subtests: information processing speed (g = 0.854, 95% CI = 0.765, 0.944, p < 0.001), immediate verbal recall (g = 0.566, 95% CI = 0.459, 0.673, p < 0.001) and immediate visual recall (g = 0.566, 95% CI = 0.487, 0.645, p < 0.001). Recruitment sites and strategies limit the generalisability of results. BICAMS is a valid and feasible international MS cognitive assessment.


Introduction
Cognition is a significant component of most neurodegenerative conditions and yet systematic, internationally valid measurement remains elusive for most. Early recognition of cognitive impairment allows for diagnosis and appropriate treatment, education, psychosocial support and engagement in shared decision-making regarding life planning, health care, involvement in research and financial matters [1]. It has been hard to meet the challenge of psychometrically sound, clinically feasible assessments, with some exceptions [2]. There is increasing recognition that the stimuli should not disadvantage any particular cultures [3]. Harmonisation of data across different national and ethnic communities needs careful consideration of cultural and linguistic variables [4]. The increasingly diverse populations within individual countries require health services to be agile and inclusive [5]. Important steps to advance cognitive measurement technology are global collaboration and a consensus, credible international validation protocol.
Multiple sclerosis (MS) is a chronic autoimmune-mediated disease of the central nervous system, involving inflammatory and degenerative processes [6]. This can produce a constellation of symptoms in the physical, psychiatric and cognitive domains. MS affects over 2.8 million people worldwide [7] and is typically diagnosed in adults aged 20 to 30 years [8]. Cognitive impairment is a prevalent and debilitating symptom of MS, affecting between 40-65% of patients [9]. It can be observed in all subtypes (Relapsing Remitting Multiple Sclerosis, RRMS; Secondary Progressive Multiple Sclerosis, SPMS; Primary Progressive Multiple Sclerosis, PPMS [10]), but severe cognitive impairment predominates in the progressive forms of the disease [11]. There are often marked deficits in information processing speed, attention, working memory and executive functioning [9]. It has a negative impact on quality of life [9], including activities of daily living [12], employment [13], disease management [14,15], personality [16] and driving safety [17]. Given the significant adverse consequences of cognitive difficulties, early identification of cognitive status is needed to facilitate optimal management and preserve quality of life in people with MS (PwMS [18]).
Cognitive impairment remains a neglected and under-diagnosed symptom of MS. The "invisibility" of cognitive difficulties has meant they are often overlooked by family members, colleagues and healthcare professionals since there is no obvious external disability [19]. At routine consultation, neurologists are poor at identifying MS-related cognitive impairment [20]. There is a growing consensus, across MS patients and professionals, that routine cognitive testing should form part of clinical practice to inform management [21]. Despite this, objective cognitive testing is rarely delivered [22,23]. Both the National Institute for Health and Care Excellence (NICE [24]) and the American Academy of Neurology (AAN [25]) recommend an annual cognitive assessment for MS. Regularly monitoring cognition in MS patients can facilitate appropriate management as well as targeted specialist referrals for follow-up expert cognitive assessment and management [26,27]. Once cognitive impairment is identified, healthcare professionals can modify their interaction style with patients and monitor increased risks associated with cognitive impairment such as driving accidents, risk of falls, unemployment and poor disease management [18].
In 2012, an international consensus committee of 12 European and American MS experts convened to develop a review process to select scales that could be combined to produce a feasible, valid and international MS cognitive assessment. The committee examined the available cognition scales from the literature, as well as their psychometric qualities and clinical applicability. This approach took account of both the psychometric standards (reliability, validity and sensitivity) and the pragmatic standards (international applicability, ease of administration, patient acceptability and contextual feasibility). The committee agreed that the assessment tool should assess information processing speed, verbal memory and visual memory (immediate recall) and prompted the selection of the following subtests: the Symbol Digit Modalities Test (SDMT; spoken response), the first five learning trials of the California Verbal Learning Test (CVLT-II) and the first three learning trials of the Brief Visuospatial Memory Test-Revised (BVMT-R [28]). These three subtests are reliable and sensitive to MS cognitive impairment.
The SDMT [29] is a measure of information processing speed comprising a key of single numbers, each paired with an abstract symbol. The patient is presented with rows of symbols that are arranged pseudo-randomly. They are required to say the correct number for each of the symbols as fast and as accurately as they can in 90 s, using the key provided. The SDMT shows high sensitivity for MS-related cognitive dysfunction and is now widely acknowledged as the gold standard for a quick cognitive screening [30].
In the CVLT-II [31], a measure of verbal memory, only the first five learning trials are administered. The patient is read a 16-item word list at a slightly slower rate than one item per second. The list is read aloud five times, and the patient is instructed to recall as many of the items as possible, in any order, across the five learning trials.
In the BVMT-R [32], a measure of visual memory, only the first three learning trials are administered. This test involves presenting to patients a 2 × 3 stimulus array of abstract geometric figures across three learning trials, each 10 s in length. The array is then removed from the patient's view, and they are instructed to draw the geometric figures in the correct position from memory.
The Brief Cognitive Assessment for Multiple Sclerosis (BICAMS [28]) has been recommended as a 15 min international measure to routinely screen and monitor cognition in MS patients. It was designed for healthcare professionals who may not have specific training in cognitive assessments, allowing more clinics to address cognition. This brief assessment tool does not require any special equipment beyond a pen, paper and stop-watch and therefore allows cognition to be tested inexpensively. BICAMS can be easily implemented into routine clinical practice across centres and countries internationally [28]. The committee have also published an international validation protocol to guide national validation studies [33].
BICAMS has been validated in 26 countries to date, including Argentina, Belgium, Turkey and Japan (e.g., [34]). These national studies have investigated the validity and reliability of BICAMS in different cultures and language groups and its sensitivity to cognitive impairment in comparison with the "gold-standard" batteries. The AAN has recommended BICAMS in their quality measurement sets for MS in 2014 and 2020. The Canadian Guidelines for MS Treatment endorsed BICAMS in 2020 [35], and over 20 peer review papers in international clinical neurology journals have also recommended BICAMS for routine cognitive assessment in MS clinics (e.g., [36]). BICAMS has been adopted by the international MS community. For example, the Arabic version of BICAMS represents the most used cognitive battery for assessing MS cognition in the Arab world [37]. It has an international reach, with 11,000 patients routinely assessed every year. There has been a systematic review of the first 16 national validation studies on BICAMS [34]. However, there have since been additional national validation studies, warranting an updated systematic review of the validation literature and international findings. The aim of the present systematic review and meta-analysis was to identify, synthesise and critically evaluate current literature on the progress of BICAMS in meeting the objectives of global collaboration and a credible international validation protocol.

Search Strategy
The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement was followed as a guide for the standardised conduct and reporting of the current systematic review and meta-analysis [38]. Studies were identified using 3 databases-PubMed, PsycINFO and Web of Science. Boolean search terms were developed and used to identify studies examining the validity of BICAMS in August 2022 (Table 1). Search terms were informed by initial searches and developed further during the process of the review to ensure all relevant articles were identified.

Search Terms
"Multiple Sclerosis" OR "MS" OR "Clinically Isolated Syndrome" OR "CIS" AND "Brief International Cognitive Assessment for Multiple Sclerosis" OR "BICAMS" AND "Validation" OR "International Validation" OR "Validity" OR "Sensitivity"

Selection Criteria
The inclusion criteria were: (a) studies that followed the international validation BICAMS protocol, (b) quantitative studies, (c) peer-reviewed studies with no date restriction that are written in the English language and (d) samples including adults with any clinical subtypes of MS and Clinically Isolated Syndrome (CIS), the MS precursor stage.
The additional criteria for inclusion in the meta-analysis were as follows: (a) studies including an HC comparison group and (b) studies reporting standard quantitative information based on the SDMT, CVLT-II and BVMT-R subscales (mean, standard deviation and sample size) or appropriate substitute scales of the MS and/or CIS and HC comparison groups.

Quality Assessment
One reviewer (HP) extracted data from the studies directly into tables made specifically for the current review, and this was examined and verified by a second reviewer (DL).
Two reviewers independently assessed the quality of the retrieved articles using the Effective Public Health Practice Project (EPHPP), and any disagreements were discussed and resolved. A final quality rating was derived from the individual ratings of the categories.

Statistical Analysis
The meta-analysis was conducted using the Comprehensive Meta-Analysis (CMA; Version 3) software [39]. Three individual analyses were performed based on the average scores of the SDMT, CVLT-II and BVMT-R subtests for both groups (MS and HC). Effect sizes were calculated as standardised mean differences with Hedges g using the following interpretation: 0.2 = small; 0.5 = medium; 0.8 = large [40].
The meta-analysis employed a random-effects model because it estimates the mean of a distribution of effects as opposed to one true effect [41,42], and the number of studies are large enough i.e., more than 5 studies. Heterogeneity was assessed using the Cochran's Q test, and the magnitude of heterogeneity was evaluated using the I 2 statistic. The I 2 statistic assesses the percentage of variation across studies that are due to heterogeneity rather than chance and can be interpreted as a small (25%), moderate (50%) or high (75%) level of heterogeneity [43].
Forest plots were created for each subtest to visually summarise the amount of heterogeneity as well as the estimated effect sizes (Hedges g) and 95% CIs. Funnel plots were also generated as a graphical tool for investigating publication bias and other bias (assessed by the Egger's test), which, if found, may lead to funnel plot asymmetry [44]. If asymmetry was shown, the Duval and Tweedie trim and fill analysis would model the data as if it were symmetrically distributed by adjusting for missing studies [45].

Search Results
Using the pre-specified eligibility criteria, 55 results were generated from PubMed, 24 from PsycINFO and 124 from Web of Science. First, 132 duplicate studies across databases were removed ( Figure 1). To assess for eligibility, all titles and abstracts were initially screened independently by two reviewers (HP and DL). The 30 full-text articles were re-evaluated to determine their final inclusion or exclusion. Following this, four studies were removed from the final review according to the inclusion criteria. A total of 26 studies met the criteria for final inclusion in the systematic review.
All 26 studies met the criteria for the meta-analysis from those included in the systematic review. All relevant data for the current review and meta-analysis were obtained from numerical information in texts, tables, figures and statistical analysis.

Study Characteristics and Sample Demographics
Data on study characteristics, sample demographics and patient disease information are shown in Table 2. The 26 validation studies were published between the years 2012 and 2022.
Adults with MS were recruited from a variety of settings including medical centres, university hospitals, specialist clinics and tertiary referral centres. HC were either recruited from the community, an established normative sample or among relatives, friends or carers of PwMS. The studies included a total of 2833 adults with MS and 2382 healthy controls. Sample size of both groups differed greatly between studies; in PwMS, the samples ranged from 40 to 500 participants, whilst for HC, this ranged from 20 to 276. Age of PwMS ranged from 20-61 years with an average age of 39.9, whilst the age of HC ranged from 22-51 years, with a similar average age of 38.9. The percentage of females in the MS and HC sample disproportionately favoured females and ranged from 47-82% in the MS sample and 33-86% in the HC. Eight studies used the same number of males and females. Years of education averaged 14.13 years in the MS sample and 14.58 years in HC. Higher rates of employment were seen in the HC in comparison to the MS samples (39-98% compared to 20-89%, respectively).

Study Characteristics and Sample Demographics
Data on study characteristics, sample demographics and patient disease information are shown in Table 2. The 26 validation studies were published between the years 2012 and 2022.

Patient Disease Information
Six studies recruited an exclusively RRMS sample, whilst the remaining studies also included a mixture of other phenotypes (e.g., SPMS or PPMS). RRMS was the most represented phenotype (33-100%), followed by SPMS (0-38%). Three studies included participants with CIS in their sample. The revised McDonald criteria for MS was the most used diagnostic criterion [72]. The average disease duration was 9.16 years and ranged from 1.08 to 14.67 years. The average Expanded Disability Status Scale (EDSS [73]) score was 2.75, indicating that, on average, the participants were in the mild disability range and could walk unaided.
Few studies calculated sensitivity and specificity data (Table 3), and it is noteworthy that, in the large Czech Republic sample, BICAMS demonstrated the same sensitivity to cognitive impairment as the "gold-standard" Minimal Assessment of Cognitive Function in MS (MACFIMS [52]).

Correlations between BICAMS and Sample Variables
Correlations between BICAMS subtest scores and sample variables (age, disease duration, EDSS score, education, and employment) were extracted (Table 4). Correlations between age and BICAMS scores were the most frequently reported and usually significant; correlations between EDSS scores and BICAMS were occasionally reported and inconsistently significant.

Quality Ratings
The overall quality of the studies ranged from 'moderate' to 'weak' on the EPHPP template, reflecting the cross-sectional design typical of validation studies. No studies were removed from this review following the quality assessment.

Meta-Analysis of BICAMS Validation Studies
Data on the standard quantitative information based on the subtests of the SDMT, CVLT-II and BVMT-R of the MS and HC groups were extracted for baseline assessments of BICAMS (Table 3). The percentage of people in both groups identified with likely cognitive impairment on at least one subtest was also extracted, along with the sensitivity and specificity of BICAMS. The results from all three subtests showed that adults with MS performed significantly worse than HC. BICAMS identified likely impaired cognition, on at least one subtest, in 25-73% in the MS sample, which was significantly higher than in HC (1-20%).
The forest plot (Figure 2) shows the effect size for each study using the SDMT. Overall, information processing speed was significantly lower in the MS sample compared to HC with a large effect size (g = 0.854, 95% CI = 0.765, 0.944, p < 0.001). There was no evidence of outliers; however, moderate heterogeneity (Q = 51.9, p = 0.001) was indicated (I 2 = 51.8). There was no evidence of publication bias (Egger's test: p > 0.05, two-tailed). The funnel plot (Figure 3) indicates that the effect sizes were symmetrical. Duval and Tweedie's trim and fill analysis estimated that no studies were missing from the analysis.   funnel plot (Figure 3) indicates that the effect sizes were symmetrical. Duval and Tweedie's trim and fill analysis estimated that no studies were missing from the analysis.  A translated version of the CVLT-II was used in 18 validation studies. For two studies, the CVLT-II was not translated as the validation studies were conducted in English-speaking countries with existing validations [49,50]. Importantly, six of the studies used an alternative verbal memory test to substitute the conventional CVLT-II (Table 3). The average mean and standard deviation scores of these alternative tests were included in the metaanalysis. Notably, the study with the smallest effect size, with a Hedge's g value of 0.017, used a substituted verbal memory test ( [51]; Figure 4). The study with the highest effect size, with a Hedge's g value of 1.072, used a translated version of the CVLT-II ( [52]; Figure 4).
The forest plot (Figure 4) shows the effect size for each study using the CVLT-II. Overall, immediate verbal recall memory was significantly lower in the MS sample  funnel plot (Figure 3) indicates that the effect sizes were symmetrical. Duval and Tweedie's trim and fill analysis estimated that no studies were missing from the analysis.  A translated version of the CVLT-II was used in 18 validation studies. For two studies, the CVLT-II was not translated as the validation studies were conducted in English-speaking countries with existing validations [49,50]. Importantly, six of the studies used an alternative verbal memory test to substitute the conventional CVLT-II (Table 3). The average mean and standard deviation scores of these alternative tests were included in the metaanalysis. Notably, the study with the smallest effect size, with a Hedge's g value of 0.017, used a substituted verbal memory test ( [51]; Figure 4). The study with the highest effect size, with a Hedge's g value of 1.072, used a translated version of the CVLT-II ( [52]; Figure 4).
The forest plot (Figure 4) shows the effect size for each study using the CVLT-II. Overall, immediate verbal recall memory was significantly lower in the MS sample A translated version of the CVLT-II was used in 18 validation studies. For two studies, the CVLT-II was not translated as the validation studies were conducted in English-speaking countries with existing validations [49,50]. Importantly, six of the studies used an alternative verbal memory test to substitute the conventional CVLT-II (Table 3). The average mean and standard deviation scores of these alternative tests were included in the meta-analysis. Notably, the study with the smallest effect size, with a Hedge's g value of 0.017, used a substituted verbal memory test ( [51]; Figure 4). The study with the highest effect size, with a Hedge's g value of 1.072, used a translated version of the CVLT-II ( [52]; Figure 4).
There was no evidence of outliers; however, a high level of heterogeneity (Q = 77.9, p < 0.001) was indicated (I 2 = 67.9). Duval and Tweedie's trim and fill analysis estimated that three studies would need to fall to the left of the mean effect size to make the plot symmetrical ( Figure 5). Assuming a random-effects model, the adjusted mean effect size remained medium (p = 0.528, 95% CI = 0.420, 0.635). There was no evidence of publication bias, as the Egger's test remained non-significant (Egger's test: p > 0.05, two-tailed).  The forest plot ( Figure 6) shows the effect size for each study using the BVMT-R. Overall, immediate visual recall memory was significantly lower in the MS sample compared to HC with a medium effect size (g = 0.566, 95% CI = 0.487, 0.645, p < 0.001). There was no evidence of outliers; however, moderate heterogeneity (Q = 42.6, p < 0.05) was indicated (I 2 = 41.4). There was no evidence of publication bias (Egger's test: p > 0.05, two-tailed). The funnel plot (Figure 7) indicates that the effect sizes were symmetrical. Duval and Tweedie's trim and fill analysis estimated that no studies were missing from the analysis. The forest plot (Figure 4) shows the effect size for each study using the CVLT-II. Overall, immediate verbal recall memory was significantly lower in the MS sample compared to HC with a medium effect size (g = 0.566, 95% CI = 0.459, 0.673, p < 0.001). There was no evidence of outliers; however, a high level of heterogeneity (Q = 77.9, p < 0.001) was indicated (I 2 = 67.9). Duval and Tweedie's trim and fill analysis estimated that three studies would need to fall to the left of the mean effect size to make the plot symmetrical ( Figure 5). Assuming a random-effects model, the adjusted mean effect size remained medium (p = 0.528, 95% CI = 0.420, 0.635). There was no evidence of publication bias, as the Egger's test remained non-significant (Egger's test: p > 0.05, two-tailed). There was no evidence of outliers; however, a high level of heterogeneity (Q = 77.9, p < 0.001) was indicated (I 2 = 67.9). Duval and Tweedie's trim and fill analysis estimated that three studies would need to fall to the left of the mean effect size to make the plot symmetrical ( Figure 5). Assuming a random-effects model, the adjusted mean effect size remained medium (p = 0.528, 95% CI = 0.420, 0.635). There was no evidence of publication bias, as the Egger's test remained non-significant (Egger's test: p > 0.05, two-tailed).  The forest plot ( Figure 6) shows the effect size for each study using the BVMT-R. Overall, immediate visual recall memory was significantly lower in the MS sample compared to HC with a medium effect size (g = 0.566, 95% CI = 0.487, 0.645, p < 0.001). There was no evidence of outliers; however, moderate heterogeneity (Q = 42.6, p < 0.05) was indicated (I 2 = 41.4). There was no evidence of publication bias (Egger's test: p > 0.05, two-tailed). The funnel plot (Figure 7) indicates that the effect sizes were symmetrical. Duval and Tweedie's trim and fill analysis estimated that no studies were missing from the analysis. The forest plot ( Figure 6) shows the effect size for each study using the BVMT-R. Overall, immediate visual recall memory was significantly lower in the MS sample compared to HC with a medium effect size (g = 0.566, 95% CI = 0.487, 0.645, p < 0.001). There was no evidence of outliers; however, moderate heterogeneity (Q = 42.6, p < 0.05) was indicated (I 2 = 41.4). There was no evidence of publication bias (Egger's test: p > 0.05, two-tailed). The funnel plot (Figure 7) indicates that the effect sizes were symmetrical. Duval and Tweedie's trim and fill analysis estimated that no studies were missing from the analysis.  Only four studies reported the sensitivity and specificity of BICAMS. Of these four studies, one reported on the sensitivity and specificity of BICAMS overall (94% and 86%, respectively), whilst the remaining three reported on the sensitivity and specificity of the individual subtests (see Table 3).

Summary of Findings
The current review identified, synthesised and appraised the current literature on the international validation of BICAMS to date. A total of 26 studies were included in both the systematic review and meta-analysis. The results from the systematic review showed that BICAMS has been embraced in many countries worldwide and with a range of clinical samples, including different MS phenotypes and consequently, disease durations and severity. Most studies included a HC sample with a similar age and educational back-  Only four studies reported the sensitivity and specificity of BICAMS. Of these four studies, one reported on the sensitivity and specificity of BICAMS overall (94% and 86%, respectively), whilst the remaining three reported on the sensitivity and specificity of the individual subtests (see Table 3).

Summary of Findings
The current review identified, synthesised and appraised the current literature on the Only four studies reported the sensitivity and specificity of BICAMS. Of these four studies, one reported on the sensitivity and specificity of BICAMS overall (94% and 86%, respectively), whilst the remaining three reported on the sensitivity and specificity of the individual subtests (see Table 3).

Summary of Findings
The current review identified, synthesised and appraised the current literature on the international validation of BICAMS to date. A total of 26 studies were included in both the systematic review and meta-analysis. The results from the systematic review showed that BICAMS has been embraced in many countries worldwide and with a range of clinical samples, including different MS phenotypes and consequently, disease durations and severity. Most studies included a HC sample with a similar age and educational background. Although BICAMS was designed to be administered by a range of health professionals, in these validation studies, BICAMS was apparently typically completed by a neuropsychologist or psychology graduate; however, this information was not routinely reported. Finally, in most studies, the gender ratio in both samples disproportionately favoured females. It is important to consider that this female recruitment bias reflects the increased prevalence of MS in females, the female-to-male sex ratio being approximately 3:1 [8].
The meta-analysis showed that adults with MS performed significantly worse than HC on the three BICAMS subtests-information processing speed and immediate verbal and visual recall. Cognitive functioning was most impaired on the SDMT (a measure of information processing speed). These findings are in line with existing literature proposing that information processing speed is markedly reduced in MS [74] and constitutes the most common cognitive limitation in PwMS [75]. It is important to stress that BICAMS should be administered in its entirety, given that multiple aspects of daily life can be affected by cognitive impairment in addition to processing speed, e.g., visuospatial learning as assessed by the BVMT-R [76].
It is important to note that the BICAMS committee included experts from Europe and America and may lack diversity and inclusivity in development and cross-cultural appropriateness [77,78]. The CVLT-II scores were more heterogeneous compared to the other subtests, possibly reflecting the additional linguistic and cultural demands of translating the verbal recall list. Prior to BICAMS, the CVLT-II had separate word lists and validations for the UK and USA. Six BICAMS validation studies used alternative verbal memory tests available in the required language. Several validation studies [49,51] reported difficulties with translating the CVLT-II and described similar scores on the CVLT-II between the MS sample and HC. The CVLT-II is also probably the most culturally sensitive of the three subtests and required more extensive work to accomplish a valid translation of the stimuli [69]. Semantic categories for the word list were sometimes adapted to be more applicable for the population e.g., by swapping different types of sports for cooking utensils in Egypt [55].

Strengths
There are several strengths to this review. First, the search strategy was designed and validated using a combination of three databases-PubMed, PsycINFO and Web of Science-to cover a breadth of the available and relevant literature. Secondly, strict inclusion criteria were employed to ensure appropriate studies were generated. Furthermore, this review identified and synthesised international validation studies reporting objective scores of cognitive abilities in PwMS compared to matched HC in a standardised manner. This review captures the advances in validating BICAMS internationally since the previous review [34], with further validations in 12 more countries. Across the validation studies, there was a varied spread of cultures, languages and countries involved in the initiative. The countries that participated in the international validation protocol reported that BI-CAMS could be feasibly administered in approximately 15 min, with minimal materials, and was recommended for routine clinical cognitive assessment as a standard of MS care.

Limitations
There are also some notable limitations to the review methodology. First, Englishlanguage publication was a requirement for inclusion in the review, so it is important to recognise that this may have limited the inclusion of validation studies published in other languages. Secondly, only the terms "Multiple Sclerosis", "MS". "Clinically Isolated Syndrome" or "CIS" were used in the database search. This may have restricted the number of studies identified through the database search, as there are additional ways to describe MS (e.g., as an autoimmune disease). Thirdly, as part of the pre-defined criteria, only peer-reviewed studies were considered eligible for inclusion in this review, which meant that possible grey literature (e.g., thesis publications) that were not commercially published would not have been included. Fourthly, there are likely to be international disparities across studies in relation to healthcare systems, accessibility, economic status, and access to general MS support facilities [79,80]. MS healthcare in countries with developing economies may be constrained by limited access to high-efficacy disease-modifying therapies (DMTs) or diagnostic technology such as magnetic resonance imaging (MRI [81]). Developed countries have significantly higher prevalence and incidence rates of MS compared to developing countries, which may reflect better access to diagnostic facilities and subsequent earlier diagnosis and treatment [82]. These variations in access and quality of MS healthcare may have made comparisons of disease profiles, such as years since diagnosis and physical disability, less valid. Most of the studies included in this review were conducted in leading centres and university hospitals, which attract a certain sociodemographic population and, therefore, may not be entirely representative of all MS populations. Fifthly, there was a great deal of heterogeneity between studies-namely in terms of sample size, age, MS phenotypes and disease duration. RRMS was overrepresented compared to other MS phenotypes. It is possible that this may have reduced the effect size since cognitive impairment is more common and severe in the progressive forms of the disease [10,11]. With progressive forms of MS being underrepresented in this review, cognitive impairment may also have been underrepresented in the identified studies compared to the general MS population. Finally, the quality assessment tool (EPHPP) used to analyse the methodological quality of the included studies may not have been considered appropriate in this systematic review, since it is not a scale designed for cross-sectional studies. This may explain why the overall quality of the studies ranged from 'moderate' to 'weak' on the EPHPP template. In addition, the possible risk of bias was not studied.

Future Directions
The adoption of an international validation protocol and a global collaboration have served to promote BICAMS to international currency for MS cognition. This is reflected in the number of international validations published, the report of BICAMS data in 150 published studies of MS cognition and its use in many large national and international trials. This initiative could serve as a model for other conditions, improving the awareness, understanding, assessment and management of cognitive impairment. It is hoped that further research investigating the feasibility of BICAMS in clinical practice will maximise its use in routine consultation to evaluate cognitive status in MS. This systematic review also prompts future studies to investigate the sensitivity and specificity of the scale in different forms of multiple sclerosis or in groups with different degrees of disability.

Conclusions
BICAMS has been translated and culturally adapted in 26 countries to date. It has been shown to be a valid measure of cognitive functioning in MS at a global level. It can detect cognitive impairment in individuals with MS compared to healthy controls across a range of cultures, languages, and countries. This review sheds light on the work of the international MS community at validating BICAMS utilising an international validation protocol. This represents progress in the increasing awareness of MS cognition as well as maximising the implementation of BICAMS into routine clinical practice, to assess and instigate the appropriate management of MS cognition across different countries.