Systematic Review of Primary Outcome Measurements for Chronic Fatigue Syndrome/Myalgic Encephalomyelitis (CFS/ME) in Randomized Controlled Trials

Background: Due to its unknown etiology, the objective diagnosis and therapeutics of chronic fatigue syndrome/myalgic encephalomyelitis (CFS/ME) are still challenging. Generally, the patient-reported outcome (PRO) is the major strategy driving treatment response because the patient is the most important judge of whether changes are meaningful. Methods: In order to determine the overall characteristics of the main outcome measurement applied in clinical trials for CFS/ME, we systematically surveyed the literature using two electronic databases, PubMed and the Cochrane Library, throughout June 2020. We analyzed randomized controlled trials (RCTs) for CFS/ME focusing especially on main measurements. Results: Fifty-two RCTs out of a total 540 searched were selected according to eligibility criteria. Thirty-one RCTs (59.6%) used single primary outcome and others adapted ≥2 kinds of measurements. In total, 15 PRO-derived tools were adapted (50 RCTs; 96.2%) along with two behavioral measurements for adolescents (4 RCTs; 7.7%). The 36-item Short Form Health Survey (SF-36; 16 RCTs), Checklist Individual Strength (CIS; 14 RCTs), and Chalder Fatigue Questionnaire (CFQ; 11 RCTs) were most frequently used as the main outcomes. Since the first RCT in 1996, Clinical Global Impression (CGI) and SF-36 have been dominantly used each in the first and following decade (26.1% and 28.6%, respectively), while both CIS and Multidimensional Fatigue Inventory (MFI) have been the preferred instruments (21.4% each) in recent years (2016 to 2020). Conclusions: This review comprehensively provides the choice pattern of the assessment tools for interventions in RCTs for CFS/ME. Our data would be helpful practically in the design of clinical studies for CFS/ME-related therapeutic development.


Introduction
Chronic fatigue syndrome/myalgic encephalomyelitis (CFS/ME) is a debilitating disease characterized by medically unexplained chronic severe fatigue for at least 6 months along with key symptoms such as unrefreshing sleep, postexertion malaise (PEM), impairments in memory or concentration, and/or orthostatic intolerance [1]. The daily lives of patients are heavily impeded, which leads to unemployment for approximately half of patients and being home-or bed-bound for one quarter [2]. The prevalence of CFS/ME is suggested to be approximately 1-2% worldwide [3], and the annual economic cost for medical care is estimated to be up to USD 10,000 per patient in the US [4].
Although various etiologies of CFS/ME, such as autonomic and neurological dysfunction, abnormalities in mitochondrial function, and aberrant gut microbiota, have been hypothesized, they have not yet been clearly revealed [5]. Recently, this disease has become considered a multisystem neuroimmune disease [1]. To date, various randomized controlled trials (RCTs) for therapeutics have been conducted; however, no effective therapy for CFS/ME exists [6]. Recently, the PACE trial, a large-scale clinical study of cognitive behavior therapy (CBT) and graded-exercise therapy (GET), was reported to be effective for CFS/ME [7]. There is however a fair amount of controversy surrounding this PACE trial, likely due to the debates regarding its efficacy and criticisms by researchers and patients due to judgments of restoration as well as side effects [8].
On the other hand, the absence of objective biomarkers of CFS/ME raises a problem for the actual diagnosis of this illness. In addition, clinical evaluations of treatment responses are also dependent on self-reported assessments of symptom severity, leading to potential trouble during the investigation of new therapeutics [9]. Accordingly, methodologically well-designed tools to assess the valuable responses of treatments for CFS/ME are very important. To date, diverse patient-reported outcome (PRO) measurements have been developed and used to assess fatigue status in clinics, such as the Checklist Individual Strength (CIS) scale, Chalder Fatigue Questionnaire (CFQ), and Multidimensional Fatigue Inventory (MFI) [10][11][12]. Many clinical studies, however, have adopted various fatigue-nonspecific instruments, including the 36-item Short Form Health Survey (SF-36), Clinical Global Impression (CGI), and Sickness Impact Profile-8 (SIP-8) [13][14][15]. In fact, researchers need to carefully review the available measurements and choose the most optimized one for the purpose of their own clinical studies. However, it is not easy for researchers to choose the appropriate measurement instruments for CFS/ME-related studies due to the absence of well-established international guidelines.
To identify the assessment tools that help in the clinical study process for CFS/ME, we comprehensively reviewed the primary measurements used in RCTs and determined changes in the use of these measurements.

Data Sources and Search Terms
In accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines [16], a systematic literature survey was performed using two electronic literature databases, PubMed and the Cochrane Library, throughout June 2020. The search terms were encephalomyelitis/chronic fatigue syndrome, ME/CFS, encephalomyelitis, ME, chronic fatigue syndrome, CFS, randomized controlled trial, RCT, and clinical trial. The trial type was limited to RCTs, and all languages were included.

Eligibility Criteria
Selected articles for this study were determined by the following inclusion criteria: (1) RCTs or randomized controlled crossover trials, (2) patients with CFS/ME as participants, (3) an evaluation of the efficacy of the intervention for CFS/ME treatment, and (4) fatigue-related measurement or outcome. The exclusion criteria were as follows: (1) articles with no full text and (2) studies without mention of the primary or main outcome. We did not have a criterion based on the number of participants in RCTs.

Data Extraction and Analysis
We extracted data on general features of RCTs, such as the number of participants, age, intervention, and treatment period, along with the primary outcome measurement instrument (subscales, items, range of scores, versions, and application of cutoff scores for recruitment).
As a descriptive analysis, this study did not need to apply statistical analyses. Regarding the treatment period, the mean and standard deviation (SD) are presented.

General Characteristics of RCTs
A total of 540 articles were initially identified from the PubMed and Cochran databases, and 52 articles met the inclusion criteria for this study (Figure 1). Forty-eight RCTs (92.3%) were performed with adult patients (n = 5872), while 4 RCTs (7.7%) were performed with adolescent subjects (n = 387). Twenty-six RCTs evaluated the efficacy of pharmacologic interventions, and 27 RCTs were conducted to evaluate nonpharmacologic interventions. The mean treatment period was 15.0 ± 9.3 weeks (Table 1).

General Characteristics of RCTs
A total of 540 articles were initially identified from the PubMed and Cochran databases, and 52 articles met the inclusion criteria for this study (Figure 1). Forty-eight RCTs (92.3%) were performed with adult patients (n = 5872), while 4 RCTs (7.7%) were performed with adolescent subjects (n = 387). Twenty-six RCTs evaluated the efficacy of pharmacologic interventions, and 27 RCTs were conducted to evaluate nonpharmacologic interventions. The mean treatment period was 15.0 ± 9.3 weeks (Table 1). In terms of the number of primary outcomes in RCTs, 31 RCTs (59.6%) used a single primary outcome (29 RCTs with adults and 2 RCTs with adolescents). Fifteen RCTs (28.8%) adopted two kinds of main measurements (with adult patients), while six RCTs (11.5%) used three kinds of measurements (four RCTs with adults and two with adolescents) as a primary outcome (Table 1). In terms of the number of primary outcomes in RCTs, 31 RCTs (59.6%) used a single primary outcome (29 RCTs with adults and 2 RCTs with adolescents). Fifteen RCTs (28.8%) adopted two kinds of main measurements (with adult patients), while six RCTs (11.5%) used three kinds of measurements (four RCTs with adults and two with adolescents) as a primary outcome (Table 1).   Supplementary Table S1. B One RCT used both pharmacologic and nonpharmacologic interventions (fluoxetine + graded exercise therapy). C Some items were applied multiple times; thus, the total percentage was larger than 100%. D Eighteen RCTs applied a cutoff score for inclusion criteria.

Characteristics of Primary Measurements in RCTs
As shown in Figure 2, the 52 RCTs used 17 kinds of methodological instruments, which were classified into survey-based measurements (15 instruments in 50 RCTs) and behavioral measurements (two instruments in four RCTs). All RCTs with adults adopted survey-based measurements, while four RCTs with adolescent patients adopted behavioral (two RCTs) and/or survey-based (two RCTs) measurements (Table 1).

Discussion
In terms of CFS/ME, a symptom-based approach is a key strategy for not only therapy but also diagnosis because of its unknown etiology [1]. The Centers for Disease Control and Prevention (CDC) recommended symptomatic treatment based on the case definition of the Institute of Medicine (IOM) for providing alternative care for patients [68]. The subjective complaints and comprehension of the PROs are crucial in the diagnostic process as well as in evaluating therapeutic responses in clinical practice for CFS/ME. To provide practical guidance in choosing a suitable measurement in clinical studies for CFS/ME, we analyzed the primary outcome measurements in RCTs conducted to date.
Unlike common guidelines recommending single primary outcome measurement in RCTs [69], 21 (40.4%) of the 52 RCTs employed multiple primary measurements (Table 1). This might be due to the absence of a well-established measurement tool specialized for CFS/ME. Among the 17 tools used in the 52 RCTs, only two behavioral measurements (school attendance rate and the number of steps per day) were adopted in four RCTs that enrolled only adolescent participants (Table 1). It is generally well known that adolescent patients show a poorer school attendance rate than healthy controls [70]. The remaining RCTs (50 RCTs with 15 different tools) employed survey-based PRO measurements, likely for many subjective symptoms or disorders, including migraine, major depressive disorder, or anxiety [71][72][73]. We classified the measurements into two groups: nine nonfatigue specialized tools employed mainly in an earlier decade (1996 to 2005) and eight fatigue-specialized measurements which have been dominant since 2016 (Figure 2).
The SF-36, not specialized for fatigue, is the most frequently used measurement based on our data (16 RCTs) ( Table 2). It has been broadly applied for measuring patients' general health status in reference to health-related quality of life (HRQOL). It is well recognized that the HRQOL of CFS/ME sufferers is notoriously poor and has been linked to a 7-fold higher risk of suicide than healthy controls [74,75]. Therefore, the SF-36, especially the physical functioning subscale, was steadily employed as a primary measurement until 2015, often supportively combined with other fatigue-specialized measurements (10 RCTs), such as the CIS or CFQ. Likewise, the SIP-8 score assessing dysfunction of daily behaviors has been used as part of the primary outcome coupled with fatigue-specialized tools (Supplementary  Table S1).
In regard to fatigue-specialized instruments, the fatigue severity subscale of the CIS and the total score of the CFQ (11-item version) were dominantly employed ( Table 2). Both have been commonly endorsed for the evaluation of psychometric fatigue status in RCTs for CFS/ME and other disorders, including rheumatoid arthritis and fibromyalgia [76]. Both instruments assess not only physical but also mental fatigue status, such as concentration and motivation, and they are known to show a very high correlation in assessing fatigue severity [77]. In particular, the CFQ was employed mostly in trials conducted in the UK (9/11 adoptions), while the CIS was preferred in the Netherlands (12/14 adoptions). On the other hand, the MFI, markedly preferred in recent studies along with the CIS, was originally developed for assessing multifarious fatigue status in patients with cancer [12]. The MFI was one of the measures in the Wichita clinical study assessing over 30 kinds of measurements or parameters for CFS/ME in 2005, and the MFI was proven as a valid measurement [78]. Recently, the MFI was applied in a large-scale study to explore the cytokine signature that showed a positive correlation between serum levels of TGF-β and the severity of CFS/ME [79]. Both the MFI and CIS were created by Dutch researchers and contain 20 nearly identical questionnaire items. However, they have some differences in measurement method strategies: a maximum of 140 points with 7-point scales on the CIS versus a maximum of 100 points with 5-point scales on the MFI (Supplementary Table S2). Unlike the CFQ-11, the MFI and CIS adopt both positive and negative questions and measure PEM-related symptoms such as "I am tired very quickly or easily", which is focused on as one of the recently established hallmark symptoms of CFS/ME [1].
In fact, numerous studies certified the validity and reliability of these commonly used instruments for CFS/ME, such as the CFQ-11, CIS, and MFI [10,78,80], while some researchers have pointed out the ceiling effects of these measurements, especially in clinical trials for treatments [81,82]. They are concerned with the possibility that sufferers of CFS/ME tend to report scores close to maximum, thereby hindering the accurate reflection of treatment response and the baseline condition. Most measurement tools (including CFS/ME-specific instruments) have non-CFS/ME-specific questionnaires, such as "I feel tired" or "I feel weak", which are frequently complained of among general populations. Accordingly, many trials (most RCTs adopted CIS-based primary outcome) used cutoff scores in the process of participant inclusion (Supplementary Table S1). On the other hand, responders to the CFQ-11 will obtain high scores due to comparisons with "usual" or "last well-state". Because most CFS patients have experienced many years of the disease with fluctuating symptoms, assessment methods involving comparisons to "usual" can hardly reflect not only deterioration in status but also treatment response [2]. Thus, some studies have adopted a modified CFQ-11 as a 10-point Likert scale (from 0 points for healthy conditions to 9 points for the worst status) in RCTs for drug development related to CFS/ME [41].
Although no confirmative pathophysiology of CFS/ME has been identified, some new findings have been highlighted, such as aberrant composition of the gut microbiome and altered serotonergic metabolism within the brain [83,84]. In addition, several studies investigating objective parameters for diagnostic and severity assessments, including elevated levels of TGF-β and nanoelectronic assays, have been conducted [79,85]. One group also found a reduction of red blood cell deformability in patients with CFS/ME [86]. Along with these advances in knowledge, it is necessary that a CFS/ME-specialized measurement instrument be developed to reflect the clinical severity and treatment response and objective biomarkers be discovered to ensure CFS/ME.

Conclusions
This systematic review provides a comprehensive overview of the choice of primary measurements in RCTs for CFS/ME to date. Approximately 40% of RCTs applied multiple primary measurements.
Of the 17 kinds of measurement tools, the SF-36 (nonfatigue specific measurement) had been most frequently applied through 2015, while two fatigue-specific measurements, the CIS and MFI, have been frequently employed in recent trials. Our data will be helpful in the practical design of clinical studies for CFS/ME-related therapeutic development.

Conflicts of Interest:
The authors have no conflicts of interest to declare.