Patient-Reported Outcomes in Inﬂammatory Bowel Disease: A Measurement of Effect in Research and Clinical Care

: The measurement of outcomes is key in evaluating healthcare or research interventions in inﬂammatory bowel disease (IBD). In patient-centred care, patient-reported outcome measures (PROMs) are central to this evaluation. In this review, we provide an overview of validated, adult disease-speciﬁc PROMs developed for use in IBD. Our aim is to assist clinicians and researchers in selection of PROMs to measure outcomes in their patient cohort. The Consensus-based Standards for the Selection of Health Measurement Instruments database of systematic reviews was the primary resource used to identify PROMs used in IBD. Search terms were ‘Crohn’s disease’, ‘ulcerative colitis’, and ‘IBD’. Seven systematic reviews were identiﬁed from this search. In addition, the publication by the IBD Core Outcome Set Working Group was used to identify further PROMs. Three systematic reviews were excluded as they did not meet the inclusion criteria. From the ﬁve included systematic reviews, we identiﬁed 21 PROMs and their shortened versions. In conclusion, it does not appear that any one PROM is entirely suitable for both research and clinical practice. Overall, the IBDQ-32 is most widely used in research but has the limitation of cost, whereas the IBD-Control has been recommended in the clinical core outcome set.


Introduction
Inflammatory bowel diseases (IBDs) are commonly categorized in to two principal diseases, Crohn's disease (CD) and ulcerative colitis (UC). Both are chronic, debilitating diseases causing inflammation and ulceration in the gastro-intestinal (GI) tract; extragastrointestinal manifestations can occur with their own impact upon patient's quality of life, for example, skin, eye, and joint diseases. UC usually only affects the colon, whereas CD may affect the entire GI tract from mouth to anus. CD is not limited to mucosal disease (as is the case with UC) but can also manifest with stricturing and fistulating disease, each with varying symptom presentation and burdens. Presenting symptoms tend to cluster, with abdominal pain and fatigue being more common in those with CD, and bloody diarrhoea with urgency more common in those suffering from UC [1]. Physical symptoms may also precipitate anxiety, for example, about distance to the toilet present in the majority (59%) of patients presenting for the first time with UC [1]. Sleep quality and side effects from medication such as steroids also has a detrimental impact upon anxiety and depression [2]. Management of IBD is generally focused on reducing mucosal inflammation and inducing disease remission. Bryant et al. [3] discuss that endoscopic mucosal healing is a primary outcome in both clinical treatment and clinical trials in IBD, in addition to the resolution of symptoms.
The measurement of outcomes is key to evaluating healthcare or research interventions in IBD. The Consensus-based Standards for the Selection of Health Measurement Instruments (COSMIN) is an initiative of a multi-disciplinary, international team of researchers with the aim of improving the selection of outcome measurement tools used in research [4]. Further work within the Core Outcome Measures in Effectiveness Trials (COMET) initiative seeks to identify core outcome sets, an agreed standardised set of outcomes that should be measured in all clinical trials of a specific disease or population. This commonality is essential to allow effective synthesis and comparison of research outcomes. In patient-centred research and care, patient-reported outcomes must be a priority and potentially provide the most meaningful evaluation of interventions. A core outcome set has been devised for the IBD healthcare setting considering four key elements, including survival and disease control (measured by a disease index), healthcare utilization, disutility of care (e.g., disease complications and steroid use) and symptoms, and function and quality of life measured by a patient reported outcome measure [5].
Patient-reported outcome measures (PROMS) are tools often presented in the form of patient-focused questionnaires that may be generic or disease specific. The United States Food and Drug Administration [6] define a PROM as any report of the status of a patient's health that comes directly from the patient. The report must be without interpretation of the patient's response by a clinician, and without laboratory assessments or measurements to inform the response [6]. This sets PROMS apart from commonly used disease indices in IBD, such as the Crohn's Disease Activity Index, that often combine patient reporting with other clinical parameters. Within IBD, Bojic et al. [7] reported that there are 23 different PROMS, including shortened forms, which have been developed with the aim of adequately capturing the patient's perception as measurements of outcomes from treatment interventions. These are often questions related to impact of their disease, disability related to their disease, and/or the health-related quality of life. It is, therefore, challenging to select an appropriate PROM for use. Additionally, it is unclear if any specific PROM is appropriate for both research purposes and clinical care.
In this review of systematic reviews, we provide an overview of identified, validated, adult disease-specific PROMs developed for use in IBD. We consider their application in both the research and clinical settings, with the aim of determining which of these may be most practical in each setting, and whether a single PROM could be used across all settings. Our aim is to assist clinicians and researchers in selection of PROMS to measure outcomes in their patient cohort.

Search Strategy
For the purposes of this narrative review, the focus was on validated tools only. For this reason, the COSMIN database of systematic reviews of outcome measurements was the primary resource used to identify PROMS used in IBD as recommended by COMET [8]. Search terms were 'Crohn's disease', 'ulcerative colitis', and 'IBD'. As all reviews on the COSMIN database relate to outcome measures, it was not necessary to include 'PROM' as a search term. Seven systematic reviews were identified from this search. In addition, the publication by the IBD Core Outcome Set (COS) Working Group was used to identify further PROMs [5]. Two systematic reviews claimed evaluation of PROMs but included measures that were primarily disease indices that did not meet the required definition of a PROM. Three systematic reviews were excluded ( Figure 1).

Data Extraction
The list of PROMS from each included paper was tabulated with key characteristics as described within the paper, including how it was administered, the number of questions, recall period, domains, and parametric properties (Table 1). Relevant comments or observations made either by the systematic review authors or this group were also recorded.

Quality Evaluation
The systematic reviews were explored for overall quality of their process using the Critical Appraisal Skills Programme (CASP) [9] tool for systematic reviews. Each review was also scrutinized to identify the quality appraisal they had undertaken of each included PROM or disease activity measure.

Results
From the five included systematic review, we identified 21 PROMs and their shortened versions ( Table 1). All the reviews included papers related to more than one PROM.
The CASP appraisal of the reviews revealed that only one [10] evaluated the methodological quality of the PROMS papers they included outside of exploring the psychometric properties. All the systematic reviews examined the included PROMs for validity, consistency, and reliability to varying degrees of precision but only two used a full range of criteria for reliability, validity, and responsiveness ( Table 2). Only one review looked at cross-cultural validity [11], and only one looked at establishment [10].
A number of different grading systems were used across the reviews to explore psychometric quality. For example Chen et al. [11] and Alrubaiy et al. [10] used the COSMIN checklist with a four-point scale [4]. Kim et al. [5] used a low-, medium-, highgrading system in assessing psychometric properties but it was not clear on what criterion these were based. In developing the COS, Kim et al. [5] were seeking specific domains and thus additionally assessed PROMs against their chosen domains and applicability to clinical practice. Pallis et al. [12] did not describe a grading system. Dhruva et al. [13] used criterion developed by Streiner et al. in 1995 [14].
The domains of each PROM are detailed in Table 3, either as described within the systematic review or from the primary source if available.   Table 3. Domains of patient-reported outcome measures described in published systematic reviews.

Discussion
Selection of the most appropriate PROM for use in trials or clinical practice begins with understanding the measurement properties of that tool. Without this step results can be biased and untrustworthy. The COSMIN initiative (Consensus-based Standards for the Selection of Health Measurement Instruments) [36] aims to improve outcome measure selection by standardizing the selection process. The COSMIN risk of bias checklist [37] is recommended for systematic reviews to determine the methodological quality of each of the included studies. The purpose of applying this checklist is to determine whether the results of the reviews are trustworthy. In this process, content validity is the most important measurement property because it is essential that all the items in the PROM are relevant and complete regarding the target population. Structural validity, internal consistency, and cross-cultural validity are also examined to allow for scrutiny of the internal structure of the PROM, which tells us something about the relationship between items and the sub-scales of the instrument.
The systematic reviews evaluated by us applied several tools to evaluate the psychometric properties of their selected PROMS. There was variability of the findings across the reviews, and this together with the lack of a consistent approach prevented effective comparison. One study did not explore the psychometric properties of any of the tools they selected [5]. This was likely because the focus of their process was selection of the core outcome set by expert opinion that would be most appropriate for the mixed IBD population they were interested in. The approach that was used followed an established Delphi process [38]. Only two studies used the COSMIN approach [39,40] and were also the studies exploring the psychometric properties of their included studies most completely. Chen et al. [11] graded the EIBDQ tool 'good' for consistency, but Alrubaiy et al. [10] graded it 'poor'. Chen et al. rated the IBDQ-32 'good' for consistency, but Alrubaiy et al. rated it 'fair/poor'. It is likely that a greater level of consistency in their findings would have been achieved had all authors defined terms in the same way and been guided as to the criteria that correspond with levels of quality. Overall, two out of the five systematic reviews [39,40] provide high quality trustworthy information about the selected PROMS.
In terms of an overall recommendation of PROMS, Kim et al. [5] strongly recommended the IBD-Control questionnaire [35] as a quick and easy tool to use in clinical care that did not require a license for use. They recognised that the IBDQ was the tool most used in research but the need for a license, and the time taken for completion has meant it has not been embedded successfully in clinical practice. Three reviews concluded that the IBDQ-32 was the most widely used and published instrument with good reliability and validity [10][11][12], and it is available in several languages. Comparability of results between trials and clinical populations is a key benefit of using of tool that has been used extensively and when a measure has been determined to be appropriate for the population it should be widely adopted. Thus, while there are myriad tools available, these reviews favour the IBDQ-32 and the IBD-Control.

IBDQ-32
The IBDQ-32 is composed of 32 comprehensive questions related to IBD symptoms, general well-being, and mood over the previous 2 weeks. Questions include elements such as how often during that last 2 weeks have you 'had to avoid attending events, had a problem with passing large amounts of gas, felt worried or anxious, troubled by nausea'. The PROM is patient administered with patients selecting responses from a seven-point scale, phrased as 'none of the time' (1) to 'all of the time' (7), or terminology specific to the question such as 'no energy at all' (1) to 'full of energy' (7).
Our previously unpublished work on patient and public involvement in research suggests that the terminology used in PROMs is important to patients generally, as well as to those with a stoma particularly, when describing IBD. Many tools refer to 'having bowels opened' or visits to the toilet, which are not relatable to their experience of having a stoma. From the PROMs described, only the IBDQ has a version specifically for patients with a stoma.
The scoring system (1-7 per question) allows for detection of subtle changes in the patient's perceived quality of life over a period of time. The widespread use of the IBDQ and range of populations and languages where validity and reliability have been demonstrated make it a good candidate for widespread adoption. However, its usefulness is limited in clinical practice, or where research funding is an issue, due to the need to purchase a license for copyright use.

IBD-Control
The IBD-Control is a patient-administered questionnaire of 14 questions presented in a variety of formats. Responses are primarily yes, no, or not sure, except question 2 regarding bowel symptoms over the previous two weeks, which are 'better, no change, or worse'. Other questions relate to IBD symptoms include pain, sleep, fatigue, and missed activities. A further question asks the patient to indicate what they would like to discuss at their next clinic appointment including changing treatment, side effects of treatment, new symptoms, or disease self-management. The final question asks the patient to rate the overall control of their IBD over the previous two weeks and is presented in a visual analogue scale from the worst possible/no control to the best possible. The terminology used in the questions is neutral and does not relate specifically to bowel movements or emptying a stoma bag, and therefore is relevant to patients with or without a stoma. However, the section of the IBD-Control regarding discussion points at the next clinic visit is unlikely to be useful within the context of a clinical trial, where participants are unlikely to have the option to change treatments or treatment patterns.

Disability Versus Quality of Life
Allen et al. [29] defines disease related quality of life as the subjective feelings and experiences of the patient, where disability relates to the restrictions and limitations on normal activity caused by the disease. Of the PROMs reviewed, some focus specifically on disability or the impact of disability, for example, the IBD Disability Score [29] or the IBD Disability Index [30]. Others focus on disease-related aspects of quality of life, such as the IBDQ [41] and IBD-Control [35]. The closest we can get to understanding whether patients have a view as to the utility of each type of tool comes from the Kim et al. study [5]. Their journey to discover the most appropriate set of measures or domains that should be included involved service users. Their focus groups with patients identified the importance of issues such as survival and complications from treatment, but it is difficult to identify the patients' preferences about other selected measures from their report. Generally, there is a lack of good quality evidence about what patients consider the most valuable measures we can use. Therefore, clinicians and researchers must determine which aspect of the patient's experience they are attempting to capture in selection of the most appropriate PROM and where possible use Public and Patient Involvement groups and principles to make sure they are setting out to measure what is of real importance to the patient.

Limitations
The search strategy is a limitation in our findings. In relying on previous systematic reviews, we did not capture all currently available IBD PROMS, only those that had been included in a published systematic review that appears on the COSMIN database. Other PROMs are available that have not been included in our analysis. For example, in a Delphi consensus process, Ghosh et al. [40] developed the patient-administered IBD Disk, a shortened version of the IBD Disability Index often used in research, for routine use in clinical practice. The IBD Disk has only recently been validated and has been shown to have good validity and high consistency [41]. The IBD Disk is composed of 10 statements to which patients score 0 (absolutely disagree) to 10 (absolutely agree), with the highest score reflecting the greatest burden of disease. The statements relate to 10 key areas of difficulties including 'abdominal pain, controlling defecation, interpersonal interactions, education/work, sleep, energy, emotions, body image, sexual function and joint pain'. The tool is presented in a user-friendly, coloured disk format. The visual format makes it easy to see a pattern of change in the patients perceived disease burden over time.
The IBD Disk is included in a recent systematic review by Van Andel et al. [42] that did not appear on the COSMIN database search. This review identified many of the PROMs included in our analysis with the addition of a large number of others, including some that appear to be disease indices rather than true PROMs, the IBD-QOL questionnaire and Function Related Quality of Life Instrument, both intended for use in clinical trials, as well as multiple other versions of the IBDQ. In total, they report 44 different IBD-related PROMs. Of these, they report that only five have sufficient evidence. These included, in terms of quality of life, the Crohn's Life Impact Questionnaire (CLIQ) and IBDQ-32 of moderate quality and of low quality for comprehensiveness, as well as the IBD-Control with sufficient comprehensiveness and low-quality evidence. The CLIQ is comprised of 27 statements that the patient responds true (score 1) or false (score 0) to. Statements include elements such as 'I only feel comfortable at home, I feel dependent on others, and I rarely feel clean'. The authors suggest that the PROM is useful in both clinical and research settings, with the advantage of being specific to CD for this patient group. However, the binary nature of the scoring system may make it difficult to detect subtle changes in the patient's quality of life over a period of time.

Conclusions
The four PROMs discussed each have a slightly different focus and format, with the IBDQ and CLIQ focusing on disease-related quality of life, the IBD-Control on disease control, the IBD Disk on disease burden. The sheer number of PROMs identified, the variability in quality of evidence, and the dilemma between measuring quality of life or disability demonstrates the scale of the challenge posed to clinicians and researchers in selecting appropriate tools. In this respect then, it seems reasonable to focus outcome measures on those items identified in the Kim et al. COS [5] for both research and clinical practice. This will enable effective comparison of research results and clinical outcomes in patients with IBD. In terms of selection of an appropriate PROM, it does not appear that any one tool is entirely suitable for both research and clinical practice. The IBDQ-32 is most widely used in research but has the limitation of cost. The IBD-Control is recommended in the clinical COS but contains questions related to what patients would like to discuss at their next appointment, which may not be relevant to research aims in a clinical trial. In addition, differences in patient experience between those with and without a stoma must be respected. However, if it is possible to narrow the number of PROMs used in practice to these two, then comparison of outcomes could be more effective.