Trials and Tribulations of Collecting Evidence on Effectiveness in Disability-Inclusive Development: A Narrative Review

Disability-inclusive development is important because there are a billion people with disabilities, and they often fall behind in income, education, health, and wellbeing. More and better evidence is needed on the effectiveness of how development interventions include and target people with disabilities. This review outlines some of the methodological challenges facing impact evaluations of disability-inclusive development interventions. Identifying people with disabilities is complex. Most approaches focus on impairment or functional limitations. They may or may not recognise environmental or personal factors, which influence the experience of disability. The Washington Group Short Set is widely endorsed for disability assessment; the addition of anxiety and depression items may enhance this tool further. The appropriate outcomes for the impact evaluation should be selected based on the aims and target audience of the intervention, the availability of appropriate tools, and after consultation with people with disabilities. New and better tools are needed to measure the range of impacts that may occur with greater accuracy, including impacts that are direct/indirect, proximal/distal, intended/unintended, and positive/negative. Disaggregation of data by impairment type is recommended to understand the effectiveness of interventions for different groups where the sample size is sufficient to allow meaningful comparisons. The inclusion of people with disabilities throughout the research process will improve the quality and acceptability of the study conducted.


Background: Why Do We Need More Evidence on Disability-Inclusive Development?
Development scholars, practitioners and policy-makers are increasingly taking notice of the needs of minority groups, recognising that they will have particular priorities and vulnerabilities. Among these groups, people with disabilities are important as they make up approximately one billion people globally, and they face widespread exclusion [1]. For example, people with disabilities are less likely to go to school, be employed, or be engaged in the community, and as a result, they are more likely to be poor [2,3]. They are also more likely to have poor health, which will further exacerbate these deprivations [4]. These impacts matter for individuals and their families, as they aim to live a full and engaged life. These exclusions are also a violation of fundamental human rights, as set out in the UN Convention of the Rights of Persons with Disabilities, as well as the laws of most countries [5]. Furthermore, development targets, such as the Sustainable Development Goals, will be impossible to reach without addressing this large and frequently marginalised group. Including disability in the development agenda and programmes is therefore important to maximise the quality of life of people with disabilities and their families, enable the realisation of rights, and contribute towards the achievement of development goals.

Disability Measurement
A key methodological challenge is in the identification or classification of people with disabilities. The WHO states that "disability is complex, dynamic, multidimensional, and contested" [1]. While this is indisputably true, it does not provide guidance for programme teams or researchers who are attempting to identify participants with disabilities for programmes and studies. Moreover, there is a continuous spectrum of disability, and therefore categorising people as disabled or not requires arbitrary thresholds to be set, and there are many types of disability (e.g., physical, visual, cognitive).
A helpful framework for conceptualising disability is that of the WHO-International Classification of Functioning, Disability and Health (ICF) [9]. This model includes three components of disability: impairments, activities and participation (Figure 1). The presence of a health condition is considered a necessary feature of disability. This condition may be a stroke, diabetes, or HIV, as examples. The health condition may cause an abnormality in body structure or function, known as an impairment. For instance, diabetes may cause visual impairment from diabetic retinopathy, while strokes may result in mobility or cognitive impairments. As a result of the impairment, people may find it difficult to perform activities, such as walking, communicating or understanding. Environmental factors (e.g., presence of ramps, laws protecting the rights of people with disabilities) and personal factors (e.g., wealth, education, social support) play important roles in determining whether and how impairments lead to activity limitations. Activity limitations, again in association with certain environmental factors and personal factors, may lead to participation restriction, such as exclusion from employment, education or social life. Clearly, the pathway from a health condition to participation restriction will not be the same for everyone. It is also apparent that enhancing health services, assistive devices, environmental and personal factors can limit the disabling effect of a health condition, and these are key targets of disability-inclusive development. Environmental factors (e.g., presence of ramps, laws protecting the rights of people with disabilities) and personal factors (e.g., wealth, education, social support) play important roles in determining whether and how impairments lead to activity limitations. Activity limitations, again in association with certain environmental factors and personal factors, may lead to participation restriction, such as exclusion from employment, education or social life. Clearly, the pathway from a health condition to participation restriction will not be the same for everyone. It is also apparent that enhancing health services, assistive devices, environmental and personal factors can limit the disabling effect of a health condition, and these are key targets of disability-inclusive development. The ICF model can therefore serve as a conceptual framework for classifying people as having a disability or not, and a variety of methods are available.
The simplest technique is to ask the direct question "Are you disabled?" or "Do you have a disability?" While this question is short and succinct, it is likely to substantially under-report the true prevalence of disability as people may not consider themselves to be disabled (e.g., older adults with functional limitations due to ageing) or may not want to declare themselves as such (e.g., in the United Kingdom, fifty per cent of people covered by the Disability Discrimination Act do not identify as disabled, according to Department of Work and Pensions research) [10,11]. Alternatives exist to this simple approach. The UK 2011 Census question asked, "Are your day-to-day activities limited because of a health problem or disability which has lasted, or is expected to last, at least 12 months?" The guidance is to "include problems related to old age" to ensure that this group is not missed. However, this question may still miss certain groups, such as people with mental health conditions.
A second approach to measuring disability is to use clinical tools to directly determine the presence of an impairment, such as visual impairment, hearing impairment or cognitive deficit. While this approach is objective and is important for planning medical services, it does not capture the impact that the impairment has on the person's day-to-day life, and it ignores the impact of the environment. Moreover, clinical assessments can be costly and time-consuming and thus are beyond the scope of many surveys or censuses. Mobile-based tools are being developed to fill this gap by allowing cheap and quick measurement of impairments by lower grade health workers, such as Peek for visual impairment and HearTest for hearing assessments [12,13]. However, such tools are not yet available for all impairment types.
Another method of measuring disability is through self-reported functional assessment. This approach is used by the Washington Group, [14] which is widely recommended for disability assessment in surveys and censuses (Box 1). Here, people are asked if they have difficulties in different domains of functioning. For the Washington Group Short Set, these are: seeing, hearing, walking, remembering/concentrating, self-care or communicating. Those who report that they have "a lot of difficulty" in a domain, or "cannot do it at all", are classified as being at high risk of participation restriction, which is taken as a proxy for "disabled". The Washington Group Short Set The ICF model can therefore serve as a conceptual framework for classifying people as having a disability or not, and a variety of methods are available.
The simplest technique is to ask the direct question "Are you disabled?" or "Do you have a disability?" While this question is short and succinct, it is likely to substantially under-report the true prevalence of disability as people may not consider themselves to be disabled (e.g., older adults with functional limitations due to ageing) or may not want to declare themselves as such (e.g., in the United Kingdom, fifty per cent of people covered by the Disability Discrimination Act do not identify as disabled, according to Department of Work and Pensions research) [10,11]. Alternatives exist to this simple approach. The UK 2011 Census question asked, "Are your day-to-day activities limited because of a health problem or disability which has lasted, or is expected to last, at least 12 months?" The guidance is to "include problems related to old age" to ensure that this group is not missed. However, this question may still miss certain groups, such as people with mental health conditions.
A second approach to measuring disability is to use clinical tools to directly determine the presence of an impairment, such as visual impairment, hearing impairment or cognitive deficit. While this approach is objective and is important for planning medical services, it does not capture the impact that the impairment has on the person's day-to-day life, and it ignores the impact of the environment. Moreover, clinical assessments can be costly and time-consuming and thus are beyond the scope of many surveys or censuses. Mobile-based tools are being developed to fill this gap by allowing cheap and quick measurement of impairments by lower grade health workers, such as Peek for visual impairment and HearTest for hearing assessments [12,13]. However, such tools are not yet available for all impairment types.
Another method of measuring disability is through self-reported functional assessment. This approach is used by the Washington Group, [14] which is widely recommended for disability assessment in surveys and censuses (Box 1). Here, people are asked if they have difficulties in different domains of functioning. For the Washington Group Short Set, these are: seeing, hearing, walking, remembering/concentrating, self-care or communicating. Those who report that they have "a lot of difficulty" in a domain, or "cannot do it at all", are classified as being at high risk of participation restriction, which is taken as a proxy for "disabled". The Washington Group Short Set questions have been widely used, which has created a pool of comparable data across countries and allows disaggregation of census and other survey data by disability (more on this later).
However, this method is also not without its limitations. The Washington Group tool measures functioning and is aligned with activities, but not participation. Additionally, there are concerns about the international comparability of the questions, as well as the interpretation of "a lot of difficulty" (and therefore disabled) versus only "some" difficulty (not disabled) [15]. The cut-off of "a lot" is relatively restrictive and thus may underestimate the true prevalence. Finally, the short set does not directly capture mental health conditions, such as depression and anxiety, which are common and influence participation. (These items are in the Washington Group Extended Set.) As a consequence, the Washington Group Short Set may underestimate the prevalence of disability. For instance, a study in the United Kingdom showed that the prevalence using the census question was three times higher than for the Washington Group Short Set, attributed to the exclusion of mental health issues and the higher severity threshold [16]. One solution is to use the Washington Group Short Set "Enhanced", which includes two additional items on the use of hands/arms and four items on anxiety/depression (Box 1). Box 1. Washington Group Short Set.

1.
Do you have difficulty seeing, even if wearing glasses? 2.
Do you have difficulty hearing, even if using a hearing aid? 3.
Do you have difficulty walking or climbing steps? 4.
Do you have difficulty remembering or concentrating? 5.
Do you have difficulty (with self-care such as) washing all over or dressing? 6.
Using your usual (customary) language, do you have difficulty communicating, for example understanding or being understood?
Response options: No difficulty/Yes-some difficulty /Yes-a lot of difficulty/Cannot do at all People are classified as having disabilities if they answer "a lot of difficulty" or more in at least one category. The Washington Group Short Set Enhanced includes additional items on upper body, anxiety and depression 7. Difficulty raising 2 litre bottle of water from waist to eye level? (No difficulty/some difficulty/a lot of difficulty/cannot do at all) 8.
Degree of difficulty using hands and fingers? (No difficulty/some difficulty/a lot of difficulty/cannot do at all) 9.
How often do you feel worried, nervous or anxious? (Daily/Weekly/Monthly/A few times a year/Never) 10. Thinking about the last time you felt worried, nervous or anxious, how would you describe the level of these feelings? (A little/A lot/Somewhere in between a little and a lot). 11. How often do you feel depressed? (Daily/Weekly/Monthly/A few times a year/Never) 12. Thinking about the last time you felt depressed, how would you describe the level of these feelings?
(A little/A lot/Somewhere in between a little and a lot).
People are classified as having a disabling mental health condition if they answer "daily" and "a lot" for depression or anxiety.
Using different approaches will generate different estimates of the prevalence of disability. For instance, in a survey in India, the prevalence of self-reported disability through the single question was 3.8% (95% CI 2.9-4.9%) [15]. The prevalence as assessed through the Washington Group Short Set was 7.5% (5.9-9.4%) and through impairment measures was 10.5% (9.4-11.7%). These different estimates will have different implications for planning services in the population.
An overarching concern is that none of these measures fully account for environmental barriers which are variously reported as being a disability [17] or a major contributor to disability [18]. Nor do these measures account for personal factors which may also be critical. These dimensions are captured in other existing tools, such as the Model Disability Survey (MDS) [19]. The MDS brief module includes 38 items, across dimensions of environmental factors, functioning, capacity and health conditions, and personal assistance and assistive products. It takes approximately 12 min to administer. A more extensive version is also available, but it may take more than two hours to complete. A concern with the MDS measure of disability is that it gives a continuous score, rather than allowing people easily to be classified as disabled or not, although this does reflect the reality of disability.
The key message is that different approaches are needed to measure disability. Selecting the appropriate tool will depend on how many questions can be included in a questionnaire and whether it is important to gather impairment data (e.g., for developing health services).

Definition of Outcome-What Is Effectiveness?
The ambition of disability-inclusive development is to end poverty and deprivations for people with disabilities. These outcomes can be considered in terms of the Sustainable Development Goals (SDGs), as an example, and thus include the reduction in poverty (SDG 1) and hunger (SDG 2), improvement in health (SDG 3), education (SDG 4) and livelihoods (SDG 8). Different disability-inclusive development programmes, therefore, have different aims and focus on different sectors. However, they generally attempt to reach their goals by improving the participation of people with disabilities. Enhanced participation is achieved in two main ways:

1.
Improving the capabilities of individual people with disabilities (e.g., through rehabilitation, vocational training or provision of assistive devices). This category can include compensating people with disabilities through social assistance.

2.
Changing environmental factors, for example, by improving attitudes towards people with disabilities, removing architectural barriers, and/or promoting institutional change (e.g., by establishing laws to protect the rights of disabled people).
These different approaches have pros and cons. Targeting individuals may be most straightforward, but does not tackle underlying drivers of exclusion among people with disabilities. Providing social assistance can help people transition out of poverty, but these transfers usually have to be claimed by people with disabilities, which will mean not everyone gets them, there may be an associated stigma, and there is an administrative cost. Environmental changes may be difficult to achieve, but should benefit other groups as well and continue into the future.
Programmes, therefore, differ not only in their sectors of intervention (e.g., health, education), but also the approach they use to promote inclusion. The measures of effectiveness will consequently vary for impact evaluations of different types of disability-inclusive development programmes. Research studies, therefore, need to develop a logic model for their intervention, such as by producing a Theory of Change, which will help to identify appropriate effectiveness measures. For this process, they outline the intervention that they are investigating and how they believe it is operating, and this will show the relevant intermediary and impact outcomes that should be captured by the study. These measures must reflect outcomes that are important for people with disabilities, so that success or not of programmes is defined on their own terms. People with disabilities should therefore participate in programme logic modelling, such as the Theory of Change determination, and the selection of outcome measures. Measures must also be broad enough to capture impacts that are intended or unintended, positive or negative, direct and indirect.
As an example, we undertook an impact evaluation of the Disability Allowance in the Maldives [20]. In this programme, a monthly payment is made to people with disabilities, and they also offer referrals to medical services, such as rehabilitation or provision of assistive devices. Our Theory of Change for the programme was developed with key stakeholders, including people with disabilities. It set out that the programme would have a number of intermediate outcomes, such as increased employment and school participation, and that the ultimate outcomes anticipated were reduced poverty and improved quality of life for people with disabilities and their families. We therefore, developed a questionnaire that captured the intermediate and ultimate outcomes and administered this questionnaire at baseline and one year after enrolment in the Disability Allowance.
Different steps in the Theory of Change can therefore be measured as intermediate outcomes or impacts of disability-inclusive development. However, each of these individual outcomes may be difficult to measure because tools are lacking or incomplete or are not appropriate for people with disabilities. For instance, tackling "stigma" is often a stated priority of disability-inclusive development programmes, as stigma is believed to be widespread and exacerbates exclusion. However, a recent systematic review of interventions to reduce the stigma experienced by children with disabilities and their families in LMICs found a lack of validated and consistently used scales [21]. A further concern is the lack of clarity of what we mean when we are discussing stigma. Do we mean negative attitudes of others towards the person with disabilities, or does it include self-stigma? Does stigma include discrimination, whereby people are excluded from services, or is that something else? And does stigma encompass emotional and verbal abuse, or even physical violence, or is this manifestation of negative attitudes a separate construct? Perhaps it is best to abandon discussions of stigma and instead focus on the different dimensions of negative attitudes, but these will need to be defined with respect to disability, and appropriate tools will need development.
Improving "participation" is another frequent goal of disability-inclusive development but is also tricky to measure. A variety of approaches have been proposed to measure the "participation" of people with disabilities. For instance, the Norwegian research organization SINTEF has conducted a number of surveys of living conditions among people with disabilities in different countries in Africa and Asia [22]. They included items on participation in community and family life. However, the desire to be engaged in each of these activities will vary between people, regardless of disability; some people will choose not to engage in community activities, irrespective of whether they can participate. It may therefore be better to think in terms of participation potential, or in Sen's terms, "capability" to participate, rather than participation itself [23]. Yet capabilities are also hard to capture, and we often end up measuring functioning instead. That is, we focus on what people have done/do, rather than what they are able to do.
Another issue arises when tools do exist but do not capture the experience of people with disabilities. Let us consider poverty. Many standard tools exist to measure poverty, whether focusing on assets, income, expenditure or multi-dimensional measures. However, people with disabilities frequently incur extra costs as a result of their disability because of their need for assistive devices, medicines, personal assistants, accessible transport, and so on. Therefore, for a given level of income or expenditure, people with disabilities often have a lower standard of living compared to those without disabilities. This is what Sen means by "Conversion Handicap" [24,25]. An income level which provides a good standard of living for someone without a disability may not be sufficient for someone who is disabled. We, therefore, need to adjust poverty measures to take these extra costs into account, but methods are lacking as to how this should be achieved.
Similarly, people with disabilities will on average have higher healthcare needs than those without disabilities [26,27]. Therefore, comparable levels of health service utilization among those with and without disabilities in reality means that people with disabilities have greater unmet needs. Another example is access to clean water, adequate sanitation and hygiene. A questionnaire may indicate that water and sanitation are available in a household, yet people with disabilities may not be able to use these facilities, and thus the standard questionnaire will fail to capture their marginalisation [28].
In other scenarios, the standard outcome measures do not need adjustment but are inappropriate for people with disabilities. For instance, educational interventions are unlikely to equalise school exam results for people with severe intellectual impairments compared to their peers without. This does not mean, however, that education is useless for this group, but rather that different outcomes of educational interventions should be assessed, such as the ability to live independently, communication skills, happiness, or self-confidence.
The key message is that defining the outcomes for disability-inclusive development interventions needs to take into consideration the ambitions of the programme and be selected in partnership with people with disabilities. New tools may need to be developed.

Diversity and Disaggregation-For Whom Is It Effective?
Throughout this review, we have written about "people with disabilities", but in fact, this is an extremely heterogeneous group that includes people with different impairment types, across the full range of age, gender, ethnicity, and living in different environmental contexts. Even a more restricted category will still be extremely diverse. For instance, the category of children with disabilities related to Zika includes children with severe impairments and those more mildly affected, those in rich households with better access to services, and those in deprived households who are already struggling, children living in Brazil and others living in Angola, and so on.
This diversity is an important feature of disability, as specific interventions may be most suitable or most needed for certain groups living in particular situations. For example, children with intellectual disabilities are often the most excluded, and arguably among the most complex to include in education. In contrast, children with visual impairments often fare better and inclusion may be more straightforward [29]. Violence is a major concern for all people with disabilities, [30] but women and girls may be at particularly high risk. It is therefore important that we test the impact of disability-inclusive development interventions for particular groups, or that results are disaggregated by features such as gender and impairment type.
Disaggregation of data by impairment type is not always feasible. Consider a trial that was undertaken of disability-inclusive sanitation promotion in Malawi [31]. The sample size required for the trial was estimated by comparing the likely improvement in sanitation facilities in the group that received the inclusive sanitation programme (intervention), compared to the group that did not (control arm). In this study, the assumption was made that in the control arm approximately 30% of households of people would improve their sanitation facilities during the course of the study, but that this would reach 50% in the intervention arm. Factoring in other things, like the loss to follow-up, the researchers concluded that they would need to include 175 households with persons with disabilities in each arm of the study to have a power of 80% to detect the expected difference. However, when the researchers started the study the prevalence of disability was much lower than they had anticipated, and therefore they only managed to find half the number of people with disabilities required (171 total, 78 in the control arm and 93 in the intervention arm), leaving the study under-powered. Disaggregating the data by disability type enhanced this problem further; "difficulties seeing" was the largest disability type but still only encompassed 21 controls and 26 intervention participants. With these group sizes, there was a lack of power to show whether the effect of the intervention is different in different groups. Disaggregation of data is therefore unlikely to allow us to determine if an intervention is more or less effective for one disability type than another, unless the study is extremely large, which is usually not feasible.
An alternative is to consider the effectiveness of an intervention only for a restricted group of people, such as people with a particular impairment type. This restriction would allow us to make more conclusive assessments for that particular group. The disadvantage is that the relevance of findings will be more limited and potentially not generalisable to other groups. Additionally, most disability-inclusive development programmes attempt to work across multiple disability types rather than focus on specific groups.
The key message is that disaggregation of data by impairment type is recommended to understand the effectiveness of interventions for different groups but requires studies where the sample size is sufficient to allow meaningful comparisons.

Disability Inclusive Research
The Human Rights Approach to disability is guided by the principle of "Nothing about us, without us" [32]. This principle means that people with disabilities should be included in a meaningful capacity in activities that relate to them, and that includes during the conduct of research. This requirement is actually self-evident-it is widely accepted that research on gender should not be conducted only by men or studies on ethnicity only by people who are White, and the same holds true for disability.
There are many reasons why disability-inclusive research will produce better evidence. People with disabilities often have far greater insights into the potential impacts of and solutions for exclusion and are therefore best placed to set the research agenda and develop tools. Furthermore, in the conduct of research participants with disabilities may be more willing to talk honestly and openly with researchers who they recognize have disabilities themselves. In the analysis of data, researchers with disabilities or who are carers may offer additional insights into the results obtained. They may also contribute to writing up and disseminating the findings in a way that is sensitive and appropriate.
There are different levels of disability-inclusive research, which can be conceptualized as the rungs on a ladder [33]. At the lowest levels, which must be avoided, there is no consultation or participation, or worse still, manipulation of people with disabilities (e.g., compelling them to take part in a study in order to receive a service). One extreme example is the Vipeholm Dental Caries Study, where people with intellectual disabilities in Sweden in the 1940s were fed a large amount of sweets to assess whether this led to the development of dental caries [34]. A step higher, is to provide information on the research to participants with disabilities. While this is necessary in any case due to informed consent requirements, it means that the contribution of people with disabilities to the study is minimised, and the researchers are not learning from their wealth of experience and knowledge. A better approach to consulting with people in the research design can be used to harness this information. For instance, we recently consulted with a group of people with different disabilities in Malawi to talk through our research questions and approach and share our interview guides for their input. They made helpful suggestions on how to improve questions and suggested that we add further items. Better still is to include people with disabilities at certain or all stages of the research [35][36][37]. In a study in India, we included people with disabilities as field workers who were collecting the data, and our impression is that this improved the willingness of people with disabilities to take part in the study and potentially the quality of the data [15]. We have also partnered with qualitative researchers with disabilities, who helped in the collection of data [38]. Again, their strong rapport with the research participants led to greater depth in information collection. It is also important to include people with disabilities at the analysis stage, but this can be more complex as it may require specific analytical training and skills [35,36]. The highest form of participation is for research to be led by disabled people and their organisations, also known as emancipatory research, so that they are responsible for the entire process from setting the question, through collecting and analysing the data, to disseminating the results [39]. The biggest constraint here is in the limited number of researchers with disabilities available to lead this type of research, especially in LMICs, given the barriers people with disabilities face in education and in the workplace.
Some concrete actions are needed to improve the participation of people with disabilities in research. First, reasonable accommodations must be in place, such as provision of accessible facilities, information and transport, and necessary adjustments, so that people with disabilities are enabled to be part of or lead the research team. Adjustments are often easier to make for people with visual or physical impairments, and so they may predominate among researchers with disabilities. However, consideration also must be given to the inclusion of children, people with hearing or intellectual impairments, or mental health conditions, otherwise, they will continue to be marginalised [35,36]. More training of disabled researchers is needed, including at the Master's degree and Doctoral level, to enable them to be independent researchers.
Additionally, better connections should be built between disabled people's organisations (DPOs) and academics and universities. DPOs tend to be more interested in advocacy and can be effective at disseminating findings and influencing practice. However, they may not have the same commitment to objectivity as academics. There are potential biases in research conducted by people with disabilities, including the assumption that participants will share their experiences or views, which should be addressed through training on reflexivity. On the other hand, academics may not be engaged with real-world problems and may be inaccessible or obscure. Particular attention is needed to overcome these limitations through training on impact and efforts to communicate better.
The key message is that the inclusion of people with disabilities throughout the research process will improve the quality and acceptability of the study conducted. Meaningful participation requires planning and adequate resourcing in terms of time and budget.

Conclusions
There is growing focus on disability-inclusive development, given that there are one billion people with disabilities, and they are consistently falling behind in development indicators. More and better evidence is needed to guide how these interventions should be implemented, where and for whom. This evidence should ideally be collected through impact evaluation, although studies may not provide a definitive answer on "what works", given the complexity of disability and the importance of contextual factors. Conducting impact evaluations will require more and better tools to measure outcomes that are appropriate and relevant for disability, as well as better measures of disability itself. People with disabilities and their organisations should be meaningfully included throughout the research process, which will require better training of disabled researchers, commitment to participation, and the provision of reasonable accommodations. Importantly, once the evidence is generated it should be used to better inform policy and practice, and here people with disabilities also have a crucial role to play.