It is critical that education data systems are disaggregated by disability to measure progress in achieving access to quality education for children with disabilities, and efforts to enable this are moving forward globally. Disability-disaggregated education data are required to track progress towards various frameworks including the Convention on the Rights of Persons with Disabilities (CRPD) [1
], the Sustainable Development Goals (SDG) [2
] and the Incheon Strategy to “Make the Right Real” for Persons with Disabilities in Asia and the Pacific [3
]. There is widespread consensus on the urgency to support Ministries of Education (MoEs) to disaggregate their Education Management Information Systems (EMISs) by disability, and the importance of doing so using tools which are valid and internationally comparable [2
]. Given the complexity of disability measurement, efforts to develop and agree upon tools for disability measurement that are valid, feasible and comparable have taken statisticians and researchers decades. Whilst debate remains lively, the urgency to gather baseline data for the SDGs has required consensus. In a statement titled Disability data disaggregation joint statement by the disability sector
], peak disability agencies such as the International Disability Alliance, the World Health Organization, UNICEF, United Nations Development Programme, and the UN Partnership to Promote the Rights of Persons with Disabilities, amongst others, agreed that the Washington Group on Disability Statistics (WG) modules should be used to disaggregate data sets to measure SDG indicators; the WG Short Set of questions for adults and the UNICEF/WG Child Functioning Module (CFM) for children.
The CFM has been developed for measuring child functioning in surveys with parents/caregivers as proxy respondents for the child’s functioning. It has been validated in different settings [7
] and was finalised in 2016. The CFM is designed for children between two and 17 years and covers a range of areas for measuring functioning difficulties, including: seeing, hearing, walking, self-care, speaking, learning, remembering, anxiety/worry, depression/sadness, controlling behaviour, attention/concentrating, accepting changes in routine and making friends. Response categories for most questions are: “no difficulty”, “some difficulty”, “a lot of difficulty” and “cannot do it at all”.
Recent advice from the United States Agency for International Development (USAID), a key donor, is that the WG Short Set and/or the CFM should be used wherever possible in USAID-funded education programs to disaggregate data sets [17
]. This is a positive indication of donor commitment to measuring outcomes towards fundamental human rights. If the CFM is to be used to disaggregate EMISs, it is critical that its properties are understood when proxy respondents are teachers and to test its measurement accuracy when used for education systems.
There are various purposes for which disability identification is needed in data aimed at ensuring inclusion of children with disabilities. From determining funding eligibility at an organizational or an individual level, determining learning and support needs of a student, to comparing equalization of access to socioeconomic rights through disability-disaggregated census or household survey data. The purpose has implications for the approach to disability identification, and for the degree of accuracy required in the instrument that determines the classification of disability. Madden highlighted the importance of designing valid tools which take into account the evidence for and consequences of score interpretation and use, and establishing meaningful thresholds on the spectrum of disability experience [18
Disability can be seen as a continuum ranging from minimal difficulties to fundamental impacts on a person’s life. On an instrument designed to measure this functioning to disability continuum, the point along the span which is used to define someone as having a disability is referred to in this paper as a cut-off. It is critical that the rationale and implications of the cut-off are clearly understood. If for example the cut-off is relatively low on the continuum and includes mild disabilities (such as difficulty seeing which can be entirely overcome with glasses), the number of children counted as having disability will be high. Whereas if severe disability is the cut-off (having a great deal of difficulty with basic functions), the number of children counted as having disability will be comparatively low. The cut-off level must be appropriate to, and will alter depending on, the purpose for identifying disability. Education systems may consider it important to identify children with mild and moderate disability to enable early intervention and educational accommodations, whereas a scheme establishing eligibility for monetary benefits may target a higher level of disability [19
The recommended criterion for identifying disability using CFM is having difficulty functioning at the level of at least “a lot of difficulty” [20
], or “daily” for anxiety and depression questions. The USAID guidance document [17
] states that “for a more nuanced analysis of disability, the answers can be used as a regular scale, with “cannot do it at all” denoting severe disability while “some difficulty” denoting minor disability in each functional domain. Answers across all domains can also be combined into a larger scale.
” (p. 4). However recent studies in Cameroon, India and Fiji [12
] indicate that there is a significant variation in how parents choose response categories to report functioning difficulties and that the cut-off “a lot of difficulty” misses significant numbers of children with moderate to severe impairments. That is, this cut-off had low sensitivity in identifying disability. When used in large household surveys or censuses the importance of these differences may be considered within acceptable margins of error. However, within an education system the tool is used for different purposes and a response cut-off with a high sensitivity is needed. Sensitivity and specificity are a trade-off and selecting a lower severity response category, for example “some difficulty”, may result in lower specificity. That is, the chance increases of falsely identifying some children as disabled who do not have a disability.
In a rapidly modernising information technology age, EMISs are increasingly based on individual electronic data files [24
]. Data from these systems are not only used to monitor and evaluate progress towards inclusive education at a large area level but are capable of and being used to determine individual student eligibility for funding related to disability status. A tool appropriate for national surveys may not also be reliable or valid in identifying individual students’ levels of functioning. It is critical that people making decisions about incorporating disability within EMISs understand that tools they are being advised to use for national or large area monitoring may have limitations for individual level assessments.
This study was undertaken in the context of an Australian aid funded education sector project in Fiji. The required purposes for disability data in Fiji’s EMIS included identification of children with disabilities, by disability type and severity, to enable resource allocation based on individual level data, and to enable monitoring, planning and reporting against policy and other commitments. The key question for the Fiji MoE was the extent to which the CFM is effective when used by teachers to identify the presence and severity of disability amongst children in Fiji. Validity and reliability of specific domains (seeing, hearing, walking, speech and cognition) were reported elsewhere [21
]. This paper focuses on the performance of the CFM as a whole. With the overarching aim of identifying a valid, reliable and feasible method for Fiji to identify children with disabilities in schools to enable monitoring, planning and reporting against policy commitments, the objectives of this paper are to:
Determine the validity (sensitivity and specificity) of different cut-off levels of the CFM for predicting the presence of disabilities in primary school aged Fijian children compared to standard clinical assessments of impairment.
Determine the inter-rater reliability between teacher and parent CFM responses.
4. Discussion, Limitations and Further Research
This study identified that the CFM is a useful core aspect of data required for disability disaggregation of Fiji’s EMIS and that teachers are adequately accurate proxy respondents to the CFM. However, the mixture of severity of impairments reported across CFM response categories and ambiguity in the choice of cut-off level, in both parent and teacher results, are limitations of the CFM and indicate that the CFM may not be accurate enough to be used as the sole method for identifying children with disabilities.
The first objective of this study was to determine the validity (sensitivity and specificity) of the CFM, which is operationally defined as the extent to which an overall score on the CFM at a given cut-off level identifies children who have an impairment as assessed using reference standard, or “gold standard”, clinical measures. For assessing sensitivity and specificity of the CFM, this paper effectively defines disability as clinically assessed impairment of a moderate or more severe level. There is debate about this medical perspective but for our purposes, it provides an objective assessment (in the sense of being made independently of those who stand to gain or lose from the assessment, or might perceive that they do), and so we have accepted it as the best available reference standard.
Overall diagnostic accuracy (a combined value of sensitivity and specificity) of the CFM was found to be just “fair” based on combined results from seeing, hearing, walking, speaking, learning, remembering and focusing attention, i.e., CFM-7. This is substantially lower than the previously reported accuracy of individual domain-specific questions on speaking, walking, seeing and hearing [21
], which are perhaps more observable functions. The cognitive domains had “fair” to “poor” accuracy (22). Given the variation in accuracy across the different domains in the module ranging from excellent to poor, it is not surprising that overall accuracy is only “fair”. This finding indicates that CFM-7 may not be accurate enough to be used as the sole method for identifying children with disabilities.
Whilst diagnostic accuracy of parent observations related to seeing, walking and speaking is stronger than that of teachers, teacher accuracy is acceptable, ranging from “good” to “very good” (between 0.823–0.909). Conversely, for the domains learning, remembering and focusing attention, teacher results are stronger than parent results. For hearing, the accuracy is high and very similar between respondent types.
To disaggregate Fiji’s EMIS by disability, it is important to identify the appropriate cut-off level of the CFM. The field testing of CFM as part of population-based surveys in Samoa, Mexico and Serbia showed that the “some difficulty” cut-off estimates a very high prevalence compared to the “a lot of difficulty” cut-off [15
]. The cut-off recommended by UNICEF/ Washington Group is “a lot of difficulty” [20
]. However, in our study a significant proportion of children with moderate or higher clinical impairment were reported as having only “some difficulty” on CFM-7, comprising seeing, hearing, walking, speaking, learning, remembering and focusing attention domains (Table 3
). These children would therefore miss out on services if the cut-off were “a lot of difficulty”. Based just on these domains, approximately half of children with moderate clinical impairments (52.4%P
) and a third of children with severe impairments (38.8%P
) would miss out on services if the cut-off level were “a lot of difficulty”. However, when CFM-13 was considered (which includes the additional 6 questions), not surprisingly the chance of missing children is reduced, and the proportions were reduced to some extent. Despite this, 39.7%P
of children with moderate clinical impairments and 27.5%P
of children with severe impairments would be missed. When domain-specific findings are considered, it is the children with moderate-severe cognitive impairments who miss out in greatest numbers [21
]. The decision to select a cut-off must also consider the fact that 47.8%P
of children with no clinical impairment are reported as having “some difficulty”. Our findings indicate that children reported as having “some difficulty” can neither be ignored nor be assumed to have disability.
The cross-tabulation also highlights the fact that the three CFM response categories—“some difficulty”, “a lot of difficulty” and “cannot do at all”—do not relate to the same levels of severity across different functioning domains. This is in contrast with the recommendations on the interpretation of these categories by UNICEF/Washington Group [20
] and USAID [17
]. Whilst most moderate impairments are reported as “some difficulty”, children with severe impairments are showing up relatively evenly across the three response categories, and the response categories do not have the same meaning across different domains. For example, the category “cannot do at all” picks up a large proportion of children with severe musculoskeletal impairment yet it picks up only approximately 2% of children with severe cognitive impairment. This extreme response category is used to a small extent for questions on hearing, walking, speaking and seeing, but almost never used for questions on learning, remembering and focusing attention.
The CFM is described as being able “to determine the proportion of those who have mild difficulties (at least some difficulty
on one or more domains of functioning), or moderate levels of difficulty (those who respond at least a lot of difficulty
) or those with severe difficulties (those who respond cannot do at all
] (p. 487). However, our findings suggest that this interpretation of the CFM response categories across disability domains would not work in Fiji. Mitra emphasised the value of using a “trichotomy” (severe, moderate and no difficulty), in which classification of people with moderate functional difficulty was based on “some difficulty” in at least one domain with no higher levels of difficulty recorded [43
]. This is consistent with our finding that the cut-off “some difficulty” included most of our children with moderate impairments, however the challenge remains that many children without impairments were also recorded as having “some difficulty”.
The ROC curve results from earlier reports were complicated and varied across domains and methods, including sensitivity, specificity, the Youden Index and likelihood ratios. For the domains seeing, hearing, walking and speaking, “some difficulty” was a far more accurate cut-off than other levels [21
]. The cognitive domains learning, remembering and focusing attention also indicate the cut-off “some difficulty” as the best, with teacher results being superior to parents at identifying children with cognitive impairments [22
However, contrary to the individual domain-specific results, the diagnostic accuracy results for the CFM-7 showed “a lot of difficulty” as the best cut-off, albeit only marginally better. This is because at “some difficulty” sensitivity is excellent (0.98P/0.96T) but specificity is very poor (0.33P/0.42T). At the cut-off “a lot of difficulty” specificity was much better (0.80P/0.82T) but sensitivity dropped significantly (0.55P/0.57T). Notably, the Youden Index for the overall CFM was quite low at either cut-off (0.31P/0.40T for “some difficulty” and 0.36P/0.39T “a lot of difficulty”). This was not surprising given the disappointing diagnostic accuracy of the CFM-7 as only “fair”. These results further highlight an important shortcoming in diagnostic accuracy of the CFM-7: there is no clear and strong cut-off response category for the overall CFM and the cut-off which performs best for individual functional domains is different from that for the overall module.
The high proportion of children reported as having “some difficulty” on the six domains without a clinical reference standard highlights the need for further research to understand the impact of the cut-off level on identifying children with difficulties in these domains.
The second objective was to determine the inter-rater reliability between teacher and parent CFM responses. Our study showed that IRR of the CFM-13 is “good” (0.68), which in theory contributes to the case that the CFM can be used with teachers as respondents. However, there is great variation in IRR across domains [21
]. The potentially more observable domains (hearing, walking and speaking) have “excellent” IRR followed by “good” IRR for self-care, seeing and learning.
However, IRR needs to be considered in relation to accuracy. For example, if both respondents are equally “wrong”, the IRR may be high but this does not mean the tool is useful. Or, if parent responses are “wrong”, a low IRR could be positively interpreted in terms of teacher use of the tool. Considering accuracy together with IRR between parents and teachers, the most accurate and reliable CFM questions relate to the domains of seeing, hearing, walking and speaking. Of the CFM questions for which this study does not have clinical reference standards (and therefore no diagnostic accuracy analysis)—self-care, anxiety, sadness, controlling behaviour, accepting changes and making friends—it is harder to interpret the largely poor IRR results. This may reflect poorly on the questions or may imply varying perspectives and accuracy between parents and teachers; teachers may be in a better position to make a relative judgment for some of these items. The higher correlations between teacher results for domains which might be expected (anxiety and depression; learning and remembering; changes to routine and focusing attention) provide some indication that teachers are observing these functional domains more consistently than parents and that teacher results may be more accurate in these domains. In relation to anxiety and depression, the results highlight a potentially important role for teachers in Fiji in identifying children at risk of psychosocial distress. These issues both point to important areas for future research. Research is required to investigate parent and teacher response accuracy for these domains.
Fiji’s MoE has committed to provide inclusive education in a way which leaves no one behind [44
] and following this study commenced disability inclusion grants to schools, calculated by number of children with disabilities. Messick [45
] and Shepard [46
] championed the importance of undertaking “consequential validity”, or investigation and prediction of positive and negative social consequences of a test. The implication of Fiji’s policy, in relation to this study, is that if a cut-off level has a low sensitivity it misses out eligible children, which would be the case if “a lot of difficulty” were used. Hence to ensure children are not missed the cut-off “some difficulty” must be used. However, given the significant proportion of children classified as “some difficulty” who do not have disability, follow-up assessments are required to verify presence of disability (and to identify children for whom referral services are required).
Conversely the low specificity of the “some difficulty” cut-off has cost implications regarding verification visits. Travelling to remote areas to assess children simply based on a self-reported “some difficulty” response would be cost-prohibitive and an inefficient use of already stretched MoE staff time. A solution to this challenge may be found in another series of results from the study, to be discussed in a subsequent paper, showing that the combination of CFM data and learning and support needs data enables a much more accurate estimation of disability. This would reduce false positives on the list of children who need verification visits.
An essential feature of the CFM to highlight, in relation to assessing disability for funding eligibility, is the self-report nature of the tool. Whether the respondent is a parent/caregiver or a teacher, the results can be biased if there is perceived financial advantage in reporting higher levels of difficulty. The disability verification visit is necessary to pre-empt over-reporting. These visits involve qualified MoE district officers visiting the schools to discuss the results with teachers and undertake basic tests with the identified children, such as visual acuity tests (Snellen chart), observations of gross and fine motor function, classroom observation, review of student records, etc. The visit offers the chance for monitoring and mentoring of efforts towards disability-inclusive education.
An important limitation common to all diagnostic accuracy studies is the assumption that the clinical assessment standards are 100% sensitive and specific themselves. That is, that the tests for vision, hearing, musculoskeletal impairment, speech and cognition are indeed “gold standards” against which the CFM can be measured. The justification for selection of the five clinical assessments along with measures to ensure accuracy of the tests and to reduce classification bias [47
] have been presented in detail elsewhere [21
] and is summarised in Appendix B
The five clinical assessments did not cover all the functioning constructs that are covered by the whole CFM (the CFM-13), specifically self-care, anxiety/worry, depression/sadness, behaviour and socialisation. We attempted to overcome this limitation by making interpretations based on IRR and simple proportions reported in different severity levels of the CFM-13. However, an outstanding recommendation for further research is for a diagnostic accuracy study which adequately covers these constructs.
A relatively high proportion of cases were from special schools (76.2%) due to the limited numbers of children with disabilities in mainstream schools. To achieve the required sample size across all five impairment groups, recruitment had to allow for this imbalance. Despite this, the target sample of 52 in each clinical impairment category was not reached for children with vision impairments (n = 35) and musculoskeletal impairments (n = 42). Future research should aim to rectify this sampling disparity and shortfall.
An important limitation relates to generalizing the findings to other populations. Of the parents/caregivers of the cases, 19% had attained a tertiary education, which is higher than the national average [48
]. The level amongst controls was 15%, which is closer to average. This highlights potential differences related to parents of children in special schools, but importantly raises the question of difference between parents of children with disabilities in school compared to those who are out of school. Future research should include out-of-school children with disabilities, whose parents may respond differently to the CFM questions.
Another limitation is that 62.8% of cases were male compared to 49.0% of controls and the mean age of cases was 10.15 years compared to 9.71 years amongst controls. However, correlations between age, sex and the CFM questions were explored, and the impact of these variations appears to be negligible. Age had significant but negligible correlation with the domains learning (0.164), remembering (0.118) and depression (0.097). Sex had significant but negligible correlation with the domains speaking (0.092), learning (0.144), controlling behaviour (0.156), focusing attention (0.096) and making friends (0.097).
Finally, the authors acknowledge the limitations of categorizing IRR values into the classifications “excellent/good/fair/poor” because it is dependent on the purpose for which the test is to be used. For the purpose of this study however, the categories provide a convenient means of comparing individual domains and the overall CFM-13.