Reliability and Validity of Non-Instrumental Clinical Assessments for Adults with Oropharyngeal Dysphagia: A Systematic Review

This systematic review on non-instrumental clinical assessment in adult oropharyngeal dysphagia (OD) provides an overview of published measures with reported reliability and validity. In alignment with PRISMA, four databases (CINAHL, Embase, PsycINFO, and PubMed) were searched, resulting in a total of 16 measures and 32 psychometric studies included. The included measures assessed any aspect of swallowing, consisted of at least one specific subscale relating to swallowing, were developed by clinical observation, targeted adults, and were developed in English. The included psychometric studies focused on adults, reported on measures for OD-related conditions, described non-instrumental clinical assessments, reported on validity or reliability, and were published in English. Methodological quality was assessed using the standard quality assessment QualSyst. Most measures targeted only restricted subdomains within the conceptual framework of non-instrumental clinical assessments. Across the 16 measures, hypothesis testing and reliability were the most reported psychometrics, whilst structural validity and content validity were the least reported. Overall, data on the reliability and validity of the included measures proved incomplete and frequently did not meet current psychometric standards. Future research should focus on the development of comprehensive non-instrumental clinical assessments for adults with OD using contemporary psychometric research methods.


Introduction
Oropharyngeal dysphagia (OD) is a symptom or a collection of symptoms of one or more underlying anatomical abnormalities or impairments and disorders in cognitive, sensory, and motor acts involved with transferring food and liquids from the mouth to the stomach [1]. OD may result in reduced efficiency and safety of swallowing, failure to maintain hydration and nutrition, risk of choking and aspiration leading to pulmonary complications, and reduced quality of life [2]. Due to these serious sequelae compromising people's health, dysphagia is one of the leading causes of death and morbidity for, but not limited to, older persons, children, and adults with neurological disorders (e.g., cerebral palsy, stroke, and dementia) and head and neck cancer patients [3]. To reduce the devastating effects of OD, early diagnosis and intervention are crucial in a patient's illness trajectory.
The first step in the management of OD is screening to identify people at risk of dysphagia. Next, those patients who fail screening are referred for further clinical assessment, for example, to identify possible causes of the swallowing problems, estimate the efficacy and safety of swallowing including the risk of aspiration, support decisions on oral or alternative feeding routes, and establish baseline data for future reference when determining the effects of interventions or the impact of a disease over time [4]. Clinical assessment may involve either instrumental or non-instrumental assessment or both. As instrumental assessment (e.g., fiberoptic or videofluoroscopic evaluation of swallowing recordings) can diagnose aspiration, including silent aspiration and other physiological problems in the pharyngeal phase, instrumental assessment is often referred to as the 'gold standard' assessment. However, no international consensus exists about which visuoperceptual measure should be used for the evaluation of swallowing recordings, and access to instrumental assessment may not always be guaranteed due to its restricted availability [5]. Moreover, the psychometric properties of many existing visuoperceptual measures are either unknown or lack methodological robustness in line with current psychometric standards [5].
Non-instrumental clinical assessment by dysphagia experts refers to an alternative method of evaluation after failed screening comprising a large variety of assessments, each of which may describe different aspects of OD given that it is a multidimensional phenomenon (e.g., medical history taking, conducting a physical examination, and patientreported functional health status or dysphagia-related quality of life). In the literature, different combinations of non-instrumental clinical assessments can be found, typically including measures of cognition and communication; evaluation of the oral, laryngeal, and pharyngeal anatomy, physiology, and function (including cranial nerve examination); oral intake, nutritional status, and mealtime observations; and intervention trials (e.g., bolus modification, head and postural adjustments, and/or swallow manoeuvres) [4,6].
In 2022, the European Society for Swallowing Disorders (ESSD) published recommendations on how to select the best evidence-based screening and non-instrumental assessments for use in clinical practice targeting different constructs, subject populations, and respondents, based on criteria for diagnostic performance, psychometric properties (reliability, validity, and responsiveness), and feasibility [6]. The ESSD also emphasised discontinuing the use of non-validated dysphagia assessments and implementing measures that demonstrate robust psychometric properties. To date, several systematic reviews have been published summarising the diagnostic performance of screening tools (e.g., Benfield, Everton [7], Bours, Speyer [8], Brodsky, Suiter [9], Kertscher, Speyer [10], O'Horo, Rogus-Pulia [11]) and the psychometric properties of visuoperceptual measures to evaluate fiberoptic or videofluoroscopic swallowing recordings [5], patient self-reported functional health status and quality-of-life questionnaires [12,13], and pediatric clinical assessments [14]. To date, no psychometric overview of clinician-reported non-instrumental clinical assessments in adults with OD has been published.
The purpose of this systematic review was to (a) summarise the characteristics of the identified non-instrumental clinical assessments for adults with OD (excluding patient self-report), (b) determine which psychometric properties related to reliability and validity were reported, and (c) construct a conceptual map of the identified measures to determine how comprehensive existing non-instrumental clinical assessments are in measuring all the underlying constructs. The reporting on the psychometric properties of measures was based on the terminology and definitions as defined in the COSMIN (Consensusbased Standards for the Selection of health Measurement Instrument) taxonomy [15,16]. Responsiveness (i.e., the ability of an instrument to detect change over time) was outside the scope of this review.

Materials and Methods
This systematic review was conducted and reported according to the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) 2020 statement and checklist [17]. The PRISMA statement and checklist (Supplementary Files S1 and S2) aim to enhance the essential and transparent reporting of systematic reviews. To report on psychometrics, terminology and definitions as defined in the COSMIN taxonomy were used [15,16] (Supplementary File S3).

Data Sources and Search Strategies
A systematic literature search was performed across four electronic databases: CINAHL, Embase, PsycINFO, and PubMed. All publication dates up to 14 February 2022 were included. Both subject headings and free text terms related to dysphagia, non-instrumental clinical assessment, and psychometrics were used to capture all relevant literature. Table 1 presents the search strategies used within this review, outlined for each database. Following the initial round of abstract selection, a further literature search was performed across the same four electronic databases using the names and acronyms of included measures to identify eligible psychometric studies. All publications up to 6 June 2022 were included.

Eligibility Criteria
The eligibility of individual measures was determined through the following inclusion and exclusion criteria: (1) Measures assessed any aspect of swallowing (including oral intake), with measures investigating eating disorders or Gastro-Esophageal Reflux Disease (GERD) excluded; (2) at least one specific subscale or a minimum of 50% of the total number of items of the measures related to swallowing; (3) measures were developed for assessment by clinical observation or eliciting clinical information by questionnaire, with all instrumental assessments, screening tools, and self-reporting questionnaires excluded; (4) measures targeted adults (i.e., 18 years old and above); and (5) measures needed to be developed originally in English, excluding translated versions of these measures.
Psychometric studies included in the systematic review met the following inclusion and exclusion criteria: (1) Studies focused on adult populations (18 years old and above); (2) studies reported on measures for conditions related to OD or swallowing difficulties, whilst any studies related to psychogenic swallowing difficulties or eating disorders (e.g., anorexia or bulimia) were excluded; (3) studies described a non-instrumental clinical assessment, so any study that focused on instrumental assessment (e.g., videofluoroscopic or endoscopic evaluation of swallowing) was excluded; (4) studies reported on psychometrics-either validity or reliability-of the included measures as defined by the COSMIN taxonomy [15], thus excluding responsiveness; and (5) studies were published in English.

Abstract and Measure Selection
Two reviewers worked independently evaluating the abstracts and titles of the records returned from the initial database search against the eligibility criteria. Abstracts were reviewed separately by the two reviewers to ensure accuracy in study selection. Any disagreements between the reviewers were discussed and, where consensus could not be reached, a third reviewer was consulted to assist in finding a resolution. None of the three reviewers had any affiliations with any of the authors of the included studies or measures. The selection process was completed according to the PRISMA guidelines and flow diagram [17], thus no evident bias in article selection was present.
Following the initial database search, a further set of searches was performed including the names and acronyms of the included measures, with the aim of locating all eligible and relevant psychometric studies. The same procedure was followed to ensure the accuracy of the selection process. A separate search was undertaken to identify potential measures and studies that met the inclusion criteria from the reference lists of the included studies.

Data Extraction
Following the selection process of both the studies and the measures, data from the remaining articles were extracted using comprehensive data extraction forms. Data were extracted under the following categories: (1) Measure characteristics (e.g., purpose, target population, subscales, range of score) and (2) psychometric properties reported within the available studies. The use of a data extraction table ensured that the same data characteristics were extracted from all included papers [18]. One reviewer extracted all data, then a second reviewer checked the extracted data for accuracy.

Methodological Quality
The standard quality assessment (QualSyst), as described by Kmet et al., 2004 [19], was performed to evaluate the methodological strength and weaknesses of the included studies. The Qualsyst critical appraisal tool provides a systematic, reproducible, and quantitative means of evaluating the methodological quality of research over a broad range of study designs. Each of the 14 Qualsyst criteria is scored individually, whereafter a total score is converted to an overall quality percentage score (a total score divided by the number of applicable items and multiplied by 100). An overall quality percentage score of 80% or higher indicates strong methodological quality, a score between 70% and 79% indicates good quality, a score between 50% and 69% indicates adequate quality, and scores below 50% indicate poor methodological quality. The criteria for good psychometric properties were adapted from Prinsen and Mokkink [20]. All ratings were performed by two independent reviewers. After a consensus was reached, any studies with poor methodology ratings (<50%) were excluded.

Conceptual Mapping of Measures
To construct a conceptual map of the identified measures, this systematic review utilised OD non-instrumental clinical assessment theory and definitions to inform a deductive thematic analysis of the findings. Following thematic synthesis of the scales and subscales of the included measures by the first and second authors, domains, sub-domains, and elements were subsequently identified, resulting in a conceptual framework of noninstrumental clinical assessment of OD.

Systematic Literature Search
From the initial search, 1430 records were retrieved from the four separate electronic databases: 277 from CINAHL, 579 from Embase, 40 from PsycINFO, and 534 from PubMed. Of these, 301 duplicates were removed. From the measure-specific search, 2513 records were retrieved from the four separate electronic databases: 312 from CINAHL, 1201 from Embase, 316 from PsycINFO, and 684 from PubMed. Of these, 478 duplicates were removed, leaving a combined 3164 articles to be reviewed. Figure 1 presents the flow chart of the studies and measures reviewed and excluded during the literature search according to the PRISMA [17]. Following this selection process, 377 studies were assessed for eligibility from which 141 individual measures were also assessed, leading to 345 studies as well as 125 measures being excluded (see Supplementary File S4). Altogether, a total of 32 original psychometric studies that focused on OD or other swallowing difficulties and included a clinical non-instrumental measure for an adult population and 16 individual measures were included.

Characteristics of Included Measures and Psychometric Studies
Descriptions and characteristics of the included measures are presented in Table 2. All 16 included measures were either developed or adapted for adult populations, with seven (44%) developed for stroke patients [21][22][23][24][25][26][27], two (12.5%) developed for adults with intellectual disability [28], and 2 (12.5%) measures developed for patients with head and neck cancers [29,30]. Measures ranged from one single scale to five subscales, with item numbers ranging from 1 [31] to 42 [32]. All measures were developed for clinical use, whilst the Eating and Drinking Ability Classification System (EDACS) can also be administered by a caregiver [33].   A tool designed to measure both the supervision level required and diet level by assigning a single number that describes whether there has been a change in functional status after the speech therapy of patients with dysphagia

Identification
Stroke patients and those with brain lesions

1.
Individual is not able to swallow anything safely by mouth.

2.
Individual is not able to swallow safely by mouth for nutrition and hydration 3.
Alternative method of feeding required as individual takes less than 50% of nutrition and hydration by mouth 4.
Swallowing is safe, but usually requires moderate cues to use compensatory strategies 5.
Swallowing is safe with minimal diet restriction and/or occasionally requires minimal cueing to use compensatory strategies 6.
Swallowing is safe, and the individual eats and drinks independently and may rarely require minimal cueing 7.
The individual's ability to eat independently is not limited by swallow function Single 7-level ordinal scale (1 = nothing by mouth; 7 = no limit by swallowing) Range: 1-7 Interpretation: ↓ scores = ↑ dysphagia severity

Dysphagia Disorders Survey
Sheppard, Hochman [28] A quantitative observation tool with capability for discriminating swallowing and feeding pathology from functionally competent patterns and providing an objective description of the clinical presentation of swallowing and feeding disorder in developmental disability (SFD-DD)  Single scale (1)-Levels:

1.
Eats and drinks safely and efficiently 2.
Eats and drinks safely but with some limitations to efficiency 3.
Eats and drinks with some limitations to safety; there may be limitations to efficiency 4.
Eats and drinks with significant limitations to safety 5.
Unable to eat and drink safely-tube feeding may be considered to provide nutrition

1.
Do you have difficulty when you eat food or drink water? 2.
Do you have difficulty when you swallow a pill? 3.
Do you cough when you eat food or drink water? 4.
Do you choke when you eat food or drink water? 5.
Do you have feeling of something stuck in the throat when you swallow? 6.
Do you feel pain when you swallow? 7.
Do you take more than 30 min to eat an average meal? 8.
Do you have drooling or spitting out food during a meal? 9.
Have you ever been diagnosed with pneumonia? 10.
Have you lost weight due to swallowing difficulty? 11. Do you have hoarse or wet voice after swallow? 12.
Do you get sputum after a meal?
Binary scoring for each item (Yes / No) Range: 0-12 (total score from the sum of all "yes" responses) Interpretation: ↑ scores = ↑ dysphagia severity

Functional Oral Intake Scale
Crary, Carnaby Mann [23] To determine patients' oral intake status, developed as an appropriate tool for estimating and documenting changes in the functional eating abilities of stroke patients over time

2.
Tube dependent with minimal attempts of food or liquid.

3.
Tube dependent with consistent oral intake of food or liquid.

4.
Total oral diet of a single consistency.

5.
Total oral diet with multiple consistencies, but requiring special preparation or compensations. 6.
Total oral diet with multiple consistencies without special preparation, but with specific food limitations. 7.
Total oral diet with no restrictions.

Mann Assessment of Swallowing Ability
Mann [27] Developed as a comprehensive clinical examination for identifying eating and swallowing disorders in patients with stroke.

Range:
Raw score (range 38-200; total score of all items) converted to severity grouping (no abnormality detected; mild; moderate; severe) for dysphagia and aspiration.

MASA-C Mann Assessment of Swallowing Ability-Cancer
Carnaby and Crary [29] Modified version of the MASA designed for cancer patients.
Patients receiving radiotherapy for head and neck cancer

Clinicianobserved
Adapted from MASA with subscales, but subscales undetermined for adapted measure (n Item Total = 24): Includes 15 of the original 24 items from the MASA, with an additional 9 cancer-specific items added 3, 4 and 5-level ordinal scales (different weighting)

Range:
Raw score (range ; total score of all items) converted to severity grouping (no abnormality detected; mild; moderate; severe) for dysphagia and aspiration.

Interpretation:
↑ scores = ↓ impairment severity An evaluative tool that assigns a numerical score to the functional abilities of the patient in the domains of self-feeding, positioning, oral motor skills for solid and liquid ingestion, and overall feeding safety.
Elderly persons with neurologic impairments

and 5-level ordinal scales (different weighting)
Range: 20-100 (total score of all items); Cut-off score: ≥95 (start oral diet); <95 (non-oral diet) Interpretation:  One of the five Nursing Outcome Classification (NOC) nursing outcomes that contain essential indicators to assess the entire swallowing process.
Ability to bring food to the mouth 2.
Integrity of the chewing structures 3.
Ability to maintain oral content in the mouth 4.
Discomfort in swallowing the bolus 5.
Emptying of the oral cavity after swallowing the bolus 6.
Postural control of the head and neck relative to the body 7.
Elevation of the larynx 10.

Conceptual Mapping of Non-Instrumental Clinical Measures
The systematic review utilised OD non-instrumental clinical assessment theory and definitions to inform a deductive thematic analysis of the findings [6]. Based on the thematic analysis, three domains were first identified, followed by sub-domains that were identified and subsumed under the most relevant domain, followed by elements that were subsumed under the most relevant sub-domain. The purpose of the conceptual mapping was to analyse the included measures in relation to how comprehensively they assess the construct of non-instrumental clinical measurement.
The content of the included measures-subscales and their items-varied and covered three domains ( Figure 2): (1) Skills Related to Eating and Drinking; (2) Making Adjustments to Facilitate Eating and Drinking; and (3) Swallowing Act. The first domain 'Skills Related to Eating and Drinking' consists of three subdomains: Eating skills (two elements: Selffeeding skills (e.g., setting up tray, grasping utensils, bringing food to mouth) and oral preparation (e.g., open mouth anticipation of food, stripping spoon, biting off, taking appropriate bolus size, sipping from cup, mastication)), oral motor skills (three elements: Movement and coordination, strength, and symmetry (e.g., of lips, tongue, soft palate)), and cognitive skills and sensory perception (two elements: Cognitive skills (e.g., alertness, cooperation, comprehension) and sensory perception (e.g., taste, smell)).
The second domain 'Making Adjustments to Facilitate Eating and Drinking' includes two subdomains: Modified aspects related to the environment (three elements: Instrumental feeding adaptation (e.g., adaptive utensils), adjustment of food and drink intake (e.g., bolus modification/food texture and drink consistency, caloric intake, nutritional supplements), and feeding support (e.g., cueing, prompting, adaptive swallowing strategies, guidance)), and modified aspects related to a person (two elements: Posture and head control (e.g., symmetrical upright sitting posture, supported head control, and alternative methods of feeding (e.g., dependency versus independency of the method of food intake: Non-oral, tube, or PEG versus oral intake)).
The third domain 'Swallowing Act' refers to two subdomains: Safety of swallowing and efficiency of swallowing. Safety of swallowing includes four elements (respiration (e.g., sputum upper airways, coordination of breathing and swallowing, pneumonia, chest status), pain and discomfort (e.g., globus feeling), pharyngeal or laryngeal clearance (e.g., aspiration, cough, choke, throat clearing, gag, pharyngeal response, laryngeal movement, voice change), and trache (i.e., tracheostomy or tracheostomy tube). The efficacy of swallowing consists of three elements (oral residue (e.g., oral food remains, multiple swallows to clear bolus, spitting food or drinks, sputum), speed (e.g., duration of completing meal, speed of oral intake, tiring), and direction (e.g., drooling or lip closure, regurgitation, vomiting, rumination)).

Validity Evidence
The validity properties of the measures-content validity, criterion validity (where applicable), and construct validity (i.e., hypothesis testing, structural validity, and crosscultural validity (where applicable))-are summarised in Table 3. Additionally, Table 4 provides an overview of the psychometric properties reported for each measure.    [26]. For other trials, see paper [26].   [42].

Results:
There was a significant positive correlation (Kendall's tau = 0.69, p < 0.01) between EDACS level and level of assistance required to bring food and fluid to the mouth and a statistically significant but only moderate positive correlation (Kendall's tau = 0.5, p < 0.01) between the EDACS and the GMFCS [33]. According to the ROC curve analysis, the optimal cut-off score to maximize the sum of sensitivity and specificity was 5, with a sensitivity of 90.9% and a specificity of 67.5% [35]. Between FOIS and pooling score, r = −0.355 (p = 0.008) for semisolids and r = −0.180 (p = 0.189) for liquids [43]. Aspect/Method: Diagnostic accuracy Results: When compared to PAS, for identifying dysphagia for liquids, FOIS had a sensitivity of 6.3% and a specificity of 94.9%, whilst for semisolids, these values were 6.1% and 95.5%, respectively. When compared with pooling score, sensitivity was 10% and specificity was 97.1% for liquids, whilst values were 13.6% and 100%, respectively, for semisolids [43].

Results:
Using Spearman correlations, The MISA score correlated moderately with age (r = −0.58, p < 0.001) but was low with gender (r = −0.34, p < 0.02). Both were negative and only age was significant. The relationship with stroke severity (discharge destination) was significant (H = 12.7, df = 3, p < 0.005). Dysphagia status was highly significant (p < 0.0001), but location of lesion was not (p < 0.01). Correlations between the MISA score and first or repeated stroke, and between MISA score and location of lesion were low, negative, and non-significant (r = −0.07, p < 0.67 and r = −0.14, p < 0.35), respectively. Low and non-significant correlations were obtained between the MISA score and type of stroke (r = 0.06, p < 0.7) [48]. Spearman rho correlations were performed on the SFAM and FOIS, and a strong significant relationship was found (ρ = 0.926, p < 0.01) at admission and (ρ = 0.706, p < 0.01) at discharge [50]. Aspect/Method: Convergent validity Results: Strong correlations (r = 0.779, p ≤ 0.001) with the food texture ratings at admission, and the SFAM levels and the liquid consistency ratings (r = 0.762, p ≤ 0.001) at admission. Moderately strong correlations (r = 0.673, p ≤ 0.001) were apparent between the SFAM levels and the food texture ratings at discharge as well as with the SFAM levels and the liquid consistency ratings (0.567, p ≤ 0.001) at discharge [51]. Aspect/Method: Predictive validity Results: When predicting discharge for age, 72% of younger (50 years old and younger) patients reached a SFAM Level 5, 6, or 7 (mild to no dysphagia) as compared to 51% of older patients. 59% of younger patients had a length of stay of 14 days or less as compared to 27% of the older patients. When predicting discharge for patients with a cognitive FIM score of 14 or lower, 82% had severe dysphagia (SFAM score of 1 or 2) as compared to 35% that had moderate dysphagia (SFAM score of 3 or 4). 61% had a length of stay of 15 days or more as compared to 39% who had a length of stay of 14 days or less [52].   Spearman correlations coefficient was used with SPEAD-rate and subjective swallowing outcomes, with ρ = 0.71 (p < 0.001) for self-rated percentage eating and drinking speed, ρ = 0.72 (p < 0.001) for self-rated percentage swallow function, ρ = −0.68 (p < 0.001) for SWAL-QOL total score, and ρ = −0.70 for degree of dysphagia by SLP. Similarly, correlations were found with SPEAD-rate and objective swallowing outcomes, with ρ = 0.70 (p < 0.001) for FOIS, ρ = −0.51 (p = 0.001) for DIGEST grade, ρ = −0.50 (p = 0.001) for aspiration on VFS, and ρ = 0.49 (p < 0.001) for maximal mouth opening [30].

Results:
Correlations of the SPEAD-rate with participant-reported dyspnoea, pain and fatigue were weak (ρ between 0.25 and 0.28), again using Spearman correlations coefficient [30]. Aspect/Method: Discriminant validity Results: As hypothesized, patients had a median SPEAD-rate of 2 g/s (range 0-10), compared to 6 g/s (range 2-11) for healthy participants corresponding to a large effect size of 0.56. When dividing participants into four groups based on degree of dysphagia rated by the SLP (no, mild, moderate and severe, with the healthy participants rated as no), SPEAD-rate decreases (p < 0.001) with increasing degree of dysphagia [30]. Aspect/Method: Diagnostic accuracy Results: When using the SPEAD-rate to discriminate between patients and healthy participants, the area under the ROC-curve was 0.82, with a cut-off value for optimal sensitivity and specificity ratio of 4.2 g/s (sensitivity 80% and specificity 79%). When using the SPEAD-rate to determine aspiration, the area under the ROC-curve was 0.79, with an optimal cut-off value of 1.2 g/s, giving 100% sensitivity and 57% specificity [30]. The parametric bootstrap approximation to Pearson chi-squared goodness-of-fit measure found that values obtained in the sample are similar to those obtained from the model (p = 0.510). The fit on the two-way margins based on 2PPC model did not present discrepancies in percentages of adjustment among indicators (chi-square residuals < 3.5), denoting similarity between observed frequencies in sample and expected frequencies from the model. These results show good fit to the model and unidimensionality of the scale [25].

Aspect/Method:
Conducted Differential Item functioning (DIF) analysis for gender, age, type of stroke, and stroke severity [25].

Results:
The measure did not show DIF for gender, age, type of stroke, and severity of stroke, indicating that these characteristics did not affect the final Swallowing Status outcome [25].

Results:
The results showed good fit to the model and that the measure is unidimensional [25].  The effects of sex were significant across all variables (discrete bites, masticatory cycles and swallows per cracker, total time to ingest, masticatory cycles per bolus, and swallows per bolus) with the exception of the derived measures of average time per masticatory cycle and average time per swallow [31].   , not main focus of study); † Including test-retest, intra, inter, intraclass correlation coefficient or Kappa; ‡ smallest detectable change or limits of agreement or minimal important change; * Hypothesis about relation between included measure and other instrument(s); ** Including differential item functioning, Measurement invariance, or item response theory (Rasch Analysis); *** Including classic test theory (Factor analysis) or item response theory (Rasch analysis).
The most commonly reported aspect of construct validity reported within the included studies was hypothesis testing-the degree to which the results produced evidence that was consistent with hypotheses based on the assumption that the instrument validly measures the construct to be measured [15,16]-with relevant data available for all 16 measures (see Table 4). Conversely, structural validity-the extent to which an instrument's scores adequately reflect the dimensionality of the construct to be measured [15,16]-was reported for only three (19%) of the sixteen measures: DDS [28], MASA-C [29], and Swallowing Status [39]. Cross-cultural validity refers to the degree to which the performance of the items on a translated or culturally adapted instrument is an adequate reflection of the performance of the items of the original version of the instrument [15,16]. As translated versions of the included measures were excluded from this review, only other forms of measurement invariance as a parameter of cross-cultural validity were considered, if applicable. For two measures (MASA-C [29] and mMASA [22]), measurement invariance could have been determined but was not reported.

Reliability Evidence
The reliability domain properties of the measures-internal consistency, reliability (i.e., test/retest and intra/inter-rater agreement), and measurement error-are outlined in Table 3. Internal consistency was reported for nine of the sixteen (56%) measures and was calculated using Cronbach's alpha in each case, with values ranging from "good" (α = 0.71) to "excellent" (α = 0.99), thus showing sufficient overall consistency for each of these measures [59].
Data on the reliability measurement property were reported for all but two measures. Inter-rater agreement was determined for 12 of the 16 (75%) measures, with Intraclass Correlation Coefficient (ICC) and Kappa coefficient being the most commonly reported. ICC was reported for eight measures (50%), with values ranging from "moderate" (ICC = 0.68) to "excellent" (ICC = 1.00) [60]. The Kappa coefficient was reported for five measures (31%), with values ranging from "moderate" (κ = 0.45) to "very good" (κ = 0.91) [61]. Additionally, intra-rater agreement was reported for four of the sixteen (25%) measures using ICC for all four measures, with all "excellent" values ranging from ICC = 0.94 to ICC = 1.00. Test-retest reliability was reported for four measures (25%), again all reported on using ICC, with values ranging from "moderate" (ICC = 0.571) to "excellent" (ICC = 1.00). For these four measures, the time interval between trials for test-retest reliability varied from approximately 15 min to six weeks. No data on measurement error were reported for any of the measures. Table 4 provides an overview of the reported psychometric properties within the domains of reliability and validity per measure.

Conceptual Mapping of Included Measures
Three assessment domains were identified: 'Skills Related to Eating and Drinking', 'Making Adjustments to Facilitate Eating and Drinking', and 'Swallowing Act' (see Figure 2). These three domains were separated into individual sub-domains and elements to help analyse the 16 non-instrumental clinical measures within this study.
Eight measures included items specific to the first domain of 'Skills Related to Eating and Drinking', with MASA [27] and MASA-C [29] including six of seven elements (all but 'selffeeding skills) from this domain and EDACS [33] and TOMASS [40] each only including one element. All but one measure (TOMASS [40]) included items specific to the second domain of 'Making Adjustments to Facilitate Eating and Drinking', with DDS [28] and DMSS [28] including all five elements whilst seven measures-EDSQ [35], IDDSI-FDS [36], MASA [27], MASA-C [29], M-MASA [22], Swallowing Status [39], and SPEAD [30]-included only one element. Thirteen of the measures (81.3%) included items specific to the third domain of 'Swallowing Act', with EDACS [33] including six of the seven elements (all but Trache) from this domain whilst SPEAD [30] and TOMASS [40] each only including one element (Speed). Of the elements in the first domain, 'Skills related to eating and drinking', 'Oral preparation' was included the most (seven of sixteen) and 'Sensory perception' was included the least (two of sixteen). Of the elements in the second domain, 'Making adjustments to facilitate eating and drinking', 'Adjust food & drink intake' was included the most (13 of 16) and 'Feeding adaptation' was included the least (2 of 16). Finally, of the elements in the third domain, 'Swallowing act', 'Pharyngeal or laryngeal clearance' was included the most (10 of 16) and 'Trache' was included the least (2 of 16).
Overall, only seven of the sixteen measures included at least one item specific to each of the three domains, though thirteen of the sixteen included at least one item specific to two of the three domains. Three measures targeted a single domain only. The mean number of elements per measure was six (MN = 6.4; SD = 2.8). MASA [27] and MASA-C [29] included the most elements (12 of 19), whilst IDDSI-FDS [36] included the least elements (1 of 19) across the three domains.

Methodological Quality
Supplementary File S5 shows the outcomes of the QualSyst critical appraisal tool by Kmet et al. [19]. As all studies had sufficient methodological quality, no studies were excluded. The overall methodological quality was strong, with the 32 included studies ranging from 90-100% ratings across the ten aspects assessed. The item that was most commonly given either a "Partial" or "No" rating was item 10 "Analytic methods described/ justified and appropriate", which resulted from not meeting the criteria for good psychometric properties.

Discussion
This systematic review, in line with the PRISMA guidelines [17], aimed to provide an overview of the psychometric properties of clinician-reported non-instrumental assessment in OD. A total of 16 measures were retrieved with published data on one or more psychometric properties within the validity and/or reliability domains. No data were available on measurement error and only three measures provided data on structural validity. As a result, none of the included measures provided a complete overview of its psychometric properties. Furthermore, data on validity and reliability as retrieved from the literature may not always meet current psychometric standards. In other words, measures providing data on its psychometric properties may not always meet methodological quality criteria as, for example, defined by COSMIN to support their implementation in research and daily clinical practice.
An important finding is that very few measures were identified in the literature that comprehensively measure the construct of non-instrumental clinical assessment, with seven measures (43.8%) consisting of a single scale covering only single aspects of OD. Based on the conceptual framework of non-instrumental clinical measures as introduced in this review, the identified measures collectively demonstrated great variety across a number of domains, subdomains, and elements. The number of elements for each measure ranged between one, demonstrating a very narrow focus, and twelve, demonstrating a very broad focus. On average, measures consisted of six elements. Even measures targeting all three domains as defined in our conceptual framework-skills related to eating and drinking, making adjustments to facilitate eating and drinking, and swallowing act-would still exclude several subdomains and elements from the assessment of people with OD. Therefore, since OD is a multidimensional phenomenon [6] and most measures only focus on restricted aspects of OD, clinicians should include multiple measures if aiming to capture the full concept of OD.
The conceptual framework also highlighted the importance of a multidisciplinary approach in the assessment of OD. Different professional healthcare workers may add value to ensure comprehensive evaluation across OD assessment domains. By combining expertise from different disciplines (e.g., speech pathologists, occupational therapists, nurses, psychologists, pulmonologists), OD can be evaluated in all its multidimensional aspects. Consequently, experts from all relevant disciplines should be involved at the onset of instrument development to ensure good content validity [15].
This current review is a first step towards optimising non-instrumental clinical assessment of OD. Although this review was based on the terminology and definitions used in the COSMIN taxonomy [15,16], it is recommended to conduct another, more in-depth psychometric review following the robust COSMIN methodologies and comparing psychometric data and statistical methods using quality criteria as formulated by the COSMIN group. This enables the quality of the psychometric studies and the quality of the psychometric properties to be thoroughly evaluated.
Future research should focus on developing more comprehensive non-instrumental clinical assessments that can be used to capture OD as a multidimensional phenomenon, using contemporary psychometric standards and methods such as item response theory and classic test theory. All psychometric properties should be determined and reported on to allow for validity, reliability, and responsiveness to be established. Finally, before implementing newly developed measures in research and clinical practice, feasibility aspects should be taken into consideration such as time constraints and accessibility. Noninstrumental clinical measures with robust psychometric properties could be of critical value, especially in those health settings where access to instrumental assessment is not possible or where availability is restricted.
Although the reporting of this review followed the PRISMA guidelines to reduce bias, some limitations are inherent to this study. As only studies and measures published in English were included, some measures may have been excluded based on language criteria. According to the COSMIN framework [15,16], nine psychometric properties should be considered if applicable. However, since no international consensus exists about a gold-standard non-instrumental clinician-reported assessment in OD, criterion validity was limited to comparisons between original measures and their revised versions (e.g., shortened versions or versions adapted to specific target populations). Furthermore, since only measures developed in English were included, translated versions of measures were excluded from this review, limiting cross-cultural validity to other forms of measurement invariance, such as different clinical populations. Further, as the identification of studies on responsiveness would have required different search strategies in the electronic databases, this psychometric domain was outside the scope of the current review.

Conclusions
This systematic review following PRISMA guidelines and terminology as defined by the COSMIN framework summarised the reliability and validity of non-instrumental clinical assessments for adults with OD excluding patient self-report. Only 16 measures were identified with reported psychometric characteristics. Even though OD is considered a multidimensional phenomenon, most measures only captured restricted subdomains within the conceptual framework of non-instrumental clinical assessments. Further, data on the reliability and validity of included measures proved incomplete and did not always meet current psychometric standards. Future research should focus on the development of comprehensive non-instrumental clinical assessments for adults with OD using contemporary psychometric research methods.

Supplementary Materials:
The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/jcm12020721/s1, File S1: PRISMA 2020 for Abstracts Checklist; File S2: PRISMA 2020 Checklist; File S3: Definitions of the nine measurement properties according to COSMIN; File S4: Overview of excluded measures for OD in adults; File S5: Methodological quality of included studies based on QualSyst critical appraisal tool by Kmet et al., 2004 [19].