Quantitative Measurement of Swallowing Performance Using Iowa Oral Performance Instrument: A Systematic Review and Meta-Analysis

Swallowing is a complex but stereotyped motor activity aimed at serving two vital purposes: alimentary function and the protection of upper airways. Therefore, any impairment of the swallowing act can represent a significant clinical and personal problem that needs an accurate diagnosis by means of reliable and non-invasive techniques. Thus, a systematic review and meta-analysis was performed to investigate the reliability of the Iowa Oral Pressure Instrument (IOPI) in distinguishing healthy controls (HC) from patients affected by swallowing disorders or pathologies and conditions that imply dysphagia. A comprehensive search was conducted following the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) guidelines and using PubMed, Scopus, Web of Science, Cochrane, and Lilacs databases. Overall, 271 articles were identified and, after a three-step screening, 33 case-control and interventional studies reporting IOPI measurements were included. The methodological quality of the retrieved studies resulted in being at a low risk of bias. The meta-analysis on case-control studies showed that maximum tongue pressure (MIP) values were always higher in HC than in patients, with an overall effect of the MIP difference of 18.2 KPa (17.7–18.7 KPa CI). This result was also confirmed when the sample was split into adults and children, although the MIP difference between HC and patients was greater in children than in adults (21.0 vs. 15.4 KPa in the MIP mean difference overall effect, respectively). Tongue endurance (TE) showed conflicting results among studies, with an overall effect among studies near zero (0.7 s, 0.2–1.1 s CI) and a slight tendency toward higher TE values in HC than in patients. Among the intervention studies, MIP values were higher after treatment than before, with a better outcome after the experimental tongue training exercise than traditional treatments (the MIP mean difference overall effect was 10.8 and 2.3 KPa, respectively). In conclusion, MIP values can be considered as a reliable measure of swallowing function in adults and in children, with a more marked MIP difference between HC and patients for the children population. MIP measures in patients are also able to detect the best outcome on the tongue function after the training exercise compared to traditional training.


Introduction
Swallowing may appear as a simple, obvious, effortless act, but it instead implies various complex mechanisms that involve all the levels of the central nervous system (CNS) as well as 25 pairs of muscles in the oropharynx, larynx, and esophagus. Indeed, the oral cavity, pharynx, and larynx-though anatomically separated-are functionally This innovative tool offers the possibility to effectively measure the pressures performed by the tongue on the bulb [20] after inserting the air-filled balloon inside the mouth and pressing the bulb against the roof of the mouth. The pressures measured are displayed on an Liquid Crystal Display (LCD) screen (Figure 1b,c).
Maximal isometric tongue pressure is measured by placing the bulb posterior to the upper incisors on the alveolar ridge ( Figure 1b). Maximum isometric posterior tongue pressure is measured by placing the bulb 10 mm anterior to the most posterior circumvallate papilla (Figure 1c) [21]. To obtain the maximum tongue pressure (pMax), subjects are then asked to push the tongue towards the hard palate as hard as possible [22]. Tongue endurance is measured by placing the bulb in the desired location (anterior or posterior) and capturing the number of seconds (s) a subject can maintain tongue pressure at 50% of pMax. Visual feedback from the IOPI and verbal encouragement from the clinician are provided during the endurance trials. The trial ends when the pressure steeply drops or when 50% cannot be maintained for a few seconds [23]. In addition, tongue protrusion can be measured, with the holder positioned between the upper and lower incisors and the tongue bulb facing intraorally. Patients are then instructed to protrude the tongue as hard as possible against the bulb [24].
Since the tongue musculature changes the shape and the position of the tongue during swallowing [25], the evaluation of the function of tongue muscles-thus the measures of tongue pressures-lends itself well to the assessment of the swallowing function [26]. Moreover, IOPI provides a biofeedback for oral motor exercise and objectively quantifies patient performance. This systematic review was based on the clinical need to obtain objective measures and the pragmatic need for easy-to-use devices. The aims were to investigate the reliability of IOPI (1) to distinguish patients with swallowing disorders or conditions that imply dysphagia from healthy controls (HC), regardless of age (2), and (3) to determine the impact and effectiveness of tongue training Since the tongue musculature changes the shape and the position of the tongue during swallowing [25], the evaluation of the function of tongue muscles-thus the measures of tongue pressures-lends itself well to the assessment of the swallowing function [26]. Moreover, IOPI provides a biofeedback for oral motor exercise and objectively quantifies patient performance. This systematic review was based on the clinical need to obtain objective measures and the pragmatic need for easy-to-use devices. The aims were to investigate the reliability of IOPI (1) to distinguish patients with swallowing disorders or conditions that imply dysphagia from healthy controls (HC), regardless of age (2), and (3) to determine the impact and effectiveness of tongue training exercises on swallowing performance compared to traditional training.

Materials and Methods
This systematic review was performed according to the guidelines of the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) statement [27]. The review record has been approved by the international prospective register of systematic reviews PROSPERO under the identification number CRD42022297506. The current review clearly addresses a focused question by using the participant, intervention, comparison, and outcomes (PICO) criteria.

Search Strategy
An electronic search on scientific databases (PubMed, Scopus, Web of Science, Cochrane, and Lilacs, ACM Digital, EBSCOhost, and Google Scholar) was performed to identify suitable studies, published after 31 December 1999, using the following terms and keywords alone or in combination: ("Iowa Oral Performance Instrument") AND (swallow* OR deglut* OR tongue) AND (abnormal* OR normal* OR physiolog* OR typical OR atypical OR disorder OR dysfunction OR dysphagia) AND (assess* OR analy* OR evaluat* OR quanti* OR measure*) NOT (review OR "case report*" OR "case series" OR preclinical OR animal).
The first search was performed on December 2021. The last electronic search was performed up to April 2022, and updated on September 2022. In addition to the electronic search, reference lists of the selected studies were manually screened. A reference manager software program (Zotero, George Mason University, Fairfax, VA, USA) was used and the duplicates were discarded first electronically, and then by checking the resulting list manually.

Eligibility Criteria
All of the inclusion/exclusion criteria have been summarized in Table 1. The search was limited to studies published in the English language. The search was restricted to human studies with healthy or diseased children or adults of both genders included in the study population. Only studies where IOPI was used were selected. IOPI outcomes, such as tongue pressures and tongue endurance, had to be reported. Case-control and intervention studies were eligible to be part of this systematic review.

Exclusion Criteria
All of the studies reporting any other diagnostic tool and not referring to IOPI measures were excluded [28][29][30]. In addition, studies published in journals without an impact factor and not peer-reviewed were eliminated [31][32][33][34][35][36][37]. When the mean and standard deviation of the outcomes were not shown in the articles, a request by email was sent to the corresponding authors and only the articles that provided the raw data about mean and standard deviation values were included [38][39][40]. Case-reports, preclinical studies, reviews, systematic reviews, and metanalyses were excluded. Outcomes: Maximum tongue pressure (MIP), lingual swallowing pressure (LSP), tongue endurance (TE), tongue protrusion pressure (TPS). We hypothesized that:

Focused PICO Question
IOPI is a reliable tool to distinguish HC from patients with swallowing disorders or pathologies and conditions that imply dysphagia.

2.
IOPI reliability is similar for children and adults. 3.
IOPI is able to measure an improvement in swallowing performance following traditional treatments and tongue training exercises in HC and in patients.

Selection of Studies
Retrieved citations were independently screened by two authors (EDM and FGC) and relevant studies were identified based on title and abstract. If those did not provide sufficient information about the inclusion criteria, the full text was evaluated to assess eligibility. Any disagreement was solved by discussion, and a third reviewer was consulted to make final decisions (VP).

Data Extraction and Analysis
Author and year, study design, duration of the study, participant baseline characteristics (number, mean age, age range, gender, pathologies), intervention, drop-out/lost to follow-up, follow-up duration, and outcomes (MIP, TE, LSP, and TPS) were extracted independently from each included study by two authors (EDM and FGC) using a predesigned data extraction form. Microsoft Excel 2020 (Microsoft Office, Microsoft Corporation, Redmond, WA, USA) was used for data collection and for descriptive analysis. A third reviewer was consulted when difficulties arose (VP). The primary outcomes included MIP. The secondary outcomes were TE, LSP, TPS.

Assessment of Methodological Quality
The quality assessment of the included studies was independently performed by two reviewers (EDM and FGC) as part of the data extraction procedure.
Specifically, the Newcastle-Ottawa Scale (NOS) [41] was applied for case-control and cross-sectional studies to judge each study on eight items distributed in three categories: "Selection", "Comparability", and "Exposure". As each item corresponded to a multiplechoice question, depending on the answer, each study could be awarded with a maximum of one star for each item of the Selection and Exposure categories, whereas a maximum of two stars could be given for the Comparability category. Each study could be judged with a maximum of 9 stars.
ROBINS-I tool [42] was used for non-randomized intervention studies through "Bias due to confounding", Bias in selection of participants into the study", "Bias in classification of interventions", "Bias due to deviations from intended interventions", "Bias due to missing data", "Bias in measurement of outcomes", and "Bias in selection of the reported result" domains. Each study was assessed with one out of five possible overall judgements: low, moderate, serious, or critical risk of bias, or no information. RoB 2 tool [43] was used for randomized intervention studies instead through "Randomization process", "Deviations from intended interventions", "Missing outcome data", "Measurement of the outcome", and "Selection of the reported result". Each study was assessed with one out of three possible overall judgements: low risk of bias, some concerns, or high risk of bias.
As it was practically unfeasible to keep patients and operators blinded to treatment, the related performance bias (blinding of participants and operators) was not accounted for, except for 4 studies [44][45][46][47] where conditions were explicitly blinded. Any disagreement was solved by discussion or consulting a third reviewer (VP) until consensus was achieved.

Statistical Analysis
Statistical analyses were conducted on the outcome variables of all selected studies that passed the eligibility criteria. To perform the statistical analyses, studies were divided into case-control and intervention studies. Spearman correlations and Student's t-tests were carried out between outcome variables when applicable. Meta-analyses were performed if data on the outcome variables were provided in at least 4 studies. To answer the first PICO question, we selected all studies that measured at least one of the outcome variables in HC and patients. Due to the heterogeneity of the diseases, we included in the patients' group all participants with a diagnosis related to swallowing disorders. To answer the second PICO question, the case-control studies were divided into two groups (children and adults) according to the age of the involved participants. To answer the third PICO question, we selected all studies that measured at least one of the outcome variables before and after traditional treatments and/or tongue training exercises. Age and sex of the population were heterogeneous among the studies. Thus, for the meta-analyses, we calculated the mean difference in each outcome variable between HC and patients as effect index for the case-control studies and the mean difference in each variable between pre-and posttreatment for the intervention studies. The weight of each study was performed according to the sample size and standard deviations of the outcome variables, as was performed previously [48]. For each effect index and for the overall effect, 95% confidence interval (CI) was estimated.

Review Analysis
The search resulted in a total of 271 articles: 84 retrieved from PubMed, 23 from Cochrane, 2 from Lilacs, 77 from Web of Science, 85 from Scopus, 0 from ACM Digital, 81 from Google Scholar, and 32 from EBSCOhost. After duplicates being removed, 87 studies were available for the screening. An initial screening of the titles and abstracts identified 47 studies. After reading the full texts, 33 studies [44][45][46][47] were included, thus eliminating 14 studies [28][29][30][31][32][33][34][35][36][37][38][39][40]78] that did not meet the inclusion/exclusion criteria. The summary of the search strategy is depicted in Figure 2. The reasons for study exclusions and characteristics of the included studies are presented in Table 2. Table 2. List of excluded studies (#14) and reasons for exclusion.
All of the characteristics of the studies and IOPI outcomes are summarized in Table 3.
The Newcastle-Ottawa quality assessment scale was also applied to studies with a cross-sectional design: a low risk of bias was found in three studies [56,58,59]. Two studies instead [55,57] resulted in being at an intermediate risk of bias because of inadequate information about "Selection", "Comparability", and "Exposure" domains.
The quality assessment of case-control studies is available in Table 4.

Intervention Studies
Among randomized intervention studies, a low risk of bias was found in eight studies [44][45][46][47]60,64,65,69]. One study instead [61] resulted in being at a moderate risk of bias because of inadequate information about the "Randomization process" domain, according to the RoB 2 tool (Figure 3).

Intervention Studies
Among randomized intervention studies, a low risk of bias was found in eight studies [44][45][46][47]60,64,65,69]. One study instead [61] resulted in being at a moderate risk of bias because of inadequate information about the "Randomization process" domain, according to the RoB 2 tool (Figure 3). A low risk of bias was also found in four out of five non-randomized studies [62,[66][67][68]. Only one study [63] showed an overall moderate risk of bias, due to the "Bias in selection of participants into the study" domain (Table 5). A low risk of bias was also found in four out of five non-randomized studies [62,[66][67][68]. Only one study [63] showed an overall moderate risk of bias, due to the "Bias in selection of participants into the study" domain (Table 5).  (Table 6). MIP values were obtained by pushing the tongue against the roof of the mouth as hard as possible, thus expressing tongue pressure. TE values were measured by asking the subjects to sustain 50% of their maximum pressure for as long as possible. Only 1 study [74] obtained TE by asking the subjects to hold 25% of their maximal pressure. TPS values were obtained by asking the subjects to protrude their tongue as hard as possible against the bulb, positioned between the upper and lower incisors and facing intraorally. LPS values were defined as the swallowing pressures generated swallowing boluses. Due to the low number of studies, LSP and TPS measurements were excluded from all statistical evaluations. No significant correlation was found between MIP and TE values (p = 1). A meta-analysis showed that, in all studies, MIP values were always higher in HC than in patients (Figure 4), with an overall effect of 18.2 KPa (17.7-18.7 KPa CI).
Differently from MIP values, TE showed contrasting results among studies ( Figure 5). Indeed, in three studies (9.09%) [71,72,74], TE values were lower in HC than in patients, in three other studies (9.09%) [50,54,77], TE values were similar for the two groups, and in the remaining three studies [59,73,76] (9.09%), TE values were higher in HC than in patients. The overall effect was near zero (0.7 s, 0.2-1.1 s), showing a slight prevalence of studies showing higher TE values in HC than in patients. However, the study [77] with the largest sample size (n = 150 in total) reported a TE mean difference between the two groups near the zero value, with the CI overlapping the y-axis (0.3 s, 1.8-2.3 s). tongue pressure; LSP: lingual swallowing pressure; TE: tongue endurance; TPS: tongue protrusion strength.
Due to the low number of studies, LSP and TPS measurements were excluded from all statistical evaluations. No significant correlation was found between MIP and TE values (p = 1).
To assess if IOPI outcomes were able to highlight possible effects of treatments on swallowing performance, interventional studies on HC and patients were included. From all investigated studies and outcomes, 11 intervention studies [44][45][46][47][60][61][62][63]66,67,69] (33.33%) had MIP mean values before and after therapy and were used for statistical analyses. Eight studies [45,46,60,62,63,66,67,69] (24.24%) reported MIP mean values in patients before and after experimental training exercises (Table 7). Specifically, MIP values were significantly higher after experimental training exercises than before (p = 0.003 from paired t-test). In addition, the mean percentage change in MIP values before and after the tongue training exercise was 39.6%.    Only three studies [44,47,61] (9,09%) measured MIP values before and after the experimental training exercise in HC, and the mean percentage change was 25.1%. From these studies, two [44,61] (6.06%) measured MIP values twice also, but in the absence of intervention, and the mean percentage change between the two measurements was 1.7%.

Post-Treatment
Four studies [45,46,60,66,69] (12.12%) reported MIP values before and after traditional training in patients, and the mean percentage change was 8.7. Figure 7 shows MIP values before and after treatments for all intervention studies in HC and in patients. Only three studies [44,47,61] (9,09%) measured MIP values before and after the experimental training exercise in HC, and the mean percentage change was 25.1%. From these studies, two [44,61] (6.06%) measured MIP values twice also, but in the absence of intervention, and the mean percentage change between the two measurements was 1.7%.
Four studies [45,46,60,66,69] (12.12%) reported MIP values before and after traditional training in patients, and the mean percentage change was 8.7. Figure 7 shows MIP values before and after treatments for all intervention studies in HC and in patients.   Only three studies [44,47,61] (9,09%) measured MIP values before and after the experimental training exercise in HC, and the mean percentage change was 25.1%. From these studies, two [44,61] (6.06%) measured MIP values twice also, but in the absence of intervention, and the mean percentage change between the two measurements was 1.7%.
Four studies [45,46,60,66,69] (12.12%) reported MIP values before and after traditional training in patients, and the mean percentage change was 8.7. Figure 7 shows MIP values before and after treatments for all intervention studies in HC and in patients.

Discussion
This systematic review showed the useful of IOPI as a quantitative measure of swallowing performance. Indeed, in the case-control studies, MIP values were always greater in HC than in patients. This result was confirmed by meta-analysis, suggesting that the MIP value is a quite reliable measure that may be used effectively by clinicians as a non-invasive measure of impaired swallowing performance related to different pathological conditions. Indeed, the studies included patients affected by sleep breathing disorders [53], head and neck cancer [71,72,76], muscular dystrophies [70,[73][74][75], unilateral cleft-lip-palate [50], Parkinson's disease [51], sleep breathing disorders [53], post-stroke and post-oral endotracheal extubation period [49,54], risk of malnutrition [55], mouth breathing behaviors [56,65], motor speech disorders and sound disorders [57], and Down syndrome [59], as well as oral phase dysphagia itself [77]. Therefore, according to these results, patients showed weaker tongue pressure-indicative of a scarcer function of tongue muscles due to the presence of any pathology or condition that might affect the swallowing performance-when compared to HC. Among all of the MIP measurements, the most evident difference between HC and patients was shown in the study by Rodrìguez-Alcalà et al. [67] for the OSAHS group, despite the limited sample size. IOPI MIP measures might also be used in the prodromal condition of dysphagia, which may represent a herald of disease progression [79].
On the contrary, the same might not be said for the analysis of TE measurements. There were studies [71,72,74] with lower TE values in HC than in patients, other studies [50,54,77] with similar TE values for the two groups, and, in the remaining studies [59,73,76], TE values were higher in HC than in patients. A slight prevalence of studies showing higher TE values in HC than patients was found. This was prevalently due to Rogus-Pulia et al., 2016 [76], who compared HC with patients with head and neck cancer and reported a very high difference (20.5 s, 23.9-27.2 s CI) between the two groups for the TE value, with a sample size similar to the other studies (n = 21, for each group). Since no clear pattern could be deduced from this analysis and a wide dissimilarity among all of the results was enhanced-probably due to the low number of studies investigating this measure-at the moment, TE cannot be considered as a reliable variable to distinguish healthy and diseased subjects, suggesting that further investigations of the above-mentioned measure should be pursued. Not by chance, the contrast between the acceptable reliability for tongue pressure and that of TE measurements was also noticed by Adams et al., 2013 [80]. Indeed, the results reported by this author showed-differently than tongue pressure values-TE mean values with above 10% changes between the first two consequent trials, despite the decrease in the subsequent trials; moreover, unacceptably large typical errors and weak-to-moderate intra-class correlation coefficients were found for TE measurements.
The meta-analysis on children showed higher values of MIP in HC than in patients, except for one study [75] conducted on subjects in a 7-8 age range; however, this was a slight difference (−1.9 KPa, −3.4 to −0.4 KPa) compared to all of the others [50,56,57,59,65,70,75]. A further analysis of the results showed that the difference between HC and patients was larger in children than in adults, thus suggesting that, among patients, children with pathological conditions have a worse tongue weakness than adult patients. In fact, children seem to show a less efficient activation of faster, higher-threshold motor units (type II) compared to adults, leading to a lower maximal pressure output [81]. Tongue pressure also changes with age in HC; in particular, a rapid increase has been noticed across ages 3-8, with a following slower increase until peaking in late adolescence to young adult age, according to Potter and Short [82]. This might explain the MIP results after dividing the studies according to participants' age: the difference between HC and patients seems to be more evident in children, probably because diseased children show a weaker tongue not only due to the pathology, but also because of age itself and the ongoing changes in younger developing subjects.
The data analysis of the intervention studies showed a greater improvement in patients after experimental tongue training exercises, especially when a worse MIP value was found at baseline conditions [63,67]. Rodrgìguez-Alcalà et al. [67] showed the greatest difference in MIP values before and after training measurements. The results from Kim et al. [62] did not show a great improvement, probably because the only impaired condition taken into account in this study was defined as "complaints of swallowing difficulties (i.e., increased aspiration rate and foreign body sensation in throat)", thus not referring to a particularly disabling comorbidity. Tongue pressure also showed improvements after traditional trainings in patients, even if it was definitely lower than after experimental trainings [45,46,60,69]. O'Connor-Reina et al. [66] also reported MIP values in patients with OSAHS that did not adhere to therapy. These MIP values were not considered in the meta-analysis because these patients were comparable to patients that did not receive any intervention. Indeed, the after-training tongue pressure values were almost similar to the before-training ones-even slightly lower-unlike adhering patients, who showed improvements after training. Not by chance, the adhering patients had worse baseline tongue pressures, unlike non-adhering patients; therefore, the better compliance could be due to greater motivation in the first place. This enhances how experimental tongue training, sometimes in association with traditional training [45,60,69], is effective at improving tongue function, especially in patients with more compromised baseline conditions. Traditional trainings might reveal itself as helpful at improving tongue function, but at a lower rate. The successful effects of experimental tongue training on patients might be important, as an improvement in tongue pressure might correspond to an improvement in the severity of the disease in some cases. For example, Suzuki et al., 2020 [83] observed significantly increased tongue pressures after myofunctional therapy (MFT), consisting of some exercises aimed at functionally obtaining the appropriate positioning of the tongue in OSA patients treated with continuous positive airway pressure (CPAP); a significantly decreased apnea-hypopnea index (AHI) was also found, suggesting that tongue exercise may have contributed to the improvement in the severity of OSA.
The tongue training exercise also showed improved tongue pressures in HC [44,47,61]. On the contrary, only a slight difference was found in HC in the absence of intervention [44,61]: the mean tongue pressure value after a period of no intervention was slightly higher than the baseline. This might mean that the periodical measuring of tongue pressure itself might lead to a greater activity of tongue muscularity and thus explain the time effect [84]. All of this might appear as non-useful information, but, according to Lin et al. [84], experimental tongue training in HC proposes instead the possibility of pressuring the tongue, giving rise to positive changes that can prevent or halt the progressively altered swallowing mechanism characteristic of healthy aging and hence representing a relevant strategy not only for dysphagia intervention but also for prevention. Finally, the meta-analysis on intervention studies confirmed that patients benefited from experimental training exercises, as revealed by the increase in MIP values after the therapy. Patients improved swallowing functions following traditional trainings, as tongue pressure was always greater after intervention, but at a slower rate.

Conclusions
IOPI MIP is a reliable measure of swallowing function in HC and in patients with different pathologies. MIP values were always higher in HC than in patients.
MIP values were also higher in HC than in patients among children, with a more remarkable difference than in adults.
Using TE instead cannot be considered as a reliable variable to distinguish healthy and diseased subjects, suggesting that further investigations are needed.
Experimental tongue training exercises alone or in combination with traditional trainings succeed at improving and pressuring the tongue function in patients, in an objectively better way than traditional trainings alone.
Therefore, IOPI proved itself to be a valid tool to successfully measure tongue pressure and detect the productive effects of tongue training on both healthy and diseased subjects.