Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (875)

Search Parameters:
Keywords = inter-rater reliability

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
20 pages, 1265 KB  
Article
Intra-Rater, Inter-Rater, and Test–Retest Reliability of a Laser- and Inclinometer-Based Hip Joint Position Sense Test in Healthy Adults: A Two-Phase Study with Preliminary Reference Values
by Joévin Burnel, Benoit Vallee, Benoit Pairot de Fontenay and Joachim Van Cant
Muscles 2026, 5(2), 45; https://doi.org/10.3390/muscles5020045 (registering DOI) - 19 Jun 2026
Abstract
Hip joint position sense (JPS), a key component of neuromuscular function arising from muscle spindle and periarticular mechanoreceptor input, remains underexplored, with no standardized and reliable clinical protocol available to assess hip proprioception. This study evaluated the intra- and inter-rater reliability of a [...] Read more.
Hip joint position sense (JPS), a key component of neuromuscular function arising from muscle spindle and periarticular mechanoreceptor input, remains underexplored, with no standardized and reliable clinical protocol available to assess hip proprioception. This study evaluated the intra- and inter-rater reliability of a laser- and inclinometer-based active hip JPS protocol and established preliminary references in healthy adults. A two-phase reliability study was conducted in accordance with GRRAS and COSMIN guidelines: 17 participants for reliability analyses and 57 for preliminary references. Six movement directions were assessed (flexion, extension, abduction, adduction, medial and lateral rotations). Reliability was quantified using intraclass correlation coefficients with their 95% confidence intervals, using two-way random-effects models with absolute agreement (ICC(3,1) for intra-rater and ICC(2,1) for inter-rater analyses), interpreted as poor (<0.50), moderate (0.50–0.70), or good (≥0.70). Absolute measurement error was reported as standard error of measurement (SEM%) and 95% minimal detectable change (MDC95%), normalized to target amplitudes to allow direct cross-direction comparison. Intra-rater reliability ranged from poor to moderate, with experienced raters reaching ICC = 0.64 (95% CI [0.39; 0.80]) for medial rotation. Inter-rater reliability improved across sessions, peaking for medial rotation (ICC = 0.78; 95% CI [0.50; 0.91]). Rotational movements yielded the lowest SEM% (3–6%), indicating high measurement precision despite trial-to-trial variability (MDC% 9–31%). Normative errors were largest in flexion (21.4 cm) and smallest in rotations (≈2.2–2.3°). Despite overall low-to-moderate reliability, the protocol achieved clinically acceptable measurement precision (SEM% < 10%) for rotational tasks, whereas the laser-based sagittal and frontal-plane components remained exploratory. The protocol provides preliminary reference values for hip JPS in healthy adults and requires further validation before clinical use. Full article
27 pages, 460 KB  
Review
Publisher-Built Generative AI Assistants in U.S. Higher Education: A Critical Review and a Reproducible TRIAD–JTBD Evaluation Framework
by Maikel Leon
Algorithms 2026, 19(6), 492; https://doi.org/10.3390/a19060492 (registering DOI) - 19 Jun 2026
Abstract
Artificial intelligence (AI) has reshaped higher education over six decades, evolving from drill-and-practice programs to adaptive cognitive tutors and, most recently, transformer-based generative models. This article presents a critical review of publisher-built generative AI assistants, adopting an explicitly socio-technical perspective that combines a [...] Read more.
Artificial intelligence (AI) has reshaped higher education over six decades, evolving from drill-and-practice programs to adaptive cognitive tutors and, most recently, transformer-based generative models. This article presents a critical review of publisher-built generative AI assistants, adopting an explicitly socio-technical perspective that combines a technological lens with a pedagogical one. It makes three contributions. First, it synthesizes the technical and algorithmic evolution of educational AI, from rule-based and expert systems through knowledge tracing and learning analytics to large language models and retrieval-augmented generation, and organizes these mechanisms into a taxonomy. Second, it introduces a reproducible evaluation framework that couples the TRIAD rubric (Trust, Relevance, Impact, Adoption, and Design) with a Jobs-to-Be-Done (JTBD) lens, complete with anchored scoring criteria, an evidence-and-confidence grading scheme, and reported inter-rater reliability. Third, it applies the framework to eleven assistants released by U.S. publishers, distinguishing peer-reviewed evidence from institutional reports and commercial claims. The analysis reflects a mid-2025 snapshot and is presented as a reusable template rather than a static ranking. Findings reveal substantial variation in privacy safeguards, curricular alignment, documented impact, adoption, and usability. The review identifies application scenarios and recommendations for researchers and institutional leaders seeking to guide the responsible integration of AI in higher education. Full article
Show Figures

Figure 1

10 pages, 287 KB  
Article
A Cross-Sectional Study of Large Language Models in Lung Cancer Information Delivery: Readability, Quality, and Patient-Centred Evaluation
by Ömer Önal and Suzan Temiz Bekce
Healthcare 2026, 14(12), 1769; https://doi.org/10.3390/healthcare14121769 - 18 Jun 2026
Abstract
Background/Objectives: Lung cancer is a leading cause of cancer-related mortality worldwide. As patients increasingly utilize large language models (LLMs) for health information, evaluating the readability and patient-centeredness of these tools is critical. This study aims to compare the performance of ChatGPT-4o mini, [...] Read more.
Background/Objectives: Lung cancer is a leading cause of cancer-related mortality worldwide. As patients increasingly utilize large language models (LLMs) for health information, evaluating the readability and patient-centeredness of these tools is critical. This study aims to compare the performance of ChatGPT-4o mini, Microsoft Copilot, and Google Gemini in providing lung cancer information, focusing on their utility for individuals with limited health literacy. Methods: In this cross-sectional study (March 2026), 30 responses to ten standardized lung cancer-related queries were analyzed. Outputs were assessed using JAMA benchmarks and mDISCERN for quality, the SMOG index for readability, and PEMAT-P for understandability and actionability. Inter-rater reliability was analyzed using intraclass correlation coefficients (ICCs). Results: ChatGPT-4o mini demonstrated superior readability, achieving a sixth-grade level (SMOG: 6.23 ± 0.72, p < 0.001). Gemini achieved higher JAMA scores, indicating stronger academic rigour. While PEMAT-P scores were highest for ChatGPT (63.7%), all models exhibited moderate mDISCERN quality. Inter-rater reliability was excellent for JAMA (ICC = 1.000) and PEMAT-P (ICC = 0.883), though moderate for mDISCERN (ICC = 0.365), reflecting inherent interpretative subjectivity in qualitative assessment. No hallucinations were observed. Conclusions: Current LLMs exhibit a trade-off between accessibility and academic rigour: ChatGPT favours patient-friendly readability, while Gemini emphasizes structured content. The observed inter-rater variability in mDISCERN underscores the complexity of standardizing qualitative AI evaluation. These findings suggest that LLMs function best as complementary aids rather than substitutes for physician-led communication. Full article
(This article belongs to the Special Issue Research on Health Literacy and Health Promotion in Healthcare)
22 pages, 885 KB  
Article
Iterative Audit Convergence in LLM-Managed Multi-Agent Systems: A Case Study in Prompt-Engineering Quality Assurance
by Elias Calboreanu
Software 2026, 5(2), 26; https://doi.org/10.3390/software5020026 - 18 Jun 2026
Abstract
Prompt specifications for multi-agent large language model (LLM) systems carry data contracts and integration logic across interdependent files but are rarely subjected to structured-inspection rigor. We report a single-system case study of iterative, agent-driven auditing applied to AEGIS (Autonomous Engineering Governance and Intelligence [...] Read more.
Prompt specifications for multi-agent large language model (LLM) systems carry data contracts and integration logic across interdependent files but are rarely subjected to structured-inspection rigor. We report a single-system case study of iterative, agent-driven auditing applied to AEGIS (Autonomous Engineering Governance and Intelligence System), a seven-lane production pipeline whose 7152-line specification surface was audited across nine rounds, surfacing 51 consistency defects (per-round counts of 15, 8, 12, 2, 8, 1, 4, 1, 0). We present a seven-category post hoc taxonomy with explicit coding rules, non-monotonic convergence consistent with cascading edits and audit-scope expansion, and a locked audit protocol. We further report two partial replications on a public synthetic mini-specification: a cross-LLM panel of four frontier vendors (OpenAI, Anthropic, Google, xAI; 12 traces; multi-vendor union detects all five seeded defects) and an inter-rater reliability check on a stratified subsample (Cohen’s κ = 0.80 on category, 0.46 on severity). The full reproducibility bundle accompanies the submission. Full article
(This article belongs to the Special Issue Software Reliability, Security and Quality Assurance)
Show Figures

Graphical abstract

15 pages, 1204 KB  
Review
White Esthetic Score as a Tool for Esthetic Assessment of Tooth-Supported Restorations: A Comprehensive Review with Case Illustration
by Abdulrahman Alshabib, Silvia Rojas-Rueda, Jose Villalobos-Tinoco, Khalid M. Aldosary, Francisco Garcia-Torres, Carlos A. Jurado and Mark A. Antal
Bioengineering 2026, 13(6), 690; https://doi.org/10.3390/bioengineering13060690 - 16 Jun 2026
Viewed by 198
Abstract
Background: The White Esthetic Score (WES) is a standardized clinician-reported index that assesses the esthetic quality of a single-tooth restoration by comparison with a natural reference tooth, typically the contralateral tooth. It evaluates five domains: tooth form, crown outline/volume, color (hue/value), surface texture, [...] Read more.
Background: The White Esthetic Score (WES) is a standardized clinician-reported index that assesses the esthetic quality of a single-tooth restoration by comparison with a natural reference tooth, typically the contralateral tooth. It evaluates five domains: tooth form, crown outline/volume, color (hue/value), surface texture, and translucency/characterization. Each domain is scored from 0 to 2 (major discrepancy, minor discrepancy, no discrepancy), yielding a total score of 0–10; higher scores indicate a closer match. Although developed for single-tooth implant restorations, WES has also been applied to natural teeth and tooth-supported restorations. Methods: This comprehensive review summarizes case-report evidence applying WES to tooth-supported restorations, outlining the concept, scoring method, documentation requirements, and available data on reliability and interpretation. A case illustration is also presented in which a patient received eight anterior veneers; outcomes were assessed using all WES parameters. Results: Published reports support WES as a practical qualitative tool to assess esthetic outcomes in tooth-supported restorations. In the presented case, the veneers achieved a WES of 9, reflecting marked improvement in tooth form, crown outline/volume, color, surface texture, and translucency/characterization. Conclusions: The comprehensive review indicates WES is feasible for routine clinical use in practice, but agreement varies by parameter and improves with standardized photography and examiner calibration; some components show lower inter-rater agreement than simpler soft-tissue indices. Because correlations between WES and patient satisfaction are inconsistent, WES should be complemented with patient-reported outcome measures. Common thresholds consider WES ≥ 6 acceptable. Clinical use for crowns and veneers should emphasize case selection, standardized records, and combined clinician- and patient-centered outcome reporting. Full article
(This article belongs to the Special Issue New Tools for Multidisciplinary Treatment in Dentistry, 2nd Edition)
Show Figures

Graphical abstract

13 pages, 955 KB  
Article
Feasibility and Concordance of a Large Language Model (ChatGPT-5) as a Clinical Decision Support Tool in Gynecologic Oncology Tumor Boards: A Blinded, Multi-Observer Study
by Hatice Asoglu, Sevgul Kose, Oguzhan Kayim, Ali Abaci, Ghanim Khatib, Mehmet Ali Vardar, Emine Kilic Bagir, Derya Gumurdulu, Mehmet Mutlu Kidi, Ertugrul Bayram, Tolga Koseci, Berksoy Sahin and Ismail Oguz Kara
J. Clin. Med. 2026, 15(12), 4451; https://doi.org/10.3390/jcm15124451 - 9 Jun 2026
Viewed by 174
Abstract
Background: The reliability of large language models (LLMs) in complex oncologic decision-making remains inadequately validated. This study evaluated the concordance of ChatGPT-5 with multidisciplinary tumor board (MDT) decisions in gynecologic oncology, assessing accuracy, reproducibility, and domains of discordance. Methods: We analyzed 242 gynecologic [...] Read more.
Background: The reliability of large language models (LLMs) in complex oncologic decision-making remains inadequately validated. This study evaluated the concordance of ChatGPT-5 with multidisciplinary tumor board (MDT) decisions in gynecologic oncology, assessing accuracy, reproducibility, and domains of discordance. Methods: We analyzed 242 gynecologic cancer cases (endometrial n = 102, ovarian n = 85, cervical n = 40, rare n = 15) discussed at the Çukurova University Gynecologic Oncology MDT (2024–2025). Standardized clinical summaries were input into ChatGPT-5 using a structured prompt template. Each case was queried three times within a single calendar day using independent conversations. Recommendations were evaluated by two blinded medical oncologists using a 5-point Likert scale. A composite performance score (CPS) was calculated as (mean Likert/5) × 100. Concordance was analyzed using Cohen’s kappa (κ). Results: Inter-rater reliability was substantial to almost perfect for both MDT (κ = 0.761) and AI (κ = 0.814) evaluations (both p < 0.001). MDT–AI concordance was fair (Rater 1: κ = 0.258; Rater 2: κ = 0.334). CPS were significantly higher for MDT versus AI (Rater 1: 93.8% ± 5.2 vs. 89.4% ± 6.7; Rater 2: 93.4% ± 5.5 vs. 89.7% ± 6.4; both p < 0.001). Full consistency across three queries was achieved in only 37.2% of cases (90/242). AI performance was significantly inferior in advanced-stage disease (p = 0.008), genetic testing (p = 0.006), fertility-sparing (p = 0.018), and novel therapeutics (p = 0.003). Conclusions: ChatGPT-5 demonstrates potential as a clinical decision support tool but lacks sufficient reliability for independent use. Key limitations include inconsistency in 62.8% of cases, suboptimal performance in advanced-stage disease, and deficiencies in precision oncology domains. These findings suggest that human expertise remains indispensable for the individualized management of complex gynecologic malignancies. Full article
(This article belongs to the Section Oncology)
Show Figures

Figure 1

29 pages, 1224 KB  
Systematic Review
Assessing Childhood Development: Systematic Review and Meta-Analysis on the Validation of Local Assessment Tools in the Context of Developing Countries
by Seep Lassi, Maira Niaz, Zoya Navid Ansari, Hamza Iftikar, Shanzay Rizvi, Hamna Amir, Zain Hasnain, Sidra Kaleem Jafri and Jai K. Das
Psychol. Int. 2026, 8(2), 35; https://doi.org/10.3390/psycholint8020035 - 5 Jun 2026
Viewed by 293
Abstract
Background: Accurate child development assessment is crucial, particularly in developing countries where access to validated tools remains limited. Many assessment tools are adapted for local contexts, but their psychometric properties require evaluation. Objective: This systematic review examines the reliability, validity, and overall psychometric [...] Read more.
Background: Accurate child development assessment is crucial, particularly in developing countries where access to validated tools remains limited. Many assessment tools are adapted for local contexts, but their psychometric properties require evaluation. Objective: This systematic review examines the reliability, validity, and overall psychometric properties of new and adapted child development assessment tools used in developing countries. The focus on these settings stems from the need to assess tools that are culturally appropriate, feasible, and accurate in resource-constrained environments, where early identification of developmental delays can significantly impact long-term child outcomes. Methods: Descriptive and meta-analyses were conducted to synthesize findings from eligible studies. Psychometric properties such as internal consistency, inter-rater reliability, construct validity, sensitivity and specificity were assessed. This review is registered on Open Science Framework (OSF) doi:10.17605/OSF.IO/GU28K. Results: The findings indicate that although some adapted tools demonstrate strong reliability and validity, others exhibit inconsistencies, highlighting challenges in adaptation. The meta-analysis provided pooled estimates of key psychometric properties with a net sensitivity and specificity of 0.859 and 0.805, respectively, illustrating the validity of these local tools but also variability in performance across different tools. Conclusion: The results emphasize the need for rigorous validation processes to ensure that adapted tools maintain their psychometric integrity. Future research should focus on refining these measures to improve their applicability in diverse cultural and socioeconomic settings. Full article
Show Figures

Figure 1

16 pages, 1625 KB  
Article
Translation, Cross-Cultural Adaptation and Validation of the Serbian Version of the Clinical Frailty Scale in Patients Undergoing Major Uro-Oncological Surgery
by Natasa Petrovic, Nebojsa Ladjevic, Vesna Jovanovic, Dimitrije Sarac, Ana Mimic, Milan Radovanovic, Mila Milicevic, Milos Lazic and Sandra Sipetic Grujicic
Healthcare 2026, 14(11), 1567; https://doi.org/10.3390/healthcare14111567 - 3 Jun 2026
Viewed by 174
Abstract
Background/Objectives: Frailty is well-recognized as a predictor of adverse postoperative outcomes. The Clinical Frailty Scale (CFS) is a widely recommended frailty assessment tool due to its simplicity and rapid bedside applicability; however, it has never been validated in Serbia. The aim of this [...] Read more.
Background/Objectives: Frailty is well-recognized as a predictor of adverse postoperative outcomes. The Clinical Frailty Scale (CFS) is a widely recommended frailty assessment tool due to its simplicity and rapid bedside applicability; however, it has never been validated in Serbia. The aim of this study was to translate, culturally adapt, and validate the CFS in Serbia, in patients undergoing elective major surgical procedures. Methods: This cross-sectional study included 149 patients aged ≥50 years undergoing elective major urological oncology surgery. Frailty was assessed preoperatively using three scales: the CFS, the Edmonton Frail Scale (EFS), and the FRAIL scale. The CFS evaluations were independently performed by two raters and repeated after 7 days. Concurrent validity was evaluated via Spearman’s correlation between the CFS, the EFS, and the FRAIL scale. The “known-group” construct validity of the CFS was assessed using the test for trends across clinically relevant groups. Both inter-rater and test–retest reliability were assessed using the intraclass correlation coefficient (ICC). Results: The CFS was translated and culturally adapted into the Serbian language in accordance with ISPOR guidelines. The Serbian version of the CFS demonstrated both excellent inter-rater reliability (ICC = 0.957; 95% CI 0.941–0.968), and test–retest reliability (ICC = 0.958; 95% CI 0.943–0.970). A strong positive correlation was observed between the CFS and both the EFS (ρ = 0.698) and the FRAIL scale (ρ = 0.614). A known-group comparison confirmed the construct validity of the CFS. Conclusions: The Serbian version of the CFS is a reliable, valid, and clinically feasible tool for preoperative identification of frailty in patients aged 50 years and older undergoing major elective uro-oncological procedures. Full article
(This article belongs to the Section Clinical Care)
Show Figures

Figure 1

15 pages, 581 KB  
Article
Agreement Between Novice Visual Assessment and Classifications Derived from Markerless Motion Capture During Sit-to-Stand Performance in Healthy Adults
by Christopher Voltmer and Casey Imperio
Healthcare 2026, 14(11), 1549; https://doi.org/10.3390/healthcare14111549 - 2 Jun 2026
Viewed by 234
Abstract
Background: Visual assessment is commonly used in rehabilitation to evaluate movement quality during functional tasks such as sit-to-stand (STS) transfers. However, the extent to which observational ratings align with classifications derived from portable markerless motion capture systems remains unclear. This study examined agreement [...] Read more.
Background: Visual assessment is commonly used in rehabilitation to evaluate movement quality during functional tasks such as sit-to-stand (STS) transfers. However, the extent to which observational ratings align with classifications derived from portable markerless motion capture systems remains unclear. This study examined agreement between novice observational ratings and motion-capture-derived classifications during STS performance. Methods: Fifty healthy adults performed STS transfers across three 18-inch seating conditions (firm, compliant, commode). Two final-year Doctor of Physical Therapy (DPT) students independently rated movement performance using a standardized observational rubric. Simultaneously, a portable markerless motion capture system (Kinotek) recorded joint kinematics, which were converted into ordinal severity classifications to enable a comparison. Inter-rater reliability and agreement were assessed using percent agreement and Krippendorff’s alpha. Results: Exact agreement between novice raters was high across all surfaces (82.3–82.9%), while Krippendorff’s alpha values were low despite high exact agreement (α = 0.250–0.323), consistent with restricted scale use. Agreement between observational ratings and motion-capture-derived classifications was low, with negative alpha values across all conditions (α = −0.224 to −0.561), indicating systematic differences in classification patterns. Observational raters more frequently assigned lower severity categories compared to motion-capture-derived classifications. Conclusions: Findings demonstrate low chance-corrected agreement under conditions of restricted scale use among novice raters and systematic disagreement between observational and motion-capture-derived classifications during STS performance. These findings reflect differences in classification approaches under the operational definitions used in this study. Motion capture was used as an objective comparator rather than a gold standard, and this study does not establish criterion validity. Further research is needed to evaluate agreement patterns in clinical populations and to examine how different measurement approaches influence functional movement classification. Full article
Show Figures

Figure 1

25 pages, 1931 KB  
Article
Reproducibility Standards for Lean Maturity Models: Design Guidelines for Logistics Operations
by Padmaka Mirihagalla and Gyula Vastag
Logistics 2026, 10(6), 122; https://doi.org/10.3390/logistics10060122 - 2 Jun 2026
Viewed by 311
Abstract
Background: Lean management has been widely adopted, particularly in logistics operations. Achieving lean is not a discrete intervention but a continuous process of maturing in the integration of processes, work systems, and organizational capabilities within a coherent management philosophy. This maturation requires [...] Read more.
Background: Lean management has been widely adopted, particularly in logistics operations. Achieving lean is not a discrete intervention but a continuous process of maturing in the integration of processes, work systems, and organizational capabilities within a coherent management philosophy. This maturation requires structured measurement instruments for tracking maturity progression. Although numerous lean maturity models (LMMs) have been proposed, none has achieved methodological standardization or acceptance as a measurement yardstick. Methods: This study addresses this gap by evaluating 27 qualified LMMs using a reproducibility-inspired assessment framework. The paper introduces the OVRGP framework (Opportunity, Validity, Reliability, Generalizability, and Process Integrity), comprising 17 rigor-based criteria. Independent raters with substantial lean expertise evaluated all 27 models using a six-point ordinal scale, achieving a pre-consensus inter-rater reliability of ICC(2,1) = 0.836. Results: Nine critical methodological weaknesses were identified, with average scores below 2.0 for criteria requiring empirical validation, structural integrity testing, and cross-context replication. Conclusions: The study offers targeted methodological guidelines for strengthening future LMM development in logistics and supply chain contexts, and introduces the OVRGP framework as a universal reference architecture for maturity model development across industries. It provides researchers, organizations, and consulting practitioners with a design reference standard for rigorous lean maturity instruments. Full article
Show Figures

Figure 1

16 pages, 1320 KB  
Article
Evaluating the Quality of Artificial Intelligence-Generated Information on Cleft Lip and Palate: A Comparative Cross-Sectional Study
by Amir Bilder, Michal Almos, Ahmad Hija, Andrei Krasovsky, Nidal Zeineh, Tal Capucha and Omri Emodi
Healthcare 2026, 14(11), 1535; https://doi.org/10.3390/healthcare14111535 - 1 Jun 2026
Viewed by 470
Abstract
Background/Objectives: Large language models (LLMs) are increasingly consulted for information about cleft lip and palate (CLP), yet the reliability of their outputs across clinical domains has not been evaluated. This study aimed to compare the quality of CLP-related information generated by GPT-4o and [...] Read more.
Background/Objectives: Large language models (LLMs) are increasingly consulted for information about cleft lip and palate (CLP), yet the reliability of their outputs across clinical domains has not been evaluated. This study aimed to compare the quality of CLP-related information generated by GPT-4o and Gemini 2.5 Pro across multiple thematic domains using a validated quality instrument and a reliability-first analytic framework. Methods: Fifty-four standardized CLP questions across six domains were submitted to GPT-4o (OpenAI) and Gemini 2.5 Pro (Google DeepMind) on 25 September 2024 via their public interfaces, using new, history-free sessions and default settings, yielding 108 responses. Three independent, CLP-experienced raters scored each response using the Global Quality Score (GQS; 1–5 scale assessing accuracy, completeness, and clinical usefulness). Before comparing models, we applied a reliability-first filter: only domains where all three raters showed substantial agreement (Fleiss’ kappa [κ] ≥ 0.60) were included in statistical comparisons. Domains that failed this threshold were analyzed qualitatively to identify the source of disagreement. A descriptive taxonomy of errors was developed for low-scoring responses. Results: Three domains met the reliability threshold (General Care Information, General Cleft Information, and Pre-Treatment Information; 30 paired questions). Both models performed at a high and practically equivalent level: GPT-4o median GQS 4.33 (IQR 4.00–5.00) versus Gemini 2.5 Pro 5.00 (IQR 4.00–5.00); the difference was not statistically significant (Wilcoxon V = 139.00, p = 0.691; Hodges–Lehmann median difference 0.00, 95% CI −0.33 to 0.67). Three domains were excluded because rater agreement was insufficient; qualitative review showed this reflected genuine clinical practice variation rather than clear model errors. The most common inaccuracies were overgeneralization of outcomes, outdated surgical timing, and omission of multidisciplinary team roles. Conclusions: Both models provided high-quality CLP information in domains supported by clinical consensus, indicating they may serve as useful adjuncts for general patient and family counseling. Clinicians should, however, verify any treatment-specific content against current institutional protocols before relaying it to patients. Future research should assess readability, alignment with health literacy, and patient comprehension of AI-generated CLP information. Full article
(This article belongs to the Section Artificial Intelligence in Healthcare)
Show Figures

Figure 1

22 pages, 1490 KB  
Article
Development and Preliminary Validation of the Breath Motor Pattern Index (BMPI): An Observational Measure of Respiratory Pattern Quality in Children
by Aleksandra Moluszys, Łukasz Mański, Mirella Kozakiewicz, Marek Niedoszytko and Eliza Wasilewska
Children 2026, 13(6), 759; https://doi.org/10.3390/children13060759 - 29 May 2026
Viewed by 214
Abstract
Background/Objectives: Breathing is increasingly recognized as an integral component of the motor system, interacting with postural control and movement. Despite this, clinical assessment of respiratory function in children remains largely limited to physiological parameters, with relatively few tools available to evaluate breathing [...] Read more.
Background/Objectives: Breathing is increasingly recognized as an integral component of the motor system, interacting with postural control and movement. Despite this, clinical assessment of respiratory function in children remains largely limited to physiological parameters, with relatively few tools available to evaluate breathing as an organized motor pattern. The aim of this study was to develop and preliminarily validate the Breath Motor Pattern Index (BMPI), an observational tool designed to assess the organization of respiratory motor patterns in children. Methods: A scoping review was conducted to identify key components of respiratory motor pattern organization. Based on these findings, the BMPI was developed and evaluated in a cohort of 210 children aged 0–72 months, divided into three groups: healthy controls, children with neurological conditions, and children with respiratory disorders. Inter-rater and test–retest reliability was assessed using intraclass correlation coefficients (ICC). Measurement error was quantified using the standard error of measurement (SEM) and minimal detectable change (MDC95). Construct-related validity was examined through correlations with the Gross Motor Function Measure (GMFM-88) and comparisons between clinical groups. Results: The BMPI showed high inter-rater reliability (ICC = 0.998) and test–retest reliability (ICC = 0.999), with low measurement error (SEM = 0.55; MDC95 = 1.53). A weak but statistically significant correlation with GMFM-88 was observed (rho = 0.23, p < 0.001). BMPI scores differed significantly between groups (p < 0.001), with lower values observed in the neurological group and higher values in the pulmonary group. Conclusions: The BMPI appears to be a promising observational tool with potential clinical applicability for assessing respiratory motor pattern organization in children. The findings support the conceptualization of breathing as an integrated component of the motor system while highlighting the need for further psychometric and longitudinal validation studies. Future research should further investigate the responsiveness of the BMPI as well as its potential utility in clinical decision-making and therapeutic monitoring. Full article
(This article belongs to the Special Issue Physical Therapy in Pediatric Developmental Disorders)
Show Figures

Graphical abstract

13 pages, 246 KB  
Article
The Italian Version of the Drooling Impact Scale: Translation and Psychometric Validation in Children with Neurodevelopmental Conditions
by Federica Pauciulo, Marco Tofani, Giulia Stella, Alessandra Lacopo, Susanna Summa, Giulia Tullo, Caterina Delia, Antonella Cerchiari and Gessica Della Bella
Children 2026, 13(6), 757; https://doi.org/10.3390/children13060757 - 29 May 2026
Viewed by 213
Abstract
Background/Objectives: Drooling is a common and clinically relevant issue in children with neurodevelopmental conditions, with important consequences for daily functioning, social participation, and caregiver burden. The lack of validated tools in Italian makes it difficult to quantify the impact of drooling on daily [...] Read more.
Background/Objectives: Drooling is a common and clinically relevant issue in children with neurodevelopmental conditions, with important consequences for daily functioning, social participation, and caregiver burden. The lack of validated tools in Italian makes it difficult to quantify the impact of drooling on daily life, support appropriate care pathways, and evaluate the effectiveness of interventions. The aim of this study was to translate, culturally adapt, and evaluate the psychometric properties of the Italian version of the Drooling Impact Scale (DIS) in a pediatric population. Methods: The DIS is a 10-item caregiver-reported outcome measure, with each item rated on an ordinal 0–10 scale, designed to assess the functional and psychosocial impact of drooling. It was translated using a standard forward–backward procedure, followed by expert review and cognitive debriefing with caregivers. Caregivers of children aged ≥2 years with heterogeneous neurodevelopmental conditions and feeding/swallowing impairments were consecutively recruited from a tertiary pediatric hospital. Psychometric properties were assessed in line with COSMIN recommendations, including internal consistency (Cronbach’s α), structural validity through exploratory factor analysis, inter-rater and test–retest reliability (intraclass correlation coefficients, ICC), measurement error (standard error of measurement, SEM; smallest detectable change, SDC), and construct validity through correlation with the Pediatric Quality of Life Inventory (PedsQL). Results: The Italian DIS was completed by caregivers of 126 children. It showed excellent internal consistency (Cronbach’s α = 0.92). Factor analysis indicated a clear dominant factor, explaining 56.5% of the variance, while additional factors contributed only marginally. Agreement between caregivers was excellent (ICC = 0.94), and test–retest reliability was good (ICC = 0.85). Measurement error analysis yielded SEM = 8.66, SDC_individual = 24.00, and SDC_group = 2.14. As expected, DIS scores were associated with health-related quality of life. Conclusions: The Italian version of the DIS appears to be a reliable and structurally sound instrument for assessing the impact of drooling in children with neurodevelopmental conditions. It may be useful in both clinical practice and research, although further studies are needed to explore its responsiveness and confirm these findings in different settings. Full article
33 pages, 1964 KB  
Article
Built Environment Performance and User Perception of Urban Transit Interface: A Mixed-Methods Empirical Assessment of Bus Corridor in Udupi, India
by Amit Kinjawadekar, Nandineni Rama Devi and Shantharam Patil
Future Transp. 2026, 6(3), 118; https://doi.org/10.3390/futuretransp6030118 - 28 May 2026
Viewed by 201
Abstract
Transit accessibility is a critical determinant of urban equity (SDG-11) in the Global South, a term referring to emerging economies characterised by rapid urbanisation and significant infrastructure deficits. A significant ‘compliance–resilience gap’ persists in intermediate Indian cities. This study evaluates a 10.2 km [...] Read more.
Transit accessibility is a critical determinant of urban equity (SDG-11) in the Global South, a term referring to emerging economies characterised by rapid urbanisation and significant infrastructure deficits. A significant ‘compliance–resilience gap’ persists in intermediate Indian cities. This study evaluates a 10.2 km primary transit corridor in Udupi, auditing 42 transit interfaces across 21 nodes using a unified 14-parameter framework. Analytical reliability was confirmed via inter-rater reliability testing (Krippendorff’s alpha = 0.822). Using a joint display synthesis, technical compliance failures were mapped to qualitative user narratives. Results supported Hypothesis 1 (H1) via chi-square testing, revealing systemic failures (p < 0.05) in ramps (2%) and information systems (0%). Hypothesis 2 (H2) was validated through a one-sample t-test, showing that stakeholder perception (mean = 1.55) was statistically significantly lower than the neutral threshold (t(99) = −21.10, p < 0.001). These deficits triggered restrictive user adaptation strategies, including temporal displacement and forced social dependency. The study establishes a replicable ‘justice-centred’ audit framework to prioritise interventions in resource-constrained urban contexts. Full article
Show Figures

Figure 1

24 pages, 5445 KB  
Review
Transcranial Focused Ultrasound Stimulation for Alzheimer’s Disease—A Scoping Review
by Jon Crompton, Robyn Cuthell, Tom G. J. Steward, William W. Watts, Alanoud Alqahtani and Daniel J. Whitcomb
Brain Sci. 2026, 16(6), 570; https://doi.org/10.3390/brainsci16060570 - 28 May 2026
Viewed by 474
Abstract
Background/Objectives: Alzheimer’s disease (AD) remains a significant global health challenge, characterised by a persistent resistance to traditional pharmacological interventions. While non-invasive brain stimulation (NIBS) techniques like transcranial magnetic stimulation (TMS) and transcranial direct current stimulation (tDCS) show therapeutic promise, their limited depth [...] Read more.
Background/Objectives: Alzheimer’s disease (AD) remains a significant global health challenge, characterised by a persistent resistance to traditional pharmacological interventions. While non-invasive brain stimulation (NIBS) techniques like transcranial magnetic stimulation (TMS) and transcranial direct current stimulation (tDCS) show therapeutic promise, their limited depth of penetration restricts their efficacy in targeting deep-brain AD pathology. Transcranial focused ultrasound stimulation (tFUS) has emerged as a novel, non-invasive neuromodulatory tool capable of precise deep-brain targeting. This scoping review aims to systematically map the current evidence base regarding the neuromodulatory application of tFUS in AD. Methods: Following PRISMA-ScR guidelines, a scoping search was conducted across four major databases (Ovid MEDLINE, Embase, Web of Science, and CENTRAL). Studies were included if they investigated focused ultrasound stimulation (FUS) as a neuromodulatory intervention for AD, excluding applications involving blood–brain-barrier disruption via microbubbles. Two independent reviewers performed screening and data extraction, with inter-rater reliability assessed via Cohen’s kappa. Results: Our analysis indicates that tFUS represents a safe and potent multi-modal intervention for AD that addresses both pathological protein aggregation and electrophysiological network failure. Its ability to modulate neuroplasticity and metabolic recovery suggests a promising therapeutic trajectory. Conclusions: Future research should prioritise the standardisation of acoustic protocols and the pursuit of longitudinal clinical cohorts to establish the long-term efficacy of this non-invasive technology. Full article
(This article belongs to the Section Neurodegenerative Diseases)
Show Figures

Figure 1

Back to TopTop