Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (832)

Search Parameters:
Keywords = inter-rater reliability

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
12 pages, 2324 KB  
Article
Evaluation of the Reliability of Radiographic and MRI Angles in Superior Femoral Epiphysiolysis: A Comparative Study
by Wassim Ben Abdennebi, Andreas Tsoupras, Eugénie Barras, Viola Sbampato, Romain Dayer, Giacomo De Marco, Oscar Vazquez, Christina Steiger, Amira Dhouib, Anne Tabard-Fougère and Dimitri Ceroni
Diagnostics 2026, 16(8), 1208; https://doi.org/10.3390/diagnostics16081208 - 17 Apr 2026
Abstract
Background/Objectives: Slipped Capital Femoral Epiphysis (SCFE) is a common, serious hip disorder in children and adolescents. Two-dimensional (2D) radiography is the gold standard for diagnosis but may not fully capture the deformity’s complexity, and it is vulnerable to positioning errors. Advances in [...] Read more.
Background/Objectives: Slipped Capital Femoral Epiphysis (SCFE) is a common, serious hip disorder in children and adolescents. Two-dimensional (2D) radiography is the gold standard for diagnosis but may not fully capture the deformity’s complexity, and it is vulnerable to positioning errors. Advances in three-dimensional (3D) imaging, such as computed tomography and magnetic resonance imaging (MRI), enable more accurate assessments. This study aimed to (1) assess the inter-rater reliability of 2D radiographic and 3D MRI measurements, and (2) evaluate the correlations and agreements between these outcomes. Methods: Patients were randomly selected from a cohort of patients aged under 16 years old and diagnosed with SCFE between January 2000 and December 2024. Southwick angles and posterior epiphyseal slip angles on 2D radiographs were independently measured by two orthopaedic surgeons. Posterior epiphyseal slip angles on 3D MRI were independently measured by two orthopaedic surgeons and two paediatric radiologists. Relationships between the three outcomes were evaluated using the Pearson correlation coefficient (r). Inter-rater reliability and agreements between the three outcomes were evaluated using the intraclass correlation coefficient (ICC) and the standard error measurement (SEM). Results: A total of 35 patients (35 hips) were recruited, with a mean age of 11.8 (1.2) years old and 19/35 (54%) females. Radiographic outcomes were moderately correlated (r < 0.75, p < 0.01) with MRI posterior epiphyseal slip angles. MRI posterior epiphyseal slip angles were systematically greater (16° on average) than both radiographic outcomes, regardless of whether contralateral correction was applied. The inter-rater reliability of radiographic outcomes was excellent (ICC > 0.85, SEM > 5.0°) and almost perfect (ICC > 0.95, SEM = 2.5°) for the MRI posterior epiphyseal slip angles measured by the paediatric radiologists. Conclusions: Findings suggest that while both diagnostic methods are reliable, radiographic measurements systematically underestimate epiphyseal slip severity by approximately 16° compared to MRI. This discrepancy could impact the accuracy of disease staging, leading to potential misclassifications. This highlights the need for a more standardised approach to evaluating SCFE, especially regarding the type of imaging used for angle measurement. Full article
14 pages, 715 KB  
Article
The Nerve-Sparing Quality (NSQ) Score: A Novel Intraoperative Scoring System for Assessing Nerve-Sparing Quality During Robot-Assisted Radical Prostatectomy—A Concept and Feasibility Study
by Jakub Kempisty, Krzysztof Balawender, Oskar Dąbrowski and Karol Burdziak
J. Clin. Med. 2026, 15(8), 2979; https://doi.org/10.3390/jcm15082979 - 14 Apr 2026
Viewed by 234
Abstract
Introduction: Nerve-sparing (NS) during robot-assisted radical prostatectomy (RARP) plays a critical role in postoperative functional recovery, particularly urinary continence and erectile function. Despite the importance of precise neurovascular bundle (NVB) preservation, intraoperative assessment of NS quality remains largely subjective and lacks standardized [...] Read more.
Introduction: Nerve-sparing (NS) during robot-assisted radical prostatectomy (RARP) plays a critical role in postoperative functional recovery, particularly urinary continence and erectile function. Despite the importance of precise neurovascular bundle (NVB) preservation, intraoperative assessment of NS quality remains largely subjective and lacks standardized evaluation tools. The aim of this study was to develop and preliminarily evaluate a structured intraoperative scoring system designed specifically for assessing NS quality during RARP. Methods: A novel 10-point intraoperative NS scoring system (NSQ Score) based on five domains was developed: dissection plane, bleeding control, bundle manipulation, continuity of dissection, and symmetry. Each parameter was rated on a 0–2 scale. Thirty robot-assisted radical prostatectomy (RARP) procedures performed in 2024 were randomly selected from a prospectively maintained institutional surgical video archive. Cases were not pre-filtered based on tumor stage, surgical difficulty, or intraoperative complexity. High-definition video recordings of the nerve-sparing phase were anonymized and independently evaluated by three experienced observers blinded to patient outcomes and to each other’s assessments. Inter-rater agreement was analyzed using weighted Cohen’s kappa statistics with quadratic weights, complemented by exact and near-agreement proportions. Cluster bootstrap resampling was applied to account for bilateral observations. Results: A total of 48 evaluable observations were analyzed. The overall inter-rater agreement demonstrated a weighted kappa of 0.41 (95% CI 0.36–0.48), indicating fair-to-moderate agreement among reviewers. Exact agreement occurred in 43% of observations, while near-agreement (allowing one ordinal level difference) reached 98%. Among individual parameters, symmetry demonstrated the highest reliability with substantial agreement (κ = 0.70; 95% CI 0.58–0.81). Other domains showed fair agreement, including intraoperative bleeding (κ = 0.36), continuity of dissection (κ = 0.39), bundle manipulation (κ = 0.34), and dissection plane (κ = 0.27). Agreement levels were comparable between left- and right-sided dissections. Conclusions: We propose a novel structured intraoperative scoring system for evaluating nerve-sparing quality during RARP. The scale is simple, procedure-specific, and feasible for structured postoperative or video-based assessment. Preliminary results demonstrate fair-to-moderate inter-rater reliability with very high near-agreement, supporting the feasibility of this tool for clinical use. The proposed scoring system may facilitate standardized training, objective performance assessment, and future studies correlating intraoperative NS quality with functional outcomes. Full article
(This article belongs to the Special Issue Robotic Urologic Surgery: Clinical Applications and Advances)
Show Figures

Figure 1

12 pages, 233 KB  
Article
Analysis of Interrater Reliability and Interpretive Discrepancies in Polysomnography Scoring Across Clinical Subgroups
by Ji Ho Choi, Tae Kyoung Ha, Ji Eun Moon and Seockhoon Chung
Life 2026, 16(4), 669; https://doi.org/10.3390/life16040669 - 14 Apr 2026
Viewed by 188
Abstract
Background: Polysomnography (PSG) is the gold standard for diagnosing sleep disorders. However, the subjectivity of manual scoring can lead to inter-scorer variability, undermining diagnostic accuracy and subsequent clinical decisions. This study aims to quantitatively assess scoring concordance among multiple scorers across various clinical [...] Read more.
Background: Polysomnography (PSG) is the gold standard for diagnosing sleep disorders. However, the subjectivity of manual scoring can lead to inter-scorer variability, undermining diagnostic accuracy and subsequent clinical decisions. This study aims to quantitatively assess scoring concordance among multiple scorers across various clinical subgroups to identify the factors that contribute to interpretive discrepancies. Methods: We conducted a retrospective analysis of overnight diagnostic PSG data from adult patients at a tertiary university hospital sleep center. Interrater reliability was evaluated by three independent expert scorers for 30 subjects selected through stratified random sampling. The polysomnographic data were independently and blindly scored according to the American Academy of Sleep Medicine criteria, focusing on sleep stages, arousals, respiratory events, and leg movements, all scored in 30 s epochs. Interrater agreement was measured using Fleiss’ κ, along with 95% confidence intervals, and included subgroup analyses by diagnostic category. Results: The analysis included a total of 28,291 epochs from 30 adults across normal, insomnia, obstructive sleep apnea (OSA) [mild–severe], and periodic limb movement (PLM) disorder subgroups. The overall interrater agreement for sleep staging among the three scorers was nearly perfect (Fleiss’ κ = 0.932), with the highest concordance observed in stages W, N2, and R, and excellent agreement in stages N1 and N3. Respiratory events showed particularly high reliability, with near-perfect agreement for apnea (κ = 0.955) and substantial agreement for hypopnea, arousals, and PLMs. Pairwise analyses indicated the highest concordance between scorer 1 and scorer 3, while the agreement between scorer 1 and scorer 2 was lower, particularly for detecting arousals and limb movements. Subgroup analyses showed the highest and most stable agreement in moderate OSA, whereas severe OSA exhibited reduced reliability for sleep staging and arousal scoring, indicating increased scoring complexity with greater sleep fragmentation. Conclusions: Although expert PSG scoring demonstrates high overall reliability, significant variability persists in complex cases like severe OSA. These findings underscore the necessity for structured quality assurance and automated tools to improve diagnostic consistency in clinical practice. Full article
21 pages, 4215 KB  
Systematic Review
Inter-Rater Reliability of Subarachnoid Hemorrhage Radiological Grading Scales: A Systematic Review and Meta-Analysis
by Daria Dmitrievna Dolotova, Tatyana Alexandrovna Solominova, Natalia Alexeevna Polunina, Evgenia Romanovna Blagosklonova, Natalya Sergeevna Plyusova, Ganipa Ramazanovich Ramazanov, Rustam Shakhismailovich Muslimov, Maxim Vladimirovich Solominov and Andrey Vasilevich Gavrilov
J. Clin. Med. 2026, 15(8), 2899; https://doi.org/10.3390/jcm15082899 - 10 Apr 2026
Viewed by 235
Abstract
Background: Subarachnoid hemorrhage (SAH) has high mortality and disability rates. The timely and precise assessment of SAH severity is of critical importance in predicting life-threatening complications. Several CT-based radiological grading systems have been proposed, but a comprehensive meta-analysis of their inter-rater reliability [...] Read more.
Background: Subarachnoid hemorrhage (SAH) has high mortality and disability rates. The timely and precise assessment of SAH severity is of critical importance in predicting life-threatening complications. Several CT-based radiological grading systems have been proposed, but a comprehensive meta-analysis of their inter-rater reliability (IRR) has not been conducted. Methods: This study followed the guidelines of Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA). Two authors performed a systematic search of original articles in the PubMed database. Methodological quality of the studies was assessed using the Quality Appraisal of Reliability Studies (QAREL). Meta-analyses of Cohen’s kappa and intra-class correlation coefficient (ICC) were performed using R packages “metafor” and “meta”. Results: A systematic literature analysis was performed for twenty articles that met the inclusion criteria. The methodological quality was moderate in 14 of 20 studies; five studies were of low quality. Only eight articles were suitable for meta-analysis. Cohen’s kappa of the binarized Fisher scale was 0.85 (95% CI 0.70–0.93), though it was based on only two studies and 109 patients. The Hijdra scale had an ICC of 0.75 (95% CI 0.29–0.93). The original and modified Graeb scales proposed for the assessment of concomitant intra-ventricular hemorrhage demonstrated ICC of 0.83 (95% CI 0.59–0.94) and 0.93 (95% CI 0.84–0.97), respectively. For other scales, meta-analysis was not possible due to incomplete reporting or single evaluations. Conclusions: The current evidence on IRR of radiological grading scales for SAH is limited, emphasizing the need for further high-quality research to validate their reliability and clinical applicability. Full article
(This article belongs to the Special Issue Intracranial Aneurysms: Diagnostics and Current Treatment)
13 pages, 675 KB  
Article
Domain-Specific vs. General-Purpose Large Language Models in Orthodontics: A Blinded Comparison of AlimGPT, GPT-4o, Gemini, and Llama
by Aksakalli Sertac, Giray Bilgin and Temel Cagri
Dent. J. 2026, 14(4), 219; https://doi.org/10.3390/dj14040219 - 8 Apr 2026
Viewed by 192
Abstract
Objective: The application of artificial intelligence (AI) in orthodontics has evolved rapidly in recent years, encompassing areas such as diagnosis, treatment planning, and patient management, and AlimGPT is an AI-based tool that provides treatment options based on data and algorithms. Methods: [...] Read more.
Objective: The application of artificial intelligence (AI) in orthodontics has evolved rapidly in recent years, encompassing areas such as diagnosis, treatment planning, and patient management, and AlimGPT is an AI-based tool that provides treatment options based on data and algorithms. Methods: Fourteen different orthodontic questions were asked to each model, and answers were analyzed. This study aimed to compare AlimGPT with GPT-4o, Gemini, and Llama using standardized tests to evaluate the quality of information provided, including the Likert scale, modified DISCERN (mDISCERN), and modified Global Quality Score (mGQS). Results: Significant differences were detected for reliability (χ2 = 15.267, p = 0.0016) and usefulness (χ2 = 20.557, p = 0.0001). Post hoc tests showed AlimGPT > Gemini and Llama for reliability and AlimGPT > GPT-4o, Gemini, and Llama for usefulness. mDISCERN was significant overall (χ2 = 11.047, p = 0.0115), but no pairwise contrast met adjusted significance; mGQS showed no significant differences (χ2 = 7.071, p = 0.0697). Inter-rater agreement was moderate-to-good for reliability (ICC = 0.710, 95% CI 0.60–0.80) and usefulness (ICC = 0.729, 95% CI 0.63–0.82), moderate for mGQS (ICC = 0.596, 95% CI 0.47–0.71), and poor-to-moderate for mDISCERN (ICC = 0.435, 95% CI 0.30–0.58). Conclusions: In this blinded, within-subjects experiment, the domain-specific model (AlimGPT) received higher clinician ratings for usefulness and, for reliability, exceeded two general baselines. Differences in mGQS were not detected. Expanding the number of raters, increasing item diversity or integrating updated baselines would be beneficial. Full article
(This article belongs to the Special Issue Orthodontics and New Technologies: 2nd Edition)
Show Figures

Graphical abstract

25 pages, 1022 KB  
Article
Strategic Competence in Sustainability Education: Conceptual Patterns Identified Through AI-Assisted Qualitative Analysis
by Cathérine Conradty and Franz Xaver Bogner
Sustainability 2026, 18(7), 3643; https://doi.org/10.3390/su18073643 - 7 Apr 2026
Viewed by 263
Abstract
This study investigates how participants conceptualise sustainability and sustainability citizenship, as well as how these conceptualisations relate to perceived agency. Drawing on two open-ended prompts, it analyses participants’ visions of a sustainable future and the roles they would like to play within it. [...] Read more.
This study investigates how participants conceptualise sustainability and sustainability citizenship, as well as how these conceptualisations relate to perceived agency. Drawing on two open-ended prompts, it analyses participants’ visions of a sustainable future and the roles they would like to play within it. The dataset was based on 1714 coded response segments from 164 participants. Methodologically, the study combines qualitative content analysis, independent human-AI double coding, manual validation, inter-rater reliability assessment, and residual-based co-occurrence analysis within a qualitatively grounded mixed-methods design. The results show that sustainability is predominantly framed in civic, symbolic, and ecological terms, whereas strategic competence and professionally articulated agency remain less visible. Sustainability meanings and role conceptions also vary systematically across disciplinary contexts. In addition, the analyses reveal patterned gaps between participants’ future visions and their self-attributed roles in sustainability transformations. The study contributes empirical insights into sustainability meaning-making and perceived agency and shows how LLM-assisted coding can be embedded in a transparent mixed-methods workflow. For sustainability education, the findings underline the importance of strengthening strategic and systemic dimensions of competence and linking civic engagement more closely to professional pathways of action. Full article
Show Figures

Figure 1

14 pages, 1814 KB  
Article
Endplate Bone Quality Assessment for Preoperative Planning and Patient-Specific Implementation in Lumbar Spine Surgery
by Wesley P. Jameson, Bailey D. Lupo, Andrew M. Schwartz, Andrew Daigle, Ahmed Anwar, Smith Surendran, Huy Tran, Christian Quinones, Deepak Kumbhare, Bharat Guthikonda and Stanley Hoang
J. Clin. Med. 2026, 15(7), 2800; https://doi.org/10.3390/jcm15072800 - 7 Apr 2026
Viewed by 332
Abstract
Background/Objectives: Poor bone quality is strongly associated with adverse surgical events. Although dual-energy X-ray absorptiometry (DXA) remains the gold standard for bone mineral density (BMD) assessment, logistical barriers may limit its preoperative application. The Endplate Bone Quality (EBQ) score is an MRI-derived [...] Read more.
Background/Objectives: Poor bone quality is strongly associated with adverse surgical events. Although dual-energy X-ray absorptiometry (DXA) remains the gold standard for bone mineral density (BMD) assessment, logistical barriers may limit its preoperative application. The Endplate Bone Quality (EBQ) score is an MRI-derived metric quantifying subchondral bone quality at the vertebral endplate with demonstrated predictive value for cage subsidence following lumbar interbody fusion. However, EBQ has been measured exclusively at the operative level in surgical cohorts. This study aimed to assess level-specific EBQ scores across the entire lumbar spine and compare distributions across age, sex and osteoporosis subgroups. Methods: A single-institution retrospective review of T1-weighted lumbar MRI studies from patients evaluated for lower back pain from 2020 to 2025 was performed. EBQ was independently scored by two blinded raters at each disc space from L1–L2 to L5–S1 using 3 mm endplate ROIs normalized to a CSF ROI at L3. Interrater reliability was assessed via ICC, Pearson correlation, and RMSE. Patients were stratified by age (≤60 vs. >60 years), sex, and osteoporosis status, and subgroup comparisons were performed for overall and level-specific EBQ score. Results: A total of 96 patients with an average age of 61.0 ± 9.42 years were included in this study. The majority of patients included were female (87.5%), and 18.8% had been diagnosed with osteoporosis. EBQ scores demonstrated a progressive caudal increase across all subgroups from L2–L3 to L5–S1. Overall interrater reliability was acceptable (ICC = 0.76), with level-specific ICCs ranging from 0.70 to 0.83. No significant differences were observed between age or sex subgroups. Osteoporotic patients demonstrated significantly higher EBQ at L1–L2, L2–L3, and overall (all p < 0.05), with no significant differences at L3–L4 through L5–S1. Conclusions: This study provides normative, level-specific EBQ reference data throughout all levels of the lumbar spine. The increase in EBQ scores seen among caudal levels and reduced osteoporotic discriminatory power support the importance of level-specific context when interpreting EBQ thresholds. These findings may support future studies evaluating threshold development for EBQ. Full article
(This article belongs to the Special Issue Clinical Advancements in Spine Surgery: Best Practices and Outcomes)
Show Figures

Figure 1

15 pages, 497 KB  
Article
An Assessment of GPT-3.5 and GPT-4.0 Responses to Scoliosis FAQs
by Tu-Lan Vu-Han, Enikö Regényi, Vikram Sunkara, Paul Köhli, Friederike Schömig, Alexander P. Hughes, Michael Putzier, Matthias Pumberger and Thilo Khakzad
J. Pers. Med. 2026, 16(4), 206; https://doi.org/10.3390/jpm16040206 - 7 Apr 2026
Viewed by 269
Abstract
Background: ChatGPT is a large language model (LLM) online chatbot developed by OpenAI and launched in November 2022. Early adoption studies have shown high readiness to use this technology for health-related questions and self-diagnosis. However, the quality and clinical adequacy of health-related [...] Read more.
Background: ChatGPT is a large language model (LLM) online chatbot developed by OpenAI and launched in November 2022. Early adoption studies have shown high readiness to use this technology for health-related questions and self-diagnosis. However, the quality and clinical adequacy of health-related responses remain incompletely characterized. This study aimed to explore responses generated by ChatGPT-3.5 and ChatGPT-4.0 to common patient questions regarding scoliosis. Methods: Ten scoliosis-related frequently asked questions (FAQs) were selected from a larger pool of over 250 patient-facing questions compiled from 17 publicly available FAQ webpages and informed by a Google Trends analysis. Questions were harmonized, grouped by theme, and then reduced by rule-based expert review to a final set intended to represent common patient concerns. Results: The median ratings of ChatGPT-3.5 and ChatGPT-4.0 responses ranged from satisfactory, requiring minimal (2) to moderate clarification (3). Across the ten matched questions, no statistically detectable difference was found between models in this study setting (W = 8.0, p = 0.59; Cliff’s δ = −0.12 [95% CI −0.58, 0.40]); however, given the small question set, unblinded rating process, and poor inter-rater reliability, this should not be interpreted as evidence of equivalence, non-inferiority, or comparable model performance. The results apply only to the 10–15 April 2024, online snapshots of ChatGPT-3.5 and ChatGPT-4.0 and should not be generalized to later model iterations. Conclusions: This study should be interpreted as a clinically oriented observational report, intended to inform physician awareness and patient-physician communication rather than validate chatbot accuracy or safety. In this 10–15 April 2024, sample, both model outputs frequently required clinician clarification. Given the small FAQ set, low inter-rater reliability, unblinded design, and single-sample outputs, the findings do not establish equivalence or superiority and apply only to the specific 10–15 April 2024, model snapshots and evaluated questions. Full article
(This article belongs to the Special Issue AI and Precision Medicine: Innovations and Applications)
Show Figures

Figure 1

20 pages, 1508 KB  
Systematic Review
Blockchain Technology and Automated Project Governance: A Systematic Review of Governance Mechanisms, Enabling Conditions, and Future Research Directions
by Mohammed Saeed Alotaibi
Sustainability 2026, 18(7), 3589; https://doi.org/10.3390/su18073589 - 6 Apr 2026
Viewed by 337
Abstract
This study synthesizes peer-reviewed literature to examine how blockchain technology supports Automated Project Governance (APG), focusing on the organizational, institutional, and human conditions under which potential governance contribution is realized. A systematic literature review was conducted in accordance with PRISMA 2020 guidelines, yielding [...] Read more.
This study synthesizes peer-reviewed literature to examine how blockchain technology supports Automated Project Governance (APG), focusing on the organizational, institutional, and human conditions under which potential governance contribution is realized. A systematic literature review was conducted in accordance with PRISMA 2020 guidelines, yielding twenty-one empirically and conceptually grounded studies. Screening reliability was strengthened through independent dual screening at the full-text eligibility stage (inter-rater κ = 0.81). Seven blockchain-enabled governance mechanisms are synthesized and comparatively assessed in terms of evidentiary support and research maturity, suggesting that blockchain’s decentralized and immutable architecture may support transparency, accountability, and coordination when embedded within appropriate governance arrangements, but these benefits do not arise automatically from technological adoption. The synthesis further identifies enabling conditions, including stakeholder acceptance, organizational governance readiness, and institutional alignment, and maps explicit research gaps for each mechanism to guide future empirical inquiry. By grounding the synthesis in the Technology Acceptance Model and Institutional Theory, the study provides a literature-derived, socio-technical framework for understanding blockchain adoption in APG and offers governance-oriented insights for organizations and policymakers. Full article
Show Figures

Figure 1

23 pages, 1629 KB  
Article
AI-Based Automated Scoring Layer Using Large Language Models and Semantic Analysis
by Anastasia Vangelova and Veska Gancheva
Appl. Sci. 2026, 16(7), 3537; https://doi.org/10.3390/app16073537 - 4 Apr 2026
Viewed by 850
Abstract
Automated scoring of open-ended questions is an important research direction in educational technology and artificial intelligence, as manual grading is time-consuming and often subject to inter-rater variation. This paper proposes an AI-based framework for automated scoring that combines large language models (LLMs), Retrieval-Augmented [...] Read more.
Automated scoring of open-ended questions is an important research direction in educational technology and artificial intelligence, as manual grading is time-consuming and often subject to inter-rater variation. This paper proposes an AI-based framework for automated scoring that combines large language models (LLMs), Retrieval-Augmented Generation (RAG), analytical rubrics, and structured machine-readable output within a Moodle-supported e-learning environment. The framework is designed to support context-grounded and criterion-based evaluation by combining the student response, retrieved instructional context, and rubric-defined scoring criteria within a controlled assessment workflow. The proposed approach aims to improve the consistency, traceability, and practical applicability of automated scoring for open-ended responses. To examine its performance, an experimental study was conducted in a real university setting involving a five-task open-ended examination. AI-generated scores were compared with independent human scores using agreement, reliability, correlation, and error metrics. The results indicate a strong level of agreement between automated and expert scoring within the tested setting, together with relatively low average deviation. These findings suggest that the proposed framework has practical potential for supporting automated assessment in digital learning environments, while also highlighting the importance of careful interpretation within the scope of the experimental design. Full article
(This article belongs to the Special Issue Application of Semantic Web Technologies for E-Learning)
Show Figures

Figure 1

16 pages, 717 KB  
Article
Validation and Cultural Adaptability of the MOBAK Test Battery for Assessing Fundamental Motor Skills in Chinese Children Aged 3–12 Years
by Jingjie Zhang, Ke Ning, Bingjun Wan, Hongmiao Chen, Chen Wang, Yue Ye and Hongyou Liu
Behav. Sci. 2026, 16(4), 534; https://doi.org/10.3390/bs16040534 - 2 Apr 2026
Viewed by 341
Abstract
Accurate assessment of children’s fundamental motor skills (FMS) is crucial for promoting lifelong healthy development and formulating effective physical education policies. However, China currently lacks standardized assessment tools that cover the entire age range from 3 to 12 years and have undergone thorough [...] Read more.
Accurate assessment of children’s fundamental motor skills (FMS) is crucial for promoting lifelong healthy development and formulating effective physical education policies. However, China currently lacks standardized assessment tools that cover the entire age range from 3 to 12 years and have undergone thorough cultural adaptation. This study aimed to evaluate the psychometric properties and cultural adaptability of the MOBAK assessment tool in measuring FMS in Chinese children aged 3 to 12 years. A total of 1200 Chinese children from four regions of China participated in the study, including 623 boys (52%) and 577 girls (48%). The MOBAK tool was used to assess FMS across different age groups, focusing on two dimensions: object movement (e.g., throwing, catching, bouncing, and dribbling) and self-movement (e.g., balancing, rolling, jumping, and running). The study evaluated psychometric properties, including reliability and validity. Results indicate that MOBAK demonstrates excellent psychometric characteristics: (1) Good item discrimination (all CR values p < 0.001), with an appropriate difficulty index (0.51–0.67); (2) Extremely high reliability, manifested by high internal consistency (α > 0.80), high test–retest stability, and high inter-rater consistency (ICC > 0.90); (3) Robust construct validity, supported by exploratory and confirmatory factor analyses, which consistently confirmed the hypothesized two-factor model and had excellent fit indicators (CFI/TLI > 0.90, RMSEA/SRMR < 0.08). The MOBAK battery demonstrates strong psychometric properties and cultural validity in the Chinese context for reliably assessing FMS in children aged 3–12 years. These findings provide a foundation for future cross-cultural comparisons and validation studies in other populations. Full article
Show Figures

Figure 1

11 pages, 1657 KB  
Article
Ergonomic Risk in Total Hip Arthroplasty: Approach-Specific Postural Loads and Position-Swap Effects During Cup Preparation
by Carmelo Marín-Martínez, José Emilio Mantilla-de-los-Ríos-García, Elena Galián-Muñoz, Marina Sánchez-Robles, Vicente Jesús León-Muñoz, Antonio Murcia-Asensio, Matilde Moreno-Cascales and Francisco Lajara-Marco
Appl. Sci. 2026, 16(7), 3418; https://doi.org/10.3390/app16073418 - 1 Apr 2026
Viewed by 353
Abstract
Musculoskeletal disorders (MSDs) among orthopaedic surgeons are associated with sustained, constrained postures during demanding intraoperative tasks. Total hip arthroplasty (THA) comprises sequential steps that may impose different postural loads on both the surgeon and assistant, yet team-level ergonomic design interventions remain underexplored. This [...] Read more.
Musculoskeletal disorders (MSDs) among orthopaedic surgeons are associated with sustained, constrained postures during demanding intraoperative tasks. Total hip arthroplasty (THA) comprises sequential steps that may impose different postural loads on both the surgeon and assistant, yet team-level ergonomic design interventions remain underexplored. This study compared ergonomic risk during primary THA performed through the direct lateral (modified Hardinge) and posterolateral (Moore) approaches and assessed a simple workflow redesign: swapping surgeon and assistant positions during acetabular cup preparation (bottom reaming, perimeter reaming, and cup impaction). In a controlled Sawbones-based simulation using standard THA instruments, eight standardised surgical steps were recorded with 360° photographs. Forty-two postural instances (22 for the surgeon, 20 for the assistant) were analysed. Joint angles were measured with Kinovea and converted to Rapid Entire Body Assessment (REBA) scores; intra- and inter-rater reliability (ICC) and minimum detectable change (MDC95) were calculated. Surgeon REBA scores were in the medium-risk range and slightly lower with the posterolateral approach (mean 5.5) than with the direct lateral approach (mean 5.88), whereas assistant scores were in the low-risk range (means 3.43 and 3.29, respectively). The position-swap intervention successfully lowered the surgeon’s REBA action level, most notably during cup impaction, where ergonomic risk dropped from 10 (high risk) to 4 (medium risk) in the posterolateral approach, and from 7 (medium risk) to 3 (low risk) in the direct lateral approach, without increasing assistant risk. These findings provide controlled simulation-based evidence that this simple, zero-cost positional change can reduce the surgeon’s ergonomic action level during THA, although confirmation under real operative conditions is needed before broad generalization. Full article
(This article belongs to the Special Issue Novel Approaches and Applications in Ergonomic Design, 4th Edition)
Show Figures

Figure 1

9 pages, 1745 KB  
Article
Reliability of Preoperative MRI Findings for Differentiating Spontaneous Spinal Subdural and Epidural Hematomas: A Multi-Institutional Retrospective Study of 27 Surgically Treated Cases
by Shun Okuwaki, Hiroshi Takahashi, Katsuya Nagashima, Tomoyuki Asada, Takane Nakagawa, Takahiro Sunami, Yosuke Ogata, Kotaro Sakashita, Hisanori Gamada, Kousei Miura, Hiroshi Noguchi, Yosuke Takeuchi, Toru Funayama, Masao Koda and Masaki Tatsumura
J. Clin. Med. 2026, 15(7), 2602; https://doi.org/10.3390/jcm15072602 - 29 Mar 2026
Viewed by 300
Abstract
Background/Objectives: Spontaneous spinal subdural hematoma (SSSDH) is a rare and severe condition that causes rapid neurological decline. Spontaneous spinal epidural hematoma (SSEH) presents similarly but is more common, and surgical management differs because SSSDH requires an intradural approach. Few studies have assessed the [...] Read more.
Background/Objectives: Spontaneous spinal subdural hematoma (SSSDH) is a rare and severe condition that causes rapid neurological decline. Spontaneous spinal epidural hematoma (SSEH) presents similarly but is more common, and surgical management differs because SSSDH requires an intradural approach. Few studies have assessed the reliability of magnetic resonance imaging (MRI) features used to distinguish SSSDH from SSEH in patients requiring surgery. Methods: We retrospectively reviewed 27 patients who underwent surgical evacuation of spinal hematomas at two institutions (2015–2025). Definitive hematoma location was determined intraoperatively. Four MRI features—shape (crescentic vs. biconvex), location (ventral vs. dorsal), craniocaudal length (<5 vs. ≥5 segments), and spinal region—were independently evaluated by two reviewers. Inter- and intra-rater reliability was assessed using agreement rate and Cohen’s kappa (κ) with 95% confidence intervals (95% CIs). Results: Among 27 cases, three (11.1%) were SSSDH and 24 were SSEH. Hematoma location, length, and spinal region demonstrated perfect inter- and intra-rater agreement (κ = 1.00). For hematoma shape, intra-rater agreement was good (96.2%, κ = 0.84; 95% CI 0.52–1.00), whereas inter-rater agreement was poor to fair (84.6%, κ = 0.26; 95% CI −0.25–0.77). Notably, two of the three SSSDHs demonstrated a biconvex configuration, and 83.3% of SSEHs also exhibited a biconvex morphology. Conclusions: MRI features such as hematoma location, extent, and spinal level were highly reproducible, whereas hematoma shape showed limited reliability. Although ventral hematomas most strongly suggest SSSDH, atypical SSEH presentations occur. When dorsal exposure reveals no epidural hematoma, intradural exploration should be promptly considered. Full article
(This article belongs to the Special Issue Clinical Advances in Spinal Neurosurgery)
Show Figures

Figure 1

13 pages, 1141 KB  
Article
Validation and Reproducibility of an App for Continuous Measurement as an Assessment Tool for Idiopathic Scoliosis
by Isis Juliene Rodrigues Leite Navarro, Louis Jacob, Kevin Masetto, Francesco Dulio, Andrea Negrini, Stefano Negrini, Fabio Zaina and Alessandra Negrini
Sensors 2026, 26(7), 2099; https://doi.org/10.3390/s26072099 - 27 Mar 2026
Viewed by 458
Abstract
(1) Background: Idiopathic scoliosis is a three-dimensional deformity, yet clinical and research decision-making still relies largely on radiographic Cobb angle measurements. As a radiation-free alternative, clinical assessment of transverse and sagittal plane deformities has gained importance. This study evaluated the concurrent validity and [...] Read more.
(1) Background: Idiopathic scoliosis is a three-dimensional deformity, yet clinical and research decision-making still relies largely on radiographic Cobb angle measurements. As a radiation-free alternative, clinical assessment of transverse and sagittal plane deformities has gained importance. This study evaluated the concurrent validity and intra- and interrater reproducibility of continuous measurements of rib hump, thoracic kyphosis, and lumbar lordosis obtained using a smartphone application in adolescents with spinal deformities. (2) Methods: Adolescents aged 10–17 years with scoliosis (>10° Cobb) or hyperkyphosis (>50° Cobb) were recruited. Continuous measurements of angle of trunk rotation (ATR) during the Adams forward bend test and in standing position, as well as sagittal profile, were collected using the ISICO app mounted on a standardized plastic tool. Concurrent validity was assessed against a scoliometer using Spearman correlation, root mean square error, and Bland–Altman analysis, while reproducibility was evaluated using intraclass correlation coefficients, standard error of measurement, and minimal detectable change. (3) Results: Thirty-two adolescents were included for validation and intrarater analyses and 34 for interrater analyses. ATR measured during the Adams test showed very high correlation with the scoliometer and minimal bias, while standing ATR showed moderate correlation. Reliability was excellent for rib hump during forward bending and moderate for sagittal parameters, with the lowest values observed for lumbar lordosis. (4) Conclusions: These findings support the clinical use of continuous app-based ATR assessment and suggest that sagittal measurements may be useful with appropriate examiner training. Full article
(This article belongs to the Section Biomedical Sensors)
Show Figures

Figure 1

18 pages, 821 KB  
Article
Phase-Based Motor Skill Acquisition in Preschool Children with Different Participation Experience in a Kinesiology Program
by Kristian Plazibat, Tihomir Vidranski and Renata Barić
J. Funct. Morphol. Kinesiol. 2026, 11(2), 133; https://doi.org/10.3390/jfmk11020133 - 24 Mar 2026
Viewed by 276
Abstract
Background: Early childhood is a critical period for the development of motor competence, which is closely related to later physical activity, educational readiness, and broader developmental outcomes. However, the temporal dynamics of motor skill acquisition in preschool children, particularly the time required to [...] Read more.
Background: Early childhood is a critical period for the development of motor competence, which is closely related to later physical activity, educational readiness, and broader developmental outcomes. However, the temporal dynamics of motor skill acquisition in preschool children, particularly the time required to reach initial and early refinement phases of learning, remain insufficiently described. The aim of this study was to examine whether different levels of previous participation experience in an organized kinesiology program are associated with differences in the speed and quality of novel motor skill acquisition in preschool children, and to explore the relationship between baseline motor proficiency and phase-based indicators of motor learning. Methods: A total of 161 preschool children aged 5–6 years participated in the study and were grouped according to their previous participation experience in an organized kinesiology program (0 h, ~120 h, ~350 h, and ~470 h). Following BOT-2 assessment, all participants completed a standardized 7-week motor learning program that included nine previously unfamiliar motor tasks. Using a phase-based video analysis protocol, three learning indicators were recorded: time to Phase 1 (F1; first successful execution), time to Phase 2 (F2; initial refinement of performance), and final performance quality (K). Group differences and associations were first examined descriptively and correlationally, after which additional multivariable regression models were performed to determine whether previous participation experience and baseline motor proficiency were independently associated with motor learning outcomes. Results: The findings showed consistent differences across groups, with children who had greater previous participation experience generally reaching F1 and F2 more rapidly and achieving higher final performance quality scores. Higher BOT-2 scores were also associated with shorter learning times and better final performance quality. In the multivariable models, both previous participation experience in an organized kinesiology program and BOT-2 total score were independently associated with Phase 1 attainment time and final performance quality, whereas only previous participation experience remained independently associated with Phase 2 attainment time. The applied phase-based observational protocol demonstrated good to excellent inter-rater reliability across the evaluated motor learning variables. Conclusions: These findings provide phase-based temporal indicators of motor learning progression in preschool children and suggest that previous participation experience in an organized kinesiology program and baseline motor competence are meaningfully associated with the speed and quality of acquiring new motor tasks. The findings also demonstrate the potential of phase-based approaches for quantifying motor learning dynamics in early childhood settings. Such indicators may offer useful reference information for instructional pacing and the planning of motor learning activities, while also serving as practically relevant predictors for adapting future kinesiology programs to children’s motor readiness. Future research should further examine these relationships using longitudinal and analytically expanded designs. Full article
Show Figures

Graphical abstract

Back to TopTop