Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (827)

Search Parameters:
Keywords = inter-rater reliability

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
26 pages, 798 KB  
Article
Strategic Competence in Sustainability Education: Conceptual Patterns Identified through AI-Assisted Qualitative Analysis
by Cathérine Conradty and Franz Xaver Bogner
Sustainability 2026, 18(7), 3643; https://doi.org/10.3390/su18073643 (registering DOI) - 7 Apr 2026
Abstract
This study investigates how participants conceptualise sustainability and sustainability citizenship, as well as how these conceptualisations relate to perceived agency. Drawing on two open-ended prompts, it analyses participants’ visions of a sustainable future and the roles they would like to play within it. [...] Read more.
This study investigates how participants conceptualise sustainability and sustainability citizenship, as well as how these conceptualisations relate to perceived agency. Drawing on two open-ended prompts, it analyses participants’ visions of a sustainable future and the roles they would like to play within it. The dataset was based on 1714 coded response segments from 164 participants. Methodologically, the study combines qualitative content analysis, independent human-AI double coding, manual validation, inter-rater reliability assessment, and residual-based co-occurrence analysis within a qualitatively grounded mixed-methods design. The results show that sustainability is predominantly framed in civic, symbolic, and ecological terms, whereas strategic competence and professionally articulated agency remain less visible. Sustainability meanings and role conceptions also vary systematically across disciplinary contexts. In addition, the analyses reveal patterned gaps between participants’ future visions and their self-attributed roles in sustainability transformations. The study contributes empirical insights into sustainability meaning-making and perceived agency and shows how LLM-assisted coding can be embedded in a transparent mixed-methods workflow. For sustainability education, the findings underline the importance of strengthening strategic and systemic dimensions of competence and linking civic engagement more closely to professional pathways of action. Full article
14 pages, 1814 KB  
Article
Endplate Bone Quality Assessment for Preoperative Planning and Patient-Specific Implementation in Lumbar Spine Surgery
by Wesley P. Jameson, Bailey D. Lupo, Andrew M. Schwartz, Andrew Daigle, Ahmed Anwar, Smith Surendran, Huy Tran, Christian Quinones, Deepak Kumbhare, Bharat Guthikonda and Stanley Hoang
J. Clin. Med. 2026, 15(7), 2800; https://doi.org/10.3390/jcm15072800 - 7 Apr 2026
Abstract
Background/Objectives: Poor bone quality is strongly associated with adverse surgical events. Although dual-energy X-ray absorptiometry (DXA) remains the gold standard for bone mineral density (BMD) assessment, logistical barriers may limit its preoperative application. The Endplate Bone Quality (EBQ) score is an MRI-derived [...] Read more.
Background/Objectives: Poor bone quality is strongly associated with adverse surgical events. Although dual-energy X-ray absorptiometry (DXA) remains the gold standard for bone mineral density (BMD) assessment, logistical barriers may limit its preoperative application. The Endplate Bone Quality (EBQ) score is an MRI-derived metric quantifying subchondral bone quality at the vertebral endplate with demonstrated predictive value for cage subsidence following lumbar interbody fusion. However, EBQ has been measured exclusively at the operative level in surgical cohorts. This study aimed to assess level-specific EBQ scores across the entire lumbar spine and compare distributions across age, sex and osteoporosis subgroups. Methods: A single-institution retrospective review of T1-weighted lumbar MRI studies from patients evaluated for lower back pain from 2020 to 2025 was performed. EBQ was independently scored by two blinded raters at each disc space from L1–L2 to L5–S1 using 3 mm endplate ROIs normalized to a CSF ROI at L3. Interrater reliability was assessed via ICC, Pearson correlation, and RMSE. Patients were stratified by age (≤60 vs. >60 years), sex, and osteoporosis status, and subgroup comparisons were performed for overall and level-specific EBQ score. Results: A total of 96 patients with an average age of 61.0 ± 9.42 years were included in this study. The majority of patients included were female (87.5%), and 18.8% had been diagnosed with osteoporosis. EBQ scores demonstrated a progressive caudal increase across all subgroups from L2–L3 to L5–S1. Overall interrater reliability was acceptable (ICC = 0.76), with level-specific ICCs ranging from 0.70 to 0.83. No significant differences were observed between age or sex subgroups. Osteoporotic patients demonstrated significantly higher EBQ at L1–L2, L2–L3, and overall (all p < 0.05), with no significant differences at L3–L4 through L5–S1. Conclusions: This study provides normative, level-specific EBQ reference data throughout all levels of the lumbar spine. The increase in EBQ scores seen among caudal levels and reduced osteoporotic discriminatory power support the importance of level-specific context when interpreting EBQ thresholds. These findings may support future studies evaluating threshold development for EBQ. Full article
(This article belongs to the Special Issue Clinical Advancements in Spine Surgery: Best Practices and Outcomes)
Show Figures

Figure 1

15 pages, 497 KB  
Article
An Assessment of GPT-3.5 and GPT-4.0 Responses to Scoliosis FAQs
by Tu-Lan Vu-Han, Enikö Regényi, Vikram Sunkara, Paul Köhli, Friederike Schömig, Alexander P. Hughes, Michael Putzier, Matthias Pumberger and Thilo Khakzad
J. Pers. Med. 2026, 16(4), 206; https://doi.org/10.3390/jpm16040206 - 7 Apr 2026
Abstract
Background: ChatGPT is a large language model (LLM) online chatbot developed by OpenAI and launched in November 2022. Early adoption studies have shown high readiness to use this technology for health-related questions and self-diagnosis. However, the quality and clinical adequacy of health-related [...] Read more.
Background: ChatGPT is a large language model (LLM) online chatbot developed by OpenAI and launched in November 2022. Early adoption studies have shown high readiness to use this technology for health-related questions and self-diagnosis. However, the quality and clinical adequacy of health-related responses remain incompletely characterized. This study aimed to explore responses generated by ChatGPT-3.5 and ChatGPT-4.0 to common patient questions regarding scoliosis. Methods: Ten scoliosis-related frequently asked questions (FAQs) were selected from a larger pool of over 250 patient-facing questions compiled from 17 publicly available FAQ webpages and informed by a Google Trends analysis. Questions were harmonized, grouped by theme, and then reduced by rule-based expert review to a final set intended to represent common patient concerns. Results: The median ratings of ChatGPT-3.5 and ChatGPT-4.0 responses ranged from satisfactory, requiring minimal (2) to moderate clarification (3). Across the ten matched questions, no statistically detectable difference was found between models in this study setting (W = 8.0, p = 0.59; Cliff’s δ = −0.12 [95% CI −0.58, 0.40]); however, given the small question set, unblinded rating process, and poor inter-rater reliability, this should not be interpreted as evidence of equivalence, non-inferiority, or comparable model performance. The results apply only to the 10–15 April 2024, online snapshots of ChatGPT-3.5 and ChatGPT-4.0 and should not be generalized to later model iterations. Conclusions: This study should be interpreted as a clinically oriented observational report, intended to inform physician awareness and patient-physician communication rather than validate chatbot accuracy or safety. In this 10–15 April 2024, sample, both model outputs frequently required clinician clarification. Given the small FAQ set, low inter-rater reliability, unblinded design, and single-sample outputs, the findings do not establish equivalence or superiority and apply only to the specific 10–15 April 2024, model snapshots and evaluated questions. Full article
(This article belongs to the Special Issue AI and Precision Medicine: Innovations and Applications)
Show Figures

Figure 1

20 pages, 1508 KB  
Systematic Review
Blockchain Technology and Automated Project Governance: A Systematic Review of Governance Mechanisms, Enabling Conditions, and Future Research Directions
by Mohammed Saeed Alotaibi
Sustainability 2026, 18(7), 3589; https://doi.org/10.3390/su18073589 - 6 Apr 2026
Abstract
This study synthesizes peer-reviewed literature to examine how blockchain technology supports Automated Project Governance (APG), focusing on the organizational, institutional, and human conditions under which potential governance contribution is realized. A systematic literature review was conducted in accordance with PRISMA 2020 guidelines, yielding [...] Read more.
This study synthesizes peer-reviewed literature to examine how blockchain technology supports Automated Project Governance (APG), focusing on the organizational, institutional, and human conditions under which potential governance contribution is realized. A systematic literature review was conducted in accordance with PRISMA 2020 guidelines, yielding twenty-one empirically and conceptually grounded studies. Screening reliability was strengthened through independent dual screening at the full-text eligibility stage (inter-rater κ = 0.81). Seven blockchain-enabled governance mechanisms are synthesized and comparatively assessed in terms of evidentiary support and research maturity, suggesting that blockchain’s decentralized and immutable architecture may support transparency, accountability, and coordination when embedded within appropriate governance arrangements, but these benefits do not arise automatically from technological adoption. The synthesis further identifies enabling conditions, including stakeholder acceptance, organizational governance readiness, and institutional alignment, and maps explicit research gaps for each mechanism to guide future empirical inquiry. By grounding the synthesis in the Technology Acceptance Model and Institutional Theory, the study provides a literature-derived, socio-technical framework for understanding blockchain adoption in APG and offers governance-oriented insights for organizations and policymakers. Full article
Show Figures

Figure 1

23 pages, 1629 KB  
Article
AI-Based Automated Scoring Layer Using Large Language Models and Semantic Analysis
by Anastasia Vangelova and Veska Gancheva
Appl. Sci. 2026, 16(7), 3537; https://doi.org/10.3390/app16073537 - 4 Apr 2026
Viewed by 374
Abstract
Automated scoring of open-ended questions is an important research direction in educational technology and artificial intelligence, as manual grading is time-consuming and often subject to inter-rater variation. This paper proposes an AI-based framework for automated scoring that combines large language models (LLMs), Retrieval-Augmented [...] Read more.
Automated scoring of open-ended questions is an important research direction in educational technology and artificial intelligence, as manual grading is time-consuming and often subject to inter-rater variation. This paper proposes an AI-based framework for automated scoring that combines large language models (LLMs), Retrieval-Augmented Generation (RAG), analytical rubrics, and structured machine-readable output within a Moodle-supported e-learning environment. The framework is designed to support context-grounded and criterion-based evaluation by combining the student response, retrieved instructional context, and rubric-defined scoring criteria within a controlled assessment workflow. The proposed approach aims to improve the consistency, traceability, and practical applicability of automated scoring for open-ended responses. To examine its performance, an experimental study was conducted in a real university setting involving a five-task open-ended examination. AI-generated scores were compared with independent human scores using agreement, reliability, correlation, and error metrics. The results indicate a strong level of agreement between automated and expert scoring within the tested setting, together with relatively low average deviation. These findings suggest that the proposed framework has practical potential for supporting automated assessment in digital learning environments, while also highlighting the importance of careful interpretation within the scope of the experimental design. Full article
(This article belongs to the Special Issue Application of Semantic Web Technologies for E-Learning)
Show Figures

Figure 1

16 pages, 717 KB  
Article
Validation and Cultural Adaptability of the MOBAK Test Battery for Assessing Fundamental Motor Skills in Chinese Children Aged 3–12 Years
by Jingjie Zhang, Ke Ning, Bingjun Wan, Hongmiao Chen, Chen Wang, Yue Ye and Hongyou Liu
Behav. Sci. 2026, 16(4), 534; https://doi.org/10.3390/bs16040534 - 2 Apr 2026
Viewed by 212
Abstract
Accurate assessment of children’s fundamental motor skills (FMS) is crucial for promoting lifelong healthy development and formulating effective physical education policies. However, China currently lacks standardized assessment tools that cover the entire age range from 3 to 12 years and have undergone thorough [...] Read more.
Accurate assessment of children’s fundamental motor skills (FMS) is crucial for promoting lifelong healthy development and formulating effective physical education policies. However, China currently lacks standardized assessment tools that cover the entire age range from 3 to 12 years and have undergone thorough cultural adaptation. This study aimed to evaluate the psychometric properties and cultural adaptability of the MOBAK assessment tool in measuring FMS in Chinese children aged 3 to 12 years. A total of 1200 Chinese children from four regions of China participated in the study, including 623 boys (52%) and 577 girls (48%). The MOBAK tool was used to assess FMS across different age groups, focusing on two dimensions: object movement (e.g., throwing, catching, bouncing, and dribbling) and self-movement (e.g., balancing, rolling, jumping, and running). The study evaluated psychometric properties, including reliability and validity. Results indicate that MOBAK demonstrates excellent psychometric characteristics: (1) Good item discrimination (all CR values p < 0.001), with an appropriate difficulty index (0.51–0.67); (2) Extremely high reliability, manifested by high internal consistency (α > 0.80), high test–retest stability, and high inter-rater consistency (ICC > 0.90); (3) Robust construct validity, supported by exploratory and confirmatory factor analyses, which consistently confirmed the hypothesized two-factor model and had excellent fit indicators (CFI/TLI > 0.90, RMSEA/SRMR < 0.08). The MOBAK battery demonstrates strong psychometric properties and cultural validity in the Chinese context for reliably assessing FMS in children aged 3–12 years. These findings provide a foundation for future cross-cultural comparisons and validation studies in other populations. Full article
Show Figures

Figure 1

11 pages, 1657 KB  
Article
Ergonomic Risk in Total Hip Arthroplasty: Approach-Specific Postural Loads and Position-Swap Effects During Cup Preparation
by Carmelo Marín-Martínez, José Emilio Mantilla-de-los-Ríos-García, Elena Galián-Muñoz, Marina Sánchez-Robles, Vicente Jesús León-Muñoz, Antonio Murcia-Asensio, Matilde Moreno-Cascales and Francisco Lajara-Marco
Appl. Sci. 2026, 16(7), 3418; https://doi.org/10.3390/app16073418 - 1 Apr 2026
Viewed by 159
Abstract
Musculoskeletal disorders (MSDs) among orthopaedic surgeons are associated with sustained, constrained postures during demanding intraoperative tasks. Total hip arthroplasty (THA) comprises sequential steps that may impose different postural loads on both the surgeon and assistant, yet team-level ergonomic design interventions remain underexplored. This [...] Read more.
Musculoskeletal disorders (MSDs) among orthopaedic surgeons are associated with sustained, constrained postures during demanding intraoperative tasks. Total hip arthroplasty (THA) comprises sequential steps that may impose different postural loads on both the surgeon and assistant, yet team-level ergonomic design interventions remain underexplored. This study compared ergonomic risk during primary THA performed through the direct lateral (modified Hardinge) and posterolateral (Moore) approaches and assessed a simple workflow redesign: swapping surgeon and assistant positions during acetabular cup preparation (bottom reaming, perimeter reaming, and cup impaction). In a controlled Sawbones-based simulation using standard THA instruments, eight standardised surgical steps were recorded with 360° photographs. Forty-two postural instances (22 for the surgeon, 20 for the assistant) were analysed. Joint angles were measured with Kinovea and converted to Rapid Entire Body Assessment (REBA) scores; intra- and inter-rater reliability (ICC) and minimum detectable change (MDC95) were calculated. Surgeon REBA scores were in the medium-risk range and slightly lower with the posterolateral approach (mean 5.5) than with the direct lateral approach (mean 5.88), whereas assistant scores were in the low-risk range (means 3.43 and 3.29, respectively). The position-swap intervention successfully lowered the surgeon’s REBA action level, most notably during cup impaction, where ergonomic risk dropped from 10 (high risk) to 4 (medium risk) in the posterolateral approach, and from 7 (medium risk) to 3 (low risk) in the direct lateral approach, without increasing assistant risk. These findings provide controlled simulation-based evidence that this simple, zero-cost positional change can reduce the surgeon’s ergonomic action level during THA, although confirmation under real operative conditions is needed before broad generalization. Full article
(This article belongs to the Special Issue Novel Approaches and Applications in Ergonomic Design, 4th Edition)
Show Figures

Figure 1

9 pages, 1745 KB  
Article
Reliability of Preoperative MRI Findings for Differentiating Spontaneous Spinal Subdural and Epidural Hematomas: A Multi-Institutional Retrospective Study of 27 Surgically Treated Cases
by Shun Okuwaki, Hiroshi Takahashi, Katsuya Nagashima, Tomoyuki Asada, Takane Nakagawa, Takahiro Sunami, Yosuke Ogata, Kotaro Sakashita, Hisanori Gamada, Kousei Miura, Hiroshi Noguchi, Yosuke Takeuchi, Toru Funayama, Masao Koda and Masaki Tatsumura
J. Clin. Med. 2026, 15(7), 2602; https://doi.org/10.3390/jcm15072602 - 29 Mar 2026
Viewed by 202
Abstract
Background/Objectives: Spontaneous spinal subdural hematoma (SSSDH) is a rare and severe condition that causes rapid neurological decline. Spontaneous spinal epidural hematoma (SSEH) presents similarly but is more common, and surgical management differs because SSSDH requires an intradural approach. Few studies have assessed the [...] Read more.
Background/Objectives: Spontaneous spinal subdural hematoma (SSSDH) is a rare and severe condition that causes rapid neurological decline. Spontaneous spinal epidural hematoma (SSEH) presents similarly but is more common, and surgical management differs because SSSDH requires an intradural approach. Few studies have assessed the reliability of magnetic resonance imaging (MRI) features used to distinguish SSSDH from SSEH in patients requiring surgery. Methods: We retrospectively reviewed 27 patients who underwent surgical evacuation of spinal hematomas at two institutions (2015–2025). Definitive hematoma location was determined intraoperatively. Four MRI features—shape (crescentic vs. biconvex), location (ventral vs. dorsal), craniocaudal length (<5 vs. ≥5 segments), and spinal region—were independently evaluated by two reviewers. Inter- and intra-rater reliability was assessed using agreement rate and Cohen’s kappa (κ) with 95% confidence intervals (95% CIs). Results: Among 27 cases, three (11.1%) were SSSDH and 24 were SSEH. Hematoma location, length, and spinal region demonstrated perfect inter- and intra-rater agreement (κ = 1.00). For hematoma shape, intra-rater agreement was good (96.2%, κ = 0.84; 95% CI 0.52–1.00), whereas inter-rater agreement was poor to fair (84.6%, κ = 0.26; 95% CI −0.25–0.77). Notably, two of the three SSSDHs demonstrated a biconvex configuration, and 83.3% of SSEHs also exhibited a biconvex morphology. Conclusions: MRI features such as hematoma location, extent, and spinal level were highly reproducible, whereas hematoma shape showed limited reliability. Although ventral hematomas most strongly suggest SSSDH, atypical SSEH presentations occur. When dorsal exposure reveals no epidural hematoma, intradural exploration should be promptly considered. Full article
(This article belongs to the Special Issue Clinical Advances in Spinal Neurosurgery)
Show Figures

Figure 1

13 pages, 1141 KB  
Article
Validation and Reproducibility of an App for Continuous Measurement as an Assessment Tool for Idiopathic Scoliosis
by Isis Juliene Rodrigues Leite Navarro, Louis Jacob, Kevin Masetto, Francesco Dulio, Andrea Negrini, Stefano Negrini, Fabio Zaina and Alessandra Negrini
Sensors 2026, 26(7), 2099; https://doi.org/10.3390/s26072099 - 27 Mar 2026
Viewed by 387
Abstract
(1) Background: Idiopathic scoliosis is a three-dimensional deformity, yet clinical and research decision-making still relies largely on radiographic Cobb angle measurements. As a radiation-free alternative, clinical assessment of transverse and sagittal plane deformities has gained importance. This study evaluated the concurrent validity and [...] Read more.
(1) Background: Idiopathic scoliosis is a three-dimensional deformity, yet clinical and research decision-making still relies largely on radiographic Cobb angle measurements. As a radiation-free alternative, clinical assessment of transverse and sagittal plane deformities has gained importance. This study evaluated the concurrent validity and intra- and interrater reproducibility of continuous measurements of rib hump, thoracic kyphosis, and lumbar lordosis obtained using a smartphone application in adolescents with spinal deformities. (2) Methods: Adolescents aged 10–17 years with scoliosis (>10° Cobb) or hyperkyphosis (>50° Cobb) were recruited. Continuous measurements of angle of trunk rotation (ATR) during the Adams forward bend test and in standing position, as well as sagittal profile, were collected using the ISICO app mounted on a standardized plastic tool. Concurrent validity was assessed against a scoliometer using Spearman correlation, root mean square error, and Bland–Altman analysis, while reproducibility was evaluated using intraclass correlation coefficients, standard error of measurement, and minimal detectable change. (3) Results: Thirty-two adolescents were included for validation and intrarater analyses and 34 for interrater analyses. ATR measured during the Adams test showed very high correlation with the scoliometer and minimal bias, while standing ATR showed moderate correlation. Reliability was excellent for rib hump during forward bending and moderate for sagittal parameters, with the lowest values observed for lumbar lordosis. (4) Conclusions: These findings support the clinical use of continuous app-based ATR assessment and suggest that sagittal measurements may be useful with appropriate examiner training. Full article
(This article belongs to the Section Biomedical Sensors)
Show Figures

Figure 1

18 pages, 821 KB  
Article
Phase-Based Motor Skill Acquisition in Preschool Children with Different Participation Experience in a Kinesiology Program
by Kristian Plazibat, Tihomir Vidranski and Renata Barić
J. Funct. Morphol. Kinesiol. 2026, 11(2), 133; https://doi.org/10.3390/jfmk11020133 - 24 Mar 2026
Viewed by 205
Abstract
Background: Early childhood is a critical period for the development of motor competence, which is closely related to later physical activity, educational readiness, and broader developmental outcomes. However, the temporal dynamics of motor skill acquisition in preschool children, particularly the time required to [...] Read more.
Background: Early childhood is a critical period for the development of motor competence, which is closely related to later physical activity, educational readiness, and broader developmental outcomes. However, the temporal dynamics of motor skill acquisition in preschool children, particularly the time required to reach initial and early refinement phases of learning, remain insufficiently described. The aim of this study was to examine whether different levels of previous participation experience in an organized kinesiology program are associated with differences in the speed and quality of novel motor skill acquisition in preschool children, and to explore the relationship between baseline motor proficiency and phase-based indicators of motor learning. Methods: A total of 161 preschool children aged 5–6 years participated in the study and were grouped according to their previous participation experience in an organized kinesiology program (0 h, ~120 h, ~350 h, and ~470 h). Following BOT-2 assessment, all participants completed a standardized 7-week motor learning program that included nine previously unfamiliar motor tasks. Using a phase-based video analysis protocol, three learning indicators were recorded: time to Phase 1 (F1; first successful execution), time to Phase 2 (F2; initial refinement of performance), and final performance quality (K). Group differences and associations were first examined descriptively and correlationally, after which additional multivariable regression models were performed to determine whether previous participation experience and baseline motor proficiency were independently associated with motor learning outcomes. Results: The findings showed consistent differences across groups, with children who had greater previous participation experience generally reaching F1 and F2 more rapidly and achieving higher final performance quality scores. Higher BOT-2 scores were also associated with shorter learning times and better final performance quality. In the multivariable models, both previous participation experience in an organized kinesiology program and BOT-2 total score were independently associated with Phase 1 attainment time and final performance quality, whereas only previous participation experience remained independently associated with Phase 2 attainment time. The applied phase-based observational protocol demonstrated good to excellent inter-rater reliability across the evaluated motor learning variables. Conclusions: These findings provide phase-based temporal indicators of motor learning progression in preschool children and suggest that previous participation experience in an organized kinesiology program and baseline motor competence are meaningfully associated with the speed and quality of acquiring new motor tasks. The findings also demonstrate the potential of phase-based approaches for quantifying motor learning dynamics in early childhood settings. Such indicators may offer useful reference information for instructional pacing and the planning of motor learning activities, while also serving as practically relevant predictors for adapting future kinesiology programs to children’s motor readiness. Future research should further examine these relationships using longitudinal and analytically expanded designs. Full article
Show Figures

Figure 1

11 pages, 1773 KB  
Article
Comparison of Different Classification Systems for Müllerian Duct Anomalies: A Retrospective Observational MRI Study
by Laura D’hoore, Eva Decroos, Pieter Julien Luc De Visschere, Ottavia Battaglia and Tjalina Hamerlynck
Medicina 2026, 62(3), 592; https://doi.org/10.3390/medicina62030592 - 21 Mar 2026
Viewed by 261
Abstract
Background and Objectives: Müllerian duct anomalies (MDAs) are congenital malformations of the female genital tract for which several classification systems have been proposed. The objective of this study is to estimate the interrater reliability of the American Fertility Society (AFS), European Society [...] Read more.
Background and Objectives: Müllerian duct anomalies (MDAs) are congenital malformations of the female genital tract for which several classification systems have been proposed. The objective of this study is to estimate the interrater reliability of the American Fertility Society (AFS), European Society of Human Reproduction and Embryology/European Society for Gynaecological Endoscopy (ESHRE/ESGE), American Society for Reproductive Medicine (ASRM) and Congenital Uterine Malformation by Experts (CUME) classification systems for Müllerian duct anomalies. Materials and Methods: This retrospective cohort study was conducted at a tertiary care hospital and included 71 patients aged up to 45 years who were assessed for a Müllerian duct anomaly between January 2000 and April 2023. Pelvic MRI images were independently evaluated by four readers, followed by a consensus meeting. The primary outcome was interrater reliability (Krippendorff’s α), and the secondary outcomes were the proportions of indeterminate and unclassifiable cases after consensus meeting. Results: The interrater reliability for MDA diagnosis was very low for all the classification systems (AFS α 0.63, 95% CI [0.57, 0.67]; ASRM α 0.46, 95% CI [0.41, 0.52]; ESHRE/ESGE α 0.33, 95% CI [0.29, 0.38]; CUME α 0.57, 95% CI [0.45, 0.72]). After consensus meeting, the ESHRE/ESGE system had more indeterminate cases (9.9%) and the ASRM system had more unclassifiable cases (20.6%). Conclusions: All the classification systems for Müllerian duct anomalies had a very low interrater reliability, with more indeterminate cases in the ESHRE/ESGE system and more unclassifiable cases in the ASRM system. We present our recommendations for the improvement of each classification system. The ultimate goal of future research should be the development of a single uniform system integrating the best features of these systems and with clinically relevant cut-off values, considering patients’ reproductive outcomes. Full article
(This article belongs to the Special Issue Interventional Radiology and Imaging in Cancer Diagnosis)
Show Figures

Figure 1

26 pages, 2391 KB  
Article
Validated Methods for Synthesising Hearing Health Data for Machine Learning: A Comparative Study of KDE and VAE Approaches
by Liam Barrett, Roulla Katiri, Yuen Bing Ooi, Isabella Moffitt, Anne G. M. Schilder and Nishchay Mehta
Appl. Sci. 2026, 16(6), 2917; https://doi.org/10.3390/app16062917 - 18 Mar 2026
Viewed by 249
Abstract
Hearing loss affects approximately 1.5 billion people globally, yet access to comprehensive audiometric datasets for research remains limited due to privacy constraints. Synthetic data generation offers a promising solution, enabling broader data sharing while preserving privacy. This study developed and validated two complementary [...] Read more.
Hearing loss affects approximately 1.5 billion people globally, yet access to comprehensive audiometric datasets for research remains limited due to privacy constraints. Synthetic data generation offers a promising solution, enabling broader data sharing while preserving privacy. This study developed and validated two complementary approaches for synthesising audiometric data: Kernel Density Estimation (KDE) and Variational Autoencoders (VAE). Using the National Health and Nutrition Examination Survey (NHANES) dataset comprising 36,676 participants with comprehensive hearing assessments, we trained both generative models and evaluated synthetic data quality through a rigorous Train-on-Synthetic-Test-on-Real (TSTR) machine learning validation framework and blinded expert clinical assessment by two independent audiologists. The VAE approach achieved 86.3% utility for hearing loss prediction, as compared to the benchmark real data (Train-on-Real-Test-on-Real). Both methods demonstrated strong privacy preservation, with zero exact record matches and robust membership inference attack resistance. Statistical validation confirmed equivalence within clinically negligible margins (<1 dB HL) across all audiometric frequencies. Blinded assessment of 85 patient profiles by two independent expert audiologists revealed that VAE synthetic data achieved high clinical plausibility ratings, with 96.7% of VAE profiles rated as plausible, compared to 13.3% for KDE. Inter-rater reliability was moderate (Cohen’s weighted κ=0.553, ICC =0.556), with 84.7% of ratings within one point, and both raters independently ranking VAE above real data above KDE. These findings establish validated methodologies for generating privacy-preserving synthetic audiometric data suitable for machine learning applications and clinical education, addressing a critical gap in hearing health research infrastructure. Full article
(This article belongs to the Special Issue Advances in Machine Learning and Big Data Analytics)
Show Figures

Figure 1

16 pages, 1140 KB  
Article
Large Language Models as Clinical Nutrition Decision Tools: Quantitative Bias and Guideline Deviation in Type 2 Diabetes Meal Planning
by Pinar Ece Karakas, Aysenur Calik, Ayse Betul Bilen, Kardelen Kandemir and Muveddet Emel Alphan
Healthcare 2026, 14(6), 739; https://doi.org/10.3390/healthcare14060739 - 13 Mar 2026
Viewed by 373
Abstract
Background/Objectives: Large language models (LLMs) are increasingly used as decision support tools in clinical nutrition, including meal planning for individuals with type 2 diabetes mellitus (T2DM). However, the clinical safety, quantitative accuracy, and guideline adherence of AI-generated dietary plans remain uncertain. This study [...] Read more.
Background/Objectives: Large language models (LLMs) are increasingly used as decision support tools in clinical nutrition, including meal planning for individuals with type 2 diabetes mellitus (T2DM). However, the clinical safety, quantitative accuracy, and guideline adherence of AI-generated dietary plans remain uncertain. This study aimed to evaluate systematic bias and agreement between LLM-generated diets and a guideline-concordant reference diet, and to assess whether current LLMs can function as reliable clinical nutrition decision support tools in T2DM. Methods: Six widely used LLMs generated standardized three-day, 1800 kcal dietary plans for T2DM using an identical prompt. Each day was treated as an independent observation (n = 18). Energy and macronutrient contents were analyzed using professional nutrition software and compared with a dietitian-designed reference diet based on ADA, EASD, IDF, and national guidelines. Agreement was evaluated using Bland–Altman analysis, proportional bias assessment, and intraclass correlation coefficients. Guideline adherence and clinical appropriateness were independently scored by registered dietitians. Results: Most LLM-generated diets systematically deviated from the reference diet, with lower total energy, reduced carbohydrate and fiber content, and variable protein distribution. Bland–Altman analyses demonstrated significant bias and wide limits of agreement for key nutrients, indicating clinically meaningful discrepancies. Guideline adherence scores varied substantially across models, with only one model showing relatively consistent performance. Inter-rater reliability between dietitians was high (ICC = 0.806). Conclusions: Current LLMs exhibit systematic quantitative bias and inconsistent guideline adherence when used for T2DM meal planning. AI-generated dietary plans are not interchangeable with dietitian-guided medical nutrition therapy and may pose clinical risks if used without professional oversight. Careful validation, domain-specific fine-tuning, and integration within supervised clinical workflows are required before implementation in diabetes care. Full article
Show Figures

Figure 1

21 pages, 4603 KB  
Article
From Casting to Printing: Rheological Modification of General-Purpose RTV-2 Silicones for Material Extrusion
by Francesco Buonamici, Lapo Governi, Yary Volpe, Monica Carfagni and Rocco Furferi
Appl. Sci. 2026, 16(6), 2764; https://doi.org/10.3390/app16062764 - 13 Mar 2026
Viewed by 335
Abstract
This study investigates the relationship between viscosity and manufacturability of two-component silicones in extrusion-based additive manufacturing. A methodology is proposed to adapt commercially available, low-viscosity general-purpose silicones for direct 3D printing using the material extrusion system provided by Lynxter S300X. EcoFlex™ 00-50 silicone [...] Read more.
This study investigates the relationship between viscosity and manufacturability of two-component silicones in extrusion-based additive manufacturing. A methodology is proposed to adapt commercially available, low-viscosity general-purpose silicones for direct 3D printing using the material extrusion system provided by Lynxter S300X. EcoFlex™ 00-50 silicone was modified through controlled additions of a thixotropic agent (THI-VEX), producing formulations with progressively increased viscosity. After a preliminary qualitative viscosity assessment, formulations were printed using identical process parameters and evaluated through a set of dedicated geometric benchmark specimens targeting critical failure modes, including unsupported thin walls, overhangs, gaps, and slender structures. Print outcomes were assessed via multi-rater visual inspection with inter-rater reliability analysis to ensure consistency. Results reveal a strong correlation between thixotropy and geometric fidelity, identifying the formulation containing 4.0 wt% THI-VEX as optimal under the tested conditions. The study provides practical design and process guidelines for silicone additive manufacturing and highlights the importance of integrated material–process optimization for reliable fabrication of soft, highly deformable materials. Full article
(This article belongs to the Section Additive Manufacturing Technologies)
Show Figures

Figure 1

13 pages, 1024 KB  
Article
Artificial Intelligence as a Support Tool for Preoperative Patient Education in Anesthesiology: A Comparative Evaluation of Five Large Language Models
by Ahmet Tuğrul Şahin, Mehtap Gürler Balta, Vildan Kölükçü, Ali Genç, Serkan Karaman, Tuğba Karaman and Hakan Tapar
J. Clin. Med. 2026, 15(6), 2197; https://doi.org/10.3390/jcm15062197 - 13 Mar 2026
Viewed by 313
Abstract
Background/Objectives: Large language models (LLMs) are increasingly used for patient education, yet comparative evidence regarding their accuracy, safety, and ethical performance remains limited, particularly in high-risk fields such as anesthesiology. This study aimed to conduct a multidimensional comparison of five contemporary LLMs [...] Read more.
Background/Objectives: Large language models (LLMs) are increasingly used for patient education, yet comparative evidence regarding their accuracy, safety, and ethical performance remains limited, particularly in high-risk fields such as anesthesiology. This study aimed to conduct a multidimensional comparison of five contemporary LLMs in answering common patient questions in anesthesiology. Methods: In this cross-sectional, comparative in silico study, 30 standardized patient questions covering general anesthesia, spinal/epidural anesthesia, and peripheral nerve blocks were submitted to ChatGPT, Gemini, Microsoft Copilot, DeepSeek, and Grok. Responses were independently evaluated under full blinding by five senior anesthesiology professors using a 5-point Likert scale across six domains: accuracy, safety, completeness, understandability, ethics, and overall assessment. Inter-rater reliability was assessed using intraclass correlation coefficients (ICC). Performance differences were analyzed using linear mixed-effects models accounting for question- and evaluator-level variability, with results reported as estimated marginal means. Results: Inter-rater agreement was good to excellent across all domains (ICC > 0.75). Significant model-related differences were observed for overall assessment, accuracy, safety, completeness, and ethics (all p < 0.001), whereas understandability did not differ significantly between models. ChatGPT achieved the highest overall performance, while Gemini demonstrated superior accuracy. Model performance varied across anesthesiology subspecialties, with significant model × topic interactions identified in multiple domains (p < 0.01). Conclusions: LLMs may serve as supportive tools for patient education in anesthesiology; however, their performance varies substantially across models and clinical contexts. Differences in accuracy, safety, and ethical performance highlight the need for cautious, context-aware integration of LLMs into clinical practice rather than their use as substitutes for anesthesiologists’ clinical judgment. Full article
(This article belongs to the Section Anesthesiology)
Show Figures

Figure 1

Back to TopTop