Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (121)

Search Parameters:
Keywords = Fleiss’ kappa

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
19 pages, 267 KB  
Article
Reliability of Auditory-Perceptual Analysis in the Study of Speech Function in Patients with Unilateral Cleft and Palate
by Alexandra Bloeck, Nora Ann Doyle, Sylva Bartel and Michael Krimmel
J. Clin. Med. 2026, 15(2), 588; https://doi.org/10.3390/jcm15020588 - 12 Jan 2026
Viewed by 138
Abstract
Background/Objectives: Multidisciplinary outcome studies are carried out to evaluate long-term treatment in patients with cleft lip and palate. Speech function as one of the key outcomes of the treatment is examined by means of an auditory-perceptual analysis. For scientific and global studies [...] Read more.
Background/Objectives: Multidisciplinary outcome studies are carried out to evaluate long-term treatment in patients with cleft lip and palate. Speech function as one of the key outcomes of the treatment is examined by means of an auditory-perceptual analysis. For scientific and global studies it is essential to reduce the risk of bias as much as possible. The aim of the present study was the examination of auditory-perceptive analyses on the basis of an outcome study. Reliability was evaluated. Methods: Twenty patients were examined to evaluate their speech function. The speech sample was obtained via the online tool Zoom™. The speech sample consisted of single words (picture supported), a version of the German “Great Ormond Street Speech Assessment” (GOS.SP.ASS) sentences and spontaneous speech. The analysis was carried out by three experienced examiners, all using the German version of the Universal Reporting Parameters at two different times. The intrarater and interrater reliability were calculated. Results: Twenty participants with unilateral cleft and palate and a minimum age of 18 years (ø 20.1) were enrolled in the analysis of the speech function. None of the participants had undergone a secondary operation due to velopharyngeal incompetence. The examination happened at a point in time before an osteotomy might be needed. The multidisciplinary treatment of the 20 participants regarding their speech function was successful. There were only marginal abnormalities. The listeners showed a very good intrarater and moderate interrater reliability (ICC/Fleiss’ kappa). An overall percentual agreement of 88.3% was achieved. Conclusions: These positive results cannot be compared with outcome studies on a national or international level, since the construction of the speech sample as well as the structure and the implementation of the auditing process reveal considerable deficiencies in methodological rigor. The small number of examiners and patients as well as the patients’ minor residual impairments influence the significance of the statistical calculation by kappa and ICC. The auditory-perceptual analysis should be validated for German-speaking countries. Full article
(This article belongs to the Special Issue New Advances in Cleft Lip and Palate and Facial Plastic Surgery)
10 pages, 1935 KB  
Article
Fracture Hunting in Ruby-Throated Hummingbirds (Archilochus colubris): A Comparative Study of General Radiography, Dental Radiography, Micro-CT, and 3D Reconstructed Imaging
by Haerin Rhim, Kimberly L. Boykin, Zoey Lex, Katie Bakalis, Rachel Jania, Kassandra Wilson, Devin Osterhoudt and Mark A. Mitchell
Animals 2026, 16(1), 62; https://doi.org/10.3390/ani16010062 - 25 Dec 2025
Viewed by 212
Abstract
Diagnosing fractures in hummingbirds is challenging because of their small size. This study evaluated the diagnostic performance and inter-reviewer agreement of four imaging modalities—conventional radiography, dental radiography, micro-computed tomography (micro-CT), and three-dimensional (3D)-reconstructed images from micro-CT scans—for identifying fractures in 16 ruby-throated hummingbirds [...] Read more.
Diagnosing fractures in hummingbirds is challenging because of their small size. This study evaluated the diagnostic performance and inter-reviewer agreement of four imaging modalities—conventional radiography, dental radiography, micro-computed tomography (micro-CT), and three-dimensional (3D)-reconstructed images from micro-CT scans—for identifying fractures in 16 ruby-throated hummingbirds (Archilochus colubris) admitted to a wildlife hospital. Six independent reviewers, with or in training for a specialty in veterinary radiology or wildlife medicine, assessed randomized image sets. Gross dissection of the carcasses using dermestid beetle larvae established the gold standard. Diagnostic performance metrics—sensitivity, specificity, predictive values, and likelihood ratios—were calculated for each modality. Inter-reviewer agreement was assessed using Fleiss’ kappa. Our results demonstrated that advanced imaging techniques improved diagnostic performance and inter-reviewer agreement compared to traditional radiography. While specificity (>88%) was comparable to other small animal studies, the sensitivity did not exceed 50% across all modalities. This low sensitivity reflects the challenges posed by minimal fracture displacement and hummingbirds’ extremely small size. Only 3D images achieved high positive likelihood ratios and superior inter-reviewer agreement, highlighting the unique value of 3D visualization in complex anatomical evaluations. Overall, the minute structures of hummingbirds present inherent diagnostic limitations, underscoring that negative radiographic results must be interpreted cautiously, and the possibility of false negatives should prompt consideration of advanced or follow-up imaging when clinical suspicion persists. Full article
(This article belongs to the Section Veterinary Clinical Studies)
Show Figures

Figure 1

13 pages, 1503 KB  
Article
Introduction of a Structured Reporting Protocol and Surgical Checklist for Rezum Water Vapor Therapy (VAPOR-SRP)
by Jan Ebbing, Viktor Alargkof, Christian Engesser, Anas Elyan, Hans-Helge Seifert, Nicola Keller, Brigitta Gahl, Pawel Trotsenko and Christian Wetterauer
J. Clin. Med. 2025, 14(23), 8431; https://doi.org/10.3390/jcm14238431 - 27 Nov 2025
Viewed by 400
Abstract
Background/Objectives: Rezum water vapor therapy for benign prostatic obstruction lacks standardized documentation, complicating data comparison. This study evaluates the completeness of non-standardized Rezum operative reports and validates a novel Rezum—Structured Reporting Protocol (SRP) to enhance documentation quality. Methods: Following the establishment [...] Read more.
Background/Objectives: Rezum water vapor therapy for benign prostatic obstruction lacks standardized documentation, complicating data comparison. This study evaluates the completeness of non-standardized Rezum operative reports and validates a novel Rezum—Structured Reporting Protocol (SRP) to enhance documentation quality. Methods: Following the establishment of content validity, the SRP—which includes detailed diagrams for various prostatic urethral lengths (PUL) and intravesical prostatic protrusion (IPP) to document injection sites, along with a comprehensive 10-item checklist capturing factors that may influence outcomes—was retrospectively applied to 100 Rezum cases. Operative videos and non-standardized reports were analyzed and compared against the SRP. For criterion validity, inter-rater reliability was evaluated through a blinded review of 20 cases by three Rezum users and the protocol development panel, comparing checklist item ratings. Results: Median number of injections was 4.0 (IQR: 2–6), injection density was 12.7 (IQR: 10–16.7) mL (PVOL)/injection, and injection interval was 0.7 (IQR: 0.5–1) cm (PUL)/injection. Variations in injection techniques were noted, including non-standard locations in 10% of cases and alternating injection sequences between lobes in 22%. Only 30% of reports detailed injection sites accurately. The intraclass coefficient for the rating of PUL was 0.94 (95% CI: 0.89–0.97). The Fleiss Kappa for MLE and IPP was 0.84 (95% CI: 0.66–1.02) and 0.85 (95% CI: 0.67–1.03), respectively. The agreement rate was 93% for bladder neck/urethra morphology and 100% for injection sequence. Kendall’s W was 0.37 (p = 0.343) for the item of injection sites. Conclusions: Variability in Rezum surgical techniques was observed, particularly in injection density, injection intervals, and precise injection locations, as well as in the structured information of non-SRP-standardized operative reports. Content validity of the SRP was achieved, leading to high inter-rater reliability in its application. The SRP promotes the standardization and completeness of Rezum data, thereby supporting improved, consistent, and high-quality Rezum documentation. Full article
(This article belongs to the Special Issue Emerging Surgical Techniques in the Management of Urological Diseases)
Show Figures

Figure 1

18 pages, 5010 KB  
Article
In Vitro Effect of Sequential Compressive Loading and Thermocycling on Marginal Microleakage of Digitally Fabricated Overlay Restorations Made from Five Materials
by Xavier Gutiérrez-Ruiz, Jordi Cano-Batalla, Òscar Figueras-Álvarez, Francisco Real-Voltas, Elena Núñez-Bielsa and Josep Cabratosa-Termes
Appl. Sci. 2025, 15(23), 12532; https://doi.org/10.3390/app152312532 - 26 Nov 2025
Viewed by 362
Abstract
Marginal microleakage compromises the longevity and biological seal of indirect restorations. Despite the growing adoption of computer-aided design and manufacturing (CAD/CAM) and three-dimensional (3D) printing technologies, limited evidence compares the marginal integrity of these materials under combined mechanical and thermal stresses. This study [...] Read more.
Marginal microleakage compromises the longevity and biological seal of indirect restorations. Despite the growing adoption of computer-aided design and manufacturing (CAD/CAM) and three-dimensional (3D) printing technologies, limited evidence compares the marginal integrity of these materials under combined mechanical and thermal stresses. This study evaluated and compared the marginal microleakage of overlay restorations fabricated from five contemporary restorative materials, IPS e.max® ZirCAD Prime, BioHPP®, G-CAM, VarseoSmile CrownPlus, and IPS e.max® CAD, after sequential compressive loading and thermocycling. A total of 125 extracted human molars were prepared for standardized 1.5 mm-thick CAD/CAM overlay restorations and assigned to three experimental conditions: control, sequential compressive loading (3 × 500 N), and thermocycling (6000 cycles between 5 °C and 55 °C) followed by loading. Microleakage was assessed using 2% methylene blue dye and stereomicroscopy. Data were analyzed using Fisher’s exact test and Fleiss’ Kappa (α = 0.05). G-CAM and IPS e.max® ZirCAD Prime exhibited the lowest microleakage across all testing conditions, while BioHPP® showed the highest values. Both sequential compressive loadings and thermocycling significantly increased microleakage in all materials (p < 0.001). The results indicate that material type significantly influences marginal sealing, with G-CAM and IPS e.max® ZirCAD Prime maintaining superior marginal integrity compared with other materials tested. Full article
(This article belongs to the Special Issue Research on Restorative Dentistry and Dental Biomaterials)
Show Figures

Figure 1

15 pages, 988 KB  
Article
Feasibility and Reliability of Ammer–Coelho Computational Tool for Sex Estimation: A Pilot Study on an Elderly Scottish Sample
by Mackenzie S. Todd and Julieta G. García-Donas
Forensic Sci. 2025, 5(4), 49; https://doi.org/10.3390/forensicsci5040049 - 18 Oct 2025
Viewed by 1027
Abstract
Background/Objectives: Estimating the sex from unknown individuals is a critical step when constructing their biological profile. The distal humerus is a useful sex discriminator as shown through metric, morphoscopic, and geometric morphometric approaches. A recently developed web application using geometric morphometric techniques has [...] Read more.
Background/Objectives: Estimating the sex from unknown individuals is a critical step when constructing their biological profile. The distal humerus is a useful sex discriminator as shown through metric, morphoscopic, and geometric morphometric approaches. A recently developed web application using geometric morphometric techniques has provided an accessible tool for estimating sex from the shape of the olecranon fossa. The aims of this study were to examine the accuracy of the Ammer–Coelho web application on Scottish individuals, as well as test its repeatability and reproducibility among seven different observers. Methods: The right humerus was obtained from 52 Scottish individuals, and the Ammer–Coelho web application was used to estimate sex. Total accuracy rates and sex-specific rates were calculated, and an analysis of Cohen’s and Fleiss’ kappa was performed. Results: The results demonstrate an overall accuracy of 69.23% with a sex bias of −5.33%, with 55.56% of the sample being accurately estimated with probabilities equal to or higher than 0.95. Substantial agreement was reported for intra-observer error, and an overall low agreement was reported for inter-observer error Conclusions: This is the first study that evaluates the Ammer–Coelho web application. A tendency to perceive more triangular shapes (male appearance) rather than oval shapes (female appearance) resulted in a high level of observer errors, with only 6% of females correctly estimated across the seven observers. The low accuracy rates obtained could also indicate inter-population variation, as shown by other studies. Due to the results obtained, research considering different levels of observers’ experience and diverse population samples is needed to confirm our findings. Full article
Show Figures

Figure 1

10 pages, 5646 KB  
Article
Radial Head Fractures: Is the Mason Classification Still Effective Today? A Large-Sample Validation of Intra- and Inter-Observer Reliability
by Filippo Calderazzi, Davide Donelli, Alessandro Marinelli, Paolo Bastia, Cristina Galavotti, Alessandro Nosenzo, Enricomaria Lunini, Alessandra Maresca, Giorgio Concari and Corrado Ciatti
J. Clin. Med. 2025, 14(20), 7252; https://doi.org/10.3390/jcm14207252 - 14 Oct 2025
Viewed by 986
Abstract
Introduction: Various classifications of radial head fractures have been reported in the literature, most of them are based solely on conventional radiographic criteria. The Mason–Johnston classification, currently the most widely used system worldwide, is affected by the limitations of conventional radiographs. The aim [...] Read more.
Introduction: Various classifications of radial head fractures have been reported in the literature, most of them are based solely on conventional radiographic criteria. The Mason–Johnston classification, currently the most widely used system worldwide, is affected by the limitations of conventional radiographs. The aim of our study is to confirm or refute the low reliability and reproducibility of the Mason–Johnston classification. Materials and Methods: The study collected elbow X-rays showing radial head fractures from 2011 to 2021. Images were evaluated by eight orthopedic surgeons and one radiologist consultant from different hospitals for classification. The first phase assessed inter-observer agreement, comparing classifications among participants. After four months, the same images were randomly reordered and then reclassified to evaluate intra-observer agreement. A total of 90 elbow X-rays from 50 women and 40 men were analyzed. Inter- and intra-observer agreement was assessed using Fleiss’ kappa, Krippendorff alpha, and Cohen’s kappa. Results: Overall inter-observer agreement by unweighted Fleiss’ κ was moderate in both sessions (κ = 0.49 and κ = 0.50), with overall pairwise percent agreement 63% and prevalence- and bias-adjusted κ (PABAK, k = 4) ≈ 0.50. As an ordinal sensitivity analysis, Krippendorff’s α (ordinal) was 0.726 and 0.744, indicating substantial agreement. Type-specific reliability was moderate for Types II–III and higher for Type IV. Unweighted Cohen’s kappa coefficients were calculated to assess intra-observer agreement, demonstrating moderate to substantial levels of concordance. Conclusions: The Mason–Johnston classification shows moderate inter-observer reliability, especially for Types II–III, and moderate to substantial intra-observer agreement. Full article
(This article belongs to the Special Issue Treatment and Long-Term Outcome of Fracture)
Show Figures

Figure 1

11 pages, 2185 KB  
Article
Reproducibility Examination of Histopathological Growth Patterns of Liver Metastases in a Retrospective, Consecutive, Single-Center, Cohort Study with Literature Review
by Anita Sejben, Szintia Almási, Boglárka Pósfai, Bence Baráth, Ádám Ferenczi, Parsa Abbasi, Tamás Zombori and Tamás Lantos
Med. Sci. 2025, 13(4), 220; https://doi.org/10.3390/medsci13040220 - 3 Oct 2025
Viewed by 522
Abstract
Objectives: Histopathological growth patterns (HGPs) of liver metastases have been shown to possess prognostic significance. To date, only 2 studies have evaluated the reproducibility of HGP assessment. The aim of our study was to assess the interobserver reproducibility of HGP classification in liver [...] Read more.
Objectives: Histopathological growth patterns (HGPs) of liver metastases have been shown to possess prognostic significance. To date, only 2 studies have evaluated the reproducibility of HGP assessment. The aim of our study was to assess the interobserver reproducibility of HGP classification in liver metastases. Methods: A retrospective, consecutive, single-center cohort study was conducted, including patients who underwent surgical resection for liver metastases at the University of Szeged between 2011 and 2023. A comprehensive database was established, incorporating basic histopathological data for each case. Histological slides were independently reviewed by 2 pathologists, 3 pathology specialist trainees, and 2 medical students with varying levels of experience in gastrointestinal pathology. Interobserver agreement was evaluated using intraclass correlation coefficients (ICC) and Fleiss’ kappa. Results: The study included resection specimens from 205 patients, comprising 336 metastatic lesions, predominantly of gastrointestinal origin (n = 188). Excellent interobserver agreement was observed among specialist trainees (ICC = 0.911) and board-certified pathologists (ICC = 0.984). Overall agreement among all 7 evaluators was good (ICC = 0.822). Conclusions: Our findings demonstrate that HGPs can be reliably assessed by individuals with at least 2 years of experience in general pathology. To our knowledge, this is the first study to include the largest number of board-certified pathologists and pathology specialist trainees in a HGP reproducibility analysis. Additionally, no comprehensive literature review on this topic has been previously conducted. Full article
Show Figures

Figure 1

14 pages, 1477 KB  
Article
Mammographic Calcifications in Lung Transplant Recipients: Prevalence and Evolution
by Jonathan Saenger, Jasmin Happe, Caroline Maier, Bjarne Kerber, Ela Uenal, Denise Bos, Thomas Frauenfelder and Andreas Boss
Biomedicines 2025, 13(9), 2318; https://doi.org/10.3390/biomedicines13092318 - 22 Sep 2025
Cited by 1 | Viewed by 736
Abstract
Objective: To investigate the prevalence and progression of macrocalcifications or sporadic scattered microcalcifications, breast arterial calcifications (BAC) and grouped microcalcifications in women undergoing lung transplantation (LTX). Materials and Methods: In this retrospective single-center cohort study, 176 adult female patients who underwent mammography between [...] Read more.
Objective: To investigate the prevalence and progression of macrocalcifications or sporadic scattered microcalcifications, breast arterial calcifications (BAC) and grouped microcalcifications in women undergoing lung transplantation (LTX). Materials and Methods: In this retrospective single-center cohort study, 176 adult female patients who underwent mammography between 2008 and 2025 were included: 82 LTX recipients and 94 age-matched controls. Mammographic findings were assessed using standardized BI-RADS criteria and a visual BAC scoring system. Clinical and demographic data were extracted from electronic medical records. Multivariable logistic regression and cumulative incidence analysis were used to evaluate associations and progression patterns. Interobserver agreement was assessed using Fleiss’ kappa. Results: BAC and grouped microcalcifications were significantly more prevalent in the LTX group in the last mammography (BAC: OR 6.57, 95% CI 2.34–20.7; microcalcifications: OR 14.6, 95% CI 3.93–73.9; both p < 0.001). Cumulative incidence analysis showed accelerated progression of BAC and grouped microcalcifications in LTX recipients (p ≤ 0.01), while macrocalcifications or sporadic scattered microcalcification progression did not differ significantly. BAC was often more extensive and potentially mimicked malignant findings. Interobserver agreement was highest for the four-level BAC scoring system (κ = 0.61), followed by BAC presence (κ = 0.59) and macrocalcifications (κ = 0.51), while grouped microcalcifications showed only fair agreement (κ = 0.33). Conclusions: Lung transplant recipients demonstrate significantly higher prevalence and faster progression of BAC and grouped microcalcifications compared to controls, complicating mammographic interpretation. Given their elevated risk of aggressive malignancies and diagnostic overlap between benign and suspicious calcifications, transplant recipients may benefit from tailored screening strategies. Full article
(This article belongs to the Special Issue Imaging Technology for Human Diseases)
Show Figures

Figure 1

11 pages, 1112 KB  
Article
Thoracic MRI in Pediatric Oncology: Feasibility and Image Quality of Post-Contrast Free-Breathing Radial 3D T1 Weighted Imaging
by Patricia Tischendorf, Marc-David Künnemann, Tobias Krähling, Jan Hendrik Lange, Walter Heindel and Laura Beck
Biomedicines 2025, 13(9), 2302; https://doi.org/10.3390/biomedicines13092302 - 19 Sep 2025
Cited by 1 | Viewed by 1192
Abstract
Objectives: To compare the feasibility and image quality of a post-contrast free-breathing radial stack-of-stars 3D T1w turbo-field echo Dixon sequence (3D T1w VANE mDIXON) with a conventional cartesian breath-hold 3D T1w fast-field echo mDIXON sequence in pediatric oncology patients undergoing chest MRI. [...] Read more.
Objectives: To compare the feasibility and image quality of a post-contrast free-breathing radial stack-of-stars 3D T1w turbo-field echo Dixon sequence (3D T1w VANE mDIXON) with a conventional cartesian breath-hold 3D T1w fast-field echo mDIXON sequence in pediatric oncology patients undergoing chest MRI. Methods: A total of 48 children (34 females; mean age 5.3 ± 3.7 years) underwent contrast-enhanced chest MRI, with 24 examined using the 3D T1w VANE mDIXON sequence and 24 with a conventional breath-hold 3D T1w mDIXON sequence. Image quality was independently assessed by three radiologists using a 5-point scale. Signal-to-noise ratio (SNR) was measured at two anatomical sites, a homogeneous paraspinal muscle region (SNRmuscle) and the liver apex (SNRliver), while avoiding vessels and signal inhomogeneities. The presence of respiratory artifacts, total imaging time, and the need for general anesthesia or sedation were recorded. Interobserver agreement was determined using Fleiss’s kappa (ϰ), and mean SNR values were compared between groups using an independent samples t-test. Results: The 3D T1w VANE mDIXON sequence yielded significantly higher SNRmuscle and SNRliver (530 ± 120; 570 ± 110 vs. 370 ± 110; 400 ± 90; p < 0.001), improved diagnostic image quality by approximately 25%, and reduced respiratory artifacts by about 23%. Interobserver agreement was almost perfect. Importantly, the need for general anesthesia was significantly reduced using the 3D T1w VANE mDIXON (p < 0.001). Conclusions: Free-breathing 3D T1w VANE mDIXON chest MRI is a feasible and effective imaging approach for pediatric oncology patients, offering superior image quality and reducing the need for general anesthesia compared to conventional methods. Full article
(This article belongs to the Special Issue Pediatric Tumors: Diagnosis, Pathogenesis, Treatment, and Outcome)
Show Figures

Graphical abstract

10 pages, 472 KB  
Article
Evaluating the Concordance Between ChatGPT and Multidisciplinary Teams in Breast Cancer Treatment Planning: A Study from Bosnia and Herzegovina
by Sefika Umihanic, Hedim Osmanovic, Nejra Selak, Dijana Kopric, Asija Huseinbasic, Erna Sehic-Kozica, Belma Babic and Fadil Umihanic
J. Clin. Med. 2025, 14(18), 6460; https://doi.org/10.3390/jcm14186460 - 13 Sep 2025
Viewed by 1067
Abstract
Background/Objectives: In many low- and middle-income countries (LMICs), including Bosnia and Herzegovina, oncology services are constrained by a limited number of specialists and uneven access to evidence-based care. Artificial intelligence (AI), particularly large language models (LLMs) such as ChatGPT, may provide clinical [...] Read more.
Background/Objectives: In many low- and middle-income countries (LMICs), including Bosnia and Herzegovina, oncology services are constrained by a limited number of specialists and uneven access to evidence-based care. Artificial intelligence (AI), particularly large language models (LLMs) such as ChatGPT, may provide clinical decision support to help standardize treatment and assist clinicians where oncology expertise is scarce. This study aimed to evaluate the concordance, safety, and clinical appropriateness of ChatGPT-generated treatment recommendations compared to decisions made by a multidisciplinary team (MDT) in the management of newly diagnosed breast cancer patients. Methods: This retrospective study included 91 patients with newly diagnosed, treatment-naïve breast cancer, presented to an MDT in Bosnia and Herzegovina in 2023. Patient data were entered into ChatGPT-4.0 to generate treatment recommendations. Four board-certified oncologists, two internal and two external, evaluated ChatGPT’s suggestions against MDT decisions using a 4-point Likert scale. Agreement was analyzed using descriptive statistics, Cronbach’s alpha, and Fleiss’ kappa. Results: The mean agreement score between ChatGPT and MDT decisions was 3.31 (SD = 0.10), with high consistency across oncologist ratings (Cronbach’s alpha = 0.86). Fleiss’ kappa indicated moderate inter-rater reliability (κ = 0.31, p < 0.001). Higher agreement was observed in patients with hormone receptor-negative tumors and those treated with standard chemotherapy regimens. Lower agreement occurred in cases requiring individualized decisions, such as low-grade tumors or uncertain indications for surgery or endocrine therapy. Conclusions: ChatGPT showed high concordance with MDT treatment plans, especially in standardized clinical scenarios. In resource-limited settings, AI tools may support oncology decision-making and help bridge gaps in clinical expertise. However, careful validation and expert oversight remain essential for safe and effective use in practice. Full article
(This article belongs to the Section Oncology)
Show Figures

Figure 1

24 pages, 2607 KB  
Article
Behavior Spectrum-Based Pedestrian Risk Classification via YOLOv8–ByteTrack and CRITIC–Kmeans
by Jianqi Sun and Yulong Pei
Appl. Sci. 2025, 15(18), 10008; https://doi.org/10.3390/app151810008 - 12 Sep 2025
Cited by 1 | Viewed by 752
Abstract
Pedestrian safety at signalized intersections remains a pressing concern in rapidly urbanizing cities. This study introduces a trajectory–signal behavior spectrum, grounded in Behavior Spectrum Theory (BST), to quantify crossing risk using readily observable data. Unmanned aerial vehicle (UAV) video is employed to record [...] Read more.
Pedestrian safety at signalized intersections remains a pressing concern in rapidly urbanizing cities. This study introduces a trajectory–signal behavior spectrum, grounded in Behavior Spectrum Theory (BST), to quantify crossing risk using readily observable data. Unmanned aerial vehicle (UAV) video is employed to record pedestrian movements, which are then detected with YOLOv8 and tracked with ByteTrack, producing frame-level trajectories without dependence on line-of-sight instrumentation. Five spatiotemporal features—speed, acceleration, crossing time, remaining pedestrian-signal green time, and red-phase duration—are compiled into the spectrum. Features are normalized using the interquartile range (IQR) method, and objective weights are determined with an improved CRITIC (Criteria Importance Through Intercriteria Correlation) scheme that incorporates a median-based coefficient of variation and absolute correlation for conflict measurement. The resulting risk eigenvalues are clustered with K-means into four levels: no risk, low, medium, and high. A case study of 1210 crossings at a two-way eight-lane intersection in Harbin, China (576 compliant, 634 non-compliant) demonstrates the approach. Results show greater variability among non-compliant speeds (mean 1.29 m/s) compared with compliant crossings (mean 1.40 m/s), with more extreme deviations. Clustering achieved silhouette coefficients of 0.60 for compliant and 0.69 for non-compliant groups, while expert validation on 20 samples yielded substantial agreement (Fleiss’ Kappa = 0.87). This study provides a systematic and interpretable method for risk classification, which supports both theoretical understanding and applied traffic safety management. Full article
Show Figures

Figure 1

14 pages, 265 KB  
Article
Co-Development and Content Validity of an Instrument to Collect Integratively the Social Determinants of Health in Postpartum Lactating People
by Paula Eugenia Barral, Agustín Ramiro Miranda and Elio Andrés Soria
World 2025, 6(3), 120; https://doi.org/10.3390/world6030120 - 1 Sep 2025
Viewed by 1203
Abstract
Postpartum lactating people are particularly vulnerable to inequities in social determinants of health (SDH), yet no validated tool currently exists to assess these factors comprehensively. This study aimed to co-develop and establish the content validity of an instrument to integratively evaluate SDH in [...] Read more.
Postpartum lactating people are particularly vulnerable to inequities in social determinants of health (SDH), yet no validated tool currently exists to assess these factors comprehensively. This study aimed to co-develop and establish the content validity of an instrument to integratively evaluate SDH in this population. Guided by the Mixed Methods Appraisal Tool, an interdisciplinary e-Delphi panel assessed item sufficiency, clarity, coherence, and relevance. Statistical analyses included the item-level (I-CVI) and scale-level (S-CVI/Ave) content validity indices, average agreement between experts (AABE), Fleiss’ kappa (κ), and Aiken’s V coefficient (V) (p < 0.05). Cognitive interviews were conducted with postpartum lactating participants representing diverse characteristics to assess interpretability. The initial version of the instrument included 135 items across nine sections addressing general demographics, education, employment, home environment, lifestyle, social support, healthcare access, stress, intimate partner violence, insomnia, and nutrition. Based on expert input, it was refined to 131 items through structural and lexical revisions. Content validity indices indicated strong agreement: I-CVI ranged from 0.66–1.00, S-CVI/Ave > 0.95, AABE > 14.26, and κ and V > 0.90. Final adjustments following cognitive interviews led to a 128-item version optimized for clarity and relevance. This instrument offers strong content validity for SDH assessment in postpartum lactating people and supports sustainable use in health research. Full article
Show Figures

Graphical abstract

15 pages, 2479 KB  
Article
Inter- and Intraobserver Variability in Bowel Preparation Scoring for Colon Capsule Endoscopy: Impact of AI-Assisted Assessment Feasibility Study
by Ian Io Lei, Daniel R. Gaya, Alexander Robertson, Benedicte Schelde-Olesen, Alice Mapiye, Anirudh Bhandare, Bei Bei Lui, Chander Shekhar, Ursula Valentiner, Pere Gilabert, Pablo Laiz, Santi Segui, Nicholas Parsons, Cristiana Huhulea, Hagen Wenzek, Elizabeth White, Anastasios Koulaouzidis and Ramesh P. Arasaradnam
Cancers 2025, 17(17), 2840; https://doi.org/10.3390/cancers17172840 - 29 Aug 2025
Viewed by 1059
Abstract
Background: Colon capsule endoscopy (CCE) has seen increased adoption since the COVID-19 pandemic, offering a non-invasive alternative for lower gastrointestinal investigations. However, inadequate bowel preparation remains a key limitation, often leading to higher conversion rates to colonoscopy. Manual assessment of bowel cleanliness is [...] Read more.
Background: Colon capsule endoscopy (CCE) has seen increased adoption since the COVID-19 pandemic, offering a non-invasive alternative for lower gastrointestinal investigations. However, inadequate bowel preparation remains a key limitation, often leading to higher conversion rates to colonoscopy. Manual assessment of bowel cleanliness is inherently subjective and marked by high interobserver variability. Recent advances in artificial intelligence (AI) have enabled automated cleansing scores that not only standardise assessment and reduce variability but also align with the emerging semi-automated AI reading workflow, which highlights only clinically significant frames. As full video review becomes less routine, reliable, and consistent, cleansing evaluation is essential, positioning bowel preparation AI as a critical enabler of diagnostic accuracy and scalable CCE deployment. Objective: This CESCAIL sub-study aimed to (1) evaluate interobserver agreement in CCE bowel cleansing assessment using two established scoring systems, and (2) determine the impact of AI-assisted scoring, specifically a TransUNet-based segmentation model with a custom Patch Loss function, on both interobserver and intraobserver agreement compared to manual assessment. Methods: As part of the CESCAIL study, twenty-five CCE videos were randomly selected from 673 participants. Nine readers with varying CCE experience scored bowel cleanliness using the Leighton–Rex and CC-CLEAR scales. After a minimum 8-week washout, the same readers reassessed the videos using AI-assisted CC-CLEAR scores. Interobserver variability was evaluated using bootstrapped intraclass correlation coefficients (ICC) and Fleiss’ Kappa; intraobserver variability was assessed with weighted Cohen’s Kappa, paired t-tests, and Two One-Sided Tests (TOSTs). Results: Leighton–Rex showed poor to fair agreement (Fleiss = 0.14; ICC = 0.55), while CC-CLEAR demonstrated fair to excellent agreement (Fleiss = 0.27; ICC = 0.90). AI-assisted CC-CLEAR achieved only moderate agreement overall (Fleiss = 0.27; ICC = 0.69), with weaker performance among less experienced readers (Fleiss = 0.15; ICC = 0.56). Intraobserver agreement was excellent (ICC > 0.75) for experienced readers but variable in others (ICC 0.03–0.80). AI-assisted scores were significantly lower than manual reads by 1.46 points (p < 0.001), potentially increasing conversion to colonoscopy. Conclusions: AI-assisted scoring did not improve interobserver agreement and may even reduce consistency amongst less experienced readers. The maintained agreement observed in experienced readers highlights its current value in experienced hands only. Further refinement, including spatial analysis integration, is needed for robust overall AI implementation in CCE. Full article
(This article belongs to the Section Methods and Technologies Development)
Show Figures

Figure 1

38 pages, 4944 KB  
Article
Integrated Survey Classification and Trend Analysis via LLMs: An Ensemble Approach for Robust Literature Synthesis
by Eleonora Bernasconi, Domenico Redavid and Stefano Ferilli
Electronics 2025, 14(17), 3404; https://doi.org/10.3390/electronics14173404 - 27 Aug 2025
Cited by 1 | Viewed by 1440
Abstract
This study proposes a novel, scalable framework for the automated classification and synthesis of survey literature by integrating state-of-the-art Large Language Models (LLMs) with robust ensemble voting techniques. The framework consolidates predictions from three independent models—GPT-4, LLaMA 3.3, and Claude 3—to generate consensus-based [...] Read more.
This study proposes a novel, scalable framework for the automated classification and synthesis of survey literature by integrating state-of-the-art Large Language Models (LLMs) with robust ensemble voting techniques. The framework consolidates predictions from three independent models—GPT-4, LLaMA 3.3, and Claude 3—to generate consensus-based classifications, thereby enhancing reliability and mitigating individual model biases. We demonstrate the generalizability of our approach through comprehensive evaluation on two distinct domains: Question Answering (QA) systems and Computer Vision (CV) survey literature, using a dataset of 1154 real papers extracted from arXiv. Comprehensive visual evaluation tools, including distribution charts, heatmaps, confusion matrices, and statistical validation metrics, are employed to rigorously assess model performance and inter-model agreement. The framework incorporates advanced statistical measures, including k-fold cross-validation, Fleiss’ kappa for inter-rater reliability, and chi-square tests for independence to validate classification robustness. Extensive experimental evaluations demonstrate that this ensemble approach achieves superior performance compared to individual models, with accuracy improvements of 10.0% over the best single model on QA literature and 10.9% on CV literature. Furthermore, comprehensive cost–benefit analysis reveals that our automated approach reduces manual literature synthesis time by 95% while maintaining high classification accuracy (F1-score: 0.89 for QA, 0.87 for CV), making it a practical solution for large-scale literature analysis. The methodology effectively uncovers emerging research trends and persistent challenges across domains, providing researchers with powerful tools for continuous literature monitoring and informed decision-making in rapidly evolving scientific fields. Full article
(This article belongs to the Special Issue Knowledge Engineering and Data Mining, 3rd Edition)
Show Figures

Figure 1

11 pages, 2637 KB  
Article
AI Enhances Lung Ultrasound Interpretation Across Clinicians with Varying Expertise Levels
by Seyed Ehsan Seyed Bolouri, Masood Dehghan, Mahdiar Nekoui, Brian Buchanan, Jacob L. Jaremko, Dornoosh Zonoobi, Arun Nagdev and Jeevesh Kapur
Diagnostics 2025, 15(17), 2145; https://doi.org/10.3390/diagnostics15172145 - 25 Aug 2025
Cited by 1 | Viewed by 1858
Abstract
Background/Objective: Lung ultrasound (LUS) is a valuable tool for detecting pulmonary conditions, but its accuracy depends on user expertise. This study evaluated whether an artificial intelligence (AI) tool could improve clinician performance in detecting pleural effusion and consolidation/atelectasis on LUS scans. Methods [...] Read more.
Background/Objective: Lung ultrasound (LUS) is a valuable tool for detecting pulmonary conditions, but its accuracy depends on user expertise. This study evaluated whether an artificial intelligence (AI) tool could improve clinician performance in detecting pleural effusion and consolidation/atelectasis on LUS scans. Methods: In this multi-reader, multi-case study, 14 clinicians of varying experience reviewed 374 retrospectively selected LUS scans (cine clips from the PLAPS point, obtained using three different probes) from 359 patients across six centers in the U.S. and Canada. In phase one, readers scored the likelihood (0–100) of pleural effusion and consolidation/atelectasis without AI. After a 4-week washout, they re-evaluated all scans with AI-generated bounding boxes. Performance metrics included area under the curve (AUC), sensitivity, specificity, and Fleiss’ Kappa. Subgroup analyses examined effects by reader experience. Results: For pleural effusion, AUC improved from 0.917 to 0.960, sensitivity from 77.3% to 89.1%, and specificity from 91.7% to 92.9%. Fleiss’ Kappa increased from 0.612 to 0.774. For consolidation/atelectasis, AUC rose from 0.870 to 0.941, sensitivity from 70.7% to 89.2%, and specificity from 85.8% to 89.5%. Kappa improved from 0.427 to 0.756. Conclusions: AI assistance enhanced clinician detection of pleural effusion and consolidation/atelectasis in LUS scans, particularly benefiting less experienced users. Full article
Show Figures

Figure 1

Back to TopTop