Previous Issue
Volume 10, April
 
 

J. Imaging, Volume 10, Issue 5 (May 2024) – 14 articles

  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
18 pages, 5382 KiB  
Article
Reliable Out-of-Distribution Recognition of Synthetic Images
by Anatol Maier and Christian Riess
J. Imaging 2024, 10(5), 110; https://doi.org/10.3390/jimaging10050110 - 01 May 2024
Viewed by 327
Abstract
Generative adversarial networks (GANs) and diffusion models (DMs) have revolutionized the creation of synthetically generated but realistic-looking images. Distinguishing such generated images from real camera captures is one of the key tasks in current multimedia forensics research. One particular challenge is the generalization [...] Read more.
Generative adversarial networks (GANs) and diffusion models (DMs) have revolutionized the creation of synthetically generated but realistic-looking images. Distinguishing such generated images from real camera captures is one of the key tasks in current multimedia forensics research. One particular challenge is the generalization to unseen generators or post-processing. This can be viewed as an issue of handling out-of-distribution inputs. Forensic detectors can be hardened by the extensive augmentation of the training data or specifically tailored networks. Nevertheless, such precautions only manage but do not remove the risk of prediction failures on inputs that look reasonable to an analyst but in fact are out of the training distribution of the network. With this work, we aim to close this gap with a Bayesian Neural Network (BNN) that provides an additional uncertainty measure to warn an analyst of difficult decisions. More specifically, the BNN learns the task at hand and also detects potential confusion between post-processing and image generator artifacts. Our experiments show that the BNN achieves on-par performance with the state-of-the-art detectors while producing more reliable predictions on out-of-distribution examples. Full article
(This article belongs to the Special Issue Robust Deep Learning Techniques for Multimedia Forensics and Security)
25 pages, 1456 KiB  
Article
Skin Tone Estimation under Diverse Lighting Conditions
by Success K. Mbatha, Marthinus J. Booysen and Rensu P. Theart
J. Imaging 2024, 10(5), 109; https://doi.org/10.3390/jimaging10050109 - 30 Apr 2024
Viewed by 295
Abstract
Knowledge of a person’s level of skin pigmentation, or so-called “skin tone”, has proven to be an important building block in improving the performance and fairness of various applications that rely on computer vision. These include medical diagnosis of skin conditions, cosmetic and [...] Read more.
Knowledge of a person’s level of skin pigmentation, or so-called “skin tone”, has proven to be an important building block in improving the performance and fairness of various applications that rely on computer vision. These include medical diagnosis of skin conditions, cosmetic and skincare support, and face recognition, especially for darker skin tones. However, the perception of skin tone, whether by the human eye or by an optoelectronic sensor, uses the reflection of light from the skin. The source of this light, or illumination, affects the skin tone that is perceived. This study aims to refine and assess a convolutional neural network-based skin tone estimation model that provides consistent accuracy across different skin tones under various lighting conditions. The 10-point Monk Skin Tone Scale was used to represent the skin tone spectrum. A dataset of 21,375 images was captured from volunteers across the pigmentation spectrum. Experimental results show that a regression model outperforms other models, with an estimated-to-target distance of 0.5. Using a threshold estimated-to-target skin tone distance of 2 for all lights results in average accuracy values of 85.45% and 97.16%. With the Monk Skin Tone Scale segmented into three groups, the lighter exhibits strong accuracy, the middle displays lower accuracy, and the dark falls between the two. The overall skin tone estimation achieves average error distances in the LAB space of 16.40±20.62. Full article
(This article belongs to the Section Image and Video Processing)
21 pages, 12347 KiB  
Article
Optimizing Vision Transformers for Histopathology: Pretraining and Normalization in Breast Cancer Classification
by Giulia Lucrezia Baroni, Laura Rasotto, Kevin Roitero, Angelica Tulisso, Carla Di Loreto and Vincenzo Della Mea
J. Imaging 2024, 10(5), 108; https://doi.org/10.3390/jimaging10050108 - 30 Apr 2024
Viewed by 269
Abstract
This paper introduces a self-attention Vision Transformer model specifically developed for classifying breast cancer in histology images. We examine various training strategies and configurations, including pretraining, dimension resizing, data augmentation and color normalization strategies, patch overlap, and patch size configurations, in order to [...] Read more.
This paper introduces a self-attention Vision Transformer model specifically developed for classifying breast cancer in histology images. We examine various training strategies and configurations, including pretraining, dimension resizing, data augmentation and color normalization strategies, patch overlap, and patch size configurations, in order to evaluate their impact on the effectiveness of the histology image classification. Additionally, we provide evidence for the increase in effectiveness gathered through geometric and color data augmentation techniques. We primarily utilize the BACH dataset to train and validate our methods and models, but we also test them on two additional datasets, BRACS and AIDPATH, to verify their generalization capabilities. Our model, developed from a transformer pretrained on ImageNet, achieves an accuracy rate of 0.91 on the BACH dataset, 0.74 on the BRACS dataset, and 0.92 on the AIDPATH dataset. Using a model based on the prostate small and prostate medium HistoEncoder models, we achieve accuracy rates of 0.89 and 0.86, respectively. Our results suggest that pretraining on large-scale general datasets like ImageNet is advantageous. We also show the potential benefits of using domain-specific pretraining datasets, such as extensive histopathological image collections as in HistoEncoder, though not yet with clear advantages. Full article
17 pages, 3251 KiB  
Article
Artificial Intelligence, Intrapartum Ultrasound and Dystocic Delivery: AIDA (Artificial Intelligence Dystocia Algorithm), a Promising Helping Decision Support System
by Antonio Malvasi, Lorenzo E. Malgieri, Ettore Cicinelli, Antonella Vimercati, Antonio D’Amato, Miriam Dellino, Giuseppe Trojano, Tommaso Difonzo, Renata Beck and Andrea Tinelli
J. Imaging 2024, 10(5), 107; https://doi.org/10.3390/jimaging10050107 - 29 Apr 2024
Viewed by 547
Abstract
The position of the fetal head during engagement and progression in the birth canal is the primary cause of dystocic labor and arrest of progression, often due to malposition and malrotation. The authors performed an investigation on pregnant women in labor, who all [...] Read more.
The position of the fetal head during engagement and progression in the birth canal is the primary cause of dystocic labor and arrest of progression, often due to malposition and malrotation. The authors performed an investigation on pregnant women in labor, who all underwent vaginal digital examination by obstetricians and midwives as well as intrapartum ultrasonography to collect four “geometric parameters”, measured in all the women. All parameters were measured using artificial intelligence and machine learning algorithms, called AIDA (artificial intelligence dystocia algorithm), which incorporates a human-in-the-loop approach, that is, to use AI (artificial intelligence) algorithms that prioritize the physician’s decision and explainable artificial intelligence (XAI). The AIDA was structured into five classes. After a number of “geometric parameters” were collected, the data obtained from the AIDA analysis were entered into a red, yellow, or green zone, linked to the analysis of the progress of labor. Using the AIDA analysis, we were able to identify five reference classes for patients in labor, each of which had a certain sort of birth outcome. A 100% cesarean birth prediction was made in two of these five classes. The use of artificial intelligence, through the evaluation of certain obstetric parameters in specific decision-making algorithms, allows physicians to systematically understand how the results of the algorithms can be explained. This approach can be useful in evaluating the progress of labor and predicting the labor outcome, including spontaneous, whether operative VD (vaginal delivery) should be attempted, or if ICD (intrapartum cesarean delivery) is preferable or necessary. Full article
Show Figures

Figure 1

14 pages, 5855 KiB  
Article
Rock Slope Stability Analysis Using Terrestrial Photogrammetry and Virtual Reality on Ignimbritic Deposits
by Tania Peralta, Melanie Menoscal, Gianella Bravo, Victoria Rosado, Valeria Vaca, Diego Capa, Maurizio Mulas and Luis Jordá-Bordehore
J. Imaging 2024, 10(5), 106; https://doi.org/10.3390/jimaging10050106 - 28 Apr 2024
Viewed by 281
Abstract
Puerto de Cajas serves as a vital high-altitude passage in Ecuador, connecting the coastal region to the city of Cuenca. The stability of this rocky massif is carefully managed through the assessment of blocks and discontinuities, ensuring safe travel. This study presents a [...] Read more.
Puerto de Cajas serves as a vital high-altitude passage in Ecuador, connecting the coastal region to the city of Cuenca. The stability of this rocky massif is carefully managed through the assessment of blocks and discontinuities, ensuring safe travel. This study presents a novel approach, employing rapid and cost-effective methods to evaluate an unexplored area within the protected expanse of Cajas. Using terrestrial photogrammetry and strategically positioned geomechanical stations along the slopes, we generated a detailed point cloud capturing elusive terrain features. We have used terrestrial photogrammetry for digitalization of the slope. Validation of the collected data was achieved by comparing directional data from Cloud Compare software with manual readings using a digital compass integrated in a phone at control points. The analysis encompasses three slopes, employing the SMR, Q-slope, and kinematic methodologies. Results from the SMR system closely align with kinematic analysis, indicating satisfactory slope quality. Nonetheless, continued vigilance in stability control remains imperative for ensuring road safety and preserving the site’s integrity. Moreover, this research lays the groundwork for the creation of a publicly accessible 3D repository, enhancing visualization capabilities through Google Virtual Reality. This initiative not only aids in replicating the findings but also facilitates access to an augmented reality environment, thereby fostering collaborative research endeavors. Full article
(This article belongs to the Special Issue Exploring Challenges and Innovations in 3D Point Cloud Processing)
Show Figures

Figure 1

25 pages, 9712 KiB  
Article
Comparative Analysis of Color Space and Channel, Detector, and Descriptor for Feature-Based Image Registration
by Wenan Yuan, Sai Raghavendra Prasad Poosa and Rutger Francisco Dirks
J. Imaging 2024, 10(5), 105; https://doi.org/10.3390/jimaging10050105 - 28 Apr 2024
Viewed by 282
Abstract
The current study aimed to quantify the value of color spaces and channels as a potential superior replacement for standard grayscale images, as well as the relative performance of open-source detectors and descriptors for general feature-based image registration purposes, based on a large [...] Read more.
The current study aimed to quantify the value of color spaces and channels as a potential superior replacement for standard grayscale images, as well as the relative performance of open-source detectors and descriptors for general feature-based image registration purposes, based on a large benchmark dataset. The public dataset UDIS-D, with 1106 diverse image pairs, was selected. In total, 21 color spaces or channels including RGB, XYZ, Y′CrCb, HLS, L*a*b* and their corresponding channels in addition to grayscale, nine feature detectors including AKAZE, BRISK, CSE, FAST, HL, KAZE, ORB, SIFT, and TBMR, and 11 feature descriptors including AKAZE, BB, BRIEF, BRISK, DAISY, FREAK, KAZE, LATCH, ORB, SIFT, and VGG were evaluated according to reprojection error (RE), root mean square error (RMSE), structural similarity index measure (SSIM), registration failure rate, and feature number, based on 1,950,984 image registrations. No meaningful benefits from color space or channel were observed, although XYZ, RGB color space and L* color channel were able to outperform grayscale by a very minor margin. Per the dataset, the best-performing color space or channel, detector, and descriptor were XYZ/RGB, SIFT/FAST, and AKAZE. The most robust color space or channel, detector, and descriptor were L*a*b*, TBMR, and VGG. The color channel, detector, and descriptor with the most initial detector features and final homography features were Z/L*, FAST, and KAZE. In terms of the best overall unfailing combinations, XYZ/RGB+SIFT/FAST+VGG/SIFT seemed to provide the highest image registration quality, while Z+FAST+VGG provided the most image features. Full article
(This article belongs to the Special Issue Image Processing and Computer Vision: Algorithms and Applications)
Show Figures

Figure 1

12 pages, 5232 KiB  
Article
Vertebral and Femoral Bone Mineral Density (BMD) Assessment with Dual-Energy CT versus DXA Scan in Postmenopausal Females
by Luca Pio Stoppino, Stefano Piscone, Sara Saccone, Saul Alberto Ciccarelli, Luca Marinelli, Paola Milillo, Crescenzio Gallo, Luca Macarini and Roberta Vinci
J. Imaging 2024, 10(5), 104; https://doi.org/10.3390/jimaging10050104 - 27 Apr 2024
Viewed by 362
Abstract
This study aimed to demonstrate the potential role of dual-energy CT in assessing bone mineral density (BMD) using hydroxyapatite–fat material pairing in postmenopausal women. A retrospective study was conducted on 51 postmenopausal female patients who underwent DXA and DECT examinations for other clinical [...] Read more.
This study aimed to demonstrate the potential role of dual-energy CT in assessing bone mineral density (BMD) using hydroxyapatite–fat material pairing in postmenopausal women. A retrospective study was conducted on 51 postmenopausal female patients who underwent DXA and DECT examinations for other clinical reasons. DECT images were acquired with spectral imaging using a 256-slice system. These images were processed and visualized using a HAP–fat material pair. Statistical analysis was performed using the Bland–Altman method to assess the agreement between DXA and DECT HAP–fat measurements. Mean BMD, vertebral, and femoral T-scores were obtained. For vertebral analysis, the Bland–Altman plot showed an inverse correlation (R2: −0.042; RMSE: 0.690) between T-scores and DECT HAP–fat values for measurements from L1 to L4, while a good linear correlation (R2: 0.341; RMSE: 0.589) was found for measurements at the femoral neck. In conclusion, we demonstrate the enhanced importance of BMD calculation through DECT, finding a statistically significant correlation only at the femoral neck where BMD results do not seem to be influenced by the overlap of the measurements on cortical and trabecular bone. This outcome could be beneficial in the future by reducing radiation exposure for patients already undergoing follow-up for chronic conditions. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

15 pages, 3624 KiB  
Article
A Multi-Modal Foundation Model to Assist People with Blindness and Low Vision in Environmental Interaction
by Yu Hao, Fan Yang, Hao Huang, Shuaihang Yuan, Sundeep Rangan, John-Ross Rizzo, Yao Wang and Yi Fang
J. Imaging 2024, 10(5), 103; https://doi.org/10.3390/jimaging10050103 - 26 Apr 2024
Viewed by 474
Abstract
People with blindness and low vision (pBLV) encounter substantial challenges when it comes to comprehensive scene recognition and precise object identification in unfamiliar environments. Additionally, due to the vision loss, pBLV have difficulty in accessing and identifying potential tripping hazards independently. Previous assistive [...] Read more.
People with blindness and low vision (pBLV) encounter substantial challenges when it comes to comprehensive scene recognition and precise object identification in unfamiliar environments. Additionally, due to the vision loss, pBLV have difficulty in accessing and identifying potential tripping hazards independently. Previous assistive technologies for the visually impaired often struggle in real-world scenarios due to the need for constant training and lack of robustness, which limits their effectiveness, especially in dynamic and unfamiliar environments, where accurate and efficient perception is crucial. Therefore, we frame our research question in this paper as: How can we assist pBLV in recognizing scenes, identifying objects, and detecting potential tripping hazards in unfamiliar environments, where existing assistive technologies often falter due to their lack of robustness? We hypothesize that by leveraging large pretrained foundation models and prompt engineering, we can create a system that effectively addresses the challenges faced by pBLV in unfamiliar environments. Motivated by the prevalence of large pretrained foundation models, particularly in assistive robotics applications, due to their accurate perception and robust contextual understanding in real-world scenarios induced by extensive pretraining, we present a pioneering approach that leverages foundation models to enhance visual perception for pBLV, offering detailed and comprehensive descriptions of the surrounding environment and providing warnings about potential risks. Specifically, our method begins by leveraging a large-image tagging model (i.e., Recognize Anything Model (RAM)) to identify all common objects present in the captured images. The recognition results and user query are then integrated into a prompt, tailored specifically for pBLV, using prompt engineering. By combining the prompt and input image, a vision-language foundation model (i.e., InstructBLIP) generates detailed and comprehensive descriptions of the environment and identifies potential risks in the environment by analyzing environmental objects and scenic landmarks, relevant to the prompt. We evaluate our approach through experiments conducted on both indoor and outdoor datasets. Our results demonstrate that our method can recognize objects accurately and provide insightful descriptions and analysis of the environment for pBLV. Full article
(This article belongs to the Special Issue Image and Video Processing for Blind and Visually Impaired)
Show Figures

Figure 1

15 pages, 3252 KiB  
Article
Precision Agriculture: Computer Vision-Enabled Sugarcane Plant Counting in the Tillering Phase
by Muhammad Talha Ubaid and Sameena Javaid
J. Imaging 2024, 10(5), 102; https://doi.org/10.3390/jimaging10050102 - 26 Apr 2024
Viewed by 271
Abstract
The world’s most significant yield by production quantity is sugarcane. It is the primary source for sugar, ethanol, chipboards, paper, barrages, and confectionery. Many people are affiliated with sugarcane production and their products around the globe. The sugarcane industries make an agreement with [...] Read more.
The world’s most significant yield by production quantity is sugarcane. It is the primary source for sugar, ethanol, chipboards, paper, barrages, and confectionery. Many people are affiliated with sugarcane production and their products around the globe. The sugarcane industries make an agreement with farmers before the tillering phase of plants. Industries are keen on knowing the sugarcane field’s pre-harvest estimation for planning their production and purchases. The proposed research contribution is twofold: by publishing our newly developed dataset, we also present a methodology to estimate the number of sugarcane plants in the tillering phase. The dataset has been obtained from sugarcane fields in the fall season. In this work, a modified architecture of Faster R-CNN with feature extraction using VGG-16 with Inception-v3 modules and sigmoid threshold function has been proposed for the detection and classification of sugarcane plants. Significantly promising results with 82.10% accuracy have been obtained with the proposed architecture, showing the viability of the developed methodology. Full article
(This article belongs to the Special Issue Imaging Applications in Agriculture)
Show Figures

Figure 1

18 pages, 8706 KiB  
Article
Separation and Analysis of Connected, Micrometer-Sized, High-Frequency Damage on Glass Plates due to Laser-Accelerated Material Fragments in Vacuum
by Sabrina Pietzsch, Sebastian Wollny and Paul Grimm
J. Imaging 2024, 10(5), 101; https://doi.org/10.3390/jimaging10050101 - 26 Apr 2024
Viewed by 344
Abstract
In this paper, we present a new processing method, called MOSES—Impacts, for the detection of micrometer-sized damage on glass plate surfaces. It extends existing methods by a separation of damaged areas, called impacts, to support state-of-the-art recycling systems in optimizing their parameters. These [...] Read more.
In this paper, we present a new processing method, called MOSES—Impacts, for the detection of micrometer-sized damage on glass plate surfaces. It extends existing methods by a separation of damaged areas, called impacts, to support state-of-the-art recycling systems in optimizing their parameters. These recycling systems are used to repair process-related damages on glass plate surfaces, caused by accelerated material fragments, which arise during a laser–matter interaction in a vacuum. Due to a high number of impacts, the presented MOSES—Impacts algorithm focuses on the separation of connected impacts in two-dimensional images. This separation is crucial for the extraction of relevant features such as centers of gravity and radii of impacts, which are used as recycling parameters. The results show that the MOSES—Impacts algorithm effectively separates impacts, achieves a mean agreement with human users of (82.0 ± 2.0)%, and improves the recycling of glass plate surfaces by identifying around 7% of glass plate surface area as being not in need of repair compared to existing methods. Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
Show Figures

Figure 1

22 pages, 4136 KiB  
Article
DepthCrackNet: A Deep Learning Model for Automatic Pavement Crack Detection
by Alireza Saberironaghi and Jing Ren
J. Imaging 2024, 10(5), 100; https://doi.org/10.3390/jimaging10050100 - 26 Apr 2024
Viewed by 301
Abstract
Detecting cracks in the pavement is a vital component of ensuring road safety. Since manual identification of these cracks can be time-consuming, an automated method is needed to speed up this process. However, creating such a system is challenging due to factors including [...] Read more.
Detecting cracks in the pavement is a vital component of ensuring road safety. Since manual identification of these cracks can be time-consuming, an automated method is needed to speed up this process. However, creating such a system is challenging due to factors including crack variability, variations in pavement materials, and the occurrence of miscellaneous objects and anomalies on the pavement. Motivated by the latest progress in deep learning applied to computer vision, we propose an effective U-Net-shaped model named DepthCrackNet. Our model employs the Double Convolution Encoder (DCE), composed of a sequence of convolution layers, for robust feature extraction while keeping parameters optimally efficient. We have incorporated the TriInput Multi-Head Spatial Attention (TMSA) module into our model; in this module, each head operates independently, capturing various spatial relationships and boosting the extraction of rich contextual information. Furthermore, DepthCrackNet employs the Spatial Depth Enhancer (SDE) module, specifically designed to augment the feature extraction capabilities of our segmentation model. The performance of the DepthCrackNet was evaluated on two public crack datasets: Crack500 and DeepCrack. In our experimental studies, the network achieved mIoU scores of 77.0% and 83.9% with the Crack500 and DeepCrack datasets, respectively. Full article
(This article belongs to the Special Issue Deep Learning in Image Analysis: Progress and Challenges)
Show Figures

Figure 1

11 pages, 726 KiB  
Article
Radiological Comparison of Canal Fill between Collared and Non-Collared Femoral Stems: A Two-Year Follow-Up after Total Hip Arthroplasty
by Itay Ashkenazi, Amit Benady, Shlomi Ben Zaken, Shai Factor, Mohamed Abadi, Ittai Shichman, Samuel Morgan, Aviram Gold, Nimrod Snir and Yaniv Warschawski
J. Imaging 2024, 10(5), 99; https://doi.org/10.3390/jimaging10050099 - 25 Apr 2024
Viewed by 281
Abstract
Collared femoral stems in total hip arthroplasty (THA) offer reduced subsidence and periprosthetic fractures but raise concerns about fit accuracy and stem sizing. This study compares collared and non-collared stems to assess the stem–canal fill ratio (CFR) and fixation indicators, aiming to guide [...] Read more.
Collared femoral stems in total hip arthroplasty (THA) offer reduced subsidence and periprosthetic fractures but raise concerns about fit accuracy and stem sizing. This study compares collared and non-collared stems to assess the stem–canal fill ratio (CFR) and fixation indicators, aiming to guide implant selection and enhance THA outcomes. This retrospective single-center study examined primary THA patients who received Corail cementless stems between August 2015 and October 2020, with a minimum of two years of radiological follow-up. The study compared preoperative bone quality assessments, including the Dorr classification, the canal flare index (CFI), the morphological cortical index (MCI), and the canal bone ratio (CBR), as well as postoperative radiographic evaluations, such as the CFR and component fixation, between patients who received a collared or a non-collared femoral stem. The study analyzed 202 THAs, with 103 in the collared cohort and 99 in the non-collared cohort. Patients’ demographics showed differences in age (p = 0.02) and ASA classification (p = 0.01) but similar preoperative bone quality between groups, as suggested by the Dorr classification (p = 0.15), CFI (p = 0.12), MCI (p = 0.26), and CBR (p = 0.50). At the two-year follow-up, femoral stem CFRs (p = 0.59 and p = 0.27) were comparable between collared and non-collared cohorts. Subsidence rates were almost doubled for non-collared patients (19.2 vs. 11.7%, p = 0.17), however, not to a level of clinical significance. The findings of this study show that both collared and non-collared Corail stems produce comparable outcomes in terms of the CFR and radiographic indicators for stem fixation. These findings reduce concerns about stem under-sizing and micro-motion in collared stems. While this study provides insights into the collar design debate in THA, further research remains necessary. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

16 pages, 2697 KiB  
Article
Derivative-Free Iterative One-Step Reconstruction for Multispectral CT
by Thomas Prohaszka, Lukas Neumann and Markus Haltmeier
J. Imaging 2024, 10(5), 98; https://doi.org/10.3390/jimaging10050098 - 24 Apr 2024
Viewed by 251
Abstract
Image reconstruction in multispectral computed tomography (MSCT) requires solving a challenging nonlinear inverse problem, commonly tackled via iterative optimization algorithms. Existing methods necessitate computing the derivative of the forward map and potentially its regularized inverse. In this work, we present a simple yet [...] Read more.
Image reconstruction in multispectral computed tomography (MSCT) requires solving a challenging nonlinear inverse problem, commonly tackled via iterative optimization algorithms. Existing methods necessitate computing the derivative of the forward map and potentially its regularized inverse. In this work, we present a simple yet highly effective algorithm for MSCT image reconstruction, utilizing iterative update mechanisms that leverage the full forward model in the forward step and a derivative-free adjoint problem. Our approach demonstrates both fast convergence and superior performance compared to existing algorithms, making it an interesting candidate for future work. We also discuss further generalizations of our method and its combination with additional regularization and other data discrepancy terms. Full article
(This article belongs to the Special Issue Image Processing and Computer Vision: Algorithms and Applications)
Show Figures

Figure 1

19 pages, 10662 KiB  
Article
SVD-Based Mind-Wandering Prediction from Facial Videos in Online Learning
by Nguy Thi Lan Anh, Nguyen Gia Bach, Nguyen Thi Thanh Tu, Eiji Kamioka and Phan Xuan Tan
J. Imaging 2024, 10(5), 97; https://doi.org/10.3390/jimaging10050097 - 24 Apr 2024
Viewed by 311
Abstract
This paper presents a novel approach to mind-wandering prediction in the context of webcam-based online learning. We implemented a Singular Value Decomposition (SVD)-based 1D temporal eye-signal extraction method, which relies solely on eye landmark detection and eliminates the need for gaze tracking or [...] Read more.
This paper presents a novel approach to mind-wandering prediction in the context of webcam-based online learning. We implemented a Singular Value Decomposition (SVD)-based 1D temporal eye-signal extraction method, which relies solely on eye landmark detection and eliminates the need for gaze tracking or specialized hardware, then extract suitable features from the signals to train the prediction model. Our thorough experimental framework facilitates the evaluation of our approach alongside baseline models, particularly in the analysis of temporal eye signals and the prediction of attentional states. Notably, our SVD-based signal captures both subtle and major eye movements, including changes in the eye boundary and pupil, surpassing the limited capabilities of eye aspect ratio (EAR)-based signals. Our proposed model exhibits a 2% improvement in the overall Area Under the Receiver Operating Characteristics curve (AUROC) metric and 7% in the F1-score metric for ‘not-focus’ prediction, compared to the combination of EAR-based and computationally intensive gaze-based models used in the baseline study These contributions have potential implications for enhancing the field of attentional state prediction in online learning, offering a practical and effective solution to benefit educational experiences. Full article
Show Figures

Figure 1

Previous Issue
Back to TopTop