Next Issue
Volume 10, July
Previous Issue
Volume 10, May
 
 

J. Imaging, Volume 10, Issue 6 (June 2024) – 25 articles

Cover Story (view full-size image): The Alcazar of Seville, a UNESCO World Heritage Site, is home to the Charles V Pavilion, a Renaissance masterpiece with roots dating back to the 12th century. This study employs advanced geomatics and ground-penetrating radar (GPR) techniques, alongside Historic Building Information Modelling (HBIM), to meticulously document and preserve the pavilion. By integrating 3D laser scanning, GNSS, and GPR, we generated a precise model of the pavilion and its subsurface. This comprehensive BIM model enables detailed structural analysis, virtual reconstructions, and interactive public engagement. Our interdisciplinary approach not only aids in conservation but also fosters global collaboration and advances in cultural heritage preservation. View this paper
  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
0 pages, 6555 KiB  
Article
Video-Based Sign Language Recognition via ResNet and LSTM Network
by Jiayu Huang and Varin Chouvatut
J. Imaging 2024, 10(6), 149; https://doi.org/10.3390/jimaging10060149 - 20 Jun 2024
Viewed by 452
Abstract
Sign language recognition technology can help people with hearing impairments to communicate with non-hearing-impaired people. At present, with the rapid development of society, deep learning also provides certain technical support for sign language recognition work. In sign language recognition tasks, traditional convolutional neural [...] Read more.
Sign language recognition technology can help people with hearing impairments to communicate with non-hearing-impaired people. At present, with the rapid development of society, deep learning also provides certain technical support for sign language recognition work. In sign language recognition tasks, traditional convolutional neural networks used to extract spatio-temporal features from sign language videos suffer from insufficient feature extraction, resulting in low recognition rates. Nevertheless, a large number of video-based sign language datasets require a significant amount of computing resources for training while ensuring the generalization of the network, which poses a challenge for recognition. In this paper, we present a video-based sign language recognition method based on Residual Network (ResNet) and Long Short-Term Memory (LSTM). As the number of network layers increases, the ResNet network can effectively solve the granularity explosion problem and obtain better time series features. We use the ResNet convolutional network as the backbone model. LSTM utilizes the concept of gates to control unit states and update the output feature values of sequences. ResNet extracts the sign language features. Then, the learned feature space is used as the input of the LSTM network to obtain long sequence features. It can effectively extract the spatio-temporal features in sign language videos and improve the recognition rate of sign language actions. An extensive experimental evaluation demonstrates the effectiveness and superior performance of the proposed method, with an accuracy of 85.26%, F1-score of 84.98%, and precision of 87.77% on Argentine Sign Language (LSA64). Full article
(This article belongs to the Special Issue Recent Trends in Computer Vision with Neural Networks)
Show Figures

Figure 1

12 pages, 7865 KiB  
Article
Weakly Supervised SVM-Enhanced SAM Pipeline for Stone-by-Stone Segmentation of the Masonry of the Loire Valley Castles
by Stuardo Lucho, Sylvie Treuillet, Xavier Desquesnes, Remy Leconge and Xavier Brunetaud
J. Imaging 2024, 10(6), 148; https://doi.org/10.3390/jimaging10060148 - 19 Jun 2024
Viewed by 467
Abstract
The preservation of historical monuments presents a formidable challenge, particularly in monitoring the deterioration of building materials over time. Chateau de Chambord’s facade suffers from common issues such as flaking and spalling, which require meticulous stone and joint mapping from experts manually for [...] Read more.
The preservation of historical monuments presents a formidable challenge, particularly in monitoring the deterioration of building materials over time. Chateau de Chambord’s facade suffers from common issues such as flaking and spalling, which require meticulous stone and joint mapping from experts manually for restoration efforts. Advancements in computer vision have allowed machine-learning models to help in the automatic segmentation process. In this research, a custom architecture defined as SAM-SVM is proposed, to perform stone segmentation, based on the Segment Anything Model (SAM) and Support Vector Machines (SVM). By exploiting the zero-shot learning capabilities of SAM and its customizable input parameters, we obtain segmentation mask for stones and joints, which are then classified using SVM. Two more SAMs (three in total) are used, depending on how many stones are left to segment. Through extensive experimentation and evaluation, supported by computer vision methods, the proposed architecture achieves a Dice coefficient of 85%. Our results highlight the potential of SAM in cultural heritage conservation, providing a scalable and efficient solution for stone segmentation in historic monuments. This research contributes valuable insights and methodologies to the ongoing conservation efforts of Château de Chambord and could be extrapolated to other monuments. Full article
Show Figures

Figure 1

11 pages, 1888 KiB  
Article
Automatic Detection of Post-Operative Clips in Mammography Using a U-Net Convolutional Neural Network
by Tician Schnitzler, Carlotta Ruppert, Patryk Hejduk, Karol Borkowski, Jonas Kajüter, Cristina Rossi, Alexander Ciritsis, Anna Landsmann, Hasan Zaytoun, Andreas Boss, Sebastian Schindera and Felice Burn
J. Imaging 2024, 10(6), 147; https://doi.org/10.3390/jimaging10060147 - 19 Jun 2024
Viewed by 684
Abstract
Background: After breast conserving surgery (BCS), surgical clips indicate the tumor bed and, thereby, the most probable area for tumor relapse. The aim of this study was to investigate whether a U-Net-based deep convolutional neural network (dCNN) may be used to detect surgical [...] Read more.
Background: After breast conserving surgery (BCS), surgical clips indicate the tumor bed and, thereby, the most probable area for tumor relapse. The aim of this study was to investigate whether a U-Net-based deep convolutional neural network (dCNN) may be used to detect surgical clips in follow-up mammograms after BCS. Methods: 884 mammograms and 517 tomosynthetic images depicting surgical clips and calcifications were manually segmented and classified. A U-Net-based segmentation network was trained with 922 images and validated with 394 images. An external test dataset consisting of 39 images was annotated by two radiologists with up to 7 years of experience in breast imaging. The network’s performance was compared to that of human readers using accuracy and interrater agreement (Cohen’s Kappa). Results: The overall classification accuracy on the validation set after 45 epochs ranged between 88.2% and 92.6%, indicating that the model’s performance is comparable to the decisions of a human reader. In 17.4% of cases, calcifications have been misclassified as post-operative clips. The interrater reliability of the model compared to the radiologists showed substantial agreement (κreader1 = 0.72, κreader2 = 0.78) while the readers compared to each other revealed a Cohen’s Kappa of 0.84, thus showing near-perfect agreement. Conclusions: With this study, we show that surgery clips can adequately be identified by an AI technique. A potential application of the proposed technique is patient triage as well as the automatic exclusion of post-operative cases from PGMI (Perfect, Good, Moderate, Inadequate) evaluation, thus improving the quality management workflow. Full article
Show Figures

Graphical abstract

14 pages, 9575 KiB  
Article
Analysis of Gloss Unevenness and Bidirectional Reflectance Distribution Function in Specular Reflection
by So Nakamura, Shinichi Inoue, Yoshinori Igarashi, Hiromi Sato and Yoko Mizokami
J. Imaging 2024, 10(6), 146; https://doi.org/10.3390/jimaging10060146 - 17 Jun 2024
Viewed by 573
Abstract
Gloss is associated significantly with material appearance, and observers often focus on gloss unevenness. Gloss unevenness is the intensity distribution of reflected light observed within a highlight area, that is, the variability. However, it cannot be analyzed easily because it exists only within [...] Read more.
Gloss is associated significantly with material appearance, and observers often focus on gloss unevenness. Gloss unevenness is the intensity distribution of reflected light observed within a highlight area, that is, the variability. However, it cannot be analyzed easily because it exists only within the highlight area and varies in appearance across the reflection angles. In recent years, gloss has been analyzed in terms of the intensity of specular reflection and its angular spread, or the bidirectional reflectance distribution function (BRDF). In this study, we develop an apparatus to measure gloss unevenness that can alter the angle with an angular resolution of 0.02°. Additionally, we analyze the gloss unevenness and BRDF in terms of specular reflection. Using a high angular resolution, we measure and analyze high-gloss materials, such as mirrors and plastics, and glossy materials, such as photo-like inkjet paper and coated paper. Our results show that the magnitude of gloss unevenness is the largest at angles marginally off the center of the specular reflection angle. We discuss an approach for physically defining gloss unevenness based on the BRDF. Full article
(This article belongs to the Special Issue Imaging Technologies for Understanding Material Appearance)
Show Figures

Figure 1

15 pages, 4331 KiB  
Article
Comparative Analysis of Micro-Computed Tomography and 3D Micro-Ultrasound for Measurement of the Mouse Aorta
by Hajar A. Alenezi, Karen E. Hemmings, Parkavi Kandavelu, Joanna Koch-Paszkowski and Marc A. Bailey
J. Imaging 2024, 10(6), 145; https://doi.org/10.3390/jimaging10060145 - 17 Jun 2024
Viewed by 571
Abstract
Aortic aneurysms, life-threatening and often undetected until they cause sudden death, occur when the aorta dilates beyond 1.5 times its normal size. This study used ultrasound scans and micro-computed tomography to monitor and measure aortic volume in preclinical settings, comparing it to the [...] Read more.
Aortic aneurysms, life-threatening and often undetected until they cause sudden death, occur when the aorta dilates beyond 1.5 times its normal size. This study used ultrasound scans and micro-computed tomography to monitor and measure aortic volume in preclinical settings, comparing it to the well-established measurement using ultrasound scans. The reproducibility of measurements was also examined for intra- and inter-observer variability, with both modalities used on 8-week-old C57BL6 mice. For inter-observer variability, the μCT (micro-computed tomography) measurements for the thoracic, abdominal, and whole aorta between observers were highly consistent, showing a strong positive correlation (R2 = 0.80, 0.80, 0.95, respectively) and no significant variability (p-value: 0.03, 0.03, 0.004, respectively). The intra-observer variability for thoracic, abdominal, and whole aorta scans demonstrated a significant positive correlation (R2 = 0.99, 0.96, 0.87, respectively) and low variability (p-values = 0.0004, 0.002, 0.01, respectively). The comparison between μCT and USS (ultrasound) in the suprarenal and infrarenal aorta showed no significant difference (p-value = 0.20 and 0.21, respectively). μCT provided significantly higher aortic volume measurements compared to USS. The reproducibility of USS and μCT measurements was consistent, showing minimal variance among observers. These findings suggest that μCT is a reliable alternative for comprehensive aortic phenotyping, consistent with clinical findings in human data. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

18 pages, 14623 KiB  
Article
A Binocular Color Line-Scanning Stereo Vision System for Heavy Rail Surface Detection and Correction Method of Motion Distortion
by Chao Wang, Weixi Luo, Menghui Niu, Jiqiang Li and Kechen Song
J. Imaging 2024, 10(6), 144; https://doi.org/10.3390/jimaging10060144 - 13 Jun 2024
Viewed by 587
Abstract
Thanks to the line-scanning camera, the measurement method based on line-scanning stereo vision has high optical accuracy, data transmission efficiency, and a wide field of vision. It is more suitable for continuous operation and high-speed transmission of industrial product detection sites. However, the [...] Read more.
Thanks to the line-scanning camera, the measurement method based on line-scanning stereo vision has high optical accuracy, data transmission efficiency, and a wide field of vision. It is more suitable for continuous operation and high-speed transmission of industrial product detection sites. However, the one-dimensional imaging characteristics of the line-scanning camera cause motion distortion during image data acquisition, which directly affects the accuracy of detection. Effectively reducing the influence of motion distortion is the primary problem to ensure detection accuracy. To obtain the two-dimensional color image and three-dimensional contour data of the heavy rail surface at the same time, a binocular color line-scanning stereo vision system is designed to collect the heavy rail surface data combined with the bright field illumination of the symmetrical linear light source. Aiming at the image motion distortion caused by system installation error and collaborative acquisition frame rate mismatch, this paper uses the checkerboard target and two-step cubature Kalman filter algorithm to solve the nonlinear parameters in the motion distortion model, estimate the real motion, and correct the image information. The experiments show that the accuracy of the data contained in the image is improved by 57.3% after correction. Full article
Show Figures

Figure 1

24 pages, 14601 KiB  
Article
U-Net Convolutional Neural Network for Mapping Natural Vegetation and Forest Types from Landsat Imagery in Southeastern Australia
by Tony Boston, Albert Van Dijk and Richard Thackway
J. Imaging 2024, 10(6), 143; https://doi.org/10.3390/jimaging10060143 - 13 Jun 2024
Viewed by 657
Abstract
Accurate and comparable annual mapping is critical to understanding changing vegetation distribution and informing land use planning and management. A U-Net convolutional neural network (CNN) model was used to map natural vegetation and forest types based on annual Landsat geomedian reflectance composite images [...] Read more.
Accurate and comparable annual mapping is critical to understanding changing vegetation distribution and informing land use planning and management. A U-Net convolutional neural network (CNN) model was used to map natural vegetation and forest types based on annual Landsat geomedian reflectance composite images for a 500 km × 500 km study area in southeastern Australia. The CNN was developed using 2018 imagery. Label data were a ten-class natural vegetation and forest classification (i.e., Acacia, Callitris, Casuarina, Eucalyptus, Grassland, Mangrove, Melaleuca, Plantation, Rainforest and Non-Forest) derived by combining current best-available regional-scale maps of Australian forest types, natural vegetation and land use. The best CNN generated using six Landsat geomedian bands as input produced better results than a pixel-based random forest algorithm, with higher overall accuracy (OA) and weighted mean F1 score for all vegetation classes (93 vs. 87% in both cases) and a higher Kappa score (86 vs. 74%). The trained CNN was used to generate annual vegetation maps for 2000–2019 and evaluated for an independent test area of 100 km × 100 km using statistics describing accuracy regarding the label data and temporal stability. Seventy-six percent of pixels did not change over the 20 years (2000–2019), and year-on-year results were highly correlated (94–97% OA). The accuracy of the CNN model was further verified for the study area using 3456 independent vegetation survey plots where the species of interest had ≥ 50% crown cover. The CNN showed an 81% OA compared with the plot data. The model accuracy was also higher than the label data (76%), which suggests that imperfect training data may not be a major obstacle to CNN-based mapping. Applying the CNN to other regions would help to test the spatial transferability of these techniques and whether they can support the automated production of accurate and comparable annual maps of natural vegetation and forest types required for national reporting. Full article
(This article belongs to the Section Color, Multi-spectral, and Hyperspectral Imaging)
Show Figures

Figure 1

20 pages, 3739 KiB  
Article
Automatic Switching of Electric Locomotive Power in Railway Neutral Sections Using Image Processing
by Christopher Thembinkosi Mcineka, Nelendran Pillay, Kevin Moorgas and Shaveen Maharaj
J. Imaging 2024, 10(6), 142; https://doi.org/10.3390/jimaging10060142 - 11 Jun 2024
Viewed by 674
Abstract
This article presents a computer vision-based approach to switching electric locomotive power supplies as the vehicle approaches a railway neutral section. Neutral sections are defined as a phase break in which the objective is to separate two single-phase traction supplies on an overhead [...] Read more.
This article presents a computer vision-based approach to switching electric locomotive power supplies as the vehicle approaches a railway neutral section. Neutral sections are defined as a phase break in which the objective is to separate two single-phase traction supplies on an overhead railway supply line. This separation prevents flashovers due to high voltages caused by the locomotives shorting both electrical phases. The typical system of switching traction supplies automatically employs the use of electro-mechanical relays and induction magnets. In this paper, an image classification approach is proposed to replace the conventional electro-mechanical system with two unique visual markers that represent the ‘Open’ and ‘Close’ signals to initiate the transition. When the computer vision model detects either marker, the vacuum circuit breakers inside the electrical locomotive will be triggered to their respective positions depending on the identified image. A Histogram of Oriented Gradient technique was implemented for feature extraction during the training phase and a Linear Support Vector Machine algorithm was trained for the target image classification. For the task of image segmentation, the Circular Hough Transform shape detection algorithm was employed to locate the markers in the captured images and provided cartesian plane coordinates for segmenting the Object of Interest. A signal marker classification accuracy of 94% with 75 objects per second was achieved using a Linear Support Vector Machine during the experimental testing phase. Full article
Show Figures

Figure 1

21 pages, 1918 KiB  
Article
Residual-Based Multi-Stage Deep Learning Framework for Computer-Aided Alzheimer’s Disease Detection
by Najmul Hassan, Abu Saleh Musa Miah and Jungpil Shin
J. Imaging 2024, 10(6), 141; https://doi.org/10.3390/jimaging10060141 - 11 Jun 2024
Viewed by 973
Abstract
Alzheimer’s Disease (AD) poses a significant health risk globally, particularly among the elderly population. Recent studies underscore its prevalence, with over 50% of elderly Japanese facing a lifetime risk of dementia, primarily attributed to AD. As the most prevalent form of dementia, AD [...] Read more.
Alzheimer’s Disease (AD) poses a significant health risk globally, particularly among the elderly population. Recent studies underscore its prevalence, with over 50% of elderly Japanese facing a lifetime risk of dementia, primarily attributed to AD. As the most prevalent form of dementia, AD gradually erodes brain cells, leading to severe neurological decline. In this scenario, it is important to develop an automatic AD-detection system, and many researchers have been working to develop an AD-detection system by taking advantage of the advancement of deep learning (DL) techniques, which have shown promising results in various domains, including medical image analysis. However, existing approaches for AD detection often suffer from limited performance due to the complexities associated with training hierarchical convolutional neural networks (CNNs). In this paper, we introduce a novel multi-stage deep neural network architecture based on residual functions to address the limitations of existing AD-detection approaches. Inspired by the success of residual networks (ResNets) in image-classification tasks, our proposed system comprises five stages, each explicitly formulated to enhance feature effectiveness while maintaining model depth. Following feature extraction, a deep learning-based feature-selection module is applied to mitigate overfitting, incorporating batch normalization, dropout and fully connected layers. Subsequently, machine learning (ML)-based classification algorithms, including Support Vector Machines (SVM), Random Forest (RF) and SoftMax, are employed for classification tasks. Comprehensive evaluations conducted on three benchmark datasets, namely ADNI1: Complete 1Yr 1.5T, MIRAID and OASIS Kaggle, demonstrate the efficacy of our proposed model. Impressively, our model achieves accuracy rates of 99.47%, 99.10% and 99.70% for ADNI1: Complete 1Yr 1.5T, MIRAID and OASIS datasets, respectively, outperforming existing systems in binary class problems. Our proposed model represents a significant advancement in the AD-analysis domain. Full article
Show Figures

Figure 1

21 pages, 9953 KiB  
Article
A Multi-Shot Approach for Spatial Resolution Improvement of Multispectral Images from an MSFA Sensor
by Jean Yves Aristide Yao, Kacoutchy Jean Ayikpa, Pierre Gouton and Tiemoman Kone
J. Imaging 2024, 10(6), 140; https://doi.org/10.3390/jimaging10060140 - 8 Jun 2024
Viewed by 504
Abstract
Multispectral imaging technology has advanced significantly in recent years, allowing single-sensor cameras with multispectral filter arrays to be used in new scene acquisition applications. Our camera, developed as part of the European CAVIAR project, uses an eight-band MSFA to produce mosaic images that [...] Read more.
Multispectral imaging technology has advanced significantly in recent years, allowing single-sensor cameras with multispectral filter arrays to be used in new scene acquisition applications. Our camera, developed as part of the European CAVIAR project, uses an eight-band MSFA to produce mosaic images that can be decomposed into eight sparse images. These sparse images contain only pixels with similar spectral properties and null pixels. A demosaicing process is then applied to obtain fully defined images. However, this process faces several challenges in rendering fine details, abrupt transitions, and textured regions due to the large number of null pixels in the sparse images. Therefore, we propose a sparse image composition method to overcome these challenges by reducing the number of null pixels in the sparse images. To achieve this, we increase the number of snapshots by simultaneously introducing a spatial displacement of the sensor by one to three pixels on the horizontal and/or vertical axes. The set of snapshots acquired provides a multitude of mosaics representing the same scene with a redistribution of pixels. The sparse images from the different mosaics are added together to get new composite sparse images in which the number of null pixels is reduced. A bilinear demosaicing approach is applied to the composite sparse images to obtain fully defined images. Experimental results on images projected onto the response of our MSFA filter show that our composition method significantly improves image spatial resolution and minimizes reconstruction errors while preserving spectral fidelity. Full article
(This article belongs to the Section Color, Multi-spectral, and Hyperspectral Imaging)
Show Figures

Figure 1

18 pages, 1987 KiB  
Article
Unsupervised Content Mining in CBIR: Harnessing Latent Diffusion for Complex Text-Based Query Interpretation
by Venkata Rama Muni Kumar Gopu and Madhavi Dunna
J. Imaging 2024, 10(6), 139; https://doi.org/10.3390/jimaging10060139 - 6 Jun 2024
Viewed by 723
Abstract
The paper demonstrates a novel methodology for Content-Based Image Retrieval (CBIR), which shifts the focus from conventional domain-specific image queries to more complex text-based query processing. Latent diffusion models are employed to interpret complex textual prompts and address the requirements of effectively interpreting [...] Read more.
The paper demonstrates a novel methodology for Content-Based Image Retrieval (CBIR), which shifts the focus from conventional domain-specific image queries to more complex text-based query processing. Latent diffusion models are employed to interpret complex textual prompts and address the requirements of effectively interpreting the complex textual query. Latent Diffusion models successfully transform complex textual queries into visually engaging representations, establishing a seamless connection between textual descriptions and visual content. Custom triplet network design is at the heart of our retrieval method. When trained well, a triplet network will represent the generated query image and the different images in the database. The cosine similarity metric is used to assess the similarity between the feature representations in order to find and retrieve the relevant images. Our experiments results show that latent diffusion models can successfully bridge the gap between complex textual prompts for image retrieval without relying on labels or metadata that are attached to database images. This advancement sets the stage for future explorations in image retrieval, leveraging the generative AI capabilities to cater to the ever-evolving demands of big data and complex query interpretations. Full article
(This article belongs to the Section Image and Video Processing)
Show Figures

Figure 1

16 pages, 1568 KiB  
Article
A Neural-Network-Based Watermarking Method Approximating JPEG Quantization
by Shingo Yamauchi and Masaki Kawamura
J. Imaging 2024, 10(6), 138; https://doi.org/10.3390/jimaging10060138 - 6 Jun 2024
Viewed by 478
Abstract
We propose a neural-network-based watermarking method that introduces the quantized activation function that approximates the quantization of JPEG compression. Many neural-network-based watermarking methods have been proposed. Conventional methods have acquired robustness against various attacks by introducing an attack simulation layer between the embedding [...] Read more.
We propose a neural-network-based watermarking method that introduces the quantized activation function that approximates the quantization of JPEG compression. Many neural-network-based watermarking methods have been proposed. Conventional methods have acquired robustness against various attacks by introducing an attack simulation layer between the embedding network and the extraction network. The quantization process of JPEG compression is replaced by the noise addition process in the attack layer of conventional methods. In this paper, we propose a quantized activation function that can simulate the JPEG quantization standard as it is in order to improve the robustness against the JPEG compression. Our quantized activation function consists of several hyperbolic tangent functions and is applied as an activation function for neural networks. Our network was introduced in the attack layer of ReDMark proposed by Ahmadi et al. to compare it with their method. That is, the embedding and extraction networks had the same structure. We compared the usual JPEG compressed images and the images applying the quantized activation function. The results showed that a network with quantized activation functions can approximate JPEG compression with high accuracy. We also compared the bit error rate (BER) of estimated watermarks generated by our network with those generated by ReDMark. We found that our network was able to produce estimated watermarks with lower BERs than those of ReDMark. Therefore, our network outperformed the conventional method with respect to image quality and BER. Full article
(This article belongs to the Special Issue Robust Deep Learning Techniques for Multimedia Forensics and Security)
Show Figures

Figure 1

14 pages, 5049 KiB  
Article
PlantSR: Super-Resolution Improves Object Detection in Plant Images
by Tianyou Jiang, Qun Yu, Yang Zhong and Mingshun Shao
J. Imaging 2024, 10(6), 137; https://doi.org/10.3390/jimaging10060137 - 6 Jun 2024
Viewed by 583
Abstract
Recent advancements in computer vision, especially deep learning models, have shown considerable promise in tasks related to plant image object detection. However, the efficiency of these deep learning models heavily relies on input image quality, with low-resolution images significantly hindering model performance. Therefore, [...] Read more.
Recent advancements in computer vision, especially deep learning models, have shown considerable promise in tasks related to plant image object detection. However, the efficiency of these deep learning models heavily relies on input image quality, with low-resolution images significantly hindering model performance. Therefore, reconstructing high-quality images through specific techniques will help extract features from plant images, thus improving model performance. In this study, we explored the value of super-resolution technology for improving object detection model performance on plant images. Firstly, we built a comprehensive dataset comprising 1030 high-resolution plant images, named the PlantSR dataset. Subsequently, we developed a super-resolution model using the PlantSR dataset and benchmarked it against several state-of-the-art models designed for general image super-resolution tasks. Our proposed model demonstrated superior performance on the PlantSR dataset, indicating its efficacy in enhancing the super-resolution of plant images. Furthermore, we explored the effect of super-resolution on two specific object detection tasks: apple counting and soybean seed counting. By incorporating super-resolution as a pre-processing step, we observed a significant reduction in mean absolute error. Specifically, with the YOLOv7 model employed for apple counting, the mean absolute error decreased from 13.085 to 5.71. Similarly, with the P2PNet-Soy model utilized for soybean seed counting, the mean absolute error decreased from 19.159 to 15.085. These findings underscore the substantial potential of super-resolution technology in improving the performance of object detection models for accurately detecting and counting specific plants from images. The source codes and associated datasets related to this study are available at Github. Full article
(This article belongs to the Section AI in Imaging)
Show Figures

Figure 1

20 pages, 25770 KiB  
Article
Exploring Emotional Stimuli Detection in Artworks: A Benchmark Dataset and Baselines Evaluation
by Tianwei Chen, Noa Garcia, Liangzhi Li and Yuta Nakashima
J. Imaging 2024, 10(6), 136; https://doi.org/10.3390/jimaging10060136 - 4 Jun 2024
Viewed by 688
Abstract
We introduce an emotional stimuli detection task that targets extracting emotional regions that evoke people’s emotions (i.e., emotional stimuli) in artworks. This task offers new challenges to the community because of the diversity of artwork styles and the subjectivity of emotions, which can [...] Read more.
We introduce an emotional stimuli detection task that targets extracting emotional regions that evoke people’s emotions (i.e., emotional stimuli) in artworks. This task offers new challenges to the community because of the diversity of artwork styles and the subjectivity of emotions, which can be a suitable testbed for benchmarking the capability of the current neural networks to deal with human emotion. For this task, we construct a dataset called APOLO for quantifying emotional stimuli detection performance in artworks by crowd-sourcing pixel-level annotation of emotional stimuli. APOLO contains 6781 emotional stimuli in 4718 artworks for validation and testing. We also evaluate eight baseline methods, including a dedicated one, to show the difficulties of the task and the limitations of the current techniques through qualitative and quantitative experiments. Full article
Show Figures

Figure 1

18 pages, 3315 KiB  
Article
MSMHSA-DeepLab V3+: An Effective Multi-Scale, Multi-Head Self-Attention Network for Dual-Modality Cardiac Medical Image Segmentation
by Bo Chen, Yongbo Li, Jiacheng Liu, Fei Yang and Lei Zhang
J. Imaging 2024, 10(6), 135; https://doi.org/10.3390/jimaging10060135 - 3 Jun 2024
Viewed by 375
Abstract
The automatic segmentation of cardiac computed tomography (CT) and magnetic resonance imaging (MRI) plays a pivotal role in the prevention and treatment of cardiovascular diseases. In this study, we propose an efficient network based on the multi-scale, multi-head self-attention (MSMHSA) mechanism. The incorporation [...] Read more.
The automatic segmentation of cardiac computed tomography (CT) and magnetic resonance imaging (MRI) plays a pivotal role in the prevention and treatment of cardiovascular diseases. In this study, we propose an efficient network based on the multi-scale, multi-head self-attention (MSMHSA) mechanism. The incorporation of this mechanism enables us to achieve larger receptive fields, facilitating the accurate segmentation of whole heart structures in both CT and MRI images. Within this network, features extracted from the shallow feature extraction network undergo a MHSA mechanism that closely aligns with human vision, resulting in the extraction of contextual semantic information more comprehensively and accurately. To improve the precision of cardiac substructure segmentation across varying sizes, our proposed method introduces three MHSA networks at distinct scales. This approach allows for fine-tuning the accuracy of micro-object segmentation by adapting the size of the segmented images. The efficacy of our method is rigorously validated on the Multi-Modality Whole Heart Segmentation (MM-WHS) Challenge 2017 dataset, demonstrating competitive results and the accurate segmentation of seven cardiac substructures in both cardiac CT and MRI images. Through comparative experiments with advanced transformer-based models, our study provides compelling evidence that despite the remarkable achievements of transformer-based models, the fusion of CNN models and self-attention remains a simple yet highly effective approach for dual-modality whole heart segmentation. Full article
Show Figures

Figure 1

11 pages, 813 KiB  
Article
Accuracy of Digital Imaging Software to Predict Soft Tissue Changes during Orthodontic Treatment
by Theerasak Nakornnoi and Pannapat Chanmanee
J. Imaging 2024, 10(6), 134; https://doi.org/10.3390/jimaging10060134 - 31 May 2024
Viewed by 515
Abstract
This study aimed to evaluate the accuracy of the Digital Imaging software in the prediction of soft tissue changes following three types of orthodontic interventions: non-extraction, extraction, and orthognathic surgery treatments. Ninety-six patients were randomly selected from the records of three orthodontic interventions [...] Read more.
This study aimed to evaluate the accuracy of the Digital Imaging software in the prediction of soft tissue changes following three types of orthodontic interventions: non-extraction, extraction, and orthognathic surgery treatments. Ninety-six patients were randomly selected from the records of three orthodontic interventions (32 subjects per group): (1) non-extraction, (2) extraction, and (3) orthodontic treatment combined with orthognathic surgery. The cephalometric analysis of soft tissue changes in both the actual post-treatment and the predicted treatment was performed using Dolphin Imaging software version 11.9. A paired t-test was utilized to assess the statistically significant differences between the predicted and actual treatment outcomes of the parameters (p < 0.05). In the non-extraction group, prediction errors were exhibited only in the lower lip parameters. In the extraction group, prediction errors were observed in both the upper and lower lip parameters. In the orthognathic surgery group, prediction errors were identified in chin thickness, facial contour angle, and upper and lower lip parameters (p < 0.05). Digital Imaging software exhibited inaccurate soft tissue prediction of 0.3–1.0 mm in some parameters of all treatment groups, which should be considered regarding the application of Dolphin Imaging software in orthodontic treatment planning. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

11 pages, 2783 KiB  
Article
Implicit 3D Human Reconstruction Guided by Parametric Models and Normal Maps
by Yong Ren, Mingquan Zhou, Yifan Wang, Long Feng, Qiuquan Zhu, Kang Li and Guohua Geng
J. Imaging 2024, 10(6), 133; https://doi.org/10.3390/jimaging10060133 - 29 May 2024
Viewed by 440
Abstract
Accurate and robust 3D human modeling from a single image presents significant challenges. Existing methods have shown potential, but they often fail to generate reconstructions that match the level of detail in the input image. These methods particularly struggle with loose clothing. They [...] Read more.
Accurate and robust 3D human modeling from a single image presents significant challenges. Existing methods have shown potential, but they often fail to generate reconstructions that match the level of detail in the input image. These methods particularly struggle with loose clothing. They typically employ parameterized human models to constrain the reconstruction process, ensuring the results do not deviate too far from the model and produce anomalies. However, this also limits the recovery of loose clothing. To address this issue, we propose an end-to-end method called IHRPN for reconstructing clothed humans from a single 2D human image. This method includes a feature extraction module for semantic extraction of image features. We propose an image semantic feature extraction aimed at achieving pixel model space consistency and enhancing the robustness of loose clothing. We extract features from the input image to infer and recover the SMPL-X mesh, and then combine it with a normal map to guide the implicit function to reconstruct the complete clothed human. Unlike traditional methods, we use local features for implicit surface regression. Our experimental results show that our IHRPN method performs excellently on the CAPE and AGORA datasets, achieving good performance, and the reconstruction of loose clothing is noticeably more accurate and robust. Full article
(This article belongs to the Special Issue Self-Supervised Learning for Image Processing and Analysis)
Show Figures

Figure 1

34 pages, 1881 KiB  
Article
Hybridizing Deep Neural Networks and Machine Learning Models for Aerial Satellite Forest Image Segmentation
by Clopas Kwenda, Mandlenkosi Gwetu and Jean Vincent Fonou-Dombeu
J. Imaging 2024, 10(6), 132; https://doi.org/10.3390/jimaging10060132 - 29 May 2024
Viewed by 369
Abstract
Forests play a pivotal role in mitigating climate change as well as contributing to the socio-economic activities of many countries. Therefore, it is of paramount importance to monitor forest cover. Traditional machine learning classifiers for segmenting images lack the ability to extract features [...] Read more.
Forests play a pivotal role in mitigating climate change as well as contributing to the socio-economic activities of many countries. Therefore, it is of paramount importance to monitor forest cover. Traditional machine learning classifiers for segmenting images lack the ability to extract features such as the spatial relationship between pixels and texture, resulting in subpar segmentation results when used alone. To address this limitation, this study proposed a novel hybrid approach that combines deep neural networks and machine learning algorithms to segment an aerial satellite image into forest and non-forest regions. Aerial satellite forest image features were first extracted by two deep neural network models, namely, VGG16 and ResNet50. The resulting features are subsequently used by five machine learning classifiers including Random Forest (RF), Linear Support Vector Machines (LSVM), k-nearest neighbor (kNN), Linear Discriminant Analysis (LDA), and Gaussian Naive Bayes (GNB) to perform the final segmentation. The aerial satellite forest images were obtained from a deep globe challenge dataset. The performance of the proposed model was evaluated using metrics such as Accuracy, Jaccard score index, and Root Mean Square Error (RMSE). The experimental results revealed that the RF model achieved the best segmentation results with accuracy, Jaccard score, and RMSE of 94%, 0.913 and 0.245, respectively; followed by LSVM with accuracy, Jaccard score and RMSE of 89%, 0.876, 0.332, respectively. The LDA took the third position with accuracy, Jaccard score, and RMSE of 88%, 0.834, and 0.351, respectively, followed by GNB with accuracy, Jaccard score, and RMSE of 88%, 0.837, and 0.353, respectively. The kNN occupied the last position with accuracy, Jaccard score, and RMSE of 83%, 0.790, and 0.408, respectively. The experimental results also revealed that the proposed model has significantly improved the performance of the RF, LSVM, LDA, GNB and kNN models, compared to their performance when used to segment the images alone. Furthermore, the results showed that the proposed model outperformed other models from related studies, thereby, attesting its superior segmentation capability. Full article
Show Figures

Figure 1

28 pages, 12383 KiB  
Article
Greedy Ensemble Hyperspectral Anomaly Detection
by Mazharul Hossain, Mohammed Younis, Aaron Robinson, Lan Wang and Chrysanthe Preza
J. Imaging 2024, 10(6), 131; https://doi.org/10.3390/jimaging10060131 - 28 May 2024
Viewed by 619
Abstract
Hyperspectral images include information from a wide range of spectral bands deemed valuable for computer vision applications in various domains such as agriculture, surveillance, and reconnaissance. Anomaly detection in hyperspectral images has proven to be a crucial component of change and abnormality identification, [...] Read more.
Hyperspectral images include information from a wide range of spectral bands deemed valuable for computer vision applications in various domains such as agriculture, surveillance, and reconnaissance. Anomaly detection in hyperspectral images has proven to be a crucial component of change and abnormality identification, enabling improved decision-making across various applications. These abnormalities/anomalies can be detected using background estimation techniques that do not require the prior knowledge of outliers. However, each hyperspectral anomaly detection (HS-AD) algorithm models the background differently. These different assumptions may fail to consider all the background constraints in various scenarios. We have developed a new approach called Greedy Ensemble Anomaly Detection (GE-AD) to address this shortcoming. It includes a greedy search algorithm to systematically determine the suitable base models from HS-AD algorithms and hyperspectral unmixing for the first stage of a stacking ensemble and employs a supervised classifier in the second stage of a stacking ensemble. It helps researchers with limited knowledge of the suitability of the HS-AD algorithms for the application scenarios to select the best methods automatically. Our evaluation shows that the proposed method achieves a higher average F1-macro score with statistical significance compared to the other individual methods used in the ensemble. This is validated on multiple datasets, including the Airport–Beach–Urban (ABU) dataset, the San Diego dataset, the Salinas dataset, the Hydice Urban dataset, and the Arizona dataset. The evaluation using the airport scenes from the ABU dataset shows that GE-AD achieves a 14.97% higher average F1-macro score than our previous method (HUE-AD), at least 17.19% higher than the individual methods used in the ensemble, and at least 28.53% higher than the other state-of-the-art ensemble anomaly detection algorithms. As using the combination of greedy algorithm and stacking ensemble to automatically select suitable base models and associated weights have not been widely explored in hyperspectral anomaly detection, we believe that our work will expand the knowledge in this research area and contribute to the wider application of this approach. Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
Show Figures

Graphical abstract

16 pages, 1657 KiB  
Article
Modeling of Ethiopian Beef Meat Marbling Score Using Image Processing for Rapid Meat Grading
by Tariku Erena, Abera Belay, Demelash Hailu, Bezuayehu Gutema Asefa, Mulatu Geleta and Tesfaye Deme
J. Imaging 2024, 10(6), 130; https://doi.org/10.3390/jimaging10060130 - 28 May 2024
Viewed by 626
Abstract
Meat characterized by a high marbling value is typically anticipated to display enhanced sensory attributes. This study aimed to predict the marbling scores of rib-eye, steaks sourced from the Longissimus dorsi muscle of different cattle types, namely Boran, Senga, and Sheko, by employing [...] Read more.
Meat characterized by a high marbling value is typically anticipated to display enhanced sensory attributes. This study aimed to predict the marbling scores of rib-eye, steaks sourced from the Longissimus dorsi muscle of different cattle types, namely Boran, Senga, and Sheko, by employing digital image processing and machine-learning algorithms. Marbling was analyzed using digital image processing coupled with an extreme gradient boosting (GBoost) machine learning algorithm. Meat texture was assessed using a universal texture analyzer. Sensory characteristics of beef were evaluated through quantitative descriptive analysis with a trained panel of twenty. Using selected image features from digital image processing, the marbling score was predicted with R2 (prediction) = 0.83. Boran cattle had the highest fat content in sirloin and chuck cuts (12.68% and 12.40%, respectively), followed by Senga (11.59% and 11.56%) and Sheko (11.40% and 11.17%). Tenderness scores for sirloin and chuck cuts differed among the three breeds: Boran (7.06 ± 2.75 and 3.81 ± 2.24, respectively), Senga (5.54 ± 1.90 and 5.25 ± 2.47), and Sheko (5.43 ± 2.76 and 6.33 ± 2.28 Nmm). Sheko and Senga had similar sensory attributes. Marbling scores were higher in Boran (4.28 ± 1.43 and 3.68 ± 1.21) and Senga (2.88 ± 0.69 and 2.83 ± 0.98) compared to Sheko (2.73 ± 1.28 and 2.90 ± 1.52). The study achieved a remarkable milestone in developing a digital tool for predicting marbling scores of Ethiopian beef breeds. Furthermore, the relationship between quality attributes and beef marbling score has been verified. After further validation, the output of this research can be utilized in the meat industry and quality control authorities. Full article
(This article belongs to the Section Image and Video Processing)
Show Figures

Graphical abstract

25 pages, 3906 KiB  
Article
Point Cloud Quality Assessment Using a One-Dimensional Model Based on the Convolutional Neural Network
by Abdelouahed Laazoufi, Mohammed El Hassouni and Hocine Cherifi
J. Imaging 2024, 10(6), 129; https://doi.org/10.3390/jimaging10060129 - 27 May 2024
Viewed by 646
Abstract
Recent advancements in 3D modeling have revolutionized various fields, including virtual reality, computer-aided diagnosis, and architectural design, emphasizing the importance of accurate quality assessment for 3D point clouds. As these models undergo operations such as simplification and compression, introducing distortions can significantly impact [...] Read more.
Recent advancements in 3D modeling have revolutionized various fields, including virtual reality, computer-aided diagnosis, and architectural design, emphasizing the importance of accurate quality assessment for 3D point clouds. As these models undergo operations such as simplification and compression, introducing distortions can significantly impact their visual quality. There is a growing need for reliable and efficient objective quality evaluation methods to address this challenge. In this context, this paper introduces a novel methodology to assess the quality of 3D point clouds using a deep learning-based no-reference (NR) method. First, it extracts geometric and perceptual attributes from distorted point clouds and represent them as a set of 1D vectors. Then, transfer learning is applied to obtain high-level features using a 1D convolutional neural network (1D CNN) adapted from 2D CNN models through weight conversion from ImageNet. Finally, quality scores are predicted through regression utilizing fully connected layers. The effectiveness of the proposed approach is evaluated across diverse datasets, including the Colored Point Cloud Quality Assessment Database (SJTU_PCQA), the Waterloo Point Cloud Assessment Database (WPC), and the Colored Point Cloud Quality Assessment Database featured at ICIP2020. The outcomes reveal superior performance compared to several competing methodologies, as evidenced by enhanced correlation with average opinion scores. Full article
Show Figures

Figure 1

18 pages, 13380 KiB  
Article
Integrated Building Modelling Using Geomatics and GPR Techniques for Cultural Heritage Preservation: A Case Study of the Charles V Pavilion in Seville (Spain)
by María Zaragoza, Vicente Bayarri and Francisco García
J. Imaging 2024, 10(6), 128; https://doi.org/10.3390/jimaging10060128 - 27 May 2024
Viewed by 739
Abstract
This paper highlights the fundamental role of integrating different geomatics and geophysical imaging technologies in understanding and preserving cultural heritage, with a focus on the Pavilion of Charles V in Seville (Spain). Using a terrestrial laser scanner, global navigation satellite system, and ground-penetrating [...] Read more.
This paper highlights the fundamental role of integrating different geomatics and geophysical imaging technologies in understanding and preserving cultural heritage, with a focus on the Pavilion of Charles V in Seville (Spain). Using a terrestrial laser scanner, global navigation satellite system, and ground-penetrating radar, we constructed a building information modelling (BIM) system to derive comprehensive decision-making models to preserve this historical asset. These models enable the generation of virtual reconstructions, encompassing not only the building but also its subsurface, distributable as augmented reality or virtual reality online. By leveraging these technologies, the research investigates complex details of the pavilion, capturing its current structure and revealing insights into past soil compositions and potential subsurface structures. This detailed analysis empowers stakeholders to make informed decisions about conservation and management. Furthermore, transparent data sharing fosters collaboration, advancing collective understanding and practices in heritage preservation. Full article
Show Figures

Figure 1

16 pages, 6240 KiB  
Article
Enabling Low-Dose In Vivo Benchtop X-ray Fluorescence Computed Tomography through Deep-Learning-Based Denoising
by Naghmeh Mahmoodian, Mohammad Rezapourian, Asim Abdulsamad Inamdar, Kunal Kumar, Melanie Fachet and Christoph Hoeschen
J. Imaging 2024, 10(6), 127; https://doi.org/10.3390/jimaging10060127 - 22 May 2024
Viewed by 726
Abstract
X-ray Fluorescence Computed Tomography (XFCT) is an emerging non-invasive imaging technique providing high-resolution molecular-level data. However, increased sensitivity with current benchtop X-ray sources comes at the cost of high radiation exposure. Artificial Intelligence (AI), particularly deep learning (DL), has revolutionized medical imaging by [...] Read more.
X-ray Fluorescence Computed Tomography (XFCT) is an emerging non-invasive imaging technique providing high-resolution molecular-level data. However, increased sensitivity with current benchtop X-ray sources comes at the cost of high radiation exposure. Artificial Intelligence (AI), particularly deep learning (DL), has revolutionized medical imaging by delivering high-quality images in the presence of noise. In XFCT, traditional methods rely on complex algorithms for background noise reduction, but AI holds promise in addressing high-dose concerns. We present an optimized Swin-Conv-UNet (SCUNet) model for background noise reduction in X-ray fluorescence (XRF) images at low tracer concentrations. Our method’s effectiveness is evaluated against higher-dose images, while various denoising techniques exist for X-ray and computed tomography (CT) techniques, only a few address XFCT. The DL model is trained and assessed using augmented data, focusing on background noise reduction. Image quality is measured using peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM), comparing outcomes with 100% X-ray-dose images. Results demonstrate that the proposed algorithm yields high-quality images from low-dose inputs, with maximum PSNR of 39.05 and SSIM of 0.86. The model outperforms block-matching and 3D filtering (BM3D), block-matching and 4D filtering (BM4D), non-local means (NLM), denoising convolutional neural network (DnCNN), and SCUNet in both visual inspection and quantitative analysis, particularly in high-noise scenarios. This indicates the potential of AI, specifically the SCUNet model, in significantly improving XFCT imaging by mitigating the trade-off between sensitivity and radiation exposure. Full article
(This article belongs to the Special Issue Recent Advances in X-ray Imaging)
Show Figures

Figure 1

25 pages, 7584 KiB  
Article
Fine-Grained Food Image Recognition: A Study on Optimising Convolutional Neural Networks for Improved Performance
by Liam Boyd, Nonso Nnamoko and Ricardo Lopes
J. Imaging 2024, 10(6), 126; https://doi.org/10.3390/jimaging10060126 - 22 May 2024
Viewed by 697
Abstract
Addressing the pressing issue of food waste is vital for environmental sustainability and resource conservation. While computer vision has been widely used in food waste reduction research, existing food image datasets are typically aggregated into broad categories (e.g., fruits, meat, dairy, etc.) rather [...] Read more.
Addressing the pressing issue of food waste is vital for environmental sustainability and resource conservation. While computer vision has been widely used in food waste reduction research, existing food image datasets are typically aggregated into broad categories (e.g., fruits, meat, dairy, etc.) rather than the fine-grained singular food items required for this research. The aim of this study is to develop a model capable of identifying individual food items to be integrated into a mobile application that allows users to photograph their food items, identify them, and offer suggestions for recipes. This research bridges the gap in available datasets and contributes to a more fine-grained approach to utilising existing technology for food waste reduction, emphasising both environmental and research significance. This study evaluates various (n = 7) convolutional neural network architectures for multi-class food image classification, emphasising the nuanced impact of parameter tuning to identify the most effective configurations. The experiments were conducted with a custom dataset comprising 41,949 food images categorised into 20 food item classes. Performance evaluation was based on accuracy and loss. DenseNet architecture emerged as the top-performing out of the seven examined, establishing a baseline performance (training accuracy = 0.74, training loss = 1.25, validation accuracy = 0.68, and validation loss = 2.89) on a predetermined set of parameters, including the RMSProp optimiser, ReLU activation function, ‘0.5’ dropout rate, and a 160×160 image size. Subsequent parameter tuning involved a comprehensive exploration, considering six optimisers, four image sizes, two dropout rates, and five activation functions. The results show the superior generalisation capabilities of the optimised DenseNet, showcasing performance improvements over the established baseline across key metrics. Specifically, the optimised model demonstrated a training accuracy of 0.99, a training loss of 0.01, a validation accuracy of 0.79, and a validation loss of 0.92, highlighting its improved performance compared to the baseline configuration. The optimal DenseNet has been integrated into a mobile application called FridgeSnap, designed to recognise food items and suggest possible recipes to users, thus contributing to the broader mission of minimising food waste. Full article
(This article belongs to the Section AI in Imaging)
Show Figures

Figure 1

15 pages, 1666 KiB  
Article
MResTNet: A Multi-Resolution Transformer Framework with CNN Extensions for Semantic Segmentation
by Nikolaos Detsikas, Nikolaos Mitianoudis and Ioannis Pratikakis
J. Imaging 2024, 10(6), 125; https://doi.org/10.3390/jimaging10060125 - 21 May 2024
Viewed by 684
Abstract
A fundamental task in computer vision is the process of differentiation and identification of different objects or entities in a visual scene using semantic segmentation methods. The advancement of transformer networks has surpassed traditional convolutional neural network (CNN) architectures in terms of segmentation [...] Read more.
A fundamental task in computer vision is the process of differentiation and identification of different objects or entities in a visual scene using semantic segmentation methods. The advancement of transformer networks has surpassed traditional convolutional neural network (CNN) architectures in terms of segmentation performance. The continuous pursuit of optimal performance, with respect to the popular evaluation metric results, has led to very large architectures that require a significant amount of computational power to operate, making them prohibitive for real-time applications, including autonomous driving. In this paper, we propose a model that leverages a visual transformer encoder with a parallel twin decoder, consisting of a visual transformer decoder and a CNN decoder with multi-resolution connections working in parallel. The two decoders are merged with the aid of two trainable CNN blocks, the fuser that combined the information from the two decoders and the scaler that scales the contribution of each decoder. The proposed model achieves state-of-the-art performance on the Cityscapes and ADE20K datasets, maintaining a low-complexity network that can be used in real-time applications. Full article
(This article belongs to the Special Issue Deep Learning in Computer Vision)
Show Figures

Figure 1

Previous Issue
Next Issue
Back to TopTop