Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (42)

Search Parameters:
Keywords = medical imaging interpolation

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
20 pages, 7543 KB  
Article
Contrastive Learning with Feature Space Interpolation for Retrieval-Based Chest X-Ray Report Generation
by Zahid Ur Rahman, Gwanghyun Yu, Lee Jin and Jin Young Kim
Appl. Sci. 2026, 16(1), 470; https://doi.org/10.3390/app16010470 - 1 Jan 2026
Viewed by 339
Abstract
Automated radiology report generation from chest X-rays presents a critical challenge in medical imaging. Traditional image-captioning models struggle with clinical specificity and rare pathologies. Recently, contrastive vision language learning has emerged as a robust alternative that learns joint visual–textual representations. However, applying contrastive [...] Read more.
Automated radiology report generation from chest X-rays presents a critical challenge in medical imaging. Traditional image-captioning models struggle with clinical specificity and rare pathologies. Recently, contrastive vision language learning has emerged as a robust alternative that learns joint visual–textual representations. However, applying contrastive learning (CL) to radiology remains challenging due to severe data scarcity. Prior work has employed input space augmentation, but these approaches incur computational overhead and risk distorting diagnostic features. This work presents CL with feature space interpolation for retrieval (CLFIR), a novel CL framework operating on learned embeddings. The method generates interpolated pairs in the feature embedding space by mixing original and shuffled embeddings in batches using a mixing coefficient λU(0.85,0.99). This approach increases batch diversity via synthetic samples, addressing the limitations of CL on medical data while preserving diagnostic integrity. Extensive experiments demonstrate state-of-the-art performance across critical clinical validation tasks. For report generation, CLFIR achieves BLEU-1/ROUGE/METEOR scores of 0.51/0.40/0.26 (Indiana university [IU] X-ray) and 0.45/0.34/0.22 (MIMIC-CXR). Moreover, CLFIR excels at image-to-text retrieval with R@1 scores of 4.14% (IU X-ray) and 24.3% (MIMIC-CXR) and achieves 0.65 accuracy in zero-shot classification on the CheXpert5×200 dataset, surpassing the established vision-language models. Full article
Show Figures

Figure 1

11 pages, 886 KB  
Article
Quadratic Spline Fitting for Robust Measurement of Thoracic Kyphosis Using Key Vertebral Landmarks
by Nikola Kirilov and Elena Bischoff
Diagnostics 2025, 15(21), 2703; https://doi.org/10.3390/diagnostics15212703 - 25 Oct 2025
Viewed by 609
Abstract
Objective: The purpose of this study is to present a kyphosis measurement method based on quadratic spline fitting through three key vertebral landmarks: T12, T8 and T4. This approach aims to capture thoracic spine curvature more continuously and accurately than traditional methods such [...] Read more.
Objective: The purpose of this study is to present a kyphosis measurement method based on quadratic spline fitting through three key vertebral landmarks: T12, T8 and T4. This approach aims to capture thoracic spine curvature more continuously and accurately than traditional methods such as the Cobb angle and circle fitting. Methods: A dataset of 560 lateral thoracic spine radiographs was retrospectively analyzed, including cases of postural kyphosis, Scheuermann’s disease, osteoporosis-induced kyphosis and ankylosing spondylitis. Two trained raters independently performed three repeated landmark annotations per image. The kyphosis angle was computed using two methods: (1) a quadratic spline fitted through the three landmarks, with the angle derived from tangent vectors at T12 and T4; and (2) a least-squares circle fit with the angle subtended between T12 and T4. Agreement with reference Cobb angles was evaluated using Pearson correlation, MAE, RMSE, ROC analysis and Bland–Altman plots. Reliability was assessed using intraclass correlation coefficients (ICC). Results: Both methods showed excellent intra- and inter-rater reliability (ICC ≥ 0.967). The spline method achieved lower MAE (5.81°), lower RMSE (8.94°) and smaller bias compared to the circle method. Both methods showed strong correlation with Cobb angles (r ≥ 0.851) and excellent classification performance (AUC > 0.950). Conclusions: Spline-based kyphosis measurement is accurate, reliable and particularly robust in cases with severe spinal deformity. Significance: This method supports automated, reproducible kyphosis assessment and may enhance clinical evaluation of spinal curvature using artificial intelligence-driven image analysis. Full article
(This article belongs to the Section Medical Imaging and Theranostics)
Show Figures

Figure 1

20 pages, 4773 KB  
Article
Progressive Disease Image Generation with Ordinal-Aware Diffusion Models
by Meryem Mine Kurt, Ümit Mert Çağlar and Alptekin Temizel
Diagnostics 2025, 15(20), 2558; https://doi.org/10.3390/diagnostics15202558 - 10 Oct 2025
Viewed by 1009
Abstract
Background/Objectives: Ulcerative Colitis (UC) lacks longitudinal visual data, which limits both disease progression modeling and the effectiveness of computer-aided diagnosis systems. These systems are further constrained by sparse intermediate disease stages and the discrete nature of the Mayo Endoscopic Score (MES). Meanwhile, synthetic [...] Read more.
Background/Objectives: Ulcerative Colitis (UC) lacks longitudinal visual data, which limits both disease progression modeling and the effectiveness of computer-aided diagnosis systems. These systems are further constrained by sparse intermediate disease stages and the discrete nature of the Mayo Endoscopic Score (MES). Meanwhile, synthetic image generation has made significant advances. In this paper, we propose novel ordinal embedding architectures for conditional diffusion models to generate realistic UC progression sequences from cross-sectional endoscopic images. Methods: By adapting Stable Diffusion v1.4 with two specialized ordinal embeddings (Basic Ordinal Embedder using linear interpolation and Additive Ordinal Embedder modeling cumulative pathological features), our framework converts discrete MES categories into continuous progression representations. Results: The Additive Ordinal Embedder outperforms alternatives, achieving superior distributional alignment (CMMD 0.4137, recall 0.6331) and disease consistency comparable to real data (Quadratic Weighted Kappa 0.8425, UMAP Silhouette Score 0.0571). The generated sequences exhibit smooth transitions between severity levels while maintaining anatomical fidelity. Conclusions: This work establishes a foundation for transforming static medical datasets into dynamic progression models and demonstrates that ordinal-aware embeddings can effectively capture disease severity relationships, enabling synthesis of underrepresented intermediate stages. These advances support applications in medical education, diagnosis, and synthetic data generation. Full article
(This article belongs to the Special Issue Computer-Aided Diagnosis in Endoscopy 2025)
Show Figures

Figure 1

15 pages, 2039 KB  
Article
Optimising Multimodal Image Registration Techniques: A Comprehensive Study of Non-Rigid and Affine Methods for PET/CT Integration
by Babar Ali, Mansour M. Alqahtani, Essam M. Alkhybari, Ali H. D. Alshehri, Mohammad Sayed and Tamoor Ali
Diagnostics 2025, 15(19), 2484; https://doi.org/10.3390/diagnostics15192484 - 28 Sep 2025
Viewed by 1294
Abstract
Background/Objective: Multimodal image registration plays a critical role in modern medical imaging, enabling the integration of complementary modalities such as positron emission tomography (PET) and computed tomography (CT). This study compares the performance of three widely used image registration techniques—Demons Image Registration [...] Read more.
Background/Objective: Multimodal image registration plays a critical role in modern medical imaging, enabling the integration of complementary modalities such as positron emission tomography (PET) and computed tomography (CT). This study compares the performance of three widely used image registration techniques—Demons Image Registration with Modality Transformation, Free-Form Deformation using the Medical Image Registration Toolbox (MIRT), and MATLAB Intensity-Based Registration—in terms of improving PET/CT image alignment. Methods: A total of 100 matched PET/CT image slices from a clinical scanner were analysed. Preprocessing techniques, including histogram equalisation and contrast enhancement (via imadjust and adapthisteq), were applied to minimise intensity discrepancies. Each registration method was evaluated under varying parameter conditions with regard to sigma fluid (range 4–8), histogram bins (100 to 256), and interpolation methods (linear and cubic). Performance was assessed using quantitative metrics: root mean square error (RMSE), mean squared error (MSE), mean absolute error (MAE), the Pearson correlation coefficient (PCC), and standard deviation (STD). Results: Demons registration achieved optimal performance at a sigma fluid value of 6, with an RMSE of 0.1529, and demonstrated superior computational efficiency. The MIRT showed better adaptability to complex anatomical deformations, with an RMSE of 0.1725. MATLAB Intensity-Based Registration, when combined with contrast enhancement, yielded the highest accuracy (RMSE = 0.1317 at alpha = 6). Preprocessing improved registration accuracy, reducing the RMSE by up to 16%. Conclusions: Each registration technique has distinct advantages: the Demons algorithm is ideal for time-sensitive tasks, the MIRT is suited to precision-driven applications, and MATLAB-based methods offer flexible processing for large datasets. This study provides a foundational framework for optimising PET/CT image registration in both research and clinical environments. Full article
(This article belongs to the Special Issue Diagnostics in Oncology Research)
Show Figures

Figure 1

23 pages, 1547 KB  
Article
An Adaptive Steganographic Method for Reversible Information Embedding in X-Ray Images
by Elmira Daiyrbayeva, Aigerim Yerimbetova, Ekaterina Merzlyakova, Ualikhan Sadyk, Aizada Sarina, Zhamilya Taichik, Irina Ismailova, Yerbolat Iztleuov and Asset Nurmangaliyev
Computers 2025, 14(9), 386; https://doi.org/10.3390/computers14090386 - 14 Sep 2025
Viewed by 930
Abstract
The rapid digitalisation of the medical field has heightened concerns over protecting patients’ personal information during the transmission of medical images. This study introduces a method for securely transmitting X-ray images that contain embedded patient data. The proposed steganographic approach ensures that the [...] Read more.
The rapid digitalisation of the medical field has heightened concerns over protecting patients’ personal information during the transmission of medical images. This study introduces a method for securely transmitting X-ray images that contain embedded patient data. The proposed steganographic approach ensures that the original image remains intact while the embedded data is securely hidden, a critical requirement in medical contexts. To guarantee reversibility, the Interpolation Near Pixels method was utilised, recognised as one of the most effective techniques within reversible data hiding (RDH) frameworks. Additionally, the method integrates a statistical property preservation technique, enhancing the scheme’s alignment with ideal steganographic characteristics. Specifically, the “forest fire” algorithm partitions the image into interconnected regions, where statistical analyses of low-order bits are performed, followed by arithmetic decoding to achieve a desired distribution. This process successfully maintains the original statistical features of the image. The effectiveness of the proposed method was validated through stegoanalysis on real-world medical images from previous studies. The results revealed high robustness, with minimal distortion of stegocontainers, as evidenced by high PSNR values ranging between 52 and 81 dB. Full article
(This article belongs to the Special Issue Using New Technologies in Cyber Security Solutions (2nd Edition))
Show Figures

Figure 1

19 pages, 8091 KB  
Article
Leveraging Synthetic Degradation for Effective Training of Super-Resolution Models in Dermatological Images
by Francesco Branciforti, Kristen M. Meiburger, Elisa Zavattaro, Paola Savoia and Massimo Salvi
Electronics 2025, 14(15), 3138; https://doi.org/10.3390/electronics14153138 - 6 Aug 2025
Viewed by 1006
Abstract
Teledermatology relies on digital transfer of dermatological images, but compression and resolution differences compromise diagnostic quality. Image enhancement techniques are crucial to compensate for these differences and improve quality for both clinical assessment and AI-based analysis. We developed a customized image degradation pipeline [...] Read more.
Teledermatology relies on digital transfer of dermatological images, but compression and resolution differences compromise diagnostic quality. Image enhancement techniques are crucial to compensate for these differences and improve quality for both clinical assessment and AI-based analysis. We developed a customized image degradation pipeline simulating common artifacts in dermatological images, including blur, noise, downsampling, and compression. This synthetic degradation approach enabled effective training of DermaSR-GAN, a super-resolution generative adversarial network tailored for dermoscopic images. The model was trained on 30,000 high-quality ISIC images and evaluated on three independent datasets (ISIC Test, Novara Dermoscopic, PH2) using structural similarity and no-reference quality metrics. DermaSR-GAN achieved statistically significant improvements in quality scores across all datasets, with up to 23% enhancement in perceptual quality metrics (MANIQA). The model preserved diagnostic details while doubling resolution and surpassed existing approaches, including traditional interpolation methods and state-of-the-art deep learning techniques. Integration with downstream classification systems demonstrated up to 14.6% improvement in class-specific accuracy for keratosis-like lesions compared to original images. Synthetic degradation represents a promising approach for training effective super-resolution models in medical imaging, with significant potential for enhancing teledermatology applications and computer-aided diagnosis systems. Full article
(This article belongs to the Section Computer Science & Engineering)
Show Figures

Figure 1

23 pages, 1972 KB  
Article
Multi-Scale Fusion MaxViT for Medical Image Classification with Hyperparameter Optimization Using Super Beluga Whale Optimization
by Jiaqi Zhao, Tiannuo Liu and Lin Sun
Electronics 2025, 14(5), 912; https://doi.org/10.3390/electronics14050912 - 25 Feb 2025
Cited by 1 | Viewed by 1882
Abstract
This study presents an enhanced deep learning model, Multi-Scale Fusion MaxViT (MSF-MaxViT), designed for medical image classification. The aim is to improve both the accuracy and robustness of the image classification task. MSF-MaxViT incorporates a Parallel Attention mechanism for fusing local and global [...] Read more.
This study presents an enhanced deep learning model, Multi-Scale Fusion MaxViT (MSF-MaxViT), designed for medical image classification. The aim is to improve both the accuracy and robustness of the image classification task. MSF-MaxViT incorporates a Parallel Attention mechanism for fusing local and global features, inspired by the MaxViT Block and Multihead Dynamic Attention, to improve feature representation. It also combines lightweight components such as the novel Multi-Scale Fusion Attention (MSFA) block, Feature Boosting (FB) Block, Coord Attention, and Edge Attention to enhance spatial and channel feature learning. To optimize the hyperparameters in the network model, the Super Beluga Whale Optimization (SBWO) algorithm is used, which combines bi-interpolation and adaptive parameter tuning, and experiments have shown that it has a relatively excellent convergence performance. The network model, combined with the improved SBWO algorithm, has an image classification accuracy of 92.87% on the HAM10000 dataset, which is 1.85% higher than that of MaxViT, proving the practicality and effectiveness of the model. Full article
Show Figures

Figure 1

16 pages, 3603 KB  
Article
Improvement of a Subpixel Convolutional Neural Network for a Super-Resolution Image
by Muhammed Fatih Ağalday and Ahmet Çinar
Appl. Sci. 2025, 15(5), 2459; https://doi.org/10.3390/app15052459 - 25 Feb 2025
Cited by 3 | Viewed by 2592
Abstract
Super-resolution technologies are one of the tools used in image restoration, which aims to obtain high-resolution content from low-resolution images. Super-resolution technology aims to increase the quality of a low-resolution image by reconstructing it. It is a useful technology, especially in content where [...] Read more.
Super-resolution technologies are one of the tools used in image restoration, which aims to obtain high-resolution content from low-resolution images. Super-resolution technology aims to increase the quality of a low-resolution image by reconstructing it. It is a useful technology, especially in content where low-resolution images need to be enhanced. Super-resolution applications are used in areas such as face recognition, medical imaging, and satellite imaging. Deep neural network models used for single-image super-resolution are quite successful in terms of computational performance. In these models, low-resolution images are converted to high resolution using methods such as bicubic interpolation. Since the super-resolution process is performed in the high-resolution area, it adds a memory cost and computational complexity. In our proposed model, a low-resolution image is given as input to a convolutional neural network to reduce computational complexity. In this model, a subpixel convolution layer is presented that learns a series of filters to enhance low-resolution feature maps to high-resolution images. In our proposed model, convolution layers are added to the efficient subpixel convolutional neural network (ESPCN) model, and in order to prevent the lost gradient value, we transfer the feature information of the current layer from the previous layer to the next upper layer. The efficient subpixel convolutional neural network (R-ESPCN) model proposed in this paper is remodeled to reduce the time required for the real-time subpixel convolutional neural network to perform super-resolution operations on images. The results show that our method is significantly improved in accuracy and demonstrates the applicability of deep learning methods in the field of image data processing. Full article
(This article belongs to the Special Issue Application of Artificial Intelligence in Image Processing)
Show Figures

Figure 1

20 pages, 42222 KB  
Article
WGAN-GP for Synthetic Retinal Image Generation: Enhancing Sensor-Based Medical Imaging for Classification Models
by Héctor Anaya-Sánchez, Leopoldo Altamirano-Robles, Raquel Díaz-Hernández and Saúl Zapotecas-Martínez
Sensors 2025, 25(1), 167; https://doi.org/10.3390/s25010167 - 31 Dec 2024
Cited by 8 | Viewed by 3633
Abstract
Accurate synthetic image generation is crucial for addressing data scarcity challenges in medical image classification tasks, particularly in sensor-derived medical imaging. In this work, we propose a novel method using a Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP) and nearest-neighbor interpolation to [...] Read more.
Accurate synthetic image generation is crucial for addressing data scarcity challenges in medical image classification tasks, particularly in sensor-derived medical imaging. In this work, we propose a novel method using a Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP) and nearest-neighbor interpolation to generate high-quality synthetic images for diabetic retinopathy classification. Our approach enhances training datasets by generating realistic retinal images that retain critical pathological features. We evaluated the method across multiple retinal image datasets, including Retinal-Lesions, Fine-Grained Annotated Diabetic Retinopathy (FGADR), Indian Diabetic Retinopathy Image Dataset (IDRiD), and the Kaggle Diabetic Retinopathy dataset. The proposed method outperformed traditional generative models, such as conditional GANs and PathoGAN, achieving the best performance on key metrics: a Fréchet Inception Distance (FID) of 15.21, a Mean Squared Error (MSE) of 0.002025, and a Structural Similarity Index (SSIM) of 0.89 in the Kaggle dataset. Additionally, expert evaluations revealed that only 56.66% of synthetic images could be distinguished from real ones, demonstrating the high fidelity and clinical relevance of the generated data. These results highlight the effectiveness of our approach in improving medical image classification by generating realistic and diverse synthetic datasets. Full article
(This article belongs to the Collection Medical Applications of Sensor Systems and Devices)
Show Figures

Figure 1

21 pages, 7299 KB  
Article
RDAG U-Net: An Advanced AI Model for Efficient and Accurate CT Scan Analysis of SARS-CoV-2 Pneumonia Lesions
by Chih-Hui Lee, Cheng-Tang Pan, Ming-Chan Lee, Chih-Hsuan Wang, Chun-Yung Chang and Yow-Ling Shiue
Diagnostics 2024, 14(18), 2099; https://doi.org/10.3390/diagnostics14182099 - 23 Sep 2024
Cited by 3 | Viewed by 2103
Abstract
Background/Objective: This study aims to utilize advanced artificial intelligence (AI) image recog-nition technologies to establish a robust system for identifying features in lung computed tomog-raphy (CT) scans, thereby detecting respiratory infections such as SARS-CoV-2 pneumonia. Spe-cifically, the research focuses on developing a new [...] Read more.
Background/Objective: This study aims to utilize advanced artificial intelligence (AI) image recog-nition technologies to establish a robust system for identifying features in lung computed tomog-raphy (CT) scans, thereby detecting respiratory infections such as SARS-CoV-2 pneumonia. Spe-cifically, the research focuses on developing a new model called Residual-Dense-Attention Gates U-Net (RDAG U-Net) to improve accuracy and efficiency in identification. Methods: This study employed Attention U-Net, Attention Res U-Net, and the newly developed RDAG U-Net model. RDAG U-Net extends the U-Net architecture by incorporating ResBlock and DenseBlock modules in the encoder to retain training parameters and reduce computation time. The training dataset in-cludes 3,520 CT scans from an open database, augmented to 10,560 samples through data en-hancement techniques. The research also focused on optimizing convolutional architectures, image preprocessing, interpolation methods, data management, and extensive fine-tuning of training parameters and neural network modules. Result: The RDAG U-Net model achieved an outstanding accuracy of 93.29% in identifying pulmonary lesions, with a 45% reduction in computation time compared to other models. The study demonstrated that RDAG U-Net performed stably during training and exhibited good generalization capability by evaluating loss values, model-predicted lesion annotations, and validation-epoch curves. Furthermore, using ITK-Snap to convert 2D pre-dictions into 3D lung and lesion segmentation models, the results delineated lesion contours, en-hancing interpretability. Conclusion: The RDAG U-Net model showed significant improvements in accuracy and efficiency in the analysis of CT images for SARS-CoV-2 pneumonia, achieving a 93.29% recognition accuracy and reducing computation time by 45% compared to other models. These results indicate the potential of the RDAG U-Net model in clinical applications, as it can accelerate the detection of pulmonary lesions and effectively enhance diagnostic accuracy. Additionally, the 2D and 3D visualization results allow physicians to understand lesions' morphology and distribution better, strengthening decision support capabilities and providing valuable medical diagnosis and treatment planning tools. Full article
Show Figures

Figure 1

16 pages, 5420 KB  
Article
A Comparative Analysis of U-Net and Vision Transformer Architectures in Semi-Supervised Prostate Zonal Segmentation
by Guantian Huang, Bixuan Xia, Haoming Zhuang, Bohan Yan, Cheng Wei, Shouliang Qi, Wei Qian and Dianning He
Bioengineering 2024, 11(9), 865; https://doi.org/10.3390/bioengineering11090865 - 26 Aug 2024
Cited by 3 | Viewed by 4274
Abstract
The precise segmentation of different regions of the prostate is crucial in the diagnosis and treatment of prostate-related diseases. However, the scarcity of labeled prostate data poses a challenge for the accurate segmentation of its different regions. We perform the segmentation of different [...] Read more.
The precise segmentation of different regions of the prostate is crucial in the diagnosis and treatment of prostate-related diseases. However, the scarcity of labeled prostate data poses a challenge for the accurate segmentation of its different regions. We perform the segmentation of different regions of the prostate using U-Net- and Vision Transformer (ViT)-based architectures. We use five semi-supervised learning methods, including entropy minimization, cross pseudo-supervision, mean teacher, uncertainty-aware mean teacher (UAMT), and interpolation consistency training (ICT) to compare the results with the state-of-the-art prostate semi-supervised segmentation network uncertainty-aware temporal self-learning (UATS). The UAMT method improves the prostate segmentation accuracy and provides stable prostate region segmentation results. ICT plays a more stable role in the prostate region segmentation results, which provides strong support for the medical image segmentation task, and demonstrates the robustness of U-Net for medical image segmentation. UATS is still more applicable to the U-Net backbone and has a very significant effect on a positive prediction rate. However, the performance of ViT in combination with semi-supervision still requires further optimization. This comparative analysis applies various semi-supervised learning methods to prostate zonal segmentation. It guides future prostate segmentation developments and offers insights into utilizing limited labeled data in medical imaging. Full article
Show Figures

Figure 1

14 pages, 2697 KB  
Article
An Improved Medical Image Classification Algorithm Based on Adam Optimizer
by Haijing Sun, Wen Zhou, Jiapeng Yang, Yichuan Shao, Lei Xing, Qian Zhao and Le Zhang
Mathematics 2024, 12(16), 2509; https://doi.org/10.3390/math12162509 - 14 Aug 2024
Cited by 8 | Viewed by 3165
Abstract
Due to the complexity and illegibility of medical images, it brings inconvenience and difficulty to the diagnosis of medical personnel. To address these issues, an optimization algorithm called GSL(Gradient sine linear) based on Adam algorithm improvement is proposed in this paper, which introduces [...] Read more.
Due to the complexity and illegibility of medical images, it brings inconvenience and difficulty to the diagnosis of medical personnel. To address these issues, an optimization algorithm called GSL(Gradient sine linear) based on Adam algorithm improvement is proposed in this paper, which introduces gradient pruning strategy, periodic adjustment of learning rate, and linear interpolation strategy. The gradient trimming technique can scale the gradient to prevent gradient explosion, while the periodic adjustment of the learning rate and linear interpolation strategy adjusts the learning rate according to the characteristics of the sinusoidal function, accelerating the convergence while reducing the drastic parameter fluctuations, improving the efficiency and stability of training. The experimental results show that compared to the classic Adam algorithm, this algorithm can demonstrate better classification accuracy, the GSL algorithm achieves an accuracy of 78% and 75.2% on the MobileNetV2 network and ShuffleNetV2 network under the Gastroenterology dataset; and on the MobileNetV2 network and ShuffleNetV2 network under the Glaucoma dataset, an accuracy of 84.72% and 83.12%. The GSL optimizer achieved significant performance improvement on various neural network structures and datasets, proving its effectiveness and practicality in the field of deep learning, and also providing new ideas and methods for solving the difficulties in medical image recognition. Full article
Show Figures

Figure 1

14 pages, 1857 KB  
Article
A Linear Interpolation and Curvature-Controlled Gradient Optimization Strategy Based on Adam
by Haijing Sun, Wen Zhou, Yichuan Shao, Jiaqi Cui, Lei Xing, Qian Zhao and Le Zhang
Algorithms 2024, 17(5), 185; https://doi.org/10.3390/a17050185 - 29 Apr 2024
Cited by 8 | Viewed by 2667
Abstract
The Adam algorithm is a widely used optimizer for neural network training due to efficient convergence speed. The algorithm is prone to unstable learning rate and performance degradation on some models. To solve these problems, in this paper, an improved algorithm named Linear [...] Read more.
The Adam algorithm is a widely used optimizer for neural network training due to efficient convergence speed. The algorithm is prone to unstable learning rate and performance degradation on some models. To solve these problems, in this paper, an improved algorithm named Linear Curvature Momentum Adam (LCMAdam) is proposed, which introduces curvature-controlled gradient and linear interpolation strategies. The curvature-controlled gradient can make the gradient update smoother, and the linear interpolation technique can adaptively adjust the size of the learning rate according to the characteristics of the curve during the training process so that it can find the exact value faster, which improves the efficiency and robustness of training. The experimental results show that the LCMAdam algorithm achieves 98.49% accuracy on the MNIST dataset, 75.20% on the CIFAR10 dataset, and 76.80% on the Stomach dataset, which is more difficult to recognize medical images. The LCMAdam optimizer achieves significant performance gains on a variety of neural network structures and tasks, proving its effectiveness and utility in the field of deep learning. Full article
Show Figures

Figure 1

30 pages, 4929 KB  
Article
A Random Particle Swarm Optimization Based on Cosine Similarity for Global Optimization and Classification Problems
by Yujia Liu, Yuan Zeng, Rui Li, Xingyun Zhu, Yuemai Zhang, Weijie Li, Taiyong Li, Donglin Zhu and Gangqiang Hu
Biomimetics 2024, 9(4), 204; https://doi.org/10.3390/biomimetics9040204 - 28 Mar 2024
Cited by 6 | Viewed by 2549
Abstract
In today’s fast-paced and ever-changing environment, the need for algorithms with enhanced global optimization capability has become increasingly crucial due to the emergence of a wide range of optimization problems. To tackle this issue, we present a new algorithm called Random Particle Swarm [...] Read more.
In today’s fast-paced and ever-changing environment, the need for algorithms with enhanced global optimization capability has become increasingly crucial due to the emergence of a wide range of optimization problems. To tackle this issue, we present a new algorithm called Random Particle Swarm Optimization (RPSO) based on cosine similarity. RPSO is evaluated using both the IEEE Congress on Evolutionary Computation (CEC) 2022 test dataset and Convolutional Neural Network (CNN) classification experiments. The RPSO algorithm builds upon the traditional PSO algorithm by incorporating several key enhancements. Firstly, the parameter selection is adapted and a mechanism called Random Contrastive Interaction (RCI) is introduced. This mechanism fosters information exchange among particles, thereby improving the ability of the algorithm to explore the search space more effectively. Secondly, quadratic interpolation (QI) is incorporated to boost the local search efficiency of the algorithm. RPSO utilizes cosine similarity for the selection of both QI and RCI, dynamically updating population information to steer the algorithm towards optimal solutions. In the evaluation using the CEC 2022 test dataset, RPSO is compared with recent variations of Particle Swarm Optimization (PSO) and top algorithms in the CEC community. The results highlight the strong competitiveness and advantages of RPSO, validating its effectiveness in tackling global optimization tasks. Additionally, in the classification experiments with optimizing CNNs for medical images, RPSO demonstrated stability and accuracy comparable to other algorithms and variants. This further confirms the value and utility of RPSO in improving the performance of CNN classification tasks. Full article
(This article belongs to the Special Issue Nature-Inspired Metaheuristic Optimization Algorithms 2024)
Show Figures

Figure 1

20 pages, 6807 KB  
Article
Single Image Super Resolution Using Deep Residual Learning
by Moiz Hassan, Kandasamy Illanko and Xavier N. Fernando
AI 2024, 5(1), 426-445; https://doi.org/10.3390/ai5010021 - 21 Mar 2024
Cited by 11 | Viewed by 7494
Abstract
Single Image Super Resolution (SSIR) is an intriguing research topic in computer vision where the goal is to create high-resolution images from low-resolution ones using innovative techniques. SSIR has numerous applications in fields such as medical/satellite imaging, remote target identification and autonomous vehicles. [...] Read more.
Single Image Super Resolution (SSIR) is an intriguing research topic in computer vision where the goal is to create high-resolution images from low-resolution ones using innovative techniques. SSIR has numerous applications in fields such as medical/satellite imaging, remote target identification and autonomous vehicles. Compared to interpolation based traditional approaches, deep learning techniques have recently gained attention in SISR due to their superior performance and computational efficiency. This article proposes an Autoencoder based Deep Learning Model for SSIR. The down-sampling part of the Autoencoder mainly uses 3 by 3 convolution and has no subsampling layers. The up-sampling part uses transpose convolution and residual connections from the down sampling part. The model is trained using a subset of the VILRC ImageNet database as well as the RealSR database. Quantitative metrics such as PSNR and SSIM are found to be as high as 76.06 and 0.93 in our testing. We also used qualitative measures such as perceptual quality. Full article
(This article belongs to the Special Issue Artificial Intelligence-Based Image Processing and Computer Vision)
Show Figures

Figure 1

Back to TopTop