Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (65)

Search Parameters:
Keywords = Mish activation function

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
20 pages, 4080 KB  
Article
Lightweight and Accurate Table Recognition via Improved SLANet with Multi-Phase Training Strategy
by Liu Mao, Yujie Xiao, Kaihang Du, Jie Shen and Xia Xie
Mathematics 2026, 14(1), 25; https://doi.org/10.3390/math14010025 - 21 Dec 2025
Abstract
Tables, as an efficient form of structured data representation, are widely applied across domains. However, traditional manual processing methods are inadequate in the big data era, and existing table recognition models, such as SLANet, still face performance limitations. To address these issues, this [...] Read more.
Tables, as an efficient form of structured data representation, are widely applied across domains. However, traditional manual processing methods are inadequate in the big data era, and existing table recognition models, such as SLANet, still face performance limitations. To address these issues, this paper proposes an improved SLANet framework. First, the original H-Swish activation is replaced with the Mish function to enhance feature representation. Second, an end-of-sequence (EOS) termination mechanism is introduced to reduce computational redundancy during inference. Third, a three-phase training strategy is designed to achieve progressive performance improvements. Experimental evaluation on the PubTabNet benchmark demonstrates that the improved SLANet achieves 77.25% accuracy with an average inference time of 774 ms, outperforming the baseline and most mainstream algorithms while retaining lightweight efficiency. The proposed algorithm achieves a TEDS score of 96.67%, significantly surpassing SLANet-based and other state-of-the-art methods. The code will be released upon acceptance. Full article
Show Figures

Figure 1

21 pages, 2419 KB  
Article
GC-FSegNet: A Flotation Froth Segmentation Network with Integrated Global Context Awareness
by Pengcheng Zhu, Zhihong Jiang, Zhen Peng and Gaipin Cai
Minerals 2025, 15(12), 1301; https://doi.org/10.3390/min15121301 - 12 Dec 2025
Viewed by 168
Abstract
Precise segmentation of flotation froths is a critical bottleneck to achieving intelligent perception and optimal control of process operations. Traditional convolutional neural networks (CNNs) are inherently limited by local receptive fields, making it challenging to accurately segment adhesive and multi-scale froths. To address [...] Read more.
Precise segmentation of flotation froths is a critical bottleneck to achieving intelligent perception and optimal control of process operations. Traditional convolutional neural networks (CNNs) are inherently limited by local receptive fields, making it challenging to accurately segment adhesive and multi-scale froths. To address this fundamental issue, this paper proposes a deep segmentation network with integrated global context awareness, termed GC-FSegNet, which establishes a new paradigm capable of jointly modeling macro-level structures and micro-level details. The proposed GC-FSegNet innovatively integrates the Global Context Network (GCNet) module into both the encoder and decoder of a Nested U-Net architecture. The GCNet captures long-range dependencies between froths, enabling macro-level modeling of clustered foam structures, while the Nested U-Net preserves high-resolution boundary details. Through their synergistic interaction, the model achieves simultaneous and efficient representation of both global contours and local details of froth images. Furthermore, the Mish activation function is employed to enhance the learning of weak boundary features, and a combined Dice and Binary Cross-Entropy (BCE) loss function is designed to optimize boundary segmentation accuracy. Experimental results on a self-constructed copper–lead flotation froth dataset demonstrate that GC-FSegNet achieves an mDice of 0.9443, mIoU of 0.8945, mRecall of 0.9866, and mPrecision of 0.9705, significantly outperforming mainstream models such as U-Net and DeepLabV3+. This study not only provides a reliable technical solution for high-adhesion froth segmentation but, more importantly, introduces a promising “global–local collaborative modeling” framework that can be extended to a wide range of complex industrial image segmentation scenarios. Full article
(This article belongs to the Section Mineral Processing and Extractive Metallurgy)
Show Figures

Figure 1

34 pages, 1741 KB  
Article
TRex: A Smooth Nonlinear Activation Bridging Tanh and ReLU for Stable Deep Learning
by Ahmad Raza Khan and Sarab Almuhaideb
Electronics 2025, 14(23), 4661; https://doi.org/10.3390/electronics14234661 - 27 Nov 2025
Viewed by 295
Abstract
Activation functions are fundamental to the representational capacity and optimization dynamics of deep neural networks. Although numerous nonlinearities have been proposed, ranging from classical sigmoid and tanh to modern smooth and trainable functions, no single activation is universally optimal, as each involves trade-offs [...] Read more.
Activation functions are fundamental to the representational capacity and optimization dynamics of deep neural networks. Although numerous nonlinearities have been proposed, ranging from classical sigmoid and tanh to modern smooth and trainable functions, no single activation is universally optimal, as each involves trade-offs among gradient flow, stability, computational cost, and expressiveness. This study introduces TRex, a novel activation function that combines the efficiency and linear growth of rectified units with the smooth gradient propagation of saturating functions. TRex features a non-zero, smoothed negative region inspired by tanh while maintaining near-linear behavior for positive inputs, preserving gradients and reducing neuron inactivation. We evaluate TRex against five widely used activation functions (ReLU, ELU, Swish, Mish, and GELU) across eight convolutional architectures (AlexNet, DenseNet-121, EfficientNet-B0, GoogLeNet, LeNet, MobileNet-V2, ResNet-18, and VGGNet) on two benchmark datasets (MNIST and Fashion-MNIST) and a real-world medical imaging dataset (SkinCancer). The results show that TRex achieves competitive accuracy, AUC, and convergence stability across most deep, connectivity-rich architectures while maintaining computational efficiency comparable to those of other smooth activations. These findings highlight TRex as a contextually efficient activation function that enhances gradient flow, generalization, and training stability, particularly in deeper or densely connected architectures, while offering comparable performance in lightweight and mobile-optimized models. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

28 pages, 3935 KB  
Article
A Novel Road Crack Detection Method Based on the YOLO Algorithm
by Li Fan, Qiuyin Xia and Jiancheng Zou
Appl. Sci. 2025, 15(23), 12354; https://doi.org/10.3390/app152312354 - 21 Nov 2025
Viewed by 526
Abstract
With the exponential growth of road transportation infrastructure, the need for pavement maintenance has increased significantly. Surface cracking represents a critical evaluation metric in roadway inspection. Conventional manual inspection methods impose substantial demands on personnel resources, time investment, and operational safety while being [...] Read more.
With the exponential growth of road transportation infrastructure, the need for pavement maintenance has increased significantly. Surface cracking represents a critical evaluation metric in roadway inspection. Conventional manual inspection methods impose substantial demands on personnel resources, time investment, and operational safety while being susceptible to subjective assessment biases. Leveraging advancements in computer vision technology, researchers have progressively investigated automated solutions for infrastructure defect identification. This study presents an enhanced deep learning framework for pavement crack detection within computer science applications, featuring three principal innovations: implementation of the SIoU loss function for improved boundary regression, adoption of the Mish activation function to enhance feature representation, and integration of the EfficientFormerV2 attention mechanism for optimized computational efficiency. Experimental validation confirms the technical feasibility of our approach, demonstrating measurable improvements in processing efficiency and computational speed compared to baseline methods. Full article
Show Figures

Figure 1

23 pages, 16159 KB  
Article
Adaptive Multi-Scale Feature Learning Module for Pediatric Pneumonia Recognition in Chest X-Rays
by Petra Radočaj, Goran Martinović and Dorijan Radočaj
Appl. Sci. 2025, 15(21), 11824; https://doi.org/10.3390/app152111824 - 6 Nov 2025
Viewed by 533
Abstract
Pneumonia remains a major global health concern, particularly among pediatric populations in low-resource settings where radiological expertise is limited. This study investigates the enhancement of deep convolutional neural networks (CNNs) for automated pneumonia diagnosis from chest X-ray images through the integration of a [...] Read more.
Pneumonia remains a major global health concern, particularly among pediatric populations in low-resource settings where radiological expertise is limited. This study investigates the enhancement of deep convolutional neural networks (CNNs) for automated pneumonia diagnosis from chest X-ray images through the integration of a novel module combining Inception blocks, Mish activation, and Batch Normalization (IncMB). Four state-of-the-art transfer learning models—InceptionV3, InceptionResNetV2, MobileNetV2, and DenseNet201—were evaluated in their base form and with the proposed IncMB extension. Comparative analysis based on standardized classification metrics reveals consistent performance improvements across all models with the addition of the IncMB module. The most notable improvement was observed in InceptionResNetV2, where the IncMB-enhanced model achieved the highest accuracy of 0.9812, F1-score of 0.9761, precision of 0.9781, recall of 0.9742, and strong specificity of 0.9590. Other models also demonstrated similar trends, confirming that the IncMB module contributes to better generalization and discriminative capability. These enhancements were achieved while reducing the total number of parameters, indicating improved computational efficiency. In conclusion, the integration of IncMB significantly boosts the performance of CNN-based pneumonia classifiers, offering a promising direction for the development of lightweight, high-performing diagnostic tools suitable for real-world clinical application, particularly in underserved healthcare environments. Full article
(This article belongs to the Special Issue Engineering Applications of Hybrid Artificial Intelligence Tools)
Show Figures

Figure 1

25 pages, 8228 KB  
Article
Soybean Seed Classification and Identification Based on Corner Point Multi-Feature Segmentation and Improved MobileViT
by Yu Xia, Rui Zhu, Fan Ji, Junlan Zhang, Kunjie Chen and Jichao Huang
AgriEngineering 2025, 7(10), 354; https://doi.org/10.3390/agriengineering7100354 - 21 Oct 2025
Viewed by 694
Abstract
To address the challenges of high model complexity, substantial computational resource consumption, and insufficient classification accuracy in existing soybean seed identification research, we first perform soybean seed segmentation based on polygon features, constructing a dataset comprising five categories: whole seeds, broken seeds, seeds [...] Read more.
To address the challenges of high model complexity, substantial computational resource consumption, and insufficient classification accuracy in existing soybean seed identification research, we first perform soybean seed segmentation based on polygon features, constructing a dataset comprising five categories: whole seeds, broken seeds, seeds with epidermal damage, immature seeds, and spotted seeds. The MobileViT module is then optimized by employing Depthwise Separable Convolution (DSC) in place of standard convolutions, applying Transformer Half-Dimension (THD) for dimensional reconstruction, and integrating Dynamic Channel Recalibration (DCR) to reduce model parameters and enhance inter-channel interactions. Furthermore, by incorporating the CBAM attention mechanism into the MV2 module and replacing the ReLU6 activation function with the Mish activation function, the model’s feature extraction capability and generalization performance are further improved. These enhancements culminate in a novel soybean seed detection model, MobileViT-SD (MobileViT for Soybean Detection). Experimental results demonstrate that the proposed MobileViT-SD model contains only 2.09 million parameters while achieving a classification accuracy of 98.39% and an F1 score of 98.38%, representing improvements of 2.86% and 2.88%, respectively, over the original MobileViT model. Comparative experiments further show that MobileViT-SD not only outperforms several representative lightweight models in both detection accuracy and efficiency but also surpasses a number of mainstream heavyweight models. Its highly optimized, lightweight architecture combines efficient inference performance with low resource consumption, making it well-suited for deployment in computing-constrained environments, such as edge devices. Full article
Show Figures

Figure 1

27 pages, 6209 KB  
Article
Prediction of Skid Resistance of Asphalt Pavements on Highways Based on Machine Learning: The Impact of Activation Functions and Optimizer Selection
by Xiaoyun Wan, Xiaoqing Yu, Maomao Chen, Haixin Ye, Zhanghong Liu and Qifeng Yu
Symmetry 2025, 17(10), 1708; https://doi.org/10.3390/sym17101708 - 11 Oct 2025
Viewed by 476
Abstract
Skid resistance is a key factor in road safety, directly affecting vehicle stability and braking efficiency. To enhance predictive accuracy, this study develops a multilayer perceptron (MLP) model for forecasting the Sideway Force Coefficient (SFC) of asphalt pavements and systematically examines the role [...] Read more.
Skid resistance is a key factor in road safety, directly affecting vehicle stability and braking efficiency. To enhance predictive accuracy, this study develops a multilayer perceptron (MLP) model for forecasting the Sideway Force Coefficient (SFC) of asphalt pavements and systematically examines the role of activation functions and optimizers. Seven activation functions (Sigmoid, Tanh, ReLU, Leaky ReLU, ELU, Mish, Swish) and three optimizers (SGD, RMSprop, Adam) are evaluated using regression metrics (MSE, RMSE, MAE, R2) and loss-curve analysis. Results show that ReLU and Mish provide notable improvements over Sigmoid, with ReLU increasing goodness of fit and accuracy by 13–15%, and Mish further enhancing nonlinear modeling by 12–14%. For optimizers, Adam achieves approximately 18% better performance than SGD, offering faster convergence, higher accuracy, and stronger stability, while RMSprop shows moderate performance. The findings suggest that combining ReLU or Mish with Adam yields highly precise and robust predictions under multi-source heterogeneous inputs. This study offers a reliable methodological reference for intelligent pavement condition monitoring and supports safety management in highway transportation systems. Full article
Show Figures

Figure 1

23 pages, 4203 KB  
Article
Improved Super-Resolution Reconstruction Algorithm Based on SRGAN
by Guiying Zhang, Tianfu Guo, Zhiqiang Wang, Wenjia Ren and Aryan Joshi
Appl. Sci. 2025, 15(18), 9966; https://doi.org/10.3390/app15189966 - 11 Sep 2025
Viewed by 1242
Abstract
To improve the performance of image super-resolution reconstruction, this paper optimizes the classical SRGAN model architecture. The original SRResNet is replaced with the EDSR network as the generator, which effectively enhances the ability to restore image details. To address the issue of insufficient [...] Read more.
To improve the performance of image super-resolution reconstruction, this paper optimizes the classical SRGAN model architecture. The original SRResNet is replaced with the EDSR network as the generator, which effectively enhances the ability to restore image details. To address the issue of insufficient multi-scale feature extraction in SRGAN during image reconstruction, an LSK attention mechanism is introduced into the generator. By fusing features from different receptive fields through parallel multi-scale convolution kernels, the model improves its ability to capture key details. To mitigate the instability and overfitting problems in the discriminator training, the Mish activation function is used instead of LeakyReLU to improve gradient flow, and a Dropout layer is introduced to enhance the discriminator’s generalization ability, preventing overfitting to the generator. Additionally, a staged training strategy is employed during adversarial training. Experimental results show that the improved model effectively enhances image reconstruction quality while maintaining low complexity. The generated results exhibit clearer details and more natural visual effects. On the public datasets Set5, Set14, and BSD100, compared to the original SRGAN, the PSNR and SSIM metrics improved by 13.4% and 5.9%, 9.9% and 6.0%, and 6.8% and 5.8%, respectively, significantly enhancing the reconstruction of super-resolution images, achieving more refined and realistic image quality improvement. The model also demonstrates stronger generalization ability on complex cross-domain data, such as remote sensing images and medical images. The improved model achieves higher-quality image reconstruction and more natural visual effects while maintaining moderate computational overhead, validating the effectiveness of the proposed improvements. Full article
Show Figures

Figure 1

26 pages, 3558 KB  
Article
Application of Inverse Optimization Algorithms in Neural Network Models for Short-Term Stock Price Forecasting
by Ekaterina Gribanova, Roman Gerasimov and Elena Viktorenko
Big Data Cogn. Comput. 2025, 9(9), 235; https://doi.org/10.3390/bdcc9090235 - 9 Sep 2025
Viewed by 1190
Abstract
This paper introduces novel inverse optimization algorithms (RC and DC) for neural network training in stock price forecasting in an attempt to overcome the traditional gradient descent limitation of local minima convergence. The key novelty is a stochastic algorithm for inverse problems adapted [...] Read more.
This paper introduces novel inverse optimization algorithms (RC and DC) for neural network training in stock price forecasting in an attempt to overcome the traditional gradient descent limitation of local minima convergence. The key novelty is a stochastic algorithm for inverse problems adapted to neural network training, where target function values decrease iteratively through selective weight modification. Experimental analysis used closing price data from 40 Russian companies, comparing traditional activation functions (linear, sigmoid, tanh) with specialized functions (sincos, cloglogm, mish) across perceptrons and single-hidden-layer networks. Key findings show the superiority of the DC method for single-layer networks, while RC proves most effective for hidden-layer networks. The linear activation function with the RC algorithm delivered optimal results in most experiments, challenging conventional nonlinear activation preferences. The optimal architecture, namely, a single hidden layer with two neurons, achieved the best prediction accuracy in 70% of cases. The research confirms that inverse optimization algorithms can provide higher training efficiency than classical gradient methods, offering practical improvements for financial forecasting. Full article
Show Figures

Figure 1

27 pages, 13245 KB  
Article
LHRF-YOLO: A Lightweight Model with Hybrid Receptive Field for Forest Fire Detection
by Yifan Ma, Weifeng Shan, Yanwei Sui, Mengyu Wang and Maofa Wang
Forests 2025, 16(7), 1095; https://doi.org/10.3390/f16071095 - 2 Jul 2025
Cited by 1 | Viewed by 874
Abstract
Timely and accurate detection of forest fires is crucial for protecting forest ecosystems. However, traditional monitoring methods face significant challenges in effectively detecting forest fires, primarily due to the dynamic spread of flames and smoke, irregular morphologies, and the semi-transparent nature of smoke, [...] Read more.
Timely and accurate detection of forest fires is crucial for protecting forest ecosystems. However, traditional monitoring methods face significant challenges in effectively detecting forest fires, primarily due to the dynamic spread of flames and smoke, irregular morphologies, and the semi-transparent nature of smoke, which make it extremely difficult to extract key visual features. Additionally, deploying these detection systems to edge devices with limited computational resources remains challenging. To address these issues, this paper proposes a lightweight hybrid receptive field model (LHRF-YOLO), which leverages deep learning to overcome the shortcomings of traditional monitoring methods for fire detection on edge devices. Firstly, a hybrid receptive field extraction module is designed by integrating the 2D selective scan mechanism with a residual multi-branch structure. This significantly enhances the model’s contextual understanding of the entire image scene while maintaining low computational complexity. Second, a dynamic enhanced downsampling module is proposed, which employs feature reorganization and channel-wise dynamic weighting strategies to minimize the loss of critical details, such as fine smoke textures, while reducing image resolution. Furthermore, a scale weighted Fusion module is introduced to optimize multi-scale feature fusion through adaptive weight allocation, addressing the issues of information dilution and imbalance caused by traditional fusion methods. Finally, the Mish activation function replaces the SiLU activation function to improve the model’s ability to capture flame edges and faint smoke textures. Experimental results on the self-constructed Fire-SmokeDataset demonstrate that LHRF-YOLO achieves significant model compression while further improving accuracy compared to the baseline model YOLOv11. The parameter count is reduced to only 2.25M (a 12.8% reduction), computational complexity to 5.4 GFLOPs (a 14.3% decrease), and mAP50 is increased to 87.6%, surpassing the baseline model. Additionally, LHRF-YOLO exhibits leading generalization performance on the cross-scenario M4SFWD dataset. The proposed method balances performance and resource efficiency, providing a feasible solution for real-time and efficient fire detection on resource-constrained edge devices with significant research value. Full article
(This article belongs to the Special Issue Forest Fires Prediction and Detection—2nd Edition)
Show Figures

Figure 1

14 pages, 3123 KB  
Article
Impact of Activation Functions on the Detection of Defects in Cast Steel Parts Using YOLOv8
by Yunxia Chen, Yangkai He and Yukun Chu
Materials 2025, 18(12), 2834; https://doi.org/10.3390/ma18122834 - 16 Jun 2025
Viewed by 827
Abstract
In this paper, to address the issue of the unknown influence of activation functions on casting defect detection using convolutional neural networks (CNNs), we designed five sets of experiments to investigate how different activation functions affect the performance of casting defect detection. Specifically, [...] Read more.
In this paper, to address the issue of the unknown influence of activation functions on casting defect detection using convolutional neural networks (CNNs), we designed five sets of experiments to investigate how different activation functions affect the performance of casting defect detection. Specifically, the study employs five activation functions—Rectified Linear Unit (ReLU), Exponential Linear Units (ELU), Softplus, Sigmoid Linear Unit (SiLU), and Mish—each with distinct characteristics, based on the YOLOv8 algorithm. The results indicate that the Mish activation function yields the best performance in casting defect detection, achieving an mAP@0.5 value of 90.1%. In contrast, the Softplus activation function performs the worst, with an mAP@0.5 value of only 86.7%. The analysis of the feature maps shows that the Mish activation function enables the output of negative values, thereby enhancing the model’s ability to differentiate features and improving its overall expressive power, which enhances the model’s ability to identify various types of casting defects. Finally, gradient class activation maps (Grad-CAM) are used to visualize the important pixel regions in the casting digital radiography (DR) images processed by the neural network. The results demonstrate that the Mish activation function improves the model’s focus on grayscale-changing regions in the image, thereby enhancing detection accuracy. Full article
Show Figures

Figure 1

18 pages, 5973 KB  
Article
Power Line Segmentation Algorithm Based on Lightweight Network and Residue-like Cross-Layer Feature Fusion
by Wenqiang Zhu, Huarong Ding, Gujing Han, Wei Wang, Minlong Li and Liang Qin
Sensors 2025, 25(11), 3551; https://doi.org/10.3390/s25113551 - 4 Jun 2025
Viewed by 1125
Abstract
Power line segmentation plays a critical role in ensuring the safety of transmission line UAV inspection flights. To address the challenges of small target scale, complex backgrounds, and excessive model parameters in existing deep learning-based power line segmentation algorithms, this paper introduces RGS-UNet, [...] Read more.
Power line segmentation plays a critical role in ensuring the safety of transmission line UAV inspection flights. To address the challenges of small target scale, complex backgrounds, and excessive model parameters in existing deep learning-based power line segmentation algorithms, this paper introduces RGS-UNet, a lightweight segmentation model integrating a residual-like cross-layer feature fusion module. First, ResNet18 is adopted to reconstruct a UNet backbone network as an encoder module to enhance the network’s feature extraction capability for small targets. Second, ordinary convolution in the residual block of ResNet18 is optimized by introducing the Ghost Module, which significantly reduces the computational load of the model’s backbone network. Third, a residual-like addition method is designed to embed the SIMAM attention mechanism module into both encoder and decoder stages, which improves the model’s ability to extract power lines from complex backgrounds. Finally, the Mish activation function is applied in deep convolutional layers to maintain feature extraction accuracy and mitigate overfitting. Experimental results demonstrate that compared with classical UNet, the optimized network achieves 2.05% and 2.58% improvements in F1-Score and IoU, respectively, while reducing the parameter count to 57.25% of the original model. The algorithm achieves better performance improvements in both accuracy and lightweighting, making it suitable for edge-side deployment. Full article
(This article belongs to the Section Fault Diagnosis & Sensors)
Show Figures

Figure 1

18 pages, 2989 KB  
Article
Interpretable Deep Learning for Pediatric Pneumonia Diagnosis Through Multi-Phase Feature Learning and Activation Patterns
by Petra Radočaj and Goran Martinović
Electronics 2025, 14(9), 1899; https://doi.org/10.3390/electronics14091899 - 7 May 2025
Cited by 3 | Viewed by 3205
Abstract
Pediatric pneumonia remains a critical global health challenge requiring accurate and interpretable diagnostic solutions. Although deep learning has shown potential for pneumonia recognition on chest X-ray images, gaps persist in understanding model interpretability and feature learning during training. We evaluated four convolutional neural [...] Read more.
Pediatric pneumonia remains a critical global health challenge requiring accurate and interpretable diagnostic solutions. Although deep learning has shown potential for pneumonia recognition on chest X-ray images, gaps persist in understanding model interpretability and feature learning during training. We evaluated four convolutional neural network (CNN) architectures, i.e., InceptionV3, InceptionResNetV2, DenseNet201, and MobileNetV2, using three approaches—standard convolution, multi-scale convolution, and strided convolution—all incorporating the Mish activation function. Among the tested models, InceptionResNetV2, with strided convolutions, demonstrated the best performance, achieving an accuracy of 0.9718. InceptionV3 also performed well using the same approach, with an accuracy of 0.9684. For DenseNet201 and MobileNetV2, the multi-scale convolution approach was more effective, with accuracies of 0.9676 and 0.9437, respectively. Gradient-weighted class activation mapping (Grad-CAM) visualizations provided critical insights, e.g., multi-scale convolutions identified diffuse viral pneumonia patterns across wider lung regions, while strided convolutions precisely highlighted localized bacterial consolidations, aligning with radiologists’ diagnostic priorities. These findings establish the following architectural guidelines: strided convolutions are suited to deep hierarchical CNNs, while multi-scale approaches optimize compact models. This research significantly advances the development of interpretable, high-performance diagnostic systems for pediatric pneumonia using chest X-rays, bridging the gap between computational innovation and clinical application. Full article
(This article belongs to the Section Computer Science & Engineering)
Show Figures

Graphical abstract

15 pages, 3719 KB  
Article
Enhancing Human Pose Transfer with Convolutional Block Attention Module and Facial Loss Optimization
by Hsu-Yung Cheng, Chun-Chen Chiang, Chi-Lun Jiang and Chih-Chang Yu
Electronics 2025, 14(9), 1855; https://doi.org/10.3390/electronics14091855 - 1 May 2025
Viewed by 1755
Abstract
Pose transfer methods often struggle to simultaneously preserve fine-grained clothing textures and facial details, especially under large pose variations. To address these limitations, we propose a model based on the Multi-scale attention guided pose transfer model, with modifications to its residual block by [...] Read more.
Pose transfer methods often struggle to simultaneously preserve fine-grained clothing textures and facial details, especially under large pose variations. To address these limitations, we propose a model based on the Multi-scale attention guided pose transfer model, with modifications to its residual block by integrating the convolutional block attention module and changing the activation function from ReLU to Mish to capture more features related to clothing and skin color. Additionally, as the generated images had facial features differing from the original image, we propose two different facial feature loss functions to help the model learn more precise image features. According to the experimental results, the proposed method demonstrates superior performance compared to the Multi-scale Attention Guided Pose Transfer (MAGPT) on the DeepFashion dataset, achieving a 3.41% reduction in FID, a 0.65% improvement in SSIM, a 2% decrease in LPIPS, and a 2.7% decrease in LPIPS. Ultimately, only one reference image is required to enable users to transform into different action videos with the proposed system architecture. Full article
(This article belongs to the Special Issue Machine Learning Techniques for Image Processing)
Show Figures

Figure 1

25 pages, 7481 KB  
Article
Grading Algorithm for Orah Sorting Line Based on Improved ShuffleNet V2
by Yifan Bu, Hao Liu, Hongda Li, Bryan Gilbert Murengami, Xingwang Wang and Xueyong Chen
Appl. Sci. 2025, 15(8), 4483; https://doi.org/10.3390/app15084483 - 18 Apr 2025
Cited by 1 | Viewed by 976
Abstract
This study proposes a grading algorithm for Orah sorting lines based on machine vision and deep learning. The original ShuffleNet V2 network was modified by replacing the ReLU activation function with the Mish activation function to alleviate the neuron death problem. The ECA [...] Read more.
This study proposes a grading algorithm for Orah sorting lines based on machine vision and deep learning. The original ShuffleNet V2 network was modified by replacing the ReLU activation function with the Mish activation function to alleviate the neuron death problem. The ECA attention module was incorporated to enhance the extraction of Orah appearance features, and transfer learning was applied to improve model performance. As a result, the ShuffleNet_wogan model was developed. Based on the operational principles of the sorting line, a time-sequential grading algorithm was designed to improve grading accuracy, along with a multi-sampling diameter algorithm for simultaneous Orah diameter measurement. Experimental results show that the ShuffleNet_wogan model achieved an accuracy of 91.12%, a 3.92% improvement compared to the original ShuffleNet V2 network. The average prediction time for processing 10 input images was 51.44 ms. The sorting line achieved a grading speed of 10 Orahs per second, with an appearance grading accuracy of 92.5% and a diameter measurement compliance rate of 98.3%. The proposed algorithm is characterized by high speed and accuracy, enabling efficient Orah sorting. Full article
Show Figures

Figure 1

Back to TopTop