Next Article in Journal
A Comprehensive Approach to Rustc Optimization Vulnerability Detection in Industrial Control Systems
Previous Article in Journal
Mathematical Modelling and Optimization Methods in Geomechanically Informed Blast Design: A Systematic Literature Review
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Dynamic Snake Convolution Neural Network for Enhanced Image Super-Resolution

1
School of Software, Northwestern Polytechnical University, Xi’an 710129, China
2
Key Laboratory of Brain-Machine Intelligence Technology, College of Artificial Intelligence, Nanjing University of Aeronautics and Astronautics, Ministry of Education, Nanjing 211106, China
3
School of Computing and Information Systems, University of Melbourne, Parkville 3010, Australia
4
Shenzhen Research Institute of Northwestern Polytechnical University, Northwestern Polytechnical University, Shenzhen 518057, China
5
Yangtze River Delta Research Institute, Northwestern Polytechnical University, Taicang 215400, China
*
Author to whom correspondence should be addressed.
Mathematics 2025, 13(15), 2457; https://doi.org/10.3390/math13152457
Submission received: 27 June 2025 / Revised: 23 July 2025 / Accepted: 29 July 2025 / Published: 30 July 2025
(This article belongs to the Special Issue Structural Networks for Image Application)

Abstract

Image super-resolution (SR) is essential for enhancing image quality in critical applications, such as medical imaging and satellite remote sensing. However, existing methods were often limited in their ability to effectively process and integrate multi-scales information from fine textures to global structures. To address these limitations, this paper proposes DSCNN, a dynamic snake convolution neural network for enhanced image super-resolution. DSCNN optimizes feature extraction and network architecture to enhance both performance and efficiency: To improve feature extraction, the core innovation is a feature extraction and enhancement module with dynamic snake convolution that dynamically adjusts the convolution kernel’s shape and position to better fit the image’s geometric structures, significantly improving feature extraction. To optimize the network’s structure, DSCNN employs an enhanced residual network framework. This framework utilizes parallel convolutional layers and a global feature fusion mechanism to further strengthen feature extraction capability and gradient flow efficiency. Additionally, the network incorporates a SwishReLU-based activation function and a multi-scale convolutional concatenation structure. This multi-scale design effectively captures both local details and global image structure, enhancing SR reconstruction. In summary, the proposed DSCNN outperforms existing methods in both objective metrics and visual perception (e.g., our method achieved optimal PSNR and SSIM results on the Set5 × 4 dataset).

1. Introduction

Artificial intelligence (AI) has become integral to various industries, transforming traditional approaches and driving advancements in emerging technologies. As a fundamental technology AI, deep learning has demonstrated remarkable success across diverse fields (e.g., image denoising [1,2], image watermark removal [3], defect recognition [4,5]). Within this domain, single-image SR (SISR) has emerged as a prominent research topic. SISR aims to reconstruct high-resolution (HR) images with richer details from a single low-resolution (LR) input, fundamentally involving the inference of missing high-frequency information to enhance visual quality and clarity. The significance of SISR extends to numerous practical applications. In medical imaging [6], for instance, it facilitates accurate disease diagnosis and supports early detection and treatment. For satellite remote sensing [7], it improves the image’s resolution, enabling clearer monitoring and assessment of natural disasters. In security and surveillance [8], it enhances video footage quality. Moreover, SISR techniques are widely utilized to optimize image quality, improve visual experiences, and support film and television production and restoration.
Driven by rapid advancements in deep learning, super-resolution technology has achieved significant breakthroughs, yielding results that substantially surpass traditional algorithms. Dong et al. [9,10] pioneered the first CNN for SR (SRCNN) with an end-to-end architecture and later proposed fast SRCNN (FSRCNN) to reduce computational costs by using deconvolution for backend upsampling. Seeking further improvements in both speed and performance, Shi et al. [11] introduced an efficient sub-pixel CNN (ESPCNN), incorporating sub-pixel convolution for upsampling to mitigate artifacts commonly associated with deconvolution. Consequently, fixed-scale super-resolution methods have predominantly adopted three main upsampling approaches: interpolation, deconvolution, and sub-pixel convolution. Inspired by ResNet [12], the concept of residual connections gained widespread adoption. Building on this, Kim et al. [13] proposed a very deep CNN (VDSR), employing global residual connections to preserve feature information and enhance reconstruction accuracy. Lim et al. [14] extended this work by proposing the enhanced deep residual network for SISR (EDSR), integrating residual connections and removing batch normalization (BN) layers for improved accuracy, albeit at the cost of increased network width, parameters, and computational load. Deeply recursive convolutional networks for SISR (DRCNs) [15] were the first to apply recurrent neural network (RNN) concepts to SR, reducing parameter counts. Concurrently, researchers developed lightweight SR networks, such as convolutional anchored regression networks, for fast and accurate SISR (CARN) [16], and they applied attention in attention networks for image SR (A2N) [17]. To advance performance, deep Laplacian pyramid networks for fast and accurate SR (LapSRN) [18], based on the Laplacian pyramid concept, achieved multi-scale SR reconstruction, and they represent a cascaded upsampling approach. Furthermore, SISR using very deep residual channel attention networks (RCANs) [19] enhanced performance by stacking RCABs, combining residual learning with channel attention. Despite these advancements, many prior CNN architectures for SR primarily focused on fixed, single-scale feature extraction. They were often limited in their ability to effectively process and integrate information across diverse scales (i.e., from fine textures to global structures) within a unified framework, frequently resulting in reconstructed images lacking intricate details and satisfactory perceptual fidelity.
To accommodate complex geometric structures, effectively integrate information from different scales, and improve training efficiency to enhance the performance of image SR models, this paper presents a novel CNN model termed DSCNN. The proposed approach aims to comprehensively improve SR capability by strengthening feature extraction and optimizing the network’s architecture. A key innovation is the embedding of dynamic snake convolution (DSConv), which significantly enhances the model’s ability to capture fine image details, thereby boosting the quality and efficiency of SR reconstruction. DSConv dynamically adjusts the shape and position of convolutional kernels, enabling precise adaptation to geometric structures within images and substantially improving feature extraction flexibility and adaptability. An improved residual network architecture is employed to effectively mitigate the vanishing gradient problem during deep network training through residual learning, enhancing both training efficiency and overall performance. The integration of SwishReLU [20] addresses the dead ReLU problem, contributing to improved reconstruction quality. Furthermore, a multi-scale convolution parallel structure is incorporated, enabling the model to simultaneously capture local details and global structures, further optimizing the SR effect. The main contributions of this work are summarized as follows:
(1)
DSConv is embedded into the CNN architecture, leveraging its dynamic adjustment mechanism to capture intricate image detail features effectively.
(2)
An improved residual structure is utilized to facilitate efficient utilization of multi-level feature information within the network.
(3)
A parallel multi-scale convolution structure is integrated into the CNNs to enable the model to consolidate both local details and global structural information of the image.
The remainder of this paper is organized as follows. Section 2 reviews related work. Section 3 details the proposed network for SISR. Section 4 presents the experimental analysis and results. Finally, Section 5 concludes this paper.

2. Related Work

2.1. Residual Structures for Image Processing

Residual networks (ResNets) [12], pioneered by He et al. for image classification, have become foundational in deep learning due to their efficacy in training very deep networks. The core innovation lies in “shortcut” or “skip” connections, enabling the network to learn residual functions relative to the layer inputs. This design effectively mitigates the vanishing gradient problem, facilitating unimpeded gradient flow during backpropagation and allowing optimization of networks with hundreds of layers. Subsequent research has yielded various ResNet variants aimed at enhancing performance and reducing complexity. He et al. [21] introduced ResNet V2, optimizing training efficiency through component reordering and a pre-activation mechanism. A key challenge in training very deep residual networks is diminishing feature reuse, leading to significantly slow convergence. To address this problem, Zagoruyko et al. [22] proposed Wider ResNet to enhance model expressiveness by increasing the network’s width while managing computational loads. In CNNs for image classification, the resolution is progressively reduced, culminating in tiny feature maps where the scene’s spatial structure becomes indistinguishable. These problems can be alleviated by dilation, which increases the resolution of output feature maps without reducing the receptive field of individual neurons. Yu et al. [23] incorporated dilated convolutions into dilated ResNet to expand receptive fields and improve feature extraction. To enhance a network’s representational power, several recent methods demonstrate significant improvements by incorporating better spatial encoding strategies. Hu et al. [24] introduced the Squeeze-and-Excitation (SE) module within SENet, integrating channel attention to boost feature representation. Collectively, these variants address deep network optimization challenges through diverse mechanisms, improving computational performance and efficiency.
This powerful residual paradigm has been successfully adapted to low-level image processing tasks [25], particularly SISR. The VDSR network [13] was among the first to leverage this concept effectively, employing a single global residual connection to learn the difference between LR and HR images, yielding significant performance gains. Building on this, the EDSR [14] adopted local residual blocks and crucially removed Batch Normalization (BN) layers. EDSR argued that BN layers, which are beneficial in high-level tasks, discard valuable range information detrimental to image restoration performance.
Our work extends this foundation by proposing an improved residual structure. This structure aims to ensure stable training while enhancing feature propagation through parallel convolutional pathways and global feature fusion, promoting more comprehensive utilization of multi-level feature information.

2.2. Dynamic Convolution for Image Processing

Standard convolutional layers employ fixed, static kernels applied uniformly across an image. This rigidity limits their ability to adapt to the diverse and complex geometric structures inherent in natural images. Dynamic convolution [26] addresses this limitation by generating kernel weights or sampling locations conditioned on input features, allowing the receptive field to dynamically adjust its shape and focus based on image content. The evolution of dynamic convolution includes key milestones. The deformable convolutional network (DCN) [27] represents a seminal work, learning 2D offsets for the regular sampling grid of standard convolutions. This enables the receptive field to deform and align with object shapes, proving highly beneficial in detection and segmentation tasks. Conditionally parameterized convolution (CondConv) [28] achieves input-adaptive convolution by representing the kernel as a linear combination of multiple expert kernels. Subsequent developments refined these concepts. Dynamic convolution (DyConv) [29] enhanced training stability and performance using a temperature annealing strategy. For 3D data, Dynamic 3D Convolution (DyCo3D) [30] dynamically adjusts kernel parameters based on complex 3D structures, enabling more effective extraction of key information. Significantly, dynamic convolution often improves a model’s performance without incurring significant computational overhead. Table 1 summarizes the key differences between DSConv and existing dynamic convolutions. Unlike DCN’s point-wise offsets or CondConv’s kernel weighting, DSConv employs iterative kernel tracing along curvilinear structures—a unique advantage for preserving continuous edges/textures in SR.
DSConv’s geometric tracing mechanism provides unique advantages for SR: (1) The iterative kernel adjustment preserves edge continuity by preventing sampling point dispersion. (2) The directional focus along primary axes enhances texture coherence. (3) Offset constraints ensure stability when handling high-frequency details.
In this work, we integrate the feature extraction and enhancement module with DSConv [31] into the super-resolution task. Originally designed for segmenting curvilinear structures like blood vessels, DSConv uniquely iteratively adjusts the kernel’s shape and position to actively “trace” such features. The integration of DSConv is designed to empower our network to more effectively capture and reconstruct intricate textures, sharp edges, and other crucial high-frequency details that are essential for high-quality SR. These details are frequently smoothed over by static kernels or inadequately modeled by standard dynamic convolution approaches.

2.3. Deep CNNs for Image Super-Resolution

Deep learning has profoundly reshaped SISR. The pioneering SRCNN [9] first demonstrated the superiority of end-to-end learning, significantly outperforming traditional methods. To enhance computational efficiency, subsequent models like the FSRCNN [10] and ESPCNN [11] adopted a post-upsampling strategy. Instead of upsampling the input image at the start, these networks perform feature extraction in the LR space and employ a learnable upsampling layer (e.g., deconvolution or sub-pixel convolution) at the network’s end. This paradigm drastically reduces computational costs and has become standard practice.
Research progression led to increasingly deeper and more sophisticated architectures to boost reconstruction quality. Models like VDSR [13] and the deeply recursive convolutional network (DRCN) [15] were explored at greater depth using global residual learning and recursive structures. Attention mechanisms were later incorporated, exemplified by the residual channel attention network (RCAN) [19], allowing networks to focus on more informative features. However, a common limitation of many architectures is their tendency to treat all spatial locations equally during feature aggregation, potentially neglecting the influence of more informative pixels and limiting training efficiency. Tian et al. [32] applied asymmetric convolution to SR, enhancing the robustness of the model to rotation and flipping. Despite their remarkable performance, the practical deployment of deep learning methods is often hindered by their substantial computational requirements. Therefore, lightweight networks like the Cascading Residual Network (CARN) [16] were developed to balance performance and efficiency. More recently, Tian et al. [33] explicitly combined dynamic network principles with SR, proposing a dynamic super-resolution network (DSRNet) that yielded significant performance gains. To effectively address the significant variations in scale and perspective inherent in natural images, jointly learning hierarchical features is a pivotal strategy for image SR. The Residual Dense Network (RDN) [34] further enhanced feature utilization by integrating residual and dense connections with local feature fusion. While deep CNNs generally extract richer structural information, identifying the most impactful layers within a single complex architecture remains challenging. Addressing this, Tian et al. [35] proposed the tree-guided SR network (TSRNet), leveraging a binary tree structure to guide the deep network. This approach aims to amplify the influence of critical nodes and enhance the modeling of hierarchical information relationships, thereby improving image recovery capability.
Despite these impressive advances, many SISR methods still rely predominantly on standard convolutional blocks processing features at a fixed scale. Our proposed DSCNN distinguishes itself by integrating two key innovations: (1) dynamic snake convolution for adaptive feature extraction sensitive to local geometry and (2) a multi-scale parallel structure designed to effectively integrate information across different receptive fields. Together, these components aim to enhance both the capture of fine details and the coherent reconstruction of global structures, leading to more faithful high-resolution image generation.

3. Proposed Method

3.1. Network Architecture

This study proposes a dynamic snake convolution neural network (DSCNN) for SISR. The network architecture comprises four core modules: a feature extraction and enhancement module, a multi-scale feature fusion module, a deep feature extraction module, and an image reconstruction module. The feature extraction and enhancement module hierarchically extracts deep features using dynamic snake convolution (DSConv) [31] blocks integrated with residual connections. The multi-scale feature fusion module then processes these features, integrating information across different scales to enrich the overall feature representation. Finally, the image reconstruction module upsamples the fused feature map and generates the high-resolution output image. This modular design effectively enhances both the objective quality and perceptual fidelity of the super-resolved results (Figure 1).
The feature extraction and enhancement module forms the central core of DSCNN, which is responsible for deep hierarchical feature extraction. This module progressively extracts and refines features through stacked blocks combining DSConv and residual connections. DSConv dynamically adjusts the convolution kernel’s shape and position based on local image geometry, significantly enhancing the network’s capacity to capture fine-grained details and complex textures. Concurrently, the residual connections facilitate stable gradient flow and effective feature propagation across layers, mitigating the vanishing gradient problem inherent in deep architectures.
The multi-scale feature fusion module processes and integrates the outputs from the feature extraction module. It employs a parallel multi-scale convolutional structure to concurrently capture both local details and global structural information. Specifically, this structure utilizes convolution kernels of diverse receptive fields (e.g., 1 × 5 , 5 × 1 , 5 × 5 ) to extract complementary features at different scales. These multi-scale features are then concatenated along the channel dimension, forming a rich and comprehensive feature representation. This process not only significantly enriches the feature maps but also provides robust support for the subsequent high-fidelity image reconstruction.
The image reconstruction module is responsible for upsampling the fused feature map to the target high resolution and generating the final super-resolution image. This module first applies an upsampling operation (e.g., sub-pixel convolution) to increase the spatial resolution. Convolutional layers then map the upsampled features back to the RGB image space. Finally, a scaling operation adjusts the pixel values back to the original input range, yielding the high-quality super-resolution output. This module ensures the generation of visually consistent and natural-looking high-resolution images.

3.2. Loss Function

The L 1 norm [36] is widely adopted as the loss function for super-resolution and other low-level image processing tasks due to its robustness to outliers and superior detail preservation compared to L 2 . Consequently, we employ the L 1 norm to train the DSCNN model. Given a training set { I L R i , I H R i } i = 1 N , where I L R i and I H R i denote the i-th LR and HR image patches, respectively, and N is the total number of patches, the loss function is defined as follows:
L ( θ ) = 1 N i = 1 N DSCNN ( I L R i ; θ ) I H R i 1 ,
where L ( θ ) is the L 1 loss, θ represents the learnable parameters of DSCNN, and DSCNN ( I L R i ; θ ) denotes the model’s output for input I L R i . This loss is optimized using the Adam optimizer.

3.3. Improved Residual Structure

Standard residual networks (ResNets) [12] effectively mitigate the vanishing gradient problem in CNNs. However, their traditional architecture can be suboptimal for capturing the diverse complex features inherent in natural images. In very deep networks, low-level features such as edges and textures risk being diluted as they propagate through successive layers. To address this limitation, we introduce an enhanced residual learning framework explicitly designed to promote multi-scale feature representation and ensure robust information flow throughout the network.
This enhanced structure offers several key advantages. Firstly, it enables the network to adaptively weight the importance of features at different scales according to image content. Secondly, it retains the core residual benefit of stable training in deep architectures by preserving direct gradient paths through residual connections, bypassing non-linear transformation blocks. Additionally, this framework serves as a powerful backbone, providing a rich, hierarchically fused feature space. This enriched representation notably enhances the effectiveness of our feature extraction and enhancement module in modeling complex geometric deformations.

3.4. Feature Extraction and Enhancement Module

To enhance the network’s perception of image structure in super-resolution tasks, a key challenge lies in accurately modeling fine-grained textures and complex edges. Standard convolutional layers, which employ fixed and rigid kernels, exhibit limited adaptability to the diverse geometric patterns inherent in natural image content. This inflexibility can lead to the loss of high-frequency details. We therefore identify the need for a mechanism that can dynamically conform its receptive field to local structures. To this end, we introduce the dynamic snake convolution (DSConv) into this module. While DSConv was initially developed to enhance accuracy and continuity in tubular structure segmentation by tracing object contours, we posit that its core principle of adaptive kernel deformation is highly transferable to the problem of detail reconstruction in super-resolution. Its integration into our feature extraction and enhancement module serves to significantly augment the model’s ability to capture and represent intricate image features (Figure 2).
The core innovation of DSConv lies in its dynamic adaptation of the convolution kernel’s shape and position to better conform to target geometries. Specifically, DSConv computes adaptive offsets for a standard convolution kernel along both the x-axis and y-axis. To maintain attention continuity and prevent excessive receptive field dispersion caused by large deformation offsets, an iterative strategy is employed to determine focal positions. The positions of the sampling grids in the convolution kernel K are determined sequentially. Originating from the central position K i , the location of any subsequent grid K i ± c is generated based on the position of its predecessor. This process is mathematically defined as follows for the x-axis direction:
K i ± c = ( x i + c , y i + c ) = ( x i + c , y i + i + c i Δ y ) , ( x i c , y i c ) = ( x i c , y i + i i c Δ y ) .
For the y-axis direction, the formula becomes
K j ± c = ( x j + c , y j + c ) = ( x j + j + c j Δ x , y j + c ) , ( x j c , y j c ) = ( x j + j j c Δ x , y j c ) .
where c denotes the horizontal distance from the central grid, and Δ x and Δ y represent the dynamically learned offsets along their respective axes, enabling the kernel to flexibly adapt to the target’s geometric contours.
In super-resolution tasks, which inherently involve features at multiple scales and levels of detail, DSConv’s dynamic kernel adjustment capability proves highly beneficial. It allows the model to better capture fine-grained textures and structures within high-resolution images, leading to demonstrable improvements in reconstruction quality.

3.5. Multi-Scale Feature Fusion Module

To enhance the network’s ability to capture features across diverse spatial scales, we introduce an innovative multi-scale feature fusion module. This design efficiently extracts and integrates multi-scale information by processing the input feature map concurrently through four parallel convolutional layers with distinct kernel sizes: 3 × 3, 5 × 1, 1 × 5, and 5 × 5. This configuration enables the network to simultaneously attend to both localized details and broader structural patterns, yielding richer feature representations. The adoption of asymmetric convolutions such as 1 × 5 and 5 × 1 enhances the model’s robustness to image flipping and rotation. This approach improves the richness of feature representation and network performance without increasing computational overhead [37]. Experimental analyses revealed that optimal structural similarity (SSIM) and peak signal-to-noise ratio (PSNR) results were achieved when the four parallel layers each outputted one-fourth of the input channels (i.e., an output channel ratio of 1:1:1:1).
The resulting feature maps from these parallel paths are concatenated along the channel dimension to form a unified, multi-scale feature representation:
x concat = torch . cat ( [ x 3 × 3 , x 5 × 1 , x 1 × 5 , x 5 × 5 ] , dim = 1 )
where x 3 × 3 , x 5 × 1 , x 1 × 5 , and x 5 × 5 denote the outputs of the respective convolutional layers.
This concatenated feature map is then combined with the output of the preceding layer via residual connections. The combined features subsequently pass through additional convolutional layers and activation functions to further enhance their expressive power.
This multi-scale parallel design significantly increases the network’s flexibility in handling image features of varying scales and complexities. It effectively captures both coarse-grained structural information and fine-grained texture details, which are crucial for accurate image super-resolution. Experimental validation confirms that this structure markedly improves model performance, leading to superior restoration of texture details, reduced blurring and distortion, and enhanced visual quality and objective metrics (PSNR and SSIM) in the reconstructed images.

4. Experimental Analysis and Results

4.1. Datasets

Our DSCNN was trained on the DIV2K dataset [38], the standard benchmark for image restoration tasks. We utilized its 800 training images for model optimization and 100 validation images for monitoring convergence and selecting optimal checkpoints. The evaluation was performed on four established benchmark datasets: Set5 [39] (5 natural images for preliminary validation), Set14 [40] (14 images containing diverse natural and artificial structures), B100 [41] (100 images from the Berkeley Segmentation Dataset with varied textures), and Urban100 [42] (100 challenging urban scenes characterized by fine geometric details and self-similar patterns). The human eye is much more sensitive to changes in luminance than changes in chrominance, and the Y channel of an image represents luminance information. Additionally, the vast majority of SR methods report PSNR and SSIM on the Y channel [9]. To ensure a fair comparison with previous work, all quantitative evaluations (PSNR and SSIM) were conducted on the luminance (Y) channel in YCbCr color space.

4.2. Experimental Settings

The proposed network architecture comprises four functional modules. The feature extraction and enhancement module employs one standard convolutional layer followed by three DSConv layers to extract initial features, with residual connections enhancing feature representation. The multi-scale feature fusion module further processes features through six standard convolutional layers combined with a multi-scale convolution module. Subsequent deep feature extraction is performed via sequential convolutional operations, while the final image reconstruction module utilizes a custom upsampling mechanism to generate super-resolved outputs. This modular design ensures clear information flow while optimizing detail reconstruction.
The batchsize of the experiment is 64, the patchsize is 64, and the data augmentation strategy adopts random cropping, random flipping, or random rotation and performs normalization operation. The optimizer uses Adam, with a learning rate initialized to 10 4 and halved every 400,000 iterations. We evaluate the super-resolution results using PSNR (dB) and SSIM. The formal experimental setup consists of 1 epoch every 1000 iterations, with a total of 900 training epochs. The ablation experiment was conducted for 600 epochs, with 1 epoch for every 1000 iterations, but the batchsize and patchsize were adjusted to 32 and 16, respectively. All implementations are built onPyTorch 1.13.1 and executed on servers with one NVIDIA GeForce RTX 3090 GPU (NVIDIA, Santa Clara, CA, USA).

4.3. Ablation Study

To validate the contributions of key components in our DSCNN framework, we conducted systematic ablation experiments for ×4 super-resolution. Starting from a baseline CNN with standard residual blocks, we incrementally integrated proposed modules (DSConv layers, multi-scale fusion, etc.) and evaluated performance progression on the Set5, Set14, and B100 datasets. The quantitative results demonstrating component-wise contributions are summarized in Table 2.
The ablation study begins with a baseline model consisting of a standard residual network without enhancements. We first integrated DSConv by replacing standard convolutions in residual blocks, which alone yielded significant performance gains, e.g., a 0.11 dB PSNR increase on Set5, demonstrating DSConv’s superior capability in capturing intricate structures compared to static kernels. Next, we incorporated the enhanced residual structure with multi-path design, improving feature propagation and fusion to achieve an additional 0.02 dB gain on Set5. We then replaced standard ReLU with SwishReLU activation, resolving potential “dying ReLU” issues while providing smoother gradients for more stable optimization. Finally, the multi-scale enhancement module was added to explicitly fuse features from different receptive fields, completing our DSCNN architecture. Table 2 demonstrates each component’s positive contribution to the overall performance, validating our design choices and highlighting the synergistic effect of these integrated innovations.
In the ×4 task, the performance results of different activation functions are presented in Table 3 when the patch size is set to 8, the batch size is set to 8, the total number of training steps is 100,000, and the decay is set to 50,000. It can be observed that the model employing SwishReLU achieves relatively optimal results. While reducing computational costs, SwishReLU is capable of enhancing model performance.

4.4. Experimental Results

We conducted comprehensive quantitative and qualitative evaluations comparing our DSCNN against traditional SISR methods, spanning traditional approaches (Bicubic) and deep learning models including SRCNN [9], VDSR [13], FSRCNN [10], and DRCN [15].
The quantitative results in Table 4, Table 5, Table 6 and Table 7 demonstrate DSCNN’s consistent superiority across all scale factors (×2, ×3, and ×4) and benchmark datasets. Notably, on the geometrically complex Urban100 dataset at ×4 scaling, DSCNN achieves a 0.23 dB PSNR improvement over lightweight CARN-M while significantly outperforming other established models. Meanwhile, we compared the proposed method with several classical super-resolution methods in terms of Params and FLOPs under the same input size and scaling factor in Table 8. It can be seen that the proposed method has the smallest Params and FLOPs.
Qualitative analysis addresses the limitations of objective metrics (PSNR/SSIM) in assessing perceptual quality, particularly for texture and detail reconstruction. We perform visual comparisons of DSCNN against six representative methods on Set14 and B100: traditional (Bicubic [43], ScSR [57], and Nearest Neighbor) and deep learning-based (SRCNN [9], SelfExSR [42], and CARN-M [16]) methods. The visualizations are structured with the following: (a) high-resolution reference, (b–g) comparative method outputs, and (h) our DSCNN reconstruction. Meanwhile, in order to more intuitively compare the performance of the proposed method in terms of human perception, we present two quantitative perceptual metrics, FSIM and LPIPS, in Table 9.
Figure 3 qualitatively compares “img014” from Set14, demonstrating our method’s superior visual realism. In the magnified region, conventional methods fail to reconstruct sharp contours on the zebra’s hind leg and lose grassy textures, while DSCNN faithfully recovers these intricate features, yielding perceptually plausible results that better align with human vision.
Figure 4 illustrates DSCNN’s enhanced visual clarity on B100’s “img077”. The magnified patch shows that our method precisely delineates color transitions on the marmot’s nose and maintains authentic color renditions, achieving significantly closer fidelity to the ground-truth than both interpolation and deep learning alternatives.
Figure 5 exemplifies DSCNN’s capability on Set14’s “img007”, producing visually sharper reconstruction. The magnified view reveals superior rendering of petal textures and stamen–petal junctions with markedly higher fidelity, outperforming conventional and learning-based methods in structural accuracy.
Figure 6 and Figure 7 demonstrate the super-resolution performance of our method on satellite and medical images, as well as the potential applications of super-resolution methods in multiple fields.

5. Conclusions

We propose the dynamic snake convolution neural network, a novel framework for efficient high-quality image super-resolution. The central component of this framework is as follows: The feature extraction and enhancement module with dynamic snake convolution adaptively deforms convolution kernels to match geometric structures within images, significantly enhancing fine-detail capture. This innovation is synergistically integrated with an enhanced residual framework, ensuring robust feature propagation, a multi-scale parallel convolution structure fusing local and global information, and SwishReLU activations improving optimization stability. Collectively, these components establish an effective solution that advances single-image super-resolution performance across both quantitative metrics and perceptual quality benchmarks. The code is available at https://github.com/WuZiang73/DSCNN (accessed on 25 July 2025).

Author Contributions

Conceptualization, W.X., Z.W. and C.T.; Methodology, W.X., Z.W. and C.T.; Software, Z.W.; Validation, W.X., Z.W. and C.T.; Investigation, T.B. and B.L.; Resources, Q.Z. and T.B.; Data curation, Z.W.; Writing—original draft, W.X. and Z.W.; Writing—review & editing, W.X. and C.T.; Visualization, W.X., Q.Z. and C.T.; Supervision, Q.Z., T.B., B.L. and C.T.; Project administration, Q.Z., T.B., B.L. and C.T.; Funding acquisition, C.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Basic and Applied Basic Research Foundation of Guangdong Province, grant 2025A1515011566; Leading Talents in Gusu Innovation and Entrepreneurship, grant ZXL2023170; and the Basic Research Programs of Taicang 2024, grant TC2024JC32.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Tian, C.; Zheng, M.; Lin, C.W.; Li, Z.; Zhang, D. Heterogeneous window transformer for image denoising. IEEE Trans. Syst. Man Cybern. Syst. 2024, 54, 6621–6632. [Google Scholar] [CrossRef]
  2. Zhang, K.; Zuo, W.; Chen, Y.; Meng, D.; Zhang, L. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE Trans. Image Process. 2017, 26, 3142–3155. [Google Scholar] [CrossRef]
  3. Tian, C.; Zheng, M.; Li, B.; Zhang, Y.; Zhang, S.; Zhang, D. Perceptive self-supervised learning network for noisy image watermark removal. IEEE Trans. Circuits Syst. Video Technol. 2024, 34, 7069–7079. [Google Scholar] [CrossRef]
  4. Ma, L.; Li, N.; Zhu, P.; Tang, K.; Khan, A.; Wang, F.; Yu, G. A novel fuzzy neural network architecture search framework for defect recognition with uncertainties. IEEE Trans. Fuzzy Syst. 2024, 32, 3274–3285. [Google Scholar] [CrossRef]
  5. Li, N.; Xue, B.; Ma, L.; Zhang, M. Automatic Fuzzy Architecture Design for Defect Detection via Classifier-Assisted Multiobjective Optimization Approach. IEEE Trans. Evol. Comput. 2025. [Google Scholar] [CrossRef]
  6. Li, Y.; Sixou, B.; Peyrin, F. A review of the deep learning methods for medical images super resolution problems. IRBM 2021, 42, 120–133. [Google Scholar] [CrossRef]
  7. Wang, P.; Bayram, B.; Sertel, E. A comprehensive review on deep learning based remote sensing image super-resolution methods. Earth-Sci. Rev. 2022, 232, 104110. [Google Scholar] [CrossRef]
  8. Rasti, P.; Uiboupin, T.; Escalera, S.; Anbarjafari, G. Convolutional neural network super resolution for face recognition in surveillance monitoring. In Proceedings of the Articulated Motion and Deformable Objects: 9th International Conference, AMDO 2016, Palma de Mallorca, Spain, 13–15 July 2016; pp. 175–184. [Google Scholar]
  9. Dong, C.; Loy, C.C.; He, K.; Tang, X. Image Super-Resolution Using Deep Convolutional Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 295–307. [Google Scholar] [CrossRef] [PubMed]
  10. Dong, C.; Loy, C.C.; Tang, X. Accelerating the Super-Resolution Convolutional Neural Network. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 11–14 October 2016. [Google Scholar]
  11. Shi, W.; Caballero, J.; Huszár, F.; Totz, J.; Aitken, A.P.; Bishop, R.; Rueckert, D.; Wang, Z. Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 1874–1883. [Google Scholar] [CrossRef]
  12. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
  13. Kim, J.; Lee, J.K.; Lee, K.M. Accurate Image Super-Resolution Using Very Deep Convolutional Networks. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 1646–1654. [Google Scholar] [CrossRef]
  14. Lim, B.; Son, S.; Kim, H.; Nah, S.; Lee, K.M. Enhanced Deep Residual Networks for Single Image Super-Resolution. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA, 21–26 July 2017; pp. 1132–1140. [Google Scholar] [CrossRef]
  15. Kim, J.; Lee, J.K.; Lee, K.M. Deeply-Recursive Convolutional Network for Image Super-Resolution. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 1637–1645. [Google Scholar] [CrossRef]
  16. Li, Y.; Agustsson, E.; Gu, S.; Timofte, R.; Van, L. CARN: Convolutional Anchored Regression Network for Fast and Accurate Single Image Super-Resolution. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
  17. Chen, H.; Gu, J.; Zhang, Z. Attention in attention network for image super-resolution. arXiv 2021, arXiv:2104.09497. [Google Scholar]
  18. Lai, W.S.; Huang, J.B.; Ahuja, N.; Yang, M.H. Deep Laplacian Pyramid Networks for Fast and Accurate Super-Resolution. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 5835–5843. [Google Scholar] [CrossRef]
  19. Zhang, Y.; Li, K.; Li, K.; Wang, L.; Zhong, B.; Fu, Y. Image super-resolution using very deep residual channel attention networks. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 286–301. [Google Scholar]
  20. Rahman, J.U.; Zulfiqar, R.; Khan, A. SwishReLU: A unified approach to activation functions for enhanced deep neural networks performance. arXiv 2024, arXiv:2407.08232. [Google Scholar]
  21. He, K.; Zhang, X.; Ren, S.; Sun, J. Identity mappings in deep residual networks. In Computer Vision—ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016, Proceedings, Part IV 14; Springer: Cham, Switzerland, 2016; pp. 630–645. [Google Scholar]
  22. Zagoruyko, S.; Komodakis, N. Wide residual networks. arXiv 2016, arXiv:1605.07146. [Google Scholar]
  23. Yu, F.; Koltun, V.; Funkhouser, T. Dilated residual networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 472–480. [Google Scholar]
  24. Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar]
  25. Tian, C.; Zheng, M.; Jiao, T.; Zuo, W.; Zhang, Y.; Lin, C.W. A self-supervised CNN for image watermark removal. IEEE Trans. Circuits Syst. Video Technol. 2024, 34, 7566–7576. [Google Scholar] [CrossRef]
  26. Han, Y.; Huang, G.; Song, S.; Yang, L.; Wang, H.; Wang, Y. Dynamic neural networks: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 7436–7456. [Google Scholar] [CrossRef]
  27. Dai, J.; Qi, H.; Xiong, Y.; Li, Y.; Zhang, G.; Hu, H.; Wei, Y. Deformable convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 764–773. [Google Scholar]
  28. Yang, B.; Bender, G.; Le, Q.V.; Ngiam, J. Condconv: Conditionally parameterized convolutions for efficient inference. Adv. Neural Inf. Process. Syst. 2019, 32. [Google Scholar]
  29. Chen, Y.; Dai, X.; Liu, M.; Chen, D.; Yuan, L.; Liu, Z. Dynamic convolution: Attention over convolution kernels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11030–11039. [Google Scholar]
  30. He, T.; Shen, C.; Van Den Hengel, A. Dyco3d: Robust instance segmentation of 3d point clouds through dynamic convolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 354–363. [Google Scholar]
  31. Qi, Y.; He, Y.; Qi, X.; Zhang, Y.; Yang, G. Dynamic snake convolution based on topological geometric constraints for tubular structure segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 6070–6079. [Google Scholar]
  32. Tian, C.; Xu, Y.; Zuo, W.; Lin, C.W.; Zhang, D. Asymmetric CNN for image superresolution. IEEE Trans. Syst. Man, Cybern. Syst. 2021, 52, 3718–3730. [Google Scholar] [CrossRef]
  33. Tian, C.; Zhang, X.; Zhang, Q.; Yang, M.; Ju, Z. Image super-resolution via dynamic network. CAAI Trans. Intell. Technol. 2024, 9, 837–849. [Google Scholar] [CrossRef]
  34. Zhang, Y.; Tian, Y.; Kong, Y.; Zhong, B.; Fu, Y. Residual dense network for image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 2472–2481. [Google Scholar]
  35. Tian, C.; Song, M.; Fan, X.; Zheng, X.; Zhang, B.; Zhang, D. A Tree-guided CNN for image super-resolution. IEEE Trans. Consum. Electron. 2025. [Google Scholar] [CrossRef]
  36. Huber, P.J. Robust estimation of a location parameter. In Breakthroughs in Statistics: Methodology and distribution; Springer: Berlin/Heidelberg, Germany, 1992; pp. 492–518. [Google Scholar]
  37. Ding, X.; Guo, Y.; Ding, G.; Han, J. ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
  38. Timofte, R.; Agustsson, E.; Van Gool, L.; Yang, M.H.; Zhang, L. Ntire 2017 challenge on single image super-resolution: Methods and results. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 114–125. [Google Scholar]
  39. Bevilacqua, M.; Roumy, A.; Guillemot, C.; Alberi-Morel, M.L. Low-complexity single-image super-resolution based on nonnegative neighbor embedding. In Proceedings of the British Machine Vision Conference, Surrey, UK, 3–7 September 2012. [Google Scholar]
  40. Zeyde, R.; Elad, M.; Protter, M. On single image scale-up using sparse-representations. In Proceedings of the International Conference on Curves and Surfaces, Avignon, France, 24–30 June 2010; pp. 711–730. [Google Scholar]
  41. Martin, D.; Fowlkes, C.; Tal, D.; Malik, J. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proceedings of the Eighth IEEE International Conference on Computer Vision, ICCV 2001, Vancouver, BC, Canada, 7–14 July 2001; Volume 2, pp. 416–423. [Google Scholar]
  42. Huang, J.B.; Singh, A.; Ahuja, N. Single image super-resolution from transformed self-exemplars. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 5197–5206. [Google Scholar] [CrossRef]
  43. Carlson, R.E.; Fritsch, F.N. Monotone piecewise bicubic interpolation. SIAM J. Numer. Anal. 1985, 22, 386–400. [Google Scholar] [CrossRef]
  44. Dai, D.; Timofte, R.; Van Gool, L. Jointly Optimized Regressors for Image Super-resolution. Comput. Graph. Forum 2015, 34, 95–104. [Google Scholar] [CrossRef]
  45. Wang, Z.; Liu, D.; Yang, J.; Han, W.; Huang, T. Deep Networks for Image Super-Resolution with Sparse Prior. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 370–378. [Google Scholar] [CrossRef]
  46. Timofte, R.; Desmet, V.; Vangool, L. A+: Adjusted Anchored Neighborhood Regression for Fast Super-Resolution. In Computer Vision—ACCV 2014; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2014. [Google Scholar]
  47. Schulter, S.; Leistner, C.; Bischof, H. Fast and accurate image upscaling with super-resolution forests. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 3791–3799. [Google Scholar] [CrossRef]
  48. Mao, X.J.; Shen, C.; Yang, Y.B. Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections. Adv. Neural Inf. Process. Syst. 2016, 29. [Google Scholar]
  49. Lu, Z.; Yu, Z.; Yali, P.; Shigang, L.; Xiaojun, W.; Gang, L.; Yuan, R. Fast Single Image Super-Resolution Via Dilated Residual Networks. IEEE Access 2019, 7, 109729–109738. [Google Scholar] [CrossRef]
  50. Shi, Y.; Wang, K.; Chen, C.; Xu, L.; Lin, L. Structure-Preserving Image Super-Resolution via Contextualized Multitask Learning. IEEE Trans. Multimed. 2017, 19, 2804–2815. [Google Scholar] [CrossRef]
  51. Ren, H.; El-Khamy, M.; Lee, J. Image Super Resolution Based on Fusing Multiple Convolution Neural Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA, 21–26 July 2017; pp. 1050–1057. [Google Scholar] [CrossRef]
  52. Zhang, K.; Gao, X.; Tao, D.; Li, X. Single Image Super-Resolution With Non-Local Means and Steering Kernel Regression. IEEE Trans. Image Process. 2012, 21, 4544–4556. [Google Scholar] [CrossRef]
  53. Bae, W.; Yoo, J.; Ye, J.C. Beyond Deep Residual Learning for Image Restoration: Persistent Homology-Guided Manifold Simplification. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA, 21–26 July 2017; pp. 1141–1149. [Google Scholar] [CrossRef]
  54. Xu, J.; Li, M.; Fan, J.; Zhao, X.; Chang, Z. Self-Learning Super-Resolution Using Convolutional Principal Component Analysis and Random Matching. IEEE Trans. Multimed. 2019, 21, 1108–1121. [Google Scholar] [CrossRef]
  55. Tian, C.; Zhuge, R.; Wu, Z.; Xu, Y.; Zuo, W.; Chen, C.; Lin, C.W. Lightweight image super-resolution with enhanced CNN. Knowl.-Based Syst. 2020, 205, 106235. [Google Scholar] [CrossRef]
  56. Chen, Y.; Pock, T. Trainable Nonlinear Reaction Diffusion: A Flexible Framework for Fast and Effective Image Restoration. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1256–1272. [Google Scholar] [CrossRef]
  57. Yang, J.; Wright, J.; Huang, T.S.; Ma, Y. Image Super-Resolution Via Sparse Representation. IEEE Trans. Image Process. 2010, 19, 2861–2873. [Google Scholar] [CrossRef]
  58. Khan, A.H.; Micheloni, C.; Martinel, N. IDENet: Implicit Degradation Estimation Network for Efficient Blind Super Resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Seattle, WA, USA, 17–18 June 2024; pp. 6065–6075. [Google Scholar]
  59. Wen, W.; Guo, C.; Ren, W.; Wang, H.; Shao, X. Adaptive Blind Super-Resolution Network for Spatial-Specific and Spatial-Agnostic Degradations. IEEE Trans. Image Process. 2024, 33, 4404–4418. [Google Scholar] [CrossRef] [PubMed]
  60. Cao, F.; Chen, B. New architecture of deep recursive convolution networks for super-resolution. Knowl.-Based Syst. 2019, 178, 98–110. [Google Scholar] [CrossRef]
  61. Huang, Y.; Li, S.; Wang, L.; Tan, T. Unfolding the alternating optimization for blind super resolution. Adv. Neural Inf. Process. Syst. 2020, 33, 5632–5643. [Google Scholar]
  62. Hui, Z.; Wang, X.; Gao, X. Fast and Accurate Single Image Super-Resolution via Information Distillation Network. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 723–731. [Google Scholar] [CrossRef]
  63. Tai, Y.; Yang, J.; Liu, X.; Xu, C. MemNet: A Persistent Memory Network for Image Restoration. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 4549–4557. [Google Scholar] [CrossRef]
  64. Cheng, G.; Han, J.; Zhou, P.; Guo, L. Multi-class geospatial object detection and geographic image classification based on collection of part detectors. ISPRS J. Photogramm. Remote Sens. 2014, 98, 119–132. [Google Scholar] [CrossRef]
  65. Cheng, G.; Han, J. A survey on object detection in optical remote sensing images. ISPRS J. Photogramm. Remote Sens. 2016, 117, 11–28. [Google Scholar] [CrossRef]
  66. Cheng, G.; Zhou, P.; Han, J. Learning Rotation-Invariant Convolutional Neural Networks for Object Detection in VHR Optical Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2016, 54, 7405–7415. [Google Scholar] [CrossRef]
Figure 1. Network architecture of DSCNN.
Figure 1. Network architecture of DSCNN.
Mathematics 13 02457 g001
Figure 2. Mechanism of the dynamic snake convolution (DSConv).
Figure 2. Mechanism of the dynamic snake convolution (DSConv).
Mathematics 13 02457 g002
Figure 3. Super-resolution visualization results of different methods with scaling factor of × 2 on Set14-img014. (a) HR image, (b) Nearest Neighbor, (c) Bicubic, (d) SRCNN, (e) ScSR, (f) SelfExSR, (g) CARN-M, and (h) DSCNN (ours).
Figure 3. Super-resolution visualization results of different methods with scaling factor of × 2 on Set14-img014. (a) HR image, (b) Nearest Neighbor, (c) Bicubic, (d) SRCNN, (e) ScSR, (f) SelfExSR, (g) CARN-M, and (h) DSCNN (ours).
Mathematics 13 02457 g003
Figure 4. Super-resolution visualization results of different methods with scaling factor of × 3 on B100-img077. (a) HR image, (b) Nearest Neighbor, (c) Bicubic, (d) SRCNN, (e) ScSR, (f) SelfExSR, (g) CARN-M, and (h) DSCNN (ours).
Figure 4. Super-resolution visualization results of different methods with scaling factor of × 3 on B100-img077. (a) HR image, (b) Nearest Neighbor, (c) Bicubic, (d) SRCNN, (e) ScSR, (f) SelfExSR, (g) CARN-M, and (h) DSCNN (ours).
Mathematics 13 02457 g004
Figure 5. Super-resolution visualization results of different methods with scaling factor of × 4 on Set14-img007. (a) HR image, (b) Nearest Neighbor, (c) Bicubic, (d) SRCNN, (e) ScSR, (f) SelfExSR, (g) CARN-M, and (h) DSCNN (ours).
Figure 5. Super-resolution visualization results of different methods with scaling factor of × 4 on Set14-img007. (a) HR image, (b) Nearest Neighbor, (c) Bicubic, (d) SRCNN, (e) ScSR, (f) SelfExSR, (g) CARN-M, and (h) DSCNN (ours).
Mathematics 13 02457 g005
Figure 6. Super-resolution visualization results with scaling factor of × 4 on the NWPU VHR-10 satellite image dataset [64,65,66]. (a) HR image, (b) Bicubic, and (c) DSCNN (ours).
Figure 6. Super-resolution visualization results with scaling factor of × 4 on the NWPU VHR-10 satellite image dataset [64,65,66]. (a) HR image, (b) Bicubic, and (c) DSCNN (ours).
Mathematics 13 02457 g006
Figure 7. Super-resolution visualization results with scaling factor of × 4 on the publicly available Blood Cell Images dataset. (a) HR image, (b) Bicubic, and (c) DSCNN (ours).
Figure 7. Super-resolution visualization results with scaling factor of × 4 on the publicly available Blood Cell Images dataset. (a) HR image, (b) Bicubic, and (c) DSCNN (ours).
Mathematics 13 02457 g007
Table 1. Comparative analysis of dynamic convolution methods.
Table 1. Comparative analysis of dynamic convolution methods.
MethodKernel AdaptationTraining StabilityPrimary Applications
DCN [27]Point-wise offsetsModerateObject detection
CondConv [28]Kernel weightingHighClassification
DyConv [29]AttentionHigh (w/annealing)Keypoint detection
DSConv [31]Iterative tracingHighCurvilinear feature enhancement
Table 2. Ablation study results of DSCNN (PSNR/SSIM).
Table 2. Ablation study results of DSCNN (PSNR/SSIM).
MethodsSet5 (PSNR (dB)/SSIM)Set14 (PSNR (dB)/SSIM)B100 (PSNR (dB)/SSIM)
Baseline31.52/0.885427.96/0.773727.37/0.7284
+DSConv31.63/0.886028.02/0.774227.38/0.7293
+Enhanced Residual Structure31.65/0.887828.04/0.775827.40/0.7302
+SwishReLU31.68/0.888628.05/0.777627.44/0.7319
+Multi-Scale Enhancement31.70/0.889028.10/0.777727.46/0.7323
Table 3. Comparison of ReLU-like activation functions in the ×4 task.
Table 3. Comparison of ReLU-like activation functions in the ×4 task.
FunctionsSet5 (PSNR (dB)/SSIM)Set14 (PSNR (dB)/SSIM)B100 (PSNR (dB)/SSIM)
ReLU30.20/0.856027.11/0.749426.79/0.7089
LeakyReLU30.25/0.856727.11/0.750126.82/0.7103
BReLU-630.08/0.856827.07/0.749726.74/0.7095
SwishReLU30.37/0.860427.21/0.753526.85/0.7121
Table 4. Average PSNR (dB) and SSIM for different methods with three scaling factors on the Set5 dataset. The best results are highlighted in red and the second best in blue.
Table 4. Average PSNR (dB) and SSIM for different methods with three scaling factors on the Set5 dataset. The best results are highlighted in red and the second best in blue.
DatasetsMethods×2×3×4
Set5Bicubic [43]33.66/0.929930.39/0.868228.42/0.8104
JOR [44]36.58/0.954332.55/0.906730.19/0.8563
SRCNN [9]36.66/0.954232.75/0.909030.48/0.8628
SelfEx [42]36.49/0.953732.58/0.909330.31/0.8619
VDSR [13]37.53/0.958733.66/0.921331.35/0.8838
CSCN [45]36.93/0.955233.10/0.914430.86/0.8732
FSRCNN [10]37.00/0.955833.16/0.914030.71/0.8657
A+ [46]36.54/0.954432.58/0.908830.28/0.8603
RFL [47]36.54/0.953732.43/0.905730.14/0.8548
RED [48]37.56/0.959533.70/0.922231.33/0.8847
FDSR [49]37.40/0.951333.68/0.909631.28/0.8658
RCN [50]37.17/0.958333.45/0.917531.11/0.8736
DRCN [15]37.63/0.958833.82/0.922631.53/0.8854
CNF [51]37.66/0.959033.74/0.922631.55/0.8856
DnCNN [52]37.58/0.959033.75/0.922231.40/0.8845
LapSRN [18]37.52/0.9590-31.54/0.8850
WaveResNet [53]37.57/0.958633.86/0.922831.52/0.8864
CPCA [54]34.99/0.946931.09/0.897528.67/0.8434
LESRCNN [55]37.65/0.958633.93/0.923131.88/0.8903
TNRD [56]36.86/0.955633.18/0.915230.85/0.8732
ScSR [57]35.78/0.948531.34/0.886929.07/0.8263
IDENet [58]37.16/0.9521-31.57/0.8846
GLFDN [59]37.47/0.954533.86/0.920331.90/0.8869
DSRNet [33]37.61/0.958433.92/0.922731.71/0.8874
DSCNN (Ours)37.66/0.958933.94/0.923431.90/0.8909
Table 5. Average PSNR (dB) and SSIM for different methods with three scaling factors on the Set14 dataset. The best results are highlighted in red and the second best in blue.
Table 5. Average PSNR (dB) and SSIM for different methods with three scaling factors on the Set14 dataset. The best results are highlighted in red and the second best in blue.
DatasetsMethods × 2 × 3 × 4
Set14Bicubic [43]30.24/0.868827.55/0.774226.00/0.7027
RFL [47]32.26/0.904029.05/0.816427.24/0.7451
SRCNN [9]32.42/0.906329.28/0.820927.49/0.7503
FDSR [49]33.00/0.904229.61/0.817927.86/0.7500
SelfEx [42]32.22/0.903429.16/0.819627.40/0.7518
VDSR [13]33.03/0.912429.77/0.831428.01/0.7674
DRRN [18]33.23/0.913629.96/0.834928.21/0.7720
CSCN [45]32.56/0.907429.41/0.823827.64/0.7578
FSRCNN [10]32.63/0.908829.43/0.824227.59/0.7535
A+ [46]32.28/0.905629.13/0.818827.32/0.7491
JOR [44]32.38/0.906329.19/0.820427.27/0.7479
RED [48]32.81/0.913529.50/0.833427.72/0.7698
RCN [50]32.77/0.910929.63/0.826927.79/0.7594
DRCN [15]33.04/0.911829.76/0.831128.02/0.7670
LapSRN [18]33.08/0.9130-28.19/0.7720
WaveResNet [53]33.09/0.912929.88/0.833128.11/0.7699
CPCA [54]31.04/0.895127.89/0.803826.10/0.7296
DnCNN [52]33.03/0.912829.81/0.832128.04/0.7672
NDRCN [60]33.20/0.914129.88/0.833328.10/0.7697
TNRD [56]32.51/0.906929.43/0.823227.66/0.7563
ScSR [57]31.64/0.894028.19/0.797726.40/0.7218
IDENet [58]32.84/0.9025-28.27/0.7678
DSCNN (Ours)33.28/0.916629.84/0.838628.17/0.7794
Table 6. Average PSNR (dB) and SSIM for different methods with three scaling factors on the B100 dataset. The best results are highlighted in red and the second best in blue.
Table 6. Average PSNR (dB) and SSIM for different methods with three scaling factors on the B100 dataset. The best results are highlighted in red and the second best in blue.
DatasetsMethods × 2 × 3 × 4
B100Bicubic [43]29.56/0.843127.21/0.738525.96/0.6675
RFL [47]31.16/0.884028.22/0.780626.75/0.7054
SRCNN [11]31.36/0.887928.41/0.786326.90/0.7101
VDSR [13]31.90/0.896028.82/0.797627.29/0.7251
SelfEx [42]31.18/0.885528.29/0.784026.84/0.7106
DRRN [18]32.05/0.897328.95/0.800427.38/0.7284
FSRCNN [10]31.53/0.892028.53/0.791026.98/0.7150
TNRD [56]31.40/0.887828.85/0.798127.29/0.7253
CARN-M [16]31.92/0.896028.91/0.800027.44/0.7304
A+ [46]31.21/0.886328.29/0.783526.82/0.7087
JOR [44]31.22/0.886728.27/0.783726.79/0.7083
RED [48]31.96/0.897228.88/0.799327.35/0.7276
CSCN [45]31.40/0.888428.50/0.788527.03/0.7161
DRCN [15]31.85/0.894228.80/0.796327.23/0.7233
CNF [51]31.91/0.896228.82/0.798027.32/0.7253
LapSRN [18]31.80/0.8950-27.32/0.7280
NDRCN [60]32.00/0.897528.86/0.799127.30/0.7263
LESRCNN [55]31.95/0.896428.91/0.800527.45/0.7313
FDSR [49]31.87/0.884728.82/0.779727.31/0.7031
ScSR [57]30.77/0.874427.72/0.764726.61/0.6983
DnCNN [52]31.90/0.896128.85/0.798127.29/0.7253
DAN [61]31.76/0.885828.94/0.791927.51/0.7248
IDENet [58]31.65/0.8848-27.35/0.7235
DSRNet [33]31.96/0.896528.90/0.800327.43/0.7303
DSCNN (Ours)32.06/0.898328.94/0.801127.52/0.7342
Table 7. Average PSNR (dB) and SSIM for different methods with three scaling factors on the Urban100 dataset. The best results are highlighted in red and the second best in blue.
Table 7. Average PSNR (dB) and SSIM for different methods with three scaling factors on the Urban100 dataset. The best results are highlighted in red and the second best in blue.
DatasetsMethods × 2 × 3 × 4
Urban100Bicubic [43]26.88/0.840324.46/0.734923.14/0.6577
SRCNN [9]29.50/0.894626.24/0.798924.52/0.7221
FDSR [49]30.91/0.908827.23/0.819025.27/0.7417
CARN-M [16]31.23/0.919327.55/0.838525.62/0.7694
JOR [44]29.25/0.895125.97/0.797224.29/0.7181
VDSR [13]30.76/0.914027.14/0.827925.18/0.7524
DRRN [18]31.23/0.918827.53/0.737825.44/0.7638
FSRCNN [10]29.88/0.902026.43/0.808024.62/0.7280
TNRD [56]29.70/0.899426.42/0.807624.61/0.7291
IDN [62]31.27/0.919627.42/0.835925.41/0.7632
WaveResNet [53]30.96/0.916927.28/0.833425.36/0.7614
RED [48]30.91/0.915927.31/0.830325.35/0.7587
DRCN [15]30.75/0.913327.15/0.827625.14/0.7510
A+ [46]29.20/0.893626.03/0.797324.32/0.7183
NDRCN [60]31.06/0.917527.23/0.831225.16/0.7546
MemNet [63]31.31/0.919527.56/0.837625.50/0.7630
DnCNN [52]30.74/0.913927.15/0.827625.20/0.7521
LESRCNN [55]31.45/0.920627.70/0.841525.77/0.7732
RFL [47]29.11/0.890425.86/0.790024.19/0.7096
ScSR [57]28.26/0.8828-24.02/0.7024
LapSRN [18]30.41/0.9100-25.21/0.7560
DAN [61]30.60/0.906027.65/0.835225.86/0.7721
SelfEx [42]29.54/0.896726.44/0.808824.79/0.7374
IDENet [58]30.22/0.9004-25.39/0.7585
DSRNet [33]31.41/0.920927.63/0.840225.65/0.7693
DSCNN (Ours)31.72/0.924427.69/0.842525.85/0.7787
Table 8. Parameters and FLOPs for different methods with 1280 × 960 input size and PSNR (dB)/SSIM with a ×4 scaling factor on Set5 dataset.
Table 8. Parameters and FLOPs for different methods with 1280 × 960 input size and PSNR (dB)/SSIM with a ×4 scaling factor on Set5 dataset.
ModelsParams (K)FLOPs (G)
EDSR [14]1518114.0
CARN [16]159290.9
DSCNN (ours)97086.78
Table 9. FSIM and LPIPS of different models on the Urban100 dataset for the ×4 task.
Table 9. FSIM and LPIPS of different models on the Urban100 dataset for the ×4 task.
ModelFSIMLPIPS
bicubic [43]0.94150.4727
SRCNN [9]0.96710.3516
SelfExSR [42]0.97610.3098
CARN-M [16]0.98110.2524
DSCNN (Ours)0.98240.2385
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Xin, W.; Wu, Z.; Zhu, Q.; Bi, T.; Li, B.; Tian, C. Dynamic Snake Convolution Neural Network for Enhanced Image Super-Resolution. Mathematics 2025, 13, 2457. https://doi.org/10.3390/math13152457

AMA Style

Xin W, Wu Z, Zhu Q, Bi T, Li B, Tian C. Dynamic Snake Convolution Neural Network for Enhanced Image Super-Resolution. Mathematics. 2025; 13(15):2457. https://doi.org/10.3390/math13152457

Chicago/Turabian Style

Xin, Weiqiang, Ziang Wu, Qi Zhu, Tingting Bi, Bing Li, and Chunwei Tian. 2025. "Dynamic Snake Convolution Neural Network for Enhanced Image Super-Resolution" Mathematics 13, no. 15: 2457. https://doi.org/10.3390/math13152457

APA Style

Xin, W., Wu, Z., Zhu, Q., Bi, T., Li, B., & Tian, C. (2025). Dynamic Snake Convolution Neural Network for Enhanced Image Super-Resolution. Mathematics, 13(15), 2457. https://doi.org/10.3390/math13152457

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop