Dynamic Snake Convolution Neural Network for Enhanced Image Super-Resolution

Xin, Weiqiang; Wu, Ziang; Zhu, Qi; Bi, Tingting; Li, Bing; Tian, Chunwei

doi:10.3390/math13152457

Open AccessArticle

Dynamic Snake Convolution Neural Network for Enhanced Image Super-Resolution

by

Weiqiang Xin

¹

,

Ziang Wu

¹,

Qi Zhu

²,

Tingting Bi

³,

Bing Li

¹ and

Chunwei Tian

^4,5,*

¹

School of Software, Northwestern Polytechnical University, Xi’an 710129, China

²

Key Laboratory of Brain-Machine Intelligence Technology, College of Artificial Intelligence, Nanjing University of Aeronautics and Astronautics, Ministry of Education, Nanjing 211106, China

³

School of Computing and Information Systems, University of Melbourne, Parkville 3010, Australia

⁴

Shenzhen Research Institute of Northwestern Polytechnical University, Northwestern Polytechnical University, Shenzhen 518057, China

⁵

Yangtze River Delta Research Institute, Northwestern Polytechnical University, Taicang 215400, China

^*

Author to whom correspondence should be addressed.

Mathematics 2025, 13(15), 2457; https://doi.org/10.3390/math13152457

Submission received: 27 June 2025 / Revised: 23 July 2025 / Accepted: 29 July 2025 / Published: 30 July 2025

(This article belongs to the Special Issue Structural Networks for Image Application)

Download

Browse Figures

Versions Notes

Abstract

Image super-resolution (SR) is essential for enhancing image quality in critical applications, such as medical imaging and satellite remote sensing. However, existing methods were often limited in their ability to effectively process and integrate multi-scales information from fine textures to global structures. To address these limitations, this paper proposes DSCNN, a dynamic snake convolution neural network for enhanced image super-resolution. DSCNN optimizes feature extraction and network architecture to enhance both performance and efficiency: To improve feature extraction, the core innovation is a feature extraction and enhancement module with dynamic snake convolution that dynamically adjusts the convolution kernel’s shape and position to better fit the image’s geometric structures, significantly improving feature extraction. To optimize the network’s structure, DSCNN employs an enhanced residual network framework. This framework utilizes parallel convolutional layers and a global feature fusion mechanism to further strengthen feature extraction capability and gradient flow efficiency. Additionally, the network incorporates a SwishReLU-based activation function and a multi-scale convolutional concatenation structure. This multi-scale design effectively captures both local details and global image structure, enhancing SR reconstruction. In summary, the proposed DSCNN outperforms existing methods in both objective metrics and visual perception (e.g., our method achieved optimal PSNR and SSIM results on the Set5

\times 4

dataset).

Keywords:

SISR; dynamic convolution; multi-scale structure

MSC:

68T07

1. Introduction

Artificial intelligence (AI) has become integral to various industries, transforming traditional approaches and driving advancements in emerging technologies. As a fundamental technology AI, deep learning has demonstrated remarkable success across diverse fields (e.g., image denoising [1,2], image watermark removal [3], defect recognition [4,5]). Within this domain, single-image SR (SISR) has emerged as a prominent research topic. SISR aims to reconstruct high-resolution (HR) images with richer details from a single low-resolution (LR) input, fundamentally involving the inference of missing high-frequency information to enhance visual quality and clarity. The significance of SISR extends to numerous practical applications. In medical imaging [6], for instance, it facilitates accurate disease diagnosis and supports early detection and treatment. For satellite remote sensing [7], it improves the image’s resolution, enabling clearer monitoring and assessment of natural disasters. In security and surveillance [8], it enhances video footage quality. Moreover, SISR techniques are widely utilized to optimize image quality, improve visual experiences, and support film and television production and restoration.

Driven by rapid advancements in deep learning, super-resolution technology has achieved significant breakthroughs, yielding results that substantially surpass traditional algorithms. Dong et al. [9,10] pioneered the first CNN for SR (SRCNN) with an end-to-end architecture and later proposed fast SRCNN (FSRCNN) to reduce computational costs by using deconvolution for backend upsampling. Seeking further improvements in both speed and performance, Shi et al. [11] introduced an efficient sub-pixel CNN (ESPCNN), incorporating sub-pixel convolution for upsampling to mitigate artifacts commonly associated with deconvolution. Consequently, fixed-scale super-resolution methods have predominantly adopted three main upsampling approaches: interpolation, deconvolution, and sub-pixel convolution. Inspired by ResNet [12], the concept of residual connections gained widespread adoption. Building on this, Kim et al. [13] proposed a very deep CNN (VDSR), employing global residual connections to preserve feature information and enhance reconstruction accuracy. Lim et al. [14] extended this work by proposing the enhanced deep residual network for SISR (EDSR), integrating residual connections and removing batch normalization (BN) layers for improved accuracy, albeit at the cost of increased network width, parameters, and computational load. Deeply recursive convolutional networks for SISR (DRCNs) [15] were the first to apply recurrent neural network (RNN) concepts to SR, reducing parameter counts. Concurrently, researchers developed lightweight SR networks, such as convolutional anchored regression networks, for fast and accurate SISR (CARN) [16], and they applied attention in attention networks for image SR (A2N) [17]. To advance performance, deep Laplacian pyramid networks for fast and accurate SR (LapSRN) [18], based on the Laplacian pyramid concept, achieved multi-scale SR reconstruction, and they represent a cascaded upsampling approach. Furthermore, SISR using very deep residual channel attention networks (RCANs) [19] enhanced performance by stacking RCABs, combining residual learning with channel attention. Despite these advancements, many prior CNN architectures for SR primarily focused on fixed, single-scale feature extraction. They were often limited in their ability to effectively process and integrate information across diverse scales (i.e., from fine textures to global structures) within a unified framework, frequently resulting in reconstructed images lacking intricate details and satisfactory perceptual fidelity.

To accommodate complex geometric structures, effectively integrate information from different scales, and improve training efficiency to enhance the performance of image SR models, this paper presents a novel CNN model termed DSCNN. The proposed approach aims to comprehensively improve SR capability by strengthening feature extraction and optimizing the network’s architecture. A key innovation is the embedding of dynamic snake convolution (DSConv), which significantly enhances the model’s ability to capture fine image details, thereby boosting the quality and efficiency of SR reconstruction. DSConv dynamically adjusts the shape and position of convolutional kernels, enabling precise adaptation to geometric structures within images and substantially improving feature extraction flexibility and adaptability. An improved residual network architecture is employed to effectively mitigate the vanishing gradient problem during deep network training through residual learning, enhancing both training efficiency and overall performance. The integration of SwishReLU [20] addresses the dead ReLU problem, contributing to improved reconstruction quality. Furthermore, a multi-scale convolution parallel structure is incorporated, enabling the model to simultaneously capture local details and global structures, further optimizing the SR effect. The main contributions of this work are summarized as follows:

(1): DSConv is embedded into the CNN architecture, leveraging its dynamic adjustment mechanism to capture intricate image detail features effectively.
(2): An improved residual structure is utilized to facilitate efficient utilization of multi-level feature information within the network.
(3): A parallel multi-scale convolution structure is integrated into the CNNs to enable the model to consolidate both local details and global structural information of the image.

The remainder of this paper is organized as follows. Section 2 reviews related work. Section 3 details the proposed network for SISR. Section 4 presents the experimental analysis and results. Finally, Section 5 concludes this paper.

2. Related Work

2.1. Residual Structures for Image Processing

Residual networks (ResNets) [12], pioneered by He et al. for image classification, have become foundational in deep learning due to their efficacy in training very deep networks. The core innovation lies in “shortcut” or “skip” connections, enabling the network to learn residual functions relative to the layer inputs. This design effectively mitigates the vanishing gradient problem, facilitating unimpeded gradient flow during backpropagation and allowing optimization of networks with hundreds of layers. Subsequent research has yielded various ResNet variants aimed at enhancing performance and reducing complexity. He et al. [21] introduced ResNet V2, optimizing training efficiency through component reordering and a pre-activation mechanism. A key challenge in training very deep residual networks is diminishing feature reuse, leading to significantly slow convergence. To address this problem, Zagoruyko et al. [22] proposed Wider ResNet to enhance model expressiveness by increasing the network’s width while managing computational loads. In CNNs for image classification, the resolution is progressively reduced, culminating in tiny feature maps where the scene’s spatial structure becomes indistinguishable. These problems can be alleviated by dilation, which increases the resolution of output feature maps without reducing the receptive field of individual neurons. Yu et al. [23] incorporated dilated convolutions into dilated ResNet to expand receptive fields and improve feature extraction. To enhance a network’s representational power, several recent methods demonstrate significant improvements by incorporating better spatial encoding strategies. Hu et al. [24] introduced the Squeeze-and-Excitation (SE) module within SENet, integrating channel attention to boost feature representation. Collectively, these variants address deep network optimization challenges through diverse mechanisms, improving computational performance and efficiency.

This powerful residual paradigm has been successfully adapted to low-level image processing tasks [25], particularly SISR. The VDSR network [13] was among the first to leverage this concept effectively, employing a single global residual connection to learn the difference between LR and HR images, yielding significant performance gains. Building on this, the EDSR [14] adopted local residual blocks and crucially removed Batch Normalization (BN) layers. EDSR argued that BN layers, which are beneficial in high-level tasks, discard valuable range information detrimental to image restoration performance.

Our work extends this foundation by proposing an improved residual structure. This structure aims to ensure stable training while enhancing feature propagation through parallel convolutional pathways and global feature fusion, promoting more comprehensive utilization of multi-level feature information.

2.2. Dynamic Convolution for Image Processing

Standard convolutional layers employ fixed, static kernels applied uniformly across an image. This rigidity limits their ability to adapt to the diverse and complex geometric structures inherent in natural images. Dynamic convolution [26] addresses this limitation by generating kernel weights or sampling locations conditioned on input features, allowing the receptive field to dynamically adjust its shape and focus based on image content. The evolution of dynamic convolution includes key milestones. The deformable convolutional network (DCN) [27] represents a seminal work, learning 2D offsets for the regular sampling grid of standard convolutions. This enables the receptive field to deform and align with object shapes, proving highly beneficial in detection and segmentation tasks. Conditionally parameterized convolution (CondConv) [28] achieves input-adaptive convolution by representing the kernel as a linear combination of multiple expert kernels. Subsequent developments refined these concepts. Dynamic convolution (DyConv) [29] enhanced training stability and performance using a temperature annealing strategy. For 3D data, Dynamic 3D Convolution (DyCo3D) [30] dynamically adjusts kernel parameters based on complex 3D structures, enabling more effective extraction of key information. Significantly, dynamic convolution often improves a model’s performance without incurring significant computational overhead. Table 1 summarizes the key differences between DSConv and existing dynamic convolutions. Unlike DCN’s point-wise offsets or CondConv’s kernel weighting, DSConv employs iterative kernel tracing along curvilinear structures—a unique advantage for preserving continuous edges/textures in SR.

DSConv’s geometric tracing mechanism provides unique advantages for SR: (1) The iterative kernel adjustment preserves edge continuity by preventing sampling point dispersion. (2) The directional focus along primary axes enhances texture coherence. (3) Offset constraints ensure stability when handling high-frequency details.

In this work, we integrate the feature extraction and enhancement module with DSConv [31] into the super-resolution task. Originally designed for segmenting curvilinear structures like blood vessels, DSConv uniquely iteratively adjusts the kernel’s shape and position to actively “trace” such features. The integration of DSConv is designed to empower our network to more effectively capture and reconstruct intricate textures, sharp edges, and other crucial high-frequency details that are essential for high-quality SR. These details are frequently smoothed over by static kernels or inadequately modeled by standard dynamic convolution approaches.

2.3. Deep CNNs for Image Super-Resolution

Deep learning has profoundly reshaped SISR. The pioneering SRCNN [9] first demonstrated the superiority of end-to-end learning, significantly outperforming traditional methods. To enhance computational efficiency, subsequent models like the FSRCNN [10] and ESPCNN [11] adopted a post-upsampling strategy. Instead of upsampling the input image at the start, these networks perform feature extraction in the LR space and employ a learnable upsampling layer (e.g., deconvolution or sub-pixel convolution) at the network’s end. This paradigm drastically reduces computational costs and has become standard practice.

Research progression led to increasingly deeper and more sophisticated architectures to boost reconstruction quality. Models like VDSR [13] and the deeply recursive convolutional network (DRCN) [15] were explored at greater depth using global residual learning and recursive structures. Attention mechanisms were later incorporated, exemplified by the residual channel attention network (RCAN) [19], allowing networks to focus on more informative features. However, a common limitation of many architectures is their tendency to treat all spatial locations equally during feature aggregation, potentially neglecting the influence of more informative pixels and limiting training efficiency. Tian et al. [32] applied asymmetric convolution to SR, enhancing the robustness of the model to rotation and flipping. Despite their remarkable performance, the practical deployment of deep learning methods is often hindered by their substantial computational requirements. Therefore, lightweight networks like the Cascading Residual Network (CARN) [16] were developed to balance performance and efficiency. More recently, Tian et al. [33] explicitly combined dynamic network principles with SR, proposing a dynamic super-resolution network (DSRNet) that yielded significant performance gains. To effectively address the significant variations in scale and perspective inherent in natural images, jointly learning hierarchical features is a pivotal strategy for image SR. The Residual Dense Network (RDN) [34] further enhanced feature utilization by integrating residual and dense connections with local feature fusion. While deep CNNs generally extract richer structural information, identifying the most impactful layers within a single complex architecture remains challenging. Addressing this, Tian et al. [35] proposed the tree-guided SR network (TSRNet), leveraging a binary tree structure to guide the deep network. This approach aims to amplify the influence of critical nodes and enhance the modeling of hierarchical information relationships, thereby improving image recovery capability.

Despite these impressive advances, many SISR methods still rely predominantly on standard convolutional blocks processing features at a fixed scale. Our proposed DSCNN distinguishes itself by integrating two key innovations: (1) dynamic snake convolution for adaptive feature extraction sensitive to local geometry and (2) a multi-scale parallel structure designed to effectively integrate information across different receptive fields. Together, these components aim to enhance both the capture of fine details and the coherent reconstruction of global structures, leading to more faithful high-resolution image generation.

3. Proposed Method

3.1. Network Architecture

This study proposes a dynamic snake convolution neural network (DSCNN) for SISR. The network architecture comprises four core modules: a feature extraction and enhancement module, a multi-scale feature fusion module, a deep feature extraction module, and an image reconstruction module. The feature extraction and enhancement module hierarchically extracts deep features using dynamic snake convolution (DSConv) [31] blocks integrated with residual connections. The multi-scale feature fusion module then processes these features, integrating information across different scales to enrich the overall feature representation. Finally, the image reconstruction module upsamples the fused feature map and generates the high-resolution output image. This modular design effectively enhances both the objective quality and perceptual fidelity of the super-resolved results (Figure 1).

The feature extraction and enhancement module forms the central core of DSCNN, which is responsible for deep hierarchical feature extraction. This module progressively extracts and refines features through stacked blocks combining DSConv and residual connections. DSConv dynamically adjusts the convolution kernel’s shape and position based on local image geometry, significantly enhancing the network’s capacity to capture fine-grained details and complex textures. Concurrently, the residual connections facilitate stable gradient flow and effective feature propagation across layers, mitigating the vanishing gradient problem inherent in deep architectures.

The multi-scale feature fusion module processes and integrates the outputs from the feature extraction module. It employs a parallel multi-scale convolutional structure to concurrently capture both local details and global structural information. Specifically, this structure utilizes convolution kernels of diverse receptive fields (e.g.,

1 \times 5

,

5 \times 1

,

5 \times 5

) to extract complementary features at different scales. These multi-scale features are then concatenated along the channel dimension, forming a rich and comprehensive feature representation. This process not only significantly enriches the feature maps but also provides robust support for the subsequent high-fidelity image reconstruction.

The image reconstruction module is responsible for upsampling the fused feature map to the target high resolution and generating the final super-resolution image. This module first applies an upsampling operation (e.g., sub-pixel convolution) to increase the spatial resolution. Convolutional layers then map the upsampled features back to the RGB image space. Finally, a scaling operation adjusts the pixel values back to the original input range, yielding the high-quality super-resolution output. This module ensures the generation of visually consistent and natural-looking high-resolution images.

3.2. Loss Function

The

L_{1}

norm [36] is widely adopted as the loss function for super-resolution and other low-level image processing tasks due to its robustness to outliers and superior detail preservation compared to

L_{2}

. Consequently, we employ the

L_{1}

norm to train the DSCNN model. Given a training set

{I_{L R}^{i}, I_{H R}^{i}}_{i = 1}^{N}

, where

I_{L R}^{i}

and

I_{H R}^{i}

denote the i-th LR and HR image patches, respectively, and N is the total number of patches, the loss function is defined as follows:

L (θ) = \frac{1}{N} \sum_{i = 1}^{N} {∥DSCNN (I_{L R}^{i}; θ) - I_{H R}^{i}∥}_{1},

(1)

where

L (θ)

is the

L_{1}

loss,

θ

represents the learnable parameters of DSCNN, and

DSCNN (I_{L R}^{i}; θ)

denotes the model’s output for input

I_{L R}^{i}

. This loss is optimized using the Adam optimizer.

3.3. Improved Residual Structure

Standard residual networks (ResNets) [12] effectively mitigate the vanishing gradient problem in CNNs. However, their traditional architecture can be suboptimal for capturing the diverse complex features inherent in natural images. In very deep networks, low-level features such as edges and textures risk being diluted as they propagate through successive layers. To address this limitation, we introduce an enhanced residual learning framework explicitly designed to promote multi-scale feature representation and ensure robust information flow throughout the network.

This enhanced structure offers several key advantages. Firstly, it enables the network to adaptively weight the importance of features at different scales according to image content. Secondly, it retains the core residual benefit of stable training in deep architectures by preserving direct gradient paths through residual connections, bypassing non-linear transformation blocks. Additionally, this framework serves as a powerful backbone, providing a rich, hierarchically fused feature space. This enriched representation notably enhances the effectiveness of our feature extraction and enhancement module in modeling complex geometric deformations.

3.4. Feature Extraction and Enhancement Module

To enhance the network’s perception of image structure in super-resolution tasks, a key challenge lies in accurately modeling fine-grained textures and complex edges. Standard convolutional layers, which employ fixed and rigid kernels, exhibit limited adaptability to the diverse geometric patterns inherent in natural image content. This inflexibility can lead to the loss of high-frequency details. We therefore identify the need for a mechanism that can dynamically conform its receptive field to local structures. To this end, we introduce the dynamic snake convolution (DSConv) into this module. While DSConv was initially developed to enhance accuracy and continuity in tubular structure segmentation by tracing object contours, we posit that its core principle of adaptive kernel deformation is highly transferable to the problem of detail reconstruction in super-resolution. Its integration into our feature extraction and enhancement module serves to significantly augment the model’s ability to capture and represent intricate image features (Figure 2).

The core innovation of DSConv lies in its dynamic adaptation of the convolution kernel’s shape and position to better conform to target geometries. Specifically, DSConv computes adaptive offsets for a standard convolution kernel along both the x-axis and y-axis. To maintain attention continuity and prevent excessive receptive field dispersion caused by large deformation offsets, an iterative strategy is employed to determine focal positions. The positions of the sampling grids in the convolution kernel K are determined sequentially. Originating from the central position

K_{i}

, the location of any subsequent grid

K_{i \pm c}

is generated based on the position of its predecessor. This process is mathematically defined as follows for the x-axis direction:

K_{i \pm c} = \{\begin{matrix} (x_{i} + c, y_{i} + c) = (x_{i} + c, y_{i} + \sum_{i + c}^{i} Δ y), \\ (x_{i} - c, y_{i} - c) = (x_{i} - c, y_{i} + \sum_{i}^{i - c} Δ y) . \end{matrix}

(2)

For the y-axis direction, the formula becomes

K_{j \pm c} = \{\begin{matrix} (x_{j} + c, y_{j} + c) = (x_{j} + \sum_{j + c}^{j} Δ x, y_{j} + c), \\ (x_{j} - c, y_{j} - c) = (x_{j} + \sum_{j}^{j - c} Δ x, y_{j} - c) . \end{matrix}

(3)

where c denotes the horizontal distance from the central grid, and

Δ x

and

Δ y

represent the dynamically learned offsets along their respective axes, enabling the kernel to flexibly adapt to the target’s geometric contours.

In super-resolution tasks, which inherently involve features at multiple scales and levels of detail, DSConv’s dynamic kernel adjustment capability proves highly beneficial. It allows the model to better capture fine-grained textures and structures within high-resolution images, leading to demonstrable improvements in reconstruction quality.

3.5. Multi-Scale Feature Fusion Module

To enhance the network’s ability to capture features across diverse spatial scales, we introduce an innovative multi-scale feature fusion module. This design efficiently extracts and integrates multi-scale information by processing the input feature map concurrently through four parallel convolutional layers with distinct kernel sizes: 3 × 3, 5 × 1, 1 × 5, and 5 × 5. This configuration enables the network to simultaneously attend to both localized details and broader structural patterns, yielding richer feature representations. The adoption of asymmetric convolutions such as 1 × 5 and 5 × 1 enhances the model’s robustness to image flipping and rotation. This approach improves the richness of feature representation and network performance without increasing computational overhead [37]. Experimental analyses revealed that optimal structural similarity (SSIM) and peak signal-to-noise ratio (PSNR) results were achieved when the four parallel layers each outputted one-fourth of the input channels (i.e., an output channel ratio of 1:1:1:1).

The resulting feature maps from these parallel paths are concatenated along the channel dimension to form a unified, multi-scale feature representation:

x_{concat} = torch . cat ([x_{3 \times 3}, x_{5 \times 1}, x_{1 \times 5}, x_{5 \times 5}], \dim = 1)

(4)

where

x_{3 \times 3}, x_{5 \times 1}, x_{1 \times 5}

, and

x_{5 \times 5}

denote the outputs of the respective convolutional layers.

This concatenated feature map is then combined with the output of the preceding layer via residual connections. The combined features subsequently pass through additional convolutional layers and activation functions to further enhance their expressive power.

This multi-scale parallel design significantly increases the network’s flexibility in handling image features of varying scales and complexities. It effectively captures both coarse-grained structural information and fine-grained texture details, which are crucial for accurate image super-resolution. Experimental validation confirms that this structure markedly improves model performance, leading to superior restoration of texture details, reduced blurring and distortion, and enhanced visual quality and objective metrics (PSNR and SSIM) in the reconstructed images.

4. Experimental Analysis and Results

4.1. Datasets

Our DSCNN was trained on the DIV2K dataset [38], the standard benchmark for image restoration tasks. We utilized its 800 training images for model optimization and 100 validation images for monitoring convergence and selecting optimal checkpoints. The evaluation was performed on four established benchmark datasets: Set5 [39] (5 natural images for preliminary validation), Set14 [40] (14 images containing diverse natural and artificial structures), B100 [41] (100 images from the Berkeley Segmentation Dataset with varied textures), and Urban100 [42] (100 challenging urban scenes characterized by fine geometric details and self-similar patterns). The human eye is much more sensitive to changes in luminance than changes in chrominance, and the Y channel of an image represents luminance information. Additionally, the vast majority of SR methods report PSNR and SSIM on the Y channel [9]. To ensure a fair comparison with previous work, all quantitative evaluations (PSNR and SSIM) were conducted on the luminance (Y) channel in YCbCr color space.

4.2. Experimental Settings

The proposed network architecture comprises four functional modules. The feature extraction and enhancement module employs one standard convolutional layer followed by three DSConv layers to extract initial features, with residual connections enhancing feature representation. The multi-scale feature fusion module further processes features through six standard convolutional layers combined with a multi-scale convolution module. Subsequent deep feature extraction is performed via sequential convolutional operations, while the final image reconstruction module utilizes a custom upsampling mechanism to generate super-resolved outputs. This modular design ensures clear information flow while optimizing detail reconstruction.

The batchsize of the experiment is 64, the patchsize is 64, and the data augmentation strategy adopts random cropping, random flipping, or random rotation and performs normalization operation. The optimizer uses Adam, with a learning rate initialized to

10^{- 4}

and halved every 400,000 iterations. We evaluate the super-resolution results using PSNR (dB) and SSIM. The formal experimental setup consists of 1 epoch every 1000 iterations, with a total of 900 training epochs. The ablation experiment was conducted for 600 epochs, with 1 epoch for every 1000 iterations, but the batchsize and patchsize were adjusted to 32 and 16, respectively. All implementations are built onPyTorch 1.13.1 and executed on servers with one NVIDIA GeForce RTX 3090 GPU (NVIDIA, Santa Clara, CA, USA).

4.3. Ablation Study

To validate the contributions of key components in our DSCNN framework, we conducted systematic ablation experiments for ×4 super-resolution. Starting from a baseline CNN with standard residual blocks, we incrementally integrated proposed modules (DSConv layers, multi-scale fusion, etc.) and evaluated performance progression on the Set5, Set14, and B100 datasets. The quantitative results demonstrating component-wise contributions are summarized in Table 2.

The ablation study begins with a baseline model consisting of a standard residual network without enhancements. We first integrated DSConv by replacing standard convolutions in residual blocks, which alone yielded significant performance gains, e.g., a 0.11 dB PSNR increase on Set5, demonstrating DSConv’s superior capability in capturing intricate structures compared to static kernels. Next, we incorporated the enhanced residual structure with multi-path design, improving feature propagation and fusion to achieve an additional 0.02 dB gain on Set5. We then replaced standard ReLU with SwishReLU activation, resolving potential “dying ReLU” issues while providing smoother gradients for more stable optimization. Finally, the multi-scale enhancement module was added to explicitly fuse features from different receptive fields, completing our DSCNN architecture. Table 2 demonstrates each component’s positive contribution to the overall performance, validating our design choices and highlighting the synergistic effect of these integrated innovations.

In the ×4 task, the performance results of different activation functions are presented in Table 3 when the patch size is set to 8, the batch size is set to 8, the total number of training steps is 100,000, and the decay is set to 50,000. It can be observed that the model employing SwishReLU achieves relatively optimal results. While reducing computational costs, SwishReLU is capable of enhancing model performance.

4.4. Experimental Results

We conducted comprehensive quantitative and qualitative evaluations comparing our DSCNN against traditional SISR methods, spanning traditional approaches (Bicubic) and deep learning models including SRCNN [9], VDSR [13], FSRCNN [10], and DRCN [15].

The quantitative results in Table 4, Table 5, Table 6 and Table 7 demonstrate DSCNN’s consistent superiority across all scale factors (×2, ×3, and ×4) and benchmark datasets. Notably, on the geometrically complex Urban100 dataset at ×4 scaling, DSCNN achieves a 0.23 dB PSNR improvement over lightweight CARN-M while significantly outperforming other established models. Meanwhile, we compared the proposed method with several classical super-resolution methods in terms of Params and FLOPs under the same input size and scaling factor in Table 8. It can be seen that the proposed method has the smallest Params and FLOPs.

Qualitative analysis addresses the limitations of objective metrics (PSNR/SSIM) in assessing perceptual quality, particularly for texture and detail reconstruction. We perform visual comparisons of DSCNN against six representative methods on Set14 and B100: traditional (Bicubic [43], ScSR [57], and Nearest Neighbor) and deep learning-based (SRCNN [9], SelfExSR [42], and CARN-M [16]) methods. The visualizations are structured with the following: (a) high-resolution reference, (b–g) comparative method outputs, and (h) our DSCNN reconstruction. Meanwhile, in order to more intuitively compare the performance of the proposed method in terms of human perception, we present two quantitative perceptual metrics, FSIM and LPIPS, in Table 9.

Figure 3 qualitatively compares “img014” from Set14, demonstrating our method’s superior visual realism. In the magnified region, conventional methods fail to reconstruct sharp contours on the zebra’s hind leg and lose grassy textures, while DSCNN faithfully recovers these intricate features, yielding perceptually plausible results that better align with human vision.

Figure 4 illustrates DSCNN’s enhanced visual clarity on B100’s “img077”. The magnified patch shows that our method precisely delineates color transitions on the marmot’s nose and maintains authentic color renditions, achieving significantly closer fidelity to the ground-truth than both interpolation and deep learning alternatives.

Figure 5 exemplifies DSCNN’s capability on Set14’s “img007”, producing visually sharper reconstruction. The magnified view reveals superior rendering of petal textures and stamen–petal junctions with markedly higher fidelity, outperforming conventional and learning-based methods in structural accuracy.

Figure 6 and Figure 7 demonstrate the super-resolution performance of our method on satellite and medical images, as well as the potential applications of super-resolution methods in multiple fields.

5. Conclusions

We propose the dynamic snake convolution neural network, a novel framework for efficient high-quality image super-resolution. The central component of this framework is as follows: The feature extraction and enhancement module with dynamic snake convolution adaptively deforms convolution kernels to match geometric structures within images, significantly enhancing fine-detail capture. This innovation is synergistically integrated with an enhanced residual framework, ensuring robust feature propagation, a multi-scale parallel convolution structure fusing local and global information, and SwishReLU activations improving optimization stability. Collectively, these components establish an effective solution that advances single-image super-resolution performance across both quantitative metrics and perceptual quality benchmarks. The code is available at https://github.com/WuZiang73/DSCNN (accessed on 25 July 2025).

Author Contributions

Conceptualization, W.X., Z.W. and C.T.; Methodology, W.X., Z.W. and C.T.; Software, Z.W.; Validation, W.X., Z.W. and C.T.; Investigation, T.B. and B.L.; Resources, Q.Z. and T.B.; Data curation, Z.W.; Writing—original draft, W.X. and Z.W.; Writing—review & editing, W.X. and C.T.; Visualization, W.X., Q.Z. and C.T.; Supervision, Q.Z., T.B., B.L. and C.T.; Project administration, Q.Z., T.B., B.L. and C.T.; Funding acquisition, C.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Basic and Applied Basic Research Foundation of Guangdong Province, grant 2025A1515011566; Leading Talents in Gusu Innovation and Entrepreneurship, grant ZXL2023170; and the Basic Research Programs of Taicang 2024, grant TC2024JC32.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Tian, C.; Zheng, M.; Lin, C.W.; Li, Z.; Zhang, D. Heterogeneous window transformer for image denoising. IEEE Trans. Syst. Man Cybern. Syst. 2024, 54, 6621–6632. [Google Scholar] [CrossRef]
Zhang, K.; Zuo, W.; Chen, Y.; Meng, D.; Zhang, L. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE Trans. Image Process. 2017, 26, 3142–3155. [Google Scholar] [CrossRef]
Tian, C.; Zheng, M.; Li, B.; Zhang, Y.; Zhang, S.; Zhang, D. Perceptive self-supervised learning network for noisy image watermark removal. IEEE Trans. Circuits Syst. Video Technol. 2024, 34, 7069–7079. [Google Scholar] [CrossRef]
Ma, L.; Li, N.; Zhu, P.; Tang, K.; Khan, A.; Wang, F.; Yu, G. A novel fuzzy neural network architecture search framework for defect recognition with uncertainties. IEEE Trans. Fuzzy Syst. 2024, 32, 3274–3285. [Google Scholar] [CrossRef]
Li, N.; Xue, B.; Ma, L.; Zhang, M. Automatic Fuzzy Architecture Design for Defect Detection via Classifier-Assisted Multiobjective Optimization Approach. IEEE Trans. Evol. Comput. 2025. [Google Scholar] [CrossRef]
Li, Y.; Sixou, B.; Peyrin, F. A review of the deep learning methods for medical images super resolution problems. IRBM 2021, 42, 120–133. [Google Scholar] [CrossRef]
Wang, P.; Bayram, B.; Sertel, E. A comprehensive review on deep learning based remote sensing image super-resolution methods. Earth-Sci. Rev. 2022, 232, 104110. [Google Scholar] [CrossRef]
Rasti, P.; Uiboupin, T.; Escalera, S.; Anbarjafari, G. Convolutional neural network super resolution for face recognition in surveillance monitoring. In Proceedings of the Articulated Motion and Deformable Objects: 9th International Conference, AMDO 2016, Palma de Mallorca, Spain, 13–15 July 2016; pp. 175–184. [Google Scholar]
Dong, C.; Loy, C.C.; He, K.; Tang, X. Image Super-Resolution Using Deep Convolutional Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 295–307. [Google Scholar] [CrossRef] [PubMed]
Dong, C.; Loy, C.C.; Tang, X. Accelerating the Super-Resolution Convolutional Neural Network. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 11–14 October 2016. [Google Scholar]
Shi, W.; Caballero, J.; Huszár, F.; Totz, J.; Aitken, A.P.; Bishop, R.; Rueckert, D.; Wang, Z. Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 1874–1883. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
Kim, J.; Lee, J.K.; Lee, K.M. Accurate Image Super-Resolution Using Very Deep Convolutional Networks. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 1646–1654. [Google Scholar] [CrossRef]
Lim, B.; Son, S.; Kim, H.; Nah, S.; Lee, K.M. Enhanced Deep Residual Networks for Single Image Super-Resolution. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA, 21–26 July 2017; pp. 1132–1140. [Google Scholar] [CrossRef]
Kim, J.; Lee, J.K.; Lee, K.M. Deeply-Recursive Convolutional Network for Image Super-Resolution. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 1637–1645. [Google Scholar] [CrossRef]
Li, Y.; Agustsson, E.; Gu, S.; Timofte, R.; Van, L. CARN: Convolutional Anchored Regression Network for Fast and Accurate Single Image Super-Resolution. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
Chen, H.; Gu, J.; Zhang, Z. Attention in attention network for image super-resolution. arXiv 2021, arXiv:2104.09497. [Google Scholar]
Lai, W.S.; Huang, J.B.; Ahuja, N.; Yang, M.H. Deep Laplacian Pyramid Networks for Fast and Accurate Super-Resolution. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 5835–5843. [Google Scholar] [CrossRef]
Zhang, Y.; Li, K.; Li, K.; Wang, L.; Zhong, B.; Fu, Y. Image super-resolution using very deep residual channel attention networks. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 286–301. [Google Scholar]
Rahman, J.U.; Zulfiqar, R.; Khan, A. SwishReLU: A unified approach to activation functions for enhanced deep neural networks performance. arXiv 2024, arXiv:2407.08232. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Identity mappings in deep residual networks. In Computer Vision—ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016, Proceedings, Part IV 14; Springer: Cham, Switzerland, 2016; pp. 630–645. [Google Scholar]
Zagoruyko, S.; Komodakis, N. Wide residual networks. arXiv 2016, arXiv:1605.07146. [Google Scholar]
Yu, F.; Koltun, V.; Funkhouser, T. Dilated residual networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 472–480. [Google Scholar]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar]
Tian, C.; Zheng, M.; Jiao, T.; Zuo, W.; Zhang, Y.; Lin, C.W. A self-supervised CNN for image watermark removal. IEEE Trans. Circuits Syst. Video Technol. 2024, 34, 7566–7576. [Google Scholar] [CrossRef]
Han, Y.; Huang, G.; Song, S.; Yang, L.; Wang, H.; Wang, Y. Dynamic neural networks: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 7436–7456. [Google Scholar] [CrossRef]
Dai, J.; Qi, H.; Xiong, Y.; Li, Y.; Zhang, G.; Hu, H.; Wei, Y. Deformable convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 764–773. [Google Scholar]
Yang, B.; Bender, G.; Le, Q.V.; Ngiam, J. Condconv: Conditionally parameterized convolutions for efficient inference. Adv. Neural Inf. Process. Syst. 2019, 32. [Google Scholar]
Chen, Y.; Dai, X.; Liu, M.; Chen, D.; Yuan, L.; Liu, Z. Dynamic convolution: Attention over convolution kernels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11030–11039. [Google Scholar]
He, T.; Shen, C.; Van Den Hengel, A. Dyco3d: Robust instance segmentation of 3d point clouds through dynamic convolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 354–363. [Google Scholar]
Qi, Y.; He, Y.; Qi, X.; Zhang, Y.; Yang, G. Dynamic snake convolution based on topological geometric constraints for tubular structure segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 6070–6079. [Google Scholar]
Tian, C.; Xu, Y.; Zuo, W.; Lin, C.W.; Zhang, D. Asymmetric CNN for image superresolution. IEEE Trans. Syst. Man, Cybern. Syst. 2021, 52, 3718–3730. [Google Scholar] [CrossRef]
Tian, C.; Zhang, X.; Zhang, Q.; Yang, M.; Ju, Z. Image super-resolution via dynamic network. CAAI Trans. Intell. Technol. 2024, 9, 837–849. [Google Scholar] [CrossRef]
Zhang, Y.; Tian, Y.; Kong, Y.; Zhong, B.; Fu, Y. Residual dense network for image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 2472–2481. [Google Scholar]
Tian, C.; Song, M.; Fan, X.; Zheng, X.; Zhang, B.; Zhang, D. A Tree-guided CNN for image super-resolution. IEEE Trans. Consum. Electron. 2025. [Google Scholar] [CrossRef]
Huber, P.J. Robust estimation of a location parameter. In Breakthroughs in Statistics: Methodology and distribution; Springer: Berlin/Heidelberg, Germany, 1992; pp. 492–518. [Google Scholar]
Ding, X.; Guo, Y.; Ding, G.; Han, J. ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
Timofte, R.; Agustsson, E.; Van Gool, L.; Yang, M.H.; Zhang, L. Ntire 2017 challenge on single image super-resolution: Methods and results. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 114–125. [Google Scholar]
Bevilacqua, M.; Roumy, A.; Guillemot, C.; Alberi-Morel, M.L. Low-complexity single-image super-resolution based on nonnegative neighbor embedding. In Proceedings of the British Machine Vision Conference, Surrey, UK, 3–7 September 2012. [Google Scholar]
Zeyde, R.; Elad, M.; Protter, M. On single image scale-up using sparse-representations. In Proceedings of the International Conference on Curves and Surfaces, Avignon, France, 24–30 June 2010; pp. 711–730. [Google Scholar]
Martin, D.; Fowlkes, C.; Tal, D.; Malik, J. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proceedings of the Eighth IEEE International Conference on Computer Vision, ICCV 2001, Vancouver, BC, Canada, 7–14 July 2001; Volume 2, pp. 416–423. [Google Scholar]
Huang, J.B.; Singh, A.; Ahuja, N. Single image super-resolution from transformed self-exemplars. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 5197–5206. [Google Scholar] [CrossRef]
Carlson, R.E.; Fritsch, F.N. Monotone piecewise bicubic interpolation. SIAM J. Numer. Anal. 1985, 22, 386–400. [Google Scholar] [CrossRef]
Dai, D.; Timofte, R.; Van Gool, L. Jointly Optimized Regressors for Image Super-resolution. Comput. Graph. Forum 2015, 34, 95–104. [Google Scholar] [CrossRef]
Wang, Z.; Liu, D.; Yang, J.; Han, W.; Huang, T. Deep Networks for Image Super-Resolution with Sparse Prior. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 370–378. [Google Scholar] [CrossRef]
Timofte, R.; Desmet, V.; Vangool, L. A+: Adjusted Anchored Neighborhood Regression for Fast Super-Resolution. In Computer Vision—ACCV 2014; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2014. [Google Scholar]
Schulter, S.; Leistner, C.; Bischof, H. Fast and accurate image upscaling with super-resolution forests. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 3791–3799. [Google Scholar] [CrossRef]
Mao, X.J.; Shen, C.; Yang, Y.B. Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections. Adv. Neural Inf. Process. Syst. 2016, 29. [Google Scholar]
Lu, Z.; Yu, Z.; Yali, P.; Shigang, L.; Xiaojun, W.; Gang, L.; Yuan, R. Fast Single Image Super-Resolution Via Dilated Residual Networks. IEEE Access 2019, 7, 109729–109738. [Google Scholar] [CrossRef]
Shi, Y.; Wang, K.; Chen, C.; Xu, L.; Lin, L. Structure-Preserving Image Super-Resolution via Contextualized Multitask Learning. IEEE Trans. Multimed. 2017, 19, 2804–2815. [Google Scholar] [CrossRef]
Ren, H.; El-Khamy, M.; Lee, J. Image Super Resolution Based on Fusing Multiple Convolution Neural Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA, 21–26 July 2017; pp. 1050–1057. [Google Scholar] [CrossRef]
Zhang, K.; Gao, X.; Tao, D.; Li, X. Single Image Super-Resolution With Non-Local Means and Steering Kernel Regression. IEEE Trans. Image Process. 2012, 21, 4544–4556. [Google Scholar] [CrossRef]
Bae, W.; Yoo, J.; Ye, J.C. Beyond Deep Residual Learning for Image Restoration: Persistent Homology-Guided Manifold Simplification. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA, 21–26 July 2017; pp. 1141–1149. [Google Scholar] [CrossRef]
Xu, J.; Li, M.; Fan, J.; Zhao, X.; Chang, Z. Self-Learning Super-Resolution Using Convolutional Principal Component Analysis and Random Matching. IEEE Trans. Multimed. 2019, 21, 1108–1121. [Google Scholar] [CrossRef]
Tian, C.; Zhuge, R.; Wu, Z.; Xu, Y.; Zuo, W.; Chen, C.; Lin, C.W. Lightweight image super-resolution with enhanced CNN. Knowl.-Based Syst. 2020, 205, 106235. [Google Scholar] [CrossRef]
Chen, Y.; Pock, T. Trainable Nonlinear Reaction Diffusion: A Flexible Framework for Fast and Effective Image Restoration. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1256–1272. [Google Scholar] [CrossRef]
Yang, J.; Wright, J.; Huang, T.S.; Ma, Y. Image Super-Resolution Via Sparse Representation. IEEE Trans. Image Process. 2010, 19, 2861–2873. [Google Scholar] [CrossRef]
Khan, A.H.; Micheloni, C.; Martinel, N. IDENet: Implicit Degradation Estimation Network for Efficient Blind Super Resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Seattle, WA, USA, 17–18 June 2024; pp. 6065–6075. [Google Scholar]
Wen, W.; Guo, C.; Ren, W.; Wang, H.; Shao, X. Adaptive Blind Super-Resolution Network for Spatial-Specific and Spatial-Agnostic Degradations. IEEE Trans. Image Process. 2024, 33, 4404–4418. [Google Scholar] [CrossRef] [PubMed]
Cao, F.; Chen, B. New architecture of deep recursive convolution networks for super-resolution. Knowl.-Based Syst. 2019, 178, 98–110. [Google Scholar] [CrossRef]
Huang, Y.; Li, S.; Wang, L.; Tan, T. Unfolding the alternating optimization for blind super resolution. Adv. Neural Inf. Process. Syst. 2020, 33, 5632–5643. [Google Scholar]
Hui, Z.; Wang, X.; Gao, X. Fast and Accurate Single Image Super-Resolution via Information Distillation Network. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 723–731. [Google Scholar] [CrossRef]
Tai, Y.; Yang, J.; Liu, X.; Xu, C. MemNet: A Persistent Memory Network for Image Restoration. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 4549–4557. [Google Scholar] [CrossRef]
Cheng, G.; Han, J.; Zhou, P.; Guo, L. Multi-class geospatial object detection and geographic image classification based on collection of part detectors. ISPRS J. Photogramm. Remote Sens. 2014, 98, 119–132. [Google Scholar] [CrossRef]
Cheng, G.; Han, J. A survey on object detection in optical remote sensing images. ISPRS J. Photogramm. Remote Sens. 2016, 117, 11–28. [Google Scholar] [CrossRef]
Cheng, G.; Zhou, P.; Han, J. Learning Rotation-Invariant Convolutional Neural Networks for Object Detection in VHR Optical Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2016, 54, 7405–7415. [Google Scholar] [CrossRef]

Figure 1. Network architecture of DSCNN.

Figure 2. Mechanism of the dynamic snake convolution (DSConv).

Figure 3. Super-resolution visualization results of different methods with scaling factor of

\times 2

on Set14-img014. (a) HR image, (b) Nearest Neighbor, (c) Bicubic, (d) SRCNN, (e) ScSR, (f) SelfExSR, (g) CARN-M, and (h) DSCNN (ours).

Figure 3. Super-resolution visualization results of different methods with scaling factor of

\times 2

on Set14-img014. (a) HR image, (b) Nearest Neighbor, (c) Bicubic, (d) SRCNN, (e) ScSR, (f) SelfExSR, (g) CARN-M, and (h) DSCNN (ours).

Figure 4. Super-resolution visualization results of different methods with scaling factor of

\times 3

on B100-img077. (a) HR image, (b) Nearest Neighbor, (c) Bicubic, (d) SRCNN, (e) ScSR, (f) SelfExSR, (g) CARN-M, and (h) DSCNN (ours).

Figure 4. Super-resolution visualization results of different methods with scaling factor of

\times 3

on B100-img077. (a) HR image, (b) Nearest Neighbor, (c) Bicubic, (d) SRCNN, (e) ScSR, (f) SelfExSR, (g) CARN-M, and (h) DSCNN (ours).

Figure 5. Super-resolution visualization results of different methods with scaling factor of

\times 4

on Set14-img007. (a) HR image, (b) Nearest Neighbor, (c) Bicubic, (d) SRCNN, (e) ScSR, (f) SelfExSR, (g) CARN-M, and (h) DSCNN (ours).

Figure 5. Super-resolution visualization results of different methods with scaling factor of

\times 4

on Set14-img007. (a) HR image, (b) Nearest Neighbor, (c) Bicubic, (d) SRCNN, (e) ScSR, (f) SelfExSR, (g) CARN-M, and (h) DSCNN (ours).

Figure 6. Super-resolution visualization results with scaling factor of

\times 4

on the NWPU VHR-10 satellite image dataset [64,65,66]. (a) HR image, (b) Bicubic, and (c) DSCNN (ours).

Figure 6. Super-resolution visualization results with scaling factor of

\times 4

on the NWPU VHR-10 satellite image dataset [64,65,66]. (a) HR image, (b) Bicubic, and (c) DSCNN (ours).

Figure 7. Super-resolution visualization results with scaling factor of

\times 4

on the publicly available Blood Cell Images dataset. (a) HR image, (b) Bicubic, and (c) DSCNN (ours).

Figure 7. Super-resolution visualization results with scaling factor of

\times 4

on the publicly available Blood Cell Images dataset. (a) HR image, (b) Bicubic, and (c) DSCNN (ours).

Table 1. Comparative analysis of dynamic convolution methods.

Method	Kernel Adaptation	Training Stability	Primary Applications
DCN [27]	Point-wise offsets	Moderate	Object detection
CondConv [28]	Kernel weighting	High	Classification
DyConv [29]	Attention	High (w/annealing)	Keypoint detection
DSConv [31]	Iterative tracing	High	Curvilinear feature enhancement

Table 2. Ablation study results of DSCNN (PSNR/SSIM).

Methods	Set5 (PSNR (dB)/SSIM)	Set14 (PSNR (dB)/SSIM)	B100 (PSNR (dB)/SSIM)
Baseline	31.52/0.8854	27.96/0.7737	27.37/0.7284
+DSConv	31.63/0.8860	28.02/0.7742	27.38/0.7293
+Enhanced Residual Structure	31.65/0.8878	28.04/0.7758	27.40/0.7302
+SwishReLU	31.68/0.8886	28.05/0.7776	27.44/0.7319
+Multi-Scale Enhancement	31.70/0.8890	28.10/0.7777	27.46/0.7323

Table 3. Comparison of ReLU-like activation functions in the ×4 task.

Functions	Set5 (PSNR (dB)/SSIM)	Set14 (PSNR (dB)/SSIM)	B100 (PSNR (dB)/SSIM)
ReLU	30.20/0.8560	27.11/0.7494	26.79/0.7089
LeakyReLU	30.25/0.8567	27.11/0.7501	26.82/0.7103
BReLU-6	30.08/0.8568	27.07/0.7497	26.74/0.7095
SwishReLU	30.37/0.8604	27.21/0.7535	26.85/0.7121

Table 4. Average PSNR (dB) and SSIM for different methods with three scaling factors on the Set5 dataset. The best results are highlighted in red and the second best in blue.

Datasets	Methods	×2	×3	×4
Set5	Bicubic [43]	33.66/0.9299	30.39/0.8682	28.42/0.8104
	JOR [44]	36.58/0.9543	32.55/0.9067	30.19/0.8563
	SRCNN [9]	36.66/0.9542	32.75/0.9090	30.48/0.8628
	SelfEx [42]	36.49/0.9537	32.58/0.9093	30.31/0.8619
	VDSR [13]	37.53/0.9587	33.66/0.9213	31.35/0.8838
	CSCN [45]	36.93/0.9552	33.10/0.9144	30.86/0.8732
	FSRCNN [10]	37.00/0.9558	33.16/0.9140	30.71/0.8657
	A+ [46]	36.54/0.9544	32.58/0.9088	30.28/0.8603
	RFL [47]	36.54/0.9537	32.43/0.9057	30.14/0.8548
	RED [48]	37.56/0.9595	33.70/0.9222	31.33/0.8847
	FDSR [49]	37.40/0.9513	33.68/0.9096	31.28/0.8658
	RCN [50]	37.17/0.9583	33.45/0.9175	31.11/0.8736
	DRCN [15]	37.63/0.9588	33.82/0.9226	31.53/0.8854
	CNF [51]	37.66/0.9590	33.74/0.9226	31.55/0.8856
	DnCNN [52]	37.58/0.9590	33.75/0.9222	31.40/0.8845
	LapSRN [18]	37.52/0.9590	-	31.54/0.8850
	WaveResNet [53]	37.57/0.9586	33.86/0.9228	31.52/0.8864
	CPCA [54]	34.99/0.9469	31.09/0.8975	28.67/0.8434
	LESRCNN [55]	37.65/0.9586	33.93/0.9231	31.88/0.8903
	TNRD [56]	36.86/0.9556	33.18/0.9152	30.85/0.8732
	ScSR [57]	35.78/0.9485	31.34/0.8869	29.07/0.8263
	IDENet [58]	37.16/0.9521	-	31.57/0.8846
	GLFDN [59]	37.47/0.9545	33.86/0.9203	31.90/0.8869
	DSRNet [33]	37.61/0.9584	33.92/0.9227	31.71/0.8874
	DSCNN (Ours)	37.66/0.9589	33.94/0.9234	31.90/0.8909

Table 5. Average PSNR (dB) and SSIM for different methods with three scaling factors on the Set14 dataset. The best results are highlighted in red and the second best in blue.

Datasets	Methods	$\times 2$	$\times 3$	$\times 4$
Set14	Bicubic [43]	30.24/0.8688	27.55/0.7742	26.00/0.7027
	RFL [47]	32.26/0.9040	29.05/0.8164	27.24/0.7451
	SRCNN [9]	32.42/0.9063	29.28/0.8209	27.49/0.7503
	FDSR [49]	33.00/0.9042	29.61/0.8179	27.86/0.7500
	SelfEx [42]	32.22/0.9034	29.16/0.8196	27.40/0.7518
	VDSR [13]	33.03/0.9124	29.77/0.8314	28.01/0.7674
	DRRN [18]	33.23/0.9136	29.96/0.8349	28.21/0.7720
	CSCN [45]	32.56/0.9074	29.41/0.8238	27.64/0.7578
	FSRCNN [10]	32.63/0.9088	29.43/0.8242	27.59/0.7535
	A+ [46]	32.28/0.9056	29.13/0.8188	27.32/0.7491
	JOR [44]	32.38/0.9063	29.19/0.8204	27.27/0.7479
	RED [48]	32.81/0.9135	29.50/0.8334	27.72/0.7698
	RCN [50]	32.77/0.9109	29.63/0.8269	27.79/0.7594
	DRCN [15]	33.04/0.9118	29.76/0.8311	28.02/0.7670
	LapSRN [18]	33.08/0.9130	-	28.19/0.7720
	WaveResNet [53]	33.09/0.9129	29.88/0.8331	28.11/0.7699
	CPCA [54]	31.04/0.8951	27.89/0.8038	26.10/0.7296
	DnCNN [52]	33.03/0.9128	29.81/0.8321	28.04/0.7672
	NDRCN [60]	33.20/0.9141	29.88/0.8333	28.10/0.7697
	TNRD [56]	32.51/0.9069	29.43/0.8232	27.66/0.7563
	ScSR [57]	31.64/0.8940	28.19/0.7977	26.40/0.7218
	IDENet [58]	32.84/0.9025	-	28.27/0.7678
	DSCNN (Ours)	33.28/0.9166	29.84/0.8386	28.17/0.7794

Table 6. Average PSNR (dB) and SSIM for different methods with three scaling factors on the B100 dataset. The best results are highlighted in red and the second best in blue.

Datasets	Methods	$\times 2$	$\times 3$	$\times 4$
B100	Bicubic [43]	29.56/0.8431	27.21/0.7385	25.96/0.6675
	RFL [47]	31.16/0.8840	28.22/0.7806	26.75/0.7054
	SRCNN [11]	31.36/0.8879	28.41/0.7863	26.90/0.7101
	VDSR [13]	31.90/0.8960	28.82/0.7976	27.29/0.7251
	SelfEx [42]	31.18/0.8855	28.29/0.7840	26.84/0.7106
	DRRN [18]	32.05/0.8973	28.95/0.8004	27.38/0.7284
	FSRCNN [10]	31.53/0.8920	28.53/0.7910	26.98/0.7150
	TNRD [56]	31.40/0.8878	28.85/0.7981	27.29/0.7253
	CARN-M [16]	31.92/0.8960	28.91/0.8000	27.44/0.7304
	A+ [46]	31.21/0.8863	28.29/0.7835	26.82/0.7087
	JOR [44]	31.22/0.8867	28.27/0.7837	26.79/0.7083
	RED [48]	31.96/0.8972	28.88/0.7993	27.35/0.7276
	CSCN [45]	31.40/0.8884	28.50/0.7885	27.03/0.7161
	DRCN [15]	31.85/0.8942	28.80/0.7963	27.23/0.7233
	CNF [51]	31.91/0.8962	28.82/0.7980	27.32/0.7253
	LapSRN [18]	31.80/0.8950	-	27.32/0.7280
	NDRCN [60]	32.00/0.8975	28.86/0.7991	27.30/0.7263
	LESRCNN [55]	31.95/0.8964	28.91/0.8005	27.45/0.7313
	FDSR [49]	31.87/0.8847	28.82/0.7797	27.31/0.7031
	ScSR [57]	30.77/0.8744	27.72/0.7647	26.61/0.6983
	DnCNN [52]	31.90/0.8961	28.85/0.7981	27.29/0.7253
	DAN [61]	31.76/0.8858	28.94/0.7919	27.51/0.7248
	IDENet [58]	31.65/0.8848	-	27.35/0.7235
	DSRNet [33]	31.96/0.8965	28.90/0.8003	27.43/0.7303
	DSCNN (Ours)	32.06/0.8983	28.94/0.8011	27.52/0.7342

Table 7. Average PSNR (dB) and SSIM for different methods with three scaling factors on the Urban100 dataset. The best results are highlighted in red and the second best in blue.

Datasets	Methods	$\times 2$	$\times 3$	$\times 4$
Urban100	Bicubic [43]	26.88/0.8403	24.46/0.7349	23.14/0.6577
	SRCNN [9]	29.50/0.8946	26.24/0.7989	24.52/0.7221
	FDSR [49]	30.91/0.9088	27.23/0.8190	25.27/0.7417
	CARN-M [16]	31.23/0.9193	27.55/0.8385	25.62/0.7694
	JOR [44]	29.25/0.8951	25.97/0.7972	24.29/0.7181
	VDSR [13]	30.76/0.9140	27.14/0.8279	25.18/0.7524
	DRRN [18]	31.23/0.9188	27.53/0.7378	25.44/0.7638
	FSRCNN [10]	29.88/0.9020	26.43/0.8080	24.62/0.7280
	TNRD [56]	29.70/0.8994	26.42/0.8076	24.61/0.7291
	IDN [62]	31.27/0.9196	27.42/0.8359	25.41/0.7632
	WaveResNet [53]	30.96/0.9169	27.28/0.8334	25.36/0.7614
	RED [48]	30.91/0.9159	27.31/0.8303	25.35/0.7587
	DRCN [15]	30.75/0.9133	27.15/0.8276	25.14/0.7510
	A+ [46]	29.20/0.8936	26.03/0.7973	24.32/0.7183
	NDRCN [60]	31.06/0.9175	27.23/0.8312	25.16/0.7546
	MemNet [63]	31.31/0.9195	27.56/0.8376	25.50/0.7630
	DnCNN [52]	30.74/0.9139	27.15/0.8276	25.20/0.7521
	LESRCNN [55]	31.45/0.9206	27.70/0.8415	25.77/0.7732
	RFL [47]	29.11/0.8904	25.86/0.7900	24.19/0.7096
	ScSR [57]	28.26/0.8828	-	24.02/0.7024
	LapSRN [18]	30.41/0.9100	-	25.21/0.7560
	DAN [61]	30.60/0.9060	27.65/0.8352	25.86/0.7721
	SelfEx [42]	29.54/0.8967	26.44/0.8088	24.79/0.7374
	IDENet [58]	30.22/0.9004	-	25.39/0.7585
	DSRNet [33]	31.41/0.9209	27.63/0.8402	25.65/0.7693
	DSCNN (Ours)	31.72/0.9244	27.69/0.8425	25.85/0.7787

Table 8. Parameters and FLOPs for different methods with 1280 × 960 input size and PSNR (dB)/SSIM with a ×4 scaling factor on Set5 dataset.

Models	Params (K)	FLOPs (G)
EDSR [14]	1518	114.0
CARN [16]	1592	90.9
DSCNN (ours)	970	86.78

Table 9. FSIM and LPIPS of different models on the Urban100 dataset for the ×4 task.

Model	FSIM	LPIPS
bicubic [43]	0.9415	0.4727
SRCNN [9]	0.9671	0.3516
SelfExSR [42]	0.9761	0.3098
CARN-M [16]	0.9811	0.2524
DSCNN (Ours)	0.9824	0.2385

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xin, W.; Wu, Z.; Zhu, Q.; Bi, T.; Li, B.; Tian, C. Dynamic Snake Convolution Neural Network for Enhanced Image Super-Resolution. Mathematics 2025, 13, 2457. https://doi.org/10.3390/math13152457

AMA Style

Xin W, Wu Z, Zhu Q, Bi T, Li B, Tian C. Dynamic Snake Convolution Neural Network for Enhanced Image Super-Resolution. Mathematics. 2025; 13(15):2457. https://doi.org/10.3390/math13152457

Chicago/Turabian Style

Xin, Weiqiang, Ziang Wu, Qi Zhu, Tingting Bi, Bing Li, and Chunwei Tian. 2025. "Dynamic Snake Convolution Neural Network for Enhanced Image Super-Resolution" Mathematics 13, no. 15: 2457. https://doi.org/10.3390/math13152457

APA Style

Xin, W., Wu, Z., Zhu, Q., Bi, T., Li, B., & Tian, C. (2025). Dynamic Snake Convolution Neural Network for Enhanced Image Super-Resolution. Mathematics, 13(15), 2457. https://doi.org/10.3390/math13152457

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Dynamic Snake Convolution Neural Network for Enhanced Image Super-Resolution

Abstract

1. Introduction

2. Related Work

2.1. Residual Structures for Image Processing

2.2. Dynamic Convolution for Image Processing

2.3. Deep CNNs for Image Super-Resolution

3. Proposed Method

3.1. Network Architecture

3.2. Loss Function

3.3. Improved Residual Structure

3.4. Feature Extraction and Enhancement Module

3.5. Multi-Scale Feature Fusion Module

4. Experimental Analysis and Results

4.1. Datasets

4.2. Experimental Settings

4.3. Ablation Study

4.4. Experimental Results

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI