PECNet: A Lightweight Single-Image Super-Resolution Network with Periodic Boundary Padding Shift and Multi-Scale Adaptive Feature Aggregation

Gao, Tianyu; Liu, Yuhao

doi:10.3390/sym17111833

Open AccessArticle

PECNet: A Lightweight Single-Image Super-Resolution Network with Periodic Boundary Padding Shift and Multi-Scale Adaptive Feature Aggregation

by

Tianyu Gao

and

Yuhao Liu

^*

College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China

^*

Author to whom correspondence should be addressed.

Symmetry 2025, 17(11), 1833; https://doi.org/10.3390/sym17111833

Submission received: 30 September 2025 / Revised: 22 October 2025 / Accepted: 26 October 2025 / Published: 1 November 2025

(This article belongs to the Special Issue Symmetry in Deep Learning Networks and Its Applications in the Real World)

Download

Browse Figures

Versions Notes

Abstract

Lightweight Single-Image Super-Resolution (SISR) faces the core challenge of balancing computational efficiency with reconstruction quality, particularly in preserving both high-frequency details and global structures under constrained resources. To address this, we propose the Periodically Enhanced Cascade Network (PECNet). Our main contributions are as follows: 1. Its core component, a novel Multi-scale Adaptive Feature Aggregation (MAFA) module, which employs three functionally complementary branches that work synergistically: one dedicated to extracting local high-frequency details, another to efficiently modeling long-range dependencies and a third to capturing structured contextual information within windows. 2. To seamlessly integrate these branches and enable cross-window information interaction, we introduce the Periodic Boundary Padding Shift (PBPS) mechanism. This mechanism serves as a symmetric preprocessing step that achieves implicit window shifting without introducing any additional computational overhead. Extensive benchmarking shows PECNet achieves better reconstruction quality without a complexity increase. Taking the representative shift-window-based lightweight model, NGswin, as an example, for ×4 SR on the Manga109 dataset, PECNet achieves an average PSNR 0.25 dB higher, while its computational cost (in FLOPs) constitutes merely 40% of NGswin’s.

Keywords:

lightweight super-resolution; multi-branch aggregation; window attention; implicit shifting

1. Introduction

Lightweight Single-Image Super-Resolution (SISR) seeks to generate high-resolution images under limited computing power, enabling real-time, on-device operation. While CNN-based approaches (e.g., IMDN [1], RFDN [2]) have established efficient frameworks, they struggle to reconcile computational efficiency with reconstruction fidelity—often sacrificing quality for speed or vice versa. This fundamental trade-off impedes practical applications where both metrics are critical.

Existing solutions bifurcate into two architecturally deficient paradigms: CNN-based methods pursue efficiency through feature distillation (IMDN [1]) or separation (BSRN [3]), yet their convolutional operators—constrained by limited receptive fields—fundamentally lack global contextual modeling. To enhance representational power, CNN-based methods [4,5,6] progressively adopt deeper and larger architectures, yet such computationally intensive designs hinder deployment on resource-constrained devices. Conversely, while the self-attention (SA) mechanism employed by ViT approaches (e.g., SwinIR [7], ELAN-light [8]) effectively models non-local information for coherence, it demands high computational resources and massive memory. Hybrid attempts like NGswin [9] integrate convolutional priors into attention blocks yet exacerbate spectral imbalance due to self-attention’s inherent low-pass filtering. Nowadays, many studies on SR integrate local and non-local features. However, although the existing multi-branch models have been specially designed for extracting different types of features, the features extracted from each branch cannot complement each other well, resulting in less than ideal feature integration.

These unresolved issues have led to thorny limitations for efficient and high-fidelity reconstruction in lightweight SISR research. First, prevailing multi-scale architectures exhibit unremarkable feature synthesis: despite innovations like SMFANet’s dual-path design [10], existing frameworks fail to holistically integrate discriminative local details (e.g., high-frequency edges/textures) and global structural dependencies (e.g., low-frequency layouts)—The separation and extraction of complementary modes generate edge or corner artifacts in structurally complex regions, as quantitatively evidenced in Urban100 benchmarks [11]. Second, existing window shifting methods often create architectural imbalance when applied partially. Simply shifting windows in a single component disrupts feature consistency and degrades performance. Therefore, a significant research gap exists, as no solution can seamlessly integrate features and perform window shifting across the entire architecture without adding costly complexity.

To comprehensively address these limitations, we propose the Multi-scale Adaptive Feature Aggregation (MAFA) module—a collaborative integrated three-branch architecture that resolves the issue of inefficient feature fusion via synergistic integration;we also propose a lightweight SISR algorithm, Periodically Enhanced Cascade Network (PECNet), which employs MAFA as its main block.

(1): MAFA integrates local details and global structural information to avoid unremarkable feature synthesis—solving the problem that existing frameworks (even with improvements such as dual-path design) still cannot integrate two complementary features as a whole, namely “distinguishable local details” and “global structural dependencies”, resulting in a disconnection in their extraction. The MAFA module achieves holistic integration through three specialized branches: the local detail estimation (LDE) enhances high-frequency details via depthwise convolution; the effective approximate self-attention (EASA) models long-range dependencies with variance modulation;the Window Nonlocal Attention (WNA) captures intra-window contexts through 8 × 8 attention.
(2): We propose the Periodic Boundary Padding Shift (PBPS) mechanism in MAFA, which serves as a unified preprocessing backbone to structurally support and align the three complementary branches. As it is difficult to apply window shifting in LDE and singly window shifting to one branch (WNA) will leading unbalance: for odd-indexed blocks, symmetric replicate-padding (4 pixels) expands feature dimensions to induce fixed window offset; even-indexed blocks maintain original resolution followed by center cropping—eliminating explicit shifting operations while equivalently achieving SwinIR-style cross-window communication at zero computational overhead. Feature refinement is enhanced via a Partial Convolution-based Feed-forward Network (PCFN) that selectively processes channels while preserving identity paths. Our experimental evaluation shows that PECNet achieves an outstanding balance between reconstruction quality and computational efficiency across multiple benchmarks (see Figure 1).

We summarize our main contributions as follows:

Three-branch aggregation: We design a three-branch aggregation module in MAFA to address the gap in frequency and spatial distance modeling for SISR. the EASA branch captures remote non-local low-frequency information, the LDE branch extracts local high-frequency details and the WNA branch focuses on non-local interactions within the shifted window.
PBPS mechanism: We propose the PBPS mechanism to integrate the three branches. The PBPS mechanism not only alleviates boundary discontinuity in traditional WNA blocks, but also enhances the generalization capability of LDE and EASA models due to window shifting is also apply in LDE and EASA.
PECNet algorithm: We propose a lightweight SISR algorithm, PECNet, which employs MAFA as its main block. PECNet extracts multi-frequency and multi-distance features through three specialized branches, with each block operating optimally in its respective domain.

2. Related Work

CNN-based SR. SRCNN [13] pioneered direct end-to-end mapping learning from low-resolution (LR) to high-resolution (HR) images, surpassing traditional interpolation methods. FSRCNN [14] and ESPCN [15] adopted post-upsampling strategies to reduce computational overhead, thereby improving efficiency. DRCN [16] introduced deep recursion with recursive supervision and skip-connections to achieve large receptive fields without increasing parameters. DRRN [4] employed recursive residual blocks with weight sharing and multi-path local connections to build a very deep network with minimal parameters. EDSR [5] and RCAN [6] improved performance through hundreds of layers, breaking through the limitations of deep architectures. However, these large models with high computational costs are difficult to deploy in practical applications.

lightweight SR. The classic lightweight SR algorithm AWSRN [17] introduces Adaptive Weighted residual Units and local fusion blocks for efficient residual learning, and proposes an adaptive weighted multi-scale module to fully utilize features. OSFFNet [18] introduced an Omni-Stage Feature Fusion architecture with stacked initialization and dynamic fusion to fully leverage multi-level features, particularly shallow ones, for efficient and high-quality image super-resolution. IMDN [1] proposed an information distillation block, which progressively splits, refines and aggregates feature maps, significantly reducing model parameters. HiT-SR [19] introduced expanding hierarchical windows and a spatial-channel correlation mechanism with linear complexity to efficiently aggregate multi-scale features while significantly reducing computational costs. The existing methods mainly focus on multi-dimensional information fusion:DLGSANet [20] introduced a Multi-Head Dynamic Local Self-Attention module and a Sparse Global Self-Attention module to efficiently capture both locally variant structures and the most useful global dependencies with very low computational overhead. MTKD [21] innovatively applied a multi-teacher distillation strategy to compress models, with its knowledge transfer mechanism improving the PSNR of lightweight models by 0.21 dB. XPSR [22] explored cross-modal priors to enhance reconstruction quality. Although existing lightweight SR methods model local/global features or fuse multi-source information, they lack targeted high-frequency modeling.

ViT-based SR. IPT [23] pioneered the introduction of standard ViT into SR tasks, laying the foundation for global modeling. SwinIR [7] designed window self-attention, surpassing large CNN models. Swin2SR [24] improves SwinIR by adopting Swin Transformer V2, effectively solving the problems of training instability and resolution difference, and achieving faster convergence and competitive performance in the tasks of compressed image super-resolution and recovery. Shift-Net [25] introduced a shift-connection layer to U-Net for deep feature rearrangement, enabling semantically coherent and texture-rich inpainting with improved efficiency. ELAN [8] proposed grouped weight sharing, significantly reducing complexity. Inspired by local attribution graphs, NGswin [9] introduced N-Gram context and SCDP bottleneck to expand receptive fields and fuse multi-scale features efficiently for lightweight image super-resolution. Despite the advantages of integrating convolution and attention in ViT methods, they still have drawbacks:self-attention’s low-pass characteristic over-smooths reconstructions and causes frequency-domain imbalance, notably lacking dedicated branches to explicitly model high-frequency details and low-frequency structures simultaneously, resulting in unbalanced feature representations.

Building upon the previously analyzed works, our MAFA module incorporates three dedicated branches that explicitly model both high-frequency details and low-frequency structures, enabling more balanced and comprehensive feature representation.

To collaborate with the three-branch structure of MAFA, this paper proposes a PBPS mechanism that implicitly enables shifted-window modeling while providing a unified structural foundation for parallel multi-branch processing, thereby allowing simultaneous extraction of high-frequency details and global dependencies.

In summary, PECNet is not an incremental improvement but a novel architectural solution via MAFA’s three-branch synergy and PBPS’s structural implicit shifting, achieving a superior balance between efficiency and fidelity. The comparison results are shown in Table 1.

3. Proposed Method

In this section, we will give a detailed introduction of our proposed PECNet: Section 3.1 elaborates on its overall architecture, Section 3.2 presents the three-branch design of the MAFA module and Section 3.3 explains the PBPS mechanism and its inverse operation.

3.1. Overall Architecture

Figure 2 illustrates the overall architecture of our proposed PECNet, which takes a low-resolution (LR) image as input and employs a 3 × 3 convolutional layer to extract shallow low-level features. These features are then fed into a series of Periodically Enhanced Cascade Blocks (PECBs) to generate deep representative features, with each PECB integrating a Multi-scale Adaptive Feature Aggregation (MAFA) module and a Partial convolution-based feed-forward network (PCFN). The MAFA module, central to feature processing, deeply incorporates the Periodic Boundary Padding Shift (PBPS) mechanism, which indirectly implements window shifting for the attention mechanism by expanding image boundaries in every alternate deep feature block. The processed features are then fed into three specialized branches: the Local Detail Estimation (LDE) branch, which captures fine- grained local details and high-frequency information; the Efficient Approximation of Self-Attention (EASA) branch, which models long-range non-local interactions focusing on distant low-frequency information; and the Window Non-local Attention (WNA) branch, which partitions features into fixed windows to compute intra-window attention, handling window-internal low-frequency information. The PCFN refines MAFA’s output via channel-selective processing, applying spatial convolution to minority channels while preserving majority identity paths for efficiency.

The PECB, integrating MAFA and PCFN, achieves gradient stability via residual connections, formulated as

F_{ρ} = MAFA (F_{in}) + F_{in}

and

{\hat{F}}_{ρ} = PCFN (F_{ρ}) + F_{ρ}

, where

F_{in}, F_{ρ}

, and

{\hat{F}}_{ρ}

retain their defined meanings. Building on the deep features output by PECBs, the image reconstruction module—with a lightweight architecture—generates the high-quality output: it uses a convolutional layer to compress channel dimensions to match the target upscaling factor, followed by a PixelShuffle layer for resolution upscaling, with a global residual connection incorporated to preserve high-frequency details. Through the PBPS mechanism and the specialized three-branch structure in MAFA, PECNet effectively captures multi-scale and multi-frequency features.

3.2. Three-Branch Architecture in MAFA

To synergistically model local and non-local features for enhanced reconstruction accuracy, we propose a lightweight Multi-scale Adaptive Feature Aggregation (MAFA) module. Within this module, the Periodic Boundary Padding Shift (PBPS) mechanism implicitly realizes shifted-window attention, while three specialized components operate in parallel: the Local detail estimation (LDE) focuses on edge and texture recovery, the Efficient approximation of self-attention (EASA) efficiently models long-range dependencies and the Window Non-local Attention (WNA) captures non-local context and low-frequency structural information.

Formally, the MAFA module processes input features

F_{in} \in R^{H \times W \times C}

through three parallel branches with integrated PBPS, as defined by the Equation (1):

MAFA (F_{in}) = Γ_{crop} ({Conv}_{1 \times 1} (LDE (Γ_{padding} (F_{in})) + EASA (Γ_{padding} (F_{in})) + WNA (Γ_{padding} (F_{in}))))

(1)

where

Γ_{padding} (\cdot)

denotes center-initiated replicate-padding (4px) for odd-indexed blocks.

LDE (\cdot)

denotes High-Frequency Extraction,

EASA (\cdot)

denotes Efficient approximation of self-attention and

WNA (\cdot)

denotes Window Attention.

Γ_{crop} (\cdot)

indicates center crop (4px) restoration for odd-indexed blocks.

The processing flow within a PECB is as follows: the input features

F_{in}

first undergo the Periodic Boundary Padding Shift (

Γ_{padding}

) for implicit window alignment. The padded features are then processed in parallel by the three specialized branches (LDE, EASA, WNA). Their outputs are aggregated via element-wise summation, followed by a

1 \times 1

convolution for channel mixing and feature refinement. The resulting features are then passed through the Inverse PBPS (

Γ_{crop}

) to restore the original spatial dimensions. Finally, the output of the entire MAFA module is integrated back into the main pathway via a residual connection (

F_{p} = MAFA (F_{in}) + F_{in}

), followed by further processing in the PCFN with another residual connection.

Feature partitioning and fusion. Given input feature

F_{padded} \in R^{H \times W \times C}

from PBPS, we first apply channel-wise normalization and expansion, which is achieved by Equation (2):

{X, Y, Z} = S ({Conv}_{1 \times 1} (| F_{padded} |_{2}))

(2)

where

{| \cdot |}_{2}

denotes channel-wise L2 normalization, the

| F_{padded} |_{2}

represents the L2 normalization at the channel level for the feature map

F_{padded}

, that is, calculating the L2 norm for all pixel values in each channel,

{Conv}_{1 \times 1}

performs channel expansion, and

S (\cdot)

equally partitions features into three branches.The function

S (\cdot)

partitions the input feature channels sequentially and uniformly into three equal parts, requiring the total number of channels C to be divisible by 3, and no channel shuffling or permutation is applied. We then process the features X, Y and Z in parallel via the LDE and EASA and WNA branches, producing the

X_{l}

,

Y_{d}

and

Z_{w}

, respectively. Finally, we fuse

X_{l}

,

Y_{d}

and

Z_{w}

together with element-wise addition and feed them into a 1 × 1 convolution to form a representative output. This process can be formulated as Equation (3):

F_{mafa} = {Conv}_{1 \times 1} (X_{l} + Y_{d} + Z_{w})

(3)

where

F_{mafa}

is the output feature after three-branch fusion.

LDE. The LDE branch captures high-frequency details essential for reconstruction quality. Dedicated to local feature extraction through cascaded convolutions, it complements EASA’s global low-frequency modeling and WNA’s window-based low-frequency focus [7]. This design is inspired by SMFANet [10].

The LDE branch processes the input feature X via depthwise convolution to generate the encoded feature

X_{m}

; then, through a 1 × 1 convolutional layer and a non-linear activation function, it outputs the refined high-frequency feature

X_{l}

, as formulated below (4) and (5):

X_{m} = {Conv}_{1 \times 1} ({DWConv}_{3 \times 3} (X))

(4)

X_{l} = {Conv}_{1 \times 1} (ϕ (X_{m}))

(5)

where

{DWConv}_{3 \times 3} (X)

denotes the local spatial feature extraction via 3 × 3 depthwise convolution,

X_{m} \in R^{H \times W \times 2 C}

is the encoded high-frequency detail feature and

X_{l}

represents the optimized local high-frequency feature.

EASA. The EASA block extracts non-local low-frequency features to model long-range spatial dependencies, transcending window-constrained attention mechanisms. This design is inspired by SMFANet [10].

The EASA branch extracts low-frequency components via adaptive max pooling

D (\cdot)

with scaling factor 8, then processes them using a 3 × 3 depth-wise convolution

{DWConv}_{3 \times 3} (\cdot)

and to characterize spatial distribution properties, computes channel-wise variance

σ^{2} (Y) \in R^{1 \times 1 \times C}

as statistical divergence and merges it with the convolved features via 1 × 1 convolution, which is achieved by Equations (6) and (7):

σ^{2} (Y) = \frac{1}{N} \sum_{i = 0}^{N - 1} {(y_{i} - μ)}^{2}

(6)

Y_{h} = {Conv}_{1 \times 1} ({DWConv}_{3 \times 3} (D (Y)) + σ^{2} (Y))

(7)

where N is the total number of pixels in the channel, and the summation runs from the 0-th to the (

N - 1

)-th pixel, thus iterating over all pixels.

y_{i}

denotes the value of each pixel,

μ

is the mean of all pixel values, the values

y_{i}

in the variance calculation

σ^{2} (Y)

are computed over all spatial locations of the input feature map Y before pooling. The modulated feature

Y_{h} \in R^{H \times W \times C}

is then adaptively aggregated with the original input

Y

to extract the representative structural representation

Y_{d}

, which is achieved by Equation (8):

Y_{d} = Y ⊙ U (ϕ (Y_{h}))

(8)

where

U (\cdot)

denotes a nearest upsampling operation, ⊙ represents element-wise multiplication.

WNA. The WNA branch captures low-frequency information within fixed windows, balancing computational efficiency and non-local context modeling. It adapts the Swin Transformer Layer (STL) from SwinIR [7].

Input feature

Z \in R^{H \times W \times C}

is divided into non-overlapping 8 × 8 windows via PBPS. In odd-indexed PECBs (expanded to

(H + 8) \times (W + 8) \times C

via PBPS), 81 non-overlapping 8 × 8 windows are partitioned from the feature map after boundary expansion. In even-indexed PECBs (with the original size

H \times W \times C

), 64 non-overlapping 8 × 8 windows are directly partitioned, which is achieved by Equation (9):

W = \{\begin{matrix} {W_{i j} ∣ W_{i j} = Z [i : i + 8, j : j + 8, :], \\ i, j \in {0, 8, 16, \dots, 64}} & (odd - indexed PECBs, 81 windows) \\ {W_{k j} ∣ W_{k j} = Z [k : k + 8, j : j + 8, :], \\ k, j \in {0, 8, 16, \dots, 56}} & (even - indexed PECBs, 64 windows) \end{matrix}

(9)

where

Z [i : i + 8, j : j + 8, :]

and

Z [k : k + 8, j : j + 8, :]

are window slicing operations.

W_{i j}

(or

W_{k j}

) denotes a single window feature block with the shape 8 × 8 × C, and

W

the window collection. Afterward, each window independently executes self-attention, and the processed windows are reconstructed into

Z_{w} \in R^{H \times W \times C}

by their original positions. This design induces a 4-pixel offset between the window centers of the WNA branches in adjacent PECBs via PBPS, equivalently enabling cross-window information interaction, which is achieved by Equation (10):

Z_{w} = Reconstruct ({W_{1}, \dots, W_{K}})

(10)

where

W_{i} \in R^{8 \times 8 \times C}

represents the window after calculating the internal attention.

3.3. Periodic Boundary Padding Shift Mechanism and Its Inverse Operation in MAFA

PBPS–Padding. As shown in Figure 3, the PBPS mechanism is designed to simulate window shifting mechanism due to it is difficult to utilize into the LDE and EASA. For odd-indexed PECBs, the input feature map of dimensions

H \times W

is expanded to

(H + 8) \times (W + 8)

through replicate-padding (applied only to odd-indexed PECBs), which is achieved by Equation (11):

F_{padded} = \{\begin{matrix} Pad (F_{in}, pad = 4) & if index \mod 2 = 1 \\ F_{in} & otherwise \end{matrix}

(11)

where

F_{in}

denotes the input feature map,

Pad (\cdot, 4)

represents the symmetric border expansion operation (extending 4 pixels symmetrically on all sides), and index corresponds to the PECB index (1, 2, 3, …).

F_{padded}

serves as the input to the three branches. (Input dimensions:

H \times W \times C

for all cases; Output dimensions:

(H + 8) \times (W + 8) \times C

for odd-indexed PECBs,

H \times W \times C

for even-indexed PECBs).

The WNA branch relies on windowed non-local attention to model low-frequency dependencies, yet explicit window shifting (e.g., SwinIR’s shifted window) can only be applied to WNA branches, and directly embedding such explicit shifts into WNA will cause branch imbalance within the module; The PBPS design ensures adjacent PECBs’ windows are offset by 4 pixels without explicit shifting—thereby enabling the simultaneous introduction of window offset effects in the WNA, LDE and EASA branches, ensuring the overall coordination and efficient operation of the module. We have tested various padding methods, and our experiments indicate that replicate-padding performs the best. We will discuss these findings in detail in the Ablation Study section.

PBPS—Crop. To ensure the input and output sizes of PECBs are strictly consistent (always

H \times W \times C

), center cropping is applied to the fused output to restore the original size (

H \times W

), which is only performed for odd PECBs, as shown in Equation (12):

F_{p} = \{\begin{matrix} Crop (F_{mafa}, 4) & if index \mod 2 = 1 \\ F_{mafa} & otherwise \end{matrix}

(12)

(For odd PECBs: the input size is

(H + 8) \times (W + 8) \times C

, and the output is obtained by extracting the central region [4:H+4, 4:W+4, :] to restore the size

H \times W \times C

.)

Where

Crop (\cdot, 4)

denotes the center cropping operation, where 4 is the cropping width.

F_{mafa}

is the output feature of MAFA, and

F_{p}

is the final output feature of the MAFA module. Regarding the input sizes: for odd PECBs, the input is

(H + 8) \times (W + 8) \times C

; for even PECBs, the input is

H \times W \times C

. Notably, the output size for both cases is consistently

H \times W \times C

.

4. Experimental Results

4.1. Datasets and Implementation

Datasets. High-quality super-resolution training datasets are crucial for model performance. In this study, we adopted the widely used DIV2K [26] and DIV2K + Flickr2K (DF2K) [26] datasets as the main sources of training data. To make a fair and effective comparison with the current mainstream methods [9,10,27,28,29,30,31], we trained the proposed model on the DIV2K and DF2K datasets respectively. During the model evaluation stage, we selected multiple recognized benchmark test sets for performance verification, covering Set5 [32], Set14 [33], B100 [34], Urban100 [11] and Manga109 [12]. In terms of evaluation indicators, we convert the reconstructed image to the YCbCr color space and calculate the peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) of the Y channel to objectively measure the quality of the restored image.

Implementation. During the model training process, we randomly cropped 64 image blocks of size

64 \times 64

from low-resolution (LR) images as the base input.The loss function of the model is set in the same way as SAFMN [35], using a combined loss function of

L_{1}

Loss (weight = 1.0) and FFT Loss (weight = 0.05), with a batch size of 64 image patches per GPU, where each high-resolution patch is

256 \times 256

pixels (

64 \times 64

for LR at

\times 4

scale). Data augmentation includes random horizontal flipping and rotation, with a fixed random seed of 10. The optimization employs Adam [36], and its hyperparameters are set to

β_{1} = 0.9

,

β_{2} = 0.99

. The initial learning rate is set to

1 \times 10^{- 4}

and the minimum learning rate to

1 \times 10^{- 6}

, with Cosine Annealing Restart scheduling [37] (1,000,000 iterations period, no warm-up), and no gradient clipping is applied. We trained the main model on the DF2K dataset for 1,000,000 iterations, while all models in the ablation study are trained on the DIV2K dataset for 250,000 iterations to reduce the cost. The experiments were implemented using PyTorch 1.8 on a platform with an NVIDIA GeForce RTX 3090 GPU. The proposed PECNet architecture contains 8 Periodic Enhancement Cascade Blocks (PECBs), each with 36 channels.

4.2. Ablation Experiment

In this section, we perform an ablation study to verify the effectiveness of our proposed PECNet. All ablation models were trained on DIV2K (250,000 iterations) and validated under the Set5. The upsampling scale factor was fixed at

\times 4

, and the network configuration remained completely consistent. The performance evaluation is based on the average PSNR and SSIM.

Verification of the effectiveness of the three-branch architecture. To systematically verify the contributions of each part of the three-branch structure in the PECNet network, we designed seven ablation models to quantitatively analyze the influence of each part on the reconstruction performance: Model S_LDE with only the LDE branch, Model S_EASA with only the EASA branch, Model S_WNA with only the WNA branch, Model D_L+E with both the LDE and EASA branches, Model D_E+W with both the EASA and WNA branches, Model D_L+W with both the LDE and WNA branches, and Model T_L+E+W with all three branches in the MAFA module. The results are shown in Table 2.

As shown in Table 2, Model S_EASA, which contains only the EASA branch, achieved the lowest PSNR and SSIM performance. The reconstruction performance improved markedly as more feature-extraction branches were incorporated. Model S_WNA significantly outperformed the other single-branch models, demonstrating the effectiveness of the WNA branch. Model T_L+E+W, which integrates all three branches, delivered the best performance, exceeding that of Model D_E+W (the best-performing two-branch model) by 0.01 dB. These results confirm that the three-branch structure effectively enhances the representational capacity of the MAFA module for high-quality image reconstruction.

Verification of the effectiveness of the PBPS mechanism. To systematically evaluate the effectiveness of the proposed PBPS mechanism, three model variants were designed for this ablation study: Model T_L+E+W from the previous ablation study, which contains the three-branch structure without PBPS and without window shifting; Model “WNA-shift”, which incorporates explicit window shifting only within the WNA branch but still without PBPS; and Model “PBPS”, which integrates both the three-branch structure and the proposed PBPS mechanism. Their reconstruction performance is quantitatively compared in Table 3.

As shown in Table 3, the data for Model PBPS(our) in Table 3 corresponds to the replicate padding methods. The performance of Model PBPS(our) is the best, illustrating the effectiveness of the PBPS mechanism. However, the performance of Model window shifting only within the WNA (WNA-shift) is worst, because it only has the window shift operation contained in the WNA branch. This is because the “explicit window movement” operation of the WNA branch in model WNA-shift disrupts the “spatial consistency” of feature extraction among the three branches of the MAFA module, introduces fusion noise and training instability and its negative effects outweigh the positive benefits brought by window interaction. The PBPS mechanism performs padding (and subsequent cropping) at the input level of the entire MAFA module. This means that the inputs of the three branches are all feature maps that have undergone the same preprocessing (padding). PBPS not only acts on WNA, but its boundary expansion operation is also theoretically beneficial for the processing of boundary region features by LDE and EASA branches, enhancing the generalization ability of the model. Due to the fact that the implicit windowing operation of the PBPS mechanism in Model PBPS(our) can be applied to three branches, the performance of Model PBPS(our) has been significantly improved, with an increase of

0.23

dB in PSNR compared to Model T_L+E+W.

Verification of the effectiveness of replicate in different Padding methods of the PBPS mechanism. To quantitatively evaluate the effects of different padding methods in the PBPS mechanism, our PBPS model can be classified into four types based on the padding method: replication, circulation, reflection and constant zero. The results are shown in Table 4.

As can be seen from Table 4, the comparison of the filling methods of the PBPS mechanism shows that the replicate mode has the highest PSNR and SSIM performance on the five types of test sets, achieving the best balance, indicating that maintaining boundary continuity is crucial for window attention.

Through experimental comparative analysis of various ablation models and different filling methods, the effectiveness of each part in the MAFA module was verified respectively.

4.3. Comparisons with State-of-the-Art Methods

Quantitative comparison. To comprehensively evaluate the performance of the proposed method, we conducted a systematic quantitative comparison. Firstly, PECNet is compared with existing state-of-the-art lightweight SISR methods, including SMSR [28], ShuffleMixer [29], SAFMN [35], SMFANet [10], LAPAR-A [30], NGswin [9] and SRConvNet [31]. These competitors have been carefully selected and cover the major architectural trends: SMSR, SAFMN and LAPAR-A represent traditional lightweight models; ShuffleMixer, SMFANet and SRConvNet are relatively novel lightweight models in recent years. NGswin, as a representative lightweight model based on Shift-Window, can directly compare our implicit PBPS mechanism with the explicit Window Shift method. When the amplification factors are

\times 2

,

\times 3

and

\times 4

, the quantitative results of each method on multiple benchmark datasets are presented in Table 5. PECNet(ours) was trained on the larger DF2K dataset to ensure a fair and rigorous comparison. In addition to the core image quality assessment metrics PSNR and SSIM, we have also listed the parameter count (#Params) and computational cost (#FLOPs) of each model to measure their complexity. To ensure fairness, the model complexity (FLOPs) of all methods is calculated using the fvcore library (i.e., fvcore.nn.flops_count, this code is from the SMFANet algorithm) and the calculation scenario is set to super-resolve a low-resolution image to a resolution of

1280 \times 720

pixels. Our PECNet achieved a throughput of 34 FPS and a peak memory usage of 265 MB.

Table 5 shows that our PECNet has achieved better performance on almost all benchmark datasets. This indicates that thanks to the MAFA module, PECNet can more effectively mine complementary information in images. Compared with the previous lightweight SR methods based on CNN, it demonstrates significant potential for performance improvement. It particularly demonstrates advantages in detail restoration and structural fidelity. For instance, on the Manga109 test set for

\times 4

SR, PECNet (ours) achieves a PSNR that is 0.25 dB higher than NGswin, while its computational cost (FLOPs) constitutes only 40% of NGswin’s.

Qualitative comparisons. We compared the visual results of our proposed PECNet with some lightweight SR methods on the Urban100 and Manga109 datasets at a scale of

\times 4

. The comparison algorithms are as follows:SMSR, ShuffleMixer, SAFMN, SMFANet, NGswin and LAPAP-A. As shown in Figure 4, existing methods yield blur, distortions and inaccurate structures, while our PECNet produces sharper edges, structural integrity and coherent textures, demonstrating superior visual quality.

In the “img061” example from Urban100: The surface texture of the sunshade glass grilles reconstructed by our method is clearer and sharper, with the edges remaining intact and the texture direction intact.Although NGswin achieved a slightly higher PSNR/SSIM in this example, the texture orientation in the restored images was disordered. In contrast, our PECNet produced visually superior reconstructions with more precise structural lines and significantly reduced artifacts in the detailed areas of buildings.This indicates that quantitative metrics alone may not fully capture perceptual quality and structural fidelity, which are critical for practical applications.The results of other methods (such as SMFANet, SMFANet, etc.) show obvious blurriness and loss of detail, the straightness and continuity of lines are disrupted, and the texture direction is disordered.

In the “OL_Lunch” example from Manga109: The size of the entire image occupied by the aircraft is very small in the image. Our method successfully restored the basic shape of the aircraft, including the left wing, fuselage and right wing of the aircraft. In the results of other methods, the left wing of the aircraft has completely disappeared.

In the “TennenSenshiG” example from Manga109: Our PECNet precisely reconstructs the clean, continuous line texture of the character’s nose, maintaining both the sharpness and accurate curvature that closely aligns with the ground truth HR image. In contrast, other methods exhibit significant shortcomings in line recovery—NGswin produces distorted and improperly aligned nasal contours; SMSR generates blurred and unnaturally thick lines; while ShuffleMixer, LAPAR-A, SAFMN and SMFANet all fail to maintain line continuity and structural accuracy, creating either discontinuous or misaligned nasal contours that compromise facial integrity.

In the “HighschoolKimengumi_vo20” example from Manga109: Our PECNet accurately reconstructs the smooth, rounded contour beneath the character’s right eye, maintaining a natural circular shape that closely matches the ground truth. In contrast, all other compared methods fail to preserve this geometric fidelity—ShuffleMixer and SAFMN produce noticeably distorted polygonal shapes, while SMSR, LAPAR-A and SMFANet generate incomplete or irregular curves that disrupt the facial structure and visual coherence.

These visual comparison results strongly demonstrate that, thanks to the three-branch collaborative modeling of the MAFA module and the implicit cross-window information interaction brought by the PBPS mechanism, PECNet can achieve superior visual reconstruction quality, especially in restoring sharp edges and maintaining structural integrity.

5. Conclusions and Prospect

Conclusions. In this paper, we proposed PECNet, a lightweight yet powerful network for efficient image super-resolution. Its core component, the MAFA module, synergistically integrates three specialized branches: the LDE branch for high-frequency detail recovery, the EASA branch for long-range dependency modeling and the WNA branch for non-local context aggregation within shifted windows. We also proposed a novel PBPS mechanism to integrate three branches, which implicitly achieves cross-window communication without additional computational cost. Extensive experiments demonstrate that PECNet achieves a superior balance between reconstruction fidelity and computational efficiency, outperforming existing lightweight methods across multiple benchmarks.

Limitations and Future Work. Despite its efficient performance, PECNet has certain limitations. The fixed 8 × 8 window size and 4-pixel padding stride in the PBPS mechanism may restrict its flexibility when handling images with complex, non-regular structures or extreme aspect ratios. Furthermore, while the three-branch design is more parameter-efficient than many competitors, its aggregate complexity remains higher than that of purely convolutional models, presenting a challenge for deployment on extremely resource-constrained devices.

Future work will explore adaptive window sizing and dynamic padding strategies to enhance model generalization across diverse image contents. We also plan to apply the PBPS mechanism to other low-level visual tasks, such as denoising and deblurring, is a promising direction to validate its broader utility.

Author Contributions

Methodology, T.G.; Software, T.G.; Validation, T.G.; Writing, T.G.; original draft/Visualization, T.G.; Project administration, Y.L.; Funding acquisition, Y.L.; Supervision, Y.L.; Writing—review and editing, Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by National Natural Science Foundation of China under Grant 61972241, Natural Science Foundation of Shanghai under Grant 22ZR1427100 and Soft Science Project of Shanghai under Grant 25692107400.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Hui, Z.; Gao, X.; Yang, Y.; Wang, X. Lightweight Image Super-Resolution with Information Multi-distillation Network. In Proceedings of the 27th ACM International Conference on Multimedia, Nice, France, 21–25 October 2019; pp. 2024–2032. [Google Scholar]
Liu, J.; Tang, J.; Wu, G. Residual Feature Distillation Network for Lightweight Image Super-Resolution. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 41–55. [Google Scholar]
Li, Z.; Liu, Y.; Chen, X.; Cai, H.; Gu, J.; Qiao, Y.; Dong, C. Blueprint Separable Residual Network for Efficient Image Super-Resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 833–843. [Google Scholar]
Tai, Y.; Yang, J.; Liu, X. Image Super-Resolution via Deep Recursive Residual Network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3147–3155. [Google Scholar]
Lim, B.; Son, S.; Kim, H.; Nah, S.; Lee, K.M. Enhanced Deep Residual Networks for Single Image Super-Resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 136–144. [Google Scholar]
Zhang, Y.; Li, K.; Li, K.; Wang, L.; Zhong, B.; Fu, Y. Image Super-Resolution Using Very Deep Residual Channel Attention Networks. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; pp. 286–301. [Google Scholar]
Liang, J.; Cao, J.; Sun, G.; Zhang, K.; Timofte, R. SwinIR: Image Restoration Using Swin Transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 1833–1844. [Google Scholar]
Zhang, X.; Zeng, H.; Guo, S.; Zhang, L. Efficient Long-Range Attention Network for Image Super-resolution. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; pp. 649–667. [Google Scholar]
Choi, H.; Lee, J.; Yang, J. N-gram in swin transformers for efficient lightweight image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 2071–2081. [Google Scholar]
Zheng, M.; Sun, L.; Dong, J.; Pan, J. SMFANet: A Lightweight Self-Modulation Feature Aggregation Network for Efficient Image Super-Resolution. In Proceedings of the European Conference on Computer Vision, Milan, Italy, 29 September–4 October 2024. [Google Scholar]
Huang, J.B.; Singh, A.; Ahuja, N. Single image super-resolution from transformed self-exemplars. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 5197–5206. [Google Scholar]
Matsui, Y.; Ito, K.; Aramaki, Y.; Fujimoto, A.; Ogawa, T.; Yamasaki, T.; Aizawa, K. Sketch-based Manga Retrieval using Manga109 Dataset. Multimed. Tools Appl. 2017, 76, 21811–21838. [Google Scholar] [CrossRef]
Dong, C.; Loy, C.C.; He, K.; Tang, X. Learning a deep convolutional network for image super-resolution. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; pp. 184–199. [Google Scholar]
Dong, C.; Loy, C.C.; Tang, X. Accelerating the Super-Resolution Convolutional Neural Network. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 391–407. [Google Scholar]
Shi, W.; Caballero, J.; Huszár, F.; Totz, J.; Wang, Z. Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1874–1883. [Google Scholar]
Kim, J.; Lee, J.K.; Lee, K.M. Deeply-Recursive Convolutional Network for Image Super-Resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1637–1645. [Google Scholar]
Wang, C.; Li, Z.; Shi, J. Lightweight Image Super-Resolution with Adaptive Weighted Learning Network. arXiv 2019, arXiv:2001.09191. [Google Scholar] [CrossRef]
Wang, Y.; Zhang, T. OSFFNet: Omni-Stage Feature Fusion Network for Lightweight Image Super-Resolution. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; Volume 38, pp. 5660–5668. [Google Scholar]
Zhang, X.; Zhang, Y.; Yu, F. HiT-SR: Hierarchical Transformer forEfficient Image Super-Resolution. In Proceedings of the European Conference on Computer Vision, Milan, Italy, 29 September–4 October 2024; pp. 483–500. [Google Scholar]
Li, X.; Pan, J.; Tang, J.; Dong, J. DLGSANet: Lightweight Dynamic Local and Global Self-Attention Networks for Image Super-Resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–3 October 2023; pp. 12792–12801. [Google Scholar]
Jiang, Y.; Feng, C.; Zhang, F.; Bull, D. MTKD: Multi-Teacher Knowledge Distillation for Image Super-Resolution. In Proceedings of the European Conference on Computer Vision, Milan, Italy, 29 September–4 October 2024; pp. 364–382. [Google Scholar]
Qu, Y.; Yuan, K.; Zhao, K.; Xie, Q.; Hao, J.; Sun, M.; Zhou, C. XPSR: Cross-modal Priors for Diffusion-based Image Super-Resolution. In Proceedings of the European Conference on Computer Vision, Milan, Italy, 29 September–4 October 2024; pp. 285–303. [Google Scholar]
Chen, H.; Wang, Y.; Guo, T.; Xu, C.; Deng, Y.; Liu, Z.; Ma, S.; Xu, C.; Xu, C.; Gao, W. Pre-Trained Image Processing Transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 12299–12310. [Google Scholar]
Conde, M.V.; Choi, U.J.; Burchi, M.; Timofte, R. Swin2SR: SwinV2 Transformer for Compressed Image Super-Resolution and Restoration. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; pp. 669–687. [Google Scholar]
Yan, Z.; Li, X.; Li, M.; Zuo, W.; Shan, S. Shift-Net: Image Inpainting via Deep Feature Rearrangement. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Agustsson, E.; Timofte, R. NTIRE 2017 Challenge on Single Image Super-Resolution: Dataset and Study. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 126–135. [Google Scholar]
Su, H.; Tang, L.; Wu, Y.; Tretter, D.; Zhou, J. Spatially Adaptive Block-Based Super-Resolution. IEEE Trans. Image Process. 2012, 21, 1031–1045. [Google Scholar] [PubMed]
Wang, L.; Dong, X.; Wang, Y.; Ying, X.; Guo, Y. Exploring Sparsity in Image Super-Resolution for Efficient Inference (CVPR’2021). In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 4917–4926. [Google Scholar]
Sun, L.; Pan, J.; Tang, J. Shufflemixer: An efficient convnet for image super-resolution. In Proceedings of the Advances in Neural Information Processing Systems, New Orleans, LA, USA, 28 November–9 December 2022; Volume 35, pp. 17314–17326. [Google Scholar]
Li, W.; Zhou, K.; Qi, L.; Jiang, N.; Lu, J.; Jia, J. LAPAR: Linearly-Assembled Pixel-Adaptive Regression Network for Single Image Super-resolution and Beyond. In Proceedings of the Advances in Neural Information Processing Systems, Virtual, 6–12 December 2020; Volume 33, pp. 20343–20355. [Google Scholar]
Li, F.; Cong, R.; Wu, J.; Bai, H.; Wang, M.; Zhao, Y. SRConvNet: A Transformer-Style ConvNet for Lightweight Image Super-Resolution. Int. J. Comput. Vis. 2025, 133, 173–189. [Google Scholar] [CrossRef]
Bevilacqua, M.; Roumy, A.; Guillemot, C.; Morel, A. Low-Complexity Single Image Super-Resolution Based on Nonnegative Neighbor Embedding. In Proceedings of the British Machine Vision Conference, Surrey, UK, 3–7 September 2012. [Google Scholar]
Zeyde, R.; Elad, M.; Protter, M. On Single Image Scale-Up Using Sparse-Representations. In Proceedings of the International Conference on Curves and Surfaces, Oslo, Norway, 28 June–3 July 2012; pp. 711–730. [Google Scholar]
Arbeláez, P.; Maire, M.; Fowlkes, C.; Malik, J. Contour Detection and Hierarchical Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 33, 898–916. [Google Scholar] [CrossRef] [PubMed]
Sun, L.; Dong, J.; Tang, J.; Pan, J. Spatially-Adaptive Feature Modulation for Efficient Image Super-Resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–3 October 2023; pp. 13190–13199. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Loshchilov, I.; Hutter, F. SGDR: Stochastic Gradient Descent with Restarts. arXiv 2016, arXiv:1608.03983. [Google Scholar]

Figure 1. Performance of our method versus other state-of-the-art lightweight methods on Manga109 [12] for

\times 4

SR. The circle sizes represent the number of FLOPs of the model. The proposed PECNet achieves a better trade-off between computational efficiency and reconstruction performance.

Figure 1. Performance of our method versus other state-of-the-art lightweight methods on Manga109 [12] for

\times 4

SR. The circle sizes represent the number of FLOPs of the model. The proposed PECNet achieves a better trade-off between computational efficiency and reconstruction performance.

Figure 2. Network architecture of the proposed PECNet. The proposed PECNet consists of a shallow feature-extraction module, periodically enhanced cascade blocks (PECBs) and a lightweight image reconstruction module. Periodically enhanced cascade blocks contains one multi-scale adaptive feature aggregation (MAFA) module and one partial convolution-based feed-forward network (PCFN).

Figure 3. Network architecture of the proposed PBPS. The PBPS mechanism processes odd-indexed PECBs. It first applies padding to expand the feature map, and then a center cropping operation to restore the original size. LEW refers to the set of three branches (LDE, EASA, WNA) along with the feature partitioning and fusion components.

Figure 4. Visual comparisons for $\times 4$ SR on the Urban100 and Manga109 datasets. The comparison between PECNet and typical lightweight methods is shown on the right side of each image.

Table 1. Comparison of different lightweight SISR methods.

Model	Core Architecture	High-Frequency	Global Modeling	Window Shift	Key Innovations and Limitations
SwinIR	Pure Transformer	Window self-attention	Window self-attention + explicit shift	Explicit	Pioneered shift-window, but self-attention insensitive to high-frequency details, explicit shifting is not suitable for multiple branches
NGswin	Conv + Transformer Hybrid	Partial relief via conv priors	N-Gram context + explicit shift	Explicit	Effective fusion, but explicit shifting, lacks dedicated high-frequency branch
SAFMN	Pure CNN (Single-path)	Spatial adaptive modulation	Large-kernel conv (limited receptive field)	None	Simple and efficient, but limited long-range dependencies
SMFANet	CNN (Dual-path)	Dedicated path for high-freq details	Dedicated path for low-freq structures	None	Feature separation via dual-path, but insufficient inter-path interaction
PECNet (Ours)	Collaborative Three-Branch Hybrid	Dedicated LDE branch	Dual-path: EASA + WNA	PBPS Implicit	Innovation 1: Three-branch collaboration with clear division of labor Innovation 2: PBPS enables implicit global shifting, uniformly supports all branches

Table 2. The ablation experimental results of the three-branch architecture. The best performances are highlighted in red colors.

Methods	LDE Branch	EASA Branch	WNA Branch	#Params (K)	#FLOPs (G)	Set5 (PSNR/SSIM)
S_LDE	✓			241	10	31.49/0.8837
S_EASA		✓		241	6	31.52/0.8857
S_WNA			✓	241	11	31.79/0.8890
D_L+E	✓	✓		251	11	31.77/0.8873
D_E+W		✓	✓	251	12	31.93/0.8905
D_L+W	✓		✓	251	15	31.87/0.8901
T_L+E+W	✓	✓	✓	262	16	31.94/0.8909
(our MAFA)

Table 3. The ablation experimental results of the PBPS mechanism. The best performances are highlighted in red colors.

Methods	Explicit Window Shifting	PBPS	#Params (K)	#FLOPs (G)	Set5 (PSNR/SSIM)
T_L+E+W			262	16	31.94/0.8909
WNA-shift	✓		260	14	31.79/0.8889
PBPS(our)		✓	262	16	32.02/0.8917

Table 4. The ablation experimental results of ablation sub-ablation models with different Padding methods in Model PBPS(our). The best performances are highlighted in red colors.

Padding Methods	#Params (K)	#FLOPs (G)	Set5 (PSNR/SSIM)
replicate	262	16	32.02/0.8917
circular	262	16	32.00/0.8916
reflect	262	16	31.98/0.8914
constant = 0	262	16	31.98/0.8914

Table 5. Quantitative comparison of our PECNet with existing state-of-the-art lightweight SISR methods on five benchmark datasets. #FLOPs is measured corresponding to an HR image of the size

1280 \times 720

pixels. The best and second-best performances are highlighted in red and blue colors.

Table 5. Quantitative comparison of our PECNet with existing state-of-the-art lightweight SISR methods on five benchmark datasets. #FLOPs is measured corresponding to an HR image of the size

1280 \times 720

pixels. The best and second-best performances are highlighted in red and blue colors.

Scale	Methods	#Params (K)	#FLOPs (G)	Set5	Set14	B100	Urban100	Manga109
$\times 2$	SMSR	985	132	38.00/0.9601	33.64/0.9197	32.17/0.8990	32.19/0.9284	38.76/0.9771
	ShuffleMixer	394	91	38.01/0.9606	33.63/0.9180	32.17/0.8995	31.89/0.9257	38.83/0.9774
	SAFMN	228	52	38.00/0.9605	33.54/0.9177	32.16/0.8995	31.84/0.9256	38.71/0.9771
	SMFANet	186	41	38.08/0.9607	33.65/0.9185	32.22/0.9002	32.20/0.9282	39.11/0.9779
	LAPAR-A	584	171	38.01/0.9605	33.62/0.9183	32.19/0.8999	32.10/0.9283	38.67/0.9772
	NGswin	998	146	38.05/0.9610	33.79/0.9199	32.27/0.9008	32.53/0.9324	38.97/0.9777
	SRConvNet	387	74	38.00/0.9605	33.58/0.9186	32.16/0.8995	32.05/0.9272	38.87/0.9774
	PECNet(ours)	250	61	38.09/0.9611	33.82/0.9201	32.24/0.9005	32.46/0.9309	39.19/0.9783
$\times 3$	SMSR	993	68	34.40/0.9270	30.33/0.8412	29.10/0.8050	28.25/0.8536	33.68/0.9445
	ShuffleMixer	415	43	34.40/0.9272	30.37/0.8423	29.12/0.8051	28.08/0.8498	33.69/0.9448
	SAFMN	233	23	34.34/0.9267	30.33/0.8418	29.08/0.8048	27.95/0.8474	33.52/0.9437
	SMFANet	191	19	34.42/0.9274	30.41/0.8430	29.16/0.8065	28.22/0.8523	33.96/0.9460
	LAPAR-A	594	114	34.36/0.9267	30.34/0.8412	29.11/0.8054	28.15/0.8523	33.51/0.9441
	NGswin	1007	66	34.52/0.9282	30.53/0.8456	29.19/0.8078	28.52/0.8603	33.89/0.9470
	SRConvNet	387	33	34.40/0.9272	30.30/0.8416	29.07/0.8047	28.04/0.8500	33.56/0.9443
	PECNet(ours)	255	28	34.51/0.9284	30.53/0.8453	29.20/0.8079	28.43/0.8565	34.18/0.9476
$\times 4$	SMSR	1006	42	32.12/0.8932	28.55/0.7808	27.55/0.7351	26.11/0.7868	30.54/0.9085
	ShuffleMixer	411	28	32.21/0.8953	28.66/0.7827	27.61/0.7366	26.08/0.7835	30.65/0.9093
	SAFMN	240	14	32.18/0.8948	28.60/0.7813	27.58/0.7359	25.97/0.7809	30.43/0.9063
	SMFANet	197	11	32.25/0.8956	28.71/0.7833	27.64/0.7377	26.18/0.7862	30.82/0.9104
	LAPAR-A	659	94	32.15/0.8944	28.61/0.7818	27.61/0.7366	26.14/0.7871	30.42/0.9074
	NGswin	1019	40	32.33/0.8963	28.78/0.7859	27.66/0.7396	26.45/0.7963	30.80/0.9128
	SRConvNet	382	22	32.18/0.8951	28.61/0.7818	27.57/0.7359	26.06/0.7845	30.35/0.9075
	PECNet(ours)	262	16	32.38/0.8969	28.81/0.7857	27.69/0.7396	26.35/0.7916	31.05/0.9136

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gao, T.; Liu, Y. PECNet: A Lightweight Single-Image Super-Resolution Network with Periodic Boundary Padding Shift and Multi-Scale Adaptive Feature Aggregation. Symmetry 2025, 17, 1833. https://doi.org/10.3390/sym17111833

AMA Style

Gao T, Liu Y. PECNet: A Lightweight Single-Image Super-Resolution Network with Periodic Boundary Padding Shift and Multi-Scale Adaptive Feature Aggregation. Symmetry. 2025; 17(11):1833. https://doi.org/10.3390/sym17111833

Chicago/Turabian Style

Gao, Tianyu, and Yuhao Liu. 2025. "PECNet: A Lightweight Single-Image Super-Resolution Network with Periodic Boundary Padding Shift and Multi-Scale Adaptive Feature Aggregation" Symmetry 17, no. 11: 1833. https://doi.org/10.3390/sym17111833

APA Style

Gao, T., & Liu, Y. (2025). PECNet: A Lightweight Single-Image Super-Resolution Network with Periodic Boundary Padding Shift and Multi-Scale Adaptive Feature Aggregation. Symmetry, 17(11), 1833. https://doi.org/10.3390/sym17111833

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

PECNet: A Lightweight Single-Image Super-Resolution Network with Periodic Boundary Padding Shift and Multi-Scale Adaptive Feature Aggregation

Abstract

1. Introduction

2. Related Work

3. Proposed Method

3.1. Overall Architecture

3.2. Three-Branch Architecture in MAFA

3.3. Periodic Boundary Padding Shift Mechanism and Its Inverse Operation in MAFA

4. Experimental Results

4.1. Datasets and Implementation

4.2. Ablation Experiment

4.3. Comparisons with State-of-the-Art Methods

5. Conclusions and Prospect

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI