Next Article in Journal
Development of Machine-Learning-Based Facial Thermal Image Analysis for Dynamic Emotion Sensing
Previous Article in Journal
A Coarse Alignment Algorithm Based on Vector Reconstruction via Sage–Husa AKF for SINS on a Swaying Base
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Lightweight Image Super-Resolution Reconstruction Network Based on Multi-Order Information Optimization

by
Shengxuan Gao
1,
Long Li
2,
Wen Cui
2,
He Jiang
2 and
Hongwei Ge
1,*
1
School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
2
School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
*
Author to whom correspondence should be addressed.
Sensors 2025, 25(17), 5275; https://doi.org/10.3390/s25175275
Submission received: 23 July 2025 / Revised: 12 August 2025 / Accepted: 21 August 2025 / Published: 25 August 2025
(This article belongs to the Section Sensing and Imaging)

Abstract

Traditional information distillation networks using single-scale convolution and simple feature fusion often result in insufficient information extraction and ineffective restoration of high-frequency details. To address this problem, we propose a lightweight image super-resolution reconstruction network based on multi-order information optimization. The core of this network lies in the enhancement and refinement of high-frequency information. Our method operates through two main stages to fully exploit the high-frequency features in images while eliminating redundant information, thereby enhancing the network’s detail restoration capability. In the high-frequency information enhancement stage, we design a self-calibration high-frequency information enhancement block. This block generates calibration weights through self-calibration branches to modulate the response strength of each pixel. It then selectively enhances critical high-frequency information. Additionally, we combine an auxiliary branch and a chunked space optimization strategy to extract local details and adaptively reinforce high-frequency features. In the high-frequency information refinement stage, we propose a multi-scale high-frequency information refinement block. First, multi-scale information is captured through multiplicity sampling to enrich the feature hierarchy. Second, the high-frequency information is further refined using a multi-branch structure incorporating wavelet convolution and band convolution, enabling the extraction of diverse detailed features. Experimental results demonstrate that our network achieves an optimal balance between complexity and performance, outperforming popular lightweight networks in both quantitative metrics and visual quality.

1. Introduction

Image super-resolution (SR) reconstructs a low-resolution (LR) image into a high-resolution (HR) one. It has made substantial contributions to medical imaging [1,2], smart mining [3,4], and computational photography [5,6]. Traditional SR methods fall into two categories: interpolation [7] and reconstruction [8]. However, interpolation often results in blurry, distorted images, while reconstruction is computationally intensive, hindering their practical use.
In recent years, with the development of deep learning, convolutional neural networks (CNNs) [9,10] have been widely used in image super-resolution (SR). Dong et al. [11] first introduced CNNs into SR and proposed the super-resolution convolutional neural network (SRCNN), which consists of three convolutional layers. Subsequently, the same team proposed the Fast Super-Resolution Convolutional Neural Network (FSRCNN) [12], which significantly improves inference speed by incorporating deconvolution layers and a more compact network architecture. Shi et al. [13] proposed a pixel shuffling strategy and used it to construct the efficient sub-pixel convolutional neural network (ESPCN). However, SRCNN, FSRCNN, and ESPCN have shallow network layers and poor reconstruction effects. To address these limitations, Kim et al. [14] increased the number of network layers and proposed very deep convolutional networks for single image super-resolution (VDSR), which improved network performance to some extent. Lim et al. [15] proposed enhanced deep residual networks for single image super-resolution (EDSR), which constructs an even deeper network by removing the batch normalization layer, achieving excellent reconstruction results. Although VDSR and EDSR achieved good reconstruction results, their large number of network parameters and computational overhead make it difficult to deploy them on mobile devices for practical applications. Consequently, researchers have proposed recursive, pruning, and information distillation methods to develop lightweight networks. Among these, the residual feature distillation network (RFDN) [16], as a typical information distillation network, refines features layer by layer through a flexible feature distillation mechanism, ensuring a lightweight network while demonstrating efficient reconstruction capabilities.
Currently, most information distillation networks still use the RFDN’s information distillation mechanism, which completes feature extraction by stacking single-scale convolution in the distillation block, directly fuses the distilled features of different layers, and then unifies all the features. Although this approach meets the lightweight network requirements, it is prone to information singularity during feature extraction and the dilution or loss of high-frequency information due to the simple fusion mechanism. To address these issues, we propose a lightweight image super-resolution reconstruction network based on multi-order information optimization (MOION). The network optimizes image detail restoration through four streamlined stages: high-frequency enhancement, information distillation, frequency refinement, and feature fusion, effectively extracting critical features while eliminating redundancy.
The core component of MOION is the multi-order information optimization block (MOIOB), which incorporates three dual-branch self-calibrating high-frequency information enhancement block (SCHIEB). These generate wavelet-derived calibration weights to amplify critical features while suppressing noise, complemented by auxiliary branches and spatial optimization for local detail extraction. Following the enhancement stage, we set up four distillation layers. Each layer compresses the enhanced multi-channel information into a small number of key features using low-dimensional convolution. This ensures the lightweight design of network. Subsequently, we construct a multi-scale high-frequency information refinement block (MSHIRB), which further refines the distilled key features through multiplicity sampling and a multi-branch feature extraction strategy, enabling the network to capture diverse image details. Finally, we introduce an enhanced spatial attention block to weight and map the processed features, further strengthening the representation of key regions and enabling full feature fusion and utilization.
The main contributions of this paper are summarized as follows:
  • We propose a self-calibrating high-frequency information enhancement block (SCHIEB). By designing an adaptive high-frequency enhancement mechanism, the network can dynamically adjust feature representation across different regions, addressing the insufficient high-frequency expression in traditional distillation networks.
  • We design a multi-scale high-frequency information refinement block (MSHIRB). By using a lightweight multiplicity sampling and multi-branch feature extraction method, it fully captures the remaining multi-scale information and high-frequency details, solving the problem of limited feature diversity in traditional distillation networks.
  • We propose a multi-order information optimization block (MOIOB). Compared to traditional distillation blocks, our architecture establishes a complete information optimization path, enabling better extraction of high-frequency features and removal of redundant information, thus improving detail recovery.

2. Related Work

2.1. Lightweight SR Network

Deep networks, dependent on vast parameters and computational resources, pose training challenges and hinder practical applications. This has spurred interest in lightweight networks suitable for mobile deployment. Tai et al. [17] presented the deep recursive residual network (DRRN), which uses parameter sharing and recursive learning to effectively reduce network parameters. Yu et al. [18] introduced the distillation and iterative pruning network (DIPNet), adopting pruning techniques to remove redundant connections and thus enhance network efficiency and generalization ability. Sun et al. [19] proposed the spatially-adaptive feature modulation network (SAFMN), designing spatially adaptive feature modulation mechanisms to dynamically select representative features and increase information processing speed. Lu et al. [20] developed the efficient super-resolution transformer (ESRT), mixing CNN and lightweight Transformer backbone to extract deep features efficiently at a low computational cost. Li et al. [21] designed the cross-receptive focused inference network (CFIN), which combines cross-scale aggregation blocks with cross-acceptance focusing mechanisms to eliminate redundant features and achieve a good balance between network performance and complexity.
Due to hardware constraints, super-resolution networks must balance computational efficiency with enhanced detail reconstruction. To overcome this limitation, we redesign traditional information distillation through multi-order information optimization block, establishing comprehensive informa-tion optimization path. Through this optimized architectural design, our network enhances detail recovery while maintaining low computational overhead.

2.2. Lightweight SR Network Based on Information Distillation

Currently, researchers propose a variety of efficient information distillation networks to meet the needs of practical applications. Hui et al. [22] first proposed the information distillation network (IDN), which first performs channel segmentation of the features, processes only some of the features, and finally aggregates them with the retained original features. This approach greatly reduces the computational complexity of the network. Subsequently, Hui et al. [23] proposed the information multi-distillation network (IMDN), which gradually extracts features by cascading distillation layers and improves the efficiency of network feature extraction. Based on the IMDN architecture, Liu et al. [16] proposed RFDN, which refines the features layer by layer through a flexible feature distillation mechanism and demonstrates efficient reconstruction capability while ensuring lightweight. Kong et al. [24] proposed a residual local feature network (RLFN), which removes the feature distillation connections and significantly accelerates the network inference. Li et al. [25] proposed a blueprint separable residual network (BSRN), which reduces the redundant computation in the feature extraction block by blueprint separable convolution and enhances the distillation feature extraction capability by combining with an efficient attention block.
However, most of these networks focus on structural simplification and computational efficiency but neglect the balance between performance and complexity. The proposed multi-order information optimization network bridges this research gap. It enhances feature extraction and captures more image details. Additionally, it improves network performance with fewer added parameters, offering a better trade-off than existing information distillation networks.

3. Multi-Order Information Optimization Network

3.1. Network Architecture

The overall structure of a multi-order information optimization network (MOION) is shown in Figure 1. MOION consists of four parts: shallow feature extraction, deep feature extraction, multi-layer feature fusion, and reconstruction. For the input low-resolution image I L R , we use a 3 × 3 convolution to extract shallow features F 0 . The process is represented as
F 0 = h I L R = C 3 I L R
where h ( · ) denotes shallow feature extraction function and C 3 ( · ) denotes 3 × 3 convolution. F 0 is fed into multiple multi-order information optimization block (MOIOB) to extract the deep features step by step, and the process is expressed as
F n = H MOIOB n H MOIOB n 1 H MOIOB 1 F 0
where H MOIOB n ( · ) denotes the nth MOIOB function and F n denotes the output of the nth MOIOB. In order to fully utilize the features of all depths, they are spliced and activated by 1 × 1 convolutional fusion and GELU, and then the fused features are refined by 3 × 3 convolution, which is denoted as
F f u s e d = C 3 ( C 1 ( C o n c a t ( F 1 , , F n ) ) )
where Concat ( · ) denotes feature splicing along the channel dimension, C 1 ( · ) denotes 1 × 1 convolution, and F f u s e d denotes fused features. In order to take advantage of residual learning, the F 0 and the fused features are summed and fed into the reconstruction part. This part consists of one 3 × 3 convolution and pixel shuffling operation [13] for up-sampling the image. The process is represented as
I S R = H r e c F f u z e d F 0
where H r e c ( · ) denotes the reconstruction function and I S R denotes the output super-resolution image. In this paper, the network is trained by minimizing the l 1 loss function, which is denoted by L ( θ ) :
L ( θ ) = 1 N i = 1 N H MOION I L R i I H R i 1
where θ denotes the learnable parameters of MOION, H MOION ( · ) denotes the MOION function, · 1 denotes the number of l 1 norm, I L R i and I H R i denote the ith input LR and HR image sample pairs, N is the total number of samples, i is the number of sample serial numbers.

3.2. Multi-Order Information Optimization Block

Traditional information distillation blocks typically use single-scale convolution in their backbone to extract high-frequency information and then directly fuse the distilled results. This leads to insufficiently rich and overly uniform feature information, failing to fully capture the image’s diverse details. In order to solve the above problems, we propose a multi-order information optimization block (MOIOB), which fully exploits the high-frequency information in the image through four stages: high-frequency information enhancement, information distillation, high-frequency information refinement, and information fusion. The structure of MOIOB is shown in Figure 2.
The first stage consists of three series-connected self-calibrating high-frequency information enhancement blocks (SCHIEBs) which aim to enhance the high-frequency information of the input image. Furthermore, this stage provides the subsequent information distillation and refinement stages with richer high-frequency features. Taking the first nth MOIOB in Figure 1 as an example, the input is F n 1 , and the high-frequency information enhancement stage can be expressed as follows:
F S 1 = H SCHIEB ( F n 1 ) , F S ( i + 1 ) = H SCHIEB ( F S i ) , i = 1 , 2
where F S i denotes the output of the ith SCHIEB and H S C H I E B i ( · ) denotes the SCHIEB function.
The second stage consists of three 1 × 1 convolution layers and one 3 × 3 convolution layer, which compresses the information of multiple channels into fewer key features through low-dimensional convolution to achieve information distillation. This stage intends to reduce the information redundancy, refine the feature representation via low-dimensional convolution, and ensure the lightweight of the network. The information distillation stage can be expressed as
F d i s t i l l e d 1 = H d i s t i l l F n 1 , F d i s t i l l e d j + 1 = H d i s t i l l F S i , j = 1 , 2 , 3
where F d i s t i l l e d i and F d i s t i l l e d j + 1 denote the distilled features and H d i s t i l l ( · ) denotes the information distillation function.
The third stage consists of two multi-scale high-frequency information refinement blocks (MSHIRBs) whose purpose is to refine the distilled information, further refine the high-frequency features, and optimize the final reconstruction effect. The high-frequency information refinement stage can be expressed as
F M 1 = H MSHIRB C o n c a t F d i s t i l l e d 1 , F d i s t i l l e d 2 , F M 2 = H MSHIRB C o n c a t F d i s t i l l e d 3 , F d i s t i l l e d 4
where F M 1 and F M 2 denote the output of two MSHIRBs and H M S H I R B ( · ) denotes the MSHIRB function. The outputs of the MSHIRB are fused in the fourth stage. In this paper, the features are channel rearranged using channel blending to break the isolation of the information between each channel and avoid the singularity of feature information.
Finally, the features after 1 × 1 convolutional smoothing are then fed into the enhanced spatial attention block (ESAB) [26] for weighted combination and feature mapping, which helps the network to focus on more discriminative features in the spatial domain to improve the efficiency of information utilization. The information fusion stage can be represented as
F n = H f u s e C o n c a t F M 1 , F M 2
where F n denotes the output of the nth MOIOB and H f u s e ( · ) denotes the information fusion function. Through the above four stages, MOIOB can remove redundant information to optimize the information extraction and fusion process, and it improves the network’s ability to recover the details of the features.

3.3. Self-Calibrating High-Frequency Information Enhancement Block

In image super-resolution tasks, edges and textures are crucial for image restoration, and this information is usually embedded in the high-frequency features of the image. However, the operation of stacked convolution of conventional distillation blocks cannot dynamically adjust the feature expression in different regions, resulting in high-frequency information being easily interfered by noise, which affects the quality of image reconstruction. For this reason, we propose the self-calibrating high-frequency information enhancement block (SCHIEB) which adaptively enhances the high-frequency information through dual branching.
The structure of SCHIEB is shown in Figure 3, which includes self-calibrating branch (SCB) and auxiliary branch (AB). In SCB, the input features are processed in two steps: first, we use 1 × 1 convolution to downscale the input features to reduce the computational complexity and then extract the local high-frequency features by 3 × 3 convolution after activation by the GELU function; second, we introduce wavelet convolution (WTConv) [27], which realizes a larger sensory field and helps the network to capture the shape information of the image. The input features are first processed through a WTConv-5 layer and sigmoid activation to generate calibration weights. Then, they are multiplied with the outputs of a 3 × 3 convolutional layer. This operation controls pixel-wise response intensity and suppresses the noise. Finally, it can adaptively enhance the high-frequency information. Taking the first SCHIEB in Figure 2 as an example, the processing of features by SCB can be expressed as follows:
F n 1 S C B = σ W T C o n ν 5 F n 1 C 3 G E L U C 1 F n 1
where σ ( · ) denotes the sigmoid function, W T C o n ν 5 ( · ) denotes the wavelet convolution, F n 1 denotes the input of SCHIEB, G E L U ( · ) denotes the GELU activation function, and F n 1 S C B denotes the output of SCB. In AB, the input features are also activated by dimensionality reduction, then the significant features in the local region are retained by maxpooling, and finally the local details are further refined by 1 × 1 convolution accumulation to further refine the local details to assist in enhancing the high-frequency information. The processing of features in AB can be represented as
F n 1 A B = C 1 M a x P o o l G E L U C 1 ( F n 1 )
where M a x P o o l ( · ) denotes maximum pooling and F n 1 A B ( · ) denotes the output of AB.
The features enhanced by SCB and AB are spliced in the channel dimension after GELU activation, respectively. In order to capture more useful features in the space, we input the spliced features into the chunked space optimization block (CSOB) to further optimize the feature representation, the structure of which is shown in Figure 4. The CSOB module builds upon spatially adaptive feature modulation [19], employing feature partitioning and adaptive max-pooling for multi-scale downsampling. Local contextual information within partitioned regions is captured through depthwise convolutions, followed by upsampling and channel-wise concatenation of processed features. Spatial correlations across blocks are aggregated via efficient blueprint separable convolution [25]. The optimized features are ultimately obtained through element-wise multiplication. The CSOB effectively implements the cross-scale information interactions and further enhances the diversity of the feature expression. Finally, the CSOB-optimized features are summed with the original features through residual linkage as the output of SCHIEB. The above process can be expressed as
F n 1 = C o n c a t G E L U F n 1 S C B , G E L U F n 1 A B , F S 1 = F n 1 H CSOB L N F n 1
where F n 1 denotes the features after two-branch splicing, F S 1 denotes the output of SCHIEB, L N ( · ) denotes layer normalization, and H CSOB ( · ) denotes the CSOB function.

3.4. Multi-Scale High-Frequency Information Refinement Block

The traditional information distillation block directly fuses features from different layers and then unifies all the features. In this process, high-frequency information is diluted or lost due to simple weighting and fusion. In order to further refine the distillation information and retain more high-frequency features, we propose the multi-scale high-frequency information refinement block (MSHIRB) as shown in Figure 5. It takes two adjacent post-distilled features as input. First, these features are spliced along the channel dimension. Then, channel blending is applied to enhance cross-level information exchange. Finally, a 1 × 1 convolution reduces the dimension, lowering the network’s computational complexity.
In MSHIRB, we use multiplicity sampling to accomplish the extraction of high-frequency information. This approach can capture multi-scale high-frequency information with very little overhead. Specifically, the downsampled features are downsampled two, four, and eight times by maxpooling and recovered to the original feature map size by interpolation to obtain the features containing more low-frequency information and subtracted from the original features element by element to extract the features containing multi-scale high-frequency information. Taking the distillation features F d i s t i l l e d 1 and F d i s t i l l e d 2 as an example, the process of multiplicity sampling can be expressed as
F h = C 1 H C s h u f f l e C o n c a t F d i s t i l l e d 1 , F d i s t i l l e d 2 , F h 2 = F h H u p 2 H d o w n 2 F h , F h 4 = F h H u p 4 H d o w n 4 F h , F h 8 = F h H u p 8 H d o w n 8 F h
where F h ( · ) denotes features after dimensionality reduction, H C s h u f f l e ( · ) denotes the channel mixing operation, F h 2 , F h 4 , and F h 8 are the multiscale high-frequency features obtained by multiplicity sampling, H d o w n 2 ( · ) , H d o w n 4 ( · ) and H d o w n 8 ( · ) indicate two-, four- and eight-time downsampling, H u p 2 ( · ) , H u p 4 ( · ) , and H u p 8 ( · ) indicate two-, four-, and eight-time upsampling, respectively. The multiplicity sampling effectively retains and refines the high-frequency features in the distillation information.
To further refine these features and enable the network to capture richer texture information, we propose the multi-branch feature extraction block (MBFEB) illustrated in Figure 6. The multi-scale high-frequency information, after channel splicing and dimensionality reduction, is fed into MBFEB. Within MBFEB, channel segmentation divides the information into four parts. While part of the original channel information is retained, the remaining three branches undergo wavelet and band convolutions separately. Wavelet convolution [27] captures the image’s shape information, while band convolution extracts its horizontal and vertical texture information. By fusing diverse information from different branches, MBFEB enhances the network’s ability to recover image details. The refinement process of features by MBFEB can be represented as
F h = C o n c a t F h 2 , F h 4 , F h 8 , F M 1 = H MBFEB C 1 F h
where F h denotes the multiplicity sampled features spliced in the channel dimension, H M B F E B ( · ) denotes the MBFEB function, and F M 1 denotes the output of MSHIRB.

4. Experimental Results and Analysis

4.1. Experimental Setup

The experiments are conducted in an environment with an Intel i5-13490F processor, a NVIDIA RTX 4070 graphics card, and a Pytorch 10.1 framework. The initial learning rate of the network is set to 5 × 10 4 , and the number of training rounds is halved when the number of training rounds reaches 200 with a total of 1000 rounds of training. The training image cropping block size is 64 × 64 with 16 small blocks input per batch. The optimizer is ADAM [28] with parameters set to β 1 = 0.9 , β 2 = 0.999 , ϵ = 10 8 . The number of input channels to the network is 64 and the number of MOIOB is 6. All networks in the ablation experiment are trained for 300 rounds with 32 input channels and the rest of the training settings are the same as the established configuration.

4.2. Datasets and Evaluation Indicators

In this paper, 800 pairs of images from DIV2K [29] are used as the training set, and four publicly available datasets, Set5 [30], Set14 [31], B100 [32], and Urban100 [33], are used as the test set. The network measures the complexity by the number of parameters and floating-point operations FLOPs, and the quality of the reconstructed image is measured by peak signal to noise ratio (PSNR) and structural similarity (SSIM) [34]. The PSNR is measured in dB, and the larger the value, the higher the quality of the reconstructed image. The PSNR is calculated using the following formula:
M S E = 1 H × W i = 1 H j = 1 W x i , j y i , j 2 , PSNR = 10 lg M A X I 2 M S E
where x is the reconstructed image, y is the real high resolution image, M S E is the mean square error, x ( i , j ) and y ( i , j ) are the pixel values at the corresponding coordinates, H and W are the height and width of the image, respectively, M A X I is the maximum pixel value in the image. SSIM mainly consists from the brightness, structure, and contrast to consider in the reconstruction of the quality of the image, which take a range from 0 to 1. The closer the value to 1, the higher the quality of the reconstructed image. SSIM is calculated by the following formula:
SSIM = 2 μ x μ y + c 1 2 σ x y + c 2 μ x 2 + μ y 2 + c 1 σ x 2 + σ y 2 + c 2
where the reconstructed image is x, the real high-resolution image is y, μ x and μ y are the average pixel values of x and y, respectively, σ x 2 and σ y 2 are the variances of x and y, respectively, σ x y is the covariance of x and y, and C 1 , C 2 are constants.

4.3. Network Performance Comparison

4.3.1. Comparison of Objective Quantitative Indicators

In order to verify the superiority of the networks proposed in this paper, MOION is compared with the current state-of-the-art lightweight networks, including EDSR-baseline [15], IMDN [23], RFDN [16], BSRN [25], SAFMN [19], DLSR [35], DRSAN [36], HAFRN [37], OSFFNet [38], and HSRNet [39]. As can be seen from Table 1, MOION achieves optimality in all metrics. As the scale factor increases, the more high-frequency information is needed for image reconstruction, the more difficult it is to reconstruct, and MOION has a more obvious advantage in large-scale reconstruction metrics compared with other networks. The Urban100 dataset, with its challenging and complex texture, can better validate the network’s reconstruction capability. Taking × 4 as an example, MOION improves PSNR by 0.27 dB and SSIM by 0.0071 on Urban100 compared to HSRNet with the number of parameters greater than 1 M , while the number of parameters is only 65 % of HSRNet. Compared with IMDN, which has a close number of parameters, MOION improves PSNR by 0.51 dB and SSIM by 0.0167 on Urban100, while the computation is 12 % less than IMDN. This shows that MOION has superior performance and a good trade-off between complexity and performance.

4.3.2. Comparison of Subjective Visual Effects

In order to visualize the reconstruction performance of MOION, images with complex texture details in B100 and Urban100 are selected for reconstruction at × 4 scale, and the reconstruction results are compared with IMDN, RFDN, BSRN, LatticeNet [43], ESRT [20], and NGSwin [44] in terms of subjective visual effects. The experimental results are shown in Figure 7, Figure 8, Figure 9 and Figure 10. In image 86000, MOION reconstructs straighter and clearer grid lines, while the rest of the networks reconstruct distorted and blurred lines. In image 210088, MOION reconstructs the fisheye shape closest to HR, while the rest of the networks reconstruct the fisheye with obvious distortion. In image 058, MOION reconstructs all curves completely, while the remaining networks fail to reconstruct them completely or are illegible. In image 015, MOION reconstructs the lines in the correct orientation, while the remaining networks all reconstruct the wrong orientation. Overall, the comparison of the reconstruction results in Figure 7, Figure 8, Figure 9 and Figure 10 further demonstrates the advanced performance of MOION.

4.3.3. Comparison with Transformer-Based Networks

In recent years, the application of Transformer in SR has greatly improved the reconstruction performance and demonstrated great competitiveness compared with CNN networks. In order to further verify the superiority of MOION, seven Transformer-based lightweight networks are selected for comparison in this paper, including SwinIR-light [45], LBNet [46], ESRT, NGSwin, DRSAN [36], CFIN [21], and HCFormer [47], and their results are shown in Table 2. Compared with HCFormer, MOION achieves 16 optimal and 6 suboptimal out of 24 metrics, while HCFormer achieves 7 optimal and 5 suboptimal. The total number of optimal and suboptimal values of MOION is more than that of HCFormer, and the number of parameters required for the network at × 2 , × 3 , and × 4 scales is reduced by 10.4%, 10.6%, and 11% respectively. Compared with SwinIR-light, the total number of most and second best of MOION is still more than SwinIR-light, and the number of parameters and computation required by the network is lower than that of SwinIR-light in all scales. Compared with the rest of the networks, MOION achieves the optimum in most of the metrics, which further validates the superiority of MOION in image reconstruction.

4.4. Ablation Studies

In order to investigate the effect of the main blocks of the network on the performance, we conduct ablation experiments on WTConv-5, which provides self-calibrating weights, CSOB, a chunked spatial optimization block, and MSHIRB, a multiscale high-frequency information refinement block, respectively. The test dataset is Urban100 with complex texture and a scale factor of × 4 . The network with the three blocks removed is used as the baseline, and each block is added to the baseline for reconstruction, respectively, and the results are shown in Table 3. From the results, it can be seen that when only WTConv-5 is used; compared to the baseline, PSNR is improved by 0.12 dB, SSIM is improved by 0.0036, while the number of parameters and computation are only increased by 38 K and 0.85 G, respectively. When only CSOB and MSHIRB are used, the network also obtains a large performance improvement at the cost of a small increase in overhead. This proves the effectiveness of each block. When only any two blocks are used, the network obtains a greater performance improvement compared to using each block individually, and when all three blocks are used simultaneously, the PSNR improves by 0.21 dB compared to the baseline. The SSIM improves by 0.0076, optimizing the performance with only a small increase in the network complexity. This demonstrates the synergistic effects between the blocks.
MSHIRB mainly consists of multiplicity sampling and multi-branch feature extraction block MBFEB. In order to investigate the effect of the two blocks on the network performance, the removal of the two blocks is used as the baseline, and the experiments are conducted by adding each block to the baseline, respectively, and the results are shown in Table 4. When only multiplicity sampling is used, the network improves by 0.02 dB compared to the baseline PSNR and 0.0008 for SSIM. When only MBFEB is used, the network improves by 0.07 dB compared to the baseline PSNR and 0.0023 for SSIM. When both are used at the same time, the performance is optimized. To visually demonstrate the effects of multiplicity sampling and MBFEB on image reconstruction, we conducted image restoration using three network configurations: (1) employing only multiplicity sampling, (2) utilizing solely MBFEB, and (3) integrating both blocks simultaneously. The reconstructed images and their corresponding high-frequency information are shown in Figure 11a. As demonstrated, both MBFEB and multiplicity sampling contribute to capturing more high-frequency components. However, when used individually, each block still leaves some artifacts in the reconstructed images. The combined use of both blocks enables more accurate high-frequency information recovery, consequently yielding optimal reconstruction quality. This demonstrates the complementary nature of multiplicity sampling for high-frequency information and the ability of MBFEB to refine the features, while their synergistic action is required to better accomplish the function of high-frequency information refinement.
To validate the rationale of SCHIEB’s dual-branch architecture, we conducted ablation studies by retaining only the self-calibrating branch (SCB) or auxiliary branch (AB) individually. As shown in Table 5, the dual-branch configuration achieves optimal performance, with SCB outperforming AB in single-branch tests. The SCB uses wavelet convolution to capture large-scale pixel relationships. It generates calibration weights that dynamically adjust regional feature representations, emphasizing high-frequency components and significantly enhancing high-frequency information. In contrast, the AB primarily preserves local salient features through max-pooling operations but lacks self-calibrating attention mechanisms for high-frequency compensation. To visually compare the effects of each branch, we performed image reconstruction using individual branches (SCB or AB) and dual branches. The reconstructed images and their high-frequency components are shown in Figure 11b,c. It can be observed that SCB helps the network reconstruct more high-frequency components, while AB compensates for partial high-frequency information. The dual-branch configuration achieves the highest reconstruction quality. This structural comparison shows that the dual-branch design combines their complementary advantages synergistically. It adaptively enhances high-frequency information and improves the ability to recover detailed textures.
To evaluate the impact of convolution kernel sizes in MBFEB, we tested six configurations combining wavelet convolution (WTConv) and dual strip convolutions (DW), where “5-7-7” denotes WTConv-5, DW-1×7, and DW-7×1. As shown in Table 6, larger kernels in both wavelet and strip convolutions consistently enhance network performance while maintaining moderate parameter and computational costs. Specifically, expanding the wavelet convolution kernel improves the capture of broad shape features, while larger strip convolution kernels strengthen modeling of long-range horizontal/vertical texture dependencies. Notably, since each MBFEB branch processes only a subset of channel features, the 5–7–7 configuration achieves optimal performance with minimal computational overhead.

5. Conclusions

This paper proposes a lightweight image super-resolution reconstruction network based on multi-order information optimization. The core of the network lies in the enhancement and refinement of high-frequency information. Through multiple stages, it fully extracts high-frequency features and removes redundant information to improve detail restoration. For high-frequency enhancement, we design a SCHIEB that regulates pixel-wise response intensities through learnable calibration weights. This block incorporates an auxiliary branch with chunked space optimization to adaptively enhance high-frequency components while preserving local structural details. In the refinement stage, we propose an MSHIRB. It first captures multi-scale information via multiplicity sampling, and then uses a multi-branch structure with wavelet and band convolutions to extract diverse detail features, further refining high-frequency information. Together, these blocks address the limitations of traditional distillation networks in high-frequency recovery and detail reconstruction. Experimental results show that the proposed network achieves competitive quantitative metrics and visual reconstruction quality while maintaining good balance between complexity and performance.

Author Contributions

Methodology, L.L.; formal analysis, S.G.; data curation, S.G., L.L., W.C., H.J. and H.G.; writing—original draft preparation, S.G., L.L., W.C., H.J. and H.G.; writing—review and editing, S.G., L.L., W.C., H.J. and H.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China under Grant 61976034, 52304182.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

We thank the China University of Mining and Technology and Dalian University of Technology.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Liu, Y.; Zhang, M.; Jiang, B.; Hou, B.; Liu, D.; Chen, J.; Lian, H. Flexible alignment super-resolution network for multi-contrast magnetic resonance imaging. IEEE Trans. Multimed. 2023, 26, 5159–5169. [Google Scholar] [CrossRef]
  2. Ren, S.; Guo, K.; Zhou, X.; Hu, B.; Zhu, F.; Luo, E. Medical image super-resolution based on semantic perception transfer learning. IEEE/ACM Trans. Comput. Biol. Bioinform. 2022, 20, 2598–2609. [Google Scholar] [CrossRef]
  3. Cheng, D.; Chen, J.; Kou, Q.; Nie, S.; Zhang, J. Super-resolution reconstruction of lightweight mine images by fusing hierarchical features and attention mechanisms. J. Instrum. 2022, 43, 73–84. [Google Scholar]
  4. Kou, Q.; Cheng, Z.; Cheng, D.; Chen, J.; Zhang, J. Lightweight super resolution method based on blueprint separable convolution for mine image. J. China Coal Soc. 2024, 49, 4038–4050. [Google Scholar]
  5. Jiang, H.; Asad, M.; Liu, J.; Zhang, H.; Cheng, D. Single image detail enhancement via metropolis theorem. Multimed. Tools Appl. 2024, 83, 36329–36353. [Google Scholar] [CrossRef]
  6. Cheng, D.; Yuan, H.; Qian, J.; Kou, Q.; Jiang, H. Image Super-Resolution Algorithms Based on Deep Feature Differentiation Network. J. Electron. Inf. 2024, 46, 1033–1042. [Google Scholar]
  7. Chao, J.; Zhou, Z.; Gao, H.; Gong, J.; Zeng, Z.; Yang, Z. A novel learnable interpolation approach for scale-arbitrary image super-resolution. In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, Macao, China, 19–25 August 2023; pp. 564–572. [Google Scholar]
  8. Li, X.; Zhang, Y.; Ge, Z.; Cao, G.; Shi, H.; Fu, P. Adaptive nonnegative sparse representation for hyperspectral image super-resolution. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 4267–4283. [Google Scholar] [CrossRef]
  9. Chen, W.; Huang, G.; Mo, F.; Lin, J. Image super-resolution reconstruction algorithm with adaptive aggregation of hierarchical information. J. Comput. Eng. Appl. 2024, 60, 221–231. [Google Scholar]
  10. Zhang, J.; Jia, Y.; Zhu, H.; Li, H.; Du, J. 3D-MRI Super-Resolution Algorithm Fusing Attention and Dilated Encoder-Decoder. J. Comput. Eng. Appl. 2024, 60, 228–236. [Google Scholar]
  11. Dong, C.; Loy, C.C.; He, K.; Tang, X. Learning a deep convolutional network for image super-resolution. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Proceedings, Part IV 13. Springer: Berlin/Heidelberg, Germany, 2014; pp. 184–199. [Google Scholar]
  12. Dong, C.; Loy, C.C.; Tang, X. Accelerating the super-resolution convolutional neural network. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 391–407. [Google Scholar]
  13. Shi, W.; Caballero, J.; Huszár, F.; Totz, J.; Aitken, A.P.; Bishop, R.; Rueckert, D.; Wang, Z. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1874–1883. [Google Scholar]
  14. Kim, J.; Lee, J.K.; Lee, K.M. Accurate image super-resolution using very deep convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1646–1654. [Google Scholar]
  15. Lim, B.; Son, S.; Kim, H.; Nah, S.; Mu Lee, K. Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 136–144. [Google Scholar]
  16. Liu, J.; Tang, J.; Wu, G. Residual feature distillation network for lightweight image super-resolution. In Proceedings of the Computer Vision–ECCV 2020 Workshops, Glasgow, UK, 23–28 August 2020; Proceedings, Part III 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 41–55. [Google Scholar]
  17. Tai, Y.; Yang, J.; Liu, X. Image super-resolution via deep recursive residual network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3147–3155. [Google Scholar]
  18. Yu, L.; Li, X.; Li, Y.; Jiang, T.; Wu, Q.; Fan, H.; Liu, S. Dipnet: Efficiency distillation and iterative pruning for image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 1692–1701. [Google Scholar]
  19. Sun, L.; Dong, J.; Tang, J.; Pan, J. Spatially-adaptive feature modulation for efficient image super-resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 13190–13199. [Google Scholar]
  20. Lu, Z.; Li, J.; Liu, H.; Huang, C.; Zhang, L.; Zeng, T. Transformer for single image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 457–466. [Google Scholar]
  21. Li, W.; Li, J.; Gao, G.; Deng, W.; Zhou, J.; Yang, J.; Qi, G.J. Cross-receptive focused inference network for lightweight image super-resolution. IEEE Trans. Multimed. 2023, 26, 864–877. [Google Scholar] [CrossRef]
  22. Hui, Z.; Wang, X.; Gao, X. Fast and accurate single image super-resolution via information distillation network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 723–731. [Google Scholar]
  23. Hui, Z.; Gao, X.; Yang, Y.; Wang, X. Lightweight image super-resolution with information multi-distillation network. In Proceedings of the 27th ACM International Conference on Multimedia, Nice, France, 21–25 October 2019; pp. 2024–2032. [Google Scholar]
  24. Kong, F.; Li, M.; Liu, S.; Liu, D.; He, J.; Bai, Y.; Chen, F.; Fu, L. Residual local feature network for efficient super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 766–776. [Google Scholar]
  25. Li, Z.; Liu, Y.; Chen, X.; Cai, H.; Gu, J.; Qiao, Y.; Dong, C. Blueprint separable residual network for efficient image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 833–843. [Google Scholar]
  26. Liu, J.; Zhang, W.; Tang, Y.; Tang, J.; Wu, G. Residual feature aggregation network for image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 2359–2368. [Google Scholar]
  27. Finder, S.; Amoyal, R.; Treister, E.; Freifeld, O. Wavelet Convolutions for Large Receptive Fields. arXiv 2024, arXiv:2407.05848. [Google Scholar] [CrossRef]
  28. Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
  29. Timofte, R.; Agustsson, E.; Van Gool, L.; Yang, M.H.; Zhang, L. Ntire 2017 challenge on single image super-resolution: Methods and results. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 114–125. [Google Scholar]
  30. Bevilacqua, M.; Roumy, A.; Guillemot, C.; Alberi-Morel, M.L. Low-Complexity Single-Image Super-Resolution Based On Nonnegative Neighbor Embedding; BMVA Press: Durham, UK, 2012. [Google Scholar]
  31. Zeyde, R.; Elad, M.; Protter, M. On single image scale-up using sparse-representations. In Proceedings of the Curves and Surfaces: 7th International Conference, Avignon, France, 24–30 June 2010; Revised Selected Papers 7. Springer: Berlin/Heidelberg, Germany, 2012; pp. 711–730. [Google Scholar]
  32. Martin, D.; Fowlkes, C.; Tal, D.; Malik, J. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proceedings of the Eighth IEEE International Conference on Computer Vision. ICCV 2001, Vancouver, BC, Canada, 7–14 July 2001; Volume 2, pp. 416–423. [Google Scholar]
  33. Huang, J.B.; Singh, A.; Ahuja, N. Single image super-resolution from transformed self-exemplars. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 5197–5206. [Google Scholar]
  34. Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed]
  35. Huang, H.; Shen, L.; He, C.; Dong, W.; Liu, W. Differentiable neural architecture search for extremely lightweight image super-resolution. IEEE Trans. Circuits Syst. Video Technol. 2022, 33, 2672–2682. [Google Scholar] [CrossRef]
  36. Park, K.; Soh, J.W.; Cho, N.I. A dynamic residual self-attention network for lightweight single image super-resolution. IEEE Trans. Multimed. 2021, 25, 907–918. [Google Scholar] [CrossRef]
  37. Wang, K.; Yang, X.; Jeon, G. Hybrid attention feature refinement network for lightweight image super-resolution in metaverse immersive display. IEEE Trans. Consum. Electron. 2023, 70, 3232–3244. [Google Scholar] [CrossRef]
  38. Wang, Y.; Zhang, T. Osffnet: Omni-stage feature fusion network for lightweight image super-resolution. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 26–27 February 2024; Volume 38, pp. 5660–5668. [Google Scholar]
  39. Liu, Y.; Jia, Q.; Zhang, J.; Fan, X.; Wang, S.; Ma, S.; Gao, W. Hierarchical similarity learning for aliasing suppression image super-resolution. IEEE Trans. Neural Networks Learn. Syst. 2022, 35, 2759–2771. [Google Scholar] [CrossRef]
  40. Yasir, M.; Ullah, I.; Choi, C. Depthwise channel attention network (DWCAN): An efficient and lightweight model for single image super-resolution and metaverse gaming. Expert Syst. 2024, 41, e13516. [Google Scholar] [CrossRef]
  41. Song, W.; Yan, X.; Guo, W.; Xu, Y.; Ning, K. MSWSR: A Lightweight Multi-Scale Feature Selection Network for Single-Image Super-Resolution Methods. Symmetry 2025, 17, 431. [Google Scholar] [CrossRef]
  42. Li, F.; Cong, R.; Wu, J.; Bai, H.; Wang, M.; Zhao, Y. Srconvnet: A transformer-style convnet for lightweight image super-resolution. Int. J. Comput. Vis. 2025, 133, 173–189. [Google Scholar] [CrossRef]
  43. Luo, X.; Qu, Y.; Xie, Y.; Zhang, Y.; Li, C.; Fu, Y. Lattice network for lightweight image restoration. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 4826–4842. [Google Scholar] [CrossRef] [PubMed]
  44. Choi, H.; Lee, J.; Yang, J. N-gram in swin transformers for efficient lightweight image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 2071–2081. [Google Scholar]
  45. Liang, J.; Cao, J.; Sun, G.; Zhang, K.; Van Gool, L.; Timofte, R. Swinir: Image restoration using swin transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 1833–1844. [Google Scholar]
  46. Gao, G.; Wang, Z.; Li, J.; Li, W.; Yu, Y.; Zeng, T. Lightweight bimodal network for single-image super-resolution via symmetric CNN and recursive transformer. arXiv 2022, arXiv:2204.13286. [Google Scholar]
  47. Li, J.; Ke, Y. Hybrid convolution-transformer for lightweight single image super-resolution. In Proceedings of the ICASSP 2024—2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Republic of Korea, 14–19 April 2024; pp. 2395–2399. [Google Scholar]
Figure 1. Multi-order information optimization network structure.
Figure 1. Multi-order information optimization network structure.
Sensors 25 05275 g001
Figure 2. Multi-order information optimization block.
Figure 2. Multi-order information optimization block.
Sensors 25 05275 g002
Figure 3. Self-calibrating high-frequency information enhancement block.
Figure 3. Self-calibrating high-frequency information enhancement block.
Sensors 25 05275 g003
Figure 4. Chunked space optimization block.
Figure 4. Chunked space optimization block.
Sensors 25 05275 g004
Figure 5. Multi-scale high-frequency information refinement block.
Figure 5. Multi-scale high-frequency information refinement block.
Sensors 25 05275 g005
Figure 6. Multi-branch feature extraction block.
Figure 6. Multi-branch feature extraction block.
Sensors 25 05275 g006
Figure 7. Visual comparison of different networks on B100: 86000.
Figure 7. Visual comparison of different networks on B100: 86000.
Sensors 25 05275 g007
Figure 8. Visual comparison of different networks on B100: 210088.
Figure 8. Visual comparison of different networks on B100: 210088.
Sensors 25 05275 g008
Figure 9. Visual comparison of different networks on Urban100: img058.
Figure 9. Visual comparison of different networks on Urban100: img058.
Sensors 25 05275 g009
Figure 10. Visual comparison of different networks on Urban100: img015.
Figure 10. Visual comparison of different networks on Urban100: img015.
Sensors 25 05275 g010
Figure 11. Reconstructed images and their high-frequency component images.
Figure 11. Reconstructed images and their high-frequency component images.
Sensors 25 05275 g011
Table 1. Comparison of metrics under the baseline dataset when the scale factor is ×2, ×3, and ×4. Bold is optimal, underlined is sub-optimal, and - indicates that the network was not tested for this condition.
Table 1. Comparison of metrics under the baseline dataset when the scale factor is ×2, ×3, and ×4. Bold is optimal, underlined is sub-optimal, and - indicates that the network was not tested for this condition.
ScaleMethodParamsFLOPsSet5Set14B100Urban100
PSNR/SSIMPSNR/SSIMPSNR/SSIMPSNR/SSIM
×2EDSR-baseline [15]1370 K316.3 G37.99/0.960433.57/0.917532.16/0.899431.98/0.9272
IMDN [23]694 K186.7 G38.00/0.960533.63/0.917732.19/0.899632.17/0.9283
RFDN [16]534 K95.0 G38.05/0.960633.68/0.918432.16/0.899432.12/0.9278
BSRN [25]332 K73.0 G38.10/0.961033.74/0.919332.24/0.900632.34/0.9303
SAFMN [19]228 K52.0 G38.00/0.960533.54/0.917732.16/0.899531.84/0.9256
DLSR [35]322 K68.0 G38.04/0.960633.67/0.918332.21/0.900232.26/0.9297
DRSAN [36]1190 K274.6 G38.14/0.961133.75/0.918832.25/0.901032.46/0.9317
HAFRN [37]496 K-38.05/0.960633.66/0.918732.21/0.899932.20/0.9289
OSFFNet [38]516 K83.2 G38.11/0.961033.72/0.919032.29/0.901232.67/0.9331
HSRNet [39]1260 K-38.07/0.960733.78/0.919732.26/0.900632.53/0.9320
DWCAN [40]401 K-37.60/0.959833.33/0.916032.07/0.898731.95/0.9267
MSWSR [41]312 K243.3 G38.01/0.961033.71/0.919332.22/0.900332.29/0.9301
SRConvNet-L [42]885 K160 G38.14/0.961033.81/0.919932.28/0.901032.59/0.9321
MOION816 K163.74 G38.16/0.961133.92/0.920432.32/0.901432.69/0.9339
×3EDSR-baseline [15]1555 K160.2 G34.37/0.927030.28/0.841729.09/0.805228.15/0.8527
IMDN [23]703 K84.0 G34.36/0.927030.32/0.841729.09/0.804628.17/0.8519
RFDN [16]541 K42.2 G34.41/0.927330.34/0.842029.09/0.805028.21/0.8525
BSRN [25]340 K33.3 G34.46/0.927730.47/0.844929.18/0.806828.39/0.8567
SAFMN [19]233 K23.0 G34.34/0.926730.33/0.841829.08/0.804827.95/0.8474
DLSR [35]329 K-34.49/0.927930.39/0.842829.13/0.806128.26/0.8548
DRSAN [36]1290 K133.4 G34.59/0.928630.42/0.844329.18/0.807928.52/0.8593
HAFRN [37]505 K-34.45/0.927630.40/0.843329.12/0.805828.16/0.8528
OSFFNet [38]524 K37.8 G34.58/0.928730.48/0.845029.21/0.808028.49/0.8595
HSRNet [39]--34.47/0.927830.40/0.843529.15/0.806628.42/0.8579
DWCAN [40]401 K-34.29/0.925830.29/0.841029.00/0.802728.18/0.8521
MSWSR [41]307 K249.6 G34.40/0.927730.35/0.843729.12/0.806728.22/0.8548
SRConvNet-L [42]906 K74 G34.59/0.928830.50/0.845529.22/0.808128.56/0.8600
MOION825 K73.72 G34.69/0.929430.57/0.846729.24/0.809128.68/0.8629
×4EDSR-baseline [15]1518 K114.0 G32.09/0.893828.58/0.781327.57/0.735726.04/0.7849
IMDN [23]715 K48.0 G32.21/0.894828.58/0.781127.56/0.735326.04/0.7838
RFDN [16]550 K23.9 G32.24/0.895228.61/0.781927.57/0.736026.11/0.7858
BSRN [25]352 K19.4 G32.35/0.896628.73/0.784727.65/0.738726.27/0.7908
SAFMN [19]240 K14.0 G32.18/0.894828.60/0.781327.58/0.735925.97/0.7809
DLSR [35]338 K20 G32.33/0.896328.68/0.783227.61/0.737426.19/0.7892
DRSAN [36]1270 K88.7 G32.34/0.896028.65/0.784127.63/0.739026.33/0.7936
HAFRN [37]517 K-32.24/0.895328.60/0.781627.58/0.736526.02/0.7849
OSFFNet [38]537 K22.0 G32.39/0.897628.75/0.785227.66/0.739326.36/0.7950
HSRNet [39]1285 K-32.28/0.896028.68/0.784027.64/0.738826.28/0.7934
DWCAN [40]401 K-32.20/0.893828.56/0.280927.41/0.733926.06/0.7851
MSWSR [41]316 K257.6 G32.26/0.896628.67/0.784327.62/0.737926.17/0.7896
SRConvNet-L [42]902 K45 G32.44/0.897628.77/0.785727.69/0.740226.47/0.7970
MOION837 K42.13 G32.51/0.898428.85/0.787427.72/0.741826.55/0.8005
Table 2. Comparison with Transformer network metrics for scale factors of ×2, ×3, and ×4. Bold is optimal, underlined is sub-optimal, and - indicates that the network was not tested for this condition.
Table 2. Comparison with Transformer network metrics for scale factors of ×2, ×3, and ×4. Bold is optimal, underlined is sub-optimal, and - indicates that the network was not tested for this condition.
ScaleMethodParamsFLOPsSet5Set14B100Urban100
PSNR/SSIMPSNR/SSIMPSNR/SSIMPSNR/SSIM
×2SwinIR-light [45]878 K195.6 G38.14/0.961133.86/0.920632.31/0.901232.76/0.9340
LBNet [46]------
ESRT [20]677 K191.4 G38.03/0.960033.75/0.918432.25/0.900132.58/0.9318
NGSwin [44]998 K140.4 G38.05/0.961033.79/0.919932.27/0.900832.53/0.9324
DRSAN [36]1190 K274.6 G38.14/0.961133.75/0.918832.25/0.901032.46/0.9317
CFIN [21]675 K116.9 G38.14/0.961033.80/0.919932.26/0.900632.48/0.9311
HCFormer [47]911 K-38.06/0.960934.18/0.925332.45/0.905132.67/0.9359
MOION816 K163.74 G38.16/0.961133.92/0.920432.32/0.901432.69/0.9339
×3SwinIR-light [45]886 K87.2 G34.62/0.928930.54/0.846329.20/0.808228.66/0.8624
LBNet [46]736 K68.4 G34.47/0.927730.38/0.841729.13/0.806128.42/0.8559
ESRT [20]770 K96.4 G34.42/0.926830.43/0.843329.15/0.806328.46/0.8574
NGSwin [44]1007 K66.6 G34.52/0.928230.53/0.845629.19/0.807828.52/0.8603
DRSAN [36]1290 K133.4 G34.59/0.928630.42/0.844329.18/0.807928.52/0.8593
CFIN [21]681 K53.5 G34.65/0.928930.45/0.844329.18/0.807128.49/0.8583
HCFormer [47]923 K-34.51/0.927930.55/0.845929.31/0.810428.56/0.8613
MOION825 K73.72 G34.69/0.929430.57/0.846729.24/0.809128.68/0.8629
×4SwinIR-light [45]897 K49.6 G32.44/0.897628.77/0.785827.69/0.740626.47/0.7980
LBNet [46]742 K38.9 G32.29/0.896028.68/0.783227.62/0.738226.27/0.7906
ESRT [20]751 K67.7 G32.19/0.894728.69/0.783327.69/0.737926.39/0.7962
NGSwin [44]1019 K36.4 G32.33/0.896328.78/0.785927.66/0.739626.45/0.7963
DRSAN [36]1270 K88.7 G32.34/0.896028.65/0.784127.63/0.739026.33/0.7936
CFIN [21]699 K31.2 G32.49/0.898528.74/0.784927.68/0.739626.39/0.7946
HCFormer [47]940 K58.7 G32.41/0.897628.84/0.787427.66/0.741326.51/0.7987
MOION837 K42.13 G32.51/0.898428.85/0.787427.72/0.741826.55/0.8005
Table 3. Impact of different modules on network performance. Bold is optimal, × means adding this block, ✔ means removing this block.
Table 3. Impact of different modules on network performance. Bold is optimal, × means adding this block, ✔ means removing this block.
ScaleWTConv-5CSOBMSHIRBParamsFLOPsUrban100
PSNR/SSIM
×4×××162 K8.79 G25.73/0.7734
××200 K9.64 G25.85/0.7770
××194 K10.30 G25.84/0.7771
××175 K9.43 G25.82/0.7758
×231 K11.14 G25.90/0.7797
×207 K10.93 G25.92/0.7806
×213 K10.27 G25.88/0.7789
244 K11.78 G25.94/0.7810
Table 4. Effect of multiplicity sampling and multi-branch feature extraction on performance. Bold is optimal, × means adding this block, ✔ means removing this block.
Table 4. Effect of multiplicity sampling and multi-branch feature extraction on performance. Bold is optimal, × means adding this block, ✔ means removing this block.
ScaleMultiplicity
Sampling (MS)
MBFEBParamsFLOPsUrban100
PSNR/SSIM
×4××162 K8.79 G25.73/0.7734
×166 K8.89 G25.80/0.7757
×172 K9.34 G25.75/0.7742
175 K9.43 G25.82/0.7758
Table 5. The impact of dual-branch on performance in SCHIEB. Bold is optimal.
Table 5. The impact of dual-branch on performance in SCHIEB. Bold is optimal.
ScaleBranch NameParamsFLOPsUrban100
PSNR/SSIM
×4SCB195 K9.21 G25.81/0.7765
AB121 K6.39 G25.61/0.7692
Dual-Branch200 K9.64 G25.85/0.7770
Table 6. The influence of different convolutional kernel sizes on performance in MBFEB. Bold is optimal.
Table 6. The influence of different convolutional kernel sizes on performance in MBFEB. Bold is optimal.
ScaleCombinationParamsFLOPsUrban100
PSNR/SSIM
×43-3-3164 K8.83 G25.70/0.7724
3-5-5164 K8.84 G25.72/0.7725
3-7-7164 K8.84 G25.73/0.7733
5-3-3166 K8.87 G25.75/0.7735
5-5-5166 K8.88 G25.77/0.7747
5-7-7166 K8.89 G25.80/0.7757
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Gao, S.; Li, L.; Cui, W.; Jiang, H.; Ge, H. Lightweight Image Super-Resolution Reconstruction Network Based on Multi-Order Information Optimization. Sensors 2025, 25, 5275. https://doi.org/10.3390/s25175275

AMA Style

Gao S, Li L, Cui W, Jiang H, Ge H. Lightweight Image Super-Resolution Reconstruction Network Based on Multi-Order Information Optimization. Sensors. 2025; 25(17):5275. https://doi.org/10.3390/s25175275

Chicago/Turabian Style

Gao, Shengxuan, Long Li, Wen Cui, He Jiang, and Hongwei Ge. 2025. "Lightweight Image Super-Resolution Reconstruction Network Based on Multi-Order Information Optimization" Sensors 25, no. 17: 5275. https://doi.org/10.3390/s25175275

APA Style

Gao, S., Li, L., Cui, W., Jiang, H., & Ge, H. (2025). Lightweight Image Super-Resolution Reconstruction Network Based on Multi-Order Information Optimization. Sensors, 25(17), 5275. https://doi.org/10.3390/s25175275

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop