3.2.1. Quantitative Results
The proposed FS
2R network in this paper is compared with ten typical image super-resolution reconstruction networks, including VDSR [
39], DRCN [
40], EDSR-baseline [
10], CARN [
41], IMDN [
16], MADNet [
17], SwinIR-light [
12], RDN [
11], CPAT [
14], DRCT [
15] and FIWHN [
18] using objective evaluation metrics.
Table 1,
Table 2 and
Table 3 show the quantitative comparisons of the super-resolution reconstruction results for scaling factors of ×2, ×3 and ×4, respectively. The best, second-best and third-best results in each table are indicated in bold, underlined and double underlined. It can be observed that our FS
2R network performs favorably on most datasets, particularly surpassing the majority of models in terms of the Structural Similarity Index Measure (SSIM).
The Structural Similarity Index Measure (SSIM) evaluates the similarity between images based on three relatively independent metrics: luminance, contrast, and structure. The improved performance of our model in this regard implies that it can better capture and restore essential structural details, such as edges and textures. This results in a more natural and visually pleasing outcome that aligns better with human visual perception.
As shown in
Table 1, FS
2R achieves better objective evaluation metrics in the 5 benchmark test sets for the ×2 reconstruction task. Compared with performance-oriented models (such as RDN, CPAT, DRCT), the number of model parameters of FS
2R is reduced by 58.45%, 55.17%, and 35.31%, respectively. However, the SSIM of FS
2R (taking BSD100 as an example) is 0.9067, which is higher than that of RDN (0.9017), CPAT (0.9056), and DRCT (0.9051). Compared with other lightweight models (such as CARN, IMDN, FIWHN), the SSIM of FS
2R-L (taking BSD100 as an example) is increased by 0.97%, 0.77%, and 0.64%, respectively.
As shown in
Table 2, FS
2R achieves better objective evaluation metrics in the 5 benchmark test sets for the ×3 reconstruction task. Compared with performance-oriented models (such as RDN, CPAT, DRCT), the SSIM of FS
2R (taking BSD100 as an example) is 0.8342, which is higher than that of RDN (0.8093), CPAT (0.8174), and DRCT (0.8182). Compared with other lightweight models (such as CARN, IMDN, FIWHN), the SSIM of FS
2R-L (taking BSD100 as an example) is increased by 3.81%, 3.65%, and 3.26%, respectively.
As shown in
Table 3, compared with performance-oriented models (such as RDN, CPAT, DRCT), the SSIM of FS
2R (taking BSD100 as an example) is 0.7533, which is higher than that of RDN (0.7419), CPAT (0.7527), and DRCT (0.7532). Compared with other lightweight models (such as CARN, IMDN, FIWHN), the SSIM of FS
2R-L (taking BSD100 as an example) is increased by 2.48%, 2.42%, and 1.77%, respectively.
The perceptual metrics of the reconstructed images from FS
2R were compared against those from typical models, with the results presented in
Table 4. LPIPS (Learned Perceptual Image Patch Similarity) [
42] aligns more closely with human perception than traditional metrics (like PSNR, SSIM). The lower the LPIPS value, the more similar the two images are; conversely, a higher value signifies greater differences.
FS2R performs well on most datasets, especially achieving the best results on Set14 and Urban100 (0.1069 and 0.0121), confirming that FS2R’s reconstructed image effects are more suitable for human perception.
Different test sets have different data distributions. The BSD100 [
36] test set mainly contains images of natural landscapes, animals, plants, and architecture, with relatively rich textures but clear structures. FS
2R achieved better objective evaluation metrics on BSD100, indicating that the model structure of FS
2R and the network weights obtained through training are more compatible with the data distribution of BSD100, resulting in superior performance on this dataset. This is manifested in the fact that FS
2R achieved the highest SSIM scores in tests of various magnification factors. For specific objective metric comparisons, please refer to
Table 1,
Table 2 and
Table 3.
Although the Set5 [
34] and Set14 [
35] test sets have fewer images, but each contains large numbers of repetitive patterns, sharp edges, and smooth areas. The Urban100 [
37] test set contains a large number of urban architectural images with regular and dense geometric structures and long-range continuous edges. Manga109 [
38] mainly consists of anime images with large areas of solid colors, clear lines, and minimal natural noise. Compared with FS
2R, performance-oriented models (such as DRCT [
15] and CPAT [
14]) can better model long-range dependencies, better maintain the coherence of lines and the purity of solid colors, and perform relatively more stably across various test data distributions. For example, the SSIM metric of FS
2R is 0.9183 (for the ×4 magnification factor on Manga109 [
38]), while under the same conditions, the SSIM metric of DRCT [
15] is 0.9304, 1.3% higher than that of FS
2R; the SSIM metric of CPAT [
14] is 0.9309, 1.4% higher than that of FS
2R. This is one of the reasons why FS
2R’s objective evaluation metric SSIM is slightly lower than that of performance-oriented models on other test sets.
Performance-oriented models, such as DRCT [
15] and CPAT [
14], have more parameters and more complex nonlinear transformations, enabling them to learn richer and more detailed image feature representations. These models also include attention mechanisms, which allow the model to adaptively focus on more important areas and allocate more computational resources to reconstructing these important regions. In contrast, FS
2R and FS
2R-L, which are designed with engineering applications in mind, strive to find a balance between model size and inference efficiency without adding complex attention structures. Our models aim to achieve good texture recovery and high perceptual evaluation (as shown in
Table 4) through relatively simple network architectures. However, when dealing with complex structures that require extremely high precision and coordination, their capabilities are slightly inferior to those of performance-oriented models, and the reconstruction results may contain minor misalignments or blurriness. These minor misalignments or blurriness are detected by the structural similarity index, directly resulting in slightly lower SSIM metrics on test sets other than BSD100 [
36] compared to performance-oriented models.
3.2.2. Subjective Evaluation
We carried out image reconstruction experiments at different scaling factors on benchmark datasets.
Figure 9 and
Figure 10 present the ×4 visual comparisons on the common test datasets. For img_36 from BSD100 [
36] and img_88 from Urban100 [
37], FS
2R demonstrates superior grid structure recovery compared to other methods, confirming its effectiveness. As depicted in
Figure 9 and
Figure 10, FS
2R’s local reconstruction of BSD100_img_36 and Urban_img_88 achieves results that are on par with or even surpass those of high-performance methods (e.g., RDN) and lightweight methods (e.g., FSRCNN, VDSR, EDSR-baseline, CARN, IMDN and FIWHN). FS
2R effectively restores edges and textures, making details clearly visible. Notably, in the reconstruction of img_88, FS
2R accurately recovers the textures of architectural structures. This visual comparison further underscores that FS
2R reaches an advanced level of performance, meeting the needs of practical engineering applications.
As shown in
Table 1,
Table 2 and
Table 3, FS
2R attains higher SSIM compared to other models, yet it exhibits relatively lower PSNR metrics. By analyzing the definitions of SSIM and PSNR and conducting an in-depth investigation of the reconstructed images, we have concluded that the image reconstruction performance of FS2R is more focused on enhancing the structural information of images rather than achieving absolute pixel value matching. Research has already demonstrated [
23], that in the task of image super-resolution reconstruction, a higher PSNR does not necessarily equate to better image reconstruction quality. Some reconstructed images may have high PSNR values, but their overly smooth details can result in a worse intuitive perception. We conducted edge extraction on the reconstructed images, and the results are displayed in
Figure 11. The results show that the reconstructed images of FS
2R possess richer edge information compared to other models. At the pixel level, these restored details may not be entirely consistent with the original image, which can lead to an increase in the mean squared error (MSE) and a decrease in the PSNR.
Compared to RepGhost [
21], which is mainly used for classification tasks, it uses batch normalization layers in the network to enhance feature expression. Through reparameterization techniques, multiple branches are merged into a single convolution to boost inference speed. FS
2R is mainly used for image reconstruction tasks, replacing BN layers with 1 × 1 convolutions to prevent BN from damaging image contrast information in super-resolution tasks. Compared to RepVGG [
20], which is mainly used for image classification, although both networks use reparameterization, FS
2R also incorporates the lightweight idea of low-cost redundant feature generation, exploring the issue of feature redundancy in neural networks. Compared to GhostSR [
26], although both are image super-resolution reconstruction networks, GhostSR only uses feature reuse techniques, while FS2R further compresses the model through structural reparameterization. FS
2R-L introduces a bypass branch structure to replace dense connections, further reducing information redundancy and parameter growth. The proposed FS
2R model is a comprehensive improvement on existing lightweight network models, retaining the advantages of advanced models while further exploring the balance between model size and inference efficiency in the engineering applications of neural networks.
3.2.3. Ablation Study
A series of ablation experiments were designed to evaluate the effectiveness of the layers and modules in our model. The network was trained using images of size 64 × 64 and updated with the Adam optimizer, with an initial learning rate of 10
−4. After 750 training epochs, the learning rate was updated to10
−5. after 900 epochs, it was updated to 10
−6, and the training was terminated after 1000 epochs. The ×4 super-resolution reconstruction performance of FS
2R was evaluated on three benchmark datasets: set14 [
35], BSD100 [
36] and Urban100 [
37]. The experimental results of FS
2R are documented in
Table 5. Experiments revealed that the FS
2R model achieves optimal performance in ×4 super-resolution reconstruction when employing 16 FS-Blocks, each comprising 8 LR-Layers.
Table 6 shows that FS
2R-L achieves optimal performance in ×4 super-resolution reconstruction when employing 12 LFS-Blocks, each comprising 8 LR-Layers.
In order to further validate the contribution of redundant features generated by various low-cost operations to model performance and to prove the advancement of the structure designed in this paper, we performed substitution tests on low-cost operations in the lightweight layer using the FS
2R model as the basis. We compared identity mapping, batch normalization, and 1 × 1 convolution (as proposed in this paper). The experimental results indicate that the lightweight layer structure designed in this paper obtains superior results in both objective metrics and subjective evaluation, with the metrics statistics shown in
Table 7.
Under the premise of the same model structure, we employ 1 × 1 convolution as a low-cost operation, compared with the use of batch normalization (BN) structure, the average improvement in objective evaluation metrics is 2.72%; compared with identity mapping, the average improvement in objective evaluation metrics is 3.47%. Ablation experiments demonstrate that the model designed in this paper has certain structural innovation and performance advantages in the task of image super-resolution reconstruction.
3.2.4. Inference Time
In engineering applications, in addition to the pursuit of performance of neural network models, the inference time of the model is also an important metric. We selected representative networks and conducted comparative experiments on reconstruction speed using high-performance GPU devices (NVIDIA GeForce RTX 3090) on the BSD100 dataset (×4). The test LR images were of size 64 × 64. An initial model warm-up operation was performed, as the first inference time may include network loading time. Subsequently, each model was repeatedly run 10 times to obtain the average inference time, as shown in
Table 8.
While ensuring high-quality image super-resolution reconstruction, our model achieves a better balance between performance and model lightweighting. Compared to high-performance super-resolution networks, on the BSD100 dataset with scale ×4, FS
2R can obtain similar objective evaluation metrics with reduced parameter counts and even surpass several representative algorithms in specific metrics. Compared to the advanced DRCT, FS
2R reduces the parameter count by 35% while increasing the SSIM metric by 0.013%. Compared to RDN, FS
2R reduces the parameter count by 58.5% and increases the SSIM metric by 1.54%. Compared to advanced lightweight networks, our model achieves improved evaluation metrics on certain datasets. Compared to FIWHN, FS
2R increases the SSIM metric by 1.8%. Compared to SwinIR-light, FS
2R increases the SSIM metric by 1.71%. Intuitive comparison as shown in
Figure 12.
To further validate the performance of FS2R in engineering applications, we conducted edge hardware inference tests using the Jetson Nano B01, with the results shown in
Table 9. The Jetson Nano B01 is an embedded and edge computing AI development kit launched by NVIDIA, equipped with a quad-core ARM Cortex-A57 MP core processor and a 128-core NVIDIA Maxwell GPU. Designed specifically for edge computing, it is capable of processing and analyzing data near the data source edge. Its compact size, low power consumption, and powerful computing performance make it highly suitable for deployment in a variety of edge application scenarios.