ResShift-4E: Improved Diffusion Model for Super-Resolution with Microscopy Images

Gao, Depeng; Gong, Ying; Cao, Jingzhuo; Wang, Bingshu; Zhang, Han; Dong, Jiangkai; Qiu, Jianlin

doi:10.3390/electronics14030479

Open AccessArticle

ResShift-4E: Improved Diffusion Model for Super-Resolution with Microscopy Images

by

Depeng Gao

¹,

Ying Gong

²

,

Jingzhuo Cao

³,

Bingshu Wang

³

,

Han Zhang

^4,*,

Jiangkai Dong

⁵ and

Jianlin Qiu

¹

School of Yonyou Digital and Intelligence, Nontong Institute of Technology, Nantong 226000, China

²

Faculty of Science and Technology, University of Macau, Macau 999078, China

³

School of Software, Northwestern Polytechnical University, Xi’an 710060, China

⁴

School of Artificial Intelligence, Optics and Electronics, Northwestern Polytechnical University, Xi’an 710060, China

⁵

Elefante AI Solution Company, Yuhang District, Hangzhou 311100, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(3), 479; https://doi.org/10.3390/electronics14030479

Submission received: 30 December 2024 / Revised: 20 January 2025 / Accepted: 22 January 2025 / Published: 24 January 2025

(This article belongs to the Special Issue Artificial Intelligence in Vision Modelling)

Download

Browse Figures

Versions Notes

Abstract

Blind super-resolution algorithms based on diffusion models still face significant challenges at the current stage, including high computational cost, long inference time, and limited cross domain generalization ability. This paper aims to apply super-resolution algorithms to the field of optical microscopy imaging to reveal more microscopic structures and details. Firstly, we proposed a lightweight super-resolution model called ResShift-4E, which is an optimized model from two important aspects: reducing the diffusion steps in ResShift and strengthening the influence of the original residuals on model learning. Secondly, we constructed a dataset of Multimodal High-resolution Microscopy Images (MHMI) including a total of 1220 images, which is available on line. Moreover, we extended our model to application-oriented research on blind image super-resolution of optical microscopy imaging. The experimental results demonstrate that our ResShift-4E model outperforms other models on various microscopy images.

Keywords:

super resolution; microscopy images; diffusion model

1. Introduction

Super-resolution reconstruction is an important computer-vision task that aims to reconstruct an input low-resolution image into a high-resolution image with more textures and details [1,2,3]. Therefore, super-resolution reconstruction techniques have an indispensable place in tasks such as microscopy imaging, medical imaging, image segmentation, image classification, pedestrian re-identification, and geographic remote sensing. However, in real-world environments, super-resolution reconstruction suffers from inconsistency between the predicted downsampling method and the real image, which leads to reconstruction bias and poor quality of the recovered image [4,5].

Most existing methods assume a predefined degradation process (e.g., bilateral downsampling) from the high-resolution image to the low-resolution image, which is difficult to apply to real images with unknown complex degradation types [6,7,8]. The blind image super-resolution algorithm based on diffusion models aims to use diffusion models for the reconstruction by learning a more realistic downsampling method. This downsampling method is predicted by human beings to increase the high-frequency feature information such as texture, morphology, and intrinsic semantics of images [9]. Nevertheless, the blur information of real low-resolution images in practical applications is often inconsistent with the model prediction, which often leads to poor quality of reconstructed high-resolution images and inadequate recovery of high-frequency information. Therefore, how to reconstruct images effectively, i.e., accurately extracting image blur information by mining potential correlations in pairs of clear and blurred images, has become a key research direction in the field of blind image super-resolution [10,11,12].

Specifically, for microscopy images, high-resolution microscopy techniques provide us with unique access to deeper insights into the inner workings of cells and a wide variety of biological processes [13]. For instance, photoactivated localization microscopy, stimulated radiation loss microscopy, and structured light illumination microscopy. However, these methods typically rely on complex optical setups, specific fluorescent markers, and sample processing methods, and require extensive computational post-processing [14]. It becomes essential to further acquire and analyze more details in microscopy images, accurate image enhancement, and super resolution. Although microscopy images have less shooting instability compared to normal images, their super-resolution effects are still restricted by the finite nature of textures and details in the images.

To address the mentioned issues, this paper aims to explore an effective solution to implement super-resolution reconstruction using diffusion models. Meanwhile, it is also expected to achieve a balance between the computational cost of the model and the quality of the output image. In this paper, we employ the fuzzy kernel prediction algorithm in combination with the diffusion model to achieve good performance in the field of single-image super-resolution. Our primary contributions are as follows:

(1) Proposing our ResShift-4E model. It is more lightweight, and is twice as computationally efficient as the original ResShift model.

(2) Constructing the Multimodal High-resolution Microscopy Images (MHMI) dataset. MHMI includes images of four different microscopy techniques, 1220 in total.

(3) Applying the modified ResShift-4E model to the field of biological image analysis. ResShift-4E is trained on the MHMI dataset and compared to other models, the results of which demonstrate the advantage of ResShift-4E over others under various types of microscopy images.

2. Related Works

2.1. Super-Resolution Image Datasets

Researchers use unified datasets to present the quantitative and visual comparison of the super-resolution models. In the field of blind super-resolution, the widely used datasets include DIV2K [15], Flickr2K [16], OutdoorSceneTraining [17], and 2020Track [18], all of which have more than 1000 high-resolution images. Limited by the acquisition difficulty, real image datasets are more suitable for super-resolution reconstruction. Cai et al. [19] firstly developed a real image dataset, RealSR, with 595 LR-HR image pairs of the same scene at different focal lengths with different scales, accomplishing accurate alignment. Wei et al. [20] developed a larger dataset called DRealSR including 840 image pairs. Similarly, traditional test sets include Set5 [21], Set14 [22], BSD100 [23], Urban100 [24], Manga109 [25], 2020Track1 [18], and 2020Track2 [18], with 5, 14, 100, 100, 100, 109, 100, and 100 images, respectively. Meanwhile, test sets such as RealSR, OST300 [26], DPED [27], and ADE20K [28] are more frequently used in the field of real image super-resolution. In summary, it is still restricted by the open datasets of microscopy images for training. Thus, it is necessary to build up a new dataset for blind super-resolution task.

2.2. Super-Resolution Methods Based on Diffusion Models

With the introduction of Denoising Diffusion Probabilistic Models (DDPM) [29] across the board, diffusion models have been progressively simplified with great success in the field of image generation. A common approach is to insert LR images into the input of the current diffusion model (e.g., DDPM) and retrain the model from scratch on the training data. Ramesh et al. [30] firstly applied diffusion models efficiently to the multimodal domain by learning a low-resolution model and multiple super-resolution diffusion models to save computation costs. Latent Diffusion Models (LDMs) [31] creatively learned feature distributions on the latent space with diffusion models instead of directly letting the diffusion models learn the images, greatly reducing the learning cost of diffusion models. Sahak et al. [32] proposed a diffusion-model-based blind super-resolution model, SR3+, which combined self-supervised training with noise modulation enhancement during training and testing.

An unconditionally pretrained diffusion model is another approach, utilizing its inverse path to generate the expected HR image [33,34,35,36,37,38]. ResShift [33] limits the diffusion model from indeterminate hundreds of steps to 15 steps. It also proposes a noise table (schedule) that can effectively control the noise intensity and transition speed during diffusion, which strengthens the inference efficiency. Gu et al. [34] took an alternative approach in high-resolution image and video synthesis by integrating low-resolution diffusion into high-resolution generation, which demonstrates a strong zero-sample generalization ability on untrained resolution. Li et al. [35] converted Gaussian noise into SR prediction through Markov chain, which could effectively solve the problems of overly smooth outputs, pattern collapse, and large model occupancy area, etc. DDIM [36] generalized DDPM through a class of non-Markovian diffusion processes, which generated implicit models with high-quality samples faster and greatly improved the iterative rate of DDPM.

In summary, a light-weighted model and guided super-resolution reconstruction are the current mainstream research directions of diffusion models. These techniques are worth investing in to achieve a good trade-off between the computations and the output image quality.

3. The Proposed Method

We propose a diffusion-based image super-resolution model, ResShift-4E, which can magnify the input low-resolution (LR) image by a factor of 4 to output an HR image. It has obvious advantages over the ResShift-15 model [33]. The proposed ResShift-4E significantly reduces the hardware requirements and runtime while maintaining a higher level of super resolution.

Figure 1 shows an overview of our ResShift-4E model. It reduces the diffusion step of the ResShift model to 4 steps and constructs a loss function

L_{c o n t r a s t}

, which focuses on the original residuals by adding a residual structure between the initial LR image and the target HR image. The reason is derived from the fact that the diffusion model can better learn the objective difference between the LR image and the HR image, so as to achieve the effect of limiting the diffusion model’s overly powerful random generation ability. It is expected to improve the objective evaluation metrics, and to ensure that the subjective evaluation metrics are stable enough.

In the Figure 1,

x_{T}

denotes the initial state of the model and

{\hat{x}}_{0}

denotes the inference result of the model with

N (y, k^{2} I)

and

p_{θ} (x | x_{T}, y)

distribution, respectively, where y is an intermediate state in the Markov chain and may represent a feature map, a specific parameter, or a set of transformed image features depending on the model architecture; during the forward diffusion process from

{\hat{x}}_{0}

to

x_{T}

, the model learns the unknown image degradation by random Gaussian sampling with the sampling parameters

μ_{θ} (x_{t}, y, t)

and

ϵ

. The parameter

ϵ

, which dominates the randomness, follows the distribution of

\sum_{θ} (x_{t}, y, t)

.

In contrast with traditional diffusion models that generate images from scratch, the ResShift model uses a degraded image as the initial input, embedding it in the initial state. We follow the diffusion structure of ResShift and reduce the number of diffusion steps. A contrast loss function is added at the end of the forward diffusion process to strengthen the constraints on random sampling.

In the original ResShift model, the forward diffusion process is as below

q (x_{t} | x_{t - 1}, y_{0}) = N (x_{t}; x_{t - 1} + α_{t} e_{0}, k^{2} α_{t} I), t = 1, 2, \dots, T

(1)

Compared with this, the ResShift-4E Markov chain transition formula for the forward process is modified as follows:

q (x_{t} | x_{t - 1}, y_{0}) = N (x_{t}; x_{t - 1} + α_{t} e_{0}, k^{2} α_{t} I, (1 - η_{t}) e_{0}), t = 1, 2, \dots, T

(2)

The newly added transition parameter

(1 - η_{t}) e_{0}

is a value that is independent of random noise and decreases with increasing diffusion steps. Thus, each diffusion step adds an offset towards the original residuals, not just the residuals from the HR image at the diffusion step, and the effect of this offset gradually decays as the diffusion progresses, ensuring that the simulated noise is not overfitting. Similarly, in order to strengthen the influence of the original residual features on the diffusion model, a contrast loss function

L_{c o n t r a s t}

is added. It is defined as follows:

\begin{matrix} L_{c o n t r a s t} (y, x_{0}, x_{t}) = \frac{1}{N} \sum_{n = 1}^{N} β D_{W}^{2} (x_{0}, x_{t}) + (1 - β) max {(m - D_{w} (y, x_{t}), 0)}^{2} \\ D_{W} (x, y) = {∥ x - y ∥}_{2} = {(\sum_{i = 1}^{P} {(x^{i} - y^{i})}^{2})}^{\frac{1}{2}} \end{matrix}

(3)

D_{W}

denotes the Euclidean distance between two samples; P denotes the feature dimension of the sample;

β

is the weight of the comparison with HR image and LR image, respectively, under the diffusion step already taken; m is the LR comparison threshold; and N is the number of samples. Specifically,

β

controls the regularization strength to prevent overfitting, while m determines the degree of smoothness in the image processing pipeline. Of course, the optimization of the model weights is still mainly determined by the form of diffusion in ResShift, which usually needs to minimize its negative variational lower bound, i.e.,:

min_{θ} \sum_{t} D_{K L} [q (x_{t - 1} | x_{0}, x_{t}, y_{0}) ∥ p_{θ} (x_{t - 1} | x_{t}, y_{0})],

(4)

where

D_{K L} [\cdot | | \cdot]

stands for Kullback–Leibler (KL) scatter, and more mathematical details can be found in Sohl-Dickstein et al. [39] or Ho et al. [29] Combining Equations (2) and (3) yields the new loss function we defined, which is used to update the model weights.

4. Experiments

In this section, we first introduce our Multimodal High-resolution Microscopy Images (MHMI) dataset. Then, our method is compared with BSRGAN [40], Real-ESRGAN [41], and other models. The experiments on the MHMI dataset are evaluated in terms of subjective and objective metrics. Finally, the strengths, weaknesses, and practical prospects of this application are discussed.

4.1. The MHMI Dataset

In this paper, we collected and constructed a multimodal pure optical microscopy image dataset called Multimodal High-resolution Microscopy Images (MHMI). Their scale bars are 0.5 μm per pixel. Advanced optical microscopes boost modern biology, which requires necessitating specific image processing algorithms to enhance microscopy images. MHMI is constructed through multiple helpers. It consists of 1220 images, including images captured by four microscopy techniques, namely, differential interference contrast (DIC), fluorescence, phase contrast, and brightfield. It is believed that this dataset can be used to research deep-learning-based microscopy image processing. It can be accessed by the link https://github.com/JimmCao/ResShift-4E (accessed on 21 January 2025).

The MHMI dataset comes from the following three main sources: (a) 240 ground truth images selected from the Fluorescence Microscopy Denoising (FDM) [42] dataset; (b) 920 labeled images selected from the NeurIPS22-CellSeg (2022 Weakly Supervised Cell Segmentation Images) [43] dataset; and (c) 60 real-captured fluorescence microscopy images acquired from laboratories specializing in optical microscopy.

As shown in Figure 2, based on the optical microscopy classification, we visualize the four microscopy images after filtering and cropping the images to show their main features. Among them, the fluorescence images and phase contrast images are mainly 512 × 512 pixels, the differential interference contrast images are mainly 1024 × 1024 pixels, and the brightfield images are mainly 2048 × 1536 pixels. We divide the MHMI dataset into the training set, validation set, and test set. The total number for the training set, validation set, and test set is 1000, 120, and 100, respectively.

As in Figure 3, we downsampled the high-resolution images in the dataset using bicubic interpolation and add random Gaussian blur, where the parameter of downsampling is four times, and the blur parameter interval is taken as [0.45, 2], in which the two values represent the minimum and maximum values of the blur interval, respectively. To facilitate visual comparison and training, we uniformly crop the high-resolution images of the validation set and the test set to 512 × 512 pixels, and, correspondingly, the LR of the low-resolution images is 128 × 128 pixels. Metrics such as PSNR, LPIPS, and NIQE are based on the comparison of high-resolution image (HR) and Ground Truth (GT).

4.2. Experiment Settings

Experiments are conducted under the same settings. The operating system is Ubuntu 22.04, and the CPU is an Intel Xeon E5-2673. The used GPUs are four NVIDIA GeForce GTX 1080Ti GPUs, each with 11 GB of memory. For GPU acceleration, CUDA 11.7 and cuDNN 8.5 are employed. The programming language is Python 3.10, and the deep learning framework used is Pytorch 2.0.1.

For the hyperparameters, “degradation_type” is set to “real-esrgan” degradation channel, which consists of two identical degradation steps containing the same fuzzy kernel with multiple random noise range values. The “crop_pad_size” is set to 300 (pixels), which indicates that the oversized images in the training set are cropped into small images of less than 300 × 300 pixels to improve the overall learning efficiency and training speed of the model. In addition, “iterations”, “microbatch”, “epoch”, “warmup_iterations”, “milestones”, and “save_freq” are set to 1000, 2, 200, 5000, [68,000, 90,000], and 10,000, respectively. Moreover, we crop the high-resolution images of the validation and test sets into small images of 512 × 512 pixels, which is convenient for quick validation, testing, and subsequent comparison.

4.3. Effect of Different Noise and Blur on Training Results

The resolution of optical microscopy images is usually limited by diffraction, so optical super-resolution methods are used to obtain a higher resolution. These methods utilize techniques in the frequency or spatial domain to obtain magnified images by separating light of different wavelengths and performing band broadening operations on the captured light. However, such super-resolution methods may introduce problems such as the irregular noise and blurring, which can affect the output super-resolution quality. During the training process, different types of noise and blurring can have a large impact on the super-resolution results, and such ablation experiments are quite necessary. Our comparison results of ResShift-4E on MHMI are shown in Table 1, Figure 4 and Figure 5.

As shown in Table 1, to quantitatively evaluate the performance of ResShift-4E, we calculate the Peak Signal-to-Noise Ratio (PSNR), Learned Perceptual Image Patch Similarity (LPIPS) [44], and Natural Image Quality Evaluator (NIQE) [45] under different noises and blurs. In the degradation model, the Gaussian noise is a mean value, the Gaussian blur is an interval, and the actual blur assigned to each image is randomly selected within this interval. It can be seen that the optically processed microscopy images are not sensitive to noises. A Gaussian noise difference of 0.05 only has a fluctuation of about 0.05 on the PSNR, LPIPS, and other evaluation metrics of the referenced images. The best values of both are reached when noise is 0.45 and blur is in [0.2, 1.5] at the same time.

We find that the smaller the noise and the larger the blur, the better the performance of NIQE, which reaches the best value at noise of 0.40 and blur in [0.3, 1.5]. Based on the performances in Figure 4 and Figure 5, it can be inferred that the larger the blur and the smaller the noise, the better the ResShift-4E model is able to simulate the granularity of the original GT image. It is quite important in the analysis and research of microscopy images, even though it may lead to a certain degree of over-sharpening.

Moreover, we subsequently evaluated the results by several professionals in computer and optical microscopy, and finally chose the training results with noise of 0.50 and blur in [0.2, 1.5] as our benchmarks for application.

4.4. Visual Results

Blind super-resolution models based on diffusion models often provide better visualization results. In order to further verify the application advantages of the ResShift-4E model, we choose the GAN-based blind super-resolution models such as BSRGAN [40] and Real-ESRGAN [41]. ResShift-4E is superior to the compared methods on three main types of microscopy images, and the comparison results are shown in Figure 6, Figure 7 and Figure 8. In which, the images labeled “GT” are the cropped, zoomed-in panels for the original photo, and “GT” means ground truth.

To ensure fairness, we trained the BSRGAN [40] model and the Real-ESRGAN [41] model using the MHMI dataset as well, and then compared the three models on the MHMI test set. It can be seen that the ResShift-4E model shows better results in fluorescence images, brightfield images, and phase-contrast images. Compared with BSRGAN [40] and Real-ESRGAN [41], which focus on local features, our ResShift-4E model makes good use of the diffusion model’s advantage of focusing on the whole noise, and achieves a better balance between local and whole. It demonstrates that our ResShift-4E is competitive when applied to super-resolution of microscopy images.

In most cases, our ResShift-4E model produces good super-resolution results on the MHMI dataset and has a clear advantage. However, our model has poor super-resolution results on differential interference micrographs. Subsequent optimization for diffusion-model-based super-resolution algorithms is needed.

5. Conclusions

In this paper, we propose a diffusion model called ResShift-4E for blind super-resolution. The main novelty is optimizing ResShift by reducing diffusion steps and strengthening the influence of the original residuals. We also construct a Multimodal High-resolution Microscopy Image (MHMI) dataset and apply our ResShift-4E model to MHMI, which experimentally demonstrates the advantages over other models. The proposed model can be used for automatic image enhancement, noise reduction, or resolution enhancement in biological imaging, medical diagnostics, or materials science.

In the future, we will further cooperate with laboratories in related fields to collect relevant data, and expand the scale of various types of microscopic images. In addition to expanding the data, another expectation of our work lies in optimization of the original framework of ResShift. For instance, by combining various acceleration algorithms like the Teacher–Student model, we could even reduce diffusion steps to 1 step, which will greatly improve its performance. We will also explore how ResShift-4E can be integrated into current microscopy workflows, either as a standalone tool or as part of an existing pipeline, improving processing efficiency, image quality, and overall workflow throughput.

Author Contributions

Conceptualization, B.W.; methodology and software, Y.G.; writing, D.G.; validation and visualization, J.C.; project administration, H.Z.; Investigation, J.D.; funding acquisition, J.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Science and Technology Plan Project of Nantong (No. JC2023023), and the Youth Fund of the National Natural Science Foundation of China (No. 62102318, 62406249). This work was also supported in part by the Nantong Key Laboratory of Virtual Reality and Cloud Computing (No. CP2021001), the Electronic Information Master’s project of Nantong Institute of Technology (No. 879002), and the Software Engineering Key Discipline Construction project of Nantong Institute of Technology (No. 879005), the Phd project in Nantong Institute of Technology (No. 2023XK(B)06).

Data Availability Statement

The dataset is available at https://github.com/JimmCao/ResShift-4E.

Conflicts of Interest

Author Jiangkai Dong was employed by the company Elefante AI Solution Company. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Xie, W.; Kuang, Z.; Wang, M. SCIFI: 3D face reconstruction via smartphone screen lighting. Opt. Express 2021, 29, 43938–43952. [Google Scholar] [CrossRef]
Gao, J.; Tang, N.; Zhang, D. A Multi-Scale Deep Back-Projection Backbone for Face Super-Resolution with Diffusion Models. Appl. Sci. 2023, 13, 8110. [Google Scholar] [CrossRef]
Zhang, X.; Cheng, B.; Yang, X.; Xiao, Z.; Zhang, J.; You, L. Improving Single-Image Super-Resolution with Dilated Attention. Electronics 2024, 13, 2281. [Google Scholar] [CrossRef]
Shan, G.; Fan, X.; Hongjia, L.; Fudong, X.; Mingshu, Z.; Pingyong, X.; Fa, Z. DETECTOR: Structural information guided artifact detection for super-resolution fluorescence microscopy image. Biomed. Opt. Express 2021, 12, 5751–5769. [Google Scholar]
Ci, X.; Yajun, C.; Cahoyue, S.; Longxiang, Y.; Rongzhen, L. AM-ESRGAN: Super-Resolution Reconstruction of Ancient Murals Based on Attention Mechanism and Multi-Level Residual Network. Electronics 2024, 13, 3142. [Google Scholar] [CrossRef]
Feng, X.; Pan, Z. Detail enhancement for infrared images based on Relativity of Gaussian-Adaptive Bilateral Filter. OSA Contin. 2021, 4, 2671–2686. [Google Scholar] [CrossRef]
Bing, H.; Xuebing, M.; Bo, K.; Bingchao, W.; Xiaoxue, W. DDMAFN: A Progressive Dual-Domain Super-Resolution Network for Digital Elevation Model Based on Multi-Scale Feature Fusion. Electronics 2024, 13, 4078. [Google Scholar] [CrossRef]
Qi, J.; Ma, H. A Combined Model of Diffusion Model and Enhanced Residual Network for Super-Resolution Reconstruction of Turbulent Flows. Mathematics 2024, 12, 1028. [Google Scholar] [CrossRef]
Zhang, Q.; Liu, F.; Lu, L.; Su, Z.; Pan, W.; Dai, X. Reconstruction of transparent objects using phase shifting profilometry based on diffusion models. Opt. Express 2024, 32, 13342–13356. [Google Scholar] [CrossRef]
AlHalawani, S.; Benjdira, B.; Ammar, A.; Koubaa, A.; Ali, A.M. DiffPlate: A Diffusion Model for Super-Resolution of License Plate Images. Electronics 2024, 13, 2670. [Google Scholar] [CrossRef]
Lai, X.; Li, Q.; Chen, Z.; Shao, X.; Pu, J. Reconstructing images of two adjacent objects passing through scattering medium via deep learning. Opt. Express 2021, 29, 43280–43291. [Google Scholar] [CrossRef]
Long, Y.; Ruan, H.; Zhao, H.; Liu, Y.; Zhu, L.; Zhang, C.; Zhu, X. Adaptive Dynamic Shuffle Convolutional Parallel Network for Image Super-Resolution. Electronics 2024, 13, 4613. [Google Scholar] [CrossRef]
Park, S.; Min, C.H.; Han, S.; Choi, E.; Cho, K.O.; Jang, H.J.; Kim, M. Super-resolution Microscopy with Adaptive Optics for Volumetric Imaging. Curr. Opt. Photon. 2022, 6, 550–564. [Google Scholar]
Dehez, H.; Piché, M.; De Koninck, Y. Resolution and contrast enhancement in laser scanning microscopy using dark beam imaging. Opt. Express 2013, 21, 15912–15925. [Google Scholar] [CrossRef]
Agustsson, E.; Timofte, R. Ntire 2017 challenge on single image super-resolution: Dataset and study. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 126–135. [Google Scholar]
Timofte, R.; Agustsson, E.; Van, G.L.; Yang, M.H.; Zhang, L. Ntire 2017 challenge on single image super-resolution: Methods and results. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 114–125. [Google Scholar]
Xintao, W.; Ke, Y.; Kelvin, C.K.C. Basicsr. 2020. Available online: https://github.com/xinntao/BasicSR (accessed on 10 December 2024).
Lugmayr, A.; Danelljan, M.; Timofte, R. Ntire 2020 challenge on real-world image super-resolution: Methods and results. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 494–495. [Google Scholar]
Cai, J.; Zeng, H.; Yong, H.; Cao, Z.; Zhang, L. Toward real-world single image super-resolution: A new benchmark and a new model. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 3086–3095. [Google Scholar]
Wei, P.; Xie, Z.; Lu, H.; Zhan, Z.; Ye, Q.; Zuo, W.; Lin, L. Component divide-and-conquer for real-world image super-resolution. In Proceedings of the 16th European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 101–117. [Google Scholar]
Bevilacqua, M.; Roumy, A.; Guillemot, C.; Alberi, M.; Marie, L. Low-complexity single-image super-resolution based on nonnegative neighbor embedding. In Proceedings of the British Machine Vision Conference, Surrey, UK, 3–7 September 2012. [Google Scholar]
Zeyde, R.; Elad, M.; Protter, M. On single image scale-up using sparse-representations. In Proceedings of the 7th International Conference on Curves and Surfaces, Avignon, France, 24–30 June 2012; pp. 711–730. [Google Scholar]
Martin, D.; Fowlkes, C.; Tal, D.; Malik, J. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proceedings of the 8th IEEE International Conference on Computer Vision, Vancouver, BC, Canada, 7–14 July 2001; pp. 416–423. [Google Scholar]
Huang, J.B.; Singh, A.; Ahuja, N. Single image super-resolution from transformed self-exemplars. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 5197–5206. [Google Scholar]
Matsui, Y.; Ito, K.; Aramaki, Y.; Fujimoto, A.; Ogawa, T.; Yamasaki, T.; Aizawa, K. Sketch-based manga retrieval using manga109 dataset. Multimed. Tools Appl. 2017, 76, 21811–21838. [Google Scholar] [CrossRef]
Wang, X.; Yu, K.; Dong, C.; Loy, C.C. Recovering realistic texture in image super-resolution by deep spatial feature transform. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 606–615. [Google Scholar]
Ignatov, A.; Kobyshev, N.; Timofte, R.; Vanhoey, K.; Van, G.L. Dslr-quality photos on mobile devices with deep convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 3277–3285. [Google Scholar]
Zhou, B.; Zhao, H.; Puig, X.; Xiao, T.; Fidler, S.; Barriuso, A.; Torralba, A. Semantic understanding of scenes through the ade20k dataset. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; Volume 127, pp. 302–321. [Google Scholar]
Ho, J.; Jain, A.; Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 2020, 33, 6840–6851. [Google Scholar]
Ramesh, A.; Dhariwal, P.; Nichol, A.; Chu, C.; Chen, M. Hierarchical text-conditional image generation with clip latents. arXiv 2022, arXiv:2204.06125. [Google Scholar]
Rombach, R.; Blattmann, A.; Lorenz, D.; Esser, P.; Ommer, B. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 10684–10695. [Google Scholar]
Sahak, H.; Watson, D.; Saharia, C.; Fleet, D. Denoising diffusion probabilistic models for robust image super-resolution in the wild. arXiv 2023, arXiv:2302.07864. [Google Scholar]
Yue, Z.; Wang, J.; Loy, C.C. Resshift: Efficient diffusion model for image super-resolution by residual shifting. Adv. Neural Inf. Process. Syst. 2024, 36. [Google Scholar]
Gu, J.; Zhai, S.; Zhang, Y.; Susskind, J.; Jaitly, N. Matryoshka diffusion models. In Proceedings of the 12th International Conference on Learning Representations, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
Li, H.; Yang, Y.; Chang, M.; Chen, S.; Feng, H.; Xu, Z.; Li, Q.; Chen, Y. Srdiff: Single image super-resolution with diffusion probabilistic models. Neurocomputing 2022, 479, 47–59. [Google Scholar] [CrossRef]
Song, J.; Meng, C.; Ermon, S. Denoising diffusion implicit models. arXiv 2020, arXiv:2010.02502. [Google Scholar]
Liu, J.; Wang, Q.; Fan, H.; Wang, Y.; Tang, Y.; Qu, L. Residual denoising diffusion models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 2773–2783. [Google Scholar]
Li, M.; Cai, T.; Cao, J.; Zhang, Q.; Cai, H.; Bai, J.; Jia, Y.; Li, K.; Han, S. Distrifusion: Distributed parallel inference for high-resolution diffusion models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 7183–7193. [Google Scholar]
Sohl, J.; Weiss, E.; Maheswaranathan, N.; Ganguli, S. Deep unsupervised learning using nonequilibrium thermodynamics. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 2256–2265. [Google Scholar]
Zhang, K.; Liang, J.; Van, G.L.; Timofte, R. Designing a practical degradation model for deep blind image super-resolution. In Proceedings of the IEEE International Conference on Computer Vision, Montreal, BC, Canada, 10–17 October 2021; pp. 4791–4800. [Google Scholar]
Wang, X.; Xie, L.; Dong, C.; Shan, Y. Realesrgan: Training real-world blind super-resolution with pure synthetic data supplementary material. Comput. Vis. Found. Open Access 2022, 1, 2. [Google Scholar]
Zhang, Y.; Zhu, Y.; Nichols, E.; Wang, Q.; Zhang, S.; Smith, C.; Howard, S. A poisson-gaussian denoising dataset with real fluorescence microscopy images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 11710–11718. [Google Scholar]
Ma, J.; Xie, R.; Ayyadhury, S.; Ge, C.; Gupta, A.; Gupta, R.; Gu, S.; Zhang, Y.; Lee, G.; Kim, J.; et al. The multimodality cell segmentation challenge: Toward universal solutions. Nat. Methods 2019, 21, 1103–1113. [Google Scholar] [CrossRef]
Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef]
Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 586–595. [Google Scholar]

Figure 1. The overview of the proposed ResShift-4E.

Figure 2. Four types of microscopy images.

Figure 3. Downsampling of the images in validation set and test set.

Figure 4. Test results of ResShift-4E on MHMI under different noise and blur (a).

Figure 5. Test results of ResShift-4E on MHMI under different noise and blur (b).

Figure 6. Visual comparison of fluorescent images under different models BSRGAN [40], Real-ESRGAN [41], and ours (ResShift-4E).

Figure 7. Visual comparison of brightfield images under different models BSRGAN [40], Real-ESRGAN [41], and ours (ResShift-4E).

Figure 8. Visual comparison of phase-contrast images under different models BSRGAN [40], Real-ESRGAN [41], and ours (ResShift-4E).

Table 1. Test results of ResShift-4E on MHMI under different noise and blur.

PSNR↑				LPIPS↓
Noise	Blur			Noise	Blur
Noise	[0.3,1.5]	[0.2,1.5]	[0.1,1.2]	Noise	[0.3,1.5]	[0.2,1.5]	[0.1,1.2]
0.50	29.74	29.95	29.83	0.50	0.4414	0.4249	0.4300
0.45	29.70	30.00	29.79	0.45	0.4338	0.4217	0.4246
0.40	29.66	29.96	29.69	0.40	0.4338	0.4241	0.4251
NIQE↓
Noise		Blur
Noise		[0.3,1.5]		[0.2,1.5]		[0.1,1.2]
0.50		10.2901		10.2517		10.8768
0.45		9.8009		10.0176		10.7537
0.40		9.6013		9.8927		10.2258

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gao, D.; Gong, Y.; Cao, J.; Wang, B.; Zhang, H.; Dong, J.; Qiu, J. ResShift-4E: Improved Diffusion Model for Super-Resolution with Microscopy Images. Electronics 2025, 14, 479. https://doi.org/10.3390/electronics14030479

AMA Style

Gao D, Gong Y, Cao J, Wang B, Zhang H, Dong J, Qiu J. ResShift-4E: Improved Diffusion Model for Super-Resolution with Microscopy Images. Electronics. 2025; 14(3):479. https://doi.org/10.3390/electronics14030479

Chicago/Turabian Style

Gao, Depeng, Ying Gong, Jingzhuo Cao, Bingshu Wang, Han Zhang, Jiangkai Dong, and Jianlin Qiu. 2025. "ResShift-4E: Improved Diffusion Model for Super-Resolution with Microscopy Images" Electronics 14, no. 3: 479. https://doi.org/10.3390/electronics14030479

APA Style

Gao, D., Gong, Y., Cao, J., Wang, B., Zhang, H., Dong, J., & Qiu, J. (2025). ResShift-4E: Improved Diffusion Model for Super-Resolution with Microscopy Images. Electronics, 14(3), 479. https://doi.org/10.3390/electronics14030479

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

ResShift-4E: Improved Diffusion Model for Super-Resolution with Microscopy Images

Abstract

1. Introduction

2. Related Works

2.1. Super-Resolution Image Datasets

2.2. Super-Resolution Methods Based on Diffusion Models

3. The Proposed Method

4. Experiments

4.1. The MHMI Dataset

4.2. Experiment Settings

4.3. Effect of Different Noise and Blur on Training Results

4.4. Visual Results

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI