Super-Resolution Reconstruction of Cytoskeleton Image Based on A-Net Deep Learning Network

To date, live-cell imaging at the nanometer scale remains challenging. Even though super-resolution microscopy methods have enabled visualization of sub-cellular structures below the optical resolution limit, the spatial resolution is still far from enough for the structural reconstruction of biomolecules in vivo (i.e., ~24 nm thickness of microtubule fiber). In this study, a deep learning network named A-net was developed and shows that the resolution of cytoskeleton images captured by a confocal microscope can be significantly improved by combining the A-net deep learning network with the DWDC algorithm based on a degradation model. Utilizing the DWDC algorithm to construct new datasets and taking advantage of A-net neural network’s features (i.e., considerably fewer layers and relatively small dataset), the noise and flocculent structures which originally interfere with the cellular structure in the raw image are significantly removed, with the spatial resolution improved by a factor of 10. The investigation shows a universal approach for exacting structural details of biomolecules, cells and organs from low-resolution images.


Introduction
Microscale organizations and nanoscale biomolecular structures play essential roles in life machinery, e.g., the nanopores control transportation [1] and the cytoskeleton behaves as a mechanosensor [2]. To understand the underlying mechanism of cellular behavior, it is important to monitor the dynamics of biomolecules at resolution of tens of nanometers, e.g., the~50 nm persistence length of DNA [3] and~24 nm thickness of microtubule fiber [4]. Imaging platforms, which are reported to achieve such resolution, include transmission electron microscopy (TEM, 300 nm) [5], scanning electron microscopy (SEM, 200 nm) [6], cryogenic electron microscopy (Cryo-EM, 200 nm) [7] and stimulated emission depletion (STED, 20 nm) microscopy [8], etc. TEM, SEM and Cyto-EM, however, are not suitable for live-cell imaging and monitoring molecular dynamics in vivo [9]. STED microscopy is a promising technique. Its application is, however, hindered by the presence of specific fluorophores [10], excessively complex operational procedures [11], and high cost [12]. Considering the fact that there exist large quantities of image data in various databases [13], and most laboratories are only equipped with commonplace inverted microscopes with sub-micron resolution and high noise level, it is critical to develop a numerical approach, which can exact molecular information from poor quality images.
Currently, reported image processing algorithms can be categorized as traditional [14,15], and deep-learning image processing algorithms. The latter have become a focus of the image processing community, and many algorithms have been developed. For instance, the super-resolution convolutional neural network (SRCNN) [16] is an end-to-end network

Related Works
As the purpose of this paper is to explore super-resolution algorithms based on neural networks, the existing algorithms for improving image resolution are reviewed in the first part, followed by detailed introduction on the super-resolution algorithms based on deep learning.
Interpolation-based algorithms [31,32] use the original pixel information of the lowresolution image to "guess" the sub-pixel information of image based on interpolation. It can effectively upgrade the low-resolution image to high resolution with more pixels. Nevertheless, in practical applications, the interpolation algorithms can only improve the image details in a very limited way.
Degenerate-model-based algorithms [33,34] focus on establishing an observation model for the acquisition process of images, and then realize super-resolution reconstruction by solving the inverse problem of the observation model. The observation model describes the process of obtaining the low-resolution observation image from the highresolution image by the imaging system, as shown in Formula (1): where L is the low-resolution image, H is the high-resolution image, f is a transformation function (i.e., the point-spread function in the optical system) and N is noise. This method restores the actual information of the object with a higher resolution, based on the estimation of f . Commonly, this type of super-resolution algorithm includes iterative backprojection (IBP) [34], projection of convex set (POCS) [25], maximum posterior probability (MAP) [35], or Bayesian analysis [36,37] methods, etc. These methods aim improve the visual quality of images and restore object details. However, they also suffer a series of problems, e.g., the processing speed is generally slow and may lead to spurious images.
Learning-based algorithms [38,39] aim to build a mapping between the low-resolution image and the corresponding high-resolution image by prior training and learning from the dataset. Learning-based algorithms are mainly realized by machine learning. There are sev-eral commonly used machine learning methods, including neighborhood embedding [40], support vector regression [41,42], manifold learning [43], sparse representation [44,45], etc. Learning-based algorithms are limited by several disadvantages including the need to manually optimize parameters and the lack of end-to-end training, which leads to poor algorithm applicability.

Deep-Learning-Based Algorithm
In recent years, various deep-learning-based super-resolution algorithms have been developed. Dong et al. [16] first applied a deep neural network to super-resolution processing. They proposed SRCNN to learn the end-to-end mapping between low-resolution images and corresponding high-resolution images. A three-layer convolutional neural network is combined with bilinear interpolation and nonlinear mapping to form the SRCNN algorithm. SRCNN algorithm automatically optimizes all parameters by learning from the input training set, and can therefore reach an average PSNR value of 30.09 dB with a runtime of 0.18 s per image. Dong et al. [17] developed FSRCNN based on SRCNN. A deconvolution layer was used at the end of the network to enlarge the image size, which can save time by eliminating the pretraining phase. The network also replaces convolution kernels in SRCNN with smaller convolution kernels and shares convolution layers in order to reduce the computation. These improvements help the network reduce the calculation parameters and speed up the processing. As a consequence, FSRCNN has very fast processing speed without the loss of restoration quality and achieves an average PSNR of 32.87 dB with a processing speed of 24.7 fps.
Kim et al. [46] extended the network to 20 layers based on SRCNN and introduced residual structure into the network, i.e., very deep convolutional networks (VDSR). The deep network layer has a larger receptive field, and more information can be learned with better accuracy. The VDSR network uses the residual learning method to limit the gradient to a certain range, which can speed up the convergence process. As compared to SRCNN, the VDSR network realized higher accuracy, faster convergence, and greater resolution of folds. In their investigation, VDSR achieves an average PSNR of 37.53 dB.
In the aforementioned methods, excessive smoothing of an image is inevitable and could lead to a spurious image. Ledig et al. [18] proposed a generative adversarial network (SRGAN), which is developed on the basis of the generative adversarial network (GAN) to solve the problem for super-resolution processing and recovering finer texture structures. The generate network generates high-resolution prediction images from low-resolution original images, and the discriminate network determines whether the prediction image is consistent with the corresponding label image. Although the PSNR values were not apparently improved, the details of the image were enhanced to super-resolution level. It should be noted, again, they aim to enrich the pixel information of the image, e.g., from 512 × 512 pixels to 1024 × 1024 pixels, not to improve the intrinsic optical resolution of image, e.g., from 300 to 100 nm.

Algorithm
To improve the resolution of a poor quality image intrinsically, i.e., extract the real structure from a blurred and noisy image, a new algorithm combining traditional image preprocessing algorithms based on the degenerate model and A-net network [47] is proposed here. The overall framework of this algorithm is shown schematically in Figure 1. For the A-net deep learning network, the image pairs of original and label images are required to construct training datasets for the deep neural network. Because of the scarcity and particularity of biological images, it is necessary to build our own biological microtubule image dataset (i.e., SR_MUI dataset), which is obtained using the DWDC method [19] that constitutes of a series of preprocessing methods, discrete wavelet method, Lucy-Richardson deconvolution method and postprocessing methods. The training dataset is input into the network so that the A-net network can learn the mapping relationship between low-resolution images and high-resolution label images. The test dataset is then input into the A-net network for prediction, and the super-resolution images are obtained. required to construct training datasets for the deep neural network. Because of the scarcity and particularity of biological images, it is necessary to build our own biological microtubule image dataset (i.e., SR_MUI dataset), which is obtained using the DWDC method [19] that constitutes of a series of preprocessing methods, discrete wavelet method, Lucy-Richardson deconvolution method and postprocessing methods. The training dataset is input into the network so that the A-net network can learn the mapping relationship between low-resolution images and high-resolution label images. The test dataset is then input into the A-net network for prediction, and the super-resolution images are obtained. Figure 1. Overall framework of the algorithm. A series of preprocessing methods are used to obtain the original image and the label image from the raw image. Denoise and a three-dimensional Gaussian interpolation is performed on the raw confocal images to obtain the original image, as shown by the green arrow. Then, the DWDC algorithm and a binarization are used to obtain highresolution label images (shown by the yellow arrow). The original image and the corresponding label image are composed of image pairs and cropped into 512 × 512 size to construct the biological microtubule image dataset, referred to as the SR_MUI dataset. A-net network trains the parameters in the network through the image of the training dataset, and then obtains the corresponding prediction results. In postprocessing, as shown by the orange arrow, applying binarization on the prediction image obtains a shrinkage outline of the microtubule structures. Subsequently, the binary image is multiplied by the test image to acquire the result image.

Raw Images and Processing Targets
The raw images to be processed in this paper are confocal fluorescent images of 3T3 fibroblast microtubule labeled by tubulin fluorescent dye, which is excited at 640 nm and emitted around 674 nm ( Figure 2). The raw images were captured by a commercial confocal microscope (Nikon A1 LFOV) using an Olympus 100X NA1.4 oil immersion objective lens. Each raw confocal image is in 16-bit TIF format with a size that is 512 × 512 pixels. The pixel interval is 0.25 µm. Overall framework of the algorithm. A series of preprocessing methods are used to obtain the original image and the label image from the raw image. Denoise and a three-dimensional Gaussian interpolation is performed on the raw confocal images to obtain the original image, as shown by the green arrow. Then, the DWDC algorithm and a binarization are used to obtain high-resolution label images (shown by the yellow arrow). The original image and the corresponding label image are composed of image pairs and cropped into 512 × 512 size to construct the biological microtubule image dataset, referred to as the SR_MUI dataset. A-net network trains the parameters in the network through the image of the training dataset, and then obtains the corresponding prediction results. In postprocessing, as shown by the orange arrow, applying binarization on the prediction image obtains a shrinkage outline of the microtubule structures. Subsequently, the binary image is multiplied by the test image to acquire the result image.

Raw Images and Processing Targets
The raw images to be processed in this paper are confocal fluorescent images of 3T3 fibroblast microtubule labeled by tubulin fluorescent dye, which is excited at 640 nm and emitted around 674 nm ( Figure 2). The raw images were captured by a commercial confocal microscope (Nikon A1 LFOV) using an Olympus 100X NA1.4 oil immersion objective lens. Each raw confocal image is in 16-bit TIF format with a size that is 512 × 512 pixels. The pixel interval is 0.25 µm. It can be observed that the filament-like microtubule structures are widely present in It can be observed that the filament-like microtubule structures are widely present in cells, characterized by poor SNR, insufficient spatial resolution and much noise information. In the investigation, it is our goal to develop a universal algorithm to obtain super-resolution images of microtubules from such low-resolution images.

Preprocessing
In order to improve the SNR of the image, threshold denoising is used first to reduce image noise. Since the pixel interval in the raw image is 0.25 µm, it restricts the image resolution to be further improved. To restore the details of the targets, a threedimensional Gaussian interpolation is performed twice, with the Gaussian function as A being the transverse and axial radius, respectively; x c , y c and z c are the interpolation center coordinates; n is the refraction index of medium; N A is the numerical aperture of the lens and λ e is the wavelength of the excitation beam.
Accordingly, the image size is extended from 512 × 512 to 2048 × 2048, leading to reduced pixel interval of 63 nm. The z-stack interval is also reduced from 1 µm to 250 nm. Then, the DWDC algorithm is used to obtain high-resolution label images [19]. In this method, discrete wavelet analysis and the Lucy-Richardson deconvolution method are combined, with binarization and threshold processing, in order to extract the sketch of microtubule structures and prevent the detailed information to be immersed by the background. The method is capable of significantly improving the image resolution of a 3t3 fibroblast microtubule up to 15 times and realizes 123.7 nm resolution. The details of the structure of the microtubule are clearly reserved. Therefore, it is appropriate to use the DWDC method to obtain the label images.
It is well known that larger image size increases the number of network parameters, e.g., the size of the convolutional layer and the computation cost of the loss function. In the process, the expansion brings heavy burden to the server and network for computing. For instance, if the size of the feature convolution layer according to a 512 × 512 image imported into the U-net model is 32 × 32 × 1024, then the size of that when importing a 2048 × 2048 image into the U-net model is 128 × 128 × 1024. The storage requirement is increased 16 times and the computation cost could be more than 16 times, since the neural network is nonlinear.
To improve the efficiency of training, a series of preprocessing steps are made, as diagrammed in Figure 3. On one hand, the original 16-bit TIF images are converted to 8-bit TIF images by projecting data from 16-bit to 8-bit using an approximately linear approach. On the other hand, each image of 2048 × 2048 size, either the original one or the label one, is split into 16 sub-images of 512 × 512 size. Thus, the SR_MUI dataset is constructed by pairing the sub-images corresponding to the original and label images.
It is worth mentioning that since the image size of the training set is 512 × 512, when the test image is input into the A-net network, each test image needs to be divided into 16 sub-images of 512 × 512 pixels as well. These sub-images of test images are processed by the A-net network and the corresponding super-resolution prediction images are obtained.
To improve the efficiency of training, a series of preprocessing steps are made, as diagrammed in Figure 3. On one hand, the original 16-bit TIF images are converted to 8bit TIF images by projecting data from 16-bit to 8-bit using an approximately linear approach. On the other hand, each image of 2048 × 2048 size, either the original one or the label one, is split into 16 sub-images of 512 × 512 size. Thus, the SR_MUI dataset is constructed by pairing the sub-images corresponding to the original and label images. At the beginning, both threshold denoising and a three-dimensional Gaussian interpolation are carried out on the raw confocal images. The image after these processing steps is adopted as the original image for the A-net network. Then, the DWDC algorithm is applied on the original images to obtain high-resolution label images. The label image is further binarized to prevent the network from learning additional feature information. Subsequently, both the original and label images are converted from 16-bit data to 8-bit, and split from 2048 × 2048 pixels to 16 sub-images of 512 × 512 pixels, to reduce the load of A-net computation. Finally, the corresponding sub-image pairs form the SR_MUI dataset.
It is worth mentioning that since the image size of the training set is 512 × 512, when the test image is input into the A-net network, each test image needs to be divided into 16 sub-images of 512 × 512 pixels as well. These sub-images of test images are processed by the A-net network and the corresponding super-resolution prediction images are obtained.

A-Net Network
This investigation focuses on the filament-like microtubule structures that can be approximated as a cluster or mesh of segments. Thus, the U-net network, which has been widely used in image segmentation, was applied here for biostructure extraction and super-resolution processing. One of the most significant advantages of the U-net is that it does not require a large biological dataset. This is particularly important for us, since our dataset is relatively small and there are no established works or public datasets that can fulfill our purpose.
The U-net network is composed of the encoder network and the decoder network with symmetric structures. In the encoder network, there are four convolution blocks for feature maps of different sizes. In the convolution block, there are two 3 × 3 convolutions in sequence, followed by 2 × 2 max pooling. In the decoder network, there are also four deconvolution blocks corresponding to the encoder network. In the deconvolution block, there are two 3 × 3 convolutions in sequence, followed by a transposed convolution. The encoder network doubles the number of channels, reducing the sample size of the feature Figure 3. Diagram of the preprocessing procedures. At the beginning, both threshold denoising and a three-dimensional Gaussian interpolation are carried out on the raw confocal images. The image after these processing steps is adopted as the original image for the A-net network. Then, the DWDC algorithm is applied on the original images to obtain high-resolution label images. The label image is further binarized to prevent the network from learning additional feature information. Subsequently, both the original and label images are converted from 16-bit data to 8-bit, and split from 2048 × 2048 pixels to 16 sub-images of 512 × 512 pixels, to reduce the load of A-net computation. Finally, the corresponding sub-image pairs form the SR_MUI dataset.

A-Net Network
This investigation focuses on the filament-like microtubule structures that can be approximated as a cluster or mesh of segments. Thus, the U-net network, which has been widely used in image segmentation, was applied here for biostructure extraction and super-resolution processing. One of the most significant advantages of the U-net is that it does not require a large biological dataset. This is particularly important for us, since our dataset is relatively small and there are no established works or public datasets that can fulfill our purpose.
The U-net network is composed of the encoder network and the decoder network with symmetric structures. In the encoder network, there are four convolution blocks for feature maps of different sizes. In the convolution block, there are two 3 × 3 convolutions in sequence, followed by 2 × 2 max pooling. In the decoder network, there are also four deconvolution blocks corresponding to the encoder network. In the deconvolution block, there are two 3 × 3 convolutions in sequence, followed by a transposed convolution. The encoder network doubles the number of channels, reducing the sample size of the feature map by half. The decoder network doubles the size of the feature map and half the channel numbers. Therefore, the encoder-decoder network transforms the input image into smallsize and multichannel feature maps, and then decodes the feature map to an output image with the same size. At the same time, a skip-connection is adopted in the U-net network. This operation can connect feature maps in different sizes, which is helpful for gradient propagation and network convergence. All the convolutions in this neural network are followed by batch normalization (BN) and a rectified linear unit (ReLU) for faster training and to prevent the gradient vanishing problem.
Since the sizes of the input image and output image of the U-net network are inconsistent, in order to make the output images of the U-net network have the same size as the input image, all valid convolution in the network is replaced with the same convolution. The employment of the same convolution makes the feature maps of the corresponding layers in encoding network and decoding network exactly the same size. Thereafter, it is appropriate to directly copy the feature map of the encoding network to the decoding network, as shown in Figure 4, and combine it with the feature map of the decoding network through the skip connection. This process avoids the crop operations in the U-net network, which can simplify the processing and reduce the image mismatching during cropping. Accordingly, the revised U-net network is named A-net in this paper.
corresponding layers in encoding network and decoding network exactly the same size. Thereafter, it is appropriate to directly copy the feature map of the encoding network to the decoding network, as shown in Figure 4, and combine it with the feature map of the decoding network through the skip connection. This process avoids the crop operations in the U-net network, which can simplify the processing and reduce the image mismatching during cropping. Accordingly, the revised U-net network is named A-net in this paper. The loss function of A-net is calculated by combining the cross-entropy loss function with a pixel-wise soft-max on the final feature map. The soft-max function can be calculated as follows: where denotes the approximated maximum function, represents the category of pixels, represents the activation function score of the category of pixel is with the pixel position ∈ Ω and Ω ⊂ ℤ , M represents the number of classes, represents the activation function score when the category of image pixel points is , and ∑ exp represents the sum of all classes of activation functions. In conclusion, is the classification result of pixel of class , maximizing the most likely result while suppressing the probability of other categories. The sum of probabilities of all prediction categories is 1. For the that has the maximum activation , the responding 1, while for all the other the responding 0. Then, crossentropy penalizes for a deviation from 1 at every position by Equation (3): The loss function of A-net is calculated by combining the cross-entropy loss function with a pixel-wise soft-max on the final feature map. The soft-max function can be calculated as follows: where P i (x) denotes the approximated maximum function, i represents the category of pixels, a i (x) represents the activation function score of the category of pixel is i with the pixel position x ∈ Ω and Ω ⊂ Z 2 , M represents the number of classes, a j (x) represents the activation function score when the category of image pixel points is j, and ∑ M j=1 exp a j (x) represents the sum of all classes of activation functions. In conclusion, P i (x) is the classification result of pixel x of class M, maximizing the most likely result while suppressing the probability of other categories. The sum of probabilities of all prediction categories is 1. For the i that has the maximum activation a i (x), the responding P i (x) ≈ 1, while for all the other i the responding P i (x) ≈ 0. Then, cross-entropy penalizes P g(x) (x) for a deviation from 1 at every position by Equation (3): where ω ∈ Ω with Ω ⊂ R denotes a weight and g(x) denotes the ground truth of each pixel. The purpose of setting ω is to give higher weights to pixels in the image that are close to the boundary points. In order to let the network learn to distinguish smaller boundaries, the weight graph is calculated in advance with ground truth of each pixel in the label images.

Postprocessing
The A-net network predicts the input sub-images (512 × 512 pixels) of the test image and obtains corresponding predicted sub-images (512 × 512 pixels), which are subsequently assembled as prediction image (2048 × 2048 pixels). Then, a binarization step is applied on the prediction image in order to obtain a shrinkage outline of the microtubule structures. Subsequently, the binary image is multiplied with the test image to obtain the result image (shown in the Results section).

SR_MUI Dataset
In this investigation, a biological microtubule image dataset, i.e., SR_MUI dataset, is constructed based on 3t3 cell images. The raw confocal image, the original image, and the high-resolution label images are shown in Figure 5.
In this investigation, a biological microtubule image dataset, i.e., SR_MUI dataset, is constructed based on 3t3 cell images. The raw confocal image, the original image, and the high-resolution label images are shown in Figure 5.
In the SR_MUI dataset, there are 200 image pairs for training and 50 images for testing. A preview of SR_MUI dataset is shown in Figure 6. It can be seen that the label images clearly extract the sketch of the microtubule structures from the noisy and blurry raw images.  In the SR_MUI dataset, there are 200 image pairs for training and 50 images for testing. A preview of SR_MUI dataset is shown in Figure 6. It can be seen that the label images clearly extract the sketch of the microtubule structures from the noisy and blurry raw images.

Implementation
The numerical experiment is performed on the PyTorch platform with Python language. This study trained and tested the A-net network on a server with 10 NVidia RTX 2080TI GPUs. The epoch number is 200. The size of the minibatch is 1. In the entire training process, the A-net network adopts the Adam optimizer. In the testing process, the test images (2048 × 2048 pixels) were split into 16 sub-images (512 × 512 pixels). After input of the 16 sub-images into the A-net network to obtain the corresponding prediction images, the 16 prediction images were assembled to acquire the prediction image of 2048 × 2048 pixels. The resulting image is obtained after postprocessing. Figure 7a is a typical test image that has low SNR and poor resolution. The cluster of microtubule structures can be roughly distinguished from the crowded backgrounds; the image cannot provide more accurate information on the microtubule distribution. In contrast, in the result images after A-net training, the noise is significantly suppressed and the microtubule structure information is extracted from the test image. In addition, after

Implementation
The numerical experiment is performed on the PyTorch platform with Python language. This study trained and tested the A-net network on a server with 10 NVidia RTX 2080TI GPUs. The epoch number is 200. The size of the minibatch is 1. In the entire training process, the A-net network adopts the Adam optimizer. In the testing process, the test images (2048 × 2048 pixels) were split into 16 sub-images (512 × 512 pixels). After input of the 16 sub-images into the A-net network to obtain the corresponding prediction images, the 16 prediction images were assembled to acquire the prediction image of 2048 × 2048 pixels. The resulting image is obtained after postprocessing. Figure 7a is a typical test image that has low SNR and poor resolution. The cluster of microtubule structures can be roughly distinguished from the crowded backgrounds; the image cannot provide more accurate information on the microtubule distribution. In contrast, in the result images after A-net training, the noise is significantly suppressed and the microtubule structure information is extracted from the test image. In addition, after zooming in on the local image structures of both the test and result images, it can be seen that the two images (Figure 7c,d) show consistent structures of the microtubule. At the same time, in order to verify the consistency of microtubule structures between the test images and the result image, the two images are overlapped, as shown in Figure  8. For a different test image, the microtubule structures are concisely highlighted by the result image. The results clearly demonstrate the capability of the A-net network on preserving the raw filament-like structures. It should be noted, as a result of the high noise level, that some structures have been inevitably segmented. However, the results do not affect estimation of the overall topology of the microtubule structures.   At the same time, in order to verify the consistency of microtubule structures between the test images and the result image, the two images are overlapped, as shown in Figure 8. For a different test image, the microtubule structures are concisely highlighted by the result image. The results clearly demonstrate the capability of the A-net network on preserving the raw filament-like structures. It should be noted, as a result of the high noise level, that some structures have been inevitably segmented. However, the results do not affect estimation of the overall topology of the microtubule structures.

Results
At the same time, in order to verify the consistency of microtubule structures between the test images and the result image, the two images are overlapped, as shown in Figure  8. For a different test image, the microtubule structures are concisely highlighted by the result image. The results clearly demonstrate the capability of the A-net network on preserving the raw filament-like structures. It should be noted, as a result of the high noise level, that some structures have been inevitably segmented. However, the results do not affect estimation of the overall topology of the microtubule structures.  Figure 9 shows the comparison of the image intensity profiles along the horizontal direction between the test and result images. Here, only parts of the intensity profiles are plotted as an example to show the improvement in the images [48]. Observe that the image intensity distribution of the result image has a sharper peak and apparently lower noise. Overall, the result image retains a large amount of the structural information contained in  Figure 9 shows the comparison of the image intensity profiles along the horizontal direction between the test and result images. Here, only parts of the intensity profiles are plotted as an example to show the improvement in the images [48]. Observe that the image intensity distribution of the result image has a sharper peak and apparently lower noise. Overall, the result image retains a large amount of the structural information contained in the test image. The resolution of distinguishing the microtubule structures can be evaluated by the full width at half maxima (FWHM) [49,50]. As shown in the right column of Figure 9, the FWHM in the test image is 1.19 µm as compared to 120 nm in the result image. A super resolution with~10 times improvement of resolution compared to the original image has been realized.
Micromachines 2022, 13, x the test image. The resolution of distinguishing the microtubule structures can be ated by the full width at half maxima (FWHM) [49,50]. As shown in the right col Figure 9, the FWHM in the test image is 1.19 µm as compared to 120 nm in th image. A super resolution with ~10 times improvement of resolution compared original image has been realized. At the same time, the result images obtained by the A-net network are com with those obtained by the DWDC algorithm, as shown in Figure 10. The FWHM result image obtained by the DWDC algorithm is 290 nm, and that of A-net netw 252 nm. An improvement of over 10% has been realized, even though DWDC has exhibited super-resolution image processing capability. At the same time, the result images obtained by the A-net network are compared with those obtained by the DWDC algorithm, as shown in Figure 10. The FWHM of the result image obtained by the DWDC algorithm is 290 nm, and that of A-net network is 252 nm. An improvement of over 10% has been realized, even though DWDC has already exhibited super-resolution image processing capability.
For the result image, the SSIM and PSNR are 0.22 and 25.88, respectively, which are surprisingly low. This is because our purpose is to extract the information of microtubule structures with super resolution; the SSIM and PSNR values cannot provide effective evaluation on image processing. Although there is currently no appropriate criterion to evaluate the processing, the improvement in visual visibility and clarity of image structure is sufficient to demonstrate the effectiveness of this algorithm.
At the same time, the result images obtained by the A-net network are com with those obtained by the DWDC algorithm, as shown in Figure 10. The FWHM result image obtained by the DWDC algorithm is 290 nm, and that of A-net netw 252 nm. An improvement of over 10% has been realized, even though DWDC has a exhibited super-resolution image processing capability. For the result image, the SSIM and PSNR are 0.22 and 25.88, respectively, whi surprisingly low. This is because our purpose is to extract the information of micro structures with super resolution; the SSIM and PSNR values cannot provide effectiv uation on image processing. Although there is currently no appropriate criterion to uate the processing, the improvement in visual visibility and clarity of image struc sufficient to demonstrate the effectiveness of this algorithm.
Furthermore, in this study, the three-dimensional (3D) structures of microtub the basis of raw and result images layer-by-layer is built, as shown in Figure 11a,b, r tively. Figure 11c displays the 3D view of the lower left region of (b). As a result low signal-noise ratio of raw images, the 3D microtubule structures constructed fro raw images are blurry and unclear. The spatial distributions of the structures and the skeleton are indistinguishable. In contrast, the 3D microtubule structures const from result images eliminate the noise and show the structures clearly. The major b ical structures are continuous, which supports the effectiveness of the method. Furthermore, in this study, the three-dimensional (3D) structures of microtubule on the basis of raw and result images layer-by-layer is built, as shown in Figure 11a,b, respectively. Figure 11c displays the 3D view of the lower left region of (b). As a result of the low signal-noise ratio of raw images, the 3D microtubule structures constructed from the raw images are blurry and unclear. The spatial distributions of the structures and even the skeleton are indistinguishable. In contrast, the 3D microtubule structures constructed from result images eliminate the noise and show the structures clearly. The major biological structures are continuous, which supports the effectiveness of the method. . The influence of noise has been significantly inhibited and the cell structure is clearly displayed.

Conclusions
In this investigation, a new method based on the A-net neural network and DWDC method is advanced and used to extract the molecular structure of 3t3 fibroblast microtubule networks from poor quality confocal images. The method requires a relatively small data set and avoids the difficulty of acquiring biological images in biomedical and medical imaging disciplines. The experimental results indicate a 10-fold improvement of spatial resolution, with a super resolution of 120 nm revealed from raw confocal images. The algorithm provides a general way for improving the resolution of filament-like structures with fewer computation resources. The algorithm will benefit broad biological and biomedical research, which rely strongly on optical imaging techniques.  (c) Zoom-in view of the lower left region of (b) (see Visualization 2 for details). The influence of noise has been significantly inhibited and the cell structure is clearly displayed.

Conclusions
In this investigation, a new method based on the A-net neural network and DWDC method is advanced and used to extract the molecular structure of 3t3 fibroblast microtubule networks from poor quality confocal images. The method requires a relatively small data set and avoids the difficulty of acquiring biological images in biomedical and medical imaging disciplines. The experimental results indicate a 10-fold improvement of spatial resolution, with a super resolution of 120 nm revealed from raw confocal images. The algorithm provides a general way for improving the resolution of filament-like structures with fewer computation resources. The algorithm will benefit broad biological and biomedical research, which rely strongly on optical imaging techniques.