Single-Core Multiscale Residual Network for the Super Resolution of Liquid Metal Specimen Images

: In a gravity-free or microgravity environment, liquid metals without crystalline nuclei achieve a deep undercooling state. The resulting melts exhibit unique properties, and the research of this phenomenon is critical for exploring new metastable materials. Owing to the rapid crystallization rates of deeply undercooled liquid metal droplets, as well as cost concerns, experimental systems meant for the study of liquid metal specimens usually use low-resolution, high-framerate, high-speed cameras, which result in low-resolution photographs. To facilitate subsequent studies by material scientists, it is necessary to use super-resolution techniques to increase the resolution of these photographs. However, existing super-resolution algorithms cannot quickly and accurately restore the details contained in images of deeply undercooled liquid metal specimens. To address this problem, we propose the single-core multiscale residual network (SCMSRN) algorithm for photographic images of liquid metal specimens. In this model, multiple cascaded ﬁlters are used to obtain feature information, and the multiscale features are then fused by a residual network. Compared to existing state-of-the-art artiﬁcial neural network super-resolution algorithms, such as SRCNN, VDSR and MSRN, our model was able to achieve higher PSNR and SSIM scores and reduce network size and training time.


Introduction
Deep undercooling is a type of rapid solidification technique for preparing novel materials. Compared to the rapid quenching technique, deep undercooling allows alloys to rapidly solidify with slow cooling. This process provides a new means of studying some of the nonequilibrium phenomena that occur during rapid alloy solidification, and it also allows for the preparation of new materials with various outstanding properties, which are otherwise impossible to obtain by conventional solidification techniques [1]. To study the properties of deeply undercooled melts, it is necessary to simulate a microgravity environment on the ground [2], which is typically performed using a vacuum drop tube. A levitation system is installed at the top of the drop tube, which contains laser heaters and cameras. Laser heaters are used to heat the levitated material, while cameras are used to record this process and photograph the deeply undercooled liquid metal droplet, which is levitated by the vacuum levitation apparatus after it is melted by laser heaters. Owing to cost concerns, these systems usually use high-framerate, low-resolution, high-speed cameras, which can only produce low-resolution photographs [3]. To obtain accurate state information about these deeply undercooled liquid metal droplets, the low-resolution photographs are reconstructed by super resolution-this is currently the most widely used approach to study the properties of liquid metal specimens. and improves the efficiency of the neural network. In this paper, a super-resolution reconstruction network based on a SCMSRN is proposed. The main structure of the network comprises gross feature extraction, hierarchical feature fusion, sub-pixel up-sampling and reconstruction layers. The structure and process are shown in Figure 1. First, the Low Resolution (LR) image is input into the gross feature extraction module, where a 3 × 3 × 64 transition convolutional layer is used to extract the LR image features of the metal sample melt, and the number of channels in the LR image is expanded for subsequent multi-scale feature extraction. The layered feature fusion module is composed of SCMSRBs with M cascades, and the optimal value of M is obtained from the super-resolution reconstruction experiment on the melt image of the liquid metal sample, which is described in detail later. Four 3 × 3 convolution kernels of the residuals are cascaded in each SCMSRB, and the feature map fusion at five scales, namely 1 × 1, 3 × 3, 5 × 5, 7 × 7, and 9 × 9, is extracted to form the local fusion feature map within the residual blocks. Second, the output feature map of each SCMSRB is extracted and fused through the residual network to form a global fusion feature map. Finally, the global fusion feature map is up-sampled by the sub-pixel convolutional layer, and the high-resolution image is obtained by the reconstruction layer.

Network Structure
The single-core multiscale residual network (SCMSRN) super-resolution model proposed in this work is inspired by the VGG [14] architecture. The VGG architecture uses small convolutional cores of the same size stacked on top of each other instead of large convolutional cores; given the same receptive field, this approach increases network depth and improves the efficiency of the neural network. In this paper, a super-resolution reconstruction network based on a SCMSRN is proposed. The main structure of the network comprises gross feature extraction, hierarchical feature fusion, sub-pixel up-sampling and reconstruction layers. The structure and process are shown in Figure 1. First, the Low Resolution (LR) image is input into the gross feature extraction module, where a 64 3 3 × × transition convolutional layer is used to extract the LR image features of the metal sample melt, and the number of channels in the LR image is expanded for subsequent multi-scale feature extraction. The layered feature fusion module is composed of SCMSRBs with M cascades, and the optimal value of M is obtained from the super-resolution reconstruction experiment on the melt image of the liquid metal sample, which is described in detail later. Four 3 × 3 convolution kernels of the residuals are cascaded in each SCMSRB, and the feature map fusion at five scales, namely 1 × 1, 3 × 3, 5 × 5, 7 × 7, and 9 × 9, is extracted to form the local fusion feature map within the residual blocks. Second, the output feature map of each SCMSRB is extracted and fused through the residual network to form a global fusion feature map. Finally, the global fusion feature map is up-sampled by the sub-pixel convolutional layer, and the high-resolution image is obtained by the reconstruction layer.

Gross Feature Extraction
A 3 × 3 × 64 transition convolutional layer is used to increase the number of channels in the LR image before the LR image is input to the SCMSRB. The inside of the residual block is represented by the following mathematical formula.

Gross Feature Extraction
A 3 × 3 × 64 transition convolutional layer is used to increase the number of channels in the LR image before the LR image is input to the SCMSRB. The inside of the residual block is represented by the following mathematical formula.
Here, * denotes a convolution computation, I LR denotes an LR liquid metal sample melt image after interpolation amplification, and F 0 denotes feature extraction of the transition convolutional layer.

Hierarchical Feature Fusion
In a convolutional neural network, the receptive field [15,16] is defined as the region in the input image that provides input to the feature map pixels of each convolutional layer. As network depth increases, the receptive field of the CNN increases in size, and the extracted feature maps also gradually become larger. The receptive field of the nth convolutional layer is given by the following equation.
Here, L n is the size of the receptive field of the nth convolutional layer, L n−1 is the size of receptive field of the n−1th convolutional layer, f n is the size of the convolutional core in the nth convolutional layer, ∏ n−1 i=1 S i is the cumulative stride of the first n−1th convolutional layers. A shallow convolutional layer will have a small receptive field, while a deep convolutional layer will have a large receptive field. Hence, by fusing feature maps from every convolutional layer, one can process features on multiple scales. Therefore, this paper presents a hierarchical feature fusion structure that not only compresses the network scale but also enhances the information flow and feature reuse among the layers of the network, thereby making the network extract more detailed information. This structure is shown in Figure 1 and includes local feature fusion (LFF) and global feature fusion (GFF). LFF is mainly used for feature fusion within each SCMSRB and is expressed by the following mathematical formula.
Here, F LFF represents local feature fusion, and F SCB (•) represents the use of a SCMSRB to extract the local fusion feature. F 0 represents the output feature map of the gross feature extraction module. GFF is used to fuse the local features extracted from M SCMSRBs. After fusion, a 1 × 1 × 64 convolution kernel is used to check the initial input feature, F 0 and reduce the dimensionality of the fusion of M SCMSRBs, to avoid producing too many parameters. It is expressed by the following mathematical formula.

Single-Core Multiscale Residual Block (SCMSRB)
In existing super-resolution algorithms, multiscale feature extraction is usually performed using convolutional cores of varying size. For instance, the multiscale residual network (MSRN) model proposed by Li, et al. and the GoogleNet Inception architecture [12] both use convolutional cores of varying size (e.g., 1 × 1, 3 × 3 and 5 × 5) for feature extraction from LR images. Large convolutional cores are also used in this process. The number of parameters in each convolutional layer, G, is given by the following equation.
Here, K is the size of the convolutional cores in the convolutional layer, C is the number of channels in the LR image, D is the number of convolutional cores in the convolutional layer, B is the bias of the convolutional core, whose value is identical to that of D. If C and D are fixed, G is directly proportional to the size of the convolutional cores. Hence, the larger the convolutional cores, the greater the number of parameters that must be computed in each convolutional layer.
Factorized convolution [17,18] is the decomposition (factorization) of large convolutional cores into a number of small, connected convolutional cores of the same size. The purpose of this process is to reduce the number of parameters in the convolutional cores and the computational complexity of the algorithm. In this way, a (2K + 1) × (2K + 1) × D convolutional core can be factorized into K connected 3 × 3 × D convolutional cores. Given an input feature map with C channels, if the unfactorized convolutional core is F ∈ R ((2K+1)×(2K+1)×D) and M ∈ R (3×3×D×K) is the factorized convolutional core, factorization will decrease the number of parameters by H, which is given by Here, H is proportional to the size of the unfactorized convolutional core, F. In other words, the larger the convolutional core, the greater the decrease in the number of parameters.
Based on multicore multiscale residual blocks (MCMSRBs), we propose the singlecore multiscale residual block. The architecture of MCMSRB and its improved derivative, SCMSRB, are shown in Figures 2 and 3, respectively.
C and D are fixed, G is directly proportional to the size of the convolutional cores. Hence, the larger the convolutional cores, the greater the number of parameters that must be computed in each convolutional layer.
Factorized convolution [17,18] is the decomposition (factorization) of large convolutional cores into a number of small, connected convolutional cores of the same size. The purpose of this process is to reduce the number of parameters in the convolutional cores and the computational complexity of the algorithm. In this way, a (2 + 1) × (2 + 1) × convolutional core can be factorized into K connected 3 × 3 × D convolutional cores. Given an input feature map with C channels, if the unfactorized convolutional core is ∈ (( )×( )× ) and ∈ ( × × × ) is the factorized convolutional core, factorization will decrease the number of parameters by H, which is given by Here, H is proportional to the size of the unfactorized convolutional core, F. In other words, the larger the convolutional core, the greater the decrease in the number of parameters.
Based on multicore multiscale residual blocks (MCMSRBs), we propose the singlecore multiscale residual block. The architecture of MCMSRB and its improved derivative, SCMSRB, are shown in Figures 2 and 3, respectively.  In the SCMSRB, N small and identically sized cascaded-residual convolutional cores are used for multiscale feature extraction. The extracted features are then fused by a concatenation operation and reassembled in one dimension, thus enabling multiscale feature extraction from LR images of liquid metal specimens. Each convolutional core generates 64 convolutional feature maps; after the feature maps have been concatenated, a 1 × 1 × 64 In the SCMSRB, N small and identically sized cascaded-residual convolutional cores are used for multiscale feature extraction. The extracted features are then fused by a concatenation operation and reassembled in one dimension, thus enabling multiscale feature extraction from LR images of liquid metal specimens. Each convolutional core generates 64 convolutional feature maps; after the feature maps have been concatenated, a 1 × 1 × 64 convolutional core is used to perform feature mapping and dimension reduction. The multiscale fused feature map from the SCMSRB is then used as input for the next SCMSRB, which extracts additional detail from the LR image. To improve the expressivity of the network model and increase its nonlinearity, a PReLU activation layer is placed after the convolutional layers. The inside of the SCMSRB is expressed by the following mathematical formula.
Here, f 1×1 and f 3×3 represent feature extraction from 1 × 1 and 3 × 3 convolutional cores in the SCMSRB, respectively, while N is the number of convolutional cores in the SCMSRB.
Here, F M SCB represents the feature extraction of the Mth SCMSRB after reduction in dimensionality by the convolutional layer. W and B represent the 1 × 1 convolutional layer and bias, respectively.

Upsampling Construction
To generate the global fused feature map, a residual network is used to fuse the LR image with the fused feature map from all SCMSRBs. This is inputted into the subpixel convolution layer [8] for upsampling to obtain the High Resulotion (HR) image. Since the subpixel convolution layer simply arranges the pixels and does not perform any true convolution operations, this arrangement improves the efficiency of the network. The computations performed by the subpixel convolution layer are shown below.
Here, PS denotes the dimension transformation operation, which transforms the H × W × C•r 2 feature map into the rH × rW × C HR images. W and B represent the network weight parameter and bias vector, respectively. ϕ represents the tanh activation function, which performs nonlinear operations on the up-sampling output.

Reconstruction
The up-sampled feature map is reconstructed using a 3 × 3 × 3 convolutional layer to obtain a reconstructed image, I HR . It is expressed by the following mathematical formula.
Here, W and B represent the convolution kernel and bias used by the 3 × 3 reconstruction layer, respectively, and I HR represents the reconstructed HR image.

Experimental Environment
The computer used for this experiment was equipped with an Intel Core i7-9750H CPU running at 2.60 GHz, NVIDIA GeForce GTX 1070 GPU, and 16 GB of RAM. The software environment consisted of the 64-bit Windows 10 operating system, CUDA Toolkit 8.0, CUDNN 7.6.5, and the TensorFlow deep learning framework.

Training Details
The Adam optimization [19,20] algorithm was used to optimize the model's parameters during the training phase, with its momentum and weight decay parameters set to 0.9 and 0.0001, respectively. Training was performed with a fixed learning rate of 0.0001, and the training process was terminated when training loss reached 0.00001. "SAME" padding was used during the training process to ensure that the size of the feature maps remained invariant.
As the aim of the training process was to learn an end-to-end mapping between LR and HR images, the training set was divided into N LR-HR image pairs, that is, { LR ,I HR } , and the network was made to learn the residual image between the LR and HR images. Parameter learning was performed using the mean square error (MSE) [20] between the outputted image and residual image of the image pair. MSE is given by the following equation.

Training Details
The Adam optimization [19,20] algorithm was used to optimize the model's parameters during the training phase, with its momentum and weight decay parameters set to 0.9 and 0.0001, respectively. Training was performed with a fixed learning rate of 0.0001, and the training process was terminated when training loss reached 0.00001. "SAME" padding was used during the training process to ensure that the size of the feature maps remained invariant.
As the aim of the training process was to learn an end-to-end mapping between LR and HR images, the training set was divided into N LR-HR image pairs, that is, , and the network was made to learn the residual image between the LR and Mach. Learn. Knowl. Extr. 2021, 3 460 HR images. Parameter learning was performed using the mean square error (MSE) [20] between the outputted image and residual image of the image pair. MSE is given by the following equation.
Here, L is the loss function, θ is the parameter to be trained, N is the total number of training sets, F(x i , θ) and y i represent the reconstructed image and the corresponding HR image, respectively.

Evaluation Criteria
Here, two objective metrics, the peak signal to noise ratio (PSNR) and structural similarity index measure (SSIM), are used to objectively evaluate the efficacy of a few super-resolution algorithms in the reconstruction of liquid-metal droplet images. PSNR represents the fidelity of the reconstructed image with respect to the original image; the higher the PSNR value, the lower the loss in fidelity and the greater the image quality. SSIM is an objective measure of similarity based on three characteristics: luminance, contrast, and structure. Unlike PSNR, which compares images in a pixel-by-pixel fashion, SSIM can be used to quantify structural differences between a pair of images. The greater the SSIM value, the more similar the images and the higher the image quality. The equations for PSNR and SSIM are shown below.
where n is the maximum number of pixel bits in the image, which typically has a maximum value of 255 in grayscale.
where X represents the reconstructed HR image, and Y represents the original LR image.
Here, µ is the mean value of the image, and σ is the variance of the image. C 1 , C 2 and C 3 are constants that prevent the denominator from being zero. The default interval of the SSIM value is [0, 1]. When the value of SSIM approaches 1, the reconstructed image is closer to the HR image.

Objective Index Analysis
Reconstruction experiments were performed on liquid-metal droplet images, and the results were evaluated using the two aforementioned objective metrics. Tables 1 and 2 show the comparison of the results of our SCMSRN algorithm with those of mainstream super-resolution algorithms, including the BICUBIC [21], SRCNN, ESPCN, VDSR and MSRN algorithms, as well as the multicore MSRN (MCMSRN) algorithm shown in Figure 2. The data shown in the tables are averages over all the images that were reconstructed by a given algorithm. Figure 5 illustrates the training convergence curves of the SCMSRN, MSRN and MCMSRN algorithms with a test set that consisted of liquid-metal droplet images with a magnification factor of x2.
where represents the reconstructed HR image, and represents the original LR image. Here, is the mean value of the image, and σ is the variance of the image. C1, C2 and C3 are constants that prevent the denominator from being zero. The default interval of the SSIM value is [0, 1]. When the value of SSIM approaches 1, the reconstructed image is closer to the HR image.

Objective Index Analysis
Reconstruction experiments were performed on liquid-metal droplet images, and the results were evaluated using the two aforementioned objective metrics. Tables 1 and 2 show the comparison of the results of our SCMSRN algorithm with those of mainstream super-resolution algorithms, including the BICUBIC [21], SRCNN, ESPCN, VDSR and MSRN algorithms, as well as the multicore MSRN (MCMSRN) algorithm shown in Figure  2. The data shown in the tables are averages over all the images that were reconstructed by a given algorithm. Figure 5 illustrates the training convergence curves of the SCMSRN, MSRN and MCMSRN algorithms with a test set that consisted of liquid-metal droplet images with a magnification factor of x2. Based on Tables 1 and 2, our SCMSRN algorithm was able to outperform all other algorithms, at all magnification factors. Compared to the similarly sized MSRN algorithm and MCMSRN algorithm, the PSNR of our SCMSRN algorithm was 2.03 dB and 1.58 dB higher when the magnification factor was two, 0.08 dB and 0.04 dB higher when the magnification factor was three, and 0.21 dB and 0.2 dB higher when the magnification factor was four. The improvement in PSNR was most pronounced when the magnification factor Based on Tables 1 and 2, our SCMSRN algorithm was able to outperform all other algorithms, at all magnification factors. Compared to the similarly sized MSRN algorithm and MCMSRN algorithm, the PSNR of our SCMSRN algorithm was 2.03 dB and 1.58 dB higher when the magnification factor was two, 0.08 dB and 0.04 dB higher when the magnification factor was three, and 0.21 dB and 0.2 dB higher when the magnification factor was four. The improvement in PSNR was most pronounced when the magnification factor was two, and the quality of the reconstruction was also highest at this level of magnification.
In Figure 5, it is shown that the introduction of the SCMSRB increased the speed of convergence and reconstruction performance. After 25 epochs of training, the PSNR of the SCMSRN algorithm reached a stable value. The value of the loss function also stopped decreasing after this point.

Visual Effects Analysis
To provide a more intuitive illustration of the results shown by the performance metrics, the image reconstructions were also analyzed in terms of subjective visual acuity. Figure 6 shows the reconstructed and locally magnified images of a 2×-downsampled liquid metal droplet photograph. By comparing the reconstructed images in Figure 6, it is clear that the BICUBIC algorithm was outperformed by all deep learning-based image super-resolution algorithms, as the BICUBIC reconstructions are quite blurry. Compared to the BICUBIC result, the SRCNN algorithm greatly improved image clarity but had a much fuzzier background than the original HR image. The ESPCN and VDSR algorithms were better in terms of background clarity, and the overall clarity of their images were a significant improvement over that of the SRCNN algorithm. The MSRN, MCMSRN and SCMSRN algorithms were able to reconstruct much of the high-frequency detail, and their outputs are a significant improvement over those of the ESPCN and VDSR algorithms in terms of clarity and the definition of the droplet's edges. In terms of subjective visual quality, our algorithm was able to reconstruct the edges of the droplet more clearly than all other algorithms and produce a result that strongly resembles the original HR image. For the reconstructed high-resolution image, the Canny operator is used to extract the image contour, and the diameter and area error ratios between the reconstructed image and the original high-resolution image of various algorithms are compared by the method of calculating pixels, so as to verify the various algorithms' effectiveness in terms of the super-resolution reconstruction of melt images of liquid metal samples. Figure 7 shows that the algorithm is able to rebuild the samples of liquid metal melt. The Canny operator is utilized to extract the contours of the image; it can be seen that this algorithm rebuilds the image contour information with a richer, more complete outline of the small aperture. This algorithm adopts the hierarchical feature fusion mechanism, which can be extracted from the liquid metal melt sample image in greater detail. It can be seen from Table 3 that the melt image of the liquid metal sample reconstructed by the algorithm in this paper can accurately measure the diameter and area of deep undercooled melt. Based on the original image, the diameter error of the algorithm in this paper is only 0.0103, and the area of error is only two, which is the best among all the comparison algorithms. For the reconstructed high-resolution image, the Canny operator is used to extract the image contour, and the diameter and area error ratios between the reconstructed image and the original high-resolution image of various algorithms are compared by the method of calculating pixels, so as to verify the various algorithms' effectiveness in terms of the super-resolution reconstruction of melt images of liquid metal samples. Figure 7 shows that the algorithm is able to rebuild the samples of liquid metal melt. The Canny operator is utilized to extract the contours of the image; it can be seen that this algorithm rebuilds the image contour information with a richer, more complete outline of the small aperture. This algorithm adopts the hierarchical feature fusion mechanism, which can be extracted from the liquid metal melt sample image in greater detail. It can be seen from Table 3 that the melt image of the liquid metal sample reconstructed by the algorithm in this paper can accurately measure the diameter and area of deep undercooled melt. Based on the original image, the diameter error of the algorithm in this paper is only 0.0103, and the area of error is only two, which is the best among all the comparison algorithms.  By comparing a multitude of model architectures in terms of their efficacy in the super resolution of liquid-metal droplet images, it was found that the number of SCMSRBs was a critical factor for image quality. Network models with 1-8 SCMSRBs were trained to extract and merge 1 × 1, 3 × 3, 5 × 5, 7 × 7 and 9 × 9 residual maps, with each SCMSRB having four cascaded-residual 3 × 3 convolutional cores. In order to ensure the fairness of the experimental results, each model was trained by the training set described in Section 3.2, and the performance was tested on the test set. In the training stage, the learning rate  By comparing a multitude of model architectures in terms of their efficacy in the super resolution of liquid-metal droplet images, it was found that the number of SCMSRBs was a critical factor for image quality. Network models with 1-8 SCMSRBs were trained to extract and merge 1 × 1, 3 × 3, 5 × 5, 7 × 7 and 9 × 9 residual maps, with each SCMSRB having four cascaded-residual 3 × 3 convolutional cores. In order to ensure the fairness of the experimental results, each model was trained by the training set described in Section 3.2, and the performance was tested on the test set. In the training stage, the learning rate was set as 0.0001, and the number of iterations was 100. Figure 8 illustrates how the number of SCMSRBs affects the super resolution of liquid-metal droplet images by the SCMSRN algorithm, with magnification factors of two, three, and four. was set as 0.0001, and the number of iterations was 100. Figure 8 illustrates how the number of SCMSRBs affects the super resolution of liquid-metal droplet images by the SCMSRN algorithm, with magnification factors of two, three, and four. During model training, the network model was only cascaded with one single-core multi-scale residual block. The learning rate was fixed, the peak signal-to-noise ratios (PSNRs) of the test set of the deep undercooling melt images were 41.23 dB, 37.75 dB and 36.12 dB at magnification factors of two, three, and four, respectively. It can be seen that Multiscale feature fusion is a highly effective approach for the super resolution of liquidmetal droplet images.
When cascading 8 SCMSRBs, regardless of the magnification factor, the PSNR value of the sample melt image reconstructed by the model will continue to decrease. This might be attributed to the excessive depth of the model causing convergence difficulties during the training phase; the training loss of this network model only reached 0.00001 after almost 300 epochs of training.
Integrating the effect of model reconstruction under the three magnification factors, when four single-core multi-scale residual blocks are cascaded, the PSNR value of the reconstructed liquid-metal droplet image reaches the peak value at the three magnification factors. Therefore, the number of single-core multi-scale residual blocks M is set to four for the best reconstruction effect.

Performance Analysis
To quantify the efficiency gains that were obtained by the introduction of factorized convolution, MCMSRN and SCMSRN network models were constructed based on the basic modules shown in Figure 2 and 3, respectively, and then assessed in terms of computational efficiency. The metrics used to measure computational efficiency were the number of model parameters (Params), the number of floating-point operations (FLOPs) [21], and training time. The Params metric evaluates the size of the network model, that is, its spatial complexity; the higher the Params value, the greater the spatial complexity. The FLOPs metric assesses the computational complexity of the network model, that is, its time complexity. The lower the FLOPs, the lower the time complexity. Training time is defined as the time taken for the loss function to reach 0.00001 during the training phase. In Figure 5b, it is shown that model loss stabilizes after reaching 0.00001, which indicates that convergence occurs at this point. During model training, the network model was only cascaded with one single-core multi-scale residual block. The learning rate was fixed, the peak signal-to-noise ratios (PSNRs) of the test set of the deep undercooling melt images were 41.23 dB, 37.75 dB and 36.12 dB at magnification factors of two, three, and four, respectively. It can be seen that Multiscale feature fusion is a highly effective approach for the super resolution of liquid-metal droplet images.
When cascading 8 SCMSRBs, regardless of the magnification factor, the PSNR value of the sample melt image reconstructed by the model will continue to decrease. This might be attributed to the excessive depth of the model causing convergence difficulties during the training phase; the training loss of this network model only reached 0.00001 after almost 300 epochs of training.
Integrating the effect of model reconstruction under the three magnification factors, when four single-core multi-scale residual blocks are cascaded, the PSNR value of the reconstructed liquid-metal droplet image reaches the peak value at the three magnification factors. Therefore, the number of single-core multi-scale residual blocks M is set to four for the best reconstruction effect.

Performance Analysis
To quantify the efficiency gains that were obtained by the introduction of factorized convolution, MCMSRN and SCMSRN network models were constructed based on the basic modules shown in Figures 2 and 3, respectively, and then assessed in terms of computational efficiency. The metrics used to measure computational efficiency were the number of model parameters (Params), the number of floating-point operations (FLOPs) [21], and training time. The Params metric evaluates the size of the network model, that is, its spatial complexity; the higher the Params value, the greater the spatial complexity. The FLOPs metric assesses the computational complexity of the network model, that is, its time complexity. The lower the FLOPs, the lower the time complexity. Training time is defined as the time taken for the loss function to reach 0.00001 during the training phase. In Figure 5b, it is shown that model loss stabilizes after reaching 0.00001, which indicates that convergence occurs at this point.
In Table 4, it is shown that the SCMSRN algorithm reduced Params by 75%, FLOPs by 75%, and training time by 18 min compared to MCMSRN. Hence, it has been experimentally demonstrated that the SCMSRN algorithm is able to outperform the MCMSRN algorithm in terms of reconstruction quality (while using a smaller number of parameters) and training efficiency.

Conclusions
This work proposes an image super-resolution network model for photographic images of liquid metal specimens (droplets), which uses factorized convolution to reduce the tremendous number of model parameters that result from the use of large convolutional cores for multiscale feature extraction and, thus, improves training efficiency for network models of this type. Single-scale multiscale residual blocks are created based on the ideas of residual networks, and they improve the performance of the network model by reducing its number of parameters while enabling the extraction of features with different scales and ensuring sufficient network depths. In liquid-metal droplet image reconstruction experiments, our network model outperformed all current state-of-the-art super-resolution models, in terms of PSNR and SSIM, at three different magnification factors. In the subjective assessment, our network model was able to clearly reconstruct the edges of the liquid metal droplet. The diameter and area that were calculated from the resulting profile were very similar to those derived from the original high-resolution image. Hence, our image super-resolution algorithm can provide accurate data for molten samples poised in simulated gravity-less or microgravity environments, which is significant for the study of novel metastable materials. In the future, we will incorporate an attention mechanism in the design of our network architecture to improve its performance in image reconstruction and, thus, reduce errors in diameter and area measurements.