Exponential-Distance Weights for Reducing Grid-like Artifacts in Patch-Based Medical Image Registration

Patch-based medical image registration has been well explored in recent decades. However, the patch fusion process can generate grid-like artifacts along the edge of patches for the following two reasons: firstly, in order to ensure the same size of input and output, zero-padding is used, which causes uncertainty in the edges of the output feature map during the feature extraction process; secondly, the sliding window extraction patch with different strides will result in different degrees of grid-like artifacts. In this paper, we propose an exponential-distance-weighted (EDW) method to remove grid-like artifacts. To consider the uncertainty of predictions near patch edges, we used an exponential function to convert the distance from the point in the overlapping regions to the center point of the patch into a weighting coefficient. This gave lower weights to areas near the patch edges, to decrease the uncertainty predictions. Finally, the dense displacement field was obtained by this EDW weighting method. We used the OASIS-3 dataset to evaluate the performance of our method. The experimental results show that the proposed EDW patch fusion method removed grid-like artifacts and improved the dice similarity coefficient superior to those of several state-of-the-art methods. The proposed fusion method can be used together with any patch-based registration model.


Introduction
Medical image registration aims to generate a dense displacement field (DDF) to accurately register a pair of images, and to spatially align anatomical structures [1]. It is a fundamental procedure in various medical image analysis tasks [2]. However, to find the best DDF requires many iterative optimizations between the two images, and the traditional algorithm that is used has a higher time cost [3].
With the rapid development of deep learning, learning-based medical image registration methods have been commonly applied in recent years. It can imitate the process of traditional image registration methods and quickly predict the DDF of two unseen images by training a deep neural network. Patch-based training is not affected by shortage of training datasets as much, since many image patches can be sampled from the original images. In addition, patch-based training usually has better performance locally than whole-image-based training [4]. One challenge regarding patch-based image registration is the patch fusion process, which stacks many image patches to generate the final wholeimage transformation. This patch fusion process can generate grid-like artifacts along the edges of the patches. To solve this problem, Yang et al. [5] introduced Quicksilver, a fast-deformable image registration method. During inference, they provided a probabilistic version which can calculate uncertainties in the predicted deformations. Cao et al. [6] proposed a novel deformable registration method, which is based on a cue-aware deep regression network. In the application stage, the first step was to extract patches from image pairs using a key-point sampling strategy. Then, the DDF patch was predicted by the deep regression network. Finally, the whole DDF could be obtained by the block-wise thin-plate spline (TPS) interpolation. Fan et al. [7] introduced a dual-guided, fully convolutional network for brain image registration, which estimates only the DDF in the central region. The size of the input image pair was 64 × 64 × 64 voxels, and the output was a 24 × 24 × 24 DDF patch. Hu et al. [8] used a stride of four to generate patches which can produce a smoother DDF in the inference. At the same time, to further increase the smoothness of the DDF and to ensure that it contained enough neighborhood information, surface-discarding was adopted. Finally, the whole DDF was obtained by the arithmetic average weighted (AAW) method.
Most of these methods can remove grid-like artifacts of DDFs with probabilistic models, a small stride, and estimating only the central region, etc. However, they cannot quickly obtain DDF without these grid-like artifacts at a large stride. Hence, we have applied an exponential-distance-weighted (EDW) scheme to the patch fusion process. Compared with the fusion methods used in the above literature, the EDW patch fusion strategy proposed in this paper can achieve more significant performance in a larger stride in a shorter time. Our code is freely available at https://github.com/LiangWUSDU/EDW (accessed on 24 September 2021). The main contributions are summarized as follows:

1.
We calculate the relative weight of each point prediction using an exponential function on each patch, according to the distance from each point to the center point. This allows fusion of the predictions from all overlapping patches, while giving lower weight to predictions that are made by the patches near their edges.

2.
The proposed patch fusion method can be used together with a patch-based deep learning model for registration without any modification to significantly improve network predictions.

Grid-like Artifacts
There are two main reasons for the emergence of grid-like artifacts: (1) the influence of using zero-padding in the feature extraction process. (2) the influence of strides on the choosing of patches in the testing stage.
Deep neural networks (DNNs) have been successfully applied in order to enhance the state-of-the-art of many segmentation, registration and classification tasks. Among them, convolution is often used as an effective method for feature extraction. By sliding the filter over the input feature maps, the dot product is taken between the filter and the parts of the input feature. The output is a new set of feature maps. As shown in Figure 1, in 2D convolution, the stride is 1, the input is a 5 × 5 feature map, and the output is a 3 × 3 feature map, by a 3 × 3 convolution kernel. In some image processing tasks such as image registration and segmentation, the output and input should have the same size. Zero-padding and transposed convolution are the two most popular approaches. As shown in Figure 2a, zero-padding allow us to control the size of the feature map, by padding 0 to make the output size the same as the input size. In Figure 2b, transposed convolution is the reverse of normal convolution, but only regarding size. To ensure the same size of the output as the input in this convolution operation, it is necessary to perform padding 0 before convolution. This kind of zero-padding increases the uncertainty of the edge regions. The same conclusion is obtained for the transposed convolution.    In the patch-based image registration method, due to the image features in the patch edges being incomplete, the patch prediction at the patch edges is less accurate. In addition, the extraction of patches using sliding windows with different strides also produces different degrees of grid-like artifacts. In Figure 3a, when the stride is consistent with the patch size, there is no overlap between the patches, which leads to severe grid-like artifacts after fusion, due to the inconsistent information of each patch edge. When the stride is smaller than the patch size, the AAW is generally used for the overlapping regions. The weights of the overlapping regions are the same no matter how far away from the center they are. The same grid-like artifacts will appear, as shown in Figure 3b. In the patchbased training method, the central region is more informative than the edge region. It is not reasonable to use the same weight for the edge region and the central region. Therefore, this study set the weight using an exponential function which was calculated by the distance from the points in the patch to the center point.  In the patch-based image registration method, due to the image features in the patch edges being incomplete, the patch prediction at the patch edges is less accurate. In addition, the extraction of patches using sliding windows with different strides also produces different degrees of grid-like artifacts. In Figure 3a, when the stride is consistent with the patch size, there is no overlap between the patches, which leads to severe grid-like artifacts after fusion, due to the inconsistent information of each patch edge. When the stride is smaller than the patch size, the AAW is generally used for the overlapping regions. The weights of the overlapping regions are the same no matter how far away from the center they are. The same grid-like artifacts will appear, as shown in Figure 3b. In the patch-based training method, the central region is more informative than the edge region. It is not reasonable to use the same weight for the edge region and the central region. Therefore, this study set the weight using an exponential function which was calculated by the distance from the points in the patch to the center point. In the patch-based image registration method, due to the image featu edges being incomplete, the patch prediction at the patch edges is less acc tion, the extraction of patches using sliding windows with different stride different degrees of grid-like artifacts. In Figure 3a, when the stride is con patch size, there is no overlap between the patches, which leads to sever facts after fusion, due to the inconsistent information of each patch edge. W is smaller than the patch size, the AAW is generally used for the overlappi weights of the overlapping regions are the same no matter how far away they are. The same grid-like artifacts will appear, as shown in Figure 3b based training method, the central region is more informative than the ed not reasonable to use the same weight for the edge region and the central fore, this study set the weight using an exponential function which was ca distance from the points in the patch to the center point.

Distance Functions
Some of the common distance functions in the field of image processing are Euclidean distance [9], Manhattan distance [9], Chebyshev distance [10] et al. These three functions are all metrics which compute a distance value based on two data points, and they are widely used in medical image processing [11,12]. Hence, in this work, we evaluated these three distance functions. The patch size is h × w × c. In the patch coordinate system, the The coordinate of the center point of the patch is ( h−1 2 , w−1 2 , c−1 2 ).

Euclidean Distance
Euclidean distance is the most widely used distance metric. The Euclidean distance d e , between two points in Euclidean space, is the length of the line segment between the two points, as:

Manhattan Distance
Manhattan distance is calculated as the sum of the absolute differences between the two points. The Manhattan distance d m of two points is:

Chebyshev Distance
The Chebyshev distance between two points is the maximum absolute magnitude of the differences between the coordinates of the points. The Chebyshev distance d c of two points is: In Figure 4b-d, the deep blue color indicates a large distance from the center point, and the deep yellow color indicates a small distance. It can be visualized that all three distance functions become smaller when they are closer to the center point. In addition, they increase from the center point to the surrounding area in a square, star, and circle manner, respectively. Additionally, in the heat map of the AAW method, the colors are the same for all positions, indicating that each position receives the same weight. From the frequency domain analysis ( Figure 5), the amplitude of the AAW method oscillates, indicating that it still had more high-frequency components. From the amplitude of the Chebyshev distance function, we observed that it also retained a small number of highfrequency components. Therefore, these two methods would still have grid-like artifacts on DDFs. The amplitude of Manhattan distance and Euclidean distance are relatively stable, which can achieve the purpose of removing grid-like artifacts while preserving the original information.    . Plot of the weights of the AAW method, the exponential weighted distance (a) and the magnitude of the corresponding Fourier transformation (b) for the 32nd row of the 33rd slice. From left to right: AAW, exponential Chebyshevdistance-weighted (ECDW), exponential Manhattan-distance-weighted (EMDW), and exponential Euclidean-distanceweighted (EEDW). From the amplitude of FFT, we can find that the AAW method and ECDW retain some high-frequency information and cannot remove the grid-like artifacts to a greater degree.

The Process of Patch Fusio
In our work (as illustrated in Figure 6), we extract overlapping patches via a sliding window with a stride of s . Figure 5. Plot of the weights of the AAW method, the exponential weighted distance (a) and the magnitude of the corresponding Fourier transformation (b) for the 32nd row of the 33rd slice. From left to right: AAW, exponential Chebyshev-distance-weighted (ECDW), exponential Manhattan-distance-weighted (EMDW), and exponential Euclideandistance-weighted (EEDW). From the amplitude of FFT, we can find that the AAW method and ECDW retain some high-frequency information and cannot remove the grid-like artifacts to a greater degree.

The Process of Patch Fusion
In our work (as illustrated in Figure 6), we extract overlapping patches via a sliding window with a stride of s. An MR image can generate m patches, which can be calculated as follows: Figure 6. The pipeline of patch fusion in the test phase. The size of both the moving image and the fixed image is H × W × C, and the overlapping patches of size h × w × c are extracted using the sliding window with stride s. The DDF patch of h × w × c × 3 is obtained by the trained registration model. The output DDF patch is located at the same location as the input patch. Finally, the whole DDF is obtained by the EDW method. An MR image can generate m patches, which can be calculated as follows: , and n is the number of overlaps at . n i can be calculated as follows: We can calculate n j ,n k by the same way. According to the distance map of each patch, we can determine the corresponding distance overlapping patches, at position (i, j, k). Due to the uncertainty of the edges, we use the following equation to obtain the normalized weights: where, ω i,j,k ts is the weight of t overlapping patches at the point (i, j, k), t ∈ [1, n], n is the number of overlapping points (i, j, k) when extracting the patch with a stride of s. e (≈2.71828), a natural constant which is the base of the exponential function. From Figure 7 we observed that the further the distance from the patched centroid is, the smaller the value of the weight is. This can give a greater confidence to the central region and improve the final prediction. Hence, the final prediction value of this point is: where,φ i,j,k s is the final predicted value of the DDF at point (i, j, k). The whole DDFφ can be obtained by applying the above processing to all voxels. where, , ,i j k s φ is the final predicted value of the DDF at point ( , , ) i j k . The whole DDF φ can be obtained by applying the above processing to all voxels. φ ω φ ω φ = + from Equation (7).

Dataset Description
Experimental data were obtained from The Open Access Series of Imaging Studies (OAISIS-3) (https://www.oasis-brains.org, accessed on 15 July 2021.) [13]. The OASIS-3 dataset was a longitudinal imaging, clinical and cognitive dataset of both normal aging Points A and B are the center points of patch 1 and patch 2, respectively, and point C is a point in the overlap region. The predicted value of point C in patch 1 and patch 2 are φ A and φ B , respectively. d 1 is the distance from point C to point A when point C is in patch 1, d 2 is the distance from point C to point B when point C is in patch 2. The final predicted value is φ C . According to Equation (6), the weights of point C in patch 1 and patch 2 are as follows: (7).

Dataset Description
Experimental data were obtained from The Open Access Series of Imaging Studies (OAISIS-3) (https://www.oasis-brains.org, accessed on 15 July 2021) [13]. The OASIS-3 dataset was a longitudinal imaging, clinical and cognitive dataset of both normal aging and Alzheimer's disease. It included 609 cognitively normal subjects and 489 subjects with varying stages of cognitive decline. We randomly selected 800 TI-weighted (T1) subjects for the experiment. We used 750 subjects for training, 50 subjects for inference, and the MNI-152 brain atlas was used as the fixed image. In preprocessing, each subject was linearly aligned to the MNI-152 brain atlas. The final image size was 160 × 192 × 160 voxels with 1 × 1 × 1 mm 3 voxel resolution. The dataset also contained segmentation label images of cerebrospinal fluid (CSF), gray matter (GM), and white matter (WM).

Experimental Details
During training, we extracted 80 patches with sizes 64 × 64 × 64 from each image with a stride of 32. The total number of patch pairs of MR images was 600,000. During reference, we explored the performance of the proposed fusion method by extracting the patches using sliding windows with different strides according to Equation (4), as shown in Table 1. We evaluated the performance of the registration using the dice similarity coefficient (DSC). In addition, to quantify the deformation regularity, we calculated the Jacobian determinant det(Dφ −1 ) of the DDF. det(Dφ −1 ) < 0 indicates the locations where folding has occurred. The proportion of folding voxels ρ = ∑ δ(det(Dφ −1 )<0) V is computed to evaluate the topology-preserving performance [14]. We conducted experiments on the OASIS-3 dataset to evaluate the performance of the proposed fusion method and to compare it with three state-of-the state methods: AAW, MIScnn [15] and patchify [16]. Three kinds of deep learning models were trained: (1) Voxel-Morph: this was a typical representative of unsupervised learning-based registration; (2) To produce a smoother DDF, a Jacobian constraint loss JD = 0.5( det(Dφ −1 ) − det(Dφ −1 )) was added to the loss function of VoxelMorph to reduce the folding of the DDF [2], denoted as VoxelMorph (JD); (3) Label-reg: this was a weakly-supervised image registration model [1].

Experimental Results
We used the trained VoxelMorph (JD) model to test the patches extracted from a sliding window with stride of 16. In our fusion method, Euclidean distance was chosen as the distance function. Table 2 shows the DSC values of CSF, GM, and WM, as well as the folding rate for the three fusion methods. As can be seen from Table 2, our method obtained higher DSC values when compared with the other three methods. However, the ρ values of MIScnn and AAW were lower, and the resulting DDF was smoother. In Figure 8, we show the DDFs of different fusion methods. It can be seen that the DDFs of AAW, MIScnn and patchify have obvious seam lines between patches, which indicates that the prediction of the edge region was inaccurate. Since our method considered the uncertainty of the edge, different weights were given to the center regions and the edge regions, which effectively eliminated the seams. From the enlarged red box, we can see that the DDF obtained by our method does not show grid-like artifacts. However, from the red markers, both our method and patchify retain many folding points which was mainly caused by the predicted model itself, while both the AAW and MIScnn methods changed the predicted values to some extent.

of 13
patchify retain many folding points which was mainly caused by the predicted model itself, while both the AAW and MIScnn methods changed the predicted values to some extent. To verify the robustness of the method, we tested it on VoxelMorph and Label-reg models. The experimental results are shown in Figure 9. We found that we obtained DDF without grid-like artifacts under different models.

Comparisons of the Results with Different Strides
In the AAW method, the stride has a great influence on the patch fusion process. Table 3 shows the DSC, ρ and fusion time of AAW and proposed methods with different strides.  To verify the robustness of the method, we tested it on VoxelMorph and Label-reg models. The experimental results are shown in Figure 9. We found that we obtained DDF without grid-like artifacts under different models.  To verify the robustness of the method, we tested it on VoxelMorph and Label-reg models. The experimental results are shown in Figure 9. We found that we obtained DDF without grid-like artifacts under different models.

Comparisons of the Results with Different Strides
In the AAW method, the stride has a great influence on the patch fusion process. Table 3 shows the DSC, ρ and fusion time of AAW and proposed methods with different strides.

Comparisons of the Results with Different Strides
In the AAW method, the stride has a great influence on the patch fusion process. Table 3 shows the DSC, ρ and fusion time of AAW and proposed methods with different strides. In Table 3, when the stride decreases, the smoothness of DDF increases with a smaller ρ. However, the DSC of the CSF, GM and WM are nearly similar. As the number of patches increases with a small stride, the fusion time is significantly increased. Compared with the AAW method, the DSC and ρ values of our method are higher than those obtained by AAW under different strides. We found that the stride had little effect on the ρ obtained by our method, while the AAW method obtained a smoother deformation field with smaller strides. This further proves that our method can robustly maintain the predicted values of the registration model. In addition, the fusion times of the two methods has little difference. In Figure 10, we found that the DDFs of our method under different strides removed the grid-like artifacts, but the AAW method still has a small number of grid-like artifacts, even at a stride of 4 × 4 × 4. We can conclude that to obtain the DDF without grid-like artifacts, the AAW method requires a smaller stride while significantly increasing the fusion time. However, our method can obtain the DDF without grid-like artifacts in a shorter time, even under a larger stride, as well as a higher DSC value than that obtained by the AAW method. with the AAW method, the DSC and ρ values of our method are higher than those obtained by AAW under different strides. We found that the stride had little effect on the ρ obtained by our method, while the AAW method obtained a smoother deformation field with smaller strides. This further proves that our method can robustly maintain the predicted values of the registration model. In addition, the fusion times of the two methods has little difference. In Figure 10, we found that the DDFs of our method under different strides removed the grid-like artifacts, but the AAW method still has a small number of grid-like artifacts, even at a stride of 4 × 4 × 4. We can conclude that to obtain the DDF without grid-like artifacts, the AAW method requires a smaller stride while significantly increasing the fusion time. However, our method can obtain the DDF without grid-like artifacts in a shorter time, even under a larger stride, as well as a higher DSC value than that obtained by the AAW method.

Comparisons of the Results with Different Distance Functions
To explore the effect of distance functions on patch fusion, we choose Chebyshev distance, Manhattan distance, and Euclidean distance for our experiments. The results are shown in Table 4.

Comparisons of the Results with Different Distance Functions
To explore the effect of distance functions on patch fusion, we choose Chebyshev distance, Manhattan distance, and Euclidean distance for our experiments. The results are shown in Table 4. From Table 4, it can be observed that the distance functions have little difference in the quantitative analysis. We also show the fusion results of the three distance functions in Figure 11. From the top row, we can see that the slices in the middle of the DDF introduced the three distance functions without grid-like artifacts. However, the Chebyshev distance still has a significant seam at the edge slice seen on the bottom row. In Figure 4, when using the Chebyshev distance as a distance function, the distances from the points at the edge to the centroid are equal. This gives the same weight to all predicted values at that point no matter how many times they overlap, which results in grid-like artifacts still appearing at the edge. Combined with Figure 5, the Chebyshev distance appears to undulate in the transition from low to high frequencies, which indicates that it is less stable at the edge and will retain some high-frequency information. Therefore, by our method, introducing the Euclidean distance or Manhattan distance can obtain the DDF without grid-like artifacts.
Sensors 2021, 21, x 11 of 13 distance still has a significant seam at the edge slice seen on the bottom row. In Figure 4, when using the Chebyshev distance as a distance function, the distances from the points at the edge to the centroid are equal. This gives the same weight to all predicted values at that point no matter how many times they overlap, which results in grid-like artifacts still appearing at the edge. Combined with Figure 5, the Chebyshev distance appears to undulate in the transition from low to high frequencies, which indicates that it is less stable at the edge and will retain some high-frequency information. Therefore, by our method, introducing the Euclidean distance or Manhattan distance can obtain the DDF without grid-like artifacts.
(a) (b) Figure 11. DDFs of three distance functions at 100 slices (a) and 160 slices (b) of the axial plane. The 1st column is the Chebyshev distance, the 2nd column is the Manhattan distance, and the 3rd column is the Euclidean distance.

Comparisons of the Results with Different Weighting Methods
The weighting method is the key to our fusion method. In this section, we chose the inverse-distance weighting (IDW) method to compare with our method. The distance function used is Euclidean distance. In Figure 12, we found that the IDW approach did not remove the grid-like artifacts. From the corresponding function curves in Figure 12c, it can be seen that in the central region (small distance), both weighting methods can give larger weighting coefficients, while in the edge region (larger distance), the IDW method produces larger weights than the EDW method. This indicates that the IDW method does not find the suitable weighting coefficients in the central and edge regions to eliminate the uncertainty of the edge, which results in its fusion results still having grid-like artifacts. Figure 11. DDFs of three distance functions at 100 slices (a) and 160 slices (b) of the axial plane. The 1st column is the Chebyshev distance, the 2nd column is the Manhattan distance, and the 3rd column is the Euclidean distance.

Comparisons of the Results with Different Weighting Methods
The weighting method is the key to our fusion method. In this section, we chose the inverse-distance weighting (IDW) method to compare with our method. The distance function used is Euclidean distance. In Figure 12, we found that the IDW approach did not remove the grid-like artifacts. From the corresponding function curves in Figure 12c, it can be seen that in the central region (small distance), both weighting methods can give larger weighting coefficients, while in the edge region (larger distance), the IDW method produces larger weights than the EDW method. This indicates that the IDW method does not find the suitable weighting coefficients in the central and edge regions to eliminate the uncertainty of the edge, which results in its fusion results still having grid-like artifacts.

Discussion
Zero-padding is a relatively common operation in deep learning networks to keep the same input and output sizes. However, this operation is subject to uncertainty in edge prediction, which leads to grid-like artifacts in the patch fusion process. To solve this problem, we propose an exponential-distance-weighted fusion method. This method uses an exponential function to convert the distance from the predicted value of each patch to the center point into a set of weight coefficients. The larger the distance is, the smaller the weight coefficient is. Finally, the predicted DDF value of each voxel is obtained by a weighting method.
We performed experiments on the OASIS-3 dataset. By comparing our proposed method with AAW, MIScnn and patchify, three different fusion methods, our method obtained a seam-free DDF. Through a quantitative analysis, our overall DSC in the three types of brain tissue was 0.7784, and the other three methods were 0.7661, 0.7632, and 0.7668, respectively. It was found that our method is significantly better than these three methods. In addition, when compared with AAW and MIScnn, our fusion method did not change the predicted value of the registration model. To demonstrate the robustness and the effectiveness of our method, we also validated it on different models. In addition, we discussed the DDF results which were obtained under different strides, distance functions, and weighting methods. However, our method was found to have two shortcomings: (1) although our method can remove grid-like artifacts, it may change the true prediction results of the model after weighting; (2) the DDF obtained by our method is not smooth enough, and the folding rate still exists.

Conclusions
In this paper, we introduced a distance function to reduce grid-like artifacts when performing patch-based image registration. We demonstrated that our proposed EDW method has significant advantages over existing patch fusion methods. Moreover, our method is easy to implement into existing deep learning models, even if they are already trained. In the future, we will use the registration model to learn the weight coefficients of overlapping regions so that the contextual information can be fully considered, and the true prediction results of the model can be preserved. In addition, only the Euclidean distance, Manhattan distance, and Chebyshev distance were selected for experiments in this paper. Next, we will introduce more distance functions, and explore the influence of the distance function used, the network structure and the distance variance on this method.

Discussion
Zero-padding is a relatively common operation in deep learning networks to keep the same input and output sizes. However, this operation is subject to uncertainty in edge prediction, which leads to grid-like artifacts in the patch fusion process. To solve this problem, we propose an exponential-distance-weighted fusion method. This method uses an exponential function to convert the distance from the predicted value of each patch to the center point into a set of weight coefficients. The larger the distance is, the smaller the weight coefficient is. Finally, the predicted DDF value of each voxel is obtained by a weighting method.
We performed experiments on the OASIS-3 dataset. By comparing our proposed method with AAW, MIScnn and patchify, three different fusion methods, our method obtained a seam-free DDF. Through a quantitative analysis, our overall DSC in the three types of brain tissue was 0.7784, and the other three methods were 0.7661, 0.7632, and 0.7668, respectively. It was found that our method is significantly better than these three methods. In addition, when compared with AAW and MIScnn, our fusion method did not change the predicted value of the registration model. To demonstrate the robustness and the effectiveness of our method, we also validated it on different models. In addition, we discussed the DDF results which were obtained under different strides, distance functions, and weighting methods. However, our method was found to have two shortcomings: (1) although our method can remove grid-like artifacts, it may change the true prediction results of the model after weighting; (2) the DDF obtained by our method is not smooth enough, and the folding rate still exists.

Conclusions
In this paper, we introduced a distance function to reduce grid-like artifacts when performing patch-based image registration. We demonstrated that our proposed EDW method has significant advantages over existing patch fusion methods. Moreover, our method is easy to implement into existing deep learning models, even if they are already trained. In the future, we will use the registration model to learn the weight coefficients of overlapping regions so that the contextual information can be fully considered, and the true prediction results of the model can be preserved. In addition, only the Euclidean distance, Manhattan distance, and Chebyshev distance were selected for experiments in this paper. Next, we will introduce more distance functions, and explore the influence of the distance function used, the network structure and the distance variance on this method.