Super-Resolution and Feature Extraction for Ocean Bathymetric Maps Using Sparse Coding

The comprehensive production of detailed bathymetric maps is important for disaster prevention, resource exploration, safe navigation, marine salvage, and monitoring of marine organisms. However, owing to observation difficulties, the amount of data on the world’s seabed topography is scarce. Therefore, it is essential to develop methods that effectively use the limited data. In this study, based on dictionary learning and sparse coding, we modified the super-resolution technique and applied it to seafloor topographical maps. Improving on the conventional method, before dictionary learning, we performed pre-processing to separate the teacher image into a low-frequency component that has a general structure and a high-frequency component that captures the detailed topographical features. We learn the topographical features by training the dictionary. As a result, the root-mean-square error (RMSE) was reduced by 30% compared with bicubic interpolation and accuracy was improved, especially in the rugged part of the terrain. The proposed method, which learns a dictionary to capture topographical features and reconstructs them using a dictionary, produces super-resolution with high interpretability.


Introduction
Ocean bathymetric maps provide basic information in various scientific and engineering fields, including geomorphology, physical oceanography, disaster prevention, and resource exploration. Despite its importance, more than three-quarters of the total ocean floor on Earth remains unmapped using detailed measurement methods such as acoustic surveys with a 15 arc-second (~250 m) interval grid, which is the reference resolution of the GEBCO_2021 Grid [1]. Because it is recognised as a global issue, several international and domestic projects, such as the "Nippon Foundation-GEBCO SEABED2030 Project (SEABED2030)" [2] and "DeSET Project" [3], is currently underway. Creating topographical maps with as high resolution as possible from available datasets and finding characteristic topographical patterns from them solves various problems. The technology to generate high-resolution (HR) from low-resolution (LR) seafloor topographical maps can be a supplementary means when comprehensive and detailed acoustic surveys using ships are difficult.
Super-resolution is a general term for image processing, such as upscaling and/or improving the image details. The simplest and straightforward method for upscaling an image uses geometrical interpolation, such as bilinear and bicubic methods. Conversely, this approach only uses information about the continuity of pixel value, which is a mainstream approach in the image-processing field, and uses useful information on the details of images by presumably learning the correspondence between LR and HR image pairs (example, [4][5][6][7][8]). The approach includes sparse-coding methods (example, [4,6,[9][10][11][12]), that use the property of images in which small patches from images can be represented by the sum of a small number of image bases and deep-learning methods that enable complex feature extraction contained in data by combining multiple layers and numerous feature-extraction filters (example, [13][14][15]).
Recently, deep-leaning-type methods have been applied to the super-resolution of bathymetric maps (example, [16][17][18]), and they have attracted considerable attention owing to their high prediction accuracy, which exceeds that of conventional interpolation methods (example, [19]). However, they have several drawbacks in terms of applications to scientific problems [20]. (1) They require numerous datasets which are appropriate for a specific target problem, and (2) they are highly black-boxed and have low interpretability which is not suitable for obtaining scientific knowledge. Therefore, it is expected to develop a highresolution method with interpretability that allows the results to be simply understood and leads to the derivation of scientific knowledge.
In this study, we focus on the potential of sparse coding super-resolution (ScSR) based on dictionary learning. This method is highly interpretable because it is a simple linear method for super-resolution which extracts a small number of important features. Therefore, the application of ScSR to seabed topographical maps is expected to simultaneously provide highly interpretable topographical-feature extraction and super-resolution. The objective of this study is to establish a super-resolution method for ocean bathymetric images using sparse modelling and verify its usefulness for increasing the resolution of rough bathymetry and extracting features of seafloor topography.
We describe the method, which is mainly based on Yang's and Elad's methods [6,9]; however, it is generally extended for use in natural science data. Subsequently, the method is applied to a multibeam echo sounder (MBES) from the Mid-Okinawa Trough. The results are compared with the bicubic method as a standard method and improvements in the accuracy and extraction of topographical features are discussed. This is a preliminary report on the incubation stag; the future direction of research is discussed.

Method
In this section, the core algorithm for the super-resolution of seabed topography is explained. The proposed algorithm, which is mainly based on [9], consists of three parts-dictionary learning, sparse coding, and reconstruction (Figures 1 and 2).

Dictionary learning
This subsection describes the data pre-processing procedures and the algorithms for dictionary learning. Important features in bathymetry often involve abrupt changes in depth. In order to extract topographical features from bathymetric maps, high-frequency components were extracted from original images by separating low-frequency components. The low-frequency component � blur was obtained by applying a Gaussian filter to � 0 , which is the original HR image for dictionary learning. The subtraction of � blur from � 0 yields the high-frequency component � ′ using the following equation.
This is the target of sparse-coding super-resolution (ScSR) estimation. Furthermore, � is the high-frequency component of the LR grid data � 0 , which is the original LR image used for dictionary learning.

Dictionary learning
This subsection describes the data pre-processing procedures and the algorithms for dictionary learning. Important features in bathymetry often involve abrupt changes in depth. In order to extract topographical features from bathymetric maps, high-frequency components were extracted from original images by separating low-frequency components. The low-frequency component � blur was obtained by applying a Gaussian filter to � 0 , which is the original HR image for dictionary learning. The subtraction of � blur from � 0 yields the high-frequency component � ′ using the following equation.
This is the target of sparse-coding super-resolution (ScSR) estimation. Furthermore, � is the high-frequency component of the LR grid data � 0 , which is the original LR image used for dictionary learning.

Dictionary Learning
This subsection describes the data pre-processing procedures and the algorithms for dictionary learning. Important features in bathymetry often involve abrupt changes in depth. In order to extract topographical features from bathymetric maps, high-frequency components were extracted from original images by separating low-frequency components. The low-frequency component X blur was obtained by applying a Gaussian filter to X 0 , which is the original HR image for dictionary learning. The subtraction of X blur from X 0 yields the high-frequency component X using the following equation. (1) This is the target of sparse-coding super-resolution (ScSR) estimation. Furthermore, Y is the high-frequency component of the LR grid data Y 0 , which is the original LR image used for dictionary learning. where H is the Gaussian filter used as a blurring operator. The Gaussian filter is an averaging filter weighted according to the spatial Gaussian distribution. Edge components of Y is extracted and divided into N patches. See Appendix A for details on edge component extraction of Y and its dimensionality reduction. The length of patches of each edge component is compressed to n low .
The LR dictionary D L is learned by applying the K-SVD algorithm [21] on the edge component patches of Y. For the obtained patch set with n low patch length, the K-SVD algorithm is used to solve the following optimisation problem to learn an LR dictionary D L ∈ R n low ×N atom , where N atom is the number of atoms in the dictionary.
where P ∈ R n low ×N is a matrix with each patch as a column element, α ∈ R N atom ×N is a matrix with sparse code α i as a column element corresponding to the i-th patch, and k 0 is the maximum number of non-zero elements. The learning process began by fixing the initial dictionary D and finding α using orthogonal matching pursuit (OMP). Then, α is fixed and dictionary D is updated using the K-SVD algorithm. The initial dictionary D was randomly sampled from the standard normal distribution, and the column was normalised.
To learn the HR dictionary D H from X , we focus on the edge component of X and generate the difference data X between X and the low-frequency component U = Q Y , where Q is an up-sampling operator. In this study, we adopted bicubic interpolation as the up-sampling method. Bicubic interpolation smoothly interpolates luminance values by fitting them with a cubic function using four pixels around the target coordinates. The HR dictionary D H is created using HR learning data X and the sparse representation matrix α obtained by learning the LR dictionary D L , where α T α + is the Moore-Penrose pseudo-inverse [22,23] of α T α.

Sparse-Coding and Reconstruction
The obtained LR image Y 0 is separated into a low-frequency component Y blur , and a high-frequency component Y.
We extracted the edge component of the high-frequency component Y by applying a differential filter in the same manner as in the dictionary learning process, and patch length of edge components of Y is compressed to n L by PCA (see Appendix A).
Using the learned D L for matrix P L ∈ R n L ×N with the obtained patches as the edge components of Y, we obtained matrixα ∈ R N atom ×N with the corresponding sparse codeα i as column elements. To solve this, we compute the following optimisation problem in the OMP [24] To reconstruct an HR image, we usedα obtained in the sparse coding process and the HR dictionary D H . We obtained matrixP H with a group of HR patches as column elements from the product of the obtained sparse representation matrixα and LR dictionary D L , The HR patch dataX p is reconstructed by adding U(= QY) to a matrix which is generated by stitching the set of patchesP H (Figure 2).
where F is an operator that superposes the adjacent patches and takes the average value of the overlap region. Subsequently, the patch dataX p is refined to X * by back projection, as proposed by [6]. Back-projection algorithm constrains the difference between the input LR image Y and a reconstructed LR image D Lα , which is not taken into account during the process of reconstructing an HR image. The refined image X * is obtained by computing where S is a down-sampling operator. Finally, the Y blur , which was initially removed as a low-frequency component, is up-sampled by cubic interpolation and combined to form the final reconstructed grid dataX,X = X * +Ŷ blur .
The algorithm structure of sparse coding and reconstruction is written below (Algorithm 1). We used Python for data pre-processing, dictionary learning, sparse coding, reconstruction, and visualisation of figures, and GMT for visualisation of the original bathymetric maps. Generate the HR patch:P H ← D Hα . 6: Up-sample the high-frequency component of the LR image, U ← QY 7: Superpose the adjacent patches and add U:X p ← FP H + U.

Data and Implementation
To verify the effectiveness of super-resolution by dictionary learning, we used bathymetry data from the Mid-Okinawa Trough (Figure 3), where the Iheya-Minor Ridge, small sea knolls, and faults associated with the Okinawa Trough have been identified [25][26][27]. Pairs of HR and LR grid data that require increased resolution of the area are used as training datasets. The target resolution data was a mesh grid of 50-m intervals drawn by calculation of the grid data from point clouds of water depths in the Mid-Okinawa Trough obtained by multiple types of MBES [28].
hyper-parameters: the number of bases was 256, the maximum number of non-zero elements was 2, the patch size was 16 × 16, and the image was reconstructed with stride 2. We set the values of these parameters based on the preliminary experimental results. Patch size (16) corresponds to an actual length of 800 m. The root-mean-square error (RMSE) relative to an original image was used to evaluate accuracy. We reconstructed the remaining seven regions using eight dictionaries generated by dictionary learning. To verify the effectiveness of ScSR, we applied bicubic interpolation to LR grid data for comparison.

Results
The results of the dictionary which were learned in area 0_0 ( Figure 4) and the image of area 0_2, which was reconstructed using dictionary 0_0, are presented. We obtained a sparse representation matrix which approximates the LR image of area 0_2 using dictionary 0_0.
and HR dictionary H were used to reconstruct the HR image in 0_2 (Figure 5). The bases within the dictionary of 0_0 show that the learning process extracts variable geomorphic characteristics, such as ridges or valley-like and mountain-like shapes The bathymetry data were normalised to the required extent before processing. The depth range of the input data was altitudes of −3000-0 m in this study. This was normalised to a range of 0-1 for training purposes. The original data X 0 has a 50-m bathymetric grid, which is a target resolution in this study. During the dictionary learning process, LR images were required to obtain an LR dictionary. The following process is applied to X 0 to obtain LR grid data Y 0 for super-resolution, where S is a down-sampling operator, that is, the data to be super-resolved in this study is an LR image obtained by down-sampling 50-m grid data to 100-m grid data. The obtained pairs of HR and LR images were divided into eight 25.6 km squares (Figure 3), and dictionary learning was performed in each area. Using these eight pairs of dictionaries, we performed super-resolution with ScSR in the other seven areas.
As mentioned above, the selected area of the Mid-Okinawa Trough was divided into eight sections ( Figure 3); eight dictionaries were created by dictionary learning using the K-SVD method in each area and the accuracy of each dictionary was validated in the remaining seven areas. In this study, dictionary learning was conducted with the following hyper-parameters: the number of bases was 256, the maximum number of nonzero elements was 2, the patch size was 16 × 16, and the image was reconstructed with stride 2. We set the values of these parameters based on the preliminary experimental results. Patch size (16) corresponds to an actual length of 800 m. The root-mean-square error (RMSE) relative to an original image was used to evaluate accuracy. We reconstructed the remaining seven regions using eight dictionaries generated by dictionary learning. To verify the effectiveness of ScSR, we applied bicubic interpolation to LR grid data for comparison.

Results
The results of the dictionary which were learned in area 0_0 (Figure 4) and the image of area 0_2, which was reconstructed using dictionary 0_0, are presented. We obtained a sparse representation matrix α which approximates the LR image of area 0_2 using dictionary 0_0. α and HR dictionary D H were used to reconstruct the HR image in 0_2 ( Figure 5). The bases within the dictionary of 0_0 show that the learning process extracts variable geomorphic characteristics, such as ridges or valley-like and mountain-like shapes ( Figure 4). Details of the topographical features indicated by the bases in the dictionary are discussed in the "Discussion" section.  The 0_2 image reconstructed with ScSR shows the topographical structure more clearly than the bicubic interpolation. Focusing on the eastern part of the Iheya-Minor Ridge, the ScSR image (Figure 5c) shows the topographical undulations more clearly compared to the LR (Figure 5b) and bicubic interpolation images (Figure 5d). In addition, in the western part of the Iheya-Minor Ridge, bicubic interpolation introduces undesired smoothing, that is not present in our proposed method. Figure 6 shows the residual images between the original HR image and bicubic-interpolated and reconstructed ScSR images. The bicubic interpolation image has large errors in large undulating areas, such as the Iheya-Minor Ridge, small knolls, and faults, whereas the ScSR has small errors.
The RMSE of the ScSR image was 1.156 m, while that of the bicubic image was 1.713 m. Table 1 presents the RMSEs for the relevant regions reconstructed using the dictionaries learned in the other seven areas. The RMSEs of bicubic interpolation are shown in the table. The RMSE of the ScSR is approximately 30% lower than that of bicubic interpolation in all the regions, indicating that ScSR improves the accuracy.   The 0_2 image reconstructed with ScSR shows the topographical structure more clearly than the bicubic interpolation. Focusing on the eastern part of the Iheya-Minor Ridge, the ScSR image (Figure 5c) shows the topographical undulations more clearly compared to the LR (Figure 5b) and bicubic interpolation images (Figure 5d). In addition, in the western part of the Iheya-Minor Ridge, bicubic interpolation introduces undesired smoothing, that is not present in our proposed method. Figure 6 shows the residual images between the original HR image and bicubic-interpolated and reconstructed ScSR images. The bicubic interpolation image has large errors in large undulating areas, such as the Iheya-Minor Ridge, small knolls, and faults, whereas the ScSR has small errors.
The RMSE of the ScSR image was 1.156 m, while that of the bicubic image was 1.713 m. Table 1 presents the RMSEs for the relevant regions reconstructed using the dictionaries learned in the other seven areas. The RMSEs of bicubic interpolation are shown in the table. The RMSE of the ScSR is approximately 30% lower than that of bicubic interpolation in all the regions, indicating that ScSR improves the accuracy.  The 0_2 image reconstructed with ScSR shows the topographical structure more clearly than the bicubic interpolation. Focusing on the eastern part of the Iheya-Minor Ridge, the ScSR image (Figure 5c) shows the topographical undulations more clearly compared to the LR (Figure 5b) and bicubic interpolation images (Figure 5d). In addition, in the western part of the Iheya-Minor Ridge, bicubic interpolation introduces undesired smoothing, that is not present in our proposed method. Figure 6 shows the residual images between the original HR image and bicubic-interpolated and reconstructed ScSR images. The bicubic interpolation image has large errors in large undulating areas, such as the Iheya-Minor Ridge, small knolls, and faults, whereas the ScSR has small errors.
(c) (d)   The RMSE of the ScSR image was 1.156 m, while that of the bicubic image was 1.713 m. Table 1 presents the RMSEs for the relevant regions reconstructed using the dictionaries learned in the other seven areas. The RMSEs of bicubic interpolation are shown in the table. The RMSE of the ScSR is approximately 30% lower than that of bicubic interpolation in all the regions, indicating that ScSR improves the accuracy.

Discussion
Because each basis was a feature extracted from the seafloor topography of the training dataset, the basis used in the reconstruction was chosen to represent the features of the topography of the corresponding area. The basis used in this study is an 800 m square, which is suitable for extracting geomorphological features, such as small sea knolls. In this section, we verify the characteristics of the bases which are learned in 0_0 and choose to reconstruct area 0_2 and verify if the topographical features in this area are extracted properly.
Focusing on the topographical features of each basis, some bases in the 0_0 dictionary show similar patterns to each other. To simplify the interpretation, we performed uniform manifold approximation and projection (UMAP) [29] to project similar bases close to each other and classified them into 23 groups based on the results (Figure 7). Figure 7 shows that the dictionary contains bases with ridges and valleys extending in the ENE-SWS direction (cluster 1 and 21), bases with small basin-like or mountain-like shapes in the central part (cluster 7 and 11), bases with the NE-SW ridge features (cluster 8), and bases with a ridge extending in the NS (cluster 14). Area 0_2 includes the Iheya-Minor Ridge of the Mid-Okinawa Trough. Several faults running in the ENE-SWS direction were identified to the south of the ridge. Small sea knolls were also observed in the centre of the southern part of the area and north of the ridge.

Discussion
Because each basis was a feature extracted from the seafloor topography of the training dataset, the basis used in the reconstruction was chosen to represent the features of the topography of the corresponding area. The basis used in this study is an 800 m square, which is suitable for extracting geomorphological features, such as small sea knolls. In this section, we verify the characteristics of the bases which are learned in 0_0 and choose to reconstruct area 0_2 and verify if the topographical features in this area are extracted properly.
Focusing on the topographical features of each basis, some bases in the 0_0 dictionary show similar patterns to each other. To simplify the interpretation, we performed uniform manifold approximation and projection (UMAP) [29] to project similar bases close to each other and classified them into 23 groups based on the results (Figure 7). Figure 7 shows that the dictionary contains bases with ridges and valleys extending in the ENE-SWS direction (cluster 1 and 21), bases with small basin-like or mountain-like shapes in the central part (cluster 7 and 11), bases with the NE-SW ridge features (cluster 8), and bases with a ridge extending in the NS (cluster 14). Area 0_2 includes the Iheya-Minor Ridge of the Mid-Okinawa Trough. Several faults running in the ENE-SWS direction were identified to the south of the ridge. Small sea knolls were also observed in the centre of the southern part of the area and north of the ridge. The colour shading of the symbols on (b) corresponds to the sum of the absolute values of the coefficients of each basis in the reconstruction. "c1" represents "cluster 1", and the same applies to "c2" and beyond.
We examined the extraction of the geomorphic features represented by the bases selected during the reconstruction of the Iheya-Minor Ridge, faults, and small sea knolls. Figures 8 and 9 show the contribution of each group to the reconstruction of area 0_2. Figure 8 shows that the absolute values of the coefficients of the bases are larger around the faults and Iheya-Minor Ridge. Focusing on the bases used to reconstruct the small sea knolls, different groups of bases were extracted for each small knoll. Specifically, cluster 14, which is represented by the basis of a ridge extending in the NS direction (Figure 7), is used for the sparse representation of two small knolls north of the Iheya-Minor Ridge (enclosed by green circles in Figure 8); however, it is not used for the representation of the knoll in the southern part of the region (within a yellow circle), which is located more than 10 km away from the ridge (Figure 8). Conversely, cluster 8, which captures the NE-SW Figure 7. (a) Visualisation of 256 bases from the "0_0" dictionary as embedded by UMAP. (b) Distribution of clusters of bases on the embedded space by UMAP. The colour shading of the symbols on (b) corresponds to the sum of the absolute values of the coefficients of each basis in the reconstruction. "c1" represents "cluster 1", and the same applies to "c2" and beyond.
We examined the extraction of the geomorphic features represented by the bases selected during the reconstruction of the Iheya-Minor Ridge, faults, and small sea knolls. Figures 8 and 9 show the contribution of each group to the reconstruction of area 0_2. Figure 8 shows that the absolute values of the coefficients of the bases are larger around the faults and Iheya-Minor Ridge. Focusing on the bases used to reconstruct the small sea knolls, different groups of bases were extracted for each small knoll. Specifically, cluster 14, which is represented by the basis of a ridge extending in the NS direction (Figure 7), is used for the sparse representation of two small knolls north of the Iheya-Minor Ridge (enclosed by green circles in Figure 8); however, it is not used for the representation of the knoll in the southern part of the region (within a yellow circle), which is located more than 10 km away from the ridge (Figure 8). Conversely, cluster 8, which captures the NE-SW ridge features, is used for sparse representation of the southern knoll and its surrounding faults but is not used for reconstruction of the northern knoll ( Figure 8). Reconstructing an HR image using sparse representation probably enables the capture of topographical features that are not apparent to the naked eye. ridge features, is used for sparse representation of the southern knoll and its surrounding faults but is not used for reconstruction of the northern knoll ( Figure 8). Reconstructing an HR image using sparse representation probably enables the capture of topographical features that are not apparent to the naked eye.  The accuracy of super-resolution detail was also examined using the residual map of area 0_2 ( Figure 6). As mentioned above, steep gradients, which are often features of geoscientific importance, are not well represented in the residual map of simple interpolation methods (Figure 6b). Conversely, ScSR shows a significant improvement in errors at the Iheya-Minor Ridge, small knolls, and faulted areas (Figure 6a). The proposed method selects the basis while constraining the reconstruction to capture topographical features at the patch-size scale. ridge features, is used for sparse representation of the southern knoll and its surrounding faults but is not used for reconstruction of the northern knoll ( Figure 8). Reconstructing an HR image using sparse representation probably enables the capture of topographical features that are not apparent to the naked eye.  The accuracy of super-resolution detail was also examined using the residual map of area 0_2 ( Figure 6). As mentioned above, steep gradients, which are often features of geoscientific importance, are not well represented in the residual map of simple interpolation methods (Figure 6b). Conversely, ScSR shows a significant improvement in errors at the Iheya-Minor Ridge, small knolls, and faulted areas (Figure 6a). The proposed method selects the basis while constraining the reconstruction to capture topographical features at the patch-size scale. The accuracy of super-resolution detail was also examined using the residual map of area 0_2 ( Figure 6). As mentioned above, steep gradients, which are often features of geoscientific importance, are not well represented in the residual map of simple interpolation methods (Figure 6b). Conversely, ScSR shows a significant improvement in errors at the Iheya-Minor Ridge, small knolls, and faulted areas (Figure 6a). The proposed method selects the basis while constraining the reconstruction to capture topographical features at the patch-size scale.
Variations in accuracy for each region are also discussed in terms of topographical features. Focusing on the RMSE of the bicubic interpolation, the value remarkably varies from one area to another (Table 1). Accuracy in bicubic interpolation is better in relatively gentle and uniform sea areas, e.g., 0_0 and 0_1. On the other hand, the bicubic interpolation accuracy is poor for other sea areas where local topographical changes are notably recognised, such as small ridges and sea hills (Figure 3). In terms of the accuracy of ScSR, we will discuss in which areas ScSR is highly effective compared to bicubic interpolation. A comparison of the RMSEs of ScSR and bicubic interpolation by reconstructed areas in Table 1 shows that ScSR is particularly effective in 0_2, 1_0, 1_2, and 1_3. The areas where local gradient changes and landforms, such as small ridges are more characteristic than other areas and are thought to be the result of the advantages ScSR, which reconstructs while selecting characteristic bases from the dictionary.
Despite the effectiveness of ScSR, the proposed method is preliminary and has room for improvement, as shown below. First, although the usefulness of ScSR has been demonstrated in the Mid-Okinawa Trough area, it is necessary to verify the method in other areas to confirm its versatility. Subsequently, the size of the patch and basis was 800 m square and faults and small knolls were successfully extracted. However, it must be further verified whether it is possible to extract a more detailed topography or global structure by changing the patch size. Finally, we used grid data which were converted from the point-cloud data obtained by the MBES in this study. Although grid data can be shared easily and its size is reduced, the information included in the original point-cloud data is lost to some extent when the data are converted. The proposed method can achieve super-resolution while providing most of the information on the observation data by improving it to apply the point-cloud data.

Conclusions
In this study, we improved the ScSR proposed by [6,9]. We separated the seabed topographical image into a high-frequency component that specialises in the information on topographical undulations and a low-frequency component that captures the global information and implements sparse modelling to the high-frequency component. ScSR was effective for the super-resolution of seabed topographical maps. This method was applied to the map of the Mid-Okinawa Trough and the obtained RMSE was improved by 30% over bicubic interpolation with small training datasets. The base-extraction process and reconstruction can provide super-resolution and geoscientific interpretation simultaneously.  Y were extracted by applying first-and second-order derivative filters in both the horizontal and vertical directions.
where Q and f denote an up-sampling operator and differential filter, respectively. In this study, we adopted bicubic interpolation as the up-sampling method as mentioned in the "Method" section. The subscripts h and v denote the direction of the differential filters, and 1 and 2 are the order of the derivative. in an operator that acts on the matrix with a differential filter. For each of Y h , Y v , Y hh , and Y vv , we divide them into patches of size √ n × √ n (where n is the patch length) to obtain a group of four matrices, each with N patches. We perform principal component analysis on the group of patches with a length of 4n and perform dimensionality reduction up to the n low -th component which has a cumulative contribution rate of 99.9% (n low < 4n).
In the process of reconstruction, edge components of Y is also extracted as the same manner.