A General Convolutional Neural Network to Reconstruct Remotely Sensed Chlorophyll-a Concentration

: Satellite-observed chlorophyll-a (Chl-a) concentrations are key to studies of phytoplankton dynamics. However, there are gaps in remotely sensed images mainly due to cloud coverage which requires reconstruction. This study proposed a method to build a general convolutional neural network (CNN) model that can reconstruct images in unfamiliar areas. Although several CNN models to reconstruct Chl-a in a speciﬁc area have already been proposed, the model in this research has the advantage of generality. The model uses a more ﬂexible U-net architecture so that it can accept input of different shapes. Images from three areas of different shapes were used in model training to improve the generality of the model. Six models, with different auxiliary input schemes and architectures, were trained and evaluated. Results show that the model with bathymetry input and coarse-to-ﬁne architecture has the best performance and can give reasonable reconstruction for the unfamiliar area. The best model shows better results than traditional interpolation methods when reconstructing for an unfamiliar area, especially in regions outside the data coverage.


Introduction
Phytoplankton serves as an indicator of marine ecosystem health.Algal blooms can have harmful impacts on anthropogenic activities [1].Therefore, monitoring and prediction of phytoplankton are crucial for the effective management of coastal and oceanic environment health and resources [2].Satellite observations of Chl-a concentrations, which are a direct proxy for phytoplankton biomass [3], are critical for studying phytoplankton dynamics [4].However, there are significant missing data in remotely sensed Chl-a images to varying degrees due to poor atmospheric conditions, missing orbits, and sensor malfunction [5,6].
Numerous methods have been developed to reconstruct missing data in satellite data.Traditional interpolation methods contain two main groups: deterministic and geostatistical interpolation.Deterministic methods predict directly based on the surrounding data points using mathematical functions [7].The most commonly used deterministic methods include linear interpolation, nearest neighbor interpolation, spline interpolation, etc. Geostatistical methods utilize the statistical properties of the measured points to reconstruct and estimate uncertainties of reconstruction.Geostatistical methods include different schemes of Kriging interpolation algorithms [8].These traditional methods are easy to operate and hence have been used and evaluated in much research [9][10][11].However, these methods do not work well when the data has limited spatial and temporal scopes.Because they estimate the missing values using only surrounding measured values [12] and no domain knowledge about the variable is used.
Optimal interpolation (OI) has been found to have better performance with limited measured data than traditional methods by combining background fields and observations.This method has been widely used to interpolate remotely sensed oceanic data in the past [13][14][15].However, OI requires background fields and the error covariance matrix.This a priori information is not always available, limiting the applicability of OI [16].
training areas and reconstruct images of new areas, with no need for long data series (DINEOF, DINCAE) or a priori information (OI) in the new areas.Therefore, this model has higher applicability and wider application scenarios than OI, DINEOF, and the CNN-based models discussed before [28][29][30][31].While compared with traditional interpolation methods, the method has the strength of higher accuracy thanks to the ability to extract complex nonlinear relationships from vast amounts of data for reconstruction.The source code of this method is included in Data Availability Statement.

Materials
Four offshore areas were selected for this study along the east coasts of North America and Asia.They are as follows: area1, the East China Sea (ECS, 26-34 • N, 120.5-127.5  W), and area4, the Florida peninsula (Florida, 24-32 • N, 84-76 • W).Only data from area1, area3, and area4 are used in training.Area2 is used solely as a test set to evaluate the models' performance in unfamiliar areas.All four areas are on the east side of continents (Eurasian and North America) in the northern hemisphere at middle latitudes.They have similarities as well as differences.It helps to find the limitation of the model's generalization ability.

Chl-a
The satellite observed Chl-a concentration data used in this research are from the GlobColour project.Daily Chl-a data were merged from multiple sensors (Modis, VIIRSN, Meris) through weighted averages.Ten years of data from four regions were selected from 2011-2020.The resolution of the grids is around 4 km.Grid shapes are as follows: area1 (193 × 169), area2 (121 × 241), area3 (193 × 193), and area4 (169 × 193).Figure 1 shows the mean Chl-a values of the four regions from 2011 to 2020.The four images all cover offshore regions where Chl-a concentration variations are dominated by terrigenous nutrient supply [2].High Chl-a values occur over a large area of the Yangtze River estuary in area1, along the mainland coast of the South China Sea and on the western side of Taiwan Island in area2, along the East Coast in area3, and along the Florida Peninsula in area4.All these high Chl-a values are in low water depth regions.Missing rates of all these areas are calculated at each pixel and shown in Figure 2. The majority of the pixels have missing rates higher than 50%.The Missing rate in the East China Sea and the South China Sea is higher than that in the East Coast and Florida, especially in the Coastal region where the data loss rate is close to 100%.This difference may be due to differences in satellite coverage and cloud density between the two continents.To make the results more reliable, images with valid pixels less than 2% are not used, referring to the previous study [28].Finally, 3277 images in area1, 3639 in area3, and 3565 in area4 were selected as a bulk dataset.The bulk dataset was then divided into a training set, a validation set, and a test set in the ratio of approximately 8:1:1.Images in area2 are only for testing and 10% (321) of valid images were selected randomly as a separate test set.

Bathymetry
The General Bathymetric Chart of the Oceans (GEBCO) project provides gridded bathymetric datasets, in meters, on a 15-arc-second interval grid.GEBCO releases a new global grid every year.In this research, the GEBCO_2022 Grid was used, shown in Figure 3. Bathymetry data are interpolated to match the grid points of the Chl-a data.Figures 1  and 3 show a clear correlation between Chl-a concentration and seafloor depth.High Chl-a values are mainly distributed near continental shelves and islands where the depth is less than 100 m.The topography adjacent to these shallow waters is mainly gentle plains surrounded by continental shelves, except for the eastern side of Taiwan island where the elevation drops from mountains to an ocean basin sharply.The cliff topography, different from other plains with river estuary, results in the low Chl-a on the eastern coast of Taiwan Island shown in Figure 1 [32].

Workflow
Figure 4 illustrates the workflow of how each model is trained.Firstly, Chl-a images from the training set are masked artificially and more data is masked out.This research creates an artificial mask dataset, containing 100 masks for each region.Each mask M ij is extracted from the missing position of a real daily Chl-a image, where 1 denotes the pixel is missing and 0 denotes the pixel is valid.The masking process is performed by multiplying the original Chl-a matrix by (1 − M ij ).The masked 2D Chl-a matrix is combined with other auxiliary variables to create a 3D input matrix.The variables used in the input matrix are shown in Table 1.The model uses the 3D input matrix to generate a 2D Chl-a reconstruction.Initially, all parameters in the model are assigned random values and the reconstruction is poor.The reconstruction (Chl-a rec ) and true Chl-a observation (Chl-a obs ) are taken as the input of the cost function given in the equation below: where J ŷij is the cost, ŷij is the reconstruction value, y ij is the satellite observation value, V ij is the flag that denotes whether each pixel has a valid observed value, i and j are the row number and the column number in the matrix, and N valid is the number of valid pixels.Once the cost function J ŷij is defined, the next step is to calculate its derivatives with respect to the model parameters.This involves applying the chain rule to compute the gradient vector, which contains the partial derivatives of J ŷij with respect to each parameter.The optimizer then uses this gradient to update the parameters iteratively, following the gradient descent algorithm, which aims to minimize J(y) by adjusting the model's weights and biases.Adam optimizer [33] was used in this research because it has an adaptive learning rate and hence converges fast.Standard parameters were used for Adam, with the learning rate α = 0.001, the exponential decay rate for the first moment estimates β1 = 0.9, the second-moment estimates β2 = 0.999, and the regularization parameter = 10 −8 .Longitude (scaled linearly between −1 and 1) 10 (optional) Latitude (scaled linearly between −1 and 1) 11 (optional) Elevation (scaled to the power of 1/5 and linearly between −1 and 1) Throughout the training process, the models are evaluated with the validation set, and current parameters are saved as files.The parameters with the lowest root mean squared error (RMSE) in validation through the training phase are regarded as the best parameters for the model and are evaluated with the test sets later.
Table 1 lists all the input variables, each of which is represented by a 2-D matrix corresponding to a single channel in the overall input matrix.To facilitate easier extraction of data variations, all variables are normalized to the same scale and a similar distribution [34].Variable 1 is the Chl-a satellite observation which is masked by a mask chosen randomly from the mask dataset.Since Chl-a values in nature follow a log-normal distribution approximately, this variable is normalized logarithmically.In the input matrix, all the invalid pixels, including the missing pixels in the original Chl-a matrix and the pixels masked out, are set to 0. To let the model know these pixels are invalid pixels rather than pixels where the observed ln(Chl-a) is 0, variable 2 is introduced.Variable 2 represents the valid flag of each pixel, in which pixels with valid Chl-a data are set to 1, and invalid pixels are set to 0. Similarly, variables 3 and 5 correspond to images of the previous and next days, respectively, after being applied with random masks.The flags for the two days are represented by variables 4 and 6.Variables 7,8 are cosine and sine of the day of the year multiplied by 2π 366 .Cosine and sine functions are used to provide seasonal information.In these two functions, the outputs of day 0 and day 365 are close.This is close to the real nature.
In previous studies, longitude and latitude were used to provide spatial context, which was effective for reconstructing data in the same area as the training area [28,29].This approach helped the model fit the distribution of each coordinate.However, using coordinates for spatial context may lead to overfitting, which could harm the model's ability to reconstruct data in areas outside the coordinate range of the training areas.As opposed to coordinates, bathymetry has a broadly valid impact on Chl-a concentration.Adding bathymetry data as input could increase the performance of the model in all areas.
To evaluate the effect of the two spatial context inputs, models with three input schemes are trained and tested.The input schemes are as follows: no auxiliary spatial information, coordinates input, and bathymetry input.Coordinate inputs are scaled linearly to the range of −1 to 1 (90 . The elevation of bathymetry input is normalized to the power of 1/5 and scaled linearly, to amplify the variance in elevations of small absolute values (coastal area).This is because a 10 m variance in ocean basin or mountain may not have an impact on Chl-a in adjacent water bodies while the same variance in low elevation could outline an estuary and has a significant impact on nearby Chl-a concentration.

Model Architecture
The architecture of the U-net used in this research is shown in Figure 5.It is a "U" shaped network consisting of an encoder on the left side and a decoder on the right side.The input matrix is shaped as (C, M, N).C is the channel number, representing the number of variables.M and N represent the grid number in latitude and longitude directions respectively.The arrow in the figure denotes the flow of data through the network.
To begin, the input is fed into a convolutional layer and passed through an activation function.The convolutional layer utilized for feature extraction is composed of 16 filters with a receptive field of 3 × 3 grids and a stride of 1.In this study, the ReLU function is employed as the activation function to provide nonlinearity to the model.After going through the convolutional layer and activation function layer, the input matrix is converted into a feature map with dimensions of (16, M, N).This feature map then undergoes a max pooling operation, where the filter size is 2 × 2, reducing the dimensions of the feature map to half of its original size (16, M/2, N/2) for dimensionality reduction.
Similar operations are applied in subsequent layers, where the grid length of the feature map is halved for each max pooling layer, while the channel number is doubled for each convolutional layer.Given the original image size, the encoder comprises 4 pooling layers and 5 convolutional layers, resulting in a feature map with a shape of (256, M/16, N/16).
The decoder module is comprised of 4 upsampling layers for feature dimension recovery and 5 convolutional layers for feature extraction.Nearest interpolation is employed in the upsampling layer to double the grid length of the feature map.The number of filters in the convolutional layers in the decoder reduces by half each time, except for the last layer which has only one filter to produce a one-channel output.
To better recover lost information in downsampling, skip connections are established between the encoder and decoder.This technique fuses the feature maps at corresponding positions in the two processes, allowing the decoder to better retain high-resolution detail information contained in the higher-level feature maps during upsampling.As a result, small-scale details are recovered more accurately.Ultimately, the feature map with dimensions of (256, M/16, N/16) is transformed into an output matrix of (1, M, N) through the decoder.
This research uses coarse-to-fine architecture with two connected U-nets to stabilize training and improve model performance [35].A coarse-to-fine model consists of a coarsenet and a refine-net.The coarse-net produces a coarse reconstruction, which is concatenated with the original input to serve as the input for the refine-net.With an input matrix containing the original input and the coarse Chl-a reconstruction, the refine-net can focus on refining the coarse Chl-a reconstruction.Using two separate networks with different tasks has been found to be more effective than relying on a single network to perform a complex task of reconstruction on all scales.The coarse-to-fine architecture has been extensively tested in the computer vision domain [36] and has also been applied in remotely sensed data reconstruction in previous studies [37].However, adding a refine-net roughly doubles the number of parameters and the depth of the neural network, making it more susceptible to the vanishing gradient issue.As the depth increases, the derivatives of the cost with respect to parameters in the first few layers far from the output layer tend to be very small.To mitigate this issue, the intermediate results are included in the cost function.Thus, the cost function J in Figure 6 is calculated as the weighted sum of the cost of two networks, as shown below: where weight parameters in this research are set the same as in the previous study [37], with α = 0.3, and β = 0.7.J coarse is the cost function of the coarse net, J re f ine is the cost function of the refine net.Both of them are calculated as Equation ( 1).Although this architecture supports multiple refine-nets after the coarse-net, only one refine-net was used in this research.Each extra refined net increases the number of parameters of the whole network, hence increasing the training time and requirement on computer memory.A second refine-net has a finer intermediate reconstruction input and may improve the performance slightly, but will increase the cost of computing resources and time dramatically.The previous study also showed models with only one similar auxiliary network can achieve a close effect to those with many [38].

Training Process
RMSEs of these six models during the training process for both the training and the validation sets are shown in Figure 7.The RMSEs in this phase are calculated as follows: RMSEs are calculated using logarithmically normalized Chl-a values.Because of the property of Logarithms (ln a − ln b = ln a b ), the RMSEs calculated in this research can be broadly seen as the relative errors between the predicted Chl-a values and the actual Chl-a values, regardless of the data scale.
Models 1-3 are the single-net model with no auxiliary spatial variable, the single-net model with coordinates input, and the one with bathymetry input respectively.Models 4-6 are the three models with coarse-to-fine architecture and with the same three input schemes as models 1-3.It can be observed that the RMSEs decrease rapidly in the beginning stage and all of the curves converged at the end after 500 epochs.We select the model parameters RMSEs of these six models during the training process for both the training and the validation sets are shown in Figure 7.The RMSEs in this phase are calculated as follows: RMSEs are calculated using logarithmically normalized Chl-a values.Because of the property of Logarithms (ln  − ln  = ln ), the RMSEs calculated in this research can be broadly seen as the relative errors between the predicted Chl-a values and the actual Chla values, regardless of the data scale.

Reconstruction of the Training Areas
RMSEs of the reconstruction produced by the six models in the test set are shown in Table 2. RMSEs for the test sets are calculated as follows: where M ij is the mask applied on input and N masked is the number of valid pixels that are masked.Notably, the RMSE expression computed during the test phase differs from those calculated during the training and validation phases.Specifically, in the test phase, only pixels that possess valid observations and are masked out by a random mask are considered in the RMSE calculation.Conversely, in the previous phases, all pixels with valid values in the observations are accounted for in the RMSE computation.It is worth emphasizing that the valid pixels that are not masked out in the input should be taken into account during training, as the model will learn to use them directly as outputs.However, it is inappropriate to evaluate the model's performance based on the reconstruction quality at these pixels, as their values are already available in the input.Thus, the evaluation of the model's performance should only consider the RMSE of the reconstructed pixels that were masked out during the test phase.As Table 2 shows, the refine-net architecture has a visibly positive effect on all the models irrespective of the input schemes, across all areas.Hence, only the coarse-to-refine networks will be further tested later in this paper.Moreover, Table 2 shows that both bathymetry and coordinate inputs enhance the model performance, albeit to a lesser extent than refinement.Additionally, bathymetry outperforms coordinates slightly in three areas.Specifically, the RMSEs in area1 and area3 are substantially higher than those in area4.Furthermore, the RMSE in area1 is higher than that in area3, albeit the difference is less pronounced.The variability among areas could be attributed to differences in the data distribution or missing rates.In the previous DINCAE application on Chl-a reconstruction [29], the RMSE of reconstruction is 0.27.Without considering the difference in different areas' data characteristics, the RMSEs for the general model in this research are on a level close to DINCAE specialized for 1 specific area in the South China Sea.
RMSE between ln(Chl-a rec ) and ln(Chl-a obs ) at each pixel is calculated and shown in Figure 8.As shown, the three models have a similar RMSE spatial distribution.Overall, RMSE values in shallow water regions are higher than those in open ocean regions.Two notable high RMSE areas are the Yangtze River estuary in area1 and the region where the Gulf Stream travels through in area3.These two areas are influenced by two strong streams with high variance and are harder to infer accurately.
Figure 9 displays the reconstruction results of each model for specific days.Data from the test set in area1 on 11 June 2013, in area3 on 7 October 2020, and in area4 on 20 February 2011 was used for reconstruction.The white areas in the last column (Satellite) are missing pixels where V ij = 0.The second column (Cur) is generated by multiplying the original satellite observed Chl-a matrix by (1 − M ij ), hence the white space in the second column are invalid pixels, where V ij × (1 − M ij ) = 0.The first (Prev) and the third (Next) columns are the Chl-a of the previous day and the next, after artificial masks.The figure shows that there is no significant difference between the three models' reconstruction results and all the three models can reconstruct the Chl-a patterns reliably in the training areas.Although they tend to blur fine structures generated by short-term ocean dynamics.
Errors at each pixel are calculated as ln(Chl-a rec ) subtracted by ln(Chl-a obs ) and are shown in Figure 10.High error values occur in similar areas among the three models.In area1, high negative and positive errors occur in the south of the Yangtze River estuary, forming a pattern of a vortex.In area3, a high positive error occurs in the north.For the front in the middle of area3, reconstructions have high negative errors along the east side of the front.In area4, a high positive error of vortex shape occurs along the curve of the east side of the Florida peninsula, and a lump of water with a high Chl-a concentration on the northeast is missing in the reconstruction.All the high errors occur due to missing short-term features or blurring feature borders.Among these three models, the one with bathymetry input shows a lower error than the other two models, although the improvement is not significant.Figure 12 shows the reconstructions of the three models for an image on 27 August 2014.The input image of the next day covers the high Chl-a value near the southern coast, while none of the three days has valid input in the northern coastal area.All models reconstruct the high Chl-a value in the southern coastal area well, but only the model with bathymetry reconstructs the high Chl-a pattern in the northern coastal area successfully.The result suggests that all models can utilize valid pixels in three days to make reconstruction, but only model 3 can achieve reasonable reconstruction for areas where no spatially or temporally nearby pixels are available.Errors at each pixel are calculated and shown in Figure 13.The model without auxiliary input and the one with coordinate input both have a large negative error on the northern coastal area, due to failure to reconstruct the high Chl-a value near the coast.In contrast, the model with bathymetry input has a much lower error in the same area, although it has a large positive error farther from the coast, due to overextending the scope of high Chl-a.The same advantage due to bathymetry input occurs on the west side of Taiwan island.Furthermore, the last model has a lower error in the open ocean on the east.In summary, the reconstruction error of the model with bathymetry input is slightly lower on a global scope and significantly lower in coastal areas than the other two models.

Comparison between the Best Model and Other Methods
The main strength of the proposed model compared with the CNN-based methods discussed before is its ability to extract the common statistical relationships in multiple areas, and works for areas with no long data series or a priori information.OI, DINEOF, and the CNN-based models discussed before are not suitable in the absence of long data series and a priori information.To evaluate the effectiveness of the proposed model, it was compared with traditional interpolation methods (Linear interpolation and NN).All the methods used incomplete Chl-a images of the previous day, the current day, and the next day to reconstruct.
The results of the three methods are shown in Figure 14.Linear interpolation has large gaps in the reconstruction due to the limited range of input data.NN fills all the gaps but fails to reproduce the high Chlo-a feature outside the data spread.The best-performing model shows a better result in regions outside the data spread.The possible reason is that the CNN model is able to make inferences based on bathymetry and surrounding valid values, relying on patterns it has seen.In contrast, the other two methods estimate merely by linear combinations of surrounding values.

Conclusions
This study proposes a novel approach based on CNN for the reconstruction of gaps in remotely sensed Chl-a images.Data reconstruction techniques rely on the correlation between variables.Previously employed methods either assume this correlation through predetermined mathematical formulas or derive it from statistics.Both have their limitations: mathematical formulas oversimplify the intricate non-linear relationships between geobiological variables, while the statistics extracted by previous methods are restricted to the local area, leading to low generality and limited applicability.The principal strength of the method developed in this study lies in its ability to extract general patterns from vast amounts of data in multiple areas and to generate predictions for regions with limited prior information and no long data series.
Different architectures and auxiliary input schemes were used to improve the model performance.The results demonstrated that the coarse-to-fine architecture enhanced the model performance.Among the input schemes, bathymetry input visibly improved the model's generalization ability, while the coordinates input commonly used in previous machine-learning techniques harmed the model's generalization ability.
The best-performing model is the model with bathymetry input and coarse-to-fine architecture.It was compared with two deterministic methods of reconstructing area2 using images of 3 days.The result demonstrated that the model outperformed traditional interpolation methods, especially in areas outside the data coverage.
Despite its success, the current model has certain limitations.For instance, it fails to predict the low Chl-a concentration on the eastern coast of Taiwan Island in area2, which has a different bathymetry (steep slope with no river estuary) than other coasts in the training area.This limitation suggests that the model's generalization ability is restricted by the data range used in training.Furthermore, the model was only tested in four areas on the east side of continents in the northern hemisphere at middle latitudes, where the climate and ocean dynamics are similar.Hence, its performance may be poorer in areas with distinct characteristics.
Future research may benefit from incorporating data from wider-spread areas and adding other physical variables such as wind speed and SST to enhance the model's

Figure 1 .
Figure 1.Averaged Chl-a concentration from 2011 to 2020 using valid pixels.

Figure 2 .
Figure 2. Missing rate of each pixel for the complete dataset.

Figure 3 .
Figure 3. Seafloor or land height (above mean sea level) in meters of the four areas.

Figure 4 .Table 1 .
Figure 4. Schematic flowchart for the workflow of model training.Table 1.The total list of input variables.Variables of index 9-11 are optional in different models.IndexVariable Name 1 ln(Chl-a) of the current day masked by a random mask 2 Flag denotes whether each grid point is valid (0 or 1) 3-4 Masked ln(Chl-a) and flag of the previous day 5-6 Masked ln(Chl-a) and flag of the next day 7 cos( day o f the year 366 × 2π) 8 sin day o f the year 366 × 2π 9 (optional)Longitude (scaled linearly between −1 and 1) 10 (optional) Latitude (scaled linearly between −1 and 1) 11 (optional) Elevation (scaled to the power of 1/5 and linearly between −1 and 1)

Figure 5 .
Figure 5. U-net model architecture.Boxes of different colors represent layers of different types.Circles with 2 lines represent the concatenation operations of two matrices.The numbers below each box denote the number of layer channels.Bold black letters below the numbers denote the edge lengths of the matrix, where I is the input edge length (M, N).

Figure 6 .
Figure 6.The architecture of the coarse-to-fine network.
with the lowest validation RMSE.Parameters at epoch 370, epoch 440, epoch 335, epoch 445, epoch 440, and epoch 370 are selected for models 1-6, respectively.These models are tested later in the research to evaluate the performance of the model and the effects of different architectures and inputs.

Figure 7 .
Figure 7. RMSEs are calculated throughout the training phase for the training set and the validation set.The curves are smoothed to show the trend.(a) RMSEs for 20 random batches in the training set

Figure 7 .
Figure 7. RMSEs are calculated throughout the training phase for the training set and the validation set.The curves are smoothed to show the trend.(a) RMSEs for 20 random batches in the training set are calculated with a step of 1 epoch with a batch number of 32.(b) RMSEs for the entire validation set are calculated at a step of 5 epochs.

Figure 8 .
Figure 8. Spatial distribution of RMSE between ln(Chl-a rec ) and ln(Chl-a obs ) of models with 3 in- put schemes.

Figure 9 .
Figure 9.The reconstruction result of each model on specific days for training areas.The first three columns labeled "Prev", "Cur" and "Next" denote masked Chl-a input of the previous day, the current day, and the next day.The three columns after Chl-a inputs are the reconstruction results of three models with different input schemes.The last column is the full satellite observation of the current day.

Figure 10 .
Figure 10.Error between ln(Chl-a rec ) and ln(Chl-a obs ) of the models in the 3 areas.

4. 3 .
Reconstruction for the Area Not Used in TrainingTo further evaluate the ability of the three models to reconstruct areas outside the training areas, area2 was used to test the three models.The model without auxiliary parameters, the model with coordinate input, and the model with bathymetry input have RMSEs of 0.6190, 0.6883, and 0.5761 respectively.The RMSEs are higher than the RMSEs for training areas in Table2.As mentioned before, the RMSE of logarithmic values evaluates the relative error.A logarithmic error of 0.5761 (RMSE of the model with bathymetry input) may underestimate the Chl-a by 43.8% or overestimate the Chl-a by 77.9%, while a logarithmic error of 0.3538 (RMSE of the same model for area4) may underestimate the Chl-a by 29.8% or overestimate the Chl-a by 42.4%.The results indicate that bathymetry input has a significant improvement on model performance, while coordinate harms model performance when applied to unfamiliar areas.Figure11presents the RMSE of three models for each pixel in the test set.The model without auxiliary parameters and the model with coordinate input have high RMSE in the southwest, close to the Pearl River Estuary, on the western coast of Taiwan island, and nearby another estuary in the north.The model with bathymetry input has significantly lower RMSEs near estuaries and on the western coast of Taiwan island than the other two models.However, it has a high RMSE on the eastern side of Taiwan island, possibly due to the region's distinct landscape characteristics that differ from those of the areas used for training.While the coasts in training areas are all gentle planes with high Chla concentration, the east coast of Taiwan island is a steep mountain area close to the continental shelf with a low Chl-a concentration.Incorrectly inferring the low Chl-a value area as a high Chl-a area may lead to a 1000% overestimation.Since the error of logarithmic values evaluates the relative error and RMSE amplifies the higher errors, this false inference for the east coast of Taiwan land contributes to the overall RMSE greatly.This could be solved by incorporating such bathymetry in training data.

Figure 11 .
Figure 11.RMSE spatial distribution of the three models for the test set.

Figure 12 .
Figure 12.The reconstruction result of each model on a specific day for area2 which is not used in training.

Figure 13 .
Figure 13.Error between ln(Chl-a rec ) and ln(Chl-a obs ) of the models in area2.

Figure 14 .
Figure 14.The reconstruction result of 3D linear interpolation, 3D nearest neighbor interpolation, and the best model.

Table 2 .
RMSE between ln(Chl-a rec ) and ln(Chl-a obs ) of the models in the test set.