Application of DINCAE to Reconstruct the Gaps in Chlorophyll-a Satellite Observations in the South China Sea and West Philippine Sea

: The Data Interpolating Empirical Orthogonal Functions (DINEOF) method has demonstrated usability and accuracy for ﬁlling spatial gaps in remote sensing datasets. In this study, we conducted the reconstruction of the chlorophyll-a concentration (Chl- a ) data using a convolutional neural networks model called Data-Interpolating Convolutional Auto-Encoder (DINCAE), and we compared its performance with that of DINEOF. Furthermore, the cloud-free sea surface temperature (SST) was used as a phytoplankton dynamics predictor for the Chl- a reconstruction. Finally, four reconstruction schemes were implemented: DINCAE (Chl- a only), DINCAE (Chl- a and SST), DINEOF (Chl- a only), and DINEOF (Chl- a and SST), denoted rec1, rec2, rec3, and rec4 respectively. To quantitatively evaluate the accuracy of these reconstruction schemes, both the cross-validation and in situ data were used. The study domain was chosen to be the Northern South China Sea (SCS) and West Philippine Sea (WPS), bounded by 115–125 ◦ E and 16–24 ◦ N to test the model performance for the reconstruction of Chl- a under di ﬀ erent Chl- a controlling mechanisms. The in situ validation showed that rec1 performs best among the four reconstruction schemes, and that adding SST into the Chl- a reconstruction cannot improve the reconstruction results. However, for cross validation, adding SST can slightly improve spatial distributions of the root mean square error (RMSE) between the reconstructed data and the original data, especially over the SCS continental shelf. Furthermore, the potential of DINCAE prediction is conﬁrmed in this paper; thus, the trained DINCAE model can be re-applied to reconstruct other missing data, and more importantly, it can also be re-trained using the reconstructed data, thereby further improving reconstruction results. Another consideration is e ﬃ ciency; with similar reconstruction conditions, DINCAE is 5–10 times faster than DINEOF.


Introduction
Satellite-based monitoring of oceanic biogeochemical variables, such as chlorophyll-a concentration (Chl-a), often suffers from clouds, sun glint, thick aerosols, and other problems. The probability is only 25-30% to obtain cloud-free conditions over global oceans for near-daily coverage of MODIS measurements [1], and only ≈5% of pixels are high-quality [2]. This limitation strongly reduces the potential applications of ocean color observations. Therefore, missing data are a crucial problem when working with Chl-a satellite data (Chl-a sat ). Currently, many different methods have been developed to solve this problem by filling the missing values, ranging from simple linear interpolations and bicubic splines to the use of complicated statistical methods, such as optimal interpolation (OI) and Kriging. WPS only through Luzon Strait. There are different mechanisms controlling the Chl-a distributions in SCS and WPS, and this causes the mean chlorophyll concentration of SCS to be twice the concentration of the adjacent oligotrophic open ocean, including the upstream areas of the Kuroshio and WPS, as shown in Figure 2a; these statements are consistent with previous studies [14,30,31]. In SCS, the Chl-a distribution has distinct seasonality caused by monsoon winds [32,33], whereas in WPS, Chl-a has indistinct seasonality and variability is considered to be relatively small, with an essentially constant nutrient supply from the deep ocean throughout the year [34]. Therefore, we can test the performances of various models for the reconstruction of Chl-a under different Chl-a production mechanisms.

Study Area
The Northern South China Sea (SCS) and West Philippine Sea (WPS), bounded by 115-125°E and 16-24°N, were selected to test the model performance for the reconstruction of Chla (Figure 1b). SCS is the largest oligotrophic marginal sea in the tropical and subtropical Pacific Ocean and is connected to the WPS only through Luzon Strait. There are different mechanisms controlling the Chl-a distributions in SCS and WPS, and this causes the mean chlorophyll concentration of SCS to be twice the concentration of the adjacent oligotrophic open ocean, including the upstream areas of the Kuroshio and WPS, as shown in Figure 2a; these statements are consistent with previous studies [14,30,31]. In SCS, the Chl-a distribution has distinct seasonality caused by monsoon winds [32,33], whereas in WPS, Chl-a has indistinct seasonality and variability is considered to be relatively small, with an essentially constant nutrient supply from the deep ocean throughout the year [34]. Therefore, we can test the performances of various models for the reconstruction of Chl-a under different Chl-a production mechanisms.

Chl-a
The satellite obtained sea surface chlorophyll-a (Chl-a sat ) data used in this study are a daily merged product obtained from the GlobColour project (http://globcolour.info). The data presented are a weighted average of multiple sensors (MODIS and VIIRSN (Visible Infrared Imaging Radiometer Suite)) with a resolution of ≈4 km for a period from 1 January 2015 to 31 December 2018. These data have been developed, validated, and distributed by ACRI-ST, France. For the domain of study, the size of the images is 240 × 192 pixels. Due to cloud cover, 82% of the sea pixels are missing in this dataset. The temporal variability of total missing data presents an irregular distribution with a minimum daily spatial average of about 30-50% and a maximum daily spatial average of more than 90% (Figure 1a). Spatially, as shown in Figure 1b, regions with the highest percentages of missing data are the coastal waters of Mainland China and ocean waters to the east of Taiwan, with an average data loss higher than 85%. Regions with the lowest percentage in missing data are the coastal waters off the Western Philippine Islands, with an average data loss between 50 and 70%. In the remaining regions, the average percentage of missing data varies from 70 to 85%. In order to obtain reliable results, images with

Chl-a
The satellite obtained sea surface chlorophyll-a (Chl-asat) data used in this study are a daily merged product obtained from the GlobColour project (http://globcolour.info). The data presented are a weighted average of multiple sensors (MODIS and VIIRSN (Visible Infrared Imaging Radiometer Suite)) with a resolution of ≈4 km for a period from 1 January 2015 to 31 December 2018. These data have been developed, validated, and distributed by ACRI-ST, France. For the domain of study, the size of the images is 240 × 192 pixels. Due to cloud cover, 82% of the sea pixels are missing in this dataset. The temporal variability of total missing data presents an irregular distribution with a minimum daily spatial average of about 30-50% and a maximum daily spatial average of more than 90% (Figure 1a). Spatially, as shown in Figure  1b, regions with the highest percentages of missing data are the coastal waters of Mainland China and ocean waters to the east of Taiwan, with an average data loss higher than 85%. Regions with the lowest percentage in missing data are the coastal waters off the Western Philippine Islands, with an average data loss between 50 and 70%. In the remaining regions, the average percentage of missing data varies from 70 to 85%. In order to obtain reliable results, images with cloud cover larger than 98% were not used for the analysis, which is the same threshold as that of Alvera-Azcárate et al. [35], so the final temporal size was 1268 images (out of an initial 1461 images).

SST
This study is different from other multivariate DINEOF Chl-a reconstruction methodologies, which use gapped SST satellite images, such as Alvera-Azcárate et al. [14] and Damien Sirjacobs et al. [15]. We used cloud-free daily MW-IR SST data spanning from 1 January 2015 to 31 December 2018, which can provide more effective information for Chl-a reconstruction. This dataset was generated by Remote Sensing Systems (RSS), using multiple sources of satellite data and optimal interpolation (OI) (www.remss.com/measurements/seasurface-temperature/oisst-description/). The horizontal resolution was ≈9 km for SST, which was interpolated to the Chl-a grid (4 km).

SST
This study is different from other multivariate DINEOF Chl-a reconstruction methodologies, which use gapped SST satellite images, such as Alvera-Azcárate et al. [14] and Damien Sirjacobs et al. [15]. We used cloud-free daily MW-IR SST data spanning from 1 January 2015 to 31 December 2018, which can provide more effective information for Chl-a reconstruction. This dataset was generated by Remote Sensing Systems (RSS), using multiple sources of satellite data and optimal interpolation (OI) (www.remss.com/measurements/sea-surface-temperature/oisst-description/). The horizontal resolution was ≈9 km for SST, which was interpolated to the Chl-a grid (4 km).

Cross-Validation Data
There are two types of cross-validation datasets in this paper. One was used to optimize the reconstruction algorithm by minimizing the expected error between the cross-validation points and their reconstructed values, iteratively, denoted "test data" in Figure 3. The other dataset was considered to be the "missing data" in the whole reconstruction process, thereby considered to be independent in the evaluation of the performance of the reconstruction method, denoted "CV-data" in Figure 3. The CV-data were extracted following the methodology of Barth et al. [23]. First, the cloud masks from the first 50 images of the Chl-a time series were extracted. Second, the extracted cloud masks were "put" on the other 50 images, which were randomly selected from the rest of the Chl-a time series. In this way, a total of 286,461 measurements (0.55% of the total sea pixels and 2.57% of the total measurements) were withheld. The test data of the DINEOF methodology were extracted followed Alvera-Azcárate et al. [35], whereby the 20 cleanest images of the "Input data" in Figure 3 were covered using the clouds extracted from the rest of the images. Thus, a total of 434,834 measurements (0.83% of the total sea pixels and 4.01% of the valid pixels in the input data) were withheld. The test data for the DINCAE method are introduced in Section 3.2.
images, which were randomly selected from the rest of the Chl-a time series. In this way, a total of 286,461 measurements (0.55% of the total sea pixels and 2.57% of the total measurements) were withheld. The test data of the DINEOF methodology were extracted followed Alvera-Azcárate et al. [35], whereby the 20 cleanest images of the "Input data" in Figure 3 were covered using the clouds extracted from the rest of the images. Thus, a total of 434,834 measurements (0.83% of the total sea pixels and 4.01% of the valid pixels in the input data) were withheld. The test data for the DINCAE method are introduced in Section 3.2.  Figure 4 shows the temporal and spatial distributions of the CV-data. In terms of temporal distribution, the CV-data are randomly distributed among different days of the year. The ratio of the number of CV-data in each image to ocean pixels (41,098 points) is 0-60%. In terms of spatial distribution, CV-data are more densely distributed over the sea area near the Philippines and less densely distributed over the South China Sea continental shelf and the waters off the east coast of Taiwan, but generally cover the research area.  Figure 4 shows the temporal and spatial distributions of the CV-data. In terms of temporal distribution, the CV-data are randomly distributed among different days of the year. The ratio of the number of CV-data in each image to ocean pixels (41,098 points) is 0-60%. In terms of spatial distribution, CV-data are more densely distributed over the sea area near the Philippines and less densely distributed over the South China Sea continental shelf and the waters off the east coast of Taiwan, but generally cover the research area.

In Situ Dataset
In situ surface Chl-a data (called Chl-asitu) from 6 to 17 August 2015 were collected during the Kuroshio experiment in the summer of 2015. The total number of surface samples was 21

In Situ Dataset
In situ surface Chl-a data (called Chl-a situ ) from 6 to 17 August 2015 were collected during the Kuroshio experiment in the summer of 2015. The total number of surface samples was 21 from 15 in situ stations, the positions of which are shown by the blue circles in Figure 1b. For every in situ sample, the Chl-a satellite image with the shortest time interval (to the collection time of the in situ data sample) was selected, in order to search for the nearest grid cell relative to the location of the field measurement. In order to make the in situ validation more realistic, only cloudy satellite Chl-a matchups (20 samples) were used in the following calculations.

Methods
In this study, a total of four reconstruction schemes were implemented by combining the two reconstruction techniques and two input datasets, as shown in Table 1. The same CV-data were chosen to compare the performance of these four reconstruction schemes. The test data used to verify the reconstruction effort during the iteration were randomly generated internally by the reconstruction algorithm, shown in Figures 3 and 5.  At the data preprocessing stage, the respective time averages of Chl-asat and SSTsat are subtracted in order to obtain the Chl-a and SST anomalies. Thereafter, Chl-a and SST anomalies are scaled by the inverse of the error variance which is effectively a constant (zero for the data that are missing). The precise value of this constant is not important because this constant will be multiplied by a weight matrix which will be optimized during the network training [23]. In this study, the parameters of the input dataset in rec2 (rec1) are given in Table 2, and the total input dataset size was 13 (10) × 240 × 192 × 1124.  Additionally, Chl-a sat data were transformed to base-e logarithm (ln) values, following the approach used by Hilborn and Costa [36]. The following discussion refers to ln(mg m −3 ) unless otherwise stated. The root mean square error (RMSE) and bias used in this paper are defined as follows:

Index Var Name
Remote Sens. 2020, 12, 480 where n is the number of samples, y i is the satellite derived value, and y i is the reconstructed value.

DINEOF
Before making the calculations, the spatial and temporal means were removed from the Chl-a sat and SST sat data separately, and results were normalized; missing data were set to zero. The missing data were then calculated by an iterative procedure. For a detailed description of DINEOF, please refer to Alvera-Azcárate et al. [9] and Alvera-Azcárate et al. [16]. A filter of the temporal covariance matrix was applied to improve the temporal coherency of the results, following Alvera-Azcárate et al. [35]. This filter effectively keeps the coherency in time to ensure a smooth transition between subsequent images. The filter length was 1.1 days. In this study, DINEOF was implemented using the 3.0 Linux binary available through the University of Liège GeoHydrodynamics and Environment Research group (GHER) [37].
The optimal number of EOFs retained for the reconstruction was determined by cross validation using the "test data" in Figure 3, and the convergence threshold was set to 0.001. The expected maximum for the number of modes was 72. DINEOF has two options to reconstruct missing data; fully reconstructed images and filled images. The former were activated in this study. Therefore, all ocean color data were reconstructed at every ocean pixel (including non-missing pixels).

DINCAE
The DINCAE method was presented by Barth et al. [23]. This method can be trained using incomplete satellite observations, in order to reconstruct missing observations and to provide an error estimate for the reconstruction. The open source DINCAE toolbox based on Python is available at https://github.com/gher-ulg/dincae. The specific methodology used for the DINCAE reconstruction is shown in Figure 5. It can be divided into three phases: data preprocessing, the training phase, and the reconstruction phase.
At the data preprocessing stage, the respective time averages of Chl-a sat and SST sat are subtracted in order to obtain the Chl-a and SST anomalies. Thereafter, Chl-a and SST anomalies are scaled by the inverse of the error variance which is effectively a constant (zero for the data that are missing). The precise value of this constant is not important because this constant will be multiplied by a weight matrix which will be optimized during the network training [23]. In this study, the parameters of the input dataset in rec2 (rec1) are given in Table 2, and the total input dataset size was 13 (10) × 240 × 192 × 1124.

Index
Var Name 1 ln(Chl-a) anomalies scaled by the inverse of the error variance (zero if the data are missing) 2 Inverse of the error variance (zero if the data are missing)

3-4
Scaled ln(Chl-a) anomalies and inverse of error variance of the previous day 5- 6 Scaled ln(Chl-a) anomalies and inverse of error variance of the following day 7 Longitude (scaled linearly between −1 and 1) 8 Latitude (scaled linearly between −1 and 1) 9 Cosine of the day of the year divided by 365.25 10 Sine of the day of the year divided by 365. 25 11 SST anomalies scaled by the inverse of the error variance (zero if the data are missing) * 12 Scaled SST anomalies of the previous day * 13 Scaled SST anomalies of the following day * * used in rec2.
Remote Sens. 2020, 12, 480 8 of 21 During the training phase, the input data (see Figure 5) are used to train the DINCAE neural network iteratively. A benefit of the neural network training is that GPU (graphical processing unit) parallel computing can be applied, and the training speed is effectively accelerated. However, due to limits in GPU memory, the input dataset was randomly shuffled (over the time dimension) and then cut into minibatches that included 50 images, as an array of size 13 × 240 × 192 × 50. The complete time series was split into 26 minibatches with 50 images each and one last minibatch with only 18 images (representing a total of 1268, as mentioned before). An optimization cycle using all 26 minibatches is called an epoch. In order to prevent the minimization algorithm from being trapped in a local minimum, for each minibatch, the neural network must mark some valid data values as "missing" in order to test the training effect (test data), and the training itself uses the remaining valid data (training data) randomly. Nevertheless, the test data is eliminated in the reconstruction phase. Thus, the networks of DINCAE are trained using these training data, and the structure of DINCAE can be divided into five parts: an input layer, encoding layers, fully connected layers, decoding layers, and an output layer: The input layer is used to receive the training datasets (training phase) and test datasets (reconstruction phase).

2.
Encoding layers are equivalent to the singular value decomposition (SVD) in DINEOF, which can effectively reduce the dimensionality of the input data in order to simplify the complexity of the required computing used in the neural networks. The work of the convolution layer is to extract the features of the input data, where the number and scale of these features depend on the depth of the convolution layer, which is also called the "convolution kernel size". The work of the pooling layer is to compress the features extracted by the convolution layers, and the compression degree is determined by the type, size, and strides of the pooling layer. In this study, the size of convolution kernel was (3 × 3), and the maximum pooling was chosen, where pool size = (2,2) and strides = (2,2).

3.
Fully connected layers are equivalent to the hidden layer in the traditional feedforward neural network. Their function is to combine the extracted features nonlinearly. There were two full connection layers in this study, which were N/5 (rounded to the nearest integer) and N, where N is the number of neurons in the last pooling layer of the encoder.

4.
Decoding layers are composed of convolution layers and interpolation layers (nearest neighbor interpolation) to upsample the results. The interpolation layers are skip connected to the output of the pooling layers in order to capture small-scale structures that are lost in the encoding layers and fully connected layers [38].

5.
The output layer gives the results; that is, an array T ijk with a size of 240 × 192 × 2, including two parameters: T ij1 : Chl-a scaled by the inverse of the expected error variance; T ij2 : logarithm of the inverse of the expected error variance.
There are five encoding layers and five decoding layers in this work and their filter size is the same as that used by Barth et al. [23], hereafter referred to as the Barth scheme.
Similar to a DINEOF reconstruction (Figure 3), the CV-data in a DINCAE reconstruction are also not used during the reconstruction phase, as shown in Figure 5. In this study, the reconstruction was performed every 10 epochs to obtain the effect of reconstruction, by cross-validation. The rebuilt Chl-a anomalyĉ hl ij and the corresponding error varianceσ ij are computed as: where γ = 10 and δ = 10 −3 ln mg · L −1 −2 , the same as those used by Barth et al. [23]. The two constants are introduced to avoid a division by a value close to zero.
Overfitting is a common problem in neural network training, as it leads to degraded results. In order to avoid overfitting, there are four strategies carried out during DINCAE training: (1) A drop-out layer is introduced between the fully connected layers, and is randomly inactivated at a given rate (hereafter referred to as the dropout_rate, to be used later) of neurons in the fully connected layers. (2) Gaussian-distributed noise is added to the input data with a zero mean and a given standard deviation (hereafter referred to as jitter_std, to be used later). (3) A regularization technique is added to the Adam optimizer [39]. The basic idea of regularization is to add a penalty term to the loss function to punish the large weight and reduce the complexity of the model and prevent overfitting. In this study, the regularization parameter is set by , where L2 is the regularization term, β L2 is the weight for L2, and the value is given as L2_beta in the following discussion. Other standard parameters for the Adam optimizer are the learning rate α = 0.001 and the exponential decay rates for the first moment estimates β 1 = 0.9 and the second moment estimates, β 2 = 0.999.
(4) The activation function in the convolution layers uses Leaky-RELU [40] instead of the rectified linear unit (RELU) to prevent gradients from falling too fast: Note that during the reconstruction phase, drop-out and Gaussian-distributed noise are disabled.

Results
For DINEOF reconstruction, the optimal numbers of EOFs kept for reconstruction are 23 and 26 for rec3 and rec4 respectively, minimizing the expected errors to 0.35 and 0.36 ln(mg m −3 ), respectively, as shown in Figure 6. These optimal EOFs can explain 98.74% and 98.59% of the initial variance for rec3 and rec4, respectively, which are calculated as the variance of points not covered by clouds.
DINCAE is different from DINEOF. Whereas the DINCAE method is a new technique, DINEOF has proven its generality by many studies [10,[17][18][19]. Although Barth et al. [23] obtained good results in their SST reconstruction using their EOF scheme with dropout_rate = 0.3, jitter_std = 0.05, and L2_beta = 0, its robustness should be further tested in many future studies. In this study, we took the Barth scheme as a reference scheme to find the optimal combination of dropout_rate, jitter_std, and L2_beta for rec2, proceeding in a step-by-step methodology.
First, an optimal jitter_std was chosen from (0.05, 0.10, 0.15). The specific method is that we let dropout_rate = 0.3 and jitter_std = 0, the same as in the Barth scheme, but we changed jitter_std only. The RMSE between the CV-data and the corresponding reconstruction data was computed every 10 epochs. The results show that the RMSE is smallest when jitter = 0.05 (Figure 7a). We update jitter in the reference scheme, and here we kept the jitter to 0.05, and then, in a similar way, we then chose the optimal dropout_rate and L2_beta from (0.3, 0.9, 1.2) and (0, 0.01, 0.001, 0.0001), respectively. Finally, we obtained an optimal combination; that is, jitter_std = 0.05, dropout_rate = 0.9, and L2_beta = 0.001. Applying the same parameterization selection process to rec1, we obtained the same optimal combination as rec2. The results for rec1 and rec2, which use the optimal combination, are shown in Figure 7d. Both RMSE solid lines have an initial sharp decrease, and after 200 epochs they mostly stabilize, but still present some fluctuations. The dotted blue and orange lines represent the average reconstructions between epochs 200 and 1000 for rec1 and rec2 respectively. The results show that the performance of rec2 is slightly better than that of rec1.

Results
For DINEOF reconstruction, the optimal numbers of EOFs kept for reconstruction are 23 and 26 for rec3 and rec4 respectively, minimizing the expected errors to 0.35 and 0.36 ln(mg m −3 ), respectively, as shown in Figure 6. These optimal EOFs can explain 98.74% and 98.59% of the initial variance for rec3 and rec4, respectively, which are calculated as the variance of points not covered by clouds. DINCAE is different from DINEOF. Whereas the DINCAE method is a new technique, DINEOF has proven its generality by many studies [10,[17][18][19]. Although Barth et al. [23] obtained good results in their SST reconstruction using their EOF scheme with dropout_rate = 0.3, jitter_std = 0.05, and L2_beta = 0, its robustness should be further tested in many future studies. In this study, we took the Barth scheme as a reference scheme to find the optimal combination of dropout_rate, jitter_std, and L2_beta for rec2, proceeding in a step-by-step methodology.
First, an optimal jitter_std was chosen from (0.05, 0.10, 0.15). The specific method is that we let dropout_rate = 0.3 and jitter_std = 0, the same as in the Barth scheme, but we changed jitter_std only. The RMSE between the CV-data and the corresponding reconstruction data was computed every 10 epochs. The results show that the RMSE is smallest when jitter = 0.05 ( Figure  7a). We update jitter in the reference scheme, and here we kept the jitter to 0.05, and then, in a similar way, we then chose the optimal dropout_rate and L2_beta from (0.3, 0.9, 1.2) and (0, 0.01, 0.001, 0.0001), respectively. Finally, we obtained an optimal combination; that is, jitter_std = 0.05, dropout_rate = 0.9, and L2_beta = 0.001. Applying the same parameterization selection process to rec1, we obtained the same optimal combination as rec2. The results for rec1 and rec2, which use the optimal combination, are shown in Figure 7d. Both RMSE solid lines have an initial sharp decrease, and after 200 epochs they mostly stabilize, but still present some fluctuations. The dotted blue and orange lines represent the average reconstructions between epochs 200 and 1000 for rec1 and rec2 respectively. The results show that the performance of rec2 is slightly better than that of rec1.  In this study, the computing environment was an Intel Core E5-2643 V4, GeForce GTX1080Ti, with the tensorflow v1.2 network library [41]. The calculation times for the four reconstruction schemes from rec1 to rec4 were 5, 5.5, 24.2 and 61.9 h, respectively. Due to the GPU acceleration in computation, the calculation time of DINCAE (1000 epochs) was an order of magnitude smaller than that of DINEOF. As shown in Figure 5, the test data in each minibatch were randomly masked as clouds. This random selection ensures the fairness of the neural network training, but also leads to some fluctuations in the gradients, and therefore, in the weights of DINCAE. Averaging the output over different epochs can effectively eliminate these random fluctuations, which was shown by Barth et al. [23]. In this study, a total of 80 outputs of the neural network between epoch 200 and epoch 1000 (saved every 10th epoch) were averaged, and this averaged reconstruction is slightly better than any intermediate result (dotted lines in Figure 7d). In the analysis, we took those averaged fields to represent the DINCAE reconstructions.

Reconstruction Statistics Using CV-Data
The scatterplot between the CV-data (Chl-a sat ) and the corresponding Chl-a rec model results are shown in Figure 8 (Table 3), the intercepts are less than 0 and slopes are less than 1, meaning that the reconstructed Chl-a values are underestimated at high concentrations, and overestimated at low concentrations.  Spatially, the RMSE values calculated between CV-data and the corresponding reconstruction values, for each pixel time series, are shown in Figure 9. These four reconstruction schemes have similar RMSE spatial patterns, with larger RMSE values over the SCS continental shelf and coastal waters, compared to smaller RMSE values over the open ocean. It is notable that adding SST to the DINCAE reconstruction can reduce the RMSE over the shelf area (Figure 9a,b). By contrast, the RMSE over the shelf area increases when SST is added to the DINEOF reconstruction (Figure 9c,d).

Spatial Distribution of Reconstruction for a Specified Day
To further verify the ability of spatial reconstruction for these four reconstruction cases, we selected 17 May 2018, which has the most points in 50 CV-data figures to make comparisons between the spatial distributions of the satellite-derived and the modeled Chl-a. The corresponding SST distribution which was used as the auxiliary variable in rec2 and rec4 is also shown in Figure 10g. The satellite image, with more than 90% gaps over the whole study domain, was used in the reconstruction by DINCAE and DINEOF methodologies. These As mentioned in Section 2.1, the chlorophyll concentration controlling mechanisms are very different for SCS and WPS. For simplicity, we partitioned the study area into SCS and WPS by the 120-degree longitude line, in order to assess the model performance for the Chl-a reconstructions under different Chl-a production mechanisms. Table 3 shows the numbers and the mean of cross-validation points between these areas, SCS and WPS, and provides the statistical results for the four reconstruction schemes in these two sub-areas. The cross-validation points in the SCS and WPS are not much different, specifically 145,476 and 140,985 points respectively, whereas the average chlorophyll concentration for SCS is three times that of WPS (0.34 mg m −3 vs. 0.11 mg m −3 ). The comparison shows that, in general for all four reconstruction schemes, higher productivity is achieved in SCS than in WPS. Similar to the global reconstruction results shown in Figure 8, DINCAE is superior to DINEOF in these two sub-areas, producing lower RMSE and higher R. Specifically, DINCAE reconstructions are slightly improved by adding SST to the input data in SCS, as shown by comparing rec1 and rec2, producing lower RMSE values (from 0.26 to 0.25), higher R values (from 0.95 to 0.96), and biases that are nearest zero (from −0.027 to −0.019). However, the improvement from adding SST in DINCAE reconstructions disappears in WPS. For all reconstruction implementations (Table 3), the intercepts are less than 0 and slopes are less than 1, meaning that the reconstructed Chl-a values are underestimated at high concentrations, and overestimated at low concentrations.

Spatial Distribution of Reconstruction for a Specified Day
To further verify the ability of spatial reconstruction for these four reconstruction cases, we selected 17 May 2018, which has the most points in 50 CV-data figures to make comparisons between the spatial distributions of the satellite-derived and the modeled Chl-a. The corresponding SST distribution which was used as the auxiliary variable in rec2 and rec4 is also shown in Figure 10g. The satellite image, with more than 90% gaps over the whole study domain, was used in the reconstruction by DINCAE and DINEOF methodologies. These reconstructed fields were generated from the artificial cloud dataset Chl-a sat+cloud and have a reasonably accurate spatial distribution, with patterns that are also indirectly exhibited by those of Chl-a rec (see Figure 10a,c-f). As such, all four reconstruction schemes can reconstruct the high Chl-a variations (bright yellow color in Figure 10a-f) exhibited in the Taiwan Bank Upwelling and Dongshan Upwelling [12], where very strong upwelling brings cold deep water to the surface, leading to the relatively low SST shown in Figure 10g. Mesoscale features such as filaments associated with the Kuroshio near the Luzon Strait are evident, as shown in Chl-a and SST distributions (Figure 10a,g), and were also revealed in the four reconstruction methodologies (Figure 10c-f).
In order to clearly compare the spatial differences between the four modeled Chl-a and the remotely sensed Chl-a, we introduce the spatial distribution of relative error (Re err ) defined as (Chl-a rec -Chl-a sat )/Chl-a sat , shown in Figure 11. Obviously, the Re err values for rec1, rec2, and rec4 are much larger than those of rec3 near the Han River discharge area (22 • N, 115 • E). A possible reason is that the Chl-a distribution is greatly affected by runoff and human activities near the river plume, leading to low correlation coefficients (−0.15, 0.15) between Chl-a and SST, as shown in Figure 2b. DINCAE may not capture the characteristics of the Chl-a distribution in this area, or there may be overfitting. Furthermore, adding auxiliary data such as SST into the reconstruction models may not improve the results, and may even make them worse, given the near-zero correlation coefficients between Chl-a and SST in this area. For the remaining area, the performance of DINCAE is better than DINEOF, especially in the basin areas. Furthermore, all four methods perform well in terms of scatterplots and statistical results, as shown in Figure 12. Moreover, rec2 has the best overall results, producing the lowest RMSE (0.20), with intercept nearest to zero (−0.14), highest R (0.96), and slope closest to 1.0 (0.94).
the high Chl-a variations (bright yellow color in Figure 10a-f) exhibited in the Taiwan Bank Upwelling and Dongshan Upwelling [12], where very strong upwelling brings cold deep water to the surface, leading to the relatively low SST shown in Figure 10g. Mesoscale features such as filaments associated with the Kuroshio near the Luzon Strait are evident, as shown in Chl-a and SST distributions (Figure 10a,g), and were also revealed in the four reconstruction methodologies (Figure 10c-f). In order to clearly compare the spatial differences between the four modeled Chl-a and the remotely sensed Chl-a, we introduce the spatial distribution of relative error (Reerr) defined as (Chl-arec-Chl-asat)/Chl-asat, shown in Figure 11. Obviously, the Reerr values for rec1, rec2, and rec4 are much larger than those of rec3 near the Han River discharge area (22°N, 115°E). A possible reason is that the Chl-a distribution is greatly affected by runoff and human activities near the river plume, leading to low correlation coefficients (−0.15, 0.15) between Chl-a and SST, as shown in Figure 2b. DINCAE may not capture the characteristics of the Chl-a distribution in this area, or there may be overfitting. Furthermore, adding auxiliary data such as SST into the reconstruction models may not improve the results, and may even make them worse, given the near-zero correlation coefficients between Chl-a and SST in this area. For the remaining area, the performance of DINCAE is better than DINEOF, especially in the basin areas. Furthermore, all four methods perform well in terms of scatterplots and statistical results, as shown in Figure 12. Moreover, rec2 has the best overall results, producing the lowest RMSE (0.20), with intercept nearest to zero (−0.14), highest R (0.96), and slope closest to 1.0 (0.94).

Validation with In Situ Observations
A total of 20 samples were used to assess the performance of the Chl-a reconstruction algorithms. As shown in Figure 13, the Chl-arec of the four reconstruction schemes generally agree well with in situ Chl-a. Among them, the fitted line for rec1 (the blue dotted line in Figure  13) is closest to the ideal line (the grey solid line in Figure 13), with R, slope, and intercept values of 0.925, 0.81, and −0.16 ln(mg m −3 ), respectively (shown in Table 4). For these four reconstruction schemes, the slopes of the linear fit between Chl-asitu and Chl-arec are all less than 1, with negative intercepts and high correlation coefficients (R > 0.91), indicating the trend that reconstructed Chl-a values appear to be overestimated for lower values (<0.4 mg m −3 ) and underestimated for higher values (0.4-1.6 mg m −3 ); this pattern is similar to that shown by the cross validation mentioned above (Figures 8 and 12). One possible reason for this result is the smoothness of the reconstruction method, which makes the reconstructed values in the range of extreme values shift to the normal values. It is not expected that in the comparisons of the two pairs of univariate and multivariable reconstruction schemes, viz., in comparing rec1 vs. rec2 and rec3 vs. rec4, both univariate schemes perform better than the corresponding multivariable

Validation with In Situ Observations
A total of 20 samples were used to assess the performance of the Chl-a reconstruction algorithms. As shown in Figure 13, the Chl-a rec of the four reconstruction schemes generally agree well with in situ Chl-a. Among them, the fitted line for rec1 (the blue dotted line in Figure 13) is closest to the ideal line (the grey solid line in Figure 13), with R, slope, and intercept values of 0.925, 0.81, and −0.16 ln(mg m −3 ), respectively (shown in Table 4). For these four reconstruction schemes, the slopes of the linear fit between Chl-a situ and Chl-a rec are all less than 1, with negative intercepts and high correlation coefficients (R > 0.91), indicating the trend that reconstructed Chl-a values appear to be overestimated for lower values (<0.4 mg m −3 ) and underestimated for higher values (0.4-1.6 mg m −3 ); this pattern is similar to that shown by the cross validation mentioned above (Figures 8 and 12). One possible reason for this result is the smoothness of the reconstruction method, which makes the reconstructed values in the range of extreme values shift to the normal values. It is not expected that in the comparisons of the two pairs of univariate and multivariable reconstruction schemes, viz., in comparing rec1 vs. rec2 and rec3 vs. rec4, both univariate schemes perform better than the corresponding multivariable schemes, with a lower RMSE (0.

Discussion
In this paper, both CV-data and in situ data were used to assess the performances of four reconstruction schemes. Among them, the CV-data were marked as clouds and never used in the whole reconstruction process, so that those data could be considered independent. However, since the CV-data and input data were from the same dataset (see Figure 3), compared with other truly independent data, the distribution of the CV-data was obviously more similar to the input data. Therefore, the cross-validation results were expected to be unrealistically good when using these CV-data as the cross validation dataset. However, as mentioned above, the shortage of CV-data can be ignored when using the same CV-data to compare the performances of the four reconstruction schemes. For the in situ validation, when matching the fieldmeasured samples with the Chl-a image sequence, using a large time window (12 h) may lead to a slight bias in the field evaluation, but this was not expected to qualitatively alter our conclusions.
The methodology of the DINEOF cross-validation dataset selection (test data in Figure 3) is also very important. Usually, randomly-selected cross-validation points will lead to quite a high number of retained EOFs and an unrealistically low cross-validation RMSE. However, Figure 13. Comparison of the reconstructed Chl-a rec data from the four reconstruction schemes, with in situ Chl-a situ data. The dotted lines represent the fitted lines for the corresponding scatter points using the same colors, for the four schemes. The grey solid line represents the ideal line.

Discussion
In this paper, both CV-data and in situ data were used to assess the performances of four reconstruction schemes. Among them, the CV-data were marked as clouds and never used in the whole reconstruction process, so that those data could be considered independent. However, since the CV-data and input data were from the same dataset (see Figure 3), compared with other truly independent data, the distribution of the CV-data was obviously more similar to the input data. Therefore, the cross-validation results were expected to be unrealistically good when using these CV-data as the cross validation dataset. However, as mentioned above, the shortage of CV-data can be ignored when using the same CV-data to compare the performances of the four reconstruction schemes. For the in situ validation, when matching the field-measured samples with the Chl-a image sequence, using a large time window (12 h) may lead to a slight bias in the field evaluation, but this was not expected to qualitatively alter our conclusions.
The methodology of the DINEOF cross-validation dataset selection (test data in Figure 3) is also very important. Usually, randomly-selected cross-validation points will lead to quite a high number of retained EOFs and an unrealistically low cross-validation RMSE. However, using point clusters instead of random points as cross-validation data can reduce the number of retained EOFs and also improve the reconstruction results by removing high EOF modes which mostly contain noise.
The trained DINCAE, discussed in the previous sections, has been saved and is denoted rec1-2015 (and rec2-2015). The latter can be reloaded to reconstruct additional datasets that originate from the same source as trained data, but at different times, such as the dataset obtained for January-to-October 2019; this process is called DINCAE-prediction. In order to capture the capability of DINCAE-prediction, we extracted CV-data-2019 from a dataset from 2019 and used the remaining dataset, denoted input-2019, as input data for reconstruction, using the DINCAE method. The CV-data-2019 selection is the same as the test data selection described in Figure 5, but now covers the 10 cleanest images of input-2019, using the clouds extracted randomly from the rest of the images. Thus, a total of 214,121 measurements were withheld.
We used the models denoted rec1-2015/rec2-2015, trained with 900 epochs, using the dataset of the time period, 2015-2018, to reconstruct the missing Chl-a values of input-2019. The scatterplots between the CV-data-2019 and the corresponding predicted data using rec1-2015 and rec2-2015 are shown in Figure 14a

Conclusions
In this study, we successfully applied DINCAE to reconstruct a series of Chl-a satellite Both validations used CV-data and in situ data, proving that DINCAE (rec1 and rec2) performs better than DINEOF (rec3 and rec4) in Chl-a reconstructions. See Figures 8 and 13 for details. The same results were obtained by Barth et al. [23] in SST reconstructions. Therefore, it is reasonable to suggest that DINCAE is a valid generic reconstruction method like DINEOF. An unexpected result is that the performances of the two multivariate reconstructions (rec2 and rec4) are worse than those of the two univariate reconstructions (rec1 and rec3) with respect to in situ validation (see Figure 13 and Table 4). Compared with the cross-validation dataset, the number of in situ samples in this paper was too small (20 samples), and the measured time is too short, too concentrated (6 to 17 August 2015). Although the in situ validation is more convincing, the cross-validation in this study can better reflect the overall reconstruction effect of the missing dataset. In this study, we did not filter the outliers, which may slightly influence the performance of reconstruction methodologies. But this is not the key focus of this study.

Conclusions
In this study, we successfully applied DINCAE to reconstruct a series of Chl-a satellite products over the Northern South China Sea (SCS) and the West Philippine Sea (WPS). To our knowledge, this is the first time that the DINCAE technique has been used to reconstruct gaps in the Chl-a sat data. We also compared the reconstruction performances of DINCAE and DINEOF using Chl-a-only data, and Chl-a and SST data, as input data based on the same cross-validation data. In this methodology, a total of four reconstruction schemes are implemented. Both cross-validation and in situ validation show the high performances of DINEOF and DINCAE methods in Chl-a reconstruction. Furthermore, DINCAE is superior to DINEOF. A somewhat surprising result is that, in contrast to the cross-validation results, the performances of the multivariate reconstruction methods in in situ validation are not as good as those of the univariate reconstruction methods. The prediction capability of DINCAE was also tested and results were found to agree well with the CV-data-2019; see Figure 14a,b. Retraining the prediction model further improves the reconstruction results, as shown in Figure 14c,d.
It is notable that for similar results, DINCAE is 5-10 times faster than DINEOF. Both DINCAE and DINEOF have better performances in SCS than in WPS, where there are different chlorophyll growth and control mechanisms. Considering that the average SCS chlorophyll concentration is three times that of WPS, given the same sources of noise, the signal-to-noise ratio for Chl-a sat in SCS will always be higher than in WPS. Thus, the performances of these reconstruction methodologies are accordingly different.
A common problem in neural network training is overfitting of the provided observations, leading to degraded results when using the cross validation. Some technical solutions can prevent overfitting, but their parameters need to be modified to suit different datasets. Through a series of sensitivity tests, we found that L2_beta is a key parameter to prevent overfitting in DINCAE. Other parameters (dropout_rate and jitter_std) may only have small effects. Similar to Barth et al. in SST reconstruction, the epoch average improved the results, albeit not as clearly Barth et al. [23]. However, epoch-averaging is a necessary method, because there are limitations in the cross-validation data, because it can only represent a small part of the whole dataset. Therefore, an accurate determination of which epoch can provide the actual minimum error for the reconstruction result is impossible; however, the epoch-averaging method can provide a way to resolve this issue.
In the data analysis methods that can be used to reconstruct missing data, the methodologies in this paper can be used to calculate reduction dimension projections, such as the singular value decomposition in DINEOF and the auto-encoder in DINCAE. This approach will inevitably lose some small-scale information. Therefore, data smoothing occurs. However, DINCAE can capture nonlinear and mutated relationships, thereby resolving part of the problem. Thus, this may be the reason that DINCAE is better than DINEOF, in terms of RMSE and the linear correlation coefficients between the CV-data (in situ data) and the corresponding molded data. In the future, a multivariate DINCAE