Biogeochemical Model Optimization by Using Satellite-Derived Phytoplankton Functional Type Data and BGC-Argo Observations in the Northern South China Sea

Marine biogeochemical models have been widely used to understand ecosystem dynamics and biogeochemical cycles. To resolve more processes, models typically increase in complexity, and require optimization of more parameters. Data assimilation is an essential tool for parameter optimization, which can reduce model uncertainty and improve model predictability. At present, model parameters are often adjusted using sporadic in-situ measurements or satellite-derived total chlorophyll-a concentration at sea surface. However, new ocean datasets and satellite products have become available, providing a unique opportunity to further constrain ecosystem models. Biogeochemical-Argo (BGC-Argo) floats are able to observe the ocean interior continuously and satellite phytoplankton functional type (PFT) data has the potential to optimize biogeochemical models with multiple phytoplankton species. In this study, we assess the value of assimilating BGC-Argo measurements and satellite-derived PFT data in a biogeochemical model in the northern South China Sea (SCS) by using a genetic algorithm. The assimilation of the satellite-derived PFT data was found to improve not only the modeled total chlorophyll-a concentration, but also the individual phytoplankton groups at surface. The improvement of simulated surface diatom provided a better representation of subsurface particulate organic carbon (POC). However, using satellite data alone did not improve vertical distributions of chlorophyll-a and POC. Instead, these distributions were improved by combining the satellite data with BGC-Argo data. As the dominant variability of phytoplankton in the northern SCS is at the seasonal timescale, we find that utilizing monthlyaveraged BGC-Argo profiles provides an optimal fit between model outputs and measurements in the region, better than using high-frequency measurements.


Introduction
Numerical models play a vital role in investigating complex marine ecosystem dynamics. Numerical models require multiple parameters to formulize the ecological processes, but it is often difficult to constrain model parameters. As the complexity of a marine model increases, thus does the number of model parameters. Constraining model parameter values and their uncertainties have a great influence on model performance [1,2], and one way of doing that is through data assimilation. Assimilating ocean remote sensing observations into the model to adjust model parameters and reduce their uncertainties can help towards a better representation of marine ecosystem dynamics [3][4][5][6][7][8][9].
Processes of growth, decay, and interaction by plankton are important in understanding marine ecosystem models and dynamics. Model parameters related to these processes are usually tuned empirically and arbitrarily. In order to reproduce observed data such as the distribution of phytoplankton and analyze underlying dynamics, it is required to reasonably estimate model parameters. Here, we developed and optimized a physical-biogeochemical model in the South China Sea (SCS) to study phytoplankton distributions and dynamics. The SCS is a large semi-enclosed marginal sea in the western Pacific. In winter, the SCS is dominated by the strong northeasterly monsoon, whereas in summer the winds reverse direction to southwesterly. The seasonal change of monsoon winds leads to variability in the upper ocean circulation [10,11]. The dominant temporal variability of biogeochemical processes in the upper ocean occurs on the seasonal timescale [12]. Ning et al. [13] reported low surface production in summer and high in winter. In the northern SCS, previous studies showed a negative correlation between satellitederived chlorophyll-a concentration (Chla) and sea surface temperature [14,15]. Recently, Geng et al. [16] demonstrated that the buoyancy flux induced mixing controls the seasonal variability of vertical nutrient transport and phytoplankton production in the northern SCS.
The winter mixing also has the potential to shoal the subsurface chlorophyll-a maximum (SCM) and change the vertical distribution of chlorophyll-a [16,17]. The SCM is a common feature in the northern SCS, contributing significantly to the depth-integrated primary production [18]. The formation and maintenance of SCM in the SCS have been investigated with idealized models [16,17,19], which identified key factors such as detritus remineralization, zooplankton grazing, phytoplankton sinking, and phytoplankton photoacclimation in modulating the SCM. However, the relative importance of these processes is largely determined by their parameterizations in the model. Rigorously quantifying processes that influence vertical distributions of chlorophyll-a thus requires a prior optimization of the model parameters.
Satellite-derived chlorophyll-a concentration has been widely used to constrain or evaluate biogeochemical models. However, ocean color satellites only detect the nearsurface. The SCM is generally located at~75 m in the northern SCS [16], which is far beyond the detection depth of satellites. In addition, satellite-derived chlorophyll-a is the bulk value of the near-surface water. It can be used to constrain the biogeochemical model that only simulates one phytoplankton group. For biogeochemical models with multiple phytoplankton groups or sizes, large uncertainties exist when using the satellite-derived bulk chlorophyll-a concentration to constrain the model, as additional assumptions are required on the relative contributions of phytoplankton groups [20][21][22][23].
Recent advances in estimating phytoplankton functional type (PFT) from satellite provides an opportunity to constrain models with multiple phytoplankton groups [24,25]. The PFT models are mostly derived from in situ High Performance Liquid Chromatography (HPLC) pigment data. Pigments determined by HPLC can be found in a variety of phytoplankton taxa and size classes, which may introduce uncertainties in the PFT model. For satellite PFT data, per-pixel uncertainties are generally difficult to quantify, especially in regions that are not covered by satellite and in situ match-up datasets [25]. Additional uncertainties can also occur because of differences in the temporal and spatial scales between satellite and in-situ data. Nevertheless, assimilation of PFT in the model has been shown to improve the simulation of phytoplankton community structure and produce a better total chlorophyll-a forecast in the North Atlantic [26,27]. Studies provide a PFT-based eco-regionalization of the Mediterranean Sea [28] and improve the simulation of PFT in the global ocean [29]. Biogeochemical-Argo (BGC-Argo) floats are able to sample the ocean vertically and continuously. Wang et al. [9] reported that assimilating profiles sampled by a BGC-Argo float yielded significant improvements for both surface and subsurface chlorophyll-a simulation. Utilizing both satellite PFT and BGC-Argo data in multi-phytoplankton biogeochemical models may thus have great potential in adjusting model parameters.
In this study, a one-dimensional (1D) physical-biogeochemical model has been developed in the northern SCS. By using a genetic algorithm, we investigate the value of using satellite PFT and BGC-Argo data to optimize model parameters. The aim of this study is to improve the simulation of vertical distributions of phytoplankton chlorophyll-a and particulate organic carbon (POC) concentrations in the northern SCS. Due to the computational cost, parameter optimization in a three-dimensional (3D) model is challenging. However, parameters optimized by using the 1D model can provide a useful baseline for 3D modeling [9,30,31].

Model Description
A coupled physical-biogeochemical model was developed in the northern SCS. The physical model is based on the Regional Ocean Modeling System (ROMS), which represents an evolution in the family of terrain-following vertical-coordinate models [32]. The model was set up with 100 layers in the vertical direction. The Mellor-Yamada Level 2.5 turbulence closure scheme was used in the model. The 1D model only considers vertical mixing processes without advection. Although it is simplified, previous studies have shown that the 1D coupled model can simulate vertical structures of biogeochemical variables reasonably well in the SCS basin [16,17]. The biogeochemical model is based on the Carbon, Silicate, and Nitrogen Ecosystem (CoSiNE) model, which has been calibrated and configured in the SCS [33,34]. The model has 2 phytoplankton groups (pico-phytoplankton (S1, Chl1) and diatoms (S2, Chl2)), 2 zooplankton groups (micro-zooplankton (Z1), mesozooplankton (Z2)), 2 size classes of particulate organic nitrogen (small (SPON), large (LPON)), biogenic silica (bSi), 4 inorganic nutrients (nitrate (NO 3 ), ammonium (NH 4 ), phosphate (PO 4 ), silicate (Si(OH) 4 )), dissolved oxygen (DO), and carbonate variables (dissolved inorganic carbon (DIC), total alkalinity (TALK)). In addition to the nitrogen-based biomass of phytoplankton, phytoplankton chlorophyll concentration for each group (pico-phytoplankton and diatom) was modeled following Geider et al. [35] by considering phytoplankton photo-acclimation with variable chlorophyll to biomass ratio. Nutrients such as nitrate, ammonium, and phosphate determine the growth of phytoplankton. Silicate is the additional nutrient that controls the growth of diatoms. Microzooplankton grazes on pico-phytoplankton, while mesozooplankton grazes on diatoms, microzooplankton, and detritus. The mortality and aggregation of phytoplankton and zooplankton form detritus, which are remineralized into inorganic matters during sinking. The 2 sizes of detritus were parameterized with different sinking speeds and remineralization rates. Model equations are as follows: where C represents the concentration of a biological variable. PHY(C) represents the contribution to the concentration change due to physical processes. BIO(C) represents biogeochemical source-minus-sink terms. The detailed equations are presented in Appendix A and the original model parameters are listed in Ma et al. [34]. The CoSiNE model has the capability of simulating 2 phytoplankton functional groups, pico-phytoplankton and diatom, which are 2 dominant phytoplankton groups in the northern SCS. Resolving phytoplankton functional groups in the model is important for realistic simulations of nutrient and carbon dynamics, because diatom-related aggregation and grazing processes tend to generate large particles that sink faster and remineralize deeper than the pico-phytoplankton related processes. As a consequence, biogeochemical models with 1 phytoplankton group may have difficulties in reasonably simulating the vertical distribution of the remineralization process of particulate organic matter, which can further lead to model bias in nutrient and carbon distributions. Furthermore, the CoSiNE model has been applied and validated in the SCS in many different studies. It has been used to study mesoscale eddies [36], Kuroshio intrusion fronts [37], carbon export [34], and cross-shelf exchange [38], etc., which shows its applicability in the SCS. Because of the model's ability in simulating two phytoplankton groups, we were able to utilize both satellite-derived PFT data and total chlorophyll-a concentration to constrain the model.
The coupled model was initialized with the climatological data from the World Ocean Atlas 2009 (WOA09) and was forced by the 6-hourly surface forcing fields from NCEP/NCAR reanalysis data, including air temperature at 2 m, surface wind components at 10 m, relative humidity at 2 m, sea level pressure, total cloud coverage, and net short-and long-wave radiations. The model was run for 4 years from January 2013 to December 2016.
The model was set up at the South East Asian Time series Study (SEATS; 18 • N, 116 • E) station in the northern SCS ( Figure 1). The SEATS is a station that is representative of typical physical and biogeochemical conditions in the northern SCS. The SEATS station is located away from coastal upwelling regions in the SCS [39][40][41]. The major nutrient supply to the upper layers is via vertical mixing [42]. The seasonal change of monsoon winds leads to variability in the upper ocean circulation. The dominant temporal variability of biogeochemical processes in the upper ocean occurs at the seasonal timescale, and surface production stays low in summer and high in winter. In the SCS, mesoscale eddies are ubiquitous, which have an important influence on the biological process, but eddies are often sporadic and do not show a clear seasonal pattern in the northern SCS [43]. Therefore, the influence of eddies is not discussed in the parameter optimization process in this study.
for realistic simulations of nutrient and carbon dynamics, because diatom-related aggregation and grazing processes tend to generate large particles that sink faster and remineralize deeper than the pico-phytoplankton related processes. As a consequence, biogeochemical models with 1 phytoplankton group may have difficulties in reasonably simulating the vertical distribution of the remineralization process of particulate organic matter, which can further lead to model bias in nutrient and carbon distributions. Furthermore, the CoSiNE model has been applied and validated in the SCS in many different studies. It has been used to study mesoscale eddies [36], Kuroshio intrusion fronts [37], carbon export [34], and cross-shelf exchange [38], etc., which shows its applicability in the SCS. Because of the model's ability in simulating two phytoplankton groups, we were able to utilize both satellite-derived PFT data and total chlorophyll-a concentration to constrain the model.
The coupled model was initialized with the climatological data from the World Ocean Atlas 2009 (WOA09) and was forced by the 6-hourly surface forcing fields from NCEP/NCAR reanalysis data, including air temperature at 2 m, surface wind components at 10 m, relative humidity at 2 m, sea level pressure, total cloud coverage, and net shortand long-wave radiations. The model was run for 4 years from January 2013 to December 2016.
The model was set up at the South East Asian Time series Study (SEATS; 18°N, 116°E) station in the northern SCS ( Figure 1). The SEATS is a station that is representative of typical physical and biogeochemical conditions in the northern SCS. The SEATS station is located away from coastal upwelling regions in the SCS [39][40][41]. The major nutrient supply to the upper layers is via vertical mixing [42]. The seasonal change of monsoon winds leads to variability in the upper ocean circulation. The dominant temporal variability of biogeochemical processes in the upper ocean occurs at the seasonal timescale, and surface production stays low in summer and high in winter. In the SCS, mesoscale eddies are ubiquitous, which have an important influence on the biological process, but eddies are often sporadic and do not show a clear seasonal pattern in the northern SCS [43]. Therefore, the influence of eddies is not discussed in the parameter optimization process in this study.

Sensitivity Analysis
There are more than 40 parameters related to biochemical processes in the CoSiNE model. Parameters with high sensitivity should have priority for optimization. Therefore, a sensitivity analysis was carried out first to identify the parameters that have an important influence on modeled chlorophyll-a concentration. The initial parameter values

Sensitivity Analysis
There are more than 40 parameters related to biochemical processes in the CoSiNE model. Parameters with high sensitivity should have priority for optimization. Therefore, a sensitivity analysis was carried out first to identify the parameters that have an important influence on modeled chlorophyll-a concentration. The initial parameter values refer to previous studies [34,[44][45][46], and possible ranges for each parameter, before the optimization, were defined following the method of Hemmings et al. [47] and Kaufman et al. [31]. The upper and lower bounds for biological parameters, such as parameters related to phytoplankton growth and zooplankton grazing, were selected from the rules of previous studies [31,47]. For the parameters that were unique in the CoSiNE model, bounds were set to be half and double the initial values. For fractional parameters, values were set to vary from 0.05 to 0.95. The sensitivity analysis experiment was conducted in 2 steps. Firstly, we used a sensitivity formulation to perform a screening sensitivity analysis. The following sensitivity function is employed to measure the local sensitivity of the biological parameters to model outputs [45,48]: where X x−% is the value of biological parameter subtracted by a fixed ratio (50%), C x−% is the corresponding annual mean chlorophyll-a concentration at the sea surface. Parameters with larger S c,x values are thought to be more sensitive to the model. With this approach, we identified 20 sensitive parameters by using a threshold of S c,x > 20%. These parameters have large impacts on phytoplankton dynamics at the SEATS station. The sensitivity of the remaining parameters was less than 20%, and the variations in phytoplankton chlorophyll-a due to these insensitive parameters were small, thus these parameters were ignored in the following analysis. Secondly, we performed the next step of the global sensitivity analysis experiment on the sensitive parameters identified in the first step to further pick out the key parameters. We conducted an analysis following the approach of Hemmings et al. [47] for all 20 sensitive parameters using the Monte Carlo sampling methods to obtain 500 different combinations of these parameters [49,50]. The sampling ranges were the same as the parameter optimization range. Then, we conducted these 500 groups of model runs, each of which has a unique combination of parameter values. We computed the coefficient of determination (r 2 ) between the parameter and the output of chlorophyll-a concentration at the sea surface to quantify the amount of variance in the outputs explained by each parameter. After that, we ranked the r 2 of each parameter and selected the key parameters for the subsequent optimization process.

Genetic Algorithm
Genetic algorithm (GA) is a powerful tool to solve various optimization problems. It is a random search algorithm, including a process of evolving a population of individuals generated randomly towards better solutions. It is an iterative process. Each individual represents a solution to the problem and is characterized by its fitness, which shows its chance of survival. The fitness is usually the value of the objective function in the optimization problem being solved. New individuals are built by means of crossover and mutation operators. A crossover operator produces 2 offspring by combining and exchanging the elements of 2 parent individuals randomly. Mutation adds small random changes to an individual. A genetic algorithm can reduce the risk of premature convergence by re-initializing after each convergence and creating new random individuals while maintaining the best fit individual from the iterative process [51].
The input parameters of the GA contain population size, probabilistic crossover rate, probabilistic mutation rate, and the maximum number of the generation. In this study, those 4 parameters were set to be 20, 0.6, 0.1, and 1000, respectively. In this optimization algorithm, the individuals of the GA represent the CoSiNE models with different parameter sets. The fitness is denoted by the cost function, which shows the difference between the model result and observations. Genetic individuals of each generation were selected using a roulette model to generate a combination of parameters with better performance according to the pre-set crossover and mutation probability. The process of "survival of the fittest" implies a maximization procedure. The entire process is conducted to further improve the cost function until a stopping criterion is met. Possible stopping criteria are related to optimal fitness value or the maximum number of the generation. The cost function, F, is defined as: where x_simulated is the simulated value, x_observed is the observations, σ is the standard deviation of the observations and N is the number of observations. The aim of GA process is to search for the best parameter combination to minimize the misfit between the observations and simulations. The flow chart of the entire model parameter optimization process is shown in Figure 2.
the fittest" implies a maximization procedure. The entire process is conducted to further improve the cost function until a stopping criterion is met. Possible stopping criteria are related to optimal fitness value or the maximum number of the generation. The cost function, F, is defined as: where x_simulated is the simulated value, x_observed is the observations, σ is the standard deviation of the observations and N is the number of observations. The aim of GA process is to search for the best parameter combination to minimize the misfit between the observations and simulations. The flow chart of the entire model parameter optimization process is shown in Figure 2.

Data and Optimization Experiments
The observations used during the optimization process include ocean color data and BGC-Argo profiles. Ocean color-derived chlorophyll-a concentration with a horizontal resolution of 4 km was obtained from Ocean Color Climate Change Initiative (OC-CCI) dataset provided by the European Space Agency [52]. The data combines measurements from 4 sensors, including the Sea-viewing Wide Field of View Sensor (SeaWiFS), the Moderate-Resolution Imaging Spectroradiometer (MODIS), the Medium Resolution Imaging Spectroradiometer (MERIS), and the Visible Infrared Imaging Radiometer (VIIRS). Remote-sensing reflectance data from MODIS-Aqua, MERIS, and VIIRS were band-shifted to match the wavebands of SeaWiFS. The merged products were validated against in-situ observations. The uncertainties (bias and RMSD) are assigned to every pixel in the products [21]. Daily OC-CCI chlorophyll-a concentration data within a 3 × 3 pixel around the SEATS station were used. We analyzed the satellite data with in-situ measurements, and the correlation was up to 0.93 at the SEATS station. These OC-CCI chlorophyll-a data were further processed into a daily PFT dataset following previous studies of Lin et al. [24] and Brewin et al. [25], which have 3 phytoplankton groups: pico-, nano-, and micro-phytoplankton. The details of 3-component PFT model of phytoplankton size classes are shown in Appendix B. As our model only includes pico-phytoplankton and diatom, the PFTderived nano-and micro-phytoplankton were combined to constrain the modeled diatom for the data assimilation. From these data, the ratios of pico-phytoplankton and diatom (nano-and micro-phytoplankton) chlorophyll-a concentrations to total chlorophyll-a were about 76% and 24% for the whole year, respectively. Lin et al. [24] collected remote sensing and in situ pigment data during SCS cruises from 2006 to 2012. From their

Data and Optimization Experiments
The observations used during the optimization process include ocean color data and BGC-Argo profiles. Ocean color-derived chlorophyll-a concentration with a horizontal resolution of 4 km was obtained from Ocean Color Climate Change Initiative (OC-CCI) dataset provided by the European Space Agency [52]. The data combines measurements from 4 sensors, including the Sea-viewing Wide Field of View Sensor (SeaWiFS), the Moderate-Resolution Imaging Spectroradiometer (MODIS), the Medium Resolution Imaging Spectroradiometer (MERIS), and the Visible Infrared Imaging Radiometer (VIIRS). Remote-sensing reflectance data from MODIS-Aqua, MERIS, and VIIRS were band-shifted to match the wavebands of SeaWiFS. The merged products were validated against in-situ observations. The uncertainties (bias and RMSD) are assigned to every pixel in the products [21]. Daily OC-CCI chlorophyll-a concentration data within a 3 × 3 pixel around the SEATS station were used. We analyzed the satellite data with in-situ measurements, and the correlation was up to 0.93 at the SEATS station. These OC-CCI chlorophylla data were further processed into a daily PFT dataset following previous studies of Lin et al. [24] and Brewin et al. [25], which have 3 phytoplankton groups: pico-, nano-, and micro-phytoplankton. The details of 3-component PFT model of phytoplankton size classes are shown in Appendix B. As our model only includes pico-phytoplankton and diatom, the PFT-derived nano-and micro-phytoplankton were combined to constrain the modeled diatom for the data assimilation. From these data, the ratios of pico-phytoplankton and diatom (nano-and micro-phytoplankton) chlorophyll-a concentrations to total chlorophyll-a were about 76% and 24% for the whole year, respectively. Lin et al. [24] collected remote sensing and in situ pigment data during SCS cruises from 2006 to 2012. From their measurements, the ratio of pico-phytoplankton chlorophyll-a to total chlorophyll-a concentration was between~60% and~80%, with slightly higher value in winter and lower in summer, which was consistent with the data used in this study. With the PFT data, the GA optimization calculated the cost function for each phytoplankton group separately thus as to achieve the goal from the total weighted F value at each generation.
A BGC-Argo float was deployed in the northern SCS on 27 June 2014 ( Figure 1), profiling every 1 to 5 days with a vertical resolution of~2 m above 1000 m and of~50 m from 1000 to 2000 m depth. The float always surfaced near local midnight to avoid the in vivo fluorescence non-photochemical quenching [53,54]. The float was equipped with a SBE 41CP CTD and a WETLabs MCOMS 3-in-1 optical sensor that included sensors for chlorophyll-a fluorescence and the particulate backscattering coefficient at 700 nm (b bp (700)). The float data were processed following Xing et al. [55]. Near the float deployment time and location, water was collected for in situ calibration of the float's chlorophyll-a fluorometer [55]. Before parameter optimization, we have removed abnormal outliers. The b bp (700) profiles were smoothed by a 5-point running median filter to remove unexpected spikes [56,57]. The particulate organic carbon (POC) was calculated from measured b bp (700) based on the empirical relationship [58], which was also validated in the SCS [59]: We conducted a series of optimization experiments to assimilate the ocean color data and float observations of chlorophyll-a. In these experiments, we applied different observation data for F calculation (Table 1) and each experiment was initialized from the same individuals of the GA. In the control (CTRL) run, the model used default model parameters without data assimilation. In experiment 1 (EXP1), only satellite data were used to optimize the model. EXP1a used satellite ocean color data to compare the total sea surface chlorophyll-a from the model. EXP1b adopted ocean color PFT data to calculate the cost function for each phytoplankton group separately. In experiment 2 (EXP2), BGC-Argo profiles of chlorophyll-a from 5 m to 150 m were used. Besides, the depth-integrated chlorophyll-a between 65 m and 85 m were added into the cost function to better capture the SCM feature. In experiment 3 (EXP3), both ocean color PFT and float profiles of chlorophylla were used for optimization. In addition, we set up two more experiments based on EXP3. We calculated the seasonal average and monthly average of BGC-Argo profiles data for assimilation to eliminate the high-frequency effect of the floats data, and the other settings were kept consistent. Table 1. Observation data to calculate the cost function in each experiment.

Experiment
Observation Data

CTRL -EXP1a
satellite sea surface chlorophyll-a EXP1b satellite-derived PFT data EXP2 BGC-Argo profiles of chlorophyll-a EXP3 PFT data and BGC-Argo profiles of chlorophyll-a EXP-S PFT data and seasonal averaged BGC-Argo profiles of chlorophyll-a EXP-M PFT data and monthly averaged BGC-Argo profiles of chlorophyll-a

Seasonal Variation of Chlorophyll-a
Temporal variations of satellite-derived chlorophyll-a and BGC-Argo profiles are shown in Figure 3. In the surface layer, satellite-derived chlorophyll-a concentration demonstrates obvious seasonal characteristics (Figure 3a). In winter, surface chlorophyll-a concentration reaches the highest value of about 0.3 mg m −3 , due to the winter cooling and strong mixing that brings subsurface nutrients into the upper layer [60]. In spring, surface chlorophyll-a decreases gradually. Surface chlorophyll-a shows a very low concentration of about 0.1 mg m −3 in summer, resulting from nutrient depletion at the surface and enhanced vertical stratification [14]. Chlorophyll-a concentration stays low during the monsoon transition period.  [17]. The SCM quickly shoals or even disappears in winter and surface phytoplankton reaches high concentrations. The SCM in the northern SCS is affected by multiple biological processes, such as phytoplankton growth, zooplankton grazing, phytoplankton sinking, phytoplankton photo-acclimation and detritus remineralization [17,61,64]. Parameter optimization could facilitate the ecosystem model to reproduce the SCM and better elucidate dominant dynamics of phytoplankton chlorophyll-a distribution.

Optimizable Parameter Selection
We first identified 20 sensitive model parameters based on the sensitivity analysis. Considering the sensitivity of modeled surface chlorophyll-a concentration, parameters related to zooplankton grazing, phytoplankton growth, and detritus remineralization are particularly sensitive. Then we conducted 500 groups of model runs with a unique combination of parameter values using the Monte Carlo sampling as described in Section 2.2. From BGC-Argo float profile data, sea surface chlorophyll-a shows the same feature. Sea surface chlorophyll-a is highest in winter and relatively low in other seasons (Figure 3b). The winter peak of chlorophyll-a from the float is a little higher than that from the satellite data. Zhang et al. [61] found this same phenomenon by comparing BGC-Argo chlorophyll-a at 5 m depth with remote sensing derived data obtained by MODIS/Aqua instruments. It might be because that the float covered a relatively wide spatial range with high phytoplankton biomass area in the northern SCS ( Figure 1). There is also uncertainty in remote sensing reflectances (Rrs). Products derived from Rrs are affected by the bias to varying degrees, with chlorophyll varying up to 25% over a year [62,63]. On the other hand, satellite observations may miss high chlorophyll-a peaks due to cloud influence in winter. Other than this small difference, the BGC-Argo chlorophyll-a data are in agreement with remote sensing data.
In the northern SCS, SCM exists but is less significant in winter, demonstrating a seasonal variation. A distinct SCM appears in spring, gradually deepens in summer and autumn with a value of more than 0.6 mg m −3 and a depth of 65-85 m, consistent with the model result and observations of Gong et al. [17]. The SCM quickly shoals or even disappears in winter and surface phytoplankton reaches high concentrations. The SCM in the northern SCS is affected by multiple biological processes, such as phytoplankton growth, zooplankton grazing, phytoplankton sinking, phytoplankton photo-acclimation and detritus remineralization [17,61,64]. Parameter optimization could facilitate the ecosystem model to reproduce the SCM and better elucidate dominant dynamics of phytoplankton chlorophyll-a distribution.

Optimizable Parameter Selection
We first identified 20 sensitive model parameters based on the sensitivity analysis. Considering the sensitivity of modeled surface chlorophyll-a concentration, parameters related to zooplankton grazing, phytoplankton growth, and detritus remineralization are particularly sensitive. Then we conducted 500 groups of model runs with a unique combination of parameter values using the Monte Carlo sampling as described in Section 2.2.
According to the analysis above, 9 key parameters ( Table 2) were selected for the model set up in the northern SCS, which mainly fell into 3 categories: predator-related parameters for zooplankton, growth-related parameters for phytoplankton and the initial slope of the P-I curve associated with phytoplankton photosynthesis. The GA optimization aims at these 9 optimizable parameters and searches for the best parameter combination to minimize the misfit between the observations and simulations.

Optimization Results
The default parameters in the CoSiNE model have been empirically tuned against different observations in the SCS [34]. Nevertheless, compared with the CTRL run, assimilating data into the model using the GA showed improvements in the cost function for all runs ( Table 3). The cost function of each experiment varied with the data assimilation settings. In EXP1, assimilation of satellite data improves the model result of sea surface chlorophyll-a and phytoplankton groups. The EXP1a, directly incorporating the satellite chlorophyll-a in the optimization, decreases the cost function by 5.3%. From Table 3, both Fs1 and Fs2 in EXP1a display reductions, suggesting that adjustment of total chlorophyll-a improves the chlorophyll-a simulation contributed by different phytoplankton functional types. However, the vertical simulation of chlorophyll-a is not improved in EXP1a. Compared with EXP1a, EXP1b shows a better performance in reproducing two phytoplankton functional types, indicating ocean color PFT data assimilation has the potential to improve estimation of phytoplankton groups. The cost function values of the two phytoplankton functional types in EXP1b decrease by 26.8% and 49.2%, respectively. In the northern SCS, chlorophyll-a concentration of pico-phytoplankton is typically higher than diatoms [65]. Although modeled total chlorophyll-a concentration fits well with ocean color data (Figure 4), the CTRL model predicts a larger concentration of diatom group, resulting in the diatom ratio being significantly higher in winter, even up to 50% ( Figure 5). After the assimilation of PFT, the relative ratio of diatom in EXP1b is more consistent with observations. However, the model misses the vertical distribution pattern of chlorophyll-a, with a deeper but smaller SCM (Figure 6). It indicates that while the assimilation of satellite data improves the model predictions of phytoplankton groups at the surface, the model predictions of vertical chlorophyll-a structure may still not be optimal compared with the CTRL run (Table 3), likely due to the lack of vertical information. diatom ratio being significantly higher in winter, even up to 50% ( Figure 5). A similation of PFT, the relative ratio of diatom in EXP1b is more consistent w tions. However, the model misses the vertical distribution pattern of chlorop a deeper but smaller SCM ( Figure 6). It indicates that while the assimilation data improves the model predictions of phytoplankton groups at the surface predictions of vertical chlorophyll-a structure may still not be optimal compar CTRL run (Table 3), likely due to the lack of vertical information.  In EXP2, the modeled vertical chlorophyll-a profile matches the observ well ( Figure 6). The cost function value of chlorophyll-a averaged over the w tions. However, the model misses the vertical distribution pattern of chloro a deeper but smaller SCM ( Figure 6). It indicates that while the assimilati data improves the model predictions of phytoplankton groups at the surfa predictions of vertical chlorophyll-a structure may still not be optimal comp CTRL run (Table 3), likely due to the lack of vertical information.  In EXP2, the modeled vertical chlorophyll-a profile matches the obse well ( Figure 6). The cost function value of chlorophyll-a averaged over the between model and observations declined by 20.4% (Table 3). Compared to groups, especially for diatom, with its concentration considerably low in th eling period ( Figure 5). Since the chlorophyll-a of pico-phytoplankton dom chlorophyll-a, underestimation of diatom might be neglected if only consid face chlorophyll-a. In Figure 4, each model exhibits reasonable prediction chlorophyll-a. However, the phytoplankton groups might be misrepresent In EXP3, both ocean color PFT data and float profiles were taken into in the GA optimization process. The cost functions for surface pico-phyto face diatom, SCM, and vertical profiles in this run all decrease compared model (Table 3). For all four optimization experiments, although each co EXP3 was not the lowest, the overall prediction skill of EXP3 was the op depth of SCM was slightly shallower than that in EXP2 and BGC-Argo (Fig magnitude increased (Table 3). There was also a clear improvement in the two phytoplankton groups ( Figure 5). The chlorophyll-a ratio of diatom with observations, especially in the winter period. The chlorophyll-a partiti pico-phytoplankton and diatom was in better agreement with the observat experiment, considering both vertical observations and PFT data provided mization performance.
The monthly-averaged chlorophyll-a profiles of each experiment ar Figure 7. The main difference of each experiment appeared in the SCM simu to the result in Figure 6, the EXP1b showed a relatively poor prediction sk cause of the lack of vertical data assimilation. In the northern SCS, SCM app gradually deepens in summer and autumn. The EXP1b model shows changed SCM depth for the whole year, and the chlorophyll-a concentrat than the observations. Both EXP2 and EXP3 can approximately simulate the in spring, summer, and autumn after the optimization, but the magnitu slightly smaller than the BGC-Argo profile. In EXP2, the modeled vertical chlorophyll-a profile matches the observations quite well ( Figure 6). The cost function value of chlorophyll-a averaged over the water column between model and observations declined by 20.4% (Table 3). Compared to the CTRL and EXP1, the assimilation of vertical float observations largely improves subsurface predictions of chlorophyll-a. The magnitude and the vertical location of SCM are more consistent with the float data. The model reproduces the seasonal variation of surface total chlorophylla concentration (Figure 4). However, it fails to simulate surface phytoplankton groups, especially for diatom, with its concentration considerably low in the whole modeling period ( Figure 5). Since the chlorophyll-a of pico-phytoplankton dominates the total chlorophyll-a, underestimation of diatom might be neglected if only considering total surface chlorophylla. In Figure 4, each model exhibits reasonable prediction of the surface chlorophyll-a. However, the phytoplankton groups might be misrepresented.
In EXP3, both ocean color PFT data and float profiles were taken into consideration in the GA optimization process. The cost functions for surface pico-phytoplankton, surface diatom, SCM, and vertical profiles in this run all decrease compared with the CTRL model (Table 3). For all four optimization experiments, although each cost function in EXP3 was not the lowest, the overall prediction skill of EXP3 was the optimal one. The depth of SCM was slightly shallower than that in EXP2 and BGC-Argo ( Figure 6), but the magnitude increased (Table 3). There was also a clear improvement in the fractions of the two phytoplankton groups ( Figure 5). The chlorophyll-a ratio of diatom was consistent with observations, especially in the winter period. The chlorophyll-a partitioning between pico-phytoplankton and diatom was in better agreement with the observations. From this experiment, considering both vertical observations and PFT data provided the best optimization performance.
The monthly-averaged chlorophyll-a profiles of each experiment are displayed in Figure 7. The main difference of each experiment appeared in the SCM simulation. Similar to the result in Figure 6, the EXP1b showed a relatively poor prediction skill of SCM because of the lack of vertical data assimilation. In the northern SCS, SCM appears in spring, gradually deepens in summer and autumn. The EXP1b model shows an almost unchanged SCM depth for the whole year, and the chlorophyll-a concentrations are lower than the observations. Both EXP2 and EXP3 can approximately simulate the depth of SCM in spring, summer, and autumn after the optimization, but the magnitude of SCM is slightly smaller than the BGC-Argo profile.

Influence of Sampling Frequency of Float Data
With respect to the traditional sampling from ship cruises, the advantage of BGC Argo float is sampling the ocean continuously and with a relatively high frequency, re turning vertical profiles associated with different spatial and temporal scales. Kaufman al. [31] demonstrated the benefit of assimilating high-frequency data in a 1D model. BGC Argo float sampling frequency of 1~5 days has the potential to resolve intraseasonal an mesoscale processes in the northern SCS [59].
To examine the influence of the sampling frequency on the assimilation performanc we conducted two more experiments, calculating the seasonal average (EXP-S) an monthly average (EXP-M) float profiles, respectively. GA optimization settings are simila to the EXP3, only changing the calculation of the cost function. In EXP-S, the observation of cost function used chlorophyll-a profiles of four seasonal averages. After the assimila tion, the vertical chlorophyll-a bias decreases by 6.9%, and the SCM bias decreases b 22.7% with respect to the CTRL (Table 4). However, the misfit of seasonal mean chloro phyll-a profiles between the model and the float data shows less reduction (Figure 8 which may result from the fact that the four seasonal chlorophyll-a profiles are not enoug to model optimization. In the EXP-M, monthly mean chlorophyll-a profiles were com puted from the float data and used in the optimization. The result indicates that the SCM bias yields the largest reduction and each item of cost function and shows considerab improvements compared to the CTRL model. The depth and magnitude of SCM show th consistency with the observed data well (Figure 8), indicating that using monthly mea profiles is sufficient to contribute to the optimization of vertical chlorophyll-a structur and maintain the optimal calibration of the biological parameters. This was also found i Bisson et al. [62]. EXP-M improves the vertical chlorophyll-a even more than EXP3, prob ably indicting that high-frequency variabilities associated with the float data of every 1~ days are not fully represented in the model, and assimilating those variabilities into th 1D model may increase the model-data deviations. It is particularly clear in March. In th northern SCS, March is a transition season between summer and winter monsoons, whe

Influence of Sampling Frequency of Float Data
With respect to the traditional sampling from ship cruises, the advantage of BGC-Argo float is sampling the ocean continuously and with a relatively high frequency, returning vertical profiles associated with different spatial and temporal scales. Kaufman et al. [31] demonstrated the benefit of assimilating high-frequency data in a 1D model. BGC-Argo float sampling frequency of 1~5 days has the potential to resolve intraseasonal and mesoscale processes in the northern SCS [59].
To examine the influence of the sampling frequency on the assimilation performance, we conducted two more experiments, calculating the seasonal average (EXP-S) and monthly average (EXP-M) float profiles, respectively. GA optimization settings are similar to the EXP3, only changing the calculation of the cost function. In EXP-S, the observations of cost function used chlorophyll-a profiles of four seasonal averages. After the assimilation, the vertical chlorophyll-a bias decreases by 6.9%, and the SCM bias decreases by 22.7% with respect to the CTRL (Table 4). However, the misfit of seasonal mean chlorophyll-a profiles between the model and the float data shows less reduction (Figure 8), which may result from the fact that the four seasonal chlorophyll-a profiles are not enough to model optimization. In the EXP-M, monthly mean chlorophyll-a profiles were computed from the float data and used in the optimization. The result indicates that the SCM bias yields the largest reduction and each item of cost function and shows considerable improvements compared to the CTRL model. The depth and magnitude of SCM show the consistency with the observed data well (Figure 8), indicating that using monthly mean profiles is sufficient to contribute to the optimization of vertical chlorophyll-a structure and maintain the optimal calibration of the biological parameters. This was also found in Bisson et al. [62]. EXP-M improves the vertical chlorophyll-a even more than EXP3, probably indicting that high-frequency variabilities associated with the float data of every 1~5 days are not fully represented in the model, and assimilating those variabilities into the 1D model may increase the model-data deviations. It is particularly clear in March. In the northern SCS, March is a transition season between summer and winter monsoons, when the seasonal influence on the vertical distribution of chlorophyll-a is not the dominant factor. Instead, local processes and/or mesoscale processes may affect phytoplankton distributions measured by the BGC-Argo float, which are not resolved by the 1D model. As the dominant variability of phytoplankton in the northern SCS is at the seasonal timescale, monthly float data are more suitable to apply to the data assimilation for the 1D model.   Table 3; Fv-s and Fv-m represent the vertical misfit of average seasonal Chla and monthly average Chla, respectively.

Effects of Biological Parameter on Vertical Chlorophyll-a Structure
In the experiments above, the main differences in the vertical Chla simulation are the magnitude and the depth of SCM layer. The depth and magnitude of SCM layer largely depend on the phytoplankton dynamics, which are influenced by biological parameters. Among the optimization parameters, gmaxs1, amaxs1, and akno3s2 are related to phytoplankton growth. Beta1, beta2, akz2, and bgamma1 are related to zooplankton growth and grazing processes. The assimilation of vertical float observations significantly improves subsurface predictions of chlorophyll-a in EXP2, EXP3, and EXP-M. The EXP-M shows the best prediction skill in vertical chlorophyll-a structure, which is consistent with the BGC-Argo profile.
The optimal parameter values of each experiment are shown in Table 5. In EXP2 and EXP3, biological parameters show similar variation in value compared with CTRL model, but the magnitude of SCM in EXP3 is slightly smaller than that in EXP2 (Figure 7), which might be due to the change of gmaxs1, amaxs1, and akno3s2. Based on Michaelis-Menten equation, the decrease of the maximum specific growth rate in EXP3 will weaken the phytoplankton growth rate. The amaxs1 shows the ability of pico-phytoplankton to utilize solar irradiance for photosynthesis. EXP3 model with a smaller amaxs1 suggests a slightly weaker photosynthesis ability of pico-phytoplankton. Nitrogen limitation plays an important role in modulating the phytoplankton growth in the deep basin of the SCS [49].

Effects of Biological Parameter on Vertical Chlorophyll-a Structure
In the experiments above, the main differences in the vertical Chla simulation are the magnitude and the depth of SCM layer. The depth and magnitude of SCM layer largely depend on the phytoplankton dynamics, which are influenced by biological parameters. Among the optimization parameters, gmaxs1, amaxs1, and akno3s2 are related to phytoplankton growth. Beta1, beta2, akz2, and bgamma1 are related to zooplankton growth and grazing processes. The assimilation of vertical float observations significantly improves subsurface predictions of chlorophyll-a in EXP2, EXP3, and EXP-M. The EXP-M shows the best prediction skill in vertical chlorophyll-a structure, which is consistent with the BGC-Argo profile.
The optimal parameter values of each experiment are shown in Table 5. In EXP2 and EXP3, biological parameters show similar variation in value compared with CTRL model, but the magnitude of SCM in EXP3 is slightly smaller than that in EXP2 (Figure 7), which might be due to the change of gmaxs1, amaxs1, and akno3s2. Based on Michaelis-Menten equation, the decrease of the maximum specific growth rate in EXP3 will weaken the phytoplankton growth rate. The amaxs1 shows the ability of pico-phytoplankton to utilize solar irradiance for photosynthesis. EXP3 model with a smaller amaxs1 suggests a slightly weaker photosynthesis ability of pico-phytoplankton. Nitrogen limitation plays an important role in modulating the phytoplankton growth in the deep basin of the SCS [49]. Parameter akno3s2 represents the half-saturation of nitrate uptake by diatom. Thus, EXP3 with a smaller akno3s2 will improve the growth of diatom and increase sea surface chlorophyll-a concentration of diatom. Besides, EXP3 has a slightly shallower SCM layer than EXP2 because of the enhancement of zooplankton grazing processes. The parameters related to grazing rate and efficiency (beta1, beta2, and bgamma1) in EXP3 are higher than those in EXP2, which increase the zooplankton grazing and thus decrease the phytoplankton biomass in the upper layer in EXP3. In EXP1b, the model with a large amaxs1 value and small grazing rate also show a deeper SCM layer. The high grazing parameter values may lead to reduced phytoplankton peak and shallower SCM depth [66]. Compared to EXP2 and EXP3, ecosystem model of EXP-M has the largest gmaxs1 and amaxs1, and has the smallest beta1, beta2, and bgamma1. These parameters facilitate the photosynthetic efficiency of pico-phytoplankton and reduce the zooplankton grazing rate to promote the biomass of phytoplankton. Pico-phytoplankton provides the dominant contribution of chlorophyll-a in the SCS. Parameter optimization greatly increases the biomass of pico-phytoplankton and provides better chlorophyll-a simulation results. The EXP-M also has a small akno3s2 value, leading to high growth potential of the diatom. In contrast, EXP-M has a relatively smaller akz2, which changes the grazing pressure of mesozooplankton and has the potential to change phytoplankton biomass to some extent. That is to say, ecosystem models with high nonlinearity can result in parameters that have a synergistic regulation effect on the modeled vertical chlorophyll-a structure.
The vertical chlorophyll-a structure is closely related to the process of phytoplankton growth and zooplankton grazing. Thus, relevant biological parameters can affect the depth and magnitude of the SCM. Among these parameters, the phytoplankton growth parameters tend to change the magnitude of SCM; the zooplankton grazing parameters tend to change both the magnitude and the depth of SCM. All biological parameters have nonlinear effects on the modeled vertical chlorophyll-a structure.

Impacts on Subsurface POC and Export Flux
Chlorophyll-a is the commonly used variable to study marine biogeochemistry. Besides, the POC is an important carbon pool that represents the biological pump transporting CO 2 from the atmosphere to the ocean interior [67]. In this study, observed POC were derived from the validated b bp (700) based on the empirical relationship. Compared to other experiments, POC at 100 m depth increases significantly in the EXP-M model, which is statistically consistent with the observed data, especially in winter and spring (Figure 9). Modeled POC concentration at 100 m is comparable with previous observations at the SEATS station with a range of 1-2 mmol C m −3 [68]. In the model, the key parameters regulate the sinking pathway of POC by controlling the biological processes of phytoplankton and zooplankton. Parameter optimization improves the phytoplankton composition and adjusts the zooplankton growing and grazing process, reflecting a change of detritus concentration. Since the model predictions fit well with observations in the vertical chlorophyll-a structure, POC concentration at 100 m in the EXP-M model is largely improved compared to the CTRL model. export is high in winter but low during summer and fall at SEATS. Moreover, they found the correlation between surface chlorophyll-a and POC export at 100 m was significant only when Chla>0.11 mg m −3 . It indicates that other factors (phytoplankton composition, zooplankton, etc.) may play an important role in POC generation and export during the low-production period, which remains largely unknown and requires further investigations. Carbon export flux is estimated by detritus and phytoplankton sinking in the model. The comparison of POC export flux at 100 m depth is shown in Figure 10. In EXP-M, chlorophyll-a concentrations fit the best with the BGC-Argo observations both at the SCM layer and 100 m depth. More carbon is exported from the euphotic zone due to the improvement of vertical profile of chlorophyll-a. Winter peak could reach about 110 mg C m −2 d −1 and other seasons show a low value of about 40 mg C m −2 d −1 , similar to carbon export flux of the sediment trap data in the central SCS [34]. Other experiments demonstrate an underestimation of carbon export flux. Carbon export is affected by phytoplankton composition, zooplankton, and other components of the food web [70]. Phytoplankton and zooplankton communities play a significant role in controlling POC export flux [71]. Siegel et al. [72] found phytoplankton's contribution to total export flux was on average 12.7% and a portion of the export was controlled by fecal matter from zooplankton grazing. The PFT data assimilation improves the plankton groups and results in a better prediction of carbon export flux. Therefore, it is important to improve the understanding of biogeochemical processes and food web dynamics in order to better predict POC export in the northern SCS. Marine organisms are components and also producers of POC. They produce detritus through feeding and metabolic processes that ultimately cause POC to sink downward. During in situ measurements, POC includes all the particulate (defined by the pore size of filter) organic carbon in the water column. In the model, POC concentration was calculated as the sum of phytoplankton biomass, zooplankton biomass, and detritus. Thus, uncertainties in modeled POC are generally larger than modeled individual variable, such as chlorophyll-a concentration, because more biogeochemical dynamics are involved in the POC simulation. Compared with chlorophyll-a simulation, modeled POC in EXP-M shows relatively larger bias, especially in summer and fall when surface production is considerably low in the SCS. In winter, when surface production is high, modeled POC compares well with observations. This seasonal variability is consistent with the measured POC export flux shown in previous studies. Zhou et al. [69] showed that POC export is high in winter but low during summer and fall at SEATS. Moreover, they found the correlation between surface chlorophyll-a and POC export at 100 m was significant only when Chla > 0.11 mg m −3 . It indicates that other factors (phytoplankton composition, zooplankton, etc.) may play an important role in POC generation and export during the lowproduction period, which remains largely unknown and requires further investigations.
Carbon export flux is estimated by detritus and phytoplankton sinking in the model. The comparison of POC export flux at 100 m depth is shown in Figure 10. In EXP-M, chlorophyll-a concentrations fit the best with the BGC-Argo observations both at the SCM layer and 100 m depth. More carbon is exported from the euphotic zone due to the improvement of vertical profile of chlorophyll-a. Winter peak could reach about 110 mg C m −2 d −1 and other seasons show a low value of about 40 mg C m −2 d −1 , similar to carbon export flux of the sediment trap data in the central SCS [34]. Other experiments demonstrate an underestimation of carbon export flux. Carbon export is affected by phytoplankton composition, zooplankton, and other components of the food web [70]. Phytoplankton and zooplankton communities play a significant role in controlling POC export flux [71]. Siegel et al. [72] found phytoplankton's contribution to total export flux was on average 12.7% and a portion of the export was controlled by fecal matter from zooplankton grazing. The PFT data assimilation improves the plankton groups and results in a better prediction of carbon export flux. Therefore, it is important to improve the understanding of biogeochemical processes and food web dynamics in order to better predict POC export in the northern SCS.

Summary and Conclusions
Parameter optimization of marine ecosystem models provides a convenient tech nique for adjusting model parameters to better represent the marine ecosystem [73][74][75][76] In this research, we implemented a genetic algorithm to optimize nine biological param eters in the CoSiNE marine ecosystem model using satellite data and BGC-Argo float ob servations. The application of remote sensing data improves the predictive capability o the model. The satellite-derived PFT data allow us to better simulate phytoplankton com munity structure, which has been used in the North Atlantic, Mediterranean Sea and global ocean. Our experiments confirmed that it could also significantly improve th model simulation in the northern SCS. However, this advantage relies on the accuracy o the PFT data. The PFT three-component model still needs more in-situ measurements t validate and evaluate in the SCS. Our study is just a preliminary application of the PFT data in the ecosystem, modeling in the SCS. Utilizing the multiplatform observational dat decreases the model bias in predictions of surface and vertical chlorophyll-a distributions All model experiments show different degrees of improvement in model skill compared with the CTRL model. After the assimilation of the satellite-derived fields, the model-pre dicted surface chlorophyll-a concentrations for two phytoplankton groups fit the ob served values quite well, with a bias reduction of 26.8% and 49.2%, respectively. The as similation of BGC-Argo float observations reduces the misfit of vertical chlorophyll-a pro files ranging from 11.7% to 20.3%, and the SCM bias decreases ranging from 4.7% to 32.1% From the experiments, data assimilation improves the simulation of the ecosystem model using both satellite PFT data and BGC-Argo observations. We show that th monthly float data are most suitable to apply in the 1D model vertical structure optimiza tion. Parameter optimization provides better chlorophyll-a simulation by adjusting frac tions of phytoplankton groups. Monthly BGC-Argo data assimilation also improves th modeled subsurface POC and POC export flux simulation by modulating the phytoplank ton functional types, phytoplankton growth, and zooplankton grazing processes.
BGC-Argo floats are expanding in the global ocean, providing more opportunitie and challenges for modeling, understanding, and predicting marine ecosystems. The com bination of BGC-Argo observations and satellite-derived phytoplankton functional typ data provides reliable support for the optimization of the model. As the 3D model require more computational time and costs, it is difficult to optimize the parameters of 3D mode directly. The success of applying the genetic algorithm in the 1D marine ecosystem mode provides the basis for further application in 3D ecosystem models.

Summary and Conclusions
Parameter optimization of marine ecosystem models provides a convenient technique for adjusting model parameters to better represent the marine ecosystem [73][74][75][76]. In this research, we implemented a genetic algorithm to optimize nine biological parameters in the CoSiNE marine ecosystem model using satellite data and BGC-Argo float observations. The application of remote sensing data improves the predictive capability of the model. The satellite-derived PFT data allow us to better simulate phytoplankton community structure, which has been used in the North Atlantic, Mediterranean Sea and global ocean. Our experiments confirmed that it could also significantly improve the model simulation in the northern SCS. However, this advantage relies on the accuracy of the PFT data. The PFT three-component model still needs more in-situ measurements to validate and evaluate in the SCS. Our study is just a preliminary application of the PFT data in the ecosystem, modeling in the SCS. Utilizing the multiplatform observational data decreases the model bias in predictions of surface and vertical chlorophyll-a distributions. All model experiments show different degrees of improvement in model skill compared with the CTRL model. After the assimilation of the satellite-derived fields, the model-predicted surface chlorophyll-a concentrations for two phytoplankton groups fit the observed values quite well, with a bias reduction of 26.8% and 49.2%, respectively. The assimilation of BGC-Argo float observations reduces the misfit of vertical chlorophyll-a profiles ranging from 11.7% to 20.3%, and the SCM bias decreases ranging from 4.7% to 32.1%.
From the experiments, data assimilation improves the simulation of the ecosystem model using both satellite PFT data and BGC-Argo observations. We show that the monthly float data are most suitable to apply in the 1D model vertical structure optimization. Parameter optimization provides better chlorophyll-a simulation by adjusting fractions of phytoplankton groups. Monthly BGC-Argo data assimilation also improves the modeled subsurface POC and POC export flux simulation by modulating the phytoplankton functional types, phytoplankton growth, and zooplankton grazing processes.
BGC-Argo floats are expanding in the global ocean, providing more opportunities and challenges for modeling, understanding, and predicting marine ecosystems. The combination of BGC-Argo observations and satellite-derived phytoplankton functional type data provides reliable support for the optimization of the model. As the 3D model requires more computational time and costs, it is difficult to optimize the parameters of 3D model directly. The success of applying the genetic algorithm in the 1D marine ecosystem model provides the basis for further application in 3D ecosystem models.
Author Contributions: Conceptualization, C.S. and P.X.; methodology, P.X. and W.M.; resources, X.X., G.Q. and S.C.; writing-original draft preparation, C.S. and P.X.; writing-review and editing, P.X. and R.J.W.B. All authors have read and agreed to the published version of the manuscript.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest.
The calculations of biological processes are as follows.
(1) The new production and regenerated productions of S1 and S2: where µ1 max and µ2 max are the maximum growth rate of S1 and S2. ψ is the NH4 inhibition parameter. K NO3 , K NH4 , K PO4 and K Si(OH)4 are the half-saturation constants for NO 3 , NH 4 , PO 4 and Si(OH) 4 . α is the initial slop of P-I curve.
(2) The production of chlorophyll-a: where θ max is the maximum ratio of chlorophyll-a to carbon.

Appendix B
The PFT three-component model of phytoplankton size classes were developed by Brewin et al. [25].
Total Chla concentration (C) is the sum of Chla concentrations in three size classes. Here we use OC-CCI Chla concentration. Picoplankton and nanoplankton can be combined into a single class [77], and their combined Chla concentration (C p,n ) can be expressed as: where C m p,n is the asymptotic maximum value for C p,n and S p,n is the initial slope. Then the Chla of microplankton (C m ) can simply be calculated as: The Chla of picoplankton (C p ) can also be expressed in a similar form as a function of total chla concentration: where C m p is the asymptotic maximum value for C p and S p determines the initial slope of the curve. Then the value of C n can be calculated as follows: Therefore, the fractions of these size classes can be derived by the following equations: F n = C m p,n 1 − exp −S p,n C − C m p 1 − exp −S p C C , (A54) where the parameter values of C m p,n , C m p , S p,n and S p in the SCS are 0.9532, 0.2563, 0.9835 and 3.5346, respectively [19].