Remote Sensing Supported Sea Surface pCO 2 Estimation and Variable Analysis in the Baltic Sea

: Marginal seas are a dynamic and still to large extent uncertain component of the global carbon cycle. The large temporal and spatial variations of sea-surface partial pressure of carbon dioxide (pCO 2 ) in these areas are driven by multiple complex mechanisms. In this study, we analyzed the variable importance for the sea surface pCO 2 estimation in the Baltic Sea and derived monthly pCO 2 maps for the marginal sea during the period of July 2002–October 2011. We used variables obtained from remote sensing images and numerical models. The random forest algorithm was employed to construct regression models for pCO 2 estimation and produce the importance of different input variables. The study found that photosynthetically available radiation (PAR) was the most important variable for the pCO 2 estimation across the entire Baltic Sea, followed by sea surface temperature (SST), absorption of colored dissolved organic matter (a CDOM ), and mixed layer depth (MLD). Interestingly, Chlorophyll-a concentration (Chl-a) and the diffuse attenuation coefﬁcient for downwelling irradiance at 490 nm (Kd_490nm) showed relatively low importance for the pCO 2 estimation. This was mainly attributed to the high correlation of Chl-a and Kd_490nm to other pCO 2 -relevant variables (e.g., a CDOM ), particularly in the summer months. In addition, the variables’ importance for pCO 2 estimation varied between seasons and sub-basins. For example, the importance of a CDOM were large in the Gulf of Finland but marginal in other sub-basins. The model for pCO 2 estimate in the entire Baltic Sea explained 63% of the variation and had a root of mean squared error (RMSE) of 47.8 µ atm. The pCO 2 maps derived with this model displayed realistic seasonal variations and spatial features of sea surface pCO 2 in the Baltic Sea. The spatially and seasonally varying variables’ importance for the pCO 2 estimation shed light on the heterogeneities in the biogeochemical and physical processes driving the carbon cycling in the Baltic Sea and can serve as an important basis for future pCO 2 estimation in marginal seas using remote sensing techniques. The pCO 2 maps derived in this study provided a robust benchmark for understanding the spatiotemporal patterns of CO 2 air-sea exchange in the Baltic Sea.


Introduction
Global oceans are an important sink of atmospheric CO 2 and uptake approximately 30% of the global anthropogenic CO 2 emissions [1]. As the global ocean uptake of CO 2 increases by a rate proportional to the atmospheric CO 2 , substantial differences exist between oceans and marginal seas [1,2]. The changing air-sea exchange of CO 2 in marginal seas, particularly those at high-latitude, is found to be the major source of uncertainties in the estimate of ocean CO 2 uptake [3,4]. As the atmospheric CO 2 is as rather globally homogenous, sea surface partial pressure of carbon dioxide (pCO 2 ) in the marginal sea is the key component for precisely determining the direction of the air-sea exchange of CO 2 .
Therefore, deriving maps of the changing pCO 2 for marginal seas over time is critical for precise estimate of global air-sea exchange and ocean uptake of CO 2 [2,3,5].
Generally, sea surface pCO 2 is jointly determined by biogeochemical processes, vertical and horizontal mixing of sea water, and the air-sea exchange of CO 2 [6,7]. Many sea surface variables related to these processes are can be retrieved from remote sensing images. Given their vast spatial coverages, remotely sensed sea surface variables have increasingly been used in sea surface pCO 2 estimation. Remotely sensed Chlorophyll-a concentration (Chl-a) is commonly used as an indicator of biological activities in water [8]. Sea surface temperature (SST) largely determines the solubility of CO 2 in sea water and has been frequently used to estimate pCO 2 from remote sensing [9][10][11][12][13]. In addition, bacteria respiration produces CO 2 by decomposing dissolved organic matter (DOM) [14,15]. Therefore, absorption of colored dissolved organic matter (aCDOM) retrieved from remote sensing images was used in sea surface pCO 2 estimation [16,17]. Furthermore, after [18] found from in-situ measurements that sea surface salinity (SSS) was highly related to sea surface pCO 2 , SSS derived directly from remote sensing images or remotely sensed aCDOM were adopted to support sea surface pCO 2 estimate [16,19]. Kd_490nm, a proxy of water transparency, was derived from remote sensing and included in sea surface pCO 2 estimation to indicate the effect of biological activities [16]. Mixed layer depth (MLD) determines thermal stratification between different water masses and, however, is not retrievable with remote sensing approaches. Therefore, some studies used the MLD obtained from ocean models to support the derivation of sea surface pCO 2 maps [9,12]. Similarly, model-yield gross primary production (GPP) and net primary production (NPP) were also included to support pCO 2 estimation by approximating the biological control on pCO 2 in sea water [9,12].
Sea surface pCO 2 in many global marginal seas have been estimated with various remote sensing supported approaches [9,12,16,17,[20][21][22][23]. Most of the studies chose the variables based on empirical knowledge and focused on deriving pCO 2 maps with small estimate errors (e.g., RMSE). However, few studies have investigated the spatiotemporal variabilities of the variable's relevance to sea surface pCO 2 in marginal seas. Considering the high spatial variabilities in the controlling forces of sea surface pCO 2 in marginal seas, some studies divided the targeted seas into sub-basins/subsets and separately constructed models for pCO 2 retrieval in each of the sub-basins/subset [12,22,24]. Though this strategy produced maps of good quality in the sub-basins/subsets, it provided little knowledge on the variables' relevance to pCO 2 distribution. Furthermore, Reference [25] regarded the sea surface pCO 2 in the targeted area as a mixture of the pCO 2 controlled by different processes (e.g., vertical mixing and biological uptake) and determined each of the processes separately from different sets of variables. Despite the successfully applications in multiple marginal seas [10,25,26], their method was often limited to pCO 2 estimation in summer time and thus fails to provide information for other seasons. Overall, large space remains for investigation on variables' relevance (importance) in sea surface pCO 2 estimate across different time and space.
The Baltic Sea is a semi-enclosed marginal sea located in northern Europe. The carbon budget of the Baltic Sea displays considerable seasonal and interannual variabilities. To date, the few studies attempting to estimate sea surface pCO 2 in the Baltic Sea using remote sensing approaches, e.g., [12]., have barely provided information on the variables' relevance/importance to the pCO 2 estimate for this marginal sea. In this study, we aimed to analyze the importance of different variables for pCO 2 estimation and derive improved monthly pCO 2 maps for the Baltic Sea from 2002 to 2011. We conducted the following: (1) filtering the in-situ pCO 2 data for the model training and validation; (2) assessing the relative importance of the input variables for the pCO 2 estimation on different spatial and seasonal scales; and (3) deriving pCO 2 maps for the Baltic Sea.

Study Area
The Baltic Sea is located at high latitudes (55-60 • N) in Europe. As the sun illumination and temperature there exhibit significant seasonal changes, the Baltic Sea and adjacent terrestrial ecosystems also undergo high seasonality. In addition, the wide span of the Baltic Sea in latitude forms a large spatial gradient in sun illumination and the corresponding environment condition, like SST. The Baltic Sea has restricted water exchange with the open North Atlantic Ocean via the Danish straits and is a semi-enclosed marginal sea. More than 600 rivers drain the catchment of total 1.7 million km 2 and export to the Baltic Sea substantial freshwater and terrigenous substances, including organic carbon [27][28][29][30]. Therefore, the Baltic Sea is characterized with a high concentration of CDOM, and most part of the sea presents as "brown water". With varying inputs from different rivers, the sub-basins of the Baltic Sea create highly heterogeneous biogeochemical conditions in this marginal sea. Consequently, the pCO 2 distribution in the Baltic Sea displays evident seasonality and spatial heterogeneity [31]. Upwelling characterized with evident seasonality and spatiality occurs frequently in the Baltic Sea and brings up deep water of high pCO 2 up to 2000 µatm to the sea surface [32,33]. The high concentration of nutrients brought up together with the deep water leads to cyanobacteria and phytoplankton blooms after the upwelling event, which further complicates the pCO 2 distribution in the Baltic Sea [34].
Till now, nearly all the pCO 2 related studies in the Baltic Sea were based on in-situ measurements from ship and/or buoys, and the findings are often valid for limited sites of the sea. Therefore, analyzing variables' relevance and obtaining reliable pCO 2 maps is critical for better understanding the carbon cycle and the air-sea exchange in the Baltic Sea [35].

Data
We chose the variables for pCO 2 estimation based on previous studies and the characteristic of the Baltic Sea. The variables SST, photosynthetically available radiation (PAR), Chl-a, Kd_490 nm, and a CDOM were remotely sensed. SSS and MLD were produced by the numerical model NEMO-NORDIC together with data assimilation. In-situ pCO 2 measurements from three different sources were used to train and validate the model for pCO 2 estimation.

Remote Sensing Products
The Moderate Resolution Imaging Spectroradiometer (MODIS) on board Aqua satellite was designed for ocean surface investigations. The sensor maps the earth every two days from July 2002 on. A MODIS image consists of 36 spectral bands covering the spectrum of wavelength from 0.63 to 14.38 µm. Images from MODIS Aqua have been successfully used to detect coastal water clarity [36], survey red tides [37], map lake suspended matter [38], and retrieve coastal dissolved organic carbon [39]. Variables, like Chl-a and SST retrieved from MODIS-Aqua images with already mature algorithms, have been widely used to estimate sea surface pCO 2 or simulate sea surface CO 2 flux in different oceans and marginal seas [11,16,17,40,41]. From the National Aeronautics and Space Administration (NASA) Goddard Space Flight Center (https://oceancolor.gsfc.nasa.gov/), we obtained the level-3 monthly mean MODIS products of PAR, Kd_490 nm and SST covering the period of August 2002-October 2020. All data have a spatial resolution equivalent to 4×4 km at the equator (Table 1). The Medium Resolution Imaging Spectrometer (MERIS) on board Envisat satellite was designed for ocean color observation. During its life span from 2002 to 2011, MERIS mapped the earth every 1-3 days and measured water surface radiances in 15 spectral bands from visible to infrared spectrum. Up to now, MERIS data have been frequently used to investigate water related issues in global ocean and marginal seas, including mapping sea algae coverage [42], detecting phytoplankton bloom [43] and cyanobacterial bloom [44], and estimating Chl-a, a CDOM, and suspended matter [45][46][47][48][49]. Most of these studies targeted at European lakes and seas and demonstrated the great potential of MERIS data for investigating these waters. Specifically, [45] found that Chl-a retrieved from MERIS for the Baltic Sea had similar distributions to that of in-situ measurements.
The MERIS data from the MERCI data base (https://merisrr-merci-ds.eo.esa.int/ merci) were used to retrieve Chl-a and a CDOM for the Baltic Sea with the Free University of Berlin (FUB) processor which were especially developed for European coastal waters. Invalid pixels (i.e., land, mixture of land and water, various cloud types, and cloud shadow) were masked out from MERIS images before the Chl-a and a CDOM retrieval. The performance of Chl-a and a CDOM retrieved from MERIS with FUB processor in the Baltic Sea were assessed to be excellent [49,50]. In this study, the daily Chl-a and a CDOM derived from MERIS images were aggregated monthly and resampled to 4×4 km. The Chl-a and a CDOM derived from the full MERIS archive spans from July 2002 to December 2011. Comparison of the contributions of the Chl-a products from MODIS and MERIS to pCO 2 estimation in the method employed here did not show significant differences ( Figure S2).

Modeled Data
MLD and SSS are important variables for pCO 2 estimates. However, remotely sensed SSS have much coarser resolutions than other variables, such as Chl-a, and MLD is not yet obtainable from remote sensing. Alternatively, modeled MLD and SSS have been applied in many studies on sea surface pCO 2 estimation [9,12,20,51,52]. Therefore, we employed the monthly MLD and SSS produced by the NEMO-NORDIC model which is a a Baltic and North Sea model based on the NEMO ocean engine and a local singular evolutive interpolated Kalman (LSEIK)filter data assimilation with a spatial resolution of 4 × 4 km [53] (Table 1). Validation of the modeled SSS against the station observation demonstrated a bias smaller than 0.5 ppt and a RMSE of 0.5 ppt [53].

In-Situ Data
We used all the in-situ sea surface pCO 2 measurements available in the Baltic sea during August 2002-November 2011 (Table 2 and Figure 1). They included the data from the Surface Ocean CO 2 Atlas (SOCAT) (2nd Version) [54], the measurements from a moored buoy at Östergarnsholm site [55], and data from [56]. All the data in SOCAT have undergone quality control and were of error < 10 µatm [54,57]. We used pCO 2 measurements acquired from 2002 to 2011 to match the remotely sensed variables. The data from SOCAT for this period were obtained from the Finnpartner vessels which travelled between Lübeck and Helsinki every second day [58]. The pCO 2 measurements are available every 1-2 min and appear as a series of points distributed along the ship tracks ( Figure 1A).
At the Östergarnsholm site, the sea surface pCO 2 is measured by a submersible autonomous moored instrument (SAMI) mounted on a buoy mooring one kilometer east off the island Östergarnsholm in the central Baltic Sea ( Figure 1A). The SAMI sensor was installed four meters below the water surface and has recorded the pCO 2 there every 30 or 60 min from 2005 May to the present [55]. The pCO 2 measurement from Östergarnsholm site also fulfills the accuracy criterion of <10 µatm.
The pCO 2 data used by [56] filled the data gap left by the previous two data sources in the Gulf of Bothnia. The data set consisted of both manual bottle measurements from discrete stations and continuous ferry box measurements obtained with the same method as the vessel data in SOCAT ( Figure 1A). The measurements were mainly from the year of 2006, 2009, and 2010. More details about the data are available from [56]).

Random Forest
Random forest is a tree-assembled model where the trees are constructed based on a set training samples [59]. Random forest has shown excellent performance in classification and regression [60,61]. Therefore, it has been used in various fields. For example, it has been used to estimate gross primary production of vegetation from remote sensing images [62], downscaling the soil moisture data and chlorophyll fluorescence of coarse resolutions etc. [63,64]. With respect to pCO 2 estimation from remote sensing data [17] derived pCO 2 maps for the Gulf of Mexico with an RMSE of 31.7 µatm using a similar tree-based algorithm. In addition, [16] compared random forest with other commonly used approaches (e.g., multiple linear regression) and proved that random forest was a robust algorithm for sea surface pCO 2 estimation from remote sensing data in the Gulf of Mexico [16].
In this study, random forest models were trained to express the relationship between the in-situ pCO 2 measurements and spatially and temporally co-located variables (i.e., Chl-a, a CDOM , SST, PAR, Kd490nm, SSS, and MLD). Each random forest model contained a number of tree (known as Ntree) with each node splitting to a number of leaves (known as Mtry). At each node, a bootstrapped subset of randomly selected training samples was used to construct the relationship between the Mtry variables (e.g., Chl-and SST) and the dependent variable (i.e., pCO 2 ) in the form of split leaves [65]. The tree grew as the nodes were produced and connected in a cascade manner. Each decision tree was independently produced. The forest construction was finished as the trees grew to Ntree, a user-defined number of trees [59]. The final random forest is a set of trees with best performance in expressing the relationship between variables in the training samples. Further details on the random forest model are to be found in Breiman (2001). Each random forest model contained 500 trees (N tree = 500) of the leaf size of three (Mtry = 3). We used the random forest algorithm implemented in the package randomForest [66] for the open access software R [67].
Subsequently, the importance of each variable in the random forest model was also extracted and analyzed. The importance of a variable X m was determined by the mean decrease accuracy (MDA) of the random forest model when the variable X m is randomly permuted in the training samples [59]. Therefore, the importance of variable X m in a random forest model indicates its contributions/relevance to the model and the response of corresponding variable to the pCO 2 variation in the training data set. For each variable, the importance was derived independently. The variables are not complementary to each other in pCO 2 estimate, Therefore, the sum of the variables' importance cannot stay as constant value, like 100%, across different time and spatial scale.

Filtering In-Situ Data
The diurnal differences of sea surface pCO 2 in the Baltic Sea can reach up to 40 µatm [68], and using only the data from day time or night time would introduce 8% to 36% error on monthly air-sea CO 2 fluxes [69]. Pre-analysis also found that using in-situ pCO 2 measurements from 24 h for sea surface pCO 2 estimation would increase the uncertainty of results by 30-60 µatm (Supplementary Materials Figure S2). Therefore, we only used the in-situ pCO 2 measurements obtained during the exact period of the two satellites (i.e., MODIS Aqua and MERIS) passes over the Baltic Sea, i.e., 9:00-14:00 UTC 00. Subsequently, the in-situ data were aggregated monthly to match the frequency of the remotely sensed and modelled variables. The variables exactly co-located to the in-situ pCO 2 measurements were extracted and used for random forest model construction and validation.
Using the variables (e.g., SST) derived for the months characterized with frequent upwelling occurrences can significantly affect the monthly pCO 2 estimates by introducing large biases ( Figure S3). Therefore, the upwelling effect should be eliminated to the largest possible extent. To achieve this, we constructed a random forest model using in-situ data from each month as validation data and the rest as training data. All the models with the alternative absence of in-situ data from each month were constructed with identical settings. Inspection on the mean absolute errors (MAE) and RMSE of these models showed that the following monthly data were dominated by upwelling (i.e.,  Figure S4). Nearly all of them were in fall when upwelling prevails in the Baltic Sea [32]. In-situ pCO 2 measurements from these months were eliminated from training and validating the model. Sea surface pCO 2 maps in these months were not predicted as it would produce misestimation for these months.
After narrowing the time window of in-situ pCO 2 measurements down to 9:00-14:00, aggregating these in-situ pCO 2 measurements monthly, and filtering out the data from the upwelling dominated months, 10,769 in-situ pCO 2 measurements with matching variables remained, as shown in Figure S1.

Analyzing Variables' Importance for pCO 2 Estimation
We derived the variables' importance to the pCO 2 estimation on two scales: spatially and temporally. On the spatial scale, the random forest models were constructed both for the overall Baltic Sea and its sub-basins indicated in Figure 1B. In each sub-basin, a random forest model was trained with the in-situ in the sub-basin from 2/3 of the months from random selection. Each model was then validated with the in-situ data in the sub-basin from the rest 1/3 months. We constructed 50 random forest models in each sub-basin with the training and validation data selected in such way. In the temporal analysis of the variables' importance to the pCO 2 estimates, the in-situ measurements were divided into different seasons. Specifically, February-April was spring, May-July was summer, and August-October was fall. The limited availability of satellite data due to frequent and extensive cloud coverage in November, December, and January did not allow for such analysis during these months. Like the spatial analysis, in-situ data from 2/3 of the months from random selection were used for training and the rest 1/3 for validation. Fifty random forest models were constructed in each season with the training data selected in the same manner and validated with the corresponding complementary data.

Constructing the Fnal Model for pCO 2 Estimation in the Baltic Sea
We constructed a final random forest model for pCO 2 estimation in the entire Baltic Sea. This model was trained with the in-situ pCO 2 measurements in odd months of even years (e.g., March 2002) and even months of odd years (e.g., April 2003) and validated with the remaining data. By doing this, both the training and validation data covered each of the 12 months in a year and the pCO 2 relevant processes from each month. Exchanging the training data and validation data yielded models with nearly the same performance ( Figure S7). The monthly mean pCO 2 distribution in the entire Baltic Sea were predicted with this model.
The Pearson correlations of the pCO 2 estimated with above model to each of the variables were analyzed. In order to speed up the processing, the correlation was analyzed on a 0.5 • × 0.5 • grid form. In each month, the mean of pCO 2 and the means of each targeted variables (e.g., Chl-a) in the same grid cell was derived. The Pearson correlations between pCO 2 and each of the variables in each grid cell were obtained across the study period of 2002-2011.

Comparing the Random Forest to Self-Organized Map (SOM) and Multiple Linear Regression (MLR) for pCO 2 Estimation in the Baltic Sea
SOM is an artificial neuronal network algorithm which classifies the input samples into a number of classes, based on their Euclidian distance from each other in the space determined by the variables of the input data [20,70]. Often, the number of classes (neuron) are given a priori in a grid format (e.g., 2 × 5). Each class corresponds to a neuron which contains the coefficients determining the relationship between the variables and the dependent variable in the same class, which is also called labelling the class with the dependent variable (output). In the case of sea surface pCO 2 estimation with SOM, the remotely sensed variables, like Chl-a and SST, in the training data, are used to calculate the distance between the input samples for classification. In the pCO 2 prediction with such a SOM model, the samples will be attributed with the pCO 2 of a class to whom the sample show the closest distance to. Detailed description of a SOM application for sea surface pCO 2 estimation by remote sensing data is available in Telszewski et al. (2009). SOM and its variants have been widely used to estimate sea surface pCO 2 with support of remote sensing products [11,12,20,[71][72][73][74]. In this study, we used the SOM algorithm implemented in the R packages of kohonen [75]. We set the size neurons (class) grid to be 25 × 20, in order to have the total number of classes same to the number of trees in the random forest models constructed in this study.
Furthermore, multiple linear regression (MLR) has been used in many studies for estimating sea surface pCO 2 in marginal seas and performed good results [9,16]. Therefore, we compared the performance of SOM, MLR, and random forest in the sea surface pCO 2 estimation in the Baltic Sea. During the comparison, the same variables were used in the three algorithms without any preselection. Random forest, SOM, and MLR models were trained with the identical data and validated likewise.
Two schemes of training data selection were adopted, one with in-situ pCO 2 measurements from 2/3 of the months from random selection (scheme Number 1, same as in Section 4.3) and the other one using 2/3 of in-situ pCO 2 measurements from random selection as training data (scheme Number 2). Scheme Number 2 was similar to the training data selection by [12]. In both schemes, the validation data were the complementary of the training data.

Spatiotemporal Characteristics of Variable Importance to pCO 2 Estimation
On the entire Baltic Sea scale, PAR was the most important variable (mean importance of 66%) for the sea surface pCO 2 estimate during 2002-2011. It meant that the errors of the random forest model constructed without PAR would be by 66% higher than that constructed with PAR. PAR was followed by SST, MLD, a CDOM , and SSS with mean importance of 21%, 20%, 15%, and 14%, respectively. Chl-a and Kd_490nm showed the lowest importance of 12% and 10% (Figure 2A). The variables importance differed among the sub-basins of the Baltic Sea. Compare to the pCO 2 estimate in the entire Baltic Sea (Figure 2A), the importance of PAR, SST, a CDOM , SSS, and MLD for pCO 2 estimation in the Gulf of Finland (i.e., sub-basin No.2) increased by 26%, 13 %, 15%, 5%, and 1% ( Figure 2B). For pCO 2 estimation in this sub-basin, PAR was still the most importance variable. With the mean importance of 25%, a CDOM and SST are the next most importance variables, followed by SSS and MLD with respective importance of 18% and 16% ( Figure 2B). The importance of Chl-a and a CDOM to the pCO 2 estimation in the southern Baltic Sea (i.e., sub-basins No. [3][4] were similar to that for the overall Baltic Sea, with slightly lower importance of SSS in sub-basin No.3 (Figure 2A). The filtering and time window narrowing down left the Gulf of Bothnia (i.e., sub-basin No.1, Figure 1 The variables' importance for pCO 2 estimation also varied on seasonal scales. For the sea surface pCO 2 estimate in the entire Baltic Sea during February-April, PAR was the most important variable with mean importance of 56%, followed by MLD (20%), SSS (15%), SST (15%), and a CDOM (10%). Chl-a and Kd_490nm showed mean importance of 8% ( Figure 3B). From May to July, all the variables displayed a similar importance (12-14%), with Kd_490nm (7%) and MLD (5%) ( Figure 3C). The low importance of all the variables in May-July means that during this period the alternative absence of the variables in the models constructed did not significantly change the accuracies of the respective models. In another word, during May-July, the combination of any six out of the seven variables used in the study can well cover the variations of pCO 2 in the Baltic Sea. For pCO 2 estimation in the entire Baltic Sea in the period of August-October, PAR and SST were the first two most important variables with respective importance of 38% and 31% ( Figure 3D), followed by MLD (16%) and SSS (12%) and the rest variables with importance of 10%. Chl-a and Kd_490nm showed overall low importance for the pCO 2 estimate across Baltic Sea, regardless of the season. From November to the following January, the dense cloud cover over the Baltic Sea region barely allowed any optical images qualified for the retrieval of remote sensed variables. The RMSEs of the 50 models were in the range of 30-80 µatm. The models trained with data from May-July showed the smaller RMSEs (41 µatm) than those trained with in-situ data from February-April and August-October (52 µatm and 55 µatm) ( Figure 3D). Overall, PAR showed the highest importance for pCO 2 estimate in the Baltic Sea across different seasons and locations. SST was the second most important variable. a CDOM is important for pCO 2 estimate in the Gulf of Finland. MLD is important for pCO 2 estimate in all the sub-basins of the Baltic Sea but varied seasonally. SSS is important for pCO 2 estimation in the Baltic Sea both spatially and temporally. Chl-a, which has been commonly considered as the determining variable for pCO 2 , showed low importance to the pCO 2 estimate over the entire Baltic Sea and its sub-basins. Kd_490nm showed low importance for pCO 2 estimation in the Baltic Sea across different seasons and sub-basins.

pCO 2 Maps from Final Random Forest Model
The final random forest model for sea surface pCO 2 estimation for the entire Baltic Sea engaged all the variables, namely, PAR, Chl-a, a CDOM , SST, Kd_490nm, SSS, and MLD. Its RMSE was 47.8 µatm and its coefficient of determination (i.e., R 2 ) was 0.63 ( Figure 4A). The mean absolute error (MAE) of the model was -3.26 µatm, implying a slight overall underestimate of pCO 2 . The pCO 2 predicted with this model exhibited minor overestimates for pCO 2 larger than 450 µatm and slight overestimates for pCO 2 around 200 µatm ( Figure 4A). Both the estimated and observed pCO 2 values were mainly in the range of 100-500 µatm, with a few pCO 2 observations between 500 µatm and 600 µatm ( Figure 4A). The variable importance in the final model was similar to that in Figure 2A. Specifically, PAR was the most important variable, followed by SST, MLD, and a CDOM . Ch-a and Kd_490nm showed the lowest importance ( Figure 4B).
For the period of August 2002-October 2011, pCO 2 maps covering the entire Baltic Sea were retrieved for each month except November, December, January, and February, when the remotely sensed variables were not available due to frequent cloud coverage.
Taking the year of 2005 as example ( Figure 5), the sea surface pCO 2 in the Baltic Sea were in the range of 100-500 µatm. On the spatial scale, the pCO 2 maps exhibited reasonable transitions in the Baltic Sea ( Figure 5). In addition, detailed features of the pCO 2 variation were also displayed in those maps. For example, in April 2005, much lower pCO 2 was present at the river mouths in the southern Baltic Sea compared to other areas. In May 2005, a strip of low pCO 2 was present in the central Baltic Proper. In September 2005, an area of pCO 2 higher than both August and October was displayed in the southern Baltic Sea ( Figure 5). The sea surface pCO 2 in the Baltic Sea exhibited significant seasonal variations ( Figure 5). Generally, low (undersaturated) pCO 2 conditions of 100-300 µatm prevailed during summer months (e.g., July) and the winter months (e.g., October) were characterized by oversaturated pCO 2 conditions of up to 500 µatm ( Figure 5). The pCO 2 variation at different sites in the Baltic Sea also exhibited these characteristics ( Figure 6).
The sea surface pCO 2 in the Baltic Sea also showed significant spatial gradient and variation along the months, particularly between April and September ( Figure 5). In April, July, and August, the southern central Baltic Sea (excluding the sub-basin No.4 in Figure 1B) often displayed pCO 2 approximately 100-150 µatm lower than the northern sub-basins ( Figure 5). In May, the Gulf of Finland and the Gulf of Riga (Sub-basin No.2 in Figure 1B) showed the lowest pCO 2 of 100 µatm in the Baltic Sea. In June, sea surface pCO 2 in the two narrow gulfs increased slightly, while the Gulf of Bothnia exhibits its lowest seas surface pCO 2 in a year. In September, the sea surface pCO 2 in the southern Baltic Sea increased rapidly and displayed a reversed the gradient to that in August. In October, the pCO 2 in the entire Baltic Sea was in the range of 380-420 µatm, rather homogenous in comparison to other months (Figures 5 and 6). On the other hand, different areas in the Baltic Sea showed their minimum pCO 2 at different time. While the Gulf of Finland (No.42 in Figure 6A) and the Baltic Proper (i.e., No.61 in Figure 6A) had two seasonal minima in May and July, respectively, the Bothnia Sea (i.e., No.8 in Figure 6A) and the Bothnia Bay (No.28 in Figure 6A) showed their only seasonal minima of 180-250 µatm in June. Thirdly, the seasonal change points of pCO 2 int the Baltic Sea varied spatially. The pCO 2 in the Bothnia Bay and Bothnia Sea started decreasing in May ( Figure 6B,C), but the pCO 2 in the Baltic Proper and Gulf of Finland in the south showed this change already in April, one month earlier ( Figure 6D,E). The pCO 2 in the Gulf of Bothnia (i.e., No.8 and 28 in Figure 6A) increased already in July, but such changes in the pCO 2 in the southern Baltic Sea were delayed by one month to August. Consequently, in August, when pCO 2 in the northern Baltic displayed are almost equal to the values in winter months ( Figure 6B,C), pCO 2 in the Baltic Proper and Gulf of Finland remained on the level of its summer value ( Figure 6D,E). Furthermore, in the Gulf of Finland (i.e., No.42 in Figure 6A), significant inter-annual pCO 2 differences were present in April and August ( Figure 6D), but, in the Baltic Proper (i.e., No.62, Figure 6A), this occurred in May, July, and August ( Figure 6E). Across the period of 2002-2011, the estimated pCO 2 were correlated to the variables in the Baltic Sea to different degrees in different directions, varying spatially (Figure 7). The Chl-a-pCO 2 correlation varied between −0.5 and 0.5, with general positive correlation in the northern Baltic Sea and negative correlation in the south. The estimated pCO 2 were generally negatively correlated to the co-located a CDOM in the Baltic Sea with correlation coefficients ranging from −1 to 0, and the correlation exhibited larger absolute coefficients than Chl-a-pCO 2 correlation, particularly in the southern Baltic Sea. SST-pCO 2 correlation mostly exhibited negative coefficients (i.e., from −0.5 to 0) in the Baltic Sea, with larger absolute values in the south than in the north. Exceptionally high positive SST-pCO 2 correlation, up to 0.8, was present in the very west part of the Baltic Sea. The PAR-pCO 2 correlation in the Baltic presented the largest absolute coefficients and pCO 2 was mostly negatively correlated to PAR in the entire Baltic Sea (i.e., from −1 to −0.6), showing the same pattern to the SST-pCO 2 correlation. Kd_490nm-pCO 2 correlation showed the similar pattern as Chl-a-pCO 2 , with slightly higher absolute coefficients in the southeastern coasts. SSS exhibited high positive correlation to the co-located pCO 2 at the coastal waters with values ranging from 0 to 0.8, mostly at 0. MLD was positively correlated to pCO 2 in the entire Baltic Sea with large absolute coefficients (0.5-1), except in the very north and west part of the sea.

Comparison of Random Forest and SOM
In the both schemes of training and validation data selection described in Section 4.5, majority of validation data were in the range of 100-500 µatm. The pCO 2 estimated with random forest were in the same range as the validation data ( Figure 8A,C). In contrast, the SOM model constrained the pCO 2 estimate into the range of 230-430 µatm ( Figure 8A,C), particularly in the scheme No.2 where the training data were the randomly selected pCO 2 measurements ( Figure 8C). In addition, often one pCO 2 value estimated from SOM responded to a large range of observed pCO 2 , forming evident horizontal features in the cross-validation ( Figure 8A,D), particularly when the prediction covers multiple months. However, such patterns were not notable in the pCO 2 estimated with random forest or MLR ( Figure 8B,E).
In an example of 50 experiments where the training data were selected with scheme No.1 ( Figure 8A,B), the coefficient of determination of the random forest model prediction was 0.68, much larger than 0.58 and 0.6, the coefficient of determination of the prediction with the SOM and MLR trained with the identical pCO 2 measurements. The mean RMSE of the 50 random forest models trained with training data selected with scheme No.1 was 49 µatm, while the mean RMSE of their SOM and MLR counterparts were 55 and 62 µatm ( Figure 8C). In the case of training data selected with scheme No.2, the mean RMSE of the 50 random forest models was 24 µatm, significantly lower than 30 and 48 µatm, the respective means of RMSEs of the 50 SOM models and MLR models trained with the same sets of training data ( Figure 8F). This indicated random forest outperformed SOM in the pCO 2 estimation in the Baltic Sea.

Characteristics of Variable Contribution to the pCO 2 Estimate
We analyzed the importance of different variables to the pCO 2 estimation in the Baltic Sea using random forest on different spatial and temporal scales. It was evident that the spatiotemporal variability in the variable's importance was high, but some general patterns were visible.
Chl-a displayed overall low importance (small contribution) to the pCO 2 estimate across different spatial and temporal scales in the Baltic Sea (Figures 2 and 3). The Chl-a-pCO 2 correlation in the Baltic Sea was also relatively low, compared to the other variables' correlation to pCO 2 (Figure 7). This was in contrast to previous findings that Chl-a was closely related to pCO 2 in global oceans [13] and marginal seas, like the Gulf of Mexico [10]. The limited importance of Chl-a is probably due to: (1) In addition to Chl-a, PAR, and SST are also fundamental factors for the photosynthesis induced biological fixation of carbon; (2) The studies that established or confirmed correlations between Chl-a and pCO 2 did not include a CDOM [13,76]. But high correlation (r > 0.9) was found between remotely sensed Chl-a and a CDOM in the Gulf of Mexico [17] and West Florida Shelf [41]. Chl-a and a CDOM also displayed similar spatiotemporal patterns in the Baltic Sea ( Figure S8). In the analysis of variables' importance, a CDOM exhibited a more pronounced response to pCO 2 variation than Chl-a (Figure 2A), as it showed higher correlation to pCO 2 than Chl-a did (Figure 7). Similarly, sea surface pCO 2 in the Gulf of Mexico is more closely related to a CDOM than to Chl-a [41]. However, despite its low importance for sea sur face pCO 2 estimate in the Baltic Sea at all the spatial and temporal scales and its general low correlation to pCO 2 (Figures 2, 3 and 7), we still regarded Chl-a as an important variable for the pCO 2 estimation in the Baltic Sea. This is particularly the case during summer (i.e., May-July), when the cyanobacteria and phytoplankton blooms takes place often, uptakes CO 2 and reduces the sea surface pCO 2 in the Baltic Sea [58]. The low importance of Chl-a in May-Jul (summer in this study) ( Figure 3B) is very likely that, during this time, the effect of absent Chl-a in the model was compensated by variables highly correlated to Chl-a during in this time (e.g., CDOM and SST). Likewise, the other variables also exhibited low importance for pCO 2 estimate in May-July ( Figure 3B). Yet, this was the case for the Baltic Sea, as for its applicability in other marginal seas, and the situation should be treated carefully.
Overall, PAR exhibited the highest importance for the pCO 2 estimation in the Baltic Sea across different sub-basin and nearly in every season, except summer. In addition, the PAR-pCO 2 correlation coefficients were of the largest absolute values among all the variable-pCO 2 correlations (Figure 7). The high importance of PAR for pCO 2 in the Baltic Sea and its sub-basins and the high correlation of this variable to sea surface pCO 2 are attributed to the high seasonality of the sun illumination. Located at the high latitude of the Baltic Sea 54-66 • N (Figure 1), the sun illumination in the central Baltic Sea, for example, varies from 6 h in winter to 18 h in summer. As phytoplankton photosynthesis is largely determined by the available sun illumination, it is reasonable that seasonality of pCO 2 aligns with that of PAR. In addition, river discharge loaded with CDOM, etc. is also characterized with high seasonality and, to large extent, synchronized to PAR [30], so is the bacteria respiration dependent on the available organic matter. Therefore, it is reasonable that PAR exhibited high importance for sea surface pCO 2 estimation in the Baltic Sea and its sub-basins. The importance of PAR in the pCO 2 estimate in the Baltic Sea in different seasons can be attributed to the wide span of the Baltic Sea (12 • ) in latitude ( Figure 1) and the resultant large gradient in sun illumination. On a day in spring, the sun illumination in the southern Baltic Sea is 2-3 h longer than that in the north, same for fall. The gradients in PAR largely impose differences in the intensities of phytoplankton photosynthesis, SST distribution, and ultimately to CO 2 uptake of sea water via primary production. As for in summer when PAR and other variables displayed similar but low importance, sun illumination in the northern Baltic Sea is up to 6 h longer than in the southern Baltic Sea, displaying an even larger spatial gradient across the Baltic Sea than in other seasons. However, owing to snowmelt, the co-current freshwater discharge and the nutrients it loads are all very high in the Baltic Sea in late spring and early summer [30], create a high spatiality in the nutrient and DOM etc. Yet, the spatial pattern of cDOM etc. are likely different from that of PAR, depending on the sizes of catchment and land cover types. When all the processes determining pCO 2 take place with similarly high intensities, none of the variables exhibit prominent importance, but all of them jointly determined the pCO 2 in the Baltic Sea in summertime with similar degree (importance).
Concerning the determination of the seasonality in sea surface pCO 2 , the Julian day of the year (DOY) has been frequently in previous studies [12,16]. However, in this study, PAR holds two advantages over DOY. Firstly, PAR is a direct measure of sun radiation available for photosynthesis, and it has physical meaning, while DOY is a proxy of the seasonality. Secondly, a trigonometric conversion is often applied on DOY to correctly proximate the seasonality. Specifically, the minus cosine of DOY was used for pCO 2 estimate in waters in the northern hemisphere and cosine of DOY for waters in the southern hemisphere [16,18]. Consequently, a trigonometric conversion of DOY attributes a spatially constant value in the entire hemisphere and overlook the effect spatial gradient of sun illumination. In contrast, PAR captures well the spatial gradient of sun illumination along the longitude and express its effect on photosynthesis in the water. Therefore, we suggest that future sea surface pCO 2 estimation consider the participation of PAR instead of DOY ( Figure 1).
The SST holds the same position in the pattern of variables' importance for pCO 2 estimate in the Baltic Sea and its sub-basin ( Figure 2). This was probably because the seasonality magnitudes of SST in each sub-basin are on the same order, particularly when the sub-basins are relatively small and well mixed horizontally. In many cases, despite its correlation to pCO 2 being on the same order as the Chl-a-pCO 2 and Kd_490nm-pCO 2 correlations, SST showed a larger importance than Chl-a, which aligned with the prediction error produced by alternatively omitting the variables by [17]. In the pCO 2 estimates for the Baltic Sea in different seasons, SST was more important in August-October than in other seasons (Figure 3). This was probably because, in fall, the large spatial gradient in SST in the Baltic Sea responded more to the pCO 2 distribution at a similar degree as the PAR does, but more than other variables. For example, the sea surface in the Gulf of Bothnia starts freezing already in October and lower down the primary production, whereas the southern Baltic Sea remains open water at time and allow the biological CO 2 uptake [77].
Despite its low importance for the pCO 2 estimate for the entire Baltic Sea, a CDOM exhibited more important for the pCO 2 estimate in the Gulf of Finland than in other subbasins ( Figure 2B). The a CDOM -pCO 2 correlation in the Baltic Sea is also relatively large, particularly at the coast and in the Gulf of Finland ( Figure 7). As mentioned previously, bacteria respiration produces CO 2 by decomposing organic carbons, like DOM [14,15]. The relatively narrow waters of the Gulf of Finland receive a large terrestrial input of DOM from the rivers, including the Neva, which drains the largest sub-catchment of the Baltic Sea, approximately 1/6 of the total Baltic Sea catchment [30]. The changes of sea surface pCO 2 in the Gulf of Finland largely responded to the changes in CDOM there. Therefore, a CDOM is important for pCO 2 estimation in the Gulf of Finland ( Figure 2B) and thus in the Baltic Sea, as well. Similar mechanism very likely applies at coastal waters receiving river discharges. Moreover, this study used the a CDOM derived from MERIS images. The MERIS sensor was succeeded by the Ocean and Land Color Instrument (OLCI) sensors on Sentinel-3 satellites in 2016. Therefore, a CDOM derived from OLCI images will likely play an equivalent role in the pCO 2 estimate in the Baltic Sea and other similar waters.
Though less than PAR and sometimes slightly less than SST, MLD was important for the pCO 2 estimation in the Baltic Sea and all its sub-basins ( Figure 2B). pCO 2 in the Baltic Sea is largely and positively correlated to MLD (Figure 7). This is probably resulted from the seasonally varying amount of fresh water discharged by the many rivers and lay above the relatively saline and heavy water [78]. In addition, seasonal winds in the Baltic Sea might have jointly determined the high variation of MLD [32] and, consequently, the vertical mixing of sea water and pCO 2 , as well.
In this study, Kd_490 nm showed low importance to the pCO 2 estimation in the Baltic Sea, regardless of season or sub-basin (Figures 2 and 3) and a relatively weaker correlations to pCO 2 (i.e., from −0.7 to 0), compared to variables, like PAR and a CDOM . This aligns with the previously found negatively correlation between Kd_490 nm and pCO 2 in the Gulf of Mexico [16]. Here, we argue that the reasons behind the low contribution of Chl-a to pCO 2 estimation very likely also applied to Kd_490nm. This argument is well supported by previous studies. It is found that Kd_490nm in the Baltic Sea was a function of inherent optical properties, i.e., absorption and scattering of phytoplankton, and effects of illumination and viewing angle [79,80]. Furthermore, [81] observed a strong positive correlation between Kd_490nm and river discharge into the Baltic Sea and the latter is rich of CDOM. In addition, a positive correlation of Kd_490nm to Chl-a and a CDOM were noticed in the Baltic Sea (S9), and the Kd_490nm-pCO 2 and a CDOM -pCO 2 correlations also exhibited similar patterns (Figure 7).

Impact of Unbalanced In-Situ Measurements Distribution on the Model for pCO 2 Estimate
The in-situ pCO 2 measurements available in the Baltic Sea during 2002-2011 were unevenly distributed, namely, relatively sparse measurements in the north and dense measurements in the south (Figure 1). In order to ensure the participation of the in-situ data from the northern Baltic Sea, we selected in-situ data month-wise to train and validate the model for pCO 2 estimation, instead of randomly selecting from the in-situ measurements. However, this measure led to the missing determination of variables' importance for the Gulf of Bothnia due to the few months of in-situ measurements in this basin (i.e., March 2006 and September 2009). In the future, including additional in-situ pCO 2 measurements from the Gulf of Bothnia can help analyze the variables' importance for the pCO 2 estimate in that region and understand the processes controlling pCO 2 there. These additional in-situ pCO 2 measurements are also expected to improve the RMSE of pCO 2 estimate for the entire Baltic Sea.
Despite the unbalanced distribution of in-situ data in the Baltic Sea, the monthly pCO 2 maps were retrieved for the Baltic Sea for the period of August-October 2011 ( Figure 5). The RMSE of the model for pCO 2 estimation was 47.8 µatm (Figure 4), slightly larger than 25 µatm and 31.7 µatm, the RMSEs of the models constructed by [16] and [17], respectively, for pCO 2 estimation in the Gulf of Mexico using similar tree-based regression algorithms. Still, the RMSE of 47.8 µatm is relatively small for pCO 2 estimation in the Baltic Sea, considering the following factors: (1) the pCO 2 estimation was undertaken on the monthly frequency, where the in-situ data from entire month was integrated to the few days with remote sensing images; (2) The magnitudes of the seasonal changes in pCO 2 in the Baltic Sea are much larger than that in middle or low latitude marginal seas. For example, the pCO 2 in the Baltic Sea was in the range of 100-600 µatm (Figure 8), while, in the Gulf of Mexico, it was 200-450 µatm [16], and, in the South China Sea, it was 250-450 µatm [11]; (3). The processes controlling pCO 2 across the Baltic sea (e.g., phytoplankton photosynthesis, bacteria respiration and runoff) vary spatially and temporally [30,82] and thus increase the difficulties in mapping pCO 2 in the Baltic Sea with high accuracy; (4) Upwelling take places in the Baltic Sea with varying frequencies among years and months [83] and complicates the pCO 2 process in multiple manners [34,84]. Even though we eliminated the months dominated by upwelling, few upwelling might have remained in the rest of the months and increased the RMSE of the model; (5) Most importantly, the random forest model covered the processes that took places in the entire Baltic Sea in all the seasons in the period of 2002-2011. This task itself is a challenging one due to the above factors. All these factors rendered deriving sea surface pCO 2 in the Baltic Sea more challenging than in other marginal seas.
The random forest algorithm outperformed SOM and MLR in the sea surface pCO 2 estimation ( Figure 8). We attributed this to how the three algorithms treated the variables. In random forest, a series of forests were constructed, and the most effective one was chosen for prediction [59,65]. While the variables and training samples were randomly selected for the tree construction, the best model was the one with little participation of the unimportant variables. In contrast, when the mode was constructed with SOM, all the input variables had the same weights [70]. This very likely amplified the contribution of the unimportant or correlated variables and suppresses the important ones at the corresponding temporal and spatial scale, thus caused misestimates ( Figure 8A,C). The variants of SOM, such as SOMLO, probably also inherit such effects. MLR attributed weights to the input variables by determining their correlation coefficients to the dependent variables. The effect of the coefficients is very evident in the case when the training samples were chosen across months and cover a large variation. For example, in the experiments in Figure 8A-C, the samples covered 2/3 of the months and performed RMSE similar to that of random forest and better than SOM. In contrast, in the experiment where the samples were 2/3 of the entire in-situ data set from random selection, samples from the same season/months of high similarity were likely used. Given that the time window of in-situ data was narrowed down to 9:00-14:00, and the in-situ data from the months dominated by upwelling were also removed, we did not consider the effect of outlier on the modeling and the errors produced by the models were regarded to be from the misestimate of the models. Overall, random forest performs better than MLR and SOM regardless of the variation range of the training data. MLR performs better than SOM when the training data cover a large variation, and SOM performs better than MLR when the training data cover a relatively small variation.

pCO 2 Maps for the Baltic Sea and Its Spatiotemporal Characteristics
In this study, we produced the monthly pCO 2 maps for the entire Baltic Sea over the period of August 2002-October 2020. These maps showed that pCO 2 across the Baltic Sea was characterized by strong seasonality, generally, high pCO 2 in winter and low pCO 2 in summer ( Figures 5 and 6). The trend aligned well with that derived from insitu data in the Baltic Sea [85]. The seasonality of pCO 2 in the Baltic Sea was similar to that in the marginal sea of Gulf of Maine but different from the one observed in Gulf of Mexico by [16]. In addition, the range of seasonal pCO 2 variation in the Baltic Sea (i.e., 100-500 µatm) was larger than that observed for the two marginal seas (i.e., 300-500 µatm) (Figures 5 and 6) [16]. These different seasonal variations trends and variables' importance (e.g., Kd_490nm) suggest that the processes determining the pCO 2 in the Baltic Sea are likely different from that observed in other seas, or same processes work on different intensity, for example, the gradient in PAR.
In addition to the similar seasonal trend, minor differences exist in the seasonal trends of pCO 2 in the Baltic Sea. For example, Baltic Proper and the Gulf of Finland showed pCO 2 minima both in May and July, while, in the Bothnia Bay and Bothnia Sea, it was only shown on minima in June ( Figure 6). May is the time when most rivers pass their annual peak of water levels [30], and, in July, the daytime is the longest in a year in Baltic Sea, with the most sunny days. In addition, different areas in the Baltic Sea showed interannual variations in different months ( Figure 6). For example, the waters in the Gulf of Finland exhibited large interannual variation in April ( Figure 6D), when the large river input take place in the sub-basin [27]. The Baltic Proper showed such variations during May-July ( Figure 6E), when the primary production is high in this sub-basin and upwelling also occurs very often there [58,68]. This indicates that the dominantly driver of pCO 2 are spatially variable across the Baltic Sea. The pCO 2 maps derived from this model exhibited continuous transitions between the sub-basins of the Baltic Sea ( Figure 5). Therefore, these maps are a significant improvement from those produced in previous studied by dividing the Baltic sea into different sub-basins [12].

Conclusions
This study analyzed the variables' importance in the pCO 2 estimation for the Baltic Sea across different time and sub-basins with the support of remote sensing and derived pCO 2 maps for the Baltic Sea from August 2002 to October 2011. We found that the contributions of the variables to pCO 2 retrieval for the Baltic Sea vary both spatially and temporally and likely replicated the spatiotemporal characteristics of the driving forces. Among all the variables, PAR was the most important, followed by SST and MLD. Chl-a contributed surprisingly little to the pCO 2 estimate. a CDOM was important for the pCO 2 estimation for the Gulf of Finland and the Gulf of Riga. The random forest model used for the pCO 2 estimate for the entire Baltic Sea had the RMSE of 47.8 µatm, MAE of −3.26 µatm, and coefficient of determination of 0.63. These pCO 2 maps derived in this study are one of the most reliable pCO 2 fields in the Baltic Sea and can potentially support determining the role of the Baltic Sea as sink/source of the atmospheric CO 2 . Moreover, the variables importance/relevance from this study can provide a benchmark for understanding the different drivers of pCO 2 in the Baltic Sea and how they vary in different time and space.
In the Baltic Sea region, frequent clouds in November, December, and January lead to the absence of pCO 2 maps during those three months. This is an inevitable situation considering the high-latitude location of the Baltic Sea. Derivation of sea surface pCO 2 for the Baltic Sea in the wintertime needs to be achieved by combining the remote sensing supported results with additional sources information, e.g., modeling.
Supplementary Materials: The following are available online at https://www.mdpi.com/2072-429 2/13/2/259/s1, Figure S1: Spatial and temporal distributions of the in-situ data used for training and validating the pCO 2 estimate. Figure S2: Diurnal effect on the pCO 2 estimate. Figure S3: Scenarios where the upwelling affects the pCO 2 estimate from remote sensing images. Figure S4: The effect of upwelling in the pCO 2 estimate with remote sensing image. Figure S5: The monthly mean product of Chl-a derived from MODIS and MERIS images in May, July and September 2011 mapping the Baltic Sea. Figure S6: a CDOM from MODIS and MERIS in the Baltic Sea. Figure S7: The performance differences between of Chl-a from MODIS and Chl-a from MERIS in the pCO 2 estimate. Figure S8: Alternative of the final model for pCO 2 estimate in the entire Baltic Sea. Figure S9: Relationship between variables in the Baltic Sea.
Author Contributions: S.Z., A.R. and P.P. designed the study. S.Z. did the data collection, analysis and manuscript preparation. Writing-review & editing, S.Z., A.R., P.P. and M.B.W. Investigation, S.Z., P.P. and M.B.W. All authors have read and agreed to the published version of the manuscript.