Assessing Phytoplankton Bloom Phenology in Upwelling-Influenced Regions Using Ocean Color Remote Sensing

: Phytoplankton bloom phenology studies are fundamental for the understanding of marine ecosystems. Mismatches between fish spawning and plankton peak biomass will become more frequent with climate change, highlighting the need for thorough phenology studies in coastal areas. This study was the first to assess phytoplankton bloom phenology in the Western Iberian Coast (WIC), a complex coastal region in SW Europe, using a multisensor long-term ocean color remote sensing dataset with daily resolution. Using surface chlorophyll a (chl- a ) and biogeophysical datasets, five phenoregions (i.e., areas with coherent phenology patterns) were defined. Oceanic phytoplankton communities were seen to form long, low-biomass spring blooms, mainly influenced by atmospheric phenomena and water column conditions. Blooms in northern waters are more akin to the classical spring bloom, while blooms in southern waters typically initiate in late autumn and terminate in late spring. Coastal phytoplankton are characterized by short, high-biomass, highly heterogeneous blooms, as nutrients, sea surface height, and horizontal water transport are essential in shaping phenology. Wind-driven upwelling and riverine input were major factors influencing bloom phenology in the coastal areas. This work is expected to contribute to the management of the WIC and other upwelling systems, particularly under the threat of climate change.


Introduction
Phytoplankton bloom phenology (i.e., the study of the annual timing and intensity of phytoplankton blooms) is key for the understanding of marine ecosystems. The development of phytoplankton blooms has important roles in the sequestration of CO2 [1,2] and may strongly influence marine food chains [3]. As a result, changes in bloom timing and intensity can have harmful consequences for the pelagic ecosystem, including mismatches between phytoplankton blooms and fish spawning, which may have severe impacts on pelagic fish communities and, consequently, on fisheries [4]. Moreover, with climate change threatening to alter bloom phenology and increase the frequency of extreme mismatches between fish reproduction and plankton peak biomass [5], changes in annual carbon sequestration budgets may occur [6]. However, even in regions such as the North Atlantic, where phenology studies have spanned over 60 years (e.g., [7][8][9]), certain aspects, such as what triggers the spring bloom initiation, remain uncertain [10]. Given its sensibility to exogenous forcing and different oceanographic regimes, phytoplankton phenology has been used as a major indicator of changes in the pelagic ecosystem [11] and is a tool to assess the ecosystem response to climate change [12,13].
Ocean color remote sensing (OCRS) has been increasingly important for the study of phytoplankton phenology. While OCRS data have their limitations (e.g., cloud coverage or lower accuracy in coastal waters due to colored dissolved organic matter (CDOM) or other seawater constituents), it enables the collection of long-term, continuous data at large spatial scales, contributing to major advances in phenology analysis. As satellite datasets become longer and both temporal and spatial resolution increase, the number of studies using OCRS to assess phenology has naturally increased (e.g., [14][15][16][17]. Most studies have a global or basin-wide focus (e.g., [14,18,19]), yet regional studies focusing on coastal regions are scarce. The higher complexity of coastal regions shapes bloom phenology, as the development of phytoplankton blooms is strongly influenced by variability patterns in environmental conditions stemming either from oceanographic phenomena (e.g., currents, upwelling, eddies; [20,21]) or anthropogenic pressure [22,23].
Several OCRS-based methods have been used to assess bloom phenology, typically focusing on the spring bloom. Most studies focused on the timing of the initiation of the spring bloom and/or the timing of its peak (i.e., when the maximum chl-a concentration is reached). While there is not a standard metric, studies usually define the initiation of the spring bloom based on thresholds set for the annual mean/median [8] or cumulative distribution of remote-sensing chl-a [24]. Ferreira et al. [25] found that either using a threshold of 15% of the cumulative distribution or a threshold of 5% above the yearly median delivers precise and comparable results, stressing that the best metric is case-dependent. Another important aspect is the temporal resolution of OCRS-based phenology studies. While a few studies have used satellite data with a weekly resolution, most use either fortnightly or monthly composites to avoid gaps (e.g., [15,17,19]). Using a temporal resolution coarser than one day may be insufficient to accurately assess changes in bloom phenology in areas where bloom-inducing conditions change rapidly [26]. Nevertheless, while daily resolution would be considered optimal for bloom phenology, it should be used carefully due to the potential existence of large, error-inducing gaps in time-series [25].
Thus, there is a lack of regional studies on phytoplankton bloom phenology based on long, continuous datasets with high temporal resolution. While OCRS-based works have become increasingly more common, most use a temporal resolution coarser than it would be optimal for bloom phenology. As climate change threatens to change bloom phenology globally, likely affecting regions differently, filling these gaps becomes increasingly crucial. This study aims to fill these research gaps by using the Western Iberian Coast (WIC), a complex coastal region located in SW Europe, as a case study. In this work, a long-term (22-years) satellite dataset with daily resolution will be used to assess phytoplankton bloom phenology. Using phenoregions (i.e., areas with coherent phenology patterns), this work seeks to answer the following questions: i) How does phytoplankton bloom phenology vary across WIC? ii) Has bloom phenology in the WIC changed over the past 20 years? and, iii) What are the main drivers of interannual changes in bloom timing and intensity along the different areas of WIC?

Study Area
The Western Iberian Coast (36°-45°N, 12°-6°W; Figure 1) is located in Southwestern Europe, encompassing the Atlantic coast of both Portugal and Spain. Oceanographically, this is considered a complex and heterogeneous region. Regional phytoplankton communities are influenced by several agents, from basin-wide agents such as the North Atlantic Oscillation (NAO) or the Atlantic Multidecadal Oscillation (AMO) pattern to mesoscale and local ones (e.g., coastal upwelling, eddies, larger river basins drainages [27][28][29][30]. Winddriven coastal upwelling is a major factor shaping biological communities during the summer [31,32], owing to WIC's placement in the northernmost sector of the Canary Current Upwelling System. Although several upwelling centers can be found, the main one is located in Northwestern WIC, coinciding with an Iberian sardine recruitment hotspot [33]. Recent studies have shown that, while drivers of phytoplankton biomass vary within the WIC, the most common include NAO, AMO, mixed layer depth (MLD), sea surface height (SSH), and wind direction (e.g., [29,30,34,35]).

Chl-a Data
Satellite-derived chl-a (mg m −3 ) was used as a proxy of phytoplankton biomass. Level 3 daily chl-a data, for the period 1997-2018, with a spatial resolution of 4x4 km, was extracted from the ESA Ocean Color-Climate Change Initiative (OC-CCI) product (version 4.2; available online at https://esa-oceancolor-cci.org/; [36]). This product integrates OCRS data from four distinct ocean color sensors: the Sea-Viewing Wide Field-of-View Sensor (SeaWiFS; 1997-2010), the medium-resolution imaging spectrometer (MERIS; 2002-2012), the moderate resolution imaging spectroradiometer (MODIS; 2002-present), and the visible infrared imaging radiometer suite (VIIRS; 2011-present). To integrate data from different sensors, data were initially atmospherically corrected using the POLYMER v4.1 algorithm (MERIS [37]) and the l2gen tool from SeaDAS v7.5 (SeaWiFS, MODIS, and VIIRS). Data were subsequently band-shifted to the main SeaWiFS bands, bias-corrected, and merged. A valuable characteristic of this dataset is that it contains information on the error and bias of its measurements, using generated optical water classes based on remote-sensing reflectance to identify the best performing algorithms for each water class (more information in [38,39]). The OC-CCI chl-a product is validated with extensive in situ data with global distribution (see [40]), exhibiting a good correlation with in situ measurements for case 1 waters (R 2 =0.73 [39]).
Nevertheless, to study coastal regions, it is essential to ensure that the algorithm has a good regional performance due to the possible influence of case 2 waters. Case 2 waters are those whose inherent optical properties (IOPs) are not predominantly shaped by phytoplankton. Typically, these coincide with coastal or inland waters with higher concentrations of CDOM and mineral particles. IOPs of case 1 waters, on the contrary, are dominated by phytoplankton, which makes it easier to derive chl-a or other ocean color variables. Within the WIC, a regional validation has already been performed for version 1 of the OC-CCI chl-a product, using in situ data from the Portuguese Coast [41]. The authors reported root-mean-square error (RMSE = 0.33), standard deviation, and coefficient of determination (R 2 =0.74) comparable to other products (e.g., MODIS OC3M). For the current study, a new validation exercise for version 4.2 of the OC-CCI product [36], using more recent in situ data from the Portuguese Coast, confirmed these results (data not shown). Among others, the determination coefficient was seen to be similar (R 2 = 0.6), and the rootmean-square and bias errors were lower, a signal of an increase in performance from the initial version of the product (n=113). Thus, the performance of the OC-CCI v4.2 chl-a product seems appropriate to evaluate regional phytoplankton bloom phenology in the WIC.

Estimating Bloom Phenology
Since the estimation of bloom phenology metrics may be highly impacted by missing data [42], the chl-a dataset was preprocessed using two methodologies. First, a 3-step gapfilling algorithm, as implemented by Racault et al. [42], was applied. This algorithm consists of sequentially filling gaps with a 3-pixel-size window using the mean value of its neighbors along the three dimensions of the dataset (i.e., longitudinally, latitudinally, and over time; see [42] for more details). Second, a 3-week centered moving mean was used to smooth the chl-a signal, helping fill the remaining gaps that could yet be present in the time-series, following [25]. While there is the risk that these procedures may slightly alter the chl-a signal linked to short enrichment events (e.g., upwelling events), it is necessary to prevent phenology metrics from being skewed by anomalous biomass spikes that may occur during such events. To evaluate the mean phenology metrics between 1998 and 2018 across WIC, an average of the chl-a concentration for each day of the year during 1998-2018 was calculated for each pixel. Data corresponding to 29 February was discarded since it only occurs in leap years and would be severely under-represented. Thus, for each pixel, a vector containing the mean annual cycle (n=365) was obtained.
The metrics BAmp, BInit, BTerm, BDur, BPeak, and BArea, were only calculated for the main bloom of the year (i.e., where the maximum chl-a concentration was identified). Blooms were identified using two criteria [24,43]: 1) chl-a must surpass a threshold of 5% of the annual chl-a median, and 2) this condition must be maintained for a minimum of 15 days.  While this approach does exclude blooms shorter than two weeks, this approach helps reduce noise and delivers robust results [25]. Moreover, it should be better suited for studying blooms across the year than algorithms that focus on the cumulative distribution of chl-a (e.g., [24,44]), which is better suited for identifying the spring bloom. Bloom initiation (BInit) and bloom termination (BTerm) dates were defined as the day of the year before and after the bloom, respectively (i.e., when chl-a biomass fell below 5% of the annual median). Bloom duration (BDur) was calculated as the difference in days between termination and initiation of the bloom, while the amplitude (BAmp) corresponds to the difference between the yearly maximum and mean. BArea was estimated by numerically integrating the area of the graphical representation of chl-a biomass during the period of the bloom, using Simpson's rule, and used as a measure of the magnitude of the bloom in terms of phytoplankton biomass. The YArea was calculated exactly as BArea, but for the entire yearly cycle. Note that, since these two methods integrate chl-a data, they maintain their units and may yield a wide range of chl-a concentration depending on the chl-a and duration of the bloom (10-1000 mg m −3 ).

Regional Analysis of Bloom Phenology
Subsequently, to account for its spatial heterogeneity, the WIC was partitioned into phenoregions (i.e., regions with coherent phenology) using the phenology metrics calculated in section 2.3. This approach has already been successfully applied (e.g., [15,45]) and enables further analysis of the annual chl-a cycles within each phenoregion. First, autocorrelated phenology metrics were discarded from the analysis (R 2 >0.75). Second, an agglomerative hierarchical clustering analysis was performed using the following phenology metrics calculated for each pixel: yearly maximum, BPeak, BInit, BTerm, BDur, BArea, and BFreq. Data were normalized before analysis. Euclidean distance was used as the distance metric between pixels and Ward's method [46] as the linkage criterion. Ward's method is considered as one of the most robust hierarchical clustering methods and has the advantage of being able to separate clusters surrounded by noise [47], a valuable feature due to the complexity of WIC.
Finally, for each one of the defined phenoregions, chl-a was spatially averaged and used to estimate the same eleven phenology metrics for each year (n=21). To account for blooms that span over two calendar years (e.g., a bloom that starts in November and ends in January), a 1-year temporal window spanning 6 months before the BPeak to 6 months after was used. Metrics were subsequently calculated for this window, following [43]. Whenever this was not possible (e.g., at the beginning and end of the dataset), the corresponding year was excluded from the analysis. For each of the calculated metrics, a linear trend was calculated for the period 1998-2019 by estimating the slope of the least-squares fit.

Environmental Data Products
To investigate the drivers of phytoplankton bloom phenology for each of the phenoregions identified, a comprehensive suite of environmental variables were gathered ( Table 2): sea surface temperature (SST; °C), mixed layer depth (MLD; m), salinity (SAL; unitless), sea surface height, a proxy of heat storage as warmer waters expand and exhibit higher SSH (SSH; m), the zonal (V) and meridional (U) component of surface seawater direction (m.s −1 ), nitrate (NO3 − ; µM), ammonium (NH4 + ; µM), phosphate (PO4 3− ; µM), silicon (Si; µM) and the euphotic zone depth (Zeu; m). SST (daily 4 km spatial resolution) was extracted from the ODYSSEA Level 4 reprocessed SST product over the European North West Shelf/Iberia Biscay Irish Sea [48]. Model-based MLD, SAL, SSH, U, and V were acquired from the Copernicus Marine Environmental Monitoring Service (CMEMS) North Atlantic Iberian Biscay Irish Regional Seas Ocean Physics Reanalysis Product [49], with a temporal and spatial resolution of 1 day and ≈8 km, respectively. NO3−, NH4+, PO3−4, Si, and Zeu modeled data were retrieved from CMEM's North Atlantic Iberian Biscay Irish Regional Seas Ocean Biogeochemistry Non-Assimilative Hindcast product [50], again with a temporal and spatial resolution of 1 day and ≈8 km. Dissolved inorganic nitrogen (DIN; µM) was calculated as the sum of NO3 − and NH4 + . All these products are available online at CMEMS (http:/marine.copernicus.eu/). Each product is specific for the North-West Atlantic region, which encompasses the Iberian West coast. Plus, it is internally validated using data from various sources (e.g., remote sensing, modeled, and in situ data) prior to distribution, ensuring higher quality. Yearly datasets on the NAO, AMO, and multivariate El Niño/Southern Oscillation Index (MEI) spanning 1998-2018 were also acquired from NOAA (available online at https://www.esrl.noaa.gov/psd/data/climateindices/).

Random Forest Analysis
Random forest (RF) models [51] were used to identify and evaluate the main drivers of each metric of bloom phenology. RF is one of the most popular analysis tools in ecology for identifying predictors of a given response (e.g., [52][53][54][55][56]), mainly due to its suite of advantages: 1) high flexibility and capability to detect nonlinear links between the predictors and the response variable; 2) high robustness to overfitting; 3) ability to successfully cope with outliers, high-dimensional data and collinearity among predictors; 4) consistent and easily understandable results [52,57,58].
RF is based on the aggregation of multiple decision trees. Each regression tree is built using a random sample from the input dataset. Along the tree, a random selection of predictors is available at each decision node, where a splitting predictor is chosen based on a criterion until a final prediction is made at the end of the tree. An important component of the RF algorithm is the use of out-of-bag observations (i.e., a random subset of observations not used for training the algorithm [51]) error to internally validate the trees.
For each determined phenoregion. 10 RF models were performed, one for each phenology metric. The number of trees for the RF was set to 500, as in [55] and [56]. The model performance was assessed by calculating the coefficient of determination (R 2 ) and the root-mean-square error (RMSE). Drop-column importance, a method that assesses the importance (i.e., explanatory power) of each predictor by comparing the performance gained or lost when dropping each predictor to the performance of a baseline model with all predictors [59], was used. A summary of the workflow of the methodology is presented in Figure 3.  Table 1 for more information on the phenological metrics.

Phytoplankton biomass off Western Iberia
Chl-a concentration off Western Iberia ( Figure 1) range from <0.5 mg m −3 (low productive oceanic regions) to >3 mg m −3 (productive coastal patches). The most productive areas are located in the northern and central sections of the WIC and the Gulf of Cádiz (SW Spain). Moreover, for offshore waters, there is a gradient from low to high biomass towards northern latitudes on oceanic water, a consequence of the stronger spring blooms observed in northern waters.
Seasonality in phytoplankton biomass in the WIC (see Supplementary Figure S1) is evident, particularly in offshore waters. Overall, spring is the most productive season of the year, as mean oceanic chl-a concentration increase to over 1 mg m −3 . Onshore-wise, a clear chl-a maximum in spring can also be seen in the waters enclosed by the Gulf of Cádiz (>3 mg m −3 ). The decrease in oceanic chl-a from spring to summer is clear, while biomass increases near upwelling centers (e.g., NW Portugal and Spain). Autumn exhibits a similar pattern to summer, although results suggest oceanic phytoplankton biomass slightly increases in the northernmost latitudes. Winter chl-a biomass reveals a transition towards spring, as oceanic chl-a concentration increases, while chl-a near upwelling zones diminishes.

Bloom Phenology Metrics
Estimated bloom phenology metrics along WIC highlight its complexity ( Figure 4). Results show that yearly maximum chl-a concentration exceeds 1 mg m −3 in most oceanic waters and 2 mg m −3 in coastal waters ( Figure 4a). As with mean chl-a concentration, Bloom amplitude (BAmp; Figure 4b) is also higher towards northern latitudes offshore. Coastal waters also tend to have higher BAmp, reaching amplitudes over 2 mg m −3 in several biomass-rich areas (e.g., NW coast, Gulf of Cádiz). While the bloom peak date (BPeak; Figure 4c) is rather homogeneous offshore (March-April), blooms in higher latitudes (>41°N) tend to peak later (April). Coastal-wise, there is more variability. Blooms in some regions (e.g., Gulf of Cádiz) appear to peak during the early spring, while blooms in upwelling centers peak later.
Bloom timing was also seen to vary across latitude and between oceanic and coastal waters ( Figure 5). Offshore, bloom initiation (BInit; Figure 5a) ranged between November-February, with spring blooms off SW Iberia typically starting and ending (Figure 5b) earlier, i.e., compared to a typical North Atlantic spring bloom, and lasting longer (2 months; Figure 5c) than blooms north of 39°N. Onshore, clear differences were observed along the West and South coast, with Sagres acting as a transition zone. While blooms along the eastern sector of the South coast (8-6°W) initiate between December-February, blooms off the remaining coast appear to start later (June-August). Overall, coastal blooms typically lasted under 80 days, generally terminating between October-November. Bloom area (BArea; Figure 5d) was higher onshore, in some areas exceeding 300 mg m −3 . The less productive blooms (<50 mg m −3 ) appear to be located in the SW oceanic waters and in a coastal patch along the continental shelf break off the Western coast.  Overall, the most productive areas are coastal, mostly overlapping with the areas characterized by higher YArea (>500 mg m −3 ) and BFreq (>3 blooms year −1 ; Figure 5e). Offshore, the annual cumulative production is, as expected, significantly lower (50-200 mg m −3 ). Bloom frequency (BFreq; Figure 5f) was also seen to be higher onshore (2-5 blooms year −1 ). Blooms were especially frequent in NW Iberia, with several pixels being characterized by 4-5 blooms year −1 . Again, a distinction in bloom phenology between oceanic phytoplankton communities can be seen, as communities below 39°N, excluding those in the Gulf of Cádiz, are characterized by a single yearly bloom, while communities further north already display an additional bloom apart from the spring bloom.

Regional Patterns of Bloom Phenology
Five phenoregions were defined in the WIC, based on the dendrogram extracted from the hierarchical clustering analysis (Figure 6a; Supplemental Figure S2). Oceanic waters in the WIC were separated into two phenoregions: one limited to waters north of 39°N (OcN; Figure 6b) and another limited to waters in the southern part of WIC, from 11°W to 6°W (OcSW; Figure 6c). A transitional region, encompassing waters of both coastal and oceanic characteristics along the continental shelf margin of the Western coast (CoMa; Figure 6d), was also defined. Finally, two coastal phenoregions were also established. CoUp (Figure 6e) included pixels along of two main upwelling centers-Center and NW Iberia and Sagres, while CoBa ( Figure 6f) appears to be limited to the Gulf of Cádiz and small areas within the vicinities of major river basins (e.g., Sado, Tagus, Ria de Aveiro, Douro, Ría de Vigo, and Ría de Pontevedra). For each phenoregion identified, statistical metrics and intercorrelation results are presented in Table 3. Yearly bloom information ( Figure 7) and bloom phenology metrics anomalies ( Figure 8) for each phenoregion are also exhibited.    Table  3. ***, **, and * correspond to the degree of statistical significance (p-value < 0.1, p-value < 0.05, p-value < 0.01, respectively).
Oceanic North (OcN) was the largest identified phenoregion. Being a largely oceanic region, OcN displays low productivity (mean chl-a of 0.34 mg m −3 ). Blooms occurring in this region are clearly defined as spring blooms, typically spanning February-May. On average, two blooms occur per year. Linear trend analysis results indicate that blooms are starting (p-value<0.1) and peaking (p-value<0.05) later over the past 22 years. Nevertheless, no significant decline was observed in bloom duration. Correlation-wise, later BInit was negatively correlated with bloom duration. In addition, blooms with a longer duration and with a later BTerm were associated with higher bloom area.
Oceanic Southwest (OcSW), the other purely oceanic phenoregion delineated, was the least productive one, with low mean chl-a (0.25 mg m −3 ). While OcSW also exhibits spring blooms, these initiate much earlier (early December) than blooms observed in OcN. Despite its much longer blooms, their total biomass is also low (50.88 mg m −3 ). Plus, there is usually only one bloom per year, with a mean duration of 5 months. Contrary to OcN, blooms in OcSW with later peaks were linked with earlier bloom termination dates. At the same time, the spring blooms starting in December were seen to match longer blooms and higher biomass.
Coastal Continental Margin (CoMa) shared characteristics with the oceanic and coastal phenoregions identified, exhibiting intermediary productivity. In terms of timing, blooms were seen to frequently initiate in February, peak in early-spring and end in May. Blooms were more frequent (~3 blooms year −1 ) and of higher biomass than in the oceanic phenoregions (65.16 mg m −3 ). Over the dataset period, rising trends were detected for the BPeak, indicating blooms have been peaking increasingly later in the year (p-value<0.1). Years with high maximum chl-a were linked with fewer, high biomass blooms.
Coastal Upwelling (CoUp) is a productive and variable coastal region (mean chl-a concentration of 1.11 mg m −3 ). The main bloom of the year is typically short (47 days), starting in late August, peaking in early October and ending in late October. Blooms in CoUp were frequent and exhibited higher biomass (66.61 mg m -3 ). A trend can be seen towards earlier bloom initiation dates (−6.88 days year −1 ; p-value<0.1). CoUp, along with CoMa, were the only regions where BArea was not correlated to mean or YArea, suggesting that the biomass associated with the main bloom is independent of the yearly productivity.
Coastal River Basins (CoBa) was the most productive phenoregion delineated. Blooms in this region started in January/February and usually ended in April/March. Blooms in CoBa were also the richest in biomass in the WIC (228.66 mg m −3 ). Nevertheless, similarly to CoUp, the biomass amassed by these blooms were only a small fraction of the YArea. Trend-wise, over the past 20 years, statistically significant declines were identified for mean chl-a (p-value < 0.01), YArea (p-value < 0.05) and BDur (p-value < 0.01). Nevertheless, it should be noted these results should be carefully regarded as they may be largely influenced by a recent sequence of negative anomalies in mean chl-a, YArea, and BDur (2012-2018) that followed three years with high biomass (2009-2011). Correlation analysis results show that earlier blooms (i.e., lower BInit) corresponded to higher bloom area, higher mean chl-a, and yearly cumulative area. Table 3. Phenoregions statistical metrics and intercorrelation results calculated for phenology metrics (see Table 1 for full names and description). For each phenoregion, mean, minimum (Min), maximum (Max), standard deviation (Stdev), 10th and 90th percentiles (P10 and P90), and mode are presented for the phenology metrics for the 1998-2018 period. The linear trend (Trend) is the slope of the least-squares regression for each metric. Correlation results between phenology metrics are shown in red (positive correlation) or blue (negative correlation). Only statistically significant correlations and trends are displayed. *, ** and *** equals p-value < 0.1, p-value < 0.05 and p-value < 0.01.

Drivers of Bloom Phenology
The explanatory power of the random forest models (RF) ranged between 43% and 87% (as seen by the calculated in-sample R 2 ), depending on the region and the metrics considered (Table 4). Table 4 also contains information on each metric's response to the main predictors (i.e., with above-average importance) identified by the RF. The partial response of each metric to an increase in its main predictors (bold) is represented as +, − or 0, based on the partial dependence plots (Supplemental Figure 3). For a given metric and predictor, if above-average values of the predictor lead to a constant increase or decrease of the metric, the response is represented as + or −, respectively. Else, if no constant response is observed, the response is considered neutral (0). For further clarification on the effect of each main predictor, consult Supplemental Figure S3. Table 4. Random forest models result for each phenoregion. For each phenology metric, the coefficient of determination (R 2 ), the root-mean-squared error (RMSE) and the predictors are presented in order of importance. The partial response of each metric to an increase in its main predictors (*) is also represented as plus (increase), minus (decrease) or 0 (neutral), following the order of the predictors. DIN, Si, MLD, and AMO were identified as the main drivers of phytoplankton phenology in the OcN phenoregion. Higher DIN concentration was associated with increases in the mean chl-a and YArea, as well as with later spring blooms. Si was seen to negatively influence max chl-a and bloom biomass. Deeper mixed layers were associated with blooms with later peak dates. AMO was linked to bloom termination, duration, and frequency.

OcN
Regarding the OcSW region, DIN, MEI, and Si were the most common drivers singled out by the models. DIN was seen to be instrumental towards years with higher max chl-a and biomass (YArea). Higher values of MEI also led to higher mean chl-a and earlier blooms. Finally, higher Si concentration was associated with lower biomass blooms with earlier peaks and fewer blooms.
Random forests for the CoMa phenoregion were characterized by the sea surface height (SSH) and DIN as major predictors. Higher SSH was seen to be associated with blooms with later peaks and end dates, as well as higher biomass blooms and overall higher YArea. DIN was linked to higher chl-a and BAmp and blooms with earlier termination. Results suggest that bloom initiation, however, was shaped by the MLD.
CoUp had no clear predominant predictors, with salinity (Sal), MEI, SSH, and the meridional water transport (Uo) being among the most common. Higher salinity was linked to years characterized by fewer blooms and higher biomass overall. MEI was seen to contribute to higher max chl-a, BAmp, and earlier blooms (i.e., starting earlier). SSH was the main factor behind bloom peak and termination, with lower SSH contributing to earlier peak and termination dates. Higher Uo (i.e., westward transport) was associated with longer blooms, while lower Uo (eastward transport) led to higher-biomass blooms.
Phenology in CoBa was also very heterogeneous in terms of model results. Nevertheless, Sal, Si, and Uo were among the most frequent predictors. Surface waters with higher salinity were associated with higher mean and max chl-a, as well as with higher yearly biomass. Higher Si concentration was, on the other hand, linked to lower BAmp and earlier blooms. Moreover, the RF models for CoBa suggest eastward water transport was also instrumental, contributing to lower overall chl-a mean, yet higher biomass, longer blooms.

Phytoplankton Bloom Phenology in the WIC
Phytoplankton in oceanic waters off Western Iberia are clearly defined by a spring bloom. However, there is a difference between the spring blooms north of 40°N and the spring blooms south of 38°N. While both situations are variants of the North Atlantic spring bloom, the first is more akin to the classical spring blooms [7,60], and its duration is in line with previous studies [8] for the same latitudes. Moreover, a fall bloom typically follows, benefiting from the increase in the MLD and the consequent enclosing of nutrients [61]. South of 38°N, however, phenology was characterized by much longer, yet lower biomass blooms and a single bloom per year.
WIC's role as a transition region between North Atlantic temperate and subtropical waters has already been pointed out for phytoplankton biomass [30]. This study not only corroborates those findings but also confirms that this transition also shapes phytoplankton bloom phenology. Moreover, basin-wide studies have shown differences in bloom initiation across WIC (e.g., [8,15,19,62]). Furthermore, trend-analysis performed in this work suggest blooms north of 40° appear to be initiating and peaking almost a month later. On the contrary, [19] observed a trend towards earlier bloom initiation off NW Iberia. Nevertheless, this study sought to evaluate phenology at a basin-wide scale. Thus, a much lower spatial (100x100 km) and temporal resolution (8 days) were used by those authors, which may have contributed to the observed differences.
Bloom phenology in the WIC was seen to vary even more along the coast as three separate phenoregions were clearly defined. A clear transition region (CoMa) was identified along the continental shelf margin, exhibiting intermediate productivity (comparing to the coastal and oceanic phenoregions) and a typical spring bloom, like its adjacent oceanic waters. These results corroborate the findings of [63] and [64], which also observed a similar transition in the slope area. Moreover, [63] found that the slope area off NW Iberia was more similar to the oceanic area than to the shelf area in terms of the chl-a annual cycle.
Near the main upwelling centers (NW Iberia and Sagres; CoUp), phytoplankton was characterized by frequent, irregular high-biomass blooms. The strongest blooms occur during the late summer and early autumn, matching the seasonal upwelling observed off NW Iberia [32,65] and previous phenology studies [66,67]. The high-frequency of blooms and their short duration is a sign of the pulsed input of nutrients and turbulence that characterize wind-driven coastal upwelling-downwelling cycles, leading to high variability in bloom timing. It is also important to consider the observed shift towards earlier blooms. While it is difficult to know what is behind this shift, the increase in upwelling intensity off NW Iberia [65,68], which is contributing to an increase in overall phytoplankton biomass [30], may also be leading to the establishment of the summer bloom, allowing for earlier main blooms.
CoBa included pixels close to several large river basins, more specifically the Tagus, Sado, Ria de Aveiro, and, overall, the Gulf of Cádiz systems, which share a similar signature in terms of bloom phenology. This region's short and frequent blooms, as well as its high variability in bloom timing, suggests bloom phenology is regulated by pulses of favorable environmental conditions. This is expected, as river discharges (and its nutrient input) are a major factor in regulating phytoplankton biomass in the Gulf of Cádiz [29,69] and off river mouths. Despite its variability, most blooms were seen to initiate in midwinter, which is typically when river discharges are at their highest volume [70] and terminate in spring. This, along with the high chl-a, could be a sign that this phenoregion may be characterized as case 2 waters. In case 2 waters, CDOM, sediment interference, and particle resuspension absorb sunlight in the blue/green bands, inducing errors in the chl-a signal [71]. Plus, the presence of suspended particulate matter (SPM) may also difficult the atmospheric correction procedure of remote sensing imagery [72]. Thus, care should be taken regarding a possible chl-a overestimation caused by CDOM and SPM. While this may affect all coastal pixels, pixels belonging to CoBa are the most likely to be overestimated due to their proximity to river basins. This phenoregion was also the one that changed more over the past 20 years, as it became less productive and with higher bloom duration. This could be a consequence of the already observed significant reduction in river discharges [73]. If river discharges continue to decrease as projected (estimate decline of 61-92% by 2050 [73]), nutrient availability should decrease, leading to overall lower biomass. It is, however, unclear what are the consequences of this shift for other phenology metrics, as bloom area and frequency remained unchanged despite being correlated with bloom duration.

Drivers of Bloom Phenology
Understanding what drives phytoplankton phenology is particularly important in complex coastal regions, where water column conditions are not as stable as in the open ocean, and a suite of agents work together to shape phytoplankton. Offshore, bloom phenology north of 38°N (OcN) was mainly seen to be driven by changes in the DIN, Si, MLD, and AMO. As mentioned above, OcN displayed a typical temperate North Atlantic spring bloom. Thus, it makes sense that nutrients, basin-wide agents (AMO), and MLD, a known major factor in bloom regulation in the NE Atlantic [74], have significant roles in phenology. While the onset of the spring bloom is still not completely understood [10], DIN concentration was here identified as the main factor behind the bloom initiation. While this result does not mean that DIN is a factor in bloom initiation, it is likely that years with higher DIN concentration allow for faster growth and, therefore, earlier bloom initiation dates as calculated here using the 15% of the mean threshold. Moreover, the fact that higher Si concentration is associated with lower max chl-a, bloom amplitude, and bloom area might be due to high Si consumption on more productive years by diatoms, the main phytoplankton group behind the spring bloom. Higher (warmer) values of AMO correspond to higher surface temperature, surface pressure, and precipitation in Western Europe [75]. Due to the large temporal scale of AMO, it becomes difficult to conclude how higher AMO could lead to earlier termination, shorter blooms, and higher bloom frequency. Nevertheless, it is possible that the more unstable weather and higher precipitation in this region could become disadvantageous for phytoplankton growth, therefore interrupting a bloom and contributing to its earlier termination and, eventually, the onset of a new bloom (thus, the higher bloom frequency).
The similarity between OcSW and OcN is reflected by the two shared main predictors identified by the RF models: DIN and Si. Nevertheless, the way these predictors interact with the bloom phenology metrics is different. DIN here is related to the chl-a max, BAmp, and yearly biomass, while Si is linked to bloom peak date, bloom biomass, and yearly biomass. In OcSW, PAR is higher; seawater is typically warmer, scarce in macronutrients, and with a shallower mixed layer [30]. While MLD and AMO are important for bloom phenology in the northern waters, the same is not true in OcSW. With typically lower MLDs, deep nutrient cycling is not as effective, resulting in lower nutrient availability at the surface [8] and contributing to lower biomass. This corroborates the results here presented, which suggest DIN is the main factor regulation max chl-a and BAmp. Another major driver of bloom phenology in OcSW was MEI. OcSW, in agreement with [15], was also the only phenoregion that had MEI as one of its major predictors. Positive MEI in the temperate North Atlantic has been linked, among others, to intensified winds and lower phytoplankton growth [14]. Here, on the contrary, it was associated with a higher mean chl-a. It is possible that higher MEI could be promoting more intense winds and, consequently, higher mixing. Since waters in the OcSW are more stratified and poorer in macronutrients, this mixing could increase surface nutrient availability, increasing phytoplankton biomass. These results corroborate how important vertical mixing and nutrient availability is for the highly oligotrophic waters off the SW Iberia [14,29,64].
Bloom phenology in CoMa was seen to be regulated by DIN and SSH. DIN again appears as the main nutrient available for phytoplankton communities, mirroring the two oceanic phenoregions. SSH allows to map warmer/colder areas and mesoscale processes (e.g., upwelling, eddies), making it likely that SST also has a role in this region's bloom phenology, although this variable was not signaled by the RF models. Previous studies [76][77][78] have shown that this area is characterized by eddies and currents. These phenomena contribute to deeper mixed layers, fueling nutrient cycling and entrainment, essentially sustaining phytoplankton [21]. Corroborating these studies, the RFs showed that higher average SSH was linked to higher bloom and overall productivity, as well as longer blooms. This is also in line with previous studies that found a strong coupling between phytoplankton biomass and distribution and small-scale oceanographic features in NW Iberia [35].
CoUp and CoBa were two highly heterogenous phenoregions in terms of main predictors, which shared several predictors. For CoUp, as a phenoregion characterized by coastal upwelling, it is logical that phytoplankton bloom phenology should be modulated by factors associated with upwelling. The fact that westward water transport (typical of upwelling off the NW Iberia) was linked to a higher bloom area corroborates this hypothesis. Plus, the association of MEI and higher max chl-a and BAmp might be a consequence of the intensified winds as a result of positive MEI in the temperate North Atlantic, which would lead to more intense upwelling in this region. Higher salinity, being linked to higher phytoplankton biomass and higher bloom frequency, could be a proxy of the upwelling of more dense (higher salinity) waters, providing nutrients essential for phytoplankton growth. Coastal upwelling intensity in the WIC is expected to increase in the future, as in other eastern boundary upwelling systems (e.g., [79,80]), even though recent studies have contradicted this trend in the WIC [81]. While higher upwelling intensity should promote higher nutrient availability, the expected increased wind intensity will also cause deeper water column mixing and light limitation, leading to lower phytoplankton biomass in some regions [79]. Apart from biomass, bloom phenology may also be impacted by the predicted changes in upwelling [20]. For instance, in the northern California Current, the spring bloom timing is expected to change, possibly causing mismatches throughout the food web [82]. With upwelling-related drivers being so important for bloom phenology in the WIC, it is essential to continue monitoring this region.
On CoBa, like CoUp, higher salinity was linked to higher biomass. Since this region's productivity is highly dependent on river discharges and its nutrient input, it would be expected that lower, not higher; salinity would be the factor behind years with higher biomass. Bloom initiation was seen to occur early with lower NAO, Si, and the euphotic zone depth (Zeu). While higher Si and deeper Zeu provide typical favorable conditions for bloom growth, higher (positive) NAO index values are linked with lower river flow in the Iberian Peninsula [83], which leads to lower riverine nutrient input. Thus, CoBa appears to be a phenoregion for which the RF results exhibit several contradictions, which is expected since it was the phenoregion with the worst performance, likely due to the inherent variability of the region.
Overall, drivers of productivity in each of the phenoregions were successfully identified, exhibiting the importance of these studies and potential for the use of RF models for studying phytoplankton ecology. Nevertheless, interannual variations in bloom initiation were more difficult to assess (lower model performances for BPeak, Binit, and BTerm). Likely that the length of the dataset used (21 years) may not be enough to offset the high variability in bloom timing in coastal areas. Henson et al. [6] suggested that 30-60 years of data may be required to disentangle natural and anthropogenic signals in phytoplankton. Plus, while CoUp and CoBa exhibit coherent phenological patterns, they include small areas that likely have small environmental differences (e.g., different upwelling sites, different rivers, etc.). Thus, the predictors for these coastal phenoregions should be interpreted carefully. To improve on these results, newer studies should focus on smaller specific areas and use higher spatial resolution (1 km or 300 m).

Remote Sensing as a Tool for Assessing Bloom Phenology in Coastal Regions
While ocean color remote sensing has been extensively used to study phytoplankton bloom phenology, there are a few caveats that should be considered. First, the collection of continuous, high-quality ocean color remote sensing data only started in 1998. Thus, only recently have +20 year datasets been available, which may be insufficient to fully characterize bloom phenology in a given region. Second, it is essential that spatial and temporal resolution match the region of interest and the question at hand. For instance, global phenology studies allow for coarser spatial and temporal resolution than complex coastal regions, such as WIC. Otherwise, studies may overlook relevant phenology patterns, such as smaller yet intense blooms or shifts in bloom timing.
This study used 21 years' worth of data (1998-2018) with a moderate (4x4 km) spatial resolution and a high (1 day) temporal resolution. To avoid temporal gaps in the dataset, a combination of two gap-filling methodologies from previous studies [25,42] was successfully applied. To the authors' best knowledge, this is the first phytoplankton phenology studies using daily data. Results were promising as this not only helped discern bloom timing with higher detail but also identify significant trends in bloom timing, which otherwise might not have been possible. Plus, the additional resolution allowed the detection of subtle differences in bloom timing and duration between phenoregions, highlighting the importance of using a daily resolution to assess phenology in coastal regions. This difference in information gained between daily and weekly resolution may be essential for accurate environmental management of coastal areas. Water-quality management, fisheries, or aquaculture may especially benefit from this. Nevertheless, it is important to be careful when OCRS in optically complex waters since the increase in uncertainty and variability may lead to contradictory results, as happened in this study for the most sensible phenology metrics in coastal phenoregions.
Due to the projected continuity of operational ocean color missions [84], ocean color datasets will not only increase their temporal span but also allow for phytoplankton phenology studies with higher temporal and spatial resolution. Furthermore, the launch of hyperspectral sensors, such as the Ocean Color Instrument (OCI) aboard NASA-PACE, will enable improved assessment of phytoplankton groups, paving the way for high-resolution group-specific phenology studies at a global scale. In the end, ocean color datasets are expected to enable the disentangling of interannual and multidecadal variability in phenology patterns, allowing scientists to discern the impacts of climate change on its variability [84].

Final Considerations
Oceanic phytoplankton communities form typically long, low-biomass spring blooms, mainly influenced by atmospheric phenomena and water column conditions. Nevertheless, a clear difference exists between northern and southern waters in the WIC, as blooms in northern waters are more akin to the classical spring bloom and appear to have been starting and peaking later over the past 20 years. Coastal phytoplankton, however, are characterized by short, high-biomass, highly heterogeneous blooms, as nutrients, sea surface height and horizontal transport had major roles in shaping phenology. Winddriven upwelling and riverine input were seen to be major factors affecting bloom phenology in the coastal areas.
Changes in phytoplankton bloom phenology should be considered in coastal management and monitored due to their possible consequences to the functioning of the ecosystem. This work is expected to contribute to the understanding of phenology in the WIC. WIC is located amid the northern zone of the Canary Upwelling System (one of the four eastern boundary upwelling systems in the world [85]). These systems are responsible for over 20% of the world's fisheries production, and phytoplankton bloom phenology is intertwined with fish recruitment. Mismatches between the seasonal timings of fish reproduction and the peak biomass of plankton typically lead to poor fish recruitment [4], and extreme mismatches (>30 days) in the WIC will likely become more frequent in the future due to climate change [5]. Therefore, future studies should focus on the possible implications of changes in bloom phenology on these resources as a trophic mismatch or bottomup phenomena may negatively impact stock biomass and recruitment. Moreover, this study can be helpful for future bloom phenology studies across the other eastern boundary upwelling systems, such as the California, Benguela and Peru upwelling systems, as they all share several oceanographical features.
The five phenoregions here described and their associated phenology metrics and drivers should be useful for the assessment of phytoplankton in the EU Marine Strategy Framework Directive (Directive 2008/56/EC) subregion "Bay of Biscay and the Iberian Coast" and for the implementation of the EU Water Framework Directive (Directive 2000/60/EC) [86]. It also complements other works in the WIC using ocean color remote sensing to aid the management of regional fisheries and aquaculture activities (e.g., [87][88][89][90]).

Supplementary Materials:
The following are available online at www.mdpi.com/2072-4292/13/4/675/s1, Figure S1: Seasonal chlorophyll-a in the WIC, Figure S2: Phenoregions hierarchical clustering dendrogram, Figure S3: random forest partial dependency plots. Funding: A. Ferreira received a grant from the Mar2020 -Programa Operacional Mar2020, under the AQUIMAR project (MAR-02-01-01-FEAMP-0107) and a PhD grant (SFRH/BD/144586/2019) from Fundação para a Ciência e a Tecnologia (FCT). A.C. Brito was funded by FCT through the Scientific Employment Stimulus Programme (CEECIND/00095/2017). This work benefited from the Infrastructure CoastNet (http://geoportal.coastnet.pt), funded by Foundation for Science and Technology (FCT) and the European Regional Development Fund (FEDER), through LISBOA2020 and ALENTEJO2020 regional operational programmes, in the framework of the National Roadmap of Research Infrastructures of strategic relevance (PINFRA/22128/2016). This study also received further support from FCT through MARE's strategic programme (UID/MAR/04292/2019). In addition, this work was also funded by the European Union's Horizon 2020 Research and Innovation Programme under grant agreement N 810139: Project Portugal Twinning for Innovation and Excellence in Marine Science and Earth Observation -PORTWIMS.
Institutional Review Board Statement: Not applicable.