The size of phytoplankton not only influences its physiology, metabolic rates and marine food web, but also serves as an indicator of phytoplankton functional roles in ecological and biogeochemical processes. Therefore, some algorithms have been developed to infer the synoptic distribution of phytoplankton cell size, denoted as phytoplankton size classes (PSCs), in surface ocean waters, by the means of remotely sensed variables. This study, using the NASA bio-Optical Marine Algorithm Data set (NOMAD) high performance liquid chromatography (HPLC) database, and satellite match-ups, aimed to compare the effectiveness of modeling techniques, including partial least square (PLS), artificial neural networks (ANN), support vector machine (SVM) and random forests (RF), and feature selection techniques, including genetic algorithm (GA), successive projection algorithm (SPA) and recursive feature elimination based on support vector machine (SVM-RFE), for inferring PSCs from remote sensing data. Results showed that: (1) SVM-RFE worked better in selecting sensitive features; (2) RF performed better than PLS, ANN and SVM in calibrating PSCs retrieval models; (3) machine learning techniques produced better performance than the chlorophyll-a based three-component method; (4) sea surface temperature, wind stress, and spectral curvature derived from the remote sensing reflectance at 490, 510, and 555 nm were among the most sensitive features to PSCs; and (5) the combination of SVM-RFE feature selection techniques and random forests regression was recommended for inferring PSCs. This study demonstrated the effectiveness of machine learning techniques in selecting sensitive features and calibrating models for PSCs estimations with remote sensing.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited