Machine Learning Models for Approximating Downward Short-Wave Radiation Flux over the Ocean from All-Sky Optical Imagery Based on DASIO Dataset

: Downward short-wave (SW) solar radiation is the only essential energy source powering the atmospheric dynamics, ocean dynamics, biochemical processes, and so forth on our planet. Clouds are the main factor limiting the SW ﬂux over the land and the Ocean. For the accurate mete-orological measurements of the SW ﬂux one needs expensive equipment-pyranometers. For some cases where one does not need golden-standard quality of measurements, we propose estimating incoming SW radiation ﬂux using all-sky optical RGB imagery which is assumed to incapsulate the whole information about the downward SW ﬂux. We used DASIO all-sky imagery dataset with corresponding SW downward radiation ﬂux measurements registered by an accurate pyranome-ter. The dataset has been collected in various regions of the World Ocean during several marine campaigns from 2014 to 2021, and it will be updated. We demonstrate the capabilities of several machine learning models in this problem, namely multilinear regression, Random Forests, Gradient Boosting and convolutional neural networks (CNN). We also applied the inverse target frequency (ITF) re-weighting of the training subset in an attempt of improving the SW ﬂux approximation quality. We found that the CNN is capable of approximating downward SW solar radiation with higher accuracy compared to existing empiric parameterizations and known algorithms based on machine learning methods for estimating downward SW ﬂux using remote sensing (MODIS) imagery. The estimates of downward SW radiation ﬂux using all-sky imagery may be of particular use in case of the need for the fast radiative budgets assessment of a site.


Introduction
Solar radiation is the main source of energy on Earth [1].It is also of great significance for biogeochemical, physical, ecological, and hydrological processes [2,3].Cloud cover, in turn, is the main physical factor limiting the downward solar radiation flux [4][5][6].Cloud cover during the day reduces the influx of solar radiation to the Earth's surface, and significantly weakens its outgoing long-wave radiation at night due to backscattering [7].This entails corresponding changes in other meteorological quantities.The functioning of agriculture, transport, aviation, resorts, alternative energy enterprises, and other sectors of the economy, in one way or another, depends on the amount and shape of clouds.
There are two options for flux estimation in modern models of climate and weather forecasts.The first is physics-based modeling of radiation transfer through two-phase medium (clouds), which includes modeling of multi-scattering, taking into account the microphysics of cloud water drops [8] and aerosols.This option is extremely computationally expensive.Alternatively, one may use parameterizations which are simplified schemes for approximating environmental variables using only routinely observed cloud properties, such as Total Cloud Cover (TCC), cloud types, and cloud cover per height layer.The existing parametrizations are empirical and were proposed years and decades ago based on observations and expert-based assumptions [9,10].As a result, they may not take into account the entire variety of cloud situations occurring in nature, which may lead to a reduced quality of approximation of downward SW solar radiation flux.
Our goals were to get computationally cheaper estimations of downward solar radiation flux and to study flux dependence on structural characteristics of clouds.The aim of this study was to improve the accuracy of existing parameterizations of downward SW radiation flux.In this study, we assess the capability of machine learning models in the scenario of statistical approximation of radiation flux from all-sky optical imagery.We solve the problem using various machine learning (ML) models with the assumption that an all-sky photo contains complete information about the downward SW radiation.
There are a number of studies on the forecasting of downward SW radiation using advanced statistical models, namely, machine learning models [3,11,12].Most of them deal with the time-series of SW radiation flux measured directly by an instrument (radiometer); thus, in case of the need for a low-cost assessment package, one cannot apply this approach.There are also a number of ML methods published for estimating other useful properties of the clouds, for example, total cloud cover [13,14] or cloud types [15][16][17].
There are a number of studies demonstrating the capabilities of machine learning methods in estimating the SW flux from remote sensing data, for example, MODIS [3,18] or GMS-5 [19].There are also studies demonstrating the links between properties of cloud cover and surface solar irradiance [20,21].However, these studies are not focused on approximating the flux directly from all-sky imagery.Rather, in these studies, all-sky imagery is commonly used for assessing some semantically meaningful properties of cloud cover, that are then used to categorize the events of solar irradiance measurements.
To the best of our knowledge, there is only one study demonstrating the capabilities of statistical modeling in the problem of the estimation of downward SW radiation flux [22].In this study, though, the statistical relation is demonstrated between the semantically rich meteorological features (solar zenith angle, surface albedo, hemispherical effective cloud fraction, ground altitude and atmospheric visibility) and the SW radiation flux.In contrast with this study, we model the statistical relations between the raw all-sky imagery and the SW radiation flux.We do not propose to infer any of semantically significant features of the all-sky visual scene.The only semantically meaningful feature we propose to use is the sun altitude, which we compute using the position, date, and time of observations.The rest of the paper is organized as follows: in Section 2, we describe the dataset that we used in our study; in Section 3, we introduce the methods we exploited in our study for approximating the SW downward radiation flux; in Section 4, we present and discuss the results of our study.In Section 6, we summarize the paper and present the outlook for further study.

Data
In this section, we present source data for our study.The problem we tackle is to map allsky imagery to net downward SW radiation flux using state-of-the-art statistical models (also known as machine learning models).We used a high-resolution fish-eye cloud-camera «SAIL cloud v.2» [14], also known as SAILCOP (which stands for "Sea-Air Interactions Laboratory Clouds Optical Package") [13] to collect all-sky images, and a Kipp and Zonen CNR-1 net radiometer (Kipp and Zonen, Delft, The Netherlands) to measure net downward SW flux.In Figure 1, we present the equipment used to collect the data.The net radiometer Kipp and Zonen CNR-1 is a tool used for the measurements of incoming and outgoing net solar (also known as short-wave, SW hereafter) and far-infrared (also known as long-wave, LW hereafter) radiation in various weather conditions, including rough seas.The CNR-1 net radiometer is equipped with four separate sensors: two for downward fluxes (SW and LW components), and the rest for outgoing radiation (SW and LW components).The CNR-1 design is such that both the upward-facing and downward-facing sensors measure the energy that is received from the whole hemispheres, upper and lower, thus having a 180-degree field of view.The output is expressed in Watts per square meter; thus, one may use the measurements as is, without any transformations.The spectral range covers both the SW radiation, meaning wavelengths from 300 to 3000 nm, and the LW radiation, meaning wavelengths from 4.5 to 42 µm.SW net radiation is measured by two pyranometers, one for measuring incoming radiation from the sky, and the other, which faces downward, for measuring the reflected SW radiation.LW radiation is measured by two pyrgeometers, one for measuring the LW radiation from the sky, and the other from the sea surface.We use the CNR-1 net radiometer in the four Separate Components Mode (4SCM) [23].According to the user manual [23], the nonlinearity of the measurements of both SW and LW sensors is ±2.5%.In recent studies, the CNR-1 net radiometers were compared to high-standard reference radiation instruments measuring individual SW and LW downward and upward flux components [24].It was shown that the CNR-1 radiometer demonstrates quite a high measurement quality commonly characterized by root-mean-square errors below 14 Wm −2 .In our study, we used the measurements as is, without any corrections.In our marine missions, though, the CNR-1 net radiometer was mounted close to the shipboard; thus, the reflected SW and LW radiation was strongly influenced by the reflection and self-irradiance of the board.Thus, we did not use the outgoing radiation measurements.
The fish-eye cloud-camera SAILCOP is developed and assembled in Sea-Air Interactions Laboratory, Shirshov Institute of Oceanology, Russian Academy of Sciences, Moscow, Russia.It was first presented in 2016 [14].It was designed following the concept of all-sky digital optical imagers presented in 1998 by Long et al. [25].The concept was then adopted in various recent studies [26][27][28][29][30][31][32][33][34].The term "cloud-camera", which we use here, is a synonym for "all-sky camera", "all-sky imager", "whole sky camera", "total sky imager", and many others similar to those mentioned in the studies referenced above.The main function of these packages is to register the visual image of the visible hemisphere of the sky-dome using a ground-based optical fish-eye camera.In our marine expeditions, our all-sky camera was mounted onboard a ship, and directed upwards when the ship was not waving.An all-sky camera commonly has a 180-degree field of view; thus, an image taken by it presents the whole visible part of the sky.The common purpose of an all-sky imager is to register the sky with visible clouds in order to automatically retrieve properties of clouds that are historically assessed by a human observer, for example, total cloud cover [13,31,34] or cloud types [17,[35][36][37][38][39].In our optical package, we used an all-sky fish-eye optical camera, Vivotek FE8171V [40,41].One may examine its complete characteristics in the Data Sheet [41] or User's Manual [40].Here, we emphasize its main properties in Table 1.The camera was operated by a software package developed in our Sea-Air Interactions Laboratory of the Shirshov Institute of Oceanology, Russian Academy of Science.The software runs on a personal computer collecting the imagery and concurrent data.The concurrent data were acquired by an extra mini-computer equipped with a GPS device and a positioning sensor (see the box under the camera in Figure 1b).The concurrent data include NMEA sentences from the GPS device, 50 Hz three-dimensional accelerometer measurements, 50 Hz threedimensional gyroscope measurements, and additional service readings.The software on the operating personal computer requests imagery from the optical camera once during a 10-s period if the camera is horizontal according to the accelerometer.The communication of an operating personal computer with the camera and mini-computer was established using a high-speed TCP/IP connection over an Ethernet cable, which was also used to provide a power supply to both the camera and the mini-computer using PoE (Power over Ethernet) technology.SAILCOP includes two identical optical cameras, each equipped with its own extra mini-computer, GPS device, and positioning sensor.We mounted them apart from each other and measured the distance between them.The software on the operating personal computer requests imagery from both optical cameras simultaneously.This way, we always acquire two images of the same sky-dome taken from two different points 15 to 35 m apart, depending on the ship and mounting scheme.
Due to a common misconception that can be observed during various discussions about all-sky imagery, we describe here the sense of all-sky images.They are the hemispheric photographs of the upper visible hemisphere taken from the ground, from the sea surface, or from the board of a ship using a fish-eye optical camera directed upwards, or employing a hemispherical mirror with a narrow-angle camera [25] (see an example of an all-sky image in Figure 1c).In an all-sky image, one may usually observe the blue sky partially covered with (commonly) white clouds of various degrees of translucency.An all-sky camera is commonly used to assess cloud features; thus, it is usually mounted apart from large structures in order to prevent them from obscuring a substantial fraction of a visible sky-dome.In the case of marine expeditions, one cannot place the cameras far enough from high structures of the ship; thus, we use a mask covering the parts of the ship in an image (see black regions in Figure 1c).Table 1.Key features of the Vivotek FE8171V fish-eye camera [41], the main component of our optical package, SAILCOP.The source data we used in our study was the Dataset of All-sky Imagery over the Ocean (DASIO) [13], which we collected in marine expeditions starting from 2014 using the equipment we presented above.The regions covered in these missions include the Indian and Atlantic oceans, Mediterranean sea, and Arctic ocean.In this dataset, the exhaustive set of cloud types is present.DASIO contains over 1,500,000 images of the sky-dome over the ocean, accompanied by downward SW radiation flux measurements.SW solar flux was averaged over 10 s intervals, and the all-sky images were registered every 20 s.The viewing angle of the Kipp&Zonen CNR-1 sensors was 180°in both vertical planes.The viewing angle of the cloud-camera was similar.Photos taken from the fisheye cloud-camera had a high enough resolution to resolve fine cloud structural details (1920 × 1920 px).The white balance and brightness of photos was adjusted automatically for the most comfortable visual experience.

Feature
In our study, we employed a subset of DASIO.The size of the training subset was more than 1,000,000, and the size of the test subset was more than 350,000 images (see Table 2).In other words, the ratio of the volumes of test and training subsets is 1:3.A particular sampling strategy was involved when we split the dataset into training and testing subsets.Since the period of image acquisition is 20 s, the visual scene of the sky-dome does not change substantially between subsequent images.Thus, two subsequent all-sky images are strongly correlated.In this case, subsequent images may be considered identical with small perturbations.Since training and testing subsets should not include identical examples, one needs to sample subsequent images in such a way that it prevents images from falling into training and test sets on a systematic basis.The issue of strongly correlated examples being massively included into training and testing subsets may arise in the case of random perimage sampling.In order to avoid this issue, we applied temporal block-folded sampling.To be precise, we applied random sampling using hours of observations instead of objects (images) themselves.In the ML approach, one also needs to split the dataset into training and testing subsets in a way that would preserve the statistical characteristics in both of them.In the case of the sampling strategy we exploited in our study, our training and testing subsets have the same statistical characteristics.In order to demonstrate this, we present the distributions of target value (SW flux) in Figure 2b for both training and testing subsets.One may clearly see that the distributions are close to each other.In Figure 3, we present the map of the missions that were included in the DASIO subset we employed in this study.One may observe that the tracks of the missions are not continuous, since we limited the set of examples based on local sun elevation; that is, we excluded the examples of the DASIO dataset with a sun elevation lower than 5 • .Thus, during the nighttime, there were no data.In Table 3, we also provide a brief summary of the research missions contributing to the DASIO subset used in this study.Figure 1c also demonstrates a mask we applied to each photo, which filters out visual objects that are not related to the subject of our study.In addition, when training our ML models, we used only the data acquired during daylight hours.In particular, we subset the images taken when the sun altitude exceeded 5 • , and the radiation flux exceeded 5 W/m 2 .
We state the problem as follows: for each observation of the whole sky registered in an all-sky image, one needs to approximate the value of the short-wave radiation flux, which is supervised in the form of CNR-1 measurements.In terms of the machine learning (ML) approach, it is a regression task with the scalar target value.We used mean squared error as a loss function for the ML models exploited in this study.We also characterized the quality of the solutions using a mean absolute error (MAE) measure.

Inverse Target-Frequency (ITF) Re-Weighting of Training Subset
In target value distribution (see Figure 4), one may notice a strong predominance of data points with low SW flux.Thus, the dataset is strongly imbalanced w.r.t.target value.This kind of issue may cause reduced approximation quality [42,43].In our study, we chose to exploit the approach of weighting the data space (following the terminology of [43]).In order to improve the approximation skills of our models, we balanced the training dataset using inverse-frequency re-weighting.We named it inverse target frequency (ITF) re-weighting.To be precise, we made the weights w i of individual examples of the training dataset inversely proportional to the frequency of target values: where i enumerates inter-percentile intervals from 0-th to 99-th; d i are the inter-percentile intervals of empiric target value distribution, and N p = 100 is a number of inter-percentile intervals.Here, the less the target frequency, the greater the inter-percentile interval d i ; thus, the greater the weights w i of the examples.In order to illustrate the approach we propose, we present percentile-wise vertical lines in Figure 4, so one may notice uneven inter-percentile distances in the cumulative distribution function figure.We present the percentiles in the CDF figure using vertical red lines in order to demonstrate the inter-percentile distances.
In addition to the ITF re-weighting, we also propose the scheme for controlling the re-weighting strength using the α coefficient: Here, one may notice that the closer the α gets to 1, the stronger the re-weighting which is applied.In the case where α = 0, there is no re-weighting, meaning w i = 1.Given the form of the weights w i and w i , one may notice that their expected value is exactly 1.0.
Coefficient α is a hyperparameter of our re-weighting scheme, which is optimized during the hyperparameter optimization stage.
In order to demonstrate the effect of ITF re-weighting, we present the resulting histogram in Figure 2a.In this histogram, we show the frequencies for inter-percentile ranges of target value (SW flux) scaled in accordance with the ITF re-weighting scheme.One may notice that the bars of the histogram have uneven widths.This is expected behavior, since we demonstrate the resulting distribution for the set of inter-percentile intervals that are uneven (see Figure 4b).One may clearly see that the effective frequencies of various inter-percentile ranges are close to each other.Thus, one may consider the dataset balanced w.r.t.target value.

Feature Engineering
An arbitrary optical digital image may be considered an array of size W × H × C, where W and H are its width and height in pixels, and C is the channel number, C = 3 for a regular RGB image.Here, RGB stands for red, green and blue components of the color of a pixel in the RGB color model [44].When composing the feature space for an image, we collected various statistics of each color channel (R,G,B) excluding masked pixels.A mask is the black-and-white binary picture obscuring constructions of a ship visible in an all-sky image.These constructions are irrelevant in our problem.One may observe an example of a mask (black part of the image) in Figure 1c.Here, we enlist the statistics we collect for each color channel of all-sky images as real-valued features of feature space: There are various color models [44], including one that is particularly useful in cloud detection when using optical imagery, which is a HSV color model.Here, H, S, and V stand for Hue, Saturation, and Value.The latter is strongly correlated with brightness and intensity calculated in other color models.Since these characteristics of pixels are useful in cloud detection, segmentation, and classification problems [32,45], we decided to include the same statistics (see the list above) of HSV channels into feature space as well.Additionally, since downward SW radiation flux is strongly dependent on the sun altitude [9], we included this feature into feature space.
Using the statistics types listed above (27 in total including 21 percentiles) computed for all of the six color channels (R,G,B,H,S,V), as well as sun elevation, we engineered a 163-dimensional real-valued feature space for all-sky images in our study.The feature engineering step was only performed when we employed classic machine learning models (see Section 3.2).

Machine Learning Methods
In our study, we used two approaches: the classic approach, and the so-called end-toend approach with the convolutional neural network employed.

Classic Models
Within the classic approach, we examined the following ML models: multilinear regression and non-parametric ensemble models, that is, Random Forests (RF) [46] and Gradient Boosting (GB) [47][48][49].Training and inference of the "classical" ML models in our study was performed using scikit-learn [50] implementations of these models.

Convolutional Neural Network
Within the end-to-end approach, we did not compute any of the expert-designed features described in Section 3.1.In contrast, we applied a convolutional neural network (CNN) [51] directly to the images.
Prior to the processing of an image by our CNN, we preprocessed the image.First, we resized the image to 512 × 512 px size using the "nearest neighbor" aggregation method.Then, we applied strong alterations of average brightness.We altered the brightness in order to encourage the CNN to learn the dependency of SW flux on the cloud spatial structure, rather than average brightness, average blue saturation, or other simple statistics of the image.We also added spatially correlated Gaussian noise to each image in order to prevent CNN from learning the dependency of SW flux on channels' simple aggregated statistics (e.g., mean, variance).These augmentations are also meant to increase the generalization ability of our CNN.Within this end-to-end neural networks-based approach, we used the feature of sun altitude, as well as in the case of classic machine learning models.
The structure of the CNN exploited in our study is shown in Figure 5.As one may see in this figure, the input example is an all-sky RGB image resized to the resolution of 512 × 512 px.In order to speed up the training process and improve the quality of the approximation, we employed the transfer learning approach [52].That is, a pre-trained version of ResNet50 [53] network was used, which was pre-trained on the ImageNet [54] dataset.The output of the ResNet50 convolutional sub-network is a 2048-dimensional vector.We concatenated the sun altitude to this vector; thus, the resulting vector is 2049dimensional.This 2049-dimensional vector is then processed by a fully connected subnetwork.The structure of this sub-network is presented in Figure 5.The output of this subnet is a real scalar value approximating SW flux.
When training our CNN, we used the Adam stochastic optimization algorithm [55].Training and inference of our CNN was implemented with a Python programming language [56] using Pytorch [57], OpenCV [58] for Python, and other high-level computational libraries for Python.In both the ensemble models (RF and GB) we exploited in our study, there are hyperparameters besides the α re-weighting coefficient we presented above.Among them are the ensemble members in RF and GB, the maximum depth of the trees of the ensemble, and so forth.The CNN is also characterized by a number of hyperparameters: its depth, the width of fully connected layers in fully connected subnets, the hyperparameters of the Adam optimization procedure, and also the magnitude of data augmentation transformations.We employed the Optuna framework [59] for hyperparameter optimization (HPO).During the HPO stage, the quality of each model initialized with a sampled hyperparameter set is assessed within the K-fold cross-validation (CV) approach with K = 5.Due to strongly correlated examples (all-sky images) that are close in temporal domain, we ensured the independence of training and validation CV subsets using a Group K-fold cross-validation approach where groups are hourly subsets of all-sky images.In the case of RF and GB models, we assessed the mean RMSE measure, as well as its uncertainty within the Group K-fold CV approach.

Results
In this section, we present the results of our study.To assess the quality of our models, we used the root mean square error (RMSE) measure.In order to estimate the uncertainty of the quality measures, we trained and evaluated each model several times (typically, 5-7) and estimated the confidence interval of 95% significance levels, assuming the RMSE is a normally distributed random variable.Additionally, the visual representation of the results is given in the form of value mapping diagrams (Figure 6), where the correspondence between approximated and measured flux values is presented in the form of point density.In Figure 7, we present the error histograms for the models involved in our study.
In Figures 6 and 8, one may see that the models generally underestimate high fluxes and overestimate low fluxes.It is also clear that the multilinear model approximates the flux worse than other models, which is supported by the RMSE measures in Table 4 and quantile-quantile plots in Figure 8.The results of CNN are the best among others in terms of formal RMSE measures, as well as approximated-to-measured value-mapping diagrams.In our study, we built and trained four ML models to approximate the downward shortwave radiation flux.We found that the quality of the CNN, which was built within the end-to-end approach, is the best compared to other ML models.As we mentioned in Section 1, there are no previously published papers demonstrating any methods for approximating downward SW flux using all-sky imagery.Thus, the only approaches we may compare with are the ones that propose estimating downward SW flux using complementary data (e.g., geoposition, date and time, properties of clouds) also known as parameterizations.In this study, we compared the quality of our models with existing SW radiation parameterizations known from the literature [9,10] and existing algorithms based on machine learning for estimating downward SW flux using remote sensing (MODIS) imagery [3].In Table 4, we presented the quality of our models assessed after the hyperparameter optimization.We also provided RMSE estimates of the parametrizations [9,10] and an ML-based algorithm applied on MODIS imagery [3] as a reference.One may observe that parameterization errors strongly depend on the amount of cloudiness: the higher the total cloud cover (TCC), the higher a parameterization error.We have provided the error range in brackets for parameterizations known from the literature.
In Figure 7, we also demonstrate error distributions for each of the ML models of our study.In the CNN error distribution (Figure 7d), one may see that the neural network is prone to underestimate the SW flux slightly.Additionally, it is clear that error distribution tails are pretty heavy for both the RF and GB models, and are light for CNN.These features of error distribution for our models are also in agreement with the variance of errors that have been presented in Table 4 in the form of RMSE (taking into account that the errors are zero-centered; thus, RMSE is the square root of variance in this case).Table 4. Quality metrics of ML models exploited in this study, parameterizations of SW radiation known from the literature [9,10], and an algorithm based on machine learning for estimating downward SW flux using remote sensing (MODIS) imagery [3].Best model along with its quality metric are highlighted using bold font.

Discussion
One may observe that the ITF re-weighting did not make any difference in terms of the RMSE quality measure.Neither Random Forests, nor Gradient Boosting for Regression models demonstrated any performance improvement due to ITF re-weighting.It is a common belief in the machine learning community that in order to improve the performance of a ML model in a problem characterized by a strongly imbalanced dataset, one needs to re-weight it, bringing the distribution of target variables close to uniform distribution.Alternatively, one needs to apply a sampling strategy that has an equivalent effect in the case of mini-batch training, such as when training artificial neural networks.In this study, we applied proper re-weighting that brings the effective distribution of downward SW flux to a uniform distribution.We present here the results of machine learning models with ITF re-weighting applied in order to demonstrate a perfect case of a strongly imbalanced dataset where a proper hyperparameter-optimized re-weighting does not improve ML models' performance.
One may also note that the models we present demonstrate some issues.Multilinear regression is a fast model; however, it has the worst quality.RF and GBR demonstrate comparable quality and are relatively fast in their inference times.At the same time, one may note non-smooth error distribution in diagrams in Figure 6b-e.We suppose that the regular drops in point density may be explained by the decision-tree-based nature of these two ensemble models.One may also notice the outliers in these diagrams that may be of interest in forthcoming studies.In this study, we did not filter the outliers comprehensively; thus, there may be irrelevant examples in the dataset that represent photographs of birds, operators cleaning the glass dome of SAILCOP cameras, and so forth.
There are limitations to the approach we used in our study for approximating downward SW flux from all-sky RGB optical imagery.We found that our CNN is capable of approximating SW flux by relying on the spatial structure of clouds present in an all-sky image.We even encouraged our CNN to learn this link by applying heavy image augmentations described in Section 3.2.However, in the presence of fog or haze, it is most probable that most clouds will be present in a corresponding all-sky image; thus, the method exploiting our CNN may deliver SW flux estimates with certain errors.The degree of uncertainty imposed by particular meteorological conditions, including the presence of fog, haze, and strong aerosol pollution, is to be assessed in forthcoming studies.

Conclusions
In this study, we presented an approach for the approximation of short-wave solar radiation flux over the ocean from all-sky optical imagery using state-of-the-art machine learning algorithms, including multilinear regression, Random Forest, Gradient Boosting, and Convolutional Neural Networks.We trained our models using the data of the DASIO dataset [13].We assessed the quality of our models in terms of root mean squared error (RMSE), approximated versus measured flux diagrams, error histograms, and quantilequantile plots.The results allowed us to conclude that one may estimate downward SW radiation flux directly from all-sky imagery, taking some well-known uncertainty into account.We also demonstrate that our CNN trained with strong data augmentations is capable of estimating downward SW radiation flux, mostly based on clouds' visible structure.At the same time, the CNN has shown to be superior in terms of flux RMSE compared to other ML models in our study.
Our method of flux estimation may be especially useful in the tasks of low-cost monitoring of downward SW flux.From a practical point of view, one may use an all-sky imager instead of high-grade radiometer in order to assess the radiative regimen of a region using a low-cost, all-sky camera.In our study, we demonstrated that a low-cost optical package accompanied by a trained ML algorithm may provide SW flux estimates of reasonable quality.These estimates may be useful for planning the positions of solar power plants, predicting the power plants generation, and so forth.
In our study, we demonstrated that the SW flux may be estimated by a ML model with a reasonable quality using all-sky imagery and sun elevation only.At the same time, there are a number of studies presenting the methods for retrieving various cloud properties from all-sky images [13,14,16,25,[37][38][39]].Thus, one may use these methods for assessing the properties of clouds and downward SW radiation based on an all-sky image, and hence, train an ML model linking the properties of clouds to SW radiation flux.One may also assess the same cloud properties from atmospheric models.Thus, there is a way to use an ML model to estimate downward SW flux based on modeled atmospheric data containing characteristics of clouds.This method of estimating SW flux in an atmospheric model may significantly reduce the computational load of its radiation subroutine.
Our results suggest that there are outliers in the DASIO dataset that may be filtered in forthcoming studies.The results also suggest that hyperparameter optimization of our

Figure 1 .
Figure 1.Equipment we used to collect the data, and an example of all-sky optical imagery over the ocean: (a) radiometer Kipp and Zonen CNR-1; (b) cloud-camera SAILCOP [13]; (c) an all-sky photo with its mask covering the structures of the ship.

Figure 2 .
Figure 2. (a) Effective distribution of target values (SW radiation flux) in the train subset as a result of ITF re-weighting.One may clearly see that re-weighted frequencies of various ranges are close to each other (bars have almost the same height); thus, one may consider the dataset balanced.(b) Distribution of target value (SW radiation flux) in training and test subsets without ITF re-weighting.One may clearly see that the distributions are close.

Figure 3 .
Figure 3.The map of marine missions contributing to the subset of the DASIO collection used in this study.The points represent the positions of a ship each hour during the corresponding expedition.The tracks are discontinuous due to the sampling strategy of our study: the images were not taken during the nighttime (when the sun's altitude is lower than 5 • ).

Figure 4 .
Figure 4. Dataset target (SW radiation flux) distribution (histogram) and approximated cumulative density function (right panel).One may clearly see the imbalance of the dataset w.r.t.target value.We present the percentiles in the CDF figure using vertical red lines in order to demonstrate the inter-percentile distances.

Figure 5 .
Figure 5. Architecture of a CNN we exploited in our study.Here, with numbers, we present the shapes of input data or activation maps.

Figure 6 .Figure 7 .
Figure 6.Value mapping diagrams for: (a) Multilinear Regression as a baseline; (b) Random Forests without ITF re-weighting; (c) Random Forests using ITF re-weighting of train subset; (d) Gradient Boosting for Regression without ITF re-weighting; (e) Gradient Boosting for Regression using ITF re-weighting of train subset; (f) convolutional neural network (no re-weighting).Here, density colormaps are logarithmic for presentation purposes.Each diagram has been provided with a diagonal dashed line representing an ideal model approximating SW flux without any errors.

Figure 8 .
Figure 8. quantile-quantile plots for: (a) Multilinear Regression as a baseline; (b) Random Forests without ITF re-weighting; (c) Random Forests using ITF re-weighting of train subset; (d) Gradient Boosting for Regression without ITF re-weighting; (e) Gradient Boosting for Regression using ITF re-weighting of the train subset; (f) convolutional neural network (no re-weighting).Each diagram has been provided with a diagonal dashed line representing an ideal model mapping the distributions without any errors.

Table 2 .
Quantitative summary of the dataset in our study.

Table 3 .
Scientific missions resulting in the DASIO collection of all-sky imagery over the ocean with the corresponding expert records of meteorological parameters.