The State-of-the-Art Progress in Cloud Detection, Identification, and Tracking Approaches: A Systematic Review

A cloud is a mass of water vapor floating in the atmosphere. It is visible from the ground and can remain at a variable height for some time. Clouds are very important because their interaction with the rest of the atmosphere has a decisive influence on weather, for instance by sunlight occlusion or by bringing rain. Weather denotes atmosphere behavior and is determinant in several human activities, such as agriculture or energy capture. Therefore, cloud detection is an important process about which several methods have been investigated and published in the literature. The aim of this paper is to review some of such proposals and the papers that have been analyzed and discussed can be, in general, classified into three types. The first one is devoted to the analysis and explanation of clouds and their types, and about existing imaging systems. Regarding cloud detection, dealt with in a second part, diverse methods have been analyzed, i.e., those based on the analysis of satellite images and those based on the analysis of images from cameras located on Earth. The last part is devoted to cloud forecast and tracking. Cloud detection from both systems rely on thresholding techniques and a few machine-learning algorithms. To compute the cloud motion vectors for cloud tracking, correlation-based methods are commonly used. A few machine-learning methods are also available in the literature for cloud tracking, and have been discussed in this paper too.


Introduction
Prediction of atmospheric conditions at a particular location and time is referred to as weather forecasting. Different locations will have different weather conditions, and when known in advance, then related actions can be taken. For example, weather forecasting helps farmers plan their farming operations, and citizens receive warnings related to storms, heavy rainfall, and floods [1]. In climate change, clouds play a vital role. Clouds cool the Earth by reflecting the solar energy back into space as well as warming the planet by absorbing longwave terrestrial radiation. This phenomenon is studied with the help of satellite images of clouds. Weather and cloud are two interrelated things, hence the study of clouds is an important parameter in weather forecasting. Forecast of solar irradiance, wind speed, and rainfall for a period greater than a few hours is generally carried out with the help of Numerical Weather Prediction (NWP) models. These models are based on information about observation sources and their qualities, atmospheric motion, model parameters, initial and boundary conditions, validation and verification process [2]. Various atmospheric parameters such as wind, rainfall, solar irradiance, temperature, and pressure are predicted with the help of NWP models. The NWP predictions may lead to some errors if the precise model parameters or initial conditions are not available. 1 of 25 Cloud movement prediction has one more dimension that is related to power generation using photovoltaic (PV) systems. The main reason for fluctuations in the PV output is the movement of clouds. Weather forecasting agencies provide information about the global coverage of clouds for an entire day. However, if the clouds obscure the solar panels even for a few minutes, it affects the PV output substantially. Tracking cloud movement for solar irradiance prediction is an important area. Cloud tracking and acquisition of related information is easily possible with the help of a high-resolution radar system; however as far as PV power generation is concerned, this does not seem a viable and economical solution. Therefore, to avoid fluctuations in PV system there is a strong need for cloud tracking systems that neither relies on satellite images nor on radar.
Depending on the forecast time scale, the approach for solar irradiance prediction differs [3]. Temporal horizon for solar irradiance prediction can be of 6 h to several days, a couple of hours ahead (intra-day), and an intra-hour. NWP models are generally used for longer forecast time, i.e., up to several days ahead prediction. For longer forecast times, satellite images are generally used as they cover a larger geographical area. Spatial resolution covered by NWP is about 100 Km 2 . The expected solar irradiance for a large spatial resolution can be predicted by the NWP models. Currently, in the case of the Geostationary Operational Environmental Satellite (GOES), 6 h ahead of accurate forecast is available at a spatial resolution of 1 Km 2 . The spatial and temporal resolution of NWP is coarse; hence for higher temporal resolution, NWP becomes inadequate. For higher temporal resolution, i.e., predictions over a couple of hours or intra-hour prediction, instead of satellite images, the sky images are captured from a ground-based camera system. Figure 1 shows the preferred prediction methods as a function of forecast time. 15   Various methods [4][5][6] are being used by researchers for weather prediction and solar irradiance predictions; however, in this study, we report these applications in the context of cloud images. Our main objective is the review of cloud detection, analysis, and cloud tracking methods for various applications related to cloud data. In this paper, we provide a systematic review of cloud detection and tracking methods. In this study, we have considered cloud images captured by satellite and ground-based camera systems. The structure of this paper is as follows: Section 2 discusses a review of clouds and imaging systems, which includes details of types of clouds, satellite and total-sky imaging systems. Furthermore, several cloud detection techniques are reviewed in detail in Section 3. Section 4 is dedicated for a systematic review of cloud forecast and tracking methods. Finally, the paper is concluded in Section 5.

Types of Clouds
Clouds have always been considered to be a sign of the weather. Therefore, if we want to predict the weather, we must understand the effect of clouds on the weather change. Weather change and cloud movements are interrelated, variations in clouds affect the weather and vice versa. The formation, evolution, and motion of clouds are determined by cloud dynamics. Even small clouds move around due to the wind. Variations in the characteristics of cloud dynamics for different meteorological situations lead to different types of clouds. Different types of clouds are responsible for different weather conditions.
Clouds are the main factors in weather variations. Different types of clouds imply different weather conditions (e.g., clear sky, cloudy, windy, rainy). Different names are given to clouds depending on their shape, size, and presence in the troposphere. Some clouds are found near the troposphere, while a few of them are near the Earth's surface. Therefore, depending on the distance of the cloud to the ground, the clouds are categorized as high-level, mid-level, and low-level clouds. One more classification of clouds is also available based on the characteristics of the clouds such as puffy, multilayer, convective clouds, etc.
High-level clouds are formed at a height of about 16,000-43,000 feet above sea level. They mostly contain ice crystals. These clouds are found on the top of a mountain or at an altitude of a jet airplane. High-level clouds do not block the sunlight hence are not considered for analysis of solar irradiance. Cirrus clouds are thin and scattered clouds and are categorized under high-level ones. These clouds do not block the sunlight. The formation of middle-level clouds occurs at a height of 6500 to 23,000 feet above sea level. Cumulus clouds are scattered, puffy and they appear at mid-level. Mid-level clouds contain water and sometimes ice. These clouds sometimes block sunlight. Low-level clouds are formed below the height of 6500 feet. They mostly contain water but can also contain ice crystals. Most of the sky is covered by these bluish-gray clouds and they block the sunlight and can provoke precipitation and wind.
Apart from the above-mentioned clouds, there are a few events that occur in a combination of clouds and result in thunderstorms and large precipitations for a longer duration. One of these types is a Mesoscale Convective System (MCS), where several thunderstorms are merged and act as a single system. An MCS covers a large area of almost hundreds of kilometers and normally lasts for more than 12 h. Expansive stratiform regions and active convective towers in MCSs differentiate it from isolated thunderstorms. The MCSs can occur in temperate latitudes and tropical ones.

Satellite Imaging
A non-invasive technique of measuring and collecting information about an object is known as remote sensing. Remote sensing is a practical solution for monitoring of Earth's surface for various applications of weather forecasting. With the development of remote sensing imaging technology, the quality of satellite cloud images has improved. These images information is required for weather forecasting and prediction of severe climate conditions. Similar to a digital camera, a multispectral sensor in satellite imager also uses charge-coupled device (CCD) sensor arrays to record electromagnetic energy. However, satellite images are not limited to the three visible wavebands (red, green and blue); instead, it records data from various bands over a broader range of wavelengths. Hence the satellite images are referred to as multispectral images. Most satellite sensors operate in the visible and near-infrared (NIR) regions while some have shortwave infrared (SWIR) and thermal infrared (TIR) capabilities. Satellite images used for cloud detection generally use visible and TIR band data.
A visible channel of satellite imageries captures images by collecting sunlight reflections from the clouds; hence they are captured only during the daytime. These images represent clouds by white, ground by gray, and water by a dark color. With the help of looping pictures, clouds and snow are differentiated. The requirement of day and nighttime information is fulfilled by infrared imagery of a satellite. The infrared sensors on the satellite identify the clouds based on the heat radiations from the clouds. These sensors compare the heat radiations from the Earth's surface and the clouds. Clouds are comparatively colder than the land area and water on the Earth's surface and hence easily identified.
In recent years, for cloud detection and tracking, researchers have used images from the following satellites. Geostationary Satellites-GOES, satellites from Indian Space Research Organization (ISRO)-Kalpna, MODIS, geostationary meteorological satellites-Meteosat-10 Second Generation-MSG and Himawari 8. These satellites capture and observe the cloud behavior far above the Earth's surface. Data collected from various satellites, areas or locations related to data and other related information are presented in the following part of this section.
In [7], cloud tracking experiments were conducted on Hurricane Luis, which formed as a tropical depression on 28 August 1995. The images were captured by GOES-8 and GOES-9 satellites. However, in [8], GOES-12 imager's data captured from the central and eastern CONUS and adjacent oceanic regions are used. The dataset used in [7] consists of the GOES-M weather satellite images for 2008 and the GOES-N weather satellite images for the years 2011 and 2012. These years were selected because, in these years, the U.S. experienced more severe thunderstorm activities than it does in a typical year.
For the study [12], high-definition images obtained from INSAT-3D satellites were used. The dataset consisted of images of the heavy rain in Mumbay, India, from 28-29 August 2017, cyclonic data sets for 16 May 2018, 19-20 September 2018, and 14 October 2018.
The Meteosat Second Generation (MSG) satellite takes Earth's pictures every 15 min in 11 spectral channels with a 3 Km resolution. In [13], these satellite images along with images taken by a sky camera (TSI-880), taken every minute, were used. In [14], these MSG satellite images were used. Zaher and Ghenam [15] also used images taken by MSG satellites along with images from a ground-based camera system. Leese et al. [16] and Smith and Phillips [17] used images, taken by Applications Technology Satellites (ATS-I and ATS-III). For the experiment in [18,19], Himawari-8 IR enhanced satellite images were used, provided by the Indonesian Agency for Meteorology, Climatology, and Geophysics, updated every hour. The image provides information about the clouds in Indonesia categorized on their temperature and height. In [20], whirlwind incident data in Indonesia in 2016 and cloud image data from the High-resolution Cloud Analysis Information (HCAI) satellite were used. These data display cloud appearance in Indonesia by type.

Total-Sky Imager
For short-term prediction (intra-hour) and tracking, the spatial and temporal resolution offered by satellite images is inadequate. The forecasting gap of satellite imagery is filled by sky imagers mounted on the ground. Total-Sky Imager (TSI) [21] by Yankee instrument systems is a complete solution for a ground-based sky imaging system. TSI is an automatic sky imager widely used for real-time cloud image acquisition, processing, and display. It is based on solid-state CCD technology; it captures RGB images and stores them in JPEG file format. It has an onboard processor that captures and computes cloud coverage of the sky image captured by an onboard camera. The sample image of TSI is shown in Figure 2a in [21]. In TSI, a CCD imager faces downward onto a rotating hemispherical mirror. For the protection of the imager optics from the intense direct normal sunlight, a shadow band is placed on the hemispherical mirror. This shadow band blocks the sunlight, but it appears on the captured image as a thick black band as shown in Figure 2b. The TSI system has some built-in preprocessing algorithms to calculate fractional cloud cover. Apart from built-in solutions such as TSI, some researchers use fisheye lens cameras to build a ground-based sky imaging system. Details of various ground-based imagers, locations, and other related details are presented below.
In [22] the data were collected from National Renewable Energy Laboratory (NREL) website [23]. The images were captured by TSI located at latitude 39.742 degrees and longitude 105.18 degrees. For the cloud tracking experiments, in [24][25][26] the data were collected using TSI instruments. In [24,26], the TSI-880 instrument was used to collect the data at an interval of 1 min. The TSI used in [26], located at the University of California Merced's solar observatory station using Yankee environmental Systems' TSI-880 instrument. In [27], the data were taken from the cirrus cloud dataset, which consists of 181 images (total-sky images and common camera images). The total-sky images were acquired by ground-based total-sky cloud imager (TCI) [27].
In [26,28], the data are measured with the Eppley Normal Incidence Pyrheliometer (NIP) mounted on an Eppley SMT tracker at the University of California Merced's solar observatory, at a sampling rate of 30 s. However, for the study in [28], 1-min averaged Direct Normal Irradiance (DNI) values based on the 30 s measurement were used. Images in [26] were taken at 1 min intervals at the University of California Merced's solar observatory station using Yankee Environmental Systems' TSI-880 instrument. In [29] for cloud detection, an all-sky imager developed by GFAT (Atmospheric Physics Group of the University of Granada) was used to collect data. The images were taken at an interval of 5 min. The instrument consists of a modified scientific CCD camera, with a modified lens that provides a field of view (FOV) of 185 degrees. It provides fullcolor images with 3 channels, each centered in red, blue, and green, respectively. In [30], the camera used is Axis M1114 network camera. Images are every 10 s for 10 h using a wide-angle lens. Unlike [29], 2-camera setup is used in [30]. In [31], the main equipment consists of a digital camera with a fisheye lens with FOV > 180 degrees. The study was performed on two datasets. One of the data sets is the Kiel dataset provided by Kiel University, Germany. The images are taken every 15 s. The second dataset is the lapCAS dataset, in which the camera is set to capture images every 10 s. In [32], the dataset used consists of data collected over 15 days, every 5 s. The sky images are acquired with a 5 MP C-mount camera equipped with a fisheye lens (FOV: 182 degrees). The dataset used in [33] for cloud motion prediction, consists of images captured every 30 s as studied by Yang et al. (2014) [34] by the UC San Diego Sky Imager (USI or UCSD imager).
In [35], for solar irradiance forecast, the datasets used have been collected during the High Definition of Clouds and Precipitation for advancing Climate Prediction (HD(CP)2) measurement campaign HOPE in 2013. A sky imager developed at the GEO-MAR Helmholtz Centre for Ocean Research [36] was used for continuous sky observations. It was part of the LACROS supersite within the HOPE measurement campaign [37]. The digital CCD camera used, equipped with a fisheye lens (FOV: 183 degrees), and had a sampling rate of 15 s.
In [38], a sky imager consisting of a 4 MP color camera equipped with a fisheye lens was used to capture images every 20 s. In [39], a whole-sky imager called WAHRSIS (Wide-Angle High-Resolution Sky imaging system) was used. It consisted of a digital single-lens reflex (DSLR) camera with a fisheye lens controlled by an onboard computer. The images were taken every 2 min.
In [40], the image data were recorded by an automatic sky imaging system. It consists of two sky imagers in true-color image format recorded at 1-minute intervals. In [41], the instruments used consisted of a camera with a resolution of 1936 × 1296, and an irradiance sensor to collect ground truth simultaneously at a frequency of one sample per second.
In [42], ground-based sky images were taken using a regular setup consisting of a camera with a fisheye lens. The images were taken at an interval of 10 s. In [43], sky images were obtained at regular intervals of 5 s with a Nikon D60 camera. Similarly, in [44], Ground-based sky images were used. The sky imager is designed around a Raspberry Pi SBC, with a fully programmable high-resolution Pi camera housed in a durable all-weather enclosure in [45].

Cloud Detection
Cloud images captured by different imageries represent raw data that requires preprocessing for various weather forecasting applications such as a study of the Earth's surface, weather forecast, or solar irradiance prediction. In the case of preprocessing the important step is cloud detection and separation of cloud from its background. In some applications, it is also essential to identify the type of cloud along with cloud detection. However, in some cases only a binary decision such as whether it is a cloud or not is enough. Depending on the application, the imagery for cloud images changes and based on the type of the image the detection techniques will also change. We have categorized the cloud detection methods based on the satellite images and ground-based camera system.

Cloud Detection from Satellite Images
Similar to a digital camera, a multispectral sensor in satellite imager also uses CCD sensor arrays to record EM energy. However, satellite images are not limited to the three visible wavebands (red, green, and blue); instead, it records data from various bands over a broader range of wavelengths. Hence the satellite images are referred to as multispectral images. Most satellite sensors operate in the visible and NIR regions while some have SWIR and TIR capabilities. Satellite images used for cloud detection generally use visible and TIR band data.
In the case of satellite images, generally, images are captured by satellite and then they are sent to ground-based data centers for processing and analysis. Due to high computational efficiency, the images captured by satellite are not processed or analyzed onboard. Srivastava et al. in [46], proposed a kernel approach for an onboard clustering algorithm. They designed a kernel method for onboard cloud detection using MODIS Satellite images. Once a suitable kernel is designed, it can be applied to both clustering and classification tasks. They used unlabeled MODIS data for the kernel clustering algorithm. The objective of the kernel clustering algorithm is to find a membership function and centroids that minimizes the cost function. Srivastava [46] proposed a probabilistic kernel to perform clustering in the feature space. Due to the large size of the kernel matrix, these methods can be computation and memory intensive.
A self-adjusting cloud movement prediction method using images from Kalpana-1 (Indian Meteorological Satellite) is proposed in [9]. The images from the infrared channel are used for the cloud analysis. For cloud movement prediction the first task is to detect clouds in the image. In [9] for cloud detection, first mean, standard deviation, busyness and entropy are computed for each pixel and formed a feature vector. Using this feature vector, a K-means clustering [47]-based image segmentation is carried out. The K-means clustering is a popular unsupervised algorithm; it partitions the feature matrix into k clusters where each pixel belongs to any one of the clusters. In K-means clustering, every cluster has a cluster center and is represented by mean of all the pixels belonging to the cluster known as centroid. The clouds are detected from the segmented image. Next, for cloud motion prediction, the Centre of Mass (CoM) for each cluster is computed. Using the displacement of the CoM the cloud clusters in the consecutive images are tracked and their positions are predicted after every 30 min.
In an object tracking scenario, the object to be tracked is always encapsulated in a standard geometric shape such as a rectangle, known as a bounding box [48,49]. However, a cloud formation results in an irregular shape and standard geometric shapes are not suitable for its encapsulation as it will include non-cluster pixels. In [10], an irregular encapsulating structure is designed for cloud clusters. They have used TIR images captured by Kalpana-1 for this analysis. Before cloud cluster extraction, segmentation is carried out using the same approach as in [9]. Using CoM cold cloud clusters are extracted. Connected pixels in each cluster are identified for tracking, and morphological operations such as erosion and dilation are carried out to obtain the final cluster to be tracked. The shape of the cluster to be tracked is extracted from the four connected neighbors at the boundary. The irregular shape extracted by the proposed method exactly fits the cloud cluster and it does not contain non-cluster elements. Every cluster has a unique structure that binds the pixels in the cluster in a single unit.
Meteosat-10 Second Generation (MSG) is a geostationary meteorological satellite in which the sensor has 12 channels from visible to thermal infrared spectra. Its visible channel acquires broadband images every 15 min, it has high spatial and temporal resolution. In [14], clouds are detected using the images from visible channels of Meteosat-10. The cloud index is calculated to differentiate clouds from the clear sky; it ranges from 0 to 1. In [14], Heliostat-II algorithm [50] is used to compute the cloud index, calculated from the reflectance measured by the sensor for pixels of interest and clear-sky pixels. The cloud index determines whether the pixel corresponds to a cloudy or a clear sky.
Nailaussada et al. [19] used Himawari 8 geostationary weather satellites operated by the Japan Meteorological Agency for satellite image segmentation. They proposed a clustering-based segmentation for cloud detection. For clustering, Density-Based Spatial Clustering (DBSCAN) and a modified version of K-means (meng hee heng K-means) are used. Experiments on various satellite images of different cloud types show that the DBSCAN method separates the clusters such that the number of cloud groups is easily identified. However, meng hee heng K-means separates the cluster based on some definite area. That implies the DBSCAN method is more favorable for cloud detection than meng hee heng K-means.
Once again Himawari 8 satellite images are used in [18], but this time images were taken several hours before a whirlwind event in a specific area. A density-based classifier DBSCAN is used as a clustering algorithm. Similar to [19], DBSCAN and meng hee heng K-means algorithm are used for clustering, and results of both algorithms are analyzed. Centroids of the clusters represent the feature of the whirlwind that characterize the movement of a cloud. Better segmentation results were in for DBSCAN than meng hee heng K-means for a whirlwind case. It was confirmed through the results that the information extracted by DBSCAN is more complete and suitable for whirlwind incident than meng hee heng K-means.
Cloud images captured from infrared and visible channels of various geostationary satellites are used for cloud detection. To detect clouds in the image various methods such as K-means clustering, segmentation and cloud index are used. Among these methods clustering is more popular and provides desired results. It is clear from the Table 1 that a common dataset is not used by the authors. Each author has collected their own data and performed the experimentation. Due to this fact the quantitative performance of various methods is not available.

Cloud Detection from Ground-Based Camera Images
Cloud analysis using geostationary satellite images does not provide the sufficient resolution required for various cloud-based applications. One more problem with satellite images is the presence of multiple cloud layers. Satellite imager can capture only higherlayer clouds whereas the lower layer clouds remain hidden. Therefore, for cloud analysis such as detection or prediction, only higher-level clouds are taken into account and the information about the cloud layers remains inaccessible.
With the advances in CCD cameras, various ground-based sky imaging systems were developed. For higher spatial and temporal resolution, ground-based sky imaging systems are well suited. These systems can provide higher spatial and temporal resolution for highly localized cloud analysis. Various types of in-house, and customized sky imaging systems, have been developed [29,51]. Images captured by ground-based sky imagers are mostly recorded as red-green-blue (RGB) color images. For these sky imagers, numerous algorithms have been developed for cloud detection and tracking [29,31,51,52]. Among these methods, the most popular one is based on a threshold of the red/blue ratio image. As mentioned earlier, in RGB color images, cloud pixels appear brighter than the background. That implies the cloud pixels have higher red (R) intensity values than sky pixels. Therefore, cloud and sky pixels can be distinguished with a threshold through automatic cloudidentification methods with either fixed or adaptive thresholding techniques.
An in-house sky imaging system using a Nikon COOLPIX5000 camera is designed in [51] for cloud cover estimation. The images captured by the camera are selectable from 2560 × 1920, 1600 × 1200, 1280 × 960, 1024 × 768, and 640 × 480. The imaging system provides the images in uncompressed TIFF (RAW) and compressed JPEG format. A fixed thresholding technique designed by Seiz et al. [52] used a simple red-blue ratio (RBR) to extract features from RGB images. In the RGB image, the blue-sky areas have higher pixel values in the blue channel and the low pixel value in the red channel. After preprocessing, Ref. [51] used red and blue channels of the sky image to separate clouds and the background sky. Instead of using only RBR, a normalized sky index (1) has been proposed using information from the red and blue channels.
where DN Blue is the pixel value of the blue channel and DN Red is the pixel value of the red channel. A fixed hard threshold of 0.23 is selected to differentiate between sky and cloud. A sky index lower than 0.23 corresponds to a cloud pixel. Cazorla et al. [29] designed an in-house sky imager from a CCD camera for cloud cover detection and classification. Their system captures images every 5 min. Instead of the fixed thresholding method, Ref. [29] used neural networks for real-time cloud classification and cloud cover estimation. Three classes, i.e., clear sky, opaque cloud, and thin cloud are used for cloud detection using neural network. For cloud classification using multilayer perceptron (MLP), first various parameters are extracted from the image. For each pixel, using a 3 × 3 window, 18 parameters are extracted from its eight neighborhoods. These parameters include pixel values of each channel, their corresponding mean, and variances. Using these parameters, an N/N of an input layer of 18 perceptrons and an output layer of 3 perceptrons have been built. For evaluation of the proposed scheme, 1000 samples were labeled with their corresponding classes. These samples were randomly divided into training and test sets. Resilient backpropagation has been used as a training algorithm.
Fixed threshold-based clustering and segmentation techniques are not suitable for all types of clouds. Different types of clouds will require different thresholds, hence an adaptive thresholding scheme is essential for robust cloud detection. As seen earlier, segmentation is a natural choice for cloud detection. Earlier cloud detection methods employed clustering-based segmentation for cloud detection. Liu et al. [31] proposed a superpixel segmentation (SPS) method for cloud detection.
SPS algorithm divides an image into irregular image blocks called a superpixel using various image features. In [31], superpixels are generated using contour and texture cues. According to the shape, size, and location of the cloud, the SPS algorithm adaptively divides the cloud image into a series of superpixels. Next, local thresholds for each superpixel are selected such that the variance between cloud and clear sky is maximized. These local thresholds are suitable for all kinds of cloud images. Finally, the cloud can be detected by comparing the (R-B) feature image with the obtained threshold matrix. A set of 500 images for the Kiel dataset and 300 images from the IapCAS data set are used to evaluate the performance of the proposed SPS algorithm. Compared with the fixed threshold, global threshold, and local threshold interpolation algorithms, the experimental results demonstrate that the algorithm achieves the highest performance for cloud detection.
Among various types of clouds, cirrus clouds are short, detached, hair-like clouds, which appear whiter than any other cloud in the sky. As they are whiter and wispy, and due to the nonuniform background of ground-based cloud images, it becomes difficult to distinguish cirrus clouds from their background. Cirrus cloud detection using the background subtraction adaptive threshold method (BSAT) has been proposed in [27]. BSAT tries to eliminate the effect of sunlight illumination using the background estimation method. Total-Sky Imager developed by the Institute of Atmospheric Sounding at the Chinese Academy of Meteorological Sciences is used for collecting cloud images. Initially, red-to-blue channel operation is followed by morphological operations to estimate nonuniform illumination background. After background subtraction, adaptive thresholding using the Otsu method [53] and binarization has been carried out to separate cloud and sky pixels. Experimentation showed the robustness of the BSAT method for a variety of cirrus clouds and outperforms the fixed threshold (FT) and adaptive threshold (AT) methods in cirrus cloud detection.
Total-Sky Imager-based short-term forecasts of DNI is proposed in [26]. Cloud detection is essential before forecasting. Red-blue channel-based methods are commonly used for cloud detection. In [26], an adaptive thresholding scheme described by Li et al. [54] is used for cloud identification. The adaptive thresholding scheme detects cloud pixels such that cross-entropy between normalized R/B ratio image and the segmented binary image is minimized.
Comparison of RBR threshold and a classifier for cloud detection and forecast is provided in [32]. Images were captured using a C-mount camera equipped with a fisheye lens. To train a classifier, 160,000 image patches are randomly extracted from sky images. These patches are clouds, sky, or a mixture of both. During the training phase, cloud texture is learned using a Restricted Boltzmann Machine. Next, for cloud detection, images are convolved with these learned filters. For cloud identification, a Random Forest classifier is used, which is trained using 350,000 manually labeled pixels from several sky images. A total of 150,000 manually labeled sample pixels are used to evaluate the proposed scheme. Although the cloud classification rate of the learning-based method is better than the RBR threshold, the computation time is too high.
RGB (Red-Green-Blue)-based thresholding model is commonly used for cloud detection. Radovan et al. [55] proposed an HSV color space for cloud segmentation. The HSV model consists of three parameters: hue, saturation, and value. The color component is defined by hue, amount of color by saturation, and brightness level by value. First, the images captured by the sky camera are converted from RGB space to HSV space. Next, morphological operations such as erosion and dilation are applied on binarized HSV image. Finally, contours corresponding to the edges of the clouds are identified using the Suzuki contour tracing algorithm [56]. Only the outer contours that represent clouds were further used for cloud movement prediction.
Various color models are proposed in the literature for cloud detection, but their comparative analysis is not available. Many authors used RBR extracted from RGB color space [26,31,32,52], whereas some of them have used normalized sky index [51,57]. A few publications have reported the use of HSV color space for cloud detection [55,56]. However, systematic analysis of these color models is not available. In [58], a comparative analysis of various color models was presented to identify the color channels with the best cloud detection results. For the comparative study, they used an image database [54], which contains 32 sky/cloud images with segmentation ground truth. The color spaces used for the study include RGB, HSV, YIQ, CIE Lab. Color components in red-blue combinations are R B , R˘B, and B−R B+R . Along with color spaces and color components, a chroma component is also used. Similarity among these features is analyzed using Principal Component Analysis (PCA) [59]. PCA is also used to identify color components with maximum variance. In the case of cloud detection, the bimodality of the color component plays a major role. To compute the bimodality of color component, Pearson's Bimodality Index (PBI) is used. A PBI value close to 1 indicates highly bimodal distributions. Experimental results indicate that out of the 16 color channels, S, R, and R B exhibit the most bimodal distributions, hence segmentation of images using these color channels works better than others. Three color models, i.e., R B , B−R B+R , and S have a relatively higher contribution to the first-principle component axis. Finally, by combining red-blue R B and B−R B+R , and saturation S, the cloud detection result surpasses the current state-of-the-art cloud detection algorithm in terms of precision, recall, and F-scores.
Color features and thresholding scheme is used in [22] for cloud identification. The images required for experimentation are collected from Total-Sky Imager developed by Yankee Environmental System. Cloud texture is extracted from the color information; next, the Otsu algorithm [53] is applied used to select the optimal threshold for image segmentation.
Instead of following traditional red-green-blue (RGB) channel information for cloud detection, a learning-based approach using Support Vector Machine (SVM) [60] has been proposed in [25]. Cloud detection and tracking system consist of a network of three total-sky imagers for 3D reconstruction.
RBR and red-blue difference are two popular techniques used for cloud detection, but these methods are not suitable for all environmental variations. A classification algorithm using the logistic sigmoidal function is used for cloud detection in [41]. Chang et al. used two features, the first one is the difference between the blue and the minimum of red and green, and the second is the blue-red ratio instead of RBR.
Using a Raspberry Pi and high-resolution Pi camera, an in-house sky imaging system is designed in [45] for cloud motion forecasting. For cloud detection, the first RBR is calculated for each image. Discontinuities in the edges around the cloud boundaries are filled by median filtering. Finally, thresholding has been carried out to distinguish the clear sky from clouds.
An EKO sky camera is used in [61] to collect RGB cloud images for solar power forecasting. Initially for cloud detection, RGB images obtained from the sky camera are converted to grayscale using (2).
Next, the binary image displaying clouds by white pixels and sky as black is obtained using Otsu's thresholding method. This method tries to minimize the intra-class variance of the thresholded cloud and sky pixels.
Using a fisheye lens camera, a ground-based sky image was designed in [42] which captures images every 10 s. For cloud detection, a masking-based approach is used. Initially, a mask corresponding to a foreground object is created using the RGB color threshold. Next, the mask is applied to the raw cloud image. For cloud detection, RBR and the difference between the Blue and Red components are considered to be features. Segmentation results of both the methods are similar so the authors use popular RBR for cloud detection.
An in-house sky imaging system using a Nikon D60 camera fitted with a Sigma 4.5 mm circular fisheye lens is designed in [43] for short-term cloud tracking. For cloud segmentation they followed RGB color space-based normalized ratio, i.e., B−R B+R . For final cloud identification a threshold was determined that minimizes the cross-entropy between cloud pixels and sky pixels. In [43], the threshold was set to 0.33, the pixel with a value lower than thresholds are classified as cloud, whereas remaining as the sky.
As mentioned earlier, RBR and red-blue difference are frequently used methods for cloud detection. These methods make use of only R and B channel information whereas the green channel remains unused. Crisosto et al. [57] incorporated information of green channel along with red and blue by calculating the Haze index. For improved cloud identification, Sky Index, and Haze Index, is also used. The equation for sky index and Haze index is given in (1) and (3), respectively.
Cloud detection from a ground-based camera involves various image acquisition systems such as Nikon COOLPIX5000 camera, CCD-based in-house sky imager, and a total-sky imager. The resolution of various devices used for the cloud detection are listed in Table 1. In most cases the images are color image therefore many authors have used RBR for cloud detection. Some authors explored different color spaces and computed the sky index accordingly. As far as cloud detection from the ground-based camera system is concerned a RBR thresholding is most popular and effective approach. Table 1 summarizes the cloud detection methods, respective imaging systems, and detection algorithms surveyed in this paper. In the case of cloud detection, the expected output is the segmented image in which the cloud area is separated from the background. The cloud detection results using different cloud detection methods presented in [27] is shown in Figure 3. It is clear from the Figure 3 that in most cases the visual results are used to decide the performance. One more reason behind this is that the cloud images cover a very large area and at this scale it is difficult to achieve high precision.

Correlation-Based Approaches
Manual determination of cloud motion vectors (CMV) from ATS pictures involves viewing the loop movies to fix cloud elements' position at the beginning and end of the picture sequence.
Fujita et al. [62], for the first time, designed the cloud motion prediction method using the satellite images obtained from geosynchronous meteorological Application Technology Satellite (ATS). The large-scale cloud motion prediction was achieved by computing crosscorrelation between the two images covering the same area but taken at different times.
One of the early approaches for cloud motion prediction is presented by Leespe et al. [16]. For cloud motion prediction, images are obtained from geosynchronous Application Technology Satellite (ATS) I and AST-III. The ATS images are digital images with every pixel represented by one of the 64 gray levels and each image by 64 × 64 pixels.
Manual techniques used for tracking cloud motion vectors to estimate wind fields are tedious and time-consuming. Leese et al. [16] presented two separate methods for computing cloud motion vectors between a pair of satellite images covering the same area but taken at different times. In the first method, they used a local area cross-correlation between a pair of images for cloud motion prediction. Next correlation coefficients concerning different displacements are computed. The displacement that gives the maximum correlation coefficient is used to compute the speed and direction of the cloud motion in the time interval for which the correlation coefficient is computed. To improvise the computational time required for computation of cross-correlation, a fast Fourier transform (FFT)-based cross-correlation was further tested. Compared to manual techniques, the FFT-based method was able to provide finer results in terms of displacements than that provided by the manual techniques.
To achieve further improvement in the computational time, a binary matching technique was proposed. In the binary matching method, grayscale satellite images are first segmented into two parts events and non-events. Events correspond to the cloud edges whereas non-events denote the remaining part of the image. Similar experimental results in terms of cloud displacements were provided by both cross-correlation and binary matching techniques. The automated technique in [16] and the existing manual method for calculation of cloud motion vector are compared. The directions of the motion vectors obtained by both the methods does not differ much. However, compared to manual technique, better resolution of displacement vectors was obtained by automated method for single layer cloud patterns. However, the same technique fails to provide good results in the case of multi-layered cloud images.
Apart from Leese et al. [16], Smith and Phillips [17] also proposed a local crosscorrelation-based technique for automatic cloud tracking. Although the cross-correlationbased methods give similar results as those obtained from the manual techniques, these methods fail to track mesoscale atmospheric phenomena [63].
Template matching is used for establishing the correspondence between various positions in a pair of images. A two-dimensional correlation surface is obtained by evaluating the goodness of match for every position (small patches) using the concordance correlation coefficient (CCC). The position corresponding to the highest CCC and respective displacement vector is the outcome of the well-known maximum cross-correlation (MCC) method. In MCC, the basic assumption is that the correlation surface has a single peak, i.e., the correlation surface is unimodal [64]. However, for time-varying low contrast images, and atmospheric noise, the correlation surface turned out to be multimodal and the peaks on it are not related to the best match position. This problem is addressed by relaxation labeling, in which instead of selecting a single position related to the highest CCC, multiple positions on the correlation surface that have a CCC value above a predetermined threshold are considered. Hence for each template, displacement vectors are generated from the corresponding multiple candidate match positions. Next, the most appropriate displacement vector for each template is determined using probabilistic relaxation. The relaxation labeling algorithm uses low-level contextual information to extract significant image features [65]. In [64], a multichannel correlation labeling is proposed for computing cloud motion vectors from the cloud images of MSG imagery. Initially, the candidate motion vectors for each template from both the channels are identified using template matching. Next, locally smooth motion field vectors for each template is produced by applying relaxation labeling on the motion vectors identified from the template matching. It is quite possible that due to predefined threshold, some templates may not have any candidate vector and relaxation algorithm may result in inconsistencies. In [64], a conditional vector median filter is proposed for the removal of such inconsistencies. The correlation relaxation labeling proposed by Evans et al. in [64] was evaluated for the single-channel and multichannel images from the MSG imagery. They showed an improvement in the result with multichannel compared to single-channel data. Due to a dense and locally consistent motion field generated by this technique, it is for NWP.
Correlation-based approaches for cloud motion prediction are found in both type of images i.e., satellite images and ground-based images. Initial works on the cloud tracking were based on correlation coefficients and many researchers have presented various modifications in the correlation-based approach. Few approaches that used TIR images have used temperature-based thresholding for cloud tracking.

Local Rigid Motion Models
It is known that cloud elements undergo continuous local deformations, hence the local rigid motion models [62,64] fail to track mesoscale atmospheric phenomena. Therefore, to track the moving objects, the motion of polygon-shaped objects were studied in [66]. They used vertices of a polygon as a feature to calculate the displacement of the cloud motion vector. The cloud motion estimation has two angles, those are, local and non-local rigid motion estimations. In general, tracking the motion of a nonrigid body is a challenging task. In the case of cloud structure estimation and tracking, the complexity increases to the upper level due to the complexities of cloud imagery and its orthographic projections. Nonlinear phenomena involved in cloud formation and unpredictable weather conditions further intensify the problem.
Zhou et al. [7] performed a cloud motion analysis using cloud images captured by meteorological satellites (GOES-8 and GOES-9). This imager has five multispectral channels of high spatial resolution and very high dynamic range radiance measurements with 10-bit precision. However, for experimental analysis in [7], images from the only visible channel were used. Zhou et al. [7] proposed a hierarchical algorithm for tracking dense cloud structure and nonrigid motion from a sequence of 2D cloud images. In the case of 2D images, it is difficult to recover the 3D structure and depth due to the unavailability of depth and correspondence information. However, in [7], Local and global analysis was presented to recover dense depth values and nonrigid motion from a sequence of 2D satellite cloud images. For local analysis, cloud images are segmented evenly into a small region. These small local regions will follow similar nonrigid motion for the small-time interval; therefore, a similar affine motion model was defined for each segmented cloud region. Next, an iterative algorithm was proposed, which combined the local analysis by considering the global cloud motion constraints. To summarize, affine models were used to capture local information and global constraints were based on the dynamics of fluid flows and smooth cloud movements in short duration. Experimental analysis for suitability of affine transformation to capture local cloud motion and their relationship had already been presented in [63,67,68].
Villa et al. [69] presented a forecasting algorithm using the thermal infrared images obtained from a geostationary satellite. The algorithm can predict the morphological characteristics of MCS and the radiative pattern for up to 120 min. Generally, the forecasting methods involve 3 steps: cloud detection, cloud tracking and finally forecast. The deep convection penetrates the upper troposphere; therefore, these clouds are generally available above 9 to 10 km. Convective cloud temperature is generally different from that of the background. Therefore temperature-based thresholding generally provides better cloud detection. However, various studies in this area suggested different threshold values [69]. The conclusion from these studies was that the brightness threshold should be selected based on the region where the convection is observed and generally the brightness temperature below 245 K satisfactorily identifies MCSs. For MCSs detection, Villa et al. [69] selected the brightness temperature threshold as 235 K. As mentioned earlier, MCS size and temperature are related to each other. Through experimental analysis identified the minimum size threshold of MCS's can be identified, required for its detection and tracking.

Convective Cloud Tracking
Cloud movement and distribution of atmospheric winds are strongly related to each other. Hence, as an alternative to expensive options such as Doppler radar, balloon radiosondes, Doppler sodar, or Doppler LiDAR, Porter et al. [70] proposed a cost-effective approach using ground-based stereo cameras. Cloud positions in images captured by two calibrated cameras are tracked to forecast the wind fields and the system is named the Stereo-Cloud Imaging Velocimetry (S-CIV) system.
Cloud position and range are determined from the images captured from the stereo cameras simultaneously. These images are georectified by calculating the rotation matrix from the two coordinate systems viz. Earth-relative and camera-relative coordinate systems. In [70], similar cloud features from the georectified images of both cameras were identified using spatial correlation. Using the azimuth and zenith angles of cloud features and an iterative search, the range and location of the cloud feature from the cameras were calculated. Once the position and range of the cloud at a particular time instance are known, a new image is captured from the first camera after 10 s anticipating that the cloud height does not change in this period. Using the approach discussed earlier, the new position and range of cloud were determined and after comparing it with the position at the previous instant the wind speed and direction were calculated. A comparison of the wind fields obtained from the proposed scheme and those from the balloon radiosonde as a function of height is also presented and it showed promising results. Since the images from the single camera were used for the wind speed calculation, the errors related to the absolute angular calibration were not available. However, the use of ground-based cameras leads to larger errors for higher elevation or larger zenith angles. For higher level and multi-level clouds, the error may increase further.
Convective clouds are associated with extremely low temperatures. In the case of TIR images, low temperature is an important factor which assists in the detection and tracking of convective clouds in it. Temperature-based convective cloud motion prediction is based on the displacement of the CoM of cloud clusters. In [9][10][11], Goswami et al. presented the CoM-based methods for tracking convective cloud motion. As explained earlier in the cloud detection section, in [9][10][11] the clouds are segmented and detected using the temperature and the K-means clustering algorithm. Large numbers of clusters are extracted, and the CoM is calculated for each cluster. In [9], the displacement of the CoM in the next image was predicted from the coordinates of the CoM in the previous two images. Once the prediction was available, a modification in prediction was also presented in [9], based on the average of the predicted coordinates of the CoM in the previous two images and the ground truth of the previously predicted coordinates. However, in the case of abrupt cloud cluster movements, this approach results in a large prediction error. Hence an adjusted mean of actual and predicted coordinates is presented as new position of the CoM of the cloud. This method provided predictions up to 30 min.
In the case of convective cloud detection from TIR images, the temperature plays an important role. The pixel intensities of TIR images denotes the temperature. In [10], cloud segmentation was done using statistical parameters of pixels and K-means clustering. Next, the clouds to be tracked were identified using connected component labeling on the cloudy pixels extracted from the segmented regions. Once the objects to be tracked were identified, the next step was to predict their movement. The Adjusted Mean-Based Prediction (AMBP) model in [9] was used to predict the displacement of CoM in the subsequent images and by observing the displacement cloud clusters can be tracked. The AMBP model is modified in [10] according to the temperature of the neighborhood of the cluster. This was done by considering a square area around the cluster and by excluding the cluster part, the square area was subdivided into four sectors. The mean value of the temperature of these sectors decided whether to use the AMBP model or to use the actual value of CoM. This temperature-dependent ABMP is denoted as Temperature Induced Mean-based Prediction (TIMP) Model. The introduction of the temperature of cluster neighbors reduced the error in the prediction of the next CoM.
The TIMP model in [10] was suitable for a single large cloud cluster. However, in the case of convective clouds multiple cloud clusters are present which need to be tracked. The modification in TIMP for multiple cloud tracking was presented in [11]. In the case of multiple clouds, correspondence of clouds was maintained in the consecutive images. All the clusters in images were paired using a cluster encapsulating structure [10]. After cluster encapsulation, the TIMP was used for each cluster to predict its movement.
Regarding our discussion on cloud movement prediction for estimation of falling electric power generation, the weather satellite images or radar images are not suitable. A short-term cloud movement prediction scheme in [30] computed the three-dimensional position of clouds along with their direction and speed using the ground-based camera arrangements. As an initial stage, the cloud images were classified into three classes (thick, thin and unidentified cloud) based on the image pixel intensities. The cloud type whether convective or stratus was also decided in [30] using Hough features. The use of stereovision for 3D measurements is a popular method based on the camera triangulation principle. Instead of relying on a single image for cloud movement, the 3D position of the cloud and corresponding movement was computed using the stereovision. The centroid of the cloud was tracked with reference to the sun's position using two cameras located 1.2 m apart. The cloud images were taken for 10 hours after every 10 s. Using the direct correlation among the cloud movement and wind, its speed and direction were computed.

CMV Prediction for Solar Irradiance Forecast
Forecasting the output power of solar systems is required for the good operation of the power grid or the optimal management of the energy fluxes occurring into the solar systems. Before forecasting the output of the solar systems, it is essential to focus the prediction on the solar irradiance. Global solar radiation forecasting can be performed by several methods.

Correlation-Based Approaches
Cross-correlation and phase correlation are two popular methods used for cloud motion detection in the case of short-term solar energy production prediction. The variability of solar energy can be predicted by the computation of cloud motion vectors. The first intra-hour solar irradiance forecast using ground-based sky cameras was presented by Chow et al. [71]. They used Total-Sky Imager for collecting sky images for solar irradiance forecasting. Cloud detection and segmentation were carried out through thresholding of the R/B value. For segmentation along with threshold, the pixels are also compared with the clear-sky model. For the prediction of the cloud movement, the first cloud speed was calculated using a cross-correlation-based block-matching scheme, next the cloud mask was displaced to calculate the cloud speed. For evaluation of the proposed scheme, a four-day dataset was used. They also computed matching errors between future images and current images translated in the direction of the velocity fields. Depending on the cloud speed and the field of view, the proposed scheme found useful for forecasting horizons of up to 15-20 min.
Huang et al. [72] investigated and modified the traditional cross-correlation and phase correlation for cloud motion detection. For the analysis of the robustness of motion vector, three different types of clouds, i.e., multilayer cloud with small and large sky coverage and a thick and dark cloud were used. The cloud images were captured using the TSI system. Phase correlation (PC) and cross-correlation (CC) are two popular methods used for the detection of cloud motion vectors; however, both are based on the unrealistic assumption of no cloud shape change. Initial investigations revealed that compared to the phase correlation, the cross-correlation is simple and robust against noise; however, it requires more time for computations.
Both PC and CC algorithms are template matching algorithms and in these methods, the performance generally depends on the size of the template. Block size, overlap area, and the search window size are major parameters that affect or improve the performance of motion vector algorithms. Too small block size increases the computational burden and may not find appropriate motion vectors whereas a too big block may result in many motion vectors and results in boundary effect. Similarly, a too large search window captures spurious motion vectors. To address these issues, Huang et al. [72] proposed three approaches for PC and CC algorithms. In the first approach, they proposed a bigger block size for the image center, and nearby edges are encoded by a smaller block size. This arrangement results in more motion vectors near the edges, and it releases the boundary effects. The second approach helped eliminate the boundary effect and spurious motion vector. It used local mean values to fill the boundary area and a shadow band area of the TSI imager. The algorithm in Westerweel and Scarano [73] was used as the third approach. To improvise the performance, they combined all three methods. The experimental results showed that the phase correlation approach was unable to discriminate multilayer clouds present in the same block. However, cross-correlation can easily detect motion vectors in a multilayer cloud. Additionally, as compared to cross-correlation, the phase correlation method is not able to track motion vectors in dark clouds. The major advantage of phase correlation compared to cross-correlation is low computational complexity. In the case of phase correlation, it is O(N 2 ) and in the case of cross-correlation, it is O(M 2 N 2 ), where N is block size and M is the search window size.
It is well known that the lifetime of a cumulus cloud is of a few minutes. Long term or a day ahead cloud motion prediction will not be effective in the case of a cumulus cloud. Meteorological data provides the prediction of a relatively larger area and for a longer time. Additionally, the use of radar turns out to be an expensive and impractical solution. A cost-effective solution to these problems is a short-term solar irradiation prediction model as proposed in [74].
Hemispheric images of the whole sky are captured using TSI. Next, motion vectors between these images are detected and finally, these motion vectors are used to predict their future location in the sky. Using the future cloud location, solar radiation level can be calculated. Loss of information due to the holding arm, shadow band of TSI, real-time shortterm prediction, cloud shape change, and multi-layered cloud are some of the challenges addressed in [74]. It is a three-step process starting with preprocessing of TSI image followed by cloud motion estimation and finally solar radiation estimation. Preprocessing involves mapping of the TSI image space into a flat space. This stage does not recover the information lost due to the holding arm, shadow band. The effect of holding arm, shadow band is minimized filling missing information. Due to this either no vectors or false vectors will be detected in and at the edge of the empty area, respectively. As a solution, Huang et al. [74] proposed two motion detection and two motion estimation. For real-time and robust cloud motion tracking, a fast cross-correlation algorithm was used. In [72] same authors showed the effectiveness of cross-correlation for cloud motion tracking. For cloud motion tracking, first rough motion estimation was carried out on the initial detection, followed by the mean filtering to fill the missing information. Next, the second round of estimation was carried out on the second detection results. These refinements worked on single-layer and multilayer clouds. For motion vector prediction, more than 2 preceding frames were considered. Using backtracking, a series of motion vectors were traced, further used to build a sequential prediction model. To remove spurious motion vectors and to adjust motion vectors, local and global patterns of wind were used. Finally, for prediction of short-term solar irradiance, a linear prediction model was proposed. Experimental analysis and results showed an improvement in the tracking and in the estimation of the solar irradiance.
For short-term solar irradiance forecasting, [24] used a total-sky camera and collected images for different sky conditions such as cloudless, partly cloudy, and overcast. The images were taken at an interval of 1 min. Along with PSI, additional information was collected using a pyranometer and pyrheliometer. A pyranometer measures the solar energy coming into the system whereas pyrheliometer measures direct beam irradiance at normal incidence. In the case of satellite images, multilayer cloud is the major problem, while modeling search images of multilayer clouds, only the upper layer is taken into consideration while the information in the lower layer remains unused. Basic steps for solar irradiance forecasting are image preprocessing, cloud detection, cloud motion vectors estimation, and finally forecasting the irradiance using prediction algorithms. After preprocessing of sky images, [24] obtained the cloud motion vectors using the methodology in [75]. For obtaining the cloud motion vectors, the spherical sky image was divided into various sectors; each sector represents different types of cloud movement such as slow, intermediate, and fast movement. The cloud motion vectors for each sector are calculated using the popular maximum cross-correlation method. Cloud detection was carried out on the last captured image. The cloud detection was carried out using RGB and hue-saturation-value (HSV) color spaces and thresholding. Solar irradiance value for each pixel was calculated for the last captured image. For calculation of solar irradiance for each pixel, the image was divided into spherical regions as per the pixel position from the sun. The radius of the spherical region is determined using radiometric data. Next, each pixel in these areas was checked and it was determined whether it corresponds to cloudless or to an overcast sky using correlation. Please note that each spherical area corresponded to a different solar radiation component. Using correlation values in each area, a polynomial function was designed for each area. After assigning solar irradiation value to each pixel, the next task was to determine the future pixel movement. To do this, cloud motion vectors were applied to the most recent image. Using the solar irradiance at each pixel and the cloud motion vectors of the most recent image, the short-term, and mid-term solar irradiance forecast was obtained. Depending on the estimated pixel-level solar irradiance in the image, the forecast was classified into three categories: cloudless, partially cloudy, and overcast. The forecasting interval ranged from 1 to 180 min.
One more technique for short-term solar irradiance prediction using a ground-based camera was presented in [42]. The images were captured using a fisheye lens camera every 10 s. The objective of the solar irradiance prediction was to determine the time taken by clouds to reach the sun position on the image after a few minutes. The first step was cloud detection, which was done using the red-to-blue ratio-based thresholding algorithm. It is quite possible that the sun, being the brightest part of the image, may be interpreted as a cloud. Therefore, first the sun position on the was identified using zenith angle, azimuth angle, camera orientation, camera properties, date, and time, and that part from the image is removed. After cloud detection, the next step is to identify the correspondence between detected clouds in each frame. To find the correspondence between the clouds, instead of using cloud features, a complete cloud was used. The correspondence between the clouds in the image frames was carried out using fast cross-correlation.
Next, to determine cloud deformations and cloud motion vectors, first cloud features were computed using Harris feature detection algorithm [76] and an array of feature point locations is formed for each frame. Using a gradient-based optical flow algorithm [77] the cloud motion vectors were found. First, optical flow velocity vectors for every feature point in each feature were calculated, then using the mean velocity of each feature point and a recent sky image, the corresponding features in a frame appearing after 3 min were calculated. As mentioned earlier, the sun position is already known, so, using the sun position in the frame and feature points' motion velocity the sun occlusion period is calculated. This information provides the period for which PV power is not available. Using this algorithm, 3 min ahead the cloud positions are predicted.
Cros et al. [14] also used optical flow and PC for motion vectors extraction from the images obtained from Meteosat-10 satellite. These motion vectors were further used to extrapolate future cloud patterns from current cloud index maps. Chow et al. [33] used variational optical flow (VOF) technique to compute the motion vectors and compared its accuracy with cross-correlation method and image persistence method. They concluded that VOF forecast outperform both the forecasts.
A continuous forecast of cloud motion for up to 10 min was presented in [32] using the images acquired from a ground-based camera. The sky images were captured after every 5 s, using a fisheye lens equipped C-mount camera. The size of the captured images is 2592 × 1944 pixels. The proposed forecast was in terms of the probability of the sun being occluded by clouds in the next 10 min. Cloud images acquired from the ground-based camera were first processed for segmentation and cloud speed (motion) estimation. The cloud segmentation was done using color and texture information in the image. Color information was extracted using the R/B threshold scheme. For textural information, the representation-learning algorithm was used to learn the textural filters. Next, using color and texture information as a feature vector, a Random Forest is trained for cloud detection. Although the result of this method is better than the R/B threshold, the computational complexity is very high, hence is not preferred for cloud detection. For cloud speed estimation a nonrigid registration technique was adopted in [32]. The registration technique assigned a displacement vector to each vector. After cloud detection, two-stage forecasting were proposed by Bernecker et al. [32], where the preliminary forecast was based on the displacement vectors obtained from nonrigid registration and the further refinement in the principle movement vector was achieved with the help of the Kalman filter [78]. Using a Kalman filter, it is possible to obtain a continuous forecast up to 3 min. In Table 2 a summary of cloud tracking methods, and respective imaging systems is presented. Figure 4 shows the result of prediction of cloud motion vectors presented in [26].

Discussion
This paper reviews a survey of works regarding cloud detection. An introduction has been presented about types of clouds and imaging systems, followed by an analysis of research that has been carried out on cloud detection. The final part is devoted to diverse techniques of cloud forecast and image. In Section 2 various imaging systems used for cloud detection and tracking are highlighted. It was observed that for short-term prediction of solar irradiance, researchers rely either on TSI or on designing their own ground camera system to capture cloud images. All the cloud detection and tracking algorithms in the literature need to be evaluated on the same data. Hence the development of the cloud image database is very essential.
For cloud detection, the input images are captured by either a satellite imager or a ground camera system. In both cases, the final objective is to detect and segment the cloud from the background. Cloud, being a carrier of water, has a very low temperature. Hence the images captured by the infrared channels of the satellite can easily detect the cloud using temperature-based thresholding. In the case of images captured from the visible channel of satellite and the images also captured from the ground camera systems, the thresholding scheme generally provides acceptable results for cloud detection. Cloud detection techniques such as global thresholding, adaptive thresholding, K-means clustering followed by segmentation, DBSCAN, and RBR, mainly rely on the pixel values in the image. The cloud and the background can be easily separated using these methods. However, if the image has multi-layered clouds, cirrus clouds, or dark clouds, then the earlier mentioned schemes do not provide desired performance. In such cases, semantic segmentation and region growing segmentation would be better options to achieve accurate cloud detection. The recent advancements in machine learning and convolutional neural networks for image segmentation can also lead to improvements in cloud detection. In the literature on cloud detection, visual results are generally used to decide the accuracy of the cloud detection methods. However, for benchmarking, a quantitative evaluation of state-of-the-art methods is required. Image quality assessment methods can also provide the quantitative analysis of the segmented image.
After cloud detection, the next stage is cloud tracking. As far as the cloud tracking from the satellite images is concerned, various methods such as stereovision, AMBP, TIMP, optical flow, and block-matching techniques are used. However, in the case of images from the ground camera system, correlation-based approaches are the most popular. In a few cases, the Kalman filter is also used for cloud tracking. In these approaches, the images containing cloud motion vectors are shown to claim accuracy. The visual results in terms of trajectories of the cloud motion are not available in the literature. Object tracking is a very well-developed area of computer vision [79,80]. Designing deep learning approaches for cloud tracking and irradiance prediction is a viable future direction [81]. Cloud tracking and its movement are very important in the case of intra-hour prediction, where the accuracy of tracking and prediction both are necessary. The prediction speed and accuracy of deep learning algorithms are higher, hence it will assist the PV user to take appropriate care well within time.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: