Integrating the Continuous Wavelet Transform and a Convolutional Neural Network to Identify Vineyard Using Time Series Satellite Images

Grape is an economic crop of great importance and is widely cultivated in China. With the development of remote sensing, abundant data sources strongly guarantee that researchers can identify crop types and map their spatial distributions. However, to date, only a few studies have been conducted to identify vineyards using satellite image data. In this study, a vineyard is identified using satellite images, and a new approach is proposed that integrates the continuous wavelet transform (CWT) and a convolutional neural network (CNN). Specifically, the original time series of the normalized difference vegetation index (NDVI), enhanced vegetation index (EVI), and green chlorophyll vegetation index (GCVI) are reconstructed by applying an iterated Savitzky-Golay (S-G) method to form a daily time series for a full year; then, the CWT is applied to three reconstructed time series to generate corresponding scalograms; and finally, CNN technology is used to identify vineyards based on the stacked scalograms. In addition to our approach, a traditional and common approach that uses a random forest (RF) to identify crop types based on multi-temporal images is selected as the control group. The experimental results demonstrated the following: (i) the proposed approach was comprehensively superior to the RF approach; it improved the overall accuracy by 9.87% (up to 89.66%); (ii) the CWT had a stable and effective influence on the reconstructed time series, and the scalograms fully represented the unique time-related frequency pattern of each of the planting conditions; and (iii) the convolution and max pooling processing of the CNN captured the unique and subtle distribution patterns of the scalograms to distinguish vineyards from other crops. Additionally, the proposed approach is considered as able to be applied to other practical scenarios, such as using time series data to identify crop types, map landcover/land use, and is recommended to be tested in future practical applications.


Introduction
Over the last decade, grape cultivation in China has rapidly progressed in terms of the cultivation area, yield, and quality, in addition to management technology.According to the statistics of the Ministry of Agriculture, by the end of 2015, the area of grape cultivation in China had reached 799,000 Ha, production was 13.669 million tonnes, and wine production was 1.14 million tonnes [1].China's table grape production increased by 591% between 2000 and 2014, which was much faster than the overall global growth of 71% over the same period.The share of the country in world production rose accordingly from 8% in 2000 to 34% in 2014, which made China the world's largest table grape producer, with 9 million tones.The production of wine grapes reached 180,000 tonnes in 2014 and has doubled over the past 15 years (+112% compared with 2000) [2].Therefore, timely and accurate information about the cultivation area and spatial distribution of vineyards would provide strong support for precisely estimating its production and predicting market performance, in addition to adjusting and optimizing the planting area at a regional or country level.
Remote sensing (RS) is the ideal technique for obtaining such information because it is fast, objective, and has a wide observation range.However, previous studies using RS data have mainly focused on mapping grape varieties [3,4], retrieving physical and biochemical parameters (e.g., leaf area index and chlorophyll) [5][6][7][8][9][10], evaluating the growing state (e.g., evaporation, water stress, and soil moisture) [9,[11][12][13], and detecting disease [14], in addition to estimating the quality [15] and production [16] of grapes.Research on vineyard identification is quite rare, and most studies have typically been based on the data with very high spatial resolution (e.g., unmanned aerial vehicle images, UAV images) [17][18][19][20][21][22].Among these vineyard identification studies, textural features caused by unique planting approaches are clearly reflected in UAV images, and thus contribute the most to the vineyard identification results, whereas spectral features contribute less because of the similarity, in terms of the phenological phase and canopy reflectance, with other crops [23].This may be the reason that few studies have been conducted to identify vineyards using single-/multi-phased satellite images in which the spatial resolution is greater than that of UAV images.Within the existing research of vineyard recognition based on satellite imaging, the features used are basically employed from the literature on similar crop identification/detection studies.Spectrums in the range of visible to short wave infrared are the basic features that are commonly employed by previous studies, which justified broad-band multispectral remote sensing imagery of high spatial resolution shows potential applications for vineyard identification [3,9].Vegetation indices derived from the spectral bands are another import feature component, which are designed to characterize different physiological or biochemical features of vineyards' canopy.For example, the normalized difference vegetation index (NDVI) was used to indicate the condition of vine leaf area index (LAI) [8], the perpendicular vegetation Index (PVI) and ratio vegetation index (RVI) were adopted to present the canopy density of the vineyard [24], and band ratios and indices like red-edge/blue and the modified soil-adjusted vegetation index (MSAVI) were used to depict the difference between vineyards and other classes [25].Texture features are often associated with very high spatial resolution, so they are widely used in UAV imagery.Since vineyard mapping using UAV images is too costly to be applied on a large scale, dense time series RS data is expected to be the solution for large-scale and accurate vineyard mapping if the subtle differences, both in spectrum and phenological phase, between vineyards and other crops over the entire growing season/year are fully considered and exploited.Therefore, a study on identifying vineyards using time series satellite images is necessary and valuable.
With the increasing success of RS techniques and the increasing number of satellites in orbit, a huge volume of RS data with a high spatial and temporal resolution is freely accessible to users all over the world, which satisfies the prerequisite for a variety of applications and enables new applications.Abundant data sources benefit the construction of time series RS data, which can fully depict phenological features of vegetation; thus, it is important to distinguish crop type from each other.To date, time series RS data have been successfully applied to many practical applications, such as land cover classification [26,27], change detection [28,29], crop type classification [30], crop phenology detection [31,32], and crop monitoring [33][34][35] because they can provide informative features for subsequent classification/regression algorithms.
In the context of crop type mapping, many approaches have exploited time series RS data, and can be roughly divided into three categories: First, time series data have directly served as the input of the traditional supervised classification method (e.g., random forest (RF)), where time series data are regarded as independent features.Their temporal information contributes nothing to the classification result in these approaches [36].Second, deep learning approaches, such as convolutional neural networks (CNNs), have been applied to identify crop types by considering original long time series data as the only input [37][38][39][40][41]. Within such approaches, time series images are directly stacked, then CNNs automatically construct convolutional filters of three dimensions to extract the features to serve its class prediction layer.Third, the Fourier transform and wavelet transform (WT) have been used to perform frequency analysis and extract phenological features to complete the task of crop type identification because time series RS data contemporaneously contains temporal and amplitude information [32,42].That is, detecting sowing and harvesting date, peak point of growth, rising and falling rate of vegetation index, then treating them as the input vector of the classification algorithm.
Feeding directly stacked image data into a traditional machine learning method and CNN ignores temporal dependencies and only uses the amplitude information of the time series.However, different crops theoretically have unique temporal profiles because they have different growth cycles.Therefore, research on extracting crop phenological features and analyzing frequency characteristics, in addition to modeling growth cycles, has been conducted to pursue more reliable and accurate recognition results.Among such studies, the Fourier transform and WT are the two most common approaches that are adopted to model crop seasonality and then extract phenological features to help the consequent identification task.Because of the complex planting structure/crop rotation and parcel size, in addition to the similarity of the spectrums between different crops, the fitted growth cycle model or the extracted phenological features may not be able to fully represent the differences between crops, and thus fail to accurately identify crop types.
As aforementioned, the researches of vineyard recognition based on satellite image failed to exploit the temporal information contained in the time series of satellite images, thus, our motivation is to propose a new approach that can make not only full use of the amplitude information of each image but also the temporal information contained in whole time series, verifying both kinds of information are contributory to the identification task.The objective of this study is to integrate the advantages of the wavelet in the analysis of time series data, the superiorities of the CNN in the image recognition field, and the strengths of time series satellite image data to distinguish vineyard from other classes and made a substantial advance in mapping vineyard distribution at a regional level.The sub-objectives are as follows: (i) to verify that applying the continuous WT (CWT) to time series RS data is a stable and effective approach to obtain more information about vineyards, particularly in the frequency domain; and (ii) to confirm that feeding the scalograms generated by the CWT into the CNN is a feasible solution for identifying vineyards, and is also better than directly feeding the original time series into a traditional supervised classification method.
The remainder of the paper is organized as follows: The dataset and study area are introduced in Section 2, and the details of the proposed approach are illustrated in Section 3. Following the result reported in Section 4, a discussion and conclusions are presented in Sections 5 and 6, respectively.

Dataset and Study Area
Shaanxi province belongs to the arid/semi-arid region of the Loess Plateau, which is one of the seven major grape-producing regions in China.The study area is located in the central Shaanxi province and spans an area of approximately 1000 km 2 .Throughout the entire study area, there are seven planting conditions, in general, that appear in cropland over a full year (Table 1): crop rotations of spring corn to vegetables, winter wheat to corn, vegetables to vegetables, vineyard, peach trees, greenhouse, and other forest.Note that the forest mentioned in this paper is a collection of trees planted on cropland, including sophora japonica, apple trees, persimmon trees, pine trees, and other trees except peach trees because they have a small planting area over the study region.Among the seven planting conditions, the rotation of winter wheat to corn takes up more than 60% of the planting area of the study region; vegetables and fruit trees rank second and third, respectively.Because it is extremely difficult to recognize the types of crops planted in greenhouses without the support of other data, the study only considers vines planted in normal cropland without any materials covering its canopy, and further crop type recognition is not performed for the greenhouse category.
A field survey was conducted on June 26, 2018, and 581 geo-tagged samples were collected (shown in the right panel of Figure 1).Geo-tagged samples were used to identify the unique features of each crop type.Training/testing data for each type of crop were acquired manually and independently from the Sentinel-2 image based on the analysis of ground samples and the images (many small polygons were randomly drawn to select sample pixels using ENVI software); the details are listed in Table 1.
The training/testing datasets were mainly defined in pixels because the identification process was conducted at the pixel level.details are listed in Table 1.The training/testing datasets were mainly defined in pixels because the identification process was conducted at the pixel level.As a valuable data resource for vegetation monitoring, the Sentinel-2 satellite constellation has had a revisit cycle of five days since early 2018, which makes it possible to build dense and consistent time series data all over the world to mitigate the problems of cloud and cloud shadow contamination [43].Despite its excellent revisit ability, Sentinel-2 also performs well in spatial and spectral resolutions, including three narrow bands for cloud screening and atmospheric correction at 60 m, and three red-edge bands and two shortwave infrared bands that provide key information about vegetation at 20 m, in addition to four classical bands (i.e., blue, green, red, and near infrared bands) at 10 m.The wide swath of Sentinel-2, coupled with its five-day revisit cycle, creates the opportunity to address the challenges that remain for precise mapping and monitoring in the agricultural field.
Sentinel-2 was the only satellite data source used in the study, and the study area was fully covered by the tile of 49SBU in the Worldwide Reference System (WRS-2).The Level-1C products of Sentinel-2A/B provided by the European Space Agency were downloaded from Sentinel Hub (https://apps.sentinel-hub.com/eo-browser/) for the period of January 1-December 31, 2018.Finally, 33 scenes of Sentinel-2 images that had a cloud percentage lower than 50% or for which the spatial distribution of clouds did not obscure the study area were acquired to build an original image time series (as shown in Figure 2).Atmospheric correction was performed using the Sen2Cor tool, which is a processor for Sentinel-2 Level-2A product generation and formatting.Following atmospheric correction, three commonly used vegetation indices (VIs) were calculated for each of the original time  As a valuable data resource for vegetation monitoring, the Sentinel-2 satellite constellation has had a revisit cycle of five days since early 2018, which makes it possible to build dense and consistent time series data all over the world to mitigate the problems of cloud and cloud shadow contamination [43].Despite its excellent revisit ability, Sentinel-2 also performs well in spatial and spectral resolutions, including three narrow bands for cloud screening and atmospheric correction at 60 m, and three red-edge bands and two shortwave infrared bands that provide key information about vegetation at 20 m, in addition to four classical bands (i.e., blue, green, red, and near infrared bands) at 10 m.The wide swath of Sentinel-2, coupled with its five-day revisit cycle, creates the opportunity to address the challenges that remain for precise mapping and monitoring in the agricultural field.
Sentinel-2 was the only satellite data source used in the study, and the study area was fully covered by the tile of 49SBU in the Worldwide Reference System (WRS-2).The Level-1C products of Sentinel-2A/B provided by the European Space Agency were downloaded from Sentinel Hub (https://apps.sentinel-hub.com/eo-browser/) for the period of January 1-December 31, 2018.Finally, 33 scenes of Sentinel-2 images that had a cloud percentage lower than 50% or for which the spatial distribution of clouds did not obscure the study area were acquired to build an original image time series (as shown in Figure 2).Atmospheric correction was performed using the Sen2Cor tool, which is a processor for Sentinel-2 Level-2A product generation and formatting.Following atmospheric correction, three commonly used vegetation indices (VIs) were calculated for each of the original time series because they have been confirmed to be highly related to the growth cycle and physical features of crops.The three VIs are normalized difference vegetation index (NDVI) [44], green chlorophyll vegetation index (GCVI) [45], and enhanced vegetation index (EVI) [46], and their formulas are as follows: where Nir, Red, Green, Blue is the spectral reflectance of the band, respectively.NDVI is usually interpreted as a useful indicator of LAI and photosynthetic capacity of vegetation; however, a saturation phenomenon appeared at high leaf area biomass.The development of GCVI and EVI is largely aimed at solving the saturation problem in NDVI.Additionally, GCVI has been found to have the most linear relationship with leaf area index (LAI) when compared with other VIs, and EVI is capable of reducing the influence of some atmospheric effects by including blue bands and is proportional to the vegetation biomass.These three VIs were selected to express the variation trend of each kind of vegetation among a full year in the aspects of photosynthetic capacity, vegetation biomass, and LAI, and then reveal the difference between a vineyard and other classes.
Remote Sens. 2019, 11, x FOR PEER REVIEW 5 of 22 vegetation index (GCVI) [45], and enhanced vegetation index (EVI) [46], and their formulas are as follows: . * ( ) where Nir, Red, Green, Blue is the spectral reflectance of the band, respectively.NDVI is usually interpreted as a useful indicator of LAI and photosynthetic capacity of vegetation; however, a saturation phenomenon appeared at high leaf area biomass.The development of GCVI and EVI is largely aimed at solving the saturation problem in NDVI.Additionally, GCVI has been found to have the most linear relationship with leaf area index (LAI) when compared with other VIs, and EVI is capable of reducing the influence of some atmospheric effects by including blue bands and is proportional to the vegetation biomass.These three VIs were selected to express the variation trend of each kind of vegetation among a full year in the aspects of photosynthetic capacity, vegetation biomass, and LAI, and then reveal the difference between a vineyard and other classes.

Methods
As time series data can be readily built, it is beneficial for characterizing potential patterns of crops among their growing cycles, which are important for addressing the existing challenges in crop identification or completing a more complicated identification task, for example, identifying vineyards.In this study, the approach proposed to identify vineyards with time series RS data has three major parts (Figure 3): reconstructing daily time series for three VIs, applying the CWT to the reconstructed time series of the VI to generate the corresponding scalogram, and using a CNN-based classifier to identify vineyard by considering the scalograms as input.Applying the CWT to time series data allows the full exploration of the frequency information of different crops during the growing cycle in the frequency domain while simultaneously maintaining temporal information, and the scalogram is the direct carrier of these two types of information.The deep CNN is used to detect the specific pattern hidden in the scalograms to achieve the goal of identifying vineyards.

Methods
As time series data can be readily built, it is beneficial for characterizing potential patterns of crops among their growing cycles, which are important for addressing the existing challenges in crop identification or completing a more complicated identification task, for example, identifying vineyards.In this study, the approach proposed to identify vineyards with time series RS data has three major parts (Figure 3): reconstructing daily time series for three VIs, applying the CWT to the reconstructed time series of the VI to generate the corresponding scalogram, and using a CNN-based classifier to identify vineyard by considering the scalograms as input.Applying the CWT to time series data allows the full exploration of the frequency information of different crops during the growing cycle in the frequency domain while simultaneously maintaining temporal information, and the scalogram is the direct carrier of these two types of information.The deep CNN is used to detect the specific pattern hidden in the scalograms to achieve the goal of identifying vineyards.

Iterated Savitzky-Golay Filter-Based VI Reconstruction
Since VI products have been widely used in the RS community as time series data, many more differences in the growing patterns of different crops have been expressed, and such patterns are significantly important for distinguishing crops from each other.However, because of both the satellite observation plan and the influence of clouds and poor atmospheric conditions, it is unrealistic to build a daily RS data series (e.g., NDVI time series) without interpolation or any reconstruction process.Applying a reconstruction approach to the original observation time series data is required in some operational practice for two reasons: denoising and filling missing values.By contrast, to facilitate the consequent CWT analysis, our approach requires that the input of the CWT is a daily time series for a full year.Therefore, three original VIs are reconstructed by applying an iterated Savitzky-Golay (S-G) filter-based method [47], which was proposed to provide high-quality NDVI time series data, to meet the requirement of the CWT analysis.The S-G filter method was originally proposed by Savitzky and Golay in 1964 [48], which has widely been used in data smoothing and denoising.It is a filtering method based on local polynomial least squares fitting in the time domain, and its biggest advantage is that it can ensure the shape and width of the signal while filtering out noise.Within the reconstruction approach, those positions that have an NDVI increase greater than 0.3 during 20 days are rejected and replaced by a null value before the iterated reconstruction process because such an increase cannot be caused by natural vegetation changes.As different pixels have different situations, in the most severe case, only 3 samples were set to null after applying this threshold.In addition to the reconstruction of the NDVI time series, the positions that are first replaced by a null value in the original NDVI time series are also replaced by a null value when the EVI and GCVI time series are reconstructed.During the reconstruction process, the width of the smoothing window and the degree of the smoothing polynomial are the two most important parameters; they determine the smooth extent of the reconstruction result.The wider the smoothing window, the smoother the result at sharp peaks, and the smaller the degree of the smoothing polynomial, the smoother the result, but bias may be introduced.Finally, we set the width of the smoothing window to 31 and the degree of the smoothing polynomial to 3 to obtain the optimal reconstruction result.

Continuous Wavelet Transformation
The WT has been proven to be a useful tool in the study of time series and has been applied successfully numerous times in an extraordinary range of fields because of its ability to extract various components (e.g., seasonal, trend, and abrupt components) for time series [49].The WT can

Iterated Savitzky-Golay Filter-Based VI Reconstruction
Since VI products have been widely used in the RS community as time series data, many more differences in the growing patterns of different crops have been expressed, and such patterns are significantly important for distinguishing crops from each other.However, because of both the satellite observation plan and the influence of clouds and poor atmospheric conditions, it is unrealistic to build a daily RS data series (e.g., NDVI time series) without interpolation or any reconstruction process.Applying a reconstruction approach to the original observation time series data is required in some operational practice for two reasons: denoising and filling missing values.By contrast, to facilitate the consequent CWT analysis, our approach requires that the input of the CWT is a daily time series for a full year.Therefore, three original VIs are reconstructed by applying an iterated Savitzky-Golay (S-G) filter-based method [47], which was proposed to provide high-quality NDVI time series data, to meet the requirement of the CWT analysis.The S-G filter method was originally proposed by Savitzky and Golay in 1964 [48], which has widely been used in data smoothing and denoising.It is a filtering method based on local polynomial least squares fitting in the time domain, and its biggest advantage is that it can ensure the shape and width of the signal while filtering out noise.Within the reconstruction approach, those positions that have an NDVI increase greater than 0.3 during 20 days are rejected and replaced by a null value before the iterated reconstruction process because such an increase cannot be caused by natural vegetation changes.As different pixels have different situations, in the most severe case, only 3 samples were set to null after applying this threshold.In addition to the reconstruction of the NDVI time series, the positions that are first replaced by a null value in the original NDVI time series are also replaced by a null value when the EVI and GCVI time series are reconstructed.During the reconstruction process, the width of the smoothing window and the degree of the smoothing polynomial are the two most important parameters; they determine the smooth extent of the reconstruction result.The wider the smoothing window, the smoother the result at sharp peaks, and the smaller the degree of the smoothing polynomial, the smoother the result, but bias may be introduced.Finally, we set the width of the smoothing window to 31 and the degree of the smoothing polynomial to 3 to obtain the optimal reconstruction result.

Continuous Wavelet Transformation
The WT has been proven to be a useful tool in the study of time series and has been applied successfully numerous times in an extraordinary range of fields because of its ability to extract various components (e.g., seasonal, trend, and abrupt components) for time series [49].The WT can decompose a signal directly according to the frequency, and represent it in the frequency domain distribution state in the time domain.The Fourier transform is not localized in time, whereas a wavelet is localized in time, which allows the WT to obtain time information in addition to frequency information [49].Thus, it is a more powerful transformation for time series analysis.
A wavelet function (or wavelet, for short), is a function ϕ ∈ R with zero average (i.e., R φ = 0), normalized (i.e., φ = 1), and centered in the neighborhood of t = 0 [50].Scaling ϕ by a positive quantity s, and translating it by u ∈ R generates a family of time-frequency atoms (i.e., the mother wavelet), φ u,s : [51] Given the orignal signal f ∈ R, the CWT of f at time u and scale s is defined as and it provides the frequency component (or details) of f that correspond to scale s and time location u.
which represents the energy of Wf at scale s.Clearly, Φ(s) ≥ 0 for all scale s, and if Φ(s) > 0 we say that signal f has details at scale s.Thus, the scalogram is a time-scale representation of the original signal and allows the detection of the most representative scales (or frequencies) of a signal, that is, the scales that contribute the most to the total energy of the signal.Because the term frequency is reserved for the Fourier transform, the WT is typically expressed in scales, but it is possible to convert scales to frequencies using the following equation [52]: where f a is the frequency, f c is the central frequency of the mother wavelet, and s is the scaling factor.Many families of wavelets exist, and they differ from each other because, for each family, a different trade-off has been made regarding how compact and smooth the wavelet appears.For example, four of the most common continuous mother wavelets were tested in this research [53]: Mexican hat wavelet (mexh Shannon wavelet (shan where B is the bandwidth and C is the center frequency, and Morlet wavelet (morl After comparing their influence on the scalogram, we finally used the "Morlet" as the mother wavelet because it extracts features with equal variance in time and frequency, which ensure the time-frequency resolution can be adapted to different signals of interest and provide a guarantee of extracting temporal features.The scale varies from one to a maximum scale, and the specific maximum scale is determined by calculating the entropy (Equation ( 12)) of the scalograms [54]: where H is the entropy of X, and p i is the probability of the ith class in X.

CNN-Based Identification
Over the last few years, CNNs have made great advances in many fields, such as reducing noise, super-resolution reconstruction [55], pan sharpening [56], image segmentation [57,58], object detection [59], change detection [60,61], and classification [62,63].In these studies, the convolutional operation was applied in both the x and y dimensions to detect the potential architecture or patterns hidden in data.Similarly, in this study, a CNN is used to detect the distinctive characteristics of vineyards in both the time and scale dimensions in scalogram data because it has the following strengths: i) extreme versatility that allows it to approximate any type of linear or nonlinear transformation, including scaling or hard thresholding; ii) it is not necessary to design handcrafted filters, they are automatically learned by algorithms; and iii) high-speed processing because of parallel computing.In this study, a new CNN similar to LeNet [64], which combines small convolutional kernels with maximum or average pooling to learn high-level features, is built to identify vineyards.The convolution and pooling operations are conducted both on the frequency/scale and time axis in our experiment because the scalogram contains time-related frequency patterns of each planting condition.Specifically, Table 2 lists the CNN layers in order.For example, the first convolutional layer represents a convolutional kernel with three input channels, 12 output channels, and a size of 5; and the first pooling layer represents a max pooling kernel of a size of 4 and strides of 2. The activation functions for all convolutional layers and the first dense layer is an ReLU, whereas it is softmax for the last dense layer.The output of the last convolutional layer is concatenated into one vector, then fed into the fully connected layer with 100 units.Finally, there is a softmax layer with two units, which indicate non-vineyard and vineyard.
To test an unknown pixel, the CNN takes its three VIs' (i.e., NDVI, EVI, and GCVI) scalograms, concatenated as a three-dimensional array, as the input, and the maximum unit of the last softmax layer is the result.For each scalogram, the size of x dimension is equal to the length of the reconstructed time series (i.e., 365), and the size of y dimension is dependent on the value of maximum scale of CWT (i.e., 200 in this research), which is determined by calculating the optimal entropy of scalogram.As three stacked scalograms are directly fed into the CNN without splitting them into several small patches, the input size of CNN is 365 × 200 × 3. The construction of CNN is implemented with the assistance of Keras, and TensorFlow is used as the backend of Keras in the research.

Experiment Settings and Accuracy Assessment
In addition to the proposed approach, RF [65], one of the most commonly used and effective methods for mapping land cover and crop types using RS data, was selected to identify vineyards under the same circumstances to serve as a control group, because such applications often exploit multi-phased images while RF is not able to capture temporal information contained in multi-phased images.The original time series of three VIs were directly concatenated as one vector, and then the fitted RF classifier was applied to perform binary classification.The RF algorithm is implemented with the assistance of the Scikit-learn library, which is a machine learning library in Python.Within the fitted RF model, 50 trees were used, the maximum depth of each tree is automatically determined by the algorithm, and the maximum number of the features considered to be used when splitting each node during the construction of a tree is set to the square root of the number of the total features (i.e., 10 in this research), other parameters were set to default values.The input of RF is a vector of 99 elements (33 timestamps × 3 indices).Note that the same training and testing datasets listed in Table 1 were used to train the RF classifier and test its performance.Finally, confusion matrices that include the producer's accuracy (Pro.'s Acc.), user's accuracy (Usr.'s Acc.), and overall accuracy (OA) were used to assess the identification performance.

Reconstructed Time Series of the Three VIs
Noise and oscillations remained in the original time series of NDVI, EVI, and GCVI, which prevented them from illustrating the general and real growing trend of different crops over a full year.An appropriate reconstruction approach could rebuild a daily time series while removing noise and filling missing values to reveal the general trends of different crops and identify the differences among them.Figure 4 shows examples of the reconstruction results of the three VIs for seven typical crops or crop rotations that were summarized in Section 2 during a full year throughout the entire study region.Generally, they all achieved good reconstructed results in terms of R 2 (as shown in Table 3).For crop rotations of spring corn to vegetables and winter wheat to corn (Figure 4a,f), the reconstructed results exactly depicted the phenological characteristics of the two crops and exhibited a stable development trend in the NDVI, EVI, and GCVI time series.The group of peach trees, forest, and vineyards (Figure 4b,c,e) all had unimodal development curves in the three VIs because there was no rotation or changes in the middle year.By contrast, they had different rising/decline rates and smoothness of peaks because of their unique features, such as flowering and deciduous times, planting structure and density, and plant morphology.Greenhouse had relatively complex development curves for all three VIs' reconstructed results over a full year (Figure 4d), and the reasons can roughly be summarized in three categories: (i) the insulation materials that covered the outside of the greenhouse were different; (ii) numerous types of crops were planted in the greenhouse, and each crop had a different canopy reflectance; (iii) the greenhouse opposed the faster and unfixed phenological phase because it provided a superior temperature and humidity environment.The developing curve of vegetables planted on normal cropland generally had three peaks because vegetable farmers in our study region typically grow three seasons of vegetables, and the three reconstructed time series successfully characterized this feature (Figure 4g).Generally, the iterated S-G reconstruction approach produced the projected daily time series data of the three VIs for all planting conditions except the greenhouse, but the substandard reconstructed results of the greenhouse only demonstrated that its reconstruction was subject to the aforementioned factors rather than the unsuitableness of the reconstruction approach.
From the view of the overall development trend of the reconstructed time series of the seven planting conditions (Figure 4), it was possible to distinguish the vineyard from crop rotations of winter wheat to corn and spring corn to vegetables, greenhouse, and vegetables according to the number of growing peaks during the entire year; the vineyard had only growing peak, but the other four planting conditions had more than one.Although peach trees, forest, and the vineyard all had only one growing peak during the entire year, some differences still existed in the reconstructed time series of the VIs.For example, the vineyard developed slower than peach trees and forest during March and April in the NDVI case, and in the EVI case, the vineyard had the flattest development trend, and its values were lower than peach trees and forest after June.Both the development trend and the variations of the VIs values over the entire year made it possible to distinguish the vineyard from peach trees and forest.

Scalograms
To determine the optimal maximum value of the scale, the maximum scale was varied from 1 to 500 to explore the relationship between the maximum scale and the entropy of the scalogram for different planting conditions.Figure 5 shows how the scalograms' entropy relates to the increase of the maximum scale in different cases.Clearly, either three VIs or seven planting conditions demonstrate a uniform trend; that is, as in maximum scale increased, the entropy of the scalogram increased accordingly.However, an entropy saturation phenomenon appeared gradually in three VIs and seven planting conditions when the maximum scale value was greater than 200, particularly in the NDVI case.Therefore, the scales were set to the range of 1-200 in the study, and the consequent CWTs were performed based on this scale setting.

Scalograms
To determine the optimal maximum value of the scale, the maximum scale was varied from 1 to 500 to explore the relationship between the maximum scale and the entropy of the scalogram for different planting conditions.Figure 5 shows how the scalograms' entropy relates to the increase of the maximum scale in different cases.Clearly, either three VIs or seven planting conditions demonstrate a uniform trend; that is, as in maximum scale increased, the entropy of the scalogram increased accordingly.However, an entropy saturation phenomenon appeared gradually in three VIs and seven planting conditions when the maximum scale value was greater than 200, particularly in the NDVI case.Therefore, the scales were set to the range of 1-200 in the study, and the consequent CWTs were performed based on this scale setting.Once the mother wavelet and scales were determined, applying the specific CWT to the reconstructed time series yielded the corresponding scalogram that represented the time series data in both the frequency and time domain.As the scalogram examples in Figure 6 show, in the two-dimensional feature space, that is, frequency and time, each planting condition had a different signature in the three VIs' cases.For example, the crop rotations of spring corn to vegetables (Figure 6a) and winter wheat to corn (Figure 6f) differed in both the coefficient distribution pattern and its magnitude; particularly in the GCVI case, they had great dissimilarities.Similarly, noticeable differences in all three scalograms easily distinguished the vineyard (Figure 6e) from peach trees (Figure 6b) and other forest (Figure 6c), even though many subtle differences still existed among the latter two classes, such as the magnitude difference in the EVI scalograms and the difference in the distribution shape in the GCVI scalograms.For the greenhouse (Figure 6d) and vegetables (Figure 6g), both the magnitude difference of the coefficient and the difference in the distribution pattern in all three scalograms provided sufficient evidence to distinguish them from each other.From the view of distinguishing the vineyard from the other six classes, although NDVI scalograms could not provide an obvious visual difference to distinguish vineyard from crop rotation of spring corn to vegetables, greenhouse, and vegetables, the distribution shape and magnitude differences in the EVI and GCVI scalograms excluded the rotation of spring corn to vegetables, greenhouse, and vegetables.To summarize, the magnitude differences and distribution patterns in all three types of scalograms ensured that time-related frequency features provided sufficient detail to the CNN classifier to successfully complete the vineyard identification task.Once the mother wavelet and scales were determined, applying the specific CWT to the reconstructed time series yielded the corresponding scalogram that represented the time series data in both the frequency and time domain.As the scalogram examples in Figure 6 show, in the two-dimensional feature space, that is, frequency and time, each planting condition had a different signature in the three VIs' cases.For example, the crop rotations of spring corn to vegetables (Figure 6a) and winter wheat to corn (Figure 6f) differed in both the coefficient distribution pattern and its

CNN Classification Results
Considering the stacked scalograms of NDVI, EVI, and GCVI as the input of the CNN, with the learning processes of three convolution layers, three max pooling layers, and a dense layer, the last softmax layer of the CNN indicated the identification result.If the value of the first unit was greater than that of the second unit, then this denoted a non-vineyard; conversely, it denoted a vineyard.Figure 7b represents the identification results of the proposed approach, from which we determined that, in the entire study region, most vines were located in the northwest, with less planting in the northeast and southeast.Figure 7c illustrates the identification results of RF, which share a similar trend with the CNN in the overall distribution pattern.However, RF misclassified many non-vineyards as vineyards and omitted some vineyards that had high similarity with other classes (e.g., peach tree), and the result was that its total planting acreage of the vineyard was more than that of the CNN by 29.8% (Figure 7d).Furthermore, the confusion matrices shown in Table 4 quantitatively characterized the difference between the CNN and RF in vineyard identification.The overall accuracy of the CNN was 89.66%, which was 9.87% higher than that of RF (79.78%).In addition to the overall accuracy, the CNN was also superior to RF in terms of producer's accuracy and user's accuracy.For example, the producer's accuracy of vineyard was 92.90% for the CNN and 60.60% for RF, which was in accordance with RF's higher omission rate and further proved that the CNN had a better capability than RF to distinguish the vineyard from other classes, even if it was very similar to other classes.Note that the user's accuracy of the vineyard was far lower than that of the non-vineyard for both the CNN and RF.This is largely because of the imbalance of the testing sample size (5500 pixels for non-vineyard and 1000 pixels for vineyard).Although the user's accuracies of vineyard and non-vineyard seemed to be less comparable, the huge difference of 21% in the user's accuracy of vineyard between the CNN and RF verified that RF misclassified more non-vineyards as vineyards than the CNN.To summarize, both the spatial distribution map and confusion matrices proved that the approach that combined the WT and CNN was comprehensively superior to the approach that combined RF and the original time series in terms of identifying the vineyard using the Sentinel-2 time series.

Variation of the Entropy of the Time Series Data
In this study, Sentinel-2 images with a cloud-cover-percentage lower than 50% were collected, and NDVI, EVI, and GCVI were then calculated as the original time series.Based on the iterated S-G approach, three time series were reconstructed and then served as the input of the CWT, which produced scalograms to present the time-related frequency information of the reconstructed time series.Therefore, exploring how the reconstruction process and CWT affects the amount of information and whether they are class sensitive is necessary for the further application of the proposed approach.The information entropy was used as the only criterion throughout the assessment process.
For each VI, the average entropies of each class in the original, reconstructed, and scalogram data were calculated, as shown in Figure 8.Clearly, either the three VIs or seven types of planting conditions indicated a uniform trend of the entropy variation across the original, reconstructed, and scalogram data; that is, the entropy value was approximately 5, 8.5, and 15.5 for the original, reconstructed, and scalogram data, respectively.Thus, the reconstruction process and CWT approximately increased the information amount by 70% and 210%, respectively.Most importantly, such a variation of the entropy was not sensitive to the class, and the trend remained the same over the three VIs.The reason for this phenomenon is largely because the three VIs were calculated based on the fixed and same image collection, and the reconstruction approach was not sensitive to the type of VI.This means that once the original time series is fixed, the reconstruction approach and CWT will exert similar effects across different classes and VIs.

Variation of the Entropy of the Time Series Data
In this study, Sentinel-2 images with a cloud-cover-percentage lower than 50% were collected, and NDVI, EVI, and GCVI were then calculated as the original time series.Based on the iterated S-G approach, three time series were reconstructed and then served as the input of the CWT, which produced scalograms to present the time-related frequency information of the reconstructed time series.Therefore, exploring how the reconstruction process and CWT affects the amount of information and whether they are class sensitive is necessary for the further application of the proposed approach.The information entropy was used as the only criterion throughout the assessment process.
For each VI, the average entropies of each class in the original, reconstructed, and scalogram data were calculated, as shown in Figure 8.Clearly, either the three VIs or seven types of planting conditions indicated a uniform trend of the entropy variation across the original, reconstructed, and scalogram data; that is, the entropy value was approximately 5, 8.5, and 15.5 for the original, reconstructed, and scalogram data, respectively.Thus, the reconstruction process and CWT approximately increased the information amount by 70% and 210%, respectively.Most importantly, such a variation of the entropy was not sensitive to the class, and the trend remained the same over the three VIs.The reason for this phenomenon is largely because the three VIs were calculated based on the fixed and same image collection, and the reconstruction approach was not sensitive to the type of VI.This means that once the original time series is fixed, the reconstruction approach and CWT will exert similar effects across different classes and VIs.

How the Mother Wavelet and Scale Range Influence the Entropy
As mentioned in Section 3.2, there are many families (types) of mother wavelets.They differ from each other because each type of wavelet has a different shape, smoothness, compactness, and is useful for different purposes.To assess the influence of different mother wavelets on the reconstructed NDVI, EVI, and GCVI time series, For each mother wavelet, the scales varied from one to a maximum scale, and the information entropy continued to be used as the criteria to assess the influence.Figure 9 illustrates the results of all four mother wavelets in the cases of the three VI and seven planting conditions; note that the maximum scale was iterated from 2 to 502 with an interval of 25.

How the Mother Wavelet and Scale Range Influence the Entropy
As mentioned in Section 3.2, there are many families (types) of mother wavelets.They differ from each other because each type of wavelet has a different shape, smoothness, compactness, and is useful for different purposes.To assess the influence of different mother wavelets on the reconstructed NDVI, EVI, and GCVI time series.
For each mother wavelet, the scales varied from one to a maximum scale, and the information entropy continued to be used as the criteria to assess the influence.Figure 9 illustrates the results of all four mother wavelets in the cases of the three VI and seven planting conditions; note that the maximum scale was iterated from 2 to 502 with an interval of 25.

How the Mother Wavelet and Scale Range Influence the Entropy
As mentioned in Section 3.2, there are many families (types) of mother wavelets.They differ from each other because each type of wavelet has a different shape, smoothness, compactness, and is useful for different purposes.To assess the influence of different mother wavelets on the reconstructed NDVI, EVI, and GCVI time series, For each mother wavelet, the scales varied from one to a maximum scale, and the information entropy continued to be used as the criteria to assess the influence.Figure 9 illustrates the results of all four mother wavelets in the cases of the three VI and seven planting conditions; note that the maximum scale was iterated from 2 to 502 with an interval of 25.Clearly, with the increase of the maximum scale, all the mother wavelets performed similarly in all cases; that is, the saturation phenomenon in the entropy also appeared in the gaus, shan, and morl cases when the maximum scale was greater than 200.Mexh and shan had a similar speed to gaus and morl in reaching the saturation point, whereas their entropies were slightly lower than that of the latter two, particularly in the NDVI case.However, the differences between the four mother wavelets were not significant, particularly in the EVI and GCVI cases.From the view of planting conditions, for all three VIs, the four mother wavelets also demonstrated similar and uniform trends in different planting conditions, which further proved that the mother wavelet was not sensitive to the class.Therefore, it was safe to set Morlet as the mother wavelet to apply the CWT to the reconstructed time series.

Difference Between the Proposed Approach and the Traditional Approach
Because the experimental results shown in Section 4.3 proved that using the CWT and a CNN achieved better performance than using RF and the original time series in identifying the vineyard with the Sentinel-2 time series, it is meaningful to discuss the difference between these two approaches and the corresponding reasons for the purpose of further study and application.Feeding RF with directly stacked original, multi-temporal images results in two adverse outcomes: (i) the direct stacking process of multi-temporal images lost time information and RF considered each feature as independent rather than time-related; and (ii) the same or similar amplitude of features at different times enhanced the redundancy problem, which had a negative impact on the RF classifier.Additionally, the similarity between the "spectrums" of the original time series confused RF in terms of distinguishing vineyards from other crops.For example, vineyards with a higher canopy cover were similar to peach trees, so RF misclassified them as non-vineyard.Conversely, applying the CWT to the reconstructed time series in the time-frequency domain better observed the local characteristics of the original time series and simultaneously observed the time and frequency information.As a result of the CWT, the scalogram represented time-related frequency (scale) information and formed a unique distribution pattern for each type of planting condition (Figure 6).The convolution and max pooling process of the CNN captured the difference of the distribution pattern in the scalogram and then distinguished the vineyard from other classes.Compared with the traditional approach, the proposed approach has two advantages: (i) time information that remained in the time series of the VIs was reserved and used in the identification process; (ii) the excellent ability of the CNN in image identification was fully exploited.Therefore, as confirmed by this study, the proposed approach was comprehensively superior to the traditional approach in terms of identifying vineyards using the Sentinel-2 time series.

Difference Between Proposed Approach and Long Short-Term Memory Networks
A Long Short-Term Memory (LSTM) network is a special type of Recurrent Neural Network (RNN) and has been widely used to analyze time series data due to their ability to capture long-term dependencies.Despite the abundant successful applications in other time series scenarios like voice recognition, machine translation, RNNs have also achieved well performance in the remote sensing community, for example, land cover classification [66][67][68], crop identification [69], change detection [70].Since we have already reconstructed the original time series of three VIs so as to obtain three stable full-year time series in the proposed approach, which also makes it possible to apply an LSTM model to identify vineyards based on these three reconstructed time series.Therefore, an LSTM model, the structure of which was summarized in Table 5, was established to test the difference between itself and the proposed approach.
The construction environment, deep learning framework, and data used for training/testing of the LSTM model are consistent with the CNN approach.Note that three full-year time series vegetation indices were simultaneously used, so, the input shape of the LSTM was 356 × 3 (i.e., 365 days × 3 VIs), and only the last neural of LSTM returned a sequence to serve as the input of last dense layer; it only has one output neural and was designed to do binary classification (i.e., Vineyard or Non-Vineyard).The LSTM model achieved an overall accuracy of 85.28%, and the detail of its identification map is illustrated in Figure 10.LSTM has a similar performance in terms of OA with respect to the approach of CWT+CNN, however, the fact that the user's accuracy (51.24%) is lower than that of the CNN approach (60.72%) indicated that LSTM has misclassified more non-vineyard pixels into the vineyard class, which is consistent with the final distribution map (Figure 4b, Figure 10).In general, the performance of LSTM is slightly inferior to the CWT+CNN but much better than the RF model.It demonstrated that both CW+CNN and LSTM had captured time-dependent features of long time series data.Regarding the difference in identification accuracy between CWT+CNN and LSTM, the potential reasons may include two aspects: (i) CWT and LSTM have totally different principles in extracting temporal information in long time series data; (ii) CNN and LSTM have different structures and different numbers of trainable parameters, but there is no doubt that both methods captured and made full use of the time-dependent information in the long time series data in our vineyard identification research.The LSTM model achieved an overall accuracy of 85.28%, and the detail of its identification map is illustrated in Figure 10.LSTM has a similar performance in terms of OA with respect to the approach of CWT+CNN, however, the fact that the user's accuracy (51.24%) is lower than that of the CNN approach (60.72%) indicated that LSTM has misclassified more non-vineyard pixels into the vineyard class, which is consistent with the final distribution map (Figure 4b, Figure 10).In general, the performance of LSTM is slightly inferior to the CWT+CNN but much better than the RF model.It demonstrated that both CW+CNN and LSTM had captured time-dependent features of long time series data.Regarding the difference in identification accuracy between CWT+CNN and LSTM, the potential reasons may include two aspects: i) CWT and LSTM have totally different principles in extracting temporal information in long time series data; ii) CNN and LSTM have different structures and different numbers of trainable parameters, but there is no doubt that both methods captured and made full use of the time-dependent information in the long time series data in our vineyard identification research.

Applicability in Other Practical Scenarios
The cost of building a time series of RS data has decreased because an increasing number of satellites have been launched into space, and the volume of usable data has largely increased.The time series observation is more capable of characterizing the features of the changes of land surface over a long time, which provides researchers with the opportunity to conduct numerous studies, such as phenological phase detection, crop growing state monitoring, land use/landcover change detection, and crop type identification.In this study, the time series of three VIs derived from Sentinel-2 data were used to identify vineyards based on the combination of the CWT and a CNN.For the CWT, we observed that the scalogram, which was the result of applying the CWT to a reconstructed time series, contained more abundant information than the original and reconstructed time series because it simultaneously reserved the frequency and time information.More importantly, the CWT was not sensitive to the mother wavelet and ground classes when it was

Applicability in Other Practical Scenarios
The cost of building a time series of RS data has decreased because an increasing number of satellites have been launched into space, and the volume of usable data has largely increased.The time series observation is more capable of characterizing the features of the changes of land surface over a long time, which provides researchers with the opportunity to conduct numerous studies, such as phenological phase detection, crop growing state monitoring, land use/landcover change detection, and crop type identification.In this study, the time series of three VIs derived from Sentinel-2 data were used to identify vineyards based on the combination of the CWT and a CNN.For the CWT, we observed that the scalogram, which was the result of applying the CWT to a reconstructed time series, contained more abundant information than the original and reconstructed time series because it simultaneously reserved the frequency and time information.More importantly, the CWT was not sensitive to the mother wavelet and ground classes when it was applied to NDVI, EVI, and GCVI time series, which implies that the CWT is stable and can be used in other frequency analysis scenarios.
In addition to the CWT, the CNN was another important part of the proposed approach.As the CNN has achieved tremendous success in the fields of object detection, image classification, and handwriting recognition, it has become a common, effective solution to the image classification task because of its unique properties.In this study, the CNN also achieved a better identification result than RF.All the previous studies conducted to classify images using CNN proves that the CNN is suitable for image classification.Therefore, the proposed approach is considered to be applicable to other applications based on time series RS data (e.g., crop type classification, mapping land use/landcover).One drawback of the proposed approach is that the computational cost is higher than that of using the original time series data and a traditional supervised method (e.g., RF) to perform the classification task.The two classification processes were conducted on a Think Station P710 with 120 GB of memory and 2 Intel(R) Xeon(R) CPU E5-2640 (2.4 GHz, 40 cores); RF consumed 1.2 min, and our approach consumed 5.6 min (only including generating scalograms and classification with CNN).The time series construction of the whole study region consumed 87 min.However, with the overall improvement of computational power and the decline of computational cost, it appears that the relatively higher computational cost of the complex solution will not be a problem in the future.

Conclusions
Grape, either table grape or wine grape, is widely cultivated in China because of its high economic value.In this study, an approach was designed to identify a vineyard using time series RS data, including original time series reconstruction, the CWT of the reconstructed time series, and training a CNN classifier.First, three VIs (i.e., NDVI, EVI, and GCVI) were derived from the original Sentinel-2 image collection to form the original time series; second, an iterated S-G method that focused on removing outliers and filling missing values was applied to the original time series to yield a daily time series over a full year; third, the CWT was applied to each previous reconstructed time series to generate the corresponding scalogram; and finally, three scalograms were stacked together to serve as the input of the CNN classifier to complete the identification task.The results demonstrated the following: (i) the CWT had stable performance on the reconstructed time series, which provided the CNN classifier with more time-related frequency information, and was not sensitive to the mother wavelet and ground classes; (ii) the CNN classifier with scalograms as input achieved a better vineyard identification result than RF, with an overall accuracy of 89.66% (overall accuracy was 79.78%), thereby making great progress by improving the overall accuracy by 9.87%; and (iii) the CNN captured the unique and subtle features that hide in the scalogram to distinguish the vineyard from other classes, and the combination of the CWT and CNN has more accurate performance than RF in vineyard identification using time series satellite images.
To the best of our knowledge, a few studies have been conducted to identify vineyards using the time series of satellite image data; thus, this is the first time that the proposed approach has been applied to vineyard identification.The proposed approach fully explored the frequency information of time series data by applying the CWT, and then used the advantages of the CNN in image recognition to capture those distinguishing features in scalograms.Since the result has shown that applying CWT to a time series of vegetation indices, in terms of the effectiveness in extracting information of time series, is insensitive to the type of vegetation index, surface category, and type of mother wavelet, the information entropy is stably increased after applying CWT to time series.Furthermore, just as the CNN has achieved good results in classifying scalogram images in our research, many previous studies on the classification of images using CNN have also proved that CNN has unique advantages for image recognition.Therefore, the approach is considered to be able to be applied to many other practical scenarios, and further researches on testing the performance of the proposed approach in applications like classifying other crops (e.g., corn, soybean, and rice) and mapping land use/landcover based on time series images are recommended and appreciated.

Figure 1 .
Figure 1.Geolocation of the study area.

Figure 1 .
Figure 1.Geolocation of the study area.

Figure 2 .
Figure 2. Cloud percentage and acquisition date of the original images.

Figure 2 .
Figure 2. Cloud percentage and acquisition date of the original images.

Figure 3 .
Figure 3. Technical flow chart of the proposed approach.

Figure 3 .
Figure 3. Technical flow chart of the proposed approach.
Remote Sens. 2019, 11, x FOR PEER REVIEW 10 of 22 series of the VIs.For example, the vineyard developed slower than peach trees and forest during March and April in the NDVI case, and in the EVI case, the vineyard had the flattest development trend, and its values were lower than peach trees and forest after June.Both the development trend and the variations of the VIs values over the entire year made it possible to distinguish the vineyard from peach trees and forest.

Figure 7 .
Figure 7. Vineyard identification comparison: (a) original image, (b) result of the CNN, (c) result of RF, and (d) difference between CNN and RF.

Figure 7 .
Figure 7. Vineyard identification comparison: (a) original image, (b) result of the CNN, (c) result of RF, and (d) difference between CNN and RF.

Figure 10 .
Figure 10.(a) Vineyard identification result of the LSTM model, (b) difference between CNN and LSTM.

Figure 10 .
Figure 10.(a) Vineyard identification result of the LSTM model, (b) difference between CNN and LSTM.

Table 1 .
Summary of planting conditions and the training/testing datasets.

Table 1 .
Summary of planting conditions and the training/testing datasets.

Table 3 .
Reconstruction accuracy of example time series in Figure 4 (R 2 ).

Table 3 .
Reconstruction accuracy of example time series in Figure 4 (R 2 ).

Table 4 .
Confusion matrices for the CNN and random forest (RF).