Using a Hidden Markov Model for Improving the Spatial-Temporal Consistency of Time Series Land Cover Classification

Time series land cover maps play a key role in monitoring the dynamic change of land use. To obtain classification maps with better spatial-temporal consistency and classification accuracy, this study used an algorithm that incorporated information from spatial and temporal neighboring observations in a hidden Markov model (HMM) to improve the time series land cover maps initially produced by a support vector machine (SVM). To investigate the effects of different initial distributions and transition probability matrices on the classification of the HMM, we designed different experimental schemes with different input elements to verify this algorithm with Landsat and HJ satellite images. In addition, we introduced spatial weights into the HMM to make effective use of spatial information. The experimental results showed that the HMM considered that spatial weights could eliminate the vast majority of illogical land cover transition that may occur in previous pixel-wise classification, and that this model had obvious advantages in spatial-temporal consistency and classification accuracy over some existing classification models.


Introduction
The mapping of regional land cover is of great help to local construction and plan [1,2].In particular, time series land cover maps make great contributions to monitoring the dynamic change of land use and acquiring the long-term land cover information [3,4].Increasing demand for dynamic land cover information (i.e., annual dynamic land cover changes) has raised the demand for the quality of time series land cover products.However, there is great inconsistency in existing land cover products, which has brought many difficulties in practical applications [5][6][7].Inconsistent spatial-temporal land cover change is a common problem in the production of land cover maps, and the spatial-temporal consistency index is one of the commonly-used evaluation indices of time series land cover products [8].To develop a new classification strategy, the time dimension and spatial relationship of the derived land cover type should be taken into account in the production strategy or incorporates to the classification model.How to adapt a new classification strategy that incorporates the spatial-temporal consistency to the production of time series land cover products is very difficult at present.
The existing time series land cover products have many inconsistent pixels in different periods, which lead to a large number of illogical land cover transitions.Issues with image quality difference, classification algorithm, and category definition can contribute to this inconsistency [9].Time series land cover maps are usually produced independently at each time point.In fact, the classification methods that consider time series information are superior to the classification methods based on a single time point [10,11].However, studies on land cover mapping that consider spatial-temporal land cover change consistency are still limited.In some studies, experiments have been carried out by adding the change information or the rule of land cover change to the classification model or the post-processing model-in which, however, the land cover products were still produced independently for each period [12,13].Cai & Wang et al., considered "illogical land cover transition" into Maximum A Posterior-Markov Random Field (MAP-MRF) model which was proposed by Kasetkasem et al. [14], and improved the MODIS Collection 5 product with the spatial-temporal context information [15,16].Based on their research, Wehmann & Liu proposed a Spatial-Temporal Markov Kernel Method to be used in time series images classification [17].However, the above method cannot optimize the entire time series simultaneously, but only process the classification of each time point sequentially, and then get the final result after multiple iterations.Multiple parameters estimate that this greatly reduces the efficiency.In addition, none of the existing methods have quantified the transformation relationship between the land cover types according to the characteristics of the study area, or distinguished the transformation rate of different land cover types.
To consider the spatial-temporal consistency in the time series land cover mapping, in this study, we proposed a method to incorporate temporal and spatial information from neighboring observations into a hidden Markov model (HMM) to produce land cover maps from time series satellite images simultaneously.The temporal consistency, spatial continuity, and initial classification accuracy of each period were taken into account in the model.Section 2 briefly introduces the basic principles of HMM followed by the step-by-step HMM-based classification approach in Section 3. In Section 4, we designed different experimental schemes to compare and analyze the experimental results.In addition, we introduced spatial weight into HMM to make effective use of spatial information.Finally, we concluded that the HMM that considered spatial weight had obvious advantages in spatial-temporal consistency and classification accuracy.

Hidden Makov Models (HMMs)
HMMs are statistical learning models that can be used in a sequence labeling problem.It describes the process of randomly generating an unobservable state sequences from a hidden Markov chain and generating an observation from each state to produce an observable sequence [18].In the classification problem of time series remote sensing images, the state sequences represent the labels assigned to each pixel in the land cover map.The observation sequence represents the multispectral and multi-temporal reflectance associated with those pixels [19].
Unlike the conventional remote sensing classifiers, HMMs further consider the temporal information of each pixel, obtain the most possible sequence of pixels, and thus improve the temporal consistency and accuracy of the classification result.Formally, we identify the set of possible labels (e.g., water, cropland, forest, etc.) as Ω = {ω 1 , ω 2 , , ω K } (where K is the number of classes), and later the land cover label assigned to a pixel in each year as Y = {y 1 , y 2 , . . . ,y n }, (where n is the number of years in the time series), and X = {x 1 , x 2 , . . . ,x n } are the spectral measurements of pixel in each year, where x t is the vector of measurements associated with year t.Thus, the state sequence Y constitutes a hidden Markov chain, and the observed sequence X is only related to the corresponding state sequence, assuming that the land cover label of the pixel is only related to the last year's label.The dependence between Y and X constitutes a hidden Markov model.
Figure 1 provides a schematic of the HMM used in this paper.y t represents the land cover label of a single pixel in year t, and meets y t ∈ Ω. x t represents the remote sensing spectral measurements related to y t .The goal of HMMs is to infer the most likely state for each y t by giving the observed variables X.The hidden variable Y constitutes a first-order Markov chain in which the state of y t may transform into a different state of y t+1 with a probability defined by a transition matrix, where the probability that a pixel in state i will transform into state j is defined as P(ω j ω i ) .The probability of observing the measurement vector x t , after determining the true land cover label ω t , is given by P(x t |ω t ) .There are K possibilities of the land cover labels for each pixel, so P(x t |ω t ) is a K element vector of probabilities, where K is the number of land cover labels.
The probability of observing the measurement vector , after determining the true land cover label , is given by P( | ).There are K possibilities of the land cover labels for each pixel, so P( | ) is a K element vector of probabilities, where K is the number of land cover labels.In this paper, the initial probability distribution of the pixels π = P( | ) can be derived from the initial classification results of the SVM classifier [20,21].The observation probability of the pixels can be calculated from the probability output of the SVM classifier through the Bayesian theorem.The transition probability matrix is determined by the actual situation of the study area.After determining the three elements of HMMs: the initial probability distribution, the observation probability, and the transition probability matrix, we can estimate the joint probability of spectral measurement sequence of individual pixel by Equation (1): Salberg and Trier considered the utility of HMM algorithms in the context of forest classification and concluded by finding that the mostly likely state sequence (Viterbi algorithm) is the most appropriate for change detection and producing forest maps for specific times [22].Following this guidance, we used the Viterbi algorithm for this paper to determine the most likely state sequence = { , , … , } for each pixel in the study area, which was the most likely land cover labels for the n years of the pixel.

Data and Study Area
To map land cover of study area from 2007 to 2015, Landsat image data were mainly used, and Huan Jing Charge-coupled Device (HJ-CCD) image data were utilized as a supplement.Considering the image quality and acquisition time, we selected Landsat 5 image data from 2007 to 2011 and Landsat 8 image data from 2013 to 2015, the paths and rows of which range from 120 to 123, and 39 to 41, respectively.Due to the lack of enough Landsat data in 2012, we selected Huan Jing Chargecoupled Device (HJ-CCD) data in 2012 as a supplement.The coverage of a HJ-CCD image is larger than a Landsat image, and 33 images which cover the study area were chosen.In addition, the cloud cover of these images is not more than 3%.
Because the cloud cover in some Landsat images was high, we used a minimum synthesis method to obtain the cloud free time series.This means that if any pixel within the scope of the study area was covered by more than an image, we should choose the smallest band value of these for the pixel.
The study area is Poyang Lake Ecological Economic Region, located between 27 degrees and 31 degrees North and between 114 degrees and 118 degrees East.It is located in the middle and lower reaches of the Yangtze River south bank, north of Jiangxi Province, China, including 38 counties (cities, districts) and the whole of Poyang Lake.Its area is 5.12 million square kilometers.More details are presented in Figure 2. The study area was mainly covered by water, cropland, forest, artificial cover, and bare land [23], so we choose the 5 labels as the set of possible labels in our study.In this paper, the initial probability distribution of the pixels π = P(ω 1 |ω 0 ) can be derived from the initial classification results of the SVM classifier [20,21].The observation probability of the pixels can be calculated from the probability output of the SVM classifier through the Bayesian theorem.The transition probability matrix is determined by the actual situation of the study area.After determining the three elements of HMMs: the initial probability distribution, the observation probability, and the transition probability matrix, we can estimate the joint probability of spectral measurement sequence of individual pixel by Equation (1): P(x 1 , x 2 , . . . ,x n |Ω)P(Ω) = n ∏ t=1 P(ω t |ω t−1 )P(x t |ω t ) Salberg and Trier considered the utility of HMM algorithms in the context of forest classification and concluded by finding that the mostly likely state sequence (Viterbi algorithm) is the most appropriate for change detection and producing forest maps for specific times [22].Following this guidance, we used the Viterbi algorithm for this paper to determine the most likely state sequence Y = {y 1 , y 2 , . . . ,y n } for each pixel in the study area, which was the most likely land cover labels for the n years of the pixel.

Data and Study Area
To map land cover of study area from 2007 to 2015, Landsat image data were mainly used, and Huan Jing Charge-coupled Device (HJ-CCD) image data were utilized as a supplement.Considering the image quality and acquisition time, we selected Landsat 5 image data from 2007 to 2011 and Landsat 8 image data from 2013 to 2015, the paths and rows of which range from 120 to 123, and 39 to 41, respectively.Due to the lack of enough Landsat data in 2012, we selected Huan Jing Charge-coupled Device (HJ-CCD) data in 2012 as a supplement.The coverage of a HJ-CCD image is larger than a Landsat image, and 33 images which cover the study area were chosen.In addition, the cloud cover of these images is not more than 3%.
Because the cloud cover in some Landsat images was high, we used a minimum synthesis method to obtain the cloud free time series.This means that if any pixel within the scope of the study area was covered by more than an image, we should choose the smallest band value of these for the pixel.
The study area is Poyang Lake Ecological Economic Region, located between 27 degrees and 31 degrees North and between 114 degrees and 118 degrees East.It is located in the middle and lower reaches of the Yangtze River south bank, north of Jiangxi Province, China, including 38 counties (cities, districts) and the whole of Poyang Lake.Its area is 5.12 million square kilometers.More details are presented in Figure 2. The study area was mainly covered by water, cropland, forest, artificial cover, and bare land [23], so we choose the 5 labels as the set of possible labels in our study.The reference data were mainly collected by artificial visual interpretation from the images in Google Earth and field survey in 2016.Eventually, we selected 7,308 locations and recorded their ground information in term of class label, longitude, and latitude.The number of locations which were covered by water, cropland, forest, artificial cover, and bare land are (in order): 2972, 951, 1939, 1085 and 361.The locations of the samples are basically the same from 2007 to 2015.In addition, we culled a few locations where the land cover labels changed.In this paper, we used 5-fold cross validation for accuracy evaluation [24].The sample dataset D was randomly split into 5 mutually exclusive subsets , , , , of approximately equal size.The entire experiment including training, classification, and testing would be carried out 5 times.Each time we selected ∖ , ∈ {1, 2, 3, 4, 5} as training samples and as testing samples.

Methods
After obtaining the original remote sensing image of the study area, we used minimum value compositing for the same coverage area of the images in the same year, and then proceeded as follows: 1.In the first step, we used an SVM classifier (type: C-SVM, kernel function: radial basis function, penalty parameter C: 100, Gamma in kernel function: 0.01) to process the time series remote sensing images from 2007 to 2015 independently, and obtained the initial classification products containing classification result and probability vectors, which represented the estimated prior probability for each class in each year.2. We could get the probability distribution of each class in each year by analyzing classification result of the SVM classifier, and calculated the observation probability vector for each pixel in each year by combining the probability vectors from the SVM classification.3. Initial probability distribution of each class could be estimated by the above statistical results, and the transition probability matrix was determined according to the actual situation of the study area and the existing study results.4.After determining the three elements of HMM, the joint probability distribution of the spectral measurement vector of each pixel was calculated by the Viterbi algorithm.Finally, we chose the state sequence which made the joint probability to the maximum as the land cover labels for the n years of the pixel.5. Considering the spatial continuity of each pixel in the classification results and giving each class a spatial weight value W by the spatial weight model, step 4 would be repeated to obtain the final classification products.
Figure 3 shows the process details of this methodology in this paper.The reference data were mainly collected by artificial visual interpretation from the images in Google Earth and field survey in 2016.Eventually, we selected 7,308 locations and recorded their ground information in term of class label, longitude, and latitude.The number of locations which were covered by water, cropland, forest, artificial cover, and bare land are (in order): 2972, 951, 1939, 1085 and 361.The locations of the samples are basically the same from 2007 to 2015.In addition, we culled a few locations where the land cover labels changed.In this paper, we used 5-fold cross validation for accuracy evaluation [24].The sample dataset D was randomly split into 5 mutually exclusive subsets D 1 , D 2 , D 3 , D 4 , D 5 of approximately equal size.The entire experiment including training, classification, and testing would be carried out 5 times.Each time we selected D D t , t ∈ {1, 2, 3, 4, 5} as training samples and D t as testing samples.

Methods
After obtaining the original remote sensing image of the study area, we used minimum value compositing for the same coverage area of the images in the same year, and then proceeded as follows: 1.
In the first step, we used an SVM classifier (type: C-SVM, kernel function: radial basis function, penalty parameter C: 100, Gamma in kernel function: 0.01) to process the time series remote sensing images from 2007 to 2015 independently, and obtained the initial classification products containing classification result and probability vectors, which represented the estimated prior probability for each class in each year.2.
We could get the probability distribution of each class in each year by analyzing classification result of the SVM classifier, and calculated the observation probability vector for each pixel in each year by combining the probability vectors from the SVM classification.

3.
Initial probability distribution of each class could be estimated by the above statistical results, and the transition probability matrix was determined according to the actual situation of the study area and the existing study results.4.
After determining the three elements of HMM, the joint probability distribution of the spectral measurement vector of each pixel was calculated by the Viterbi algorithm.Finally, we chose the state sequence which made the joint probability to the maximum as the land cover labels for the n years of the pixel.

5.
Considering the spatial continuity of each pixel in the classification results and giving each class a spatial weight value W by the spatial weight model, step 4 would be repeated to obtain the final classification products.
Figure 3 shows the process details of this methodology in this paper.

Initial Distribution and Observation Probability
Based on the initial classification results produced by the SVM, we selected 3 initial distributions: the average class probability distribution from 2007 to 2015 marked as Ini_1, uniform distribution marked as Ini_2, and the class probability distribution in 2007 marked as Ini_3.More details are shown in Table 1.The observation probability is obtained from the Bayesian formula like Equation (2): Among them, ( | ) is the probability vector from the SVM classification, ( ) is the proportion of each class in year t, and ( ) is a constant value for the pixel.Thus, we use to approximate the observation probability P( | ) in general.

Transition Probability Matrix
The transition probability matrix determines the probability that a pixel transfers from a class to another class between two years, which mainly related to the objective conditions such as the geographical location of the study area, the development plan, and the climate environment [25,26].

Initial Distribution and Observation Probability
Based on the initial classification results produced by the SVM, we selected 3 initial distributions: the average class probability distribution from 2007 to 2015 marked as Ini_1, uniform distribution marked as Ini_2, and the class probability distribution in 2007 marked as Ini_3.More details are shown in Table 1.The observation probability is obtained from the Bayesian formula like Equation (2): Among them, P(ω t |x) is the probability vector from the SVM classification, P(ω t ) is the proportion of each class in year t, and P(x) is a constant value for the pixel.Thus, we use P(ω t |x) P(ω t ) to approximate the observation probability P(x|ω t ) in general.

Transition Probability Matrix
The transition probability matrix determines the probability that a pixel transfers from a class to another class between two years, which mainly related to the objective conditions such as the geographical location of the study area, the development plan, and the climate environment [25,26].For example, the Poyang Lake Ecological Economic Zone has a large number of rivers changes easily between water bodies to bare land.Due to the expansion of the city, the change from bare land to artificial cover is relatively reasonable between two years.Changes from forest to water and from artificial cover to cropland are unlikely to happen between two years.
Generally, the transition probability matrix can vary from pixel to pixel and over time.Therefore, it is particularly difficult to obtain the true transition probability matrix for each year of the study area.In this paper, we restricted our transition matrix to a simplified form: a constant transition probability applied to all pixels from 2007 to 2015.Based on the existing study results [27,28], we focused on the complexity of transition between any two classes, and divided the probability of transition into 6 levels: 99%, 90%, 6%, 3%, <1%, and 0.01%.Then, we gave the probability values of different sizes for each case.For reference, Mingxia Zhu's research results indicate that 3.28% of water, 2.28% of cropland, 0.70% of forest, 1.73% of artificial cover, 7.52% of sand, and 11.11% of beach in Poyang Lake Ecological Economic Region changed between 2008 and 2013 [29].Another simplified approach to the transition probability matrix is to define the transition probability for each class to itself as 90%, like the study of Abercrombie S P & Friedl M A [30], and to other K-1 classes as 10%/(K-1).More details are shown in Table 2.Because the joint probability in the HMM is the product form, the transition probability cannot be 0, or it will seriously interfere with the result and affect the accuracy of classification.A transition probability less than 1% in Table 2 was defined as an illogical transition between two years, which may include: water forest, water→artificial cover, cropland→forest, cropland→artificial cover, forest→water, forest→cropland, forest→artificial cover, forest→bare land, artificial cover→water, artificial cover→cropland, artificial cover→forest, artificial cover→bare land, bare land→cropland, bare land→forest.The proportion of illogic transition was used as one of the important indicators to evaluate the classification results.

Spatial Weight Analysis
Besides the time information of the remote sensing image that was taken into account in the HMM, we further considered the spatial distribution of land cover of each pixel.According to local binary pattern (LBP) theory [31,32], we added a corresponding spatial weight w to each pixel in each year, which could reduce illogical phenomena in the classification process to improve the classification accuracy.
The spatial weight W is related to the number of pixels that are consistent with the object class around the target pixel and whether it is adjacent to the surrounding pixels.In this paper, we adopted 8-neighbourhood, as shown in Figure 4: A1-A8 are 8 pixels around the target pixel A. The number of pixels that are consistent in the class of target pixel A is defined as N A .The number of land cover label change in the chain A1→A2→A3→ . . .→A7→A8→A1 is defined as V A , which depends on whether the land cover classes of the surrounding pixels are distributed continuously.Therefore, the spatial weight W should satisfy the following relationship: . The corresponding spatial weight model is: where α, β are model parameters and α > 0. In this paper, we assumed that the spatial weight model in each year keep independent in the parameters.
ISPRS Int.J. Geo-Inf.2017, 6, 292 7 of 14 the land cover classes of the surrounding pixels are distributed continuously.Therefore, the spatial weight W should satisfy the following relationship: ∝ .The corresponding spatial weight model is: where , are model parameters and > 0. In this paper, we assumed that the spatial weight model in each year keep independent in the parameters.The model parameters for each year were obtained through the training data.Then, we calculated the value of spatial weight w of each pixel and used it to update the joint probability of each Markov chain.Finally, the HMM was applied again to get the final classification products.In this paper, the spatial distribution with a ≤ was defined as an unreasonable distribution.The proportion of unreasonable distribution was considered as one of the important indicators to evaluate the classification results.

Method for Spatial Weight Parameter Estimation
In order to make effective use of spatial information to optimize classification accuracy, we proposed a simple and fast method for estimation the model parameters automatically: a search method of variant step length.Let us consider for a set of M training samples _ ( = 1,2, … , ) from 2007 to 2015.
In the second step, the nine sets of optional parameter combinations were substituted into the spatial weight model to compute the spatial weight W of each training sample in each year.After that, we brought the spatial weight W into the HMM to obtain the classification results of each parameter combination.
The third step was to compute the classification accuracy of nine sets of optional parameter combinations by means of the training sample set.Then, we chose the parameter combination with the highest classification accuracy as the new iterative initial value and the new step length, and were reduced by half.The fourth step was to judge whether the iteration was over.If the step length was less than the predetermined value , the parameter combination at this point was the optimal parameter combination, otherwise return to the first step.The model parameters for each year were obtained through the training data.Then, we calculated the value of spatial weight w of each pixel and used it to update the joint probability of each Markov chain.Finally, the HMM was applied again to get the final classification products.In this paper, the spatial distribution with a N A V A ≤ 1 2 was defined as an unreasonable distribution.The proportion of unreasonable distribution was considered as one of the important indicators to evaluate the classification results.

Method for Spatial Weight Parameter Estimation
In order to make effective use of spatial information to optimize classification accuracy, we proposed a simple and fast method for estimation the model parameters automatically: a search method of variant step length.Let us consider for a set of M training samples S_i(i = 1, 2, . . ., M) from 2007 to 2015.
The first step of the method consisted in determining the iterative initial value of α and β, and the corresponding initial step length H α and H β .Then, we found the nine sets of optional parameter combinations In the second step, the nine sets of optional parameter combinations were substituted into the spatial weight model to compute the spatial weight W of each training sample S i in each year.After that, we brought the spatial weight W into the HMM to obtain the classification results of each parameter combination.
The third step was to compute the classification accuracy of nine sets of optional parameter combinations by means of the training sample set.Then, we chose the parameter combination with the highest classification accuracy as the new iterative initial value and the new step length, H α and H β were reduced by half.
The fourth step was to judge whether the iteration was over.If the step length was less than the predetermined value H θ , the parameter combination at this point was the optimal parameter combination, otherwise return to the first step.

Experimental Schemes
HMM has three elements: initial distribution, transition probability matrix, and observation probability.To compare and analyze the influence of different elements of the HMM on the classification results, we designed different experimental plans, as shown in Table 3.There are 3 initial distributions in Table 1, and 2 transition probability matrices in Table 2.The combinations between these two elements form Plan 2-Plan 7. Plan 1 was used as a reference without any post-processing.In Plan 8, we considered the spatial relationship of each pixel for each year based on the classification results of Plan 2. The spatial weight model parameters α and β in each year were obtained by 54 iterations, and the optimal parameters were: α = (0.20, 0.17, 0.22, 0.16, 0.19, 0.19, 0.20, 0.20, 0.17) T , β = (1.88, 1.93, 1.86, 1.89, 1.87, 1.91, 1.85, 1.88, 1.90) T .

Evaluation of Land Cover Maps
Figure 5 shows the classification results of the SVM classifier and post-processing with the HMM for the time series remote sensing images from 2007 to 2015.Due to limited space, only the results of Plan 1 and Plan 8 are shown here.We can see that many pixels of the study area changed in their land cover labels within one year, which is not likely to happen in reality.Especially in the southwest of the study area, where geographical environment is complex and a large area of illogical transitions is more common.It is proved that the classification results of SVM classifier cannot ensure the temporal consistency of the land cover label with many pixels with illogical transitions, resulting in the relatively low overall accuracy of classification results.In comparison, the classification results of Figure 5b post-processed with the HMM significantly reduced the illogical transitions.We can see that most of land cover labels showed good temporal consistency and the classification accuracy improved.In addition, the HMM in Plan 8 considered spatial weight, so the classification results also have a good performance in spatial continuity.Figure 6 shows the number of year-to-year changes in land cover labels at each pixel from 2007 to 2015 from 8 plans in a 100% stacked bar chart.In the maps produced by SVM classifier, only 18.91% of pixels can remain the same land cover label from 2007 to 2015, and 23.36% of pixels changed more than 5 times from 2007 to 2015-these pixels change their labels almost every year.The results confirm that the SVM classifier cannot guarantee the temporal consistency of the land cover label.The maps of Plan 2-8 show a low level of change from 2007 to 2015, which have been processed by using the HMM.The percent of pixels that remained unchanged in land cover label rose to 73% (Plan 2, Plan 3, Plan 4, and Plan 8), and 55% (Plan 5, Plan 6, and Plan 7).Most importantly, the percent of pixels that change more than 5 times sharply reduced to about 0.03% (Plan 2, Plan 3, Plan 4, and Plan 8), and 0.16% (Plan 5, Plan 6, and Plan 7).The results indicate that, when the HMM is used, most of the pixels change no more than once, which appeared to be close to the actual land cover change in the study area.We can also find that the performance of applying the transition probability matrix TRANS_1 (Plan 2, Plan 3, Plan 4, and Plan 8) is better than TRANS_2 (Plan 5, Plan 6, and Plan 7), and the difference of the initial distribution has little effect on the classification result.The classification result of adding the spatial information (Plan 8) is slightly better than that without any spatial constraint in terms of temporal consistency.Figure 6 shows the number of year-to-year changes in land cover labels at each pixel from 2007 to 2015 from 8 plans in a 100% stacked bar chart.In the maps produced by SVM classifier, only 18.91% of pixels can remain the same land cover label from 2007 to 2015, and 23.36% of pixels changed more than 5 times from 2007 to 2015-these pixels change their labels almost every year.The results confirm that the SVM classifier cannot guarantee the temporal consistency of the land cover label.The maps of Plan 2-8 show a low level of change from 2007 to 2015, which have been processed by using the HMM.The percent of pixels that remained unchanged in land cover label rose to 73% (Plan 2, Plan 3, Plan 4, and Plan 8), and 55% (Plan 5, Plan 6, and Plan 7).Most importantly, the percent of pixels that change more than 5 times sharply reduced to about 0.03% (Plan 2, Plan 3, Plan 4, and Plan 8), and 0.16% (Plan 5, Plan 6, and Plan 7).The results indicate that, when the HMM is used, most of the pixels change no more than once, which appeared to be close to the actual land cover change in the study area.We can also find that the performance of applying the transition probability matrix TRANS_1 (Plan 2, Plan 3, Plan 4, and Plan 8) is better than TRANS_2 (Plan 5, Plan 6, and Plan 7), and the difference of the initial distribution has little effect on the classification result.The classification result of adding the spatial information (Plan 8) is slightly better than that without any spatial constraint in terms of temporal consistency.

Illogical Transition
The proportion of the illogical transitions can be used to evaluate the classification results as an important indicator.Figure 7 shows the distribution of illogical transition of Plan1, Plan 2, and Plan 8, in which the gray value represents the frequency of illogical transition at each pixel in 8 years.Figure 8 shows the proportion of illogical transition from 2007 to 2015 in a 100% stacked bar chart.We can see that the frequency of illogical transition in Plan 1 is significantly higher than those in Plan 2 and Plan 8.The results post-processed with HMM can greatly reduce the illogical transition, compared to the results produced by SVM classifier.As can be seen from Figure 8, the proportion of illogical transition from 2007 to 2015 in Plan 1 is 29.44%, which reduce to about 7.7% in Plan 5, Plan 6, and Plan 7 and reduce to about 2.6% in Plan 2, Plan 3, and Plan 4, and 2.4% in Plan 8.These results suggest that the HMM can remove some of the pixels with illogical transitions, thereby reducing illogical transitions of land cover label in the time series, and it can greatly improve the accuracy of the classification results.

Illogical Transition
The proportion of the illogical transitions can be used to evaluate the classification results as an important indicator.Figure 7 shows the distribution of illogical transition of Plan1, Plan 2, and Plan 8, in which the gray value represents the frequency of illogical transition at each pixel in 8 years.Figure 8 shows the proportion of illogical transition from 2007 to 2015 in a 100% stacked bar chart.We can see that the frequency of illogical transition in Plan 1 is significantly higher than those in Plan 2 and Plan 8.The results post-processed with HMM can greatly reduce the illogical transition, compared to the results produced by SVM classifier.As can be seen from Figure 8, the proportion of illogical transition from 2007 to 2015 in Plan 1 is 29.44%, which reduce to about 7.7% in Plan 5, Plan 6, and Plan 7 and reduce to about 2.6% in Plan 2, Plan 3, and Plan 4, and 2.4% in Plan 8.These results suggest that the HMM can remove some of the pixels with illogical transitions, thereby reducing illogical transitions of land cover label in the time series, and it can greatly improve the accuracy of the classification results.

Illogical Transition
The proportion of the illogical transitions can be used to evaluate the classification results as an important indicator.Figure 7 shows the distribution of illogical transition of Plan1, Plan 2, and Plan 8, in which the gray value represents the frequency of illogical transition at each pixel in 8 years.Figure 8 shows the proportion of illogical transition from 2007 to 2015 in a 100% stacked bar chart.We can see that the frequency of illogical transition in Plan 1 is significantly higher than those in Plan 2 and Plan 8.The results post-processed with HMM can greatly reduce the illogical transition, compared to the results produced by SVM classifier.As can be seen from Figure 8, the proportion of illogical transition from 2007 to 2015 in Plan 1 is 29.44%, which reduce to about 7.7% in Plan 5, Plan 6, and Plan 7 and reduce to about 2.6% in Plan 2, Plan 3, and Plan 4, and 2.4% in Plan 8.These results suggest that the HMM can remove some of the pixels with illogical transitions, thereby reducing illogical transitions of land cover label in the time series, and it can greatly improve the accuracy of the classification results.

Spatial Continuity and Accuracy Evaluation
The spatial continuity of the pixels can be used to evaluate the classification results.The HMM generally only considers the temporal information of each pixel in the time series remote sensing images, without considering the spatial relationship between the pixel and surrounding pixels.Therefore, HMM can only improve the spatial continuity by correcting the false land cover labels of the pixels.When we used the HMM that considered the spatial weight to post-process the classification results, the spatial continuity can be improved.Figure 9 shows the proportion of unreasonable spatial distribution in all plans from 2007 to 2015 as a histogram.
As can be seen from Figure 9, the proportion of unreasonable spatial distribution in Plan 1-7 is about 4.5%, which drops to 2.92% in Plan 8.This confirms the above conclusion-the HMM that considered spatial weight has a good performance in improving the spatial continuity of pixels.Finally, Table 4 shows the average classification accuracy of Plan 1-8.Because these data were manually collected and subjective, the accuracies are all high (>90%).However, the classification results post-processed with HMM (Plan 2-Plan 8) are higher than those produced by SVM classifier (Plan 1).The transition probability matrix defined by actual situation has a better performance than a common simplified process (Plan 2, Plan 3, Plan 4, and Plan 8 are all higher than Plan 5, Plan 6, and Plan 7), which is consistent with our previous conclusions.In short, HMM can use the temporal information of time series images to greatly reduce the illogical transitions in the classification process and improve the temporal consistency of the classification results.When we bring in the spatial weight, HMM can also improve the spatial continuity of the pixels to get a better classification result, with a higher than average classification accuracy of 97.3%.

Spatial Continuity and Accuracy Evaluation
The spatial continuity of the pixels can be used to evaluate the classification results.The HMM generally only considers the temporal information of each pixel in the time series remote sensing images, without considering the spatial relationship between the pixel and surrounding pixels.Therefore, HMM can only improve the spatial continuity by correcting the false land cover labels of the pixels.When we used the HMM that considered the spatial weight to post-process the classification results, the spatial continuity can be improved.Figure 9 shows the proportion of unreasonable spatial distribution in all plans from 2007 to 2015 as a histogram.
As can be seen from Figure 9, the proportion of unreasonable spatial distribution in Plan 1-7 is about 4.5%, which drops to 2.92% in Plan 8.This confirms the above conclusion-the HMM that considered spatial weight has a good performance in improving the spatial continuity of pixels.

Spatial Continuity and Accuracy Evaluation
The spatial continuity of the pixels can be used to evaluate the classification results.The HMM generally only considers the temporal information of each pixel in the time series remote sensing images, without considering the spatial relationship between the pixel and surrounding pixels.Therefore, HMM can only improve the spatial continuity by correcting the false land cover labels of the pixels.When we used the HMM that considered the spatial weight to post-process the classification results, the spatial continuity can be improved.Figure 9 shows the proportion of unreasonable spatial distribution in all plans from 2007 to 2015 as a histogram.
As can be seen from Figure 9, the proportion of unreasonable spatial distribution in Plan 1-7 is about 4.5%, which drops to 2.92% in Plan 8.This confirms the above conclusion-the HMM that considered spatial weight has a good performance in improving the spatial continuity of pixels.Finally, Table 4 shows the average classification accuracy of Plan 1-8.Because these data were manually collected and subjective, the accuracies are all high (>90%).However, the classification results post-processed with HMM (Plan 2-Plan 8) are higher than those produced by SVM classifier (Plan 1).The transition probability matrix defined by actual situation has a better performance than a common simplified process (Plan 2, Plan 3, Plan 4, and Plan 8 are all higher than Plan 5, Plan 6, and Plan 7), which is consistent with our previous conclusions.In short, HMM can use the temporal information of time series images to greatly reduce the illogical transitions in the classification process and improve the temporal consistency of the classification results.When we bring in the spatial weight, HMM can also improve the spatial continuity of the pixels to get a better classification result, with a higher than average classification accuracy of 97.3%.Finally, Table 4 shows the average classification accuracy of Plan 1-8.Because these data were manually collected and subjective, the accuracies are all high (>90%).However, the classification results post-processed with HMM (Plan 2-Plan 8) are higher than those produced by SVM classifier (Plan 1).The transition probability matrix defined by actual situation has a better performance than a common simplified process (Plan 2, Plan 3, Plan 4, and Plan 8 are all higher than Plan 5, Plan 6, and Plan 7), which is consistent with our previous conclusions.In short, HMM can use the temporal information of time series images to greatly reduce the illogical transitions in the classification process and improve the temporal consistency of the classification results.When we bring in the spatial weight, HMM can also improve the spatial continuity of the pixels to get a better classification result, with a higher than average classification accuracy of 97.3%.

Conclusions
In this study, we considered the time series remote sensing images of Poyang Lake Ecological Economic Region from 2007 to 2015 as study subjects.By investigating the geographical environment of the study area and analyzing the common illogical transitions in the images, we obtained the transition probability matrix with expert knowledge.To improve the quality of the classification results produced by common remote sensing classifiers (SVM), we applied HMM that considered temporal and spatial information to timing annotation.After comparing the experimental results, we found that the methods significantly reduced the frequency of illogical transitions and effectively improved the spatial continuity of the time series land cover maps.The improved results of the HMM with spatial weight showed its ability in enhancing the quality of the time series regional land cover maps.In the past, most land cover maps based on remote sensing images had focused on single dates.With the emergence of more time series images, the time series land cover products and land cover change products should take advantage of temporal information from satellite data [33][34][35].The methodology we described in this paper can provide a simple and effective approach for time series land cover classification.
The HMM was employed as the main method of analysis in this study, which contributes to the understanding of land cover change in the study area and thus provides information for city planners.Furthermore, our application also demonstrates the great potential of the directed graph model in time series land cover classification [15,36].In order to meet the needs of higher temporal resolution, it is worthwhile to consider seasonal factors in our future studies.In addition, more statistical models for timing annotation problems are worthy of further study.

Figure 6 .
Figure 6.100% stacked bar chart showing the number of label changes for pixels, 2007-2015.

Figure 6 .
Figure 6.100% stacked bar chart showing the number of label changes for pixels, 2007-2015.

Figure 6 .
Figure 6.100% stacked bar chart showing the number of label changes for pixels, 2007-2015.

Table 1 .
Initial probability distribution of 5 land cover classes.

Table 1 .
Initial probability distribution of 5 land cover classes.

Table 2 .
Transition probability matrix.(a) defined by actual situation, marked as TRANS_1; (b) a simplified process, marked as TRANS_2.

Table 4 .
Average classification accuracy of all plans by 5-fold cross validation.