Indication of the Two Linear Correlation Methods Between Vegetation Index and Climatic Factors: An Example in the Three River-Headwater Region of China During 2000–2016

The within-growing-season correlations (WGSC) and the inter-growing-season correlations (IGSC) are widely used linear correlation analysis methods between vegetation index and climatic factors (such as temperature, precipitation, and so on). The WGSC method usually calculates the linear correlation coefficient between vegetation index and climatic factors of each month in all the growing seasons, for instance, whether vegetation index or temperature had data of 204 months (12 months × 17 years) during 2000–2016 to get the WGSC. The IGSC calculates the linear correlation coefficient between the vegetation index and climatic factors in the same month of each growing season among all the years, for example, only 17 couples’ data of vegetation index and temperature during 2000–2016 were used to get the linear correlation of IGSC. What is the difference between the results of the two methods and why do the results show that difference? Which is the more suitable method for the analysis of the relationship between the vegetation index and climatic conditions? To clarify the difference of the two methods and to explore more about the relationship between the vegetation index and climatic factors, we collected the data of 2000–2016 moderate resolution imaging spectroradiometer (MODIS) 13A1 normalized difference vegetation index (NDVI) and the meteorological data-temperature and precipitation, then calculated WGSC and IGSC between NDVI and the climatic factor in three river-headwater regions of China. The results showed that: (1) As for WGSC, the more of the years included, the higher the correlation coefficient between NDVI and the temperature/precipitation. The correlation coefficient of WGSC is dependent on how many years’ the data were included, and it was increased with the more year’s data included, while the correlation coefficients of IGSC are relatively independent on the amount of the data; (2) the WGSC showed a pseudo linear correlation between NDVI and climatic conditions caused by the accumulation of data amount, while the IGSC can more accurately indicate the impact of climatic factors on vegetation since it did not rely on the data amount.


Introduction
As an important part of terrestrial ecosystems, the vegetation is not only an indicator of global and regional environmental changes, but also plays an important role in regulating regional and global climate [1][2][3][4]. The comprehensive understanding of the changes of the vegetation growth and its responses to climate change is significant for researchers in predicting the trend of vegetation growth in the future, regional changes of environmental conditions and ecosystem evolution [5][6][7]. Normalized difference vegetation index (NDVI) can reflect the optimal state of vegetation growth and is closely related to ground biomass [8][9][10]. It has been widely used in the study of vegetation activities/growth [11][12][13][14]. It also was used to analysis the relationship between vegetation and climate change in many researches [15][16][17][18][19][20][21].
The within-growing-season correlations (WGSC) and the inter-growing-season correlations (IGSC) are two widely used methods to explore the relationship between the vegetation index and climatic factors [16,17,[21][22][23][24][25][26]. The WGSC is used to calculate the correlation coefficients between the vegetation index and climatic factor in all growing seasons, while the IGSC is used to calculate the correlation coefficients in the same month of each growing season. For instance, for the entire year, both NDVI and temperature, which had data of 204 months (12 months × 17 years) during 2000-2016, were used to get the linear correlation of IGSC in season. However, only 17 couples' data of NDVI and temperature during 2000-2016 were used to get the linear correlation of IGSC. What is the difference between the results of the two methods and why does this difference happen? Which is the more suitable method (WGSC or IGSC) to explore the relationship between NDVI and climatic conditions? This requires further analysis.
To answer the above questions, we took the three river-headwater region as a case study and collected the data of 2000-2016 moderate resolution imaging spectroradiometer (MODIS) 13A1 NDVI and the meteorological data, and calculated WGSC and IGSC between NDVI and climatic factor (temperature and precipitation). The main aims are: (1) to explore and compare the difference in relationship between NDVI and climatic factors by linear correlation coefficient (R) at various monthly time scales and (2) to select the method that is more suitable for analyzing the NDVI-climate relationship.

Study Area
The Three River-headwater Region (31 • 39'-36 • 12' N, 89 • 45'-102 • 23' E) is located along the south of the Qinghai-Tibet plateau, Three River-headwater Region is the birthplace of the Yangtze river, the Yellow River, and the Lancang river. The annual average temperature ranges from−5.6 to 3.8 • C, and the annual precipitation ranges from 262.2 to 772.8 mm. The climate of the region is dominated of the continental climate of the plateau, with obvious characteristics of the alpine climate. Precipitation is unevenly distributed within the year and concentrated in summer, with the same period of rain and heat, and precipitation gradually decreases from southeast to northwest in general. The source area of the Yellow River is located in the east with an elevation of about 4000 m; the central and western parts are the source area of the Yangtze River with an elevation of more than 4500 m; the central and southern parts are the source area of the Lancang River with an average elevation of about 4400 m.

Data Sources
The MODIS13A1 (moderate resolution imaging spectroradiometer) data were used in the study. MODIS13A1 data with every 16 days at 500-m spatial resolution as a gridded level-3 product in the Sinusoidal projection from 2000 to 2016 was downloaded from NASA (www.earthdata.nasa.gov). The radiometric correction, geometric correction and image enhancement was conducted in the dataset. Then we used the MRT (MODIS reprojection tools) to deal with the NDVI data format and projection transformation. The maximum value composite method was applied in this study to synthesize the monthly scale data in order to minimize the effects of the cloud and monthly phenology [27]. Finally we completed the remote sensing image mosaic and tailoring batch image boundaries by using ArcGIS-Python. At last the growing season of the NDVI image time-sequence in the study area was generated. Since most of three river-headwater region is located in the high altitude, most of the region's NDVI value begins to increase when the daily average temperature rises to more than 0 • C in early May, and it reaches the maximum in July, then the NDVI value gradually reduces. The vegetation withers and stops growing at the end of September. So the area of the growing season is May to September [28,29].
The vegetation type and distribution map (1:4000,000) was get from the National Cryosphere Desert Data Center [30]. The vegetation types in the region were classed as alpine plant, typical steppe, alpine steppe, and meadow steppe, as shown in Figure 1. Since alpine plant covers only a very small area in the whole study area and no meteorological data were obtained in alpine plant area, this study only analyzed typical steppe, alpine steppe, and meadow steppe ( Figure 1). The alpine steppe is mainly distributed in the northwest and southeast of the source region and consists of clumps of hardy grasses. The alpine steppe covers almost two thirds of the source area and is composed of cold-tolerant herbaceous plants. The typical steppe is scattered in the south of the Lancang river source area and the east of the Yellow River source area, mainly composed of cold-tolerant mesophilic or mesophilic herbaceous plants. The monthly average temperature (T) and monthly precipitation (P) during 2000-2016 in the vegetation growth season were get from three meteorological stations in and around Wudaoliang, Nangqian and Henan in three river-headwater region, which were collected from the Meteorological Information Center of China Meteorological Administration.

Calculation of NDVI Around the Weather Station
To avoid interference of urban buildings and human factors to the vegetation in the vicinity of the site, the averaged NDVI were calculated in the area with a radius of 25 km around the weather station [31,32]. According to the vegetation type map, the vegetation types near the meteorological stations in Wudaoliang, Nangqian, and Henan in the study area are mainly alpine steppe, meadow steppe, and typical steppe.

Pearson Correlation Analysis
Pearson correlation analysis was applied to get the linear correlation coefficient between variables Formula (1). The range of R is from −1 to 1. If R > 0, there is a positive correlation, while if R < 0, there is a negative one. If R = 0 or about, there is no linear correlation between two variables. In the formula, n is the number of samples; x i is NDVI in the i th month; y i is the climatic factors in the I th month; and − x and − y represent the means of x and y.
The significance test of R is conducted by a t test, and the formula is as follows: In this study, we analyzed R between NDVI and climate factor (temperature and precipitation) using two method, WGSC and IGSC. Both of them were calculated in different time scales (difference in how many years' data included), which is shown in Table 1.   Table 1.
Lilliefors test for normality of meteorological data was conducted in MATLAB, and the results are shown in Table 2. The precipitation and temperature data were normally distributed.

WGSC Between NDVI and Climatic Factors
The R of NDVI and monthly mean temperature in different time scales was calculated (Table 3), and R of NDVI and monthly precipitation in different time scales was shown as well (Table 4). Table 3. Within-growing-season R between normalized difference vegetation index (NDVI) and monthly mean temperature in different periods from 2000 to 2016.    With the accumulation of time length (years), R of NDVI and monthly mean temperature generally presents an increasing trend, this is to say, the longer the time series is, the higher the correlation coefficient is. When the time series was short (2016 and 2014-2016), R of the alpine steppe NDVI and the monthly mean temperature was lower than that of the other two types of grassland. Table 4 shows that meadow steppe NDVI and monthly precipitation were significantly positively correlated in periods of 2000-2016, 2006-2016, and 2010-2016. The alpine steppe NDVI and monthly precipitation were significantly positively correlated in periods of 2000-2016. R of typical steppe NDVI and monthly precipitation was not higher than that of the other two types of steppe and precipitation. All the three different types of grassland showed a trend of increase in R with the increase of duration length, which was similar to the results in Table 3.
Comparing the two Rs between NDVI and the climatic factors, R between NDVI and monthly precipitation was lower than it between NDVI and monthly mean temperature. This shows that the vegetation is more sensitive to temperature than it is to precipitation. The reason is that the three river-headwater region is located in the hinterland with high altitude, and the climate is cold all year. The vegetation growth in the region is more sensitive to temperature than precipitation [33].
Tables 3 and 4 both showed that with the accumulation of years, the correlation coefficients between the two factors were also increased gradually as the time interval longer. Figures 2 and 3 show the fitting lines of three different types of steppe NDVI in different time scales in WGSC. It shows that with the lengthening of time series, the number of samples included in calculation increased, and R values kept increasing and gradually approached 1.   Table 5 shows R between NDVI and each month mean temperature (May, June, July, August, and September) in the inter-grown-seasons during 2000-2016. Compared to R between NDVI and climatic factors in WGSC during 2000-2016 (Table 3), R in IGSC was smaller. There were significant correlations between the temperature and meadow steppe, alpine steppe in May, and typical steppe in June. The results indicated that the temperature in May could promote the growth of meadow steppe and alpine steppe, and in June it could benefit the growth of the typical steppe. In the early stage of vegetation growth (May) in the three river-headwater region, the temperature had a significant impact on NDVI, while had little impact at the end of the growing season. Note: * and ** were significant at p < 0.1 and p < 0.05, respectively. Table 6 shows the R between NDVI and monthly precipitation in the inter-grown-seasons during 2000-2016. Note: * and ** were significant at p < 0.1 and p < 0.05, respectively.

IGSC Between NDVI and Climatic Factors
In July, there were significantly positive correlations between precipitation and the meadow steppes, and the typical steppe, while the relationship between precipitation and the alpine steppe was not significant in the whole growing season. It showed the meadow steppes and typical steppe was sensitive to precipitation in the middle of the growing season (July and August for meadow steppe only), this is because the precipitation absorbed by the vegetation root would promote the growth of grass, which could increase the NDVI value.

IGSC Between NDVI and Climate at Different Time Scales
The changes in R between NDVI and climatic conditions (temperature and precipitation) were analyzed at various time scales Figure 4. With the length of duration increased, R between NDVI and precipitation fluctuated slightly for all the types of the vegetation in IGSC, while R between NDVI and temperature had no accordant trends.

Discussion
The growth of vegetation is mainly driven by water and heat conditions. Moderate temperature and adequate precipitation usually lead to a high vegetation index. However, this repetition over the years will bring out a high correlation [31]. From the linear fitting degree of NDVI in the within-growing-season, it is found that with the increase of duration length, the number of samples in the linear relation calculation also increases, and R will increase continuously and gradually approach to 1. However, most of the former studies have not considered the phenomenon of synchronization of precipitation and temperature when analyzing the NDVI-climate relationship, so there needs more verification [31,32,34,35]. The purpose of a linear correlation analysis between NDVI and climatic conditions is to use a month of meteorological data to predict crop biomass or vegetation index. For example, if the correlation coefficient is 0.8, which means a climatic factor such as temperature or precipitation can be used to predict the vegetation index with 80% probability. Theoretically, the correlation coefficient does not change with the change of the sample amount. So, the WGSC may indicate a pseudo correlation between NDVI and climatic factors, which may not reasonably explain the correlation between NDVI and climatic conditions.
The IGSC between NDVI and climatic conditions was analyzed separately at different time scales. This method can reduce the influence of synchronization of precipitation and temperature, avoid to producing highly repetitive results, and eliminate the false correlation.

Conclusions
In this study, we took the Three River-Headwater region as a study area and analyzed the linear correlations between NDVI and climatic conditions (such as monthly mean temperature and monthly precipitation) used two different methods (WGSC and IGSC). We explored the variation and trends of both WGSC and IGSC between NDVI and climatic conditions at various monthly time scales. The main conclusions are as follows: (1) The relationship between NDVI and climate was different when comparing WGSC and IGSC.
As for WGSC, the R between NDVI and the temperature or precipitation increased with the lengthening of durations. (2) The correlation coefficients of WGSC are more dependent on the duration length. It would increase with the accumulation of growing seasons used in the calculation. However, the correlation coefficients of IGSC are relatively independent of data included. (3) Since the synchronization of rainfall and temperature in a year, it indicate that WGSC was a pseudo linear correlation between NDVI and climatic conditions caused by the accumulation of the sample amount, which may not truly indicate the influence of precipitation and temperature on vegetation growth. It is found after separate analyses at different time scales that the IGSC can eliminate the impact of synchronization of precipitation and temperature. Thus, the results obtained by this method may be more reasonable to explain the relation between NDVI and climatic factors.
There are many challenges in the exploration of the relationship between vegetation and climate due to the complexity of the vegetation response to the environment. These findings may be helpful on how to choose a more reasonable linear correlation method to study the relationship of NDVI-climate. The purpose of the study was to clarify the difference of the two widely used linear correlation analysis methods between the vegetation index and climatic factors. Therefore, we considered the single-factor linear correlation between precipitation and temperature temporarily, and further research about multiple factors and the interaction between each factor is necessary.