Monitoring the Spatio-Temporal Dynamics of Shale Oil/Gas Development with Landsat Time Series: Case Studies in the USA

Shale oil/gas extraction has expanded rapidly in the last two decades due to the rising energy prices and the advancement of technologies. Its development can have huge impacts on and, at the same time, is also deeply affected by energy markets, especially in an era with high economic uncertainty. Understanding and monitoring shale oil/gas development over large regions are critical for both energy policies and environmental protection. However, there are currently no applicable methods to track the spatio-temporal dynamics of shale oil/gas development. To fill this gap, we propose a new NDVI Trajectroy Matching algorithm to track shale oil/gas development using the annual Landsat NDVI composite time series from 2000 to 2020. The results reveal that our algorithm can accurately extract the location and time of shale oil/gas exploitation in Eagle Ford and Three Forks, with an accuracy of 83.80% and 81.40%, respectively. In the Eagle Ford area, accuracy for all disturbance year detection was greater than 66.67%, with the best in 2011 and 2019 at 90.00%. The lowest accuracy in the Three Forks area was 63.33% in 2002, while the highest accuracy was 93.33% in 2019. In conclusion, the algorithm can effectively track shale oil/gas development with considerable accuracy and simplicity. We believe that the algorithm has enormous potential for other applications, such as built-up regions, forests, farmlands, and water body expansion and contraction involving vegetation damage.


Introduction
In 2020, the Financial Times reported that oil prices in the United States crashed into the negative territory for the first time in history as the evaporation of demand caused by COVID-19 left the world awash with oil and short of storage [1]. Since the industrial revolution, energy has become a material guarantee for the survival and development of human society [2]. It is an essential foundation for maintaining lighting, transportation, catering, heating, cooling, and automated management systems [3,4].
Energy is also the most fundamental driving force for national development and economic growth [5,6]. Reserves and the development and utilization of energy are essential elements for the healthy and stable development of a country [7]. After more than two decades of development, the shale gas revolution, which began in the 1990s, has changed the pattern of energy use in the United States and around the world [8]. Shale oil/gas well platforms are the carrier of shale gas extraction. Thereby, obtaining datasets of shale oil/gas well platforms is the basis for analyzing energy developments in the United States, especially in current special epidemic period.
Currently, datasets on shale oil/gas extraction are often obtained from government statistics and field visits [9]. These datasets often give only a general range of wells and mines, lacking a long time series of data for shale oil/gas monitoring [10,11]. It is difficult to continuously obtain a spatial and temporal distribution of shale oil/gas development map. Optical remote sensing images, as massive datasets for systematic exploration of the Earth from space during the daytime, have considerable advantages in land cover and land use surveys and change monitoring [12,13]. The local land cover and ecological environment are often artificially changed during the process of shale oil/gas development [14]. Therefore, acquiring land cover through a long-term remote sensing time series to monitor land disturbance provides the most important and practical method for monitoring shale oil/gas development [15].
Seto et al., investigated land disturbance studies using multitemporal optical remote sensing to analyze global urban land expansion, and presented a meta-analysis of 326 studies to map urban land conversion [16]. Gong et al. mapped Global Annual Impervious Areas (GAIA) from 1985 to 2018 using the entire archive of 30 m resolution Landsat images on the Google Earth Engine platform to understand the process of urbanization and land use/cover change as well as the impacts on the environment and biodiversity [17].
Recently, Liu et al., also mapped global annual urban dynamics (GAUD) from 1985 to 2015 at a 30 m resolution using numerous surface reflectance data from Landsat to provide basic information of global environmental change and contribute to applications related to climate mitigation and urban planning for sustainable development [18].
Although the above techniques are easier to implement, they may fail to provide comprehensive information regarding changes [32]. It was found that using feature space transformation operation [33,34] can provide additional textural and spatial features to solve the problem of the homologous spectrum and heterogeneous spectrum brought by spectral interpretation and reduce "salt and pepper effects" [35].
Due to the complexity of geographic space, it is difficult to adaptively select the segmentation scale, especially for long-term sequence land disturbance. In addition, the time scale is also one of the important factors of land disturbance [44,45], and some scientists have designed different algorithms to remove the effect of seasonal characteristics of plants on long time series [46][47][48][49][50]. There are methods that combine the position and time point of disturbances, like Breaks For Additive Seasonal and Trend (BFAST) [44], Landsat-based detection of Trends in Disturbance and Recovery (LandTrendr) [45,51,52].
For example, the LandTrendr method uses both trajectory segmentation and temporal segmentation to capture both abrupt and slow change phenomena and the broad features of the trajectory in the Landsat time series. While it is often necessary to simulate the possibility of changes in all disturbance values throughout time, such as continuous rise, continuous fall, rise and fall back and forth, and more, it may waste a great deal of computing time in certain cases.
Vegetation Indices (VI) play a vital role in remote sensing analysis. They are pretty easy to analyze and essential for both quantitative and qualitative measurements of vegetation cover and properties of soil [32,34,53]. Furthermore, they can reduce the impacts of topographic effects [54] and illumination by using NDVI, which is more suitable for long-term sequence land disturbance monitoring. However, to date, there are no refined spatio-temporal shale oil/gas maps, and shale oil/gas extraction's spatial and temporal characteristics have not been widely discussed.
Existing disturbance monitoring algorithms, which tend to focus only on the spatial and temporal variability of the features, rarely consider the quantitative characteristics of the disturbance. In this paper, we focus on developing a method to pinpoint shale oil/gas platform construction location and time over large regions, with the 30 m resolution Landsat time series. Our proposed method is based on the observed fact that the construction of shale oil/gas platforms often causes long-term damages to vegetation at the sites, and their sizes are large enough to be captured with Landsat time series. Although high-resolution images are much better to detect shale oil/gas platforms, especially with the help of deep learning technologies, they are often very expensive to cover large regions, and historical archives are generally not available in most cases.

Study Areas
Shale oil/gas plays are widely distributed in the United States (Figure 1a). The United Nations Conference on Trade and Development (UNCTAD) reported that the United States shale gas reserves are the fourth largest in the world (17.7 trillion cubic meters) [55]. In 2000, shale gas provided only 1% of natural gas production. By 2010, however, it grew rapidly to over 20%, mainly due to energy mining technology advancements [56]. The main shale play areas in Figure 1a represent the major shale plays in the U.S. The location information is published by the Energy Information Administration (EIA) [57]. In this study, we selected two typical shale oil/gas production areas in the United States as the study area: Eagle Ford ( Figure 1b) and Three Forks (Figure 1c). Eagle Ford shale (discovered in 2008) is a late Cretaceous sedimentary rock stratum, located below most parts of southern Texas, covering an area of 3000 square miles [58]. It is composed of an organic-rich marine shale found in the outcrop. This hydrocarbonproducing formation rich in oil and gas extends 400 miles east of Texas from the Texas and Mexico border in Weber and Maverick counties. The shale gas play is 50 miles wide, with an average thickness of 250 feet and a depth of 4000 to 12,000 feet [59]. Shale contains a large amount of carbonate, which makes shale fragile and easier to carry out hydraulic fracturing to produce oil or natural gas.
The Three Forks play has emerged as a significant exploration and development target for the Williston Basin oil and gas industry. The play's primary acreage is located in western North Dakota. Since the discovery of the Parshall oil field, the state's daily oil output has climbed from approximately 90,000 B/D (1990)(1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005) to more than 1.4 million B/D in 2019 [60]. While recent global events have delayed development and production, over 15,400 horizontal wells have been drilled and finished within the Bakken and Three Forks Formations to date, producing over 3.3 billion barrels of oil from North Dakota alone [61]. Figure 2 shows the flow chart for this study, which is divided into three parts: source data, processing steps, and result evaluation. We first retrieve the Landsat-7 and Landsat-8 data, then apply a multi-temporal rolling strategy together with the mixed NDVI compositing method to generate a 20-year NDVI composite time series for each pixel. Then, based on the NDVI value's distributions of vegetation and the shale oil/gas well platforms obtained from samples, we simulate the NDVI curves with changes occurred in each year from 2000 to 2020, by generating Gaussion random numbers. The Manhattan distances are then calculated between the 20-year NDVI composite time series and the simulated change template pixel by pixel. The year corresponding to the smallest distance is the year in which the shale oil/gas well platform was built. Simultaneously, we use a distance threshold to effectively reduce the false alarm rate. If a calculated minimum distance is larger than that threshold, that pixel will be labeled as an other type of change and will not be considered relative to shale oil/gas development. Finally, the shale oil/gas platform's area attribute is used to Remote Sens. 2022, 14, 1236 5 of 22 eliminate some isolated pixels and large patches. The visual interpretation is completed with high-resolution Google Earth historical images.

Data
Landsat-7 and Landsat-8 satellite images were used in this study. Landsat-7 is the world's longest-running optical remote sensing satellite collecting imagery at the 30 m scale. It was launched in 1999 and is equipped with the Enhanced Thematic Mapper Plus (ETM+). ETM+ detects reflected solar radiation and emitted thermal radiation passively, using eight bands of sensors that span the infrared to the visible light spectrum. Landsat-7 has collected a vast amount of high-quality remote sensing image data since its launch and is now the longest-running land satellite in terms of observation time.
Due to the unexpected failure of the Landsat-7 ETM+ on-board scan line corrector (SLC) in May 2003, which resulted in data gaps and 22% pixels loss in the obtained images [62]. However, the ETM+ data are commonly used for land cover change analysis due to their long time archive [63][64][65]. NASA launched the Landsat-8 satellite in 2013, which has the exact spatial resolution and spectral characteristics of the Landsat-7 satellite. The Operational Land Imager (OLI) and the Thermal Infrared Sensor (TIRS) are aboard the Landsat-8 satellite. Landsat-7 and Landsat-8 are sun-synchronous polar-orbiting satellites with a 16-day equatorial repetition period, and both acquire images at a spatial resolution of 30 m [66].
We accessed and processed satellite data on Google Earth Engine (GEE). GEE is a cloud-based platform that allows users to access and use a wide range of remote sensing data, including a substantial number of preprocessing, calibration, and correction products. Google also provides the computing power for planetary-scale geospatial analysis [67]. We used GEE to retrieve Landsat-7 and Landsat-8 data for this study (USGS Landsat-7 and Landsat-8 Collection 1 Tier 1 TOA Reflectance). The Landsat TOA Tier 1 collection on GEE comprises data with calibrated reflectance values of the highest geometric and radiometric quality [68], making it appropriate for time series analysis [69].
Simultaneously, the TOA spectral features accurately match ground characteristics and efficiently capture ground disturbance phenomena. We also used high-resolution historical images from Google Earth to validate our classification results [52]. Google Earth has a variety of high-resolution satellite images, and the data is regularly updated, making it ideal for evaluating change detection results. Table 1 shows the specifics of the data we use.

Data Preparation
All Landsat-7 and Landsat-8 datasets were released with preprocessing metadata about cloud coverage for each scene, as well as cloud and cloud shadow information for each pixel. The Quality Assessment (QA) Bands [70] in Landsat is a bitmask that provides information about pixel-level clouds and cloud shadows. We mask all pixels with clouds or cloud shadows based on the information in the QA Bands. It is worth noting that low-quality pixels in Landsat-7/ETM+ images frequently lack entire spectral bands, particularly around the image's periphery. Before final use, these parts must be recognized and eliminated.
We generate a low-quality ETM + pixel mask by examining bands 1-5 and 7 of a Landsat-7 image using a simple and effective method [71]. The gap generated by SLC-OFF is also included in this mask. If the product of the corresponding positions of the six bands equals zero, the pixel is of low quality and should be removed. Following these two mask-processing steps, the remaining pixels constitute the images in the following compositing process.
Using the method of Kennedy et al. [51], we applied the harmonizationRoy function on the filtered image collections to coordinate the Landsat-8/OLI images with the Landsat-7/ETM+, ensuring that the spectral parameters of the Landsat sensor remain consistent. We utilized the TOA spectral transformation coefficient [72] in this method, and the band conversion encompasses six bands: blue (B1), green (B2), red (B3), near-infrared (B4), short wave infrared 1 (B5), and short wave infrared 2 (B7). The band selection is based on Landsat-7. Simultaneously, to ensure consistency, we renamed the band corresponding to Landsat-7 in Landsat-8 [73].

Annual NDVI Composite
Since Landsat-7/ETM+ was made available to the public for free in 2008 [74], a vast number of 30 m resolution optical remote sensing images have been collected into GEE. However, their use has been severely limited because of cloud and cloud shadow pollution and a lack of algorithms to evaluate them. As a result, for a long time, most applications had been restricted to small areas that can be covered with single imagery [16]. Researchers have established two primary ways to deal with clouds in optical remote sensing images: (1) direct but arduous mapping efforts and (2) picking only high-quality clear pixels in partially polluted images through a multi-temporal image compositing process according to specific criteria.
The automatic cloud classification approach based on a single Landsat image achieves excellent accuracy in detecting clouds and cloud shadows [75][76][77][78], and those based on multitemporal images can achieve better results [79][80][81]. However, supervision classification algorithms are generally time-consuming and labor-intensive, making them unsuitable for widespread use. In the beginning, satellite image compositing was intended to pick up pixels without cloud or cloud shadow pollution pixel by pixel for the Advanced Very High Resolution Radiometer (AVHRR) [82].
This method seeks to produce synthetic products of clear image pixels from different times, even if no one clear image is available in a given period. Consequently, it may use the data from all incompletely polluted images. The Maximum NDVI Value approach and the maximum apparent temperature method are the most widely employed compositing methods [82]. Both are based on the coarse resolution AVHRR images, and clouds and cloud shadows affect the vegetation surface's NDVI value and perceived temperature.
The Maximum NDVI Value method works well over vegetation but fails over water and bare land surfaces because the NDVI value of clouds can be greater than that of bare land and water surfaces [83]. The maximum apparent temperature technique assumes that the surface remains stable over a short time. However, the earth's surface may change over a long time, and the surface may also vary significantly over a more extended period. Even on a gloomy summer day, the temperature of pixels may be higher than on a clear winter day [71]. Landsat's revisit duration is 16 days, which indicates that more data is required to gather sufficient observations from Landsat. Thus, the synthesis period will be long.
In this research, we utilized the nominal rolling compositing method of multi-temporal data to obtain adequate data to collect clean pixels, as shown in Figure 3. We used three years of Landsat-7 images from 2000 to 2013, including images from the current and the immediate following two years, regardless of their cloud coverage to ensure that there are enough numbers of observations for image compositing. Using the rolling method, we obtained 13 years of annual synthetic Landsat-7 images from 2000 to 2013. We used the previous harmonization approach described in Section 2.3.2 to harmonize Landsat-7 with Landsat-8 from 2014 to 2020. Similarly, we collected Landsat-7 and Landsat-8 mosaics using a two-year rolling method from 2014 to 2020, considering that the number of observations now significantly increased. Finally, from 2000 to 2020, we generated 20 annual Landsat NDVI composites from a high-quality NDVI time-series free of cloud and shadow contamination.
Although having certain drawbacks, the Maximum NDVI Value method is straightforward, simple, and quick, allowing NDVI compositing at a broad scale, even at the global scale, and can ensure high-quality data from the growing season. In this study, we adopted the mixed NDVI composite approach proposed by Zhang et al. [71], adapted from the Maximum NDVI Value method and generated NDVI composites on GEE using various treatments for vegetation, barren land, and water.
We first established two thresholds for distinguishing vegetation (maximum NDVI > 0.4) and water bodies (minimum NDVI < −0.3) from others. Then, we generated a composite with median NDVI and replaced all median NDVI values more than 0.4 with the Maximum NDVI value in the NDVI image collection and all values less than −0.3 with the minimum NDVI value. This method successfully mitigates the effects of some seasonal and transitory variations in feature categories. In addition, we multiplied each pixel's final NDVI value by 10,000 and transform it to an int16 integer to reduce file storage space. Finally, we obtained an annual mixed NDVI composite.

Shale Oil/Gas Platforms Appearance in Remote Sensing Images
The infrastructure for extracting, processing, and temporarily storing shale oil/gas is the shale oil/gas well platform. This is the carrier of surface shale oil/gas well building activity. In general, the development of a shale oil/gas well platform is often accompanied by the destruction of vegetation. Then, the concrete structure of artificial buildings is created in the damaged region. At the same time, it is challenging to extract shale oil/gas well platforms due to their poor image characteristics, broad background vegetation coverage, and the influence of noise in Landsat images.
The majority of shale oil/gas well platforms are concrete structures with substantially lower NDVI values than vegetation, which is represented in the brightness difference between the shale oil/gas well platform and the vegetation background. At the same time, the NDVI value will drop dramatically once a piece of vegetated land is disturbed. As seen in Figure 4, the location was covered with vegetation from 2000 to 2013, and the NDVI value remained stable around 0.6 but dropped significantly below 0.2 in 2014 when the vegetation was destroyed for a shale oil/gas well platform. From 2014 through 2020, the location's NDVI value remained stable at roughly 0.15. Land disturbance usually occurs while a shale oil/gas well platform is built. As a result, the change in NDVI value in a long time series can be effectively used to detect this occurrence. We discover that stable vegetation land type value will only fluctuate a little around 0.6 in the long-term NDVI sequence. The disruption of the land type caused by building a shale oil/gas well platform is frequently a quick occurrence. Once shale oil/gas extraction begins at a specific location, the NDVI value will plummet to around 0.15 in the following year.
As a result of the statistics of NDVI values of various land types, we estimate that the average NDVI value of vegetation types is about 0.6 with a variance of 0.05, but the average NDVI value of shale oil/gas well platforms is about 0.15 with a variance of 0.05. Then, as shown in Figure 5, we mimic the converting process of vegetation to shale oil/gas well platform in different years by generating Gaussian random numbers, with the consideration of the above discussed NDVI distribution patterns for different land types.

285
A distance is a simple similarity measure, which is to calculate the distance of an 286 unknown category vector to a known category vector [84]. The smaller the distance, the 287 more similar the two vectors are. The basic principle of the minimum distance method is 288 that in an N-dimensional space, the minimum distance method first calculates the mean 289 of each dimension of each known class X A and similarly calculates the mean of another 290 class X B . We use µ A ,µ B to represent their mean values. Then for a feature vector x to 291 be classified, we only need to calculate the distance D(x,µ A ) and D(x,µ B ) to X A and X B 292 respectively.Then find the smallest values in D(x,µ A ) and D(x,µ B ). If the former is small, x 293 belongs to class A. Otherwise, it belongs to class B. We use Manhattan distance to measure 294 the distance between two n-dimensional vectors as Equation 1,the X 1 is (a 1 , a 2 , ..., a n ) ,X 2 is 295 Version February 23, 2022 submitted to Journal Not Specified 9 of 21 changed dramatically in the corresponding year. Figure 5(u) denotes a stable shale oil/gas 281 platform, with a mean of 0.15 and a variance of 0.05. Figure 5(v) denotes stable vegetation, 282 with a mean of 0.60 and a variance of 0.05. And Figure 5(w) denotes stable water, with a 283 mean of -0.45 and a variance of 0.05. 284 Figure 5. The 24 Gaussian random number templates, (a) to (t) The construction of shale oil/gas well platform in the year of curve mutation point, (u) stable shale oil/gas platform, (v) stable vegetation, and (w) stable Water.

Model matching with the minimum Manhattan distance 285
A distance is a simple similarity measure, which is to calculate the distance of an 286 unknown category vector to a known category vector [84]. The smaller the distance, the 287 more similar the two vectors are. The basic principle of the minimum distance method is 288 that in an N-dimensional space, the minimum distance method first calculates the mean 289 of each dimension of each known class X A and similarly calculates the mean of another 290 class X B . We use µ A ,µ B to represent their mean values. Then for a feature vector x to 291 be classified, we only need to calculate the distance D(x,µ A ) and D(x,µ B ) to X A and X B 292 respectively.Then find the smallest values in D(x,µ A ) and D(x,µ B ). If the former is small, x 293 belongs to class A. Otherwise, it belongs to class B. We use Manhattan distance to measure 294 the distance between two n-dimensional vectors as Equation 1,the X 1 is (a 1 , a 2 , ..., a n ) ,X 2 is 295 Figure 5. The 24 Gaussian random number templates, (a-t) the construction of a shale oil/gas well platform in the year of the curve mutation point, (u) a stable shale oil/gas platform, (v) stable vegetation, and (w) stable water.

Model Matching with the Minimum Manhattan Distance
A distance is a simple similarity measure, which is to calculate the distance of an unknown category vector to a known category vector [84]. The smaller the distance, the more similar the two vectors are. The basic principle of the minimum distance method is that, in an N-dimensional space, the minimum distance method first calculates the mean of each dimension of each known class X A and similarly calculates the mean of another class X B . We use µ A ,µ B to represent their mean values.
Then, for a feature vector x to be classified, we only need to calculate the distance D(x, µ A ) and D(x, µ B ) to X A and X B , respectively. Then, we find the smallest values in D(x, µ A ) and D(x, µ B ). If the former is small, x belongs to class A. Otherwise, it belongs to class B. We use the Manhattan distance to measure the distance between two n-dimensional vectors as Equation (1), the X 1 is (a 1 , a 2 , . . . , a n ), X 2 is (b 1 , b 2 , . . . , b n ). The Manhattan distance has the advantages of being quick to calculate and simple to implement.
We use the NDVI images from 2000 to 2020 as our vector to be classified, and a total of 24 random number templates simulated above as our known classes. The length of the NDVI time series for a specific pixel is 21, consistent with the length of NDVI trajectory templates, we set from 2000 to 2020 ( Figure 6). We match the unknown pixel with random number templates, find the minimum Manhattan distance, and label it with the corresponding class category. After assigning the label, we can easily pinpoint the time when the change occurred since we already know that each NDVI trajectory has a fixed occurrence time point.  Due to the complexity and diversity of landscapes, we discovered that calculating the location and built year of a shale oil/gas well platform directly through template matching will result in a significant false alarm rate. This indicates that many pixels that are not shale oil/gas well platforms will be labeled as shale oil/gas well platforms. For example, the construction of a shale oil/gas well mining platform is frequently accompanied by the construction of roads and pipelines, which has an impact on the accuracy. Let us suppose that a pixel changed over time, and the change is vegetation conversion to a shale oil/gas well platform.
In that case, the minimum distance calculated from the random number templates we set must be less than that calculated from pixels with other kinds of changes. As a result, we explore establishing a minimum distance criterion to reduce false alarms. The minimum distance obtained by pixels with other changes is greater than this threshold if the location is a land disturbance caused by an existing shale oil/gas well platform. The estimated minimum distance is then compared to that threshold. A label assigned will be kept if the corresponding minimum distance is smaller than that threshold. Otherwise, we classify it into other classes, significantly reducing the false alarm rate.
We determine the threshold with random simulation. We randomly select 500 shale oil/gas well platform sample points and 500 sample points with other disturbances. Then, we retrieve the minimum Manhattan distance for each of sample to the NDVI trajectory templates. A random simulation experiment will utilize the average minimum Manhattan distances of the 500 points with the disturbance caused by the construction of the shale oil/gas well platform and the 500 points with other disturbances. We conducted 5000 random simulation experiments to acquire the minimum distance distribution of each category.
Disturbances caused by different factors can be distinguished extremely effectively in the event of a large number of random simulations. It is easy to find that the threshold is about 18,400 in Eagle Ford ( Figure 7a) and 16,200 in Three Forks (Figure 7b). It is worth noting that our thresholds are for a specific disturbance related to shale oil/gas well platform construction. Some pixels in the classification process of vegetation and water bodies have a minimum distance greater than the defined threshold and are thus classified as other categories.
Then, we use GDAL to vectorize the binarized image and calculate the area. We use the geometric attribute of the area to delete any portions that are too large or too small. To determine the best-reserved shale oil/gas well area range, we first sort the vector data by area, then define four intervals, randomly sample 100 vector spots within these four intervals, and perform visual interpretation to determine the correct number of selected areas. Table 2 displays the findings of our verification.
We can see that the accuracy of the shale oil/gas well platform is very low between (0, 3000] and (60,000, 125,426], whereas (3000, 30,000] and (30,000, 60,000] are almost shale oil/gas well platforms, and thus we delete regions with an area less than 3000 and greater than 80,000 to remove isolated pixels and other large artificial buildings. We found some pixels from different years in the same shale oil/gas well platform. A simple and effective method is used to solve this problem. We take a vote on the pixels in each shale oil/gas well vector spot, calculate the number of pixels with the most pixels from the same year in that patch, and determine the year of that pixel as the year of this shale oil/gas well.

Multi-Year Disturbance Detection Map
Two specific shale oil/gas well regions in the United States were studied in this research: Eagle Ford ( Figure 8) and Three Forks (Figure 9), a-d in Figures 8 and 9 represent details of shale oil/gas platform extraction. The land disturbance cause by the development of a shale oil/gas well platform from 2000 to 2020 was identified together with the construction period and locations of the shale oil/gas well platforms.

Classification Accuracy
To verify our algorithm's accuracy, we randomly picked 500 vector spots from Eagle Ford and Three Forks. As we treat a shale oil/gas well platform as a whole entity, we can directly check the class accuracy of the vector spots. We verified the results using Google Earth high-resolution images and careful visual inspection. If both our chosen sample point and the feature in Google Earth are shale gas well platforms, the point is correctly classified; otherwise, it is incorrectly classified. Table 3 shows that the accuracy in Eagle Ford was 83.80% and in Three Forks was 81.40%.

Disturbance Year Accuracy
We also verified the disturbance year accuracy of shale oil/gas well platforms. This was accomplished by identifying vector spots with actual shale oil/gas platforms based on visual interpretation of the shale oil/gas multi-year disturbance results. For the Eagle Ford area, we chose 30 vector spots in which the actual landscape was a shale gas well platform each year from 2000 to 2020. We compared the disturbance time of each year with the right sample year by year using Google Earth high-resolution historical images.
The best accuracy was 90.00% in 2011 and 2019, while the lowest was 66.67% in 2001 (Table 4). At the same time, the overall accuracy for all the 20 years was 81.00%. The same verification method was used in Three Forks. The best accuracy was 93.33% in 2019, the lowest accuracy was 63.33% in 2002, and the overall accuracy for all the 20 years was 78.99% (Table 5). Here, we also adopted a confidence interval of ±1 years in the comparison in view of the uncertainty in visual interpretation of the change year [85,86].
When assessing the reliability of our maps using a fuzzy accuracy assessment classification, the accuracy increased moderately. In Eagle Ford, when the previous year was included in the accuracy evaluation, the lowest accuracy was 73.

Comparison with Other Algorithms
There are many algorithms that consider different land types characteristics to detect land disturbances, like Breaks For BFAST [44] and LandTrendr (LT) [45]. However, the long-term sequence quantitative and digitized disturbance phenomenon simulation was lacking. The LT algorithm is data-intensive and heavily reliant on random accesses to the time series. The input for each pixel is a spectral band or an indexed annual time series, plus the date of each observation, and computing the vertices entails generating multiple models that include different groupings of input values, which is computationally complex [51].
The NDVI Trajectory Matching algorithm can successfully locate shale oil/gas wells and pinpoint the construction time with a simple minimum distance method that is easy to understand and implement with light computation. We accurately captured the land disturbance characteristics of shale oil/gas before and after development. This is because, once the construction from vegetation to the shale oil/gas well platform occurs, the sudden change of NDVI value in that year is very clear, forming a strong change signal that can easily be captured by the Landsat NDVI time series.
What must be mentioned here is that the proposed method relies on vegetation change signals. That is, if a shale oil/gas well was constructed on a piece of bare land without vegetation, the proposed algorithm would not work as expected. The algorithm can effectively extract the area of the shale oil/gas well platform and simplify the disturbance scene of the LT method. The rolling NDVI compositing method in Section 2.3.3 can also help to suppress some inter-annual disturbances, such as fallow lands.
From the results of our assessment, we can see that the classification accuracy of shale oil/gas well platform extracted by this algorithm was 83.8% in Eagle Ford and 81.4% in Three Forks, which are slightly lower than the accuracy reported by certain current supervised learning algorithms, such as machine learning and deep learning [39]. However, our algorithm does not require a great deal of preliminary sampling work, which is often time-consuming and labor-intensive work.
Supervised learning methods, such as random forest or neural networks, also require a large amount of data for training and require multiple bands of data added to the training to achieve better results. In comparison, we use only one band of NDVI, which greatly reduces the data storage space. The proposed algorithm can achieve a fast calculation speed when using the Manhattan distance. Of course, it should be mentioned that the generalization of our algorithm is not as good as the current deep learning methods.

Influence of Different Vegetation Growth Conditions
We can see that the classification accuracy and mean disturbance year accuracy were higher in Eagle Ford at low latitude than in Three Forks at high latitude. Figure 1a shows the two areas' relative positions. Simultaneously, when we conduct a random simulation experiment, we discover that the minimum distance threshold of the two regions is also different. This is because the distribution of NDVI value of vegetation that we set is the same for these two regions; however, the vegetation ecological environment of the two regions is different due to the different geographical settings.
Compared with the Eagle Ford area at low latitude, the land in the Three Forks area at high latitude is relatively barren with a low overall NDVI value. When a shale gas well mining platform is built, the NDVI curve from vegetation to shale gas platform in Eagle Ford changes dramatically and is easily detected. In contrast, the NDVI curve in Three Forks changes less sharply, resulting in some false detection.
As a result of comparing the two research areas, we can see those different geographical locations have different vegetation ecological environments, which are difficult to describe using a unified random number template. Therefore, establishing an appropriate random number template according to the specific study area is an issue worthy of attention.

Limitations of the Shale Oil/Gas Research
In this paper, we monitored the activities of shale oil/gas well platform construction. When vegetation is destroyed and a new shale oil/gas platform is built, the signal of NDVI change is well captured by our algorithm. However, we can only interpret this as an indirect indication of oil/gas production activities, as all we know about the shale oil/gas well platform extracted through the algorithm is that a shale gas well platform's construction has occurred at this location. We cannot determine whether the shale oil/gas platform is still in operation or has been abandoned. It is difficult to monitor this process directly using remote sensing images.
Of course, if a shale oil/gas platform is in operation, waste gas might be burned to generate a flame. In such a case, night light remote sensing images may be a good choice; however, the current night light image resolutions are relatively low, making it difficult to detect shale oil/gas platforms. Furthermore, waste gas is not burned at every oil/gas extraction site. If we want to know whether a shale gas platform is in operation, we may need to conduct field investigations or gather additional data, which is a challenge and must be addressed in the future.

Conclusions
Shale oil/gas, as a cleaner energy, is now widely used in the United States and around the world. Its development can have huge impacts on and, at the same time, is also deeply affected by energy markets, especially in an era with high economic uncertainty. Furthermore, its development also inevitably causes negative impacts on local environments. Understanding and monitoring shale oil/gas development over large regions is critical for energy policies and environmental protection. However, there are currently no applicable methods to track the spatio-temporal dynamics of shale oil/gas development.
To fill this gap, we designed a new NDVI Trajectory Matching algorithm to detect shale oil/gas development using the annual Landsat NDVI composite time series from 2000 to 2020. Compared with other methods, our proposed algorithm takes into account both speed and accuracy, greatly simplifies the calculation, and effectively extracts the location and construction time of shale oil/gas well platforms in two regions in the U.S.A. The reason behind the success of the proposed method is that the NDVI signal change caused by the damage of vegetation by the development of shale oil/gas platforms is quite strong and can be easily captured by the Landsat NDVI time series at the 30 m scale.
Remote sensing imagery at the Landsat scale have been well archived and are continuing to grow due to new satellite sensors, such as the Landsat-9 and the Sentinel-2 series, to cover the entire global land surface. These publicly open and free of charge data, together with powerful cloud-based computation infrastructures, such as the GEE, all provide huge potential for the proposed method to be applied to a much broader extent and allow data about shale oil/gas development to be collected in a more economical and objective way.
Although the proposed method relies on the land cover change signal due to vegetation disturbance and post-processing is required to reduce false alarm, it can also be applied to detect similar changes involving vegetation changes, such as urban expansion to agricultural lands. That is to say, the NDVI Trajectory Matching algorithm has the potential to be applied to other perturbation phenomena of land features, such as built-up areas, forests, farmlands, and water body expansion and contraction if there are vegetation disturbances occurring.