1. Introduction
The volume of data obtained from satellites is increasing rapidly because of the development of satellite sensors and communication technologies. The geometric, spectral, and radiometric resolutions of satellite sensors are improving in tandem, and the size of scenes is increasing geometrically. As the speed of communication between satellites and ground stations improves, the volume of data received by ground stations grows. Moreover, nanosatellites (or cubesats) can capture more and more images with larger coverages and shorter revisit cycles, based on dense constellations and low altitude. For example, the PlanetScope mission involves more than 200 satellites, less than 4 m GSD, 400 square kilometer coverage, and a 90-min revisit cycle [
1]. Because of the enhancing technologies and decreasing prices of storage media, there has been a steady rise in the size of the satellite images that could be stored in data warehouses. The development of computing power has led to an improvement in the conditions in which very large satellite images could be processed. However, it is not practically easy to process large satellite images via hardware improvements alone: corresponding software technologies should be developed and optimized for satellite image processing. As such, the advent of new and complex processing algorithms does not necessarily help to reduce processing times of larger satellite images, hindering their practical utilization. Therefore, there is an evident emerging need for research on reducing processing times.
Parallel processing is currently used for high-volume data processing and computationally intensive problem solving in a variety of fields, such as engineering, mathematics, and science. Parallel processing (or parallel computing) is a computing technology that divides complex problems into smaller parts and processes their data simultaneously on several computers. In the field of spatial informatics, large volumes of data are usually dealt with, and the complexity of processing methods is increasing considerably. As such, opportunities for using parallel processing in geographic information systems [
2,
3], remote sensing [
4,
5], and laser scanning [
6,
7] have steadily emerged. For example, in the field of remote sensing, parallelization was implemented in an optical real-time adaptive spectral identification system of which purpose was to analyze the endmembers of hyperspectral images [
8]. In order to monitor and track wildfires in (near) real-time, another study focused on parallelizing classification algorithms based on automated morphological and self-organizing maps for hyperspectral images [
9]. Endmember extraction (or pixel purity index) calculations for hyperspectral images were parallelized, and performance comparisons were performed on parallel processing hardware using various platforms [
10,
11,
12]. Typically, relevant studies share a common aspect: they process hyperspectral images and parallelize various algorithms for classification and recognition.
Efforts have been made to develop parallel processing techniques that were suitable for new and affordable hardware. Starting from early 2010, there has been a trend toward parallel processing using graphics processing units (GPUs). They were applied, based on the compute unified device architecture (CUDA) of NVIDIA, to modulation transfer function compensation, cubic convolution interpolation, and discrete and wavelet inverse transformations [
13]. CUDA was also used to apply the mean shift clustering algorithm to high-resolution remote sensing images [
14]. Some researchers have developed GPU-based image mosaicking [
15] and Sobel filtering [
16]. However, the use of GPU-based approaches is not known ideal for very large image processing because of the limited capacity of GPU memory.
More recently, interest in cloud computing has grown rapidly, and there has been an increasing use of methods that allowed large satellite images to be processed using high-performance cloud servers. Cloud computing technology is a good candidate for processing large amounts of remote sensing data if the cost of high-performance computing implementations and technical usability is considered [
17]. For example, maximum likelihood classification and Mahalanobis distance clustering algorithms have been implemented in a cloud computing environment [
18]. However, the basic purpose of cloud computing is to make high-performance computing resources available to large quantities of clients through the Internet. When large satellite images are uploaded to cloud servers, the process of uploading itself can lead to inefficiencies, and the cloud resources cannot be reserved exclusively for the large projects. As an alternative, distributed computing decomposes data and processes them simultaneously. This concept can correspond to parallel processing; however, it is a loosely coupled method in which each node processes the data independently. Thus, it differs from tightly coupled parallel processing, which is based on computing nodes that share memory [
5].
On the other hand, unsupervised classification is a preprocessing process implemented to investigate land cover distribution from satellite images without prior knowledge. Additionally, it is utilized to generate training data for supervised classification. Recently, the trend of satellite image classification has shifted to deep learning and a wide variety of pre-trained models were available for application to various landcovers and sensors [
19], with studies being conducted to transfer and augment pre-trained models [
20]. However, to train a new model for new objects and sensors, a large quantity of training data is still required, but acquisition cannot completely replace the use of human resources. Therefore, it is expected, for the time being, the unsupervised classification is applied to semi-automated training data collection in deep learning approaches.
Because unsupervised classification is generally assumed to be a preprocessing task, in some cases, when repeatedly performed using various parameters on multiple images, it is strongly expected to require less processing time in a fully automatic method. However, computing-intensive methods are still likely to be relevant to the attainment of acceptable results. For example, k-means clustering, a de facto standard for unsupervised classification algorithms, is implemented in prevalent remote sensing software such as ENVI, Erdas Imagine, and SNAP. It is also implemented through extensions or plugins in GIS software such as QGIS and ArcGIS. Due to its simplicity and efficiency in clustering a large dataset, the k-means clustering algorithm is still popular and widely used in relevant studies in native and further modified forms [
21,
22,
23,
24,
25,
26,
27,
28,
29,
30]. K-means clustering is composed of multiple loops and may take hours depending on the volume of images and input parameters. Nevertheless, when applied to multiple images, resulting classes would not be compatible among the images because the method would lack pre-trained class signatures. Therefore, improvements are needed to increase processing speed and apply to series of images acquired from recent satellite missions.
The study aims at two objectives: (1) performing unsupervised classification to acquire consistent and compatible results from a series of satellite images and (2) improving computational efficiency by implementing parallel processing. For the objectives, the study improves the k-means clustering algorithm in two steps: firstly, it classifies a series of images attained by the PlanetScope mission and produces a single set of class signatures so the resulting images share consistent class IDs. The results generated by the proposed algorithm were compared with those obtained from the native version. Next, the algorithm is parallelized using multi-cores of a CPU and a GPU. The parallelism consists of concurrent processing of the main classification routines using numbers of computational units, followed by further optimization based on parallel reduction utilizing a GPU. The performance is compared between models with a sequential code (without parallel processing), a multi-core code, a GPU code, and a GPU code with parallel reduction.
3. Evaluation and Discussions
3.1. Evaluation Method
A series of adjacent PlanetScope images were processed using kMC and IIkMC, and the classification results were compared. IIkMC was implemented in a sequential code (without parallel processing), an intra-node parallel code, a GPU-based parallel code, and a GPU-based parallel code using the parallel reduction. The performance of the parallel codes was compared to the sequential variant.
A total of twelve images were achieved over two days from 30 October 2018 to 31 October 2018 (
Figure 7). Each image is composed of four multi-spectral bands (red, green, blue, and infra-red) and the file type is 16-bit geotiff. The dimension of each image is approximately 8.2 k × 3.9 k and the average size in the uncompressed tiff is 244 MB. The computing system has an AMD Ryzen 9 CPU consisting of 16 cores for intra-node parallel processing and an NVIDIA RTX 3090 Ti graphics card for GPU-based parallel processing (
Table 1). The number of classes (k) is eight, and the iteration termination condition is 1.0% (
Table 2).
3.2. Evaluation of Clustering Results
The results of the separate processing of each image using kMC are shown in
Table 3 and
Figure 8. A same-color map was applied to each result to match the same class ID with the same color. The number of iterations differed greatly, and image 3 was processed after three iterations, resulting in incomplete segmentation. With smaller termination condition, it would have produced different results, but it should be remarked 1.0% was not an unreasonable condition. One class was canceled in image 9, and two classes were canceled in image 11, which may have occurred accidentally due to the spectral characteristics of the initial and updated class centers. This problem might be solved by applying class-splitting step to the classification routine, which was not considered in this study. Results 1 and 7 to 12 appear to have similar class signatures compatible among themselves; images 3, 4, and 6 also might, but the rest do not.
The result of concurrently processing the 12 images using IIkMC is shown in
Table 4 and
Figure 9. The result was derived after 55 iterations, and the class signatures appeared to be compatible throughout the results. From the enlarged image of
Figure 7 (a) in
Figure A1 and the corresponding results in
Figure A2 and
Figure A3, it can be clearly seen IIkMC has better class compatibility among the results. From the enlarged image of
Figure 7 (b) in
Figure A4 and the corresponding results in
Figure A5 and
Figure A6, it could be confirmed a complete result was obtained for image 2, which was not properly segmented by kMC. From the inspection, it was also confirmed none of the classes were canceled in the results of images 9 and 11.
However, from
Figure A3 and
Figure A6, it could be remarked over-segmentation might have occurred in the results of IIkMC. The possible reason for this was acceptance of the spectral features from multiple images could have broadened the spectral distribution, and, as a result, separability among classes may have decreased. Separability can be better understood by inspecting the distribution of the center and standard deviation of each class in the spectral domain. In the results of kMC, the standard deviation starts in the mid 100 s (
Table A1), while in IIkMC, it starts in the late 200 s (
Table A2). A larger standard deviation can imply lower separability. Therefore, it will be necessary to reduce the spectral distribution by applying histogram matching among images before clustering.
Nevertheless, the standard deviation of kMC is very different for each image and band, and the large values exceed 3000. This is compared to the standard deviation of IIkMC, which does not exceed the early 2000 s. Therefore, it can be confirmed the result of IIkMC is more stable than that of kMC, even from the view of separability. Moreover, since it is challenging to identify a set of representative class signatures from KMC (i.e., widely distributed class centers and standard deviation among images), IIkMC is considered a better solution.
3.3. Evaluation of Parallel Processing Performance
Parallel processing performance is typically evaluated using two indices, efficiency and speedup, both of which are defined in terms of the processing time versus the number of cores or computing nodes involved (Equation (2)). In the ideal case, speedup is the same as the number of cores or computing nodes, and efficiency should be 1.0. This indicates the increase in execution speed is exactly proportional to the number of cores or computing nodes involved. However, according to Amdahl’s law [
41], this is not achievable in reality. Reasons for this include unequal workload among cores and nodes, as well as unparallelizable procedures. In addition, in some cases, the efficiency might exceed 1.0; this is called super-linearity. This can occur when processing an exceptional amount of data or when the complexity is beyond the ability of a single core or node [
42]. Speedup
and efficiency
are defined as:
where
p is the number of cores (or computing nodes) and
and
denote the execution times for sequential and parallel processing, respectively.
The execution times were measured for the sequential code, the intra-node code, the GPU-based code, and the GPU-based code with parallel reduction for IIkMC. To calculate the speedup and the efficiency of the intra-node code,
was set as the execution time of the sequential code, and
was set as that of the intra-node code using all cores. However, since it is ambiguous to define
p for the GPU-based code, the efficiency is not presented (
Table 5).
The results are listed in
Table 5. Processing with sequential code was very burdensome and required 5765.18 s (=1.6 h) to process 12 images (=2.86 GB). On the contrary, the intra-node code took 449.27 s (=7.5 min). This is about 13 times faster, meaning the task can be finished within an acceptable time. The GPU-based code processed the data 25.53 times faster than sequential code and 2.00 times faster than intra-node code. Furthermore, the GPU-based code with parallel reduction processed 1.53 times faster than the GPU-based code. In this study, sequential addressing, a middle-optimized parallel reduction approach, was applied. With more optimized reduction approaches, the processing speed will be further improved. Additionally, it can be predicted the speed can be maximized by using multi-GPUs.
Finally, this study verified whether the results obtained using the sequential, intra-node, and GPU-based codes with and without parallel reduction, matched. The class centers of each band and the classified number of pixels of the results were compared. No discrepancy was detected among the results, confirming they matched completely.
3.4. Discussion
Applying kMC and IIkMC to 12 planetscope images revealed quite different results. In the results of kMC, using 12 consecutively acquired images that might constitute similar land cover, the number of iterations was significantly different, some images were not clustered appropriately, and some classes were canceled during iteration. Class signatures of kMC proved challenging because choosing a representative one among images would take a lot of work. On the other hand, IIkMC conducted stable clustering while assigning consistent classes to all the images. Although the standard deviations were estimated to be slightly larger, the results were expected to provide a comprehensive result by delivering a single set of class signatures for all images.
Because of its efficiency, kMC is still widely used for unsupervised classification of satellite images. Nevertheless, as the capacity or quantity of images increases, the processing time of IIkMC increases exponentially. The intra-node parallel processing and GPU-based parallel processing succeeded in increasing the speed of IIkMC several times faster than sequential processing. In addition, the performance is further improved with the introduction of parallel reduction and is expected to be enhanced by optimizing parallel reduction and utilizing multi-GPU. Therefore, it will be possible to process data acquired in a more extensive range of areas, including satellite and drone images ranging from hundreds to thousands.
4. Conclusions
In this study, IIkMC was introduced for the unsupervised classification of satellite images acquired in series. The results can be summarized in three points as follows:
- -
derived a single set of class signatures for multiple satellite images;
- -
performed complete classification of all images, granting consistent class IDs;
- -
implemented intra-node and GPU-base parallel processing.
A single set of class signatures was derived, and each image was consistently classified so the classification results were compatible between the images. In addition, some images were completely processed could not be correctly segmented by kMC. However, IIkMC also revealed the possibility the separability among classes could be weakened because wider spectral features from multiple images were merged into each class. Nevertheless, it should be remarked IIkMC provides more reliable results, potentially facilitating overall inspection of multiple images and training data generation for supervised classification or deep learning.
kMC is a computationally intensive algorithm, making it burdensome to apply to large images, which is also the case for IIkMC. With the parallelized IIkMC using multi-cores of a CPU and a GPU, the processing speed could be dramatically improved. Thus, large volumes of satellite images acquired from an extensive range of areas are expected to be plausibly inspected prior to intensive analysis or supervised classification.
For better performance of IIkMC, future works aim to improve the class center management and enhance class separability. The former includes class center initialization along with class separation and integration based on data characteristics, similar to ISODATA clustering. The latter will be achieved by normalizing the radiometric distributions among images. GPU-based approaches will be improved by implementing further optimized parallel reduction, manipulating high-performance operations introduced in cuBLAS [
43], and modifying parallel code to fit using multi-GPUs.