Performance Evaluation of Parallel Structure from Motion (SfM) Processing with Public Cloud Computing and an On-Premise Cluster System for UAS Images in Agriculture

: Thanks to sensor developments, unmanned aircraft system (UAS) are the most promising modern technologies used to collect imagery datasets that can be utilized to develop agricultural applications in these days. UAS imagery datasets can grow exponentially due to the ultrafine spatial and high temporal resolution capabilities of UAS and sensors. One of the main obstacles to processing UAS data is the intensive computational resource requirements. The structure from motion (SfM) is the most popular algorithm to generate 3D point clouds, orthomosaic images, and digital elevation models (DEMs) in agricultural applications. Recently, the SfM algorithm has been imple-mented in parallel computing to process big UAS data faster for certain applications. This study evaluated the performance of parallel SfM processing on public cloud computing and on-premise cluster systems. The UAS datasets collected over cropping fields were used for performance evaluation. We used multiple computing nodes and centralized network storage with different network environments for the SfM workflow. In single-node processing, an instance with the most computing power in the cloud computing system performed approximately 20 and 35 percent faster than in the most powerful machine in the on-premises cluster. The parallel processing results showed that the cloud-based system performed better in speed-up and efficiency metrics for scalability, although the absolute processing time was faster in the on-premise cluster. The experimental results also showed that the public cloud computing system could be a good alternative computing envi-ronment in UAS data processing for agricultural applications.


Introduction
In recent days, unmanned aircraft system (UAS) have been actively utilized in agricultural applications to develop a high-throughput phenotyping (HTP) system [1,2]. UAS, often called as drone, can collect high-spatiotemporal-resolution imagery data over agricultural fields. UAS data can be processed to visualize agriculture fields and analyzed for developing advanced agriculture applications [3]. Once UAS data are collected in the field, the data need to be processed to extract phenotypic information. The structure from motion (SfM) algorithm is the most popular algorithm used to turn numerous UAS images with significant overlaps into measurable geospatial data products such as 3D point clouds, digital elevation models (DEMs), and orthomosaic images using the triangulation concept in photogrammetry. The geospatial data products generated from the SfM process are then adopted to generate georeferenced phenotypic information [4][5][6][7].
As hundreds of images are easily taken by each UAS flight mission, UAS data collection campaigns usually result in a huge imagery dataset. Although computing power has rapidly increased recently, processing massive amounts of UAS data is still a challenging task, as the computational resource requirements grow exponentially as the number of UAS images increases [8,9]. The SfM process could take many hours or even days to process big UAS data collected at a fine spatiotemporal resolution. To overcome this hurdle, high-performance computing (HPC) capabilities can be adopted to parallelize the SfM process and expedite the computation time.
For the parallel processing of SfM, a cluster system with independent computers and common storage can be employed. Although cluster systems have benefits such as high performances, fault tolerance, and scalability, users should invest significant resources including labor, hardware, and software to construct and maintain the system locally [10]. In recent years, since commercial cloud computing service have developed rapidly and inexpensively, cloud computing systems can serve as an effective counterplan for cluster computing.
To evaluate the potential for employing cloud computing systems to process UAS data for agriculture fields, the performance of SfM processing must be examined. Therefore, two UAS datasets in two different environments were tested in public cloud computing and on-premises cluster systems. The main objective of this study was to: (1) compare the performance of single-node processing with different computing power and storage options and (2) test parallel processing in public cloud-based and on-premises cluster systems. For the experiments, the high-quality RGB imagery was collected by using a UAS platform, and then processed with SfM software in various environments. The processing time was measured and used to compare the performance of the cloud-based and onpremises cluster systems.

UAS Datasets
In this study, two UAS missions were designed to collect small (2.3 GB over 4 acre) and large (12.5 GB over 220 acre) datasets to conduct performance comparison tests. A field for a small dataset was located in the research farm managed by Texas A&M AgriLife Research and Extension Center at Corpus Christi. Corn crops were planted in this field. The large dataset covered cotton and sorghum plants in a commercial field in Driscoll, TX. RGB images were collected with a DJI Phantom 4 RTK (DJI, Shenzhen, China) for both fields. The onboard camera (FC6310R) was equipped with a 20 megapixel CMOS sensor, a resolution of 5,472 × 3,648, an 8.8-mm focal length, and an 84° field of view (FOV). The flight parameters, such as the flight altitude and overlap, were determined by the field size and flight time (Table 1). One and four UAS flights were conducted to collect 293 and 1,557 images for small and large datasets, respectively. As we used the same UAS platform and sensor, the total volume of raw images was directly proportional to the number of images.

Cluster Systems for Processing
Two cluster systems were adopted to test performance of processing UAS images with the SfM algorithm. The AgriLife Local Cluster (ALC) is an isolated on-premises cluster constructed in the Texas A&M AgriLife Research & Extension Center at Corpus Christi. The ALC consists of 5 workstations (nodes) and network-attached storage (NAS). All nodes and the NAS are interconnected through a network switch with gigabit ethernet ports (Figure 1a). NAS was connected to the network switch with four gigabit LAN ports to increase the bandwidth by aggregating multiple network interfaces and preventing traffic failover to maintain network connections. All nodes and NAS could communicate internally regardless of public internet connections. Each machine has different hardware such as CPU, RAM, and graphic card, which are equipped for each node in the ALC ( Table  2). The details of the CPU and GPU specifications are shown in Appendix A (Tables A1  and A2).
An oracle cloud cluster (OCC) was built with various combinations of four compute shapes and two storage options in the oracle cloud infrastructure (OCI) ( Table 2). Although all shapes employ the same CPU, the numbers of CPUs, CPU/GPU memory, and network bandwidth are different for each shape (Table A3) [11]. Two storage options, file storage and block volume, were tested in this study. The block volume was used as the local storage of the node, while file storage worked as a network drive in OCC [12,13]. For a multi-nodes cluster system, nodes and network storage were set up in the OCI and connected through the public internet ( Figure 1b). The OCC was simple and easy to build in the OCI for parallel processing, but the network speed through the public internet mainly affected the processing time.

Structure from Motion (SfM) Processing
Although there are various available SfM software programs, such as Agisoft Metashape, Pix4D, and OpenDroneMap, and image mosaicking services by DroneDeploy, Agisoft Metashape software (1.6.3.10732, 64 bit) is used to process UAS raw images. Agisoft Metashape also provides network (parallel) processing using multiple nodes as well as stand-alone processing. In this study, Agisoft Metashape was selected to process UAS data through batch processing to avoid manual work in the processing pipeline (Table 3). Although Metashape provides many parameters that the user can adjust, default or recommended options were used for all of the experiments.
In the align photos, Metashape estimated the camera position at the time of image capture defined by the interior and exterior orientation parameters [14]. Interior orientation (IO) parameters included the camera focal length, coordinates of the image principal point and lens distortion coefficients. Exterior orientation (EO) parameters defined the position and orientation of the images. EO consisted of 3 translation components (X, Y, and Z) and 3 Euler rotation angles (yaw, role, and pitch). The UAS platform used in this study is equipped with RTK GPS systems for measuring initial EO parameters in image capture. IO and EO parameters can be calculated by Metashpae using aerotriangulation with tie points and bundle block adjustment based on collinearity equations [15]. After this processing, estimated IO and EO with sparse point cloud containing triangulated positions of matched image points were resulted. A depth map calculated using dense stereo image matching is constructed for the overlapping image pairs considering the updated IO and EO parameters from the previous process. In Metashape, the depth map is transformed into partial dense point clouds, and then it is merged into a final dense point cloud. For every point in the final dense point cloud, a confidence value, which means the number of contributing depth maps, and color information sampled from the images are stored.
In this study, the DEM is rasterized from a dense point cloud with height values stored per every cell on the regular grid, and then used to build the orthomosaic. A combined image created by the seamless merging of the raw images was projected on the ground surface with the selected projection. As file saving is conducted by a single-node, DEM and orthomosaic images are also exported to compare the performance of different storage options.

Performance Testing
The performance experiments were sconducted using single and multiple nodes for the small and large datasets. In single-node processing, two different storage environments were tested in the different cluster environment. For local and network storage options, all UAS data and processing products were stored in the local hard drive and network drive in both cluster systems. Due to the speed of the disk I/O (input and output) and network, local storage could be expected to process faster. Three workstations (M1, M2, and M3) in the ALC and four VMs (2.1, 3.1, 3.2, and 3.4) in the OCC were selected for single-node processing to compare the performance with different computing powers in a single-node. For multi-nodes processing, the datasets were processed using the network processing mode in Metashape. The processing began with one node, and then additional nodes were used, up to five and six nodes in the ALC and OCC, respectively.
All processes were conducted continuously in a batch process without manual work. Processing time was measured as a criterion of performance. All experiments were repeated three times and the average processing time was used for comparison.
The speed-up and efficiency, the principal measurements of parallelization efficiency, were calculated from the total computation time in multi-nodes processing. The Speed-Up ( ) was defined as the ratio of the time required to execute the computational workload on a single-node to the time required for the same task on N processors [16]: where is the execution time on a single processor and is the execution time on N processors.
Efficiency was defined as the ratio of speed-up to the number of processors (Equation (2)) [17]: where is the efficiency on N processors, is the speed-up on N processors, and is the number of processors. Efficiency can be used to measure the fraction of time for which each node is usefully utilized.

Computing Power of Single-Node
To show the computation power of each node in the ALC and OCC, the benchmark scores were measured in different environments. In Figure 2, items on the X-axis indicate the abbreviations of each node. The first letter means the ALC (A) or OCC (O), and the third letter means local (L) or network (N) storage. Second term is showing the machine ID (ex. M1, M2, etc.) in the ALC or the shape (ex. 2.1, 3.1, etc.) in the OCC. Single-core and multi-core power indicating the overall performance of main processor was measured by GeekBench 5 and V-Ray. GPU performance was also tested by GeekBench 5 (OpenCL).
In the ALC, all nodes showed different performances because each node was equipped with different hardware specifications. Although the single-core powers of M1 and M2 were higher than those of the others, M3 was the highest in the multi-core test because it had the largest number of cores (threads). GPU power was strongly related to the specification of the graphic card. M1 was the highest, while M3, M4, and M5 were similar in the GPU test.
The single-core power of the nodes in OCC was lower than M1 and M2 due to the frequency of the CPU, but the multi-core power of OCC was higher. Though VM.GPU2.1 consisted of more CPU/memory and faster network bandwidth, VM.GPU2.1 showed better performance of multi-core power than VM.GPU3.1, but similar to VM.GPU3.2, which was equipped with the same number of OCPU. The VM.GPU3 series equipped a more powerful GPU than two times of VM.GPU2.1 and four times all nodes in the ALC.
Based on the benchmark test applied to each single-node in the ALC and OCC, we tested which hardware parts were more influential in SfM processing and the potential of cloud-based clusters for UAS data processing.

Single-Node Porcessing
The performance for SfM processing in single-node was tested in experiments based on: (1) hardware specifications; (2) storage options; (3) and the UAS data size (Figure 3). In the same cluster system, the node with the more powerful GPU performed better in processing UAS data. For example, AWN/AWL-M1 was approximately 40 percent faster than AWN/AWL-M3 when using the small dataset, even though M3 resulted in a higher score on the multi-core benchmark. The OCC, VM.GPU3 series was also faster than VM.GPU2.1 when using the small dataset. Moreover, the results of large dataset processing showed that multi-core capability is another factor of processing speed. For example, GeekBench 5 and V-Ray showed a linear increase with the number of GPU cores for VM.GPU3 series shapes (Figure 2). Despite of the higher GPU performance, AWL/AWN-M2 and OWL/OWN-3.1 were slower than the other nodes when using the large dataset due to the power of the CPU. This implies that GPU and multi-core factors are highly influential hardware specifications in single-node processing.
In single-node processing, performance time was critically affected by different storage options, especially, in the OCC. Local storage, which is a block storage, is generally faster than network storage. In the ALC, there was less than a 10 percent difference between local and network storage, since the network speed through the router was as fast as local disk I/O. However, the network storage (file storage option) in OCC made SfM processing twice as slow. As Metashape must communicate with the storage through the entire processing, the disk I/O speed mainly affected the processing time of each step. The disk I/O-intensive works, such as Build Dem, Build Orthomosaic, and Export DEM/Orthomosaic took significantly longer time with network storage in OCC. These processes occupied approximately 45~55% of the entire processing time.
The results of the comparison between the ALC and OCC in single-node processing demonstrated that a cloud computing system could provide more performance gain with the appropriate virtual machine shape and storage architecture. For example, OWL-3.4 performed approximately 20 and 35 percent faster than AWL-M1, which is the fastest machine in the ALC, for the large and small datasets, respectively.

Performance of Parallel Processing in Cluster Systems
Multi-nodes processing was tested by increasing the number of nodes from a singlenode. As the number of GPU was limited up to six in the OCC, the shape with a single GPU, VM.GPU2.1 and VM.GPU3.1, were selected, and the performance of multi-node processing was compared with the ALC. SfM processing was conducted by Metashape in exactly the same way as single-node processing, but Align Photos, Build Dense Cloud, Build DEM, and Build Orthomosaic were considered in comparison because Export DEM/Orthomosaic were still processed in a single-node. Figure 4 shows the absolute processing time with different cluster environments. As the nodes of the ALC were connected internally through the router in the isolated network, the ALC performed faster than the clusters of OCC. The network speed of the OCC could affect the disk I/O and communication between nodes for parallel processing. Nevertheless, the processing time with multiple nodes in the OCC decreased more rapidly when another node was added. The decreasing slope of both cluster systems converged with 5 nodes. As mentioned in Section 3.2, the multi-nodes with VM.GPU2.1 performed faster than VM.GPU3.1 for a large dataset. Although OCC took a longer processing time by more than two times in multi-nodes processing due to network speed, the results showed that cloud-based clusters could process UAS data using SfM software more efficiently.  To compare how efficient the ALC and OCC were in multi-nodes processing, speedup and efficiency were calculated using the processing time ( Figures 5 and 6). Speed-up is defined as the ratio of the time taken to process data on a single-node to the time required to perform the same work on multiple nodes. In an ideal case, parallel processing could have a liner speed-up, 1-to-1 line, which means that the speed of execution increases with the number of nodes. Generally, the real speed-up is lower than the number of nodes, which means the slope should be lower than 1, and closer to 1-to-1 line is better. In this study, cloud-based clusters showed approximately 15~25 percent better performance of the speed-up algorithm in SfM processing for both small and large datasets. Since the nodes in the ALC were not uniform, speed-up was fluctuated more, while the speed-up values of the clusters in the OCC increased gradually. Regardless of the datasets, the clusters with VM.GPU2.1 and 3.1 had almost the same speed-up value ( Figure 5).
Efficiency is a performance metric estimating how well-utilized the nodes are processing data, compared to how much effort is wasted in communication and synchronization. Some nodes and the time in tasks can usually be wasted in either idling or communicating. Therefore, efficiency is lower than 1 in a real case and decreases with more nodes. Figure 6 shows the efficiency with the number of multi-nodes for small and large datasets. Similar to the speed-up results, the clusters in OCC showed better performance than the ALC and more stable efficiency with additional nodes. In particular, higher speed-up and efficiency were measured in the multi-nodes processing for the large dataset. These results imply that the cloud-based cluster could provide a better and more stable system for SfM processing when using the UAS data. If users would adopt the appropriate number of nodes and shapes in the OCC, they could construct a more efficient and stable cluster system than an on-premise cluster.

Conclusions
In this study, cloud computing-and local-cluster systems with various options were tested to compare the performance of SfM processing using UAS images collected in agricultural fields. Two UAS datasets were collected over the agricultural fields and processed by SfM software, Agisoft Metashape, with different computing environments. The performance of local machines and clusters were compared with cloud computing systems. Although local machine and cluster processed UAS datasets faster because of the network speed and disk I/O, cloud-based clusters showed better speed-up and efficiency in parallel processing. The experiments demonstrated that cloud computing could provide more stable and efficient systems to process massive UAS images when the user adopts the proper number and specification of nodes. In addition, cloud computing can give us the flexibility to increase instances more efficiently without having to worry about maintaining security or increasing capability. In the future, we will apply the cloud computing and cluster systems to process a huge dataset for various applications such as forest fire, coasting monitoring, environmental change detection, etc. in real/semi-real time.

Conflicts of Interest:
The authors declare no conflicts of interest.

Appendix A
Specifications of the CPU and GPU equipped in each node are shown in Tables A1 and A2. A shape is a template that determines the number of OCPUs, amount of memory, and other resources that are allocated to an instance in the OCI. In this study, GPU shapes for virtual machines were adopted. An OCPU is defined as the CPU capacity equivalent of one physical core of an Intel Xeon processor with hyper-threading enabled, or one physical core of an Oracle SPARC processor. The previous generation VM shape, VM.GPU2.1, is not currently available.