A Distributed Data Management and Service Framework for Heterogeneous Remote Sensing Observations

Cheng, Hongquan; Wu, Huayi; Zheng, Jie; Li, Zhenqiang; Qi, Kunlun; Gong, Jianya; Xiang, Longgang; Cao, Yipeng

doi:10.3390/rs17244009

Open AccessArticle

A Distributed Data Management and Service Framework for Heterogeneous Remote Sensing Observations

by

Hongquan Cheng

^1,2

,

Huayi Wu

²

,

Jie Zheng

^3,*

,

Zhenqiang Li

⁴,

Kunlun Qi

⁴

,

Jianya Gong

²,

Longgang Xiang

²

and

Yipeng Cao

³

¹

School of Architecture and Urban Planning, Guangdong University of Technology, Guangzhou 510090, China

²

State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing (LIESMARS), Wuhan University, Wuhan 430079, China

³

Oriental Space Port Research Institute, Yantai 265100, China

⁴

National Engineering Research Center for Geographic Information System, School of Geography and Information Engineering, China University of Geosciences (Wuhan), Wuhan 430074, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(24), 4009; https://doi.org/10.3390/rs17244009

Submission received: 2 October 2025 / Revised: 4 December 2025 / Accepted: 10 December 2025 / Published: 12 December 2025

(This article belongs to the Special Issue Spatial Modeling and Analysis with Geographical and Remote Sensing Data)

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

We present DDMS, a distributed data management and service framework that consolidates heterogeneous remote sensing data sources, including optical imagery and InSAR point clouds, into a unified system for scalable and efficient management.
The framework introduces an integrated storage model combining distributed file systems, NoSQL, and relational databases, alongside a parallel computing model, enabling optimized performance for large-scale image processing and real-time data access.

What are the implications of the main findings?

DDMS significantly enhances the scalability and efficiency of remote sensing data management, providing a flexible solution for real-time service delivery in applications that require high-volume, diverse datasets such as disaster monitoring, environmental analysis, and urban development.
By incorporating elastic parallelism and modular design, DDMS supports dynamic, large-scale geospatial data processing, reducing latency, improving service responsiveness, and ensuring robust performance across varying workloads and data sizes.

Abstract

Remote sensing imagery is a fundamental data source in spatial information science and is widely used in earth observation and geospatial applications. The explosive growth of such data poses significant challenges for online management and service, particularly in terms of storage scalability, processing efficiency, and real-time accessibility. To overcome these limitations, we propose DDMS, a distributed data management and service framework for heterogeneous remote sensing data that structures its functionality around three core components: storage, computing, and service. In this framework, a distributed integrated storage model is constructed by integrating file systems with database technologies to support heterogeneous data management, and a parallel computing model is designed to optimize large-scale image processing. To verify the effectiveness of the proposed framework, a prototype system was implemented and evaluated with experiments on representative datasets, covering both optical and InSAR images. Results show that DDMS can flexibly adapt to heterogeneous remote sensing data and storage backends while maintaining efficient data management and stable service performance. Stress tests further confirm its scalability and consistent responsiveness under varying workloads. DDMS provides a practical and extensible solution for large-scale online management and real-time service of remote sensing images. By enhancing modularity, scalability, and service responsiveness, the framework supports both research and practical applications that depend on massive earth observation data.

Keywords:

DDMS; remote sensing image; DFS; NoSQL; distributed computing

1. Introduction

With the rapid advancement of Earth observation technologies, remote sensing imagery has become a fundamental source of spatiotemporal big data [1,2]. Owing to their rich spectral and spatial content, remote sensing data are widely used in geological surveys [3], meteorological monitoring [4], and disaster assessment [5,6]. Recent statistics indicate that global Earth observation archives have already exceeded 807 PB, with an annual increment of more than 100 PB [7]. At the same time, the rise of large-scale remote sensing models has substantially intensified the demand for massive and heterogeneous datasets [8,9]. The construction of such models requires multi-source observations with broad spatial coverage, long temporal continuity, and multimodal integration [10], which places unprecedented pressure on data storage and management. Remote sensing data are inherently characterized by storage intensiveness and computational complexity [11]. Processing terabyte- to petabyte-scale datasets is infeasible in single-machine environments due to hardware limitations and unacceptable time consumption. These constraints underscore the need for distributed architectures and advanced data management strategies to meet the demands of large-scale computation. Efficient management and service provision have thus become not only essential for conventional remote sensing applications but also indispensable prerequisites for the scalability of large remote sensing models [11,12].

Traditional offline management approaches, which typically rely on local storage and isolated processing, are limited by their constrained scalability, inefficient resource utilization, and restricted data sharing [13,14]. These approaches cannot accommodate the growing demands of massive image datasets in either research or practical applications. With the evolution of application scenarios, requirements for efficient data access and scalable processing have become increasingly prominent. The performance requirements for image management systems are increasing accordingly, with real-time processing emerging as a defining trend for next-generation remote sensing applications [15,16]. Currently, large-scale Earth observation data are predominantly delivered through commercial or public–commercial platforms such as NOAA, Maxar, EOS, and ESA [17,18,19,20]. While these platforms provide high-resolution and frequently updated global products, their system architectures and performance details remain largely proprietary, limiting opportunities for transparent technical comparison. In addition, their cloud-centric service models restrict autonomous local data management, which can be incompatible with scenarios requiring data sovereignty or on-premises deployment. In response to these growing demands, researchers have increasingly investigated high-performance computing (HPC) strategies for remote sensing data management [21,22,23]. By leveraging parallel computing, HPC can enhance computational throughput and improve the efficiency of large-scale data analysis. However, such approaches are less effective in I/O-intensive tasks and remain limited in addressing dynamic and elastic resource demands required for real-time online services [24,25,26]. In addition, HPC-based solutions typically entail high implementation complexity, involving intricate configurations and significant maintenance costs, which further restrict their practical deployment at scale [27,28].

Big data technologies have yielded substantial progress in meeting the storage and processing demands of large-scale geospatial information [29,30]. Distributed file systems enable elastic and fault-tolerant storage, supporting scalable repositories for massive datasets [31,32,33,34]. At the same time, distributed computing frameworks such as MapReduce [35,36] and Spark [37,38] have significantly advanced the execution of complex processing tasks through cluster-based parallelization and efficient workload scheduling. Virtualization techniques further facilitate system deployment by decoupling hardware and software environments, improving both portability and resource utilization across heterogeneous infrastructures. Building on these advances, many geospatial big data platforms have been developed [39,40,41,42,43]. However, these systems are primarily oriented toward linear and polygonal vector data and therefore lack dedicated mechanisms for managing and providing services for large-scale remote sensing imagery. In addition, parallel processing approaches based on MPI and GPU acceleration [44,45] have been proposed for remote sensing data computation. While such approaches can deliver high throughput for specific tasks, they often neglect storage management, impose stringent hardware requirements, and exhibit limited fault tolerance, constraining their broader applicability [46,47].

Despite the application of big data technologies in remote sensing data management, existing solutions still present significant limitations. First, no widely recognized and generalizable framework for online management and service can be seamlessly adapted to heterogeneous environments [48]. Second, many systems remain dependent on single, tightly coupled storage architectures, which constrain scalability and limit their capacity to accommodate diverse datasets [49]. Third, the optimization of service mechanisms remains inadequate, particularly in terms of dynamic updating strategies and efficient resource scheduling. Moreover, current frameworks lack a unified structure that can support multiple types of image services within a single platform. Collectively, these limitations highlight a fundamental research challenge: designing a framework that not only meets the essential requirements of large-scale remote sensing data management and service provision but also reduces implementation complexity while ensuring adaptability across diverse storage and computing infrastructures. Addressing this challenge is crucial for advancing toward unified, scalable, and efficient systems for managing remote sensing data.

To address the limitations, this study introduces DDMS (Distributed Data Management and Service Framework), a platform-independent framework designed to support the management and real-time service of large-scale remote sensing imagery. DDMS formalizes two fundamental models: an integrated storage model that accommodates heterogeneous data repositories, and a distributed parallel processing model tailored to the processing requirements of remote sensing data. The framework adopts a loosely coupled and component-oriented design, enabling interoperability between diverse storage infrastructures and computing environments. By integrating distributed storage with parallel computing, DDMS improves the efficiency of data access, processing, and service delivery while alleviating the implementation complexity that has constrained earlier approaches. Furthermore, the abstraction of storage and computation modules provides a unified operational interface, which facilitates system extensibility and reduces dependence on specific backend configurations. Through these design principles, DDMS establishes a scalable and adaptable foundation for advancing online management and service architectures in the domain of massive remote sensing data. The framework particularly targets open deployment scenarios such as government geospatial platforms and academic research environments, where autonomous data control, flexible workflow customization, and continuous data integration are essential requirements that cannot be fully satisfied by current commercial service platforms.

2. DDMS Management and Service Architecture

2.1. The Overall Architecture of DDMS

The core challenge of remote sensing data management and service lies in addressing storage and processing demands [50], while simultaneously optimizing functional mechanisms in accordance with practical application requirements to ensure efficient data management and service delivery. Therefore, the DDMS framework provides a unified access interface that abstracts heterogeneous storage and computing systems, simplifying data management and service processes and enabling online management and real-time service of remote sensing data. The overall architecture of DDMS is organized into five parts, as illustrated in Figure 1.

The DDMS framework is designed to support online management and service of heterogeneous remote sensing data by providing unified operational interfaces for external storage, computing platforms, and application portals. As illustrated in Figure 1, the architecture is organized into five interconnected components. At the foundation lies a distributed cluster environment, which relies on virtualization and distributed technologies to construct scalable computing and storage resources for higher-level applications. On this basis, DDMS establishes an integrated storage model to accommodate heterogeneous data structures, including imagery, metadata, and tiles, each of which differs significantly in format and organizational requirements. Above the storage layer, a distributed parallel processing model provides elastic parallel processing capabilities tailored to remote sensing tasks, supporting operations such as multi-resolution pyramid construction and tile-based service updates through optimized transformation and action operators. Building upon these foundations, the management and service solution layer delivers practical functions such as metadata standardization, image query and retrieval, hierarchical modeling, segmentation, and the release and update of services, linking the infrastructure with user-oriented applications. At the top of the architecture, an external service portal integrates all underlying modules and exposes interfaces for data management, service construction and maintenance, online monitoring, TMS and WMTS service access, as well as distributed raster algebra. Within this architecture, the integrated storage model and distributed parallel processing model constitute the core modules, jointly abstracting storage and computation to enable efficient management and large-scale processing of remote sensing data in distributed environments. Figure 2 presents the UML class diagram of the DDMS framework, which formally depicts the structural organization of its core modules and the logical relationships among components that enable integrated management and service of heterogeneous remote sensing data.

2.2. Remote Sensing Data Integrated Storage Model

Efficient management of remote sensing data requires not only differentiated storage strategies tailored to the structural characteristics of various data types but also their integration into a unified framework to support the coordinated management of heterogeneous and multi-source datasets. Remote sensing imagery represents a typical form of large-scale unstructured data, usually exceeding the gigabyte level. Previous studies have demonstrated that file-based organization is more suitable for storing and managing such data [51,52]. Given the limitations of conventional file systems in terms of storage capacity and scalability, distributed file systems provide a superior solution. Moreover, distributed storage of imagery can be seamlessly combined with distributed computing, enabling block-level parallel processing and thereby improving image processing efficiency [49]. In contrast, remote sensing metadata is relatively small in scale, typically organized as key–value pairs with a highly uniform structure, making it well-suited for standardized extraction and processing. Although both relational and NoSQL databases are applicable, relational databases are generally more appropriate due to their higher efficiency in data filtering and querying, combined with the modest overall volume of metadata [53,54].

Beyond raw imagery and metadata, service construction processes generate massive volumes of image tiles. While each tile is small in size, the overall quantity is extremely large and inherently unstructured. To balance storage capacity with response efficiency, distributed NoSQL databases are preferable, as they can exploit their key–value structures and scalability to manage tile data effectively [55,56]. In addition to imagery, vectorized datasets such as InSAR point clouds also require efficient storage and retrieval. Within DDMS, raw point cloud data are stored in the distributed file system (DFS) due to their large scale and unstructured characteristics, whereas the tiled and indexed representations of point clouds are organized in NoSQL databases. Spatial indexing strategies (e.g., Z-order or Hilbert curves [57]) are employed to encode and partition these tiles, enabling efficient spatiotemporal retrieval and visualization. To accommodate such heterogeneous requirements, the DDMS framework introduces an integrated hybrid storage model (Figure 3). In this model, remote sensing data are abstracted into three object classes (Figure 4): the RSMetadataObject, which manages parsing, normalization, and warehousing of metadata; the RSObject, which handles raw remote sensing data including imagery and point clouds, supporting operations such as upload, download, pyramid construction, and vector slicing; and the RSTileObject, which organizes tile data and provides interfaces for layer read/write, indexing, and spatial encoding. Through this abstraction, heterogeneous datasets can be mapped to appropriate storage backends—DFS for large-scale imagery and raw point clouds, relational databases for metadata, and NoSQL systems for image and point cloud tiles—establishing a loosely coupled storage mechanism that enhances the efficiency, scalability, and interoperability of multi-source remote sensing data management.

To ensure consistency across the hybrid storage backends, DDMS adopts a versioned management strategy. Each update of an image or point cloud creates a new version identifier that links metadata records, raw files in the distributed file system, and tile entries in the NoSQL cluster. New data are always written as a separate version, and the previous version remains unchanged until the new version has been fully generated and validated. The logical layer only exposes versions whose status is marked as active, so partially updated data never becomes visible to external services.

2.3. Remote Sensing Data Distributed Processing Model

In addition to storage capacity, the efficiency of data computation is a major factor constraining the management and service of remote sensing datasets. Traditional single-node systems cannot fully utilize available computing resources, and their performance quickly reaches a bottleneck beyond which further acceleration becomes difficult. High-performance solutions based on MPI or GPU acceleration can enhance computational capability. Still, their complex programming requirements and reliance on specialized hardware restrict their applicability in large-scale operational environments [47]. Distributed computing offers a compelling alternative, as it leverages resource schedulers to utilize the collective capacity of commodity cluster nodes without depending on dedicated devices, thereby broadening its applicability. Remote sensing data analysis is typically grounded in raster algebra for imagery and spatial algebra for point clouds, where pixel or point values within a defined spatial domain are processed to support visualization and quantitative analysis. To improve computational efficiency, the DDMS framework develops a Distributed Parallel Processing (DPP) model on top of its integrated storage system (Figure 5). Through distributed interfaces, both imagery and point clouds can be read in parallel. For imagery, if the required data are not already stored in the NoSQL cluster, they are retrieved from the distributed file system, partitioned into blocks, and processed concurrently across nodes using distributed operators to construct an elastic dataset represented as key–value pairs of spatial extent and corresponding pixel values. For vector data such as InSAR point clouds, raw datasets are also stored in DFS, while tiled and indexed subsets are retrieved from NoSQL clusters to construct elastic distributed point datasets, enabling efficient parallel computation.

Once constructed, elastic datasets undergo repartitioning to further increase parallelism, with the number of partitions determined by data size and cluster capacity, typically ranging from hundreds to thousands. A distributed resource manager assigns partitions to cluster nodes using fair scheduling policies and executes algebraic operations within predefined spatial ranges. Similar to traditional map algebra, four categories of operations are supported: pixel-wise operations for independent cell-based calculations, neighborhood-based operations such as convolution filtering, region-based operations that integrate vector-defined masks to guide parallel processing, and map-wide operations that apply uniform calculations across the entire dataset. For vector data such as InSAR point clouds, analogous operators support local deformation estimation, neighborhood-based coherence filtering, region-based aggregation within geophysical zones, and global reference frame adjustments. The DPP model supports these operations through transformation and action operators. Transformations, which constitute the core of distributed parallel computing, include functions such as map, flatMap, and mapPartition for tile- or block-level computations, reduce and fold for distributed aggregation, and partition or repartition for data layout optimization. Auxiliary operators, such as persist, cache, and count, are implemented to improve efficiency by retaining intermediate results in memory. Action operators are then applied to collect or output the final results, thereby enabling the efficient large-scale processing of both imagery and vector data, such as InSAR point clouds, within the DDMS framework.

3. DDMS Application Design: Optical Image Service Management and Large-Scale InSAR Data Visualization

3.1. Optical Image Service Online Management

Efficient management and timely updating of optical remote sensing image services are fundamental to advanced geospatial applications. With the growing demand for near-real-time monitoring in fields such as urban transformation, agricultural supervision, and disaster response, the ability to refresh imagery services with minimal latency has become a critical requirement [58,59]. Traditional offline approaches, which mosaic historical and newly acquired scenes into a composite before release, entail heavy computational overheads, generate redundant tile datasets, and introduce considerable delays. These drawbacks not only waste storage and processing resources but also compromise the responsiveness of applications that require high temporal resolution. To address these challenges, the DDMS implements a tile-level incremental updating strategy tailored for online service management. Instead of full-scale mosaicking, DDMS combines its distributed integrated storage model with a distributed parallel processing model to update tile services dynamically. Incoming optical images undergo preprocessing steps—including radiometric normalization, reprojection, and pyramid segmentation—within the distributed cluster, and each dataset independently produces a multi-resolution tile service without modifying previously archived data. This modular and non-invasive design preserves data integrity while enabling rapid service refresh.

For incremental tile updates, DDMS follows a three-stage workflow. First, the system creates a new version entry in the metadata store and records the intended coverage and service level. Second, the distributed parallel processing model generates all pyramid tiles for this version and writes them to the NoSQL backend, followed by automatic integrity checks on tile counts, spatial coverage, and key levels. Third, only if all checks succeed does the logical layer switch the active mapping from the previous version to the new one and flag the previous version as historical. If any failure occurs in the second stage, the switch is not performed, and the previous version remains active, which prevents clients from observing half-updated services.

During online access, as shown in Figure 6, tiles corresponding to the requested spatial extent are retrieved, stitched in parallel, and visualized in real time. Non-overlapping areas are displayed directly from storage, while overlapping regions are fused through distributed operators using interpolation schemes such as nearest neighbor, bilinear, or cubic convolution. This workflow (Algorithm 1) minimizes redundant computation, accelerates update cycles, and achieves a practical balance between accuracy and timeliness. The updating mechanism is supported by modular service components: ImgPyramider performs hierarchical tile generation; RSLogicalLayer establishes logical mappings that unify standard and updated services under a single access model; and RSSpatialMosaic provides distributed resampling and stitching functions. Together, these modules ensure the scalability and interoperability of service operations.

Beyond updating, DDMS also emphasizes interoperability and standardization. While the tile map service (TMS) is the default format, a proxy-based transformation module enables on-demand conversion into an OGC-compliant Web Map Tile Service (WMTS). This includes the automatic generation of capability documents that describe service metadata, supported coordinate systems, and access endpoints, thereby facilitating integration with heterogeneous client systems. By combining incremental updating, distributed parallelism, and standards-based interoperability, the DDMS framework surpasses conventional methods to deliver real-time, scalable, and efficient service management. Although its tile-based approach may sacrifice some fusion accuracy compared with offline mosaics, it substantially reduces latency and computational costs. Most importantly, it ensures rapid access to the latest optical imagery, enabling high-timeliness applications and reinforcing the role of remote sensing in dynamic, data-driven decision-making.

Algorithm 1 Parallel update of the remote sensing image tile service

Input: HTTP update request R

Output: Resulting tile(s) T

1: if isValidFormat(R) then

2: forwardRequest(R)

3: end if

4: (z, x, y)

\leftarrow

parseTileRequest(R)

5: targetBox

\leftarrow

computeSpatialBox(z, x, y)

6: InfoList

\leftarrow

\emptyset

7: for each sourceImage S in updateCandidates do // parallelizable

8: (zs, xs, ys)

\leftarrow

parseImageHeader(S)

9: box_s

\leftarrow

computeSpatialBox(zs, xs, ys)

10: if intersects(box_s, targetBox) then

11: InfoList

\leftarrow

InfoList

\cup

retrieveImageInfo(box_s)

12: end if

13: end for

14: TileSet

\leftarrow

\emptyset

15: for each info in InfoList do // parallelizable

16: tile

\leftarrow

readTileFromStore(info, NoSQLHandler)

17: TileSet

\leftarrow

TileSet

\cup

{tile}

18: end for

19: if |TileSet| = 0 then

20: return null

21: else if |TileSet| = 1 then

22: return head(TileSet)

23: else

24: T

\leftarrow

TileSet.reduce { (A, B)

\Rightarrow

25: for each band b in B do

26: for r = 1 to rows(b) do

27: for c = 1 to cols(b) do

28: A[b][r][c] ← mergeResample(A[b][r][c], B[b][r][c], method)

29: end for

30: end for

31: end for

32: A

33: }

34: return T

35: end if

3.2. Large-Scale InSAR Point Cloud Visualization

The visualization of large-scale InSAR point cloud data has long posed significant challenges in remote sensing data processing [44,60]. Traditional single-machine approaches can handle visualization tasks to a limited extent; however, their reliance on a single computing node results in a rapidly decreasing processing speed as data volumes increase and system complexity grows exponentially. With the continuous growth of data scale, these methods are increasingly unable to meet the requirements of efficient and near-real-time visualization. Accordingly, distributed computing and hybrid storage strategies have been explored as potential solutions, aiming to improve the efficiency of large-scale point cloud processing while reducing latency and resource consumption. Standardized design principles and optimized storage architectures have been highlighted as promising directions for supporting the visualization of dense InSAR point cloud datasets in large-area ground monitoring scenarios.

Within our prototype system, as shown in Figure 7, the construction of InSAR points pyramids plays a crucial role. Pyramid construction is not only a method for multi-level processing of remote sensing imagery but is also applicable to vectorized InSAR point cloud data. By multi-resolution segmentation, massive data can be divided into tiles of varying granularities, which are stored in a pyramid structure. Each level of the pyramid represents a different resolution, enabling efficient data access and real-time rendering. In this process, simplifying point cloud data is a core step in pyramid construction. Through simplification algorithms, we denoise, compress, and hierarchically process the point cloud data, enabling each resolution level to load and display the data quickly with minimal computational resources. The choice of simplification methods depends not only on the spatial distribution characteristics of the data but also on the requirements of practical applications, such as real-time demands in disaster monitoring or urban change detection.

The pyramid construction process involves spatially dividing the original InSAR point cloud data to create a multi-level tile structure. Let the point cloud data set be denoted as

P = {p_{1}, p_{2}, \dots, p_{n}}

, where each point

p_{i}

has spatial coordinates

(x_{i}, y_{i})

and attribute information

a_{i}

(such as reflectivity, velocity, etc.). The pyramid construction decomposes the data into tiles of different resolutions, where each tile

T_{z}

represents the spatial data at resolution level

z

. Each tile can be expressed as follows:

T_{z} = {p_{i} | p_{i} \in P, d (p_{i}, p_{j}) < δ}

(1)

where

d (p_{i}, p_{j})

is the spatial distance between points

p_{i}

and

p_{j}

, and

δ

is the threshold, representing the maximum distance at which points are considered to belong to the same tile. By dividing the data into different resolution levels, each level contains data corresponding to a specific resolution.

In processing the data within each tile, we apply a simplification method that considers both spatial distance and attribute differences. Let the spatial distance between points

p_{i}

and

p_{j}

be

d (p_{i}, p_{j})

. If

d (p_{i}, p_{j}) < δ

, we consider these points to be overlapping and eligible for simplification. The simplification process not only relies on spatial characteristics but also considers attribute differences. Let the attributes of points

p_{i}

and

p_{j}

be

a_{i}

and

a_{j}

, respectively. When the attribute difference

|a_{i} - a_{j}| < ϵ

, we consider these points to be similar in attributes and eligible for merging. The coordinates of the merged point

(x_{m}, y_{m})

and the attribute

a_{m}

can be calculated as:

x_{m} = {(x}_{i} + x_{j}) / 2

(2)

y_{m} = {(y}_{i} + y_{j}) / 2

(3)

a_{m} = {(a}_{i} + a_{j}) / 2

(4)

This spatial and attribute-based simplification method significantly reduces redundant points while preserving key spatial features and attribute information, thus reducing computational load. The simplified point cloud data is then encoded and stored using spatial indexing techniques. To improve data access efficiency, spatial filling encoding methods are applied to the simplified data. Given the spatial extent

R_{z}

of the point cloud data and resolution level

z

, the spatial index for each tile is implemented via an encoding function

e n c o d e (R_{z}) = G e o h a s h (R_{z})

.

Spatial indexing not only accelerates data retrieval but also facilitates the effective distribution of data within distributed storage systems. The simplified point cloud data, along with spatial indexing, is stored in a distributed NoSQL database and processed in parallel using distributed computing frameworks. Parallel computation enables the data within different tiles to be processed on multiple computing nodes, facilitating efficient data processing and visualization. Each layer of the pyramid stores data as tiles, with tile size and resolution adjustable. The tile size at different resolution levels needs to be adjusted to ensure that the computational load for each tile is manageable. Let the tile size at level

k

be

W_{k} \times H_{k}

, then at level

k + 1

, the tile size is adjusted as follows:

W_{k + 1} = W_{k} / 2, H_{k + 1} = H_{k} / 2

(5)

The construction of tiles at each level requires simplification and spatial adjustment to ensure that data from different levels seamlessly connect and meet real-time rendering requirements. With the support of distributed storage and computing architecture, large-scale data can be loaded and rendered rapidly, maintaining a balance between real-time performance and accuracy.

4. Experiments and Discussion

4.1. Experimental Setting

To evaluate the effectiveness of the DDMS distributed framework for managing and servicing remote sensing data, a private cloud environment consisting of six nodes was deployed on a virtualization platform. Within this environment, one application server acted as the master node, coordinating with five distributed computing nodes to verify system efficiency and reliability. The cluster configuration was as follows: (1) one application server node equipped with four virtual CPUs (VCPUs), 16 GB of memory, and 500 GB of disk storage; and (2) five distributed computing nodes, each configured with six VCPUs, 12 GB of memory, and 1024 GB of disk storage. In our experiments, the optical dataset consists of three-band RGB remote sensing imagery with a spatial resolution of approximately 1 m. The InSAR dataset is stored as discrete point observations with tens of millions of points per scene.

In terms of system architecture, raw remote sensing datasets, including both optical imagery and InSAR point clouds, were stored in the Hadoop Distributed File System (HDFS). At the same time, Spark was adopted as the distributed computing engine. In this configuration, the storage and computing layers of DDMS remain loosely coupled with the underlying distributed systems. While the current prototype implementation is based on HDFS and Spark, the framework design does not rely on platform-specific features and can be extended to larger clusters or alternative distributed infrastructures when additional resources become available. This provides architectural support for horizontal scalability beyond the present deployment.

For tile data storage, three representative distributed NoSQL databases were incorporated: Accumulo [61], HBase [62], and Redis [63] Cluster. Accumulo, as a column-oriented store, provides efficient support for large-scale data ingestion and retrieval. HBase offers high scalability and throughput, well-suited for hybrid real-time and batch data scenarios. Redis Cluster, functioning as a distributed key–value store, enables rapid caching and high-speed access. In our deployment, Redis Cluster was configured in a pure in-memory operating mode without persistence enabled. These backends were selected because they represent three typical storage strategies widely adopted in remote sensing data services. Accumulo and HBase provide persistent and scalable disk-based storage suitable for managing massive tile repositories and long-term archiving. Redis Cluster adopts an in-memory architecture that enables rapid access and high concurrency, reflecting practical scenarios where latency-sensitive applications avoid disk I/O by caching hot tiles in memory to improve online visualization and monitoring performance. The integration of these heterogeneous backends allowed for a comprehensive verification of the DDMS framework’s compatibility and performance across diverse NoSQL architectures and application demands.

The experimental evaluation primarily focused on testing the compatibility, processing efficiency, and transaction capabilities of the DDMS for online management and real-time service of heterogeneous data. To ensure the effective execution of distributed workloads, the Spark runtime environment was configured with the following settings: driver memory set to 2 GB, executor memory to 4 GB, and executor cores to 4. Comparative experiments across the different NoSQL backends demonstrated that the framework exhibited strong flexibility and stability when handling large-scale datasets, confirming its performance advantages for scalable remote sensing data management and service delivery. It is important to clarify that direct benchmarking against commercial Earth observation platforms is not included, as their system architectures and performance metrics are not publicly accessible. Therefore, the evaluation focuses on feasibility in open deployment environments consistent with the intended application scenarios of DDMS.

4.2. Experiments on Storage Capability

To evaluate the compatibility of the DDMS system with different storage backends, this study conducted distributed experiments using Accumulo, HBase, and Redis Cluster as backend databases (Figure 8), focusing on pyramid-based partitioning and storage. Remote sensing images of varying sizes were processed in parallel under different partition numbers, and the average execution times were recorded. To ensure experimental consistency, the spectral configuration and spatial resolution of the imagery were kept constant across datasets of different sizes. The results demonstrate that the DDMS framework can support distributed acceleration for hierarchical pyramid construction and integrate flexibly with various types of NoSQL databases for tile storage.

As shown in Figure 8, in terms of performance, the Redis Cluster backend consistently demonstrated the highest processing efficiency, outperforming both Accumulo and HBase in terms of pyramid slicing and tile storage. Specifically, when the partition numbers were set to 60, 120, and 180, the average acceleration ratios of Redis Cluster compared with Accumulo were 3.5, 2.7, and 2.0, respectively. In comparison with HBase, the ratios were 2.8, 2.2, and 1.6, respectively. These results indicate that, given sufficient memory capacity, distributed cache-based key-value stores achieve superior performance in large-scale tasks involving point cloud and image tile storage. HBase, as a distributed columnar database, demonstrated intermediate performance, ranking between Redis Cluster and Accumulo. Its performance was higher than that of Accumulo but slightly lower than Redis Cluster. With the increase in partition numbers, HBase and Accumulo demonstrated a stable upward performance trend, indicating stronger scalability for large-scale data ingestion. As shown in Figure 8b,c, when the dataset size expanded from approximately 2.5 GB to 10 GB, the pyramid construction time for both systems decreased, suggesting that they can accommodate growing data volumes. In contrast, Redis Cluster consistently maintained high performance and stability across different partition settings, with only a slight increase in latency observed when the number of partitions became excessively large.

In the experiments of InSAR data, as shown in Figure 9, the overall processing time was higher compared with remote sensing imagery. This is mainly because point cloud simplification and hierarchical modeling require extensive spatial proximity computations and attribute fusion operations, resulting in significantly higher computational complexity than the rasterization of two-dimensional imagery. The differences among storage backends were also more pronounced. Redis Cluster continued to demonstrate strong parallel acceleration, though its advantage in memory access was somewhat reduced in the point cloud scenario. HBase exhibited a more balanced performance, with its strengths in batch data management and range queries enabling efficiency close to that of Redis. Accumulo remained less efficient, and although its performance improved with increased data partitions, the gains were limited, lagging behind the other two databases. Across different partition scales, both Redis and HBase showed good stability, with processing time exhibiting quasi-linear growth as input data volume increased, indicating that they effectively leveraged distributed computing architectures to mitigate the impact of I/O overhead. As shown in Figure 9b,c, the variation in processing time remained relatively small across different dataset sizes, consistently within 1500 s, further demonstrating their ability to maintain high processing efficiency under increasing workloads.

The experimental results show that, with increasing input data volumes, all three backends exhibited quasi-linear growth in execution time; however, Redis Cluster and HBase demonstrated greater stability than Accumulo. Under Redis Cluster and HBase configurations, the variation in processing time caused by different partition settings remained limited, indicating that their parallel computation frameworks and storage architectures effectively mitigated I/O fluctuations induced by partition changes. As shown by the blue bars in Figure 9, for typical-scale InSAR data, the Redis-based backend was able to keep the pyramid construction time within approximately 300–1200 s. This suggests that Redis Cluster is particularly well-suited for real-time visualization and monitoring scenarios that require low latency. In contrast, HBase offers a more balanced solution in terms of stability and scalability, making it well-suited for hybrid workloads that involve both batch and streaming tasks.

4.3. Experiments on Image Service Construction Performance

The processing time of remote sensing imagery is a key metric for evaluating the performance of DDMS. To assess its effectiveness, comparative experiments were conducted between the DDMS prototype system and the ArcGIS Server system, focusing on the execution time of image pyramid construction. Additionally, further experiments were designed to investigate how the end-to-end processing time of the DDMS prototype system changes as the input data size increases. For this purpose, imagery datasets with sizes of approximately 0.6 GB, 1 GB, 1.5 GB, 2 GB, 2.2 GB, 2.5 GB, and 3 GB were selected. Both the DDMS prototype and ArcGIS Server were deployed under comparable hardware conditions, ensuring equivalent total CPU frequency and memory capacity. It should be clarified that ArcGIS Server was deployed as a single-machine Personal Edition instance, consistent with commonly adopted configurations in operational remote sensing service systems. Therefore, this comparison does not constitute a distributed-to-distributed benchmark but instead uses ArcGIS Server as a baseline reference reflecting typical single-node deployments in similar system settings. The experimental results, as illustrated in Figure 10, demonstrate that under identical configurations, the DDMS framework consistently requires less time for hierarchical pyramid tiling than the ArcGIS Server system. Furthermore, the calculated relative speedup of DDMS exceeded a factor of two across all tested dataset sizes, highlighting its efficiency advantage in pyramid-based image processing tasks.

Since the ArcGIS Server personal edition is unable to process imagery of larger sizes, direct comparison between the two systems could not be extended to higher data volumes. Therefore, further experiments were conducted solely on the DDMS prototype system to evaluate its capability in large-scale pyramid construction. A total of sixteen remote sensing images, with sizes ranging from approximately 1 GB to 100 GB, were used as inputs to evaluate the scalability and performance of the DDMS framework systematically, and the results are shown in Figure 11. As illustrated, the processing time of DDMS increases in an approximately linear fashion with the growth of input size, indicating that the system exhibits good stability in the distributed processing of large-scale remote sensing imagery.

To provide a more quantitative measure, an indicator

T

was defined as the average time consumed per unit of pyramid construction:

T = P r o c e s s i n g T i m e / I m a g e S i z e

(6)

where ProcessingTime refers to the end-to-end time required for pyramid construction, measured from execution initiated in the system management interface to completion of the entire workflow, and ImageSize denotes the size of the input image. Based on this definition, Figure 11 can be transformed into the performance curve shown in Figure 12. The curve reveals that

T

decreases initially and then increases gradually as the input size grows. This pattern aligns with the characteristics of distributed parallel computing: for smaller datasets, the computational workload is limited, and overheads such as data partitioning, task scheduling, and result communication dominate, leading to higher unit processing costs. With increasing data size, parallelism is better utilized, resulting in lower and more stable unit processing time. As the dataset size continues to grow, the benefits of increased parallel utilization reach a balance with the overhead of distributed task scheduling, data shuffling, and inter-node communication. Under the current cluster configuration, the workload around approximately 18 GB becomes sufficiently large to fully exploit parallel execution across nodes, while coordination overhead remains relatively low, which results in the lowest observed unit processing time. As the workload continues to increase beyond this point, queueing delay and inter-node synchronization overhead gradually accumulate and lead to an increase in unit processing cost. This observation aligns with reported behaviors in distributed large-scale raster computation, where processing efficiency typically reaches a peak once parallel resources are adequately engaged before eventually declining due to increasing distributed system overhead [64,65].

4.4. Experiments on Stress Testing for Service Responsiveness

Service response capability is a critical indicator for evaluating the stability and practicality of distributed prototype systems. To assess the transaction-level responsiveness of the DDMS framework with different storage backends, we conducted experiments based on imagery and InSAR point cloud services generated by the previously introduced online updating method. The results are shown in Figure 13, Figure 14, Figure 15 and Figure 16. Figure 13 presents the average response time for 1000 simulated tile service requests, while Figure 14 illustrates the results of a stress test simulating 100 concurrent users per second over 1000 repeated requests. It should be noted that in these tests, Redis Cluster operated entirely in memory without persistent storage enabled, and therefore, the results mainly reflect memory dominated access scenarios.

As shown in Figure 13, under low-concurrency conditions, the response latency of tile services with Redis Cluster as the storage backend was consistently lower than that of services using HBase and Accumulo. This performance advantage can be attributed to the architectural characteristics of Redis Cluster, which employs an in-memory, non-overlapping sharding strategy and dynamically retrieves relevant data during access, thereby reducing request delays. Furthermore, Redis Cluster nodes communicate through the Gossip protocol, maintaining global consistency across the cluster and avoiding metadata bottlenecks that commonly arise in systems such as Accumulo and HBase, where operational metadata is centralized. By contrast, systems based on HBase and Accumulo, although slightly slower in response speed, do not show significant performance degradation and, due to their lower dependence on high-performance memory, still demonstrate strong adaptability in conventional application environments.

The stress testing results are illustrated in Figure 14. Redis Cluster demonstrated superior concurrency performance compared with HBase and Accumulo, maintaining greater stability and higher throughput under high-load conditions with 1000 simulated concurrent requests per second. These results highlight the response efficiency of Redis Cluster–based caching backends, making them particularly suitable for latency-sensitive online services. In contrast, Accumulo, while providing fine-grained control and strong applicability for geospatial data storage [61], exhibited relatively higher response latency, reaching approximately 1500 ms as shown in Figure 14b.

Comparable experiments were also conducted for InSAR data services, and the results exhibited performance trends consistent with those observed in the imagery experiments. As shown in Figure 15, Redis Cluster achieved the lowest average response time, followed by HBase and then Accumulo. Although the differences among the three storage backends were less pronounced than in the imagery case, Redis Cluster still maintained a clear performance advantage. This indicates that architectural features, such as in-memory sharding and efficient data retrieval, remain effective even when handling vectorized InSAR tiles. Under stress testing conditions with 1000 simulated concurrent requests per second (Figure 16), Redis Cluster again delivered the fastest responses, while HBase and Accumulo showed slightly higher latency. In contrast with imagery services, the increase in response latency under heavy load was less significant for InSAR services. This can be attributed to the relatively minor data volume of InSAR vector tiles compared with imagery tiles, which reduces computational and I/O overhead during service requests. Consequently, although the performance differences among Redis, HBase, and Accumulo narrowed under this scenario, as shown in Figure 16b, Redis Cluster still achieved the best results, with its response latency concentrated around 500 ms and remaining lower than the other backends. This underscores its suitability for large-scale and real-time InSAR applications where stability and responsiveness are essential. At the same time, systems based on HBase and Accumulo, despite exhibiting slightly slower response speeds, demonstrated strong capacity, indicating their ability to support conventional high-concurrency application requirements.

5. Conclusions

This study develops the DDMS framework for distributed online management and service of remote sensing data. The framework integrates an integrated storage model that combines RDBMS, DFS, and NoSQL backends with a distributed parallel processing model, enabling efficient handling of heterogeneous image and InSAR point cloud data. To address the timeliness requirements of online applications, an incremental tile processing mechanism is introduced, which refreshes services in a non-invasive and loosely coupled manner, thereby reducing latency while maintaining data consistency. In addition, a unified proxy-based service format is implemented to ensure interoperability across heterogeneous client systems. A prototype system was developed to evaluate the framework’s performance under various NoSQL storage backends. Experimental results show that DDMS can flexibly adapt to different backends, with Redis Cluster achieving the fastest response and HBase and Accumulo maintaining balanced scalability in efficiency. Across large-scale pyramid construction tasks, the framework demonstrated low latency, stable throughput, and strong compatibility with various storage systems. The DDMS framework offers a scalable and generalizable approach for managing and servicing large-scale heterogeneous remote sensing data online. Future work will expand its storage and computing adaptability, as well as enrich its online processing capabilities for diverse applications, and further validate the elasticity of the framework on larger-scale distributed environments to demonstrate its scalability beyond the current experimental setting.

Author Contributions

Conceptualization, H.C. and J.Z.; methodology, H.C. and J.Z.; software, J.Z.; validation, H.C. and J.Z.; resources, J.Z., J.G. and L.X.; data curation, Y.C.; writing—original draft preparation, H.C.; writing—review and editing, Z.L. and K.Q.; visualization, H.C.; supervision, J.Z. and H.W.; project administration, J.Z. and J.G.; funding acquisition, Z.L. and H.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China (Grant No. 42301527) and in part by the Open Research Fund of the State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University (Grant No. 24R07).

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Acknowledgments

All authors would like to thank the anonymous reviewers. Thanks are due to Meng Li for linguistic assistance. Thanks are due to Xiaoliang He for coding assistance.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Dritsas, E.; Trigka, M. Remote Sensing and Geospatial Analysis in the Big Data Era: A Survey. Remote Sens. 2025, 17, 550. [Google Scholar] [CrossRef]
Yu, D.; Fang, C. Urban Remote Sensing with Spatial Big Data: A Review and Renewed Perspective of Urban Studies in Recent Decades. Remote Sens. 2023, 15, 1307. [Google Scholar] [CrossRef]
Wang, S.; Huang, X.; Han, W.; Zhang, X.; Li, J. Geological Remote Sensing Interpretation via a Local-to-Global Sensitive Feature Fusion Network. Int. J. Appl. Earth Obs. Geoinf. 2024, 135, 104258. [Google Scholar] [CrossRef]
Dimitrov, S.; Iliev, M.; Borisova, B.; Semerdzhieva, L.; Petrov, S. A Methodological Framework for High-Resolution Surface Urban Heat Island Mapping: Integration of UAS Remote Sensing, GIS, and the Local Climate Zoning Concept. Remote Sens. 2024, 16, 4007. [Google Scholar] [CrossRef]
Sarkar, A.; Chowdhury, T.; Murphy, R.R.; Gangopadhyay, A.; Rahnemoonfar, M. SAM-VQA: Supervised Attention-Based Visual Question Answering Model for Post-Disaster Damage Assessment on Remote Sensing Imagery. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–16. [Google Scholar] [CrossRef]
Zuo, C.; Zhang, H.; Ma, X.; Gong, W. Impact Assessment of Flood Events Based on Multisource Satellite Remote Sensing: The Case of Kahovka Dam. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 20164–20176. [Google Scholar] [CrossRef]
Wilkinson, R.; Mleczko, M.M.; Brewin, R.J.W.; Gaston, K.J.; Mueller, M.; Shutler, J.D.; Yan, X.; Anderson, K. Environmental Impacts of Earth Observation Data in the Constellation and Cloud Computing Era. Sci. Total Environ. 2024, 909, 168584. [Google Scholar] [CrossRef]
Li, X.; Wen, C.; Hu, Y.; Yuan, Z.; Zhu, X.X. Vision-Language Models in Remote Sensing: Current Progress and Future Trends. IEEE Geosci. Remote Sens. Mag. 2024, 12, 32–66. [Google Scholar] [CrossRef]
Lu, S.; Guo, J.; Zimmer-Dauphinee, J.R.; Nieusma, J.M.; Wang, X.; VanValkenburgh, P.; Wernke, S.A.; Huo, Y. Vision Foundation Models in Remote Sensing: A Survey. IEEE Geosci. Remote Sens. Mag. 2025, 13, 190–215. [Google Scholar] [CrossRef]
Samadzadegan, F.; Toosi, A.; Dadrass Javan, F. A Critical Review on Multi-Sensor and Multi-Platform Remote Sensing Data Fusion Approaches: Current Status and Prospects. Int. J. Remote Sens. 2025, 46, 1327–1402. [Google Scholar] [CrossRef]
Chi, M.; Plaza, A.; Benediktsson, J.A.; Sun, Z.; Shen, J.; Zhu, Y. Big Data for Remote Sensing: Challenges and Opportunities. Proc. IEEE 2016, 104, 2207–2219. [Google Scholar] [CrossRef]
Yang, M.; Mei, H.; Yang, Y.; Huang, D. Efficient Storage Method for Massive Remote Sensing Image via Spark-Based Pyramid Model. Int. J. Innov. Comput. Inf. Control 2017, 13, 1915–1928. [Google Scholar]
Xu, C.; Du, X.; Fan, X.; Giuliani, G.; Hu, Z.; Wang, W.; Liu, J.; Wang, T.; Yan, Z.; Zhu, J.; et al. Cloud-Based Storage and Computing for Remote Sensing Big Data: A Technical Review. Int. J. Digit. Earth 2022, 15, 1417–1445. [Google Scholar] [CrossRef]
Yang, L.; He, W.; Qiang, X.; Zheng, J.; Huang, F. Research on Remote Sensing Image Storage Management and a Fast Visualization System Based on Cloud Computing Technology. Multimed. Tools Appl. 2024, 83, 59861–59886. [Google Scholar] [CrossRef]
Sun, J.; Zhang, Y.; Wu, Z.; Zhu, Y.; Yin, X.; Ding, Z.; Wei, Z.; Plaza, J.; Plaza, A. An Efficient and Scalable Framework for Processing Remotely Sensed Big Data in Cloud Computing Environments. IEEE Trans. Geosci. Remote Sens. 2019, 57, 4294–4308. [Google Scholar] [CrossRef]
Xu, C. From Observation to Understanding: Rethinking Geological Hazard Research in an Era of Advanced Technologies. npj Nat. Hazards 2025, 2, 85. [Google Scholar] [CrossRef]
Willett, D.S.; Brannock, J.; Dissen, J.; Keown, P.; Szura, K.; Brown, O.B.; Simonson, A. NOAA Open Data Dissemination: Petabyte-Scale Earth System Data in the Cloud. Sci. Adv. 2023, 9, 1–8. [Google Scholar] [CrossRef] [PubMed]
Velastegui-Montoya, A.; Montalván-Burbano, N.; Carrión-Mero, P.; Rivera-Torres, H.; Sadeck, L.; Adami, M. Google Earth Engine: A Global Analysis and Future Trends. Remote Sens. 2023, 15, 3675. [Google Scholar] [CrossRef]
Choi, M.; Lyapustin, A.; Wang, Y.; Tucker, C.J.; Khan, M.N.; Policelli, F.; Neigh, C.S.R.; Hall, A.A. Calibration of Maxar Constellation Over Libya-4 Site Using MAIAC Technique. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 5460–5469. [Google Scholar] [CrossRef]
Gomes, V.C.F.; Queiroz, G.R.; Ferreira, K.R. An Overview of Platforms for Big Earth Observation Data Management and Analysis. Remote Sens. 2020, 12, 1253. [Google Scholar] [CrossRef]
Russo, M.; Nisar, M.; Pauciullo, A.; Imperatore, P.; Lapegna, M.; Romano, D. A Multi-Level Parallel Algorithm for Detection of Single Scatterers in SAR Tomography. In Proceedings of the 2025 33rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP), Turin, Italy, 12 March 2025; IEEE: NewYork, NY, USA, 2025; pp. 544–551. [Google Scholar]
Dutta, U.; Singh, Y.K.; Prabhu, T.S.M.; Yendargaye, G.; Kale, R.; Khare, M.K.; Kumar, B.; Panchang, R. Embankment Breach Simulation and Inundation Mapping Leveraging High-Performance Computing for Enhanced Flood Risk Prediction and Assessment. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2024, X-3–2024, 117–123. [Google Scholar] [CrossRef]
Ma, Y.; Wu, H.; Wang, L.; Huang, B.; Ranjan, R.; Zomaya, A.; Jie, W. Remote Sensing Big Data Computing: Challenges and Opportunities. Future Gener. Comput. Syst. 2015, 51, 47–60. [Google Scholar] [CrossRef]
Bhattarai, R.; Pritchard, H.; Ghafoor, S. Evaluation of a Dynamic Resource Management Strategy for Elastic Scientific Workflows. In Proceedings of the European Conference on Parallel Processing, Madrid, Spain, 26–30 August 2024; Springer: Cham, Switzerland, 2025; pp. 334–345. [Google Scholar]
Bhattarai, R.; Pritchard, H.; Ghafoor, S. Enabling Elasticity in Scientific Workflows for High-Performance Computing Systems. In Proceedings of the European Conference on Parallel Processing, Dresden, Germany, 25–29 August 2025; Springer: Cham, Switzerland, 2025; pp. 307–321. [Google Scholar]
Souza, A.; Rezaei, M.; Laure, E.; Tordsson, J. Hybrid Resource Management for HPC and Data Intensive Workloads. In Proceedings of the 2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), Larnaca, Cyprus, 14–17 May 2019; IEEE: NewYork, NY, USA, 2019; pp. 399–409. [Google Scholar]
Vanz, N.; Munhoz, V.; Castro, M.; Pilla, L.L.; Aumage, O. Task-Based HPC in the Cloud: Price-Performance Analysis of N-Body Simulations with StarPU. In Proceedings of the 13th IEEE International Conference on Cloud Engineering IC2E 2025, Rennes, France, 23–26 September 2025. [Google Scholar]
Al-Dhuraibi, Y.; Paraiso, F.; Djarallah, N.; Merle, P. Elasticity in Cloud Computing: State of the Art and Research Challenges. IEEE Trans. Serv. Comput. 2017, 11, 430–447. [Google Scholar] [CrossRef]
Zhang, X.; Xiang, L.; Yue, P.; Gong, J.; Wu, H. Open Geospatial Engine: A Cloud-Based Spatiotemporal Computing Platform. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2024, X-4–2024, 453–459. [Google Scholar] [CrossRef]
Li, Z.; Cao, Z.; Yue, P.; Zhang, C. Earth Video Cube: A Geospatial Data Cube for Multisource Earth Observation Video Management and Analysis. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 4986–5000. [Google Scholar] [CrossRef]
Xu, C.; Du, X.; Yan, Z.; Fan, X. ScienceEarth: A Big Data Platform for Remote Sensing Data Processing. Remote Sens. 2020, 12, 607. [Google Scholar] [CrossRef]
Li, H.; Gu, M.; Shi, G.; Hu, Y.; Xie, M. Distribution-Based Approach for Efficient Storage and Indexing of Massive Infrared Hyperspectral Sounding Data. Remote Sens. 2024, 16, 4088. [Google Scholar] [CrossRef]
Wu, J.; Gan, W.; Chao, H.-C.; Yu, P.S. Geospatial Big Data: Survey and Challenges. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 17007–17020. [Google Scholar] [CrossRef]
Tang, X.; Yao, X.; Liu, D.; Zhao, L.; Li, L.; Zhu, D.; Li, G. A Ceph-Based Storage Strategy for Big Gridded Remote Sensing Data. Big Earth Data 2022, 6, 323–339. [Google Scholar] [CrossRef]
Hedayati, S.; Maleki, N.; Olsson, T.; Ahlgren, F.; Seyednezhad, M.; Berahmand, K. MapReduce Scheduling Algorithms in Hadoop: A Systematic Study. J. Cloud Comput. 2023, 12, 143. [Google Scholar] [CrossRef]
Li, J.; Ding, W.; Han, W.; Huang, X.; Long, A.; Wang, Y. Remote Sensing Thematic Product Generation for Sustainable Development of the Geological Environment. Remote Sens. 2024, 16, 2529. [Google Scholar] [CrossRef]
Hashem, I.A.T.; Anuar, N.B.; Marjani, M.; Ahmed, E.; Chiroma, H.; Firdaus, A.; Abdullah, M.T.; Alotaibi, F.; Ali, W.K.M.; Yaqoob, I. MapReduce Scheduling Algorithms: A Review. J. Supercomput. 2020, 76, 4915–4945. [Google Scholar] [CrossRef]
Guo, J.; Huang, C.; Hou, J. A Scalable Computing Resources System for Remote Sensing Big Data Processing Using GeoPySpark Based on Spark on K8s. Remote Sens. 2022, 14, 521. [Google Scholar] [CrossRef]
Eldawy, A.; Mokbel, M.F. SpatialHadoop: A MapReduce Framework for Spatial Data. In Proceedings of the 2015 IEEE 31st International Conference on Data Engineering, Seoul, Republic of Korea, 13–17 April 2015; pp. 1352–1363. [Google Scholar] [CrossRef]
Aji, A.; Wang, F.; Vo, H.; Lee, R.; Liu, Q.; Zhang, X.; Saltz, J. Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce. Proc. VLDB Endow. 2013, 6, 1009. [Google Scholar] [CrossRef]
Yu, J.; Wu, J.; Sarwat, M. A Demonstration of GeoSpark: A Cluster Computing Framework for Processing Big Spatial Data. In Proceedings of the 2016 IEEE 32nd International Conference on Data Engineering (ICDE), Helsinki, Finland, 16–20 May 2016; pp. 1410–1413. [Google Scholar] [CrossRef]
Xiao, F. A Spark Based Computing Framework For Spatial Data. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2017, IV-4/W2, 125–130. [Google Scholar] [CrossRef]
Tang, M.; Yu, Y.; Malluhi, Q.M.; Ouzzani, M.; Aref, W.G. LocationSpark: A Distributed in-Memory Data Management System for Big Spatial Data. In Proceedings of the VLDB Endowment; VLDB Endowment, New Delhi, India, 5 September–9 September 2016; Volume 9, pp. 1565–1568. [Google Scholar]
Wu, Z.; Ma, P.; Zhang, X.; Ye, G. Efficient Management and Processing of Massive InSAR Images Using an HPC-Based Cloud Platform. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 2866–2876. [Google Scholar] [CrossRef]
Wang, X.; Yang, R.; Liu, X. Research on General System of Remote Sensing Satellite Data Preprocessing Based on MPI + CUDA. In Proceedings of the 2022 3rd International Conference on Geology, Mapping and Remote Sensing (ICGMRS), Zhoushan, China, 22 April 2022; IEEE: NewYork, NY, USA, 2022; pp. 607–612. [Google Scholar]
Haut, J.M.; Paoletti, M.E.; Moreno-Alvarez, S.; Plaza, J.; Rico-Gallego, J.-A.; Plaza, A. Distributed Deep Learning for Remote Sensing Data Interpretation. Proc. IEEE 2021, 109, 1320–1349. [Google Scholar] [CrossRef]
Tarraf, A.; Schreiber, M.; Cascajo, A.; Besnard, J.-B.; Vef, M.-A.; Huber, D.; Happ, S.; Brinkmann, A.; Singh, D.E.; Hoppe, H.-C.; et al. Malleability in Modern HPC Systems: Current Experiences, Challenges, and Future Opportunities. IEEE Trans. Parallel Distrib. Syst. 2024, 35, 1551–1564. [Google Scholar] [CrossRef]
Li, J.; Liu, Z.; Lei, X.; Wang, L. Distributed Fusion of Heterogeneous Remote Sensing and Social Media Data: A Review and New Developments. Proc. IEEE 2021, 109, 1350–1363. [Google Scholar] [CrossRef]
Zhou, X.; Wang, X.; Zhou, Y.; Lin, Q.; Zhao, J.; Meng, X. RSIMS: Large-Scale Heterogeneous Remote Sensing Images Management System. Remote Sens. 2021, 13, 1815. [Google Scholar] [CrossRef]
Xu, C.; Du, X.; Fan, X.; Yan, Z.; Kang, X.; Zhu, J.; Hu, Z. A Modular Remote Sensing Big Data Framework. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–11. [Google Scholar] [CrossRef]
Zhu, J.; Zhang, Z.; Zhao, F.; Su, H.; Gu, Z.; Wang, L. Efficient Management and Scheduling of Massive Remote Sensing Image Datasets. ISPRS Int. J. Geo-Inf. 2023, 12, 199. [Google Scholar] [CrossRef]
Nass, A.; Mühlbauer, M.; Heinen, T.; Böck, M.; Munteanu, R.; D’Amore, M.; Riedlinger, T.; Roatsch, T.; Strunz, G.; Helbert, J. Approach towards a Holistic Management of Research Data in Planetary Science—Use Case Study Based on Remote Sensing Data. Remote Sens. 2022, 14, 1598. [Google Scholar] [CrossRef]
Béjar-Martos, J.A.; Rueda-Ruiz, A.J.; Ogayar-Anguita, C.J.; Segura-Sánchez, R.J.; López-Ruiz, A. Strategies for the Storage of Large LiDAR Datasets—A Performance Comparison. Remote Sens. 2022, 14, 2623. [Google Scholar] [CrossRef]
Kadir, R.A.; Surin, E.S.M.; Sarker, M.R. A Systematic Review of Automated Classification for Simple and Complex Query SQL on NoSQL Database. Comput. Syst. Sci. Eng. 2024, 48, 1405–1435. [Google Scholar] [CrossRef]
Lokugam Hewage, C.N.; Laefer, D.F.; Vo, A.-V.; Le-Khac, N.-A.; Bertolotto, M. Scalability and Performance of LiDAR Point Cloud Data Management Systems: A State-of-the-Art Review. Remote Sens. 2022, 14, 5277. [Google Scholar] [CrossRef]
Boudriki Semlali, B.-E.; Freitag, F. SAT-Hadoop-Processor: A Distributed Remote Sensing Big Data Processing Software for Earth Observation Applications. Appl. Sci. 2021, 11, 10610. [Google Scholar] [CrossRef]
Li, Y.; Yan, J.; Huang, X.; He, X.; Deng, Z.; Chen, Y. R-MLGTI: A Grid- and R-Tree-Based Hybrid Index for Unevenly Distributed Spatial Data. ISPRS Int. J. Geo-Inf. 2025, 14, 231. [Google Scholar] [CrossRef]
Kucharczyk, M.; Hugenholtz, C.H. Remote Sensing of Natural Hazard-Related Disasters with Small Drones: Global Trends, Biases, and Research Opportunities. Remote Sens. Environ. 2021, 264, 112577. [Google Scholar] [CrossRef]
Babbar, H.; Rani, S.; Soni, M.; Keshta, I.; Prasad, K.D.V.; Shabaz, M. Integrating Remote Sensing and Geospatial AI-Enhanced ISAC Models for Advanced Localization and Environmental Monitoring. Environ. Earth Sci. 2025, 84, 1–12. [Google Scholar] [CrossRef]
Zheng, L.; Li, D.; Xu, J.; Xia, Z.; Hao, H.; Chen, Z. A Twenty-Years Remote Sensing Study Reveals Changes to Alpine Pastures under Asymmetric Climate Warming. ISPRS J. Photogramm. Remote Sens. 2022, 190, 69–78. [Google Scholar] [CrossRef]
Demirci, G.V.; Aykanat, C. Scaling Sparse Matrix-Matrix Multiplication in the Accumulo Database. Distrib. Parallel Databases 2020, 38, 31–62. [Google Scholar] [CrossRef]
Cao, Z.; Dong, H.; Wei, Y.; Liu, S.; Du, D.H.C. Is-Hbase: An in-Storage Computing Optimized Hbase with i/o Offloading and Self-Adaptive Caching in Compute-Storage Disaggregated Infrastructure. ACM Trans. Storage 2022, 18, 1–42. [Google Scholar] [CrossRef]
Chopade, R.; Pachghare, V. A Data Recovery Technique for Redis Using Internal Dictionary Structure. Forensic Sci. Int. Digit. Investig. 2021, 38, 301218. [Google Scholar] [CrossRef]
Yang, J.; Yang, W.; Qi, R.; Tsai, Q.; Lin, S.; Dong, F.; Li, K.; Li, K. Parallel Algorithm Design and Optimization of Geodynamic Numerical Simulation Application on the Tianhe New-Generation High-Performance Computer. J. Supercomput. 2024, 80, 331–362. [Google Scholar] [CrossRef]
Kamath, S.; Masterov, M.V.; Padding, J.T.; Buist, K.A.; Baltussen, M.W.; Kuipers, J.A.M. Parallelization of a Stochastic Euler-Lagrange Model Applied to Large Scale Dense Bubbly Flows. J. Comput. Phys. X 2020, 8, 100058. [Google Scholar] [CrossRef]

Figure 1. The architecture of DDMS.

Figure 2. The UML class diagram of DDMS.

Figure 3. The data integrated storage model in DDMS.

Figure 4. The UML class diagram of the data integrated storage model in DDMS.

Figure 5. The distributed parallel processing model in DDMS.

Figure 6. The workflow for the online image service updates.

Figure 7. The workflow of InSAR data visualization.

Figure 8. Image pyramid construction time consumption test.

Figure 9. InSAR data pyramid construction time consumption test.

Figure 10. Pyramid modeling performance comparison.

Figure 11. Image pyramiding performance of the prototype system.

Figure 12. Performance indicator–based evaluation of pyramid construction.

Figure 13. Assessment of image tile retrieval performance (KDE denotes kernel density estimation).

Figure 14. Performance assessment of queries in stress-test scenarios for images (KDE denotes kernel density estimation).

Figure 15. Assessment of points retrieval performance (KDE denotes kernel density estimation).

Figure 16. Performance assessment of queries in stress-test scenarios for point data (KDE denotes kernel density estimation).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cheng, H.; Wu, H.; Zheng, J.; Li, Z.; Qi, K.; Gong, J.; Xiang, L.; Cao, Y. A Distributed Data Management and Service Framework for Heterogeneous Remote Sensing Observations. Remote Sens. 2025, 17, 4009. https://doi.org/10.3390/rs17244009

AMA Style

Cheng H, Wu H, Zheng J, Li Z, Qi K, Gong J, Xiang L, Cao Y. A Distributed Data Management and Service Framework for Heterogeneous Remote Sensing Observations. Remote Sensing. 2025; 17(24):4009. https://doi.org/10.3390/rs17244009

Chicago/Turabian Style

Cheng, Hongquan, Huayi Wu, Jie Zheng, Zhenqiang Li, Kunlun Qi, Jianya Gong, Longgang Xiang, and Yipeng Cao. 2025. "A Distributed Data Management and Service Framework for Heterogeneous Remote Sensing Observations" Remote Sensing 17, no. 24: 4009. https://doi.org/10.3390/rs17244009

APA Style

Cheng, H., Wu, H., Zheng, J., Li, Z., Qi, K., Gong, J., Xiang, L., & Cao, Y. (2025). A Distributed Data Management and Service Framework for Heterogeneous Remote Sensing Observations. Remote Sensing, 17(24), 4009. https://doi.org/10.3390/rs17244009

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Distributed Data Management and Service Framework for Heterogeneous Remote Sensing Observations

Highlights

Abstract

1. Introduction

2. DDMS Management and Service Architecture

2.1. The Overall Architecture of DDMS

2.2. Remote Sensing Data Integrated Storage Model

2.3. Remote Sensing Data Distributed Processing Model

3. DDMS Application Design: Optical Image Service Management and Large-Scale InSAR Data Visualization

3.1. Optical Image Service Online Management

3.2. Large-Scale InSAR Point Cloud Visualization

4. Experiments and Discussion

4.1. Experimental Setting

4.2. Experiments on Storage Capability

4.3. Experiments on Image Service Construction Performance

4.4. Experiments on Stress Testing for Service Responsiveness

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI