2.1. The Principle of Image Elevation Datum Conversion
If the geodetic height of a certain point is known, according to the elevation relationship formulas:
In these equations,
H is the geodetic height,
is the orthometric height,
is the normal height,
N is the geoid difference, and
is the elevation anomaly. To more intuitively show the relationship between the geodetic height, normal height, and orthometric height, the elevation system and its relationship are illustrated in
Figure 1.
The calculation of elevation anomalies using a geoid model fundamentally relies on mathematically modeling the Earth’s gravity field, transforming gravity signals into the geometric undulations of the geoid, and subsequently establishing the conversion relationship between geodetic height and normal height. In this study, the geoid model is fitted based on the SGG-UGM-2 gravity field model. Compared with the widely adopted traditional EGM2008 (which also extends to degree 2159), SGG-UGM-2 substantially improves the reliability of short-wavelength components by integrating the latest GOCE satellite gravity gradient data, denser ground gravity observations, and refined terrain reduction techniques. This enhanced capability to capture high-frequency signals carries dual core significance for the precise conversion of DSMs (Digital Surface Models):
Reduction of omission errors: High-resolution DSMs contain abundant short-wavelength terrain variations. If the spectral resolution of the reference gravity field model is insufficient, a substantial portion of short-wavelength signals in elevation anomalies will be lost, resulting in omission errors and inducing systematic spatial aliasing in the datum transformation.
Improvement of local fitting numerical stability: The superior fidelity of SGG-UGM-2 in the short-wavelength band ensures that calculated elevation anomalies correlate more strongly with actual topographic relief within local regions. Consequently, during the subsequent local quadric fitting process, residuals between model predictions and observed values are smaller and more evenly distributed. Physically, this enhances the convergence of the least squares estimation, reduces the sensitivity of the fitted surface to anomalies, and guarantees that even in regions of complex topography, the elevation datum transformation preserves high spatial consistency and accuracy.
The global geoid can be fitted based on the disturbing potential. The fully normalized spherical harmonic function expression of the global potential coefficient model released in recent years is generally:
where
r is the distance from the calculated point to the center point of the ellipsoid;
is the longitude of the calculated point;
is the geocentric colatitude;
a is the major semi-axis of the ellipsoid;
is the geocentric gravitational constant;
are the fully normalized Stokes coefficients (potential coefficients);
is the fully normalized associated Legendre function;
n denotes the degree; and
m is the order in spherical harmonic expansions. According to the Bruns formula, the geoid difference or elevation anomaly can be expressed as [
10]:
where
is the normal gravity value. Substituting the disturbing potential into the Bruns formula, the spherical harmonic function expansion of the elevation anomaly at any point can be obtained:
Based on the interpolation principle of the quadratic surface moving fitting method, an elevation datum conversion model can be constructed as follows:
According to the principle of interpolation, the plane coordinates of the point P to be determined are solved.
Taking the point
P as the center and
R as the search radius (
Figure 2). Since the number of quadratic surface coefficients is 6, the number of selected data points must be
. A point
is selected if its distance
. If the selected points are insufficient,
R is increased until
.
After selecting a sufficient number of sampling points, the coordinate system is adjusted for computational convenience by translating the origin to the location of the point to be determined. Using the identified sampling points in combination with the quadratic surface fitting equation, the fitted surface can be expressed as:
In the formula,
are the corresponding fitting coefficients of the undetermined point
. Compared to linear interpolation, the quadric surface can better capture the fine gravity field signals of the SGG-UGM-2 model in the medium and short wavelength bands, effectively reducing the omission error in the reference transformation. In the algorithm implementation, besides requiring a certain number of sampling points,
(to ensure redundant observations), this algorithm introduces a multi-quadrant spatial distribution verification mechanism (as shown in
Figure 2). The program divides the search area into four quadrants, mandating the presence of sampling points in each quadrant. This constraint ensures that the unknown point P remains within the geometric envelope of the control points, effectively mitigating the risk of rank deficiency and leverage effect in solving the normal equations, thereby ensuring the geometric robustness of the surface fitting.
If there are m control points in the measurement area, the error can be calculated using Formula (6):
In the formula, V is the residuals, A is the fitting coefficient vector, L is the measured observation vector of known points, and X is the design matrix of , where each row represents the coordinate value of a data point mapped to the quadratic polynomial.
In the calculation process, the weight of each sampling point is very important. Here, the matrix
P represents the weight of the data points. It needs to be clear that these weights do not directly reflect the observation accuracy of the adjacent points searched. Therefore, the principle of determining the weight should be related to the distance
between the data point and the point to be determined. The commonly used weight forms are:
This study conducted a parameter perturbation test: due to the extremely smooth physical surface of the quasi-geoid in space, experiments have shown that when the search radius
R varies within the range of 10 km to 30 km, or the weighted power is switched between 1 and 2, the final fluctuation of elevation anomalies remains at the millimeter level. This proves that the algorithm has strong robustness to parameter settings, and the conversion accuracy does not depend on specific parameter tuning. where
R is the radius of the selected point, which is the distance from the point
P to the data point. In this paper, the point
to be solved is set as the origin of the local coordinate system. For the
i-th data point
within the search radius
R, the relative distance
is defined as:
Finally, according to the principle of least squares, the solution of the quadratic surface coefficient is obtained by means of the method:
and the value of the interpolation point is obtained.
2.2. MPI Parallel Conversion Algorithm Design
This study proposes the HDC (High-Performance Benchmark Conversion)–MPI algorithm, a fast conversion method based on MPI parallel technology, designed to improve the efficiency of batch image elevation benchmark conversions and provide a scalable solution for large-scale image processing. The HDC-MPI (High Definition Datum Conversion–MPI) algorithm operates on a distributed memory model and leverages the Message Passing Interface (MPI) to enable highly concurrent processing of DSM images. Unlike shared-memory parallelization approaches such as OpenMP, MPI allows the algorithm to overcome the memory limitations of a single physical node, facilitating the processing of large-scale DSM datasets that exceed the capacity of individual machines. Given the substantial volume of high-resolution DSM imagery, which can be processed independently, MPI-level process parallelism inherently accelerates the computation, providing natural scalability and efficiency advantages for the HDC-MPI algorithm.
Traditional single-threaded methods often fail to complete such tasks within a reasonable time frame when dealing with large-scale image datasets. In this context, the application of MPI parallel technology significantly improves conversion efficiency, particularly when processing large volumes of multi-image data.
MPI parallel technology allows programmers to parallelize operations at every stage of the program, enabling efficient data exchange and task collaboration in a multi-process environment. In the case of elevation datum conversion for DSM images, each image can be processed independently, and this inherent parallelism makes MPI an ideal solution. By leveraging MPI fast conversion, the algorithm efficiently utilizes the parallel capabilities of modern computing systems, significantly accelerating the conversion process.
2.2.1. Communication Model Design
The Collective Communication model allows all processes within a communicator to participate in data transmission and coordination simultaneously, serving as a core paradigm for efficient data exchange in parallel computing. Compared with traditional point-to-point communication, collective communication leverages highly optimized collective operations to synchronize tasks among multiple processes, significantly reducing communication latency in large-scale parallel environments [
11].
In the proposed algorithm, the standard MPI_Bcast function is employed to distribute the global gravity field model parameters and control information from the root process to all participating processes. Within this parallel architecture, the root process handles global data initialization and resource management, while the worker processes execute the specific DSM elevation transformation tasks in parallel.
By invoking, the root process delivers data to the entire process group through a single collective call, utilizing the tree-based topology optimized within the MPI implementation. This approach circumvents the MPI_Bcast linear time complexity associated with sequential point-to-point transmissions. It should be noted that the correctness of this mechanism relies on the strictly consistent invocation sequence across all processes in the communicator. By ensuring synchronized storage management and communication ordering, provides implicit synchronization, thereby reducing the risk of logical errors that typically arise from manually managing complex point-to-point communication handles. During the data-loading phase of global gravity field models and large-scale DSM imagery, this mechanism ensures high consistency of data states throughout the parallel framework.
2.2.2. Load Balancing Design
Load balancing is commonly classified into two categories: static load balancing and dynamic load balancing. Static load balancing achieves a uniform distribution of workload by pre-dividing tasks and assigning them to computing processes prior to program execution. This approach is characterized by its implementation simplicity and minimal communication overhead, making it particularly well-suited for scenarios in which the task scale is fixed and the computational intensity is predictable [
12].
In the context of processing large volumes of standardized remote sensing products—such as DSM tiles with consistent framing and uniform pixel sizes—the computational load associated with each scene is highly uniform. Under these conditions, static allocation can provide substantial parallel speedup while minimizing MPI communication overhead, thereby achieving efficient utilization of computational resources.
Given the variations in DSM imagery specifications across different task scenarios—such as the number of pixel rows/columns and file sizes for a single scene—as well as the total volume of images to be processed, the algorithm must exhibit high flexibility. To ensure that the HDC-MPI algorithm can adapt to production tasks of various scales, this study employs a static load balancing strategy based on data decomposition.
In a serial algorithm, images are typically processed in batches using a for loop, where images are read sequentially and then processed for elevation datum conversion. To improve efficiency, this paper proposes replacing the for loop of the serial algorithm with an MPI parallel algorithm. However, unlike the serial algorithm, which processes images sequentially without considering the overall task load, the parallel algorithm must account for the load balancing problem—specifically, how to distribute images effectively across different processes.
To address this, a modified MPI parallel algorithm is designed, suitable for handling the image processing tasks in parallel. This algorithm ensures that tasks are evenly distributed, taking into consideration the number and size of images. The algorithm can be expressed in the following formula:
Let the total number of image tasks to be processed be
and the total number of MPI processes launched be
. Define the process number currently executing the task as
(where
). First, calculate the quotient
q and the remainder
in the benchmark task:
In the formula, q represents the quotient of the total image task volume divided by the total number of processes, r is the remainder, and is the rank of the currently executing process. In addition, process IDs are 0-based.
Two scenarios are considered for static load allocation based on the relationship between the total number of image tasks and the total number of processes.
Scenario 1: There is no exact multiple relationship between the total task volume and the number of processes. For example, when processing 10 image scenes using 4 processes, we have and . For the first two processes, the conditions and are satisfied, so each is assigned tasks. For the remaining processes ( and ), these conditions are not satisfied, so each is assigned tasks. The resulting load distribution queue is .
Scenario 2:The total task volume is exactly divisible by the number of processes. For instance, when processing 10 image scenes using 5 processes, and . Since none of the processes satisfy both and , the allocation logic reduces to a simple AND operation. Consequently, each process is evenly assigned 2 tasks, resulting in a final load distribution queue of .
This static allocation strategy ensures balanced workload distribution across processes, minimizing idle time and MPI communication overhead, even when the total task volume is not evenly divisible among the available processes.
Figure 3 shows the image task allocation flowchart of HDC-MPI algorithm.
2.2.3. Core Algorithm Code
The specific implementation of the static task allocation strategy is detailed in Algorithm 1. The pseudo-code presented above provides a detailed illustration of the global image scanning and static task allocation process during the data preprocessing stage of the HDC-MPI algorithm. This process is structured into three core phases, designed to maximize parallel efficiency while minimizing idle time between processes:
Input and Broadcast Mechanism (Phase 1): To mitigate I/O bottlenecks and resource contention caused by multiple processes simultaneously accessing the underlying file system in large-scale parallel environments, the algorithm employs a single-point input strategy. Specifically, only the root process () is responsible for receiving or reading the initial image file path, . This path is then efficiently broadcast to all slave processes within the MPI communication domain using MPI’s collective communication mechanism (Broadcast). This ensures that all computing nodes maintain a consistent data context simultaneously, reducing redundant I/O operations.
Global File Analysis (Phase 2): Once the unified path is obtained, each process performs directory traversal in parallel, automatically filtering out invalid entries such as hidden or system files. During this step, a complete queue of pending image files, , is dynamically constructed. The system also determines the total number of tasks, M, corresponding to the valid images that require elevation datum conversion.
Static Load Balancing and Task Partitioning (Phase 3): This phase is critical for achieving a high parallel speedup ratio. To map the M processing tasks efficiently onto MPI processes, the algorithm implements a refined static data decomposition strategy, ensuring that workload is balanced across processes while minimizing inter-process communication overhead.
This structured preprocessing workflow enables the HDC-MPI algorithm to handle large-scale imagery efficiently, providing a foundation for high-throughput parallel elevation datum conversion.
| Algorithm 1: MPI-based Global Image Scanning and Static Task Allocation |
![Electronics 15 02127 i001 Electronics 15 02127 i001]() |
2.3. HDC-MPI Parallel Conversion Workflow
The HDC-MPI algorithm has implemented a high-performance parallel elevation benchmark conversion workflow for large-scale DSM images. Its core components include:
Global Image Scanning and Task Allocation:The root process scans the input directory, filters out invalid files, and broadcasts a list of valid DSM images to all worker processes. A static load balancing strategy is used to allocate tasks, which takes into account the total number of images and the number of processes (Formulas (
16) and (
17)). Each process receives a continuous block of images, ensuring balanced workload distribution and minimal inter-process communication overhead.
Model Solving and Quadratic Surface Interpolation in Parallel Subtasks:In independent parallel loops of each process, the algorithm first calculates the elevation anomalies of control points based on the SGG-UGM-2 global gravity field model (2190 th order). Subsequently, for each pixel in the DSM image, elevation anomaly interpolation is performed using a quadratic surface model fitted to adjacent control points.
I/O and Memory Optimization:To avoid file system contention, the initial global model parameter reading is uniformly executed by the root process and broadcasted for distribution. In the image processing stage, each process independently reads the image blocks it is responsible for based on the allocated file handle. By finely managing matrix operations through local memory pools and limiting redundant data copies, the algorithm ensures high throughput when processing high-resolution and large-volume DSM images.
Synchronization and Result Collection:The set utilizes MPI’s collective communication operations to ensure that all worker processes synchronize their states during the computation cycle. To avoid network communication and memory bottlenecks caused by massive image data being returned to the root process, each worker process independently and in parallel outputs the processed DSM image blocks to the designated storage location after completing the benchmark conversion, ensuring that the final dataset remains consistent in space and values.
2.4. Technical Route
This study addresses the prominent issue of low conversion efficiency during the elevation datum transformation of batch DSM images. To solve this problem, an efficient batch DSM image elevation datum conversion method, named HDC-MPI, is proposed and specifically designed based on MPI technology. The primary objective is to significantly enhance the throughput and efficiency of batch processing. The overall workflow diagram is shown in
Figure 4, specific research contents are organized as follows:
- (1)
Study of fast batch image elevation datum conversion algorithm
A fast HDC-MPI image elevation datum conversion algorithm based on MPI is designed, focusing on the following aspects:
Analyze whether MPI parallel technology can improve the efficiency of batch image elevation datum conversion.
Investigate the impact of image quantity and image size on memory usage, and assess whether these factors influence MPI execution efficiency.
Analyze the effect of having identical versus different image sizes within an image set on conversion efficiency.
- (2)
Study on the impact of terrain on elevation datum conversion
Four terrain types are selected, including plain, hilly, mountainous, and high-mountain regions. For each terrain type, two groups of test examples with different image sizes are designed. The differences between these groups mainly lie in pixel count and memory consumption. The speedup ratio is adopted as the evaluation metric to analyze the influence of terrain factors on the performance of the MPI-based fast conversion algorithm.
- (3)
Study on the correctness of image conversion results
ArcGIS 10.8 software is employed to verify the correctness of the parallel conversion results:
Using the profile analysis function, the elevation results produced by the serial and parallel algorithms are compared along identical profiles to verify the correctness of the parallel conversion.
Contour lines are generated from both serial and parallel results, and contour maps are compared to further assess the accuracy and consistency of the parallel conversion results.