Distributed Geoscience Algorithm Integration Based on OWS Specifications : A Case Study of the Extraction of a River Network

To understand and solve various natural environmental problems, geoscience research activities are becoming increasingly dependent on the integration of knowledge, data, and algorithms from scientists at different institutes and with multiple perspectives. However, the facilitation of these integrations remains a challenge because such scientific activities require gathering numerous geoscience researchers to provide data, knowledge, algorithms, and tools from different institutes and geographically distributed locations. The pivotal issue that needs to be addressed is the identification of a method to effectively combine geoscience algorithms in a distributed environment to promote cooperation. To address this issue, in this paper, a scheme for building a distributed geoscience algorithm integration based on the Open Geospatial Consortium web service (OWS) specifications is proposed. The architecture of the geoscience algorithm integration, algorithm service management mechanism, XML description method for algorithm integration, and integrated model execution strategy are designed and implemented. The experiment implements the integration of geoscience algorithms in a distributed cloud environment and evaluates the feasibility and efficiency of the integrated geoscience model. The proposed method provides a theoretical basis and practical guidance for promoting the integration of distributed geoscience algorithms; this approach can help to aggregate the distributed geoscience capabilities to address natural challenges.


Introduction
With the development of large and global geographical environment problems, fully and effectively coordinating globally distributed domain experts and fully utilizing distributed knowledge,

Literature Review
As early as 1983, Blanning proposed the problem of managing various types of geoscience models and algorithms in the form of a document and proposed the concept of the model library, which managed models and algorithms with a defined model library query language (MQL) [4].However, due to the restrictions of network technologies, traditional geoscience algorithms mostly exist in single-machine environments and they cannot be obtained through a network.Moreover, because of the different programming languages and operating environments, most of the algorithms are not interactive.There are barriers to using aggregate distributed geoscience algorithms to build the models that are required to solve complex geographic problems [5], because the various algorithms and data were developed with different languages and interfaces; hence, the existing algorithms need to be rewritten to meet the integration requirements.If various geoscience algorithms can be obtained and aggregated into a model in real time via the Internet and are no longer limited by platform, hardware, and software, emergency response times to earthquakes, landslides, floods, hurricanes, and other disasters can be shortened.To this end, it is necessary to study geoscience algorithm integration technologies so that all users can access these distributed geoscience algorithms quickly and easily, or even integrate the distributed algorithms dynamically.
Driven by network technology, web service platforms have become a new solution for the integration of network applications.Web service technologies have been widely used to construct distributed, modular applications and service-oriented applications.Accordingly, with the application of geographical information technology in various fields, the technology is moving from a closed (tightly coupled standalone) system to an open and loosely coupled service.Users can access and use geographic data, geoprocessing services, and mapping services on demand via the Internet [6].Service-oriented applications have become the direction of new geoscience developments.Geospatial application activities are moving from a professional field to a networked, socialized, and popular service that is being accepted by various domain experts and even nonprofessionals [7].An algorithm service deployment strategy for sharing geoanalysis algorithms was proposed and implemented to provide a collaboration-oriented method that allowed modeling participants to work together and integrate algorithms and computational resources across an open web environment [8].A virtual workflow system can provide an efficient Graphical User Interface (GUI) that users can utilize to integrate distributed scientific collaborative services and execute them on grid resources [9].
With continuous advancements in the sharing, exchanging, and use of spatial data, the sharing and interoperability of processing functions have received increased attention.A web service provides an open platform for the sharing of spatial information and geoprocessing functions.It is critical to implement the sharing of Earth observation data and geospatial analysis algorithms [10].By accessing web service resources, all geoprocessing functions from algorithm publishers can be provided to algorithm users through the Internet.The emergence of common web services also enables geospatial data sharing and algorithm interoperability, and, when compared with other distributed architectures, the geoscience algorithm sharing based on a common service-oriented architecture (SOA) has obvious advantages.However, there is no standard scheme for the integration of common web service-based geospatial algorithms; hence, this method can only be used to share geospatial data and geoscience algorithms within a limited scope and community.Moreover, it is difficult to implement the automatic discovery and integration of widely accepted geoscience algorithms.
To meet the requirements of distributed geoscience data and algorithm sharing, the International Organization for Standardization/Technical Committee (ISO/TC 211) and the Open Geospatial Consortium (OGC) formulated a series of geographical data services and processing service standards, including a Web Map Service (WMS), Web Feature Service (WFS), Web Coverage Service (WCS), and Web Processing Service (WPS), to standardize data transmission and processing interfaces.
The WPS interface is proposed to address the deficiencies of web services in solving functional interoperability and the increasing demand for network-based spatial data processing."Processing" can be an algorithm, computation, or model for processing spatial data.The WPS interface standard provides rules for standardizing how to construct the inputs and outputs of geospatial processing services and geographic computing in a standard way, making it easier for users to publish geospatial processing services and discover and bind these services.WPS also defines how the client invokes the processing service and how to process the output of the processing service.The implementation of this standard allows any geospatial processing service, regardless of its source, to be encapsulated and integrated into existing workflows using standard interfaces.The WPS standard defines a general process model, which is designed to provide interoperability descriptions for geographic processing and computing and support service discovery in distributed environments.
These geospatial service specifications have been widely adopted in geographical model building [11][12][13][14][15][16].By using web-based maps, geospatial data, geoprocessing services, and sensor web services, researchers can efficiently use geographic information resources to support spatial decision-making and geoscience applications [17][18][19][20][21][22].Di demonstrated that the framework based on the OGC web service (OWS) facilitates interoperability between Earth observation data and geoprocessing modeling [23].Specialists have also conducted geographic processing research that is based on grid technology and a cloud computing environment [24][25][26][27].WPS specifications are utilized to process geoscience data on different computing backends and platforms [28].A new method of flexible service chaining using the standard Business Process Markup Notation (BPMN) has been proposed to access a centralized repository of processes and services to form a reusable workflow [29].Nativi developed the GEO model web initiative of environmental model access and interoperability, in which the basic principles and technical challenges of implementing a model web are revealed [30].Castronova designed a generic OpenMI-component that wraps OGC WPS modeling services, and the model services can be leveraged and reused within multiple workflow environments and decision support systems; this approach can advance the work in SOAs for environmental modeling [16].To address the emerging issue of integrating data sharing and computing e-infrastructures for multidisciplinary applications, a business process broker (BPB) was designed to take a formal description of a scientific business process and translate it in an executable process, and this method has been applied in satellite image mosaicking [31].
Countries and organizations have conducted geospatial information service projects, such as the U.S. EarthCube program by the National Science Foundation (NSF), the Infrastructure for Spatial Information in Europe (INSPIRE), and China's TIANDITU.Geoscience services that are based on distributed information infrastructures have been developed in recent years, including Spatial Data and Information Infrastructure, e-Science, and Cyberinfrastructure [31][32][33][34].The goal is to improve the access, sharing, visualization, and analysis of all forms of geoscience data and related resources.In recent years, the virtual geographic environment was introduced as a new geoscience algorithm sharing and interaction technology, and it plays an important role in the managing and sharing of geographical knowledge and multiscale environmental change monitoring applications [5,[35][36][37][38].Commercial organizations and enterprises have built cloud-based geocomputation services for spatial analysis, mapping, and spatial processing [39,40].
All of these prior studies provide excellent examples of standard-based geoscience data processing and they can be seen as the source of inspiration for this study.

The Architecture of Geoscience Algorithm Integration
To implement geoscience algorithm integration in the distributed environment, it is necessary to understand the algorithm architecture, and this paper presents the architecture of geoscience algorithm integration, as shown in Figure 1.The architecture consists of four main components: the geoscience service (GS) provider, GS registry center, algorithm integration module, and geospatial resources.
(1) GS provider The GS providers provide two services: a spatial data service and a geoprocessing service.The geospatial data are provided by the WCS, WFS, and WMS, and the geoprocessing functionalities are provided by the WPSs.A simple geoprocessing algorithm can be encapsulated in a WPS, which can also integrate with other geoprocessing algorithm services to form complex geoscience algorithms.This step results in a great improvement to the reusability and flexibility of distributed geoscience analysis and decision making.
(2) Service registry center The duty of the service registry center is to provide services for GS registration and lookup.After the GS provider releases the GSs to the registry center via Catalog Service for the Web (CSW) interfaces, the GSs can be searched via the Internet and then invoked via URLs as a web service.
(3) Algorithm integration module The geoscience algorithm integration module is responsible for finding and binding the predefined GSs according to XML-based model script in the algorithm base, executing and monitoring the integrated algorithms, and returning the results to the user.The algorithm can be a single GS or a combination of multiple GSs.The algorithm integration module executes the integrated geoscience model via the model execution engine.
(4) Geospatial resources Geospatial resources represent the geoprocessing tools, geospatial data, and computing and storage platforms.Geoprocessing tools can be published as a WPS.Data resources include geographic vector and raster data and remote sensing data, which can be published as the WCS, WFS, and WMS.The geospatial resources also include a physical geoprocessing server or grid/cloud computing resources.
ISPRS Int.J. Geo-Inf.2019, 8, x FOR PEER REVIEW 5 of 16 integrated algorithms, and returning the results to the user.The algorithm can be a single GS or a combination of multiple GSs.The algorithm integration module executes the integrated geoscience model via the model execution engine.
(4) Geospatial resources Geospatial resources represent the geoprocessing tools, geospatial data, and computing and storage platforms.Geoprocessing tools can be published as a WPS.Data resources include geographic vector and raster data and remote sensing data, which can be published as the WCS, WFS, and WMS.The geospatial resources also include a physical geoprocessing server or grid/cloud computing resources.

Geoscience Service Management Mechanism
Numerous GSs appear in distributed environments; thus, a mechanism is needed to assist domain experts with accurately and efficiently finding the required GSs from a large set of available GSs.The service management mechanism is the facility that guarantees the integration of the distributed geoscience algorithms.The registration and discovery of GSs can be implemented by establishing a registry center for the services.Distributed GSs can be divided into four categories: portrayal service, data service, processing service, and registration service.The portrayal service is used to depict the visualization of geographic information that is presented to the user.The data service (e.g., WFS, WCS) is responsible for providing the spatial data using a service interface.The processing service (e.g., WPS) provides spatial data analysis functions to achieve value-added information.The registration service records the above three services.In this paper, a GS classification system is designed, as shown in Table 1.

Geoscience Service Management Mechanism
Numerous GSs appear in distributed environments; thus, a mechanism is needed to assist domain experts with accurately and efficiently finding the required GSs from a large set of available GSs.The service management mechanism is the facility that guarantees the integration of the distributed geoscience algorithms.The registration and discovery of GSs can be implemented by establishing a registry center for the services.Distributed GSs can be divided into four categories: portrayal service, data service, processing service, and registration service.The portrayal service is used to depict the visualization of geographic information that is presented to the user.The data service (e.g., WFS, WCS) is responsible for providing the spatial data using a service interface.The processing service (e.g., WPS) provides spatial data analysis functions to achieve value-added information.The registration service records the above three services.In this paper, a GS classification system is designed, as shown in Table 1.  1 shows the general classification of the geoscience data services and processing services, which can be effectively used to manage the services and facilitate the registration and lookup of the services.The classification system provides strong support for GS registration and discovery in distributed geoscience algorithm service integration.Distributed GSs and GS management mechanisms constitute the foundation of distributed GS integration.

XML Description of the Algorithm Integration
In the proposed method, the integrated models are described via Business Process Execution Language (BPEL) XML specifications.When compared with the methods that are proposed in [30][31][32], this paper utilizes BPEL XML to describe the integrated geoscience models.BPEL is an OASIS standard executable language for business processes with web services and is widely adopted by the scientific community and industry circles, which can help the method to become widely accepted.
Filling pits: This step is used for data preprocessing.In the case of data errors, the original DEM data will have noise and there will be pits.In the D8 flow direction algorithm, a part of the river network is broken down, which contradicts the rules of river formation.The small defects in the data are removed by filling the pits in the DEM data, as shown in Figure 3.

2.
Calculating flow direction: After filling, the value of each center pixel is not smaller than the values of the eight pixels around it; thus, each water pixel will flow toward the pixels with lower values.This process is utilized to form the 8 flow directions.The grid flow is calculated by using the D8 algorithm to create the flow from each pixel toward the steepest downhill adjacent points.As shown in Figure 4, the values are 1, 2, 4, 8, 16, 32, 64, and 128 in each direction.

3.
Calculating flow accumulation: To form a river network by rainwater, each grid is given a water drop.The flow calculation creates a grid for each water droplet accumulated by each pixel.4.
Thresholding flow accumulation: The threshold of the number of water droplets is calculated while considering that the number of water droplets in a river network pixel is greater than the threshold value.The binarization algorithm is used to set the values of the pixels in the river network to 1 and the other values to 0.
Converting the data format: The grid of the river network is converted to vector format to facilitate data editing and analysis.Converting the data format: The grid of the river network is converted to vector format to facilitate data editing and analysis.

Integrated Model Execution Strategy
The original model XML description file can be defined by the users, who are usually experts, and the model base contains predefined XML-based integrated geoscience models, such as global climate change models and the hydrology models.These predefined models can be used directly to solve specific problems.The original integrated model XML description files are created by referring to BPEL specification; hence, an XML document can be converted into a standard BPEL format and executed via a BPEL engine.In general, when users need to execute an integrated geoscience model, the following steps are carried out, as shown in Figure 5.  Converting the data format: The grid of the river network is converted to vector format to facilitate data editing and analysis.

Integrated Model Execution Strategy
The original model XML description file can be defined by the users, who are usually experts, and the model base contains predefined XML-based integrated geoscience models, such as global climate change models and the hydrology models.These predefined models can be used directly to solve specific problems.The original integrated model XML description files are created by referring to BPEL specification; hence, an XML document can be converted into a standard BPEL format and executed via a BPEL engine.In general, when users need to execute an integrated geoscience model, the following steps are carried out, as shown in Figure 5.

Integrated Model Execution Strategy
The original model XML description file can be defined by the users, who are usually experts, and the model base contains predefined XML-based integrated geoscience models, such as global climate change models and the hydrology models.These predefined models can be used directly to solve specific problems.The original integrated model XML description files are created by referring to BPEL specification; hence, an XML document can be converted into a standard BPEL format and executed via a BPEL engine.In general, when users need to execute an integrated geoscience model, the following steps are carried out, as shown in Figure 5. i.
Search the system model base and determine whether there is a predefined integrated model; if yes, then skip to step iii; if no, go to step ii.ii.
Create a model, and submit it after completion.In this step, the user can build the integrated model according to Section 3.3 and then submit it to the model base.iii.
Select the required model XML description document and submit it to the model execution engine.iv.
Execute the integrated model.During the execution of the integrated geoscience model, the model integration module will send the XML document to the model execution engine, which finishes execution via the BPEL engine.v.
Acquire the result.A URL link is returned after a complete process is executed by the BPEL engine, and the service user can obtain the results of the geoprocessing through the URL link.i.
Search the system model base and determine whether there is a predefined integrated model; if yes, then skip to step iii; if no, go to step ii.ii.
Create a model, and submit it after completion.In this step, the user can build the integrated model according to Section 3.3 and then submit it to the model base.iii.
Select the required model XML description document and submit it to the model execution engine.iv.
Execute the integrated model.During the execution of the integrated geoscience model, the model integration module will send the XML document to the model execution engine, which finishes execution via the BPEL engine.v.
Acquire the result.A URL link is returned after a complete process is executed by the BPEL engine, and the service user can obtain the results of the geoprocessing through the URL link.
In the above steps, all algorithm services and models are based on the OGC WPS specification.Therefore, the module has the advantages of convenient interactions and executions that are independent of the OS and the execution platform.

Experiment Description
In the experiment, all of the geoscience algorithms (e.g., filling pits, calculating flow direction, calculating flow accumulation, threshold flow accumulation, and data format conversion) are shared via the WPS, and the DEM data and the vector spatial data are shared via the WCS and WFS.The DEM data from ASTER GDEM V2 were selected for the experiment, and these data were developed jointly by the METI of Japan and NASA of the United States.These data are accessible to the public and have a spatial resolution of 30 m.The dataset was provided by the Geospatial Data Cloud site, Computer Network Information Center, and Chinese Academy of Sciences (http://www.gscloud.cn).The DEM data cover the southern part of the Loess Plateau of China.Figure 6 shows the location of the study area.The area has a typical Loess Plateau landform and millions of gullies.In the above steps, all algorithm services and models are based on the OGC WPS specification.Therefore, the module has the advantages of convenient interactions and executions that are independent of the OS and the execution platform.

Experiment Description
In the experiment, all of the geoscience algorithms (e.g., filling pits, calculating flow direction, calculating flow accumulation, threshold flow accumulation, and data format conversion) are shared via the WPS, and the DEM data and the vector spatial data are shared via the WCS and WFS.The DEM data from ASTER GDEM V2 were selected for the experiment, and these data were developed jointly by the METI of Japan and NASA of the United States.These data are accessible to the public and have a spatial resolution of 30 m.The dataset was provided by the Geospatial Data Cloud site, Computer Network Information Center, and Chinese Academy of Sciences (http://www.gscloud.cn).The DEM data cover the southern part of the Loess Plateau of China.Figure 6 shows the location of the study area.The area has a typical Loess Plateau landform and millions of gullies.
The test environment is built on the QingCloud, which is a commercial cloud service vendor in China.Four server host virtual machines (VMs) are launched in Beijing, Shanghai, Guangzhou, and Hong Kong, as shown in Figure 7a.Each VM is equipped with four virtual CPUs of 2.2 GHz, 8-GB RAM, and a 20 Mb/s network.To test the feasibility and performance of the proposed method around the world, the test environment is also built on Alibaba Cloud, which is a cloud service that is available around the world.Four server host VMs are launched in London (UK), Silicon Valley (USA), Beijing (CHN), and Sydney (AUS), as shown in Figure 7b.Each VM is equipped with four virtual CPUs of 2.5 GHz, 8-GB RAM, and 40G ROM.The test environment is built on the QingCloud, which is a commercial cloud service vendor in China.Four server host virtual machines (VMs) are launched in Beijing, Shanghai, Guangzhou, and Hong Kong, as shown in Figure 7a.Each VM is equipped with four virtual CPUs of 2.2 GHz, 8-GB RAM, and a 20 Mb/s network.To test the feasibility and performance of the proposed method around the world, the test environment is also built on Alibaba Cloud, which is a cloud service that is available around the world.Four server host VMs are launched in London (UK), Silicon Valley (USA), Beijing (CHN), and Sydney (AUS), as shown in Figure 7b.Each VM is equipped with four virtual CPUs of 2.5 GHz, 8-GB RAM, and 40G ROM.In the distributed environment, spatial data and algorithms can be dispersed on the same node or distributed to different nodes; a model will integrate with the algorithms of distributed servers via WPSs in practical applications.Therefore, we designed three test schemes to distribute the geoscience algorithms and geospatial data on different nodes to simulate various geographical distributions of the geoscience algorithms and data.All data are published as WCS and WFS via GeoServer 2.13.3, which allows considerable flexibility in map creation and data sharing by using OGC standards.The geoscience algorithms are published as WPSs via 52° North, which is open-source software for managing and publishing WPSs.The model execution engine is built via Eclipse and the BPEL 2.0  The test environment is built on the QingCloud, which is a commercial cloud service vendor in China.Four server host virtual machines (VMs) are launched in Beijing, Shanghai, Guangzhou, and Hong Kong, as shown in Figure 7a.Each VM is equipped with four virtual CPUs of 2.2 GHz, 8-GB RAM, and a 20 Mb/s network.To test the feasibility and performance of the proposed method around the world, the test environment is also built on Alibaba Cloud, which is a cloud service that is available around the world.Four server host VMs are launched in London (UK), Silicon Valley (USA), Beijing (CHN), and Sydney (AUS), as shown in Figure 7b.Each VM is equipped with four virtual CPUs of 2.5 GHz, 8-GB RAM, and 40G ROM.In the distributed environment, spatial data and algorithms can be dispersed on the same node or distributed to different nodes; a model will integrate with the algorithms of distributed servers via WPSs in practical applications.Therefore, we designed three test schemes to distribute the geoscience algorithms and geospatial data on different nodes to simulate various geographical distributions of the geoscience algorithms and data.All data are published as WCS and WFS via GeoServer 2.13.3, which allows considerable flexibility in map creation and data sharing by using OGC standards.The geoscience algorithms are published as WPSs via 52° North, which is open-source software for managing and publishing WPSs.The model execution engine is built via Eclipse and the BPEL 2.0 In the distributed environment, spatial data and algorithms can be dispersed on the same node or distributed to different nodes; a model will integrate with the algorithms of distributed servers via WPSs in practical applications.Therefore, we designed three test schemes to distribute the geoscience algorithms and geospatial data on different nodes to simulate various geographical distributions of the geoscience algorithms and data.All data are published as WCS and WFS via GeoServer 2.13.3, which allows considerable flexibility in map creation and data sharing by using OGC standards.The geoscience algorithms are published as WPSs via 52 • North, which is open-source software for managing and publishing WPSs.The model execution engine is built via Eclipse and the BPEL 2.0 Library, and the integrated geoscience model is executed via Apache ODE.To compare with the traditional single-machine-based method, in Test 4, the same model is built via the ArcGIS Model Builder and executed on a machine.The four test schemes are as follows: 1.
Test 1: No data transmission, and the data and processing methods are in the same cloud node; this scenario tests the proposed geoscience algorithm integration method in the single node and it can be used by the distributed users.

2.
Test 2: Only partial data (i.e., DEM data) are acquired by distributed transmission, and the other required data and processing methods are on one cloud node.This test scenario tests the execution of the proposed geoscience algorithm integration method between two organizations and can be used by the distributed users.

3.
Test 3: Full data transmission, and the data and all algorithms are on different cloud nodes.In this test situation, the data and geoscience algorithms are built by distributed users; this scenario tests the proposed geoscience algorithm integration method over a wide area and it can be used by the distributed users.4.
Test 4: Building the same workflow via ArcGIS Model Builder and executing it on a single machine.The test is usually configured by the users of one organization or institute and it can only be used on a single machine.
The four different approaches are compared and analyzed using different data volumes and different network bandwidths (e.g., 1 Mbps, 5 Mbps, 10 Mbps, and 20 Mbps).The sizes of the datasets are shown in Table 2.

River Network Extraction Results of Geoscience Algorithm Integration
The integrated model utilizes DEM data as the input data and it returns the results of the river network in GML format after processing.The first three test schemes can execute the integrated geoscience model and obtain the results successfully.After the execution of the integrated geoscience algorithms, a GML-based result is produced.Through parsing the GML results, the vector data result is created.In Test 4, the model is executed on the ArcGIS platform, resulting in a shapefile format.The vector data are then overlapped with an ESRI web map, and the final results are shown in Figure 8a-d, corresponding to the DEM data sizes of 2 MB, 20 MB, 100 MB, and 500 MB, respectively.The vector lines are the extracted river network in the study area, reflecting the size of the data at different distribution ranges, and the background is an ESRI web map.
Library, and the integrated geoscience model is executed via Apache ODE.To compare with the traditional single-machine-based method, in Test 4, the same model is built via the ArcGIS Model Builder and executed on a machine.The four test schemes are as follows: 1. Test 1: No data transmission, and the data and processing methods are in the same cloud node; this scenario tests the proposed geoscience algorithm integration method in the single node and it can be used by the distributed users.2. Test 2: Only partial data (i.e., DEM data) are acquired by distributed transmission, and the other required data and processing methods are on one cloud node.This test scenario tests the execution of the proposed geoscience algorithm integration method between two organizations and can be used by the distributed users.3. Test 3: Full data transmission, and the data and all algorithms are on different cloud nodes.In this test situation, the data and geoscience algorithms are built by distributed users; this scenario tests the proposed geoscience algorithm integration method over a wide area and it can be used by the distributed users.4. Test 4: Building the same workflow via ArcGIS Model Builder and executing it on a single machine.The test is usually configured by the users of one organization or institute and it can only be used on a single machine.
The four different approaches are compared and analyzed using different data volumes and different network bandwidths (e.g., 1 Mbps, 5 Mbps, 10 Mbps, and 20 Mbps).The sizes of the datasets are shown in Table 2.

River Network Extraction Results of Geoscience Algorithm Integration
The integrated model utilizes DEM data as the input data and it returns the results of the river network in GML format after processing.The first three test schemes can execute the integrated geoscience model and obtain the results successfully.After the execution of the integrated geoscience algorithms, a GML-based result is produced.Through parsing the GML results, the vector data result is created.In Test 4, the model is executed on the ArcGIS platform, resulting in a shapefile format.The vector data are then overlapped with an ESRI web map, and the final results are shown in Figure 8a-d

Discussion
This experiment demonstrates that it is feasible to integrate distributed geoscience algorithms that are based on OGC web service specifications, which can help scientists to utilize distributed geoscience algorithms and data to solve geographic problems.All of the test schemes can obtain the river network extraction result, demonstrating that the proposed geoscience algorithm integration method works in various algorithms and data distribution situations, improving its applicability to the diverse conditions in the real world.To analyze the quality of the proposed method in different conditions, we collected the performance data of the different test schemes, as shown in Figure 9. Figure 9a shows that as the dataset size increases, the execution time that is required for each test also increases because larger input datasets require additional processing time.However, the rate of time increase in Test 3 is significantly higher than the rates in the other two tests, particularly when the network bandwidth is only 1 Mbps because the data and all algorithms are on different nodes;

Discussion
This experiment demonstrates that it is feasible to integrate distributed geoscience algorithms that are based on OGC web service specifications, which can help scientists to utilize distributed geoscience algorithms and data to solve geographic problems.All of the test schemes can obtain the river network extraction result, demonstrating that the proposed geoscience algorithm integration method works in various algorithms and data distribution situations, improving its applicability to the diverse conditions in the real world.To analyze the quality of the proposed method in different conditions, we collected the performance data of the different test schemes, as shown in Figure 9.

Discussion
This experiment demonstrates that it is feasible to integrate distributed geoscience algorithms that are based on OGC web service specifications, which can help scientists to utilize distributed geoscience algorithms and data to solve geographic problems.All of the test schemes can obtain the river network extraction result, demonstrating that the proposed geoscience algorithm integration method works in various algorithms and data distribution situations, improving its applicability to the diverse conditions in the real world.To analyze the quality of the proposed method in different conditions, we collected the performance data of the different test schemes, as shown in Figure 9. Figure 9a shows that as the dataset size increases, the execution time that is required for each test also increases because larger input datasets require additional processing time.However, the rate of time increase in Test 3 is significantly higher than the rates in the other two tests, particularly when the network bandwidth is only 1 Mbps because the data and all algorithms are on different nodes; Figure 9a shows that as the dataset size increases, the execution time that is required for each test also increases because larger input datasets require additional processing time.However, the rate of time increase in Test 3 is significantly higher than the rates in the other two tests, particularly when the network bandwidth is only 1 Mbps because the data and all algorithms are on different nodes; therefore, the volume of data transferred in the distributed environment in Test 3 is larger than that in tests 1 and 2. Thus, when the distributed data sizes are large, the transmission of data between nodes that are connected via a low-speed network can be a time-consuming task that will increase the execution time of the integrated geoscience model.This result demonstrates that the quality of distributed geoscience algorithm integration can be affected by the size of the processed data, the amount of distributed data transferred, and the speed of the network.Figure 9b shows the performance of the experiment around the world, which indicates that the experiment can obtain very similar performance to that of the experiment within China.Thus, the proposed method can be applied not in one country, but also around the world with similar performance.
In Figure 9, the execution time of Test 4, which is the traditional method of geoscience algorithm integration and execution, shows that the performance of the traditional method is more stable than that of the proposed method.This is because all data and processing are hosted in one machine; hence, there is no data transmission involved.Moreover, there is no distributed invoked web service in the model.All of these factors contribute to the rapid and stable execution of Test 4. Furthermore, when compared with the traditional process in a standalone environment, the proposed method has the characteristics of remote access, interoperability, and distributed storage of data and algorithms.Moreover, by utilizing OGC specifications, the barriers of different data formats and interfaces can be removed.In contrast, the traditional method can only encapsulate the geoscience algorithms in an isolated environment and it cannot be accessed remotely via the Internet, making it difficult to achieve distributed integration and interoperability.
Both experiments demonstrate the feasibility of the proposed method, but it is also clear that the performance of the proposed method is affected by data transmission.As a result, future efforts should focus on enhancing the efficiency and reliability of geoscience algorithm integration when both the algorithms and data are highly dispersed in the network environment, which is critical for geoscience algorithm integration, particularly when there is an unstable and low-speed network.

Conclusions
Developments in the comprehensive research on Earth systems have led to increased demands for geoscience data and algorithms.To overcome the defects of geoscience analysis and decision-making models in the local area network environment and the heterogeneous implementation of algorithms, this paper provides a method for geoscience algorithm integration in a distributed environment.The interface of the OGC OWS standard specifications is used to solve the problem of interoperability in distributed algorithm integration.A river network extraction experiment is used to demonstrate the feasibility of the proposed method.This study can help to promote the development of a distributed seamless information environment for scientific Earth system research, support the wide and deep sharing, integration, diffusion of geoscience resources, and also contribute to the realization and application of Open Science.
Based on the test results of the experiment, we can conclude that the distribution of data and geoscience algorithms, the network capability, and the size of the processed dataset affect the efficiency of the integrated geoscience model.Data transmission in a distributed environment is a great challenge that can impact the performance of the distributed integrated geoscience model, and this challenge will become more serious if a large volume of distributed data is involved.The experiment also shows that large datasets place great challenges on the processing capability of the distributed computing resources.With the increasing complexity and area of geoscience issues, these challenges for algorithm integrations will become more serious in the future.
To obtain a high-quality integrated geoscience model, more efforts will be dedicated to optimizing the performance and the reliability of distributed geoscience algorithm integration, particularly when both the algorithms and data are dispersed in the distributed environment.Ongoing research will

Figure 1 .
Figure 1.The architecture of distributed geoscience algorithm integration.

Figure 1 .
Figure 1.The architecture of distributed geoscience algorithm integration.

Figure 2 .
Figure 2. Distributed geoscience algorithm integration XML script for river network extraction.

Figure 2 .
Figure 2. Distributed geoscience algorithm integration XML script for river network extraction.

Figure 6 .
Figure 6.The location and area of DEM data.

Figure 7 .
Figure 7. Distribution of the server host VMs on the cloud.(a) Virtual machines (VMs) on the QingCloud; (b) VMs on the Alibaba Cloud.

Figure 6 .
Figure 6.The location and area of DEM data.

Figure 6 .
Figure 6.The location and area of DEM data.

Figure 7 .
Figure 7. Distribution of the server host VMs on the cloud.(a) Virtual machines (VMs) on the QingCloud; (b) VMs on the Alibaba Cloud.

Figure 7 .
Figure 7. Distribution of the server host VMs on the cloud.(a) Virtual machines (VMs) on the QingCloud; (b) VMs on the Alibaba Cloud.

Figure 9
Figure 9 Execution time of the integrated geoscience model.(a) Execution time of the integrated geoscience model within China; (b) Execution time of the integrated geoscience model around the world.

Figure 8 .
Figure 8.The result of the river network extraction.(a) River networks extracted from DEM1; (b) River networks extracted from DEM2; (c) River networks extracted from DEM3; (d) River networks extracted from DEM4.

Figure 8 .
Figure 8.The result of the river network extraction.(a) River networks extracted from DEM1; (b) River networks extracted from DEM2; (c) River networks extracted from DEM3; (d) River networks extracted from DEM4.

Figure 9
Figure 9 Execution time of the integrated geoscience model.(a) Execution time of the integrated geoscience model within China; (b) Execution time of the integrated geoscience model around the world.

Figure 9 .
Figure 9. Execution time of the integrated geoscience model.(a) Execution time of the integrated geoscience model within China; (b) Execution time of the integrated geoscience model around the world.

Table 1 .
Distributed geoscience service classification system.

Table 2 .
The DEM data parameters.

Table 2 .
The DEM data parameters.