A Comprehensive Study of Geochemical Data Storage Performance Based on Different Management Methods

: The spatial calculation of vector data is crucial for geochemical analysis in geological big data. However, large volumes of geochemical data make for inefﬁcient management. Therefore, this study proposed a shapeﬁle storage method based on MongoDB in GeoJSON form (SSMG) and a shapeﬁle storage method based on PostgreSQL with open location code (OLC) geocoding (SSPOG) to solve the problem of low efﬁciency of electronic form management. The SSMG method consists of a JSONiﬁcation tier and a cloud storage tier, while the SSPOG method consists of a geocoding tier, an extension tier, and a storage tier. Using MongoDB and PostgreSQL as databases, this study achieved two different types of high-throughput and high-efﬁciency methods for geochemical data storage and retrieval. Xinjiang, the largest province in China, was selected as the study area in which to test the proposed methods. Using geochemical data from shapeﬁle as a data source, several experiments were performed to improve geochemical data storage efﬁciency and achieve efﬁcient retrieval. The SSMG and SSPOG methods can be applied to improve geochemical data storage using different architectures, so as to achieve management of geochemical data organization in an efﬁcient way, through time consumed and data compression ratio (DCR), in order to better support geological big data. The purpose of this study was to ﬁnd ways to build a storage method that can improve the speed of geochemical data insertion and retrieval by using excellent big data technology to help us efﬁciently solve problem of geochemical data preprocessing and provide support for geochemical analysis.


Introduction
Geochemical mapping plays an important role in both mineral exploration and environmental studies [1]. Geochemical data have the characteristics of complexity, region, and space. The traditional data management model cannot reflect the correlation characteristics of geochemical data, let alone preprocess the geochemically original sampling point data efficiently. Due to the complexity of geochemical data, it is difficult to ensure the integrity of the data in electronic form [2]. At the same time, floating-point-based main geochemical data types consume a lot of computer resources. Moreover, the increase in the amount of geochemical data makes the correlation analysis between elements more and more complicated. It is difficult to meet the needs of scientific research by using only for geochemical data. Distributed database centers for geological big data need PB-level data centers to store and analyze complete geochemical data. Consequently, the above database technologies have the following limitations in terms of data storage capacity: (1) the inability to create spatial indices due to lack of spatial extension; (2) difficulty in storing geochemical data based on traditional data structure; and (3) failure to achieve distributed database architecture via sharding of spatial data [24].
Cloud computing technology, NoSQL, and distributed database cluster technology may bring new solutions to overcome these problems for geological big data [25,26]. The establishment of geochemical databases in big data environments aims at innovating data storage structures and spatial index methods to store and analyze data efficiently at minimum cost. Therefore, in this paper, two advanced methods are proposed to solve the disadvantages of large-scale geochemical data storage, especially in geochemical data analysis for geological big data. These two new storage methods provide compact data structure, better performance in storage space, and efficient retrieval speed. This paper proposes two innovative storage methods of geochemical data: one is based on the Post-greSQL hexadecimal stream, and the other improves the GeoJSON storage mode based on MongoDB. This study implements a storage method based on MongoDB in GeoJSON form (SSMG), and a storage method based on PostgreSQL with open location code (OLC) geocoding (SSPOG), in order to achieve efficient retrieval and data compression. To test geochemical data in these methods, we utilized geochemical data and basic geological data from Xinjiang, in shapefile format. Moreover, data compression ratio (DCR) was used to evaluate the storage efficiency of the SSMG method and the SSPOG method. In order to accurately test the performance of the two methods, we simultaneously compared the speed of storage and data compression between the two methods. Finally, conclusions and future directions are discussed.

Datasets and Environment
In this research work, geochemical data for Xinjiang, in shapefile format, were selected to test the proposed SSMG and SSPOG storage methods. Xinjiang was selected as the study area. Xinjiang is located in the northwest of China, in the center of Eurasia, covering more than 1.66 million square kilometers, accounting for about 1/6 of China's total territory, and has abundant mineral resources ( Figure 1). The establishment of a geochemical database provides data support for the evaluation of mineral resources, groundwater pollution monitoring, and ecological monitoring and evaluation. Geochemical surveys, at home and abroad, along with national geochemical data, have been applied in the process of investigation of mineral resources for decades. Therefore, Xinjiang has abundant mineral resources, which is of great significance in the establishment of a geochemical database.
Shapefile data are often used as a data source for experiments [27]. This experiment was designed to test the performance of the SSMG and SSPOG storage methods using geochemical data. Shapefile is a vector graphics format, which can save the location of spatial elements and related attributes, but this format cannot store the topological information of geographical data. At present, many free programs or commercial programs can read shapefile data. Shapefile can store the location data of spatial features, but cannot store the attribute data of these spatial features in a file at the same time. Therefore, shapefile may also be accompanied by a two-dimensional table file to store attribute information for each spatial feature. A complete ESRI shapefile file consists of a main file (.shp), an index file (.shx), and a table file (.dbf). The main file is composed of a fixed-length header and a variable-length record; it is mainly used to keep spatial feature records. The index file contains a 100-byte header and an 8-byte fixed-length record, recording the location of each spatial feature in the main file. The table file contains the characteristic attributes of each spatial feature in the shapefile file. The corresponding relationship between the table file and the spatial feature record in the main file is established by the index file. Therefore, shapefile data are adopted for the storage of geochemical data. Because the  The corresponding relationship between the  table file and the spatial feature record in the main file is established by the index file.  Therefore, shapefile data are adopted for the storage of geochemical data. Because the  SSMG method is a storage mechanism based on the MongoDB database, shapefile data  inserted into the database become a complete document form. The SSPOG method is  based on the PostgreSQL database, which is similar to the form of table file in shapefile, but SSPOG integrates shapefile spatial information into hexadecimal code and stores it in the database. To explain the differences between SSMG and SSPOG, the time consumed by Mon-goDB and PostgreSQL operations was recorded. Therefore, PostgreSQL and MongoDB were deployed on a single-machine environment, and database visualization softwaresuch as PremiumSoft's Navicat Premium-was deployed to observe the result data. In addition, ArcGIS and QGIS were used to display the result maps, showing the configuration details of each platform (Table 1).

Experimental Design
In our experiment, we tested the SSMG and SSPOG methods with geochemical data in shapefile format. The SSMG method of geochemical data contains two processes-JSONification, and cloud storage-while the SSPOG method of geochemical data contains To explain the differences between SSMG and SSPOG, the time consumed by MongoDB and PostgreSQL operations was recorded. Therefore, PostgreSQL and MongoDB were deployed on a single-machine environment, and database visualization software-such as PremiumSoft's Navicat Premium-was deployed to observe the result data. In addition, ArcGIS and QGIS were used to display the result maps, showing the configuration details of each platform (Table 1).

Experimental Design
In our experiment, we tested the SSMG and SSPOG methods with geochemical data in shapefile format. The SSMG method of geochemical data contains two processes-JSONification, and cloud storage-while the SSPOG method of geochemical data contains three processes: geocoding, extension, and data storage. Based on the methodology detailed in Sections 2.3 and 2.4, Python was used to insert geochemical data into the different databases in two ways. In addition, the geochemical data were stored in the database according to the table structure described in Section 2.5. As shown in Section 3.1, Remote Sens. 2021, 13, 3208 5 of 15 the two storage methods are evaluated by the DCR criterion. Section 3.2 describes the application of geochemical data in the SSMG and SSOG methods. Section 3.3 compares the data storage performance of the two methods through a variety of evaluation criteria and statistical methods.

SSMG Method
The big data technology group includes three parts: distributed database, parallel computing, and data mining. MongoDB, HBase, Neo4j, and Redis are all popular databases today. MongoDB has the ability to process massive data efficiently [28], supports embedded document objects and array objects [29], and has an automatic sharding mechanism [30]. In addition, MongoDB can provide a high-performance and -availability solution for storing unstructured data. MongoDB stores data in document form. Each document consists of multiple keys and their corresponding values, supports arrays and documents, and can store complex data types. When spatial data are stored in MongoDB, each spatial object is transformed into a JSON object by using the GeoJSON format for spatial data expression, and the spatial and non-spatial attributes of spatial objects are stored in <key,value> mode. Finally, spatial data are serialized into JSON files and stored on disk. GeoJSON defines the following geometric types: Point, LineString, Polygon, MultiPoint, MultiLineString, Mul-tiPolygon, and GeometryCollection. Attributes contain geometric objects and additional information, as well as attribute sets [31]. Compared with the XML data format, GeoJSON supports multiple server-side languages, and is easy to access and extract for the clients, thus reducing the amount of code development on both the server and client sides.
The characteristics of shapefile data stored in GeoJSON are different from relational database storage mechanisms, integrating spatial information and attribute information to ensure consistency [32]. MongoDB was chosen as the container for storing GeoJSON because it is not only a NoSQL distributed database with good performance [33], but also has more advantages in storing document data. In addition, using MongoDB can achieve compatibility with other software. The proposed SSMG method illustrates how to store geochemical data in the form of GeoJSON in the document-type database MongoDB ( Figure 2). This method consists of two tiers: JSONification, and cloud storage. storage methods are evaluated by the DCR criterion. Section 3.2 describes the applicat of geochemical data in the SSMG and SSOG methods. Section 3.3 compares the data st age performance of the two methods through a variety of evaluation criteria and statist methods.

SSMG Method
The big data technology group includes three parts: distributed database, para computing, and data mining. MongoDB, HBase, Neo4j, and Redis are all popular da bases today. MongoDB has the ability to process massive data efficiently [28], suppo embedded document objects and array objects [29], and has an automatic sharding me anism [30]. In addition, MongoDB can provide a high-performance and -availability so tion for storing unstructured data. MongoDB stores data in document form. Each do ment consists of multiple keys and their corresponding values, supports arrays and d uments, and can store complex data types. When spatial data are stored in MongoD each spatial object is transformed into a JSON object by using the GeoJSON format spatial data expression, and the spatial and non-spatial attributes of spatial objects stored in <key,value> mode. Finally, spatial data are serialized into JSON files and sto on disk. GeoJSON defines the following geometric types: Point, LineString, Polygon, M tiPoint, MultiLineString, MultiPolygon, and GeometryCollection. Attributes contain g metric objects and additional information, as well as attribute sets [31]. Compared w the XML data format, GeoJSON supports multiple server-side languages, and is easy access and extract for the clients, thus reducing the amount of code development on b the server and client sides.
The characteristics of shapefile data stored in GeoJSON are different from relatio database storage mechanisms, integrating spatial information and attribute informat to ensure consistency [32]. MongoDB was chosen as the container for storing GeoJSO because it is not only a NoSQL distributed database with good performance [33], but a has more advantages in storing document data. In addition, using MongoDB can achi compatibility with other software. The proposed SSMG method illustrates how to st geochemical data in the form of GeoJSON in the document-type database MongoDB (F ure 2). This method consists of two tiers: JSONification, and cloud storage. As a core part of SSMG, the JSONification tier is used to convert geological vec data to GeoJSON format data. The GDAL/OGR spatial database conversion interfac used to process tasks by this tier. The Geospatial Data Abstraction Library (GDAL) conversion interface developed by the Open Source Geospatial Foundation (OSGeo) der the Massachusetts Institute of Technology X/MIT license agreement. The OGR Sim Features Library (OGR) is a part of the GDAL, which mainly provides support for vec data, including 84 different types of vector data. The OGR interface treats the shape dataset as a whole, and a single shapefile in the dataset as one of the layers. The re As a core part of SSMG, the JSONification tier is used to convert geological vector data to GeoJSON format data. The GDAL/OGR spatial database conversion interface is used to process tasks by this tier. The Geospatial Data Abstraction Library (GDAL) is a conversion interface developed by the Open Source Geospatial Foundation (OSGeo) under the Massachusetts Institute of Technology X/MIT license agreement. The OGR Simple Features Library (OGR) is a part of the GDAL, which mainly provides support for vector data, including 84 different types of vector data. The OGR interface treats the shapefile dataset as a whole, and a single shapefile in the dataset as one of the layers. The read driver reads the outer ring clockwise and the inner ring counterclockwise under the polygon specification. If the topological relationship of the shapefile is damaged under the polygon specification, the configuration option OGR_ORGANIZE_POLYGONS can be reset to complete the analysis of the topological relationship of the original data. The GeoJSON driver supports reading and writing access data in GeoJSON format, as well as the use of GeoJSON for other map service formats, such as GeoServer or CartoWeb. The GeoJSON driver maps five types of element objects-Point, LineString, Polygon, GeometryCollection, and Feature-to new OGRFeature objects. According to the requirements of GeoJSON's specifications, because the members with properties are the characteristics of element objects, every member with properties of OGR objects converted into OGRField type is finally inserted into the corresponding OGRFeature objects. Therefore, the JSONification tier achieves storage of geological vector data in GeoJSON geocoding format.
The cloud storage refers to a distributed database cluster. When more data are stored in the database, a single database cannot meet the storage requirements, nor can it provide acceptable read/write throughput. A distributed database enables the database system to store more data by partitioning the data on multiple other servers. For client users, there is no need to know whether the data are split or not, nor the corresponding server for data sharding. The data sharding task is performed by a route process, which records the storage location of all data and the corresponding relationship between data and shards. The JSONification tier documents the geochemical data, while the cloud storage tier groups the documents into blocks, each consisting of a specified range of keys. The cloud storage tier records the amount of inserted data in each data block, and once the split threshold is reached, the collection of the target database is split. For the client, it simply connects to an ordinary process. In the database service of data request, the location of the target data can be obtained by this process, and the data are collected by the route process and returned to the client. On account of their fast access speed, superior performance, and easy expansion, distributed databases are quite appropriate for geochemical data. Distributed databases provide an easy and fast storage environment for geochemical data.

SSPOG Method
PostgreSQL is an open-source object-relational database management system, which supports the management of geospatial data. Moreover, some fundamental geometric types have been defined in PostgreSQL. The proposed SSPOG method in this study shows the architecture of SSPOG ( Figure 3). The SSPOG method innovatively uses OLC geocoding as the geographic index of vector data, follows a Simple Feature for Structured Query Language (SFS) [34] model to extend geometry objects under Open Geospatial Consortium (OGC) specifications, and stores unstructured geographic data in a spatial database in the form of two-dimensional relational tables. This method consists of three tiers: geocoding, extension, and storage. The purpose of the geocoding tier is to process the conversion of longitude and latitude of the WGS84 coordinate system to OLC. The input is a large number of longitude and latitude coordinates (LLCs), while the output is a simpler OLC. In the geocoding tier, the conversion interface is transmitted through a dedicated algorithmic reference table supported by the Google Maps spatial engine. The algorithm is authorized to execute un- The purpose of the geocoding tier is to process the conversion of longitude and latitude of the WGS84 coordinate system to OLC. The input is a large number of longitude and latitude coordinates (LLCs), while the output is a simpler OLC. In the geocoding tier, the conversion interface is transmitted through a dedicated algorithmic reference table supported by the Google Maps spatial engine. The algorithm is authorized to execute under Apache License 2.0. Characters that are not easily confused in more than 30 languages are selected as the OLC code. Meanwhile, each geographic code describes an area consisting of two longitudes and latitudes, as determined by its southwest corner and size. According to the requirement of user request, the geocoding length that meets the accuracy is determined in the geocoding tier. As the geocoding length continues to expand, the target area becomes more precise. When the encoding is extended to 11 characters, the mapping to the Earth's surface can accurately describe the geographical entity with a precision of 3 m. Compared with LLC, OLC coding takes up less space, and is generated by open-source algorithms. OLC coding can identify any part of the Earth, which is an appropriate solution to improve the processing speed and positional identification accuracy of coding.
The extension tier is designed to implement the mapping of geochemical data to geographic entity objects. In order to follow the SFS model specification under OGC, two sets are used to track and report geometries in the database. A collection calls the spatial reference identifier (SRID) to define all known spatial reference systems in the database. The SRID corresponds to a spatial reference system based on a specific ellipse, and can be used for planar or spherical mapping. The extension tier supports the input and output of geological vector data in a variety of formats, including well-known text (WKT), well-known binary (WKB), extended well-known text (EWKT), extended well-known binary (EWKB), and other format types. Among them, the EWKT and EWKB formats are three-dimensional representation formats formally defined by the Structured Query Language (SQL)-Multimedia Part 3 (SQL/MM) specification. According to the request of SFS specification, geochemical data can be fundamentally processed.
The storage tier is the link of executing all types of geochemical data storage. After the model specification of the extension tier, POINT, LINE, POLYGON, POLYGON with a hole, and COLLECTION are used to map geographic entities on the Earth. There are many types of geological data. The client may create geological databases on different topics according to different geological disciplines, including geochemical databases, basic geological databases, and geotectonic databases. Therefore, the storage tier builds different databases according to metadata tables of different topics. Requests for geochemical data are sent through a dedicated job submission interface, which converts the shapefile into spatial databases suitable for insertion into geometric or geographic formats.
Because longitude and latitude require large storage space, and are stored in the form of point features in the database, the efficiency of geochemical data execution is affected. The proposed SSPOG method uses OLC geocoding to accurately describe the common surface elements in geological research with 10-12 characters to meter level, which improves the efficiency of geochemical data, and can quickly and accurately obtain the location information of the target feature. Because the SSPOG method is based on PostgreSQL-a relational database with pluggable type extensions and functional extensions-the spatial and attribute information of geochemical data are therefore used for management in a relational database. Through the extension of geometry objects under the OpenGIS protocol, spatial information is inserted into the database in a hexadecimal system. PostgreSQL distributed function extension technologies-such as Citus, Green Plum, and PL/Proxy-are appropriate choices to support the distributed management of big data technology.

Design for Data Tables in SSMG and SSPOG to Store Geochemical Data
A dataset is divided into several parts by a relational database, and then stored in the corresponding tables. When the data need to be used, they are spliced together and used. For example, a table describing remote sensing data information is designed according to the third paradigm [35], when different remote sensing data cover a study area. A single table can be used to store remote sensing images of different time series and read the required data through the association between tables when displaying available remote sensing data. Meanwhile, the geochemical data storage mechanism of SSMG is quite different from this mode. Since this kind of storage unit is a document that supports arrays and nested documents, SSMG can directly describe all attribute information of geochemical data with a documented data structure (Figure 4). Each field in the entity represents a type of information in the SSMG method, and is not a form of table. The association function of a relational database is not necessarily its advantage, but a necessary condition for it to work. In the SSMG method, using its rich document characteristics, it does not require every document to have the same structure, and supports many heterogeneous data scenarios very well. To some extent, association is a pseudo-requirement, which can be avoided by reasonable modeling.
Plum, and PL/Proxy-are appropriate choices to support the distributed management of big data technology.

Design for Data Tables in SSMG and SSPOG to Store Geochemical Data
A dataset is divided into several parts by a relational database, and then stored in the corresponding tables. When the data need to be used, they are spliced together and used. For example, a table describing remote sensing data information is designed according to the third paradigm [35], when different remote sensing data cover a study area. A single table can be used to store remote sensing images of different time series and read the required data through the association between tables when displaying available remote sensing data. Meanwhile, the geochemical data storage mechanism of SSMG is quite different from this mode. Since this kind of storage unit is a document that supports arrays and nested documents, SSMG can directly describe all attribute information of geochemical data with a documented data structure (Figure 4). Each field in the entity represents a type of information in the SSMG method, and is not a form of table. The association function of a relational database is not necessarily its advantage, but a necessary condition for it to work. In the SSMG method, using its rich document characteristics, it does not require every document to have the same structure, and supports many heterogeneous data scenarios very well. To some extent, association is a pseudo-requirement, which can be avoided by reasonable modeling.   Inheriting the advantages of the geospatial relation-object model, the storage of geospatial set elements conforms to the description and definition of geographic elements by OGC in SSPOG. The structure of the SSPOG method table is mainly divided into two parts: One is a traditional structured attribute column, which meets all the requirements of a traditional relational database paradigm. The other part is the spatial information column, which stores geometric objects in hexadecimal form. Each spatial data record in SSPOG stores a spatial feature, and integrates all tables into a dataset with the same spatial reference system.

Data Compression Ratio (DCR)
In order to achieve large-scale geochemical data storage, the SSMG and SSPOG methods are used to store unstructured data. There are great differences between the two methods proposed in this research. The former is used to transform the spatial information and attribute information of the shapefile into GeoJSON format and store them in database. The latter is used to extend the spatial information of the shapefile following the OGC protocol, and store it in the database in the form of two-dimensional tables. The increase or decrease in space occupied by data insertion into the database is one of the important evaluation criteria for a data organization mode, and the efficient storage of data is also pursued in the era of big data. Therefore, a new method of evaluating data storage mechanisms-DCR-is proposed in this study. In order to analyze the increase and decrease in the space occupied by two different methods for storing data, firstly, the space occupied by shapefile-encoded experimental data stored on a Windows file system was recorded, which was used as the standard control group for the experiment. Secondly, the experimental data were recorded and stored in different databases using SSMG and SSOG. Thirdly, the amount of space taken up by recording the experimental data in different databases via SSMG and SSPOG was recorded. Finally, the DCR values of different methods were calculated according to (1). The size of DCR represents the efficiency of data storage.
where R is the DCR of the database, D T is the space occupied by the experimental group data, and D 0 is the space occupied by the control group data.

Geochemical Data Storage and Data Presentation
This study measured the time needed to reconstruct geochemical data into a GeoJSON structure and store it in a two-dimensional table structure. In addition, the time consumed to retrieve data based on the SSMG and SSPOG methods and their corresponding DCR were also measured. The experiment consisted of two steps: storing geochemical data, and mapping them. When using SSMG to store geochemical data, the efficiency of its storage function was evaluated. Three steps were performed in sequence: (1) Clients obtain all the information of geochemical data from the data source by inheriting the GetLayer operation of the GDAL/OGR spatial feature library, and shapefile data are reconstructed into GeoJSON form via the Feature.ExportToJson function. This contains the original data with all the spatial information and attribute information. (2) Clients register data into the MongoDB cluster through the metadata tables already designed in the system to provide data foundation for geological data analysis. (3) At this point, MongoDB divides the documents registered in the database into blocks. When block data reach a threshold, MongoDB divides them into two smaller blocks. Finally, geochemical data are inserted into MongoDB in the form of GeoJSON.
Similarly, when using the SSPOG method to store geochemical data, the efficiency of its storage function was also evaluated. Three steps were performed in sequence: (1) Clients use the DECODE function to encode the shapefile data of the research area, so that each spatial feature can be accurately described by OLC. (2) The SSMG method follows the SFS model specification under OGC to extend shapefile data to geometry objects, describing the spatial information of data in the form of hexadecimal characters. (3) Through the specific model, the structured attribute information and the extended spatial information are uniformly stored in the two-dimensional table structure, so that clients can analyze spatial data with SQL.
In the process of displaying geochemical data, the geochemical data were retrieved from the database through the application interface accessed by the database, and the data were displayed via the graphical software. Based on the different element content values in geochemical data, the original data were symbolized and displayed, and finally the display results were obtained. The results showed the geochemical element contents of different elements based on shapefile data ( Figure 5). Geochemical data contain information about element content in most of the regions. If the kind of data can be used quickly and efficiently, this can provide effective data support for geological big data.
inserted into MongoDB in the form of GeoJSON.
Similarly, when using the SSPOG method to store geochemical data, the efficiency of its storage function was also evaluated. Three steps were performed in sequence: (1) Clients use the DECODE function to encode the shapefile data of the research area, so that each spatial feature can be accurately described by OLC. (2) The SSMG method follows the SFS model specification under OGC to extend shapefile data to geometry objects, describing the spatial information of data in the form of hexadecimal characters. (3) Through the specific model, the structured attribute information and the extended spatial information are uniformly stored in the two-dimensional table structure, so that clients can analyze spatial data with SQL.
In the process of displaying geochemical data, the geochemical data were retrieved from the database through the application interface accessed by the database, and the data were displayed via the graphical software. Based on the different element content values in geochemical data, the original data were symbolized and displayed, and finally the display results were obtained. The results showed the geochemical element contents of different elements based on shapefile data ( Figure 5). Geochemical data contain information about element content in most of the regions. If the kind of data can be used quickly and efficiently, this can provide effective data support for geological big data.

Performance Evaluation
The experiment compared the storage efficiency of SSMG with SSPOG when storing different numbers of features. The SSMG and SSPOG methods are based on open-source servers; the databases of SSMG and SSPOG were MongoDB and PostgreSQL, respectively. Specifically, the experiments of SSMG and SSPOG were carried out in the same hardware

Performance Evaluation
The experiment compared the storage efficiency of SSMG with SSPOG when storing different numbers of features. The SSMG and SSPOG methods are based on open-source servers; the databases of SSMG and SSPOG were MongoDB and PostgreSQL, respectively. Specifically, the experiments of SSMG and SSPOG were carried out in the same hardware environment. Because computer performance would be affected by other processes, the average of three repeated experiments was taken in this experiment. For shapefile data with 129,419, 239,344, and 421,897 features, the time consumed by the SSMG method was approximately 515, 955, and 1646 s, respectively. Meanwhile, the time consumed by the SSPOG method was approximately 165, 293, and 509 s, respectively ( Figure 6). When storing 453,988 features, the SSMG method reached approximately 1727 s, while SSPOG reached 550 s. Overall, the SSPOG method was approximately three times more efficient than the SSMG method.
The time consumption growth trend of the SSMG and SSPOG methods was linear with respect to the number of features (Figure 7). The slope of SSMG was approximately 0.0038s/row, while the slope of SSPOG was approximately 0.0012s/row. The SSPOG method is much more efficient than the SSMG method when storing large quantities of geochemical data.
In the same way, this experiment also compared the DCR of the SSMG method with the SSPOG method when storing different numbers of features. For shapefile data with 129,419, 239,344, and 421,897 features, the DRC of SSMG was approximately 22.40%, 22.37%, and 21.43%, respectively, whereas for the SSPOG method it was approximately 53.39%, 53.67%, and 52.07%, respectively (Figure 8). The DRC of SSMG trends to~22%, while the DRC of the SSPOG method trends to~53%. Overall, the DRC of SSMG does not reach half that of the SSPOG method.
In the same way, this experiment also compared the DCR of the SSMG method with the SSPOG method when storing different numbers of features. For shapefile data with 129,419, 239,344, and 421,897 features, the DRC of SSMG was approximately 22.40%, 22.37%, and 21.43%, respectively, whereas for the SSPOG method it was approximately 53.39%, 53.67%, and 52.07%, respectively (Figure 8). The DRC of SSMG trends to ~22%, while the DRC of the SSPOG method trends to ~53%. Overall, the DRC of SSMG does not reach half that of the SSPOG method.   the SSPOG method when storing different numbers of features. For shapefile data with 129,419, 239,344, and 421,897 features, the DRC of SSMG was approximately 22.40%, 22.37%, and 21.43%, respectively, whereas for the SSPOG method it was approximately 53.39%, 53.67%, and 52.07%, respectively (Figure 8). The DRC of SSMG trends to ~22%, while the DRC of the SSPOG method trends to ~53%. Overall, the DRC of SSMG does not reach half that of the SSPOG method.    In conclusion, the SSPOG method was more efficient when storing different numbers of features. With the number of features increased, the time consumed by SSPOG decreased in comparison with SSMG. Compared with document management systems, the SSMG and SSPOG methods provide new ways to store geochemical data, and support higher storage capacity. Compared with SSMG, SSPOG provides higher and more efficient storage methods (Figures 6 and 8). Meanwhile, using the DCR index, SSPOG provides better compression data capability compared with capacity. However, using different retrieval methods, it is apparent that the SSMG method is better than the SSPOG method in terms of retrieval.    Table 2 shows the performance of testing retrieval under different methods. Dealing with 129,719 features, the time consumed was different with respect to different storage and retrieval methods. Using the collection query method (CQM), the time consumed by the SSMG method was 220 milliseconds. In the same way, the time consumed by the SSPOG method was 2450 milliseconds ( Figure  9). Overall, the SSMG method was approximately 10 times faster than the SSPOG method in retrieval.
In conclusion, the SSPOG method was more efficient when storing different numbers of features. With the number of features increased, the time consumed by SSPOG decreased in comparison with SSMG. Compared with document management systems, the SSMG and SSPOG methods provide new ways to store geochemical data, and support higher storage capacity. Compared with SSMG, SSPOG provides higher and more efficient storage methods (Figures 6 and 8). Meanwhile, using the DCR index, SSPOG provides better compression data capability compared with capacity. However, using different retrieval methods, it is apparent that the SSMG method is better than the SSPOG method in terms of retrieval.  Figure 8. DCR of geochemical data using two methods with different numbers of features. Table 2 shows the performance of testing retrieval under different methods. Dealing with 129,719 features, the time consumed was different with respect to different storage and retrieval methods. Using the collection query method (CQM), the time consumed by the SSMG method was 220 milliseconds. In the same way, the time consumed by the SSPOG method was 2450 milliseconds (Figure 9). Overall, the SSMG method was approximately 10 times faster than the SSPOG method in retrieval.

Discussion
In this experiment, the geochemical data were stored and accessed using the SSMG and SSPOG methods. In the performance evaluation stage, the SSPOG method consumed less time than document methods, such as SSMG. Relational databases are structurally compact and less redundant compared with document databases. The basic structure of shapefile data is to store information in the form of traditional attribute tables. The SSPOG method stores the spatial information of geochemical data as structured data in a relational database after spatial extension. Therefore, the SSPOG method has more advantages than SSMG in terms of saving and compressing data. However, the SSMG method helps to solve the problem of geochemical data storage for retrieval. The document database <key,value> data storage mode eliminates the close relationship between different data in the relational database, and achieves the direct acquisition of target data from the database. Therefore, the SSMG method performs better in terms of retrieval. The experimental results were compared with one another, and the advantages of SSMG and SSPOG are as follows:

Discussion
In this experiment, the geochemical data were stored and accessed using the SSMG and SSPOG methods. In the performance evaluation stage, the SSPOG method consumed less time than document methods, such as SSMG. Relational databases are structurally compact and less redundant compared with document databases. The basic structure of shapefile data is to store information in the form of traditional attribute tables. The SSPOG method stores the spatial information of geochemical data as structured data in a relational database after spatial extension. Therefore, the SSPOG method has more advantages than SSMG in terms of saving and compressing data. However, the SSMG method helps to solve the problem of geochemical data storage for retrieval. The document database <key,value> data storage mode eliminates the close relationship between different data in the relational database, and achieves the direct acquisition of target data from the database. Therefore, the SSMG method performs better in terms of retrieval. The experimental results were compared with one another, and the advantages of SSMG and SSPOG are as follows: (1) The SSPOG method efficiently stores geochemical data in shapefile format. The SSPOG method can store different types of geographic elements-such as point, polyline, and polygon-in different ways. This storage method enables the same type of data to be invoked to extract multisource data information in geological big data analysis functions. Meanwhile, OLC enables SSPOG to save lots of space and locate target features more accurately, as described in Section 2.2. In terms of storage efficiency and speed, merging two floating-point fields into one character field is an innovation for traditional spatial data storage. As the number of geochemical data increases, so too does the time consumed by SSPOG. Therefore, for the above reasons, the SSPOG method improves the efficiency of storing geochemical data; (2) The SSMG method innovates the storage form of geochemical data and improves the retrieval efficiency. On account of the increasing accuracy and complexity of geological data description, it is difficult to implement retrieval in large-scale data in an efficient way. The vector format of geochemical data is expressed in the form of <key,value> by SSMG, which breaks through the complex relationships between attributes in relational databases. As mentioned in the conclusion, the storage method is much faster than retrieval in relational database space. Through geochemical data storage in GeoJSON format, this vector data storage method supports a two-dimensional spherical spatial index, and solves the application problem of location-based service (LBS), so it is suitable for large-scale retrieval research. Meanwhile, the clustering technology of MongoDB enables a vector dataset to be segmented and stored on different data nodes, which provides a technological foundation for the distributed analysis and calculation of geochemical data.
Challenges still remain in terms of data storage structure and database organization; more efficient storage methods of geochemical data can be established to achieve geological big data storage. Future work will focus on the following: (1) The OLC unique coding and matching technology of vector features' locations and geometric features can solve the problem of unified coding of elements in geochemical data. Through the uniform coding of geological entities, the matching of geological spatial features can be converted into document format via coding matching, which can improve the matching efficiency of geological data. (2) Storing a large amount of geochemical data in different clusters can make full use of idle computer resources, and improve the data availability and performance of large database retrieval servers. Therefore, database cluster sharding technology will be the focus of our next work.

Conclusions and Future Work
This study implemented unstructured spatial data storing methods to improve the storage efficiency of vector data and achieve shapefile data application in the retrieval of geochemical data. Our experiment demonstrated that the SSPOG and SSMG methods achieved creative geochemical data storage and retrieval at a large scale. These two methods showed different performance in storing and retrieving geochemical data. In terms of storage performance, the efficiency of geochemical data storage in SSPOG can be threefold greater than that of SSMG. The SSPOG method showed the advantage of the close data structure of the relational database through spatial extension under OGC standard. In terms of data compression, through the DCR index proposed in this paper, the efficiency of data compression in SSMG was better than that of SSPOG. Meanwhile, the retrieval performance of SSMG was better than that of SSPOG; that is to say, the SSMG method was able to complete real-time geological retrieval tasks with excellent performance when storing geochemical data at a large scale. Because the SSMG method uses a document structure to store geochemical data, it can obtain a looser structure, so it performs better in terms of data compression and retrieval. In fact, 90% of the time consumed in storing geochemical data in SSMG is a process of documentation, which takes only a short time to insert document data into the database. Therefore, documented vector data have more advantages in optimizing storage space and retrieval.
Compared with the traditional retrieval of geochemically original data, the two geochemical data management models based on big data technology proposed in this paper show effective improvement. It takes less than 1 s to find the target data from 460,000 records, which is an efficiency that cannot be achieved by the traditional geochemically original data management model. On the basis of these management models, the abnormal values in the massive geochemical data can be quickly found and processed. At the same time, the core of geochemical big data analysis is to retrieve the target data from the massive data for processing and analysis, and the methods proposed in this paper can provide efficient technological support. In addition, the SSPOG and SSMG methods have their own advantages and disadvantages in terms of storage and retrieval performance. Under different conditions, different methods can be selected.
At present, the focus of our research is on the improvement of spatial data storage performance and retrieval by range index attributes. In future works, the spatial index will be the focus of our research. In the two methods proposed in this paper, the use of a spatial index can increase the accuracy of data retrieval, and in different application scenarios can also improve the efficiency of data retrieval.