1. Introduction
Vector geographic data is one of the most important production materials in information society [
1]. It is an inevitable requirement to ensure the security of vector geographic data in order to develop geographic information systems (GIS) industries. As a frontier technology for information security, digital watermarking plays a critical role in copyright protection and content authentication of vector geographic data [
2,
3,
4,
5,
6]. Particularly in terms of copyright protection, zero watermarking has gained more and more attention. It is a kind of watermarking technology that does not cause any modification to the host data [
7]. Zero watermarking constructs watermark by means of quantifying the characteristics of the data and then registers the watermark and additional information to a third-party intellectual property rights (IPR) repository. Therefore, compared with the traditional embedding watermarking [
8], zero watermarking has no damage to data accuracy, which can be applied for vector geographic data with high-precision requirements. It balances the contradiction between the invisibility and the robustness of watermarking [
9]. The robustness of watermarking refers to the ability to detect the watermark information from watermarked data after being attacked [
10,
11]. However, there are many kinds of attacks for vector geographic data, such as geometrical attacks, vertex attacks, and object attacks [
12,
13]. These attacks will damage data from different perspectives, thereby affecting the synchronization of watermark information. It puts forward higher demands for the robustness of zero watermarking. Thus, how to improve the robustness is a hotspot in the current research of zero watermarking for vector geographic data.
The existing zero watermarking methods for vector geographic data can be divided into two types. The first type is the method based on attribute characteristics [
14,
15,
16,
17]. It quantifies the descriptive information of vector geographic data (such as what, why, and when) to construct a zero watermark. For example, scholars select element coding [
14], stroke width [
15], color [
16], and map symbols [
17] as the attribute characteristic. If the attribute information is of a numeric type, it can be directly quantified to watermark information. If it is of a text type, some approaches of converting text to numbers need to be used, such as encoding and statistics. Generally, this type of method has strong robustness and can perfectly resist geometrical attacks, including rotation, scaling, and translation (RST) attacks. This is because RST attacks only change coordinates, not attributes. However, the method has high requirements for the integrity of the attributes. Only the vector geographic data whose attributes meet certain conditions can implement watermark embedding. Moreover, the attribute information of vector geographic data differs from the production stage and the application scenario. Therefore, this type of method has significant application limitations.
The second type is the method based on spatial characteristics [
8,
9,
18,
19,
20,
21,
22,
23,
24,
25,
26,
27,
28,
29,
30,
31,
32]. It quantifies the coordinates of vector geographic data to construct the zero watermark. There are two ways to use coordinates. One is to use the coordinate directly. For example, literature [
21] compares the coordinates of vertices to obtain a Boolean sequence and then quantify the sequence to a watermark that contains only zero and one. Literatures [
9,
18,
24] count the number of vertices that meet certain spatial location conditions. Another is to use the coordinate indirectly. For example, scholars employ the angle [
8,
19,
20,
29,
30,
31,
32], distance [
25,
26], distance ratio [
22,
27], and topology [
28] to construct a watermark, respectively. Compared with attribute characteristics, spatial characteristics are the basis of vector geographic data. Therefore, the method based on spatial characteristics solves the application limitation of the first type. Besides, most of them can resist common geometrical attacks. However, the watermark synchronization of this method is strongly coupled with the coordinates, so it is difficult to resist the attacks with significant distortion, such as non-uniform scaling and projection transformation attacks.
Through the above analysis, it is found that the method based on attribute characteristics is robust to attacks related to coordinates but has many limitations in practical applications. The method based on spatial characteristics solves the issues of the former and can resist common geometrical attacks, but it is difficult to resist attacks that have significant distortion on the data. Therefore, how to find the characteristics that can completely resist data deformation is a critical scientific problem.
To resolve the above problem, we propose a zero watermarking algorithm based on the number of neighboring features for vector geographic data. It changes the target of statistics from vertices to features and is fully integrated with topological relationships between features. The remainder of this paper is organized as follows.
Section 2 gives the basic ideas and details of the algorithm. The experimental design is described in
Section 3. Then
Section 4 provides corresponding experimental results and analyses, and
Section 5 gives some discussions. Finally,
Section 6 draws the conclusion.
2. Methodology
The proposed method is based on the number of neighboring features of vector geographic data. Specifically, the polygon data is used as the watermarking target in this paper. For a polygon, it has first-order neighbors, that is, the polygons that are neighbors directly to the polygon. At the same time, it also has second-order neighbors that refer to neighbors of neighbors. The key to the method is how to use the number of neighboring features. Firstly, the number of first-order neighboring features (referred to as NFNF hereafter) and the number of second-order neighboring features (referred to as NSNF hereafter) are counted for every feature. Then, NFNF is quantified as the watermark bit, and NSNF is quantified to the watermark index. The watermark index represents the mapping relationship between the feature and the watermark bit. Finally, the watermark is constructed by combining the watermark indices and the watermark bits.
Figure 1 shows the basic idea of the algorithm. The final constructed watermark can be encrypted with a secret key to enhance the security of the method. For example, the literature [
28] uses a logistic mapping model to encrypt the watermark information. That is, the proposed method can be combined with the encryption method. This paper focuses on the procedure of constructing the zero watermark, so we do not integrate any encryption operations in the method.
2.1. Neighboring Features
The neighboring relationships can be divided into three types for polygons: (1) Overlapping neighbors: polygons that have all or part of their areas overlapping; (2) Edge neighbors: polygons that have common or touching boundaries; (3) Node neighbors: polygons that touch at a point (
https://desktop.arcgis.com/en/arcmap/latest/tools/analysis-toolbox/how-polygonneighbors-analysis-works.htm). Polygon A and polygon B demonstrate these three types of neighboring relationships in
Figure 2. In the proposed method, if two features belong to any of the three types, they will be regarded as neighbors.
Figure 3 demonstrates the statistics of neighboring features based on the above three neighboring relationships.
Figure 3a shows the statistics of NFNF values, and the number in each polygon is its corresponding NFNF value. Similarly,
Figure 3b shows the statistics of NSNF values. For a polygon, its NSNF is the sum of the NFNFs of all its neighbors and is calculated by the following equation.
where
n is the NFNF of a polygon and
means the NFNF of its
i-th neighbor.
In
Figure 3, take polygon A in the upper left corner as an example. From
Figure 3a, it is easy to see that polygon A has two neighbors: B and C, so the NFNF of polygon A is 2. Then, polygon B and polygon C have five and four neighbors separately, so the NSNF of polygon A is 9, as shown in
Figure 3b. Clearly, the NSNF value is larger than the NFNF value for a polygon. Therefore, using the NSNF to quantify the watermark index can make the distribution of the watermark index more uniform.
2.2. The Determination of the Watermark Bit
In this paper, the watermark is a binary sequence with a fixed length [
27]. Every watermark bit is zero or one. The watermark bit of a polygon is determined by quantizing its NFNF. The quantization is a process of mapping input values from a large set to output values in a smaller set. We use the following equation to quantify the NFNF.
where Mod is the modulo operation.
2.3. The Determination of the Watermark Index
Similar to the determination of the watermark bit, the watermark index is obtained by quantizing the NSNF. Set the length of the watermark to N, so the watermark index of a polygon ranges from 1 to N. It can be calculated by the following equation.
where Rand is a random number generator that creates uniformly distributed pseudorandom integers between 1 and N. The first parameter of Rand is the seed or start value of the random number generator.
The watermark index of a polygon presents the position of its watermark bit in the final zero watermark. There will be a situation that the watermark indices of multiple polygons are the same. Then, the majority voting mechanism [
12] is employed to determine the watermark bit on the watermark index. For a watermark index, if the number of watermark bit 1 is larger than that of watermark bit 0, the watermark bit will be recorded as 1. Otherwise, the watermark bit will be recorded as 0.
2.4. Watermark Generation and Extraction
In the proposed method, the watermark generation and the watermark extraction are exactly the same processes. However, they happen at different times. The watermark generation occurs in the initial stage of the algorithm to generate a watermark. However, the watermark extraction is executed when copyright infringement needs to be confirmed, which generates a watermark from suspicious data and identifies the watermark by comparing it with the watermark registered in the IPR repository.
Denote the polygon data as
,
presents the polygon in
i-th position by storage, and M is the total number of polygons. Firstly, count the NFNF and NSNF of
. Then, get the watermark bit and the watermark index of
according to Equations (2) and (3). The set of watermark bits is denoted by
, and the set of watermark indices is denoted by
. Next, introduce a set that all the elements are 0, denoted by
. Reassign the value of
by the following equation:
Finally, construct the zero watermark by Equation (5) and denote it as
.
5. Discussions
The above experimental results suggest that the proposed method has strong robustness under various attacks, especially for attacks with significant distortion, such as non-uniform scaling attacks, simplification attacks, and projection transformation attacks. Further discussions will be given from three perspectives to understand the characteristics of the proposed method better.
5.1. Local Characteristics
NFNF and NSNF are the foundation of the proposed method to construct the zero watermark. Based on neighbors, the former determines the watermark bit. Based on neighbors of neighbors, the latter determines the watermark index. Therefore, both of them reflect the local characteristics of the vector geographic data. This is why the proposed method can resist object attacks. When a feature is added or deleted in vector geographic data, this only affects one local part of the data, while some parts can be retained without being damaged. That is, the watermark in these unattacked parts is preserved and can be detected successfully.
Meanwhile, NFNF and NSNF are based on statistics of features, not vertices. Compared with vertices, the characteristics of the two are not so particularly local, but they are just right. This is the core reason why the proposed method can resist vertex attacks. In vertex attacks, the interpolation and simplification of vertices do not affect the neighboring relationship between features in vector geographic data, which is verified by
Figure 13 and
Figure 14. In the two figures, with the addition and deletion of vertices, the NC value of the proposed method is always kept at 1.00. Thus, the two local characteristics, NFNF and NSNF, enhance the robustness of the proposed algorithm.
5.2. Applied to Polyline Data
The algorithm in this paper is used for polygon data because it is based on the number of neighboring polygons. However, only with some simple modifications, the method can also be applied to polyline data. The key idea is to change the topological relationship from the adjacency to the intersection when counting the number of neighboring features in polyline data. The main procedure of the modified algorithm is as follows: (1) Similar to NFNF and NSNF, the number of the first-order intersecting features (denoted by NFIF) and the number of the second-order intersecting features (denoted by NSIF) are counted for every polyline feature. (2) NFIF is quantified to the watermark bit, and NSIF is quantified to the watermark index. (3) According to the majority voting mechanism, a zero watermark is constructed by combining the watermark bits and the watermark indices.
A demonstration of NFIF and NSIF of a polyline data is given below, as shown in
Figure 18. The polyline data contains twelve polyline features rendered with different colors. If two features have one or more common points, they are regarded as the intersecting features.
Figure 18a shows the statistics of NFIF values, and
Figure 18b shows the statistics of NSIF values. The labels near the features are their corresponding NFIF values or NSIF values. For example, for polyline A in the upper left corner, its NFIF and NSIF are 1 and 3, respectively.
Furthermore, unlike polyline data and polygon data, there is no similar intersecting or neighboring relationship for points in point data. Therefore, the proposed method is not suitable for point data. Converting points to polylines or polygons seems like a feasible approach to making it possible. For example, construct points to the Voronoi diagram. This will be further investigated in our future works.
5.3. The Watermark Uniqueness
The watermark uniqueness is one of the most important indexes in zero watermarking. It is determined by the characteristics of the data used to construct the watermark. A good zero watermarking algorithm needs to ensure that the watermarks constructed from different data are very different. To verify the watermark uniqueness of the proposed method, we selected six test data that are all administrative division maps, denoted Data 1–6, as shown in
Figure 19. First, construct watermarks from the six test data using the proposed method. Then calculate NC values between the six watermarks and the watermark generated by the experimental data in
Section 3.1, respectively. If the NC values are less than the threshold, it means that the method has good watermark uniqueness; otherwise, the method is not qualified. Finally, considering that the watermark length will affect the watermark uniqueness, we choose different watermark lengths for experiments: 32, 64, 128, and 256. Therefore, four groups of experimental results are produced, each with six NC values, as shown in
Figure 20.
It can be observed in
Figure 20 that all the NC values are less than the threshold for the six test data under four different watermark lengths. And it roughly shows a trend that the longer the watermark length, the lower the NC value for the data. In detail, the maximum of the NC values is 0.71, and most of them are concentrated around 0.55 and 0.65, which are less than the threshold of 0.75. This proved that the proposed method meets the requirement of the watermark uniqueness.
6. Conclusions
In zero watermarking for vector geographic data, resisting attacks that cause significant distortion of the data is a challenging problem. The key is to find the characteristics that are not affected by data deformation. In this paper, two local characteristics are introduced: NFNF and NSNF, and they are quantified to the watermark bit and the watermark index, respectively. Among them, NSNF, the number of second-order neighboring features, is the first time introduced into the watermarking for vector geographic data in the state-of-the-art of watermarking research. Further, NFNS and NSNF make full use of the topological and statistical characteristics of the data, which are the foundation of the proposed method. Experimental results show that this method has good robustness and can completely resist attacks with significant distortion compared with other algorithms, such as non-uniform scaling, simplification, and projection transformation attacks. The proposal of this method is a new exploration in improving the robustness of zero watermarking for vector geographic data. Moreover, the combination of topological characteristics and statistical characteristics can provide some ideas for the future watermarking research. However, this method is not suitable for point data. Exploring the conversion methods of point data to polyline data and polygon data will be the focus of our future research.