A Watermark-Based In-Situ Access Control Model for Image Big Data

When large images are used for big data analysis, they impose new challenges in protecting image privacy. For example, a geographic image may consist of several sensitive areas or layers. When it is uploaded into servers, the image will be accessed by diverse subjects. Traditional access control methods regulate access privileges to a single image, and their access control strategies are stored in servers, which imposes two shortcomings: (1) fine-grained access control is not guaranteed for areas/layers in a single image that need to maintain secret for different roles; and (2) access control policies that are stored in servers suffers from multiple attacks (e.g., transferring attacks). In this paper, we propose a novel watermark-based access control model in which access control policies are associated with objects being accessed (called an in-situ model). The proposed model integrates access control policies as watermarks within images, without relying on the availability of servers or connecting networks. The access control for images is still maintained even though images are redistributed again to further subjects. Therefore, access control policies can be delivered together with the big data of images. Moreover, we propose a hierarchical key-role-area model for fine-grained encryption, especially for large size images such as geographic maps. The extensive analysis justifies the security and performance of the proposed model.


Introduction
The development of deep learning enables the analysis of a massive amount of image data.During these processes, how to analyse the image data while protecting images from leakage and exposure is a big challenge.Traditional access control policies may be invalid when images are stored again in different servers.For example, 4G and incoming 5G techniques enable smart phone users to share their images easily.When an image is uploaded to a service provider (e.g., Facebook), a user can set access privileges to control the access rights for the image so that the image can only be accessed by "friends" or the public.However, when the accumulated images are redistributed to other parties for further analysis (e.g., the Facebook-Cambridge Analytica scandal [1]), the access control policies which were stored in its original servers are lost.Thus, desired protection solutions should integrate access control policies with the image itself.In other words, even if the image is redistributed, the access control polices will be attached (in-situ) as well.In addition, a simple "yes" or "no" access control on an image does not work well.For example, when taking a photo with a smart phone, additional information such as location data, latitude/longitude, map, date and time, etc. are also included.Various privileges should be attached for that information.Consider geographic or remote sensing images as another example.A geographic image may consist of several areas/layers.Thus, differentiating access control for various areas/layers requires fine-grained and flexible access control policies.
Recently, centrally regulated access control models (e.g., [2,3]) have been intensively studied.However, they are not suitable for image data sharing and redistribution for the following reasons: Distributed data can be accessed with two modes: "Yes" for all or "No" for all.For data that cannot be accessed publically, the data cannot be distributed.Once data is distributed, it can be accessed by all accessors.Besides, for those data that must be in access control (classified data), control policies are difficult to define and change, especially when the data volume is large.For example, for different areas in a single image with different access policies, we must set up different regulations in central control servers.Moreover, classified data can be accessed only when remote policy conformance servers are available.The accessibility of the data relies on the availability of networks and the workload of central control servers.It constrains the convenience of remotely accessing data.Furthermore, access control regulation for a large volume of data results in a large delay.Each time accessors request images, they must first fetch access control policies on servers.In big data scenarios, accessing a response on servers results in a large burden and access delay.Finally, once data is distributed, the control domain is changed.Thus, the old management authority may not be available to control the data.
Therefore, with the development of big data sharing and redistribution, traditional access control models based on central conformance should be improved to cater to the new requirements.
In this paper, we design a novel access control model in which access control is conducted by specific clients and access policies are carried together with access objects themselves.Our proposed access control model has the following advantages: Access control policies are attached with image data.Regardless of how many times the data are further redistributed, access control policies are still incorporated with the data.Additionally, access control is fine-grained.For images with large size (e.g., geographic or remote sensing images), control strategies must be specific to different partial areas instead of the entire image.In other words, different parts in one image must conform to different access privileges.Furthermore, accessing classified data does not rely on remote servers or available network connections.The control flow is made more lightweight due to reshaping regulations at clients(we also call it in-situ control).
Based on the above observations and analysis, we propose a new access control model for big image data sharing and redistribution.The major contributions of this paper are listed as follows: 1.
We propose a watermark-based access control model, allowing objects being accessed to integrate together with access control strategies.

2.
We propose a hierarchical key-role-area access control model for images with large size such as geographic graphs and remote sensing graphs.We also propose a hierarchical key generation method that can guarantee fine-grained access privileges.
The rest of the paper is organized as follows: Section 2 surveys related work.Section 3 formulates the research problems and challenges.Section 4 elaborates on the proposed models.Extensive analysis of the proposed scheme is presented in Section 5, and we conclude the paper in Section 6.

Related Work
The topic of watermarks has been explored for decades.Due to powerful software and personal computers, there has emerged considerable unauthorized copying and distribution of digital content, such as e-books, videos, and digital images.To solve this problem, watermarks are usually used to verify and protect the copyrights [4][5][6].In the above methods, both fragile watermarks and robust watermarks are coded as a legal label instead of as a control technique.Additionally, many methods have been proposed to detect the modification of images [7,8], but they are unable to find the modifier or prevent such modifications.
In recent years, several watermark schemes have been put forward for access control.Watermarks used for permitting hierarchical access control and protecting the content of visual medical information were proposed [9].However, original images are not encrypted in this scheme.A removable and visible watermarking by combining block truncation coding and chaotic map is proposed in [10], which can be applied in copyright notification and access control in mobile communication.They proposed two-stage watermarks that blur original images before visitors pass access control, and only authorized visitors can attain clear images.However, it is not a hierarchical access control.A. Phadikar proposed a data hiding scheme for access control and error concealment in digital images [11].He also proposed a data hiding method that integrates access control and authentication in a single platform, especially for cover images [12].Encrypted digital images are displayed in lower quality before watermarks are read.To summarize, the schemes above display images in lower-quality formats before visitors obtain permissions.The access control strategies are still not coded in watermarks.
Quality access control is are used in audio watermarks.K. Datta et al. proposed a combination of both encryption and audio watermarking.This method is used for the safe distribution of audio content over public networks, whereby only authorized users can access the high-quality content, while other users can only access a low-quality content [13].Watermarks can be used in video files to identify pirates, which can be extracted at the decoder and used to determine whether the video content is watermarked [14].We stress that our proposed scheme for integrating access control policies as watermarks can also be applied in audio files or video files, although we concentrate on images in this paper.
Geologic mapping and the design of geologic (thematic) maps are currently supported by Geographic Information Systems (GIS).In order to gain a high degree of efficiency and to allow the exchange of a common structured framework, map data models have been designed by agencies and individuals in order to support their mapping process.File-based geo-databases are much more accessible, but still suffer from a number of administrative limitations [15].A new access control mechanism that combines trust and role-based access control models is presented in [16].J. Kim proposes a multi-layer based access control model for GIS mobile web services [17].The objective of such spatially-aware access control models is to regulate the access to protect objects based on the position information.M. Kirkpatrick proposed role-based access control with spatial constraints [18].F. Ma et al. proposed a fine-grained access control model for spatial data in a grid environment based on a role-based access control model [19].Furthermore, a multi-granularity spatial access control model was proposed that introduces more types of policy rule conflicts than single-granularity objects [20].The model can manage and enforce the strong and efficient access control technology in large-scale environments.However, all of these access control strategies are not encoded into watermarks, and access control still relies on servers.
In recent years, Quick Response (QR) codes have been popular due to their efficiency and security.They are widely used in mobile phones (e.g., applications of instance messaging, user login, and mobile payment).QR codes can not only store large information, but also have error-correction ability [21].In addition, QR codes have high recognition rate, and there are massive algorithm libraries to invoke [22].For these reasons, we chose the QR code as a case study for our model.

System Model
Figure 1 depicts the traditional access model, which includes four entities: servers, accessors, images, and access control unit.The access control unit is located with servers.Traditional access control processes include four steps, as follows: (1) Accessors request to fetch some data (e.g., images) from servers; (2) Servers inquire access control strategies from the access control unit to determine corresponding accessible objects (e.g., images); (3) The access control unit regulates access privileges as well as accessible objects accordingly; (4) Servers return accessible objects to accessors corresponding to designated privileges.In big data publication scenarios, we move the access control unit to clients, so as to provide persistent control.We change the access control processes as follows: (1) Servers incorporate access control strategies into images as watermarks.( 2) Accessors request to fetch some data (e.g., images), and servers publish image big data to accessors.(3) The access control unit in clients parses access control strategies in watermarks to determine access to objects in images.(4) The access control unit regulates access privileges and returns accessible objects to accessors.Note that embedding methods for access control policies are independent with the above architecture.Watermarks or other associated tags can also be workable if they can reveal access control policies.In most cases, invisible watermarks may be preferred.
Access control policies are embedded with big data, and thus the access control unit is moved to clients for persistent control, regardless of how many times the data are re-distributed.Additionally, access control can be accomplished without assuming the availability of servers and networking connections, which also mitigates the workload of servers and shortens the access delay.

Transferring Attack
Existing access control models invite the transferring attack.In a transferring attack, if accessor "A" can access image "P", then accessor "A" can transfer image "P" to others, such as accessor "B".Thus, accessor "B" can easily gain the access privileges of accessor "A".
To tackle this attack, we propose the use of a watermark-based access control model where access policies are embedded with objects and move the access control unit from servers to clients.
Besides, transferring attacks cannot be accountable.That is, it is impossible to trace back to original leaking accessors if many accessors can access the same objects.In other words, the provenance of leakage is lost.To provide provenance, we can also rely on watermarks that can reveal the identification of originators or leakers.Proposition 1.For persistent access control, access control policies need to be associated with accessible objects, and the objects can only be accessed upon parsing policies at clients.Additionally, the objects need to return back to unaccessible status after the allotted time of authorized access.
Proof.If objects do not retain unaccessible status after being accessed, others can also access those objects when they are transferred to others.
If access control policies are not associated with accessible objects, clients will not be able to enforce access policies.Proposition 2. For the provenance of distributed data, data must carry the identification information of originators.
Proof.If data do not carry any of the originators' identification information, the provenance of who distributes data cannot be determined.

Distributed Denial of Service (DDoS) Attack
Traditional access control models rely on the availability of servers and access control units.The availability can be damaged by distributed denial of service (DDoS) attack.If servers or access control units cannot be accessed, access processes or services will be terminated.It is much easier to let clients be available than servers, thus access control that is migrated to clients will be more scalable and durable.

Coarse Access
In traditional access control models, servers are confronted with a large volume of data and access requests, and fine-grained access control will experience much difficulty due to workload.It is not fine-grained if access control is specific to an entire image, instead of for a specific area or layer in the image-especially for those images that have large size such as geographic graphs or remote sensing graphs.Traditional models may have to tackle fine-grained access by extra control, which further increases the overhead of servers.

Physical Copy Attack
In image big data distribution, the most difficult attack to defend against is physical copy attack, in which images are copied by physical manners such as screen capture or outside photo shooting.After accessors gain access to images, those images are totally displayed and out of (access) control.This attack must be tackled, especially if certain areas or layers in images must remain confidential.It cannot be defended against by access control because it is a kind of proactive defense before events.This attack can be traced back by watermark-based schemes for further provenance, as that is a kind of reactive defense after events.Proposition 3. Physical copy attack cannot be defended against by any access control schemes, but it can be traced back to the source of image leakers, which is called provenance.The provenance can only be achieved by associated watermarks in images.
Proof.As images can be uncovered and viewed by authorized accessors, physical copy attack such as screen capture and photo shooting is also possible.
The provenance can be achieved by embedding watermarks in images, as watermarks are also carried by images during and after physical copy attack.
Only when some watermarks associated with the identity of originators are embedded with uncovered images can the provenance of originators who exposed the images be accomplished from leaked images.

Design Goals
We list design goals as follows: Design a novel access control flow that migrates the control unit from servers to clients.Design a watermark-based access control model that provides fine-grained access control for various areas or layers in a single image.Defend against attacks imposed by traditional access control models and propose a tailored design for big data sharing and redistribution of images with large sizes.

Remark 1. Images can be downloaded only from servers who embed access policies into images via watermarks.
Images can only be viewed via particular client tools, such as an image browser that can extract watermarks, parse watermark semantics into policies, and enforce access control policies before viewing.The context of watermarks can be recognized by corresponding clients.
Accessors may register their roles on servers at first, and their roles can be affirmed by client tools before viewing images.
The client tool can transparently decrypt images by asking for the correct keys.After accessors view their corresponding partial areas, those areas are encrypted again by client tools transparently.
If a hard copy of images is obtained by screen capture or photo shooting, watermarks in images can facilitate the trace back to the accessor who was the last authorized viewer.

Basic Settings
We first describe a concrete process to explain our scheme, which consists of three steps as follows: 1. Accessors registration.Accessors register for data access on servers.They are assigned a role or multiple roles by servers.2. Data publication.Servers who are data publishers or distributors embed access control policies via watermarks in data such as images.Data is published, in which certain areas or layers may be encrypted by secret keys related to control policies.3. Client conformance.Accessors request images via particular client tools, such as image browsers.
Client tools ask accessors to present their roles and secret keys.Client tools enforce control policies by parsing from watermarks that are embedded in images, and decrypt corresponding areas or layers in images by responding secret keys.
Obviously, data publication and client conformance are critical in the design.Next, we propose a hierarchical encryption model as a concrete scheme.

Hierarchical Key-Role-Area Access Control Model
The encryption (and decryption) of various areas in a single image can be conducted by the following proposed hierarchical models. HKRAGraph

1.
Hierarchical Keys (a) KEY ::= l, c , where l ∈ N is a key level, and c ∈ N is a key column.Keys should be classified into different levels.In other words, a key has two metrics: one is key level denoted as l, and the other is key column denoted as c.
is a one-way function.It is computationally infeasible to obtain x from g(x), where x ∈ KEY.(e) k[j, c] can be computed from any k Simply speaking, a key with a larger key level can be derived from any key with smaller key levels in the same key column.If accessors possess a key of a smaller level, they can derive all keys with larger key levels in the same key column.Thus, a larger-level key can decrypt the data encrypted by a smaller-level key, but not inversely.

Hierarchical Roles
(a) ROLE ::= l, c, u , where l is a key level, c is a key column, and u is an identification to distinguish multiple roles for the same key.As multiple roles may map to the same key with k[l, c], multiple identifications (e.g., u) are required for the distinction of multiple roles.(b) R2K : r ∈ ROLE → k ∈ KEY, where ROLE is a set of roles; KEY is a set of keys.It is a function.It does not need to be one-to-one.That is, multiple roles may map to one key.We denote r ∈ ROLE that maps to the same key k[l, c] as r[l, c, u], l, c, u ∈ N. R2K(•) is on-to.Simply speaking, multiple roles may be related to one key.Regarding the privileges for images, the mainly one is "read".A role with smaller (higher) levels can access all objects that can be accessed by roles with larger (lower) levels.Each role will be mapped to a key.(c) R2L : r ∈ ROLE → l ∈ N, where ROLE is a set of roles; l is a natural number representing a key level.Note that ∀r ∈ ROLE, R2L(r) ⇐ K2L(R2K(r)).That is, roles are also hierarchically classified into different levels.
(d) R2C : r ∈ ROLE → c ∈ N, where ROLE is a set of roles; c is a natural number representing a column number.Note that ∀r ∈ ROLE, R2C(r) ⇐ K2C(R2K(r)).This function returns a key index (in terms of key column) for a role, which can be used for guaranteeing derivative relationship between keys.(e) R2U : r ∈ ROLE → u ∈ N, where ROLE is a set of roles; u is a natural number representing users who are associated to the same key.Note that ∀r 1 , r 1 ∈ ROLE, if R2K(r 1 ) = R2K(r 2 ), then R2U(r 1 ) = R2U(r 2 ).
The model proposed above is illustrated in Figure 3.

Differentiate Areas by Roles
(a) AREA ::= l, c, u, i , where l is a key level; c is a column number; u is an identification to distinguish multiple roles for the same key; i is an identification to distinguish multiple areas for the same role.Note that ∩ l,c,u,i a[l, c, u, i] = ∅.(b) A2R : a ∈ AREA → r ∈ ROLE is a function.It does not need to be one-to-one.That is, multiple areas may be assigned to one role.As r is a tuple with thre elements, a is a tuple with four elements.(c) A2K : a ∈ AREA → k ∈ KEY is a function.It does not need to be one-to-one.Note that ∀a ∈ AREA, A2K(a) ⇐ R2K(A2R(a)).
Remark 2. Note that, AREA can also be replaced by LAYER.In geographic images, there may be multiple layers in a single image.
a ∈ AREA could be any shapes (e.g., circles or rectangles), which are independent of the design of this paper.The details on areas can be embedded in watermarks, such as one-point locations with two rectangular edges.Areas for different roles can be overlapped.For different roles with the same R2K, the areas may be different and one area information for one role may not be available for the other role.
If we remove the constraints of R2K from a function to any mapping, then one role may map to multiple keys.
The proposed access control model is illustrated in Figure 4.

Image Publication
Images can be processed before publication as follows: 1. Servers select an image to publish.Corresponding areas (e.g., a ∈ AREA) in this image are split according to security concerns and assigned to different roles.Areas are layered into different security levels, such that roles who can access higher security level (with larger key level) will be able to access lower security levels (with smaller key level).Servers formulate access control strategies by ACL ::= ROLE, AREA , where ∀a ∈ AREA, ∃r = A2R(a) ∈ ROLE. 2. Servers code access control strategies into watermarks and embed them into published images.
For example, QR codes can be used as watermarks, and strategies are coded into QR codes.3. Servers maintain a table for the image TBL ::= a ∈ AREA, f (A2K(a)) , and encrypt specific areas in images with corresponding keys.For example, servers encrypt a by f (A2K(a)).f (•) is a one-way function.f (A2K(a)) instead of A2K(a) is stored for better confidentiality.A2K(•) is initialized by servers in HKRAGraph.4. ∀a ∈ AREA in this image, a is encrypted by f (A2K(a)), and note that all K2C(A2K(a)) are identical.5. ∀a 1 , a 2 ∈ AREA in an image, we have A2C(a 1 ) = A2C(a 2 ).Simply speaking, for all areas in one image, encrypt keys must be in the same column index.6. ∀a 1 , a 2 ∈ AREA in an image, if A2L(a 1 ) = A2L(a 2 ), then A2K(a 1 ) = A2K(a 2 ) due to A2C(a 1 ) = A2C(a 2 ).

Client Conformance
Client conformance for access control can be processed as follows: 1. Accessors request images via a particular client tool (e.g., image browser).
2. The browser prompts to ask for and obtain a secret key k and a role r corresponding to an accessor.
3. The browser extracts a QR code, obtains access control strategies (i.e., ACL ::= ROLE, AREA ).All a ∈ ACL.AREA are obtained for r ∈ ACL.ROLE.That is, A2R(a) = r .4. The browser computes f (k ), and decrypts all areas for r (i.e., a).Note that the key is not stored in the browser, and only f (k ) is computed temporarily by the browser and destroyed after browsing.5. Calculate all j > l, k[j, c] ⇐ g j−l (k[l, c]), k[l, c] = k and decrypt left areas at lower levels.That is, a ∈ ACL.AREA, A2R(a) = r by k[j, c]. 6.The browser displays all a to the accessor.7. Accessors close the browser, and the browsed image returns to its original encryption status.Remark 3. Servers will maintain consistency with client tools for function f (•) (i.e., the same f (•)).Once the consistency is retained, f (•) can be evolved further regularly to provide forward security.Alternatively, an extra pairwise key (e.g., key p ) between servers and client tools can be introduced into f (•) as f (•, •) (e.g., f (•, key p )).We stress that client tools do not locally and permanently store accessor keys.Instead, decryption keys for encrypted areas in images are computed temporally upon browsing.

Case Study
It is a trend to incorporate multiple maps from one location into one map as multiple layers.For a better explanation, we separate a combined map with multiple layers into three individual maps.In this case study, three maps of Shanghai are displayed in Figure 5 [23], which includes a remote sensing image, a geologic map, and a city planning map.These three maps describe three aspects of the same location.A combinative map can provide various aspects of one location in one map by multiple layers, which facilitates fast linkages to relevant information within one area.The security levels of roles and corresponding layers are embedded into maps as watermarks, and thus access control strategies can be obtained from distributed maps without consulting servers.Accessors present their roles to a dedicated client tool such as an image browser, and specific areas that can be accessed by presented roles will be determined by the client tool.
In one map, accessible areas are encrypted by corresponding keys (e.g., a i is encrypted by k i (i = 1, 2, ..., n)).Only someone who presents the correct key (e.g., k i ) can view the corresponding encrypted areas (a i ).We also provide a kind of hierarchical access by hierarchical encryptions for areas.That is, keys at lower security levels can be derived by keys at higher security levels (e.g., k i+1 can be derived by k i ).Thus, areas for roles in lower security levels can also be decrypted and viewed by roles with higher security levels.Upon request for images by an accessor, the image browser will prompt the accessor to present their key (e.g., k i ).The image browser will compute f (k i ) and use it to decrypt corresponding areas.
In combinative maps, one area consists of multiple aspects presented in layers.For example, geology, remote sensing, and city planning are three layers of a single city, Shanghai.Some accessors may only be able to access one layer among them.Accessors present their roles and keys to reveal corresponding layers.

Security Analysis
Defending Against Transferring Attack.Images are encrypted by designated keys related to corresponding roles or accessor identifications, and accessors must present the correct keys to enable client tools to decrypt images for browsing.Encrypted images cannot be decrypted without keys, even if images are transferred to others again.Moreover, decrypted images can only be decrypted and displayed in client tools.Images will return to their original encrypted status after browsing.
The control unit migrates to client tools and it maintains control even though images are redistributed again.The control policies are associated with images as watermarks, which specify what areas can be viewed for given roles.The decryption can only occur upon browsing, and the encrypted area returns back to confidential status after images are browsed in the client tools.That is, the encrypted areas (layers) are transparently decrypted and ephemerally displayed upon browsing.
Defending Against DDoS Attacks.As access control logics are embedded in watermarks together with images, client tools can control access policies without consulting servers and relying on networking connections.Thus, DDoS attacks for servers and networking connections are not workable.
Defending Against Coarse Access.Our model can differentiate the access privileges for various areas in a single image, and similarly, further access control for various layers in a single area are also possible iteratively.
Defending Physical Copy Attack.As visible watermarks such as QR codes or invisible watermarks are incorporated with images, anyone who obtains physical copies of images by screen capture or outside camera shooting will be traced back by watermarks.The roles and identifications can be revealed by decrypted areas in captured images and control policies in watermarks.Proposition 4. It is hard to compute k j from k i if k j = g(k i ), where f (•) is a one-way function.
Proof.Straightforward.We use a one-way function to drive keys in lower security levels from keys in higher security levels.As the function is one-way, the derivation of keys will be also one-way.That is, it is hard to compute x from g(x).

Performance Analysis
Computation Cost.The major computation in the scheme are as follows: encoding and decoding watermarks, encrypting and decrypting areas in images, and one-way function computation.However,

Figure 1 .
Figure 1.Existing traditional access control model.

Figure 2 Figure 2 .
Figure 2. Existing traditional access control model.

Figure 3 .
Figure 3. Hierarchical key and role model.

Figure 5 .
Figure 5.A combinative map of Shanghai with multiple layers.The first one is sensing image.The second one is a geologic map.The third one is a city planning map.
where KEY is a set of keys; l is a natural number representing key level.It is a function.It does not need to be not one-to-one.That is, multiple keys may map to one level.It is on-to.We denote the k ∈ KEY with level l as k[l, •].If multiple keys map to the same level l, we distinguish them as k[l, c], c ∈ N. (c) K2C : k ∈ KEY → c ∈ N, where KEY is a set of keys; c is a natural number representing key column.It is a function.It does not need to be one-to-one.That is, multiple keys may map to one column index.It is on-to.We denote the k ∈ KEY with index c as k[•, c].If multiple keys map to the same column c, we distinguish them as k