3.1. Preliminary Knowledge and Notations
CP-ABE method [
12] mainly consists of the following steps:
Step 1: System initialization Setup (1λ). Let G, GT be two cyclic groups in which the order is a large prime number q. g is a generating element of G. Definition e: G × G → GT is a bilinear mapping. The hash function H: {0, 1}* → G describes the hash value of a user’s identity mapped to G. The hash function uses SHA-1. The authorization center A (access control server or intra-region sub-policy server) has a set of attributes L, A that randomly choose ai, bi ∈ Zq, Zq = {0, 1, 2, …, q-1}. The private key of the authorization center A is SK = {ai, bi, i ∈ L}. The public key of A is PK = {e(g, g)ai, gbi, i ∈ L}.
Step 2: Key generation and distribution. Node u receives a set of attributes I(u) from the authorization center, and a private key ski,u = gai H(u)bi corresponds to each i ∈ I(u). Note that all keys are passed to the target node through a secure channel, e.g., encrypted using SSH or the target node’s public key, and can be decrypted only by the node with the corresponding private key.
Step 3: Sender encryption
Encrypt(
M,
R,
π). The sender determines the access tree, extracts the LSSS (Linear Secret Sharing Scheme) access structure (
R,
π), and computes the ciphertext based on the input message
M, the access matrix
R, and the mapping relation
π from the row vectors of
R to the attributes. First, the sender chooses a random seed
s ∈
Zq, a random vector
v = (
s,
y2, …,
yn) ∈
Zqn,
n is the number of attributes in the access policy, and the first term is
s. Then, the sender computes
λx =
Rxv, where
Rx is the
xth row of
R. Also, the sender chooses a random vector
w ∈
Zqn, with the first term 0, and computes
wx =
Rxw. Further, for each row
Rx of
R, the sender chooses a random number
ρx ∈
Zq, and then computes the following parameters:
where
π(
x) is a mapping from
Rx to attribute
i. Finally, the sender assembles the ciphertext
C = <
R,
π,
C0, {
C1,x,
C2,x,
C3,x, ∀
x}> and sends it to the receiver, along with the access tree
R matrix.
Step 4: Decrypt algorithm Decrypt(C, {ski,u}). The receiver u inputs the ciphertext C, the key {ski,u}, the group G, and outputs the message M. It first obtains the access control matrix R and the mapping π from C, and then performs the following steps:
First,
u computes the set of shared attributes {
π(
x):
x ∈
X} ∩
I(
u) to obtain the intersection of the assigned attributes and the corresponding attributes of the access matrix, and
X is the set of row vectors of the matrix
R. Then, for each attribute of the set of shared attributes, it is checked whether there exists a subset
X’ of the row vectors of
R, such that their linear combination is equal to (1, 0, …, 0). If it does not exist, decryption is not feasible, and
Decrypt(
C, {
ski,u}) = NULL. If it exists, the receiver computes the constants
kx ∈
Zq that satisfy
;
Kx is a vector of constants consisting of
kx,
x ∈
X’. Finally, the decryption computation proceeds as follows:
Trusted computing [
23] is an active immune method and its basic function is shown in
Figure 4.
Integrity measurement: variable x (which can be an environment value, attribute value, specific value, etc.) has an integrity value of f(x). The hash calculation h(.) is a commonly used integrity calculation method, and f(x) is stored in a trusted module. The trusted module (TPM) recalculates the variable x at a certain frequency (f2 in this article; f2 is an important parameter) to obtain f’(x), and determines whether its integrity value has been tampered with (f(x)? = f’(x)).
Identity authentication: authentication is implemented using TPM integrity metrics and remote proofs [
23]. TPM reports AIK (Attestation Identity Key) certificates and AIK-based signatures of integrity metric values (
h(
x1),
h(
x2), …,
h(
xn)) to neighboring nodes, and the neighbor nodes certify the correctness of the certificates and signatures to complete the authentication.
Confidential storage: TPM stores the integrity measurement value in the PCR protected by the storage root key (SRK). TPM is physically tampering-resistant; it uses a nonvolatile memory to protect the EK, SRK, and PCR. It uses a secret area taking the SRK as a root for protecting the platform data outside the EK, and a sealed storage function is provided for protecting the data from unauthorized operation by the authorized user.
The notations used in this paper are shown in
Table 2.
3.2. Attribute Modeling and Non-Compliance Identification Incorporating Trusted Mechanisms
Attribute modeling is the key to attribute-based access control (including CP-ABE method), whose task is to find a set of attribute collections and formulate authorization relationships on the collections.
3.2.1. Generalized Attributes
Generic attributes generally include identity attributes, operation attributes, environment attributes, etc. The access attribute set Attr_Set = <ID, Act_Set, Env_set>, ID denotes the set of node identities, such as {PLC1, PLC2, control-server1, data-server2, …}. Act_Set denotes the set of operation behaviors for control instructions, commonly including read, write, relay operations, that is {read, write, relay}. Env_set denotes the set of environment attributes; the common ones include the node’s function, function {monitoring, field control}, and other restrictions.
3.2.2. Attributes of Business Relevance
Taking the oil and gas separation operation of the process control system as an example, the processes of gas heating and pressurization, mixing, reacting, and exhausting are in strict sequence within a cycle Tn.
Each process has a period, e.g., after completing the “mixing” period, it is handed over to the “reacting” period. Env_denote = {Nh, Nm, Nr} denotes the identifiers of each process. When the PLC finishes controlling the current process, it sends a handover signal to the control server, which starts or loads the control program for the next business process, and at the same time notifies all the currently connected nodes to update their identifiers.
Within a single process period, the operating status of the equipment shall be limited to the scope defined by the control program for that process. Pressurized operation (belonging to Nh) is not permitted within the period of mixing marked as Nm to avoid gas explosion.
There are two modes for industrial scenarios (as shown in
Figure 1). When a single controller controls a single business process, devices within the data transmission range of a single business process only need to mark the current business flow
Nxe ∈
Env_denote. When a single controller controls multiple business processes, the process markers on the devices need to be switched positively and instantly by the timing relation
Nh →
Nm →
Nr →
Nh within the processes cycle
Tn.
The business process identifier is added to the Env_set as one of the factors of environmental constraints. The receiving node uses a monitoring mechanism to recognize if the current operation is constrained within the current process, thus allowing/warning the operation.
Another business correlation is spatial location. The monitoring system directly connects the physical site. The closer it is to the PLC, the greater the threat is to the physical site device from operations that violate the security policy (especially write operations). Control servers and PLCs need to be guarded against tampering with critical data to cause wrong control (e.g., maliciously closing valves).
The business relevance constraints are all reflected in the environmental attributes. We expand the set of environment attributes, Env_set= {function, Nxe, Lx}. Lx = {1, 0} denotes the relevance to the field operation, where 1 denotes relevance to the field operation and 0 denotes no relevance. Considering the spatial location, the data operation sent to the control server and PLC is to set Lx = 1. The associativity constraint avoids tampering of data by the control server and PLC by formulating an operation preemptive policy, e.g., the one that allows the authorized nodes to further manipulate the data only if the integrity metrics of the control program and critical instructions are passed and the operation behavior is in line with the predefined operation flow of the control code.
3.2.3. Identification of Business Relevance Anomalies
TPM and access control module collaborate to identify anomalies in business-related behaviors.
As mentioned earlier, Lx =1 indicates that the operation is directly related to the control business. Each process, such as gas heating and pressurization, mixing, reacting, and exhausting will have a set of control programs, and each set of control programs indicates a strict time sequence within the period, such as opening/closing/increasing/decreasing the valves, etc. TPM saves the integrity metrics of the control programs and verifies the integrity according to a certain frequency (f2). Once an anomaly occurs, it implies that there may be tampering of the data by the control servers and the PLCs; the TPM interrupts the operation process and submits it to the detection server to analyze the code logic relationships. Simultaneously, the access control module also monitors the execution process of each group of control programs with timing relationships within the process in real time. Once recognizing that critical control instructions violate the business timing, such as reading/writing programs segment that are not the current business segment, the module issues an early warning and interrupts the operation process.
Integrity metrics and compliance identification are also reflected in the attribute model. Let e1 denote the TPM’s integrity warning for control instructions; e1 =1 when without warning. e2 denotes the TPM’s integrity warning for the static control program; e2 =1 when without warning. e3 denotes the TPM’s warning indication for control instructions that violate the predefined operation timing of the segment of the programs within the period of the process and when without warning e3 =1. Then, the warning indications E = e1 e2 e3. When E = 1, access to the data is normal; otherwise, access is blocked. E is also an environment variable; the supplemented environment variable is Env_set_new= {function, Nxe, Lx, E}.
3.3. Identity Authentication and Data Transfer Incorporating Trusted Mechanisms
Without the loss of generality, the transfer process of the control command between two nodes is shown in
Figure 5. Each node deploys an access control module for access function realization and communication interactions. When the data flow initiator is ready to send data, the resident access control module first authenticates the identity information of the data initiator and the connected network node, and only valid identities are added to the routing list to avoid the data being sent to the revoked node. Then, the initiator formulates the attribute-based access control policy (CP-ABE method) for the data. The access control module of the receiver performs attribute matching and identification, and only the node that has the attribute policy defined by the sender can obtain the decryption key to decrypt the data. If it is an intermediate node, it continues to deliver the data.
Due to the node stability of the industrial control system, it is not necessary to perform authentication at every interaction. Authentication is required at the beginning of the establishment of the industrial control network, when a new node joins, or when a node that is not in the access control list due to revocation of privileges re-initiates a connection request. To prevent tampering by the operating system, authentication is implemented using TPM integrity metrics and remote proofs [
23]. The TPM reports AIK (Attestation Identity Key) certificates and AIK-based signatures of integrity metric values (
h(
x1),
h(
x2), …,
h(
xn)) to neighboring nodes, and the neighbor nodes certify the correctness of the certificates and signatures to complete the authentication. After all the nodes pass the authentication by the above method, the data transmission starts.
Data transmission combines the Modbus protocol [
5]; node
a transmits data
Md {
C,
h(
data),
Tv, (
R,
π)} to the target node
d.
C is the ciphertext encapsulated by the CP-ABE algorithm, in which the plaintext
M = MK (
data).
MK (
data) denotes the data encapsulated by the Modbus protocol and encrypted with the session key
K between node
a and node
d adopting symmetric encryption method (e.g., DES or AES). The key update frequency
f1 is fixed or updated on demand, and the maximum frequency is one cryptogram at a time.
h(
data) is the hash value of the data.
Tv = h(
K) ⊕
Tmod denotes the authentication sequence number,
h(
K) denotes the hash value of the session key
K, and
Tmod denotes the checksum value that comes with the Modbus protocol. (
R,
π) is the LSSS access structure formulated by node
a. Node
d receives and extracts the Modbus protocol data by the access control privileges and calculates the data hash value to verify the data integrity. When there is an integrity verification exception or synchronization exception, node
d verifies
Tv. When the verification fails, the data are retransmitted. When the verification fails again, the request is re-initiated and identity verification is performed, at which time the session key in
Tv needs to be updated synchronously. The same method is used to transfer data from node
d to node
a.Notice that, considering the high availability for instructions transmission in industrial control networks, the sender encrypts the transmitted data using a set of shared session keys that have been negotiated in the authentication phase using an end-to-end encryption method, MK(data) = EnK(data), and the key K is the encryption key. The core of TPM is memory sealing and remote authentication. The TPM stores the negotiated key K and its generating parameter, the public key, and the decrypted commands and their integrity metric values in the PCR isolation zone to avoid unauthorized access by the host operating system.
3.4. Access Control Policy Optimization Under Availability Constraints
We deploy an access control server on the monitoring network bus and deploy the access control module-integrated TPM on each node of the industrial control network. The access control server monitors the node and network status and adjusts and optimizes the access control policy when the deployed policy affects the availability of the industrial control system.
3.4.1. Availability Constraint Problem
Time delay is a key factor in the availability constraints of control instructions and control functions, and industrial control systems require that control cycles and data acquisition be within acceptable delay; the time delay of a control cycle includes intra-node access control processing overlay delay and inter-node transmission overlay delay. Let the set of industrial network nodes through which business data flow in a single control cycle be N1 = {N1, N2, …, Nm1}; m1 is the number of nodes, the computation time of the access control deployed by the i-th node is ta(i), the original business processing time is tb(i), and the total computation time is tcal (i) = ta (i) + tb (i). When the command instruction is transmitted from the source node to the target node, the transmission time of the access control is tc(i), and the original service transmission time is td(i). The total transmission time is tcom(i) = tc(i) + td(i). A single control cycle (e.g., control instruction transmission, processing, and feedback) consists of multiple processes at multiple nodes, so the time delay of a single control cycle is . N1 is the set of nodes involved in the control cycle. An access control policy can be deployed to a target node when t < tmax. tmax is the maximum delay allowed for a control cycle.
3.4.2. Access Control Policy Optimization
The policy optimization includes the following three aspects: optimization of policy deployment location and intensity, optimization of session key update frequency, and optimization of the frequency of integrity metrics of TPMs on target programs or key attributes.
TPMs are deployed in network nodes in a parallel manner and some PLC devices have limited energy, which is not suitable for deploying TPM chips. In this case, the authentication at the beginning of PLC access is implemented by the control server (host computer) by proxy. The PLC adopts the host computer’s AIK key to sign the PLC’s integrity metric value (h(x1), h(x2), …, h(xn)), and then reports the platform integrity to the host computer at a frequency; the host computer verifies the signature and compares integrity value with the first pre-stored one to determine whether it has been tampered with. The PLC directly loads the attribute-based access control set <{ID}, {Act_Set}, {function, Nxe, Lx, E}> to implement access control to the control program in the PLC and the passed control instructions. The PLC does not participate in the CP-ABE/decryption process, and the host computer passes the control commands stored in the TPM to the PLC in a symmetrically encrypted way to avoid leakage and tampering of the Fieldbus.
Optimization of key update frequency
f1:
f1 is defined as the number of updates per unit time. Let the number of command interactions per unit of time be
nt; the one-at-a-time protection is the best, but it is difficult to satisfy the delay constraint; if the key is not updated for a long time it will lead to information tampering and leakage. We set the key to be updated when anomalies are detected, assuming that the average number of finding anomalies is
nrec (
nrec <
nt), and updating the key before detecting the threat can guarantee the protection effect. Therefore, we establish the optimization constraint rule as follows:
We deploy a complete access control mechanism, randomly select the instruction transmission path, and compute Equation (4); if the delay constraint is satisfied, the optimal solution of Equation (3) is one secret at a time, f1 = nt. If Equation (4) is not satisfied, we need to find a lower bound on f1 that ensures the protection effectiveness.
The industrial control system is more stable than the Internet system; we consider the worst case of Internet packet loss, assuming that each packet loss can be found. The average number of abnormalities nrec obeys the Pareto distribution, the cumulative distribution function (CDF) is , where α is a Pareto parameter, it is in line with the empirical distribution of the historical statistical values of the anomalies’ number; the complementary CDF of nrec is , assuming that is negligible, and hence the average number of anomalies found in the worst case is . Since f1 < nrec and the number of instruction interactions is integer, min f1 = nrec + 1 under satisfying the delay constraint. ε is chosen according to the system preference, and f1 = 11 when α = 2 and ε = 0.01. If Equation (4) is still not satisfied when f1 takes the lower bound, we optimize the DH key negotiation process as follows: a set of negotiation keys {K1, K2, …, Ktn} is generated at the beginning of the communication establishment, and when an exception occurs, we firstly select the keys that were not randomly used before (set the random index) from the key list to be used for the instruction transmission, which avoids repeating the negotiation process, and further reduces the time delay. To avoid exhaustive attack, we set a forced update period Tmax, which updates a set of negotiation keys for the next batch of instruction transmission during the gap period of instruction transmission.
Optimization of TPM monitoring frequency f2: f2 is defined as the number of times the target object is monitored per unit time. The monitoring process includes reading the target program, computing the hash value, calling the hash value in memory for comparison, and feeding back the results. Due to the dual architecture of trusted computing, the additional delay of TPM monitoring on the target program is mainly the interaction with the interface program, which is negligible in comparison with the encryption operation. From the viewpoint of defense effect, f2 should be as large as possible. f2′ s lower bound must cover anomalies, i.e., nrec + 1< f2 < fnor, where fnor is the regular monitoring frequency of TPM. When Equation (4) does not meet the requirements but is close to the set delay, f2 can be reduced (e.g., by 50%, but must be greater than nrec + 1) to reduce the delay.
3.5. Access Control Implementation Steps
The steps of the proposed access control method integrated with trusted mechanisms are as follows:
Step 1: Initialization. The access control server records the node identity (
ID) when it connects to the access control module of each node in the domain and extracts the access attribute set
Attr_Set = <{
ID}, {
Act_Set}, {
Env_set}>. Node
u receives a set of attributes
I(
u) from the access control server and a private key
ski,u =
gai H(
u)
bi corresponding to each
i ∈
I(
u) (see
Section 3.1). All keys are delivered to the nodes using
SSH encryption during the system initialization phase. Each node pair negotiates a set of session keys using the DH method, and the related parameters are also transmitted to each node using
SSH encryption, key
K, and its related parameters are stored in the PCR of the TPM.
Step 2: Access control policy formulation. The message sender formulates an access control policy based on the desired access attributes of the data. For example, policyi = <PLCi ∈ ID, write/read, {field control, Nh, Lx =1, E = 1}> is an access control policy indicating that the PLC node with identity i can read or write critical data in the field control operation of gas heating and pressurization. The access that satisfies the policy conditions is Permit, otherwise Deny. The message sender extracts the LSSS access structure (R, π) based on the formulated access control policy, R is the access matrix, and π is the mapping relation from the row vector of R to the attributes.
Step 3: Trusted platform startup. The node runs the TPM platform and measures the integrity of the node’s environment. The attributes used for authentication are (h(x1), h(x2), …, h(xn)), which include the version number of the platform control program, the code function division parameter, and the attributes associated with the control code {function, Nxe, Lx, E}, and stores the integrity metric in the PCR register of the TPM platform to realize the isolated management of sensitive data.
Step 4: Data sending. The resident access control engine first authenticates the identity of the data initiator and the connected network node using TPM and adds valid identities to the routing list. The message sender encrypts the instructions to be transmitted using a negotiated shared key
MK(
data) =
EnK(
data). Further, the message sender computes the ciphertext based on the access structure (
R,
π) using the
Encrypt(
K,
R,
π) algorithm in
Section 3.1, assembles the ciphertext
C =
<R,
π,
C0, {
C1,x,
C2,x,
C3,x, ∀
x}> and sends data
Md {
C,
h(
data),
Tv, (
R,
π)} encapsulated by the Modbus protocol to the receiver. The PLC sends and receives data according to the optimized scheme (see
Section 3.4.2).
Step 5: Data reception. When the receiver is the target node, its resident access control engine first authenticates the identity information of the data initiator and the previous hop node on the first connection, and the connection is broken if the authentication fails. After authentication, the receiver extracts the access structure (R, π), queries its own attribute set and attribute values (including the attribute values formed by the result of the TPM’s metrics on the integrity of the instruction data and control program, as well as the result of the expected operation of the control instruction) to see if they satisfy the access control policy set by the access tree, and interrupts the transmission if not satisfied. When satisfied, the receiver executes the decryption algorithm Decrypt(C, {ski,u}) to authorize access to the key K in the PCR. Then, the receiver decrypts data data= DeK (MK (data)) using the symmetric key K. It synchronously verifies the integrity of data and K in Md and alerts for tampering. When the data flow through the intermediate node, there are two cases: one is that the data need to be manipulated, for example, the host computer needs to perform some operation or processing of the instructions from the workstation with the data from the database, and then send it to the PLC. In this case, the host computer has the attribute privilege to decrypt and process the data in the TPM isolation zone and encrypt them to be sent to the PLC. The other case is that the data do not need to be manipulated; data are flowed directly to the next node as directed by the routing table.
Step 6: Access control policy optimization. The access control server monitors the node and network status and computes the time delay of each access control process. If the deployed policy affects the availability, then the server optimizes the access control policy as follows: optimization of session key update frequency, and optimization of the frequency of integrity metrics of trusted TPMs on target programs or key attributes. The server randomly combines the above strategies until the availability meets the requirements and further observes the abnormal conditions of the control program.