However, in the risk assessment process, these 30 indicators do not contribute equal weight. Therefore, it is necessary to assess these indicator weights so that decision makers can allocate resources and formulate strategies accordingly. Moreover, when neural networks are subsequently employed for data security risk evaluation, these indicator weights will also impact the accuracy and rigor of the data assessment results.
Commonly used methods for determining subjective indicator weights include the Analytic Hierarchy Process, the Delphi method, and others. Among them, the Delphi method generally provides only a series of indicators without a systematic hierarchical decomposition, which may make it difficult for experts to form an overall understanding. Moreover, although the Delphi method gathers consensus through multiple rounds of anonymous questionnaires, it lacks a means to quantitatively verify the consistency of expert opinions. In contrast, the AHP allows for the decomposition of complex problems into layers and levels. Experts only need to perform pairwise comparisons among indicators within the same level, and consistency tests are used to ensure the logical reliability of judgments.
Commonly used methods for determining objective indicator weights include the Entropy Weight Method and the Criteria Importance Through Intercriteria Correlation (CRITIC) method. The CRITIC method, based on standard deviation and correlation, accounts for both the variability of indicators and the correlations between them, making it suitable for scenarios where indicators are strongly correlated. In contrast, the EWM calculates weights based solely on the distribution characteristics of the data, making it more suitable for situations with weakly correlated indicators. It can ensure that risk factors with a high degree of data dispersion receive higher weights without being disturbed by other indicators and are independent of the absolute size of the values.
Therefore, the overall approach of this study is to assign subjective weights to each indicator using the AHP based on expert evaluations and to derive objective weights using the EWM based on the actual data distribution. These subjective and objective weights are then combined into comprehensive weights through weighted fusion, thereby constructing a complete evaluation model.
4.3.1. Determining Subjective Weights Using the Analytic Hierarchy Process
The present scheme employs the following Analytic Hierarchy Process [
37] to determine the subjective weights of the indicators. The primary step of the AHP is to set up a hierarchical structure model, which generally contains the goal layer, the indicator layer, and the alternative layer, as shown in
Figure 12.
In this scheme, two key considerations are taken into account: first, the impact of different stages of the data lifecycle on risk assessment results varies across scenarios, and second, matrix operations involve computational and storage challenges. We adopt a phased approach to determine weights and synthesize them in the final step. This method not only reflects the diverse impact of each lifecycle stage on the risk assessment results but also confines higher-order matrix operations and storage demands to lower levels. In this scheme, the process of determining subjective indicator weights based on the AHP is illustrated in
Figure 13.
The details are as follows.
1. Construction of the pairwise comparison matrix
(1) Expert scoring
Assume there are
K experts in the relevant field. Each expert compares every indicator
for each period
t (where
, respectively, corresponding to the stage of data collection, transmission, storage, processing, exchange, and destruction) according to the pairwise comparison scale of the AHP, thereby constructing the pairwise comparison matrix (judgment matrix). The AHP pairwise comparison scale is defined in
Table 7.
K experts, drawing on domain knowledge and other relevant criteria, construct the pairwise judgment matrix according to the pairwise comparison scale of the AHP. The indicator judgment matrix of each period is as follows:
denotes the judgment matrix obtained by the k-th expert from pairwise comparisons of the five indicators within the t-th period.
(2) Construction of the Pairwise Comparison Matrix
The final pairwise comparison matrix
for period
t is obtained by integrating all
K experts’ judgment matrices
. The integration process employs the geometric mean method. Specifically, suppose that
K experts provide scale values for a given element
of the judgment matrix as
(where
); then, the geometric mean of that element is given by
Then, the final judgment matrix
of the
t-th period is
where
. At this point, there are only six judgment matrices for each cycle indicator, corresponding to the six cycles of the entire data lifecycle.
2. Calculation of the weight vector
For the indicator judgment matrix of the t-th stage, the steps to calculate the weight vector are as follows.
(1) Construction of the normalized matrix
At the
t-th stage, the elements of the normalized matrix
are computed by dividing the corresponding element
from the original judgment matrix
by the sum of its corresponding column. Specifically, the sum for each column
j is calculated as follows:
The computation formula for each element
of the normalized matrix
is
The normalized matrix
for the
t-th stage is represented as
(2) Calculation of the weight vector
For the
t-th stage, the weight
of the
i-th indicator is the average of the elements in the
i-th row of the normalized matrix
, calculated as follows:
where
n is the order of the judgment matrix, i.e., there are
n indicators. In this case,
. Then, the weight vector
for the
t-th cycle indicators is given by
It satisfies the condition .
3. Consistency check
When constructing the judgment matrix, it is possible to make logical errors, so a consistency check is required to assess whether the matrix exhibits any inconsistencies. The steps for the consistency check are as follows.
(1) Calculate the maximum eigenvalue
of the indicator judgment matrix for the
t-th stage using the following formula:
(2) Calculate the consistency index
using the following formula:
At this point, if , it indicates that there is complete consistency; if is close to 0, it indicates the satisfactory consistency. The larger the is, the more severe the inconsistency.
(3) Obtain the
value by consulting the table.
is the random consistency index, and it is related to the order of the judgment matrix. In general, as the order of the matrix increases, the probability of random consistency deviation also increases. Its values are provided in
Table 8:
(4) Calculate the consistency ratio
. Considering that deviations in consistency may be due to random factors, when testing whether the judgment matrix exhibits acceptable consistency, the consistency index
must be compared with the random consistency index
. The test coefficient
is calculated as follows:
If , the judgment matrix is considered to have passed the consistency check; otherwise, it does not exhibit satisfactory consistency. If the data do not pass the consistency check, it is necessary to check for logical issues and re-enter the judgment matrix for further analysis.
4. Calculation of the final entire lifecycle subjective indicator weights
Suppose the weights assigned by experts for each stage are as follows:
Here, corresponds to the weight for the t-th stage from data collection to destruction, and they satisfy the condition: .
For each stage, the indicator weight vector calculated using AHP is
Finally, the subjective indicator weight
for the
i-th indicator in the
t-th stage of the entire data lifecycle is computed as:
4.3.2. Determining Objective Weights Using Entropy Weight Method
This scheme utilizes the Entropy Weight Method [
38] to determine the objective indicator weights. The specific process is shown in
Figure 14.
Based on the risk assessment indicators of the entire data lifecycle established in this scheme, we collect and quantify
N samples of data security risk assessment. The original data matrix
is shown below.
where
N is the number of data samples;
n is the number of assessment indicators; and
represents the value of the
i-th sample for the
j-th data security risk assessment indicator, with
and
.
1. Data normalization
Due to the differing value ranges among the risk assessment indicators, normalization is required to constrain their values between . A commonly used min–max normalization method is as follows:
(1) For extremely large indicators (i.e., higher values are better),
where
is the maximum value for the
j-th indicator and
is the minimum value for the
j-th indicator.
(2) For extremely small indicators (i.e., lower values are better),
Then, the normalized matrix
is given by
2. Calculation of the probability matrix
For each element
of matrix
, calculate the probability distribution
for each indicator. The formula for
is given by
where
represents the contribution rate of the
i-th sample to the
j-th data security risk assessment indicator. The probability matrix
is given by
3. Calculation of information entropy
The information entropy
of the
j-th data security risk assessment indicator is calculated by the following formula:
where the normalization coefficient is defined as
, ensuring that
lies within the interval
. Note that if
, then
is defined to be 0. The information entropy
reflects the data distribution of the indicator.
(1) If is large, it indicates that the data distribution of the indicator is relatively uniform, providing less information, so the weight should be lower.
(2) If is small, it indicates that the values of the indicator vary more, providing more information, so the weight should be higher.
4. Calculation of the entropy weight
Based on
, the information utility value
is calculated as follows:
represents the contribution of the information of the j-th data security risk assessment indicator, i.e., the importance of that indicator.
Eventually, the entropy weight (the objective weight of the indicator)
is calculated as follows:
4.3.3. Determining Comprehensive Weights of Indicators
Assume that for a certain security risk assessment indicator in the entire data lifecycle, the subjective weight obtained using AHP is
, and the objective weight obtained using EWM is
. Then, the composite weight
for that indicator using the AHP–EWM combined weighting method is given by
where
represents the balancing coefficient between the subjective and objective weights.
To obtain the optimal balancing coefficient , we adopt the least squares method to obtain the optimal . Its main principle is to minimize the sum of squared deviations between the composite weight and both the subjective weight and the objective weight , thereby obtaining the value of . It secures an optimal compromise in the composite weight between subjective preferences and objective data. The specific solution steps are as follows.
(1) Construction of the Objective Function:
In minimizing the squared error between the composite weight
and both the subjective weight
and the objective weight
to define the objective function, the expression is as follows:
where
n is the total number of indicators in the evaluation system.
(2) Differentiation with Respect to :
Take the derivative of the objective function with respect to
. When the first derivative of the objective function is set to zero, the minimum of the squared error is obtained. In this case, we found that
, meaning that the optimal balancing coefficient is 0.5, which indicates that the subjective weight and the objective weight are weighted equally. Thus, the composite weight
using the combined AHP–Entropy Weight method is
This weight calculation method is applicable in scenarios where the subjective and objective weights are equally important. In practical applications, different values for
may be set based on the actual situation to emphasize either the subjective or the objective weight. Finally, the complete set of entire data lifecycle risk assessment indicator weights is obtained as follows: