A Multi-Scene Automatic Classification and Grading Method for Structured Sensitive Data Based on Privacy Preferences
Abstract
1. Introduction
- A composite sensitivity scoring framework with the privacy preference matrix was proposed, enabling scenario-aware sensitivity quantification for structured data attributes.
- A hybrid classification strategy that integrates domain knowledge, k-means clustering, and association rule mining to improve the precision of sensitive attribute identification.
- A hierarchical grading mechanism built on sensitivity-weighted mutual information was developed, supporting fine-grained stratification of sensitive attributes.
2. Related Works
3. Notations
4. Methods
4.1. Composite Sensitivity Calculation Integrated with Privacy Preference Matrix
- For numerical attributes: Compute the mean of non-null values within the target attribute, as formulated by (where denotes the number of non-null records). Subsequently, select the non-null value with the smallest deviation from this mean, , to populate the null entries of the attribute.
- For categorical attributes: Determine the mode of non-null values in the target attribute (i.e., the category with the highest frequency of occurrence), and utilize this mode directly for filling null records.
- Input: The original dataset containing 50 structured attribute fields (including vehicle basic information, performance parameters, configuration details, travel records, etc.) and the privacy preference survey results from 100 users across three scenarios (regulatory supervision, operational services, and marketing). These results are presented in the form of a preference matrix, reflecting the sensitivity weights of different attributes in each scenario.
- Output: A table of attribute sensitivity scores after composite sensitivity quantification (including the objective information entropy value of each attribute, the subjective weight of collective preferences, and the integrated composite sensitivity value) and dynamic sensitivity threshold parameters adapted to the three scenarios (for scenario-specific calls in subsequent classification modules).
4.1.1. Construction of Privacy Preference Matrix
- Level A (Non-sensitive): Publicly accessible information (e.g., gender, age group), assigned a weight of 0.
- Level B (Slightly Sensitive): Information indirectly inferring identity (e.g., driver’s license type, occupation type), assigned a weight of 0.25.
- Level C (Moderately Sensitive): Information directly inferring identity (e.g., driving experience, driver-related information), assigned a weight of 0.5.
- Level D (Highly Sensitive): Information explicitly identifying individuals (e.g., mobile phone number, email address), assigned a weight of 0.75.
- Level E (Extremely Sensitive): Information involving core privacy (e.g., biometric data, location information), assigned a weight of 1.0.
4.1.2. Composite Sensitivity Calculation
Algorithm 1: Sensitivity Calculation Algorithm Integrating the Privacy Preference Matrix |
Require: the structured dataset, named dataset, and the user preference information list, named user_preference_list, , |
Read the structured dataset, the user preference information list, and the predefined weights. Define cleaned_data = clean_data(dataset) |
privacy_preference_matrix = {}, sensitivity_score = {“Define privacy attribute sensitivity score dictionary”}, scenario_score = {“Define scenario acceptability score dictionary”} |
for user_preference in user_preference_list |
for attribute, scenario, preference in user_preference, scenario_preferences: |
score = sensitivity_score[preference] or score = scenario_score[preference] |
privacy_preference_matrix[attribute][scenario].append(score) |
for attribute, scenario_scores in privacy_preference_matrix.items(): |
for scenario, scores in scenario_scores.items(): |
average_score = sum(scores)/len(scores) if scores else 0 |
privacy_preference_matrix[attribute][scenario] = average_score |
adjusted_sensitivity = {} |
for attribute_values in cleaned_data.columns: |
entropy =calculate_entropy(attribute_values) |
max_entropy = log2(len(set(attribute_values))) |
sv_entropy = entropy/max_entropy |
sv_preference = privacy_preference_matrix.get(attribute, {}).get(“Comprehensive scenario”, 0) |
adjusted_sensitivity[attribute] = × sv_entropy + × sv_preference |
Output adjusted_sensitivity[attribute] |
Output: adjusted_sensitivity |
4.2. Sensitive Data Classification Based on Clustering Feature Mining
4.2.1. Domain Knowledge-Driven Initial Classification Using K-Means
Algorithm 2: Preliminary Sensitive Data Clustering based on K-Means |
Require: Structured data sensitivity set (adjusted_sensitivity), privacy preference matrix (privacy_matrix), domain, maximum iterations (max_iter), displacement threshold (threshold). |
Retrieve the structured data sensitivity set, privacy preference matrix, domain, the number of iterations, and displacement threshold, and define cleaned_data = clean_data (adjusted_sensitivity) |
privacy_attribute = analyze_privacy (privacy_matrix, domain) # Scenario analysis to obtain privacy attribute values. |
center1 = select_center_by_privacy(cleaned_data, privacy_attribute) |
center2 = random_select_center(cleaned_data) |
for iter = 1 to max_iter |
cluster1 = [] |
cluster2 = [] |
for each sample in cleaned_data |
distance1 = calculate_distance(sample, center1) |
distance2 = calculate_distance(sample, center2) |
if distance1 < distance2 |
append sample to cluster1 |
else |
append sample to cluster2 |
new_center1 = calculate_mean(cluster1) |
new_center2 = calculate_mean(cluster2) |
displacement1 = calculate_displacement(center1, new_center1) |
displacement2 = calculate_displacement(center2, new_center2) |
if displacement1 < threshold and displacement2 < threshold |
break |
center1 = new_center1 |
center2 = new_center2 |
sensitive_data = cluster1 |
suspected_sensitive_data = cluster2 |
Output: sensitive data clusters (sensitive_data) and suspected sensitive data clusters (suspected_sensitive_data) |
4.2.2. Classification Correction Based on the Association Rules
Algorithm 3: Cluster Optimization based on FP-Growth |
Require: Preliminarily clustered sensitive data (PreSensitive_data), preliminarily clustered suspected sensitive data (PreSuspected_sensitive_data), minimum support (min_support), confidence threshold (confidence_threshold). |
Merge the sensitive data and suspected sensitive data into a single dataset, i.e., combined_data=PreSensitive_data+PreSuspected_sensitive_data |
item_frequency = count_item_frequency(combined_data) |
sorted_items = sort_items_by_frequency(item_frequency) |
root = create_root() |
for each transaction in combined_data |
sorted_transaction = sort_transaction(transaction, sorted_items) |
insert_transaction(root, sorted_transaction) |
frequent_itemsets = [] |
for each leaf in root.leaves |
path = get_path_from_root(leaf) |
local_itemsets = find_frequent_itemsets(path, item_frequency, min_support) |
frequent_itemsets = union(frequent_itemsets, local_itemsets) |
sensitive_set = [] |
nonsensitive_set = [] |
for each itemset, subset in frequent_itemsets generate_subsets(itemset) |
antecedent = subset, consequent = itemset - subset |
support_AB = calculate_support(antecedent + consequent) |
support_A = calculate_support(antecedent) |
confidence = support_AB/support_A |
if confidence >= confidence_threshold |
append itemset to sensitive_set |
else |
append itemset to nonsensitive_set |
Output: Updated sensitive dataset (UpdSensitive_set), Updated non-sensitive dataset (UpdNonsensitive_set) |
4.3. Sensitive Data Classification Based on Mutual Information Matrix
Algorithm 4: The sensitive data classification algorithm based on the mutual information matrix |
Require: Structured sensitive dataset, sensitive_set, privacy preference matrix, privacy_matrix |
num_attributes = len(sensitive_set.num) # Computing the number of attributes in the structured sensitive dataset. |
for i, j in range (num_attributes): # Constructing the mutual information matrix |
joint_prob = joint_count/len(sensitive_set) |
mutual_info += joint_prob × (log(joint_prob/(p_x × p_y))) |
mutual_information_matrix[i][j] = mutual_info |
num_attributes = len(mutual_information_matrix) |
clusters = [[i] for i in range(num_attributes)] |
sensitivity_values = get_sensitivity_values(privacy_matrix) |
num_clusters = get_level_sensitivity_values |
while len(clusters) > target_cluster_num: |
similarity += mutual_information_matrix[a][b] |
similarity = similarity/(size1 * size2) |
if similarity > max_similarity: merge_cluster1,merge_cluster2 |
new_cluster = clusters[merge_cluster1] + clusters[merge_cluster2] |
groups = get_cluster_by_new_cluster |
for attr, level in groups, level_criteria.items() |
group_sensitivity += sensitivity_values[attribute] |
avg_sensitivities=sum(group_sensitivit)/len(new_cluster) |
classifications = level |
Output: attribute grouping, along with the average sensitivity, avg_sensitivities, and the data classification result, classifications. |
5. Experimental Results and Analysis
5.1. Dataset
5.1.1. Vehicle Management Dataset
5.1.2. Financial Guarantee Dataset
5.2. Experimental Results
5.2.1. Composite Sensitivity Calculation Incorporating Privacy Preferences
- Experimental results under the vehicle management dataset
- Experimental results under the Financial Guarantee Dataset
5.2.2. Classification Results of Structured Sensitive Data
- Experimental results under the vehicle management dataset
- Experimental results under the Financial Guarantee Dataset
5.2.3. Graded Classification Results of Structured Sensitive Data
- Experimental results under the vehicle management dataset
- Experimental results under the Financial Guarantee Dataset
5.2.4. Comparison with Existing Methods
6. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Adewusi, A.O.; Okoli, U.I.; Adaga, E.; Olorunsogo, T.; Asuzu, O.F.; Daraojimba, D.O. Business intelligence in the era of big data: A review of analytical tools and competitive advantage. Comput. Sci. IT Res. J. 2024, 5, 415–431. [Google Scholar] [CrossRef]
- Herath, H.; Herath, H.; Madhusanka, B.; Guruge, L.G.P.K. Data protection challenges in the processing of sensitive data. In Data Protection: The Wake of AI and Machine Learning; Springer Nature: Cham, Switzerland, 2024; pp. 155–179. [Google Scholar]
- Protection Regulation. General data protection regulation. Intouch 2018, 25, 1–5. [Google Scholar]
- PCI Security Standards Council. Data Security Standard. Requirements and Security Assessment Version 3. 2010. Available online: https://listings.pcisecuritystandards.org/documents/PCI-DSS-v4-0-SAQ-D-Service-Provider.pdf (accessed on 20 August 2025).
- Templ, M.; Sariyar, M. A systematic overview on methods to protect sensitive data provided for various analyses. Int. J. Inf. Secur. 2022, 21, 1233–1246. [Google Scholar] [CrossRef]
- Zu, L.; Qi, W.; Li, H.; Men, X.; Lu, Z.; Ye, J.; Zhang, L. UP-SDCG: A Method of Sensitive Data Classification for Collaborative Edge Computing in Financial Cloud Environment. Future Internet 2024, 16, 102. [Google Scholar] [CrossRef]
- Cui, Y.; Huang, Y.; Bai, Y.; Wang, Y.; Wang, C. Sensitive data identification for multi-category and multi-scenario data. Trans. Emerg. Telecommun. Technol. 2024, 35, e4983. [Google Scholar] [CrossRef]
- Tian, W.; Gu, K.; Xiao, S.; Zhang, J.; Cui, W. G2MBCF: Enhanced Named Entity Recognition for sensitive entities identification. Data Knowl. Eng. 2025, 159, 102444. [Google Scholar] [CrossRef]
- Yi, Y.; Zhu, N.; He, J.; Jurcut, A.D.; Ma, X.; Luo, Y. A privacy-sensitive data identification model in online social networks. Trans. Emerg. Telecommun. Technol. 2024, 35, e4876. [Google Scholar] [CrossRef]
- Wu, L.-T.; Lin, J.-R.; Leng, S.; Li, J.-L.; Hu, Z.-Z. Rule-based information extraction for mechanical-electrical-plumbing-specific semantic web. Autom. Constr. 2022, 135, 104108. [Google Scholar] [CrossRef]
- Tayefi, M.; Ngo, P.; Chomutare, T.; Dalianis, H.; Salvi, E.; Budrionis, A.; Godtliebsen, F. Challenges and opportunities beyond structured data in analysis of electronic health records. Wiley Interdiscip. Rev. Comput. Stat. 2021, 13, e1549. [Google Scholar] [CrossRef]
- Kužina, V.; Vušak, E.; Jović, A. Methods for automatic sensitive data detection in large datasets: A review. In Proceedings of the 2021 44th International Convention on Information, Communication and Electronic Technology (MIPRO), Opatija, Croatia, 27 September–1 October 2021; IEEE: New York City, NY, USA; pp. 187–192. [Google Scholar]
- Cai, L.; Zhou, Y.; Ding, Y.; Jiang, J.; Yang, S.-H. Utilizing lexicon-enhanced approach to sensitive information identification. In Proceedings of the 2022 27th International Conference on Automation and Computing (ICAC), Bristol, UK, 10 October 2022; IEEE: New York City, NY, USA, 2022; pp. 1–6. [Google Scholar]
- Kužina, V.; Petric, A.-M.; Barišić, M.; Jović, A. CASSED: Context-based approach for structured sensitive data detection. Expert Syst. Appl. 2023, 223, 119924. [Google Scholar] [CrossRef]
- Qiao, C.; Hu, X. Text classification for cognitive domains: A case using lexical, syntactic and semantic features. J. Inf. Sci. 2019, 45, 516–528. [Google Scholar] [CrossRef]
- Gitanjali, K.L. A novel approach of sensitive data classification using convolution neural network and logistic regression. Int. J. Innov. Technol. Explor. Eng. 2019, 8, 2883–2886. [Google Scholar]
- Wang, Y.; Shen, X.; Yang, Y. The classification of Chinese sensitive information based on BERT-CNN. In International Symposium on Intelligence Computation and Applications; Springer: Singapore, 2019; pp. 269–280. [Google Scholar]
- Cong, K.; Li, T.; Li, B.; Gao, Z.; Xu, Y.; Gao, F.; Peng, H. KGDetector: Detecting Chinese Sensitive Information via Knowledge Graph-Enhanced BERT. Secur. Commun. Netw. 2022, 2022, 4656837. [Google Scholar] [CrossRef]
- Timmer, R.C.; Liebowitz, D.; Nepal, S.; Kanhere, S.S. Can pre-trained transformers be used in detecting complex sensitive sentences?—A monsanto case study. In Proceedings of the 2021 Third IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications (TPS-ISA), Atlanta, GA, USA, 14 April 2022; IEEE: New York City, NY, USA, 2021; pp. 90–97. [Google Scholar]
- Li, M.; Liu, J.; Yang, Y. Automated Identification of Sensitive Financial Data Based on the Topic Analysis. Future Internet 2024, 16, 55. [Google Scholar] [CrossRef]
- He, W.; Peng, C.; Wang, M.; Ding, X.; Fan, M.; Ding, H. Algorithm for Identification and Classification of Sensitive Attributes in Structured Data Sets. Appl. Res. Comput. 2020, 37, 3077–3082. [Google Scholar]
- Hao, W. Research on Address Sensitive Data Identification Method Based on Machine Learning. Master’s Thesis, Beijing Jiaotong University, Beijing, China, 2023. [Google Scholar]
- Zhang, Y. Key Technologies and Systems for Sensitive Data Anonymization and Watermarking in Structured Data. Master’s Thesis, Beijing University of Posts and Telecommunications, Beijing, China, 2021. [Google Scholar]
- Guan, X.; Zhou, C.; Cao, W. Research on Classification Method of Sensitive Structural Data of Electric Power. In Proceedings of the 2022 IEEE 12th International Conference on Electronics Information and Emergency Communication (ICEIEC), Beijing, China, 15–17 July 2022. [Google Scholar]
- Li, Y. Cross-Cultural Privacy Differences. In Modern Socio-Technical Perspectives on Privacy; Springer International Publishing: Berlin/Heidelberg, Germany, 2022; pp. 267–292. [Google Scholar] [CrossRef]
A (0) | B (0.25) | C (0.5) | D (0.75) | E (1) | ||
---|---|---|---|---|---|---|
Vehicle Basic Information | Vehicle Identifier | |||||
Vehicle Type | ||||||
Manufacturer | ||||||
Model | ||||||
Year of Production | ||||||
Color | ||||||
Vehicle Performance Information | Engine Displacement | |||||
Fuel Type | ||||||
Fuel Efficiency | ||||||
Transmission Type | ||||||
Mileage | ||||||
Driving Speed | ||||||
Acceleration | ||||||
Brake Response | ||||||
Steering Sensitivity | ||||||
Tire Type | ||||||
Tire Pressure | ||||||
Suspension Type | ||||||
Number of Airbags | ||||||
Seat Capacity | ||||||
Vehicle Configuration Information | Location Enabled | |||||
Bluetooth Enabled | ||||||
Number of USB Ports | ||||||
Entertainment System | ||||||
Air Conditioning Control System | ||||||
Navigation System | ||||||
Traffic Alarm | ||||||
Parking Sensors | ||||||
Reversing Camera | ||||||
Lane Departure Warning | ||||||
Collision Warning | ||||||
Cruise Control | ||||||
Travel Information | Trip Start Time | |||||
Trip End Time | ||||||
Trip Duration | ||||||
Distance Traveled | ||||||
Trip Origin | ||||||
Trip Destination | ||||||
Route Type | ||||||
Traffic Conditions | ||||||
Weather Conditions | ||||||
Road Type | ||||||
Toll Amount | ||||||
Parking Fees | ||||||
Maintenance Fees | ||||||
Insurance Costs | ||||||
Driver Information | Driver Age | |||||
Driver Gender | ||||||
Driver License Type | ||||||
Driving Experience |
Application Scenario | Completely Acceptable (0) | Somewhat Acceptable (0.25) | Neutral (0.5) | Somewhat Unacceptable (0.75) | Completely Unacceptable (1) |
---|---|---|---|---|---|
Government Regulatory | |||||
Operational Usage | |||||
Marketing Usage |
Data Attributes | Regulatory | Operational | Marketing | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
A | B | C | D | E | A | B | C | D | E | A | B | C | D | E | ||
Vehicle Basic Information | Vehicle Identifier | √ | √ | √ | ||||||||||||
Vehicle Type | √ | √ | √ | |||||||||||||
Manufacturer | √ | √ | √ | |||||||||||||
Model | √ | √ | √ | |||||||||||||
Year of Production | √ | √ | √ | |||||||||||||
Color | √ | √ | √ | |||||||||||||
Vehicle Performance Information | Engine Displacement | √ | √ | √ | ||||||||||||
Fuel Type | √ | √ | √ | |||||||||||||
Fuel Efficiency | √ | √ | √ | |||||||||||||
Transmission Type | √ | √ | √ | |||||||||||||
Mileage | √ | √ | √ | |||||||||||||
Driving Speed | √ | √ | √ | |||||||||||||
Acceleration | √ | √ | √ | |||||||||||||
Brake Response | √ | √ | √ | |||||||||||||
Steering Sensitivity | √ | √ | √ | |||||||||||||
Tire Type | √ | √ | √ | |||||||||||||
Tire Pressure | √ | √ | √ | |||||||||||||
Suspension Type | √ | √ | √ | |||||||||||||
Number of Airbags | √ | √ | √ | |||||||||||||
Seat Capacity | √ | √ | √ | |||||||||||||
Vehicle Configuration Information | Location Enabled | √ | √ | √ | ||||||||||||
Bluetooth Enabled | √ | √ | √ | |||||||||||||
Number of USB Ports | √ | √ | √ | |||||||||||||
Entertainment System | √ | √ | √ | |||||||||||||
Air Conditioning Control System | √ | √ | √ | |||||||||||||
Navigation System | √ | √ | √ | |||||||||||||
Traffic Alarm | √ | √ | √ | |||||||||||||
Parking Sensors | √ | √ | √ | |||||||||||||
Reversing Camera | √ | √ | √ | |||||||||||||
Lane Departure Warning | √ | √ | √ | |||||||||||||
Collision Warning | √ | √ | √ | |||||||||||||
Cruise Control | √ | √ | √ | |||||||||||||
Travel Information | Trip Start Time | √ | √ | √ | ||||||||||||
Trip End Time | √ | √ | √ | |||||||||||||
Trip Duration | √ | √ | √ | |||||||||||||
Distance Traveled | √ | √ | √ | |||||||||||||
Trip Origin | √ | √ | √ | |||||||||||||
Trip Destination | √ | √ | √ | |||||||||||||
Route Type | √ | √ | √ | |||||||||||||
Traffic Conditions | √ | √ | √ | |||||||||||||
Weather Conditions | √ | √ | √ | |||||||||||||
Road Type | √ | √ | √ | |||||||||||||
Toll Amount | √ | √ | √ | |||||||||||||
Parking Fees | √ | √ | √ | |||||||||||||
Maintenance Fees | √ | √ | √ | |||||||||||||
Insurance Costs | √ | √ | √ | |||||||||||||
Driver Information | Driver Age | √ | √ | √ | ||||||||||||
Driver Gender | √ | √ | √ | |||||||||||||
Driver License Type | √ | √ | √ | |||||||||||||
Driving Experience | √ | √ | √ |
Data Attributes | Regulatory | Operational | Marketing | |
---|---|---|---|---|
Vehicle Basic Information | Vehicle Identifier | 0.83 | 0.23 | 0.23 |
Vehicle Type | 0.22 | 0.22 | 0.60 | |
Manufacturer | 0.20 | 0.22 | 0.63 | |
Model | 0.22 | 0.23 | 0.62 | |
Year of Production | 0.22 | 0.22 | 0.62 | |
Color | 0.13 | 0.12 | 0.63 | |
Vehicle Performance Information | Engine Displacement | 0.22 | 0.23 | 0.63 |
Fuel Type | 0.21 | 0.23 | 0.63 | |
Fuel Efficiency | 0.22 | 0.61 | 0.62 | |
Transmission Type | 0.25 | 0.21 | 0.63 | |
Mileage | 0.22 | 0.62 | 0.23 | |
Driving Speed | 0.61 | 0.23 | 0.21 | |
Acceleration | 0.22 | 0.22 | 0.20 | |
Brake Response | 0.24 | 0.23 | 0.23 | |
Steering Sensitivity | 0.22 | 0.20 | 0.23 | |
Tire Type | 0.22 | 0.21 | 0.22 | |
Tire Pressure | 0.23 | 0.23 | 0.22 | |
Suspension Type | 0.22 | 0.23 | 0.22 | |
Number of Airbags | 0.23 | 0.22 | 0.23 | |
Seat Capacity | 0.23 | 0.22 | 0.62 | |
Vehicle Configuration Information | Location Enabled | 0.83 | 0.62 | 0.22 |
Bluetooth Enabled | 0.22 | 0.21 | 0.61 | |
Number of USB Ports | 0.12 | 0.11 | 0.63 | |
Entertainment System | 0.12 | 0.12 | 0.62 | |
Air Conditioning Control System | 0.18 | 0.12 | 0.63 | |
Navigation System | 0.81 | 0.62 | 0.21 | |
Traffic Alarm | 0.21 | 0.21 | 0.22 | |
Parking Sensors | 0.22 | 0.23 | 0.23 | |
Reversing Camera | 0.23 | 0.23 | 0.23 | |
Lane Departure Warning | 0.22 | 0.22 | 0.23 | |
Collision Warning | 0.22 | 0.23 | 0.23 | |
Cruise Control | 0.21 | 0.21 | 0.23 | |
Travel Information | Trip Start Time | 0.62 | 0.23 | 0.23 |
Trip End Time | 0.62 | 0.23 | 0.23 | |
Trip Duration | 0.62 | 0.62 | 0.22 | |
Distance Traveled | 0.62 | 0.63 | 0.22 | |
Trip Origin | 0.83 | 0.63 | 0.23 | |
Trip Destination | 0.82 | 0.21 | 0.22 | |
Route Type | 0.22 | 0.23 | 0.21 | |
Traffic Conditions | 0.23 | 0.22 | 0.22 | |
Weather Conditions | 0.21 | 0.22 | 0.24 | |
Road Type | 0.22 | 0.22 | 0.23 | |
Toll Amount | 0.22 | 0.23 | 0.22 | |
Parking Fees | 0.23 | 0.22 | 0.22 | |
Maintenance Fees | 0.22 | 0.63 | 0.23 | |
Insurance Costs | 0.21 | 0.63 | 0.22 | |
Driver Information | Driver Age | 0.62 | 0.21 | 0.63 |
Driver Gender | 0.23 | 0.23 | 0.62 | |
Driver License Type | 0.82 | 0.22 | 0.22 | |
Driving Experience | 0.61 | 0.23 | 0.22 |
Category | Attribute | ID | Value |
---|---|---|---|
Vehicle Basic Information | Vehicle Identifier | Vehicle_ID | 1/2/3/4/5/6/7… |
Vehicle Type | Vehicle_Type | Hatchback/Sedan/SUV/… | |
Manufacturer | Manufacturer | Chevrolet/Honda/Ford/… | |
Model | Model | Corolla/Malibu/F-150/… | |
Year of Production | Year | 2020/2018/2020/… | |
Color | Color | Black/White/Blue/… | |
Vehicle Performance Information | Engine Displacement | Engine_Size | 3.0/5.0/1.5/… |
Fuel Type | Fuel_Type | Hybrid/Gasoline/Diesel/… | |
Fuel Efficiency | Fuel_Efficiency | 19.9/49.2/33/… | |
Transmission Type | Transmission | Manual/Manual/Automatic/… | |
Mileage | Mileage | 72,272/43,702/18,400/… | |
Driving Speed | Speed | 34/83/79/… | |
Acceleration | Acceleration | 16.97/12.24/12.81/… | |
Brake Response | Brake_Response | 1.105/1.744/0.797/… | |
Steering Sensitivity | Steering_Responsiveness | 0.71/0.96/1.74/… | |
Tire Type | Tire_Type | Winter/Summer/Summer/… | |
Tire Pressure | Tire_Pressure | 34.89/37.45/32.65/… | |
Suspension Type | Suspension_Type | Independent/Non-Independent/… | |
Number of Airbags | Airbag_Count | 5/7/6/… | |
Seat Capacity | Seating_Capacity | 7/7/7/… | |
Vehicle Configuration Information | Location Enabled | Location_Enabled | TRUE/TRUE/TRUE/… |
Bluetooth Enabled | Bluetooth_Enabled | TRUE/FALSE/TRUE/… | |
Number of USB Ports | USB_Ports | 4/3/3/… | |
Entertainment System | Entertainment_System | Automatic/Manual/Manual/… | |
Air Conditioning Control System | Climate_Control | Automatic/Automatic/Manual | |
Navigation System | Navigation_System | FALSE/TRUE/TRUE/… | |
Traffic Alarm | Traffic_Alert | FALSE/FALSE/FALSE/… | |
Parking Sensors | Parking_Sensor | FALSE/FALSE/TRUE/… | |
Reversing Camera | Backup_Camera | TRUE/TRUE/TRUE/… | |
Lane Departure Warning | Lane_Departure_Warning | FALSE/FALSE/FALSE/… | |
Collision Warning | Collision_Warning | FALSE/TRUE/FALSE/… | |
Cruise Control | Adaptive_Cruise_Control | TRUE/TRUE/FALSE/… | |
Travel Information | Trip Start Time | Start_Time | 9:05:00/10:53/07:38/… |
Trip End Time | End_Time | 11:03/12:10/09:26/… | |
Trip Duration | Trip_Duration | 118/165/77/… | |
Distance Traveled | Distance_Traveled | 289/93/171/… | |
Trip Origin | Origin | Hefei High-Tech Zone, Anhui … | |
Trip Destination | Destination | Binhu Square, Hefei, Anhui … | |
Route Type | Route_Type | Highway/Mixed/Highway/… | |
Traffic Conditions | Traffic_Condition | Moderate/Light/Heavy/… | |
Weather Conditions | Weather_Condition | Snowy/Rainy/Sunny/… | |
Road Type | Road_Type | Paved/Paved/Unpaved/… | |
Toll Amount | Toll_Amount | 17.5/0/19/… | |
Parking Fees | Parking_Fee | 10.8/4.6/4.4/… | |
Maintenance Fees | Maintenance_Cost | 965/733/269/… | |
Insurance Costs | Insurance_Cost | 915/1,263/1,226/… | |
Driver Information | Driver Age | Driver_Age | 39/41/22/… |
Driver Gender | Driver_Gender | Male/Female/Female/… | |
Driver License Type | Driver_License_Type | Class B/Class B/Class A/… | |
Driving Experience | Driver_Experience | 10/8/3 |
Category | Attribute | ID | Value |
---|---|---|---|
Subject Information Category | Project Code | Project_Code | 1/2/3/4/5… |
Loan Principal | Loan_Principal | AACo., Ltd./BBCo., Ltd./… | |
Applicant | Applican_Person | Tom/WilsonBrown/… | |
ID Card Number | ID_Card | 340321…/34088…/34242… | |
Certificate Number | ID_Certificate | 340321…/34088…/34242… | |
Contract and Voucher Information Category | Loan Contract Number | ID_Loan Contract | 0130…/3402…/1953… |
Receipt Number | ID_Receipt | 0130…/3402…/1953… | |
Contract Start Date | Contract_Start_Date | 1 August 2025/25 July 2025/… | |
Contract End Date | Contract_End_Date | 3 August 2026/26 July 2027/… | |
Core Loan Information Category | Disbursing Bank | Disbursing_BankName | BankA/BankB/BankC/… |
Loan Amount (RMB) | Loan_Amount | 500,000/80,000/20,000/… | |
Loan Balance (RMB) | Loan_Balance | 500,000/80,000/20,000/… | |
Entered Amount (RMB) | Entered_Amount | 100,000/20,000/10,500/… | |
Overdue Amount (RMB) | Overdue_Amount | 2000/3000/1000/… | |
Loan Start Time | Loan_Start_Time | 1 August 2025/25 July 2025/… | |
Loan End Time | Loan_End_Time | 3 August 2026/26 July 2027/… | |
Credit Limit (RMB) | Credit_Limit | 500,000/80,000/20,000/… | |
Credit Handler | Credit_Handler | Tom/WilsonBrown/… | |
Guarantee-related Information Category | Guarantee Amount (RMB) | Guarantee_Amount | 500,000/80,000/20,000/… |
Guarantee Rate (%) | Guarantee_Rate | 0.8/0.6/05/… | |
Premium Payment | Premium_Payment | Paid/UnPaid | |
First Guarantee or Not | Is_First_Guarantee | YES/NO/NO/… | |
Consistency of Counter-Guarantee Measures | Is_Consistency | YES/YES/NO/… | |
Project Status and Type Category | Loan Form | Loan_Form | short/long/mid/… |
Project Type | Project_Type | Agriculture/service industry/… | |
In-Guarantee Status | In-Guarantee_Status | Normal/abnormal/… | |
Product Label | Product_Label | Conventional guarantee/… | |
Risk Resolution Project or Not | Is_Risk_Resolution | NO/NO/YES/… | |
Bank Pre-litigation Project or Not | Is_Bank_Pre-litigation | NO/NO/YES/… | |
Geographical Information Category | City | City | Anqing/Fuyang/Nanjing/… |
District | District | Yingshang/Funan/Chizhou/… | |
Township | Township | Huanggang/Dayang/Luoling/ | |
Management and Time Information Category | Project Manager | Project_Manager | Tom/WilsonBrown/… |
Creation Time | Creation_Time | 1 August 2025/25 July 2025/… | |
Data Acquisition | Data_Acquisition | Offline/Online/… | |
Industry Classification Category | Industry Level 1 | Industry_Level _1 | Agriculture, forestry, animal husbandry and fishery |
Industry Level 2 | Industry_Level_2 | Forestry/…. | |
Industry Level 3 | Industry_Level_3 | seedling cultivation/… | |
Industry Level 4 | Industry_Level_4 | seedling cultivation/… |
Vehicle_ID | Vehicle_Type | Manufacturer | Model | Year | |
---|---|---|---|---|---|
Marketing | 0.64 | 0.32 | 0.31 | 0.32 | 0.32 |
Regulatory | 0.16 | 0.64 | 0.64 | 0.64 | 0.64 |
Operational | 0.64 | 0.64 | 0.64 | 0.64 | 0.64 |
Color | Engine_Size | Fuel_Type | Fuel_Efficiency | Transmission | |
Marketing | 0.34 | 0.32 | 0.32 | 0.36 | 0.32 |
Regulatory | 0.72 | 0.64 | 0.64 | 0.64 | 0.64 |
Operational | 0.72 | 0.64 | 0.64 | 0.32 | 0.64 |
Mileage | Speed | Acceleration | Brake_Response | Steering_Responsiveness | |
Marketing | 0.64 | 0.64 | 0.64 | 0.64 | 0.64 |
Regulatory | 0.64 | 0.32 | 0.64 | 0.64 | 0.64 |
Operational | 0.32 | 0.64 | 0.64 | 0.64 | 0.64 |
Tire_Type | Tire_Pressure | Suspensi…_Type | Airbag_Count | Seating_Capacity | |
Marketing | 0.64 | 0.64 | 0.64 | 0.64 | 0.32 |
Regulatory | 0.64 | 0.64 | 0.64 | 0.64 | 0.64 |
Operational | 0.64 | 0.64 | 0.64 | 0.64 | 0.64 |
Location_Enabled | Bluet…_Enabled | USB_Ports | Entertai…_System | Climate_Control | |
Marketing | 0.64 | 0.32 | 0.32 | 0.32 | 0.32 |
Regulatory | 0.16 | 0.64 | 0.72 | 0.72 | 0.72 |
Operational | 0.32 | 0.64 | 0.72 | 0.72 | 0.72 |
Navigati.._System | Traffic_Alert | Parking_Sensor | Backup_Camera | Lane_Departu…_Warning | |
Marketing | 0.68 | 0.73 | 0.64 | 0.64 | 0.64 |
Regulatory | 0.19 | 0.73 | 0.64 | 0.64 | 0.64 |
Operational | 0.36 | 0.73 | 0.64 | 0.64 | 0.64 |
Collisi…_Warning | Adapt…_Control | Start_Time | End_Time | Trip_Duration | |
Marketing | 0.64 | 0.64 | 0.64 | 0.64 | 0.65 |
Regulatory | 0.64 | 0.64 | 0.32 | 0.32 | 0.33 |
Operational | 0.64 | 0.64 | 0.64 | 0.64 | 0.33 |
Distance_Traveled | Origin | Destination | Route_Type | Traffic_Condition | |
Marketing | 0.64 | 0.64 | 0.64 | 0.64 | 0.64 |
Regulatory | 0.32 | 0.16 | 0.16 | 0.64 | 0.64 |
Operational | 0.32 | 0.64 | 0.64 | 0.64 | 0.64 |
Weat…_Condition | Road_Type | Toll_Amount | Parking_Fee | Maintenance_Cost | |
Marketing | 0.64 | 0.64 | 0.75 | 0.64 | 0.64 |
Regulatory | 0.64 | 0.64 | 0.75 | 0.64 | 0.64 |
Operational | 0.64 | 0.64 | 0.75 | 0.64 | 0.32 |
Insurance_Cost | Driver_Age | Driver_Gender | Driver_L…._Type | Driver_Experience | |
Marketing | 0.64 | 0.32 | 0.32 | 0.64 | 0.64 |
Regulatory | 0.64 | 0.32 | 0.64 | 0.16 | 0.32 |
Operational | 0.32 | 0.64 | 0.64 | 0.64 | 0.64 |
Vehicle_ID | Vehicle_Type | Manufacturer | Model | Year | |
---|---|---|---|---|---|
Marketing | 0.64 | 0.32 | 0.31 | 0.32 | 0.32 |
Regulatory | 0.16 | 0.64 | 0.64 | 0.64 | 0.64 |
Operational | 0.64 | 0.64 | 0.64 | 0.64 | 0.64 |
Color | Engine_Size | Fuel_Type | Fuel_Efficiency | Transmission | |
Marketing | 0.34 | 0.32 | 0.32 | 0.36 | 0.32 |
Regulatory | 0.72 | 0.64 | 0.64 | 0.64 | 0.64 |
Operational | 0.72 | 0.64 | 0.64 | 0.32 | 0.64 |
Mileage | Speed | Acceleration | Brake_Response | Steering_Responsiveness | |
Marketing | 0.64 | 0.64 | 0.64 | 0.64 | 0.64 |
Regulatory | 0.64 | 0.32 | 0.64 | 0.64 | 0.64 |
Operational | 0.32 | 0.64 | 0.64 | 0.64 | 0.64 |
Tire_Type | Tire_Pressure | Suspens…_Type | Airbag_Count | Seating_Capacity | |
Marketing | 0.64 | 0.64 | 0.64 | 0.64 | 0.32 |
Regulatory | 0.64 | 0.64 | 0.64 | 0.64 | 0.64 |
Operational | 0.64 | 0.64 | 0.64 | 0.64 | 0.64 |
Locati…_Enabled | Blueto…_Enabled | USB_Ports | Enterta…_System | Climate_Control | |
Marketing | 0.64 | 0.32 | 0.32 | 0.32 | 0.32 |
Regulatory | 0.16 | 0.64 | 0.72 | 0.72 | 0.72 |
Operational | 0.32 | 0.64 | 0.72 | 0.72 | 0.72 |
Naviga…_System | Traffic_Alert | Parking_Sensor | Backup_Camera | Lane_Depart…_Warning | |
Marketing | 0.68 | 0.73 | 0.64 | 0.64 | 0.64 |
Regulatory | 0.19 | 0.73 | 0.64 | 0.64 | 0.64 |
Operational | 0.36 | 0.73 | 0.64 | 0.64 | 0.64 |
Collis…_Warning | Adapt…_Control | Start_Time | End_Time | Trip_Duration | |
Marketing | 0.64 | 0.64 | 0.64 | 0.64 | 0.65 |
Regulatory | 0.64 | 0.64 | 0.32 | 0.32 | 0.33 |
Operational | 0.64 | 0.64 | 0.64 | 0.64 | 0.33 |
Dista…_Traveled | Origin | Destination | Route_Type | Traffic_Condition | |
Marketing | 0.64 | 0.64 | 0.64 | 0.64 | 0.64 |
Regulatory | 0.32 | 0.16 | 0.16 | 0.64 | 0.64 |
Operational | 0.32 | 0.64 | 0.64 | 0.64 | 0.64 |
Wea…_Condition | Road_Type | Toll_Amount | Parking_Fee | Maintenance_Cost | |
Marketing | 0.64 | 0.64 | 0.75 | 0.64 | 0.64 |
Regulatory | 0.64 | 0.64 | 0.75 | 0.64 | 0.64 |
Operational | 0.64 | 0.64 | 0.75 | 0.64 | 0.32 |
Insurance_Cost | Driver_Age | Driver_Gender | Driver_L…_Type | Driver_Experience | |
Marketing | 0.64 | 0.32 | 0.32 | 0.64 | 0.64 |
Regulatory | 0.64 | 0.32 | 0.64 | 0.16 | 0.32 |
Operational | 0.32 | 0.64 | 0.64 | 0.64 | 0.64 |
Project_Code | Loan_Principal | Applican_Person | ID_Card | |
---|---|---|---|---|
Marketing | 0.4 | 0.24 | 0.24 | 0.08 |
Regulatory | 0.64 | 0.24 | 0.24 | 0.08 |
Operational | 0.32 | 0.16 | 0.16 | 0.08 |
ID_Certificate | ID_Loan Contract | ID_Receipt | Contract_Start_Date | |
Marketing | 0.16 | 0.4 | 0.48 | 0.49 |
Regulatory | 0.08 | 0.56 | 0.56 | 0.65 |
Operational | 0.08 | 0.24 | 0.32 | 0.33 |
Contract_End_Date | Disbursing_BankName | Loan_Amount | Loan_Balance | |
Marketing | 0.49 | 0.72 | 0.4 | 0.4 |
Regulatory | 0.65 | 0.72 | 0.56 | 0.56 |
Operational | 0.33 | 0.64 | 0.32 | 0.32 |
Entered_Amount | Overdue_Amount | Loan_Start_Time | Loan_End_Time | |
Marketing | 0.59 | 0.67 | 0.57 | 0.57 |
Regulatory | 0.75 | 0.59 | 0.65 | 0.65 |
Operational | 0.51 | 0.35 | 0.33 | 0.33 |
Credit_Limit | Credit_Handler | Guarantee_Amount | Guarantee_Rate | |
Marketing | 0.32 | 0.43 | 0.4 | 0.42 |
Regulatory | 0.56 | 0.59 | 0.56 | 0.74 |
Operational | 0.24 | 0.27 | 0.32 | 0.49 |
Premium_Payment | Is_First_Guarantee | Is_Consistency | Loan_Form | |
Marketing | 0.4 | 0.39 | 0.68 | 0.48 |
Regulatory | 0.56 | 0.7 | 0.76 | 0.4 |
Operational | 0.32 | 0.47 | 0.52 | 0.24 |
Project_Type | In-Guarantee_Status | Product_Label | Is_Risk_Resolution | |
Marketing | 0.41 | 0.47 | 0.32 | 0.76 |
Regulatory | 0.81 | 0.71 | 0.8 | 0.52 |
Operational | 0.65 | 0.55 | 0.48 | 0.36 |
Is_Bank_Pre-litigation | City | District | Township | |
Marketing | 0.56 | 0.34 | 0.35 | 0.41 |
Regulatory | 0.32 | 0.74 | 0.75 | 0.73 |
Operational | 0.16 | 0.58 | 0.59 | 0.57 |
Project_Manager | Creation_Time | Data_Acquisition | Industry_Level _1 | |
Marketing | 0.42 | 0.56 | 0.4 | 0.37 |
Regulatory | 0.58 | 0.72 | 0.64 | 0.85 |
Operational | 0.26 | 0.56 | 0.48 | 0.61 |
Industry_Level_2 | Industry_Level_3 | Industry_Level_4 | ||
Marketing | 0.33 | 0.39 | 0.38 | |
Regulatory | 0.81 | 0.79 | 0.78 | |
Operational | 0.57 | 0.55 | 0.54 |
Scenario | Group | Attributes | Average Sensitivity | Grading Level |
---|---|---|---|---|
Marketing | G1 | ‘Transmission’ | 0.67995 | Level 3 |
G2 | ‘Manufacturer’, ‘Model’, ‘USB_Ports’, ‘Climate_Control’, ‘Fuel_Efficiency’, ‘Engine_Size’, ‘Year’, ‘Vehicle_Type’, ‘Driver_Gender’, ‘Color’, ‘Fuel_Type’, ‘Bluetooth_Enabled’, ‘Seating_Capacity’, ‘Entertainment_System’, ‘Driver_Age’ | 0.67991 | Level 3 | |
Regulatory | G1 | ‘Vehicle_ID’ | 0.84000 | Level 4 |
G2 | ‘Location_Enabled’ | 0.83994 | Level 4 | |
G3 | ‘Destination’ | 0.83993 | Level 4 | |
G4 | ‘Origin’ | 0.83992 | Level 4 | |
G5 | ‘Driver_License_Type’ | 0.83991 | Level 4 | |
G6 | ‘Navigation_System’ | 0.80351 | Level 4 | |
G7 | ‘Driver_Experience’, ‘Driver_Age’, ‘Start_Time’, ‘Trip_Duration’, ‘Distance_Traveled’, ‘Speed’, ‘End_Time’, ‘Vehicle_Type’ | 0.60663 | Level 3 | |
Operational | G1 | ‘Distance_Traveled’, ‘Trip_Duration’, ‘Mileage’, ‘Fuel_Efficiency’, ‘Maintenance_Cost’, ‘Insurance_Cost’, ‘Location_Enabled’ | 0.678383 | Level 3 |
G2 | ‘Navigation_System’ | 0.643513 | Level 3 |
Scenario | Group | Attributes | Average Sensitivity | Grading Level |
---|---|---|---|---|
Marketing | G1 | ID_Card | 0.91823 | Level 4 |
G2 | ‘ID_Certificate’ | 0.83859 | Level 4 | |
G3 | ‘Loan_Principal’ | 0.75847 | Level 4 | |
G4 | ‘Applican_Person’ | 0.75806 | Level 4 | |
G5 | ‘Product_Label’ | 0.68011 | Level 3 | |
G6 | ‘Credit_Limit’ | 0.67759 | Level 3 | |
G7 | ‘Industry_Level_2’, ‘Industry_Level_1’ | 0.65201 | Level 3 | |
G8 | ‘Project_Type’ | 0.59498 | Level 3 | |
G9 | ‘Guarantee_Amount’, ‘Loan_End_Time’, ‘Loan_Amount’, ‘Contract_End_Date’, ‘Entered_Amount’, ‘City’, ‘Project_Code’, ‘ID_Receipt’, ‘Loan_Form’, ‘Loan_Start_Time’, ‘Is_First_Guarantee’, ‘Loan_Balance’, ‘District’, ‘Township’, ‘Guarantee_Rate’, ‘Creation_Time’, ‘Project_Manager’, ‘ID_Loan Contract’, ‘In-Guarantee_Status’, ‘Credit_Handler’, ‘Contract_Start_Date’, ‘Industry_Level_3’, ‘Industry_Level_4’ | 0.55432 | Level 3 | |
G10 | ‘Data_Acquisition’, ‘Is_Bank_Pre-litigation’, ‘Premium_Payment’ | 0.54667 | Level 3 | |
Regulatory | G1 | ‘ID_Certificate’ | 0.91859 | Level 4 |
G2 | ‘ID_Card’ | 0.91823 | Level 4 | |
G3 | Loan_Principal | 0.75847 | Level 4 | |
G4 | Applican_Person | 0.75806 | Level 4 | |
G5 | ‘Is_Bank_Pre-litigation’ | 0.67999 | Level 3 | |
G6 | ‘Guarantee_Amount’, ‘Overdue_Amount’, ‘Credit_Limit’, ‘Loan_Amount’, ‘Loan_Balance’, ‘Credit_Handler’, ‘Loan_Form’, ‘Is_Risk_Resolution’, ‘ID_Receipt’, ‘ID_Loan Contract’, ‘Project_Manager’ | 0.45028 | Level 2 | |
G7 | ‘Premium_Payment’ | 0.44000 | Level 2 | |
Operational | G1 | ‘ID_Certificate’ | 0.91859 | Level 4 |
G2 | ‘ID_Card’ | 0.91823 | Level 4 | |
G3 | ‘Is_Bank_Pre-litigation’ | 0.8400 | Level 4 | |
G4 | ‘Loan_Principal’ | 0.83847 | Level 4 | |
G5 | ‘Applican_Person’ | 0.83806 | Level 4 | |
G6 | ‘Loan_Form’ | 0.75999 | Level 4 | |
G7 | ‘ID_Loan Contract’ | 0.75873 | Level 4 | |
G8 | ‘Credit_Limit’ | 0.75759 | Level 4 | |
G9 | ‘Project_Manager’ | 0.74036 | Level 3 | |
G10 | ‘Credit_Handler’ | 0.73389 | Level 3 | |
G11 | ‘Loan_Amount’ | 0.67839 | Level 3 | |
G12 | ‘Guarantee_Amount’ | 0.67767 | Level 3 | |
G13 | ‘Loan_Balance’ | 0.67729 | Level 3 | |
G14 | ‘Overdue_Amount’ | 0.645523 | Level 3 | |
G15 | ‘Is_Risk_Resolution’ | 0.64359 | Level 3 | |
G16 | ‘Data_Acquisition’, ‘Premium_Payment’ | 0.6 | Level 3 | |
G17 | ‘Loan_Start_Time’, ‘Contract_Start_Date’, ‘ID_Receipt ’, ‘Is_Consistency’, ‘Guarantee_Rate’, ‘City’, ‘Industry_Level_2’, ‘Project_Code’, ‘Product_Label’, ‘In-Guarantee_Status’, ‘Loan_End_Time’, ‘District’, ‘Township’, ‘Creation_Time’, ‘Industry_Level_4’, ‘Entered_Amount’, ‘Industry_Level_3’, ‘Contract_End_Date’, ‘Is_First_Guarantee’ | 0.52818 | Level 3 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, Y.; Wu, Z.; Li, J.; Xie, L. A Multi-Scene Automatic Classification and Grading Method for Structured Sensitive Data Based on Privacy Preferences. Future Internet 2025, 17, 384. https://doi.org/10.3390/fi17090384
Li Y, Wu Z, Li J, Xie L. A Multi-Scene Automatic Classification and Grading Method for Structured Sensitive Data Based on Privacy Preferences. Future Internet. 2025; 17(9):384. https://doi.org/10.3390/fi17090384
Chicago/Turabian StyleLi, Yong, Zhongcheng Wu, Jinwei Li, and Liyang Xie. 2025. "A Multi-Scene Automatic Classification and Grading Method for Structured Sensitive Data Based on Privacy Preferences" Future Internet 17, no. 9: 384. https://doi.org/10.3390/fi17090384
APA StyleLi, Y., Wu, Z., Li, J., & Xie, L. (2025). A Multi-Scene Automatic Classification and Grading Method for Structured Sensitive Data Based on Privacy Preferences. Future Internet, 17(9), 384. https://doi.org/10.3390/fi17090384