Privacy-Preserving Information Extraction for Ethical Case Studies in Machine Learning Using ChatGLM-LtMP

Gao, Xindan; Ba, Xinyi; Xing, Jian; Liu, Ying

doi:10.3390/electronics14071352

Open AccessArticle

Privacy-Preserving Information Extraction for Ethical Case Studies in Machine Learning Using ChatGLM-LtMP

College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(7), 1352; https://doi.org/10.3390/electronics14071352

Submission received: 14 February 2025 / Revised: 10 March 2025 / Accepted: 19 March 2025 / Published: 28 March 2025

(This article belongs to the Special Issue Advancements in Distributed Intelligent Security Through AI-Driven Solutions)

Download

Browse Figures

Versions Notes

Abstract

Ensuring privacy protection in machine learning is crucial for handling sensitive information, particularly in ethical case studies within computer engineering. Traditional information extraction methods often expose private data to risks such as membership inference and reconstruction attacks, compromising confidentiality. To address these concerns, we propose ChatGLM-LtMP, a privacy-preserving information extraction framework that integrates Least-to-Most Prompting and P-Tuning v2 for structured and secure data retrieval. By employing controlled prompting mechanisms, our approach minimizes data exposure while maintaining high accuracy (93.71%), outperforming baseline models. Additionally, we construct a knowledge graph using the Neo4j 4.4 database and integrate LangChain 0.2 for case-based intelligent question answering. This framework enables secure and interpretable extraction of ethical case data, making it suitable for applications in sensitive machine learning scenarios. The proposed method advances information extraction, safeguarding sensitive ethical cases from potential attacks in automated learning environments.

Keywords:

privacy protection; data security; large language model; information extraction; knowledge graph; intelligent question answering

1. Introduction

Computer technology has been extensively applied to various aspects of daily life, such as security surveillance, search engines, and autonomous driving. However, the inherent uncertainties and dual nature of these technologies have exposed the inadequacies of modern ethical standards, giving rise to a series of computer engineering ethics issues, including privacy leaks, algorithmic price discrimination, and algorithmic bias, which pose significant challenges to data privacy and confidentiality [1,2]. The emergence of these issues not only threatens individual rights but also undermines public trust in technological systems, highlighting the urgency of integrating ethical considerations into technological development. In computer engineering ethics research, case analysis serves as a crucial method for uncovering the essence of these problems and exploring potential solutions [3,4]. However, ethical cases often involve highly sensitive data, including personally identifiable information, behavioral motivations, and the complex factors underlying ethical decision-making. Traditional information extraction methods, which rely excessively on data, increase the risk of data exposure and hinder the sustainable development of technology. Therefore, it is of paramount importance to construct a framework capable of efficiently extracting data value while robustly protecting privacy.

Using Large Language Models (LLMs) for information extraction represents an advanced application of machine learning techniques in natural language processing. Leveraging machine learning methodologies and frameworks, LLMs enable the understanding and generation of human language, thereby producing structured outputs without storing raw data, significantly reducing privacy risks. Moreover, LLMs exhibit robust Few-Shot and even Zero-Shot learning capabilities, allowing them to adapt to new tasks with minimal examples or prompts rapidly. This reduces reliance on large-scale annotated datasets and further enhances privacy protection. Additionally, through a series of prompt engineering techniques, the extraction performance of these models can be further improved, providing more refined technical support for information extraction tasks. The extracted results can be used to construct knowledge graphs, which organize and represent data graphically, making the information more intuitive and easier to comprehend. Furthermore, intelligent question answering systems can be implemented by integrating LLMs with knowledge graphs using the LangChain framework, offering users in-depth information support. This enables users to obtain instant answers anytime and anywhere, along with personalized responses and recommendations. Such integration brings significant benefits across multiple dimensions, including efficiency, cost-effectiveness, user experience, service quality, and data analysis.

Currently, research on applying LLMs to privacy-preserving information extraction in ethical cases is still in its infancy, and this article innovatively explores this topic. The main contributions of this article are as follows.

We have collected and organized different types of ethical case datasets, which are stored in their respective local systems. Under the framework of federated learning, models are trained using local data, and only model parameter updates are uploaded to a central server for aggregation, achieving a balance between data privacy protection and shared data analysis. The construction of these datasets provides a solid data foundation for model training and application, supporting subsequent intelligent analysis and decision-making;
We propose an information extraction model for computer engineering ethical cases based on ChatGLM-LtMP. This model is capable of efficiently completing extraction tasks through Few-Shot Prompting, reducing data dependency and minimizing the risk of privacy leakage;
We employed the Least-to-Most (LtM) Prompting method to construct the prompt templates, guiding the model to extract only a portion of the case information at each step, thereby reducing the risk of exposure to sensitive data. Moreover, we optimized the model parameters with the P-Tuning v2 technique to further enhance the model’s performance. Extensive information extraction experiments were conducted on both general and ethics case datasets constructed in this paper, with a comparative analysis against current mainstream models. The results indicate that the ChatGLM-LtMP model outperforms existing methods across multiple metrics;
Additionally, we integrated the Neo4j graph database to construct a knowledge graph that stores only structured information, thus avoiding the storage of raw text from ethics cases and enhancing data security. Furthermore, by leveraging the LangChain framework, we implemented intelligent question answering for cases, setting dynamic prompt restrictions to limit the data accessible, which simultaneously protects privacy and improves the real-time performance and accuracy of the question and answer service. This provides intelligent support for research in the field of computer engineering ethics.

The rest of the paper is organized as follows. Section 2 discusses related work. Section 3 describes constructing the computer engineering ethics case dataset. Section 4 proposes the ChatGLM-LtMP model. Section 5 completes the information extraction experiments. Section 6 constructs the intelligent question answering system. Section 7 concludes the paper.

2. Related Work

2.1. Research Status of Large Language Models in Information Extraction

In recent years, approaches based on LLMs have become the current SOTA representatives for information extraction-based natural language processing tasks [5,6]. Its powerful representation learning and language generation capabilities effectively reduce the need for data annotation. It enhances the generalization ability of the model, thus significantly improving the efficiency and performance of information extraction. Currently, LLMs show high efficiency in general domain text extraction. With the continuous refinement of the task, different models combined with their characteristics are applied to each specific domain. For example, Song et al. [7] demonstrated the feasibility of LLMs such as GPT-4.0 in key insight extraction for scientific articles and the advantages of local deployment of open-source LLMs regarding data security, cost-effectiveness, and scalability. Zhang et al. [8] evaluated the performance of ChatGPT in Chinese using nine datasets on four tasks, and the results showed that ChatGPT in the Chinese environment still has some limitations, especially in factual question and answer, and professional domain performance. Wang et al. [9] verified the effectiveness of ChatGLM for RWD extraction by comparing the performance of ChatGLM and LlaMA on text extraction. Sun et al. [10] sincerely fused ChatGLM with coal mine gas sensor data to create an intelligent assessment model, significantly improving the efficiency and accuracy of mining risk assessment and providing intelligent support for miners’ decision-making. Since the computer engineering ethics cases studied in this paper are mainly Chinese texts, they contain many professional terms with a complex logical structure, making identification difficult. In addition, with the rapid development of computer technology, the frequency of new cases is high, timeliness becomes an important consideration, and the need to extract information quickly further enhances the difficulty of information extraction. Based on the above, this study adopts the ChatGLM model, which is suitable for the Chinese context and can handle complex terms to address these challenges effectively.

2.2. Research Status of Prompt Engineering

Among the studies applying LLMs, Prompting Engineering is popular and performs well in various professional fields due to its low computational resource requirements and natural language interaction advantages [11,12,13]. Among them, Zero-Shot Prompting and Few-Shot Prompting have become the mainstream methods for LLMs to cope with complex scenarios in the real world, and they significantly improve the ability of models to handle diverse tasks without the need for large amounts of labeled data, thus promoting the wide application of computer technology in several fields [14,15]. Zero-Shot Prompting does not require specific data. It relies on the generalization capabilities of the model to understand and perform tasks, making it more suitable for tasks in general domains. Few-Shot Prompting relies on the model’s ability to learn from a few examples to guide the model in the right direction and is more suitable for domain-specific tasks. In addition, prompt design [16] is also crucial to help the model better understand the user’s intent and problem. The main advantage of Chain-of-Thought (CoT) [17], a widely applicable cue design methodology, is the ability to guide the LLMs to analyze the problem in-depth and come up with the answer using a step-by-step reasoning process. This approach makes the reasoning path of the model transparent and traceable, significantly facilitating the user’s understanding and verification of the reasoning process. However, there are limitations in solving complex problems. Self-Consistency (SC) [18], as a reasoning enhancement technique, has been demonstrated to be stable and effective across diverse sampling strategies and parameter configurations, significantly improving model robustness and interpretability, particularly in complex problem-solving scenarios. However, generating multiple reasoning paths and voting on them requires substantial computational resources, leading to increased operational costs. To mitigate resource consumption, Automatic-Chain-of-Thought (Auto-CoT) [19] was proposed. This method leverages Zero-Shot-CoT techniques and simple heuristic rules to automatically generate reasoning chains, offering advantages in automation, scalability, and performance efficacy. Nevertheless, it may introduce hallucinations such as factual inconsistencies and logical errors, necessitating a balance between complexity and diversity in reasoning processes. Therefore, designing methodologies that address complex problems while enhancing performance through phased interactions is critical. ReAct [20] addresses this by enabling LLMs to synergize reasoning traces and task-specific actions in an interleaved manner, thereby improving flexibility in task resolution. However, its heavy reliance on the availability and accuracy of external resources makes it vulnerable to cascading errors if inaccuracies occur during reasoning. To resolve this limitation, Zhou et al. [21] proposed the Least-to-Most Prompting framework. This approach first guides the model to decompose the original problem into sub-problems via iterative prompting, then sequentially solves these sub-problems to derive the final solution. Such a hierarchical mechanism avoids the redundant multi-path computations inherent in SC and circumvents the rigidity of heuristic rules in Auto-CoT. The LtM method demonstrates superior effectiveness in complex scenarios, including common sense reasoning, natural language processing, mathematical problem-solving, and contextual question answering tasks [22].

In this paper, we combine the sample prompts and LtM methods and try to fine-tune the ChatGLM model with Zero-Shot-LtM and Few-Shot-LtM, respectively, with the expectation of improving the information extraction ability. The Zero-Shot-LtM methodology enables models to perform information extraction on novel ethical cases without requiring task-specific training data. This approach is particularly advantageous for data-scarce or privacy-sensitive domains, as it circumvents potential exposure risks inherent in data annotation and utilization. In contrast, the Few-Shot-LtM method involves fine-tuning with limited case samples, preserving model generalization capability while rapidly adapting to the logical constructs and judgment criteria specific to ethical cases. Its strength lies in harmonizing model universality with case-specific nuances, thereby achieving an optimal balance between extraction accuracy and computational efficiency. By comparing the application effects of these two methods on the ChatGLM model, we aim to identify a more efficient and practical fine-tuning strategy to provide robust support for the field of computer engineering ethics. In addition, parameter fine-tuning is one of the standard methods to improve model performance. Traditional fine-tuning requires training all the parameters in the model, but as the number of parameters grows, this method decreases efficiency and consumes vast resources [23]. To solve this problem, several parameter-efficient fine-tuning methods have emerged, such as P-Tuning, LoRA, and QLoRA. Among them, LoRA is suitable for scenarios that require fast fine-tuning and have limitations on storage and computational resources; QLoRA, an extension of LoRA, employs quantization techniques to further reduce the model’s computational demands, thereby significantly lowering training costs; and P-Tuning is more general, with outstanding performance in dealing with complex questions and tasks, which is more suitable for the research tasks in this paper [24,25,26]. In particular, ChatGLM officially provides the P-Tuning v2 fine-tuning scheme, which provides strong support for efficient fine-tuning of model parameters. Therefore, in this paper, we use P-Tuning v2 to complete the fine-tuning task of the ChatGLM model.

2.3. Research Status of Intelligent Question Answering Systems

As a key application in artificial intelligence, the core purpose of an intelligent question answering system is to bring users a convenient, efficient, and accurate information retrieval experience. Intelligent question answering understands the user’s query intent by simulating human dialogue and providing corresponding answers. Traditional intelligent question answering systems are often limited by rule-based pattern matching and keyword recognition techniques, which are insufficient to handle complex, ambiguous, or in-depth user queries, making it difficult to achieve deep semantic understanding and problem resolution. As a result, they often cannot provide satisfactory answers when responding to diverse user information needs, thus limiting their effectiveness and user experience in practical applications. In recent years, combining knowledge graphs and large-scale language models has shown great potential in intelligent question answering systems. This combination enhances knowledge querying, knowledge reasoning, and knowledge visualization; and effectively copes with problems such as illusions and interpretability that may occur when models deal with complex problems. Yang et al. [27] successfully constructed a specific intelligent question answering system for water engineering inspection by fusing knowledge graphs and LLMs. The system significantly improved the efficiency and accuracy of water conservancy project inspection. Yang et al. [28] proposed an intelligent question answering system by fusing GPT and knowledge graphs, which demonstrated significant potential and application value in flood emergency decision-making, and provided a valuable reference for future innovation and development of emergency response technology. Hence, this paper constructs an intelligent question answering system, offering the public an interactive and responsive intelligent platform, with the aim of enhancing their understanding of ethical issues, as well as improving awareness of data security and the capability of privacy protection.

3. Constructing Datasets

The scope of ethical cases is broad, and with the advancement of computer and artificial intelligence technologies, the rate of updates is accelerating. To enhance the efficiency of dataset construction and updates, the authors of this paper have taken targeted measures by conducting independent data collection for specific categories of ethical cases. Furthermore, federated learning technology has been employed to ensure data security and privacy protection, enabling secure data sharing among multiple parties. This approach has improved the quality and diversity of the dataset without exposing the original data.

In the data acquisition stage, we use the vast amount of resource data on the Internet to crawl case information from several authoritative official websites. In addition, we obtain cases from open academic journals and relevant industry reports to supplement the dataset. The dataset not only includes the main text information of the cases but also provides a wealth of contextual information, such as case background, outcome analysis, and social impact, which facilitates in-depth research. At the same time, a wide range of case types, such as privacy protection, data security, intellectual property rights, etc., are covered to ensure the diversity and representativeness of the data. The data used aligns with the requirements for use in academic research and does not involve commercial use or infringement. The workflow for web data crawling is outlined as follows. Firstly, the URL list of the target page is obtained, and each page URL is extracted one by one; then, the list of case URLs is parsed, extracted, and stored; finally, the case text is extracted from the case URLs, and the text is written to a local file until all URLs are processed.

During the data processing phase, we implemented a three-stage progressive cleaning strategy to systematically enhance data quality and compliance. First Stage: Manual Review. Manual screening was performed to remove blank texts, case-irrelevant information (e.g., off-topic comments, duplicate paragraphs, and malformatted content). Second Stage: Automated Cleaning via Canopy-K-means Clustering [29]. The Canopy-K-means algorithm, a hybrid method combining coarse-grained pre-clustering (Canopy) and fine-grained precise clustering (K-means), was adopted for noise reduction and anomaly detection in large-scale, high-dimensional data. The workflow proceeded as follows: Firstly, unstructured ethics case texts were encoded into dense vectors using a pre-trained model to generate numerical features suitable for clustering. Secondly, data was partitioned using loose threshold T1 (identifying potentially similar texts) and tight threshold T2 (excluding highly overlapping candidate cluster centers). For instance, texts related to “autonomous vehicle liability determination” were grouped into Canopy clusters encompassing technical and legal dimensions. Then, the number of cluster centers K was determined proportionally to the square root of the dataset size to avoid overfitting (K too large) or cluster impurity (K too small). K-means++ initialized centroids, followed by iterative objective function optimization. Small clusters and edge points were removed as anomalies. Finally, texts with identical MD5 hashes or semantic similarity exceeding thresholds were deduplicated (retaining the earliest occurrence). Outliers (e.g., semantically irrelevant or malformatted texts) were flagged for manual verification. This stage reduced duplicate texts by 7%, achieved a 23% anomaly removal rate, and improved efficiency by 40% compared to single-round manual cleaning. Third Stage: Expert Validation. Domain experts reviewed the cleaned data to verify privacy compliance and ensure alignment with research objectives. The final corpus comprised 2632 validated text entries.

Building on insights from domain experts and requirements of specific application scenarios, we conducted an in-depth analysis of privacy-related information in case studies. Based on this analysis, 11 entity types and 10 relationship types between entities were defined. These entity types specifically include: the core content of the event (Event); the specific time at which the event occurred (Time); the geographical location of the event, encompassing privacy information of individuals or organizations, such as residence, workplace, or movement paths (Address); the full names or titles of the individuals involved (Name); the organizations involved (Organization); the commercial companies involved (Company); the government agencies involved (Government); technological factors (Technology); information related to the legal obligations and rights of individuals or organizations (Legal); ethical judgments and codes of conduct related to individuals or organizations (Ethical); monetary amounts associated with personal financial situations or business transaction payments (Amount). On this foundation, we defined the types of relationships between entities, which include Event-Time, Event-Location, Involved-Person, Involved-Organization, Involved-Company, Involved-Government, Technology-Factor, Legal-Aspect, Ethical-Aspect, and Involved-Amount. To ensure annotation quality, three annotators independently labeled 2632 text entries using the Label Studio tool. The inter-annotator agreement was validated through Cohen’s Kappa consistency test. A random subset (10%) of annotated samples was reviewed by domain experts to correct mislabeled instances (e.g., “data collection protocol” erroneously labeled as Technology instead of Legal). Subsequently, the Isolation Forest algorithm was employed to detect anomalous annotations (e.g., relationship types conflicting with contextual semantics), resulting in the removal of 1.2% low-quality samples. Ultimately, 10,604 high-confidence entity-relationship pairs were retained for analysis, as summarized in Table 1.

To analyze the frequency distribution of entity-relationship types, a histogram was generated with a bin width of 5, as illustrated in Figure 1.

Analysis of the data in Table 1 and Figure 1 reveals a significant distribution bias in entity-relationship types within the dataset. The entity-relationship pairs, Technology-Factor and Event-Time, exhibit the highest frequencies, indicating that technical factors and temporal information dominate the dataset. This reflects the domain-specific characteristics of computer engineering ethics cases, which are typically driven by computer-related technologies and associated with fixed occurrence times. In contrast, Involved-Government and Involved-Organization appear least frequently, suggesting that events involving government or organizational participation are rare in the dataset. Such imbalance may lead to model bias toward high-frequency categories, insufficient generalization for low-frequency categories, and distortion of evaluation metrics.

To enhance the model’s understanding of low-frequency domain knowledge, we introduce a specialized corpus related to computer engineering ethics for knowledge augmentation. The newly added corpus consists of 12 authoritative documents (totaling 21,247 words), covering The ACM Code of Ethics, GDPR provisions, and other professional literature. The specific processing workflow is as follows: Initially, the raw documents are manually reviewed to eliminate irrelevant expressions. Subsequently, regular expressions are employed to remove extraneous content (such as headers, footers, etc.) from the files, and OCR recognition is performed on the PDF documents to ensure text integrity. Following this, the data annotation process described earlier is utilized to annotate 106 entity-relationships. Finally, the preprocessed corpus knowledge is integrated with the existing dataset for subsequent model training.

4. ChatGLM-LtMP Model Framework

In this study, we propose the ChatGLM-LtMP model. ChatGLM is a conversational language model jointly developed by Tsinghua University and Zhipu AI, built on the GLM pre-training architecture and optimized through an autoregressive blank-filling objective. The GLM-4-9B-chat model used in this paper retains the excellent features of its predecessors, such as smooth dialogue interaction and low deployment barriers, while demonstrating outstanding performance in handling complex problems requiring specific task instructions and constraints. This provides strong potential for customized fine-tuning. In addition, the model is adapted to the computational resources available in this lab and can run efficiently on two 24 GB VRAM 4090 GPUs, ensuring the smooth running of the experiments. To focus on entity-relationship extraction in computer engineering ethics cases, we employed the Least-to-Most Prompting method to construct specialized prompt templates and further enhanced the model’s performance by fine-tuning its parameters using the P-Tuning v2 technique. The core workflow of the model includes: activating the GLM-4-9B-chat model, constructing prompt templates, performing prompt-based fine-tuning, training the model, and outputting the results of entity-relationship extraction. This process is illustrated in Figure 2.

4.1. Constructing Prompt Templates Using the Least-to-Most Prompting Method

Computer engineering ethics cases often involve complex scenarios, including case context, causes, social impacts, and public evaluation. Direct extraction of entities and relationships not only increases the task’s difficulty but also leads to unsatisfactory recognition results due to the model’s insufficient understanding of the context. To solve this problem, we define a prompt template based on the task requirements to guide the model in following the instructions to complete the specified task. We adopt the Least-To-Most Prompting method to design the template to optimize the performance of the GLM-4-9B chat model. This approach decomposes the information extraction problem into smaller, more manageable sub-problems, guiding the model to progressively process and understand each part and ultimately solve the entire complex problem. Throughout, they remain transparent, improving the model’s accuracy and adaptability.

Take the case of ‘telecom fraud’ as an example. The core workflow of the template construction is shown in Figure 3.

Set the model’s role. The instruction for the model’s role is: “You are an expert in the field of computer engineering ethics, specializing in the task of joint entity and relation extraction in ethical case studies”. The explicit task for the model to accomplish is: “To learn joint entity and relation extraction techniques based on the given case”. Such role setting constrains the model’s creativity to a certain extent, thereby enhancing the precision of information processing.
Set model instructions. Provide a detailed description of the entities and entity-relationship types the model must extract. Given that case information is often complex and involves numerous factors, this study guides the model to focus on extracting the primary entity information directly related to the event to enhance the efficiency of case utilization. For instance, in the text describing “Mr. Guo, born in 1987, fell victim to a telecommunications fraud in 2024”, the model should prioritize the extraction of the key entity of the event’s occurrence 2024, while disregarding the secondary entity of Mr. Guo’s birth year, 1987. Additionally, when dealing with nested entities, the model is instructed to extract information hierarchically. It should first extract the outermost entities and then proceed to the nested entities within, ensuring that entities at each level are accurately identified for subsequent case retrieval and application. In addition, we have adopted a two-stage disambiguation framework that integrates semantic representation and contextual reasoning for ambiguous entities. Initially, we utilize the dense representations from large language models to filter out the top-5 candidate entities that are similar to those in the entity library, and record their coarse screening scores. Subsequently, by incorporating prompt statements, we train a classifier using cross-entropy loss to generate fine-grained similarity scores. A learnable weight parameter α is introduced to adaptively fuse the scores from both stages, thereby effectively addressing the issue of entity polysemy. Finally, the model is directed to output the extraction results in a fixed format, emphasizing the exclusion of irrelevant content. To mitigate the impact of text length on extraction effectiveness, we pre-segment the text into sections of no more than 1024 characters and summarize the section abstracts to distill the core content, extracting key information only from the paragraphs that contain specific case details.
Design a hierarchical prompt structure. Taking a telecom fraud case as an example, we utilize the Least-to-Most Prompting approach to guide the construction of prompt templates, employing a step-by-step decomposition method to address complex problems. First, to answer the question, “What are the entity-relationship triplets in the case?”, we need to identify all entities involved, as this serves as the foundation for subsequent analysis. To address the question, “What are the entities in the case?”, we further break it down into two sub-questions: “What is the core event described in the case?” and “What are the other types of entities in the case?”. Next, we sequentially extract the event entity, “Telecom fraud”, along with ten other entity types, such as the person entity “Mr. Guo” and the organization entity “Baotou Police”. If multiple entities of the same type exist, their importance must be evaluated. Primary entities should be retained, while secondary entities can be discarded. For example, among location entities, “Fuzhou” is the event location and is considered a primary entity, whereas “Baotou” is unrelated to the event location and can be discarded as a secondary entity. Both “AI” and “AI face-swapping technology” are primary entities and should be retained among technical entities. Finally, based on the answers to the previous sub-questions, we progressively extract entity-relationship triplets. The model can systematically and incrementally handle complex problems through this hierarchical prompt structure, ensuring accuracy and completeness at each step. Ultimately, it outputs a comprehensive set of entity-relationship triplets, providing a solid foundation for subsequent analysis.
Model fine-tuning. We employed two fine-tuning approaches to enhance the model’s performance on specific tasks: Zero-Shot-LtM and Few-Shot-LtM. In Zero-Shot-LtM fine-tuning, no case-specific information is provided. The model is guided solely through the design of general-purpose prompt templates to complete the task. This approach’s core idea is to leverage the model’s pre-trained knowledge, enabling it to generate high-quality outputs directly without additional examples. In contrast, Few-Shot-LtM requires the pre-construction of case templates that include all entity-relationship types. These case templates ensure that the model is exposed to balanced samples during learning, thereby mitigating performance bias caused by sample imbalance. By comparing the performance of models fine-tuned with Zero-Shot-LtM and Few-Shot-LtM, we retained the best-performing training results to ensure optimal model performance in subsequent applications.

4.2. P-Tuning v2 Fine-Tuning Method

The ChatGLM model has an enormous dataset, and the computational power and data volume required for full-scale fine-tuning are excessively large. Additionally, when combined with the Least-to-Most Prompting method, the model’s training cycle is prolonged, and training costs increase significantly. To address these challenges, this study leverages the P-Tuning v2 fine-tuning technical documentation provided by the ChatGLM team to optimize the model further, aiming to enhance its performance on entity-relationship extraction tasks in computer engineering ethics cases. P-Tuning v2, an efficient and high-performance fine-tuning method, has been widely adopted across various domains [30]. It introduces continuous prompt vectors into the input of each layer of the pre-trained language model, requiring adjustments to only 0.1–3% of the parameters to achieve performance comparable to full fine-tuning. This enables the model to adapt to new tasks and data requirements efficiently. The working principle of P-Tuning v2 is illustrated in Figure 4.

When the input text

X

is provided, the virtual prompt

P = \{p_{1}, p_{2}, \dots, p_{i}\}

at each layer of the model is combined with the input

X

to jointly participate in the model computation. Each layer’s prompt

p_{i}

is generated from the previous layer’s prompt

p_{i - 1}

and the model parameters

m = \{m_{1}, m_{2}, \dots, m_{i}\}

, and is continuously updated during iterative training by minimizing the loss function

L

. This approach enhances model performance without significantly increasing computational overhead. The relevant formulas are shown in Equations (1) to (3), where

X^{'}

represents the input data processed with prompts, and

P^{'}

denotes the final prompt parameters.

p_{i} = f (p_{i - 1}, m_{i})

(1)

X^{'} = (X \oplus p_{i})

(2)

P^{'} = \min L (X^{'})

(3)

5. Experimental Results and Discussion of Information Extraction

5.1. Experimental Setup

The experiments were conducted using the Windows 11 × 64 operating system with an Intel Core i9 CPU at a central frequency of 3.91 GHz. The GPU was an NVIDIA RTX 4090 with 80 GB of video memory. The programming language used was Python 3.11.5. During model training, key parameters such as pre_seq_len and learning_rate were adaptively adjusted, and the experimental parameter settings are shown in Table 2.

This study employs Precision (P), Recall (R), and F1 Score as evaluation metrics for the information extraction experiments. A higher F1 Score indicates better model performance [31].

5.2. Comparative Experiments Under Different Fine-Tuning Methods

To investigate the impact of Zero-Shot-LtM and Few-Shot-LtM on the performance of the ChatGLM-LtMP model, we conducted evaluation experiments on the same ethics case dataset using Zero-Shot-LtM, One-Shot-LtM, Three-Shot-LtM, Five-Shot-LtM, and Few-Shot-LtM (with sample sizes ranging from 6 to 9) [32]. For the Few-Shot-LtM setting, we recorded the highest F1 Score achieved by the model during testing. The experimental results are presented in Table 3.

By analyzing the data, we observe that the model’s F1 Score exhibits an overall increasing trend from Zero-Shot-LtM to Few-Shot-LtM. For instance, the F1 Score for “Involved-Person” increases from 87.09% to 95.33%, representing an improvement of 8.24%, and the F1 Score for “Involved-Government” rises from 82.36% to 91.61%, marking an increase of 9.25%. However, it is noteworthy that as the number of samples increases, the rate of improvement in the F1 Score begins to slow. This phenomenon suggests that, under the current model architecture and dataset conditions, the model can effectively learn and extract most entity-relationships by providing a limited number of annotated samples.

Furthermore, we observe that the F1 Scores for the entity types “Legal-Aspect” and “Ethical-Aspect” under Few-Shot-LtM are relatively low. This phenomenon can be attributed to two main reasons: First, the tail entities associated with these categories are highly abstract and complex. Second, accurately identifying these entity-relationships requires the model to possess specialized knowledge in legal and ethical domains and strong language comprehension and reasoning capabilities, which significantly increase the task’s difficulty. However, it is worth noting that the Few-Shot-LtM model improves 8.26% and 8.04% in F1 Score on these two types of entity-relationships, respectively, and this improvement is significant. This finding demonstrates that the characteristics of Few-Shot-LtM hold significant reference value for entity-relationship extraction tasks in resource-constrained or emerging domains. With rapid computer tech progress and evolving engineering ethics cases, this model offers an effective way to learn complex entity-relationships. Its advantage is capturing and recognizing new, complex relationships with limited data, opening new AI applications in the field.

Furthermore, understanding the relationship between sample size and performance saturation is critical for evaluating the model’s learning capabilities. To this end, we extended our experiments to explore performance variations across different sample sizes, ranging from Few-Shot-LtM (6–9 samples) to Few-Shot-LtM (10–30 samples), and analyzed the saturation points. The experimental results revealed that for most entity-relationship types, performance saturated at approximately 17 samples, with marginal performance gains (<0.05%) observed beyond this point. For instance, the F1 Score for Involved-Person increased by an average of 0.21% between 10–17 samples but only by 0.03% between 18–30 samples. Low-frequency entities exhibited slightly higher saturation points, around 21 samples. For example, the F1 Score for Involved-Government improved by 0.15% between 10–21 samples and by 0.04% between 22–30 samples. These findings suggest that, under the current model architecture and task settings, 17–21 samples represent a reasonable stopping point, balancing near-optimal performance and minimized annotation costs. This insight provides valuable guidance for sample selection in practical applications and lays the groundwork for future research, such as sample optimization in low-resource scenarios.

5.3. Comparative Experiments on Generic Datasets

To validate the effectiveness of the improved ChatGLM-LtMP model in entity-relationship extraction tasks, this study selected three widely used datasets for comparative experiments: ACE2005, DuIE2.0, and Chinese Literature Text [33,34]. The ACE2005 dataset, released by the Linguistic Data Consortium, is a benchmark dataset in the field of entity-relationship extraction. It contains a large volume of English and Chinese data, providing a robust foundation for model evaluation. The construction corpus of DuIE2.0 dataset comes from Baidu Encyclopedia, Baidu Information Stream, and Baidu Posting, which covers both written and spoken expressions, and is able to comprehensively examine the model’s ability to extract relationships in real scenarios. This is similar to the ethical case dataset constructed in this study in terms of expression characteristics. The Chinese Literature Text dataset annotates entities such as characters, time, and locations, along with their interrelationships, sharing consistency with the ethics case dataset in certain entity-relationship categories. These datasets exhibit similarities to the dataset constructed in this study across multiple dimensions, making them suitable as benchmarks for comparative experiments.

In this study, five entity-relationship recognition models, which have been applied in many studies and are highly representative and universal, are selected for comparison experiments. The experimental results are detailed in Table 4.

BERT + Multi-head [35]: A joint entity-relationship extraction model that integrates the multi-head attention mechanism of Transformer with the pre-trained capabilities of BERT. It employs a single linear layer classifier and dropout techniques to evaluate the performance of multi-task learning, making it suitable for addressing multi-relationship problems.
CasRel_BERT [36]: An entity-relationship extraction model based on BERT. It models relationships as functions that map subjects to objects while identifying all possible relationships for a given subject and their corresponding objects. This approach significantly enhances the model’s ability to extract relationships in complex scenarios.
RIFRE [34]: A representation iterative fusion method based on heterogeneous graph neural networks. It models entities and relationships within a graph structure and iteratively fuses these two types of semantic information using a message-passing mechanism. This results in node representations better suited for relationship extraction tasks, improving model performance.
OneRel [37]: A model that treats joint extraction as a fine-grained triplet classification problem. It consists of a score-based classifier and a relation-specific tagging strategy, providing consistent performance gains in complex scenarios involving overlapping patterns and multiple triplets.
PI-Marker_BERT [38]: A model designed explicitly for Chinese natural language processing tasks. It enhances the model’s understanding of Chinese text by introducing word boundary markers into the BERT architecture.

By analyzing the data in Table 4, we observe that the ChatGLM-LtMP model demonstrates outstanding performance across all three datasets: ACE2005, DuIE2.0, and Chinese Literature Text. It achieves the highest scores in Precision, Recall, and F1 Score among all models, indicating its strong generalization capability and practical value in entity-relationship extraction tasks. Furthermore, we note that all models perform best on the Chinese Literature Text dataset, which has the simplest text structure. At the same time, their performance is relatively weaker on the DuIE2.0 dataset, which features the most complex text structure. This phenomenon reveals the challenges and room for improvement of the model in processing complex text, and highlights the importance of model optimization for specific tasks and dataset characteristics. It also provides a direction for further optimization of the model in this paper: more real-world complex samples can be added to the training data, and the generalization ability and robustness of the model can be further improved through targeted optimization of the model so that it can play a more significant role in a broader range of application scenarios.

5.4. Comprehensive Performance Assessment Experiment

5.4.1. Comprehensive Performance Comparison Between ChatGLM-LtMP and Baseline Models

To further validate the performance of the improved ChatGLM-LtMP model in the task of entity-relationship extraction for computer engineering ethics cases, this paper conducts comparative experiments on the same ethics case dataset. The OneRel and PL-Marker_BERT models, which exhibited the best performance in Section 5.3, are selected as the benchmark models. Additionally, two large language models, Qwen-7B-Chat [39] and Baichuan-7B [40], are introduced. Qwen-7B-Chat is a 7-billion-parameter-scale model based on Alibaba Cloud’s Qianyi Qiwang large model series, which has undergone extensive pre-training to understand and generate natural language. Baichuan-7B is an open-source large-scale pre-trained model developed by Baichuan Intelligent, achieving state-of-the-art results in multiple authoritative Chinese, English, and multilingual general domains for models of the same size. The experimental results are presented in Table 5.

By analyzing the data in Table 5, we observe that the ChatGLM-LtMP model achieves a precision of 94.38%, a recall of 93.06%, and an F1 Score of 93.71%, all of which are the highest among the compared models. Specifically, it outperforms the Baichuan-7B model by 4.04%, 3.78%, and 3.9% in precision, recall, and F1 Score, respectively, and surpasses the Qwen-7B-Chat model by 6.3%, 5.94%, and 5.59% in the same metrics. Compared to the OneRel model, the ChatGLM-LtMP model shows even more substantial gains, with improvements of 14.03%, 13.88%, and 13.95% in precision, recall, and F1 Score, respectively, marking the most notable optimization over baseline models. Furthermore, by comparing the data in Table 4 and Table 5, we find that the performance of our model on the ethics case dataset is significantly better than on the general datasets, with noticeable increases in precision, recall, and F1 Score. To evaluate the generalization capability of the ChatGLM-LtMP model, we conducted a 5-fold cross-validation experiment. In this process, the model was trained on 80% of the data in each fold and validated on the remaining 20%. We rigorously monitored the validation loss trajectory and implemented an early stopping strategy, halting training when no significant decrease in validation loss was observed for five consecutive epochs, thereby mitigating overfitting risks. The average cross-validated F1-score reached 92.86%, in close proximity to the test set F1-score of 93.71%, providing robust evidence against overfitting. Furthermore, the smooth convergence characteristics of the training loss curve demonstrate that the model effectively learned data features during iterations without observable oscillations or divergence, ensuring reliable prediction accuracy and robustness. A cross-lingual performance analysis revealed an 11% discrepancy between Chinese and English validation results. Given that the dataset predominantly comprises Chinese texts with limited English content, this performance level satisfies current research requirements. Future extensions to broader multilingual scenarios will necessitate targeted optimizations. Collectively, these experimental findings demonstrate that the ChatGLM-LtMP model achieves state-of-the-art effectiveness in entity-relationship extraction tasks for computer engineering ethics cases, significantly outperforming existing mainstream methods. This framework holds substantial promise as a critical tool for advancing ethical case studies and practical applications.

5.4.2. Performance Comparison of the Model Across Different Entity Relation Categories

To further analyze the performance of different models in extracting ten types of entity-relationships, this study compares the results of the Qwen-7B-Chat, Baichuan-7B, and ChatGLM-LtMP models, as illustrated in Figure 5.

By analyzing Figure 5, we observe that the “Event-Time” category achieves high-performance metrics across all models, with F1 Scores of 96.06%, 96.44%, and 97.87%, respectively. Similarly, the “Involved-Amount” category also demonstrates strong performance, with F1 Scores reaching 93.65%, 94.56%, and 97.91%, respectively. Further analysis of the dataset samples reveals that these two entity types possess clear and objective textual features, whereas the tail entities are often numerical. This characteristic enables the models to perform extraction tasks more accurately. In contrast, the extraction performance for the “Legal-Aspect” category is relatively lower, with F1 Scores of 77.14%, 81.17%, and 87.5%, respectively. Similarly, the performance for the “Ethical-Aspect” category is also suboptimal, with corresponding metrics of 79.69%, 82.45%, and 88.34%, respectively. Despite this, the ChatGLM-LtMP model proposed in this study shows significant improvements in F1 Score for both “Ethical-Aspect” and “Legal-Aspect”, with increases of more than 5% compared to the Qwen-7B-Chat and Baichuan-7B models. This result further validates the proposed model’s relative advantages and practical value in extraction tasks for computer ethics cases.

To determine whether the performance difference between ChatGLM-LtMP and Baichuan-7B is statistically significant, we conducted a paired-sample McNemar test. During the test, we ensured the consistency and integrity of the data. The null hypothesis was defined as follows: There is no statistically significant difference in performance between ChatGLM-LtMP and Baichuan-7B, meaning the two models perform comparably. As shown in Table 6, the McNemar test yielded a p-value of 0.042, which is below the significance level of 0.05, leading to the rejection of the null hypothesis. This indicates a statistically significant performance difference between the two models. Thus, the experimental results support the conclusion that ChatGLM-LtMP outperforms Baichuan-7B. Additionally, to further evaluate the models’ generalization capabilities, we designed a Zero-Shot experiment to test their performance on unseen ethics cases. The results demonstrated that ChatGLM-LtMP achieved an F1 Score of 81.32% in the Zero-Shot scenario, significantly higher than Baichuan-7B’s 78.65%. This suggests that ChatGLM-LtMP not only excels on known data but also exhibits strong generalization capabilities on unseen ethics cases.

The aforementioned analysis elucidates the model’s superior performance in conventional data scenarios; however, practical deployment necessitates addressing challenges posed by noise contamination and unseen terminologies. To establish a holistic performance evaluation framework, this study subsequently investigates the model’s robustness under aberrant data environments.

5.4.3. Anomaly Detection and Analysis

To gain deeper insights into the weaknesses of the ChatGLM-LtMP model on specific data types or tasks and evaluate its sensitivity to anomalous and privacy-related information, we conducted targeted anomaly detection and analysis. In the test set, 15% of annotation errors (e.g., mislabeled “Technology” entities as “Legal” entities) and 15% of domain-specific unknown terms (e.g., “Algorithmic Accountability”) were randomly introduced. The error-correction and adaptation capabilities of ChatGLM-LtMP were systematically observed and compared with baseline models, including Qwen-7B-Chat and Baichuan-7B. As summarized in Table 7, this experiment provides critical insights for optimizing the model to ensure robustness and reliability in real-world applications.

The results demonstrate that these models exhibit a certain level of sensitivity to anomalous information. Among the three models, ChatGLM-LtMP showed the smallest performance degradation (8.89%) when handling anomalous data, indicating superior robustness. This was followed by Qwen-7B-Chat, which experienced a performance drop of 10.56%. In contrast, Baichuan-7B exhibited the largest decline (11.12%), performing the worst in processing anomalous data. These findings highlight the advantages of our proposed model and underscore the importance of optimizing models to address their weaknesses. To further investigate these weaknesses, we conducted a manual analysis of 200 error samples and categorized them into six types based on their common characteristics, as summarized in Table 8.

Analyzing the above results, we can identify the following key observations: Firstly, the bias in pre-trained data is a major weakness of the model. Due to insufficient coverage of domain-specific knowledge in the pre-trained data, the model exhibits limited understanding of rare terminologies, with the most prominent errors occurring in the recognition of “Ethical-Aspect” and “Legal-Aspect”. These two categories involve highly specialized domain knowledge, and the imbalanced distribution of texts in the dataset fails to cover all relevant content, leading to a higher probability of model errors. Additionally, the recognition errors for “Involved-Government” are also notable. Further analysis of the dataset samples reveals that the limited volume of data for this entity-relationship results in insufficient model generalization. To address these issues, our preliminary work has already incorporated domain-specific corpora to enhance the representation of domain-specific terminologies. However, experiments indicate that manually constructed domain knowledge bases still face limitations, such as insufficient timeliness due to their dynamic update nature and high costs associated with manual annotation and maintenance. Therefore, we propose a three-phase optimization strategy: (1) constructing a real-time update framework based on dynamic crawling and semantic parsing, focusing on authoritative texts such as EU digital acts and IEEE ethical standards to enable incremental updates to the knowledge base; (2) introducing differential privacy techniques during the data collection phase, adding Laplace noise to sensitive information to ensure anonymized processing of personal data in compliance with the “privacy by design” principle; and (3) designing a dual-channel review mechanism of “machine pre-screening and domain expert verification”, employing an attention-weight-based credibility scoring system to filter out low-quality or conflicting entries.

Secondarily, data ambiguity and insufficient contextual information have also impacted model performance. Although our prior work optimized model behavior through a two-stage disambiguation framework, integrating semantic representation and contextual reasoning, further improvements remain achievable. The entity-relationships with the highest error rates include Event-Time, Involved-Person and Involved-Company, which is attributed to two main factors: (1) non-standardized temporal expressions (e.g., ambiguous date formats like “02/03” that cannot disambiguate between 3 February and 2 March), and (2) naming conflicts where corporate and personal names exhibit high similarity (e.g., Ford, Dell, and Tesla), leading to recognition confusion.

Finally, annotation errors significantly degraded model performance. Limited domain expertise among annotators resulted in annotation inconsistencies, particularly for specialized terminologies. We have rectified erroneous labels and developed detailed annotation guidelines to ensure consistency. Notably, errors attributable to intrinsic model limitations accounted for a minimal proportion of total errors. Consequently, our optimization efforts focus on the aforementioned three aspects, with plans to implement targeted test sets designed to systematically monitor model improvement across these dimensions.

5.5. Ablation Experiment

To validate the impact of Least-to-Most Prompting and P-Tuning v2 on the overall performance of the model, this study conducted ablation experiments on the ChatGLM-LtMP model using the same dataset [41]. The experiments involved removing LtM and P-Tuning v2 individually to observe their effects on model performance. The results of these experiments are presented in Table 9.

The results show that when Least-to-Most Prompting is added individually, the F1 Score increases from 87.33% to 90.65%, a gain of 3.32%. When P-Tuning v2 is applied individually, the F1 Score rises from 87.33% to 88.81%, an improvement of 1.48%. When both methods are combined, the F1 Score increases by 6.38%, achieving optimal model performance. This finding indicates that LtMP has a more significant optimization effect compared to P-Tuning v2, while the combination of LtMP and P-Tuning v2 delivers the best performance.

Given that the performance improvement from P-Tuning v2 alone is not substantial, this study employs another widely used strategy, LoRA [42], for a comparative experiment to further validate the impact of different parameter fine-tuning strategies on the model. In the experiment, four distinct ethics case test sets were selected and constructed from the dataset. While keeping the LtM strategy unchanged, the model was fine-tuned using both LoRA and P-Tuning v2 methods. The experimental results are shown in Figure 6. The results reveal that across all four ethics case test sets, the F1 Score achieved with the LoRA method is consistently lower than that with P-Tuning v2. P-Tuning v2 enhances task adaptability by introducing continuous trainable prompt vectors, which directly adjust the semantic representation of the input space without modifying the model weights. This characteristic enables greater flexibility in tasks requiring multi-level reasoning and domain knowledge integration, making it particularly suitable for extracting ethical and legal relationships in the ethics case studies discussed in this paper. In contrast, LoRA updates weight matrices through low-rank decomposition. While parameter-efficient, LoRA exhibits insufficient sensitivity to local semantics, especially when handling low-frequency entities (e.g., Involved-Government), as it struggles to capture domain-specific contextual dependencies. The results of this paper confirm that, for entity-relationship extraction tasks in computer engineering ethics cases, P-Tuning v2 is a more suitable fine-tuning approach.

6. Construction and Discussion of Intelligent Question Answering Systems

6.1. Building a Knowledge Graph

Knowledge graphs, as a powerful tool, offer significant advantages in terms of intuitiveness and the visual representation of entity categories and relationships, and their application across various fields is increasingly widespread [43,44]. In this study, the Neo4j graph database was selected as the storage solution. During the construction process, the primary entity “Event” from each case in the dataset was used as the root node of the graph, while other entities were treated as attribute nodes. Relationships between entities were represented as connections between these attributes. Taking “Facebook Privacy Class Action Lawsuit” and “The Equifax Data Breach Incident” as examples, the knowledge graphs are illustrated in Figure 7. The knowledge graphs clearly depict 34 key pieces of information from the two cases, including case events, amounts involved, and related legal and ethical issues. The knowledge graph uses different colors to distinguish various entities and their relationships within the cases, intuitively revealing the frequency of different entity types. This visualization method not only helps users quickly identify and understand the core content of the events but also enhances their cognitive ability to recognize the main subjects and accurately identify ethical issues. Advancing technology and richer case data enhance the role of knowledge graphs in creating more effective and accurate ethical platforms, especially for deeper analysis of ethical issues. Thus, knowledge graphs are both a technical tool and an innovative research method in computer engineering ethics.

6.2. Building an Intelligent Question Answering System

Integrating knowledge graphs with LLMs enhances intelligent question answering, offering the public an effective tool for ethical knowledge search and learning. LLMs are capable of understanding user queries, including parsing the intent, context, and required format of the answers, making them highly promising for intelligent question answering applications. Although LLMs exhibit some degree of uncertainty in information retrieval and generation, their combination with knowledge graphs allows for more accurate understanding of user intent, effectively reducing the occurrence of “hallucination” in responses. In this study, we employed the GLM-4-9B-chat as the base model and implemented bi-directional interaction with the knowledge graph using LangChain [45,46]. LangChain provides a comprehensive set of tools, components, and interfaces, significantly simplifying the system construction and deployment process and greatly improving development efficiency. The construction workflow is illustrated in Figure 8.

Building a local knowledge base. To further accelerate the model’s inference speed and reduce potential question and answer biases during the fine-tuning process, this study constructed a local knowledge base as a data support system. The knowledge base integrates pre-collected computer engineering ethics case studies, professional books, authoritative website information, and relevant academic literature, thereby expanding the model’s learning space. This enables the model to gain a deeper understanding of the computer engineering ethics domain and provides richer scenario support. The construction process is as follows. Document Loading and Splitting: The system loads documents of various types and uses the RecursiveCharacterTextSplitter to divide them into smaller chunks. Each chunk has a size of 1024 characters, with an overlap of 100 characters between adjacent chunks to maintain contextual coherence. Text Embedding and Storage: The text chunks are converted into vector representations using FastEmbed. These vectors are then stored in the Chroma vector database for efficient retrieval and processing.
Question paring. When handling user queries, LLMs match the questions against predefined question templates. For questions defined in the templates, the system retrieves answers directly from the knowledge graph and presents them immediately. For undefined questions, the system converts the question into a vector representation and retrieves the top 5 most relevant text segments from the knowledge base. These segments, along with the original question, are passed as input to the GLM-4-9B-chat model to generate a natural language response. In multi-turn dialogue scenarios, the system utilizes the create_history_aware_retriever function to integrate historical conversation information with the current query, ensuring dialogue coherence. Once relevant information is retrieved, the system promptly provides feedback to the user. If no matching information is found, the system informs the user with the message, “No relevant results found at the moment”. This approach not only delivers a human-like interactive experience and precise information retrieval but also effectively mitigates the issue of “hallucination” caused by random generation.
Compliance Checking. To address potential human errors in the question answering scenario, such as spelling mistakes, format inconsistencies, discriminatory expressions, or knowledge gaps, we implemented multi-layered technical strategies for error tolerance and robustness optimization. The specific approaches are summarized in Table 10. First point, the system automatically corrects spelling errors in user inputs. For example, “Etical-Aspect” is corrected to “Ethical-Aspect”. Second point, using regular expressions, the system enforces consistent formats for error-prone inputs such as dates and numbers. For instance, “02/03/2025” is converted to “YYYY-MM-DD”, and the user is prompted to confirm whether it represents “2025-02-03” or “2025-03-02”. Third point, a predefined rule engine dynamically matches discriminatory or privacy-violating content. If a user input contains discriminatory expressions or violates ethical principles, the system triggers an interception mechanism and displays the message: “This request contravenes ethical standards and principles”. Last point, for queries involving insufficient knowledge coverage or low-confidence responses, the system appends a disclaimer and recommends manual review. For example, when queried about the emerging term “algorithmic fatigue”, the system cannot generate an accurate response due to insufficient knowledge coverage.
Question and answer application. Taking “The Equifax Data Breach Incident” as a case study, we conducted three rounds of intelligent question answering, and the results are presented in Table 11.

When asked about the location of “The Equifax Data Breach Incident”, the system quickly responds by matching the predefined question template and retrieves relevant entity information from the knowledge graph, accurately providing the location of the event—U.S. Furthermore, when queried about similar cases, the system leverages the results from the previous round of question and answer to filter cases in the knowledge graph that occurred in the United States and successfully returns the corresponding event names. When asked about the impacts of the case, the system detects that this falls outside the scope of predefined question templates and instead accesses the knowledge base to retrieve detailed content. When processing the user query “computer engineering ethics cases on 20 March 2030”, the system executes a two-stage retrieval process. Initially, it attempts to retrieve predefined cases using the standardized date format “2030-03-20”, which yields no matches. The system then falls back to a vector database search for semantically similar content, yet no valid results are identified, ultimately triggering a knowledge insufficiency alert. For sensitive queries such as “how to obtain an individual’s complete identity information”, the system employs a dynamic blacklist rule engine to directly intercept the request. This workflow comprehensively demonstrates the system’s end-to-end processing capabilities, spanning input parsing, multi-level retrieval, compliance filtering, and result generation. The architecture highlights its operational efficiency and adaptability in handling complex queries and delivering precise responses, underscoring its robustness in challenging task scenarios. However, the performance of the current intelligent question answering system is largely constrained by the quality and completeness of the external knowledge base. When faced with novel or cross-domain ethical issues, it may exhibit limited generalization capabilities. In the future, we will continue to explore ways to further optimize system performance to promote its practical application in real-life scenarios.

7. Conclusions

In this study, an innovative computer engineering ethics case dataset was constructed, enabling cross-institutional data sharing through federated learning while ensuring privacy preservation during machine learning processes involving sensitive information. We propose ChatGLM-LtMP, that integrates Least-to-Most Prompting and P-Tuning v2 techniques. This framework achieves efficient Zero-Shot and Few-Shot information extraction tasks, delivering superior performance that significantly outperforms baseline models. Furthermore, a knowledge graph was implemented using the Neo4j graph database, coupled with the LangChain framework to enable intelligent question-answering functionality. These advancements provide robust technical support for ethical case analysis and application development.

However, the current dataset exhibits scenario coverage bias, with insufficient representation of rapidly evolving domains such as metaverse ethics and quantum computing ethics. Additionally, the model lacks multimodal data processing capabilities for images, audio, and video. Future work will focus on expanding and refining the dataset while prioritizing the development of multimodal interaction technologies. By integrating diverse data sources, we aim to achieve more natural and efficient human-computer interaction experiences, thereby addressing the demands of increasingly heterogeneous application scenarios.

Author Contributions

Conceptualization, X.G. and X.B.; Data curation, X.B. and Y.L.; Formal analysis, X.B., J.X. and Y.L.; Funding acquisition, X.G. and J.X.; Investigation, X.G., X.B., J.X. and Y.L.; Methodology, X.G., X.B. and J.X.; Project administration, X.G., X.B., J.X. and Y.L.; Resources, X.G. and J.X.; Software, X.G., X.B. and Y.L.; Supervision, X.G. and J.X.; Validation, X.G., X.B. and J.X.; Visualization, X.B. and Y.L.; Writing—original draft, X.G., X.B., J.X. and Y.L.; Writing—review & editing, X.G., X.B. and Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (grant number 32371864) and the 2024 Teaching Reform Project: Programming Course Reform Based on Generative Artificial Intelligence (grant number DGY2024-28).

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

LLMs	Large Language Models
LtM	Least-to-Most
CoT	Chain-of-Thought
SC	Self-Consistency
Auto-CoT	Automatic-Chain-of-Thought

References

Usher, M.; Barak, M. Unpacking the role of AI ethics online education for science and engineering students. Int. J. STEM Educ. 2024, 11, 35. [Google Scholar]
Ye, Z.J.; Zhang, X.Y.; Liang, J.; Tang, Y. The Challenges of Medical Ethics in China: Are Gene-Edited Babies Enough? Sci. Eng. Ethics 2019, 26, 123–125. [Google Scholar] [PubMed]
Ponce-Correa, A.M.; Ospina-Ospina, A.A.; Correa-Gutierrez, R.E. Curriculum analysis of ethics in engineering: A case study. DYNA 2022, 89, 67–73. [Google Scholar]
Zhong, B.; Xing, X.; Sun, L.J.W. Situation of Engineering Ethics Education of Postgraduates in China: A Preliminary Investigation. Int. J. Eng. Educ. 2023, 39, 1154–1166. [Google Scholar]
Suzgun, M.; Kalai, A.T. Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding. arXiv 2024, arXiv:2401.12954. [Google Scholar]
Xu, D.R.; Chen, W.; Peng, W.J.; Zhang, C.; Xu, T.; Zhao, X.Y.; Wu, X.; Zheng, Y.F.; Wang, Y.; Chen, E.H. Large language models for generative information extraction: A survey. Front. Comput. Sci. 2024, 18, 186357. [Google Scholar] [CrossRef]
Song, Z.H.; Hwang, G.Y.; Zhang, X.; Huang, S.; Park, B.K. A scientific-article key-insight extraction system based on multi-actor of fine-tuned open-source large language models. Sci. Rep. 2025, 15, 1608. [Google Scholar] [CrossRef]
Zhang, H.; Li, L.; Li, C. ChatGPT Performance Evaluation on Chinese Language and Risk Measures. Data Anal. Knowl. Discov. 2023, 7, 16–25. [Google Scholar]
Wang, B.; Lai, J.; Cao, H.; Jin, F.; Li, Q.; Tang, M.; Yao, C.; Zhang, P. Enhancing the interoperability and transparency of real-world data extraction in clinical research: Evaluating the feasibility and impact of a ChatGLM implementation in Chinese hospital settings. Eur. Heart J. Digit. Health 2024, 5, 712–724. [Google Scholar] [CrossRef]
Sun, Y.; Zhang, C.; Wang, C.; Han, Y. MIRA-ChatGLM: A Fine-Tuned Large Language Model for Intelligent Risk Assessment in Coal Mining. Appl. Sci. 2024, 14, 12072. [Google Scholar] [CrossRef]
Mzwri, K.; Turcsányi-Szabo, M. The Impact of Prompt Engineering and a Generative AI-Driven Tool on Autonomous Learning: A Case Study. Educ. Sci. 2025, 15, 199. [Google Scholar] [CrossRef]
Russe, M.F.; Reisert, M.; Bamberg, F.; Rau, A. Improving the use of LLMs in radiology through prompt engineering: From precision prompts to zero-shot learning. Rofo-Fortschritte Auf Dem Geb. Der Rontgenstrahlen Und Der Bildgeb. Verfahr. 2024, 196, 1166–1170. [Google Scholar] [CrossRef] [PubMed]
Son, M.; Won, Y.-J.; Lee, S. Optimizing Large Language Models: A Deep Dive into Effective Prompt Engineering Techniques. Appl. Sci. 2025, 15, 1430. [Google Scholar] [CrossRef]
Seo, J.; Moon, H.; Lee, C.; Eo, S.; Park, C.; Kim, J.; Chun, C.; Lim, H. Plain Template Insertion: Korean-Prompt-Based Engineering for Few-Shot Learners. IEEE Access 2022, 10, 107587–107597. [Google Scholar] [CrossRef]
Li, Z.; Chang, X.; Li, Y.; Li, S.; Zhou, G. A Joint Learning Approach to Few-Shot Learning for Multi-category Sentiment Classification. Acta Sci. Nat. Univ. Pekin. 2023, 59, 57–64. [Google Scholar]
Giray, L. Prompt Engineering with ChatGPT: A Guide for Academic Writers. Ann. Biomed. Eng. 2023, 51, 2629–2633. [Google Scholar] [CrossRef]
Miao, J.; Thongprayoon, C.; Suppadungsuk, S.; Krisanapan, P.; Radhakrishnan, Y.; Cheungpasitporn, W. Chain of Thought Utilization in Large Language Models and Application in Nephrology. Med.-Lith. 2024, 60, 148. [Google Scholar] [CrossRef] [PubMed]
Zhao, S.; Tian, J.; Fu, J.; Chen, J.; Wen, J.M. FeaMix: Feature Mix with Memory Batch Based on Self-Consistency Learning for Code Generation and Code Translation. IEEE Trans. Emerg. Top. Comput. Intell. 2025, 9, 192–201. [Google Scholar] [CrossRef]
Zhang, H.; Cao, R.; Chen, L.; Xu, H.; Yu, K. ACT-SQL: In-Context Learning for Text-to-SQL with Automatically-Generated Chain-of-Thought. arXiv 2023, arXiv:2310.17342. [Google Scholar]
Yao, S.; Zhao, J.; Yu, D.; Du, N.; Shafran, l.; Narasimhan, K.; Cao, Y. ReAct: Synergizing Reasoning and Acting in Language Models. arXiv 2022, arXiv:2210.03629. [Google Scholar]
Zhou, D.; Scharli, N.; Hou, L.; Wei, J.; Scales, N.; Wang, X.; Schuurmans, D.; Bousquet, O.; Le, Q.; Chi, E.H. Least-to-Most Prompting Enables Complex Reasoning in Large Language Models. arXiv 2022, arXiv:2205.10625. [Google Scholar]
Vatsal, S.; Dubey, H. A Survey of Prompt Engineering Methods in Large Language Models for Different NLP Tasks. arXiv 2024, arXiv:2407.12994. [Google Scholar]
Zhang, Q.; Wang, Y.; Wang, H.; Wang, J.; Chen, H. Comprehensive Review of Large Language Model Fine-Tuning. Comput. Eng. Appl. 2024, 60, 17–33. [Google Scholar]
Zhou, B.; Li, X.Y.; Liu, T.Y.; Xu, K.Z.; Liu, W.; Bao, J.S. CausalKGPT: Industrial structure causal knowledge-enhanced large language model for cause analysis of quality problems in aerospace product manufacturing. Adv. Eng. Inform. 2024, 59, 102333. [Google Scholar] [CrossRef]
Song, Y.S.; Zhu, Q.T.; Wang, H.B.; Zheng, Q.H. Automated Essay Scoring and Revising Based on Open-Source Large Language Models. Ieee Trans. Learn. Technol. 2024, 17, 1920–1930. [Google Scholar] [CrossRef]
Alahmari, S.S.; Hall, L.O.; Mouton, P.R.; Goldgof, D.B. Repeatability of Fine-Tuning Large Language Models Illustrated Using QLoRA. IEEE Access 2024, 12, 153221–153231. [Google Scholar] [CrossRef]
Yang, Y.R.; Chen, S.S.; Zhu, Y.P.; Liu, X.M.; Pan, S.F.; Wang, X. Intelligent question answering for water conservancy project inspection driven by knowledge graph and large language model collaboration. Lhb-Hydrosci. J. 2024, 110, 2397337. [Google Scholar] [CrossRef]
Wang, Z.; Lu, J.; Yang, D.; Li, M. Research on intelligent question-answering system for flood emergency decision-making with fusion of GPT and knowledge graph. J. Saf. Sci. Technol. 2024, 20, 5–11. [Google Scholar]
Li, J.J.; Zhang, K.; Yang, X.L.; Wei, P.; Wang, J.; Mitra, K.; Ranja, R. Category Preferred Canopy-K-means based Collaborative Filtering algorithm. Future Gener. Comput. Syst.-Int. J. Escience 2019, 93, 1046–1054. [Google Scholar] [CrossRef]
Liu, C.; Sun, K.J.; Zhou, Q.Q.; Duan, Y.C.; Shu, J.H.; Kan, H.X.; Gu, Z.Y.; Hu, J.L. CPMI-ChatGLM: Parameter-efficient fine-tuning ChatGLM with Chinese patent medicine instructions. Sci. Rep. 2024, 14, 6403. [Google Scholar] [CrossRef]
Hinojosa Lee, M.C.; Braet, J.; Springael, J. Performance Metrics for Multilabel Emotion Classification: Comparing Micro, Macro, and Weighted F1-Scores. Appl. Sci. 2024, 14, 9863. [Google Scholar] [CrossRef]
Ren, W.Q.; Tang, Y.; Sun, Q.Y.; Zhao, C.Q.; Han, Q.L. Visual Semantic Segmentation Based on Few/Zero-Shot Learning: An Overview. IEEE-Caa J. Autom. Sin. 2024, 11, 1106–1126. [Google Scholar] [CrossRef]
Huang, W.Z.; Zhang, J.C.; Tian, T.; Ji, D.H. A syntax-enhanced parameter generation network for multi-source cross-lingual event extraction. Knowl.-Based Syst. 2024, 292, 111585. [Google Scholar] [CrossRef]
Rao, D.; Wu, Q.; Huang, G. Chinese relation extraction pipeline model based on entity cascading types. Appl. Res. Comput. 2024, 41, 2685–2689. [Google Scholar]
Ro, Y.; Lee, Y.; Kang, P. Multi²OIE: Multilingual Open Information Extraction based on Multi-Head Attention with BERT. In Proceedings of the Findings, Online, 16–20 November 2020. [Google Scholar]
Hao, S.L.; Shi, F.F. Research on Joint Extraction Method of Elevator Safety Risk Control Knowledge Based on Multi-Perspective Learning. IEEE Access 2024, 12, 159488–159502. [Google Scholar] [CrossRef]
Shang, Y.-M.; Huang, H.; Mao, X.-L. OneRel: Joint Entity and Relation Extraction with One Module in One Step. In Proceedings of the AAAI Conference on Artificial Intelligence, Beijing, China, 22 February–1 March 2022. [Google Scholar]
Li, L.; Dai, Y.; Tang, D.; Feng, Z.; Zhou, C.; Qiu, X.; Xu, Z.; Shi, S. MarkBERT: Marking Word Boundaries Improves Chinese BERT. In Proceedings of the Natural Language Processing and Chinese Computing, Foshan, China, 12–15 October 2023. [Google Scholar]
Yang, R.; Zhu, J.H.; Man, J.P.; Fang, L.; Zhou, Y. Enhancing text-based knowledge graph completion with zero-shot large language models: A focus on semantic enhancement. Knowl.-Based Syst. 2024, 300, 112155. [Google Scholar] [CrossRef]
Yang, A.M.; Xiao, B.; Wang, B.; Zhang, B.; Bian, C.; Yin, C.; Lv, C.; Pan, D.; Wang, D.; Yan, D.; et al. Baichuan 2: Open Large-scale Language Models. arXiv 2023, arXiv:2309.10305. [Google Scholar]
Na, C.; Magnusson, I.; Jha, A.; Sherborne, T.; Strubell, E.; Dodge, J.; Dasigi, P. Scalable Data Ablation Approximations for Language Models through Modular Training and Merging. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Miami, FL, USA, 12–16 November 2024. [Google Scholar]
Gao, D.H.; Ma, Y.F.; Liu, S.; Song, M.F.; Jin, L.B.; Jiang, W.; Wang, X.; Ning, W.; Yu, S.Q.; Xuan, Q.; et al. FashionGPT: LLM instruction fine-tuning with multiple LoRA-adapter fusion. Knowl.-Based Syst. 2024, 299, 112043. [Google Scholar] [CrossRef]
Liu, J.; Sun, X.; Zhang, Y. Educational resource content review method based on knowledge graph and large language model collaboration. J. East China Norm. Univ. Nat. Sci. 2024, 2024, 57–69. [Google Scholar]
Gao, M.; Zhang, L. Research on connotation, technology and application of educational knowledge graph based on multi-modal resources. Appl. Res. Comput. 2022, 39, 2257–2267. [Google Scholar]
Vidivelli, S.; Ramachandran, M.; Dharunbalaji, A. Efficiency-Driven Custom Chatbot Development: Unleashing LangChain, RAG, and Performance-Optimized LLM Fusion. Cmc-Comput. Mater. Contin. 2024, 80, 2423–2442. [Google Scholar] [CrossRef]
Wang, Y.D.; Zhang, C.Y.; Li, J.; Pang, Y.; Zhang, L.F.; Zhang, M.; Wang, D.S. AlarmGPT: An intelligent alarm analyzer for optical networks using a generative pre-trained transformer. J. Opt. Commun. Netw. 2024, 16, 681–694. [Google Scholar] [CrossRef]

Figure 1. Frequency histogram of entity-relationship types.

Figure 2. ChatGLM-LtMP model framework.

Figure 3. The process of constructing the prompt template.

Figure 4. The working principle of P-Tuning v2.

Figure 5. Performance comparison of different models in ten types of entity-relationship extraction tasks.

Figure 6. Performance comparison of P-Tuning v2 and LoRA.

Figure 7. Knowledge graph of “Facebook Privacy Class Action Lawsuit” and “The Equifax Data Breach Incident” cases.

Figure 8. The process of building the intelligent question answering system.

Table 1. Entity-relationship types and data information.

Entity-Relationship Type	Head Entity Type	Tail Entity Type	Count
Event-Time	Event	Time	2176
Event-Location	Event	Address	1805
Involved-Person	Event	Name	1017
Involved-Organization	Event	Organization	180
Involved-Company	Event	Company	569
Involved-Government	Event	Government	121
Technology-Factor	Event	Technology	2610
Legal-Aspect	Event	Legal	491
Ethical-Aspect	Event	Ethical	612
Involved-Amount	Event	Amount	1023
Total			10,604

Table 2. Hyperparameter settings.

Hyperparameter	Value
pre_seq_len	128
learning_rate	2 × 10⁻⁴
batch_size	8
epoch	30
max_source_seq_len	400
max_target_seq_len	300

Table 3. F1 Score for ChatGLM-LtMP model with different fine-tuning settings (Unit: %).

Entity Relationship Type	Zero-Shot-LtM	One-Shot-LtM	Three-Shot-LtM	Five-Shot-LtM	Few-Shot-LtM
Event-Time	90.26	92.24	94.96	96.87	97.87
Event-Location	88.15	90.41	93.01	96.91	97.15
Involved-Person	87.09	91.59	93.08	94.81	95.33
Involved-Organization	83.06	87.21	90.11	92.76	93.01
Involved-Company	85.58	88.34	93.03	94.58	94.76
Involved-Government	82.36	86.10	88.35	90.57	91.61
Technology-Factor	85.03	87.91	91.01	92.63	93.40
Legal-Aspect	79.24	80.63	83.96	85.76	87.50
Ethical-Aspect	80.30	81.29	85.69	87.91	88.34
Involved-Amount	92.35	94.04	95.73	97.91	97.91

Table 4. Performance comparison of different models on general datasets (Unit: %).

Entity-Relationship Type	ACE2005			DuIE2.0			Chinese Literature Text
Entity-Relationship Type	P	R	F1	P	R	F1	P	R	F1
BERT + Multi-head	80.32	79.65	79.98	75.06	74.13	74.59	84.26	83.79	84.02
CasRel_BERT	81.37	80.29	80.83	75.31	74.78	75.05	86.31	84.21	85.23
RIFRE	82.53	81.65	82.09	78.86	77.41	78.11	86.69	85.63	86.16
OneRel	86.62	85.17	85.89	81.01	79.56	80.27	87.14	86.23	86.69
PL-Marker_BERT	85.45	84.23	84.83	83.35	82.74	83.04	88.05	87.43	87.74
ChatGLM-LtMP (Ours)	87.92	86.61	87.26	85.91	84.72	85.31	90.61	89.13	89.87

Table 5. Comparison of the ChatGLM-LtMP Model’s performance with other benchmark models (Unit: %).

Model Type	P	R	F1
OneRel	80.35	79.18	79.76
PL-Marker_BERT	82.61	81.23	81.91
Qwen-7B-Chat	88.08	87.12	87.58
Baichuan-7B	90.34	89.28	89.81
ChatGLM-LtMP (Ours)	94.38	93.06	93.71

Table 6. McNemar test results for ChatGLM-LtMP vs. Baichuan-7B.

Parameter	Value
total n	500
test statistic	4.05
degree of freedom	1
asymptotic sig	0.042

Table 7. Accuracy of models on different categories of data (Unit: %).

Model Type	Normal Data	Anomalous Data (Injected)	Performance Drop
Qwen-7B-Chat	87.58	78.33	10.56
Baichuan-7B	89.81	79.82	11.12
ChatGLM-LtMP (Ours)	93.71	85.37	8.89

Table 8. The ChatGLM-LtMP model’s misclassification in entity relation extraction tasks.

Error Type	Example	Count	Percentage
Annotation Errors	An annotator mislabeled “Digital Signature” as a Legal entity (correct: Technology), causing the model to confuse technical terms with legal clauses.	31	15.5%
Ambiguity	Sentence: “Ford announced a new electric vehicle model in 02/03/2025”. The model failed to distinguish whether “02/03” refers to 3 February or 2 March.	40	20%
Pretraining Bias	Sentence: “In the fairness audit of the recruitment system, the algorithm’s pass rate for female applicants was significantly lower than that for males, indicating clear gender discrimination”. The model failed to recognize “Fairness Audit” due to insufficient coverage of ethical knowledge.	87	43.5%
Lack of Context	Sentence: “This incident has sparked public concern, with many netizens stating that AI replacing some human jobs is an inevitable trend”. The context does not clarify what “this incident” refers to, preventing the model from linking the Event entity with the Technology entity (AI).	30	15%
Other Errors	Errors caused by incomplete model parameter settings or prompt template design.	12	6%

Table 9. Results of the ablation experiment (Unit: %).

Model	Least-To-Most Prompting	P-Tuning v2	P	R	F1
ChatGLM-LtMP (Ours)	-	-	87.96	86.71	87.33
	-	√	89.51	88.12	88.81
	√	-	90.95	90.36	90.65
	√	√	94.38	93.06	93.71

Table 10. Detection and solutions for complex issues in compliance checking.

Challenge Type	Solution	Example
Spelling Errors	Automatic text correction before query	User input: “What are the Etical-Aspect relationships in this case?” `→` Corrected to “Ethical-Aspect”.
Format Inconsistencies	Regular expression parsing + user confirmation	User input: “02/03/2025” `→` System response: “Detected date ambiguity. Is it ‘2025-02-03’ or ‘2025-03-02’?”
Discriminatory Expressions	Dynamic ethical blacklist matching	User input: “How to obtain others’ private data?” `→` System intercepts and responds: “This request contravenes ethical standards and principles”.
Knowledge Gaps	Disclaimer for insufficient knowledge coverage	User input: “How to evaluate the ethical risks of autonomous driving algorithms in ‘algorithmic fatigue’ scenarios?” `→` System response: “The current knowledge coverage is insufficient; it is recommended to have a manual review”.

Table 11. Intelligent question answering system results.

Round	Question	Query	Answer
one	What is the location involved in The Equifax Data Breach Incident?	MATCH (m: CaseIndex {Event: ‘Equifax Data Breach Incident’})-[]->(n) RETURN m.Address	U.S.
two	Please provide an example of a similar case name.	MATCH (m: CaseIndex)-[: Event-Location}->(L: Address) WHERE L.Address CONTAINS ‘U.S.’ RETURN m.Event	Facebook Privacy Class Action Lawsuit
three	Please discuss the impact of the Facebook privacy class action lawsuit on China.	results = db.query( collection_name = “my_collection”, query_embeddings = [query_embedding[ ]], n_results = 10)	This case serves as an important revelation for the development of internet privacy protection in China. The internet technology in China is developing rapidly, but there is still a need for further enhancement in terms of public privacy awareness, supporting privacy systems…
four	Provide information about a computer engineering ethics case that occurred on 20 March 2030?	MATCH (m:CaseIndex {Date: “2030-03-20”}) RETURN m.Event, m.Description db.query(collection_name = “my_collection”, query_embeddings = [query_embedding], n_results = 5)	The current knowledge coverage is insufficient; it is recommended to have a manual review.
five	How can one obtain someone’s complete identity information?	MATCH (n:EthicalBlacklist {Type: “PrivacyViolation”}) WHERE n.Query CONTAINS “identity information” RETURN n.Response	This request contravenes ethical standards and principles.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gao, X.; Ba, X.; Xing, J.; Liu, Y. Privacy-Preserving Information Extraction for Ethical Case Studies in Machine Learning Using ChatGLM-LtMP. Electronics 2025, 14, 1352. https://doi.org/10.3390/electronics14071352

AMA Style

Gao X, Ba X, Xing J, Liu Y. Privacy-Preserving Information Extraction for Ethical Case Studies in Machine Learning Using ChatGLM-LtMP. Electronics. 2025; 14(7):1352. https://doi.org/10.3390/electronics14071352

Chicago/Turabian Style

Gao, Xindan, Xinyi Ba, Jian Xing, and Ying Liu. 2025. "Privacy-Preserving Information Extraction for Ethical Case Studies in Machine Learning Using ChatGLM-LtMP" Electronics 14, no. 7: 1352. https://doi.org/10.3390/electronics14071352

APA Style

Gao, X., Ba, X., Xing, J., & Liu, Y. (2025). Privacy-Preserving Information Extraction for Ethical Case Studies in Machine Learning Using ChatGLM-LtMP. Electronics, 14(7), 1352. https://doi.org/10.3390/electronics14071352

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Privacy-Preserving Information Extraction for Ethical Case Studies in Machine Learning Using ChatGLM-LtMP

Abstract

1. Introduction

2. Related Work

2.1. Research Status of Large Language Models in Information Extraction

2.2. Research Status of Prompt Engineering

2.3. Research Status of Intelligent Question Answering Systems

3. Constructing Datasets

4. ChatGLM-LtMP Model Framework

4.1. Constructing Prompt Templates Using the Least-to-Most Prompting Method

4.2. P-Tuning v2 Fine-Tuning Method

5. Experimental Results and Discussion of Information Extraction

5.1. Experimental Setup

5.2. Comparative Experiments Under Different Fine-Tuning Methods

5.3. Comparative Experiments on Generic Datasets

5.4. Comprehensive Performance Assessment Experiment

5.4.1. Comprehensive Performance Comparison Between ChatGLM-LtMP and Baseline Models

5.4.2. Performance Comparison of the Model Across Different Entity Relation Categories

5.4.3. Anomaly Detection and Analysis

5.5. Ablation Experiment

6. Construction and Discussion of Intelligent Question Answering Systems

6.1. Building a Knowledge Graph

6.2. Building an Intelligent Question Answering System

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI