Moral Judgment with a Large Language Model-Based Agent

Xiong, Shuchu; Gu, Haozhan; Liang, Wei; Yin, Lu

doi:10.3390/electronics14132580

Open AccessArticle

Moral Judgment with a Large Language Model-Based Agent

¹

School of Computer Science, Hunan University of Technology and Business, Changsha 410205, China

²

Changsha Social Laboratory of Artificial Intelligence, Hunan University of Technology and Business, Changsha 410205, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(13), 2580; https://doi.org/10.3390/electronics14132580

Submission received: 18 May 2025 / Revised: 21 June 2025 / Accepted: 23 June 2025 / Published: 26 June 2025

(This article belongs to the Special Issue Natural Language Processing Based on Neural Networks and Large Language Models)

Download

Browse Figures

Versions Notes

Abstract

The ethical reasoning capability of large language models (LLMs) directly impacts their societal applicability, and enhancing this capacity is critical for developing trustworthy and secure artificial intelligence (AI) systems. The existing moral judgment methods based on LLMs rely on a single cognitive theory and lack an information aggregation and transmission mechanism, which affects the accuracy and stability of moral judgment. In this paper, we propose MoralAgent, an agentic approach that utilizes LLMs for moral judgment. First, the moral judgment process is planned based on various moral judgment theories. Second, the four dynamic prompt templates and the memory module are designed, and the moral principle is constructed to assist the analysis. Finally, the memory module is coordinated with the dynamic prompt template to optimize data transmission efficiency. This method significantly outperforms three types of traditional methods on the MoralExceptQA dataset. Compared to the two existing categories of methods based on LLMs, the F1 score of the proposed method is at least 4.13% higher, with slightly lower variance. Extensive experiments and evaluation metrics demonstrate the effectiveness of the proposed method, and sample analysis shows how the judgment process works to ensure that the results are reliable.

Keywords:

moral judgment; large language models (LLMs); AI moral; agent; prompt engineering

1. Introduction

The swift advancement of large language models (LLMs) signifies a wholly novel phase in artificial intelligence systems. The model markedly enhances the efficiency and precision of the decision-making process. However, because LLMs depend on learning from vast amounts of data, they may cause moral and ethical problems such as bias and discrimination when they capture and use harmful or sensitive information in the training data. At the same time, LLMs can learn and internalize some moral principles during training. However, they may show a mechanical adherence to these principles and a lack of flexibility. These risks may not only exacerbate social conflicts and reinforce group segregation but also have far-reaching negative impacts on values.

The primary emphasis of discourse on moral judgment is ensuring that artificial intelligence (AI) systems conform to human morality and stay under human oversight. Consequently, prior research has sought to attain specific goals such as debiasing and detoxification through the modification of model parameters, fine-tuning a limited set of parameters, or incorporating supplementary modules. These lightweight methodologies seek to regulate model behavior to align its outputs more closely with human morality [1,2,3,4]. Nevertheless, as model sizes perpetually increase, the efficacy of these methodologies has progressively diminished. Subsequently, some researchers used fully supervised fine-tuning to fine-tune the model with end-to-end instructions by manually constructing datasets that satisfy different values. These methods have markedly enhanced alignment efficacy [5,6,7,8]. While fully supervised fine-tuning has enhanced stability in LLMs, it is deficient in generalization abilities and continues to show inconsistencies with human evaluations in intricate real-world situations. To tackle these challenges, certain scholars have commenced the integration of multidisciplinary methodologies by amalgamating theories from philosophy and cognitive science to augment the moral judgment faculties of LLMs. Moral judging methods, such as MORALCoT [9] and ECMoral [10], direct LLMs through a sequence of reasoning processes to facilitate moral reasoning and assessment. Experimental findings demonstrate that these models possess a degree of moral judgment ability. Nonetheless, these methods rely on a single theory, indicating potential for enhancement in their applicability and comprehensiveness. Furthermore, the substantial volumes of produced textual data are devoid of effective management and transmission, resulting in the potential for information ambiguity, which ultimately influences their moral judgment performance.

To solve the above problems, this study proposes MoralAgent, an agentic approach that utilizes LLMs for moral judgment. The agent is composed of the large language model (LLM), memory, planning, and tools [11], and its architecture enables the LLM to have a more comprehensive analyzing ability [12,13,14,15]. The design of the intelligent body in this paper will begin by simulating the human moral judgment process to construct an automated analysis process centered around the LLM. We summarize and refine based on many human moral judgment theories and design a set of analysis plans. It includes key information extraction, preliminary analysis, multidimensional analysis, and comprehensive decision-making. Based on this four-step analysis plan, four dynamic prompt templates applicable to the LLM are constructed. The dynamic prompt template will provide details of the tasks to be performed, with the analysis results to be received. At the same time, this study organized and summed up the ethical guidelines for AI that were put forward by governments, institutions, and academic groups. It then turned them into callable moral principles that could be used to support the analysis. This strengthens the comprehensiveness of the LLM’s moral judgment.

Moreover, there is uncertainty in the information transmission of LLMs. To avoid confusion and omission of important information during the reasoning process, this method designs the analysis process as four dynamic prompt templates. These templates serve as an interactive bridge between the input and output of the LLM, running through the four-step analysis process. These can dynamically update based on the information transmitted during the reasoning process, ensuring the coherence of the analysis process and accurately conveying all important information. The complexity of moral scenarios and the diversity of analysis tasks generate a large amount of analysis data. This study constructs a memory module and collaborates with dynamic prompt templates to support LLMs in storing and summarizing important information related to analysis tasks and their generation during the analysis process. Moreover, the memory module will use the summarized information to update the next prompt template. Dynamic prompt templates and the memory module complement each other, further alleviating the issues of information confusion and loss that LLMs in agents might encounter when processing vast amounts of information, and thereby enhancing the efficiency and reliability of analysis.

MoralAgent is proposed as an agentic approach using LLMs for moral judgment. We systematically sort and summarize many moral judgment theories and design a four-step analysis plan. In addition, dynamic prompt templates, moral principles, and the memory module are designed to support the analysis.
An information processing method is designed. The memory module summarizes and stores the results obtained from the analysis plan. Subsequently, the variable fields in the next prompt template are automatically updated based on what is memorized. The two interact synergistically to ensure the orderly progression of each step in the analysis plan and the efficient delivery of important information.
The paper’s experiments demonstrate that the MoralAgent method outperforms previous related methods, and the moral judgment process is demonstrated through a sample.

The remainder of this paper is organized as follows: Section 2 reviews related work, including theoretical research on human moral judgment, studies on moral judgment in LLMs, and research on LLM agents. Section 3 details our proposed method, covering problem definition, the design rationale of the MoralAgent approach, and its execution process. Section 4 introduces the Dataset, Baseline Methods, and Experimental Metrics. Section 5 presents the experimental results comparing MoralAgent with baseline methods, along with ablation studies and generalization tests. Section 6 concludes the paper and discusses its limitations as well as future research directions.

2. Related Works

2.1. Theoretical Research on Human Moral Judgment

Early studies of moral judgment focused on causal and emotionally driven analyses. The responsibility and blame model, which Shaver [16] came up with in 1985, says that moral judgments are based on causality and responsibility. Since then, researchers like Weiner [17], Cushman [18], and Knobe [19] have investigated deeper aspects of moral judgment from the points of view of emotion, behavior that can be controlled, and cause and effect.

Subsequent studies have focused on the interaction between intuition and rationality in complex moral decision-making and their theoretical models. Haidt and Hersh [20] suggested fast emotional intuition usually comes before slowing post hoc reasoning and rules over moral judgments. Greene [21], on the other hand, combined emotion and logic to explain how people make choices when they are faced with moral dilemmas. Malle et al.’s [22] pathway model of blame combines intuition and reasoning, clarifying how final blame decisions are made, and providing a framework for understanding the dynamic processes involved in moral judgment.

With the continuous development of moral judgment theories, researchers have further refined and explored new theoretical frameworks based on existing results. The Social Information Processing–Moral Decision-Making (SIP-MDM) framework by Garrigan et al. [23] is based on the Social Information Processing Theory [24]. It combines affective processes and moral domain knowledge through six steps, from cue encoding to behavior execution, to show how moral judgments are made. Van Bavel et al. [25] also said that traditional static theories should be replaced with dynamic system models that focus on the interaction among emotion, cognition, and context. This offers a new way to understand how moral cognition changes over time. Using only a single theory to guide an LLM can lead to a limited analytical perspective, which in turn limits its information processing capabilities. Therefore, it is necessary to systematically sort out and integrate multiple theories and construct the steps of moral judgment applicable to the LLM.

2.2. Research on Moral Judgment in LLMs

Early research on moral judgments in LLMs focused on parameter training, i.e., achieving the alignment of moral values without modifying or adjusting only a small number of parameters in the model. Sheng et al. [1] searched and optimized adversarial training to obtain discrete strings as triggers to be spliced into cues of a language model to reduce the model from generating discriminatory content with respect to gender, ethnicity, etc. Cheng et al. [2] trained a filtering layer on top of BERT’s output using an information bottleneck-based loss function to remove gender-related information. Similarly, Qian et al. [4] and Yang et al. [26] optimize the fairness and harmlessness of the generated content through prefix fine-tuning and information-theoretic approaches, respectively. While these methods achieved some success in small-scale models, their effectiveness diminished when applied to LLMs.

With the increasing scale of LLMs, supervised fine-tuning has gradually become a mainstream approach. Supervised fine-tuning can yield more accurate alignment results for LLMs than parameter training. To fine-tune a language model and get rid of the gender bias in the pre-training data, Lu et al. [27] created attribute values that were semantically similar but different (for example, male vs. female). Liu et al. [8] added positive and negative examples to the training data and improved the model’s ability to spot subtle value differences through comparative learning. However, such methods show limited generalization ability when dealing with unfamiliar scenes.

In recent years, researchers have begun to explore the use of prompt learning methods to enhance the performance of LLMs in tasks involving moral judgment. Ganguli et al. [28] improve the model’s performance by adding clear value constraint statements to the prompts, like “Please make sure your answer is fair.” This makes the content that is generated more in line with what is expected to happen. Sun et al. [7], through contextual learning, design manually written guidelines to help the model produce useful and harmless content. In addition, Saunders et al. [29] used a LLM to self-critique the responses generated for a given question and revise the responses again based on the problems they found. However, the analysis process and moral judgment results obtained by these methods still differ from actual human moral judgments. This highlights the importance of incorporating human moral judgments into the moral judgment mechanism.

To solve the problems listed above and get the most out of prompt learning, Jin et al. [9] first created a set of prompt methods for moral judgment in LLMs from a contractualist point of view. Wu et al. [10] tried to include emotional and mental factors in the way that LLMs decide what is right and wrong, drawing from psychology and cognitive science. Such methods still have room for improvement, as they rely solely on a theoretical framework to guide LLMs in moral judgment.

2.3. Research on LLM Agents

LLMs like GPT-4 have already demonstrated their exceptional capabilities in complex reasoning, intent recognition, and code writing. These abilities are usually considered manifestations of intelligence that appear only when the model’s parameter count reaches a certain level. Currently, numerous studies are dedicated to exploring the potential of large language models (LLMs), driving their evolution toward agent-oriented development [30,31,32,33,34]. Therefore, several agent systems such as AutoGPT, BabyAGI, AgentGPT, and HuggingGPT have emerged on the market [35]. LLMs serve as the core brain of agents, working in conjunction with multiple key components [11]. Some studies have utilized LLMs to combine foundational models with existing application architectures to accomplish tasks in both digital and physical domains. For example, ChemCrow [36] integrates 17 expert-designed tools to enhance the capabilities of LLMs in the field of chemistry. Math agents [37] use LLMs to explore, reveal, solve, and demonstrate mathematical problems. In this paper, the method of MoralAgent utilizes the advantages of agents to design a more comprehensive moral judgment process centered on an LLM. It includes callable moral principles to assist in analysis. The memory module and dynamic prompt templates work together to make sure that the analysis process runs efficiently. This improves the moral judgment abilities of LLMs in the end.

3. Methodology

Because moral judgments are complicated and flexible, traditional approaches based on LLMs only use one theory of moral judgment as a prompt template for analysis and judgment. This makes it hard to fully utilize the potential of LLMs. Also, the huge amount of text data that is generated during the analysis process does not have a good way to summarize and transfer information, which can easily cause information confusion. To solve the above problems, this paper proposes MoralAgent, an agent method for moral judgment using LLMs. The method constructs a set of moral judgment processes that simulate human beings to achieve a systematic analysis of moral scenarios. At the same time, a formatting tool is constructed to assist the analysis with a knowledge graph containing moral principles, which increases the comprehensiveness of the analysis. Additionally, we synergize the memory module and dynamic prompt templates to optimize the information transfer process, thereby ensuring the reliability of the analysis results. This section provides a detailed description of the specific structural design of each module in this methodology, as well as the interaction process between them.

3.1. Task Definition

Moral judgment belongs to a special classification task in natural language processing. By analyzing the given input scene text and behavior description, we predict whether the behavior is appropriate (the output is “Yes”) or not (the output is “No”).

3.2. Design of the MoralAgent Method

As a deep learning-based model, the LLM is trained with a large amount of human textual data to exhibit human-like cognitive abilities. There are some problems with relying only on the LLM’s abilities, though. Therefore, this study proposes a framework for agents that integrates memory, dynamic prompt templates, and external tool modules with a LLM at its core. As shown in Figure 1, this study first analyzes the basic process of human moral judgment and then identifies the analysis plan and competency needs of MoralAgent.

3.2.1. Design of the Analysis Process

Researchers have a better idea of how people make moral decisions thanks to progress in cognitive science. Out of many theories and models of moral decision-making, we chose six that best represent how people do it: the Social Intuitionist Model [20], the Causal-Intentional Model [18], the Dual Process Theory [21], the Path Model of Blame [22], the Dynamic Systems Model of Moral Cognition [25], and the SIP-MDM framework [24].

Although these theories and models differ in content, they have similarities in process. We broke the process down into four steps and combined the best parts of each theory and model to achieve the process of moral judgment shown in Figure 1. The process is as follows: Step 1: Moral Scenario Information Identification and information recognition. Step 2: Intuitive and superficial thinking. Step 3: Causal analysis, emotional cognition assessment, and analysis of purpose and intent. Step 4: Make a judgment based on all the analysis results.

MoralAgent’s analysis process is based on the human moral judgment process, resulting in a four-step analysis plan applicable to LLMs. The process is as follows: Step 1: Text retrieval and key information extraction. Step 2: Intuitive preliminary analysis. Step 3: Multidimensional analysis based on emotions, causality, perspective-taking, and moral principles. Step 4: Synthesize all analyses for final decision-making. These steps will serve as the basis for the subsequent construction of the dynamic prompt template.

3.2.2. Design of the Memory Module

The human memory capacity can effectively recall and summarize the information in the process of moral judgment, thus ensuring coherent thinking. The process of analyzing moral scenarios generates a large amount of data, and LLMs often show a tendency to confuse or forget important information when dealing with large datasets. To ensure the effective transfer of important information during the analysis, the model can continuously refer to this information throughout the runtime. Our method will design the memory module (denoted as

M M

). This process can be represented as:

M M = m e m o r y (l e n (I n p u t))

(1)

M_{P i} = M M (I n p u t)

(2)

where

I n p u t

includes the analysis task, the output information after the LLM performs the analysis task, and the external information that is called.

l e n ()

represents the total number of characters for storing the information and is intended to be the storage length for creating the memory.

m e m o r y

is the module in LangChain that is used to create the memory module

M M

.

M_{i}

denotes the information summarized by the memory, where

i

= 1, 2, 3. Firstly, the important information

I n p u t

in the analysis process is stored, and, subsequently,

M M

summarizes the important information to obtain

M_{i}

through the LLM, which provides support for the subsequent analysis tasks. The introduction of the memory module not only effectively improves the efficiency of information processing but also enhances its stability and reliability in complex tasks. It lays the foundation for the efficient operation of the analysis process.

3.2.3. Formulation of the Moral Principles

Existing human moral principles also influence the judgment process. According to cognitive development theory experts, the imperfection of judgment standards will lead to immature moral judgments [38]. Moreover, a lack of understanding of moral principles and judgment standards will make it difficult for individuals to make accurate and appropriate decisions when facing complex moral situations. So, it is important to make sure that clear moral principles are used for analysis and evaluation.

This study excerpts internationally recognized ethical and moral norms for AI proposed by governments, institutions, and academic bodies. To mitigate regional or cultural biases, we extracted the moral principles consistently addressed across these norms and structured them into a knowledge graph. This knowledge graph stores the principle information in the form of nodes, which not only cover the basic quasi-moral principles, such as fairness, justice, peace, freedom, cooperation, respect for others, and safety, but also include guiding principles that are analyzed for the needs of special situations and specific groups of people. The content of the moral principles is more concise and standardized with this structured representation. This makes calling moral principles in LLMs more efficient and accurate. This design provides a clear reference basis for an in-depth analysis of LLMs and further enhances the reasonableness of moral judgments.

3.3. Execution Process of the MoralAgent

In the information processing process of moral judgment, it is crucial to ensure the clear transmission of important information at each step. Efficient transmission of information is the basis for avoiding confusion and omission of information. To effectively address this challenge, we will design the dynamic prompt template and collaborate with the memory module as a bridge between inputs and outputs in the four-step analysis process to realize the efficient transfer of information. The dynamic prompt template provides formatted tasks with the content of the results of the tasks performed and passes this information to the memory module. Subsequently, the memory module then summarizes the important information and uses it as input for the next prompt template.

As shown in Figure 2, the four dynamic prompt templates are notated as P₁, P₂, P₃, P₄. Fixed fields are designed in the instruction information to give the specific analysis task and to prompt the LLM to start the execution. At the same time, the variable fields are designed to take over the important information in the memory module, the external information, and the results obtained by the LLM after executing the analyzing instructions in this step. The variable fields include raw text (Text), formatted templates (Json), key information (T₁), results of preliminary analysis (T₂), results of multidimensional analysis (T₃), moral principle (T_k), and the information summarized in the memory module (M_P1, M_P2, M_P3), which will be dynamically replaced according to the progress of the task. The updated command information is noted as P₁′, P₂′, P₃′, P₄′, which will be passed into the memory module for memory storage and summarization, providing important information for analysis and the decision-making of subsequent tasks. Next, the specific process will be introduced from the four parts of key information extraction, preliminary analysis, multidimensional analysis, and comprehensive decision-making.

3.3.1. Key Information Extraction

Moral scenarios are usually of high complexity, containing not only diverse subjects and their behaviors, but also influenced by the time and place of the event, as well as other contextual information. In the face of such complex textual information, the extraction of key information helps to improve the understanding of the content of that text [39,40]. Therefore, we must first systematically extract the scene’s content to ensure the accuracy of our subsequent analysis and decision-making process.

First, the LLM receives instructions from the dynamic prompt template

P_{1}

and invokes the format template as a qualification for recognizing the content and format. Then, the language comprehension capability of LLM is utilized to finally obtain the key information

T_{1}

of

T e x t

. These include who, what, when, where, why, and how. By applying this extraction method, it helps to capture the core information systematically about the details and context of the event in the text. This process can be represented as:

P_{1} = (R e g u l a r_{1}, {T e x t}, {T_{1}}, {J s o n})

(3)

T_{1} = L L M (P_{1})

(4)

where the fixed field

R e g u l a r_{1}

represents the specific task of key information extraction and

{T e x t}

,

{T_{1}}

,

{J s o n}

are all variable fields.

This kind of key information based on the strictly formatted processing preserves the integrity of the information in the subsequent transmission process. During execution,

T e x t

as well as the output key information

T_{1}

are automatically populated with the content of the dynamic prompt template

P_{1}

updated to

P_{1}^{'}

and passed to the memory module to be summarized to get

M_{P 1}

.

3.3.2. Preliminary Analysis

In the absence of a clear analytical focus and tendency, a direct multidimensional analysis of key information could put judgments in a dilemma, thus increasing the uncertainty of the results. Simultaneously, intuition frequently influences the final outcome of human moral judgment. Therefore, it is crucial to conduct an intuitive preliminary analysis of the extracted key information. At this step, it is possible to see where the analysis is going and what the judgment results are likely to be. It is also possible to build a strong logical base for the next multidimensional analysis.

M_{P 1}

is summarized by the memory module, replacing the corresponding variable fields in the dynamic prompt template

P_{2}

. The LLM receives this information and performs an intuitive preliminary analysis, ultimately generating the result

T_{2}

. During this process, the information generated by the LLM will be passed to

P_{2}

as variable fields. Finally, the updated dynamic prompt template

P_{2}^{'}

will be passed to the memory module for storing and summarizing to get

M_{P 2}

. This process can be represented as:

M_{P 1} = M E M (P_{1}^{'})

(5)

P_{2} = (r e g u l a r_{2}, {M_{P 1}}, {T_{1}}, {T_{2}})

(6)

T_{2} = L L M (P_{2})

(7)

M_{P 2} = M E M (P_{2}^{'})

(8)

where the fixed field

R e g u l a r_{2}

indicates the specific task of the preliminary analysis and

{M_{P 1}}

,

{T_{1}}

,

{T_{2}}

are all variable fields.

3.3.3. Multidimensional Analysis

Multidimensional analysis is a necessary step to realize deep moral judgment. Its purpose is to conduct a comprehensive and detailed analysis of moral situations from different perspectives to reveal the potential motives behind behaviors. In addition, LLM invokes external moral principles to simulate the subconscious moral judgment process of human beings relying on moral principles. This process ensures the depth of moral judgment, thus avoiding one-sided conclusions.

In this phase,

P_{3}

instructs the LLM to call the moral principles

T_{m}

from the knowledge graph and extract the important information

M_{P 2}

summarized in the memory module. Subsequently, based on this information, the LLM conducts an in-depth analysis from multiple dimensions, such as emotional, causal, moral principles, and perspective-taking aspects, and finally obtains the output

T_{3}

.

P_{3}

will be automatically updated to

P_{3}^{'}

with the introduction of external information and the generation of analysis results, and will be deposited into the memory module again for summarization, resulting in

M_{p 3}

, which will support the final decision. This process can be represented as:

P_{3} = (R e g u l a r_{3}, {M_{P 2}}, {T_{k}}, {T_{3}})

(9)

T_{3} = L L M (P_{3})

(10)

M_{P 3} = M E M (P_{3}^{'})

(11)

where the fixed field

R e g u l a r_{3}

denotes the specific task of the multidimensional analysis, and

{M_{P 2}}

,

{T_{k}}

,

{T_{3}}

are all variable fields.

3.3.4. Comprehensive Decision-Making

The aforementioned steps process the

T e x t

, yielding a series of analysis results. Integration of these results with the final prompts makes a moral judgment.

The comprehensive decision-making stage will receive a summary of the information from the previous three steps

M_{P 3}

. The content of

P_{4}

will also instruct the LLM to make a final judgment based on the available information, outputting the judgment result

T_{4}

. The content in

P_{4}

will also be updated to provide an analysis basis for subsequent experiments. Then this process can be represented as:

P_{4} = (R e g u l a r_{4}, {M_{P 3}}, {T_{4}})

(12)

T_{4} = L L M (P_{4})

(13)

where the fixed field

R e g u l a r_{4}

represents the specific task of comprehensive decision-making, and

{M_{P 3}}

,

{T_{4}}

are both variable fields.

During the moral judgment process, the memory module and dynamic prompt templates work together to speed up the transfer of information during the analysis process. This makes sure that important information is communicated correctly in difficult analysis tasks. This collaborative mechanism provides a solid guarantee for the orderly execution of the LLM in multi-task collaboration.

4. Experimental Design

In this section, we introduce the experimental dataset, baseline methodology, and experimental metrics for MoralAgent. To be consistent with the experimental setup of the SOTA model ECMoral [10], this study conducts experiments by calling the GPT-3.5-turbo-instruct model from the InstructGPT family of models via OpenAI’s API.

4.1. Introduction to the Dataset

The dataset used in this study is the MoralExceptQA [9], released by the Max Planck Institute, Stanford University, and ETH Zurich. MoralExceptQA originated from a series of recent moral psychology studies designed to investigate the flexibility of human moral cognition. The dataset scale is detailed in Table 1. It consists of three moral scenarios: (1) Do not cut in line (Line); (2) Do not damage others’ property (Prop); (3) Do not throw firecrackers into the pool (Cann.). These three moral scenarios are designed based on existing psychological survey questionnaires and represent different moral cognitive processes: (1) social learning, (2) socio-cultural evolution, and (3) individual reasoning.

4.2. Baseline Methods

To evaluate the effectiveness of MoralAgent, we introduce three categories of traditional methods and two categories of LLM-based methods for performance comparison. The comparative methods are detailed below:

1.: Simple Baseline Methods. This includes Random Baseline and Always No. Random Baseline is the result of randomly choosing in moral scenarios. Always No is the result of choosing inappropriate in all scenarios.
2.: BERT Family of Models. We reference the experimental results of BERT-base [41], BERT-large [41], RoBERTa-large [42], and ALBERT-xxlarge [43] as reported in the MORALCoT study.
3.: Delphi Family of Models [44]. Delphi is trained on 1.7 million moral judgment datasets, while Delphi++ is a model trained on top of Delphi with an additional 200,000 pieces of data.
4.: GPT Family of Models. These include GPT-3 [45], text-davinci-002 [46], and GPT-3.5-turbo-instruct. We directly ask the GPT family of models whether the behavior of the scenario is appropriate or not and obtain the answer.
5.: Chain of Thought (CoT). The CoT method [47] adds “Let’s think step by step” to the instruction to guide the model in step-by-step reasoning. MORALCoT [9], based on contractualism, evaluates whether an action violates moral principles. Self-Ask [48] prompts the model to generate and answer relevant questions sequentially before making moral judgments. The “emotion-cognition” collaborative reasoning method proposed by ECMoral [10]. Auto-ECMoral [10] is an automatic reasoning chain for “emotion-cognition” collaboration, which enables LLMs to automatically generate and answer reasoning questions under guidance.

4.3. Experimental Metrics

We followed the metrics of the SOTA method ECMoral in the literature [9]. In this study, the weighted F1 score as well as the accuracy are used as the main evaluation metrics. Meanwhile, in order to show the model performance in different scenarios, we calculate the F1 score in three types of scenarios (Line, Prop, and Cann) separately.

F 1 = \frac{2 \times T P}{2 \times T P + F P + F N}

(14)

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(15)

Specifically,

T P

(True Positive) refers to the number of correctly identified appropriate cases, while

T N

(True Negative) refers to the number of correctly identified inappropriate cases.

F P

(False Positive) represents the number of cases incorrectly judged as appropriate, and

F N

(False Negative) represents the number of cases incorrectly judged as inappropriate. Furthermore, the

C o n s

metric represents the percentage of errors caused by rigidly adhering to moral principles and incorrectly judging a case as “inappropriate”.

C o n s = \frac{F N}{F N + F P}

(16)

To present a more detailed picture of the difference between the model’s moral judgments and human moral judgments, we compare the probability (

P_{moralagent, i}

) of the model output being appropriate with the probability(

P_{human, i}

) of a human choosing whether it is appropriate or not in a moral scenario. The difference between the probability of the model’s output and that of the human judgment was calculated for each problem using the mean absolute error (

M A E

) as a measure of the model’s bias in moral judgments, and the cross-entropy (

C E

) between these two probability distributions was calculated.

M A E = \frac{1}{N} \sum_{i = 1}^{N} |P_{h u m a n, i} - P_{moralagent, i}|

(17)

C E = - \frac{1}{N} \sum_{i = 1}^{N} [P_{h u m a n, i} l o g (P_{moralagent, i}) + (1 - P_{h u m a n, i}) l o g (1 - P_{moralagent, i})]

(18)

5. Analysis of Experimental Results

5.1. Comparative Experiment

As shown in Table 2, MoralAgent is more effective in moral judgment tasks. Its F1 score and accuracy score are higher than those of all other methods. MoralAgent introduced a formatted tool and utilized a memory module combined with a dynamic prompt module to facilitate the transmission of important information, thereby suppressing, to some extent, the uncertainty in model outputs. Therefore, this method exhibited low variance in various experiments, demonstrating good stability.

We found that methods such as the GPT family of models, CoT [47], and Self-Ask [48], which mainly rely on the intrinsic capabilities of LLMs with minimal external guidance, tend to produce moral judgments that reflect high-standard moral norms. However, this tendency also results in a lack of necessary flexibility when handling complex scenarios. In contrast, methods such as MORALCoT, ECMoral, and Auto-ECMoral improve their judgment capabilities in complex contexts by enhancing the design of their reasoning chains. However, these methods rely solely on a specific human-defined rule, lack effective control over the data transmission process, and do not consider existing moral principles. As a result, they fail to fully explore and utilize the intelligence of LLM. The MoralAgent method proposed in this paper designs a reasonable moral analysis and judgment process and collaborates with the memory modules and dynamic prompt templates to ensure efficient data transmission. Meanwhile, it leverages moral principles to enhance the comprehensiveness of the analysis. This method fully strengthens the moral judgment capability of LLMs.

To ensure more reliable experimental results, this study further increased the number of experiments in comparative analysis compared to methods in previous research. A total of 10 rounds of experiments were conducted. As shown in Figure 3, the results demonstrate that the judgment results generated by this method exhibit high stability. Specifically, the variance of the F1 score is 1.60, and no extreme values caused by the uncertainty of LLM outputs were observed. From the second round onward, the average F1 score in each round fluctuated slightly around 76. Therefore, based on a comprehensive consideration of the experimental results and resources, the final report adopts the average results of the 10 rounds of experiments as the final evaluation metric.

5.2. Ablation Study

To verify the effect of different components of the method in this paper on the experimental results, we conduct ablation experiments on the MoralExceptQA dataset. We use the F1 score as a measure, and Table 3 displays the results.

Ablation of Key Information Extraction. If we do not perform key information extraction on the original text at the beginning, the F1 score is 66.67%. After analysis, the extraction of key information from the original text can significantly enhance the LLM’s in-depth understanding of the text content, thus avoiding the neglect of key content due to the gradual deepening of communication in the subsequent interactive reasoning process. It can reduce the phenomenon of blurring of actors and behaviors in the judging process and, at the same time, avoid the neglect of important background information beyond the main events. However, this information is usually the key element to focus on in moral judgment.

Ablation of Preliminary Analysis. The experimental results indicate that the lack of preliminary analysis in this method, MoralAgent, will appear to have a certain degree of influence on the final judgment results, in which the F1 score decreases from 76.11% to 73.21%, and the performance in all three types of scenarios decreases. Since many moral scenarios are such that a sound judgment can be given quickly and unilaterally, a direct multidimensional analysis may lead to over-complicating the analysis and instead lead to a dilemma in judgment. This suggests that making an intuitive analysis of the moral scenario first helps subsequent analysis and judgment.

Ablation of Multidimensional Analysis. We observed that the F1 score of the ablated MoralAgent method decreased by about 16%, especially in the Prop scenario, where its F1 score was the lowest among all ablation studies. This study found that although the big language model itself has powerful data as a foundation, it still leads to too one-sided analysis results when dealing with complex situations due to the lack of necessary information such as moral principles and the single dimension of consideration. Especially when faced with rules that are shaped by social conventions and at the same time possess exceptions, the performance drops significantly. This demonstrates the importance of the multidimensional analytical capabilities of LLMs when dealing with complex information.

Ablation of Memory Module. The F1 score decreased by about 15%. In the Line scenario, the F1 score was only 56.25%. This study found that without a dedicated memory module for information storage and summarization, the large amount of information generated by the LLM during the analysis process is directly transferred to the subsequent tasks. This direct transfer of information can lead to information buildup, which can trigger the confusion of important information. Especially in the final judgment stage, the model has forgotten the important information, which leads to the instability of the output results. Therefore, to ensure the logic of the judgment process and the reliability of the final decision result, the memory module is indispensable.

5.3. Generalization Ability of the MoralAgent

The model is continuously updated iteratively, while the moral scenarios are characterized by diversity. In view of this, we will verify the generalization performance of the MoralAgent method based on these two dimensions.

Model Generalizability. The GPT-3.5-Instruct model has become the most representative model in the InstructGPT family. To verify the generalization ability of the MoralAgent method proposed in this paper, the GPT-4o model was selected for experimental analysis and compared with other methods, as it is the most utilized and widely used model in the ChatGPT family. The experimental results are shown in Table 4, where the MoralAgent method performs the best among all methods on the GPT-4o model. Its F1 score reaches 81.16%, which indicates that this method is able to deeply integrate with the current mainstream models and significantly improve its performance. In addition, this experiment also reveals that, although the performance gap between GPT-3.5-Instruct and GPT-4o is large, GPT-3.5-Instruct, based on the MoralAgent method, is close to GPT-4o in terms of moral judgment ability. This further demonstrates the effectiveness of the current method.

Scenario Generalization. The ETHICS dataset is known for its large size and wide coverage. To verify the effectiveness of the proposed method, we randomly selected 10% of the samples for prediction on the moral judgment test set with the category “commonsense” in the ETHICS dataset. GPT-3.5-turbo-instruct is used as the base model. The experimental results are shown in Table 5. On the selected dataset, the MoralAgent method demonstrated superior performance compared to other methods. This result confirms that MoralAgent possesses wider applicability, thus providing a more reliable solution for ethical judgment.

5.4. Sample Analysis

To illustrate and demonstrate the method of this paper more intuitively, we select representative samples, as shown in Figure 4.

Step 1: Key Information Extraction. The LLM receives key information extraction instructions from the dynamic prompt templates and calls the formatting templates to extract the six key pieces of information: who, what, when, where, why, and how. The main points of the moral scenarios become clearer through this structured extraction process. This approach gives a solid base for further analysis and making the final decision. At this step, we update the dynamic prompt template’s content with key information. Further, these are stored in memory and summarized in a memory module to support subsequent analysis. During the subsequent analysis, it can be seen that the extracted objects and their behaviors, as well as the contextual information of the scene, are fully taken into account.

Step 2: Preliminary Analysis. The LLM extracts the information obtained earlier from memory and makes an intuitive analysis of this information to generate a brief analysis result. The results of the preliminary analysis reveal the motivation behind the maintenance worker’s behavior and give a general direction for the analysis by making basic considerations about the relevant group (customers who are standing in line) that the behavior may affect. This information will also be remembered and summarized in the memory module and passed on to the next stage as input for the multidimensional analysis.

Step 3: Multidimensional Analysis. In this stage, the big language model utilizes ethical principles and analyzes the problem comprehensively in multiple dimensions, such as sentiment analysis, causal reasoning, and transpersonal thinking. It enriches the analysis and enhances the interpretability of moral judgment. As can be seen from the results of the analysis, this stage indicates the necessity of the maintenance worker’s work, the benefits that such behavior will eventually bring, and gives a clearer judgment tendency. At the same time, the interests and feelings of the customers in the queue are considered more fully, and the idea that specific problems need to be analyzed specifically is shown. The memory module then stored and summarized these results as the foundation for the final judgment.

Step 4: Comprehensive Decision-Making. The LLM receives the information summarized into the memory module at each stage and outputs the final judgment results and reasons. The result of this moral judgment is yes, which indicates that the maintenance worker’s behavior of jumping the queue is reasonable in the scenario.

This method fully simulates human thinking in the process of moral analysis and judgment, and it generates a more comprehensive and reasonable analytical result.

6. Conclusions

6.1. Summary of This Study

To promote the sustainable development of artificial intelligence and society, and to address the deficiencies in the existing moral judgment methods based on LLMs, this paper proposes MoralAgent, an agent method for moral judgment using LLMs. This method simulates the human analysis process through a small amount of cueing information and creates moral principles to assist in the analysis. In addition, collaborative memory modules and dynamic prompt templates provide efficient control of information flow. This method provides analysis results one layer at a time by extracting key information, conducting preliminary analysis, and then performing multidimensional analysis. Finally, it gives moral judgment. The experimental results indicate that the proposed method has an obvious improvement in F1 score, accuracy, and stability, and proves the effectiveness of the proposed method on different models and datasets. This work provides valuable insights for developing more trustworthy AI systems, particularly benefiting developers building ethically aligned applications and end-users relying on AI decisions in sensitive domains. The enhanced moral reasoning capability addresses critical needs for the responsible deployment of LLMs in real-world scenarios.

6.2. Limitations and Future Directions

However, this method still has certain limitations. First, while this method employs an agent-based approach to enhance the LLM’s moral judgment capabilities via external prompt templates, memory module, and tools, the model’s inherent moral foundation itself remains unaltered by training. Secondly, while the method is designed based on contemporary advanced moral judgment theories, it is important to note that human morality itself evolves with time, and moral judgment theories will continue to evolve accordingly. Lastly, in real-time application scenarios, there remains room for improvement in the system’s response efficiency and resource utilization.

Therefore, future research could explore three key directions: First, by constructing high-quality moral dilemma datasets to directly train the inherent moral judgment abilities of LLMs, enabling the development of end-to-end moral alignment training frameworks. Second, establishing dynamic moral adaptation systems that utilize incremental learning and time-varying knowledge bases to allow models to continuously track shifts in societal ethics. Third, optimizing real-time decision-making architectures and investigating efficient inference methods to enhance system responsiveness. These directions will advance AI moral judgment systems toward greater autonomy, adaptability, and practical utility.

Author Contributions

Conceptualization, S.X. and H.G.; methodology, H.G.; software, H.G.; validation, S.X., W.L. and H.G.; formal analysis, S.X., H.G. and L.Y.; resources, S.X.; data curation, H.G. and W.L.; supervision, S.X., H.G. and L.Y.; funding acquisition, S.X. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Hunan Provincial Education Department Graduate Research and Innovation Project (Grant no. CX20240960).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

LLMs	Large Language Models
LLM	Large Language Model
AI	Artificial Intelligence
MM	Memory Model
CoT	Chain of Thought
TP	True Positive
TN	True Negative
FP	False Positive
FN	False Negative

References

Sheng, E.; Chang, K.-W.; Natarajan, P.; Peng, N. Towards Controllable Biases in Language Generation. arXiv 2020, arXiv:2005.00268. [Google Scholar] [CrossRef]
Cheng, P.; Hao, W.; Yuan, S.; Si, S.; Carin, L. FairFil: Contrastive Neural Debiasing Method for Pretrained Text Encoders. arXiv 2021, arXiv:2103.06413. [Google Scholar] [CrossRef]
Berg, H.; Hall, S.M.; Bhalgat, Y.; Yang, W.; Kirk, H.R.; Shtedritski, A.; Bain, M. A Prompt Array Keeps the Bias Away: Debiasing Vision-Language Models with Adversarial Learning. arXiv 2022, arXiv:2203.11933. [Google Scholar] [CrossRef]
Qian, J.; Dong, L.; Shen, Y.; Wei, F.; Chen, W. Controllable Natural Language Generation with Contrastive Prefixes. arXiv 2022, arXiv:2202.13257. [Google Scholar] [CrossRef]
Wang, Y.; Kordi, Y.; Mishra, S.; Liu, A.; Smith, N.A.; Khashabi, D.; Hajishirzi, H. Self-Instruct: Aligning Language Models with Self-Generated Instructions. arXiv 2023, arXiv:2212.10560. [Google Scholar] [CrossRef]
A Public and Large-Scale Expert Information Fusion Method and Its Application: Mining Public Opinion via Sentiment Analysis and Measuring Public Dynamic Reliability. Information Fusion 2022, 78, 71–85. [CrossRef]
Sun, Z.; Shen, Y.; Zhou, Q.; Zhang, H.; Chen, Z.; Cox, D.; Yang, Y.; Gan, C. Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision. Adv. Neural Inf. Process. Syst. 2023, 36, 2511–2565. [Google Scholar]
Liu, H.; Sferrazza, C.; Abbeel, P. Chain of Hindsight Aligns Language Models with Feedback. arXiv 2023, arXiv:2302.02676. [Google Scholar] [CrossRef]
Jin, Z.; Levine, S.; Gonzalez Adauto, F.; Kamal, O.; Sap, M.; Sachan, M.; Mihalcea, R.; Tenenbaum, J.; Schölkopf, B. When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment. Adv. Neural Inf. Process. Syst. 2022, 35, 28458–28473. [Google Scholar]
Wu, D.; Zhao, Y.; Qin, B. A Joint Emotion-Cognition Based Approach for Moral Judgement. Available online: https://crad.ict.ac.cn/article/doi/10.7544/issn1000-1239.202330812 (accessed on 16 June 2025).
Hua, W.; Fan, L.; Li, L.; Mei, K.; Ji, J.; Ge, Y.; Hemphill, L.; Zhang, Y. War and Peace (WarAgent): Large Language Model-Based Multi-Agent Simulation of World Wars. arXiv 2024, arXiv:2311.17227. [Google Scholar] [CrossRef]
Yang, J.; Fu, J.; Zhang, W.; Cao, W.; Liu, L.; Peng, H. MoE-AGIQA: Mixture-of-Experts Boosted Visual Perception-Driven and Semantic-Aware Quality Assessment for AI-Generated Images. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 17–18 June 2024; pp. 6395–6404. [Google Scholar] [CrossRef]
Yang, J.; Fu, J.; Zhang, Z.; Liu, L.; Li, Q.; Zhang, W.; Cao, W. Align-IQA: Aligning Image Quality Assessment Models with Diverse Human Preferences via Customizable Guidance. In Proceedings of the MM ‘24: The 32nd ACM International Conference on Multimedia, Melbourne, VIC, Australia, 28 October–1 November 2024. [Google Scholar]
Dong, L.; Jiang, F.; Peng, Y.; Wang, K.; Yang, K.; Pan, C.; Schober, R. LAMBO: Large AI Model Empowered Edge Intelligence. IEEE Commun. Mag. 2025, 63, 88–94. [Google Scholar] [CrossRef]
Li, X.; Deng, R.; Wei, J.; Wu, X.; Chen, J.; Yi, C.; Cai, J.; Niyato, D.; Shen, X. AIGC-Driven Real-Time Interactive 4D Traffic Scene Generation in Vehicular Networks. IEEE Netw. 2025, early access. [Google Scholar] [CrossRef]
Shaver, K.G. The Attribution of Blame: Causality, Responsibility, and Blameworthiness; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
Weiner, B. Judgments of Responsibility: A Foundation for a Theory of Social Conduct; Guilford Press: New York, NY, USA, 1995. [Google Scholar]
Cushman, F. Crime and Punishment: Distinguishing the Roles of Causal and Intentional Analyses in Moral Judgment. Cognition 2008, 108, 353–380. [Google Scholar] [CrossRef] [PubMed]
Knobe, J. Person as Scientist, Person as Moralist. Behav. Brain Sci. 2010, 33, 315–329. [Google Scholar] [CrossRef]
Haidt, J.; Hersh, M.A. Sexual Morality: The Cultures and Emotions of Conservatives and Liberals. J. Appl. Soc. Pyschol. 2001, 31, 191–221. [Google Scholar] [CrossRef]
Greene, J.D. The Secret Joke of Kant’s Soul. Moral Psychol. 2008, 3, 35–79. [Google Scholar]
Malle, B.F.; Guglielmo, S.; Monroe, A.E. A Theory of Blame. Psychol. Inq. 2014, 25, 147–186. [Google Scholar] [CrossRef]
Garrigan, B.; Adlam, A.L.; Langdon, P.E. Moral Decision-Making and Moral Development: Toward an Integrative Framework. Dev. Rev. 2018, 49, 80–100. [Google Scholar] [CrossRef]
Crick, N.R.; Dodge, K.A. A Review and Reformulation of Social Information-Processing Mechanisms in Children’s Social Adjustment. Psychol. Bull. 1994, 115, 74–101. [Google Scholar] [CrossRef]
Van Bavel, J.; FeldmanHall, O.; Mende-Siedlecki, P. The Neuroscience of Moral Cognition: From Dual Processes to Dynamic Systems. Curr. Opin. Psychol. 2015, in press. [Google Scholar] [CrossRef]
Yang, Z.; Yi, X.; Li, P.; Liu, Y.; Xie, X. Unified Detoxifying and Debiasing in Language Generation via Inference-Time Adaptive Optimization. arXiv 2023, arXiv:2210.04492. [Google Scholar] [CrossRef]
Lu, K.; Mardziel, P.; Wu, F.; Amancharla, P.; Datta, A. Gender Bias in Neural Natural Language Processing. In Logic, Language, and Security; Nigam, V., Ban Kirigin, T., Talcott, C., Guttman, J., Kuznetsov, S., Thau Loo, B., Okada, M., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2020; Volume 12300, pp. 189–202. [Google Scholar] [CrossRef]
Ganguli, D.; Askell, A.; Schiefer, N.; Liao, T.I.; Lukošiūtė, K.; Chen, A.; Goldie, A.; Mirhoseini, A.; Olsson, C.; Hernandez, D.; et al. The Capacity for Moral Self-Correction in Large Language Models. arXiv 2023, arXiv:2302.07459. [Google Scholar] [CrossRef]
Saunders, W.; Yeh, C.; Wu, J.; Bills, S.; Ouyang, L.; Ward, J.; Leike, J. Self-Critiquing Models for Assisting Human Evaluators. arXiv 2022, arXiv:2206.05802. [Google Scholar] [CrossRef]
Jiang, W.; Wang, Y.; Jiang, Y.; Chen, J.; Xu, Y.; Tan, L. Research on Mobile Internet Mobile Agent System Dynamic Trust Model for Cloud Computing. China Commun. 2019, 16, 174–194. [Google Scholar] [CrossRef]
Jiang, W.; Liu, W.; Xia, H.; Xu, Y.; Cao, D.; Liang, G. Research on Intelligent Mobile Commerce Transaction Security Mechanisms Based on Mobile Agent. CMC 2020, 65, 2543–2555. [Google Scholar] [CrossRef]
Bang, Y.; Cahyawijaya, S.; Lee, N.; Dai, W.; Su, D.; Wilie, B.; Lovenia, H.; Ji, Z.; Yu, T.; Chung, W.; et al. A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity. arXiv 2023, arXiv:2302.04023. [Google Scholar] [CrossRef]
Zhu, G.; Cai, C.; Pan, B.; Wang, P. A Multi-Agent Linguistic-Style Large Group Decision-Making Method Considering Public Expectations. Int. J. Comput. Intell. Syst. 2021, 14, 188. [Google Scholar] [CrossRef]
Jiang, F.; Dong, L.; Wang, K.; Yang, K.; Pan, C. Distributed Resource Scheduling for Large-Scale MEC Systems: A Multiagent Ensemble Deep Reinforcement Learning with Imitation Acceleration. IEEE Internet Things J. 2022, 9, 6597–6610. [Google Scholar] [CrossRef]
Xi, Z.; Chen, W.; Guo, X.; He, W.; Ding, Y.; Hong, B.; Zhang, M.; Wang, J.; Jin, S.; Zhou, E.; et al. The Rise and Potential of Large Language Model Based Agents: A Survey. Sci. China Inf. Sci. 2025, 68, 121101. [Google Scholar] [CrossRef]
Bran, A.M.; Cox, S.; Schilter, O.; Baldassari, C.; White, A.D.; Schwaller, P. ChemCrow: Augmenting Large-Language Models with Chemistry Tools. arXiv 2023, arXiv:2304.05376. [Google Scholar] [CrossRef]
Swan, M.; Kido, T.; Roland, E.; dos Santos, R.P. Math Agents: Computational Infrastructure, Mathematical Embedding, and Genomics. arXiv 2023, arXiv:2307.02502. [Google Scholar] [CrossRef]
Kiley Hamlin, J.; Wynn, K.; Bloom, P. Three-month-olds Show a Negativity Bias in Their Social Evaluations. Dev. Sci. 2010, 13, 923–929. [Google Scholar] [CrossRef]
Hamborg, F.; Breitinger, C.; Schubotz, M.; Lachnit, S.; Gipp, B. Extraction of Main Event Descriptors from News Articles by Answering the Journalistic Five W and One H Questions. In Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries, Fort Worth, TX, USA, 3–7 June 2018; pp. 339–340. [Google Scholar] [CrossRef]
Jin, P.; Mu, L.; Zheng, L.; Zhao, J.; Yue, L. News Feature Extraction for Events on Social Network Platforms. In Proceedings of the 26th International Conference on World Wide Web Companion—WWW ’17 Companion, Perth, Australia, 3–7 April 2017; pp. 69–78. [Google Scholar] [CrossRef]
Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. Bert: Pre-Training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; Volume 1 (long and short papers), pp. 4171–4186. [Google Scholar]
Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv 2019, arXiv:1907.11692. [Google Scholar] [CrossRef]
Lan, Z.; Chen, M.; Goodman, S.; Gimpel, K.; Sharma, P.; Soricut, R. ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations. arXiv 2020, arXiv:1909.11942. [Google Scholar] [CrossRef]
Jiang, L.; Hwang, J.D.; Bhagavatula, C.; Bras, R.L.; Liang, J.; Dodge, J.; Sakaguchi, K.; Forbes, M.; Borchardt, J.; Gabriel, S.; et al. Can Machines Learn Morality? The Delphi Experiment. arXiv 2022, arXiv:2110.07574. [Google Scholar] [CrossRef]
Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language Models Are Few-Shot Learners. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: New York, NY, USA, 2020; Volume 33, pp. 1877–1901. [Google Scholar]
Ouyang, L.; Wu, J.; Jiang, X.; Almeida, D.; Wainwright, C.; Mishkin, P.; Zhang, C.; Agarwal, S.; Slama, K.; Ray, A. Training Language Models to Follow Instructions with Human Feedback. Adv. Neural Inf. Process. Syst. 2022, 35, 27730–27744. [Google Scholar]
Wei, J.; Wang, X.; Schuurmans, D.; Bosma, M.; Xia, F.; Chi, E.; Le, Q.V.; Zhou, D. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. Adv. Neural Inf. Process. Syst. 2022, 35, 24824–24837. [Google Scholar]
Press, O.; Zhang, M.; Min, S.; Schmidt, L.; Smith, N.A.; Lewis, M. Measuring and Narrowing the Compositionality Gap in Language Models. arXiv 2023, arXiv:2210.03350. [Google Scholar] [CrossRef]

Figure 1. Design process of MoralAgent.

Figure 2. Execution process of the MoralAgent.

Figure 3. F1 score and average F1 score for each round.

Figure 4. Sample analysis of MoralAgent.

Table 1. MoralExceptQA dataset Size.

Dataset	Number of Scenes
Do not cut in line (Line)	66
Do not damage others’ property (Prop)	54
Do not throw firecrackers into the pool (Cann)	28
Total	148

Table 2. Comparative experimental results on MoralExceptQA.

Model/Method		Overall Performance					F1 on Each Subset
Model/Method		F1(↑)	Accuracy (↑)	Cons	MAE (↓)	CE (↓)	Line (↑)	Prop (↑)	Cann (↑)
Simple Baseline Methods	Random Baseline	49.37 (4.50)	48.82 (4.56)	40.08 (2.85)	0.35 (0.02)	1.00 (0.09)	44.88 (7.34)	57.55 (10.34)	48.36 (1.67)
Simple Baseline Methods	Always No	45.99 (0.00)	60.81 (0.00)	100.00 (0.00)	0.258 (0.00)	0.70 (0.00)	33.33 (0.00)	70.60 (0.00)	33.33 (0.00)
BERT Family of Models.	BERT-base	45.28 (6.41)	48.87 (10.52)	64.16 (21.36)	0.26 (0.02)	0.82 (0.19)	40.81 (8.93)	51.65 (22.04)	43.51 (11.12)
	BERT-large	52.49 (1.95)	56.53 (2.73)	69.61 (16.79)	0.27 (0.01)	0.71 (0.01)	42.53 (2.72)	62.46 (6.46)	45.46 (7.20)
	RoBERTa-large	23.76 (2.02)	39.64 (0.78)	0.75 (0.65)	0.30 (0.01)	0.76 (0.02)	34.96 (3.42)	6.89 (0.00)	38.32 (4.32)
	ALBERT-xxlarge	22.07 (0.00)	39.19 (0.00)	0.00 (0.00)	0.46 (0.00)	1.41 (0.04)	33.33 (0.00)	6.89 (0.00)	33.33 (0.00)
Delphi Family of Models	Delphi	48.51 (0.42)	61.26 (0.78)	97.70 (1.99)	0.42 (0.01)	2.92 (0.23)	33.33 (0.00)	70.60 (0.00)	44.29 (2.78)
Delphi Family of Models	Delphi++	58.27 (0.00)	62.16 (0.00)	76.79 (0.00)	0.34 (0.00)	1.34 (0.00)	36.61 (0.00)	70.60 (0.00)	40.81 (0.00)
GPT Family of Models	GPT3	52.32 (3.14)	58.95 (3.72)	80.67 (15.50)	0.27 (0.02)	0.72 (0.03)	36.53 (3.70)	72.58 (6.01)	41.20 (7.54)
	text-davinci-002	53.94 (5.48)	64.36 (2.43)	98.52 (1.91)	0.38 (0.04)	1.59 (0.43)	42.40 (7.17)	70.00 (0.00)	50.48 (11.67)
	GPT-3.5-turbo-instruct	53.13 (6.27)	62.84 (3.41)	70.45 (10.10)	0.39 (0.03)	1.57 (0.37)	37.66 (6.12)	48.01 (3.04)	65.75 (7.32)
Chain of Thought	CoT	62.02 (4.68)	62.84 (6.02)	58.46 (17.5)	0.40 (0.02)	4.87 (0.73)	54.4 (4.30)	72.50 (11.11)	59.57 (5.07)
	MORALCoT	64.47 (5.31)	66.05 (4.43)	66.96 (2.11)	0.38 (0.02)	3.20 (0.30)	62.10 (5.13)	70.68 (5.14)	54.04 (1.43)
	Self-Ask	53.58 (2.46)	62.84 (1.23)	93.62 (1.14)	0.40 (0.02)	4.57 (0.85)	42.50 (4.26)	72.44 (2.68)	46.90 (1.20)
	ECMoral	71.98 (1.76)	72.13 (1.50)	50.16 (12.87)	0.29 (0.02)	1.78 (0.27)	66.24 (3.90)	85.56 (8.03)	53.95 (4.44)
	Auto-ECMoral	67.70 (2.14)	68.58 (2.79)	59.53 (19.75)	0.31 (0.01)	1.75 (0.37)	59.46 (2.94)	81.30 (5.69)	55.26 (4.94)
	MoralAgent	76.11 (1.60)	81.76 (1.54)	53.26 (8.87)	0.29 (0.02)	1.77 (0.29)	75.76 (3.23)	80.00 (5.78)	74.07 (3.92)

Note: “↑” indicates that the larger the value, the better the performance; “↓” indicates that the smaller the value, the better the performance. The values in parentheses represent the variance of 10 experimental results. The bold values indicate the best values.

Table 3. Ablation study.

Ablation Component	F1	Line	Prop	Cann
MoralAgent	76.11	75.76	80.00	74.07
Ablation of Key Information Extraction	66.67	61.54	73.68	74.07
Ablation of Preliminary Analysis	73.21	73.85	76.19	69.23
Ablation of Multidimensional Analysis	59.65	59.70	55.56	62.07
Ablation of Multidimensional Analysis	61.11	56.25	66.67	69.23

Note: The bold numbers are the optimal results.

Table 4. Results of MoralAgent used on GPT4o.

Method	F1	Line	Prop	Cann
GPT4o	76.64	71.43	85.71	80.00
MORALCoT	76.92	76.17	75.68	75.68
ECMoral	77.78	76.19	86.96	75.68
MoralAgent	81.16	77.65	85.71	87.50

Note: The bold numbers are the optimal results.

Table 5. Results of MoralAgent’s use on the ETHICS dataset.

Method	F1	Accuracy
GPT-3.5-turbo-instruct	61.41	60.23
MORALCoT	51.33	57.52
ECMoral	63.89	61.17
MoralAgent	68.67	63.32

Note: The bold numbers are the optimal results.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xiong, S.; Gu, H.; Liang, W.; Yin, L. Moral Judgment with a Large Language Model-Based Agent. Electronics 2025, 14, 2580. https://doi.org/10.3390/electronics14132580

AMA Style

Xiong S, Gu H, Liang W, Yin L. Moral Judgment with a Large Language Model-Based Agent. Electronics. 2025; 14(13):2580. https://doi.org/10.3390/electronics14132580

Chicago/Turabian Style

Xiong, Shuchu, Haozhan Gu, Wei Liang, and Lu Yin. 2025. "Moral Judgment with a Large Language Model-Based Agent" Electronics 14, no. 13: 2580. https://doi.org/10.3390/electronics14132580

APA Style

Xiong, S., Gu, H., Liang, W., & Yin, L. (2025). Moral Judgment with a Large Language Model-Based Agent. Electronics, 14(13), 2580. https://doi.org/10.3390/electronics14132580

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Moral Judgment with a Large Language Model-Based Agent

Abstract

1. Introduction

2. Related Works

2.1. Theoretical Research on Human Moral Judgment

2.2. Research on Moral Judgment in LLMs

2.3. Research on LLM Agents

3. Methodology

3.1. Task Definition

3.2. Design of the MoralAgent Method

3.2.1. Design of the Analysis Process

3.2.2. Design of the Memory Module

3.2.3. Formulation of the Moral Principles

3.3. Execution Process of the MoralAgent

3.3.1. Key Information Extraction

3.3.2. Preliminary Analysis

3.3.3. Multidimensional Analysis

3.3.4. Comprehensive Decision-Making

4. Experimental Design

4.1. Introduction to the Dataset

4.2. Baseline Methods

4.3. Experimental Metrics

5. Analysis of Experimental Results

5.1. Comparative Experiment

5.2. Ablation Study

5.3. Generalization Ability of the MoralAgent

5.4. Sample Analysis

6. Conclusions

6.1. Summary of This Study

6.2. Limitations and Future Directions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI