AcademiCraft: Transforming Writing Assistance for English for Academic Purposes with Multi-Agent System Innovations

Du, Zhendong; Hashimoto, Kenji

doi:10.3390/info16040254

Open AccessArticle

AcademiCraft: Transforming Writing Assistance for English for Academic Purposes with Multi-Agent System Innovations

by

Zhendong Du

and

Kenji Hashimoto

^*

Graduate School of Information, Production and Systems, Waseda University, Fukuoka 8080135, Japan

^*

Author to whom correspondence should be addressed.

Information 2025, 16(4), 254; https://doi.org/10.3390/info16040254

Submission received: 10 February 2025 / Revised: 6 March 2025 / Accepted: 13 March 2025 / Published: 21 March 2025

(This article belongs to the Special Issue Semantic Web and Language Models)

Download

Browse Figures

Versions Notes

Abstract

In the realm of academic English writing, both native (L1) and non-native (L2) speakers face significant challenges due to the complex linguistic structures and conventions required. Existing writing assistance tools, while useful for grammar correction and text enhancement, often fall short by providing only corrected output without explanations, and they typically operate as opaque, proprietary systems. This study introduces AcademiCraft, a novel EAP writing assistance built upon a multi-agent system (MAS) and advanced large-language models (LLMs), integrating state-of-the-art natural language processing (NLP) and linguistic research. AcademiCraft distinguishes itself by correcting and refining academic texts and offering detailed explanations for its revisions, thereby deepening the user’s understanding of writing conventions. The tool’s open and replicable framework contrasts sharply with the closed nature of most commercial products. Through a series of rigorous scientific benchmarks, AcademiCraft has been shown to outperform leading commercial writing assistance tools in several key areas, including its ability to handle complex syntactic structures, improve coherence and cohesion in academic texts, and provide contextually accurate word choices, all while maintaining high levels of grammatical precision. This paper details the methodologies and underlying principles of AcademiCraft, and presents a comparative analysis of its performance, demonstrating its superior capability in supporting both L1 and L2 writers in mastering the complexities of academic English writing.

Keywords:

natural language processing; English for academic purposes; English writing assistance; AI agent; multi-agent system; large language model; prompt engineering

Graphical Abstract

1. Introduction

English for Academic Purposes (EAP) writing presents a formidable challenge, demanding a high degree of linguistic proficiency and academic rigor. This challenge is not only prevalent among L2 English speakers but also affects L1 writers who must navigate complex linguistic phenomena [1] to produce clear, coherent, and academically appropriate texts. The complexities inherent in EAP stem from its strict adherence to grammatical norms, precise vocabulary usage, and the construction of logically structured arguments, which are essential for academic communication and often pose significant hurdles in learning and mastering academic writing skills.

In response to these challenges, various EAP writing assistance tools have been developed to aid writers in improving their text quality. These tools typically focus on grammatical error correction (GEC) and text polishing to enhance readability and adherence to academic standards. However, while these tools automate the correction process, they often fall short in several key areas. Most notably, current EAP writing assistance systems tend to deliver corrected outputs without providing detailed explanations for the revisions made. This lack of transparency can hinder writers’ linguistic development, as understanding the rationale behind specific corrections is crucial for improving writing skills.

Moreover, the proprietary nature of the technology behind most commercial EAP writing tools obscures their operational principles, which limits users’ and researchers’ ability to assess and learn from the technology. Without access to the underlying algorithms and methodologies, it is difficult for users to critically evaluate the suggestions made by these tools or for educational technologists to integrate these tools effectively into broader educational contexts.

Recognizing these gaps, our research incorporates the latest in NLP and integrates substantial linguistic insights to develop AcademiCraft, an innovative EAP writing assistance. What sets AcademiCraft apart is its dual focus—not only does it provide essential GEC and text enhancements, but it also offers detailed explanations for each revision. This functionality is crafted to deepen the users’ understanding of common writing errors and the most effective correction strategies, thus aligning with the educational objectives of EAP assistance.

By marrying NLP technology with thorough linguistic analysis, AcademiCraft strives to be a transparent, educational tool that does more than just improve text quality—it aims to enhance the writer’s skill and understanding. This approach serves the dual purpose of producing superior academic texts and fostering essential writing skills for sustained academic achievement. In the subsequent sections, we will explore in detail the methodologies, operational principles, strengths, and limitations of AcademiCraft, showcasing how it supports both L1 and L2 writers in mastering the complexities of academic English writing.

The primary contributions of this work are threefold:

We introduce AcademiCraft, a novel MAS-based EAP writing assistance that not only corrects and enhances texts but also provides detailed explanations for revisions, fostering a deeper understanding of academic writing conventions.
We develop an innovative approach that integrates LLM-based agents for different aspects of text revision, including grammar correction, sentence-level enhancement, paragraph coherence, and chapter-level organization, providing comprehensive writing support.
Through extensive empirical evaluation, we demonstrate AcademiCraft’s superior performance compared to leading commercial tools across multiple metrics, including grammatical accuracy, text coherence, and academic language use.

2. Related Work

2.1. EAP

EAP is a specialized branch of English language education aimed at preparing learners for the linguistic demands of higher education and professional academic settings. EAP programs are designed to develop critical language skills tailored to the specific needs of academic contexts, focusing not only on general language proficiency but also on the mastery of academic conventions and genres [2].

In EAP, academic writing is a central skill [3] that encompasses understanding and producing various types of academic texts, such as research papers, theses, dissertations, and conference presentations. These texts demand a high level of formality, precision, and adherence to the conventions of academic discourse. The challenges in EAP writing can be extensive, ranging from structuring arguments coherently to employing appropriate academic tone and style [4,5,6,7].

For L1 speakers, the primary challenge often involves honing their ability to convey complex ideas succinctly and formally [8], transitioning from everyday communicative styles to the specialized registers of academic English. They must learn to deploy advanced syntactic structures [9] and sophisticated vocabulary [10] that accurately convey nuanced arguments and evidence critical engagement with scholarly materials.

For L2 speakers, EAP writing introduces additional layers of complexity. L2 learners must overcome linguistic barriers that include mastering a new vocabulary set, often filled with abstract and discipline-specific terms, and grappling with syntactic structures that are unfamiliar or markedly different from those in their native language [11]. Issues such as verb tense consistency [12], the use of articles, and preposition usage can significantly affect the clarity and professionalism of their academic writing [13].

Furthermore, L2 learners are required to understand and apply rhetorical strategies that may not align with those from their cultural or linguistic backgrounds [14]. This includes developing a sense of audience awareness [15,16,17] and adopting an argumentative structure that supports the norms of English academic writing, which typically emphasizes a clear thesis, logical progression of ideas, and a strong conclusion [18,19].

Overall, mastering EAP writing is a demanding task that necessitates a deep understanding of linguistic, cultural, and disciplinary norms, alongside a continuous refinement of language skills to meet the exacting standards of academic communication.

2.2. EAP Writing Assistance

In the field of EAP, various commercial products have been developed to assist learners in improving their academic writing. This section provides an overview of several notable tools designed to aid users in refining their writing through advanced linguistic technologies.

Pitaya (https://www.mypitaya.com (accessed on 1 July 2024)) offers a feature known as “Rewrite”, which enables users to input academic texts and receive alternative versions with enhanced clarity and style. The tool uses advanced algorithms to restructure sentences and suggest more academically appropriate vocabulary, thereby improving the overall readability and formal tone required in academic writing.

Wordvice AI (https://wordvice.ai/tools/paraphrasing (accessed on 1 July 2024)) features an “AI Paraphraser” designed to help writers rephrase their sentences without losing the original meaning. This tool is particularly useful for academics looking to avoid plagiarism by expressing ideas in new ways, ensuring that their writing maintains a high level of originality and adheres to academic integrity standards.

QuillBot (https://quillbot.com/paraphrasing-tool (accessed on 1 July 2024)) includes a “Paraphrasing Tool” that assists users in rewriting texts to achieve better clarity, variety, and academic tone. By providing synonyms and altering sentence structures, this tool helps users refine their drafts and enhance the persuasive power of their arguments.

Effidit (https://effidit.qq.com/demo (accessed on 1 July 2024)) [20] provides a “Text Polishing” service, which includes a Rewriting feature that focuses on enhancing the linguistic quality of academic texts. This service aims to improve the coherence and cohesion of academic writing, making it more understandable and appropriate for scholarly communications.

DeepL Write (https://www.deepl.com/en/write (accessed on 1 July 2024)) is a comprehensive tool that supports various aspects of writing improvement. It offers suggestions for grammar corrections, style enhancements, and vocabulary substitutions, making it a versatile assistance for non-native speakers striving to meet the rigorous demands of academic writing in English.

Common Challenges

Despite the utility of these tools, there are common issues that limit their effectiveness. Firstly, these products typically provide users with revised results without explaining the rationale behind these changes. This lack of transparency can be problematic for learners who wish to understand the principles of effective academic writing deeply. Just as [21] points out, in the context of tasks like GEC in EAP writing aids, it is crucial to provide users with reasons for the revisions, not merely the revised outcomes. Secondly, the proprietary nature of these technologies poses a significant barrier to academic research and improvement. Since the algorithms and methodologies are not openly shared, it is challenging for researchers to analyze, critique, or enhance these tools, which stifles innovation and limits educational progress in EAP writing assistance.

2.3. Prompt Engineering

Prompt engineering is a crucial technique in the development and utilization of language models, particularly for tasks related to EAP writing assistance. It involves the design and refinement of input prompts to guide language models in generating desired outputs effectively. The importance of prompt engineering lies in its ability to influence the behavior of language models, ensuring that the generated text meets specific criteria and quality standards. Formally, prompt engineering can be defined as follows:

Prompt engineering = arg max_{P} \sum_{i = 1}^{n} E_{M} [Q (O_{i} | P, C_{i})]

(1)

where P represents the prompt,

O_{i}

represents the output,

C_{i}

represents the context, and Q represents the quality of the output as determined by the language model M.

The development of prompt engineering has evolved significantly with the advancement of LLMs. Initially, prompts were simple and often manually crafted. However, as the complexity of LLMs increased, so did the sophistication of prompt engineering techniques. Modern approaches utilize various strategies, including evolutionary algorithms [22,23,24] and reinforcement learning [25,26,27,28], to optimize prompts for specific tasks.

In the context of English GEC, prompt engineering plays a pivotal role. EAP writing heavily relies on precise grammar usage, making GEC an essential component. Studies such as [29,30] have demonstrated the effectiveness of prompt engineering in enhancing the performance of LLMs for GEC tasks. Ref. [31] also highlights the critical impact of prompt design on the quality of grammar corrections provided by language models. Furthermore, research by [32] underscores the significant influence of prompt engineering on GEC outcomes, emphasizing its importance in achieving high-quality grammatical corrections in EAP writing.

Prompt engineering is also applied in sentence-level revisions (SentRev) [33], which is another crucial aspect of EAP writing assistance. SentRev aims to elevate subpar academic sentences to publishable standards, addressing issues related to clarity, coherence, and formal tone. According to [33], prompt engineering significantly impacts the feedback quality provided by LLMs for SentRev tasks. The study reveals that even when prompts convey the same meaning, the quality of the feedback can vary greatly across different languages. Moreover, within the same language, different prompt designs can lead to substantial variations in feedback quality, highlighting the nuanced nature of prompt engineering in EAP applications.

2.4. AI Agent

AI agents are autonomous entities that observe and act upon an environment to achieve specific goals. These agents are programmed with algorithms that allow them to perceive their surroundings, make decisions, and execute actions. AI agents can be broadly classified into reactive agents, which respond to stimuli without internal state management, and cognitive agents, which maintain an internal state and utilize past experiences to inform current decisions. Within the field of NLP, AI agents have been applied to various tasks, including machine translation [34], sentiment analysis [35], and text summarization [36,37].

A significant advancement in AI agent technology is the development of MAS. These systems involve multiple interacting agents, each with distinct capabilities and objectives, working collaboratively or competitively to achieve complex goals. The superiority of MAS lies in its ability to handle tasks that are too complex for a single agent by distributing the workload and leveraging the diverse strengths of individual agents.

In the context of NLP, MAS can provide enhanced problem-solving capabilities through distributed processing, parallel execution of tasks, and dynamic adaptation to changing environments. For instance, in machine translation, multiple agents can focus on different linguistic components such as syntax, semantics, and context, thereby improving the overall translation quality [38]. Similarly, in sentiment analysis, different agents can analyze various aspects of a text, including emotional tone, intent, and contextual nuances, resulting in more accurate interpretations [39].

The potential of multi-agent systems in EAP writing assistance is particularly promising. EAP writing involves various complex tasks, such as grammar checking, style enhancement, coherence improvement, and content validation. By employing a multi-agent approach, these systems can offer more comprehensive and nuanced assistance. Each agent within the system can specialize in a specific aspect of writing, providing targeted support that addresses the diverse needs of EAP learners.

Furthermore, multi-agent systems can dynamically interact to provide holistic feedback, addressing multiple facets of writing simultaneously [40]. This adaptability is crucial in EAP contexts, where the requirements and standards can vary significantly across disciplines and educational levels. The ability of multi-agent systems to learn and evolve based on user interactions and feedback further enhances their potential [41], making them an invaluable tool in improving the academic writing process for both non-native and native English speakers.

3. Objectives

The objective of this study is to develop an innovative EAP writing assistance system, AcademiCraft, using multi-agent system (MAS) technology. This assistance aims to enhance the quality of EAP drafts, elevating them to, or beyond, the standard of existing commercial EAP writing tools. Additionally, it seeks to provide users with explanations for suggested revisions, addressing complex linguistic phenomena that typically challenge such enhancement.

AcademiCraft contributes to the field of EAP writing assistance systems through its multi-level granular analysis approach, operating at sentence, paragraph, and chapter levels. This hierarchical framework ensures comprehensive coverage of both linguistic and structural elements essential to academic writing. Unlike traditional tools applying universal enhancement methods, our system recognizes the distinct requirements of different textual components.

The specialized agent architecture of AcademiCraft employs nine dedicated agents, each addressing specific aspects of EAP writing. This division creates a system capable of handling complex linguistic phenomena through collaborative intelligence, thereby overcoming limitations commonly found in monolithic approaches used by commercial tools. The collaborative nature of these agents allows for sophisticated analysis that considers multiple linguistic dimensions simultaneously.

A primary objective of AcademiCraft involves providing transparent explanations for each suggested revision. While current commercial tools frequently operate as “black boxes”, our system enhances user understanding through detailed justifications based on grammar correction rationales, Academic Phrasebank applications, use of hedges and boosters, passive voice implementation, paragraph structure analysis, and move analysis for chapter-level organization. This transparency supports the pedagogical value of the system.

AcademiCraft specifically targets academic writing conventions by incorporating established linguistic frameworks such as lexical bundles, the Academic Phrasebank, and Swales’ CARS model. This focus differentiates our system from general-purpose writing tools and addresses the specific needs of academic writers. Furthermore, we aim to develop a system that exceeds the performance of existing commercial tools in accuracy of grammar correction, appropriateness of academic language suggestions, structural improvements at all text organization levels, and user comprehension of suggested changes.

Current commercial EAP writing tools often treat writing assistance systems as a sequence-to-sequence task, relying heavily on large parallel training datasets. This approach reflects a common critique of traditional deep learning methods—developers struggle to explain the reasoning behind their models’ outputs. Existing tools typically cannot adequately explain suggested revisions, have limited coverage of complex linguistic phenomena specific to academic writing, inadequately consider hierarchical text structures, and insufficiently integrate established academic writing frameworks.

Recent advancements in MAS technology offer new potential for breakthroughs in this domain, prompting our exploration of its capabilities for transforming EAP writing assistance systems. By leveraging the collaborative intelligence of multiple specialized agents, AcademiCraft aims to overcome these limitations and provide a more effective, transparent, and educational writing assistance system experience tailored specifically for academic contexts.

4. Approaches

This chapter provides a detailed exposition of our methodological approach. Utilizing Coze (https://www.coze.com), we developed a MAS for EAP writing assistance, which we have termed AcademiCraft. The architectural schema of the system is depicted in Figure 1. AcademiCraft divides the task of EAP writing assistance into multiple granular subtasks via nine distinct agents. These agents collaboratively transform a user-submitted draft of EAP text into a manuscript that meets publication standards, providing comprehensive and justified revisions. The following discussion will detail the specific functions of each agent within the system.

The arrangement agent is responsible for managing and operating the entire MAS. It also plays a crucial role in interacting with users. Upon receiving a user’s EAP draft, the arrangement agent distributes the document to the GEC agent, followed by the execution of the other agents in sequence. The final output includes a revised EAP text along with justifications for the revisions made, which are then presented to the user.

The GEC agent is designed for GEC in the EAP draft. Solutions such as those proposed [42,43] integrate GEC directly into the drafting process. However, this integration of additional variables may potentially diminish the quality of either the GEC or the draft revisions. Ref. [44]’s research has demonstrated the exceptional performance of LLMs in the GEC task. Therefore, utilizing an LLM-based Agent for GEC is evidently a prudent choice. Our GEC agent is developed in accordance with [45]’s design principles to ensure the quality of GEC.

Node switch agent is designed to classify EAP texts and direct them to different nodes based on their type. Our approach categorizes the grammar-corrected EAP drafts into three distinct textual dimensions: sentence, paragraph, and chapter. This differentiation is crucial because the focus points of academic texts vary significantly depending on their structure [46,47]. For clarity, we define these textual dimensions as follows:

Sentence refers to a single, standalone sentence that requires revision primarily focused on grammatical accuracy, academic phraseology, and lexical precision.
Paragraph encompasses text consisting of two or more sentences forming a cohesive unit with a central idea, requiring analysis of internal coherence, logical flow, and collective academic tone.
Chapter refers to multiple paragraphs organized together, requiring higher-level structural assessment including moves analysis and cross-paragraph coherence.

By segmenting the EAP drafts into these distinct granularity levels, our system employs a divide-and-conquer strategy, ensuring efficient processing and enhancement of academic writing. It is important to note that these three dimensions operate independently, with no direct communication between their respective agents. The node switch agent serves as a classifier that determines which specific revision path is appropriate based on the input text’s length and structure, then directs the text exclusively to either the Sentence, Paragraph, or Chapter processing track.

Each textual dimension is processed by its dedicated agents: Sentence-level texts are handled solely by the sentence revision agent and evaluated by the sentence evaluation agent; paragraph-level texts are processed exclusively by the paragraph revision agent and assessed by the paragraph evaluation agent; chapter-level texts are managed only by the chapter revision agent with evaluation from the chapter evaluation agent.

The revision process follows a single-pass approach rather than an iterative cycle. Each text is processed exactly once through its respective dimension-specific agents, with no repeated revisions or feedback loops across different dimensions. This design decision was made to maintain system stability and ensure that the final output remains recognizable to users. Our preliminary experiments indicated that multiple revision cycles could potentially lead to text that increasingly diverges from the author’s original intent and style, which might reduce user acceptance of the suggested changes.

Although the system does not implement multiple revision cycles, we have carefully designed each dimension-specific agent to consider all relevant aspects of academic writing appropriate to its level. For instance, the chapter revision agent performs comprehensive analysis by incorporating both micro-level linguistic features and macro-level organizational structures in a single pass. This approach ensures effective enhancement while preserving the author’s voice and maintaining a reasonable boundary on the extent of revisions.

The arrangement agent manages the overall workflow, ensuring that each text is properly classified, directed to the appropriate dimension-specific agents, and that the evaluation results are properly formatted for user presentation. This clearly delineated approach allows for specialized processing at each textual dimension while maintaining system modularity, which facilitates both effective performance and clear evaluation across the three distinct textual dimensions, as demonstrated in our experimental results.

The sentence revision agent is designed for sentence-level revision. For EAP writing, whether for L1 or L2 users, the core focus is to grasp the academic writing structure of sentences [48,49,50]. Central to this is the use of formulaic language (FL). Utilizing FL in sentence-level EAP writing is akin to a beginner swimmer using a flotation device—making the process considerably easier [51]. Writers can simply apply EAP sentence formulas to construct proper academic sentences.

Among the components of FL, lexical bundles (LBs) are particularly effective. This concept, first introduced by [52], refers to recurrent sequences of words that commonly appear together in academic texts and serve specific communicative functions. LBs are typically multi-word expressions that help in structuring arguments, presenting evidence, and framing research findings in a way that is conventional in academic writing.

For example, LBs can include phrases such as the following: “as shown in Figure”, “the results of the study” and “it is important to note that”. These phrases are not only common in academic texts but also help writers convey their ideas more clearly and effectively by using standardized expressions that are widely recognized and understood within the academic community.

In a previous study [53], LBs were employed in Seq2seq models for sentence-level EAP writing assistance systems. Although this model showed performance improvements over the earlier solution [54], the improvements were marginal. This limited enhancement was due to the direct substitution of unsuitable phrases with EAP LBs through N-gram matching. This approach resulted in minimal revisions to the draft sentence structure and failed to identify and replace many non-contiguous LBs.

The core idea behind our designed sentence revision agent is to match sentences with an Academic Phrasebank, ensuring that the entire sentence draft fully incorporates the LBs from the Academic Phrasebank, rather than merely performing partial structure retrieval and substitution. Recent advancements in LLMs possess extensive background knowledge, making them capable of revising EAP sentence drafts as proficiently as an expert in EAP linguistics.

The paragraph revision agent is designed to enhance paragraph quality by focusing on overall structure rather than individual sentences. In EAP writing, paragraph analysis is essential, encompassing several key aspects:

First, the main idea or argument of the paragraph must be clearly identified and presented, ensuring that the central idea is prominent and unambiguous [13]. The agent then conducts a thorough analysis of the paragraph’s structure, including the topic sentence, supporting sentences, and concluding sentence, to ensure logical progression and clarity. Improving the coherence and logical flow between sentences is critical, so the agent assesses and revises the paragraph to enhance these aspects.

Accuracy, academic tone, and clarity of expression are essential in EAP writing [55]. The agent reviews the language use, ensuring that sentences are formal and precise, utilizing structures from the Academic Phrasebank [56] to enhance academic rigor. The Academic Phrasebank is a comprehensive resource developed at the University of Manchester, providing a collection of phrases and sentences commonly used in academic writing. It helps writers construct well-structured arguments and present their ideas clearly and formally. Hedges and boosters are crucial in academic writing for presenting arguments with appropriate caution or emphasis [57,58,59,60]. The agent identifies and adjusts the use of these linguistic tools to align with the academic tone and purpose of the paragraph.

Finally, the passive voice is significant in EAP for maintaining objectivity and focusing on the action rather than the actor [61]. According to [62], the use of hedges, boosters, and even passive voice are interconnected, further highlighting the linguistic complexity of EAP and the superiority of our fine-grained approach. The agent identifies passive constructions and modifies them as necessary to suit the academic context.

Chapter revision agent is designed for comprehensive chapter-level revisions, going beyond mere paragraph adjustments. This process begins with a thorough analysis and restructuring of the entire chapter, followed by detailed paragraph revisions. Specifically, the chapter revision agent first conducts a ‘moves’ analysis [63] on the EAP draft. Moves analysis, in this context, refers to identifying the functional components of the text, such as introducing the topic, establishing a territory, and creating a research space, in line with Swales’ CARS (Create a Research Space) model [64].

The CARS model is instrumental in this process. It includes steps like identifying and analyzing the moves within the text. These moves involve creating a research space, establishing a territory, and occupying a niche. By incorporating the CARS model, the chapter revision agent ensures a structured and methodical approach to text analysis.

Upon completing the moves analysis, the chapter revision agent proceeds with paragraph analysis. However, this is not a straightforward paragraph revision; it incorporates an additional element—evidence. This element is unique to EAP chapter writing and is critical in ensuring that the overall structure and syntactical style of the chapter adhere to academic standards.

The sentence evaluation agent is designed to compare a user’s draft sentence with its revised version, providing feedback on the reasons for the revisions. Specifically, the sentence evaluation agent will produce an evaluation result that details the reasons for the revisions based on two main aspects: (1) grammar and (2) use of LBs. To ensure the structural stability of the output, we employ a few-shot learning approach, which enhances the reliability and stability of the output while also increasing its interpretability [65]. Upon completing its task, the sentence evaluation agent transmits the results back to the arrangement agent, which then delivers the final output to the user.

The paragraph evaluation agent is designed to compare users’ draft paragraphs with their revised versions, providing feedback on the reasons for revisions. Unlike the conventional sentence evaluation agent, this tool assesses and explains the revisions in two distinct dimensions: grammatical level and content level. The grammatical level encompasses four aspects: (1) grammar and (2) use of LBs, (3) use of hedges and boosters, and (4) use of passive voice. The content level focuses on paragraph analysis.

The chapter evaluation agent is designed to compare users’ chapter drafts with their revised versions, providing feedback on the reasons for revisions. Building upon the capabilities of the sentence evaluation agent, the chapter evaluation agent incorporates the evaluation and explanation of moves analysis.

For detailed settings of each agent, please refer to Appendix A.

4.1. Agent Interaction and Workflow Formalization

The multi-agent architecture of AcademiCraft can be represented as a directed workflow graph

G = (V, E)

, where vertices

V = {v_{1}, v_{2}, \dots, v_{9}}

represent the nine distinct agents, and edges, E, represent the information flow between agents. The arrangement agent (

v_{1}

) serves as both the source and sink of this graph, coordinating the overall information flow.

For a given input text, T, the processing can be formalized as a sequential transformation function:

Prompt engineering = arg max_{P} \sum_{i = 1}^{n} E_{M} [Q (O_{i} | P, C_{i})]

(2)

where

f_{i}

represents the transformation function of agent i. The specific agents involved in the transformation depend on the text classification performed by the node switch agent. Let

C (T) \in {S, P, C}

be the classification function that maps input text to one of three categories: sentence (S), paragraph (P), or chapter (C).

The transformation path can then be defined as follows:

f (T) = \{\begin{matrix} f_{SE} \circ f_{SR} \circ f_{GEC} (T), & if C (T) = S \\ f_{PE} \circ f_{PR} \circ f_{GEC} (T), & if C (T) = P \\ f_{CE} \circ f_{CR} \circ f_{GEC} (T), & if C (T) = C \end{matrix}

(3)

where

f_{GEC}

represents the grammar error correction agent;

f_{SR}

,

f_{PR}

, and

f_{CR}

represent the sentence, paragraph, and chapter revision agents respectively; and

f_{SE}

,

f_{PE}

, and

f_{CE}

represent the corresponding evaluation agents.

4.2. Computational Analysis

The computational approach of AcademiCraft is influenced by the sequential nature of agent operations. For a text with n tokens, the processing can be conceptually divided into three main phases:

T (n) = T_{GEC} (n) + T_{Rev} (n) + T_{Eval} (n)

(4)

where

T_{GEC} (n)

represents the grammar correction phase,

T_{Rev} (n)

represents the revision process, and

T_{Eval} (n)

represents the evaluation phase.

The complexity varies across different textual dimensions. Sentence-level processing involves primarily local operations. Paragraph-level processing requires analyzing relationships between sentences, incorporating coherence measures and discourse structure. Chapter-level processing encompasses more global features, including moves analysis and cross-paragraph coherence assessment.

Each agent contributes specialized processing capabilities to the overall system, with carefully designed prompts that target specific aspects of academic writing. This modular design allows for focused enhancement at each textual level while maintaining an integrated approach to writing assistance that spans from local grammatical concerns to global organizational structures.

5. Experiments

This section presents a comprehensive evaluation of AcademiCraft’s performance through a series of systematic experiments. Our evaluation framework encompasses three key dimensions: text granularity (sentence, paragraph, and chapter levels), writer proficiency (L1 and L2 English writers), and multiple performance metrics (including grammatical accuracy, textual coherence, and academic language use). We first introduce our baseline systems and evaluation datasets, followed by detailed descriptions of our evaluation metrics. Subsequently, we present quantitative results comparing AcademiCraft with leading commercial writing assistance systems across various aspects. We conclude this section with an in-depth analysis and discussion of the experimental findings, highlighting both the strengths and potential areas for improvement in our approach. This experimental evaluation aims to answer several critical questions:

How effective is AcademiCraft at improving text quality across different granularities compared to existing commercial tools?
Does AcademiCraft provide consistent performance benefits for both L1 and L2 English writers?
What are the specific strengths and limitations of our MAS-based approach in different aspects of academic writing assistance systems?

5.1. Baselines

We selected several mainstream commercial EAP writing assistance products currently available on the market, as introduced in the related work, to serve as baselines. To better illustrate their characteristics, we listed several key features in Table 1.

5.2. Data

To evaluate the performance of different products, we selected various test data across three textual dimensions: sentences, paragraphs, and complete texts. Specifically, for sentence revision, we utilized the LOCNESS [66] and TCNAEC [67] corpora. LOCNESS is a corpus of native English essays consisting of A-level essays from British pupils, essays from British university students, and essays from American university students. In contrast, TCNAEC comprises draft sentences in academic English written by non-native English speakers from ten different countries. We randomly selected several sentences according to method x.

For paragraph and full-text revisions, we used the LOCNESS and ICNALE [68] corpora. ICNALE includes over 10,000 topic-based speeches and essays written by university students from ten non-English-speaking countries. We focused exclusively on the essay portion of ICNALE and selected several test data points using the same method.

5.3. Metrics

5.3.1. Sentence Revision

Grammatical error rate (GER) is used to evaluate the grammatical error rate within a single text. GER measures the ratio of the number of grammatical errors to the total number of words in the text, thereby assessing the grammatical accuracy of the text. This metric is calculated using the language-tool-python (https://github.com/jxmorris12/language_tool_python (accessed on 1 July 2024)). The formula for the calculation is as follows:

$GER = \frac{Number of Grammatical Errors}{Total Number of Words}$

(5)
The proportion of grammatical error types is used to evaluate the relative frequency of various types of grammatical errors in a text. This metric reflects the distribution of different types of grammatical errors by counting the instances of each type of error and calculating their proportion in the total number of grammatical errors. This metric is calculated using the language-tool-python
GLUE [69] is used to evaluate the semantic similarity between the draft and the revised draft. This metric employs the all-MiniLM-L6-v2 [70] model from the SentenceTransformers [71] library. Specifically, the draft and revised draft texts are encoded into vectors using the model, and then the cosine similarity between these vectors is calculated. The formula is as follows:

$GLUE = cos (v_{draft}, v_{revised})$

(6)

where $v_{draft}$ and $v_{revised}$ represent the vector representations of the draft and revised draft, respectively.
Flesch reading ease (FRE) [72] is used to evaluate the readability of a text. This metric is calculated using the NLTK (https://github.com/nltk/nltk (accessed on 1 July 2024). The calculation formula is as follows:

$FRE = 206.835 - 1.015 (\frac{Total Words}{Sentences}) - 84.6 (\frac{Syllables}{Words})$

(7)

The FRE score ranges from 0 to 100, with higher values indicating easier readability. The specific segments are interpreted as follows:
–
0–30: Extremely difficult text, suitable for highly specialized academic articles.
–
31–50: Very difficult text, suitable for academic papers.
–
51–60: Moderately difficult text, suitable for articles at the undergraduate level and above.
–
61–70: Fairly easy-to-read text, suitable for high school-level articles.
–
71–80: Easy text, suitable for middle school-level articles.
–
81–100: Very easy text, suitable for elementary school-level articles.
The Gunning fog index (GFI) [73] is used to measure the readability of a text, particularly evaluating the difficulty level for readers to comprehend the material. This metric determines the text’s complexity by calculating the proportion of sentence length and complex words. A higher value indicates a more difficult text to read. Thus, in academic writing, a high GFI value typically signifies more in-depth and specialized content. This metric is calculated using the NLTK. The specific calculation formula is as follows:

$GFI = 0.4 (\frac{Total Words}{Total Sentences} + 100 \times \frac{Complex Words}{Total Words})$

(8)

5.3.2. Paragraph Revision

In addition to the evaluation metrics used for sentence revision, paragraph revision employs a broader range of metrics to better match the characteristics of paragraph texts. The following sections will provide a detailed explanation of these metrics.

TL Freq FW Log (TFFL) is used to evaluate the frequency of academic vocabulary in the text. This metric reflects the academic level of the text by calculating the logarithmic frequency of specific frequent words (such as academic words). The calculation of this metric is based on the TAALES [74] tool, which extracts the frequency of academic vocabulary in the text and takes the logarithm to better quantify the use of academic vocabulary. The specific calculation formula is as follows:

$TFFL = log (\frac{\sum_{i = 1}^{n} {Freq}_{i}}{n})$

(9)
BNC-written bigram proportion (BWBP) is used to evaluate the use of bigrams in the text. This metric reflects the natural fluency and linguistic authenticity of the text by calculating the proportion of bigrams found in the written section of the BNC. The calculation of this metric is based on extracting bigrams from the text and comparing them with those in the BNC-written corpus to determine their usage proportion. This metric is calculated using the TAALES. The specific calculation formula is as follows:

$BWBP = \frac{\sum_{i = 1}^{n} I ({Bigram}_{i} \in BNC)}{n}$

(10)
BNC-written trigram proportion (BWTP) is used to evaluate the use of trigrams in the text. This metric reflects the natural fluency and linguistic authenticity of the text by calculating the proportion of trigrams found in the written section of the BNC. This metric is calculated using the TAALES. The specific calculation formula is as follows:

$BWTP = \frac{\sum_{i = 1}^{n} I ({Trigram}_{i} \in BNC)}{n}$

(11)
Syntactic overlap of sentence noun phrases (SOSNPs) is used to evaluate the repetition degree of noun phrases between adjacent sentences within a single draft, measuring the coherence of the text. By calculating the frequency and overlap of noun phrases in adjacent sentences, this metric reflects whether the author maintains consistent themes and coherent discourse between sentences. This metric is calculated using the TAACO [75]. The specific calculation formula is as follows:

$SOSNP = \frac{\sum_{i = 1}^{n - 1} Count (N o u n P h r a s e_{i} \cap N o u n P h r a s e_{i + 1})}{\sum_{i = 1}^{n - 1} Count (N o u n P h r a s e_{i})}$

(12)
Syntactic overlap of sentence verb phrases (SOSVPs) is used to evaluate the repetition degree of verb phrases between adjacent sentences within a single draft, measuring the coherence of the text. By calculating the frequency and overlap of verb phrases in adjacent sentences, this metric reflects whether the author maintains consistent action or state descriptions between sentences, thereby enhancing the coherence of the text. This metric is calculated using the TAACO. The specific calculation formula is as follows:

$SOSVP = \frac{\sum_{i = 1}^{n - 1} Count (V e r b P h r a s e_{i} \cap V e r b P h r a s e_{i + 1})}{\sum_{i = 1}^{n - 1} Count (V e r b P h r a s e_{i})}$

(13)
Latent semantic analysis full sentence similarity (LSAFSS) is used to evaluate the semantic similarity between sentences within a single draft. This metric measures the degree of similarity in a semantic space using latent semantic analysis (LSA) [76], thereby reflecting the coherence and consistency of the text. Specifically, LSAFSS maps each sentence into a semantic space and calculates the cosine similarity between adjacent sentences in this space. This metric is calculated using the TAACO. The specific calculation formula is as follows:

$LSAFSS = \frac{\sum_{i = 1}^{n - 1} cos (\vec{S_{i}}, \vec{S_{i + 1}})}{n - 1}$

(14)

where $\vec{S_{i}}$ represents the vector representation of the i-th sentence in the semantic space, and n is the total number of sentences in the draft.
Latent Dirichlet allocation full sentence similarity (LDAFSS) is used to evaluate the topic similarity between sentences within a single draft. This metric measures the degree of similarity in a topic space using latent Dirichlet allocation (LDA) [77], thereby reflecting the coherence and consistency of the text. Specifically, LDAFSS maps each sentence into a topic space and calculates the cosine similarity between adjacent sentences in this space. This metric is calculated using the TAACO. The specific calculation formula is as follows:

$LDAFSS = \frac{\sum_{i = 1}^{n - 1} cos (\vec{T_{i}}, \vec{T_{i + 1}})}{n - 1}$

(15)

where $\vec{T_{i}}$ represents the vector representation of the i-th sentence in the topic space, and n is the total number of sentences in the draft.
Word2Vec Full sentence similarity (W2VFSS) is used to evaluate the word vector similarity between sentences within a single draft. This metric measures the degree of similarity in a word vector space using the Word2Vec [78], thereby reflecting the coherence and consistency of the text. Specifically, W2VFSS maps each sentence into the word vector space and calculates the cosine similarity between adjacent sentences in this space. This metric is calculated using the TAACO. The specific calculation formula is as follows:

$W 2 VFSS = \frac{\sum_{i = 1}^{n - 1} cos (\vec{W_{i}}, \vec{W_{i + 1}})}{n - 1}$

(16)

where $\vec{W_{i}}$ represents the vector representation of the i-th sentence in the word vector space, and n is the total number of sentences in the draft.

5.3.3. Chapter Revision

Syntactic overlap of paragraph noun phrases (SOPNPs) is used to evaluate the repetition degree of noun phrases within a single draft, measuring the internal consistency and coherence of the text. By calculating the frequency and overlap of noun phrases within the text, this metric reflects whether the author maintains a high level of consistency and logical coherence within paragraphs. This metric is calculated using the TAACO. The specific calculation formula is as follows:

$SOPNP = \frac{\sum_{i = 1}^{n} Count (N o u n P h r a s e_{i} \cap N o u n P h r a s e_{i + 1})}{\sum_{i = 1}^{n} Count (N o u n P h r a s e_{i})}$

(17)
Syntactic overlap of paragraph verb phrases (SOPVPs) is used to evaluate the repetition degree of verb phrases within a single draft, measuring the internal consistency and coherence of the text. By calculating the frequency and overlap of verb phrases within the text, this metric reflects whether the author maintains consistent actions or states descriptions within paragraphs, thereby enhancing the coherence of the text. This metric is calculated using the TAACO. The specific calculation formula is as follows:

$SOPVP = \frac{\sum_{i = 1}^{n} Count (V e r b P h r a s e_{i} \cap V e r b P h r a s e_{i + 1})}{\sum_{i = 1}^{n} Count (V e r b P h r a s e_{i})}$

(18)
Latent semantic analysis full paragraph similarity (LSAFPS) is used to evaluate the semantic similarity within paragraphs of a single draft. LSAFPS measures the overall coherence and consistency of the text by comparing the semantic similarity between every pair of sentences within a paragraph. This metric is crucial for understanding the thematic coherence within paragraphs and its contribution to the overall structure of the text. This metric is calculated using the TAACO. The specific calculation formula is as follows:

$LSAFPS = \frac{1}{N} \sum_{i = 1}^{N} cos (\vec{d_{i}}, \vec{d_{j}})$

(19)
Latent Dirichlet allocation full paragraph similarity (LDAFPS) is used to evaluate the thematic similarity within paragraphs of a single draft. LDAFPS measures the overall coherence and consistency of the text by comparing the topic distribution between every pair of sentences within a paragraph. This metric is crucial for understanding the thematic consistency within paragraphs and its contribution to the overall structure of the text. This metric is calculated using the TAACO. The specific calculation formula is as follows:

$LDAFPS = \frac{1}{N} \sum_{i = 1}^{N} JS (\vec{t_{i}}, \vec{t_{j}})$

(20)
Word2Vec full paragraph similarity (W2VFPS) is used to evaluate the similarity within paragraphs of a single draft based on word vector representations. W2VFPS measures the overall coherence and consistency of the text by comparing the average word vectors between every pair of sentences within a paragraph. This metric is crucial for understanding the semantic consistency within paragraphs and its contribution to the overall structure of the text. This metric is calculated using the TAACO. The specific calculation formula is as follows:

$W 2 VFPS = \frac{1}{N} \sum_{i = 1}^{N} cos (\vec{w_{i}}, \vec{w_{j}})$

(21)

5.4. Discussion on Metrics

We opted not to employ LLMs-based methods for evaluation. Despite some studies highlighting the considerable potential of these methods over traditional approaches [79,80,81], they are fraught with uncertainties [82,83]. Our objective in this work is to minimize uncertainties as much as possible during the evaluation process. Additionally, metrics such as BLEU [84] and Perplexity [85] have been widely used; however, the studies by [86,87] have revealed that these metrics do not align well with human evaluation characteristics, leading us to exclude them from our evaluation framework.

6. Results

Based on our scientific experiments and evaluations, we obtained notable results from our benchmark assessments. It is worth highlighting that Effidit consistently underperformed across all tests. Specifically, in sentence and paragraph revision tasks, Effidit’s output was highly inconsistent. For the same sentence or paragraph input, Effidit would occasionally indicate that the input was perfect, sometimes provide a revised version, and at other times, inexplicably translate the input into Chinese. This inconsistency suggests that Effidit’s system architecture may incorporate a pre-trained machine translation model. Furthermore, for passage-level revisions, Effidit’s limitations on the number of input attempts made testing infeasible. Despite being freely available, Effidit, as a commercial product, proves to be disappointing.

In contrast, the other tools demonstrated their respective strengths in various aspects. The following sections will present the results in detail and provide a thorough analysis.

6.1. Examples

We will first present the performance of our tool, AcademiCraft. Figure 2 illustrates an example of sentence revision using AcademiCraft Due to text length constraints, examples of paragraph and chapter revisions are provided in Appendix B in textual form.

6.2. GEC Evaluation Results

6.2.1. GER Score

For the GER scores of different tools, please refer to Table 2.

Analysis

Sentence-level analysis: In the L1 EAP corpus LOCNESS, AcademiCraft achieved a GER of 0, demonstrating complete mastery over L1 EAP texts. For the L2 EAP corpus TCNAEC, the GER was 1.53, slightly higher than mainstream tools such as QuillBot Premium and DeepL Write Pro, but superior to Draft and Pitaya Pro. This indicates AcademiCraft’s effective correction capabilities in L2 EAP writing, albeit with room for improvement.

Paragraph-level analysis: AcademiCraft exhibited a GER of 0.06 in the LOCNESS corpus, showcasing its ability to maintain grammatical accuracy at the paragraph level. In the ICNALE corpus, the GER increased to 0.14, which, while higher than that of DeepL Write Pro, was lower than other competing tools, confirming its efficacy in processing L2 paragraphs.

Chapter level analysis: In the LOCNESS corpus, AcademiCraft recorded a GER of 0.90, outperforming other tools and demonstrating strong capability in maintaining grammatical accuracy across extensive texts. In the ICNALE corpus, a GER of 0 was achieved, indicating that AcademiCraft is highly effective in correcting complex structures in L2 EAP texts.

Comprehensive evaluation: AcademiCraft demonstrated low GER across various text levels and different corpora, particularly excelling in sentence and chapter levels for both L1 and L2 EAP texts. These findings reveal AcademiCraft’s potential in grammatical correction, especially in its handling of long texts and L2 EAP texts. Further algorithmic enhancements, particularly for L2 EAP writing as observed in the TCNAEC corpus, could improve its capacity to process complex and diverse texts.

6.2.2. The Proportion of Grammatical Error Types

Analysis

Sentence revisions in the LOCNESS Corpus: The radar charts illustrate the distribution of different grammatical error types corrected by various tools in sentence revisions within the LOCNESS corpus. As shown in Figure 3, the detailed breakdown reveals nuanced differences in error correction capabilities. AcademiCraft shows no errors, indicating its high efficiency in sentence-level corrections for L1 EAP texts. Tools like Draft and Pitaya Pro predominantly corrected errors related to possessive apostrophes and morphological rules, while Wordvice AI Premium focused solely on possessive apostrophes. These findings highlight the specialization of each tool in addressing specific error types.

Sentence revisions in the TCNAEC corpus: As shown in Figure 4, in the TCNAEC corpus, AcademiCraft demonstrates a balanced correction of morphological rules and infinitive errors. Tools like Draft and Pitaya Pro also corrected similar error types, with Pitaya Pro notably addressing a higher proportion of morphological errors. This suggests that while these tools are effective in correcting specific grammatical errors in L2 EAP texts, AcademiCraft maintains a more comprehensive approach.

Paragraph revisions in the LOCNESS corpus: For paragraph-level revisions in the LOCNESS corpus, As shown in Figure 5, AcademiCraft addresses a range of error types, including specific cases and compound sentences. Draft and DeepL Write Pro show varied distributions, focusing on different grammatical aspects like uppercase sentence starts and specific case rules. Wordvice AI Premium again focuses solely on morphological rules. The diverse error type corrections by AcademiCraft indicate its robustness in handling paragraph-level grammatical accuracy.

Paragraph revisions in the ICNALE corpus: As shown in Figure 6, in the ICNALE corpus, AcademiCraft and other tools continue to exhibit diverse correction capabilities. AcademiCraft balances corrections among morphological rules and specific grammatical constructs, whereas other tools like QuillBot Premium and Wordvice AI Premium primarily focus on a single error type, morphological rules. This consistency in correction diversity further supports AcademiCraft’s adaptability to different L2 EAP texts.

Chapter revisions in the LOCNESS corpus: For chapter-level revisions within the LOCNESS corpus, As shown in Figure 7, AcademiCraft again shows no errors, indicating exceptional performance in maintaining grammatical accuracy across longer texts. Other tools, such as Draft and Pitaya Pro, focus on possessive apostrophes and compound sentences. The significant error-free performance of AcademiCraft underscores its effectiveness in comprehensive text revisions.

Chapter revisions in the ICNALE corpus: As shown in Figure 8, in the ICNALE corpus, both AcademiCraft and Wordvice AI Premium show no errors, suggesting a high level of efficiency in managing grammatical accuracy for L2 EAP chapter revisions. Tools like Draft and Pitaya Pro show varied distributions, addressing multiple error types. This consistency of error-free results by AcademiCraft highlights its reliability across extensive texts.

Comprehensive evaluation: The analysis across different text levels and corpora demonstrates AcademiCraft’s superior performance in grammatical corrections. Its ability to address a wide range of grammatical errors, especially in L2 EAP texts, sets it apart from other tools that tend to specialize in specific error types. The tool’s consistent performance across sentences, paragraphs, and chapters indicates its robustness and adaptability, making it a highly effective solution for enhancing grammatical accuracy in academic writing. Further improvements could focus on maintaining this high level of performance across even more diverse and complex textual structures.

6.3. Evaluation Results of Statistical Metrics

Analysis

As detailed in Table 3, the evaluation results provide a comprehensive overview of the statistical metrics across different writing assistance systems and corpora.

Paragraph-level analysis: In the LOCNESS corpus, TFFL values indicate that the writing assistance systems performed similarly, ranging from 4.66 to 4.77. Draft and AcademiCraft scored slightly lower (4.66), while DeepL Write Pro scored the highest (4.77). However, a higher TFFL does not necessarily indicate better text quality, as it may merely reflect changes in word frequency distribution [88].

BWBP and BWTP measure the proportion of bigrams and trigrams, respectively. Draft has a notably higher BWBP (0.54) than the other tools, while AcademiCraft (0.43) and DeepL Write Pro (0.53) have relatively lower scores. This suggests that Draft may contain more common phrase combinations, which does not inherently imply better language quality. Conversely, lower BWBP and BWTP values might indicate a richer vocabulary use [89].

For syntactic overlap metrics (SOSNP, SOSVP), DeepL Write Pro shows higher values in SOSNP (0.56) and SOSVP (0.46), indicating more consistent syntactic structures in its revisions. In contrast, AcademiCraft’s performance in SOSNP (0.82) and SOSVP (0.26) is relatively lower, potentially indicating more varied syntactic structures post-revision and slightly weaker syntactic consistency.

In the ICNALE corpus, the TFFL distribution is similar to that of LOCNESS, with DeepL Write Pro (4.75) again performing best, and AcademiCraft (4.65) close behind. Higher TFFL values might indicate a more concentrated word frequency distribution for L2 learners, but this does not necessarily correlate with higher text quality. For syntactic overlap metrics (SOSNP, SOSVP), DeepL Write Pro once more demonstrates higher scores (1.22 and 0.45), suggesting higher consistency in L2 corpus syntactic structures, while AcademiCraft performs moderately (0.92 and 0.18).

Chapter-level analysis: In the LOCNESS corpus, TFFL values range from 4.68 to 4.79, with Draft (4.68) and AcademiCraft (4.69) performing slightly lower and DeepL Write Pro (4.79) the highest. The BWBP and BWTP distribution shows Draft scoring higher in BWBP (0.41), indicating more common bigram usage, but this does not necessarily signify quality. AcademiCraft’s scores in BWBP (0.37) and BWTP (0.11) are moderate.

Syntactic overlap metrics (SOPNP, SOPVP) indicate that DeepL Write Pro excels in SOPNP (8.78) and SOPVP (6.71), showing high consistency in paragraph-level syntactic structures. AcademiCraft also performs well in SOPNP (7.59) and SOPVP (5.47), indicating good consistency in paragraph-level syntax, though slightly inferior to DeepL Write Pro.

In the ICNALE corpus, TFFL values are concentrated, with Draft (4.57) and AcademiCraft (4.59) on the lower end and DeepL Write Pro (4.74) leading. The BWBP and BWTP distribution is similar to the Paragraph level, with Draft performing higher in BWBP (0.47) and AcademiCraft (0.36) being moderate. For syntactic overlap metrics, AcademiCraft shows good performance in SOPNP (6.49) and SOPVP (2.85), although it is still slightly behind DeepL Write Pro (4.34 and 2.44).

Comprehensive evaluation: Overall, Draft shows higher values in several metrics, but these do not necessarily indicate better text quality, as higher word frequency features and bigram proportions may merely reflect common vocabulary usage rather than superior writing. AcademiCraft performs well across multiple metrics, especially in TFFL and syntactic overlap, although it has room for improvement in some areas. Its overall stable performance indicates it is a competitive EAP writing assistance system.

In comparing L1 (LOCNESS) and L2 (ICNALE) corpora, it is evident that L2 learners’ texts face more challenges in vocabulary frequency and syntactic consistency [90]. AcademiCraft shows good revision effects in both corpora, particularly in syntactic overlap and paragraph-level consistency, indicating its potential as an EAP writing assistance system.

While AcademiCraft is slightly behind DeepL Write Pro in certain metrics, its overall performance meets the needs of EAP writing, especially in balancing vocabulary richness and syntactic consistency [8]. Future improvements can focus on enhancing syntactic consistency and improving the revision effects for L2 learners to further enhance the overall quality and effectiveness of the writing assistance system.

6.4. Evaluation Results of Performance Metrics

Analysis

As shown in Table 4, the detailed performance metrics provide insights into the various writing assistance tools’ capabilities.

Sentence level: In the sentence level evaluation of the LOCNESS corpus, AcademiCraft demonstrated a significant reduction in text complexity, evidenced by its FRE score of 13.60 and GFI score of 21.60. The GLUE score of 0.74, while slightly lower than other tools (e.g., QuillBot Premium at 0.86 and DeepL Write Pro at 0.82), still maintained a high level of semantic similarity. Compared to other tools, AcademiCraft achieved a favorable balance between reducing text complexity and maintaining semantic similarity [91,92], which is crucial in EAP writing revisions.

For the TCNAC corpus, AcademiCraft again showcased its strength in controlling text complexity with a GFI of 22.09 and FRE of 11.37. The GLUE score of 0.84, comparable to other tools (e.g., QuillBot Premium at 0.91 and DeepL Write Pro at 0.88), indicated excellent performance in semantic consistency. This performance is particularly relevant for both L1 and L2 students, as it helps maintain the original semantic integrity while simplifying the text [93,94].

Paragraph level: At the paragraph level, AcademiCraft’s performance in the LOCNESS corpus highlighted its capability to enhance text readability, with an FRE of 35.83 and GFI of 17.66, outperforming Wordvice AI Premium and DeepL Write Pro. The GLUE score of 0.90 further confirmed its superiority in semantic consistency. Additionally, the LSAFSS and W2VFSS scores of 0.51 and 0.88, respectively, indicated stable performance in sentence-level and paragraph-level semantic similarity. These results are particularly important for L1 students who often require more assistance in simplifying and understanding complex academic texts.

In the ICNALE corpus, AcademiCraft continued to excel with an FRE of 25.03 and GFI of 18.74, demonstrating effectiveness in handling complex texts. The GLUE score of 0.90 was on par with other tools (e.g., QuillBot Premium at 0.91). The LSAFSS and W2VFSS scores of 0.50 and 0.82, respectively, showed consistent performance in paragraph-level semantic similarity. This is crucial for L2 students who need to ensure text simplification and semantic consistency while learning and using academic English [95].

Chapter level: At the chapter level for the LOCNESS corpus, AcademiCraft exhibited outstanding performance with an FRE of 41.48 and GFI of 16.35, highlighting its significant advantage in enhancing text readability. The GLUE score of 0.90 indicated excellent performance in semantic consistency. Scores for LSAFPS and W2VFSS were 0.55 and 0.91, respectively, further affirming its superior ability to maintain semantic consistency across paragraphs. These metrics are particularly vital for L1 students dealing with extensive academic texts, aiding in better understanding and grasping complex content.

In the ICNALE corpus, AcademiCraft continued to demonstrate its advantages with an FRE of 33.32 and GFI of 16.79, underscoring its effectiveness in managing complex academic texts. The GLUE score of 0.86 confirmed its reliability in semantic consistency. The LSAFPS and W2VFSS scores of 0.62 and 0.82, respectively, further highlighted its stable performance in paragraph-level semantic similarity. This is highly beneficial for L2 students, assisting them in maintaining academic content integrity and consistency while simplifying the text.

Comprehensive evaluation: In summary, AcademiCraft consistently exhibited exceptional capabilities in enhancing text readability and maintaining semantic consistency across different levels and corpora. Compared to other EAP writing assistance systems, AcademiCraft achieved an optimal balance between reducing text complexity and preserving semantic similarity. These results validate its effectiveness in text revision for both L1 and L2 students, further establishing AcademiCraft as a leading tool in academic writing assistance.

6.5. Discussion

This study systematically evaluated the performance of AcademiCraft across different corpora and text levels. Our findings demonstrate that AcademiCraft offers significant advantages over existing commercial tools, particularly in its comprehensive approach to academic writing assistance system and explainability of revisions. While quantitative metrics show competitive performance, the system’s true strength lies in its multi-dimensional analysis capabilities and transparent feedback mechanisms.

At the sentence level, AcademiCraft achieved a zero-error rate in the L1 EAP corpus (LOCNESS), showcasing its exceptional ability to handle academic English writing by native speakers. Although tools like QuillBot Premium and DeepL Write Pro showed marginally lower error rates in the L2 EAP corpus (TCNAEC), AcademiCraft significantly outperformed them in maintaining semantic similarity between original and revised texts, as evidenced by superior GLUE scores. This balance between correction and preservation of original meaning represents a critical advantage in academic writing contexts, where precision of expression is paramount.

The multi-dimensional analysis approach of AcademiCraft demonstrates particular strengths when handling larger text structures. In both paragraph and chapter levels, our system shows remarkably consistent performance across readability metrics (FRE and GFI) while simultaneously improving academic language markers like TFFL and bigram/trigram proportions. Unlike commercial tools that often focus primarily on sentence-level corrections, AcademiCraft maintains a holistic view of text coherence, as demonstrated by consistently higher SOSNP, LSAFSS, and W2VFSS scores across both L1 and L2 corpora.

A distinctive feature of AcademiCraft is its ability to provide explanations for suggested revisions—an innovation largely absent in commercial tools. While our current evaluation focused on automated metrics for textual improvements, we acknowledge that the educational value of these explanations requires separate human evaluation. The generated explanations vary in length and detail across different textual dimensions, which may present different challenges and benefits for L1 versus L2 users. Some explanations offer comprehensive linguistic rationales that could enhance deeper understanding, while others provide concise feedback that might be more immediately actionable but potentially less instructive for non-native writers.

The formulation of standardized evaluation criteria for explanation quality presents significant challenges, including the need to account for varying user proficiency levels, learning objectives, and disciplinary contexts. This complexity prevented us from conducting formal human evaluation of explanation quality in the current study. Future research should address this limitation by developing robust frameworks for assessing explanation clarity, relevance, educational value, and efficiency for different user groups. Such evaluation would provide valuable insights into optimizing the balance between comprehensiveness and conciseness in revision explanations.

Looking toward future improvements, we identify several promising directions. First, enhancing the performance on L2 texts represents a clear opportunity, particularly for addressing complex sentence structures common in academic writing by non-native speakers. Second, incorporating more sophisticated discourse analysis techniques could further strengthen chapter-level revisions. Third, exploring adaptive approaches that customize both revision strategies and explanation depth based on detected writer proficiency levels could improve overall effectiveness and educational impact. Finally, a systematic investigation of the pedagogical efficacy of revision explanations across different academic contexts and user groups would significantly contribute to understanding the educational potential of such systems.

In conclusion, despite comparable performance on some isolated metrics, AcademiCraft demonstrates clear advantages in its comprehensive approach to academic writing assistance system, consistent performance across textual dimensions, and innovative provision of revision explanations. While the current evaluation establishes its technical capabilities through automated metrics, realizing the full educational potential of such systems will require dedicated studies on how different user groups interact with and learn from the explanatory components. The development of AcademiCraft represents an important step toward writing assistance tools that not only improve texts but potentially enhance writers’ understanding of academic discourse conventions.

7. Conclusions

In this study, we developed and evaluated AcademiCraft, an English academic writing assistance based on a MAS, which surpassed mainstream commercial products on several key performance metrics, particularly in GEC and EAP text polishing. AcademiCraft stands out by providing detailed revision explanations, a feature not available in current commercial offerings. Its open and replicable technological framework contrasts sharply with the proprietary nature of most commercial tools, facilitating transparency and allowing researchers and developers to test, adjust, and enhance the technology. Our analyses across various text levels—sentence, paragraph, and chapter—demonstrate AcademiCraft’s robust capabilities in maintaining grammatical accuracy and textual consistency, especially in complex L2 EAP contexts. The results underscore AcademiCraft’s significant potential and superiority in the field of academic writing assistance system, and future work will focus on refining its algorithms to enhance its efficiency in processing a broader range of EAP texts, thereby reinforcing its leadership in the academic writing support tools market.

Author Contributions

Conceptualization, Z.D.; methodology, Z.D.; software, Z.D.; validation, Z.D. and K.H.; formal analysis, Z.D.; investigation, Z.D.; resources, Z.D.; data curation, Z.D.; writing—original draft preparation, Z.D.; writing—review and editing, K.H.; visualization, Z.D.; supervision, K.H.; project administration, Z.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Information, Production and Systems Research Center, Waseda University; Future Robotics Organization, Waseda University; Humanoid Robotics Institute, Waseda University under the Humanoid Project; and Waseda University Grant for Special Research Projects, grant number 2024C-518. The APC was funded by Waseda University.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets used during the study are all publicly available as open-source datasets.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Appendix A. The Detailed Settings of Each Agent

Appendix A.1. Arrangement Agent

Persona & Prompt

# Role

You are an expert academic English text revision engineer with profound English text rewriting skills.

## Skills

### Grammar Error Correction:

- Check and correct grammatical errors in the input text.

### Execute Sentence, Paragraph, or Chapter Revision:

- Enhance clarity, coherence, and academic quality.

- Ensure that the original meaning of the text is preserved.

### Content Evaluation:

- Automatically evaluate and compare the original EAP text with the revised text according to specified requirements, and output the evaluation results.

## Process Flow

### Grammar Error Correction:

- Perform a comprehensive grammar error correction on the input.

### Content Revision

- Select one of the following revision types based on the grammar-corrected EAP text: sentence revision, paragraph revision, or chapter revision. Perform the selected revision and output the revised text to the next node.

- When the input EAP text is a sentence, jump to the “Sentence Revision Node”.

- When the input EAP text is a paragraph (a single block of text without carriage returns between punctuation marks), jump to the “Paragraph Revision Node”.

- When the input EAP text is a chapter (several paragraphs, where a new paragraph is indicated by a carriage return between two punctuation marks), jump to the “Chapter Revision Node”.

### Content Evaluation:

- Use the appropriate evaluators based on the content type:

1. English Sentence: Use the “Sentence Evaluation”.

2. English Paragraph: Use the “Paragraph Evaluation”.

3. English Chapter: Use the “Chapter Evaluation”.

- Ensure detailed feedback on the improvements made during the evaluation.

## Constraints

- Ensure the execution of each skill without omission.

- Ensure the final execution of content evaluation.

- Refine any text input from the user without asking questions or interpreting the content.

- Maintain the precision, fidelity, and academic quality of the text revision.

- Select the correct evaluator based on the content type (sentence evaluation, paragraph evaluation, chapter evaluation).

- Ensure that the final output does not display the original EAP text.

Agent Setting

Input settings: Number of context rounds included: 50

Long-term memory: Off

Which node should the new round of conversation be sent to? Start node

Appendix A.2. GEC Agent

Scenarios

Perform a comprehensive grammatical error correction on the input text to ensure it adheres to standard grammatical rules.

Agent Prompt

### Role You are a highly skilled academic English text editor with exceptional expertise in diagnosing and revising English texts. Your role involves conducting thorough grammar checks and revisions on user-provided English inputs to ensure they meet the highest standards for academic publication.

### Skills:

- Perform comprehensive grammar and syntax checks on user-submitted English texts.

- Execute detailed revisions and enhancements to improve clarity, coherence, and overall quality.

- Ensure that the revised texts comply with the rigorous norms and standards required for academic publication.

### Constraints:

- Maintain precision and fidelity of the original text.

- Preserve the academic integrity of the content.

- Directly transmit the revised EAP text without sharing grammar correction evaluations with users.

Model Settings

Model: GPT-4o (128K)(https://www.openai.com/gpt-4o-128k) (accessed on 1 July 2024)

Generation diversity: balance

Temperature: 0.5

Top p: 1

Frequency penalty: 0

Presence penalty: 0

Response max length: 2048

Appendix A.3. Node Switch Agent

Agent Prompt

- When the input EAP text is a sentence, jump to the “Sentence Revision Node”

- When the input EAP text is a paragraph (a single block of text without carriage returns between punctuation marks), jump to the “Paragraph Revision Node”.

- When the input EAP text is a chapter (several paragraphs, where a new paragraph is indicated by a carriage return between two punctuation marks), jump to the “Chapter Revision Node”.

Node Switching Settings

Model: Gemini 1.5 Pro (https://deepmind.google/technologies/gemini/pro) (accessed on 1 July 2024)

Timing of judgment: After user input

Dialog rounds considered for the judgment: 6

Appendix A.4. Sentence Revision Agent

Scenarios

Read the EAP sentences transmitted from the agent-content classification and revise them to improve the overall sentence quality.

Agent Prompt

Please revise the input EAP sentence and rewrite it by matching it with appropriate phrase structures from the Academic Phrasebank. The goal is to improve the sentence’s academic writing quality and clarity. After revision, please proceed with the following steps:

- Sentence structure: Improve the sentence structure to comply with academic writing standards.

- Academic phrases: Use phrase structures from the Academic Phrasebank to make the sentence more academic and formal.

### Constraints

- Maintain the precision, fidelity, and academic integrity of the text throughout the revision process.

- After completing the analysis and revisions, do not display the final output to the user. Instead, directly transmit the revised text to the next node.

Model Settings

Model: Claude 3.5 (200K) (https://www.anthropic.com/api) (accessed on 1 July 2024)

Generation diversity: Custom

Temperature: 0.8

Response max length: 2048

Appendix A.5. Paragraph Revision Agent

Scenarios

If the input EAP text is a paragraph (a single block of text without carriage returns between punctuation marks), then proceed to this node.

Agent Prompt

If the input EAP text is only a single paragraph and not a chapter (multiple paragraphs), then proceed with the following paragraph analysis and revise accordingly:

1. Main idea:

- Identify the central idea or argument of the paragraph.

- Revise to ensure that the main idea is clearly presented.

2. Structure:

- Analyze the paragraph’s structure, including the topic sentence, supporting sentences, and concluding sentence.

- Revise to improve the paragraph structure.

3. Coherence:

- Assess the coherence and logical flow between the sentences within the paragraph.

- Revise to enhance coherence and flow.

4. Language use:

- Check for accuracy, academic tone, and clarity of expression in the language used.

- Use phrase structures from the Academic Phrasebank to make the sentence more academic and formal.

- Revise to improve language use.

5. Use of hedges and boosters:

- Identify and adjust the use of hedges and boosters as needed.

6. Use of passive voice:

- Identify instances of passive voice and modify as needed.

### Constraints - Maintain the precision, fidelity, and academic integrity of the text throughout the revision process. - After completing the analysis and revisions, do not display the final output to the user. Instead, directly transmit the revised text to the next node.

Model Settings

Model: Claude 3.5 (200K)

Generation diversity: Custom

Temperature: 0.8

Response max length: 2048

Appendix A.6. Chapter Revision Agent

Scenarios

If the input EAP text is a chapter (multiple paragraphs) and not a single paragraph or sentence, then proceed to this node.

Agent Prompt

Please conduct a comprehensive revision of the input text by following these steps:

1. Moves analysis:

- Identify and analyze the moves within the text, such as creating a research space, establishing a territory, and occupying a niche, as outlined in Swales’ CARS model.

- Based on this analysis, revise the text to clarify the function and purpose of each component. Overall text revision:

- Based on the moves analysis, revise the entire text.

- Ensure the logic, coherence, and academic quality of each section are improved accordingly.

3. Paragraph analysis and revision:

- After completing the overall text revision, analyze and revise each paragraph in detail.

- Follow these specific steps and revise the text accordingly:

a. Main idea:

- Identify the central idea or argument of each paragraph.

- Revise to ensure that the main idea is clearly presented.

b. Structure:

- Analyze the structure of the paragraph, including the topic sentence, supporting sentences, and concluding sentence.

- Revise to improve the paragraph structure.

c. Coherence:

- Evaluate the coherence and logical flow between sentences within the paragraph.

- Revise to enhance coherence and flow.

d. Language use:

- Check the accuracy, academic tone, and clarity of the language.

- Use phrase structures from the Academic Phrasebank to make the sentence more academic and formal.

- Revise to improve language use.

e. Evidence:

- Assess whether the evidence used in the paragraph is sufficient, strong, and relevant.

- Revise to strengthen the use of evidence.

4. Post-revision process:

- Conduct necessary review and proofreading steps to ensure the final version of the text meets high academic standards.

- Make final revisions to ensure quality and accuracy.

### Constraints

- Maintain the precision, fidelity, and academic integrity of the text throughout the revision process.

- After completing the analysis and revisions, do not display the final output to the user. Instead, directly transmit the revised text to the next node.

Model Settings

Model: Claude 3.5 (200K)

Generation diversity: Custom

Temperature: 0.8

Response max length: 4096

Appendix A.7. Sentence Evaluation Agent

Scenarios

As a linguist, evaluate the original EAP texts and their revised versions to ensure that the revised texts demonstrate a significant improvement in quality.

Agent Prompt

### Task Please conduct a detailed evaluation and comparison of the revised EAP text and the original EAP text. Start by outputting the revised EAP text, followed by the evaluation and analysis results for each aspect.

—————————————

## Output Format

- The output should only include the following elements:

### Revised EAP Text

- Revised text: output the revised EAP text.

### Grammatical Level

1. Grammar:

- Format: Identify the grammatical errors in the input text and explain how the revised text corrects these errors.

- Example:

- Errors: Subject–verb agreement, article usage, incorrect verb form.

- Corrections:

- “The data shows” → “The data show” (subject–verb agreement)

- “a important factor” → “an important factor” (article usage)

- “have saw” → “have seen” (incorrect verb form)

2. Use of lexical bundles:

- Format: List the lexical bundles added or modified in the revised text, and explain their appropriateness and importance.

- Example:

- Added:

- “as a result of” (clarifies causation)

- “in the context of” (provides specific context)

- Modified:

- “the study has shown that” → “this study demonstrates” (enhances conciseness)

- “in terms of the” → “with regard to” (improves clarity)

- Explanation:

- Adding “as a result of” helps clarify causation in the argument.

- “this study demonstrates” is more concise and impactful than “the study has shown that”.

- Modifying “in terms of the” to “with regard to” improves clarity and reduces wordiness.

3. Use of hedges and boosters:

- Format: Identify the use of hedges and boosters, noting each modification and explaining the reason for the change.

- Example:

- Original: “This result might suggest that the method is effective”.

- Revised: “This result suggests that the method is effective”.

- Reason: Removed the hedge “might” to make a stronger assertion.

- Original: “This clearly demonstrates the success of the intervention”.

- Revised: “This demonstrates the success of the intervention”.

- Reason: Removed the booster “clearly” to maintain a neutral tone.

4. Use of passive voice:

- Format: Identify the instances of passive voice and assess their appropriateness. For each modification, compare the original text with the revised text and explain the change. Also, explain why certain sentences were not changed.

- Example:

- Original: “The experiment was conducted by the team”.

- Revised: “The team conducted the experiment”.

- Explanation: Changed to active voice for clarity.

- Original: “The results were analyzed using statistical software”.

- Revised: No change.

- Explanation: Passive voice is appropriate here to emphasize the action rather than the actor.

### Content Level

5. Paragraph analysis:

- Original text: Evaluate the structure and coherence of the paragraphs.

- Main idea: Identify the central idea or argument of each paragraph.

- Structure: Analyze the structure, including the topic sentence, supporting sentences, and concluding sentence.

- Coherence: Assess the coherence and logical flow between sentences.

- Language use: Check the accuracy, academic tone, and clarity of the language.

- Revised text: Explain how the revisions have improved paragraph structure and coherence.

- Example:

- Main idea: Clarify the central argument.

- Structure: Improve topic and supporting sentences.

- Coherence: Enhance logical flow between sentences.

- Language use: Improve clarity and academic tone.

—————————————

## Constraints - Provide a detailed evaluation for each aspect listed above.

- Ensure the explanations are concise and avoid repeating the content of the original and revised texts.

- Maintain a consistent format for each evaluation aspect to ensure clarity and ease of understanding.

Model Settings

Model: Claude 3.5 (200K)

Generation diversity: Custom

Temperature: 0.5

Response max length: 2048

Appendix A.8. Paragraph Evaluation Agent

Scenarios

As a linguist, evaluate the original EAP texts and their revised versions to ensure that the revised texts demonstrate a significant improvement in quality.

Agent Prompt

### Task Please conduct a detailed evaluation and comparison of the revised EAP text and the original EAP text. Start by outputting the revised EAP text, followed by the evaluation and analysis results for each aspect.

—————————————

## Output Format

- The output should only include the following elements:

### Revised EAP Text

- Revised text: Output the revised EAP text.

### Grammatical Level

1. Grammar:

- Format: Identify the grammatical errors in the input text and explain how the revised text corrects these errors.

- Example:

- Errors: Subject–verb agreement, article usage, incorrect verb form.

- Corrections:

- “The data shows” → “The data show” (subject–verb agreement)

- “a important factor” → “an important factor” (article usage)

- “have saw” → “have seen” (incorrect verb form)

2. Use of lexical bundles:

- Format: List the lexical bundles added or modified in the revised text, and explain their appropriateness and importance.

- Example:

- Added:

- “as a result of” (clarifies causation)

- “in the context of” (provides specific context)

- Modified:

- “the study has shown that” → “this study demonstrates” (enhances conciseness)

- “in terms of the” → “with regard to” (improves clarity)

- Explanation:

- Adding “as a result of” helps clarify causation in the argument.

- “this study demonstrates” is more concise and impactful than “the study has shown that”.

- Modifying “in terms of the” to “with regard to” improves clarity and reduces wordiness.

3. Use of hedges and boosters:

- Format: Identify the use of hedges and boosters, noting each modification and explaining the reason for the change.

- Example:

- Original: “This result might suggest that the method is effective”.

- Revised: “This result suggests that the method is effective”.

- Reason: Removed the hedge “might” to make a stronger assertion.

- Original: “This clearly demonstrates the success of the intervention”.

- Revised: “This demonstrates the success of the intervention”.

- Reason: Removed the booster “clearly” to maintain a neutral tone.

4. Use of passive voice:

- Format: Identify the instances of passive voice and assess their appropriateness. For each modification, compare the original text with the revised text and explain the change. Also, explain why certain sentences were not changed.

- Example:

- Original: “The experiment was conducted by the team”.

- Revised: “The team conducted the experiment”.

- Explanation: Changed to active voice for clarity.

- Original: “The results were analyzed using statistical software”.

- Revised: No change.

- Explanation: Passive voice is appropriate here to emphasize the action rather than the actor.

### Content Level

5. Paragraph analysis:

- Original text: Evaluate the structure and coherence of the paragraphs.

- Main idea: Identify the central idea or argument of each paragraph.

- Structure: Analyze the structure, including the topic sentence, supporting sentences, and concluding sentence.

- Coherence: Assess the coherence and logical flow between sentences.

- Language use: Check the accuracy, academic tone, and clarity of the language.

- Revised text: Explain how the revisions have improved paragraph structure and coherence.

- Example:

- Main idea: Clarify the central argument.

- Structure: Improve topic and supporting sentences.

- Coherence: Enhance logical flow between sentences.

- Language Use: Improve clarity and academic tone.

—————————————

## Constraints - Provide a detailed evaluation for each aspect listed above.

- Ensure the explanations are concise and avoid repeating the content of the original and revised texts.

- Maintain a consistent format for each evaluation aspect to ensure clarity and ease of understanding.

Model Settings

Model: Claude 3.5 (200K)

Generation diversity: Balance

Temperature: 0.5

Response max length: 4096

Appendix A.9. Chapter Evaluation Agent

Scenarios

As a linguist, evaluate the original EAP texts and their revised versions to ensure that the revised texts demonstrate a significant improvement in quality.

Agent Prompt

### Task Please conduct a detailed evaluation and comparison of the revised EAP text and the original EAP text. Start by outputting the revised EAP text, followed by the evaluation and analysis results for each aspect.

—————————————

## Output Format

- The output should only include the following elements:

### Revised EAP Text

- Revised text: Output the revised EAP text.

### Grammatical Level

1. Grammar:

- Format: Identify the grammatical errors in the input text and explain how the revised text corrects these errors.

- Example:

- Errors: Subject–verb agreement, article usage, incorrect verb form.

- Corrections:

- “The data shows” → “The data show” (subject–verb agreement)

- “a important factor” → “an important factor” (article usage)

- “have saw” → “have seen” (incorrect verb form)

2. Use of lexical bundles:

- Format: List the lexical bundles added or modified in the revised text, and explain their appropriateness and importance.

- Example:

- Added:

- “as a result of” (clarifies causation)

- “in the context of” (provides specific context)

- Modified:

- “the study has shown that” → “this study demonstrates” (enhances conciseness)

- “in terms of the” → “with regard to” (improves clarity)

- Explanation:

- Adding “as a result of” helps clarify causation in the argument.

- “this study demonstrates” is more concise and impactful than “the study has shown that”.

- Modifying “in terms of the” to “with regard to” improves clarity and reduces wordiness.

3. Use of hedges and boosters:

- Format: Identify the use of hedges and boosters, noting each modification and explaining the reason for the change.

- Example:

- Original: “This result might suggest that the method is effective”.

- Revised: “This result suggests that the method is effective”.

- Reason: Removed the hedge “might” to make a stronger assertion.

- Original: “This clearly demonstrates the success of the intervention”.

- Revised: “This demonstrates the success of the intervention”.

- Reason: Removed the booster “clearly” to maintain a neutral tone.

4. Use of passive voice:

- Format: Identify the instances of passive voice and assess their appropriateness. For each modification, compare the original text with the revised text and explain the change. Also, explain why certain sentences were not changed.

- Example:

- Original: “The experiment was conducted by the team”.

- Revised: “The team conducted the experiment”.

- Explanation: Changed to active voice for clarity.

- Original: “The results were analyzed using statistical software”.

- Revised: No change.

- Explanation: Passive voice is appropriate here to emphasize the action rather than the actor.

### Content Level

5. Paragraph analysis:

- Original text: Evaluate the structure and coherence of the paragraphs.

- Main idea: Identify the central idea or argument of each paragraph.

- Structure: Analyze the structure, including the topic sentence, supporting sentences, and concluding sentence.

- Coherence: Assess the coherence and logical flow between sentences.

- Language Use: Check the accuracy, academic tone, and clarity of the language.

- Evidence: Evaluate the sufficiency, strength, and relevance of the evidence used.

- Revised text: Explain how the revisions have improved paragraph structure and coherence.

- Example:

- Main idea: Clarify the central argument.

- Structure: Improve topic and supporting sentences.

- Coherence: Enhance logical flow between sentences.

- Language Use: Improve clarity and academic tone.

- Evidence: Strengthen the relevance of supporting evidence.

6. Moves analysis:

- Original text: Assess the moves structure, identifying any weaknesses or gaps, and note if any moves are missing.

- Example:

- Move 1 (Establishing a territory): “Many studies have shown..”.

- Move 2 (Establishing a niche): “However, there is a gap in..”.

- Move 3 (Occupying the niche): “This study aims to..”. - Explanation: Identify if Move 2 is missing or weak.

- Revised text: Perform a detailed moves analysis on the revised text, explaining the enhancements made, including any added moves, and their importance in strengthening the argument and flow of the text.

- Example:

- Move 1: Expanded background information for better context.

- Move 2: Clearly identified the gap and its significance.

- Move 3: Strongly stated the research aim and contributions.

- Added Move: “However, there is a lack of research in..”. (to establish a niche).

- Explanation: Adding Move 2 (“However, there is a lack of research in..”.) clarifies the research gap, making the argument more compelling and structured.

—————————————

## Constraints

- Provide a detailed evaluation for each aspect listed above. - Ensure the explanations are concise and avoid repeating the content of the original and revised texts.

- Maintain a consistent format for each evaluation aspect to ensure clarity and ease of understanding.

Model Settings

Model: Claude 3.5 (200K)

Generation diversity: Balance

Temperature: 0.5

Response max length: 4096

Appendix B. Examples of Paragraph and Chapter Revisions Using AcademiCraft

Appendix B.1. An Example of Paragraph Revision

User Prompt

To summarise, while there are compelling arguments in favour of students working in addition to their studies, it is critical to establish a balance that is appropriate for individual circumstances. A balanced approach is recommended, taking into account issues such as course load, work commitments, and personal well-being. Students should assess their ability to effectively manage both tasks while keeping in mind that their primary focus should stay on their studies. Finally, working while studying should be founded on a full assessment of one’s talents and aims, ensuring that the quest of practical experience does not jeopardise the pursuit of knowledge.

System Output

Revised Text: To summarize, while compelling arguments exist in favor of students working alongside their studies, establishing an appropriate balance based on individual circumstances is crucial. A balanced approach is recommended, considering factors such as course load, work commitments, and personal well-being. Students should evaluate their ability to effectively manage both responsibilities, keeping in mind that their primary focus should remain on their academic pursuits. Ultimately, the decision to work while studying should be based on a comprehensive assessment of one’s capabilities and goals, ensuring that the pursuit of practical experience does not compromise the acquisition of knowledge.

Grammatical Level

1. Grammar:

- Errors: Minor spelling and punctuation issues.

- Corrections:

- “summarise” → “summarize” (American English spelling)

- “talents” → “capabilities” (more appropriate word choice)

- Removed unnecessary comma after “Finally”

2. Use of lexical bundles:

- Added:

- “based on” (provides specific context)

- “such as” (introduces examples)

- Modified:

- “in addition to” → “alongside” (improves conciseness)

- “taking into account” → “considering” (enhances clarity)

- Explanation:

- Adding “based on” helps clarify the context for decision-making.

- Using “such as” introduces specific examples more effectively.

- Changing “in addition to” to “alongside” reduces wordiness.

- Modifying “taking into account” to “considering” improves clarity and conciseness.

3. Use of hedges and boosters:

- Original: “it is critical to establish”

- Revised: “establishing … is crucial”

- Reason: Removed the hedge “it is” to make a stronger assertion and improve sentence structure.

- Original: “Students should assess”

- Revised: “Students should evaluate”

- Reason: Changed to a more precise verb without altering the level of certainty.

4. Use of passive voice:

- Original: “A balanced approach is recommended”

- Revised: No change.

- Explanation: Passive voice is appropriate here to emphasize the recommendation rather than who is making it.

Content Level

5. Paragraph analysis:

- Original text:

- Main idea: The importance of balancing work and study for students.

- Structure: The paragraph has a clear introduction, supporting points, and a conclusion.

- Coherence: The ideas flow logically, but some transitions could be improved.

- Language Use: Generally accurate, but some phrases could be more concise.

- Revised text:

- Main idea: Clarify and maintain the central argument about balancing work and study.

- Structure: Improve the flow of ideas by reorganizing some sentences and enhancing transitions.

- Coherence: Enhance logical flow between sentences, particularly in the middle section.

- Language use: Improve clarity and academic tone by using more precise language and removing redundancies.

- Example improvements:

- Replaced “in addition to” with “alongside” for conciseness.

- Changed “taking into account” to “considering” for clarity.

- Reworded the final sentence to emphasize the importance of balancing practical experience with academic pursuits.

The revised text maintains the original message while enhancing its clarity, coherence, and academic tone. The changes made improve the overall structure and readability of the paragraph, making the argument more compelling and easier to follow.

Appendix B.2. An Example of Chapter Revision

User Prompt

I agree that it is important for college students to have a part-time work. A part-time work offers numerous advantages that can improve a student’s generally college encounter. Firstly, working part-time allows understudies to pick up commonsense involvement and create critical aptitudes that complement their scholarly considers. They can apply classroom information in a real-world setting, moving forward their understanding of concepts and upgrading their problem-solving capacities.

Besides, having a part-time work gives money related autonomy for college understudies. Numerous students face the burden of educational expenses, and a part-time work can offer assistance reduce the money related strain. This not as it were helps the money related stack on their families but moreover educates them profitable lessons approximately money related obligation and budgeting.

Furthermore, a part-time work uncovered understudies to differing work situations and empowers them to arrange with experts in their chosen areas. These associations can be important for future internships, work openings, and mentorship. Besides, a part-time work permits understudies to create fundamental life aptitudes such as time administration, multitasking, and compelling communication, all of which are exceedingly esteemed by managers.

Finally, adjusting work and academics makes a difference students develop teach, strength, and the capacity to prioritize errands. Juggling different duties plans them for the challenges they will confront in their careers and prepares them with the essential abilities to exceed expectations in their proficient lives.

In conclusion, having a part-time work whereas in college offers practical involvement, money related freedom, organizing openings, and vital life aptitudes. It contributes to a well-rounded instructive travel and plans understudies for future victory. In this manner, I unequivocally accept that it is vital for college understudies to consider part-time work as a important component of their by and large improvement.

System Output

Revised Text: I agree that it is important for college students to have part-time work. A part-time job offers numerous advantages that can enhance a student’s overall college experience. Firstly, working part-time allows students to gain practical experience and develop crucial skills that complement their academic studies. They can apply classroom knowledge in a real-world setting, improving their understanding of concepts and enhancing their problem-solving abilities.

Moreover, having a part-time job provides financial independence for college students. Many students face the burden of educational expenses, and a part-time job can help alleviate the financial strain. This not only eases the financial burden on their families but also teaches them valuable lessons about financial responsibility and budgeting.

Furthermore, a part-time job exposes students to diverse work environments and enables them to network with professionals in their chosen fields. These connections can be valuable for future internships, job opportunities, and mentorship. Additionally, a part-time job allows students to develop essential life skills such as time management, multitasking, and effective communication, all of which are highly valued by employers.

Finally, balancing work and academics helps students develop discipline, resilience, and the ability to prioritize tasks. Juggling multiple responsibilities prepares them for the challenges they will face in their careers and equips them with the necessary skills to excel in their professional lives.

In conclusion, having a part-time job while in college offers practical experience, financial independence, networking opportunities, and vital life skills. It contributes to a well-rounded educational journey and prepares students for future success. Therefore, I strongly believe that it is crucial for college students to consider part-time work as an essential component of their overall development.

Grammatical Level

1. Grammar: Errors: Spelling errors, subject–verb agreement, incorrect word usage, and punctuation errors.

Corrections:

- “A part-time work” → “A part-time job” (incorrect noun usage)

- “understudies” → “students” (spelling error)

- “pick up” → “gain” (more formal word choice)

- “moving forward” → “improving” (more appropriate phrasing)

- “offer assistance” → “help” (simplification)

- “not as it were” → “not only” (spelling and phrasing error)

- “money related” → “financial” (correct term)

- “uncovered” → “exposes” (correct verb form)

- “areas” → “fields” (more appropriate term)

- “makes a difference” → “helps” (correct phrasing)

- “exceed expectations” → “excel” (correct phrasing)

- “plans” → “prepares” (correct verb usage)

2. Use of Lexical Bundles:

Added:

- “in conclusion” (signals the end of the argument)

- “furthermore” (adds additional information)

- “moreover” (introduces a new point)

Modified:

- “I agree that” → retained (introduces the writer’s stance)

- “Firstly” → retained (signals the first point in a series)

- “Finally” → retained (signals the last point in a series)

Explanation:

- The addition of “furthermore” and “moreover” improves the flow and coherence between paragraphs.

- Retaining “Firstly” and “Finally” maintains the clear structure of the argument.

3. Use of Hedges and Boosters:

- Original: “I unequivocally accept”

- Revised: “I strongly believe”

- Reason: Changed to a more common academic phrasing while maintaining the strong stance.

- Original: “can improve”

- Revised: “can enhance”

- Reason: Maintained the hedge “can” to acknowledge potential variability in outcomes.

4. Use of Passive Voice:

- Original: “These associations can be important”

- Revised: “These connections can be valuable”

- Explanation: Maintained passive voice as it is appropriate here to emphasize the importance of the connections rather than who finds them valuable.

- Original: “Juggling different duties plans them”

- Revised: “Juggling multiple responsibilities prepares them”

- Explanation: Changed to active voice for clarity and to emphasize the action of preparation.

Content Level

5. Paragraph analysis:

Original text:

- Main idea: Each paragraph had a clear main idea, but some lacked coherence and proper development.

- Structure: Paragraphs generally had a topic sentence, supporting sentences, and a concluding sentence, but the flow was sometimes disrupted by grammatical errors.

- Coherence: Some sentences lacked logical flow due to incorrect word usage and grammatical errors.

- Language use: The language was often informal and contained numerous spelling and grammatical errors.

- Evidence: The text provided relevant points but lacked specific examples or data to support claims.

Revised text:

- Main idea: Each paragraph now has a clear, well-developed main idea.

- Structure: Improved topic sentences and supporting details enhance the overall structure.

- Coherence: Better sentence structure and word choice improve the logical flow within and between paragraphs.

- Language use: The language is now more formal, accurate, and clear, maintaining an academic tone throughout.

- Evidence: While specific data are still lacking, the points are now more clearly articulated and logically presented.

6. Moves analysis:

Original text:

- Move 1 (establishing a territory): “I agree that it is important for college students to have a part-time work”.

- Move 2 (establishing a niche): Not clearly present.

- Move 3 (occupying the niche): The rest of the text attempts to occupy the niche by listing the benefits of part-time work.

- Explanation: The text lacks a clear Move 2, which would typically highlight the importance or novelty of the argument.

Revised text:

- Move 1: “I agree that it is important for college students to have part-time work”.

- Move 2: “A part-time job offers numerous advantages that can enhance a student’s overall college experience”.

- Move 3: The subsequent paragraphs occupy the niche by detailing the benefits of part-time work.

- Explanation: The revision strengthens Move 1 and adds a clear Move 2, establishing the importance of the topic. Move 3 is more coherently developed through the improved paragraph structure and language use.

References

Zhao, J. Native speaker advantage in academic writing? Conjunctive realizations in EAP writing by four groups of writers. Ampersand 2017, 4, 47–57. [Google Scholar] [CrossRef]
Ye, Y. EAP for undergraduate science and engineering students in an EFL context: What should we teach? Ampersand 2020, 7, 100065. [Google Scholar] [CrossRef]
Özdemir-Yılmazer, M. Exploring students’ developing perception of writing in an EAP program of an English-medium university in Turkey. Read. Matrix Int. Online J. 2022, 22, 186–198. [Google Scholar]
Hamam, D. A Study of the Rhetorical Features and the Argument Structure of EAP Essays by L1 & L2 Students. Ph.D. Thesis, The British University in Dubai, Dubai, United Arab Emirates, 2019. [Google Scholar]
Basturkmen, H.; Von Randow, J. Guiding the reader (or not) to re-create coherence: Observations on postgraduate student writing in an academic argumentative writing task. J. Engl. Acad. Purp. 2014, 16, 14–22. [Google Scholar] [CrossRef]
Mazgutova, D. Linguistic and Cognitive Development of L2 Writing During an Intensive English for Academic Purposes (EAP) Programme; Lancaster University: Lancaster, UK, 2015. [Google Scholar]
Asaoka, C.; Usui, Y. Students’ perceived problems in an EAP writing course. Jalt J. 2003, 25, 143–172. [Google Scholar] [CrossRef]
Abdi Tabari, M.; Bui, G.; Wang, Y. The effects of topic familiarity on emotionality and linguistic complexity in EAP writing. Lang. Teach. Res. 2021, 28, 1616–1634. [Google Scholar] [CrossRef]
Mazgutova, D.; Kormos, J. Syntactic and lexical development in an intensive English for Academic Purposes programme. J. Second Lang. Writ. 2015, 29, 3–15. [Google Scholar] [CrossRef]
Tarasova, E.; Beliaeva, N. The Role of Morphological Knowledge in EAP Writing. In Linguistic Approaches in English for Academic Purposes: Expanding the Discourse; Bloomsbury Publishing: London, UK, 2023; p. 63. [Google Scholar]
Youn, S.J. Measuring syntactic complexity in L2 pragmatic production: Investigating relationships among pragmatics, grammar, and proficiency. System 2014, 42, 270–287. [Google Scholar] [CrossRef]
Asiyaban, A.R.; Yamini, M.; Bagheri, M.S.; Yarmohammadi, L. Implicit/explicit knowledge and its contribution towards tense consistency employment across EFL learners’ proficiency levels. Cogent Educ. 2020, 7, 1727129. [Google Scholar] [CrossRef]
Storch, N.; Tapper, J. The impact of an EAP course on postgraduate writing. J. Engl. Acad. Purp. 2009, 8, 207–223. [Google Scholar] [CrossRef]
Bhowmik, S.; Chaudhuri, A. Addressing culture in L2 writing: Teaching strategies for the EAP classroom. Tesol Q. 2022, 56, 1410–1429. [Google Scholar] [CrossRef]
Christiansen, M.S. Multimodal L2 composition: EAP in the digital era. Int. J. Lang. Stud. 2017, 11, 53. [Google Scholar]
Lee, J.J.; Subtirelu, N.C. Metadiscourse in the classroom: A comparative analysis of EAP lessons and university lectures. Engl. Specif. Purp. 2015, 37, 52–62. [Google Scholar] [CrossRef]
Youn, S.J. Task-based needs analysis of L2 pragmatics in an EAP context. J. Engl. Acad. Purp. 2018, 36, 86–98. [Google Scholar] [CrossRef]
Crosthwaite, P.; Jiang, K. Does EAP affect written L2 academic stance? A longitudinal learner corpus study. System 2017, 69, 92–107. [Google Scholar] [CrossRef]
Kim, J.e.; Nam, H. How do textual features of L2 argumentative essays differ across proficiency levels? A multidimensional cross-sectional study. Read. Writ. 2019, 32, 2251–2279. [Google Scholar] [CrossRef]
Shi, S.; Zhao, E.; Bi, W.; Cai, D.; Cui, L.; Huang, X.; Jiang, H.; Tang, D.; Song, K.; Wang, L.; et al. Effidit: An assistant for improving writing efficiency. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), Toronto, ON, Canada, 9–14 July 2023; pp. 508–515. [Google Scholar]
Kaneko, M.; Okazaki, N. Controlled Generation with Prompt Insertion for Natural Language Explanations in Grammatical Error Correction. arXiv 2023, arXiv:2309.11439. [Google Scholar]
Morris, C.; Jurado, M.; Zutty, J. LLM Guided Evolution-The Automation of Models Advancing Models. In Proceedings of the Genetic and Evolutionary Computation Conference, Melbourne, VIC, Australia, 14–18 July 2024; pp. 377–384. [Google Scholar]
Akiba, T.; Shing, M.; Tang, Y.; Sun, Q.; Ha, D. Evolutionary optimization of model merging recipes. arXiv 2024, arXiv:2403.13187. [Google Scholar] [CrossRef]
Alonso, M.P.; Beamonte, A.; Gargallo, P.; Salvador, M. Local labour markets delineation: An approach based on evolutionary algorithms and classification methods. J. Appl. Stat. 2015, 42, 1043–1063. [Google Scholar] [CrossRef]
Ma, Y.; Jianye, H.; Liang, H.; Xiao, C. Rethinking decision transformer via hierarchical reinforcement learning. In Proceedings of the Forty-First International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023. [Google Scholar]
Shukla, Y.; Gao, W.; Sarathy, V.; Velasquez, A.; Wright, R.; Sinapov, J. LgTS: Dynamic Task Sampling using LLM-generated sub-goals for Reinforcement Learning Agents. arXiv 2023, arXiv:2310.09454. [Google Scholar]
Chang, J.D.; Brantley, K.; Ramamurthy, R.; Misra, D.; Sun, W. Learning to Generate Better than Your LLM. arXiv 2023, arXiv:2306.11816. [Google Scholar]
Havrilla, A.; Du, Y.; Raparthy, S.C.; Nalmpantis, C.; Dwivedi-Yu, J.; Zhuravinskyi, M.; Hambro, E.; Sukhbaatar, S.; Raileanu, R. Teaching Large Language Models to Reason with Reinforcement Learning. arXiv 2024, arXiv:2403.04642. [Google Scholar]
Zeng, M.; Kuang, J.; Qiu, M.; Song, J.; Park, J. Evaluating Prompting Strategies for Grammatical Error Correction Based on Language Proficiency. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), Torino, Italy, 20–25 May 2024; pp. 6426–6430. [Google Scholar]
Sachdev, R.; Wang, Z.Q.; Yang, C.H.H. Evolutionary Prompt Design for LLM-Based Post-ASR Error Correction. arXiv 2024, arXiv:2407.16370. [Google Scholar]
Li, W.; Wang, H. Detection-Correction Structure via General Language Model for Grammatical Error Correction. arXiv 2024, arXiv:2405.17804. [Google Scholar]
Bryant, C.; Yuan, Z.; Qorib, M.R.; Cao, H.; Ng, H.T.; Briscoe, T. Grammatical error correction: A survey of the state of the art. Comput. Linguist. 2023, 49, 643–701. [Google Scholar] [CrossRef]
Du, Z.; Hashimoto, K. Exploring Sentence-Level Revision Capabilities of llms in English for Academic Purposes Writing Assistance; Research Square: Rockville, MD, USA, 2024. [Google Scholar]
Zhang, B.; Haddow, B.; Birch, A. Prompting large language model for machine translation: A case study. In Proceedings of the International Conference on Machine Learning, PMLR, Honolulu, HI, USA, 23–29 July 2023; pp. 41092–41110. [Google Scholar]
Zhan, T.; Shi, C.; Shi, Y.; Li, H.; Lin, Y. Optimization Techniques for Sentiment Analysis Based on LLM (GPT-3). arXiv 2024, arXiv:2405.09770. [Google Scholar]
Gao, G.; Taymanov, A.; Salinas, E.; Mineiro, P.; Misra, D. Aligning llm agents by learning latent preference from user edits. arXiv 2024, arXiv:2404.15269. [Google Scholar]
Jin, H.; Zhang, Y.; Meng, D.; Wang, J.; Tan, J. A Comprehensive Survey on Process-Oriented Automatic Text Summarization with Exploration of LLM-Based Methods. arXiv 2024, arXiv:2403.02901. [Google Scholar]
Guo, S.; Zhang, S.; Ma, Z.; Zhang, M.; Feng, Y. SiLLM: Large Language Models for Simultaneous Machine Translation. arXiv 2024, arXiv:2402.13036. [Google Scholar]
Xing, F. Designing Heterogeneous LLM Agents for Financial Sentiment Analysis. arXiv 2024, arXiv:2401.05799. [Google Scholar]
Wang, W.; Feng, C. Kernel-based Consensus Control of Multi-agent Systems with Unknown System Dynamics. Int. J. Control. Autom. Syst. 2023, 21, 2398–2408. [Google Scholar] [CrossRef]
de Zarzà, I.; de Curtò, J.; Roig, G.; Manzoni, P.; Calafate, C.T. Emergent cooperation and strategy adaptation in multi-agent systems: An extended coevolutionary theory with llms. Electronics 2023, 12, 2722. [Google Scholar] [CrossRef]
Mita, M.; Sakaguchi, K.; Hagiwara, M.; Mizumoto, T.; Suzuki, J.; Inui, K. Towards automated document revision: Grammatical error correction, fluency edits, and beyond. arXiv 2022, arXiv:2205.11484. [Google Scholar]
McCarthy, K.S.; Roscoe, R.D.; Allen, L.K.; Likens, A.D.; McNamara, D.S. Automated writing evaluation: Does spelling and grammar feedback support high-quality writing and revision? Assess. Writ. 2022, 52, 100608. [Google Scholar] [CrossRef]
Coyne, S.; Sakaguchi, K.; Galvan-Sosa, D.; Zock, M.; Inui, K. Analyzing the performance of gpt-3.5 and gpt-4 in grammatical error correction. arXiv 2023, arXiv:2303.14342. [Google Scholar]
Cao, J.; Li, M.; Wen, M.; Cheung, S.c. A study on prompt design, advantages and limitations of chatgpt for deep learning program repair. arXiv 2023, arXiv:2304.08191. [Google Scholar] [CrossRef]
Yu, L. Investigating L2 writing through tutor-tutee interactions and revisions: A case study of a multilingual writer in EAP tutorials. J. Second Lang. Writ. 2020, 48, 100709. [Google Scholar] [CrossRef]
McLucas, M.A. Adopting a basic student peer review process in EAP A/B writing. In Reports from English Teachers’ Seminars; Chubu University: Kasugai, Japan, 2021; Volume 4, pp. 20–29. [Google Scholar]
Pack, A.; Barrett, A.; Liang, H.N.; Monteiro, D.V. University EAP students’ perceptions of using a prototype virtual reality learning environment to learn writing structure. Int. J. Comput.-Assist. Lang. Learn. Teach. (IJCALLT) 2020, 10, 27–46. [Google Scholar] [CrossRef]
Malakhovskaya, M.; Beliaeva, L.; Kamshilova, O. Teaching noun-phrase composition in EAP/ESP context: A corpus-assisted approach to overcome a didactic gap. J. Teach. Engl. Specif. Acad. Purp. 2021, 9, 257–266. [Google Scholar] [CrossRef]
Uludag, P.; McDonough, K. Validating a rubric for assessing integrated writing in an EAP context. Assess. Writ. 2022, 52, 100609. [Google Scholar] [CrossRef]
Pecorari, D. Formulaic language in biology: A topic-specific investigation. Acad. Writ. Interface Corpus Discourse 2009, 91, 105. [Google Scholar]
Biber, D.; Johansson, S.; Leech, G.; Conrad, S.; Finegan, E. Longman Grammar of Spoken and Written ENGLISH; Pearson Japan: Tokyo, Japan, 2000. [Google Scholar]
Du, Z.; Hashimoto, K. Data augmentation for sentrev using back-translation of lexical bundles. In Proceedings of the 37th Pacific Asia Conference on Language, Information and Computation, Hong Kong, China, 2–4 December 2023; pp. 70–79. [Google Scholar]
Du, Z.; Hashimoto, K. Sentence-level revision with neural reinforcement learning. In Proceedings of the 35th Conference on Computational Linguistics and Speech Processing (ROCLING 2023), Taipei City, Taiwan, 20–21 October 2023; pp. 202–209. [Google Scholar]
Hyon, S.; Chen, R. University Faculty Writing and EAP Education: Beyond the Research Article. In Proceedings of the Annual Meeting of the American Association of Applied Linguistics, St. Louis, MO, USA, 24–27 February 2001. [Google Scholar]
Morley, J. Academic Phrasebank; University of Manchester: Manchester, UK, 2014. [Google Scholar]
Akbas, E.; Hardman, J. Strengthening or weakening claims in academic knowledge construction: A comparative study of hedges and boosters in postgraduate academic writing. Educ. Sci. Theory Pract. 2018, 18, 831–859. [Google Scholar]
Salichah, I. Hedges and Boosters in Undergraduate Students’ Research Articles. Ph.D. Thesis, Universitas Negeri Malang, Malang, Indonesia, 2015. [Google Scholar]
Bacang, B.C.; Rillo, R.M.; Alieto, E.O. The Gender Construct in the Use of Rhetorical Appeals, Hedges, and Boosters in ESL Writing: A Discourse Analysis. Online Submiss. 2019, 25, 210–224. [Google Scholar]
Herminingsih, D.I.; Isro’iyah, L. The metadiscourse analysis in abstracts of multidisciplinary sciences journal articles: Hedges vs boosters. Int. Linguist. Res. 2023, 6, 24. [Google Scholar]
Hinkel, E. Tense, aspect and the passive voice in L1 and L2 academic texts. Lang. Teach. Res. 2004, 8, 5–29. [Google Scholar] [CrossRef]
Du, Z.; Hashimoto, K. Decoding Academic Language: The Symbiotic Relationship Between Boosters, Hedges, and Voice in EAP. In Proceedings of the 2024 12th International Conference on Information and Education Technology (ICIET), Yamaguchi, Japan, 18–20 March 2024; pp. 46–52. [Google Scholar] [CrossRef]
Swales, J.M. Genre Analysis: English in Academic and Research Settings; Cambridge University Press: Cambridge, UK, 1990. [Google Scholar]
Swales, J. Create a research space (CARS) model of research introductions. InWriting About Writing: A College Reader; Bedford/st Martins: New York, NY, USA, 2014; pp. 12–15. [Google Scholar]
Tang, W.; Liu, L.; Long, G. Interpretable time-series classification on few-shot samples. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 19–24 July 2020; pp. 1–8. [Google Scholar]
Granger, S. The computer learner corpus: A versatile new source of data for SLA research. In Learner English on Computer; Granger, S., Ed.; Addison Wesley Longman: London, UK; New York, NY, USA, 1998; pp. 3–18. [Google Scholar]
Du, Z.; Hashimoto, K. Tcnaec: Advancing sentence-level revision evaluation through diverse non-native academic english insights. IEEE Access 2023, 11, 144939–144952. [Google Scholar]
Ishikawa, S. A New Horizon in Learner Corpus Studies: The Aim of the ICNALE Project. In Corpus-Based Studies in Language Use, Language Learning, and Language Documentation; Weir, G., Ishikawa, S., Poonpon, K., Eds.; Rodopi: Amsterdam, The Netherlands, 2011; pp. 3–11. [Google Scholar]
Wang, A.; Singh, A.; Michael, J.; Hill, F.; Levy, O.; Bowman, S. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, Brussels, Belgium, 1 November 2018; pp. 353–355. [Google Scholar] [CrossRef]
Chen, P.; Ghattas, O. Stein variational reduced basis Bayesian inversion. arXiv 2020, arXiv:2002.10924. [Google Scholar]
Reimers, N.; Gurevych, I. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. arXiv 2019, arXiv:1908.10084. [Google Scholar]
Flesch, R. How to Write Plain English: A Book for Lawyers and Consumers; Harper & Row: New York, NY, USA, 1949. [Google Scholar]
Gunning, R. The Technique of Clear Writing; McGraw-Hill: New York, NY, USA, 1952. [Google Scholar]
Kyle, K.; Crossley, S.; Berger, C. The tool for the automatic analysis of lexical sophistication (TAALES): Version 2.0. Behav. Res. Methods 2018, 50, 1030–1046. [Google Scholar]
Crossley, S.A.; Kyle, K.; Dascalu, M. The Tool for the Automatic Analysis of Cohesion 2.0: Integrating semantic similarity and text overlap. Behav. Res. Methods 2019, 51, 14–27. [Google Scholar] [CrossRef]
Landauer, T.K.; Foltz, P.W.; Laham, D. An Introduction to Latent Semantic Analysis. Discourse Process. 1998, 25, 259–284. [Google Scholar] [CrossRef]
Blei, D.M.; Ng, A.Y.; Jordan, M.I. Latent Dirichlet Allocation. J. Mach. Learn. Res. 2003, 3, 993–1022. [Google Scholar]
Church, K.W. Word2Vec. Nat. Lang. Eng. 2017, 23, 155–162. [Google Scholar] [CrossRef]
Kobayashi, M.; Mita, M.; Komachi, M. Large Language Models Are State-of-the-Art Evaluator for Grammatical Error Correction. arXiv 2024, arXiv:2403.17540. [Google Scholar]
Chan, C.M.; Chen, W.; Su, Y.; Yu, J.; Xue, W.; Zhang, S.; Fu, J.; Liu, Z. Chateval: Towards better llm-based evaluators through multi-agent debate. arXiv 2023, arXiv:2308.07201. [Google Scholar]
Zheng, L.; Chiang, W.L.; Sheng, Y.; Zhuang, S.; Wu, Z.; Zhuang, Y.; Lin, Z.; Li, Z.; Li, D.; Xing, E.; et al. Judging llm-as-a-judge with mt-bench and chatbot arena. Adv. Neural Inf. Process. Syst. 2024, 36, 46595–46623. [Google Scholar]
Shankar, S.; Zamfirescu-Pereira, J.; Hartmann, B.; Parameswaran, A.G.; Arawjo, I. Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences. arXiv 2024, arXiv:2404.12272. [Google Scholar]
Liu, Y.; Zhou, H.; Guo, Z.; Shareghi, E.; Vulic, I.; Korhonen, A.; Collier, N. Aligning with human judgement: The role of pairwise preference in large language model evaluators. arXiv 2024, arXiv:2403.16950. [Google Scholar]
Papineni, K.; Roukos, S.; Ward, T.; Zhu, W.J. BLEU: A Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, PA, USA, 6–12 July 2002; pp. 311–318. [Google Scholar]
Brown, P.F.; Pietra, S.A.D.; Pietra, V.J.D.; Mercer, R.L. A Statistical Approach to Machine Translation. Comput. Linguist. 1993, 19, 263–311. [Google Scholar]
Reiter, E. A structured review of the validity of BLEU. Comput. Linguist. 2018, 44, 393–401. [Google Scholar]
Kuribayashi, T.; Oseki, Y.; Ito, T.; Yoshida, R.; Asahara, M.; Inui, K. Lower Perplexity is Not Always Human-Like. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Virtual, 1–6 August 2021; Zong, C., Xia, F., Li, W., Navigli, R., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021; pp. 5203–5217. [Google Scholar] [CrossRef]
Karademir, T.; Alper, A.; Soğuksu, A.F.; Karababa, Z.C. The development and evaluation of self-directed digital learning material development platform for foreign language education. Interact. Learn. Environ. 2021, 29, 600–617. [Google Scholar] [CrossRef]
Granger, S.; Bestgen, Y. The use of collocations by intermediate vs. advanced non-native writers: A bigram-based study. Int. Rev. Appl. Linguist. Lang. Teach. 2014, 52, 229–252. [Google Scholar] [CrossRef]
Johnson, M.D. Cognitive task complexity and L2 written syntactic complexity, accuracy, lexical complexity, and fluency: A research synthesis and meta-analysis. J. Second Lang. Writ. 2017, 37, 13–38. [Google Scholar] [CrossRef]
Smirnova, E. Corpus Analysis of Academic Discourse Features: Implications for eap/esp Writing. Ph.D. Thesis, Universidade de Vigo, Vigo, Spain, 2023. [Google Scholar]
Hyland, K. The ‘other’ English: Thoughts on EAP and academic writing. Eur. Engl. Messenger 2006, 15, 34–38. [Google Scholar]
Gebril, A.; Plakans, L. Toward a transparent construct of reading-to-write tasks: The interface between discourse features and proficiency. Lang. Assess. Q. 2013, 10, 9–27. [Google Scholar] [CrossRef]
García-Ostbye, I.C.; Martínez-Sáez, A. Reading challenges in higher education: How suitable are online genres in English for medical purposes. ESP Today 2023, 11, 53–74. [Google Scholar] [CrossRef]
Azadnia, M. A corpus-based analysis of lexical richness in EAP texts written by Iranian TEFL students. Teach. Engl. A Second Lang. Q. (Formerly J. Teach. Lang. Skills) 2021, 40, 61–90. [Google Scholar]

Figure 1. AcademiCraft system architecture.

Figure 2. A sentence revision example using AcademiCraft.

Figure 3. The proportion of grammatical error types in the sentence revisions of different tools in the LOCNESS corpus.

Figure 4. The proportion of grammatical error types in the sentence revisions of different tools in the TCNAEC corpus.

Figure 5. The proportion of grammatical error types in the paragraph revisions of different tools in the LOCNESS corpus.

Figure 6. The proportion of grammatical error types in the paragraph revisions of different tools in the ICNALE corpus.

Figure 7. The proportion of grammatical error types in the chapter revisions of different tools in the LOCNESS corpus.

Figure 8. The proportion of grammatical error types in the chapter revisions of different tools in the ICNALE corpus.

Table 1. Comparison of product features of mainstream commercial EAP writing assistance tools.

Product	Pitaya Pro	Wordvice AI Premium	QuillBot Premium	Effidit	DeepL Write Pro
Charge?	Yes	Yes	Yes	No	Yes
Function	Rewrite	AI Paraphraser	Paraphrasing Tool	Text Polishing	-
Text Limit	6000 words	10,000 words	80,000 words	300 words	None

Note: For Wordvice AI, both free and premium versions provide access to AI Paraphraser with academic definitions, but the free version has a word limit of 500. For Pitaya and QuillBot, the free version lacks access to this feature. For DeepL Write, the primary distinctions between the free and Pro versions lie in usage limits and text count restrictions.

Table 2. Evaluation results of GER.

Text Level	Corpus	Draft	Pitaya	QuillBot	DeepL Write	Wordvice AI	AcademiCraft
			Pro	Premium	Pro	Premium	(Our)
Sentence	LOCNESS	2.90	0.50	0.44	0.91	0.50	0
	TCNAEC	1.65	1.67	1.35	3.65	1.21	1.53
Paragraph	LOCNESS	2.26	0.43	0.08	0.65	0.02	0.06
	ICNALE	1.04	0.06	0.09	0.52	0.06	0.14
Chapter	LOCNESS	3.43	1.36	1.12	1.31	0.12	0.90
	ICNALE	3.89	0.45	0.60	0.79	0	0

Table 3. Evaluation results of statistical metrics.

Text Level	Corpus	Text	TFFL	BWBP	BWTP	SOSNP	SOSVP	SOPNP	SOPVP
Paragraph	LOCNESS	Draft	4.66	0.54	0.19	0.82	0.76	-	-
		Pitaya Pro	4.73	0.42	0.13	0.80	0.25	-	-
		QuillBot Premium	4.72	0.47	0.16	0.83	0.34	-	-
		DeepL Write Pro	4.77	0.53	0.22	0.56	0.46	-	-
		Wordvice AI Premium	4.71	0.44	0.15	1.05	0.34	-	-
		AcademiCraft (Our)	4.66	0.43	0.14	0.82	0.26	-	-
	ICNALE	Draft	4.57	0.49	0.15	0.92	0.59	-	-
		Pitaya Pro	4.68	0.34	0.07	0.84	0.25	-	-
		QuillBot Premium	4.62	0.38	0.08	0.78	0.33	-	-
		DeepL Write Pro	4.75	0.45	0.09	1.22	0.45	-	-
		Wordvice AI Premium	4.67	0.36	0.14	1.17	0.24	-	-
		AcademiCraft (Our)	4.65	0.35	0.06	0.92	0.18	-	-
Chapter	LOCNESS	Draft	4.68	0.41	0.16	0.89	0.58	4.65	5.03
		Pitaya Pro	2.57	0.33	0.09	0.80	0.28	5.93	4.19
		QuillBot Premium	4.73	0.37	0.12	0.83	0.39	6.97	6.99
		DeepL Write Pro	4.79	0.40	0.16	0.80	0.41	8.78	6.71
		Wordvice AI Premium	4.75	0.35	0.10	1.12	0.53	6.31	2.90
		AcademiCraft (Our)	4.69	0.37	0.11	0.95	0.41	7.59	5.47
	ICNALE	Draft	4.57	0.47	0.14	1.08	0.89	5.45	4.71
		Pitaya Pro	4.64	0.36	0.07	0.61	0.23	4.31	1.23
		QuillBot Premium	4.67	0.38	0.10	0.89	0.54	5.20	2.51
		DeepL Write Pro	4.74	0.44	0.14	0.82	0.33	4.34	2.44
		Wordvice AI Premium	4.64	0.33	0.06	1.14	0.16	4.03	0.50
		AcademiCraft (Our)	4.59	0.36	0.08	0.74	0.24	6.49	2.85

Table 4. Evaluation results of performance metrics.

Text Level	Corpus	Text	FRE	GFI	GLUE	LSAFSS	LDAFSS	W2VFSS	LSAFPS	LDAFPS	W2VFPS
Sentence	LOCNESS	Draft	52.68	13.87	-	-	-	-	-	-	-
		Pitaya Pro	33.22	17.27	0.81	-	-	-	-	-	-
		QuillBot Premium	38.76	16.27	0.86	-	-	-	-	-	-
		DeepL Write Pro	28.69	18.94	0.82	-	-	-	-	-	-
		Wordvice AI Premium	32.88	17.32	0.81	-	-	-	-	-	-
		AcademiCraft (Our)	13.10	21.60	0.74	-	-	-	-	-	-
	TCNAEC	Draft	52.66	13.30	-	-	-	-	-	-	-
		Pitaya Pro	22.47	18.48	0.85	-	-	-	-	-	-
		QuillBot Premium	34.27	16.85	0.91	-	-	-	-	-	-
		DeepL Write Pro	25.78	19.38	0.88	-	-	-	-	-	-
		Wordvice AI Premium	10.55	21.83	0.81	-	-	-	-	-	-
		AcademiCraft (Our)	11.37	22.09	0.84	-	-	-	-	-	-
Paragraph	LOCNESS	Draft	53.43	14.84	-	0.47	0.95	0.79	-	-	-
		Pitaya Pro	36.98	17.71	0.89	0.46	0.93	0.78	-	-	-
		QuillBot Premium	35.22	17.74	0.89	0.47	0.91	0.81	-	-	-
		DeepL Write Pro	36.20	18.14	0.89	0.49	0.94	0.82	-	-	-
		Wordvice AI Premium	24.80	20.34	0.85	0.49	0.87	0.78	-	-	-
		AcademiCraft (Our)	35.83	17.66	0.90	0.51	0.88	0.80	-	-	-
	ICNALE	Draft	42.08	17.87	-	0.43	0.84	0.69	-	-	-
		Pitaya Pro	30.40	17.70	0.86	0.39	0.89	0.71	-	-	-
		QuillBot Premium	27.24	18.63	0.84	0.43	0.91	0.75	-	-	-
		DeepL Write Pro	27.90	19.38	0.88	0.47	0.95	0.81	-	-	-
		Wordvice AI Premium	15.08	21.56	0.88	0.45	0.85	0.74	-	-	-
		AcademiCraft (Our)	23.24	18.74	0.89	0.50	0.92	0.80	-	-	-
Chapter	LOCNESS	Draft	57.41	13.53	-	0.49	0.97	0.87	0.57	0.99	0.89
		Pitaya Pro	37.30	16.98	0.90	0.48	0.94	0.88	0.59	0.94	0.90
		QuillBot Premium	34.59	17.77	0.90	0.49	0.95	0.89	0.48	0.79	0.73
		DeepL Write Pro	38.53	16.89	0.89	0.50	0.95	0.89	0.64	0.97	0.91
		Wordvice AI Premium	23.62	20.72	0.84	0.55	0.97	0.91	0.59	0.84	0.78
		AcademiCraft (Our)	41.48	16.35	0.90	0.50	0.96	0.89	0.64	0.97	0.91
	ICNALE	Draft	58.96	13.39	-	0.58	0.97	0.88	0.43	0.58	0.53
		Pitaya Pro	34.74	16.06	0.87	0.53	0.95	0.89	0.42	0.56	0.52
		QuillBot Premium	29.37	17.86	0.82	0.55	0.95	0.90	0.44	0.57	0.52
		DeepL Write Pro	31.86	18.03	0.80	0.58	0.96	0.90	0.47	0.54	0.56
		Wordvice AI Premium	18.22	19.73	0.78	0.64	0.97	0.91	0.35	0.43	0.40
		AcademiCraft (Our)	33.32	16.79	0.86	0.62	0.98	0.91	0.65	0.82	0.76

Bold entries indicate the best performance within each category across different tools.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Du, Z.; Hashimoto, K. AcademiCraft: Transforming Writing Assistance for English for Academic Purposes with Multi-Agent System Innovations. Information 2025, 16, 254. https://doi.org/10.3390/info16040254

AMA Style

Du Z, Hashimoto K. AcademiCraft: Transforming Writing Assistance for English for Academic Purposes with Multi-Agent System Innovations. Information. 2025; 16(4):254. https://doi.org/10.3390/info16040254

Chicago/Turabian Style

Du, Zhendong, and Kenji Hashimoto. 2025. "AcademiCraft: Transforming Writing Assistance for English for Academic Purposes with Multi-Agent System Innovations" Information 16, no. 4: 254. https://doi.org/10.3390/info16040254

APA Style

Du, Z., & Hashimoto, K. (2025). AcademiCraft: Transforming Writing Assistance for English for Academic Purposes with Multi-Agent System Innovations. Information, 16(4), 254. https://doi.org/10.3390/info16040254

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

AcademiCraft: Transforming Writing Assistance for English for Academic Purposes with Multi-Agent System Innovations

Abstract

1. Introduction

2. Related Work

2.1. EAP

2.2. EAP Writing Assistance

Common Challenges

2.3. Prompt Engineering

2.4. AI Agent

3. Objectives

4. Approaches

4.1. Agent Interaction and Workflow Formalization

4.2. Computational Analysis

5. Experiments

5.1. Baselines

5.2. Data

5.3. Metrics

5.3.1. Sentence Revision

5.3.2. Paragraph Revision

5.3.3. Chapter Revision

5.4. Discussion on Metrics

6. Results

6.1. Examples

6.2. GEC Evaluation Results

6.2.1. GER Score

Analysis

6.2.2. The Proportion of Grammatical Error Types

Analysis

6.3. Evaluation Results of Statistical Metrics

Analysis

6.4. Evaluation Results of Performance Metrics

Analysis

6.5. Discussion

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A. The Detailed Settings of Each Agent

Appendix A.1. Arrangement Agent

Appendix A.2. GEC Agent

Appendix A.3. Node Switch Agent

Appendix A.4. Sentence Revision Agent

Appendix A.5. Paragraph Revision Agent

Appendix A.6. Chapter Revision Agent

Appendix A.7. Sentence Evaluation Agent

Appendix A.8. Paragraph Evaluation Agent

Appendix A.9. Chapter Evaluation Agent

Appendix B. Examples of Paragraph and Chapter Revisions Using AcademiCraft

Appendix B.1. An Example of Paragraph Revision

Appendix B.2. An Example of Chapter Revision

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI