Leveraging Explainable AI for LLM Text Attribution: Differentiating Human-Written and Multiple LLM-Generated Text

Najjar, Ayat A.; Ashqar, Huthaifa I.; Darwish, Omar; Hammad, Eman

doi:10.3390/info16090767

Open AccessArticle

Leveraging Explainable AI for LLM Text Attribution: Differentiating Human-Written and Multiple LLM-Generated Text

¹

AI and Data Science Department, Arab American University, Jenin P.O Box 240, Palestine

²

AI Program, Columbia University, New York, NY 10027, USA

³

IoT and Cybersecurity Lab, Eastern Michigan University, Ypsilanti, MI 48197, USA

⁴

iSTAR Lab, Texas A&M University, College Station, TX 77840, USA

^*

Authors to whom correspondence should be addressed.

Information 2025, 16(9), 767; https://doi.org/10.3390/info16090767

Submission received: 18 June 2025 / Revised: 26 July 2025 / Accepted: 7 August 2025 / Published: 4 September 2025

(This article belongs to the Special Issue Generative AI Transformations in Industrial and Societal Applications)

Download

Browse Figures

Versions Notes

Abstract

The development of generative AI Large Language Models (LLMs) raised the alarm regarding the identification of content produced by generative AI vs. humans. In one case, issues arise when students heavily rely on such tools in a manner that can affect the development of their writing or coding skills. Other issues of plagiarism also apply. This study aims to support efforts to detect and identify textual content generated using LLM tools. We hypothesize that LLM-generated text is detectable by machine learning (ML) and investigate ML models that can recognize and differentiate between texts generated by humans and multiple LLM tools. We used a dataset of student-written text in comparison with LLM-written text. We leveraged several ML and Deep Learning (DL) algorithms, such as Random Forest (RF) and Recurrent Neural Networks (RNNs) and utilized Explainable Artificial Intelligence (XAI) to understand the important features in attribution. Our method is divided into (1) binary classification to differentiate between human-written and AI-generated text and (2) multi-classification to differentiate between human-written text and text generated by five different LLM tools (ChatGPT, LLaMA, Google Bard, Claude, and Perplexity). Results show high accuracy in multi- and binary classification. Our model outperformed GPTZero (78.3%), with an accuracy of 98.5%. Notably, GPTZero was unable to recognize about 4.2% of the observations, but our model was able to recognize the complete test dataset. XAI results showed that understanding feature importance across different classes enables detailed author/source profiles, aiding in attribution and supporting plagiarism detection by highlighting unique stylistic and structural elements, thereby ensuring robust verification of content originality.

Keywords:

explainable AI; LLMs; attribution; plagiarism; ChatGPT; Claude; Google Bard; LLaMA

1. Introduction

The recent development of generative AI models is rapidly altering our communication patterns. It is extensively utilized in many fields, including the arts, healthcare, education, and content creation. The subject at hand is the growing risks associated with plagiarism in academic contexts, especially as more and more students turn to LLM tools like ChatGPT, Google Bard, Claude, etc., to produce academic writing [1]. The unethical taking of another person’s work without giving due credit is known as plagiarism, and it has long been a problem in educational institutions [2]. The introduction of sophisticated language models exacerbates this issue. One major concern is that students’ writing skills may deteriorate because of their heavy reliance on AI tools for assignments and essays, which will impede the crucial process of developing their expressive capabilities.

In addition to compromising the integrity of the educational system, this reliance on AI-generated content makes it difficult for teachers to evaluate students’ academic progress and true talents fairly. Therefore, the pressing problem requires a thorough investigation and a calculated response to deal with the complex effects of unrestricted plagiarism made possible by sophisticated language models [3,4].

The importance of this paper lies in its contribution to the evolving field of textual content attribution in a world where human-authored and AI-generated texts, especially from multiple large language models (LLMs), are increasingly indistinguishable. As AI systems like ChatGPT, LLaMA, Google Bard, Claude, and Perplexity [5,6,7,8,9] become more prevalent, ensuring the authenticity and trustworthiness of content has become a major concern in various domains, such as cybersecurity, academic integrity, and business operations [10,11].

This paper addresses a critical need to reliably differentiate between human-written and AI-generated content, which has significant implications for maintaining ethical standards, ensuring information security, and fostering accountability in an AI-driven world. By proposing an effective model that outperforms generalized systems like GPTZero, this research offers a solution to the growing challenge of content attribution, paving the way for more accurate detection and classification of text originating from diverse sources—whether human or multiple AI systems.

In a world where multiple LLMs are generating content for varied purposes [12,13], from automated reports to news articles, the ability to trace the origin of the text is crucial [14]. This study’s approach of utilizing machine learning algorithms and Explainable AI (XAI) provides not only high accuracy but also transparency in classification, making it valuable for ensuring the integrity of digital content across industries. As AI continues to shape communication and content creation, the findings of this paper will play a vital role in shaping how we verify and attribute textual content in a multi-LLM, generative AI environment.

The rest of this study is structured as follows: In Section 2, we offer a thorough analysis of relevant research in the area of LLM-generated text identification using ML. Our suggested strategy is thoroughly explained in Section 3. We assess our approach’s performance and provide the findings in Section 5. Finally, we wrap up our paper in Section 6.

2. Related Work

The study seeks to address the growing challenge of textual content attribution in a world where AI systems, such as LLMs, are increasingly used to generate content. By testing and comparing ML and DL algorithms, this study aims to demonstrate how a fine-tuned, targeted approach can outperform more generalized models in accurately classifying text and ensuring transparency in the decision-making process through XAI techniques. Specifically, we look at two sections: one for multi-classification, in which we distinguish between text written by humans and five different LLM tools (ChatGPT, LLaMA, Google Bard, Perplexity, and Claude); and another for binary classification, in which we distinguish between text generated by LLMs generally and text written by humans.

2.1. Advancements in Text Attribution

These studies explore creative approaches to text attribution, exposing a dynamic evolution in methods that combine linguistic analysis, deep learning, and machine learning. The various methods highlight a dedication to improving accuracy and flexibility in the area of plagiarism detection.

Changing the words or word order in a statement to create a different version is known as paraphrasing or rephrasing. In NLP, identifying paraphrases is a difficult problem. By using the SVM, logistic regression, and RNN algorithm models, a study reported in [15] sought to identify instances of paraphrase plagiarism. The best method was found to be RNN, with 80% accuracy. Using four well-known models—Bag of Words (BOW), Latent Semantic Analysis (LSA), Support Vector Machine (SVM), and Stylometry the study reported in [16] aimed to provide a unified method for plagiarism detection. Using 25 books by different writers, the study analyzed data based on how frequently the Most Common Words (MCWs) were used. The increased weighting approach of the adjusted LSA performed better than the conventional LSA method according to the results. An additional study [17] presents a new approach to detect cross-language plagiarism by machine learning and natural language processing. The procedure unfolds as follows: textual input, translation detection, online search, and report production. Most documents with electronic input can be used with this method. Findings demonstrate that the system can locate instances of Spanish materials that have been plagiarized online from English sources, both by humans and by machines. In 56% of the cases, the system was able to identify the source of plagiarism; this proportion rises to 67% in the case of machine translation. With the primary goal of identifying plagiarism in source codes in mind, the study reported in [18] suggests a plagiarism detector that is insensitive to variations in program statement order or identifiers. The proposed methodology was compared with simulation-based plagiarism detection, integrating many syntax tree components and sequence alignment into the system. Moreover, the authors disclosed how their approach effectively identifies instances of plagiarism. Another study [19] suggests utilizing the word2vec model, which is a model for detecting plagiarism in Arabic literature using deep learning characteristics. This approach evaluates the semantic similarity between Arabic words by using cosine similarity, which provides a highly accurate way to compare vector similarity. The similarity measures illustrate how even minor textual modifications, like swapping out a word or shifting the order of verbs and nouns, can produce results with a similarity value of 99%, making it possible to identify plagiarism, even in cases where test administration modifies the wording or substitutes synonyms for test items.

2.2. Leveraging LLMs for Text Detection

The use of LLMs for text detection has been the subject of the following studies, which highlight how they can improve textual content recognition accuracy and contextual understanding. This section examines relevant papers that explore the use of LLMs to improve text detection capabilities.

A review of existing LLM-generated text detection methods is provided by two surveys [20,21]. The purpose of the first survey [20] was to improve language generation model control and regulation while offering a summary of current LLM-generated text detection methods. In addition, we highlight important directions for future work to advance LLM-generated text detection, such as the creation of thorough assessment criteria and the danger of open-source LLMs. The second survey [21] gathers the most recent findings in this field of study and emphasizes the urgent necessity of supporting detector research. It also delves into widely used datasets, explaining their shortcomings and future development needs. Moreover, it examines different LLM-generated text recognition paradigms, illuminating issues such as data ambiguity, possible assaults, and out-of-distribution issues. In summary, it indicates promising avenues for further investigation into LLM-generated text detection to progress the application of responsible artificial intelligence (AI). With this study, we hope to give novices a thorough introduction to the topic of LLM-generated text identification, as well as provide a useful update for seasoned researchers.

In order to conduct a comparison analysis, a novel dataset comprising human-written and LLM-generated texts throughout the many genres included in this study [22] such as stories, poems, essays, and Python code is introduced. The results demonstrate how well various machine learning models can differentiate between text created by AI and human input, with the best results being observed in binary classification tasks. However, there are problems with categorizing GPT-generated text, especially in narrative composition, which highlights the intricacy of multiclass assignments involving many LLMs. The dataset provides a basis for further research on this rapidly evolving subject, and these insights have significant implications for AI text identification. An additional study [23] initiative involves educating the public on the effectiveness of LLM-generated text detectors and their utilization in upholding academic integrity. According to their findings, GPTKit is best for minimizing false positives, GLTR is the most robust, and Copy Leaks is the most accurate detector. On the other hand, GPTZero’s false positives raise certain issues. The study highlights the detectors’ shortcomings with respect to code, non-English languages, and paraphrased information, highlighting the want for continual advancements to offer a complete remedy to maintain academic integrity. We also propose ways to improve detector usability, including simplifying API integration, providing clear documentation, and supporting widely used languages. Finally, this study [24] investigates a novel LLM-based strategy for data race detection that combines fine-tuning and motivating engineering methods. After DataRaceBench was used to create the specific DRB-ML dataset, fine-grain labels describing data race pairs, related variables, line numbers, and read/write information were added. By assessing exemplar LLMs and optimizing publicly available ones with DRB-ML, the highlights the practicality of LLMs in data race identification. But they are not as effective as more conventional methods, especially when it comes to providing specific details on variable pairs that result in data races.

3. Methodology

Generally, data preparation and feature selection processes from a generated dataset play important roles in simplifying the subsequent tasks, like the classification task, leading to improved classification rates. This study proposes a framework for detecting texts written by LLMs, as shown in Figure 1, which includes four main phases: data preprocessing, feature selection, machine learning model, and detection and classification. The following sections describe each step of this framework.

3.1. Dataset Collation and Generation

The dataset has 600 observations and was compiled in November 2023. It has 300 observations written by humans that extracted from the Kaggle dataset for the detection of texts written by LLMs [25]. This dataset has the following features:

id—A unique identifier for each essay.
prompt_id—Identifies the prompt for which the essay was produced. Two essays are available: “Car-free cities”, rated “0”; and “Does the electoral college work?” rated “1”.
text—The essay text itself.
generated—Whether the essay was written by an LLM (“1”) or by a student (“0”).

The author produced another 300 essays from five different LLMs (ChatGPT, LLaMA, Google Bard, Claude, and Perplexity) by asking each LLM to generate 30 essays for each of the essay subjects mentioned before. There are three text-based attributes that each entry has: “text”, “category”, and “subcategory”. By bridging the gaps between the academic and corporate worlds, this research project significantly impacts the fields of LLMs and plagiarism detection while also advancing the field of NLP.

To clarify, our methodology for generating the LLM-authored documents was designed to mirror the human-written dataset as closely as possible. Specifically, we used the same prompts and instructions originally given to the student participants to generate comparable outputs from each of the five LLMs (ChatGPT, LLaMA, Bard, Claude, and Perplexity). The prompts were carefully controlled across all sources to maintain topical consistency. This approach ensured that both human- and machine-generated texts addressed the same subject matter, lexical domains, and rhetorical expectations.

3.2. Data Preprocessing and Feature Selection

A preliminary check was conducted during the data preprocessing phase to make sure no empty observations were detected. The text data was improved using common preparation techniques for NLP jobs. This featured stop-word removal, lemmatization, punctuation removal, and tokenization of text [26]. Putting the text data into a clean, structured format suited for classification and other NLP tasks helped prepare the dataset for later analysis and model creation. The word cloud for the two classes of “human” and “LLM” is displayed in Figure 2, and Table 1 displays the word frequencies for the two classes as counts and percentages. After the data pretreatment stages, the following phase involved converting the category classifications into numerical representations. The use of machine learning algorithms that require numerical inputs was made possible by this change. The “category” feature, which distinguishes between “human” and “LLM” content, was specifically encoded into numerical labels, with the “human” class being represented by 0 and the “LLM” class being represented by 1.

Table 1. Top 10 words for ‘human’ and ‘LLM’ classes (counts and percentages of total tokens).

Words Frequency–Human Texts			Words Frequency–LLM Texts
Word	Count	Percentage %	Word	Count	Percentage %
car	2464	2.72	system	337	2.28
vote	2163	2.39	electoral	319	2.15
people	1360	1.50	vote	317	2.14
state	1121	1.24	college	303	2.05
Electoral	958	1.06	state	225	1.52
would	901	0.99	popular	182	1.23
college	884	0.97	car	178	1.20
not	767	0.85	ensure	177	1.20
electoral	750	0.83	would	161	1.09
use	656	0.72	dear	150	1.01

The dataset was then separated into training and testing subsets using an 80/20 split, with 80% of the data going toward training and 20% for testing. This division played a critical role in the model evaluation process by evaluating the model’s performance on unseen data. Finally, a TF-IDF (Term Frequency–Inverse Document Frequency) vectorizer was used to make it easier to convert the text data into a machine learning-friendly format [27]. By transforming the text data into a matrix of numerical features, this approach was able to capture the significance of words inside each document while considering their frequency across the entire dataset. With “0” denoting the “human” class and “1” denoting the “llms” class, the resulting TF-IDF vectors served as the basis for the training of machine learning models on this dataset, allowing for the creation of classifiers to differentiate between human- and “LLM”-generated texts. Figure 3 displays the top 10 words with the highest TF-IDF weights for the two classes of “human” and “LLM”.

3.3. Classification Algorithms

In the field of ML and classification, training models to produce predictions and classify texts authored by LLMs is an interesting and significant task. This technology allows computers to identify patterns in data and perform actions based on those patterns by using the RF and XGBoost algorithms [21] and RNNs [22]. The precise identification of relevant text is made easier by this technique.

RF is a robust ensemble learning method that is well-known for its performance in both classification and regression applications. With the help of a group of decision trees, it performs exceptionally well at managing complex datasets and reducing overfitting. Random Forest is a highly favored option in numerous fields due to its adaptability and resilience, yielding precise and dependable forecasts, as well as valuable insights via feature importance analysis. The study also explored the rapidly changing neural network landscape, utilizing the power of RNNs with a focus on texts generated by LLMs analysis [23]. These cutting-edge deep learning approaches have produced ground-breaking capabilities, automating the complex pattern extraction of LLM texts.

XGBoost: Extreme Gradient Boosting, or XGBoost, is an ML technique that is notable for its great efficiency and scalability. The XGBoost algorithm, which is part of the gradient boosting family, performs exceptionally well and accurately in predictive modeling applications. XGBoost is a preferred option in many industries, including finance and healthcare, due to its capacity to manage complicated relationships in data, regularization techniques, and parallel processing. Acknowledged for its swiftness and efficiency, XGBoost has established itself as a mainstay in both practical and competitive ML scenarios.

RNNs: RNNs are neural networks specifically constructed for sequential data processing. Because they have internal memory, they can retain knowledge about earlier inputs, which makes them appropriate for tasks like time-series analysis and natural language processing. Even though they are good at capturing temporal dependencies, classic RNNs have problems with things like vanishing gradients. To overcome these constraints, more sophisticated topologies have been developed, such as Gated Recurrent Units (GRUs) and Long Short-Term Memory (LSTM) networks, which improve RNNs’ ability to represent and learn intricate sequential patterns.

Our text detection framework employs a two-stage classification process. Initially, binary classification distinguishes between two classes. Subsequently, multi-classification refines the identification by categorizing text into five classes, ensuring a comprehensive and nuanced approach to text detection.

3.3.1. Binary Classification

We added a column named “category” to the dataset that indicates two classes: “human” to indicate that the text is written by a human and “LLM” to indicate that the text is generated by one of the LLMs that we used (ChatGPT, LLaMA, Google Bard, Claude, or Perplexity).

3.3.2. Multi-Classification

We added a column named “subcategory” to the dataset that indicates two classes: “human” to indicate that the text is written by a human and another four classes—“chatgpt”, “LLaMA”, “Google Bard”, “Claude”, or “Perplexity”—to indicate that the text is generated by one of the mentioned LLMs.

3.4. Explainable Artificial Intelligence (XAI)

In recent years, artificial intelligence has advanced significantly, sparking interest in previously understudied fields. The focus has shifted from solely focusing on model performance as AI advances to requiring experts to look at algorithmic decision-making processes and the logic behind AI models’ output. As modern ML algorithms—especially “deep learning” ones using black-box techniques—become more powerful and complex, making it difficult to understand how they behave and why specific outcomes were achieved or mistakes were made, explainable artificial intelligence (XAI) systems are becoming more and more necessary. However, understanding those models’ behaviors is equally as crucial as their outputs, allowing users to develop the proper level of trust and reliance [24,25].

In the field of XAI, Local Interpretable Model-agnostic Explanations (LIME) is a crucial instrument that provides a way to understand how sophisticated ML models make decisions. Because LIME is based on a model-agnostic premise, which was developed by Ribeiro et al. in 2016 [26,27], it can offer visible and interpretable insights into the predictions of different black-box models. LIME generates locally faithful approximations through perturbed samples around individual instances, enabling users to understand the reasoning behind individual predictions. Its interpretability-enhancing capabilities and adaptability have led to LIME’s widespread adoption in various domains, where it is a valuable resource for researchers and practitioners seeking transparency in the decision-making processes of complex ML algorithms.

4. Experimental Results and Discussion

This section assesses and examines the performance of different ML algorithms. An 11th generation Intel(R) Core (TM) i5-1135G7 @ 2.40 GHz processor, 16.0 GB of RAM, and a 64-bit operating system were used in the experiment, as well as Jupyter notebook, which was utilized to program in Python.

In the first stage of our work, we focused on using the previously mentioned algorithms to distinguish texts written by LLMs from texts written by humans, which is a binary classification task. The accuracy results for the three algorithms ranged from 94% to 98%, as shown in Table 2. Next, we decided to distinguish between texts written by humans and texts generated by different LLMs, which is a multi-classification task. The accuracy results ranged from 71% to 97% and are shown in Table 3.

The results demonstrate that, when it comes to binary classification, the accuracy of distinguishing between texts written by humans and LLMs is very high because literature created by humans exhibits a profound awareness of context, drawing from individual experiences and cultural quirks and demonstrating creativity, emotional intelligence, and moral judgment. LLMs, on the other hand, produce replies based on learned patterns from enormous datasets and lack actual comprehension [28]. However, when multi-classification was discussed, it was noted that because there are already four classes of LLMs that might display similar characteristics in text composition, the accuracy tends to be lower than in binary classification.

4.1. Model Evaluation

4.1.1. Binary Classification

In the binary classification, we used three different ML and NN techniques (RF, XGBoost, and RNN), as shown in Table 2. Using RF, XGBoost, and RNN, we obtained an accuracy ranged of 94% to 98% and precision and recall in about the same range. This shows that the three algorithms correctly detected most of the observations and successfully distinguished between texts written by humans and texts generated by LLMs.

Table 2. Binary classification accuracy results.

Algorithm	Accuracy	Precision	Recall	F1 Score
RF	97%	96%	95%	96%
XGBoost	98%	97%	98%	98%
RNN	94%	93%	94%	94%

4.1.2. Multi-Classification

As shown in Table 3 and Figure 4, the multi-classification study aims to differentiate texts produced by five distinct LLMs: ChatGPT, LLaMA, Google Bard, Claude, and Perplexity. We present the ROC curve and conclusions from the confusion matrix. Notably, we saw an excellent TP rate for the RNN and an excellent TP rate for the RF and XGBoost algorithms. For the “human” class, all three algorithms showed an exact 100% TP rate. For the other classes, they produced positive results; however, RNN showed a 12.5% TP rate for “claude”, suggesting that it confused “human”, “chatgpt”, and “Bard”. Furthermore, a 62.5% TP rate for “llama” indicated that it was difficult to differentiate it from the terms “human”, “chatgpt”, and “perplexity”. These observations point out strong points and possible misunderstandings in the classification outcomes.

Table 3. Multi-classification accuracy results.

Algorithm	Accuracy	Precision	Recall	F1 Score
RF	97%	93%	94%	93%
XGBoost	94%	90%	90%	89%
RNN	88%	90%	72%	74%

4.2. Explainable AI

In this part, we examine how the RF algorithm used in this study affects multi-classification detection. We will employ LIME for XAI in our investigation. By providing insights into the reasons impacting the model’s conclusions in the field of plagiarism detection, this technique improves transparency.

Figure 5 shows the top 10 important features for the “chatgpt”, “Bard”, “claude”, “llama”, “perplexity”, and “human” classes in the RF model, as generated by LIME. These visualizations clarify the precise words that have a major influence on the model’s predictions in each class using bar plots. These visualizations, which take advantage of LIME’s local interpretability, help to clarify the decision-making process of the black-box model by providing succinct insights into the critical characteristics driving classification results for various text categories.

The bar charts offer insights into the most critical features for predicting each class in the RF model, highlighting the unique and shared features that are important across the different classes of “Bard”, “chatgpt”, “claude”, “llama”, “perplexity”, and “human”. These insights reveal significant differences and similarities in feature importance, reflecting the distinct characteristics and commonalities among the classes. The top features for “Bard” include terms related to systematic and structural elements such as “focus”, “transportation”, “several”, “vote”, “car”, “range”, “ensure”, “without”, “concern”, and “system”. The emphasis on “system” suggests a focus on organized, methodological aspects. For “chatgpt”, the critical features include “good”, “find”, “embrace”, “consider”, “trust”, “citizen”, “urban”, “limit”, “letter”, and “usage”. These features highlight a mixture of qualitative assessments (e.g., “good” and “trust”) and operational terms (e.g., “usage” and “urban”), suggesting a balance between subjective evaluation and practical application in the “chatgpt” context. The important features for “claude” include words like “city”, “national”, “excessive”, “people”, “thank”, “sincerely”, “thank”, “vote”, “rational”, and “please”. This class appears to prioritize social and formal interaction elements, with a notable emphasis on politeness and civic engagement (e.g., “please”, “vote”, and “thank”). “Llama” is characterized by features such as “election”, “usage”, “process”, “equal”, “limit”, “voice”, “alternative”, “also”, “ensure”, and “sincerely”. These features suggest a focus on procedural and democratic elements, reflecting themes of equality, participation, and alternative approaches. The top features for “perplexity” include “reduce”, “system”, “pressure”, “ensure”, “help”, “drive”, “less”, “Senator”, “individual”, and “dear”. This class seems to emphasize reduction and efficiency (e.g., “reduce” and “less”), systemic approaches, and individual importance, indicating a concern with optimization and personal significance. For “human”, the critical features include “do”, “thing”, “much”, “many”, “say”, “way”, “get”, “go”, “people”, and “not”. These features reflect a focus on action and communication, with a high frequency of common verbs and pronouns, indicating practical, everyday human activities and interactions.

However, across all classes, certain themes, such as procedural terms (“ensure” and “system”) and evaluative terms (“consider”, “trust”, and “good”) appear frequently. Words like “ensure” and “usage“ are common in multiple classes, indicating their broad relevance and importance in predicting various outcomes. The ”Bard“ and “claude” classes have unique terms that focus on structure and civility, respectively, highlighting their distinct contexts. “Chatgpt” and “llama” include more operational and procedural terms, reflecting their applications in practical and democratic settings. “Perplexity” and “human” classes emphasize efficiency and everyday human activities, respectively, indicating their specific focuses on optimization and practical action.

Table 4 and Figure 6 explain the selected text instance using LIME. The output provides insight into the local decision-making process of the model and consists of the instance text, true label, predicted label, and a visual representation of the top ten features impacting the prediction. This graphic, which has its basis in LIME explanations, aids in the interpretation of the data by making clear the specific terms that significantly influence the expected result.

The analysis of feature importance across different classes in the RF model can be highly practical in detecting plagiarism. By understanding the key features that distinguish various classes, we can develop robust models to identify and flag potentially plagiarized content. Each chart provides insights into the most critical features for predicting each class, such as “ensure”, “system”, and “trust” for Bard or “good”, “find”, and “embrace” for ChatGPT. These terms might be consistently used by specific LLM tools, making them important markers of their writing style.

By analyzing the most important features for different classes, we can create detailed profiles of each LLM. These profiles can help in identifying stylistic and structural elements unique to specific sources. When new content is analyzed, the model can compare it against these established profiles to detect discrepancies or similarities that might indicate plagiarism. This method not only highlights direct copying but also more subtle forms of plagiarism where the content has been paraphrased or lightly edited. Thus, the detailed understanding of feature importance aids in the creation of a nuanced and effective plagiarism detection system.

Table 4. The prediction probabilities for a specific instance.

Instance Text	True Label	Predicted Label	Prediction Probabilities
“dear Senator write express strong support continue use Electoral College presidential election process proud citizen great nation believe crucial maintain principle fairness equality upon democracy found Electoral College ensure small state voice election process promote coalitionbuilde national unity serve vital check tyranny majority implore uphold integrity democratic system reject attempt abolish Electoral College sincerely”	llama	llama

4.3. Evaluation

One of the most challenging duties is addressed in this section: comparing our model’s accuracy to standards set by the industry. To exceed their accuracy levels, we benchmarked against the GPTZero [29,30] in our comparison. We are certain that our prototype has the potential to be further developed and is scalable for commercial use, even with a smaller training dataset than industry-level norms.

GPTZero is an application of AI detection software that Edward Tian, an undergraduate student at Princeton University, created to identify artificially generated text, especially from huge language models. GPTZero, which was introduced in January 2023 to address worries about AI-driven academic plagiarism, has received praise for its work but has also drawn criticism for producing false positives, particularly in situations where academic integrity is at risk. The program uses burstiness and perplexity metrics to identify passages that are created by bots [31]. Burstiness examines phrase patterns for differences, whereas perplexity measures text randomization and odd construction based on language model prevalence. Human text has greater diversity than content generated by AI. In previous studies comparing GPTZero and ChatGPT’s efficacy in assessing fake queries and medical articles [32], GPTZero was utilized in multiple investigations. This study found that GPTZero had low false-positive and high false-negative rates. A second analysis of more than a million tweets and academic papers looked at opinions regarding ChatGPT’s capacity for plagiarism [33]. It contrasted it with the lack of interest in GPTZero, which is intended to prevent plagiarism caused by artificial intelligence. The difficulties and possibilities for both models were determined using sophisticated natural language processing techniques, providing information for further conversational AI research. The used classes in GPTZero can be seen in Table 5.

In the binary classification, we used GPTZero to recognize text. To do this, a test dataset, which makes up 20% of the original dataset used for binary classification, must be extracted. A summary of the detection process’s results can be seen in Table 6. While GPTZero showed about 78.3% accuracy, not identifying every occurrence, our algorithms showed a higher accuracy of about 97.5%.

Table 5. GPTZero classes.

Class Name	GPTZero Message	AI Text Percentage
Human	“This text is most likely to be written by a human”	0–10%
Different Result	“Our ensemble of detectors predicts different results for this text. Please enter more text for more precise predictions.”	11–39%
Mix	“This text is likely to be a mix of human and AI text”	40–88%
AI	“This text is likely to be written by AI”	89–100%
Not Recognized	“Try typing in some more text (>250 characters) so we can give you accurate results”	The total text is less than 250 characters

Table 6. Comparison with the GPTZero tool.

	Class	Human	AI	Mix	Different Result	Not Recognized	Accuracy
GPTZero	Human	59	1	0	8	0	78.3%
GPTZero	LLMs	5	35	7	0	5	78.3%
Our Model	Human	66	2	-	-	-	97.5%
Our Model	LLMs	1	51	-	-	-	97.5%

5. Discussion

It is important to note that LLMs are usually trained on vast corpora of human-generated text, and as a result, they often mimic human-like patterns, styles, and structures. This makes the task of distinguishing between human-written and LLM-generated text inherently challenging and highlights the importance of rigorous experimental design and interpretation. Nonetheless, the core objective of our study is not to claim a universal, topic-agnostic ability to distinguish between all human and all machine writing but, rather, to explore whether, under controlled conditions with identical prompts, constrained domains, and consistent stylistic expectations, it is possible to detect subtle statistical and linguistic differences that are characteristic of current LLMs. These distinctions might not be obvious to human readers, but through machine learning models and explainable AI (XAI), we can uncover traceable and interpretable markers in syntax, structure, or token distribution that signal LLM authorship.

Practically, this research contributes to the growing field of textual attribution, with relevance to academic integrity, plagiarism detection, and misinformation auditing. Our experimental setting, by using the same writing tasks for both LLMs and humans, controls for topic bias and highlights differences that emerge specifically from the generative process. The high performance of our models does not imply a fundamental separation between “human” and “machine” writing in general but, rather, shows that detectable patterns exist within this generation context, and those patterns can be modeled. Conceptually, the value of our study lies in (1) demonstrating that multi-class attribution (e.g., distinguishing which LLM wrote the text)—not just binary human/AI detection—is feasible; (2) showing how explainable AI (e.g., LIME) can surface interpretable markers of different LLMs, potentially enabling tools for educators and publishers to audit content sources; and (3) providing a foundation for future models that are less reliant on large, labeled datasets and more focused on style, structure, and attribution transparency.

In future work, we aim to explore domain transfer and adversarial scenarios where style convergence is even greater, as well as examine how prompt engineering or fine-tuning may reduce detectability.

6. Conclusions

In this study, we aimed to address the growing challenge of textual content attribution in a world where LLMs are increasingly used to generate content. We compared ML and DL algorithms, demonstrating how a fine-tuned, targeted approach can outperform more generalized models in accurately classifying text and ensuring transparency in the decision-making process through XAI techniques. This is achieved by looking into how to identify these texts using ML and XAI techniques in two sections: For multi-classification, we differentiated between five different LLM tools (ChatGPT, LLaMA, Google Bard, Claude, and Perplexity) and human-written text, and we found that RF achieved the best result, with 97% accuracy. For binary classification, we distinguished between text generated by LLMs generally and text written by humans, and the RF, XGBoost, and RNN algorithms achieved the highest accuracy of 98%. Our model outperformed GPTZero (78.3%) with 98.5% accuracy. Notably, GPTZero was unable to recognize about 4.2% of the observations, but our model was able to recognize the complete test dataset.

XAI showed that there are critical features for predicting each class, such as “ensure”, “system”, and “trust” for Bard or “good”, “find”, and “embrace” for ChatGPT. These terms might be consistently used by specific LLM tools, making them important markers of their writing style. Analyzing feature importance across various classes provides valuable insights for detecting plagiarism by identifying distinct stylistic and structural elements unique to different authors. This approach not only helps in solving the problem of textual content attribution and spotting direct copying but also in recognizing more subtle forms of plagiarism, thereby ensuring thorough and accurate verification of content originality.

As part of future work, we plan to explore domain adaptation techniques to assess the robustness of our models across new topics without significantly expanding the dataset. We also intend to investigate prompt sensitivity and syntactic perturbations to understand how LLM attribution models behave under varying writing conditions.

Author Contributions

Conceptualization, A.A.N., H.I.A., O.D., and E.H.; Formal analysis, A.A.N.; Funding acquisition, O.D.; Investigation, H.I.A.; Methodology, A.A.N., H.I.A., and O.D.; Project administration, H.I.A. and E.H.; Resources, O.D.; Software, A.A.N.; Supervision, H.I.A. and O.D.; Validation, A.A.N. and H.I.A.; Visualization, A.A.N.; Writing—original draft, A.A.N.; Writing—review and editing, H.I.A. and E.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data is available upon request.

Acknowledgments

Publication made possible in part by support from Eastern Michigan University’s Faculty Open Access Publishing Fund, administered by the Associate Provost and Vice President for Graduate Studies and Research, with assistance from the EMU Library.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Hadi, M.U.; Qureshi, R.; Shah, A.; Irfan, M.; Zafar, A.; Shaikh, M.B.; Akhtar, N.; Wu, J.; Mirjalili, S. A survey on large language models: Applications, challenges, limitations, and practical usage. Authorea Prepr. 2023. [Google Scholar] [CrossRef]
Awasthi, S. Plagiarism and academic misconduct: A systematic review. Desidoc J. Libr. Inf. Technol. 2019, 39, 94–100. [Google Scholar] [CrossRef]
Tami, M.; Ashqar, H.I.; Elhenawy, M. Automated Question Generation for Science Tests in Arabic Language Using NLP Techniques. arXiv 2024, arXiv:2406.08520. [Google Scholar] [CrossRef]
Sammoudi, M.; Habaybeh, A.; Ashqar, H.I.; Elhenawy, M. Question-Answering (QA) Model for a Personalized Learning Assistant for Arabic Language. arXiv 2024, arXiv:2406.08519. [Google Scholar] [CrossRef]
AopenAI. Introducing ChatGPT. 2023. Available online: https://openai.com/blog/chatgpt (accessed on 2 February 2024).
Jayaseelan, N. lama 2, A New Intelligent Open Source Language Model. 2024. Available online: https://www.e2enetworks.com/blog/llama-2-the-new-open-source-language-model (accessed on 2 February 2024).
Team, S. Google Bard: Uses, Limitations, and Tips for More Helpful Answers. 2024. Available online: https://www.semrush.com/blog/google-bard/? (accessed on 2 February 2024).
Anthropic. Introducing Claude. 2024. Available online: https://www.anthropic.com/index/introducing-claude (accessed on 2 February 2024).
Perplexity. Introducing PPLX Online LLMs. 2024. Available online: https://blog.perplexity.ai/blog/introducing-pplx-online-llms (accessed on 2 February 2024).
Radwan, A.; Amarneh, M.; Alawneh, H.; Ashqar, H.I.; AlSobeh, A.; Magableh, A.A.A.R. Predictive analytics in mental health leveraging llm embeddings and machine learning models for social media analysis. Int. J. Web Serv. Res. (IJWSR) 2024, 21, 1–22. [Google Scholar] [CrossRef]
Masri, S.; Raddad, Y.; Khandaqji, F.; Ashqar, H.I.; Elhenawy, M. Transformer Models in Education: Summarizing Science Textbooks with AraBART, MT5, AraT5, and mBART. arXiv 2024, arXiv:2406.07692. [Google Scholar]
Jaradat, S.; Nayak, R.; Paz, A.; Ashqar, H.I.; Elhenawy, M. Multitask Learning for Crash Analysis: A Fine-Tuned LLM Framework Using Twitter Data. Smart Cities 2024, 7, 2422–2465. [Google Scholar] [CrossRef]
Kim, J.K.; Chua, M.; Rickard, M.; Lorenzo, A. ChatGPT and large language model (LLM) chatbots: The current state of acceptability and a proposal for guidelines on utilization in academic medicine. J. Pediatr. Urol. 2023, 19, 598–604. [Google Scholar] [CrossRef] [PubMed]
Nam, D.; Macvean, A.; Hellendoorn, V.; Vasilescu, B.; Myers, B. Using an llm to help with code understanding. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering, New York, NY, USA, 14–20 April 2024; pp. 1–13. [Google Scholar]
Hunt, E.; Janamsetty, R.; Kinares, C.; Koh, C.; Sanchez, A.; Zhan, F.; Ozdemir, M.; Waseem, S.; Yolcu, O.; Dahal, B.; et al. Machine learning models for paraphrase identification and its applications on plagiarism detection. In Proceedings of the 2019 IEEE International Conference on Big Knowledge (ICBK), Beijing, China, 10–11 November 2019; pp. 97–104. [Google Scholar]
AlSallal, M.; Iqbal, R.; Amin, S.; James, A.; Palade, V. An integrated machine learning approach for extrinsic plagiarism detection. In Proceedings of the 2016 9th International Conference on Developments in eSystems Engineering (DeSE), Liverpool, UK, 31 August–2 September 2016; pp. 203–208. [Google Scholar]
Anguita, A.; Beghelli, A.; Creixell, W. Automatic cross-language plagiarism detection. In Proceedings of the 2011 7th International Conference on Natural Language Processing and Knowledge Engineering, Tokushima, Japan, 27–29 November 2011; pp. 173–176. [Google Scholar]
Kikuchi, H.; Goto, T.; Wakatsuki, M.; Nishino, T. A source code plagiarism detecting method using alignment with abstract syntax tree elements. In Proceedings of the 15th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), Las Vegas, NV, USA, 30 June–2 July 2014; pp. 1–6. [Google Scholar]
Suleiman, D.; Awajan, A.; Al-Madi, N. Deep learning based technique for plagiarism detection in Arabic texts. In Proceedings of the 2017 International Conference on New Trends in Computing Sciences (ICTCS), Amman, Jordan, 11–13 October 2017; pp. 216–222. [Google Scholar]
Tang, R.; Chuang, Y.N.; Hu, X. The science of detecting llm-generated text. Commun. Acm 2024, 67, 50–59. [Google Scholar] [CrossRef]
Wu, J.; Yang, S.; Zhan, R.; Yuan, Y.; Wong, D.F.; Chao, L.S. A survey on llm-gernerated text detection: Necessity, methods, and future directions. arXiv 2023, arXiv:2310.14724. [Google Scholar]
Hayawi, K.; Shahriar, S.; Mathew, S.S. The imitation game: Detecting human and ai-generated texts in the era of large language models. arXiv 2023, arXiv:2307.12166. [Google Scholar]
Orenstrakh, M.S.; Karnalim, O.; Suarez, C.A.; Liut, M. Detecting llm-generated text in computing education: A comparative study for chatgpt cases. arXiv 2023, arXiv:2307.07411. [Google Scholar]
Chen, L.; Ding, X.; Emani, M.; Vanderbruggen, T.; Lin, P.H.; Liao, C. Data race detection using large language models. In Proceedings of the SC’23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis, Denver, CO, USA, 12–17 November 2023; pp. 215–223. [Google Scholar]
Lab, T.L.A. LLM—Detect AI Generated Text | Kaggle. 2024. Available online: https://www.kaggle.com/competitions/llm-detect-ai-generated-text/data (accessed on 2 February 2024).
Tabassum, A.; Patil, R.R. A survey on text pre-processing & feature extraction techniques in natural language processing. Int. Res. J. Eng. Technol. (IRJET) 2020, 7, 4864–4867. [Google Scholar]
Abubakar, H.D.; Umar, M.; Bakale, M.A. Sentiment classification: Review of text vectorization methods: Bag of words, Tf-Idf, Word2vec and Doc2vec. Slu J. Sci. Technol. 2022, 4, 27–33. [Google Scholar] [CrossRef]
Bender, E.; Friedman, B. Data statements for NLP: Toward Mitigating System Bias and Enabling Better Science. Trans. Assoc. Comput. Linguist. 2018, 6, 587–604. [Google Scholar] [CrossRef]
GPTZero. GPTZero|The Trusted AI Detector for ChatGPT, GPT-4, and More. 2024. Available online: https://gptzero.me/ (accessed on 2 February 2024).
Svrluga, S. Princeton Student Creates GPTZero Tool to Detect ChatGPT-Generated Text. 2024. Available online: https://www.washingtonpost.com/education/2023/01/12/gptzero-chatgpt-detector-ai/ (accessed on 2 February 2024).
Wikipedia. GPTZero. 2024. Available online: https://en.wikipedia.org/wiki/GPTZero (accessed on 2 February 2024).
Habibzadeh, F. GPTZero performance in identifying artificial intelligence-generated medical texts: A preliminary study. J. Korean Med. Sci. 2023, 38, 1516083870. [Google Scholar] [CrossRef] [PubMed]
Heumann, M.; Kraschewski, T.; Breitner, M.H. ChatGPT and GPTZero in Research and Social Media: A Sentiment-and Topic-Based Analysis. Available online: https://aisel.aisnet.org/amcis2023/sig_hci/sig_hci/6 (accessed on 6 July 2025).

Figure 1. The general workflow of the proposed ML model for detecting and classifying texts written by LLMs.

Figure 2. Word cloud for (a) ‘human’ and (b) ‘LLM’ classes.

Figure 3. Top 10 words with maximum TF-IDF weights for (a) ‘human’ and (b) ‘LLM’ classes.

Figure 4. Confusion matrices for multi-classification in percentage for (a) RF, (b) XGBoost, and (c) RNN.

Figure 5. The top 10 important features for the (a) “Bard”, (b) ”chatgpt”, (c) “claude”, (d) “llama”, (e) “perplexity”, and (f) “human” classes in the RF model.

Figure 6. The top 10 important features for the (a) “Bard”, (b) ”chatgpt”, (c) claude, (d) ”llama”, (e) “perplexity”, and (f) “human” classes in the RF model for a specific instance.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Najjar, A.A.; Ashqar, H.I.; Darwish, O.; Hammad, E. Leveraging Explainable AI for LLM Text Attribution: Differentiating Human-Written and Multiple LLM-Generated Text. Information 2025, 16, 767. https://doi.org/10.3390/info16090767

AMA Style

Najjar AA, Ashqar HI, Darwish O, Hammad E. Leveraging Explainable AI for LLM Text Attribution: Differentiating Human-Written and Multiple LLM-Generated Text. Information. 2025; 16(9):767. https://doi.org/10.3390/info16090767

Chicago/Turabian Style

Najjar, Ayat A., Huthaifa I. Ashqar, Omar Darwish, and Eman Hammad. 2025. "Leveraging Explainable AI for LLM Text Attribution: Differentiating Human-Written and Multiple LLM-Generated Text" Information 16, no. 9: 767. https://doi.org/10.3390/info16090767

APA Style

Najjar, A. A., Ashqar, H. I., Darwish, O., & Hammad, E. (2025). Leveraging Explainable AI for LLM Text Attribution: Differentiating Human-Written and Multiple LLM-Generated Text. Information, 16(9), 767. https://doi.org/10.3390/info16090767

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Leveraging Explainable AI for LLM Text Attribution: Differentiating Human-Written and Multiple LLM-Generated Text

Abstract

1. Introduction

2. Related Work

2.1. Advancements in Text Attribution

2.2. Leveraging LLMs for Text Detection

3. Methodology

3.1. Dataset Collation and Generation

3.2. Data Preprocessing and Feature Selection

3.3. Classification Algorithms

3.3.1. Binary Classification

3.3.2. Multi-Classification

3.4. Explainable Artificial Intelligence (XAI)

4. Experimental Results and Discussion

4.1. Model Evaluation

4.1.1. Binary Classification

4.1.2. Multi-Classification

4.2. Explainable AI

4.3. Evaluation

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI