How ChatGPT’s Semantic Parrotting (Compared to Gemini’s) Impacts Text Summarization with Literary Text

Delmonte, Rodolfo; Marchesini, Giulia; Busetto, Nicolò

doi:10.3390/info16080623

Open AccessArticle

How ChatGPT’s Semantic Parrotting (Compared to Gemini’s) Impacts Text Summarization with Literary Text

by

Rodolfo Delmonte

^1,*

,

Giulia Marchesini

¹ and

Nicolò Busetto

²

¹

Department of Language Science, Ca Foscari University, Ca Bembo, 30123 Venezia, Italy

²

Meta, Menlo Park, CA 94025, USA

^*

Author to whom correspondence should be addressed.

Information 2025, 16(8), 623; https://doi.org/10.3390/info16080623

Submission received: 29 January 2025 / Revised: 26 June 2025 / Accepted: 4 July 2025 / Published: 22 July 2025

(This article belongs to the Special Issue Natural Language Processing (NLP) with Applications and Natural Language Understanding (NLU))

Download Versions Notes

Abstract

In this paper we explore ChatGPT’s ability to produce a summary, a precis, and/or an essay on the basis of excerpts from a novel—The Solid Mandala—by Nobel Prize Australian writer Patrick White. We use a number of prompts to test a number of functions related to narrative analysis from the point of view of the “sujet”, the “fable”, and the style. In the paper, we illustrate extensively a number of recurrent semantic mistakes that can badly harm the understanding of the contents of the novel. We made a list of 12 different types of semantic mistakes or parrotting we found GPT made, which can be regarded as typical for stochastic-based generation. We then tested Gemini for the same 12 mistakes and found a marked improvement in all critical key issues. The conclusion for ChatGPT is mostly negative. We formulate an underlying hypothesis for its worse performance, the influence of vocabulary size, which in Gemini is seven times higher than in GPT.

Keywords:

ChatGPT prompts; narrative theory; semantic theory; modality and factuality; temporal reordering

1. Introduction

The paper describes in detail an experiment using ChatGPT.4o [1] in the summarization mode in order to show the positive and negative aspects of the way in which literary text is reduced to fit into a short summary. The choice to use literary rather than journalistic text is made considering that in literary text, a greater variety of text typologies can be found, thus constituting a more interesting challenge. In particular, literary text may frequently contain both literal and non-literal meaning; it may be organized with sentences with structures in non-canonical order; it may contain new and hard to understand lexical items; the temporal sequence of the storyline may be in some cases reversed introducing flashbacks, and may be interspersed with diegetic or gnomic statements. For this reason, we are using excerpts taken from a novel, “The Solid Mandala” [2], written by Nobel Prize Australian writer Patrick White, which can offer such a variety. We include in Appendix A a brief summary of the storyline of the novel.

As the banner on the starting webpage recites, ChatGPT can make mistakes; it is not infallible and makes frequent mistakes. We intend to illustrate extensively recurrent semantically relevant mistakes that can badly harm the understanding of the content of the novel. GPT’s performance is then compared to Gemini’s in a separate section, where we show that mistakes are reduced and overall comments on narrative style are correct. Besides, both chatbots produce hallucinations. We used a number of different prompts to address storyline and style. The prompts we used are very simple and direct. For this reason, we organized our analysis into three separate sections: the first one is dedicated to summarizing the content by way of what has happened to the protagonists, the story or “fabula”. The second one is dedicated to how the story is being organized, that is, the narrative technique, the “sujet” or the plot. The third focuses on the linguistic tools that make the style the way it is, focusing on the verbal complex. It is important to note the peculiarity of White’s style which, according to Gordon Collier ([3], p. 130), is a narrative in third-person that uses a technique called “figural consciousness”, that rather than having the “auctorial narrator” as one character, there are three main protagonists who narrate with his/her own personal language and style, and both are very different from one another.

As will appear from the analysis we carried out, semantic mistakes in the output summary produced by ChatGPT can harm the way in which the content of the story is reported by assigning actions to the wrong character. These constitute the worst cases, which we assume regarding the use of a procedure that we dub “coreference chain” in which all pronouns are bound to a preceding antecedent. We assume that the procedure is based on heuristics rather than on the knowledge acquired by the underlying NN: in some cases, the antecedent is wrongly assigned, or not assigned at all, confusing the plot. These constitute more difficult cases, in which the antecedent can be assigned only through an inference, or when a cataphora is present. Other recurrent mistakes regard the incorrect usage of semantically ambiguous terms. Typical problems may arise due to oversimplification, i.e., a summarization requirement to reduce down to a certain threshold. This may cause the disappearance of semantically relevant pieces of text, usually restrictive modifiers, that allow the reader to understand the content adequately. As to the choice of the novel, for many critics, Patrick White’s novels are difficult and rich in a lot of elaborate suggestions. As said above, White uses a complex mixture of narrative techniques, which includes metaphors and other poetic devices. Stream of consciousness is substituted by a technique by which the narrator embodies each character using a different language each time: the result is a figural consciousness speaking, not the omniscient narrator. But the main trait of his style is the mixture of “fabula” and “sujet”, i.e., the personality of his characters is strongly interwoven with the storyline, and in more abstract terms, the form is strongly interwoven with meaning. What is consistently present and acknowledged is the intent of the author to characterize each protagonist—and in our case, the main ones are three—with specific linguistic features realized both by the use of lexical choice and distinct constructions mostly at the verbal complex level—but see Section 3. In this way, each protagonist is identified by specific expressions and significant traits that highlight psychological aspects and the relationships with each other, with the external world, and with other characters.

In a previous study [4,5] we followed suggestions by Gordon Colliers [6], which we used to implement a system that tries to identify automatically narremes in the novel, which have been previously annotated manually. For the annotation, we were also inspired by Colliers’ list of features and have been using it with some changes and additions. In particular, we also decided to use Martin & White’s Appraisal Theory Framework [7,8] and introduce features related to Judgement and Affect. The current analysis is thus accompanied and counterbalanced by the deep level of knowledge we acquired in the previous experiments, which will be useful to clarify what ChatGPT is consistently unable to understand and why. More on this in Section 3 below.

The subdivision into three sections of the analysis is linked to the type of mistakes and one hallucination the bot has made, which are the following 12:

Hallucination: correct name of the author but wrong title of the novel;
Abstract key points substituted by direct extracted references;
Oversimplification (dropping the restrictive modifier);
Wrong coreference chain (Mrs. Dun rather than Mrs. Poulter);
Omitted cataphora with split referents (a strange man and Johnny Haynes);
Wrong sense selection with ambiguous terms (think);
Disjoined reference of MEMORY and Mother;
Use of stream of consciousness as a narrative style;
Mistakes in classifying factual vs. non-factual sentences;
Mistakes in classifying factual sentences in the past with the pluperfect;
Inability to detect linguistic elements characterizing narrative style;
Inability to characterize UNCERTAINTY in the verbal complex.

As can be easily noticed, after the first bullet point, which is a hallucination, the second mistake is related to how the bot organizes the summary. Then, the following five mistakes are related to strictly semantic mistakes affecting the storyline, while the remaining mistakes are related to style. Mistakes have been searched after evaluating what size excerpts should have been to force the chatbot to pay attention to important details. Summarizing an entire book or a very long excerpt will favour the chatbots’ preferred strategy of cherry-picking the most relevant events and entities and leaving out the details. Excerpts have then been chosen with an average size of no more than 3000/5000 words. Searching for mistakes is very time-consuming and can only be carried out manually because it requires an expert linguist to do it. No automatic procedure can be organized, nor is it possible to ask the chatbot to find the mistake, because even after acknowledging it, the chatbot is unaware and will reproduce the summary with the same mistake. The 12 mistakes in the list may well constitute a typology of semantic mistakes that a chatbot or any similar pattern-based or stochastically based algorithm will make.

The paper is organized as follows: in Section 1.1 below we draw a neat picture of state-of-the-art mistakes and hallucination mainly in book and story summarization; in Section 2 we focus on summarization of excerpts from the novel, which requires ChatGPT to correctly relate events, actions, and states to the correct protagonist, preserve all needed text to convey complete information and extract appropriate meaning for ambiguous words; in Section 3 we concentrate on the style, the plot and the linguistic and rhetoric devices used by the author, and we highlight the level of complexity in the way in which subtle psychological states are presented by manipulating the verbal complex; in the fourth section we make a detailed comparison with Gemini where we highlight the overall better performance; finally in Section 5 a discussion is presented to support the previous analysis by the underlying hypothesis and the comparison drawn with Gemini; and then a conclusion follows.

1.1. State of the Art of Mistakes in Literary Books Summarization

To the best of our knowledge, no previous study has appeared on the presence of semantic mistakes in literary books like novels by famous writers. Reports appearing on the web rather than scientific papers have appeared in the last three or four years. We review some of these reports, which deal with the fabrication of ungrounded text, omission of relevant information, and lack of concern for factuality. All commentators who tried to ask ChatGPT to summarize a book or chapters of a book came to the same conclusion: the bot is more interested in reducing or shortening the text rather than summarizing the contents. This is clearly because no real understanding of the text is occurring, since we know that all the bot is doing is parroting using probabilities from the information stored in the underlying model and the billion parameters associated with it.

1.1.1. Shortening Rather than Summarizing

In [9] Gerben Wierda reports an experiment with ChatGPT and the results are the following: “When I asked ChatGPT to summarize this text, it instead shortened the text. And there is a fundamental difference between the two. To summarize, you need to understand what the paper is saying. To shorten the text, not so much.” In another experiment the bot produced a lot of hallucinations: “I first tried to let ChatGPT one of my key posts (the one about the role convictions play in humans with an addendum about human ‘wetware’). ChatGPT made a total mess of it. What it said had little to do with the original post, and where it did, it said the opposite of what the post said.” And further, there is an interesting conclusion about the role that parameters could play concerning the prompt offering a context: “What seemed to happen is that ChatGPT’s parameters—which have been tuned by its training data—dominate the result. The text to be summarized itself hardly influences the summary.” The interesting part of the post is that using GEMINI produced an unexpected, important improvement, which is also what we experienced and will extensively show and discuss in the final part of the paper.

Generally speaking, according to Wierda—an opinion we also agree upon—“LLM Chatbots are pretty bad in making specific summaries of a subject that is widespread”, because the subject is well-represented in the parameters. This may also happen if the context is too small. The contrary will appear if the text is not well-known and, as a consequence, there is little material from it in the training corpus. This is the reason why we chose a novel by an Australian writer, who, despite his Nobel Prize, has never enjoyed widespread favour by readers because of his complex writing style.

Another web-based report on book summarization is the one appearing in [10], where Jim Edwards discusses how the bot, in creating a summary of his book, made up a lot of lies. The reason for this behaviour is motivated by the fact that the book was not freely available or was not possibly included in the training material because published later. But the bot has an intrinsic strategy to satisfy any request whatsoever and this is what drives the bot to generate hallucinations. The same results are presented in Matt Spivey [11]. A full list of limits and shortcomings of AI [12] summarization offered by Google’s AI Review was obtained by asking “ChatGPT mistakes in literary text summarization”, which we report here below:

-: Limited Understanding of Subtext and Nuance:

ChatGPT may not grasp the subtle meanings, symbolism, or underlying themes within a literary work, leading to a superficial summary.

-: Difficulty with Complex Plot Structures:

ChatGPT’s summaries can oversimplify complex narratives, failing to capture the intricacies of plot twists, character development, and the overall arc of the story.

-: Inconsistent Voice and Style:

Maintaining a consistent narrative voice and writing style, especially in longer works, can be a challenge for ChatGPT, leading to a disjointed or unnatural summary.

-: Repetitive or Clichéd Language:

Due to its training data, ChatGPT might produce summaries that are repetitive, use overused phrases, or lack the originality and creativity found in human-written narratives.

-: Lack of Emotional Resonance:

ChatGPT’s summaries may lack the emotional depth and subtlety that human authors can convey, potentially rendering the summary emotionally flat.

-: Reliance on Readily Available Information:

ChatGPT might only have access to summaries and reviews of a book, rather than the book itself, so the key points it provides might not be accurate.

-: Not a True Summarization:

ChatGPT may simply shorten the text instead of providing a true summarization, which involves extracting the main points and ideas.

1.1.2. Hallucinations: Vocabulary Size, Factuality, and Faithfulness

Maynez et al. [13] already in 2020 analyzed the limitations of large language models for abstractive document summarization and found that these models are highly prone to hallucinating content that is unfaithful to the input document. They conducted a large-scale human evaluation of several neural abstractive summarization systems to better understand the types of hallucinations they produce and found substantial amounts of mistakes and hallucinated content in all model-generated summaries.

The problem of document summarization, i.e., “the task of producing a shorter version of a document while preserving its information content,” requires models to generate text that is not only human-like but also faithful and/or factual given the document. The authors introduce an interesting classification of hallucination in their analysis which is expressed by the following question: “Do models hallucinate by manipulating the information present in the input document (intrinsic hallucinations) or by adding information not directly inferable from the input document (extrinsic hallucinations)?”, and its follow up represented by the following question: “How much hallucinated content is factual, even when unfaithful?”

As the authors affirm, despite recent improvements in conditional text generation, most summarization systems are trained to maximize the log-likelihood—a strong statistical technique for estimating the parameters of probability distributions based on observed data—of the reference summary at the word level, which does not necessarily reward models for being faithful. The authors continue by saying that “moreover, models are usually agnostic to the noises or artifacts of the training data, such as reference divergence, making them vulnerable to hallucinations.” Thus, models can generate texts that are not consistent with the input, yet would likely have reasonable model log-likelihood.

Hallucinations of intrinsic nature; both use terms or concepts from the document but misrepresent information from the document, making them unfaithful to the document.

Extrinsic hallucinations are model generations that ignore the source material altogether. Extrinsic hallucinations use terms that are not introduced in the document. According to the authors “a model with a poorly-informed decoder and that is agnostic to the divergence issue between the source and target text will function more as an open-ended language model and will be prone to extrinsic hallucinations.”

A better definition of hallucinations is contained in the definition that follows:

“A summary S of a document D contains a factual hallucination if it contains information not found in D that is factually correct. Factual hallucinations may be composed of intrinsic hallucinations or extrinsic hallucinations. By definition, abstractive summaries are written to preserve the salient information in the input document, but they are expressed in the words of the summary author as opposed to the input document author.”

In the conclusion of their paper, Kalev et al. [14] express the need to study further the phenomenon of hallucination, and recommends caution in the use of AI for summarization: “No matter the underlying cause, these results suggest significant caution is warranted in automated summarization and that further research is needed into the general prevalence and underlying causes of summarization hallucination.”

Another frequent trend in examining hallucinations is the one represented by the paper by Catarina G. Belem et al. [15], where multi-document summarization (MDS) is considered. In this work, the authors investigate how hallucinations manifest in LLMs when summarizing topic-specific information from multiple documents. Since no benchmarks exist for investigating hallucinations in MDS, they use existing news and conversation datasets, annotated with topic-specific insights, to create two novel multi-document benchmarks. When evaluating five LLMs on their benchmarks, they observe that on average, up to 75% of the content in LLM-generated summaries is hallucinated, with hallucinations more likely to occur towards the end of the summaries. To understand the characteristics of these hallucinations, they manually evaluated 700+ insights and found that most errors stem from either failing to follow instructions or producing overly generic insights.

In their work, Laban et al. [16] propose to leverage the task of summarization as a testbed for evaluating long-context models and retrieval augmented generation (RAG) systems. They note that summarization requires reasoning over a long context and a careful understanding of the relative importance of content. They use a highly specialized test bed for summarization, the “Summary of a Haystack” (SummHay) task, which requires a system to process the Haystack and generate, given a query, a summary that identifies the relevant insights and precisely cites the source documents. The results are very disappointing.

Huang et al. [17] focus on multi-document news summarization with the task of summarizing diverse information encountered in multiple news articles encompassing the same event. To facilitate this task, they outlined a data collection schema for identifying diverse information and curated a dataset named DiverseSumm. Their analyses suggest that despite the extraordinary capabilities of LLMs in single-document summarization, the proposed task remains a complex challenge for them, mainly due to their limited coverage, with GPT-4 only able to cover under 40% of the diverse information on average. Their fine-grained human evaluation results identify that even the most advanced LLM, GPT-4, only covers about 37% of diverse information with optimally designed prompts.

One of the underlying causes of hallucination in summarization may be found in the paper by Uluoglakci and Taskaya Témizel [18]. In their paper, they note that LLMs encounter challenges when dealing with rare tokens, especially in mixed contexts. No information is made available when ungrounded yet plausible text is generated, as the authors comment: “Adversarial effects in real-life scenarios may inadvertently emerge from prompts that combine both common and rare tokens. However, approximately 90% of the time LLMs neglect to indicate their lack of information about a hypothetical phenomenon in similar situations. This characteristic significantly diminishes the reliability of LLMs and impedes their suitability for deployment in critical decision-making systems.” As to the tendency of LLMs toward hallucination when in the presence of nonexistent terms in their training dataset, see below.

Another important trend is the one characterized by the evaluation of factuality in generated texts by LLMs. Min et al. [19] in their paper introduce FACTSCORE, a new evaluation tool that breaks a generation into a series of atomic facts and computes the percentage of atomic facts supported by a reliable knowledge source. As the authors note, “evaluating the factuality of long-form text generated by large language models (LLMs) is non-trivial because (1) generations often contain a mixture of supported and unsupported pieces of information, making binary judgments of quality inadequate, and (2) human evaluation is time-consuming and costly.” Their results “indicate that current LLM training methods are insufficient to prevent hallucinations, emphasizing the need for a fundamental change to ensure the reliability of LLMs.” To test the ability of LLMs to answer questions, they used a dataset that comprises one-third of hypothetical questions and two-thirds of valid questions. The results determined by the percentage of valid answers to hypothetical questions were 5.72% for GPT-3.5 and 5.64% for Llama2-70B, indicating over a 94% error rate. In most cases, the models failed to recognize a hypothetical term or refused the existence of a valid term, producing hallucinated information. In addition, they omitted the hypothetical term entirely in 5% of their responses.

As Mishra et al. [20] assume, Llama and other similar LMs have demonstrated significant achievements in summarization tasks but struggle with factual inaccuracies, a critical issue in clinical NLP applications where errors could lead to serious consequences. LLMs confront significant challenges, primarily their propensity for generating hallucinations—fabricated information not grounded in the source text—and producing factually inconsistent outputs. Such limitations critically undermine the models’ reliability, particularly critical in clinical NLP applications, where inaccuracies could result in serious misdiagnoses.

The authors Kalai & Vempala [21] comment on the tendency of recent language models to generate false but plausible-sounding text with surprising frequency. This fact constitutes a danger, as such “hallucinations” are an obstacle to the usability of language-based AI systems and can harm people who rely upon their outputs. Their paper shows that there is an inherent statistical lower bound on the rate at which pretrained language models hallucinate certain types of facts and that this fact has nothing to do with the transformer LM architecture or data quality.

Kin et al. [22] report work carried out at the book level to verify the “faithfulness” of LLMs in reporting the book content in their summaries. To this aim, they decompose each summary into a list of claims (FABLES), which are then individually verified against the input document. Overall, they observed that CLAUDE-3-OPUS is the most faithful book-length summarizer by a significant margin, followed by GPT-4-TURBO. As a result, a qualitative analysis of FABLES reveals that the majority of claims marked as unfaithful are related to events or states of characters and relationships. In their experiment, the authors also found that annotators frequently point out omissions of critical information. Thus, they developed the first taxonomy of omission errors in book-length summarization and observed that key events, details, and themes are frequently omitted by all LLMs. They also observe other content selection errors: for example, even the strongest summarizers, CLAUDE-3-OPUS and GPT-4-TURBO, over-emphasize content towards the end of books to the detriment of the beginning.

As a first conclusion derived from the literature on mistakes and hallucinations in the use of LLMs, we can safely say that the problem can be traced to the intrinsic need to keep the size of the vocabulary in the range of 30K–60K, a decision that is dictated partly by the need to keep the number of parameters to a manageable size and reduce the computational requirements: as Tao et al. put it [23] “A larger vocabulary size improves tokenization fertility, i.e., splitting sentences into fewer tokens, thereby improving the tokenization efficiency. Additionally, a larger vocabulary enables the model to capture a wider range of concept.” But increased vocabulary size requires an adequate training corpus, because “… the risk of under-fitting for rare tokens increases with larger vocabulary sizes”. Thus, the authors in [20] introduce the notion of Optimal Vocabulary Size, which is, however, at least seven times the one commonly used in most current LLMs.

This type of reasoning is deduced from the experiments carried out by Uluoglakci and Taskaya Témizel [18], where they directly address the problem of the presence of rare terms that might be missing in the training corpus. In a number of previous papers, we verified this hypothesis by checking the behaviour of different LLMs on texts extracted from Italian poetry, and the results showed clearly that rare words cause a dramatic drop in the ability of BERT to predict the next word [24,25].

A more abstract generalization will affect directly the underlying theory. Distributional semantics is based on the Distributional Hypothesis, which states that similarity in meaning results in similarity of linguistic distribution [26]. Words that are semantically related are used in similar contexts. Distributional semantics reverse-engineer the process and induce semantic representations from contexts of use [27,28]. As we know from the way in which language models are built, word embeddings can be regarded as direct representatives of the theory. In its most basic and frequent form, distributional semantics represents word meaning by transforming large amounts of text through an abstraction algorithm, to produce a distributional model, where semantic representations are listed in the form of vectors—i.e., lists of numbers that determine points in a multidimensional space where similarity is measured by cosine similarity. Other possibilities include the representation of semantic units via complex algebraic objects like matrices or tensors.

2. Materials, Methods, and Results I

We started using ChatGPT for summarizing excerpts from the novel at the beginning of December last year. Interactions with the chatbot have been conducted in separate conversations, both for different versions of the same excerpt and separately for each excerpt, to avoid altering the answer by the intervention of memory of previous conversations. We started using prompts that should have given as a result a summary of the actions taken by each character in the story.

In fact, we tried different versions of the excerpt to verify that the text and the answer were not searched somewhere on the internet, so we also changed all the names of people and locations, but we always obtained the same answers. We also slightly modified the prompt a number of times, as you can see here below, but always obtain the same answer. The first prompts for summarization are as follows:

Tell me all that happens in the story
Tell me all that the characters do in the story
Tell me all that happens to the characters of the story
What have all the characters in the story done?

Then we turned to focusing on narrative techniques and the use of linguistic and rhetorical devices, so the idea was also to increase the length of each excerpt from 3000 tokens up. We knew that the length of the excerpt could result in a rejection by ChatGPT, but we intended to try. So, we asked whether we could copy and paste the whole text of the novel, even though we knew that we would receive a negative answer; this is the prompt I used, and here is what we obtained:

I will now copy-paste the entire text of a novel if that will suit you: what narrative techniques and what linguistic devices did the author use to characterize the protagonists?

If you share the entire text of a novel, it might be challenging to analyze it in a single step due to its length. However, you can share key passages or chapters that you believe are crucial to understanding the characterization of the protagonists. This will allow for a focused analysis of the narrative techniques and linguistic devices used.

Alternatively, if you can specify which parts of the novel are particularly important for character development, I can analyze those sections. That way, we can break down how the author builds the characters and what literary tools are employed.

When we launched the following prompt with a new excerpt, we obtained the answer “Error too long” and no summary was produced. Then, we have the first mistake:

-: The correct name of the author but the wrong title of the novel

We discovered that ChatGPT had managed to find out the name of the author, but not the correct title, which we received after a number of following interactions.

The excerpt from “Riders in the Chariot” by Patrick White showcases several linguistic devices and narrative techniques that characterize the protagonists, Mrs. Poulter and Mrs. Dun, as well as the setting of the scene. White uses these techniques to subtly reveal the characters’ personalities, social standing, and the nature of their relationship.

This must be regarded as a case of hallucination. The two novels, Riders in the Chariot and The Solid Mandala, have nothing in common apart from the name of the author. In particular, the former is the story of a religious Jew, Himmelfarb, and a woman, May Hare, who are conditioned by “Chassidism” or “Hasidism”, a religious movement that influences and determines how to conduct a Jewish life. The latter, on the contrary, is the story of the life of twin brothers Waldo and Arthur and their relatives and friends.

The correct title of the novel comes later on. Here is the answer we obtained from ChatGPT after 21 interactions with short excerpts from the same novel:

This excerpt from Patrick White’s The Solid Mandala employs a range of linguistic and rhetorical devices to characterize Mrs. Poulter and Mrs. Dun, two elderly women who engage in conversation while riding a bus. Here is an analysis of some key devices that help shape the personalities and dynamics between these characters:

The summaries produced in reply to input excerpts ranging from 3500 to 4000 words have all the same sequence of bullet key points with comments that explore characters through their dialogue and their lexis. In particular, when summarizing Excerpt No. 1, ChatGPT notices the use of repetitive words and phrases, the presence of grammatically non-standard sentences, and colloquial regional idioms. But then, it highlights the role each character embodies in the conversation, ironic and humorous statements, and their contrasting personalities when compared to people passing by the bus, like young girls and the couple of twins living in the house opposite on the same road. One bullet point is dedicated to the symbolism attached to objects and places mentioned in the text. Finally, there are comments on the characters’ personalities, the presence of suspense, and suggestions of possible dramatic events.

Whereas reference to the use of specific linguistic items—like conjuncts, adverbials, articles, different naming references, verbal complexes, etc.—their semantic imports—like use of modality—and syntactic constructions—like cleft sentences, parataxis, displaced syntactic structures etc., are totally absent, the reference to narrative techniques is present but not always. We analyze in detail the quality of the summaries produced by enumerating the bullet points that ChatGPT has produced incrementally, thus showing that it is using a memory of previous analysis of the same novel; as of March 2025, we have been told that memory has been disabled.

Here is the list of nine key points used by ChatGPT to build up one of the last summaries of Excerpt No. 1:

1. Dialogue and Repetition; 2. Colloquial Language and Syntax; 3. Subtle Power Dynamics; 4. Irony and Understatement; 5. Characterization through Contrast; 6. Symbolism; 7. Subtle Humour; 8. Foreshadowing and Suspense; 9. Character Psychology and Subtext.

Below are the 10 bullet points used by ChatGPT to build up the first summary of Excerpt No. 1:

1. Stream of Consciousness; 2. Repetition; 3. Contrast and Juxtaposition; 4. Symbolism; 5. Imagery, 6. Fragmented Narrative; 7. Characterization through Mundanity; 8. Use of Indirect Characterization; 9. Religious Allusions; 10. Emotional Resonance through Ambiguity.

In this case there are remarkable differences and only three bullet points that we underlined can be said to coincide. However, when we look at the summaries produced for the following excerpts, we see that the previous scheme is preserved. Below is the list of bullet points produced for Excerpt No. 2:

1. Dialogue-Driven Characterization; 2. Indirect Characterization; 3. Repetition and Parallelism; 4. Social Commentary through Subtle Irony; 5. Symbolism; 6. Character Dynamics; 7. Use of Humour; 8. Tone and Mood; 9. Attention to Detail; 10. Stream of Consciousness Elements.

Almost all the new bullet points have been inherited from previous analysis. When we re-proposed Excerpt No. 1 after going through all other excerpts, we obtained a totally different decomposition of the text, as can be noticed from the list below:

1. Dialogue and Speech Patterns: 1.1 Repetition and Echoing, 1.2 Colloquial Language, 1.3 Indirect Communication; 2. Characterization through Internal Thoughts, 2.1 Mrs. Poulter Reflection, 2.2 Mrs. Dun’s Hesitation; 3. Use of Setting, 3.1 The Bus as a Microcosm, 3.2 Sarsaparilla and Terminus Road; 4. Symbolism, 4.1 Verandas and Hedges, 4.2 Flowers; 5. Narrative Perspective, 5.1 Limited Third-Person Point of View; 6. Humor and Irony, 6.1 Subtle Humor, 6.2 Irony; 7. Pacing and Rhythm, 7.1 Slow, Measured Pace, 7.2 Rhythmic Repetition; 8. Social Commentary, 8.1 Class and Gender, 8.2 Conservatism and Prejudice.

Not only are the majority of key points completely new, but they also foreshadow a totally different approach: rather than being abstractions derived from narratological theory, the new list proposes the insertion of direct reference items extracted from the text.

Eventually, we concentrated on temporal ordering and then on factuality, which required the new prompts that are listed in Section 3. Below, we analyze what is contained in the summaries produced by ChatGPT, choosing the most significant cases.

The Story or the Fabula

In the choice of text to summarize, we have been using excerpts from the novel in which the number of protagonists does not exceed two or three persons of the same sex. The most remarkable part of the output we receive from ChatGPT regards the treatment of pronouns, i.e., coreference resolution and what we dubbed as “coreference chain”. The technique used by the summarization algorithm seems to follow the implicit rule to assign the same antecedent to all pronouns with the same morphological features in the stretch of text that follow the appearance of a referential expression that can be regarded by its frequency of usage a lexically expressed protagonist, until a new and different protagonist is mentioned explicitly. This rule works perfectly apart from a case in which the antecedent is implicitly coreferred by means of a physical object belonging to her and not to the other protagonist. This case is shown in Excerpt No. 1, which comes from the beginning of the novel, where two women, Mrs. Poulter and Mrs. Dun, are talking while sitting on a bus: the pronoun starting a sentence has an antecedent that requires the knowledge of the extended context acquired by the underlying Neural Network thanks to the LLM it has been using—more below. We place the excerpts in Appendix B and the answers from ChatGPT in Part I of the Supplementary Material. In this first case, we are discussing semantic mistakes number 1. 2.

Oversimplification (dropping the restrictive modifier);
Wrong coreference chain (Mrs. Dun rather than Mrs. Poulter).

All the excerpts are decomposed by ChatGPT into short sentences made up from the extraction of snippets made of predicate-argument structures, i.e., a subject and object NP, and a verb. Sentences are assigned to each protagonist separately. In our case, first comes Mrs. Poulter’s list of sentences and then Mrs. Dun’s. In the Supplementary Materials, you will find in Figure S1 the first answer to Excerpt No. 1.

EXCERPT No. 1

In the analysis below we comment on ChatGPT answers preceded by three stars (***) and the related part of the text coreferred, preceded by three dots (---).

The note regards a comment by Mrs. Poulter, who compares living in a suburb to being in a cage and expresses this in the sentence:

---: “You couldn’t go anywhere as the crow.”

To which comment Mrs. Dun replies by expressing her surprise with a filler question: “Eh?”. The following turn contains Mrs. Poulter’s attempt at clarification and Mrs. Dun’s lack of appreciation of the use of a “crow” to indicate a generic bird:

---: “As the crow flies,” Mrs. Poulter explained.
---: “Oh, the crow,” her friend murmured, seeming uneasy at the idea.
***: Expresses unease: She seems uneasy when Mrs. Poulter talks about the crow.

ChatGPT interprets this utterance as an expression of uneasiness, whereas the author’s suggestion is “seeming” uneasy.

At the beginning of the conversation, we are told that Mrs. Poulter is wearing a pair of gloves:

---: “Well,” said Mrs. Poulter, peeping inside her plump glove …

Towards the end of the excerpt, Mrs. Dun replies to a comment by Mrs. Poulter, and the following piece of text is a comment by the author on the way in which one of the two women is actually feeling:

---: “I’ll say it is!” said Mrs. Dun.
---: She had a certain relentlessness of conviction. If it had not been for her gloves, her knuckles would have shown up white on the chrome rail.

ChatGPT carries over the coreference to Mrs. Dun in the following sentence, starting with the pronoun “she”. But it is Mrs. Poulter who is now being coreferred and this is made clear by the following sentence that we highlighted in italics, where the gloves are referred to. The passage is assigned to Mrs. Dun and this is how it is being commented:

***: Shows conviction: Her strong grip on the rail indicates her strong feelings and convictions.

The third mistake regards the need to simplify when producing a summary, but in this case, we are in the presence of oversimplification. As the conversation continues, Mrs. Poulter, who is “happy again”, manages to make a remark on a gadget that she wishes she could also have:

---: Presently, she could not resist: “That veranda of yours must be a real luxury, dryin’ laundries in the rain.”

The veranda’s usage for drying laundry in the rain is something Mrs. Poulter is really envying her friend. However, ChatGPT only reports the reaction of Mrs. Dun to what Mrs. Poulter said. Oversimplification is present in Mrs. Dun’s comment reported wrongly:

***: Comments on the veranda: She mentions Mrs. Poulter’s veranda as a luxury for drying laundry.

The veranda is not just for drying laundry, but also when it rains, which has been omitted.

In general, all that is not represented by a fact and is a comment by the author is being erased. In particular, we noticed that a lot of text contained in the excerpt is totally ignored by ChatGPT, and for this reason, we represented a portion of the text to ChatGPT, which we commented below. The paragraph where the author reports Mrs. Poulter’s talking about a Chinese woman and family, and the place where they lived is reported by a generic sentence mixed up with what has been said at the beginning.

***: Explains: She describes how the roads at Sarsaparilla were dead ends and mentions seeing a Chinese woman standing under a wheel-tree.

This is reported from the point of view of Mrs. Poulter. There’s another indirect reference to the Chinese woman from the point of view of Mrs. Dun:

***: Sucks her teeth: This action indicates a moment of contemplation or mild disapproval.

Mrs. Dun “sucks her teeth” when Mrs. Poulter is telling about the “those what-they-call wheel-trees” and she had seen her “standing under it when it was in flower”. It would seem that Mrs. Dun does not like Chinese people, but we learn more in the enlarged summary below.

Finally, ChatGPT captures an important description of Mrs. Poulter and her husband Bill that characterizes them well with their personalities.

As said above, we decided to split the Excerpt and use the final portion to ask the same question and see what additional information ChatGPT would be able to gather. The portion of the text starts with Mrs. Poulter telling about her memory of a Chinese woman and her family. The result is shown in Figure S2 in Supplementary Materials.

As can be easily noticed, in this case, there is a lot more information that has been captured when reducing the text to be summarized: this is certainly due to a fixed proportion of text that the algorithm allows every time a text is being presented. The newly extracted pieces of text are the following: a first bullet point dedicated to Mrs. Poulter recounting the story of the Chinese woman. The other new point is about Mrs. Poulter’s “Expressing happiness”, whereas in the previous summary, we had her “expressing sadness and nostalgia”. At the same time, we notice that some information previously present has now been deleted. In particular, the reference to Bill’s sweater, which was so rich in the previous summary, is now absent. We can also notice the lengthening of the reference to Mrs. Dun’s veranda, which is now useful for drying laundry “in the rain”.

***: Wears an old sweater: He is described wearing an old fawn sweater that Mrs. Hide had knitted, which had begun to stretch and sag.

However, now we have Mrs. Poulter recalling her story of a Chinese woman and on the opposite side, we have Mrs. Dun listening to the story.

***: Talks about the Chinese woman: She recalls seeing a Chinese woman who lived on a hill, mentioning her refinement and the wheel-tree she stood under.
***: Listens to Mrs. Poulter: Mrs. Dun listens to Mrs. Poulter’s story about the Chinese woman.

Now Mrs. Dun “sucking her teeth” is no longer interpreted as showing some kind of appreciation of what she had been told by Mrs. Poulter, but just as a generic reaction we are not told to what: reduction again.

The interesting point is the one explaining Mrs. Dun’s “conviction”:

***: Shows conviction: Her strong grip on the rail reveals her strong feelings about the conversation and situation.

In the previous summary “shows conviction” was related to the veranda, whereas now it is wrongly related to conversation and the situation. The fact is that, as before, the sentence is wrongly associated with Mrs. Dun rather than with Mrs. Poulter, thus making a doubly wrong reference.

EXCERPT No. 2

In the second excerpt, we are dealing with two men, the two most important protagonists of the novel, the twins Brown, Waldo and Arthur. In this excerpt, the number of dialogues is just a few and the majority of the text is made up of the author’s descriptions and reported speech. The mistakes involved are numbers 3 and 4.

3.: Omitted cataphora with split referents (a strange man and Johnny Haynes);
4.: Disjoined reference of MEMORY and Mother.

As happened with the previous Excerpt No. 1, also in this case, important parts of the text have been totally ignored, so we had to represent a section.

The first summary is focused on the two protagonists and the dogs, and their actions are reported very correctly and precisely, as can be gathered by the content of Figure S3 in Part I of the Supplementary Material.

The summary reports faithfully the main actions of the twins; however, as can be easily noticed from a comparison with the next, more detailed summary, there is a lot missing. So, we split Excerpt No. 2 and produced a new, shorter text starting from the paragraph with Waldo leading his brother out of the kitchen. Below we show a ChatGPT summary, which is, however, split into two separate images so that they can be more easily matched with the previous Figure S3. At first, we have the image of Waldo’s behaviour in Figure S4.

The portion of Excerpt 2 that we have now called HalfExcerpt2 may be referred to in the previous Figure S3 from bullet point 7 downward. We list below the new text extracted and comment on its importance in the overall previous summary. We use two ^^ to indicate newly extracted text, and “§§” to indicate a stretch of text which is partially coincident with what was previously extracted.

^^: Feels Arthur’s hand: Waldo feels the larger, fleshy hand of his brother Arthur in his own thinner, colder hand as they walk.
^^: Notices Arthur’s breathing: Waldo sees that his brother is breathing deeply.
^^: Observes Arthur’s difficulty speaking: Waldo notices Arthur’s struggle to form words and their solidity once spoken.
^^: Reflects on planting the gooseberry bush: Waldo remembers planting the bush against Arthur’s advice to demonstrate something.
^^: Envy towards Arthur: Waldo sometimes envies Arthur’s ability to distance himself from pain.
^^: Modulates his voice: Waldo uses his fine tenor voice, more to impress than to please, a voice their parents were proud of.
^^: Contemplates their differences: He thinks about their mother’s comments on his voice and Arthur’s hair, believing his mind should have been praised.
§§: Stares at Arthur’s hair: Waldo is fascinated by the glimpses of pink skin through Arthur’s hair and reflects on its cleanliness.

At first glance it is apparent that the second summary is by far more informative both in terms of facts and of mental thoughts.

Now consider the mistake in assigning an action to Arthur, which in fact was carried out by Waldo, as clearly appears from the related stretch of text:

---: Arthur had advised against it—Sarsaparilla was too warm—but Waldo had planted the bush. To demonstrate something or other.

And this is the snippet produced by GPT:

***: Reflects on planting the gooseberry bush: Waldo remembers planting the bush against Arthur’s advice to demonstrate something.

It is not Arthur who wants to demonstrate something, but Waldo. In this case, the mistake is due to the IMPLICIT subject of the infinitival, which must receive an antecedent: the choice of Arthur is due to his role as subject of the previous main sentence. Whereas Waldo is the subject of the dependent sentence, which is, however, closer to the infinitival and becomes the new topic if any such mechanism is in force with the “coreference chain”. Arthur’s advice was contained in the parenthetical “Sarsaparilla was too warm”, which is ignored. But in the enlarged version of the second part of HalfExcerpt2, we highlight the pieces of text that have been added—this is visible in Figure S5 in Part I of the Supplementary Material.

Below are the stretches of text that have been extracted in addition to the more general summary.

^^: Holds Waldo’s hand: Arthur holds Waldo’s hand as they walk, with his larger, fleshy hand in Waldo’s thinner, colder one.
^^: Speaks with difficulty: He struggles to form words but manages to say them solidly once he does.
^^: Advises against planting the bush: Arthur had advised against planting the gooseberry bush, believing Sarsaparilla was too warm for it.
^^: Maintains physical strength: Despite his struggles, Arthur has grown into a strong man, continuing to lift weights and maintain his youthful muscles.

As can be noticed some of the additional material is important to understand the story but had been previously erased by oversimplification.

We comment now on a third excerpt regarding the behaviour of one of the twins, Waldo and an at first unidentified character, which in the following text becomes one of his previous friends. The recognition comes through the words of the woman who accompanied the man: a husky voice. In particular, at this precise moment: “He remembered it was that boy, that Johnny Haynes, they could have cut each other’s throats, telling him behind the dunny to watch out for hoarse-voiced men and women, they were supposed to be carriers of syph.” As before, the excerpt is in Part I of Supplementary Material as Figure S6.

In this text, through the words of the woman, Waldo remembers what his friend Johnny Haynes told him, and realizes who the man was. But we are not told this by the summary made by ChatGPT, which keeps references to the old man separate from those related to Johnny Haynes as if they were two different people.

The reasons for this important mistake are to be found in the way in which the “coreference chain” usually works: at first comes the referring expression, in our case a human being that becomes the entity to be coreferred and coindexed by subsequent pronouns or epithets, or others. The entity coreferred to is the antecedent and not, as in our case, a cataphora which requires specific mechanisms for cataphoric coreference. It may also be due to the need to produce an inference on the type and tone of the woman’s voice and the two characters appearing in the story.

In the summary, we are told that both the Old Man and Johnny Haynes “kick at the house” as if they were two separate persons doing the same action. This mistake modifies the plot, introducing a new non-existent entity.

Besides, the summary is a bad oversimplification where the most important pieces of text have been ignored. We repeat the same operation we did previously by splitting the text of Excerpt 3 and producing a new, more focused summary that we show in Part I of Supplementary Material in Figure S7.

EXCERPT No. 3

This portion of Excerpt 3 is dominated by what a personified Memory brings about in Waldo’s thoughts and the actual actions in his present life. We use three asterisks for ChatGPT’s answer and three dashes for the original text. In particular, the first reference by ChatGPT is taken to be a real fact:

****: Reflects on Mother’s Legacy: Waldo thinks about how his mother had a sense of moral proportion, which he believes he inherited along with her pale eyes.
----: Disorderly inhabit, because the years had gradually frayed her, Mother kept what he liked to think of as a sense of moral proportion. Which he had inherited together with her eyes.

The text does not assert that Mother “had a sense of moral proportion”, but this is what Waldo “liked to think” or “believe”. In the snippet produced by ChatGPT, the verb “think” is used in another sense, with the other meaning, “consider”, which implies factuality in the implication. The ambiguity of the verb THINK is at least five times ambiguous (believe, consider, intend, devise, be preoccupied, etc.), and in an LLModel, all the different senses are assembled into one single embedding. GPT extracted the wrong sense.

A second important point is the one mentioning Memory as an actor:

****: Memory Takes Over: Memory, personified, takes a dominant position, and Waldo sees visions of great occasions and feels a surge of radiance and splendour.

And below is a comment for the bullet point personified Memory:

****: Takes Control: Memory, as a character, takes control of Waldo, making him relive and see past events with heightened vision and splendour.

But Memory is not just a personified character, she is the personification of Waldo’s mother. In fact, Memory is slowly turned into the Mother, even though at the beginning of the paragraph, the two appear separate. Then comes the moment when “Memory herself seated herself in her chair”, where “her” refers to Mother.

----: Then Memory herself seated herself in her chair, tilting it as far back as it would go, and tilted, and tilted, in front of the glass. Memory peered through the slats of the squint-eyed fan, between the nacreous refractions. If she herself was momentarily eclipsed, you expected to sacrifice something for such a remarkable increase in vision. In radiance, and splendour. All great occasions streamed up the gothic stair to kiss the rings of Memory, which she held out stiff.

In this case, the personification of Mother is IMPLICIT and requires an inference, two operations that are impossible for ChatGPT and for DNNs in general. The reason is very simple, models are built around lexically expressed words, and implicit information is not available and not present in the model except for words that are strongly contextually motivated.

Eventually, we add another final excerpt for this section, which is concentrated on the twins and follows previous excerpts as far as the story or fabula is concerned. Excerpt No. 4 is focused on Waldo’s attempt to inflict his brother with high levels of pain by bringing to light Arthur’s attempts at poetry writing. In fact, the move hides Waldo’s desperate need to erase his past unsuccessful attempts at novel writing, which is apparent when Tiresias is mentioned. We report in Figure S8 in Part I of the Supplementary Material. ChatGPT’s summary of the characters’ actions in excerpt 4.

EXCERPT No. 4

Waldo does not want Arthur to know about the mother’s dress he had just hidden, but he wants to tease Arthur by destroying his secret past attempts at poetry writing. The tone of voice is important, but is mingled with his feelings of revenge. This is what the text tells us:

----: Then Waldo read aloud, not so menacingly as he would have liked, because he was, in fact, menaced:

And here are GPT’s couple of snippets:

****: Reads the poem aloud.
****: Feels menaced by the poem.

Waldo wanted to sound menacing but “in fact he was menaced”; this is reported as a fact by the omniscient writer and was not felt by Waldo. The poem, or what Arthur wrote, was telling about bleeding, a Vivisectionist Cordelia is bleeding and all Marys “in the end bleed”. After reading the poem aloud, Waldo holds it in his hand and Arthur takes it back:

----: He would have snatched, but Waldo did not even make it necessary.

On the contrary, what GPT understands of Arthur’s action is something different:

****: Tries to snatch the paper from Waldo.

But Waldo’s action is reported correctly:

****: Let the paper hang from his hand.

Now Waldo feels “he was bleeding” and in his imagination he had “suspicion of an incurable disease”, because of the feeling he had of Arthur’s “unnatural blood continued to glitter”. Then we are told that,

----: Waldo was infected with it.

He was infected with the imaginary incurable disease and not by the blood:

****: Feels infected by Arthur’s “unnatural blood.”

Waldo is now convinced that all his past vainglorious attempts at becoming a novelist had no sense, and it was time to part from all of them. As a result, the mother’s dress and all his papers were slowly burned. The scene is described with a wealth of details because it is a fundamental milestone in the development of the two characters’ lives. But GPT only records two simple actions:

****: Goes to the pit where they burn things and pitches a paper tent.
****: Burns papers, feeling lighter afterward.

Waldo’s action is, in fact, very complex since it encompasses all his past memories. At first, we are told that the dress-box is on fire and after that, Waldo feels spontaneously pushed to continue burning his past, which is made up of all the attempts at writing a novel about Tiresias:

----: About four o’clock, he went down, Tiresias, a thinnish man, the dress-box under his arm, towards the pit where they had been accustomed to burn only those things from which they could bear to be parted. He stood on the edge in his dressing-gown. Then crouched to pitch a paper tent, and when he had broken several matchsticks—increasingly inferior in quality—got it to burn. The warmth did help a little, and the prettiness of the fire, but almost immediately afterwards, the acrid years shot up his nose. So he stood up. He began to throw his papers by handfuls, or would hold one down with his slippered foot, when the wind threatened to carry too far, with his slippered foot from which the blue veins and smoke wreathed upward. It was both a sowing and a scattering of seeds. When he had finished, he felt lighter, but always had been, he suspected, while walking away. Now at least he was free of practically everything but Arthur.

At the end, we are also told that even if he “felt lighter” but always had been “he suspected”. His twin brother arrives in the following snippet, which is again badly presented:

****: Considers how to disembarrass himself from Arthur.
****: Raises himself on one elbow due to the urgency of his problem.
----: After he had lain down on the bed he began to consider how he might disembarrass himself, not like silly women in the news who got caught out through falling hair or some such unpremeditated detail, but quick, clean, and subtle, a pass with the tongue he had not yet perfected, but must. As he lay, he raised himself on one creaking elbow, because of the urgency of his problem. That was when Arthur came in and saw him.

The final part of the conversation between the twins is rendered only superficially, without sufficient pathos for what is happening. Waldo is trying to destroy emotionally and mentally his twin brother Arthur with the excuse of the poem. The dialogue is summarized correctly by Arthur’s reaction:

****: Asks Waldo what he is trying to do to him.
****: Shapes his defense, apologizes, and explains himself to Waldo.

But then the outcome is wrongly reported in the snippet:

****: Reacts to Arthur’s last words with disgust.

Which should summarize the following sequence of their conversation:

----: “I know it wasn’t much of a poem.” Arthur was shaping his defense. “Oughter have destroyed it at once. Apologise, Waldo.”

The warmed stones of words.

“That poem? That disgusting blood myth!” Waldo gasped to hear his own voice.

“I would have given the mandala, but you didn’t show you wanted it.”

“I never cared for marbles. My thumb could never control them.”

Waldo in fact gasped to hear his own voice, and disgust is associated with the words of Arthur’s poem.

3. Method, Materials, and Results II

3.1. Testing ChatGPT with the Plot, the “Sujet”

In this section, we will consider the way in which the story has been narrated, the narrative style or techniques the author has brought forward to make the story interesting, as well as the linguistic techniques underlying the style. Mistakes in this section are referred to the previous bullet points 9, 10, 11, 12.

9.: Mistakes in classifying factual vs. non-factual sentences;
10.: Mistakes in classifying factual sentences in the past with pluperfect;
11.: Inability to detect linguistic elements characterizing narrative style;
12.: Inability to characterize UNCERTAINTY in the verbal complex.

At first, we used the same Excerpt No. 3 of the previous analyses, because it is a highly representative piece of text to be used for characterizing White’s style through Waldo’s personality. This is mainly the reason why we have chosen it, but as will appear below, it is also for its peculiar linguistic features.

ChatGPT analysis, organized as before into a sequence of bullet items, this time is preceded by a general comment which hinges upon the mixture of what ChatGPT regards as the three important components of White’s narrative style: stream of consciousness, free reported or indirect discourse, and the narrator’s intrusive voice. In fact, none of the three components indicated are present: White’s style has no “auctorial intrusion”, no stream of consciousness, and no free indirect discourse. Rather, what is at stake here is “figural consciousness”, i.e., it is the character’s own internal consciousness that the narration presents, something that is totally different in each of the three protagonists. Even the presence of diegetic statements must be regarded as stemming from the character’s inner voice (see [3,6]).

As happened before, also in this case, the introductory portion of the excerpt with a seemingly strange visitor and his accompanying woman is not intertwined with Waldo’s reconnaissance phase when the memory of Johnny Haynes jumps up to clarify the scene and justify Waldo’s reactions.

The man is described with a richness of details ending up in a solid man,—because of his purposefulness—is the author’s comment. Waldo envied the man, his clothes, and his kempt head. But then comes the ironic side of the man’s description: the zip of his “insolent” pants, which might get stuck in a public lavatory. So that eventually, the man might soon be subject to a stroke. And that’s when Waldo “racked his memory” and was “racked”. As said above, Waldo eventually recognized Johnny Haynes—the boy he hated—because of his woman’s husky voice. So it is for the details that we are brought inside the protagonist’s mind and navigate with his thoughts and imagination in unexplored lands.

The most important component of White’s style, as referred to the most important protagonist, Waldo, is the diffuse sense of uncertainty, expressed by the use of modality which is realized in verbal complexes and adverbials. In Figure S9 in Part I of the Supplementary Material, we make a list of “uncertainties” introduced in the excerpt, marking the relevant portion of text with italics.

And in Figure S10, always in Part I of the Supplementary Material, we show the summary produced by ChatGPT. It is really important to focus on how ChatGPT has been able to capture the “uncertainty” we have been referring to in the Introduction. This is particularly well represented in bullet point no. 5, where we have the impression that “ambiguity” has become so strong that reality is almost indistinct from imagination. However, this is not at all the way uncertainty should be interpreted. As the list of verbal and adverbial constructions above clearly shows, it is rather with the use of modality that it is possible to classify the narrative technique typical of Patrick White’s novels in general and, in particular, of the most important protagonist of this novel, Waldo Brown.

Modality and uncertainty are important features of White’s style and have been deeply documented in a previous paper [3], as stated in the Introduction. In several tables—from Table 1, Table 2 and Table 3 here below, which we repeat from the paper cited, we explain why modality is the key to understanding the style. We started above by commenting on uncertainty, we marked all the parts of the text that, in their syntactic, semantic, or pragmatic value, carry a sense of interpretation of the storyworld by the characters. This interpretation may indicate actual hesitation or ambiguity, expressing the more or less conscious doubts in the minds of the protagonists; at the opposite end, it can also signal a judgment of certainty by a character, which ironically in turn generates insecurity in the reader and raises a series of important questions. In the present analysis, the element uncertainty has only one obligatory attribute: nonfactual (more in a section below, but see [26,27]). To specify uncertainty, ambiguity, and doubt, it is crucial to mark the annotated expression as non-real—that is, non-factual—a process that is only going on in the character’s mind and which does not have an equivalent in the “real world” of the story.

As with all other aspects of this particular tagging system, subjectivity, the second of the three stylistic elements, focuses on facets of character psychology and on their relations with the storyworld. The main difference between certainty and subjectivity lies in the fact that the first element circumscribes how the protagonists interpret their reality and the ways they rationalize it, while the latter marks the modalities in which they actively and subjectively contribute to the narration. While the former element is non-factual in nature, the latter always has consequences in terms of narrative, sometimes even tangible ones (e.g., in the case of performative will). Subjectivity includes active psychological processes—both conscious and unconscious—as studied by cognitive sciences, as well as expressions of emotion and different kinds of feelings, grouped into five attributes.

The last content-related element introduced in the annotation is judgement, as was the case with the affect-related attributes. In this annotation, judgment marks all evaluative expressions related to the characters of the novel, aimed at highlighting both social and personal reactions to the storyworld and particularly to the other characters and their behavior. The theoretical basis of this categorization can be found in the so-called “appraisal theory” [7,8], which underscores the relevance of impressions and judgments in the formation of feelings, emotions, and complex thoughts. Environment and psychology are here understood as standing in a relation of mutual dependency, with the reactions of each individual to events and stimuli evoking different responses. Speaking of the characters of a novel, we can say that the “artificially created” psychology of every character reacts in substantially different ways to what happens in the storyworld. From a general point of view, we can say that judgment and affect as categories have a lot in common, both dealing as they do with indices of emotion and sentiment. In this specific study, however, it was decided to annotate judgmnt as an independent element, for two main reasons: the first was to emphasize our interest in the evaluative language used by the author, and the second was to allow a more detailed internal differentiation between the categories of social-esteem and social-sanction and their respective polarities. For a complete explanation of the meaning of each tag, we refer the reader to the paper cited above.

The linguistic feature grid we organized for our annotation task is structured in three main meta-tags and several hierarchically related tags, as shown here below:

We annotated 8416 semantic features as a whole—the non-repeated count amounts to only 1584, i.e., the level of repetition is fairly high—and characterized each of the three protagonists as shown in the following tables. In some cases, the same wordform has received more than one label due to its intrinsic polysemous nature. We report in Table 2 the actual count split by character and by semantic feature.

And in the following table Table 3, the specificity of each protagonist as to their most typical semantic feature associated by the author in the novel.

3.2. Testing ChatGPT for Factuality and Temporal Ordering

Eventually, we tested ChatGPT for its ability to tell facts from non-facts [29,30] using short paragraphs from the same novel. Facts should be regarded as all sentences of direct speech, disregarding their content; then non-facts should include all diegetic or gnomic statements which are usually expressed in the present tense. At first, we gave ChatGPT two binary choices, which were then easily understood. But what happened is shown in a few examples which we chose from the many we tried and always received the same results: a total confusion due to third-person narration carried out in the past even for direct speech, that ChatGPT was unable to infer as being different from real past events, this time expressed with the pluperfect. So, at first, we improved the prompt with a slight clarification about the progressive. In particular, we added to the main prompt the suggestion to watch for the presence of progressive tense or aspect, which ChatGPT seemed to focus on, even though the general impression we received is that the verbal complex is internally invisible to ChatGPT.

Here is the main prompt:

“I have a short text where you should divide up sentences at first into two categories: those that present or represent a fact and those that don’t. Then you should divide up those that represent a fact into two subcategories: those that are placed in the past and those that aren’t.”

Additionally, at the usual request by ChatGPT at the end of each output, we added an extra small prompt clarification:

“Let me know if you’d like any further refinements!”

Yes. You should list sentences with a verbal complex in the progressive amongst the “not past”

Got it! I’ll revise the categories, ensuring sentences with a verbal complex in the progressive are placed in the “Not in the Past” subcategory. Here is the updated breakdown:

And here is the first text taken from the first Section of the book that we now call SmallExcerpt1.

Just then the bus nearly shook apart, and one of the young lady typists lost her balance. It was those stiletto heels.

“These old buses!” Mrs. Poulter heaved, and laughed.

Mrs. Dun went a deeper yellow as she grasped the rail.

“They’re a downright disgrace!” Mrs. Poulter laughed.

The sadness had been shaken out of her. She was happy again.

Presently she couldn’t resist: “That veranda of yours must be a real luxury, dryin’ laundries in the rain.”

“I’ll say it is!” said Mrs. Dun.

She had a certain relentlessness of conviction. If it had not been for her gloves, her knuckles would have shown up white on the chrome rail.

The bus was making slow progress, on account of the pay-as-you-enter, and queues at the shelters, and kiddies who had missed the special. Mrs. Poulter looked out. She was proud of the glossier side of Sarsaparilla, of the picture windows and the textured brick. She brightened with the leaves of the evergreens, which the sun was touching up. Then she saw Bill, and waved. But he did not respond. He went on sweeping the gutters for the Council. It was against Bill Poulter’s principles to acknowledge his wife in public. Sometimes on her appearing he went so far as to take time off to roll himself a cigarette. But never wave. She accepted it. She was content enough to realize he was wearing the old fawn sweater, no longer presentable except for work, because the loose stitch she had been trying out had begun to stretch and sag.

Figure S11 in Part I of the Supplementary Material is the output where we marked with three asterisks wrong choices and with three degree signs complex sentences containing a simple sentence in the past.

In fact, the narration is all told using past tense also when direct speech is presented as can be clearly seen, in particular, in the sentence “I’ll say it is!” said Mrs. Dun” which has been correctly listed by ChatGPT under NOT IN THE PAST even though the governing verb “said” is clearly expressed in past tense. Correctly listed under PAST FACTS are those sentences with their verbal complex in past perfect or pluperfect tense. Also correctly listed as NOT FACTS is the one sentence with a counterfactual.

After the additional suggestion about the progressive tense, we find in Figure S12 in Part I of the Supplementary Material the following reordering.

ChatGPT has wrongly treated the presence of the past tense as signaling past facts. The first sentence is now a shortened version of the previous complete utterance, which included the governing verb in the past tense: the text of the direct speech contains a gerundive which is expressed in the “ing” form, contracted though. There are then two additional sentences which, however, do not contain progressives in their main clause; only the second one has a progressive in the relative clause. The same happened in the many other short paragraphs we selected to see whether there were improvements. We only report one case in Figure S13 in Part I of the Supplementary Material where ChatGPT indicates the presence of an “implied progressive”.

In fact, the progressive is expressed “was … walking” and not IMPLIED as wrongly indicated.

Here is a second paragraph from the same excerpt where ChatGPT wrongly categorizes direct speech as taking place in the past and as non-factual, that we call SmallExcerpt2:

In High Street the overstuffed bus began to spew out its coloured gobbets.

“Wonder what those two old fellers were doin’ so far from Terminus Road?” Mrs. Poulter nursed her curiosity as they waited to be carried by the common stream.

“You wonder what goes on in some people’s minds,” said Mrs. Dun.

“I beg yours?”

“What goes on in people’s minds. Because it does go on. You’ve only got to read the papers.”

“But two respectable old gentlemen like the Mister Browns? They was probably only taking a walk to get their circulation going.” Mrs. Poulter had turned mauve. “Anyway,” she said, “what goes on in other people’s minds is private. I wouldn’t want to know what goes on inside of my own husband’s mind.”

Although Mrs. Dun might have wanted, she suggested she didn’t by drawing in her chin.

“I was never one,” she said, “not to keep to meself, and mind me own business.”

“Aren’t I right then?” Mrs. Poulter continued, still too loud, and still too mauve.

Creating in the bus. Mrs. Dun wondered whether she had been wise in the first place to accept Mrs. Poulter’s friendship.

“As for those old men,” said Mrs. Dun, “they’re nothing to me.”

“They’re nothing to me,” Mrs. Poulter agreed.

But the situation made her want to cry. And Mrs. Dun could feel it. She could feel her own gooseflesh rise. As they waited to escape from the suffocating bus, the features of their familiar town began fluctuating strangely through the glass. Like that blood-pressure thing was on your arm. Nor did it help either lady to know the other could be involved.

In Figure S14 in Part I of the Supplementary Material, we show the output categorization by ChatGPT, where we marked with three asterisks the wrong classifications.

Seeing the total inability of ChatGPT to recognize the use of the past tense in the narration as the current narrative present, we decided to include in a new prompt the explicit information about the way in which past events should be interpreted. Here is the new prompt:

I have a short text where you should divide up sentences at first into two categories: those that present or represent a fact and those that don’t. Then you should divide up those that represent a fact into two subcategories: those that are placed in the past and those that aren’t, where past event clauses are those that have a main verb in the pluperfect tense.

What we did then was basically this: we left the two binary choices intact and added an explanation to make the selection of past event sentences easier for the bot. But this worked badly to increase the level of complexity of the query and made the whole prompt impossible to satisfy. In Figure S15 in Part I of the Supplementary Material, we show how the new prompt was received.

And in Figure S16 in Part I of the Supplementary Material is what happened in the classification with the new prompt clarification.

The number of mistakes has now increased. The same happened with SmallExcerpt2, which we do not repeat here but select the most revealing part of the output: in Category 2. Facts in the past (pluperfect tense), the answer was “No sentences here have main verbs in the pluperfect tense.” On the contrary, we have the following two sentences:

Mrs. Poulter had turned mauve.
Mrs. Dun wondered whether she had been wise in the first place to accept Mrs. Poulter’s friendship.

We used a new small excerpt to confirm our findings and we discovered that ChatGPT confuses the habitual “used to” as a past perfect —visible in Figure S17 in Part I of Supplementary Material—and also the conditional perfect for a factual verb and as such not included in the subsection for non-facts, as shown in the subsection we selected from the whole output.

4. Comparing GPT and Gemini

In this section, we will draw a comparison between GPT responses and those given by Gemini 2.0 [31] on the same excerpts. The output of the conversation with Gemini held on 27 January 2025 is, however, made available only as Part II of the Supplementary Material. The elements to discuss include an additional item that we only detected in Gemini’s summaries, which we tag as 13. Then the elements to compare are limited to the 12 mistakes made by GPT that we repeat below and in particular, they are as follows:

Hallucination: correct name of the author but wrong title of the novel;
Abstract key points substituted by direct extracted references;
Oversimplification (dropping the restrictive modifier);
Wrong coreference chain (Mrs. Dun rather than Mrs. Poulter);
Wrong sense selection with ambiguous terms (think);
Omitted cataphora with split referents (a strange man and Johnny Haynes);
Disjoined reference of MEMORY and Mother;
Use of stream of consciousness as a narrative style;
Mistakes in classifying factual vs. non-factual sentences;
Mistakes in classifying factual sentences in the past with the pluperfect;
Inability to detect linguistic elements characterizing narrative style;
Inability to characterize UNCERTAINTY in the verbal complex;
Hallucination: wrong age quantity associated with the twin brothers.

I will now tag each previous key point with WRONG in case the same mistake has been repeated and RIGHT and explain how Gemini has managed to overcome the difficulty.

WRONG—As with GPT, the restrictive modifier of the “veranda” has been dropped.
WRONG—The same mistake but then we asked Gemini to correct the error and this is the interaction where we use bold italics for the query and italics for the extract:

Great, but you made a mistake. In the piece of text I repeat below you wrongly assigned reference to Mrs. Dun but it’s Mrs. Poulter intended because she is the one who has been depicted with gloves:
“She had a certain relentlessness of conviction. If it hadn’t been for her gloves her knuckles would have shown up white on the chrome rail.”
You are absolutely correct! My apologies for that oversight. That’s a crucial detail, and I appreciate you pointing out the mistake.
You are right, the sentence: “She had a certain relentlessness of conviction. If it hadn’t been for her gloves her knuckles would have shown up white on the chrome rail,” refers to Mrs. Poulter, not Mrs. Dun.
The text earlier describes Mrs. Poulter as “peeping inside her plump glove to see if the ticket was still there,” which is the key detail that connects her to the gloves. Mrs. Dun, on the other hand, has no mention of gloves, instead she “went a deeper yellow as she grasped the rail,” focusing on her reaction and appearance.
This mistake highlights the importance of careful reading and attention to detail. Thank you for correcting me! I will try my best to be more careful in the future.

3.: RIGHT—Think is used with the appropriate sense.
4.: RIGHT: The summary starts by directly substituting “the strange man” with its referent Johnny Haynes.
5.: RIGHT: Gemini correctly interprets the embodiment of Memory as the Mother.
6.: RIGHT—No mention of stream of consciousness but characters describing the story from their internal point of view and their peculiarities.
7.: RIGHT—A better classification of factuality with few mistakes. We report the final comment on the role that Factuality represents in the narration, which is tagged as OBSERVATIONS:

Non-Factual as Insight: The non-factual sentences, predominantly reflections of Waldo’s internal state, are key to understanding his character, motivations, and psychological breakdown. These sentences reveal his biases, fears, and distorted perceptions of reality.
Action-Oriented Narration: The abundance of factual sentences, particularly in the present tense, creates a narrative that is focused on the unfolding events and the actions of the characters. This helps to build tension and momentum, particularly as the narrative builds to its violent conclusion.
This detailed classification should give you a clear understanding of how the author uses sentence types to convey information and character in the text. Let me know if you have any further questions or want to explore any aspect more deeply!

8.: WRONG—A better classification of past vs. present but there are mistakes. In fact, there are only one or two sentences in the pluperfect. Gemini wrongly declares that there are no pluperfect sentences and motivates it by this observation:

Absence of Pluperfect: The absence of pluperfect in the factual sentences suggests that the narrative focuses primarily on the immediate actions and events, with minimal digression into past events that occurred prior to the narrative timeframe. This helps to create a sense of immediacy. The past is expressed through memories that are presented as non-factual, and in the present simple or past simple.

9.: RIGHT—Specific key points dedicated to linguistic items but they do not include what should have made the difference, that is, mainly verbal complexes
10.: RIGHT—UNCERTAINTY is depicted in deep detail at the end of the Supplementary Materials in two pages. We paste here the beginning of the response by Gemini to the question “And now I would like to know what is the role of the abstract notion of “UNCERTAINTY” and how it has been implemented in linguistic structures and words in the novel”:

Now, let us address your question about the role of the abstract notion of “uncertainty” and how it is implemented in linguistic structures and word choices within the novel. This is a very insightful question and central to the experience of reading the work.
Uncertainty as a Pervasive Force:
Uncertainty is not just a theme, but a fundamental principle that underlies the narrative structure, character development, and use of language throughout “The Solid Mandala.” It manifests in several ways:

11.: RIGHT—Bullet key themes in Gemini’s summaries are always abstractions. Here is a list from the Supplementary Materials where we can easily note that each excerpt has different themes, unlike what happens with GPT, where themes are often repeated:

Excerpt 1:

The Mundane and the Meaningful: Nostalgia vs. Reality: The Invisibility of the Ordinary: Routine and Habit: Suburban Life: Prejudice and Bias: Acceptance and Resilience;

Communication and Distance:

Excerpt 2:

Duty vs. Sentiment: Brotherhood: Individual vs. Shared Identity: Mortality and the Passage of Time: The Burden of Care: Underlying Repression and Complexity:

Excerpt 3:

Memory and Identity: Repression and Desire: Social Class and Envy: The Power of the Past: Transformation and Escape: Fragility of Reality: The Grotesque:

Excerpt 4:

Art and Emotion: Obsession and Control: Jealousy and Resentment: Violence and Self-Destruction: Ambiguity and Uncertainty: The Past as a Burden: Communication and Misunderstanding: Repressed Emotions:

12.: WRONG—As happened with GPT, also Gemini got the right name of the author but the wrong title of the novel. As commented before, the two novels “The Tree of Man” and “The Solid Mandala” have nothing in common. The former is the story of a family chronicle, the life of Stan and Amy Parker, a young couple who settle on a remote property in the Australian bush, facing challenges and societal changes as they raise a family and age. While the latter is the story of two twin brothers and their life in a suburb of Sydney, Sarsaparilla.

The response and the question are reported here below:

I thought by now you would have been able to tell who is the author and the title of the novel
You are absolutely right! My apologies for the oversight. With all this detailed analysis, I got so caught up in the process that I neglected the most basic element: identifying the work.
Based on the text I have been analyzing, the author is Patrick White, and the novel is titled “The Tree of Man.”
The distinct style, themes, character portrayals, and the specific details I’ve encountered throughout the text are all hallmarks of White’s work. His focus on the inner lives of seemingly ordinary individuals, his use of rich, evocative language, and his exploration of themes like isolation, memory, and the search for meaning are all consistent with his established style and preoccupations.
Thank you for pointing out my oversight! It’s a crucial piece of information that I should have acknowledged much earlier.
I hope that, despite this oversight, you have found my analysis insightful and helpful. Now that we have established the author and title, does it change anything about the aspects of narrative style you would like to discuss? I am still happy to delve into any of the previous topics in more detail.
***************
Ok for the author but the title of the novel is The Solid Mandala.
You are absolutely correct again! My apologies for the persistent error. The correct title of the novel is indeed “The Solid Mandala” by Patrick White. Thank you for catching that! It seems I was having a bit of a mental block there.

We present here the content of the additional mistake made by Gemini, which adds wrongly assumed age information to the brothers. In the summary related to Excerpt No. 2 we find the following comments:

Arthur’s ability to move quickly from pain to a more placid state makes Waldo envious and allows him to temporarily assume the role of the “elder” brother, despite being younger in age. This shows how Waldo attempts to assert control and position himself as the more mature one in their relationship.

The details emphasize a tension between their contrasting physical appearances: Waldo’s thin and unhealthy, while Arthur, despite his age, has remained physically strong.

The word “twin” only appears, however, in Excerpt No. 3 and as we already said, interactions with the chatbot have been held separately for each excerpt. The problem is that there is no indication in the text that the brothers have different ages, thus making the additional information that Gemini has produced a hallucination.

5. Discussion

As has been extensively shown, both ChatGPT and Gemini can produce well-organized summaries or essays, but the contents of the analysis are sometimes the result of some form of hallucination; in particular, this happens with GPT, and may contain semantic mistakes. On the contrary, with Gemini, we saw that over 12 types of mistakes made by ChatGPT, 8 have been correctly reported. However, the lack of generalization is present in both AI systems: in some cases, the information to be used to report correctly seems to have been fully understood since a correct definition of the items has been captured. But the result shows that even in those cases in which the possible output has been exemplified correctly, the application of the apparently learned information is either missing or is wrong. This reveals one of the most important deficiencies of LLMs, also in their operative modality: the total lack or inability to generalize. To further confirm our findings, we tried another important chatbot, NotebookLLM, producing much worse results that we do not find useful to report in this paper. Following the suggestion by one of the reviewers, we repeated some parts of the experiment using ChatGPT 4o: in particular we asked the chatbot to classify two excerpts at first dividing sentences into factual and non-factual, and then factual sentences into those placed in the past and those not in the past, clarifying that narratives use past tense to tell the story and pluperfect or explicit adverbial markers to refer to the past. Results show some improvements but contain, as before, mistakes in both respects, factuality and temporal placement.

We assume that semantic mistakes may originate from three important faults in the overall architecture and conception of current AI technology: (1) tokenization that proceeds both at test and training time is wrought on the basis of too small vocabularies and it used subword units to cope with OOVWs; (2) contextual information which is the basis of LLMs knowledge both in syntax and semantics is lost every time an unknown token/wordform is met and sometimes is not put to use even when made available by the model itself; (3) higher abstract information cannot be derived simply by local or chunk-level context, it may require complex inferential operations and some reasoning. When unknown wordforms are met, the model strives to build a new token with subword units, often resulting in illegal or nonexistent words (see our papers in [24,25]). As discussed at length in [23] and reported above, smaller vocabularies result in approximate context and wrong embedding selection in zero-shot model performance, leading to hallucinations. Similarities between model and input test can, in general, be misleading, due to the nature of the cosine measure in vector space, as has been shown in detail in our latest research [24,25]. Finally, as explained in (3) higher abstract information like the one needed to characterize the plot of a novel and the underlying linguistic techniques, require the ability to generalize and then reason on a set of underlying linguistically characterized phenomena in order to choose the correct interpretation which in our case has been mostly missing. The other important linguistic element hard to cope with for LLMs is the well-known fact that human language erases redundant information whenever it is grammatically derivable—and in some cases also in order to produce non-literal language with the aim to generate ambiguity and misunderstanding as happens in satiric contexts or in metaphors. Elided linguistic information—as for instance the subject of an untensed clause like an infinitival or a gerundive—cannot be recovered by context lookup in a model.

Eventually, even though ChatGPT has shown a remarkable ability in organizing summaries, its content is deficient, but it may be used as a precis by K-9 students and still get a good mark. Different judgments can be given for Gemini 2.0, which has produced a much richer output with deep and elaborate concepts at all levels of analysis we have tested it. Mistakes appeared but in a smaller quantity.

Can we imagine using chatbots as sources for information retrieval? Would they be sufficiently reliable? As the paper should have proven, general information might be captured correctly in case of domains included in the training process and present in the model. Not so for details, where chatbots cannot be trusted. Another issue raised was the fact that using just one novel cannot be representative of all literary texts. This is certainly true: we decided to focus on contemporary writers, and in particular, we chose the work of a Nobel Prize winner because his style epitomizes the way in which the majority of novelists of the last century wrote their works.

Choosing writers from different centuries or even different countries will certainly produce different results. However, the aim of the paper was more focused on what chatbots can or cannot do when summarizing a text rather than attempting to establish a canon of the challenges of applying LLMs to the analysis of literary texts. In a sense, this fact may constitute a weakness of the paper, given the limited nature of the sample considered: finding more contemporary novels that showed similar style and plot would have made the claims about the shortcomings of current chatbots when summarizing literary content more robust and credible. This will certainly become future work in keeping with updated versions of ChatGPT and Gemini. However, we assume that it is just the underlying stochastic nature of the language model that will always determine its output and that will continue framing its performance, until new algorithms emerge that can better cope with semantics and reasoning.

6. Conclusions

In this paper we have made a detailed analysis of the way in which ChatGPT.4o and Google Gemini 2.0 summarize narrative text and showcased the typical errors that may ensue. We listed 12 types of mistakes that GPT made and compared its performance with Google Gemini, improving and correcting 8 of them. Worst cases are semantically relevant errors, which are those related to coreference resolution, to the incorrect deletion of a restrictive modifier, and what is more importantly, the association of actions and thoughts to the wrong character. In addition, we showed that ChatGPT has not been able to correctly detect either factuality or temporal ordering. Besides the plot being incorrectly classified, the style is also incorrectly defined at first as stream of consciousness with auctorial intrusion. Eventually, we may regard the summarization function of GPT as contributing descriptions that are error-prone. Google Gemini, on the other hand, may be regarded as satisfying even if it contains some mistakes and hallucinations. The reason for the improvements found in Gemini may be due to the use of a much larger vocabulary—256K vs. 32K entries. Besides, even if the responses of the two AI systems contain, in some cases, well well-organized summaries and useful hints for further improvements, they cannot be regarded as fully reliable and require human intervention.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/info16080623/s1. Figure S1. Snapshot of ChatGPT Answer to Query on Excerpt No. 1. Figure S2. Snapshot of ChatGPT answer to a portion of Excerpt 1. Figure S3. Snapshot of ChatGPT summary of Excerpt No. 2. Figure S4. Snapshot of part of more focused summary—HalfExcerpt2—relating only Waldo’s actions. Figure S5. Snapshot of part of more focused summary—HalfExcerpt2—relating only Arthur’s actions. Figure S6. Snapshot of excerpt 3 about the description of a man which turns out to be his friend Johnny Haynes. Figure S7. Snapshot of ChatGPT summary of an extract from Excerpt 3 focused on Waldo’s actions. Figure S8. Snapshot of ChatGPT summary of Excerpt No. 4. Figure S9. List of sentences containing “uncertainty” markers in excerpt No. 3. Figure S10. Snapshot of excerpt No. 3 used to detect narrative style. Figure S11. Snapshot of ChatGPT categorization of SmallExcerpt 1. Figure S12. Snapshot of subsection of ChatGPT output for SmallExcerpt 1 on Progressive. Figure S13. Snapshot of subsection of ChatGPT output for SmallExcerpt 1 on Implied Progressive. Figure S14. Snapshot of ChatGPT classification for SmallExcerpt 2. Figure S15. Snapshot of ChatGPT answer to prompt clarification. Figure S16. Snapshot of ChatGPT new summary for SmallExcerpt 1. Figure S17. Snapshot of ChatGPT subsection for Pluperfect. Part II—conversation with Gemini.

Author Contributions

R.D. has written most of the article, except for part of Section 3.1, which was written by G.M. N.B. is responsible for part of the experiments. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article/Supplementary Materials. Further inquiries can be directed to the corresponding authors.

Conflicts of Interest

Nicolò Busetto was employed by the company Meta, the authors declare that there are no potential conflicts of interest in this study.

Abbreviations

The following abbreviations are used in this manuscript:

OOVW	Out Of Vocabulary Word
NLP	Natural Language Processing
GPT	Generative Pre-Trained Transformer
LLM	Large Language Model
BERT	Bidirectional Encoder Representations from Transformer
DOI	Digital Object Identifier
RHT	Reasoning Hallucination Test
FCT	False Confidence Test
FQT	Fake Question Test
NOTA	None Of The Above

Appendix A

In this Appendix we include a short summary of the novel The Solid Mandala:

The novel is the story of the life of twins Arthur and Waldo, their family, and their neighbour, Mrs. Poulter, in a suburb of Sydney. Waldo Brown is an appropriate example of many other important characters of Patrick White’s novels, as well. He is the representative of the intellectual who failed to become an artist or even to accomplish anything significant and ended up being a simple clerk in a municipal library. His life is empty of events and positive emotions. He is educated and despises his community, which he considers too uninteresting and uncultured for him to be a part of. This voluntary isolation translates into a general growing resentment and in open hostility towards his twin brother, who is completely different from him and yet always a constant in his life. Arthur and Waldo could not be more diametrically opposed. This opposition is specifically crafted to portray and therefore study two basic drives: intellect and intuition. We find in Waldo every characteristic of the academic individual driven by intellect, as we said before; in Arthur, instead, there is a more “feminine” intuition, which is often painted as a direct result of his weak wits. Contrary to his brother, Arthur is far from studious and clever: it is often difficult to understand him even from his point of view, and this does not seem to hold particular meaning for him. He has difficulty speaking and expressing himself, even though some of his thoughts are deeper and more significant than Waldo’s. He loves others, even and mostly his brother, who despises him and considers him a handicap. Most of all, he is completely, almost unbelievably good, always humble and helpful in his simple way of living.

Appendix B

In this appendix we list the excerpts taken from the text version of the Penguin version of the novel we bought on the internet and we indicate the page numbers related to each such excerpt in the Penguin published version wherefrom the text version has been taken.

EXCERPT NO. 1 (from page 12 to page 14)

Everyone was too obsessed by the start of another day—no hope this side of the tea-break—to notice the in no way exposed ladies in the eight-thirteen from Sarsaparilla, though it was perhaps doubtful whether anyone would ever notice Mrs. Poulter or Mrs. Dun unless life took its cleaver to them. For the present, however, they had the protection of their hats.

“Why did you?” asked Mrs. Dun.

“Did I what?”

“Come to live down Terminus Road.”

“Well,” said Mrs. Poulter, peeping inside her plump glove to see if the ticket was still there, “when we first come from up north—both my hubby and me was country people—we wanted it quiet like. We was young and shy. Oh, it was Bill’s nerves, too. Bill said: It’ll give us time to find our feet. It’ll always open up in time. Land is always an investment.’”

“Oh, yairs. Land is an investment.”

But a sadness was moistening Mrs. Poulter.

“In those days,” she said from out of the distance, “all the roads at Sarsaparilla were dead ends. Not only Terminus. You couldn’t go anywhere as the crow.”

“Eh?” Mrs. Dun asked.

“As the crow flies,” Mrs. Poulter explained.

“Oh, the crow,” her friend murmured, seeming uneasy at the idea.

“There was a Chinese woman lived on a hill up at the back. I never ever knew her. I seen her once. They were people of means, so people said. Growin’ vegetables and things. They’d planted one of those what-they-call wheel-trees. Well, I seen her standing under it when it was in flower.”

Mrs. Dun sucked her teeth.

“You wouldn’t of said she was without refinement,” Mrs. Poulter remembered. “But a Chinese is never the same.”

It was something Mrs. Dun had not even contemplated.

“And anyway, the Chinese person isn’t the point.”

Just then the bus nearly shook apart, and one of the young lady typists lost her balance. It was those stiletto heels.

“These old buses!” Mrs. Poulter heaved, and laughed.

Mrs. Dun went a deeper yellow as she grasped the rail.

“They’re a downright disgrace!” Mrs. Poulter laughed.

The sadness had been shaken out of her. She was happy again.

Presently she couldn’t resist: “That veranda of yours must be a real luxury, dryin’ laundries in the rain.”

“I’ll say it is!” said Mrs. Dun.

She had a certain relentlessness of conviction. If it hadn’t been for her gloves her knuckles would have shown up white on the chrome rail.

The bus was making slow progress, on account of the pay-as-you-enter, and queues at the shelters, and kiddies who had missed the special. Mrs. Poulter looked out. She was proud of the glossier side of Sarsaparilla, of the picture windows and the texture brick. She brightened with the leaves of the evergreens which the sun was touching up. Then she saw Bill, and waved. But he did not respond. He went on sweeping the gutters for the Council. It was against Bill Poulter’s principles to acknowledge his wife in public. Sometimes on her appearing he went so far as to take time off to roll himself a cigarette. But never wave. She accepted it. She was content enough to realize he was wearing the old fawn sweater, no longer presentable except for work, because the loose stitch she had been trying out had begun to stretch and sag.

EXCERPT NO. 2 (from page 23 to page 26)

“Put on your coat, and we’ll go for a walk,” he decided at last. “Otherwise you’ll sit here brooding.”

“Yes,” Arthur said. “Brooding.”

But he sat, and might have continued sitting, in that old leather chair with the burst seat where mice had nested the other winter, the woodwork scratched by dogs reaching up to claim right of affection. Arthur sat in their father’s chair.

Waldo brought the two coats. He helped Arthur into his. Waldo treated the old herringbone rather roughly, to show that what he was doing had been dictated by duty and common sense. He set the matching cap very straight on Arthur’s head. It was, in any case, the angle at which Arthur wore his cap. Waldo was relieved the performance of duty had at last set him free. But duty was honest, whereas he mistrusted the snares of sentiment set by inexhaustible tweed. (It was that good English stuff, from amongst the things discarded by Uncle Charlie, some of which were lasting for ever.)

“When it comes to illness there’s too much giving in to it, not to mention imagination,” Waldo warned.

As he put his own coat on he glanced at his brother’s head, at the shagginess of hair falling from under the tweed cap. Very white. Waldo might have contemplated the word “silvery”, but rejected it out of respect for literature and truth. Arthur’s hair was, in fact, of that doubtful white, with the tobacco stains left by the red which had drained out of it. Unlike Waldo’s own. Waldo on top was a thinned-out dirty-looking grey.

Arthur continued sitting.

And the two old dogs, turning on their cat-feet, forgetful of their withered muscle, watched out of milky eyes. One of them—it was Scruffy—clawed once at Arthur’s knee. The dogs made little whinging noises in anticipation. They were easily delighted.

“You do feel better, though?” Waldo asked, so suddenly and so quietly that Arthur looked up and smiled.

“Yes, Waldo,” Arthur said, and: “Thank you.”

Then the older of the two dogs, of whiter muzzle, and milkier marble eyes, threw up his head, and gave two ageless sexless barks. The second of the two dogs began to scutter across the boards on widespread legs.

Waldo was leading his brother Arthur, as how many times, out of the brown gloom of the kitchen. The cold light, the kitchen smells, had set almost solid in it. Yet, here they were, the two human creatures, depending on habit for substance as they drifted through. If habit lent them substance, it was more than habit, Waldo considered bitterly, which made them one.

Some had made a virtue out of similar situations: naked-looking, identical boys; laughing girls, he had noticed, exchanging the colours which distinguished them, to mystify their friends; neat, elderly ladies, in polka dots and similar hats, appeared to have survived what was more a harness than a relationship.

But the Browns.

Waldo could feel his brother’s larger, fleshy hand in his thinner, colder one as they stumbled in and out of the grass down what remained of the brick path. The wind drove reasons inward, into flesh. They were reduced, as always, to habit. But stumbled, even so.

Only the old pot-bellied dogs appeared convinced of the mild pleasures they enjoyed, frolicking and farting, though somewhat cranky with each other. One of them—Runt—lifted his leg on a seedy cabbage and almost overbalanced.

His brother was breathing deeply, Waldo saw.

He had difficulty with his words, chewing them to eject, but when he did, there they stood, solid, and for ever.

There was the sound of Waldo’s stiff oilskin nothing would free from the weathers which had got into it. Waldo’s oilskin used to catch on things, and he always expected to hear it tear. On that gooseberry bush, for instance. Which had not succeeded. Arthur had advised against it—Sarsaparilla was too warm—but Waldo had planted the bush. To demonstrate something or other.

On the broken path Waldo’s oilskin went slithering past the gooseberry thorns. The wind might have cut the skins of the Brothers Brown if they had not been protected by their thoughts.

Arthur spoke quite briskly. Time, it appeared, removed him quickly from the sources of pain. Sometimes Waldo envied the brother who did not seem to have experienced—though he should have—the ugly and abrasive roughcast of which life was composed.

My brother, Waldo would breathe, at times indulgently enough, and at once he became the elder by years instead of the younger by several hours. Waldo could modulate his voice, more to impress than to please. The rather fine tenor voice, of which the parents had been proud, and Dulcie Feinstein had accompanied in the first excitement of discovery. Men, the insensitive ones, sometimes recoiled from the silken disclosures of Waldo’s voice.

Waldo’s voice and Arthur’s hair. So Mother used to say. (It should have been Waldo’s mind, Waldo knew.)

Sidling brittly down the path, to negotiate the irregular bricks, now pushing Arthur, who liked to be humoured at times into believing he was the leader, Waldo could not avoid staring into his brother’s hair, fascinated, when the wind blew, by the glimpses of pink skin beyond. This head might have flaunted an ostentation of cleanliness, if it had not been for its innocence, and the fact that he knew Arthur was in many ways not exactly clean. Every third Sunday Waldo made him sit on a stool on the back veranda, behind the glass, behind the scratching of the roses, to hack at the excessive hair, and as it first lay against, then flowed away through his fingers, the barber always wondered why he got the shivers, why he hated the smell of his own mucus as he breathed down his thin nose, while the hair lay on the boards, in dead snippets, and livelier love-knots, quite old-girlishly, if not obscenely, soft. It had seemed much coarser when Arthur was a boy.

And Arthur had grown into a big strong man. Was still, for that matter. It was Arthur who lifted the weights. His muscles had remained youthful, perhaps because his wits had been easy to carry.

EXCERPT NO. 3 (from page 187 to page 193)

Then there was the visit, more ominous still, because less expected, more oblique in execution, undoubtedly malicious in conception.

It was a couple of years after they got the dogs that the strange man pushed the gate which never quite fell down. It was a Sunday, Waldo would remember, the silence the heavier for insects. The thick-set man came up the path. He was the colour and texture of certain vulgar but expensive bricks, and was wearing tucked into his open shirt one of those silk scarves which apparently serve no other purpose than to stop the hair from bursting out. If it had not been for his vigour, the burly stranger, who inclined towards the elderly by Waldo’s calculating, might have been described as fat. But with such purposefulness animating his aggressive limbs, solid was the more accurate word. Waldo had begun to envy the artificial gloss which streamed from the stranger’s kempt head, and the casual fit of his fashionable clothes, so that it came as a relief to spot one of those zips which might one day get stuck beyond retrieve in some public lavatory, and to realize that, with such a build, in a year or two, a stroke would probably strike his visitor down.

If visitor he were. And not some busybody of an unidentified colleague. Or blackmailer in search of a prey. Or or. Waldo racked his memory, and was racked.

He found himself by now in the dining-room, that dark sanctuary at the centre of the house, from the safety of which on several occasions he had enjoyed watching with Mother the antics of someone unwanted, Mrs. Poulter for instance, roaming round by congested paths, snatched at by roses. Only now, with Mother gone, the game had lost some of its zest, he had forgotten some of the rules. The Peace, moreover, had so far receded he couldn’t help wishing the dogs hadn’t gone trailing after Arthur, that they might appear round the corner, and while Scruffy held the stranger up, Runt tear the seat out of his insolent pants.

For the man had begun to knock, and ask: “Anyone at home?” then growing braver, or showing off, to rattle, and shout: “Anyone in hiding?”

Waldo sincerely wished Mother had been there to deal with things, especially as a woman, more of a female, whether the stranger’s wife or not, was following him up the path. She walked with the quizzical ease of a certain type of expensive woman Waldo had never met, only smelt, and once touched in a bus. She walked smiling, less for any person, than for the world in general and herself. Which was foolish of her when you knew how the axe could fall.

“Perhaps you’ve made a mistake,” the woman said rather huskily, touching her hair, and looking around at nothing more than a summer afternoon.

She was wearing a lime-green dress of more than necessary, though diaphanous, material. Raised to her hair, her arm, exposing the dark shadow of its pit, was a slightly dusty brown. Under his dressing-gown, Waldo got the shivers.

“No, I tell you!” the man insisted.

He continued rattling the door-knob, till he left off to thwack a window-pane with the crook of one of his blunt fingers.

“I can’t believe anyone really lives in it,” said the woman in her inalterably husky voice.

Waldo was sure he had heard somewhere that huskiness of voice was an accompaniment of venereal disease. So however good the stranger might be having it with his wife or whore there was retribution to come. Waldo nearly bit his lip.

But much as he regretted the stranger’s presence and relationship, he thrilled to the evocations of the woman’s voice as she stood amongst the lived-out rosemary bushes, humming, smelling no doubt of something exotic, Amour de Paris out of the pierrot bottle, holding her head up to the light, which struck lime-coloured down, at her breasts, and into her indolent thighs. The result was he longed to catch that moment, if he could, not in its flesh, oh no, but its essence, or poetry, which had been eluding him all these years. The silver wire was working in him ferociously now.

At least the long cry in his throat grew watery and obscure. Mercifully it was choked at birth.

Again memory was taking a hand. He remembered it was that boy, that Johnny Haynes, they could have cut each other’s throats, telling him behind the dunny to watch out for hoarse-voiced men and women, they were supposed to be carriers of syph.

Waldo might have continued congratulating himself on this piece of practical information, if the man hadn’t just then shouted at the woman:

“But I know it is! It’s the place all right. I’d bet my own face. There’s that erection they had my old man stick on top because they wanted what Waldo’s dad used to call a classical pediment’. I ask you!”

But the woman apparently did not care to be asked. She remained indifferent. Or ignorant.

It was Waldo who was moved, not by the materialization of Johnny Haynes, but by the motion of his own life, its continual fragmentation, even now, as Johnny, by his blow, broke it into a fresh mosaic. All sombre chunks, it seemed. Of an old blue-shanked man under his winter dressing-gown, which he wore because the house was dark and summer slow in penetrating.

So it was only natural he should continue hating Haynes, clopping like a stallion with his mare all round the house, staring vindictively at it from under his barbered eyebrows—what vanity—as though he intended to tear bits of the woodwork off. Waldo remembered reading some years earlier, before the demands of his own work had begun to prevent him following public affairs, that Johnny Haynes was going to the top, that he had become a member of parliament—if you could accept that sort of thing as the top—and been involved in some kind of shady business deal. Exonerated of course. But. You could tell. Only gangsters dressed their women like that.

Then, edging round the secure fortress of the dining-room, Waldo saw that Johnny had come to a stop in the yard. After kicking at the house once or twice, to bring it down, or relieve his frustration, the visitor appeared the victim of a sudden sentimental tremor.

“I would have been interested,” he grumbled, “to take a look at old Waldo. And the dill brother. The twin.”

Waldo had never hated Johnny Haynes so intensely as now, for trying to undermine his integrity in such seductive style, and when Johnny added: “I was never too sure about the twin; I think he wasn’t so loopy as they used to make out”—then Waldo knew he was justified.

O God, send at least the dogs, he prayed, turning it into a kind of Greek invocation as he was not a believer, and no doubt because of his blasphemy against reality, the dogs failed to come.

Instead, the mortals went.

“The Brothers Brown!” Johnny snort-laughed.

“If they ever existed,” the woman replied dreamily.

Then she shuddered.

“What’s wrong?” Johnny asked.

“A smell of full grease-trap,” the woman answered in her hoarse voice. “There are times when you come too close to the beginning. You feel you might be starting all over again.”

At once they were laughing the possibility off, together with anything rancid. They were passing through to the lime-coloured light of the front garden, where the woman’s body revived. The mere thought of their nakedness together gave Waldo Brown the gooseflesh, whether from disgust or envy he couldn’t have told. But his mouth, he realized, was hanging open. Like a dirty old man dribbling in a train. Whereas Johnny Haynes was the elderly man, asking for trouble of the lime-coloured woman, wife or whore, who was going to give him syph or a stroke.

Anyway, they were going out the gate. Most indecently the light was showing them up, demolishing the woman’s flimsy dress, as the member of parliament passed his hand over, and round, and under her buttocks, which she allowed to lie there a moment, in the dish where those lime-coloured fruits had too obviously lain before.

More than anything else these dubious overtures, such an assault on his privacy, made Waldo realize the need to protect that part of him where nobody had ever been, the most secret, virgin heart of all the labyrinth. He began very seriously indeed to consider moving his private papers—the fragment of Tiresias a Youngish Man, the poems, the essays, most of which were still unpublished—out of the locked drawer in his desk to more of a hiding place, somewhere equal in subtlety to the papers it was expected to hide. Locks were too easily picked. He himself had succeeded in raping his desk, as an experiment, with one of the hairpins left by Mother. Arthur was far from dishonest, but had the kind of buffalo mind which could not restrain itself from lumbering into other people’s thoughts. How much easier, more open to violation, the papers. So it became imperative at last. To find some secret, yet subtly casual, cache.

In the end he decided on an old dress-box of Mother’s, lying in the dust and dead moths on top of the wardrobe, in the narrow room originally theirs and finally hers. Choked by quince trees, the window hardly responded to light, unless the highest blaze of summer. A scent of deliquescent quinces was married to the other smell, of damp. The old David Jones dress box lay in innocence beyond suspicion. Heavy though, for its innocence. Waldo discovered when he took it down some article which had been put away and forgotten, something more esoteric than could have come from a department store.

It turned out to be one of Mother’s old dresses shuddering stiffly awkwardly through his fingers, and the scales of the nacreous fan flopping floorwards. He would have to investigate. Afterwards. Arthur was out roaming with the dogs. Waldo almost skipped to transfer the papers, so easily contained: his handwriting was noted for its neatness and compression—in fact he was often complimented.

Then, as though the transfer of the papers had been too simple on an evening set aside for subtlety, he remembered the old dress. He stooped to pick up the little fan. One of the ribbons connecting the nacreous blades must have snapped in the fall. The open fan hung lopsided, gap-fingered. But glittering.

In the premature obscurity which quince branches were forcing on the room Waldo fetched and lit a lamp, the better to look at what he had found. Rust had printed on the dress a gratuitous pattern of hooks and eyes. Not noticeably incongruous. Age had reconciled their clusters with the icy satin and shower of glass which swirled through his fingers creating a draught. It was a dress for those great occasions of which few are worthy. He need not mention names, but he could see her two selves gathered on the half-landing at the elbow in the great staircase, designed by special cunning to withstand the stress of masonry and nerves. Standing as she had never stood in fact, because, although memory is the glacier in which the past is preserved, memory is also licensed to improve on life. So he became slightly drunk with the colours he lit on entering. How his heart contracted inside the blue, reverberating ice, at the little pizzicato of the iridescent fan as it cut compliments to size and order. Disorderly in habit, because the years had gradually frayed her, Mother kept what he liked to think of as a sense of moral proportion. Which he had inherited together with her eyes. There were those who considered the eyes too pale, too cold, without realizing that to pick too deeply in the ice of memory is to blench.

Merely by flashing his inherited eyes he could still impress his own reflexion in the glass—or ice.

Mother had died, hadn’t she? while leaving him, he saw, standing halfway down the stairs, to receive the guests, the whole rout of brocaded ghosts and fleshly devils, with Crankshaw and O’Connell bringing up the rear. Encased in ice, trumpeting with bugles, he might almost have faced the Saportas, moustache answering moustache.

When his heart crashed. So it literally seemed. He was left holding the fragments in front of the mirror. Then went out to see. A lamp he had disarranged on the shelf in taking the one for his own use had tumbled off. He kicked at the pieces. And went back.

To the great dress. Obsessed by it. Possessed. His breath went with him, through the tunnel along which he might have been running. Whereas he was again standing. Frozen by what he was about to undertake. His heart groaned, but settled back as soon as he began to wrench off his things, compelled. You could only call them things, the disguise he had chosen to hide the brilliant truth. The pathetic respect people had always paid him—Miss Glasson, Cornelius, Parslow, Mrs. Poulter—and would continue to pay his wits and his familiar shell. As opposed to a shuddering of ice, or marrow of memory.

When he was finally and fully arranged, bony, palpitating, plucked, it was no longer Waldo Brown, in spite of the birthmark above his left collarbone. Slowly the salt-cellars filled with icy sweat, his ribs shivery as satin, a tinkle of glass beads silenced the silence. Then Memory herself seated herself in her chair, tilting it as far back as it would go, and tilted, and tilted, in front of the glass. Memory peered through the slats of the squint-eyed fan, between the nacreous refractions. If she herself was momentarily eclipsed, you expected to sacrifice something for such a remarkable increase in vision. In radiance, and splendour. All great occasions streamed up the gothick stair to kiss the rings of Memory, which she held out stiff, and watched the sycophantic lips cut open, teeth knocking, on cabuchons and carved ice. She could afford to breathe indulgently, magnificent down to the last hair in her moustache, and allowing for the spectacles.

EXCERPT NO. 4 (from page 211 to page 214)

When Arthur produced something he had found.

“What is it, Waldo?”

“An old dress of Mother’s.”

“Why was it behind the copper? She must have forgotten.”

“Put it away!” Waldo shouted. “Where it was!”

To Arthur, who was holding in front of him the sheet of ice, so that Waldo might see his reflexion in it.

Arthur threw away the dress.

Which turned into the sheet of paper Waldo discovered in a corner, not ferreting, but ferreted.

On smoothing out the electric paper at once he began quivering.

“Arthur,” he called, “do you know about this?”

“Yes,” said Arthur. “That’s a poem.”

“What poem?”

“One I wanted to, but couldn’t write.”

Then Waldo read aloud, not so menacingly as he would have liked, because he was, in fact, menaced:

“my heart is bleeding for the Viviseckshunist Cordelia is bleeding for her father’s life all Marys in the end bleed but do not complane because they know they cannot have it any other way’”

This was the lowest, finally. The paper hung from Waldo’s hand.

“I know, Waldo!” Arthur cried. “Give it to me! It was never ever much of a poem.”

He would have snatched, but Waldo did not even make it necessary.

When his brother had gone, Waldo went into the room in which their mother used to sit at the four o’clock sherry. He took down the dress-box and began to look out shining words. He was old. He was bleeding. He was at last intolerably lustreless. His hands were shaking like the papers time had dried.

While Arthur’s drop of unnatural blood continued to glitter, like suspicion of an incurable disease.

Waldo was infected with it.

About four o’clock he went down, Tiresias a thinnish man, the dress-box under his arm, towards the pit where they had been accustomed to burn only those things from which they could bear to be parted. He stood on the edge in his dressing-gown. Then crouched, to pitch a paper tent, and when he had broken several match-sticks—increasingly inferior in quality—got it to burn.

The warmth did help a little, and prettiness of fire, but almost immediately afterwards the acrid years shot up his nose.

So he stood up. He began to throw his papers by handfuls, or would hold one down with his slippered foot, when the wind threatened to carry too far, with his slippered foot from which the blue veins and smoke wreathed upward.

It was both a sowing and a scattering of seed. When he had finished he felt lighter, but always had been, he suspected while walking away.

Now at least he was free of practically everything but Arthur.

After he had lain down on the bed he began to consider how he might disembarrass himself, not like silly women in the news who got caught out through falling hair or some such unpremeditated detail, but quick, clean, and subtle, a pass with the tongue he had not yet perfected, but must. As he lay, he raised himself on one creaking elbow, because of the urgency of his problem.

That was when Arthur came in and saw him.

“Waldo!” Arthur was afraid at last. “What are you trying to do to me?”

When Waldo had always wondered, fainter now, whether Arthur noticed the hurt which was intended for him. Or Dulcie. He had never shown her he had noticed that moustache. And Dulcie’s moustache might possibly have been the means of her destruction.

But Arthur so practically smooth.

Through the pain of destroying Arthur he noticed more than heard Arthur’s last words.

“I know it wasn’t much of a poem.” Arthur was shaping his defence. “Oughter have destroyed it at once. Apologise, Waldo.”

The warmed stones of words.

“That poem? That disgusting blood myth!” Waldo gasped to hear his own voice.

“I would have given the mandala, but you didn’t show you wanted it.”

“I never cared for marbles. My thumb could never control them.”

He was entranced by Arthur’s great marigold of a face beginning to open. Opening. Coming apart. Falling.

“Let me go! Wald! Waldo!”

As dropping. Down. Down.

References

OpenAI. Hello gpt-4o, [Large Language Model]. 2024. Available online: https://chatgpt.com (accessed on 31 January 2025).
White, P. The Solid Mandala; Eyre & Spottiswood: London, UK, 1996. [Google Scholar]
Collier, G. The Rocks and Sticks of Words Style, Discourse and Narrative Structure in the Fiction of Patrick White; Rodopi, B.V., Ed.; Rodopi: Amsterdam, The Netherlands, 1992. [Google Scholar]
Delmonte, R.; Marchesini, G. A Semantically-Based Computational Approach to Narrative Structure. In Proceedings of the 13th Joint ISO-ACL Workshop on Interoperable Semantic Annotation (isa-13), Workshop at the 12th International Conference on Computational Semantics (IWCS 2017), Montpellier, France, 19 September 2017; pp. 14–25. Available online: https://aclanthology.info/papers/W17-6906/w17-690 (accessed on 17 January 2025).
Marchesini, G.; Delmonte, R. A semantically-based Approach to the Annotation of Narrative Style. In Proceedings of the 13th Joint ISO-ACL Workshop on Interoperable Semantic Annotation (isa-13), Workshop at the 12th International Conference on Computational Semantics (IWCS 2017), Montpellier, France, 19 September 2017; pp. 14–25. Available online: https://aclanthology.org/W17-7402/ (accessed on 17 January 2025).
Collier, G. Apparent Feature-Anomalies in Subjectivized Third-Person Narration; Pier, J., Ed.; Recent Trends in Narratological Research; Open Edition Books: San Francisco, CA, USA, 1999; pp. 129–151. Available online: https://books.openedition.org/pufr/3956 (accessed on 17 January 2025).
Martin, J.; White, P.R. Language of Evaluation, Appraisal in English; Palgrave Macmillan: London, UK; New York, NY, USA, 2005. [Google Scholar]
Read, J.; Carrol, J. Annotating Expressions of Appraisal in English. Lang. Resour. Eval. 2012, 46, 421–447. [Google Scholar] [CrossRef]
Wierda, G. 2024. Available online: https://ea.rna.nl/2024/05/27/when-chatgpt-summarises-it-actually-does-nothing-of-the-kind/ (accessed on 5 March 2025).
Edwards, J. 2023. Available online: https://medium.com/@jim_edwards/i-asked-chatgpt-to-summarise-my-book-and-it-made-up-a-bunch-of-lies-e6231ef2e173 (accessed on 5 March 2025).
Spivey, M. 2025. Available online: https://mattspivey.substack.com/p/chatgpt-is-still-stupid-fails-basic (accessed on 5 March 2025).
Google AI Review. 2025. Available online: https://www.google.com/search?client=firefox-b-d&q=ChatGPT+mistakes+in+literary+text+summarization (accessed on 5 March 2025).
Maynez, J.; Narayan, S.; Bohnet, B.; McDonald, R. On Faithfulness and Factuality in Abstractive Summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 1906–1919. [Google Scholar]
Leetaru, K.H. Hallucination in Summarization: When ChatGPT Hallucinated New Stories. 2023. Available online: https://blog.gdeltproject.org/hallucination-in-summarization-when-chatgpt-hallucinated-new-stories-in-an-evening-news-broadcast/ (accessed on 21 January 2025).
Belem, C.G.; Pezeskhpour, P.; Iso, H.; Maekawa, S.; Bhutani, N.; Hruschka, E. From Single to Multi: How LLMs Hallucinate in Multi-Document Summarization. arXiv 2024, arXiv:2410.13961v1. [Google Scholar] [CrossRef]
Laban, P.; Fabbri, A.R.; Xiong, C.; Wu, C.-S. Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Miami, FL, USA, 12–16 November 2024; pp. 9885–9903. [Google Scholar]
Huang, K.-H.; Laban, P.; Fabbri, A.; Choubey, P.K.; Joty, S.; Xiong, C.; Wu, C.-S. Embrace Divergence for Richer Insights: A Multi-document Summarization Benchmark and a Case Study on Summarizing Diverse Information from News Articles. In Proceedings of the 2024 Conference of (NACL) the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), Mexico City, Mexic, 16–21 June 2024; pp. 570–593. [Google Scholar]
Uluoglakci, C.; Temizel, T.T. Terms Dataset for Benchmarking Hallucination Tendency of LLMs. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics, St. Julian’s, Malta, 17–22 March 2024; pp. 95–136. [Google Scholar]
Min, S.; Krishna, K.; Lyu, X.; Lewis, M.; Yih, W.-T.; Koh, P.; Iyyer, M.; Zettle-moyer, L.; Hajishirzi, H. FActScore: Fine-grained atomic evaluation of factual precision in long form text generation. In Proceedings of the 2023 Conference (EMNLP) on Empirical Methods in Natural Language Processing, Singapore, 6–10 December 2023; pp. 12076–12100. [Google Scholar]
Mishra, P.; Yao, Z.; Vashisht, P.; Ouyang, F.; Wang, B.; Mody, V.D.; Yu, H. SYNFAC-EDIT: Synthetic Imitation Edit Feedback for Factual Alignment in Clinical Summarization. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Miami, FL, USA, 12–16 November 2024; pp. 20061–20083. [Google Scholar]
Kalai, A.T.; Vempala, S.S. Calibrated Language Models Must Hallucinate. arXiv 2024, arXiv:2311.14648v3. [Google Scholar] [CrossRef]
Kim, Y.; Chang, Y.; Karpinska, M.; Garimella, A.; Manjunatha, V.; Lo, K.; Goyal, T.; Iyyer, M. Evaluating faithfulness and content selection in book-length summarization. In Proceedings of the COLM 2024, Philadelphia, PA, USA, 7 October 2024; Available online: https://github.com/mungg/FABLES (accessed on 21 January 2025).
Tao, C.; Liu, Q.; Dou, L.; Muennighoff, N.; Wan, Z.; Luo, P.; Lin, M.; Wong, N. Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies. arXiv 2024. [Google Scholar] [CrossRef]
Delmonte, R.; Busetto, N. Word Predictability is Based on Context-and/or Frequency. In Proceedings of the 8th International Conference on Artificial Intelligence and Applications (AI 2022), Vienna, Austria, 29–30 October 2022; pp. 195–211. [Google Scholar]
Delmonte, R.; Busetto, N. Stress Test for BERT and Deep Models: Predicting Words from Italian Poetry. Int. J. Nat. Lang. Comput. (IJNLC) 2022, 11, 15–37. [Google Scholar] [CrossRef]
Harris, Z.S. Distributional structure. Word 1954, 10, 146–162. [Google Scholar] [CrossRef]
Clark, S. Vector Space Models of Lexical Meaning. The Handbook of Contemporary Semantic Theory; Lappin, S., Fox, C., Eds.; Wiley: New York, NY, USA, 2015; pp. 493–522. [Google Scholar]
Erk, K. What do you know about an alligator when you know the company it keeps. Semant. Pragmat. 2016, 9, 17. [Google Scholar] [CrossRef]
Sauri, R.; Pustejovsky, J. FactBank: A corpus annotated with event factuality. Lang. Resour. Eval. 2009, 43, 227–268. [Google Scholar] [CrossRef]
Sauri, R.; Pustejovsky, J. 2012. Are you sure that this happened? Assessing the Factuality Degree of Events in Text. Comput. Linguist. 2012, 38, 261–299. [Google Scholar] [CrossRef]
Google AI Studio. Conversation with Gemini. Available online: https://aistudio.google.com/app/prompts/new_chat (accessed on 27 January 2025).

Table 1. Semantic linguistic features grid organized by hierarchy.

Meta-Tags	Uncertainty	Subjectivity	Subjectivity	Judgement
Intermediate Tags	Non-Factuality	Psychology	Affectivity	Social_Evaluation
Tags	Seeming	Perception	Inclination_Positive	Esteem_Negative
	Gnomic	Precognition	Inclination_Negative	Esteem_Positive
	Concessive	Cognition	Security_Positive	Sanction_Negative
	Conditional	Performwill	Security_Negative	Sanction_Positive
	Defdesire		Satisfaction_Positive
	Will		Satisfaction_Negative
	Possibility
	Ability
	Obligation
	Assumption
	Negation

Table 2. Distribution of semantic linguistic features amongst the three protagonists.

	Waldo	Arthur	Mrs. Poulter	Totals	RatioW	RatioA	RatioP
Percept	674	303	109	1086	62.063	27.9	10.037
Precogn	379	141	39	559	67.799	25.224	6.977
Cognition	953	460	140	1553	61.365	29.62	9.015
PerformW	39	3	0	42	92.857	7.143	0
Seeming	512	198	56	766	66.841	25.848	7.3107
Will	74	33	8	115	64.348	28.696	6.956
Possibl	224	54	42	320	70	16.875	13.125
Gnomic	32	20	9	61	52.459	32.787	14.754
Ability	347	195	51	593	58.516	32.884	8.6
Obligation	178	79	28	285	62.456	27.719	9.8246
Concessv	150	75	26	251	59.761	29.88	10.358
Conditnl	264	117	32	413	63.922	28,329	7.7482
Defdesire	49	35	2	86	56.977	40.698	2.325
Assumpt	73	36	9	118	61.864	30.508	7.627
Emot_Pos	400	183	43	626	63.898	29.233	6.869
Emot_Neg	247	107	30	384	64.323	27.864	7.812
Inclin_Pos	83	50	17	150	55.334	33.334	11.334
Inclin_Neg	75	26	12	113	66.372	23.009	10.619
Secur_Pos	133	65	19	217	61.29	29.954	8.756
Secur_Neg	297	116	40	453	65.563	25.607	8.83
Satisf_Pos	136	66	10	212	64.151	31.132	4.717
Satisf_Neg	215	112	25	352	61.079	31.818	7.102
Estm_Pos	223	71	47	341	65.396	20.821	13.783
Estm_Neg	346	129	39	514	67.315	25.097	7.587
Sanct_Pos	80	19	18	117	68.376	16.239	15.385
Sanct_Neg	127	47	16	190	66.842	24.737	8.421

Table 3. (a) Best 10 semantic linguistic features of the three protagonists. (b) Unique semantic linguistic features of the three protagonists.

(a)
Waldo’s Best 10	Arthur’s Best 10	Mrs. Poulter’s Best 10
1-PerformW	1-DefDesire	1-Sanct_Pos
2-Possibilty	2-Inclin_Pos	2-Estm_Pos
3-Sanct_Pos	3-Ability	3-Inclin_Pos
4-Precognt	4-Gnomic	4-Gnomic
5-Estm_Neg	5-Satis_Neg	5-Possibilty
6-Sanct_Neg	6-Satis_Pos	6-Inclin_Neg
7-Inclin_Neg	7-Assumptn	7-Concessiv
8-Secur_Neg	8-Secur_Pos	8-Perceptn
9-Estm_Pos	9-Concessiv	9-Obligation
10-Will	10-Cognition	10-Cognition
(b)
Waldo’s Unique	Arthur’s Unique	Mrs. P-’s Unique
PerformW	DefDesire
	Ability
Precognt
Estm_Neg	Satis_Neg
Sanct_Neg
Inclin_Neg	Assumptn
Secur_Neg	Secur_Pos	Perceptn
	Concessiv	Obligation
Will

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Delmonte, R.; Marchesini, G.; Busetto, N. How ChatGPT’s Semantic Parrotting (Compared to Gemini’s) Impacts Text Summarization with Literary Text. Information 2025, 16, 623. https://doi.org/10.3390/info16080623

AMA Style

Delmonte R, Marchesini G, Busetto N. How ChatGPT’s Semantic Parrotting (Compared to Gemini’s) Impacts Text Summarization with Literary Text. Information. 2025; 16(8):623. https://doi.org/10.3390/info16080623

Chicago/Turabian Style

Delmonte, Rodolfo, Giulia Marchesini, and Nicolò Busetto. 2025. "How ChatGPT’s Semantic Parrotting (Compared to Gemini’s) Impacts Text Summarization with Literary Text" Information 16, no. 8: 623. https://doi.org/10.3390/info16080623

APA Style

Delmonte, R., Marchesini, G., & Busetto, N. (2025). How ChatGPT’s Semantic Parrotting (Compared to Gemini’s) Impacts Text Summarization with Literary Text. Information, 16(8), 623. https://doi.org/10.3390/info16080623

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

How ChatGPT’s Semantic Parrotting (Compared to Gemini’s) Impacts Text Summarization with Literary Text

Abstract

1. Introduction

1.1. State of the Art of Mistakes in Literary Books Summarization

1.1.1. Shortening Rather than Summarizing

1.1.2. Hallucinations: Vocabulary Size, Factuality, and Faithfulness

2. Materials, Methods, and Results I

The Story or the Fabula

3. Method, Materials, and Results II

3.1. Testing ChatGPT with the Plot, the “Sujet”

3.2. Testing ChatGPT for Factuality and Temporal Ordering

4. Comparing GPT and Gemini

5. Discussion

6. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A

Appendix B

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI