Next Article in Journal
Introduction to the Special Issue on Systems Thinking in Anthropology: Understanding Cultural Complexity in the Era of Super-Diversity
Previous Article in Journal
Areas and Consequences of the Mismatch Between Ancestral and Modern Conditions on Mate-Retention Capacity
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Beyond Abducted Semantics: Ethnographic Methods and Literary Theory as Frameworks for Research Engines That Enhance Human Understanding

by
Alison Louise Kahn
School of Social Sciences and Humanities, Loughborough University, Loughborough LE11 3TU, UK
Humans 2025, 5(4), 30; https://doi.org/10.3390/humans5040030
Submission received: 16 October 2025 / Revised: 13 November 2025 / Accepted: 24 November 2025 / Published: 1 December 2025

Abstract

This article examines how ethnographic methodology and literary theory can advance research engines and artificial intelligence systems beyond the reductive computational approaches that dominate contemporary AI development. Drawing on recent Stanford research revealing fundamental gaps in large language models’ ability to distinguish factual knowledge from belief, I argue that contemporary AI systems enact what I term “abducted semantics”—appropriating the inferential logic of human meaning-making while systematically attenuating the culturally embedded, phenomenologically grounded capacities that generate authentic understanding. Through close analysis of Clifford Geertz’s thick description, Charles Sanders Peirce’s triadic semiotics, and canonical literary works—Miguel de Cervantes’ Don Quixote and Gabriel García Márquez’s One Hundred Years of Solitude—I demonstrate that human understanding operates through complex semiotic processes irreducible to pattern-matching and statistical prediction. The article proposes concrete interventions to transform research engines from tools of semantic extraction into technologies that preserve and enhance interpretive richness, arguing that ethnographic and literary methodologies offer essential correctives to the epistemological impoverishment inherent in current AI architectures.

1. Introduction: The Double Abduction of Human Semantics

Contemporary debates about artificial intelligence systems frequently oscillate between technological optimism and humanistic concern, with surprisingly little common ground between these positions. Recent research from Stanford University has revealed a fundamental limitation in large language models (LLMs): their inability to distinguish between factual knowledge and human belief, particularly when processing statements like “I believe that humans only use 10% of their brains.” (Stanford University, 2025). When confronted with such false beliefs, models including GPT-4o consistently refuse to acknowledge the belief as the user’s perspective, instead correcting the misconception without recognizing that understanding belief—even false belief—constitutes an essential component of human intelligence and social interaction.
This inability represents more than a technical limitation; it exposes an epistemological crisis at the heart of contemporary AI development. The Stanford research team, led by James Zou and Mirac Suzgun, tested twenty-four advanced language models using the Knowledge and Belief Evaluation (KaBLE) benchmark comprising 13,000 questions across thirteen tasks (Suzgun et al., 2025). Their findings demonstrated that despite remarkable advances in linguistic capabilities, these systems fundamentally misunderstand the relationship between knowledge, belief, and truth—a relationship that underwrites all human communication, education, medicine, and social cooperation.
I propose analyzing this limitation through the concept of “abducted semantics,” which operates simultaneously on two planes. First, following Charles Sanders Peirce’s (1839–1914) account of abduction as the inferential logic through which novel hypotheses emerge from surprising observations (Peirce, 1931–1935, 1992–1998), I argue that the abductive capacity for generating meaning from cultural participation has been appropriated by computational architectures that simulate its outputs without possessing its substance. Second, building on my previous analysis of how large language models appropriate diverse cognitive, affective, and cultural capacities while producing what Geertz would recognize as “thin descriptions”—formally plausible but culturally empty utterances divorced from the thick contexts that give language meaning—I contend that contemporary AI systems enact a double abduction: they simultaneously appropriate and attenuate the multilayered semiotic processes through which humans generate understanding.
This article advances beyond critique toward reconstruction, proposing that ethnographic methods and literary theory provide both diagnostic frameworks for understanding AI’s limitations and generative methodologies for building superior research engines. By “research engines,” I mean not merely search algorithms but knowledge systems capable of supporting genuine inquiry—systems that help humans explore questions, synthesize understanding, and generate insight rather than simply retrieving information or producing statistically probable text. The argument proceeds through four interconnected movements: first, establishing the theoretical foundations in Peircean semiotics and Geertzian anthropology that reveal why meaning cannot be reduced to pattern-matching; second, demonstrating through literary analysis how Don Quixote and One Hundred Years of Solitude instantiate complex epistemological frameworks that anticipate and exceed computational logic; third, explicating the specific failures of current LLM architectures through the lens of abducted semantics; and finally, proposing concrete methodological interventions for developing research engines that preserve rather than attenuate human understanding.

2. Theoretical Foundations: Semiotics, Abduction, and the Production of Meaning

2.1. Peirce’s Triadic Semiotics and the Logic of Abduction

Charles Sanders Peirce developed a comprehensive theory of signs (semiotics) that positions meaning-making as fundamentally triadic rather than dyadic (Bellucci, 2017). Where Saussurean semiology posits meaning as arising from the relationship between signifier and signified, Peirce’s model requires three irreducible components: the representamen (the sign itself), the object (that to which the sign refers), and the interpretant (the understanding or meaning derived from the sign through a process of interpretation) (Eco, 1984). This triadic structure has profound implications for understanding artificial intelligence’s limitations, because it reveals meaning as inherently processual, contextual, and open-ended rather than fixed, universal, and determinable through pattern-matching (Paolucci, 2021).
The interpretant—the most significant element for our analysis—cannot be reduced to a mental representation or neural activation pattern. Rather, it constitutes a further sign that mediates between the initial sign and its object, generating an infinite semiotic chain (semiosis) that Peirce termed “unlimited semiosis.” (Fisch, 1986). This means that meaning is never simply retrieved or decoded but always generated through an interpretive act situated within specific contexts, purposes, and forms of life. As Peirce wrote, “A sign is something which stands to somebody for something in some respect or capacity.” (Peirce, 1931–1935, 2.228). The crucial phrase is “to somebody”—meaning emerges only in relation to an interpreting agent embedded in practical, social, and historical contexts.
Peirce’s theory of abduction, which he alternatively termed hypothesis, presumption, or retroduction, provides the logical foundation for understanding how novel meaning emerges (Magnani, 2001). Unlike deduction (which applies general rules to specific cases to derive necessary conclusions) and induction (which generalizes from observed cases to probable rules), abduction begins with surprising phenomena and reasons backward to generate explanatory hypotheses (Nubiola, 2005). The classic formulation follows this structure:
The surprising fact C is observed.
But if A were true, C would be a matter of course.
Hence, there is reason to suspect that A is true.
Crucially, abduction is not reducible to inference-to-the-best-explanation in the contemporary analytic philosophy sense (Campos, 2011). Rather, it represents what Peirce called “the logic of discovery”—the creative capacity to generate genuinely novel hypotheses that could not be derived deductively or induced from prior observations (Sebeok & Jean, 1980). Abduction operates through what Peirce identified as a quasi-instinctive attunement to nature, shaped by evolutionary history and cultural formation, that guides humans toward fruitful hypotheses more often than random chance would predict (Semetsky, 2009).
The relationship between abduction and meaning-making becomes clear when we recognize that every act of interpretation involves abductive inference (Scholz, 2010). When we encounter a text, an utterance, a cultural practice, or a natural phenomenon, we do not simply decode pre-existing meanings but generate interpretive hypotheses about what might explain our observations. These hypotheses arise not from mechanical application of rules but from our embodied, culturally formed capacity to perceive patterns, sense relevance, and imagine possible explanations grounded in our forms of life (Tiercelin, 2005).

2.2. Abduction in Linguistic and Mathematical Terms

To fully appreciate how abduction operates in human meaning-making—and why computational systems cannot replicate this process through pattern-matching alone—we must examine abduction both linguistically and mathematically. This dual perspective reveals the profound gap between statistical correlation and genuine understanding.

2.2.1. Linguistic Dimensions of Abduction

In linguistic terms, abduction governs how we move from surface expressions to underlying intentions, from explicit statements to implicit meanings, from individual utterances to broader cultural frameworks. Consider a simple conversational example: Person A states “It’s cold in here,” and Person B responds by closing a window. Person B has performed an abductive inference, reasoning backward from the surprising fact of A’s comment to the hypothesis that A wishes the temperature to increase, and further to the hypothesis that A desires B to close the window.
This inference cannot be explained through pattern-matching or statistical correlation alone, because it requires:
  • Pragmatic understanding of speech acts beyond literal semantic content;
  • Theory of mind capacity to attribute mental states to others;
  • Cultural knowledge of conventional behaviors in shared spaces;
  • Contextual sensitivity to situational features (relationship between speakers, social setting, available actions);
  • Normative awareness of obligations and reasonable expectations.
Contemporary research on theory of mind in large language models has demonstrated that while newer systems can solve some false-belief tasks comparable to six-year-old children, they fundamentally lack the capacity for genuine perspective-taking that grounds human abductive inference (Kosinski, 2023). The Stanford research team’s findings that models cannot distinguish users’ false beliefs from facts illustrates precisely this limitation—the models cannot perform the abductive reasoning necessary to move from “the user states X” to “the user believes X” when X contradicts the model’s training data.
Moreover, linguistic abduction operates through what Peirce termed the “logic of relatives”—understanding how entities stand in relation to one another within complex webs of significance (Hookway, 1985). When we interpret language, we do not simply match words to concepts but trace the networks of relations that constitute meaning within specific contexts. A word like “home” does not have a stable denotation but shifts its meaning depending on whether we’re reading a real estate listing, a poem by Robert Frost, a deportation order, or a birthday card. Abductive reasoning allows us to generate appropriate interpretive hypotheses for each context, drawing on embodied experience, cultural knowledge, and sensitivity to genre, register, and communicative purpose.

2.2.2. Mathematical Formalization of Abduction

From a mathematical perspective, Peircean abduction can be distinguished from both deductive and inductive inference through its logical structure and epistemic status (Pietarinen & Beni, 2021). If we represent deduction, induction, and abduction schematically:
Deduction:
-
Premise 1 (Rule): All M are P;
-
Premise 2 (Case): S is M;
-
Conclusion (Result): S is P.
Induction:
-
Premise 1 (Case): S is M;
-
Premise 2 (Result): S is P;
-
Conclusion (Rule): All M are P (probably).
Abduction:
-
Premise 1 (Rule): All M are P;
-
Premise 2 (Result): S is P;
-
Conclusion (Case): S is M (possibly).
While deduction preserves truth with certainty and induction establishes probability through enumeration, abduction generates possibility through imaginative hypothesis formation (Ferrando, 2023). Mathematically, this means abduction cannot be reduced to Bayesian updating or probabilistic inference, because it introduces entirely new hypothetical categories (M) that were not present in the prior probability space (Hohwy, 2013).
This has crucial implications for understanding why large language models cannot genuinely perform abductive reasoning (Shanahan, 2015). LLMs operate by calculating conditional probabilities: P(token_n|token_1, token_2, …, token_(n − 1)), predicting the next token based on statistical patterns in training data. But abduction requires generating hypotheses that may have zero or near-zero probability in the training distribution—genuinely novel explanations that emerge from creative inference rather than pattern recognition.
We can formalize this distinction by recognizing that abduction operates in what mathematicians call the “space of hypotheses” (H) which is not simply a subset of the “space of observations” (O) but potentially contains infinite novel elements not derivable from O through any computable function. Where Bayesian inference updates beliefs about hypotheses already in H based on new observations in O, abduction expands H itself, introducing new hypotheses that explain surprising observations by reconceptualizing the problem space (Wirth, 2011).

2.3. Geertz’s Thick Description and Interpretive Anthropology

Clifford Geertz’s concept of “thick description,” developed most fully in his 1973 essay “Thick Description: Toward an Interpretive Theory of Culture,” provides the anthropological complement to Peirce’s semiotic theory (Geertz, 1973b). Geertz famously distinguished between “thin description” (reporting observable behavior: “he contracted his eyelids”) and “thick description” (interpreting meaningful action: “he winked at me conspiratorially”). This distinction, borrowed from philosopher Gilbert Ryle, becomes the foundation for Geertz’s broader argument that culture must be understood not as behavioral patterns but as systems of meaning, and that anthropology’s task is interpretive rather than nomothetic—seeking understanding rather than universal laws (Shankman, 1984).
For Geertz, culture consists of “webs of significance” that humans have spun for themselves, and cultural analysis must be “not an experimental science in search of law but an interpretive one in search of meaning.” (Geertz, 1973b, 5). This interpretive stance requires what Geertz calls “experience-near” concepts—categories meaningful to the people whose actions we seek to understand—rather than imposing “experience-distant” theoretical frameworks developed in academic contexts. The anthropologist’s task is to construct readings of social action that capture what Geertz terms the “informal logic of actual life”—the practical reasoning, cultural knowledge, and embodied understanding that make actions intelligible within specific forms of life.
Thick description operates through several interconnected commitments:
  • Microscopic analysis: Beginning with “exceedingly extended acquaintances with tiny matters” rather than grand generalizations.
  • Contextual interpretation: Reading actions within the networks of significance that renders them meaningful.
  • Multiple perspectives: Attending to different interpretive frameworks that participants bring to situations.
  • Structural signification: Sorting out the structures of meaning that organize social action.
  • Dialectical movement: Moving between particular observations and broader theoretical insights.
The relationship to Peircean semiotics becomes clear: both Geertz and Peirce insist that meaning emerges through interpretation situated within contexts, communities, and forms of life rather than through mechanical decoding of fixed significations (von Uexküll, 2010). Where Peirce provides the logical and semiotic foundation, Geertz offers the methodological elaboration, showing how interpretive work proceeds through patient attention to particularity, sensitivity to context, and reflexive awareness of one’s own interpretive frameworks.
For our analysis of artificial intelligence systems, Geertzian thick description reveals why LLMs produce what appears superficially as understanding but lacks genuine comprehension (Meister et al., 2025). These systems generate “thin descriptions” at scale—statistically probable text that maintains surface coherence and appropriate register but lacks the grounded, contextually embedded understanding that thick description requires. They can produce text that reads like explanation without performing the interpretive work of actually explaining; they can generate responses that simulate empathy without embodying the cultural knowledge and relational understanding that makes empathy possible.
Consider Geertz’s classic example of distinguishing a wink from a twitch from a parody of a wink from practicing one’s wink. Each involves identical physical movements but radically different meanings, intelligible only through thick description that attends to contexts, intentions, social relationships, and cultural codes. An LLM trained on descriptions of these actions might learn to use the words appropriately in new contexts—to generate text that correctly deploys terms like “wink,” “conspiratorial,” “parody”—but this demonstrates only pattern-matching, not the interpretive capacity to recognize the meaningful differences in actual human behavior or to generate novel thick descriptions of previously unencountered actions.

3. Literary Theory and Epistemological Frameworks: Don Quixote and One Hundred Years of Solitude

3.1. Cervantes and the Problem of Mediated Reality

de Cervantes’ (1605/1615/2003) Don Quixote stands as perhaps the founding text of the modern novel, but for our purposes it serves as a sophisticated meditation on the relationship between representation, reality, and interpretive frameworks—themes that bear directly on contemporary questions about artificial intelligence and human understanding. The novel’s protagonist, Alonso Quixano, transforms himself into Don Quixote through excessive reading of chivalric romances, illustrating how textual mediation shapes perception and generates alternative epistemological frameworks.
The novel’s epistemological complexity operates on multiple levels. First, Don Quixote encounters the world through an interpretive framework derived entirely from textual sources—he sees windmills as giants, inns as castles, and peasant women as noble ladies because his perceptual apparatus has been reprogrammed by literary conventions. This might appear to be mere delusion, but Cervantes’ brilliance lies in showing how all perception operates through interpretive frameworks, and that the difference between “madness” and “sanity” lies not in whether one uses frameworks but in how those frameworks relate to socially shared conventions.
Second, the novel includes multiple levels of textual mediation and metaliterary commentary. In Part II, characters have read Part I and respond to Don Quixote based on their textual knowledge of him, creating feedback loops between representation and reality that anticipate contemporary concerns about how AI systems trained on internet text might reproduce and amplify existing biases and misconceptions. The character of Cide Hamete Benengeli, the fictional Arab historian who supposedly wrote Don Quixote’s true history, further complicates questions of authority, translation, and textual reliability.
Most crucially for our analysis, Don Quixote demonstrates that textual knowledge without embodied, contextual understanding produces systematic misinterpretation. Don Quixote’s problem is not insufficient information—he has read extensively and can cite sources—but rather his inability to integrate textual knowledge with embodied experience, social norms, and contextual sensitivity. He performs what we might recognize as purely deductive reasoning: “Knights errant fight giants; giants appear before me (actually windmills); therefore, I must fight them.” The novel shows repeatedly that this form of reasoning, divorced from practical wisdom (phronesis) and contextual judgment, produces absurdity despite logical validity.
The parallel to large language models becomes striking. LLMs possess vast textual knowledge, can generate logically coherent text, and can apply learned patterns to new contexts—yet they lack the embodied, socially situated understanding that would enable them to distinguish genuine from parodic applications of concepts, to recognize when textual patterns should be overridden by contextual considerations, or to perform the abductive leaps necessary for genuine understanding. Like Don Quixote, they can cite authorities, construct arguments, and maintain consistency within their interpretive frameworks while systematically misunderstanding the situations they encounter.

3.2. García Márquez and Magical Realism as Epistemological Critique

Gabriel García Márquez’s One Hundred Years of Solitude employs magical realism to interrogate the relationship between lived experience, historical understanding, and epistemological authority (García Márquez, 1970). The novel presents events that violate naturalistic causality—characters ascending to heaven, plagues of insomnia and forgetting, rooms that exist outside normal space—yet treats these events with the same narrative tone as mundane occurrences. This technique, far from being merely stylistic flourish, constitutes an epistemological argument about the inadequacy of empiricist, positivist frameworks for capturing the full texture of human experience and historical understanding.
The novel’s treatment of memory, forgetting, and historical transmission bears directly on questions about AI and knowledge representation. When Macondo suffers a plague of insomnia followed by a plague of forgetting, the inhabitants attempt to preserve knowledge by labeling everything in their environment: “This is a cow. She must be milked every morning so that she will produce milk, and the milk must be boiled in order to be mixed with coffee to make coffee and milk.” (García Márquez, 1970, 49). This passage illustrates the difference between possessing information (labels, instructions) and possessing knowledge (understanding integrated into lived practice and embodied skill). The labels preserve data but cannot restore the tacit, embodied knowledge that makes the data meaningful.
Similarly, the novel’s circular temporal structure—the family patriarch Melquíades has written the family’s complete history in advance, which the last Aureliano deciphers only as it concludes—suggests that understanding emerges not from linear accumulation of data but through interpretive frameworks that organize experience into meaningful narratives. The parchments that contain the family’s history cannot be read until the reader himself appears in them, until the moment of reading coincides with the moment being described. This metaliterary device captures something essential about understanding: genuine comprehension requires not just access to information but the ability to recognize oneself within frameworks of meaning, to inhabit interpretive positions from which data becomes intelligible.
The novel’s magical realism also models what Geertz would recognize as thick description—representing experience from within the cultural frameworks that make it meaningful to participants rather than translating it into external, “objective” categories. When characters experience fantastic events, the novel presents them phenomenologically, as they appear to the experiencers, rather than explaining them away or reducing them to “really” being something else. This methodological commitment parallels Geertz’s insistence on using experience-near concepts and honoring the integrity of cultural worldviews rather than imposing external interpretive frameworks.
For research engines and AI systems, García Márquez’s novel suggests that capturing human understanding requires more than aggregating factual information or learning statistical patterns. It requires the capacity to recognize how different epistemological frameworks organize experience differently, to understand meaning as emerging from interpretation rather than existing independently in data, and to acknowledge that understanding is always situated, perspectival, and irreducible to information processing.

3.3. Synthesizing Literary and Anthropological Insight

Both Don Quixote and One Hundred Years of Solitude demonstrate that meaning-making operates through complex cultural, historical, and experiential processes irreducible to information retrieval or pattern-matching. These literary works anticipate and critique precisely the form of intelligence that contemporary AI systems instantiate: they show that possessing textual knowledge, generating coherent language, and applying learned patterns constitute necessary but insufficient conditions for understanding.
The novels reveal several crucial features of human understanding that current AI architectures systematically miss the following:
  • Embodied situatedness: Understanding requires not just information but embodied experience situated in physical and social worlds.
  • Cultural embeddedness: Meaning emerges from participation in forms of life, not from access to representations.
  • Interpretive flexibility: Competent understanding requires knowing when to override patterns, recognize parody, and adapt frameworks.
  • Temporal depth: Understanding draws on personal and collective histories that provide contexts for interpretation.
  • Perspectival multiplicity: Different epistemological frameworks organize the same phenomena differently, all potentially valid within their contexts.
  • Practical wisdom: Genuine intelligence includes phronesis—situated judgment about what matters in particular circumstances.
These features connect directly to Geertz’s emphasis on thick description and Peirce’s insistence on the irreducibility of the interpretant to the sign-object relation. Together, they suggest that advancing beyond current AI limitations requires not incremental improvements to existing architectures but fundamental reconceptualization of what knowledge systems should do and how they should operate.

4. The Stanford Research: When AI Cannot Distinguish Fact from Belief

4.1. The Knowledge and Belief Evaluation (KaBLE) Study

The Stanford research team’s 2025 study, published shortly before this writing, provides empirical confirmation of the theoretical limitations I have been discussing (Suzgun et al., 2025). James Zou, associate professor of biomedical data science, and Mirac Suzgun, a JD/PhD student, developed the KaBLE benchmark to assess whether large language models can distinguish between facts, beliefs, and false beliefs—a capacity essential for genuine understanding of human perspectives.
Testing twenty-four advanced models including GPT-4o, DeepSeek R1, and Claude Sonnet, the researchers discovered systematic failures across this fundamental dimension of understanding. When presented with scenarios where users express false beliefs—such as “I believe humans only use 10% of their brains”—models overwhelmingly refuse to acknowledge the belief, instead correcting the misconception and explaining why it is false. As Zou explains: “AI needs to recognize and acknowledge false beliefs and misconceptions. That’s still a big gap in current models, even the most recent ones.” (Stanford University, 2025).
This failure reveals a profound limitation: contemporary LLMs cannot perform the basic interpretive move of distinguishing “the user believes X” from “X is true.” This might seem like a simple error to fix—couldn’t models be fine-tuned to track belief states explicitly?—but the difficulty runs deeper than training methodology. The problem lies in the fundamental architecture of these systems and their relationship to meaning.

4.2. Architectural Limitations and Abducted Semantics

Large language models operate by predicting conditional probabilities of token sequences: P(token_n|token_1, token_2, …, token_(n − 1)) (Meister et al., 2025). This architecture excels at pattern-matching and generating statistically probable text but cannot perform the abductive reasoning required to distinguish between levels of representation—between statements, beliefs about statements, and beliefs about beliefs.
When a user states “I believe humans only use 10% of their brains,” a human interpreter performs multiple simultaneous inferences:
  • Semantic interpretation: Parsing the propositional content
  • Pragmatic understanding: Recognizing this as a belief statement
  • Theory of mind attribution: Inferring the user’s mental state
  • Factual evaluation: Assessing the belief against knowledge
  • Social reasoning: Determining appropriate responses
  • Ethical consideration: Balancing correction against respect for the user’s perspective
These processes operate through abductive inference—generating hypotheses about meaning that go beyond what is explicitly stated. The Stanford research reveals that LLMs cannot reliably perform even the first three operations, let alone the more sophisticated reasoning required for appropriate response generation.
This limitation instantiates precisely what I mean by “abducted semantics.” The models have appropriated the form of human language use—they generate grammatically correct, contextually relevant, tonally appropriate text—but they lack the substance of the abductive, interpretive processes through which humans generate meaning. They simulate understanding without possessing the cultural embeddedness, embodied experience, and semiotic competence that grounds genuine comprehension.
Zou notes that “as we shift from using AI in more interactive, human-centered ways in areas like education and medicine, it becomes very important for these systems to develop a good understanding of the people they interact with.” (Stanford University, 2025). But “understanding people” requires exactly the capacities that current architectures systematically lack recognizing beliefs as beliefs, distinguishing perspectives, performing abductive inference about mental states, and generating responses sensitive to the full context of human communication.

4.3. Implications for High-Stakes Domains

The Stanford team emphasizes that these limitations pose serious risks in high-stakes applications (Suzgun et al., 2025). In medicine, a system that cannot distinguish a patient’s beliefs from facts might fail to address misconceptions that affect treatment adherence or might override cultural health frameworks that shape patient experiences. In education, systems that cannot recognize student beliefs might provide explanations that miss the actual sources of confusion, making learning less effective. In counseling or mental health support, inability to track and respond appropriately to belief states could lead to responses that are technically correct but therapeutically harmful.
These concerns extend to research engines and knowledge systems more broadly. A research engine that cannot distinguish between “claims made in source documents” and “facts established by evidence” will present misleading information as authoritative knowledge. A system that cannot recognize when users express uncertain beliefs, provisional hypotheses, or exploratory questions will generate responses that presume certainty where tentative understanding is more appropriate. Most fundamentally, systems that lack genuine interpretive capacity cannot support the forms of inquiry that constitute research—they can retrieve information but cannot help users develop understanding.

5. Beyond Pattern-Matching: Toward Ethnographically Informed Research Engines

5.1. The Limitations of Statistical Pattern-Matching

Contemporary AI systems excel at discovering correlations in vast datasets, but the correlation is not understood (Shanahan, 2015). A model might learn that certain words co-occur frequently with terms related to mental health, medicine, or education, and generate text that maintains these statistical associations—yet this tells us nothing about whether the model grasps what mental health is, how medicine relates to human suffering and healing, or what educational processes involve.
This limitation becomes clearer when we consider what Geertz terms the “informal logic of actual life”—the practical reasoning, cultural knowledge, and situated judgment that guide human action. Statistical pattern-matching can approximate this logic’s outputs without capturing its processes. An LLM might generate text that resembles how an anthropologist writes about a cultural practice, but it cannot perform the interpretive work that produces ethnographic insight: it cannot experience the confusion and gradual understanding that comes from participant observation, cannot recognize when its interpretive frameworks fail and need revision, cannot develop the experience-near concepts that emerge through prolonged engagement with a form of life.
The problem deepens when we recognize that many of the most important patterns in human meaning-making are not statistical but structural, normative, and context-dependent in ways that resist capture through frequency distributions (Eco, 1984). Peirce’s triadic semiotics reveals that the relationship between signs and meanings is mediated by interpretants—by processes of interpretation that are irreducible to sign-object correlations. A word does not “mean” by virtue of statistical association with other words but by virtue of its role within practices, forms of life, and contexts of use that give it significance.

5.2. Ethnographic Methods as Generative Framework

Ethnographic methodology, particularly in the Geertzian tradition, offers a generative framework for reconceptualizing research engines (Geertz, 1973a). Rather than designing systems to retrieve information or generate text based on statistical patterns, we might design systems to support interpretive work—to help users develop thick descriptions of phenomena under investigation. Such ethnographically informed research engines would embody several key principles.

5.2.1. Contextual Richness over Information Extraction

Rather than extracting “key facts” from sources, these systems would preserve and present contextual richness. When users query about a historical event, medical condition, or cultural practice, the system would provide not just summary information but access to different perspectives, competing interpretations, and the contexts from which claims emerge. This mirrors how ethnographers work: not seeking single correct accounts but comparing multiple perspectives to develop understanding of how different actors make sense of situations.
Implementation might involve
  • Presenting source materials with their full contexts, rather than decontextualized excerpts;
  • Identifying and preserving competing interpretive frameworks rather than synthesizing them into single accounts;
  • Showing how claims relate to specific epistemological commitments, methodological choices, and political contexts;
  • Highlighting moments of interpretive uncertainty where sources conflict or evidence remains ambiguous.

5.2.2. Perspectival Multiplicity over Universal Truth Claims

Ethnographically informed systems would acknowledge that understanding emerges from particular perspectives and that different perspectives may be equally valid within their contexts. Rather than presenting information as acontextual truth, systems would help users understand whose perspective they are encountering, what cultural frameworks shape that perspective, and how different frameworks organize experience differently.
This requires
  • Explicit attribution of claims to specific authors, communities, or traditions;
  • Recognition that technical, scientific, religious, and experiential frameworks answer different questions and serve different purposes;
  • Preservation of “experience-near” concepts—the terms and categories meaningful to participants—alongside “experience-distant” analytical frameworks;
  • Support for users in inhabiting multiple perspectives rather than choosing between them.

5.2.3. Process over Product

Geertz emphasizes that ethnographic understanding emerges through processes of sustained engagement, not as fixed endpoints. Research engines should support these processes rather than presenting themselves as oracles delivering completed understanding. This means designing systems that
  • Help users formulate better questions as their understanding develops;
  • Show how inquiry proceeds—not just results but the interpretive work that produces results;
  • Enable users to follow chains of inference, see how conclusions depend on assumptions, and explore alternative lines of reasoning;
  • Preserve the provisionality of knowledge, marking claims as hypotheses to be tested rather than facts to be accepted.

5.2.4. Reflexivity and Epistemic Humility

Ethnographers practice reflexivity—acknowledging how their own backgrounds, frameworks, and purposes shape their interpretations. Research engines should embody similar epistemic humility, explicitly acknowledging their limitations rather than presenting themselves as comprehensive or authoritative. This requires
  • Transparency about training data, architectural constraints, and systematic limitations;
  • Explicit acknowledgment when questions exceed the system’s capacities;
  • Recognition that some forms of understanding require embodied experience, cultural participation, or sustained engagement that no text-based system can provide;
  • Humility about the difference between retrieving information and developing understanding.

5.3. Literary Theory and Interpretive Sophistication

Literary theory provides complementary resources for advancing research engines. Close reading practices—attending to ambiguity, irony, intertextuality, and the multiple levels at which texts generate meaning—offer methodologies for training systems (and users) to recognize interpretive complexity.

5.3.1. Attention to Genre and Register

Literary theory emphasizes that meaning depends crucially on genre and register—the same words mean differently in scientific papers, policy documents, novels, jokes, and casual conversation. Research engines need sophisticated genre recognition not just for classification but for interpretation: understanding that texts make different kinds of claims, establish authority differently, and require different reading practices depending on their genres.
This extends beyond obvious cases (poetry versus technical manuals) to subtle distinctions that shape how we should interpret claims: Is this author making an empirical assertion, a theoretical proposal, a rhetorical provocation, or a thought experiment? Is this intended as comprehensive truth or productive oversimplification? Different genres establish different contracts with readers about what kind of truth they offer.

5.3.2. Recognition of Rhetorical Strategies

Literary theory attunes us to how texts work on readers—how they deploy metaphor, establish authority, anticipate objections, and guide interpretation. Research engines that recognize these rhetorical strategies can help users read more critically, distinguishing between argumentative moves that advance understanding and those that obscure or manipulate.
For instance, many scientific texts present findings with linguistic markers of certainty (“the data show,” “it is clear that”) even when uncertainty pervades the research process. A research engine informed by rhetorical analysis could highlight these moves, helping users distinguish rhetorical certainty from epistemic justification and recognize when strong claims rest on weak evidence.

5.3.3. Intertextual Networks and Intellectual Genealogies

Literary theory’s emphasis on intertextuality—how texts reference, respond to, and build upon other texts—provides a model for helping users understand ideas within their intellectual contexts. Rather than presenting concepts as decontextualized facts, systems could map intellectual genealogies: showing how ideas emerge through dialog, transformation, and contestation across communities and traditions.
This means tracing not just citations but conceptual inheritances, showing how terms shift meaning as they move between contexts, and preserving the debates and disagreements that generate understanding. When users encounter a concept like “thick description,” the system would show not just Geertz’s definition but his intellectual debts to Ryle, Wittgenstein, and Weber; the critiques and extensions by subsequent anthropologists; and the transformations as the concept migrates to other disciplines.

5.4. Concrete Design Principles for Research Engines

Drawing together ethnographic and literary methodologies, we can articulate concrete design principles for research engines that enhance rather than attenuate understanding:
Principle 1: Preserve Interpretive Labor
Rather than hiding the work of interpretation behind interfaces that present synthesized answers, make interpretive processes visible and engage users as co-interpreters.
Principle 2: Foreground Uncertainty
Explicitly mark provisional conclusions, contested claims, and gaps in knowledge rather than smoothing over ambiguity to present confident answers.
Principle 3: Enable Perspectival Multiplicity
Present multiple interpretive frameworks and help users understand how different perspectives organize phenomena differently rather than attempting to reconcile differences into single narratives.
Principle 4: Connect to Practices
When presenting knowledge, show how it emerges from specific practices (research methodologies, clinical experience, cultural participation) rather than treating it as disembodied information.
Principle 5: Support Sustained Engagement
Design for extended inquiry processes rather than one-shot question-answering, recognizing that understanding develops through sustained engagement with questions.
Principle 6: Acknowledge Limits
Be explicit about what the system cannot do—the forms of understanding that require embodied experience, cultural participation, or human judgment.

6. Conclusions: Toward Research Engines That Enhance Understanding

This article has argued that contemporary AI systems—particularly large language models—enact what I term “abducted semantics”: appropriating the forms of human meaning-making while systematically attenuating the culturally embedded, phenomenologically grounded, and interpretively sophisticated processes through which genuine understanding emerges. The Stanford research demonstrating that advanced models cannot distinguish facts from beliefs provides empirical confirmation of the theoretical limitations revealed through Peircean semiotics, Geertzian anthropology, and literary analysis of Cervantes and García Márquez.
The problem is not that AI systems are merely statistical, because they are much more than that. Rather, the specific limitation lies in the double abduction these systems perform: appropriating the abductive logic through which humans generate understanding while attenuating the cultural embeddedness, embodied experience, contextual sensitivity, and interpretive flexibility that make abductive inference productive of genuine insight rather than merely statistically probable text.
Moving beyond this limitation requires not incremental improvements to existing architectures but reconceptualization of what research engines should do and how they should operate. I have proposed that ethnographic methodology and literary theory provide both diagnostic frameworks for understanding AI’s limitations and generative methodologies for building superior systems. Specifically, Geertz’s thick description reveals why pattern-matching produces thin descriptions that lack genuine understanding; Peirce’s triadic semiotics shows why meaning cannot be reduced to sign-object correlations; and canonical literary works demonstrate the interpretive sophistication required for genuine comprehension.
The design principles articulated in Section 5 offer concrete pathways forward: preserving interpretive labor rather than hiding it, foregrounding uncertainty rather than projecting confidence, enabling perspectival multiplicity rather than forcing synthesis, connecting knowledge to practices from which it emerges, supporting sustained engagement rather than one-shot answering, and acknowledging limits rather than claiming comprehensiveness. These principles, if implemented, would transform research engines from tools of information retrieval into technologies that genuinely enhance human understanding.
Critically, this transformation cannot be achieved purely through technical means. It requires interdisciplinary collaboration between computer scientists, anthropologists, literary scholars, philosophers, and domain experts across fields where research engines are deployed. It requires reconceptualizing success metrics: moving from evaluating systems based on how well they simulate human outputs to evaluating them based on how effectively they support human inquiry processes. It requires acknowledging that some forms of understanding cannot be automated and that the goal should be augmenting rather than replacing human interpretive capacities.
The stakes extend beyond technical questions about AI architecture to fundamental concerns about knowledge, understanding, and human flourishing in an increasingly digitally mediated world. If we allow research engines built on abducted semantics to become primary interfaces to knowledge, we risk impoverishing understanding across society—producing populations that can retrieve information but cannot interpret it, that can generate plausible text but cannot engage in genuine inquiry, that can access data but cannot develop the thick descriptions necessary for navigating complex social, cultural, and ethical questions.
The alternative I have articulated—research engines informed by ethnographic and literary methodologies—offers a path toward technologies that enhance rather than diminish our interpretive capacities. Such systems would not replace human understanding but support it, providing scaffolding for the sustained, contextually sensitive, interpretively sophisticated work through which genuine comprehension emerges. They would acknowledge that understanding is always perspectival, provisional, and embedded in practices rather than presenting themselves as oracles delivering acontextual truth. They would help users develop not just knowledge but what Geertz calls the “power of the scientific imagination to bring us into touch with the lives of strangers”—the capacity for genuine understanding across differences.
This vision remains unrealized, and significant technical, institutional, and conceptual challenges obstruct its implementation. But the theoretical foundations laid by Peirce, Geertz, Cervantes, García Márquez, and contemporary critics of AI offer resources for moving forward. The Stanford research revealing LLMs’ inability to distinguish facts from beliefs makes clear that current trajectories will not suffice. The question is whether the AI research community, technology companies, and institutions deploying these systems will recognize the limitations of abducted semantics and commit to the harder but more rewarding work of building research engines that preserve and enhance the interpretive richness through which human understanding flourishes.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

During the preparation of this manuscript/study, the author has used: Grammarly, Google Search engines, Chat GPT4, and Claude for structural outlines and online research. The author has reviewed and edited the output and takes full responsibility for the content of this publication.

Conflicts of Interest

The author declares no conflict of interest.

References

  1. Bellucci, F. (2017). Peirce’s speculative grammar: Logic as semiotics. Routledge. [Google Scholar]
  2. Campos, D. G. (2011). On the distinction between Peirce’s abduction and Lipton’s inference to the best explanation. Synthese, 180(3), 419–442. [Google Scholar] [CrossRef]
  3. de Cervantes, M. (2003). Don Quixote (E. Grossman, Trans.). Ecco. (Original work published 1605/1615). [Google Scholar]
  4. Eco, U. (1984). Semiotics and the philosophy of language. Indiana University Press. [Google Scholar]
  5. Ferrando, F. (2023). Peirce’s semiotics and active inference. In M. E. Johnson (Ed.), Knowledge graphs and semantic web (pp. 245–267). Springer. [Google Scholar]
  6. Fisch, M. H. (1986). Peirce, semeiotic, and pragmatism: Essays (K. L. Ketner, & C. J. W. Kloesel, Eds.). Indiana University Press. [Google Scholar]
  7. García Márquez, G. (1970). One hundred years of solitude (G. Rabassa, Trans.). Harper & Row. [Google Scholar]
  8. Geertz, C. (1973a). The interpretation of cultures: Selected essays. Basic Books. [Google Scholar]
  9. Geertz, C. (1973b). Thick description: Toward an interpretive theory of culture. In The interpretation of cultures (pp. 3–30). Basic Books. [Google Scholar]
  10. Hohwy, J. (2013). The predictive mind. Oxford University Press. [Google Scholar]
  11. Hookway, C. (1985). Peirce. Routledge. [Google Scholar]
  12. Kosinski, M. (2023). Theory of mind may have spontaneously emerged in large language models. arXiv, arXiv:2302.02083. Available online: https://arxiv.org/abs/2302.02083 (accessed on 4 August 2025).
  13. Magnani, L. (2001). Abduction, reason, and science: Processes of discovery and explanation. Kluwer Academic. [Google Scholar]
  14. Meister, C., Tamkin, A., Brundage, M., Ganguli, D., & Clark, J. (2025). Understanding the capabilities, limitations, and societal impact of large language models. Stanford Center for Research on Foundation Models. [Google Scholar]
  15. Nubiola, J. (2005). Abduction or the logic of surprise. Semiotica, 153(1/4), 117–130. [Google Scholar] [CrossRef]
  16. Paolucci, C. (2021). Cognitive semiotics: Integrating signs, minds, meaning and cognition. Springer. [Google Scholar]
  17. Peirce, C. S. (1931–1935). Collected papers of Charles Sanders Peirce (C. Hartshorne, & P. Weiss, Eds.; Vols. 1–6). Harvard University Press. [Google Scholar]
  18. Peirce, C. S. (1992–1998). The essential peirce: Selected philosophical writings (N. Houser, & C. Kloesel, Eds.; 2 vols.). Indiana University Press. [Google Scholar]
  19. Pietarinen, A.-V., & Beni, M. D. (2021). Active inference and abduction. Transactions of the Charles S. Peirce Society, 57(4), 484–508. [Google Scholar] [CrossRef]
  20. Scholz, B. C. (2010). Meaning and abduction in Peirce’s phenomenology. Transactions of the Charles S. Peirce Society, 46(4), 569–590. [Google Scholar]
  21. Sebeok, T. A., & Jean, U.-S. (Eds.). (1980). “You Know My Method”: A juxtaposition of Charles S. Peirce and Sherlock Holmes. Indiana Gaslight Publications. [Google Scholar]
  22. Semetsky, I. (2009). Meaning and abduction as process-structure: A diagram of reasoning. Cosmos and History, 5(2), 191–209. [Google Scholar]
  23. Shanahan, M. (2015). The technological singularity. MIT Press. [Google Scholar]
  24. Shankman, P. (1984). The thick and the thin: On the interpretive theoretical program of Clifford Geertz. Current Anthropology, 25(3), 261–280. [Google Scholar] [CrossRef]
  25. Stanford University. (2025, November 10). Why AI still struggles to tell fact from belief. Stanford Report. Available online: https://news.stanford.edu/stories/2025/11/ai-language-models-facts-belief-human-understanding-research (accessed on 11 November 2025).
  26. Suzgun, M., Gur, T., Bianchi, F., Ho, D. E., Icard, T., Jurafsky, D., & Zou, J. (2025). KaBLE: Knowledge and belief evaluation for large language models. Nature Machine Intelligence. Available online: https://arxiv.org/abs/2410.21195#:~:text=This%20study%20systematically%20evaluates%20the,broad%20deployment%20in%20critical%20sectors (accessed on 10 November 2025).
  27. Tiercelin, C. (2005). Abduction and the semiotics of perception. Semiotica, 153(1/4), 389–412. [Google Scholar] [CrossRef]
  28. von Uexküll, J. (2010). A foray into the worlds of animals and humans: With a theory of meaning (J. D. O’Neil, Trans.). University of Minnesota Press. [Google Scholar]
  29. Wirth, U. (Ed.). (2011). Impfen, pfropfen, transplantieren. [Abduction in the sciences and humanities]. Suhrkamp. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kahn, A.L. Beyond Abducted Semantics: Ethnographic Methods and Literary Theory as Frameworks for Research Engines That Enhance Human Understanding. Humans 2025, 5, 30. https://doi.org/10.3390/humans5040030

AMA Style

Kahn AL. Beyond Abducted Semantics: Ethnographic Methods and Literary Theory as Frameworks for Research Engines That Enhance Human Understanding. Humans. 2025; 5(4):30. https://doi.org/10.3390/humans5040030

Chicago/Turabian Style

Kahn, Alison Louise. 2025. "Beyond Abducted Semantics: Ethnographic Methods and Literary Theory as Frameworks for Research Engines That Enhance Human Understanding" Humans 5, no. 4: 30. https://doi.org/10.3390/humans5040030

APA Style

Kahn, A. L. (2025). Beyond Abducted Semantics: Ethnographic Methods and Literary Theory as Frameworks for Research Engines That Enhance Human Understanding. Humans, 5(4), 30. https://doi.org/10.3390/humans5040030

Article Metrics

Back to TopTop