Seeing the Message but Not the Machine: Digital Skepticism and AI Discernment in Online Information Environments

Phothong, Lersak; Sukprasert, Anupong; Shutimarrungson, Nattakarn; Obthong, Mehtabhorn

doi:10.3390/info17030295

Open AccessArticle

Seeing the Message but Not the Machine: Digital Skepticism and AI Discernment in Online Information Environments

by

Lersak Phothong

,

Anupong Sukprasert

^*

,

Nattakarn Shutimarrungson

and

Mehtabhorn Obthong

Mahasarakham Business School, Mahasarakham University, Mahasarakham 44150, Thailand

^*

Author to whom correspondence should be addressed.

Information 2026, 17(3), 295; https://doi.org/10.3390/info17030295

Submission received: 5 January 2026 / Revised: 15 March 2026 / Accepted: 17 March 2026 / Published: 18 March 2026

Download

Browse Figures

Versions Notes

Abstract

Artificial intelligence (AI) increasingly mediates how information is generated, ranked, and circulated in digital environments. However, it remains unclear under what conditions users explicitly articulate recognition of AI involvement in routine news-related discourse. This study examines how digital skepticism and AI-related discernment are expressed in naturally occurring social media discussions. Using an exploratory observational design, 6065 user-generated comments from 305 news-related Reddit threads were analyzed through a rule-based framework distinguishing general skepticism, structural suspicion, and explicit AI-related discernment. Within the sampled corpus, generalized digital skepticism is proportionally more visible than explicit attribution to AI-generated or synthetically produced content. Explicit AI-related attribution is unevenly distributed across discourse contexts, appearing more frequently in technology-oriented communities and remaining limited in mainstream news-related discussions. Differences across score-based visibility contexts do not correspond to a consistently higher representation of explicit AI attribution. These findings indicate a distributional difference between generalized skepticism and publicly articulated recognition of AI mediation. Rather than measuring levels of awareness, the results illuminate the contextual and linguistic conditions under which AI involvement becomes explicitly named in public interaction. By focusing on observable discourse rather than self-reported attitudes, the study provides a corpus-bound account of when AI mediation becomes discursively articulated in algorithmically mediated environments.

Keywords:

algorithmic mediation; discursive AI attribution; digital skepticism; information behavior; social media discourse; information credibility

1. Introduction

Digital platforms have become central infrastructures through which users encounter and interpret information. In these environments, visibility is influenced by algorithmic ranking, personalization, and engagement-based feedback mechanisms in addition to editorial gatekeeping. A substantial body of research indicates that such arrangements are associated with patterns of attention allocation, the prominence of emotionally resonant content, and selective exposure, often operating outside users’ focal awareness [1,2]. As a result, information encounters increasingly occur within computationally mediated systems embedded in everyday online interaction.

Within algorithmically mediated environments, misinformation is not adequately captured by a simple true–false distinction. Contemporary scholarship instead conceptualizes misinformation as a contextual and bounded phenomenon situated within uncertainty, contested expertise, and evolving evidentiary standards [3]. Under such conditions, users frequently evaluate information without definitive verification, relying on interpretive judgment and situational cues. Responses to questionable content are therefore more appropriately understood as practices of uncertainty navigation rather than binary accuracy assessments.

Research in communication and information science indicates that users do not consistently engage in systematic verification when encountering online information. Credibility judgments are commonly guided by heuristic cues such as plausibility, narrative coherence, popularity, and social endorsement [4]. These heuristics can be understood as adaptive responses to information overload and operate within platform architectures that emphasize engagement metrics. As algorithmic curation influences content prominence, heuristic evaluation may become more visible in online discourse.

The increasing integration of AI-generated and AI-mediated content introduces additional complexity to these dynamics. Advances in foundation models and generative systems enable large-scale production of fluent text that can resemble human-authored content, often without explicit markers of authorship or intent [5]. These developments complicate assumptions about human authorship and content provenance, contributing to information environments in which production processes may not be readily transparent [6]. In such contexts, discernment may involve evaluation not only of factual claims but also of structural cues and potential automated mediation.

Recent scholarship suggests that disinformation in algorithmically mediated environments operates not only through factual inaccuracy but also through aesthetic and structural features associated with emotional resonance and circulation [7]. Algorithmic systems organize visibility and repetition, while automation does not always become explicitly articulated in user-facing interaction. This opacity has prompted ongoing discussion regarding transparency, accountability, and users’ ability to interpret computational mediation within information environments [8].

Skepticism toward online information does not necessarily entail explicit articulation of algorithmic or AI-mediated processes. Despite extensive research on misinformation, algorithmic curation, and AI ethics, systematic discourse-level examinations of how AI-related discernment is expressed in naturally occurring public interaction remain comparatively less developed than survey- or experiment-based approaches. Much of the literature relies on self-reported attitudes or experimentally elicited judgments. While valuable, these approaches provide limited visibility into how skepticism and attribution emerge spontaneously in routine discourse, where doubt unfolds without analytical prompting [9].

This limitation is particularly salient in news-related online environments, which play a central role in public sense-making. These spaces are characterized by high engagement and rapid circulation of content. It remains analytically relevant to examine whether such environments are associated with explicit articulation of AI mediation or whether AI-related attribution remains lexically unexpressed in routine discourse.

To investigate this question, the present study examines how digital skepticism and AI-related discernment are articulated in naturally occurring Reddit discussions. Rather than measuring self-reported awareness, the study analyzes observable discourse using a three-level framework capturing (L1) general skepticism toward content, (L2) structural suspicion regarding framing or coordination, and (L3) explicit attribution of AI-generated or synthetically produced content (i.e., AI-mediated provenance), distinguished from background ranking or recommendation. By analyzing comment-level expressions and aggregating them across discussion contexts, the study provides a corpus-bound account of how users articulate evaluative orientations within algorithmically mediated environments.

2. Literature Review

2.1. Algorithmic and AI-Mediated Information Environments

Contemporary information environments are increasingly mediated by algorithmic systems and AI-driven infrastructures that influence what becomes visible, salient, and socially reinforced. Rather than functioning as purely neutral intermediaries, digital platforms organize attention through ranking mechanisms, personalization, and engagement-based feedback loops. Critical scholarship characterizes these systems as socially embedded infrastructures associated with particular value orientations and operational logics that may not be explicitly articulated in routine interaction [10].

From a platform governance perspective, algorithmic systems extend beyond information distribution. They are implicated in shaping public discourse through ongoing processes of moderation, amplification, and removal [11]. Algorithmic mediation can be understood as a pervasive structural feature of contemporary information environments rather than an isolated intervention [2,12]. In such environments, popularity and engagement metrics operate as heuristic cues associated with perceptions of credibility and relevance [13,14]. Within this framework, algorithmic systems influence interpretation not through overt persuasion but by structuring conditions under which information is encountered and evaluated. This infrastructural role complicates conventional assumptions about authorship, responsibility, and accountability in contemporary information ecosystems.

Research suggests that algorithmic curation may be associated with more homophilic exposure patterns, sometimes described as “filter bubbles” or “echo chambers,” potentially limiting exposure to divergent perspectives [1]. Early discussions framed personalization primarily as a technological risk [15]. More recent analyses situate these dynamics within broader socio-technical arrangements combining platform design, user behavior, and incentive structures. Such arrangements may operate without explicit markers signaling the presence or influence of automated mediation.

The rise of AI-assisted and AI-generated content introduces additional complexity to these environments. Disinformation can operate through aesthetic and structural features—such as narrative coherence, emotional resonance, and stylistic plausibility—rather than through easily identifiable factual inaccuracies [7]. In these contexts, AI systems are implicated in producing and circulating content that resembles human-authored material, complicating discursive articulation of informational origins and production mechanisms.

From an information science perspective, these developments highlight a potential divergence between the structural centrality of algorithmic and AI mediation and their experiential articulation in everyday interaction. Algorithmic processes are associated with patterns of visibility and interpretation while often remaining backgrounded in routine engagement. This observation motivates a closer examination of how skepticism, suspicion, and explicit attribution are articulated in practice, complementing research that has primarily focused on cognitive awareness and attitudinal measures.

2.2. Misinformation, Uncertainty, and Skepticism in Digital Information Environments

Research on misinformation increasingly moves beyond binary distinctions between true and false information, emphasizing instead uncertainty, contested expertise, and evolving evidentiary standards in shaping how misinformation is defined and experienced [3,16]. From this perspective, misinformation is understood as contextually bounded: what counts as misleading is often evaluated relative to the best available expert knowledge at a given moment rather than against an absolute standard of falsity [3]. This framing shifts analytical attention away from static accuracy judgments toward how individuals interpret and respond to ambiguous or contested claims in real-world environments.

From an information behavior standpoint, an important distinction exists between ignorance and misperception. Scholarship further differentiates belief acceptance from attribution, demonstrating that individuals may reject inaccurate information without recognizing or articulating the source or mechanism of deception [17,18]. Nyhan and Reifler [17] show that uninformed individuals frequently acknowledge uncertainty, whereas misinformed individuals may hold incorrect beliefs with high confidence. This distinction complicates interpretations of skepticism in public discourse. Expressions of doubt do not necessarily indicate reflective or elaborated explanatory reasoning; they may coexist with motivated reasoning, incomplete explanatory models, or provisional assumptions.

A substantial body of experimental research documents the persistence of false beliefs even after explicit corrections, commonly referred to as the continued influence effect [17,19,20]. Syntheses of this literature identify cognitive mechanisms associated with resistance to correction, including mental model coherence, familiarity effects, retrieval failure, and reactance [19]. These mechanisms help explain why individuals may rely on narrative coherence and plausibility, particularly when corrective information disrupts existing explanatory frameworks [20]. Rather than assuming uniform irrationality, this body of research highlights the stability of cognitive structures under conditions of uncertainty.

These cognitive dynamics unfold within digitally mediated environments in which heuristic processing is common. When encountering information online, users rarely engage in systematic fact-checking. Instead, they often rely on affective, familiarity-based, and socially validated cues such as emotional resonance, perceived consensus, and source familiarity [20]. Experimental evidence suggests that susceptibility to misinformation may be linked more closely to inattention than to ideological commitment, as users do not consistently activate reflective accuracy judgments at the moment of exposure [21]. Social media contexts have been associated with greater emphasis on engagement-oriented considerations relative to explicit truth evaluation, which may correspond with lower observable accuracy discernment [20].

Within social media environments, these tendencies are frequently described as interacting with platform architectures that prioritize engagement, repetition, and social endorsement. Policy-oriented analyses characterize contemporary disinformation as operating at the level of information ecosystems, involving interactions among platform design, amplification mechanisms, and audience practices rather than relying solely on isolated false claims [22]. Repeated exposure has been associated with increased familiarity and perceived plausibility, which may facilitate circulation even in the absence of firm belief. Misinformation may therefore circulate not only because users endorse it, but also because it aligns with social identity, emotional expression, or performative norms within online communities.

Under these conditions, skepticism toward online content can take multiple forms. At a general level, users may express doubt or cynicism without articulating specific justificatory reasoning. At a structural level, skepticism may target institutional credibility, media bias, political intent, or perceived hidden agendas. However, explicit recognition of algorithmic or AI-mediated production does not consistently appear in mainstream discourse. Questionable content may instead be interpreted through predominantly human-centered frames, with intent attributed to journalists, political actors, or ideological groups rather than to computational systems.

Recent syntheses describe a divergence between reported public concern about misinformation and publicly articulated understanding of algorithmic mechanisms. Although survey-based research indicates high levels of perceived exposure and distrust toward platforms, understanding of algorithmic mechanisms associated with content visibility appears uneven [9]. This divergence suggests that skepticism may be more readily articulated than explicit technological discernment, particularly when such discernment requires acknowledgment of AI systems, automation, or recommender infrastructures. Within this distinction, AI-related discernment becomes analytically salient for examining contemporary information behavior at the discourse level.

2.3. AI Awareness, Algorithmic Literacy, and Discernment

Recent scholarship indicates that skepticism toward online information does not automatically translate into explicit awareness of algorithmic or AI-mediated processes [23]. In information science and communication research, this distinction is examined through concepts such as algorithmic awareness, algorithmic literacy, and AI literacy. These constructs refer to users’ recognition and understanding of how computational systems are associated with shaping information environments.

Algorithmic literacy is commonly conceptualized as a multidimensional construct encompassing both awareness of algorithmic presence and knowledge of how such systems operate. Dogruel et al. [24] distinguish between awareness—the recognition that algorithms are involved in ranking or personalization—and knowledge, which entails understanding their principles and possible consequences. Their validation of an algorithm literacy scale suggests that awareness alone may not be sufficient for informed evaluation or adaptive behavior. This distinction is analytically relevant for interpreting discursive responses to AI-mediated content, as explicit attribution requires not only recognition but also interpretive framing competence.

Research on AI literacy reinforces this differentiation. Long and Magerko [25] conceptualize AI literacy as a set of competencies including recognizing AI, understanding its capabilities and limitations, reasoning about its outputs, and evaluating its social and ethical implications. Recognition is treated as a foundational yet nontrivial competency that must be learned and practiced. Subsequent extensions highlight generative AI literacy, emphasizing the increasing difficulty of identifying AI-generated content when it is seamlessly embedded within everyday information flows [26]. These developments suggest that discursive recognition may depend on contextual cues rather than on generalized familiarity alone.

Despite expanding public discussion of AI risks and platform governance, empirical studies report variation between general concern and explicit discernment. Awareness of personalization mechanisms differs across platforms, user groups, and media systems. Cross-national research indicates that many users remain only partially aware of algorithmic curation, particularly where such systems are normalized as default features of news and social media consumption [27]. This uneven distribution of awareness provides context for understanding why explicit AI attribution may not consistently appear in mainstream discourse, even when skepticism is expressed.

The literature also proposes structural reasons for divergence between diffuse awareness and explicit discursive attribution. Algorithmic systems are frequently experienced as background infrastructures rather than as foreground objects of reflection [23,26]. Cox [26] notes that algorithmic literacy research often addresses AI embedded in opaque infrastructures, whereas AI literacy discussions more often presume identifiable systems as focal objects. Studies of perceived algorithmic fairness further indicate that users may evaluate automated systems through surface cues of transparency or justice without detailed knowledge of underlying mechanisms [28]. Under such conditions, generalized awareness may not readily translate into explicit discursive articulation.

Research on explainability further clarifies conditions under which awareness becomes discursively actionable. Shin [29] finds that explainability and causability are associated with variations in user trust through both heuristic and systematic processing routes. In the absence of explanatory cues, users may rely more heavily on surface heuristics. Such reliance can sustain general skepticism without necessarily enabling explicit reasoning about algorithmic mediation. Similarly, studies of algorithmic fairness suggest that perceptions of transparency and justice vary depending on how algorithmic processes are communicated and framed [28].

Importantly, increased awareness does not uniformly reduce skepticism and may, in some contexts, intensify critical scrutiny. Research on algorithm awareness and system acceptance suggests that awareness can coincide with both increased perceived usefulness and heightened sensitivity to bias or opacity [30]. This ambivalence indicates that explicit AI-related discernment should not be interpreted as a direct indicator of acceptance or rejection. Rather, it reflects a higher-order interpretive orientation shaped by literacy, contextual cues, and the situational visibility of algorithmic mediation—conditions that may not be consistently present in routine discourse.

While automated detection research focuses on whether AI-generated content can be technically identified, the present study adopts a complementary perspective. It examines whether AI mediation becomes explicitly recognized and named in public discourse. Accordingly, AI-related discernment is conceptualized as explicit discursive attribution of AI involvement in content generation rather than as generalized awareness of background processes such as ranking or recommendation. The study does not evaluate technical detectability, as detectability and discursive attribution operate at analytically distinct levels.

2.4. Positioning the Present Study

Existing research provides substantial insight into misinformation diffusion, heuristic credibility judgments, and variation in algorithmic awareness. Diffusion studies indicate that misleading content often circulates through human judgment and social dynamics rather than through automated agents alone [31]. Experimental research further demonstrates that credibility assessments are highly sensitive to contextual and surface cues [32]. Taken together, these findings suggest that information evaluation in digital environments is influenced by platform context and communicative cues, often without systematic consideration of content provenance.

Survey-based and measurement-oriented research further indicates that public awareness of algorithmic and AI-mediated processes is unevenly distributed. Cross-national evidence points to variation in algorithmic awareness across demographic groups, platforms, and media systems [27]. Distinctions between awareness of algorithmic presence and knowledge of algorithmic functioning suggest that recognition alone may not correspond with critical evaluation or adaptive behavior [24]. Research on AI literacy similarly emphasizes that recognizing AI involvement is a nontrivial competence that cannot be presumed in routine encounters [25].

Despite these advances, an important conceptual distinction warrants clarification: the distinction between awareness as a cognitive construct and attribution as a discursive act. Survey and experimental research typically measure reported knowledge of algorithms or AI systems, capturing awareness as an internal competence or belief state. Such measures are essential for understanding literacy, variation, and demographic differences. However, they do not indicate whether recognition of AI mediation becomes publicly articulated in everyday interaction.

Discursive attribution operates at a distinct analytical level from cognitive awareness. Whereas literacy constructs refer to internal knowledge or belief states, discursive attribution concerns observable communicative performance—specifically, whether AI involvement is explicitly invoked in public interaction. A user may possess AI literacy yet interpret content within a predominantly human-authored frame during routine discourse. Conversely, explicit attribution may emerge situationally without reflecting stable or generalized literacy. Distinguishing these levels enables the present study to complement awareness research by examining when and how AI recognition becomes discursively visible in naturalistic settings.

From an information behavior perspective, users may question credibility or intent while continuing to interpret content within a human-authored frame [33]. Computational mediation is frequently experienced as background infrastructure rather than as a focal object of reflection [23]. Under such conditions, skepticism does not necessarily correspond with explicit attribution of technological mediation. Heuristic and more systematic forms of processing may therefore coexist with limited discursive recognition of AI involvement.

The present study addresses this distinction by focusing on observable discursive behavior rather than on declared awareness or hypothetical judgments. Instead of examining whether users report understanding algorithms or AI systems, the study analyzes how skepticism and discernment are articulated within public discussions as part of routine interaction. It operationalizes three analytically distinct orientations at the comment level: (L1) general skepticism toward content claims, (L2) structural suspicion concerning framing or coordination, and (L3) explicit attribution of AI-generated content, distinguished from background algorithmic ranking or recommendation. By analyzing how these orientations appear and co-occur across threads and contexts, the study provides a corpus-bound account of how uncertainty and attribution are publicly enacted in algorithmically mediated environments.

By adopting this discourse-centered perspective, the study complements diffusion, misinformation, and literacy research by specifying how skepticism and AI recognition function as situated communicative practices rather than solely as internal competencies. The contribution does not revise or challenge prior findings; instead, it analytically differentiates cognitive awareness from publicly articulated attribution within contemporary digital environments. In doing so, the study advances a complementary line of inquiry into the conditions under which AI recognition becomes discursively expressed in routine interaction.

3. Methodology

3.1. Research Design

This study employs an exploratory, observational design to examine how digital skepticism and AI-related discernment are expressed in naturally occurring online discourse. The analytical objective is to identify observable patterns of information behavior in public discussion contexts rather than to test predefined hypotheses, estimate population-level prevalence, or model causal relationships. The design situates information behavior within algorithmically mediated environments, consistent with prior scholarship emphasizing ecologically grounded analysis [2,9].

A discourse-level orientation enables the identification of publicly articulated evaluative judgments that self-report instruments may not fully capture. Whereas surveys measure stated awareness, this approach examines enacted information behavior as it appears in observable communication. Quantitative summaries are used to support structured within-corpus comparison rather than inferential generalization beyond the sampled dataset.

All quantitative comparisons are treated as corpus-bound descriptive summaries under conditions of sparse base rates and fixed-cap sampling. Because comment-level observations are nested within discussion threads, assumptions required for classical statistical generalization—such as independence and random sampling—are not invoked. Interpretation therefore remains anchored in descriptive pattern comparison at the level of observable discourse.

User-generated comments are analyzed as situated expressions of interpretive judgment within platform-mediated communicative practices. Rather than inferring latent attitudes or cognitive states, the analysis focuses on how skepticism and AI attribution are articulated—or remain absent—through identifiable linguistic signals occurring within platform-mediated infrastructures [14,23,34].

Accordingly, the unit of analysis is expressed discourse at the comment level. The study examines explicit linguistic indicators of doubt, suspicion, and technological attribution as they appear in responses to information encountered within algorithmically mediated environments. This design enables systematic comparison of discursive forms without relying on survey prompts, experimental manipulation, or assumptions about internal psychological states.

3.2. Data Source and Context

The empirical analysis draws on publicly available discussions from Reddit, a large-scale social media platform organized around topic-centered communities known as subreddits. Reddit supports extended text-based interaction, threaded discussion, and community-specific norms that facilitate explicit evaluation, contestation, and collective sense-making. These structural characteristics provide an analytically tractable setting for examining discursive expressions of digital skepticism and AI-related discernment [14,23].

The study focuses on news-related public discourse, defined as discussions referencing current events, public affairs, or informational content presented in a journalistic or quasi-journalistic style. Such contexts are analytically relevant because users routinely evaluate credibility, relevance, and intent under conditions of uncertainty. Exposure and ranking within Reddit are mediated through algorithmic and voting-based mechanisms, while personalization and infrastructural processes typically remain backgrounded within user-facing interaction.

Reddit’s topic-based structure enables structured comparison across discourse contexts with distinct orientations, including technology-focused and mainstream news communities. The platform’s visibility and voting mechanisms provide contextual signals associated with variation in how skepticism and AI-related attribution are articulated within the sampled corpus. Subreddit selection criteria and sampling procedures are detailed in Section 3.3.

Because discussions are public by default, naturally occurring information behavior can be observed without requiring inference about users’ identities or motivations [34]. The use of Reddit does not aim to achieve population-level generalization. Rather, the platform serves as an analytically bounded environment for examining how skepticism, suspicion, and technological attribution become publicly articulated within a large-scale, algorithmically mediated information ecosystem.

3.3. Data Collection Procedure

Data were collected from Reddit using the platform’s official application programming interface (API) in read-only mode. The collection strategy was designed to capture naturally occurring, news-related public discourse rather than responses elicited through surveys, prompts, or experimental manipulation. This approach enables observation of how skepticism and AI-related discernment are articulated in routine online interaction.

Discussion threads were retrieved from public subreddits that regularly host news-related or information-oriented content. Each thread was preserved as a coherent discursive context at the data-harvesting stage. The procedure did not involve post hoc filtering based on stance or classification outcome. The final dataset comprises 305 discussion threads and 6065 comments collected between August 2010 and August 2025.

Although the corpus spans 2010–2025, time is not modeled as a continuous sequence. Earlier years are retained as contextual background rather than treated as evidence of temporal progression. Because explicit AI-related attribution depends on historically situated lexical availability, segmented comparison is limited to analytically bounded periods (2019–2021; 2023–2025), reported in Section 4.5. This design avoids conflating the emergence of generative AI terminology with observable discursive articulation under differing historical conditions.

For analytical comparability across discourse contexts, a standardized extraction procedure was applied. Up to 20 comments were retrieved per discussion thread based on the API-returned “best/top” ordering; threads with fewer than 20 comments were included in full. This fixed-cap procedure supports bounded comparison across heterogeneous participation levels without privileging high-activity threads.

Only comment text and limited thread-level contextual metadata were retained for analysis. No usernames, profile information, or personally identifiable attributes were collected or stored. Table 1 summarizes the dataset characteristics and units of analysis.

Sampling Strategy and Thread Selection

The selection of discussion threads followed a purposive, criteria-based approach designed to identify news-related public discourse in which credibility assessment and interpretive judgment are commonly articulated. The design prioritizes analytical relevance and transparency rather than statistical representativeness, consistent with discourse-analytic approaches to information behavior.

Threads were eligible for inclusion if they met three criteria: (1) the initiating post referenced a current event, public issue, or informational claim presented in a journalistic or quasi-journalistic style; (2) the thread generated substantive user discussion beyond minimal reactions; and (3) the discussion was publicly accessible without registration or membership restrictions.

The term news-related discourse refers to topical orientation rather than source origin. Threads included content originating from professional news outlets as well as informational posts functioning as news-like references within Reddit communities, such as summaries of reports, blog excerpts, or reposted media content.

Thread identification was conducted through systematic keyword searches combined with subreddit-specific browsing, followed by screening against predefined inclusion criteria. Keywords functioned as structured retrieval anchors to identify candidate discussions involving public events, informational claims, and controversies in which credibility evaluation was textually present. Representative terms included event-related markers (e.g., “report,” “investigation,” “leak,” “breaking”), media-referential cues (e.g., “video,” “image,” “source,” “evidence”), and manipulation-related triggers (e.g., “misinformation,” “fake,” “manipulated,” “AI-generated,” “deepfake”).

Keywords served exclusively to retrieve candidate threads. Final inclusion was determined through qualitative screening against predefined eligibility criteria rather than by keyword presence. Once threads met inclusion requirements, all comments within the standardized fixed-cap extraction were retained without additional content-based filtering. Keywords did not determine classification outcomes or subsequent analytical categorization.

The final corpus includes both sparse and highly active threads. Variation in participation volume was retained as an empirical feature of news-related discourse rather than treated as sampling noise.

Five subreddits—r/Futurology, r/technology, r/news, r/worldnews, and r/politics—were selected to enable structured comparison between technology-oriented and mainstream news contexts. This comparison does not assume equivalence between communities but examines how discursive recognition of AI mediation is articulated across distinct informational environments within the sampled corpus.

The keyword strategy was aligned with the conceptual distinction between content-level skepticism and mediation-level attribution developed in Section 2.1 and Section 2.3. Event-oriented and media-artifact terms were included to retrieve contexts in which evaluative discourse concerning authenticity and evidence was textually articulated, while manipulation- and AI-related triggers were included to identify contexts in which explicit attribution to automated or synthetic processes might be textually expressed. This alignment ensures coherence between theoretical framing and empirical identification without predetermining analytical outcomes.

3.4. Data Preparation and Units of Analysis

Data preparation enabled systematic analysis while preserving the natural structure of online discourse. All collected comments were screened to confirm the presence of textual content and essential contextual metadata. No normalization, sentiment transformation, or linguistic modification was applied beyond the minimal preprocessing required to associate comments with their corresponding discussion threads and discourse contexts.

The primary unit of analysis is the individual comment. Comments represent the smallest observable unit through which users articulate evaluative judgments, express skepticism, or explicitly attribute information to AI-mediated processes. This operationalization captures explicitly articulated discursive moments consistent with the rule-constrained coding framework. Treating comments as the primary analytical unit enables fine-grained identification of discursive expressions without presupposing coherence or intent at the user or thread level, consistent with discourse-oriented approaches to information behavior [2,23].

Comment-level classifications were aggregated at the thread level, which serves as the secondary unit of analysis. Aggregation summarizes the distribution of classified comment types within each discussion thread and enables examination of how skepticism and AI-related discernment manifest collectively within shared discourse environments. This two-level structure supports cross-context comparison while retaining sensitivity to individual contributions.

The analysis does not include user-level modeling, network analysis, or temporal sequence modeling. User identifiers were not retained, and multiple contributions by the same individual were treated as independent discursive expressions rather than repeated measures of user-level cognition. This design choice is consistent with the study’s focus on expressed information behavior as articulated in text, not on modeling individual trajectories or interaction dynamics.

Engagement was operationalized at the thread level as a contextual indicator of score-based visibility. Because the dataset was constructed using a fixed cap (up to 20 comments per thread), engagement is not treated as a continuous measure of participation intensity. Instead, visibility contexts were distinguished using the thread-level mean comment score, which serves as a platform-specific proxy for collective valuation within Reddit’s voting and ranking infrastructure. It does not capture individual attention, persuasion, or interpretive impact.

The study does not conduct inferential hypothesis testing for engagement effects or population-level estimation. All quantitative summaries are interpreted as corpus-bound descriptive comparisons under fixed-cap sampling and sparse base-rate conditions. Because comments are nested within threads and are not assumed to constitute independent observations, observed percentage differences are understood as contextual distributional patterns rather than inferential effects.

Conceptually, visibility is treated as a platform-level condition tied to ranking and collective valuation rather than as a proxy for individual comprehension or awareness. On Reddit, score-based signals primarily reflect amplification and exposure dynamics operating within algorithmic ordering mechanisms. In this study, visibility situates discourse environments under different score-based conditions while remaining analytically agnostic with respect to individual-level cognition. This treatment aligns with platform-oriented perspectives that conceptualize engagement as an infrastructural feature of information circulation.

3.5. Analytical Framework and Coding Procedure

The analysis focuses on observable information behavior as articulated in public discourse. No inference is extended beyond explicitly expressed textual cues.

The L1–L3 framework is not a psychological scale, developmental sequence, or validated measurement instrument. It is an analytically constructed typology designed to capture distinct orientations toward information and its production as articulated in text. This approach aligns with discourse-analytic and practice-oriented perspectives in information science, which emphasize what users do in communicative settings rather than what they possess as internal cognitive states [2,23].

The framework is informed by three strands of prior research. First, scholarship on digital and informational skepticism shows how users express doubt and credibility assessment without necessarily articulating underlying mechanisms of information production. Second, research on misinformation and media manipulation documents discursive practices that question structure, framing, or authenticity—such as claims of staging or propaganda—without explicit reference to AI-based generation. Third, studies of algorithmic mediation demonstrate that automated systems operate within information environments without being consistently verbalized in everyday discourse [2,23].

The present study applies three analytically defined categories for discourse-level analysis. These categories differentiate how skepticism and attribution are expressed linguistically. They do not infer degrees of knowledge, competence, or literacy. The framework is designed for interpretive clarity and analytical transparency rather than psychometric validation.

Three analytically distinct but non-mutually exclusive categories were applied.

General skepticism (L1) refers to expressions of doubt directed at the credibility, accuracy, or authenticity of information without reference to content-generation mechanisms.

Structural suspicion (L2) captures skepticism oriented toward the form, framing, or construction of content—such as references to staged, scripted, misleading, or propagandistic formats—without explicit attribution to AI.

Explicit AI-related discernment (L3) is operationalized narrowly and captures only direct attribution to AI-generated or synthetically produced content (e.g., AI-generated text, synthetic narration, deepfakes). References to algorithmic ranking, curation, or recommendation without attribution to content generation are not classified as L3.

Analytical separation between L1 and L2 distinguishes skepticism directed at content claims from suspicion directed at content construction. Overlap between L1 and L2 is expected, as these orientations frequently co-occur in naturalistic communication. Such overlap reflects layered discursive evaluation rather than category ambiguity.

Prior research indicates that AI-generated content can often be identified through computational methods without corresponding human recognition [35,36,37,38]. Technical detectability and discursive attribution operate at distinct analytical levels. For this reason, L3 is restricted to explicit verbal attribution to AI in order to avoid conflating computational detection with observable recognition.

The L1–L3 categories are treated as coexisting analytical lenses rather than stages along a continuum. Table 2 summarizes the operational definitions and illustrative indicators for these categories. A single comment may be classified into more than one category when multiple orientations are explicitly expressed. This reflects the layered nature of discursive evaluation.

Classification relied on a transparent, rule-based coding scheme grounded in explicit linguistic cues and constrained contextual usage observed in Reddit discourse. Coding was based on keyword patterns and phrase-level indicators rather than probabilistic inference or machine-learning classification. This approach is designed to support interpretability and reproducibility by allowing direct inspection of how classifications are derived. The finalized rule set is documented in Appendix A.

3.5.1. Coding Procedure and Reliability Assessment

Coding was implemented using a hybrid procedure combining deterministic rule-based automation with human reliability validation and adjudicated ground-truth benchmarking.

The automated rule-based classifier served as the primary coding mechanism applied to the full dataset of 6065 comments. Human coding was conducted on a validation subset to assess interpretive consistency, evaluate classifier performance, and establish an adjudicated benchmark for comparison.

All comments were processed using a Python (version 3.12.12) script that applied predefined keyword and pattern-matching rules derived directly from the finalized codebook (Appendix A). The rule set was fully specified prior to validation and was not modified thereafter. This ensured consistent and reproducible classification under transparent, inspection-ready coding logic.

To assess reliability, a validation subset of 300 comments (approximately 5% of the dataset) was independently double-coded by two trained human coders. Each indicator (L1, L2, L3) was coded as a binary variable (TRUE/FALSE), and multiple classifications were permitted within a single comment. Coders were instructed to rely exclusively on explicit textual cues as defined in the rulebook and were prohibited from inferring author intent beyond observable discourse.

Within the adjudicated validation subset (n = 300), 144 comments were positive for L1, 23 for L2, and 27 for L3. Intercoder reliability was assessed separately for each indicator using Cohen’s kappa (κ), appropriate for binary nominal variables [39,40]. Agreement levels were κ = 0.72 for L1, κ = 0.47 for L2, and κ = 0.76 for L3, corresponding to commonly used interpretive benchmarks for substantial agreement for L1 and L3 and moderate agreement for L2.

Disagreements were resolved through structured adjudication. Discrepant cases were jointly reviewed using the predefined rulebook, and final labels were determined by consensus strictly according to operational criteria. Automated classifier outputs were not visible during adjudication to prevent circular confirmation. The resulting consensus labels constituted the adjudicated ground truth for classifier evaluation. This benchmark reflects rule-constrained human interpretation aligned with theoretical definitions rather than independent reinterpretation.

Classifier performance was evaluated by comparing automated labels against the adjudicated ground truth. The classifier achieved high precision across all indicators (L1 = 0.956; L2 = 1.000; L3 = 1.000). Recall remained strong for L1 (0.903) and L3 (0.852), while L2 recall was lower (0.696). Corresponding F1-scores were 0.929 (L1), 0.821 (L2), and 0.920 (L3). Table 3 reports the full confusion matrix and performance metrics. No false positives were observed for L2 and L3 within the validation subset, suggesting that under-detection in these categories is consistent with conservative false negatives under the specified rule configuration.

The comparatively lower recall for L2 is consistent with the boundary-sensitive and context-dependent nature of structural suspicion as a discourse-level construct. Unlike L3, which depends on explicit lexical attribution, L2 captures orientations toward coordinated or systemic manipulation that are often indirectly framed and lack stable lexical markers. The classifier was specified to minimize false positives, prioritizing precision over sensitivity. Under sparse base-rate conditions, minor interpretive differences can influence proportional estimates; L2 prevalence in the full dataset is best understood as a conservative lower-bound estimate of explicitly articulated structural suspicion.

Importantly, no rule modifications, threshold adjustments, or recalibration was introduced following validation. The full dataset was not re-coded after evaluation. Reported performance metrics therefore reflect the performance of the predefined rule-based system without post hoc optimization.

3.5.2. Temporal Context and Analytical Scope

The dataset spans multiple years during which public discourse surrounding artificial intelligence and algorithmic mediation underwent substantial transformation. To address temporal heterogeneity, the analysis adopts a coarse-grained temporal framing rather than modeling fine-grained dynamics or assuming continuity over time.

Two periods are distinguished: an early period (2019–2021) and a recent period (2023–2025). This segmentation captures broad differences in the sociotechnical context of information production and interpretation, particularly across periods associated with differing levels of public salience of generative AI systems. The intervening year (2022) is treated as transitional and excluded from comparative analysis. Methodologically, this exclusion is intended to limit potential boundary ambiguity associated with partial technological diffusion and uneven public recognition of generative AI during periods of rapid adoption and shifting media attention. During transitional phases, interpretive frames may coexist without stable lexical markers, increasing ambiguity in discourse-level attribution. By omitting 2022 from direct comparison, the design is intended to support analytical clarity and facilitate differentiation between pre-diffusion and post-diffusion contexts.

This framing does not constitute a longitudinal or time-series design and does not assume linear trends, developmental trajectories, or causal temporal effects. Time is treated as a contextual condition associated with differences in the discursive availability and recognizability of AI as an interpretive frame rather than as an explanatory variable. The comparison is therefore contextual rather than longitudinal. It assesses whether explicit AI-related attribution is discursively available under different historical conditions, not whether individual recognition increases over time.

This approach recognizes that aggregating discourse across extended time spans may obscure contextual variation. By treating time as a conditioning factor rather than a causal mechanism, the study preserves analytical validity while avoiding overinterpretation of sparse temporal signals. Discourse from earlier years functions as a contextual baseline rather than as evidence of stable cross-era patterns of AI-related recognition.

3.6. Analytical Procedures

The analysis followed a structured, non-inferential procedure to describe how skepticism and AI-related discernment are articulated within public online discourse. The process consisted of three stages: comment-level classification, thread-level aggregation, and contextual contrast. This sequential structure aligns the analytical framework with the defined units of analysis across stages. Figure 1 provides an overview of the analytical workflow.

Stage 1: Comment-Level Classification

Each comment was classified according to the framework described in Section 3.5 using the predefined rule-based coding scheme. Comments were evaluated for explicit linguistic cues corresponding to General Skepticism (L1), Structural Suspicion (L2), and Explicit AI-Related Discernment (L3). Because categories are non-mutually exclusive, a single comment could receive multiple classifications when more than one orientation was expressed. This approach follows discourse-oriented perspectives that prioritize observable evaluative expressions rather than latent psychological inference [4,23].

Stage 2: Thread-Level Aggregation

Classified comments were aggregated at the thread level to examine how skepticism and AI-related discernment are articulated collectively within shared discourse contexts. For each discussion thread, L1, L2, and L3 classifications were summarized descriptively in terms of their presence and proportional representation within the thread. These summaries constitute a corpus-bound account of how discursive orientations appear across contexts without implying statistical generalization or effect estimation. The aggregation supports structured cross-context contrast while preserving sensitivity to variation in observable discursive expression [2].

Stage 3: Contextual Comparison

Thread-level summaries were examined across discourse contexts, including technology-oriented and mainstream news subreddits, as well as across score-based visibility conditions. Engagement indicators were used descriptively to provide contextual reference for discourse environments rather than as predictors of individual behavior. The analysis examined contextual differences in the observed presence and co-occurrence of L1–L3 classifications across settings without fitting causal or inferential statistical models.

Figure 1 illustrates the sequential analytical workflow. Data were collected via the official Reddit API, followed by comment-level rule-based classification into L1 (general skepticism), L2 (structural suspicion), and L3 (explicit AI-related discernment). Classified comments were then aggregated at the thread level to summarize discursive patterns within shared contexts. These summaries enabled descriptive cross-context contrast without inferential modeling.

4. Results

This section reports descriptive patterns derived from the comment-level classification and thread-level aggregation procedures outlined in Section 3.6. As the study adopts a discourse-analytic and observational design, the focus is on identifying when and where skepticism and AI-related attribution become discursively visible within this sampled corpus rather than on estimating causal effects or population prevalence.

4.1. Distribution of Digital Skepticism and AI Discernment

The analysis includes 6065 comments drawn from 305 discussion threads within news-related Reddit discourse. Each comment was evaluated for the presence of three categories: General Skepticism (L1), Structural Suspicion (L2), and Explicit AI-Related Discernment (L3). Because classifications are non-mutually exclusive, a single comment could receive multiple labels.

Across the full dataset, General Skepticism (L1) was identified in 2.24% of comments. Structural Suspicion (L2) appeared in 0.26%, and Explicit AI-Related Discernment (L3) in 0.38%. In proportional terms within this corpus, instances of L1 were more frequent than instances of L2 or L3. Direct attribution of content to AI mediation or synthetic generation was observed infrequently in the sampled dataset under the applied coding criteria.

These proportions provide a descriptive reference point for subsequent corpus-bound comparisons across score-based visibility conditions and subreddit-level discourse contexts.

4.2. Digital Skepticism and AI Discernment by Score-Based Visibility Context

Visibility contexts were defined at the thread level using a median split of the thread-level mean comment score (median = 18.75). Threads with a mean score ≥ 18.75 were categorized as higher-score (higher-visibility), and those below this threshold as lower-score (lower-visibility). This operational division was employed to enable structured descriptive comparison under the fixed-cap sampling design and does not imply categorical platform strata. The distribution of L1–L3 classifications across visibility contexts is summarized in Table 4.

Across contexts, all three categories were observed in both higher- and lower-score threads. In absolute terms, 48 instances of general skepticism (L1), 6 of structural suspicion (L2), and 10 of explicit AI-related discernment (L3) were identified in higher-score threads (N = 3040 comments), compared with 88 (L1), 10 (L2), and 13 (L3) in lower-score threads (N = 3025 comments).

In proportional terms, L1 appears at a rate of 2.91% in lower-score threads and 1.58% in higher-score threads. For L2, proportional representation is 0.33% in lower-score threads and 0.20% in higher-score threads. For L3, proportional representation is 0.43% and 0.33%, respectively. These numerical contrasts are presented descriptively and should be interpreted in conjunction with the small underlying absolute counts.

Given the bounded comment volume per thread, sparse base rates, and nested data structure, these patterns are interpreted strictly as corpus-bound descriptive contrasts rather than as indicators of substantive magnitude or statistical effect.

No assumptions of independence, population estimation, engagement causality, or ranking effects are invoked. Visibility-based contrasts therefore describe bounded differences observed within this sampled corpus rather than generalized platform-level dynamics.

4.3. Digital Skepticism and AI Discernment Across Discourse Contexts

To examine contextual contrasts within the sampled corpus, comment-level proportions were compared across subreddits representing technology-oriented and mainstream news-oriented discussions.

Explicit AI-related discernment (L3) was observed in technology-oriented contexts within this dataset, including 16 instances in Futurology and 5 instances in Technology. Two instances were identified in Politics, and no instances were identified in News or Worldnews under the applied coding criteria. Given the low base rates under conservative rule specification, these numerical contrasts are interpreted descriptively and in relation to the underlying absolute counts rather than as indicators of substantive magnitude.

General skepticism (L1) was observed across all subreddits, with proportional representation ranging from 1.11% (Technology) to 3.33% (Futurology). Structural suspicion (L2) was observed in all contexts except Futurology. In Worldnews, nine instances (1.13%) were identified within the sampled corpus. Given the sparse base rates, this value is treated as a corpus-bound observation rather than as evidence of contextual elevation. Because L2 and L3 occur infrequently, these patterns represent limited observable articulations within each discourse environment rather than estimates of underlying cognitive prevalence.

Taken descriptively, explicit AI attribution appears in comparatively greater proportional representation within technology-oriented discussions in this corpus, whereas general skepticism is observed across both technology and mainstream news contexts. Structural suspicion is present at differing proportional levels across discourse contexts; however, the design does not support inference regarding broader platform-level dynamics, ranking mechanisms, or engagement effects.

Figure 2 visualizes comment-level proportions of L1 (general skepticism), L2 (structural suspicion), and L3 (explicit AI-related discernment) across subreddits. The visualization summarizes corpus-bound descriptive contrasts and should be interpreted in conjunction with the raw counts reported in Table 5.

4.4. Consistency Across Levels of Analysis

To assess descriptive consistency across analytical levels, comment-level and thread-level summaries were examined in parallel. Subreddits in which Explicit AI-Related Discernment (L3) was present at the thread level within this dataset also showed corresponding comment-level instances of L3. Similarly, contexts with minimal or no thread-level L3 showed comparably low comment-level frequencies.

A comparable descriptive correspondence was observed for General Skepticism (L1) and Structural Suspicion (L2). No notable cross-level inconsistencies were identified within the sampled corpus.

4.5. Temporal Comparison of Digital Skepticism and AI Discernment

To address temporal heterogeneity, a segmented comparison was conducted between an early period (2019–2021) and a recent period (2023–2025). The comparison is descriptive and does not model longitudinal change. Instead, it contrasts observable proportions of skepticism and AI-related attribution across two sociotechnical contexts separated by the widespread public diffusion of generative AI tools.

At the comment level, General Skepticism (L1) was observed at 1.43% in the early period and 1.98% in the recent period. Structural Suspicion (L2) was 0.95% in the early period and 0.15% in the recent period. Explicit AI-Related Discernment (L3) was not observed in the early period (0 instances) and was identified in 20 comments (0.59%) in the recent period.

These numerical contrasts are interpreted within the constraints of the fixed-cap sampling design. Because base rates are low and comments are nested within threads, small absolute differences may appear proportionally amplified when expressed as percentages. The segmented framing does not assume temporal continuity, causal sequencing, developmental progression, or independent observations across periods.

Thread-level summaries display comparable proportional contrasts across periods, indicating descriptive cross-level correspondence within the segmented corpus. The proportional distribution of L1–L3 classifications across the early and recent periods is summarized in Table 6.

Accordingly, the segmented comparison reflects bounded descriptive contrasts between two observed discourse contexts rather than evidence of modeled temporal change.

4.6. Low Prevalence of Discursive Skepticism and AI Attribution

As reported in Section 4.1, the overall proportion of classified indicators is low within the sampled corpus. General Skepticism (L1) was identified in 2.24% of comments, Structural Suspicion (L2) in 0.26%, and Explicit AI-Related Discernment (L3) in 0.38%. These values are reported under the fixed-cap sampling design implemented to ensure cross-thread comparability.

Classification relied strictly on explicit linguistic cues and direct textual attribution. Comments were coded only when they contained unambiguous expressions corresponding to the operational definitions of L1–L3. Accordingly, the reported proportions represent explicitly articulated discursive expressions rather than inferred attitudes, competencies, or latent orientations.

Validation results (Section 3.5.1) indicate that the rule-based classifier was specified under a precision-oriented configuration. In the validation subset, no false positives were observed for L2, while recall was lower relative to L1 and L3. This pattern is consistent with the boundary-sensitive character of structural suspicion, which depends on explicit reference to coordination, systemic manipulation, or structured intent. Under this specification, ambiguous or implicitly framed suspicion was not classified as structural suspicion unless it met predefined operational criteria.

For example, statements such as “the media always pushes a narrative” or “this feels orchestrated somehow” may express generalized suspicion but do not necessarily include explicit structural attribution. Under the rule-constrained coding framework, such comments were evaluated strictly against predefined criteria and were not coded as L2 when structural attribution remained implicit.

This approach maintains conceptual separation between generalized skepticism (L1), structurally oriented suspicion (L2), and explicit AI attribution (L3). The conservative specification is intended to support interpretive transparency and to limit inflation under sparse base-rate conditions.

Similarly, the L3 category demonstrated high precision in validation. Given the sparse base rates, proportional contrasts should be interpreted cautiously and in relation to underlying absolute counts. The low observed proportion of L3 therefore reflects the limited frequency of explicit lexical attribution under the applied coding criteria within the sampled corpus rather than absence of potential background recognition.

5. Discussion

Throughout this discussion, discursive silence regarding AI mediation is not interpreted as evidence of cognitive unawareness. Rather, it is treated as a function of discursive and infrastructural conditions that shape what forms of recognition become publicly articulated within everyday online interaction.

5.1. Interpreting the Observed Distributional Difference Between Digital Skepticism and AI Discernment

Within the sampled corpus and under the conservative rule-based specification adopted in this study, a distributional difference is observed between generalized digital skepticism and explicit AI-related discernment. Instances of L1 are numerically more common than instances of L3 within the applied coding framework. Across the collected discussions, explicit AI attribution does not consistently appear alongside everyday expressions of skepticism in news-related contexts. Rather, AI mediation becomes discursively expressed under specific lexical and contextual conditions identifiable within the dataset.

This distributional pattern should not be interpreted as an absence of awareness of AI technologies. Rather, it reflects the analytical distinction between implicit background recognition and explicit discursive attribution. The present study examines only what becomes textually articulated and lexically identifiable within publicly observable discourse. Expressions of skepticism toward credibility appear throughout the sampled corpus, whereas explicit technological attribution appears in a smaller number of comments under the applied criteria.

The observed proportion of explicit AI-related discernment must also be understood in relation to the operational logic of the coding framework. The rule-based classification prioritizes explicit lexical anchoring and minimizes interpretive expansion beyond clearly defined criteria. Consequently, discursive expressions that implicitly gesture toward automated mediation without employing recognizable AI-related terminology fall outside the L3 category. The reported proportions therefore represent conservative, discourse-bound estimates of explicit attribution rather than comprehensive indicators of latent awareness.

The inclusion of Structural Suspicion (L2) as a distinct analytical category preserves conceptual differentiation between diffuse credibility questioning (L1) and direct technological attribution (L3). L2 captures discourse oriented toward staging, coordination, or systemic manipulation without invoking AI explicitly. This intermediate distinction allows analytical differentiation without reducing discursive orientations to a binary skepticism-versus-AI framework.

Importantly, L2 is not conceptualized as a stable prevalence construct but as an analytic device that maintains theoretical granularity within the typology. Given its boundary-sensitive character and sparse base rate, empirical patterns involving L2 function as illustrative within-corpus manifestations rather than proportionally robust comparative indicators.

Taken together, the findings document a corpus-bound distributional difference under rule-constrained coding conditions, reflecting observable patterns of public articulation within platform-specific and linguistically situated contexts rather than supporting inference about underlying levels of awareness or recognition.

5.2. Visibility and Contextual Conditions of Discursive AI Attribution

Across score-based visibility contexts, explicit AI-related discernment is not consistently more prevalent in higher-score threads within this dataset. Comment-level counts indicate that L1, L2, and L3 instances are present in both higher- and lower-score contexts under the fixed-cap sampling design. Given the sparse base rates and nested observations, these differences are interpreted strictly as within-corpus distributional contrasts rather than as intensity- or engagement-based effects.

Within this sampled corpus, score-based visibility is treated descriptively as a contextual feature associated with the API-returned ordering of comments. Under the fixed-cap procedure, many higher-score threads include numerous short alignment or stance-taking responses focused on the substantive claim of the initiating post. Comments raising questions about provenance, mediation, or technological production are also observed but may appear lower in the retrieved ordering. These observations describe the distribution of comment types within the sampled and ordered dataset; they do not establish how ranking mechanisms shape discourse across the broader platform.

Illustrative excerpt (anonymized, corpus-derived and lightly paraphrased): In one higher-score thread, a top-ranked comment expressed strong stance-taking toward the substantive claim (e.g., “This is completely wrong—anyone can see that”), followed by brief alignment replies such as “Exactly” or “Well said.” Within the same discussion, a lower-ranked comment questioned the source (“Where is this information coming from?”) and suggested possible editing or manipulation. Explicit AI-related attribution was not present unless AI mediation had already been foregrounded in the initiating post. This illustration reflects the relative positioning of comment types within the retrieved ordering under the applied sampling procedure and should not be interpreted as evidence of causal ranking effects.

Importantly, these contrasts are limited to observable ordering within the sampled corpus and do not constitute behavioral claims about user cognition, motivation, or platform-wide amplification dynamics.

Explicit AI-related discernment is also unevenly distributed across subreddit contexts within this dataset. L3 instances were observed primarily in Futurology (16) and Technology (5), with two instances in Politics and none in News or Worldnews under the applied coding criteria. These counts indicate contextual variation within the corpus but do not imply broader prevalence patterns beyond the sampled threads.

Research on algorithmic awareness suggests that recognition of computational mediation may vary across topical settings and discursive communities [27]. In the present corpus, AI-related terminology appears more frequently in technology-oriented discussions, where AI systems are thematically salient. In mainstream news-related contexts included in the sample, evaluative attention more often centers on political meaning, institutional credibility, or actor intent, while explicit references to AI mediation appear less frequently articulated.

Taken together, these findings indicate that explicit AI attribution is discursively observable where AI terminology is lexically available and thematically foregrounded within the sampled corpus. Visibility and subreddit context are therefore treated as descriptive conditions associated with variation in observable articulation, not as explanatory mechanisms or engagement effects.

5.3. Implications for Information Quality and Digital Citizenship

The limited discursive visibility of explicit AI attribution within the sampled corpus invites reflection on how information quality is conceptualized in AI-mediated environments. As AI systems increasingly participate in content creation, summarization, and narration, the present findings indicate that evaluative practices observed in everyday discourse are frequently oriented toward surface-level credibility cues. Within this dataset, technological conditions of content production are articulated explicitly in a relatively small number of comments, even when skepticism is expressed.

This distributional pattern highlights the distinction between content-level evaluation and mediation-level attribution in publicly observable discourse. It suggests that frameworks addressing information quality may benefit from attending not only to accuracy and credibility claims but also to the discursive availability of provenance and synthetic-generation terminology in routine interaction. The present analysis does not assess the effectiveness of such frameworks but clarifies how mediation is—or is not—lexically articulated within this corpus.

These observations are also relevant to discussions of digital citizenship in AI-mediated environments. Rather than interpreting information evaluation solely as an individual cognitive competence, the findings underscore the role of discursive context in shaping what becomes publicly articulated. In the sampled discussions, explicit technological attribution appears contingent on whether AI terminology is thematically foregrounded and lexically accessible within a given community.

Scholarship on AI governance and information ethics emphasizes that interpretability and contestability are conditioned by institutional and platform-level factors, including transparency cues that render mediation more discussable [9]. The present findings do not test these structural dynamics but indicate that variation in discursive articulation may be analytically relevant to future empirical examination of how mediation becomes publicly named in everyday information evaluation.

In this sense, information quality may be examined not only in terms of content accuracy but also in relation to the discursive legibility of mediation processes within public interaction. The current study contributes to this line of inquiry by documenting when explicit AI attribution becomes observable under conservative, rule-based coding within a bounded corpus.

5.4. Contributions to Information Science

This study contributes to information science in four related respects. First, it empirically differentiates generalized skepticism toward informational content from explicit attribution to AI-mediated or AI-generated provenance within naturally occurring online discourse. By analyzing comment-level expressions rather than self-reported attitudes, the findings document that critical engagement with content and explicit technological attribution are not uniformly co-articulated within the sampled corpus. This distinction provides analytical clarity regarding how evaluative practices are publicly articulated at the discourse level under AI-mediated conditions.

Second, the study contributes methodologically to information behavior research by operationalizing discernment as an observable discursive phenomenon. The rule-based, discourse-oriented framework offers a transparent and reproducible basis for identifying evaluative orientations as they are textually expressed, thereby complementing attitudinal or survey-based measurement with analysis grounded in naturally occurring communication.

Third, the findings document contextual variation in the distribution of explicit AI attribution within the sampled corpus. L3 instances were observed primarily in technology-oriented subreddits included in the dataset and were less frequently observed in the mainstream news-related communities sampled under the present design. This contextual contrast illustrates how interpretive frames are situated within community-specific vocabularies and topical orientations as they become discursively articulated.

Fourth, by examining the discursive visibility of AI mediation in everyday interaction, the study extends existing survey- and experiment-based approaches to algorithmic awareness and AI literacy. Rather than measuring perceived understanding or prompted recognition, the present analysis specifies when AI mediation becomes explicitly named in routine discourse under conservative coding criteria. In doing so, it offers a discourse-analytic perspective on the relationship between awareness, articulation, and public expression within AI-mediated information environments.

6. Conclusions

This study examined how digital skepticism and AI-related discernment become discursively visible in news-related online discussions within algorithmically mediated information environments. Using naturally occurring Reddit discourse and a rule-based discourse-level analytical framework, the analysis focused on observable expressions of information evaluation rather than on self-reported attitudes or experimentally elicited judgments.

The findings should not be interpreted as evidence regarding levels of cognitive awareness. Instead, they clarify the communicative conditions under which AI mediation becomes publicly articulated within everyday online interaction.

Three corpus-bound observations can be identified. First, digital skepticism toward credibility and authenticity (L1) appears across routine news discourse, whereas explicit attribution of content to AI mediation or synthetic generation (L3) appears in a smaller number of comments under the conservative rule-based specification. Second, comparison across score-based visibility contexts shows no consistent pattern in the distribution of skepticism or AI attribution within the sampled dataset. Under fixed-cap comment sampling and nested thread structure, observed differences between visibility contexts are interpreted as bounded distributional contrasts rather than engagement effects. Third, L3 instances were observed in technology-oriented communities included in the dataset and were less frequently observed in the mainstream news-related communities sampled under the present design.

By foregrounding publicly expressed discourse, the study documents patterns of discursive articulation within AI-mediated information environments. The results indicate that AI-related discernment can be analytically treated as a distinct discursive orientation situated within community-specific vocabularies and topical contexts, rather than as a direct extension of generalized skepticism.

The findings also highlight the analytical relevance of provenance and mediation cues in news-related environments. When explicit AI attribution is infrequently articulated in observable discourse, mediation processes may remain unnamed within routine interaction under existing communicative conditions. This observation does not prescribe specific interventions but identifies disclosure and provenance signaling as variables warranting further empirical investigation in future research.

Disclosure mechanisms should therefore be examined as contextual affordances rather than assumed corrective tools. Prior research indicates that labeling practices exhibit heterogeneous associations depending on institutional trust, community norms, and interpretive expectations. Future research may explore how different forms of AI disclosure correspond with variation in discursive articulation across contexts and visibility conditions.

Further extensions of this discourse-oriented approach may include alternative operationalizations of visibility beyond voting-based proxies, cross-platform comparisons of discursive AI attribution, and mixed-method designs integrating qualitative close analysis with computational techniques to refine understanding of when and how AI mediation becomes lexically articulated in public discourse.

7. Limitations and Future Research

7.1. Limitations

This study has several limitations that define the scope of interpretation and indicate directions for future research. These constraints arise from the sampling strategy, the discourse-level analytical focus, the characteristics of the coding framework, the temporal structure of the dataset, and the platform-specific context.

First, the analysis is based on a purposive, non-exhaustive sample of 305 news-related discussion threads drawn from publicly available Reddit discourse. The sampling strategy supports focused examination of analytically relevant contexts but does not aim to provide statistical representativeness of the full universe of news-related discussions on the platform. The dataset includes only articulated comments from active participants; individuals who consume or evaluate content without commenting remain unobserved. Because participation reflects a subset of more vocal users and individual posters were not modeled, the findings should not be interpreted as generalizable to passive audiences or to population-level constructs such as AI awareness, skepticism, or literacy.

Second, the study relies on observational discourse data and does not support causal inference. The analytical design describes patterns of expressed skepticism and AI-related discernment as they appear within naturally mediated environments rather than modeling their determinants or inferring underlying cognitive states. The findings therefore pertain to publicly articulated discourse, not to individual belief, competence, or intention.

Third, the analytical framework captures skepticism and AI-related discernment only when these orientations are explicitly articulated in text. Implicit interpretive reasoning or background recognition that does not surface in comment-level discourse lies outside the scope of analysis. This limitation is particularly relevant for structurally oriented discourse (L2), which often depends on contextual framing and situational cues rather than stable or consistently identifiable lexical markers. Within the validation subset, L2 exhibited moderate intercoder agreement (κ = 0.47) and lower recall relative to L1 and L3, which is consistent with the boundary-sensitive and context-dependent character of structural suspicion as a discourse-level category.

Fourth, although the dataset spans multiple years, the study does not implement a longitudinal or time-series design. Temporal heterogeneity is addressed through segmented comparison across two sociotechnical contexts (2019–2021; 2023–2025), used to assess discursive availability under differing public conditions rather than to model change over time. Observed differences are treated as contextual differentiation rather than as developmental progression or temporal causation.

Fifth, the analysis focuses on English-language discourse on a single platform. Reddit represents a prominent but specific segment of the contemporary information ecosystem, characterized by distinctive participation norms, visibility mechanisms, and community structures. Patterns of digital skepticism and AI-related discernment may differ across platforms, languages, or cultural contexts. Extending analysis to additional platforms or multilingual settings would enable systematic comparison of how platform architecture and sociocultural conditions are associated with variation in the discursive visibility of AI mediation.

These limitations suggest several directions for future research. Longitudinal or event-centered designs could examine how explicit AI-related attribution is articulated in relation to technological milestones, media controversies, or regulatory interventions. Cross-platform and cross-linguistic analyses would help assess the contextual stability of observed distributional patterns across media systems. Finally, mixed-method approaches integrating discourse analysis with survey-based or interview-based investigations, experiments, or qualitative interviews could bridge the distinction between discursive articulation and individual-level awareness, supporting closer examination of how AI mediation is recognized, negotiated, and verbalized in contemporary information environments.

Among these possibilities, computational analysis of large-scale discourse and controlled experimental investigation of AI disclosure mechanisms represent particularly promising directions for extending the present framework.

7.2. Computational and Experimental Research Directions

A promising direction for future research involves extending the discursive framework developed in this study toward computational and experimental investigation of AI-related discernment in large-scale online discourse. The rule-based coding scheme distinguishing general skepticism (L1), structural suspicion (L2), and explicit AI-related attribution (L3) provides an analytically interpretable foundation that can support both large-scale computational analysis and controlled experimental designs examining how users interpret technologically mediated information. One potential data-analytic design would involve constructing a large annotated corpus of online discussions drawn from public social media platforms such as Reddit and other discussion forums where news and informational content are actively debated. A subset of comments could be manually coded according to the operational indicators documented in Appendix A, thereby creating labeled training data representing the three discursive categories (L1–L3). Such a dataset could then be used to train supervised natural language processing models capable of detecting discursive patterns of skepticism and AI attribution within user-generated text. Recent advances in transformer-based language models demonstrate that contextual language representations can support highly accurate classification of complex linguistic patterns in large textual corpora [41].

Scaling the analytical framework in this way would enable researchers to examine how AI-related discernment becomes discursively visible across large digital communication environments and across different platform contexts. In particular, such analysis could investigate how platform-level mechanisms—including voting systems, ranking signals, and visibility dynamics—shape the circulation and interpretation of informational content in social discussion systems. Prior research shows that Reddit users function simultaneously as information consumers and curators through their voting behavior, which strongly influences which content becomes visible and receives further engagement [42]. Incorporating these platform dynamics into large-scale discourse analysis would therefore provide valuable insight into how collective curation processes shape the visibility of skepticism, suspicion, and technological attribution in online conversations.

In addition to computational approaches, future studies could employ controlled experimental designs to examine how interface-level signals influence discursive attribution of AI-generated content. For example, participants could be exposed to identical informational content under different labeling conditions—such as “AI-generated,” “human-written,” or unlabeled presentation—and their evaluative responses could then be analyzed using the L1–L3 framework developed in this study. Such designs would allow researchers to investigate whether disclosure cues increase the likelihood of credibility-oriented skepticism, structural suspicion, or explicit technological attribution. Prior human–computer interaction research suggests that users often reason about algorithmic processes indirectly and that invisible algorithmic mediation can shape how people interpret digital information environments [43]. Experimental approaches of this kind would therefore provide a complementary method for investigating whether the relatively sparse articulation of AI attribution observed in naturalistic discourse reflects limited recognition, limited vocabulary for expressing technological mediation, or the absence of salient disclosure signals.

The framework developed in this study therefore provides a bridge between discourse-level interpretive analysis and emerging computational approaches for detecting AI-related discernment in large-scale information environments.

Author Contributions

Conceptualization, L.P.; methodology, L.P. and A.S.; software, A.S.; vali-dation, A.S. and M.O.; formal analysis, L.P.; investigation, L.P., N.S.; resources, L.P. and N.S.; data curation, L.P.; writing—original draft preparation, L.P. and N.S.; writing—review and editing, L.P., A.S. and M.O.; visualization, M.O.; supervision, L.P.; project administration, L.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research project was financially supported by Mahasarakham University. No grant number is applicable.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki and approved under an exemption category by the Institutional Review Board of Mahasarakham University (MSU IRB) (Approval No. 038–033/2026; Date of Approval: 20 January 2026).

Informed Consent Statement

Not applicable. This study involved the analysis of publicly available online data and did not include direct interaction with human participants, intervention, or the collection of personally identifiable information.

Data Availability Statement

The data analyzed in this study were obtained from publicly available online discussions on Reddit. Due to platform terms of service and ethical considerations, the raw data cannot be shared publicly. Aggregated data and analytical procedures are available from the corresponding author upon reasonable request.

Acknowledgments

The authors acknowledge the role of publicly accessible online platforms and user-generated content in enabling this research. This study relied exclusively on publicly available data and did not involve direct contact with individuals or the collection of any personally identifiable information. All data were analyzed in aggregate form and handled in accordance with established ethical principles for research using public online data, with careful consideration given to privacy, contextual integrity, and responsible data use.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Codebook and Rule-Based Indicators for Digital Skepticism and AI-Related Discernment

This appendix documents the operational codebook and rule-based indicators used to classify observable expressions of digital skepticism and AI-related discernment in news-related online discourse. The coding scheme consists of three analytically distinct, non-mutually exclusive binary indicators: General Skepticism (L1), Structural Suspicion (L2), and Explicit AI-Related Discernment (L3). Each indicator captures a specific discursive orientation toward information and its production as expressed in user comments.

All indicators were coded independently as binary variables (TRUE/FALSE). A single comment may therefore receive multiple TRUE classifications when more than one operational criterion is satisfied.

Appendix A.1. General Skepticism

Indicator: L1_general_skepticism (L1)

Conceptual Definition

General skepticism refers to explicit expressions of doubt, questioning, or requests for verification concerning the credibility, accuracy, or authenticity of informational content. This category captures evaluative reactions directed toward the truthfulness of claims without attributing fabrication, manipulation, or technological mediation.

Operational Criteria

Code TRUE if a comment includes one or more of the following:

Explicit requests for sources, evidence, or verification
Direct questioning of factual accuracy or plausibility
Statements expressing disbelief or uncertainty about whether the content is true

Code FALSE if a comment:

Accepts the content without questioning credibility
Expresses emotional reaction or opinion without evaluative doubt
Criticizes actors, events, or topics without addressing truthfulness

Illustrative Examples (Anonymized)

Included (TRUE):

“Is there any reliable source confirming this?”
“I’m not convinced this actually happened.”
“Do we have proof for these claims?”

Excluded (FALSE):

“This is shocking.”
“I don’t like what they’re doing.”
“That politician is terrible.”

Appendix A.2. Structural Suspicion

Indicator: L2_structural_suspicion (L2)

Conceptual Definition

Structural suspicion captures comments expressing skepticism toward the construction, presentation, or framing of content rather than toward its factual claims alone. This includes judgments that content appears staged, scripted, manipulated, misleading, or artificially arranged, without explicitly attributing such production to artificial intelligence or automated systems.

This category differs from L1 by targeting the form or construction of content rather than its factual accuracy.

Operational Criteria

Code TRUE if a comment includes one or more of the following:

Claims that content is staged, scripted, rehearsed, or artificially arranged
Statements that content “doesn’t feel real,” “looks fake,” or “seems constructed”
Allegations of misleading or fabricated format without reference to AI

Code FALSE if a comment:

Only questions factual accuracy (L1 only)
Explicitly attributes content to AI or synthetic generation (L3)
Criticizes bias or ideology without alleging fabrication or staging

Illustrative Examples (Anonymized)

Included (TRUE):

“This whole thing looks staged.”
“It feels scripted, like it was set up.”
“The video doesn’t seem authentic at all.”

Excluded (FALSE):

“Can someone verify this?” (L1 only)
“This is clearly AI-generated.” (L3)
“The media is biased.”

Appendix A.3. Explicit AI-Related Discernment

Indicator: L3_explicit_AI_discernment (L3)

Conceptual Definition

Explicit AI-related discernment refers to comments that directly attribute content to artificial intelligence or automated generation. This category captures explicit recognition of AI mediation through identifiable linguistic markers, including references to AI-generated content, deepfakes, synthetic media, or AI-based narration.

Only explicit attribution is coded; recognition is not assumed unless clearly articulated in the comment.

Operational Criteria

Code TRUE if a comment includes one or more of the following:

Explicit mention of “AI,” “artificial intelligence,” “deepfake,” or “synthetic”
Direct reference to AI-generated voice, video, image, or narration
Clear attribution of content to automated or algorithmic generation

Code FALSE if a comment:

Expresses doubt or suspicion without mentioning AI
Uses metaphorical or colloquial language unrelated to AI mediation
Refers to technology in general without attributing content creation to AI

Illustrative Examples (Anonymized)

Included (TRUE):

“This sounds like an AI-generated voice.”
“Another deepfake spreading online.”
“Pretty sure this video was made by AI.”

Excluded (FALSE):

“This looks fake.” (L2)
“Technology is scary these days.”
“I don’t trust this video.” (L1 or L2)

Appendix A.4. Coding Rules and Implementation Notes

Each indicator (L1, L2, L3) was coded as an independent binary variable (TRUE/FALSE).
Indicators are non-mutually exclusive; a single comment may satisfy multiple criteria.
Coding prioritized explicit textual cues rather than inferred intent, tone, or assumed author knowledge.
The identical rule set was implemented in a deterministic Python-based classifier and applied consistently in both automated coding and human reliability validation.
Illustrative examples above are complemented by the complete lexical pattern specification reported in Appendix A.5.

Appendix A.5. Full Rule-Based Lexical Patterns Implemented in the Classifier

This subsection reports the complete lexical patterns implemented in the Python classifier and applied to the full dataset. Patterns were implemented using case-insensitive phrase matching with boundary and contextual constraints to minimize partial or spurious matches.

Table A1. Full Rule-Based Lexical Patterns Implemented in the Automated Classifier.

Indicator	Inclusion Patterns	Exclusion Constraints	Notes
L1 General skepticism	“is this true”, “any source”, “any evidence”, “proof?”, “I don’t buy this”, “I don’t believe this”, “sounds fake”, “no evidence”, “doesn’t add up”, “hard to believe”, “can anyone verify”	Exclude metaphorical usage; exclude emotional reactions without credibility questioning; exclude criticism of actors/events without addressing accuracy	Targets explicit doubt or requests for verification
L2 Structural suspicion	“staged”, “scripted”, “set up”, “manufactured”, “doesn’t feel real”, “looks fake”, “seems constructed”, “coordinated”, “propaganda”, “manipulated”	Exclude explicit AI references (L3); exclude generic bias/ideology claims without fabrication	Targets suspicion toward content construction or presentation
L3 Explicit AI-related discernment	“AI-generated”, “generated by AI”, “written by AI”, “made by AI”, “deepfake”, “synthetic”, “synthetic voice”, “AI voiceover”, “AI narration”, “artificial intelligence generated”	Exclude metaphorical/colloquial uses; exclude general technology references without attribution	Captures explicit attribution to AI or automated generation

For L1, classification relied on lexical cues expressing doubt toward informational claims.

For L2, rules targeted references to content construction or presentation without explicit AI attribution.

For L3, classification required direct attribution to AI-mediated or synthetic content generation.

All rules were implemented through deterministic keyword and phrase matching with boundary and contextual constraints designed to minimize false positives.

References

Ahmmad, M.; Shahzad, K.; Iqbal, A.; Latif, M. Trap of Social Media Algorithms: A Systematic Review of Research on Filter Bubbles, Echo Chambers, and Their Impact on Youth. Societies 2025, 15, 301. [Google Scholar] [CrossRef]
Rieder, B.; Matamoros-Fernández, A.; Coromina, Ò. From ranking algorithms to ‘ranking cultures’:Investigating the modulation of visibility in YouTube search results. Convergence 2018, 24, 50–68. [Google Scholar] [CrossRef]
Vraga, E.K.; Bode, L. Defining Misinformation and Understanding its Bounded Nature: Using Expertise and Evidence for Describing Misinformation. Political Commun. 2020, 37, 136–144. [Google Scholar] [CrossRef]
Metzger, M.J.; Flanagin, A.J. Credibility and trust of information in online environments: The use of cognitive heuristics. J. Pragmat. 2013, 59, 210–220. [Google Scholar] [CrossRef]
Wiggins, W.F.; Tejani, A.S. On the Opportunities and Risks of Foundation Models for Natural Language Processing in Radiology. Radiol. Artif. Intell. 2022, 4, e220119. [Google Scholar] [CrossRef]
Floridi, L. Content Studies: A New Academic Discipline for Analysing, Evaluating, and Designing Content in a Digital and AI-Driven Age. Philos. Technol. 2025, 38, 41. [Google Scholar] [CrossRef]
Ferreira, G.B. The Aesthetics of Algorithmic Disinformation: Dewey, Critical Theory, and the Crisis of Public Experience. Journal. Media 2025, 6, 168. [Google Scholar] [CrossRef]
Skandali, D. Social Media Ethics: Balancing Transparency, AI Marketing, and Misinformation. Encyclopedia 2025, 5, 86. [Google Scholar] [CrossRef]
Kozyreva, A.; Smillie, L.; Lewandowsky, S. Incorporating Psychological Science into Policy Making: The Case of Misinformation; Hogrefe Publishing: Göttingen, Germany, 2023; Volume 28, pp. 206–224. [Google Scholar]
Zobel, G. Review of “Algorithms of oppression: How search engines reinforce racism,” by Noble, S.U. (2018). New York, New York: NYU Press. Commun. Des. Q. Rev. 2019, 7, 30–31. [Google Scholar] [CrossRef]
Gillespie, T. Custodians of the Internet: Platforms, Content Moderation, and the Hidden Decisions That Shape Social Media; Yale University Press: London, UK, 2018. [Google Scholar]
Hastuti, H.; Maulana, H.F.; Lawelai, H.; Suherman, A. Algorithmic influence and media legitimacy: A systematic review of social media’s impact on news production. Front. Commun. 2025, 10, 1667471. [Google Scholar] [CrossRef]
Messing, S.; Westwood, S.J. Selective Exposure in the Age of Social Media: Endorsements Trump Partisan Source Affiliation When Selecting News Online. Commun. Res. 2014, 41, 1042–1063. [Google Scholar] [CrossRef]
Van Dijck, J.; Poell, T.; De Waal, M. The Platform Society: Public Values in a Connective World; Oxford University Press: Oxford, UK, 2018. [Google Scholar]
Pariser, E. The Filter Bubble: How the New Personalized Web Is Changing What We Read and How We Think; Penguin: London, UK, 2011. [Google Scholar]
Southwell, B.G.; Thorson, E.A.; Sheble, L. Introduction: Misinformation among Mass Audiences as a Focus for Inquiry. In Misinformation and Mass Audiences; Brian, G.S., Emily, A.T., Laura, S., Eds.; University of Texas Press: New York, NY, USA, 2018; pp. 1–12. [Google Scholar]
Nyhan, B.; Reifler, J. When Corrections Fail: The Persistence of Political Misperceptions. Political Behav. 2010, 32, 303–330. [Google Scholar] [CrossRef]
Paquin, R.S.; Boudewyns, V.; Betts, K.R.; Johnson, M.; O’Donoghue, A.C.; Southwell, B.G. An Empirical Procedure to Evaluate Misinformation Rejection and Deception in Mediated Communication Contexts. Commun. Theory 2022, 32, 25–47. [Google Scholar] [CrossRef]
Lewandowsky, S.; Ecker, U.K.H.; Seifert, C.M.; Schwarz, N.; Cook, J. Misinformation and Its Correction:Continued Influence and Successful Debiasing. Psychol. Sci. Public Interest 2012, 13, 106–131. [Google Scholar] [CrossRef]
Ecker, U.K.H.; Lewandowsky, S.; Cook, J.; Schmid, P.; Fazio, L.K.; Brashier, N.; Kendeou, P.; Vraga, E.K.; Amazeen, M.A. The psychological drivers of misinformation belief and its resistance to correction. Nat. Rev. Psychol. 2022, 1, 13–29. [Google Scholar] [CrossRef]
Pennycook, G.; Rand, D.G. Lazy, not biased: Susceptibility to partisan fake news is better explained by lack of reasoning than by motivated reasoning. Cognition 2019, 188, 39–50. [Google Scholar] [CrossRef] [PubMed]
Wardle, C.; Derakhshan, H. Information Disorder: Toward an Interdisciplinary Framework for Research and Policymaking; Council of Europe Strasbourg: Strasbourg, France, 2017; Volume 27. [Google Scholar]
Bucher, T. If… Then: Algorithmic Power and Politics; Oxford University Press: Oxford, UK, 2018. [Google Scholar]
Dogruel, L.; Masur, P.; Joeckel, S. Development and Validation of an Algorithm Literacy Scale for Internet Users. Commun. Methods Meas. 2022, 16, 115–133. [Google Scholar] [CrossRef]
Long, D.; Magerko, B. What is AI Literacy? Competencies and Design Considerations. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, Honolulu, HI, USA, 25–30 April 2020; pp. 1–16. [Google Scholar]
Cox, A. Algorithmic Literacy, AI Literacy and Responsible Generative AI Literacy. J. Web Librariansh. 2024, 18, 93–110. [Google Scholar] [CrossRef]
Eder, M.; Sehl, A. Being aware of algorithmic personalization? Insights from three European Countries. Inf. Commun. Soc. 2025, 1–18. [Google Scholar] [CrossRef]
Ochmann, J.; Michels, L.; Tiefenbeck, V.; Maier, C.; Laumer, S. Perceived algorithmic fairness: An empirical study of transparency and anthropomorphism in algorithmic recruiting. Inf. Syst. J. 2024, 34, 384–414. [Google Scholar] [CrossRef]
Shin, D. The effects of explainability and causability on perception, trust, and acceptance: Implications for explainable AI. Int. J. Hum.-Comput. Stud. 2021, 146, 102551. [Google Scholar] [CrossRef]
Huang, Y.; Liu, L. The impact of algorithm awareness on the acceptance of personalized social media content recommendation based on the technology acceptance model. Acta Psychol. 2025, 259, 105383. [Google Scholar] [CrossRef]
Vosoughi, S.; Roy, D.; Aral, S. The spread of true and false news online. Science 2018, 359, 1146–1151. [Google Scholar] [CrossRef]
Pennycook, G.; Bear, A.; Collins, E.T.; Rand, D.G. The Implied Truth Effect: Attaching Warnings to a Subset of Fake News Headlines Increases Perceived Accuracy of Headlines Without Warnings. Manag. Sci. 2020, 66, 4944–4957. [Google Scholar] [CrossRef]
Cho, Y.Y.; Woo, H. Heuristic and Systematic Processing on Social Media: Pathways from Literacy to Fact-Checking Behavior. J. Media 2025, 6, 198. [Google Scholar] [CrossRef]
Gillespie, T.; Boczkowski, P.J.; Foot, K.A. Media Technologies: Essays on Communication, Materiality, and Society; MIT Press: Cambridge, MA, USA, 2014. [Google Scholar]
Gehrmann, S.; Strobelt, H.; Rush, A. GLTR: Statistical Detection and Visualization of Generated Text. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Florence, Italy, 28 July–2 August 2019; Association for Computational Linguistics: Stroudsburg, PA, USA, 2019; pp. 111–116. [Google Scholar]
Kirchenbauer, J.; Geiping, J.; Wen, Y.; Katz, J.; Miers, I.; Goldstein, T. A Watermark for Large Language Models. In Proceedings of the 40th International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023; pp. 17061–17084. [Google Scholar]
Mitchell, E.; Lee, Y.; Khazatsky, A.; Manning, C.D.; Finn, C. DetectGPT: Zero-shot machine-generated text detection using probability curvature. In Proceedings of the 40th International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023; p. 1038. [Google Scholar]
Zellers, R.; Holtzman, A.; Rashkin, H.; Bisk, Y.; Farhadi, A.; Roesner, F.; Choi, Y. Defending Against Neural Fake News. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
Cohen, J. A Coefficient of Agreement for Nominal Scales. Educ. Psychol. Meas. 1960, 20, 37–46. [Google Scholar] [CrossRef]
Krippendorff, K. Content Analysis: An Introduction to Its Methodology, 4th ed.; Sage Publications: Thousand Oaks, CA, USA, 2018. [Google Scholar]
Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of NAACL-HLT 2019, Minneapolis, MN, USA, 2–7 June 2019; Association for Computational Linguistics: Stroudsburg, PA, USA, 2019; pp. 4171–4186. [Google Scholar]
Glenski, M.; Pennycuff, C.; Weninger, T. Consumers and Curators: Browsing and Voting Patterns on Reddit. IEEE Trans. Comput. Soc. Syst. 2017, 4, 196–206. [Google Scholar] [CrossRef]
Eslami, M.; Rickman, A.; Vaccaro, K.; Aleyasen, A.; Vuong, A.; Karahalios, K.; Hamilton, K.; Sandvig, C. “I always assumed that I wasn’t really that close to [her]”: Reasoning about Invisible Algorithms in News Feeds. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, Seoul, Republic of Korea, 18–23 April 2015; pp. 153–162. [Google Scholar]

Figure 1. Analytical Workflow of the Study.

Figure 2. Distribution of Digital Skepticism and AI Discernment across Discourse Contexts.

Table 1. Overview of the Dataset.

Item	Description
Data source	Reddit (public subreddits)
Data access method	Official Reddit API (read-only)
Discourse type	News-related public online discussions
Platform characteristics	Topic-based communities, threaded comments, engagement-based visibility
Number of discussion threads	305
Number of comments	6065
Primary unit of analysis	Comment-level textual content
Secondary unit of analysis	Thread-level aggregation
Engagement indicators	Thread-level mean comment score (visibility/valuation proxy under fixed-cap sampling)
Contextual identifiers	Subreddit affiliation
Language	English
Study design	Exploratory, observational
Time frame	August 2010–August 2025

Table 2. Operational Definitions of Digital Skepticism and AI Discernment.

Category	Code	Operational Definition	Example Indicators *
General Skepticism	L1	Expressions of doubt about credibility, accuracy, or authenticity without reference to AI or content-generation mechanisms	Requests for sources; questioning plausibility; expressions of disbelief
Structural Suspicion	L2	Suspicion toward structure, presentation, or framing without explicit attribution to AI	References to scripted, staged, misleading, or propagandistic formats
Explicit AI Discernment	L3	Direct attribution of content to AI mediation or generation	Mentions of AI-generated text, synthetic narration, AI voiceovers, deepfakes

* Example indicators are illustrative; classification was performed using a rule-based keyword and pattern-matching procedure.

Table 3. Automated Classifier Performance Against Adjudicated Ground Truth (n = 300).

Label	TP	FP	FN	TN	Precision	Recall	F1-Score	Support
L1	130	6	14	150	0.956	0.903	0.929	144
L2	16	0	7	277	1	0.696	0.821	23
L3	23	0	4	273	1	0.852	0.92	27

Note: Support refers to the number of positive instances in the adjudicated ground-truth subset. Metrics are calculated against consensus-based adjudicated labels under a precision-oriented rule specification.

Table 4. Raw comment counts (n) and percentages contextualizing low base-rate prevalence.

Visibility Context	Threads	Comments (N)	L1 n (%)	L2 n (%)	L3 n (%)
Higher-score threads	152	3040	48 (1.58)	6 (0.20)	10 (0.33)
Lower-score threads	153	3025	88 (2.91)	10 (0.33)	13 (0.43)

Note. Visibility contexts were defined using a median split of the thread-level mean comment score (median = 18.75). Percentages are calculated at the comment level within each context. Raw counts are provided to contextualize sparse prevalence under fixed-cap sampling. Comments are nested within threads and are not assumed to constitute independent observations.

Table 5. Raw comment counts (n) and percentages contextualizing low base-rate prevalence under the fixed-cap sampling design.

Subreddit	Threads	Comments (N)	L1 n (%)	L2 n (%)	L3 n (%)
Futurology	62	1227	41 (3.33)	0 (0.00)	16 (1.30)
Technology	45	896	10 (1.11)	1 (0.11)	5 (0.56)
Politics	101	2009	47 (2.33)	3 (0.15)	2 (0.10)
Worldnews	40	795	11 (1.38)	9 (1.13)	0 (0.00)
News	57	1138	27 (2.37)	3 (0.26)	0 (0.00)

Note. Percentages are calculated at the comment level within each subreddit. Raw comment counts (n) are reported to contextualize low base-rate prevalence under fixed-cap sampling. “Threads” refers to sampled discussion threads; comment volume was capped at approximately 20 comments per thread.

Table 6. Temporal Comparison of Digital Skepticism and AI-Related Discernment.

Category	Early Period (2019–2021) %	Recent Period (2023–2025) %	Absolute Difference (%) (Recent–Early)
L1 General skepticism	1.43	1.98	0.55
L2 Structural suspicion	0.95	0.15	−0.81
L3 Explicit AI-related discernment	0.00	0.59	0.59

Note. Percentages represent comment-level proportions within each period under fixed-cap sampling. Observed contrasts are interpreted descriptively within the sampled corpus and do not constitute longitudinal modeling, causal inference, or population-level estimation.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Phothong, L.; Sukprasert, A.; Shutimarrungson, N.; Obthong, M. Seeing the Message but Not the Machine: Digital Skepticism and AI Discernment in Online Information Environments. Information 2026, 17, 295. https://doi.org/10.3390/info17030295

AMA Style

Phothong L, Sukprasert A, Shutimarrungson N, Obthong M. Seeing the Message but Not the Machine: Digital Skepticism and AI Discernment in Online Information Environments. Information. 2026; 17(3):295. https://doi.org/10.3390/info17030295

Chicago/Turabian Style

Phothong, Lersak, Anupong Sukprasert, Nattakarn Shutimarrungson, and Mehtabhorn Obthong. 2026. "Seeing the Message but Not the Machine: Digital Skepticism and AI Discernment in Online Information Environments" Information 17, no. 3: 295. https://doi.org/10.3390/info17030295

APA Style

Phothong, L., Sukprasert, A., Shutimarrungson, N., & Obthong, M. (2026). Seeing the Message but Not the Machine: Digital Skepticism and AI Discernment in Online Information Environments. Information, 17(3), 295. https://doi.org/10.3390/info17030295

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Seeing the Message but Not the Machine: Digital Skepticism and AI Discernment in Online Information Environments

Abstract

1. Introduction

2. Literature Review

2.1. Algorithmic and AI-Mediated Information Environments

2.2. Misinformation, Uncertainty, and Skepticism in Digital Information Environments

2.3. AI Awareness, Algorithmic Literacy, and Discernment

2.4. Positioning the Present Study

3. Methodology

3.1. Research Design

3.2. Data Source and Context

3.3. Data Collection Procedure

Sampling Strategy and Thread Selection

3.4. Data Preparation and Units of Analysis

3.5. Analytical Framework and Coding Procedure

3.5.1. Coding Procedure and Reliability Assessment

3.5.2. Temporal Context and Analytical Scope

3.6. Analytical Procedures

4. Results

4.1. Distribution of Digital Skepticism and AI Discernment

4.2. Digital Skepticism and AI Discernment by Score-Based Visibility Context

4.3. Digital Skepticism and AI Discernment Across Discourse Contexts

4.4. Consistency Across Levels of Analysis

4.5. Temporal Comparison of Digital Skepticism and AI Discernment

4.6. Low Prevalence of Discursive Skepticism and AI Attribution

5. Discussion

5.1. Interpreting the Observed Distributional Difference Between Digital Skepticism and AI Discernment

5.2. Visibility and Contextual Conditions of Discursive AI Attribution

5.3. Implications for Information Quality and Digital Citizenship

5.4. Contributions to Information Science

6. Conclusions

7. Limitations and Future Research

7.1. Limitations

7.2. Computational and Experimental Research Directions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Codebook and Rule-Based Indicators for Digital Skepticism and AI-Related Discernment

Appendix A.1. General Skepticism

Appendix A.2. Structural Suspicion

Appendix A.3. Explicit AI-Related Discernment

Appendix A.4. Coding Rules and Implementation Notes

Appendix A.5. Full Rule-Based Lexical Patterns Implemented in the Classifier

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI