When Accountability Goes Public: Televised Oversight and Systemic Governance in Urban China

Zhang, Hong; Ju, Yifei; Zheng, Lei

doi:10.3390/systems14060615

Open AccessArticle

When Accountability Goes Public: Televised Oversight and Systemic Governance in Urban China

by

Hong Zhang

,

Yifei Ju

and

Lei Zheng

^*

School of International Relations and Public Affairs, Fudan University, Shanghai 200433, China

^*

Author to whom correspondence should be addressed.

Systems 2026, 14(6), 615; https://doi.org/10.3390/systems14060615

Submission received: 27 April 2026 / Revised: 18 May 2026 / Accepted: 25 May 2026 / Published: 28 May 2026

(This article belongs to the Special Issue Systemic Governance in Smart Cities: Rethinking Urban Complexity)

Download

Browse Figures

Versions Notes

Abstract

Televised accountability can be understood as a socio-technical urban governance arrangement in which media exposure, evidentiary inputs, staged interaction, and bureaucratic response are tied together in a structured oversight process. Drawing on Nanning’s long-running “Commitment to the People—TV Accountability” program, the analysis covers 73 episodes broadcast between 2014 and 2023, including 327 issue chains and 3675 official responses. Rather than collapsing episodes into aggregate cases, the design preserves issue chains and response sequence. Pre-response pressure is measured through factual specificity, emotional intensity, accountability directness, and evidence type, while official outputs are coded as acknowledgement, deflection, generic commitment, and specific commitment. Row-level logit models indicate that emotional intensity is associated with a broader immediate visible response: higher odds of acknowledgement, deflection, and generic commitment, but lower odds of specific commitment. Accountability directness is associated with higher odds of acknowledgement and deflection. Non-clip evidence is associated with specific commitment, whereas negative exposure clips are associated with lower odds of generic commitment. Terminal unit-chain models and sequence analyses further show that visible concession emerges early, whereas checkable commitment appears later and in fewer chains. Rather than serving as a simple transparency device, televised oversight operates here as a systemic accountability interface. It tends to channel official response toward low-cost, publicly legible adjustment while making high-cost commitment more selective, evidence-dependent, and stage-dependent. The findings speak to ongoing discussion of socio-technical governance, smart-city governance, and public-value production in data-mediated urban accountability settings.

Keywords:

systemic governance; smart-city governance; socio-technical systems; televised accountability; urban accountability; public accountability; public value

1. Introduction

Urban governance increasingly operates through interfaces that combine media exposure, citizen complaint, administrative response, and public evaluation. Public value scholarship asks whether public action is valuable, publicly authorized, and operationally supported, rather than merely whether it generates visible managerial output [1,2,3,4,5,6]. Accountability scholarship adds a second boundary condition: accountability is not simply any form of responsiveness, but a relation in which an actor is called to explain and justify conduct before a forum that can question, judge, and sometimes impose consequences [7,8,9,10]. The question for urban governance is therefore not only whether officials respond, but what sort of accountability-relevant public value a governance interface can produce.

Televised accountability in China differs from both conventional town-hall meetings and ordinary interview programs. In the Nanning program examined here, local governance problems are organized into issue chains built around citizen complaints, investigative reporting, and program-prepared evidence. A typical chain begins with host introduction, background narration, complaint material, investigative clips, or documentary evidence. Responsible officials from municipal departments, districts, townships, development zones, or other public organizations then answer in the studio before hosts, cameras, audience members, and supervisory authorities. During the exchange, hosts may replay evidence, ask follow-up questions, name responsible units, and move the discussion toward a commitment segment in which officials are expected to state what will be done. Later review episodes or post-program reporting may revisit previously exposed problems and give the format a visible follow-up component [11,12,13].

Research on smart cities and data-rich urban governance similarly emphasizes that public value is shaped through coupled relationships among technologies, organizations, and civic actors rather than through isolated tools alone [14,15,16,17,18]. Chinese televised accountability programs are one such interface. They do not merely publicize failure after the fact. They assemble evidence, allocate attention, identify responsibility, and press officials to respond before viewers, hosts, and supervising authorities within the same staged sequence [19,20,21,22]. In many cities, this kind of broadcast arena coexists with online inquiry channels, government hotlines, platform-based complaint handling, and other digital responsiveness arrangements, but television remains distinctive because it synchronizes issue selection, evidentiary display, official reply, and public evaluation within one visible sequence [23,24,25,26]. The empirical case is Nanning’s long-running “Commitment to the People—TV Accountability” program. The question is therefore not only how officials react when accountability goes public, but how a publicly staged oversight process reorganizes response inside an urban governance system.

Existing research shows why this setting matters, but it has not fully examined the internal response process inside the accountability scene. Chinese studies often treat televised accountability as media supervision, mediated governance, or a channel that translates public demands into administrative action [19,20,23,27]. Other work focuses on later outcomes such as rectification, satisfaction, or agenda pressure. What remains less clear is how different informational inputs, pressure cues, and program stages are converted into immediate official response within the oversight interface itself.

The broader literature on blame avoidance helps explain why this gap matters. Political and administrative actors manage blame rather than simply absorb it [28]. Transparency and publicity can intensify strategic adaptation instead of guaranteeing substantive correction [29,30,31]. In a televised setting, once silence and procedural delay become harder to use, officials may turn to faster and more interactional forms of defense, including partial acknowledgement, technical explanation, responsibility dilution, symbolic commitment, or combinations of these moves. What looks like a simple public reply is therefore better understood as a system output shaped by staged visibility, evidentiary form, responsibility targeting, and sequence position.

This setting is especially useful because televised accountability is structured rather than spontaneous. Hosts intervene, clips are prepared in advance, turn order is managed, and later program stages explicitly invite commitment and evaluation [32,33,34]. The scene is public, but it is also curated. Treating it as an ordinary deliberative forum therefore misses its defining feature: pressure is staged, sequenced, and socially amplified. In urban-governance terms, this makes televised accountability a useful site for examining how mediated visibility shapes official response.

Public confrontation also adds an interactional-processing layer. It concentrates visibility, explicit judgment, and limited control over the timing of reply. Research on stress, threat rigidity, and decision pressure suggests that such conditions can narrow information processing and encourage short-horizon defensive adjustment rather than careful problem solving [35,36,37,38]. The analysis therefore focuses on observable response patterns rather than direct measurement of officials’ internal psychological states.

The analysis centers on official response strategies within that staged setting, using Nanning’s program as a focused city-level case rather than a random national sample. The design is not fully exogenous, and it does not directly estimate governance improvement after broadcast. The question is narrower and more system-oriented: how are different forms of pre-response pressure associated with the current official reply, and what does the resulting sequence reveal about the public value and accountability outputs this oversight interface tends to organize?

The article links concept, method, and empirical substance in three ways. Conceptually, it recasts televised accountability as a socio-technical governance interface in which public pressure is assembled through media production, evidentiary presentation, and program sequence. This framing connects public value to accountability by asking whether visible answerability is associated mainly with legible response or with more checkable commitments that can support later verification and learning. Methodologically, the analysis uses a response-level design based on sequential transcripts and pressure windows, making it possible to analyze the individual official reply rather than the episode as a whole. Empirically, the Nanning case shows that officials rarely rely on a single response line. They assemble response packages, and the distinction between generic and specific commitment turns out to be crucial for understanding when public oversight is linked to symbolic accommodation and when it is linked to more checkable commitments. More broadly, visible pressure is associated with low-cost, publicly legible adjustment more readily than high-cost commitment, a point with wider relevance for systemic governance, urban accountability, and public value in data-mediated oversight settings.

The article proceeds in four steps. Section 2 develops the theoretical framing and research focus. Section 3 describes the Nanning corpus, the response-chain design, the measurement strategy, and the model specifications. Section 4 reports the descriptive, model-based, robustness, stage, and sequence results. Section 5 discusses the implications for systemic accountability, public value, practical design, limitations, and future research, before the conclusion summarizes the argument.

2. Theory and Research Focus

2.1. Televised Accountability, Mediated Governance, and the Missing Systemic Mechanism

Chinese research on televised accountability has gradually moved from treating it as a media phenomenon or quasi-deliberative experiment [39] to treating it as part of a broader process of mediated governance [22]. Recent work on media co-governance extends this view by showing how media organizations actively structure the accountability scene [21]. This literature shows that televised accountability turns administrative problems into publicly visible events and that the production logic of the program shapes how those events are framed, timed, and morally evaluated [19,20,22,40]. Related work on online political inquiry, willingness to respond, and pressure-based responsiveness reaches a similar conclusion: official receptivity exists, but it is organized and filtered rather than fully open [23,24,27,41,42].

Less developed in this literature is the internal conversion process inside the televised scene itself. Much of the existing work either evaluates the significance of the program as a whole or asks whether publicity produces rectification later. Both questions are important, but neither directly captures how information inputs, stage progression, and responsibility cues are translated into immediate official response. In a setting where the host can intensify pressure, introduce a clip, demand an on-site statement, or move the program into a commitment stage, the internal structure of official response becomes a central object of analysis rather than a residual detail. The missing mechanism is therefore not only micro-interaction in a narrow sense, but the way a governance interface converts staged pressure into different kinds of public output.

2.2. Public Value, Accountability, and Systemic Response Under Visibility

Public value theory provides a demanding standard for judging public-sector response. Moore’s original formulation linked public value creation to the alignment of valuable public purposes, authorization, and operational capacity [1]. Later work broadened this view by emphasizing contested values, the public sphere, co-production, and networked governance rather than a purely manager-centered account [2,6]. Bozeman’s work is especially relevant here because it separates public values and public interest from narrower efficiency or output criteria and develops the idea of public-value failure: visible activity can still fall short if public values are not articulated, aggregated, or sustained over time [3,4]. Hartley and colleagues add that empirical public value research must specify which conception of public value is being used and examine stakeholders, processes, and outcomes rather than treating public value as a self-evident label [5]. This article therefore does not treat an on-air response itself as public value. It asks whether the accountability interface produces responses specific enough to support later verification, coordination, and learning.

Accountability theory provides the parallel conceptual boundary. Bovens defines accountability as a relationship between an actor and a forum in which the actor must explain and justify conduct, the forum can question and judge, and the actor may face consequences [7]. Mulgan warns that accountability is often stretched to cover responsiveness, control, transparency, and public dialogue in ways that blur its core meaning of being called to account [8]. Schillemans further shows that accountability arrangements can operate in the shadow of hierarchy, where horizontal or public-facing forums matter partly because they interact with traditional hierarchical authority [9]. Bovens, Schillemans, and ’t Hart also caution that the effectiveness of accountability arrangements should be judged through more than visible answer-giving, including democratic, constitutional, and learning functions [10].

These distinctions matter for televised accountability. The program creates a visible forum in which officials are called to respond, but it does not by itself reveal whether downstream rectification, sanction, or organizational learning occurs. The analysis therefore focuses on the public output of answerability inside the interface: acknowledgement, deflection, generic commitment, and specific commitment. Specific commitment is treated as the most checkable response category, not as proof of full accountability success. This keeps the argument narrower than a claim about final governance effectiveness while still showing how an urban oversight interface organizes accountability-relevant outputs.

The wider literatures on public administration and political communication suggest that official response under accountability pressure is rarely the same as direct problem solving. Political and administrative actors manage blame rather than simply absorb it [28]. Public exposure can intensify strategic responsiveness and defensive adaptation rather than guarantee high-quality compliance [29,30,31]. In China, field and survey experiments show that public-facing receptivity is conditional rather than universal [43,44]. Research on constituency service, consultative authoritarianism, differentiated local response, and digital interaction makes the same point in different ways: responsiveness exists, but it is selective, filtered, and institutionally shaped [25,45,46,47,48,49,50,51,52]. When officials are confronted on air, pressure is also public, reputational, and time compressed. Work on threat rigidity and decision pressure suggests that actors under such conditions often simplify information processing and fall back on defensive routines [35,37,38]. Here, blame avoidance is treated not only as a strategic political logic, but also as a set of observable response choices consistent with short-horizon adjustment under intense public scrutiny.

2.3. From Structural Blame Avoidance to Response Packages

The starting point is straightforward: under public exposure, officials do not merely choose between admission and denial. Comparative work on blame management and discursive response shows that public officials often use layered combinations of acceptance, justification, responsibility dilution, and symbolic accommodation rather than one pure strategy [30,31,53]. Hansson’s discursive framework states this point directly, and later semiotic work shows that acceptance, distancing, justification, and repositioning can coexist within the same speech act [53,54]. More recent work on impoliteness and mixed signals reaches a similar conclusion about the layered character of public response [55,56].

The televised accountability arena intensifies this logic. Many familiar bureaucratic defenses, such as long procedural delay, silent non-response, or purely internal escalation, are compressed by the public format. Yet strategic response does not disappear. It changes form. Public exposure can compress delay while still leaving room for symbolic adaptation, reputational self-protection, and calibrated ambiguity [29,31]. In China, media commercialization and networked publicity have repeatedly created incentives for rapid official response without guaranteeing substantive resolution [57,58]. Officials must respond immediately, but they can still vary the type of response they provide. A short statement can acknowledge a problem, stress institutional constraints, and promise future action in highly noncommittal terms. The public scene does not remove blame avoidance. It reorganizes the available response options.

2.4. Televised Accountability as a Socio-Technical Governance Interface

The Chinese televised accountability format can be understood as theatrical public accountability and, more specifically, as a socio-technical governance interface. “Theatrical” does not imply falsity. It identifies a form of public accountability in which pressure is staged, timed, and assembled through media production and program sequencing. “Interface” indicates that the format is more than a media genre. It is an organized point of connection across subsystems that are often studied separately, including media institutions, citizen complaint, evidentiary presentation, bureaucratic hierarchy, and follow-up supervision. This formulation builds on Chinese work that treats televised accountability as a form of mediated governance rather than spontaneous participation [20,21,22,40]. It also fits broader scholarship on mediatization, media events, and public communication under staged visibility [32,33,34]. Read alongside smart-city and open-data scholarship, it treats urban oversight as a governance arrangement in which information flows, organizational boundaries, accountability claims, and public-value claims are continuously reworked rather than fixed in advance [15,18,59,60]. The link to smart-city governance follows from this interface logic: urban intelligence is not only a matter of sensors, platforms, or dashboards, but also of how complaint signals are selected, narrated, and converted into accountable public response across visible interfaces [14,16,17]. The concept characterizes the accountability format; the empirical tests are based on the Nanning case.

Pressure is organized rather than simply expressed. Hosts do not merely relay citizen complaints; they curate issues, introduce clips, choose moments of escalation, and frame failures in ways that heighten visibility and responsibility. That pattern is consistent with accounts of staged public attention and the structuring force of media logic [32,34]. Chinese work on media co-governance and research on media influence under Party supervision point in the same direction: media institutions can assemble attention, frame failure, and amplify responsibility [21,57,58]. Response is structured by sequence as well. Issue introduction, clip playback, transitional confrontation, commitment, and audience evaluation carry different communicative expectations, which fits phase-based accounts of mediatized politics [33]. This view also resonates with work on the networked public sphere [61]. Official speech is therefore constrained by simultaneous exposure to the host, the camera, the studio audience, and the prospect of later administrative follow-up. That is what makes the setting useful for studying systemic governance under visibility: the interface collects dispersed complaints, reorganizes them into a shared sequence, and translates them into publicly observable response.

This perspective also sets a narrower inferential boundary. Pressure is not treated as randomly assigned, nor is the televised program treated as an ordinary public sphere in which responses emerge organically. That distinction matters because participatory and deliberative channels in China are typically bounded, managed, and selectively opened rather than fully open [46,51,62]. The central question is which stable associations appear between different kinds of pre-response pressure and different kinds of official response within this structured accountability interface, and what those associations reveal about the public output it tends to produce.

2.5. Pressure Dimensions and Differential Expectations

Because pressure in this arena is assembled rather than uniform, it should not be reduced to one undifferentiated index. Based on the structure of the transcript data, the analysis distinguishes four pressure dimensions. This follows a broader view in which blame, responsiveness, and publicity operate through different communicative cues rather than through one homogeneous dose of pressure [27,28,29,53].

Factual specificity refers to the degree to which the pressure window contains verifiable factual anchors such as numbers, dates, places, named units, process details, or documentary references. When factual specificity is high, officials should have less room to blur the problem boundary because detailed anchors narrow interpretive flexibility and increase the later checkability of response [29,53].

Emotional intensity captures the overall force of dissatisfaction, condemnation, urgency, or moral pressure in the pressure window. Emotional pressure may not alter the objective content of the problem, but it can raise the immediate need to respond publicly, especially in settings where condemnation and impoliteness are part of the pressure script [55]. Such pressure windows also plausibly intensify short-horizon decision pressure, because negative judgment is public, reputational risk is salient, and response time is short [35,36,38].

Evidentiary exposure refers to whether the pressure window includes evidence resources such as general documentary or investigative evidence, explicit negative exposure clips, or rectification follow-up clips. Different evidence forms may not work in the same way because they are embedded in different program contexts and publicity logics [34,57,63].

Accountability directness refers to the extent to which pressure directly points to a responsible unit, official role, regulatory failure, or an immediate demand for explanation or action. Its primary contribution is to heighten responsibility visibility rather than merely issue salience [28,30].

The expectation, then, is not that “more pressure” linearly improves governance. Emotional intensity is more likely to be associated with a broad response package; factual specificity and evidentiary exposure are more likely to constrain evasive maneuvering; and accountability directness is more likely to coincide with visible response while also provoking defensive boundary work.

2.6. Temporal Blame Displacement and the Meaning of Commitment

Commitment is especially easy to misread in public accountability settings. At first glance, any on-air promise of rectification may look like accountability success. Yet from the perspective of blame avoidance, commitment varies in cost and specificity [30,31]. A statement such as “we will look into this immediately” or “we will coordinate after the program” may function less as concrete compliance than as a way to defer the current crisis to an uncertain future. Earlier work on transparency already suggested that highly visible accountability settings can redirect rather than eliminate strategic adaptation [29]. Research on quasi-democratic institutions and activism in China reaches a similar conclusion: weak institutions and low-cost concessions can absorb pressure politically even when implementation remains thin [63]. The later blame-game perspective sharpens that point by showing how actors shift risk across time and responsibility boundaries [30].

This is why the analysis distinguishes generic commitment from specific commitment. Generic commitment conveys a reformist posture but lacks clear time, agent, or verifiable target. Specific commitment approaches a higher-cost statement because it attaches the official’s public account to future-oriented action elements that can later be checked. In accountability terms, it gives the forum a more determinate object for questioning, judgement, possible sanction, and learning, whereas generic commitment keeps the account elastic [7,8,9,10]. In public-value terms, this distinction matters because public value theory links valuable public purposes to authorization and operational capacity [1,5], while public-value-failure work warns that visible activity can still remain disconnected from public values over time [3,4]. In a system where officials often operate through mixed signals and bounded concessions, that distinction is analytically important rather than merely semantic [56]. If a pressure dimension mainly increases generic commitment but not specific commitment, the implication is not necessarily that accountability works better. It may instead show that officials absorb present pressure through temporally displaced language.

2.7. Research Focus

The article brings four strands of research into a single response-level analysis. Public value research motivates the focus on specificity and checkability: a visible oversight interface is more valuable when it can support later verification, coordination, and learning [1,3,5]. Accountability research clarifies why the setting should be treated as a staged actor–forum relation embedded in hierarchy rather than as a generic instance of responsiveness [7,8,9]. Work on televised accountability and mediated governance identifies the format as an organized governance process rather than a simple media event [19,20,21,22]. Blame-avoidance research explains why institutional risk management appears in observable discursive choices [28,30,53]. In systems terms, the analysis examines how a city-level accountability interface converts complaint signals, evidentiary inputs, and staged confrontation into differentiated public response outputs.

The empirical design rests on three linked choices. The analytical unit is not the full episode or a single illustrative case, but the single official response located inside a chain-specific sequential context. Pressure is disaggregated into factual specificity, emotional intensity, evidentiary exposure, and accountability directness instead of being collapsed into one index. Response strategies are modeled as non-mutually exclusive components of a response package rather than mutually exclusive categories, which aligns better with discursive work on layered government communication and with text-as-data approaches that preserve multi-label structure [54,64].

These design choices bring four questions into focus. First, how are different forms of pre-response pressure associated with official response strategies in a staged accountability setting? Second, do emotional intensity, factual specificity, evidentiary exposure, and accountability directness operate differently across response types? Third, does generic commitment function as a form of temporal blame displacement? Fourth, is specific commitment an immediate correlate of pre-response pressure, or is it better understood as a later-stage product of the program’s structured sequence?

Figure 1 condenses the argument. The left side contains observable pressure inputs in the televised scene. The middle stage identifies the interface mechanisms through which those inputs are translated into public response. The right side contains the observable response package analyzed in the data. The empirical tests examine the association between pressure inputs and response outputs, while the middle layer clarifies the theoretical process through which staged oversight reorganizes official response.

3. Materials and Methods

3.1. Data Source and Sample Construction

The empirical material comes from a sequential corpus of Nanning’s long-running “Commitment to the People—TV Accountability” program from 2014 to 2023. Public reporting indicates that the program began in March 2014, combined live questioning with explicit post-program rectification pressure, and later added review episodes to track whether previously exposed problems were actually rectified [11,12,13]. The same reporting also points to a stable and institutionalized format rather than episodic experimentation: within the first year there were already 14 live episodes and 53 participating departments, while later retrospective reports described repeated cycles of exposure, follow-up, and review [12,13]. These features make Nanning a useful case. It captures a mature and problem-focused accountability arena with enough temporal stability to observe repeated response patterns under a routinized televised format. Because the program is organized around municipal departments, district-level actors, and concrete local problems, it also provides a useful window into how an urban accountability interface processes city-level governance failures under public visibility. The evidence should still be read as a focused city-level case rather than a random sample of all Chinese televised accountability programs. The analytical value lies in the stabilized accountability format, not in statistical representativeness of every regional governance ecology.

The publicly accessible transcript archive is not complete: some earlier episodes have been removed from official webpages, the latest retrievable transcript material available for this study ends in 2023, and several episode records contain severe transcript gaps. After excluding unavailable or seriously incomplete records, the 73 analyzed episodes contain 15,162 speaking turns across hosts, officials, citizens, commentators, clips, and narration. Before response-level coding, speaker roles, organizational units, and turn order were standardized across the sequential transcripts.

Among these turns, 4561 are spoken by identifiable officials. Because the analysis focuses on public-sector actors who can credibly bear and articulate commitment, it excludes clearly non-governmental market actors, enterprise representatives, and ambiguous speaker records. After restricting the data to eligible official responses with complete coded variables, the final analytical sample contains 3675 valid official responses from 73 episodes, 327 issue chains, and 840 unit-chain observations.

The analytical sample is not identical to the full transcript universe. It includes the official responses that meet the actor, chain, and variable-completeness criteria described above. Quantitative inference in what follows is therefore based on the effective response-level sample rather than on an undifferentiated treatment of all program turns.

3.2. Analytical Unit, Chains, and Pressure Windows

Analysis proceeds at the level of the single official response, with each response interpreted within a sequential context. The data structure follows three levels: episode, chain, and turn. An episode is a program installment, a chain is an issue-specific accountability exchange, and a turn is an individual speaking turn.

Chain identifiers were constructed from physical and programmatic boundaries visible in the transcripts. Recurrent host markers such as “let us watch the clip”, “next we focus on”, “welcome back”, “commitment time”, and “please vote” provide repeatable boundary cues. The segmentation used these recurring markers, with review of ambiguous boundaries where needed. The goal was not to reconstruct the finest semantic topic boundary, but to obtain a stable and reproducible chain structure for defining pressure windows.

For response-level modeling, each official response is paired with a pressure window. Within the same chain, the pressure window includes all non-official turns that occur after the previous official turn and before the current official turn. If an official response occurs at the beginning of a chain, the window starts from the chain’s first turn. This rule preserves temporal order, avoids contaminating the current pressure measure with earlier same-unit official speech, and gives each response its nearest external pressure environment.

3.3. Text Coding and Measurement

The text variables are based on interpretive content coding rather than keyword counts alone. For response outcomes, the primary text is the current official reply; pressure-window text, previous same-unit text, episode, chain, and stage provide context only when the local meaning is ambiguous. The measurement scheme records a broader set of response dimensions and uses the theoretically relevant dimensions as the analytical variables described below. This design follows standard guidance in content analysis, which stresses explicit categories, clear decision rules, and documented agreement checks when interpretive judgments are converted into analyzable variables [65,66].

Initial classification used the ‘qwen3.6-plus’ model release through the Alibaba Cloud DashScope interface (Alibaba Cloud, Hangzhou, China), followed by researcher review of ambiguous or substantively important cases before analysis. The same model was used for the response outcomes and the pressure variables under task-specific prompts. At the time of coding, ‘qwen3.6-plus’ was the current release in the Qwen Plus line on DashScope. It was selected because the source material consists of Chinese television transcripts with colloquial compression, administrative terminology, and context-dependent references, making a high-capacity Chinese-language model appropriate for this material.

This approach follows a broader text-as-data view in public administration and political methodology, which argues that automated text coding should be paired with concept-specific checks and close reading rather than treated as sufficient on its own [64,67]. Recent work on LLM-based annotation further suggests that model-assisted coding can perform well for short texts and well-specified codebooks, but that performance remains task-dependent and requires independent assessment [68,69,70,71]. Other methodological work points to risks of validity drift, replicability loss, and opaque prompt dependence, which reinforces the need to report coding rules and coder-agreement results rather than rely on model classifications alone [72,73].

The prompts did not ask for free-form interpretation. They specified the categories, inclusion and exclusion rules, boundary cases, and required categorical answers. This design was intended to reduce the risk that the model would generate unsupported summaries or infer unobserved motives. It does not remove model-related risks altogether. A Chinese-language model trained in a Chinese information environment may still reproduce genre-specific assumptions about administrative speech, miss irony or pragmatic ambiguity, or classify formulaic official language too confidently. For that reason, the model supported classification, while researcher judgment remained central for difficult cases and for the agreement checks reported below. The final analytical variables should therefore be understood as model-assisted content codes followed by targeted researcher review and independent human agreement checks, not as unverified LLM outputs.

3.4. Outcome Variables

The response scheme records multiple non-mutually exclusive dimensions. The main models focus on four core strategies while retaining additional categories for descriptive context and for distinguishing routine bureaucratic talk from the absence of meaningful content.

The selection of the four core outcomes is theory-driven rather than prevalence-driven. They capture the main strategic contrast examined here: whether public pressure is associated with visible acknowledgement, displacement of responsibility, vague future-oriented settlement, or checkable commitment. Some additional categories are empirically common, especially routine informational response, but they do not map as directly onto this central contrast between concession, defense, and commitment. Because official replies often combine several speech functions, the outcome dimensions are coded as separate binary indicators rather than as one mutually exclusive typology. A single response can therefore contain acknowledgement together with deflection, one form of commitment, or routine bureaucratic talk. The only deliberate exception is the commitment pair: if a response satisfies the higher threshold for specific commitment, it is not also counted as generic commitment. This rule keeps vague promise-making analytically separate from more checkable commitment. The models estimate whether each pressure cue is associated with the presence of a given response component.

The first core outcome is the acknowledgement indicator, which equals 1 when the response explicitly acknowledges the existence of a problem or responsibility. This includes statements of acceptance, criticism-taking, apology, shame, or self-reproach, but it does not require full acceptance of all responsibility.

The second is the deflection family indicator. It equals 1 when the response weakens immediate responsibility through boundary claims, historical legacy, procedural constraints, coordination difficulties, objective conditions, or outward responsibility shifting. The coding logic does not equate this category with crude blame dumping. Its core criterion is whether the response functionally reduces the focal unit’s immediate responsibility burden.

The third is the generic-commitment indicator. It records future-oriented commitment language that contains a promise to address the issue but does not meet the higher threshold for verifiable specificity. Responses that already qualify as specific commitment are therefore not counted again as generic commitment.

Although both deflection and generic commitment can serve defensive purposes, they are not treated as the same speech function. Deflection operates through spatial or causal displacement, that is, it reduces the focal unit’s immediate responsibility burden by contesting where responsibility lies or why the failure occurred. Generic commitment operates through temporal displacement. It accepts the need to respond, but defers resolution into an underspecified future. Separating the two therefore distinguishes between contesting present responsibility and delaying concrete settlement.

The fourth is the specific-commitment indicator. Under the final coding rule, a response must contain explicit commitment language and satisfy at least two of four action-specificity elements: a clear time element, a clear action, a clear responsible actor, or a clear target or object of rectification. This is the outcome closest to a high-cost, trackable commitment. Consistent with the theoretical argument above, it is treated as an accountability-relevant public output because it gives later actors more material for verification, not because the coding itself proves that the promised remedy was completed.

In addition to these four core outcomes, the broader response scheme includes categories such as routine informational response, effort claiming, policy or technical explanation, low-information response, and sanction threat. These dimensions reduce the size of the residual category and make the surrounding bureaucratic speech environment more interpretable. The main models do not center on these categories because they do not bear as directly on the article’s core contrast between concession, defense, and commitment. Sanction threat is retained as an additional outcome. Classic blame-avoidance theory can plausibly treat downward punishment as a form of deflection, because responsibility is shifted onto subordinates or lower-level implementers. In the present setting, however, punitive signaling serves a dual function. It can deflect immediate blame downward, but it can also perform visible rectification, political alignment, and administrative resolve before superiors, hosts, and viewers. For that reason, sanction threat is modeled separately rather than subsumed completely within the broader deflection family.

3.5. Pre-Response Pressure Variables

The main models use pressure measures built from rubrics rather than simple dictionary-based counts. Each pressure variable is scored from observable textual cues specified in advance, such as factual anchors, direct responsibility signals, or the tone of the pressure window. This avoids treating repeated keywords as equivalent to substantive pressure and follows the broader text-as-data principle that automated measures are strongest when coding rules are tightly aligned with the substantive concept [64,67]. This strategy is also consistent with content-analysis guidance on making latent constructs traceable to observable indicators and with recent recommendations for LLM-assisted coding under fixed rubrics [65,72,73].

The factual-specificity measure is a 0–3 variable constructed from five factual anchors: numerical anchors, time anchors, place or organizational anchors, process-detail anchors, and documentary-evidence anchors. Higher values indicate that the pressure window contains more verifiable and concretely locatable factual content.

The emotional-intensity measure is also coded from 0 to 3. Unlike simple word-count approaches, it evaluates the overall semantic intensity and moral pressure of the pressure window under a fixed ordinal rubric. It is therefore closer to the overall pressure tone of the accountability scene than to a simple tally of negative words, which matters because the relevant construct is pragmatic pressure rather than mere lexical negativity [55,67,73].

The accountability-directness measure is a 0–3 variable composed from three direct-accountability signals: explicit naming or pointing to a responsible actor, attribution of failure or neglect, and a demand for immediate on-site explanation or action. Higher values indicate that the pressure is more directly aimed at a responsibility bearer rather than framed as a generic problem.

The evidence dimension is modeled not as a single binary exposure variable but through three separate indicators: general non-clip evidence, clearly negative exposure clips, and rectification follow-up clips. This disaggregation is necessary because different evidence forms embed different program logics and do not operate as a single undifferentiated treatment.

3.6. Controls and Structural Variables

The models include four sets of controls. Program stage captures whether the response occurs during issue introduction, clip presentation, transition, commitment, or voting. This is a substantive structural control rather than a routine background variable because commitment patterns vary sharply across stages. Unit type controls for whether the actor is a bureau or functional department, a district or township government, a development-zone body, a party-discipline actor, or another public organization. Calendar year absorbs temporal heterogeneity across 2014–2023.

A final control, and the most important one for addressing omitted problem seriousness, is coded problem severity. This chain-level ordinal measure ranges from 0 to 3 and is coded from clip and host-background text only, with official response content excluded. The score is based on five observable seriousness cues: public-safety or health risk, economic or property loss, affected population scale, repeated exposure or long-term unresolved failure, and explicit illegality or rule violation. These cues guide the ordinal judgement; they are not mechanically added into a five-point count. Each chain score is then assigned to the corresponding response observations. The measure is therefore best understood as a source-separated severity score rather than an estimate of absolute ground-truth harm. The clips and host narratives remain media texts, and media framing can dramatize problem presentation. This design reduces same-text contamination and limits direct conceptual overlap with the pressure variables in the main regressions [64,65,67].

3.7. Intercoder Agreement

An independent second coder assessed agreement on 400 response-level observations and 100 chain-level severity cases. The choice of Cohen’s

κ

for nominal indicators and linearly weighted

κ

for ordered scales follows standard guidance in content analysis and annotation research, where agreement measures are expected to account for chance agreement and where weighted coefficients are preferable when disagreements are ordinal rather than purely nominal [66,74]. Both pressure and response variables were included in this check, so the final variables are assessed through human agreement rather than only through the model used for initial classification.

For the four core response outcomes, Cohen’s

κ

was 0.929 for acknowledgement, 0.873 for deflection, 0.923 for generic commitment, and 0.734 for specific commitment. For the ordinal response-level pressure variables, linearly weighted

κ

was 0.943 for factual specificity, 0.942 for emotional intensity, and 0.884 for accountability directness; clip valence reached a Cohen’s

κ

of 0.970. For the independently coded severity control, kappas for the five seriousness cues ranged from 0.696 to 0.888, and the resulting 0–3 problem-severity score reached a linearly weighted

κ

of 0.760. Table A1 summarizes these results, and Appendix B gives additional measurement details.

3.8. Model Strategy

The quantitative design combines three complementary components.

The baseline specification works at the row level. Using single official responses as observations, separate binary logit models are estimated for acknowledgement, deflection, generic commitment, and specific commitment. A sanction-threat indicator is reported separately as an additional outcome that captures downward punitive language rather than one of the core strategic dimensions. Standard errors are clustered at the episode level.

Parallel binary models are used rather than forcing the response package into a mutually exclusive multinomial outcome because the central phenomenon is the co-occurrence of acknowledgement, defense, and commitment. This specification also fits the multi-label structure of the coded text, which preserves overlapping response components rather than collapsing them into a single label [54,64]. Although the latent errors across strategy dimensions may be correlated, parallel logits are retained as the baseline specification because they map directly onto the odds of each interactional strategy and avoid imposing a stronger joint-normal-error structure on non-mutually exclusive speech acts at this stage of the analysis. Because the televised setting is highly interactive, however, the row-level models should be read as estimating contemporaneous associations between the current pressure window and the current official reply rather than clean one-way causal effects. It is entirely plausible that evasive or defensive behavior in turn

t - 1

provokes stronger emotional intensity or greater accountability directness in turn t. That concern is addressed only partially through chain fixed effects, historical-maximum specifications, sequence analysis, and an additional lagged-feedback diagnostic reported in Table A2. Those devices reduce but do not eliminate bidirectional feedback.

The second component is a unit-chain terminal model in which the observation is the final response of the same unit within the same chain. For each unit-chain, historical-maximum pressure measures are constructed prior to the terminal response, including the maximum factual specificity, emotional intensity, accountability directness, and evidence indicators experienced earlier in the chain, as well as the number of responses by that unit within the chain. This model addresses a narrower question than full step-by-step causal transition modeling: whether the highest pressure encountered earlier in the chain relates to the strategic character of the final statement.

The third component is a sequence analysis. Because the data preserve within-chain order, the analysis also examines adjacent state transitions and estimates discrete-time first-event hazards for two events: first visible concession, defined as acknowledgement or specific commitment, and first specific commitment. Here, visible concession refers to observable concessionary language, not evidence that a substantive remedy has already occurred.

Studies of e-government responsiveness show that actual contact attempts can reveal whether public interfaces deliver timely and usable responses, rather than merely whether institutions have adopted digital channels [75,76]. Adaptive-governance research likewise frames responsiveness as a temporal balancing problem between adaptation, stability, and accountability [77]. Methodologically, discrete-time event-history models provide standard tools for analyzing the timing of transitions in ordered response sequences [78,79]. Sequence analysis complements them by foregrounding path dependence in ordered interaction [80]. In this study, they illuminate timing and path dependence without forcing inference to rest on sparse global transition cells alone.

Finally, an additional lagged-feedback diagnostic reverses the direction of the row-level question. Using the same adjacent-transition sample, it estimates whether official response at turn

t - 1

predicts pressure at turn t, conditional on lagged pressure, current stage, unit type, year, problem severity, and within-chain turn order. The data consist of uneven interaction sequences rather than balanced panel waves, so this diagnostic is not presented as a full cross-lagged panel model or a causal identification strategy. Its results are reported with the robustness checks and in Table A2.

3.9. Robustness Strategy

The robustness checks address three concerns. Row-level models are re-estimated on a pre-commitment subsample restricted to the earlier stages of the program, testing whether the main findings are merely artifacts of the highly scripted commitment stage. The clustering level is then varied from episodes to chains, and chain fixed effects are introduced in the row-level models to absorb time-invariant chain heterogeneity and reduce sensitivity to omitted problem-level characteristics. This fixed-effects specification is deliberately conservative because it absorbs stable within-chain differences in political sensitivity, problem complexity, and underlying scandal seriousness before the pressure variables are evaluated. Together with the severity control and the sequence analysis, these checks assess whether the core patterns are reducible to one modeling choice.

4. Results

4.1. Descriptive Overview of the Response Structure

The empirical analysis uses 3675 valid official responses drawn from 73 episodes, 327 issue chains, and 840 unit-chain observations. In raw frequency terms, acknowledgement of problem or responsibility is the most common core response dimension, appearing 1496 times and accounting for 40.7% of the sample. Routine informational response is also frequent, accounting for 26.3% of the sample; generic commitment accounts for 25.5%; and deflection or shared-responsibility language appears in 22.1%. By contrast, specific commitment accounts for only 7.6%, and sanction-threat language for 8.2%. The descriptive pattern is clear: the most common responses in the televised accountability arena are not high-cost, verifiable concessions, but lower- to medium-cost moves such as acknowledgement, explanation, routine information, and generic commitment.

The responses also show a pronounced composite structure. Figure 2 maps the non-empty co-occurrence patterns across the four core response dimensions. Frequent combinations include not only single-strategy cases but also acknowledgement plus generic commitment, acknowledgement plus deflection, and acknowledgement plus deflection plus generic commitment. Acknowledgement co-occurs with generic commitment 445 times and with deflection 344 times. Although acknowledgement-only replies remain the largest non-empty core pattern, the broader structure is plainly combinatory rather than mutually exclusive. This is direct evidence against a simple admit-versus-deny reading of official behavior. The accountability scene is better understood as a setting in which multiple rhetorical resources are jointly mobilized.

The additional response categories also clarify the residual category. If one looks only at the five reported outcome dimensions, 1104 responses have none of those labels, or about 30.0% of the sample. Once routine informational response, low-information response, effort claiming, and related categories are taken into account, 1006 of those cases receive at least one additional substantive classification. Only 98 responses remain unclassified within the broader response scheme, or 2.7% of the total sample. The data therefore do not contain a large residual category with no interpretable content. Instead, the bureaucratic speech environment contains a substantial layer of technical and routine talk that is kept analytically separate from the more theoretically consequential strategies of acknowledgement, deflection, and commitment.

4.2. Row-Level Main Models

The row-level binary logit models examine how the pre-response pressure variables are associated with the four core response outcomes while controlling for stage, unit type, year, and coded problem severity. Table 1 reports the main coefficients in odds-ratio form for the four core outcomes and the separately reported sanction-threat outcome, while Figure 3 visualizes the same patterns for the four core outcomes.

Emotional intensity provides the most consistent pressure-response pattern in the row-level specification. Higher emotional intensity is associated with higher odds of acknowledgement (OR = 1.931,

p < 0.001

), deflection (OR = 1.164,

p = 0.014

), generic commitment (OR = 1.230,

p < 0.001

), and sanction-threat language (OR = 1.316,

p = 0.031

). It is simultaneously associated with lower odds of specific commitment (OR = 0.718,

p = 0.003

). That contrast is central to the argument. Emotional escalation does not linearly coincide with more high-cost concession. Instead, it is linked to more immediate public response and a broader mix of strategies deployed on air.

Accountability directness shows a different but equally important pattern. It is associated with higher odds of acknowledgement (OR = 1.123,

p = 0.008

) and deflection (OR = 1.259,

p < 0.001

), while being associated with lower odds of generic commitment (OR = 0.853,

p = 0.006

). One clear implication is that concession and defense can coexist under highly visible pressure. Direct naming and responsibility targeting do not map onto simple submission. They often correspond to an acknowledge-and-defend package in which acceptance and boundary work move together.

The evidence split reveals marked internal heterogeneity. Non-clip general evidence is associated with lower odds of deflection (OR = 0.511,

p = 0.034

) and higher odds of specific commitment (OR = 1.828,

p = 0.017

). Negative exposure clips are associated with lower odds of deflection (OR = 0.661,

p = 0.026

) and generic commitment (OR = 0.585,

p = 0.012

). Rectification follow-up clips are associated with lower odds of acknowledgement (OR = 0.323,

p = 0.034

). These patterns underscore why evidence cannot be treated as a simple binary category. Different evidentiary forms map onto different parts of the official response space.

Factual specificity also has a more focused pattern than an undifferentiated pressure interpretation would imply. It is not robustly associated with higher odds of acknowledgement or specific commitment, but it is associated with lower odds of deflection (OR = 0.830,

p = 0.003

) and sanction-threat language (OR = 0.829,

p = 0.011

). When the problem is already more concrete, more locatable, and more verifiable, the space for boundary-based reinterpretation and symbolic downward punishment talk appears to shrink.

The severity control does not eliminate the core findings. It is significantly positive only in the deflection model (OR = 1.126,

p = 0.023

), which suggests that more severe problems can themselves be associated with defensive talk. The larger pattern remains intact: emotional intensity, accountability directness, and evidence type each have distinct and nontrivial relationships with response structure beyond a simple projection of issue seriousness.

4.3. Unit-Chain Terminal Models

The unit-chain terminal model captures a limited but substantively meaningful form of chain dynamics. The observation here is the final response of the same unit within the same issue chain, while the key predictors are historical maximum pressure values encountered earlier in that unit-chain. The purpose is not to estimate full step-by-step causal transitions, but to examine whether accumulated peak pressure leaves a trace in the strategic character of the unit’s final on-air statement. Table 2 reports the corresponding odds ratios, and Figure 4 provides a compact visual summary.

This model points to a more selective and less linear dynamic than a simple accumulation story would predict. Historical maximum emotional intensity is associated with higher odds of terminal acknowledgement (OR = 1.662,

p = 0.001

). Historical maximum accountability directness is associated with higher odds of terminal deflection (OR = 1.655,

p = 0.006

). These findings complement the row-level results by suggesting that emotional pressure leaves behind a stronger trace of public acceptance, while accumulated accountability directness is linked to boundary maintenance and responsibility dilution by the end of the chain.

There is, however, little support for the idea that accumulated pressure is reliably associated with terminal specific commitment. The key historical maximum pressure indicators do not yield a stable positive association with terminal specific commitment, and terminal generic commitment is likewise not robustly linked to the core historical pressure terms. Even when pressure is summarized as the highest level previously encountered in the chain, the final response does not follow a simple pattern in which more pressure corresponds to more concrete concession.

The estimates also show that the more times the same unit speaks within the chain, the less likely the final statement is to contain acknowledgement language (OR = 0.925,

p = 0.003

) or deflection language (OR = 0.935,

p = 0.005

). Longer interaction is therefore not associated with a more comprehensive final statement. One plausible explanation is that final statements become increasingly absorbed into closure scripts, commitment-stage expectations, and end-of-chain formulae as the chain progresses.

The separately reported sanction-threat model shows a similar pattern. Historical maximum emotional intensity (OR = 1.898,

p = 0.023

) and accountability directness (OR = 1.933,

p = 0.015

) are both positively associated with terminal sanction-threat language, whereas historical maximum factual specificity is negatively associated with it (OR = 0.504,

p < 0.001

). This pattern is consistent with the view that intense and directly targeted pressure can sometimes be redirected into downward punitive signaling rather than translated into higher-cost commitment.

Figure 5 compresses the same estimates into a cross-model association grid. The comparison is useful because it makes the contrast between immediate and accumulated pressure visually explicit. Emotional intensity is the most consistent positive correlate of acknowledgement in both panels, while accountability directness is the clearest stable correlate of deflection. By contrast, the terminal panel is visibly sparser than the row-level panel, reinforcing the claim that accumulated pressure leaves a narrower and more selective trace than immediate on-air confrontation.

4.4. Stage Heterogeneity

Stage-level descriptive patterns reinforce the claim that not all response types occupy the same place in the program sequence. In the commitment stage, generic commitment reaches 53.2% and specific commitment 30.8%. In the issue-exposition stages, by contrast, specific commitment appears in only 4.9% of responses. Earlier stages are instead dominated by acknowledgement (41.2%), deflection (22.6%), generic commitment (22.7%), and routine informational talk (27.5%). The program thus exhibits strong process dependence: early and middle stages are primarily devoted to pressure input, explanation, defense, and provisional stance-taking, whereas commitment production is concentrated in later stages.

This is particularly important for interpreting specific commitment. If specific commitment is treated as a response form fully parallel to acknowledgement and deflection, one risks overstating its status as an immediate product of pressure. The stage distribution suggests a more cautious reading. Specific commitment behaves partly like a structured output of the program’s later stages rather than a pure contemporaneous reaction to the immediately preceding pressure window. In that sense, the late emergence of specific commitment reflects not only short-horizon strategic adjustment under pressure but also the dramaturgy of televised accountability itself, in which checkable resolutions are often reserved for the concluding commitment segment. Figure 6 visualizes this stage-specific response structure directly.

4.5. Robustness Checks

The robustness analysis asks whether the core findings depend on one specific stage mix, clustering choice, or chain-level structure. Figure A1 summarizes several of the most consequential comparisons.

When the row-level models are re-estimated on a pre-commitment subsample restricted to the issue-introduction, clip, and transition stages, the positive association between emotional intensity and acknowledgement not only survives but becomes stronger (OR = 2.003,

p < 0.001

). The positive association between accountability directness and deflection also remains stable (OR = 1.239,

p < 0.001

). These results indicate that the main findings are not simply artifacts of the later commitment stage.

The same holds when the clustering level changes from episode to chain. Emotional intensity remains positively associated with acknowledgement, accountability directness remains positively associated with deflection, and the negative association between negative exposure clips and generic commitment remains significant. The substantive story therefore does not depend on one clustering rule.

The core theoretical relationships also survive when chain fixed effects are introduced. Emotional intensity remains positively associated with acknowledgement (OR = 1.890,

p < 0.001

), and accountability directness remains positively associated with deflection (OR = 1.174,

p = 0.005

). This is a conservative within-chain test because the fixed effects absorb all time-invariant issue-level features, including stable differences in political sensitivity, problem complexity, and baseline scandal seriousness. This specification cannot remove all omitted-variable concerns, but it shows that the article’s central claims are not driven entirely by between-chain heterogeneity.

A final diagnostic addresses possible reverse feedback from official response to subsequent pressure. Table A2 shows that prior deflection is associated with higher subsequent accountability directness (coefficient = 0.102,

p < 0.05

), prior acknowledgement is associated with higher subsequent emotional intensity (coefficient = 0.117,

p < 0.001

), and prior generic and specific commitment are associated with higher subsequent factual specificity (coefficients = 0.186 and 0.243, both

p < 0.001

). Some weaker negative associations also appear between prior deflection and later exposure or review clips. These patterns indicate that interactional feedback is present: later pressure can partly respond to earlier official behavior. At the same time, the feedback is selective rather than uniform, and the diagnostic does not overturn the main interpretation of structured pressure-response associations. It is therefore best read as a caution against one-way causal claims, not as a replacement for the main models.

4.6. Sequential Patterns and Discrete-Time Event Histories

Because the data preserve repeated response order within issue chains, the analysis also examines unit-chains with at least two responses by the same unit. This yields 447 repeated unit-chains and 2835 adjacent transitions.

The most frequent adjacent transitions are 0 to 0, A to A, 0 to A, A to 0, 0 to D, 0 to G, D to 0, and D to D. Here, 0 denotes none of the four core strategies, A acknowledgement, D deflection, G generic commitment, and S specific commitment. The dominance of these transitions shows that repeated official responses do not simply follow a monotonic escalation path. Officials move back and forth between routine response, acknowledgement, and defense rather than progressing mechanically toward ever stronger concession.

Figure 7 adds a fuller row-normalized view of those state changes. Two points stand out. First, the diagonal cells for none, acknowledgement, deflection, and generic commitment remain relatively dark, which indicates meaningful short-run persistence. Second, many off-diagonal moves flow back into the none or acknowledgement states rather than toward specific commitment. The sequential process therefore looks less like steady escalation and more like repeated repositioning under pressure.

The path-specific evidence is also suggestive. When the previous response contains deflection and the current pressure does not include a negative exposure clip, the current-turn visible-concession rate is 38.7%. When the previous response contains deflection and the current pressure includes a negative exposure clip, the current-turn visible-concession rate rises to 63.6%. The exposed cell is small, but the contrast illustrates why sequence matters for interpreting the same pressure cue. Table 3 reports this path-specific contrast together with the discrete-time event-history models.

The sequential evidence becomes especially informative in discrete-time event-history form. For first visible concession, defined as acknowledgement or specific commitment, 385 of the 447 eligible unit-chains experience the event, an event rate of 86.1%. The median event turn is 1 and the mean is 1.86. Emotional intensity is associated with a higher hazard of first visible concession (OR = 2.088,

p < 0.001

), while the unit’s order within the chain is significantly negative (OR = 0.895,

p = 0.007

). Visible concession therefore tends to occur very early, and stronger emotional pressure is linked to faster occurrence.

The pattern is different for first specific commitment. Only 140 of the 447 eligible unit-chains experience the event, an event rate of 31.3%. The median event turn is 5 and the mean is 6.12. Emotional intensity is associated with a lower hazard of first specific commitment (OR = 0.661,

p = 0.002

), while the unit’s order within the chain is significantly positive (OR = 1.047,

p = 0.007

). Within-chain turn order is, however, highly collinear with programmatic stage progression, so this positive later-turn baseline is interpreted as partly capturing the institutional shift into the program’s concluding commitment phase rather than as a pure temporal clock. This pattern indicates that the kind of pressure most closely linked to immediate public concession is not the same as the kind of pressure linked to higher-cost specific commitment. Figure 8 adds a descriptive view of this timing pattern by combining chain reach at each turn with the cumulative incidence of the two first-event outcomes. Figure 9 presents the corresponding survival curves for the two first-event outcomes.

5. Discussion

The results speak to public accountability and to wider debates on systemic governance in data-mediated urban settings. Smart-governance and public-value research has emphasized that accountability depends on the configuration of information flows, organizations, and civic-facing interfaces [15,16,18]. The Nanning evidence adds a response-level view of that configuration. What matters here is not simply whether an official sounds cooperative, but what kind of public response is produced by a staged oversight interface under visibility, time constraint, and reputational threat. Chinese research on televised and mediated governance similarly treats these formats as organized governance processes rather than as simple acts of disclosure [20,21,22]. In that sense, the main finding is that public pressure redistributes response across acknowledgement, defense, and commitment instead of converting directly into substantive correction.

5.1. Systemic Pressure and Visible Adjustment

The clearest result is that emotional intensity does not simply make officials more compliant. Classic work on blame avoidance explains why public officials have incentives to combine concession with responsibility management [28,30], and discursive accounts further show that acceptance, justification, distancing, and responsibility shifting can appear in the same public response [53,54]. The Nanning models show this mixed pattern directly. Emotional intensity is associated with higher odds of acknowledgement, deflection, and generic commitment while being associated with lower odds of specific commitment. An additional pattern for sanction-threat language points in the same direction: visible pressure can be partly absorbed by redirecting punitive signals toward subordinate actors. Emotional escalation therefore corresponds to more immediate strategic activity rather than moving officials in one uniform direction.

A plausible interpretation is that highly emotional pressure compresses the decision space and increases the value of immediate encounter management. Research on social-evaluative threat, coping, threat rigidity, and decision-making under stress provides a useful interpretive lens for this pattern [35,36,37,38]. In the present data, acknowledgement can reduce immediate interpersonal tension, deflection can protect the self or organization, and generic commitment can defer resolution to a safer future point. Specific commitment, by contrast, requires more precise forward planning and a more checkable public pledge, which helps explain why it is less common under acute emotional pressure.

The sequence evidence reinforces this interpretation. Work on media events and mediatized political communication treats public performance as structured sequence rather than free exchange [32,33,34]. In the Nanning setting, emotional intensity is associated with a higher hazard of first visible concession but a lower hazard of first specific commitment. Emotional pressure is therefore more closely tied to on-air concession in the sense of visible acknowledgement or surface accommodation. It does not automatically correspond to high-cost, checkable commitments.

The results also clarify the double-edged nature of accountability directness. Direct responsibility targeting is associated with acknowledgement and deflection at the same time, and in the terminal model its cumulative maximum value remains associated with final deflection. When a host or clip explicitly identifies responsibility, officials face stronger response pressure. But the modal response is not simple submission. It is a mixed package in which acknowledgement coexists with boundary work, coordination claims, or shared-responsibility language.

In this case, highly visible naming does not collapse bureaucratic defense into pure confession. Comparative work on blame games and boundary work is useful here because it treats public responsibility claims as relational and contested rather than automatic [31,56]. Direct responsibility targeting raises the interpersonal and reputational stakes of the moment, which can make a defensive response package more likely even when some acknowledgement is unavoidable. The televised scene therefore reveals not a clean shift from denial to honesty, but a more complex coexistence of concession and defense under intense public visibility.

5.2. Constraint, Evidence, and the Limits of High-Cost Commitment

If emotional intensity mainly corresponds to response activation, factual specificity and evidence appear to work more by narrowing the available maneuvering space. Work on transparency and blame avoidance has long warned that visibility can coexist with strategic responsibility management [29,30]. The present findings specify one condition under which that room narrows: factual specificity is associated with lower odds of deflection, negative exposure clips are associated with lower odds of deflection and generic commitment, and non-clip evidence is associated with higher odds of specific commitment. These results also support the interpretation of generic commitment as a form of temporal blame displacement. Vague future-oriented promises are most useful when immediate pressure can still be absorbed rhetorically; once a problem is framed in more concrete, verifiable, and documentable terms, that room shrinks.

The evidence split is therefore essential. A negative exposure clip, a rectification review clip, and general documentary evidence all count as evidence in a broad sense, but they do not work in the same way. Negative clips seem especially effective at limiting low-information verbal maneuvering. General non-clip evidence is more closely associated with specific commitment, likely because it is often tied to named targets, documents, data records, dates, lists, or procedural traces that create a more stable factual anchor. A video clip can be powerful and morally vivid, but it remains a staged, selected, and formatted presentation of the problem, a point consistent with work on media events and mediatized political communication [32,34]. That form of evidence may leave officials more room to dispute representativeness than documents, records, or named procedural traces do. Documentary or data-based evidence is less easily reframed as a visual impression alone, and this interpretation fits wider data-governance work that treats information flows as structuring accountability and public value arrangements [16,18]. It can narrow the reply space by making the object of rectification easier to name and the later promise easier to check. Treating all evidence as one binary input would flatten the heterogeneity that matters most in this setting.

The timing evidence also helps explain the row-level results. First visible concession usually occurs early, with a median turn of 1. Specific commitment appears much later, with a median turn of 5. The two events therefore occupy different parts of the interaction: one is immediate and publicly visible, whereas the other is rarer, later, and more dependent on continued program progression.

This timing pattern also helps explain why emotional intensity is associated with higher odds of acknowledgement and generic commitment but lower odds of specific commitment. The literature on mediatization emphasizes that political communication is shaped by staged formats, phases, and media logics [32,33]. In this case, emotional escalation is linked to a quicker shift into visible response, but not directly into high-cost commitment. Specific commitment depends much more heavily on later-stage interaction, stage-specific expectations, and, in some cases, harder evidence. High-cost commitment is therefore not simply a delayed by-product of pressure. It is also an institutional product of the televised sequence, where the program format helps determine when checkable commitments become speakable.

This does not mean that pressure is irrelevant for specific commitment. Non-clip evidence is positively associated with it. The broader pattern, however, suggests that specific commitment is not mainly an immediate function of preceding pressure. It is a later-stage structured output shaped by program sequence, response turn order, and only some pressure types. In this setting, specific commitment is more valuable than generic commitment, but it is not equivalent to accountability success.

This interpretation also aligns with systems-oriented work on smart-city governance. Studies of smart public services and big-data-assisted urban governance emphasize how performance depends on the configuration of services, data, and stakeholder relations across the wider interface [26,81]. In systemic-governance terms, visibility is therefore not a neutral transparency input. It is a structuring force that can increase legible response without guaranteeing higher-value response.

These results do not directly measure officials’ internal states. They show a narrower pattern: observed responses in a high-visibility bureaucratic setting are consistent with short-horizon adjustment under public pressure, and this account helps explain why emotional escalation broadens visible response while failing to produce more frequent high-cost commitment.

5.3. Implications for Systemic Accountability and Public Value in Smart-City Governance

Televised accountability is better understood as a governance interface than as a simple transparency device. Smart-city governance research makes a similar move when it analyzes the relationships among technology, organizational arrangements, public information, and civic-facing interfaces [14,15,59,60]. Public value governance scholarship likewise treats public value as a product of institutional arrangements and multi-actor problem solving, not managerial output alone [2,6]. Televised accountability links media production, evidentiary presentation, bureaucratic hierarchy, and public evaluation in one staged sequence. Within that interface, the observed pattern is not one uniform response to pressure, but a redistribution of output across different response components. Emotional pressure is associated with broader visible response, but not in a way that reliably corresponds to the most substantively costly output. This helps clarify how systemic accountability can generate legibility and movement without necessarily generating high-cost commitment.

The accountability implication is that televised oversight strengthens answerability more clearly than it proves accountability effectiveness. Bovens’s actor–forum formulation is helpful here: officials are made to answer before a forum, but the forum is composite, with hosts, viewers, municipal superiors, and later administrative follow-up all partly present [7]. That structure explains why visibility can produce immediate acknowledgement and defense together. Mulgan’s warning against treating accountability as a loose synonym for responsiveness is also important [8]. The Nanning results show responsiveness under public questioning, but they do not show that every response produces sanction, correction, or learning. Evaluated against the democratic, constitutional, and learning functions of accountability, the program’s strongest observed effect is to make official answer-giving visible and sequentially traceable; its weaker point is the selective and delayed production of checkable commitment [9,10].

Informational architecture matters as well. Accountability and blame management need to be analyzed together with the arrangements that make response publicly visible. Factual specificity and evidentiary form do not simply add more pressure. They change the structure of the reply space itself. Concrete documentary inputs narrow room for rhetorical maneuvering, whereas some forms of exposure mainly force immediate visible adjustment. This bears directly on current debate over urban accountability, because oversight quality depends not only on whether exposure occurs, but also on how information is curated, staged, and sequenced. The same point carries over to smart-city governance. Research on big data, open government data, and data-driven accountability links public value to the movement of information across organizational boundaries [16,17,18,26]. The Nanning results suggest that complaint signals need to be converted into replies specific enough to support later verification, coordination, and learning.

The public value implication is equally specific. Moore’s account directs attention to whether public action connects a value proposition to authorization and operational capacity, while Bozeman’s work warns that visible activity can still coexist with public-value failure when public values are weakly articulated, aggregated, or sustained over time [1,3,4]. If visible accountability is mainly associated with quick acknowledgement, mixed defense, and generic commitment, then system performance cannot be judged solely by whether officials respond on air. A more demanding standard asks whether the oversight interface produces responses that are specific enough to support later verification, follow-up, and learning. On that criterion, the Nanning case suggests both the value and the limit of public exposure: it is linked to greater legibility and movement, but it does not by itself guarantee higher-quality commitment.

The distinction between symbolic accommodation and checkable commitment therefore has broader implications for mediated accountability arrangements. For public value creation, a generic promise can preserve the appearance of action while leaving the link among public purpose, authorization, and operational capacity underspecified. A specific commitment does not prove that public value has been created, but it gives later actors a more determinate object around which implementation, verification, and operational adjustment can be organized [1,3,5]. For governance legitimacy, the televised interface can demonstrate that grievances have been heard and that officials are publicly answerable, a function that is especially important in managed participatory settings [46,52,58,62]. Yet legitimacy grounded only in visible responsiveness remains fragile if response does not become checkable. For institutional learning, the key issue is whether the public account creates a traceable object for later questioning, correction, and feedback. Specific commitment is analytically important because it is the response form most likely to leave such an object, even though the present study cannot verify whether the object is later acted on [9,10,77]. This distinction is therefore not only a measurement issue; it identifies a core systems-governance problem: whether public visibility can be converted into accountable follow-up, institutional learning, and public value.

Sequence and feedback deserve equal attention. The distinction between early visible concession and later specific commitment matters because it separates two governance problems. One concerns how the interface manages the immediate encounter under public scrutiny. The other concerns when it yields a more checkable and costly response mode. The timing of accountability output is therefore part of the governance process rather than a secondary detail. Adaptive governance research stresses learning and adjustment over time [77], while sequence analysis gives a vocabulary for identifying path-dependent ordering in social processes [80]. The Nanning sequence results fit that logic: immediate concession is often part of managing the public moment, whereas specific commitment is more likely to emerge later, under stronger factual and evidentiary constraint and under stage-specific expectations. This connects the findings to systemic governance in complex urban settings: evidence flows, staged interaction, hierarchical pressure, and public evaluation interact over time, so the quality of accountability depends on the response pathway that the interface activates rather than on visibility alone.

This interpretation also clarifies the balance between theatrical accountability and authoritarian responsiveness. The televised format can work as a partial safety valve: by drawing grievances into a managed media sequence, it channels social pressure into acknowledgement, explanation, and symbolic movement. Work on Chinese media politics shows that public communication may reduce or intensify pressure depending on how agenda control and publicity interact [58]. But the format is not merely a safety valve, because it can also place concrete failures, named agencies, and public commitments into a visible sequence. At the same time, it does not suspend the hierarchical and managed character of Chinese participatory channels. Studies of authoritarian deliberation and consultative authoritarianism show that controlled voice and consultation may coexist with concentrated agenda control, while research on authoritarian responsiveness and conditional receptivity shows that public-facing response is selective and institutionally filtered [43,44,46,62]. The Nanning evidence fits that mixed picture. The interface can absorb social discontent, but it can also create some moments in which evidentiary constraint makes more checkable commitment possible. Its systemic significance therefore lies less in either pure performance or full democratic accountability than in the way staged publicity reorganizes responsibility signals inside an urban administrative hierarchy.

5.4. Practical Implications

For organizers of televised accountability and related public-evaluation formats, the results separate two design goals that are often conflated. Emotional escalation can coincide with immediate visible response. It is much less useful for producing checkable commitment and may even work against it. More concrete factual anchors, documentable evidence, and continued follow-up appear more useful when the aim is to anchor response in higher-cost commitment. Contemporary urban oversight systems therefore need more than visibility technologies and exposure mechanisms alone. They also need evidentiary discipline and follow-up pathways if they are to generate public value beyond symbolic adjustment.

For public organizations, the practical implication is equally clear. In high-visibility accountability settings, officials are likely to default toward low-commitment and reputationally defensive language unless they enter the interaction with enough factual preparation, clearer responsibility mapping, and a realistic basis for committing to specific action. Response protocols built around those needs may reduce the drift from public concession to vague future promises.

5.5. Limitations and Future Research Directions

Several limits remain. The analysis identifies structured associations in a highly organized accountability setting rather than strong exogenous causal effects. This boundary is important because pressure and response are produced inside the same unfolding interaction. Hosts may intensify questioning after evasive answers, officials may anticipate later commitment stages, and program editors may select evidence in ways related to issue seriousness. The pressure-window design preserves temporal order at the turn level, and the models add severity controls, stage controls, chain fixed effects, historical-maximum specifications, sequence analysis, and an additional lagged-feedback diagnostic. In that diagnostic, some prior response components predict subsequent pressure: prior deflection is associated with higher subsequent accountability directness, prior acknowledgement is associated with higher subsequent emotional intensity, and prior generic or specific commitment is associated with higher subsequent factual specificity. These patterns confirm that interactional feedback is present rather than negligible. At the same time, the diagnostic models have limited explanatory power and do not overturn the main interpretation that the results should be read as structured pressure-response associations. A full cross-lagged panel design, an experimental vignette, or a setting with quasi-random variation in evidence presentation would be needed for stronger causal claims.

Because the empirical material comes from Nanning, the evidence represents a mature city-level case rather than a direct estimate of all Chinese televised accountability programs. The value of the case lies in its routinized and institutionalized format, which keeps sustained response patterns observable, but future multi-city and multi-province work is still needed to test how far these response packages travel across different regional governance ecologies. Nor does the study cover the full ecology of urban complaint handling, which in many cities also includes hotlines, online inquiry systems, platform complaint channels, and internal administrative dashboards. The coded problem-severity control partially adjusts for problem seriousness but does not eliminate omitted-variable concerns.

The intercoder agreement results further show that agreement is generally high but not equally strong across all constructs: specific commitment remains harder to identify than acknowledgement or generic commitment, and the chain-level severity score still contains some one-step disagreements. The sequence analysis makes better use of temporal structure, but some high-pressure cells remain too small to support a full global state-transition model. Specific commitment is also not equivalent to downstream rectification, so the analysis does not estimate ultimate governance effectiveness. A commitment that names an action, actor, time, or target is more checkable than a vague promise, but it can still be delayed, diluted, or unfulfilled after the broadcast. For that reason, the article treats specific commitment as a meaningful proxy for the quality of public answerability inside the program, not as a direct measure of completed governance improvement.

A separate measurement limitation concerns model-assisted text coding. Fixed prompts, deterministic settings, targeted researcher review, and independent agreement checks reduce the risk that the LLM generates unsupported interpretations, but they do not make automated classification neutral. The model may still overread formulaic administrative speech, miss pragmatic ambiguity, or reproduce genre-specific assumptions about Chinese official language. The reported agreement statistics are therefore best understood as evidence that the final variables are sufficiently reliable for the present analysis, not as evidence that model-assisted coding removes interpretive judgement [64,67,72,73]. Future work could compare multiple models and use larger fully human-coded validation sets.

Future work can strengthen the micro-mechanism side of the design more directly. One route would be to combine transcript analysis with expert surveys, coder-based ratings of perceived threat, or experiments that manipulate emotional pressure, evidence form, and direct responsibility cues. Another would be to compare public officials with other professional groups who also operate under social evaluation and time pressure. A third would be to extend the analysis across regions and program formats to see whether the same pressure-response pathway appears in less institutionalized accountability settings. It would also be valuable to connect televised accountability to broader urban data and complaint infrastructures and to collect follow-up evidence on whether checkable commitments are actually implemented after broadcast.

6. Conclusions

Using a sequential corpus from Nanning’s long-running televised accountability program, this study offers a response-level account of televised oversight as a systemic accountability interface. Emotional intensity emerges as the most stable pressure dimension, but it is mainly linked to immediate and mixed public response rather than high-cost commitment. Factual specificity and evidence types work more by reorganizing the available rhetorical space. Accountability directness has a double-edged pattern, with acknowledgement and defense moving together. Sequence analysis further shows that visible concession is heavily front-loaded, whereas specific commitment is delayed and confined to a smaller subset of chains. An additional finding is that emotional intensity is also associated with more sanction-threat language, which is more consistent with downward punitive signaling than with substantive correction.

Taken together, the findings support a view of televised accountability as a socio-technical governance arrangement in which media staging, evidentiary presentation, and bureaucratic response are tightly coupled. Official response in that setting is not a simple choice between admitting and denying, nor does stronger pressure linearly produce better governance. The observed pattern is a redistribution of acknowledgement, deflection, and commitment across a highly structured and mediatized sequence. Generic commitment often carries the logic of temporal blame displacement, whereas specific commitment is more likely to emerge as a later-stage and more institutionally structured output. The Nanning evidence does not by itself exhaust all regional variation, but it does show the value of treating televised oversight as a structured urban-governance process rather than only as a media event or a governance slogan.

The emphasis therefore shifts from whether televised accountability works in the abstract to how a public oversight interface organizes response under visibility. For smart-city governance, the main implication is that visible data, complaint inputs, and exposure mechanisms cannot be evaluated apart from the response pathways they activate. For public value and accountability research, the point is parallel: answerability becomes more visible, but higher-quality public value depends on whether the interface can convert that visibility into checkable commitments, follow-up, and learning. Future work can refine the boundary of specific commitment, improve chain-level severity measurement, connect televised accountability to broader complaint and data infrastructures, and extend the sequential models. Even within its current limits, the findings show that understanding urban accountability requires close attention to how information inputs, staged interaction, and feedback timing jointly shape the public outputs of governance.

Author Contributions

Conceptualization, H.Z.; methodology, H.Z. and Y.J.; data curation, H.Z. and Y.J.; formal analysis, H.Z.; writing—original draft preparation, H.Z.; writing—review and editing, H.Z., Y.J. and L.Z.; visualization, H.Z.; supervision, L.Z.; funding acquisition, L.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China [grant number 72434003].

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Restrictions apply to the availability of the complete raw transcript corpus. The raw corpus was derived from third-party televised program materials, contains identifiable names, organizations, and contextual references, and is also being used in an ongoing related research project; therefore, it is not publicly shared in full at this stage. Requests to access additional restricted materials for verification purposes should be directed to the corresponding author and will be considered subject to applicable copyright, licensing, de-identification, and ongoing-research constraints.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

GLM	Generalized linear model
LLM	Large language model
OR	Odds ratio

Appendix A. Intercoder Agreement

Table A1. Intercoder agreement for the main text measures.

Agreement Sample	Variable	Agreement Statistic
Response ( $n = 400$ )	Acknowledgement	Cohen’s $κ = 0.929$
Response ( $n = 400$ )	Deflection	Cohen’s $κ = 0.873$
Response ( $n = 400$ )	Generic commitment	Cohen’s $κ = 0.923$
Response ( $n = 400$ )	Specific commitment	Cohen’s $κ = 0.734$
Response ( $n = 400$ )	Factual specificity	Linearly weighted $κ = 0.943$
Response ( $n = 400$ )	Emotional intensity	Linearly weighted $κ = 0.942$
Response ( $n = 400$ )	Accountability directness	Linearly weighted $κ = 0.884$
Response ( $n = 400$ )	Clip valence	Cohen’s $κ = 0.970$
Severity ( $n = 100$ )	Safety or life risk	Cohen’s $κ = 0.888$
Severity ( $n = 100$ )	Economic or property loss	Cohen’s $κ = 0.696$
Severity ( $n = 100$ )	Affected population scale	Cohen’s $κ = 0.727$
Severity ( $n = 100$ )	Repeat exposure or unresolved history	Cohen’s $κ = 0.802$
Severity ( $n = 100$ )	Clear illegality	Cohen’s $κ = 0.857$
Severity ( $n = 100$ )	Coded problem-severity score (0–3)	Linearly weighted $κ = 0.760$

Notes: The second coder independently coded a random subsample of 400 official responses and a random subsample of 100 issue chains using only introductory problem material. The coded problem-severity score is a 0–3 ordinal chain-level measure based on the five seriousness cues listed above; the cues informed the ordinal judgement and were not mechanically summed.

Appendix B. Text Coding and Problem-Severity Measurement

For the response outcomes, classification was based on the text of the current official response. Contextual information such as the immediately preceding pressure-window text, previous same-unit text, episode, chain, and stage was used only to clarify the interactional boundary and reduce local ambiguity. It was not treated as a substitute for the response text itself.

LLM classification was constrained by fixed prompts. The prompts defined the categories, positive and negative criteria, exclusion rules, and required categorical answers rather than open-ended summaries. For outcome measures, responses were judged on specific speech functions that were then mapped to the core variables. For pressure measures, the model identified factual anchors and ordinal cue levels under fixed rubrics. Both response and pressure measures were classified with the ‘qwen3.6-plus’ model release through the Alibaba Cloud DashScope interface (Alibaba Cloud, Hangzhou, China) using deterministic settings (temperature = 0). At the time of coding, ‘qwen3.6-plus’ was the current release in the Qwen Plus line on DashScope. The source material consists of Chinese television transcripts with colloquial compression, administrative terminology, and context-dependent references, so a high-capacity Qwen model was appropriate for Chinese-language classification. This LLM classification was paired with structured prompting, targeted researcher review, and subsequent agreement checks.

Researcher review focused on the boundaries most likely to affect substantive inference. These included responses initially classified as specific commitment or sanction threat, generic-versus-specific commitment boundary cases, deflection-family cases, and routine-informational or low-information cases. The intercoder agreement results reported in Table A1 then evaluated both the response outcomes and the pressure variables.

The coded problem-severity control was measured separately at the chain level from clip and host-background text only, with official response content excluded. The final score is a 0–3 ordinal assessment based on five seriousness cues: public-safety or health risk, economic or property loss, affected population scale, repeated exposure or long-term unresolved failure, and explicit illegality or rule violation. The cues informed the final severity judgement but were not mechanically summed. This separation prevents problem seriousness from being inferred from the same official language later used to construct the dependent variables. The measure is therefore source-separated rather than ground-truth severity, because the clip and host narration are themselves media framings that may dramatize the issue.

Appendix C. Additional Lagged-Feedback Diagnostic

Table A2. Additional lagged-feedback diagnostic: prior official response and subsequent pressure.

Current Pressure Outcome	Prior Acknowledgement	Prior Deflection	Prior Generic Commitment	Prior Specific Commitment
Factual specificity	−0.017 (0.036)	0.098 * (0.041)	0.186 *** (0.042)	0.243 *** (0.069)
Emotional intensity	0.117 *** (0.029)	−0.020 (0.032)	0.029 (0.036)	0.107 (0.067)
Accountability directness	0.058 (0.040)	0.102 * (0.045)	0.043 (0.049)	0.107 (0.075)
Non-clip evidence	−0.002 (0.005)	−0.004 (0.006)	0.008 (0.008)	0.003 (0.012)
Negative exposure clip	0.003 (0.006)	−0.014 * (0.006)	0.003 (0.008)	−0.009 (0.014)
Rectification follow-up clip	−0.002 (0.002)	−0.002 * (0.001)	0.002 (0.003)	−0.001 (0.001)

Notes: Entries are coefficients with unit-chain-clustered standard errors in parentheses from additional diagnostic linear models estimated on 2835 adjacent transitions within 447 repeated unit-chains. Each row uses the current pressure variable at turn t as the outcome and prior official-response indicators at turn

t - 1

as focal predictors. Models also control for lagged pressure variables, current stage, unit type, year, coded problem severity, and within-chain turn order. This diagnostic is used to probe short-run feedback from official response to subsequent pressure; it is not a full cross-lagged panel model or a causal identification strategy. *

p < 0.05

, ***

p < 0.001

.

Appendix D. Additional Robustness Figure

Figure A1. Robustness comparison across core pressure paths and evidence-sensitive paths under alternative clustering and fixed-effects specifications.

References

Moore, M.H. Creating Public Value: Strategic Management in Government; Harvard University Press: Cambridge, MA, USA, 1995. [Google Scholar]
Benington, J.; Moore, M.H. (Eds.) Public Value: Theory and Practice; Palgrave Macmillan: Basingstoke, UK, 2011. [Google Scholar] [CrossRef]
Bozeman, B. Public-value failure: When efficient markets may not do. Public Adm. Rev. 2002, 62, 145–161. [Google Scholar] [CrossRef]
Bozeman, B. Public Values and Public Interest: Counterbalancing Economic Individualism; Georgetown University Press: Washington, DC, USA, 2007. [Google Scholar]
Hartley, J.; Alford, J.; Knies, E.; Douglas, S. Towards an empirical research agenda for public value theory. Public Manag. Rev. 2017, 19, 670–685. [Google Scholar] [CrossRef]
Bryson, J.M.; Crosby, B.C.; Bloomberg, L. Public value governance: Moving beyond traditional public administration and the new public management. Public Adm. Rev. 2014, 74, 445–456. [Google Scholar] [CrossRef]
Bovens, M. Analysing and assessing accountability: A conceptual framework. Eur. Law J. 2007, 13, 447–468. [Google Scholar] [CrossRef]
Mulgan, R. ‘Accountability’: An ever-expanding concept? Public Adm. 2000, 78, 555–573. [Google Scholar] [CrossRef]
Schillemans, T. Accountability in the shadow of hierarchy: The horizontal accountability of agencies. Public Organ. Rev. 2008, 8, 175–194. [Google Scholar] [CrossRef]
Bovens, M.; Schillemans, T.; ’t Hart, P. Does public accountability work? An assessment tool. Public Adm. 2008, 86, 225–242. [Google Scholar] [CrossRef]
Xinhua News Agency. Facing Public Concerns and Daring to Confront Hard Problems: Guangxi Nanning’s Live TV Accountability Program. People’s Daily Online, 21 March 2014. Available online: https://politics.people.com.cn/n/2014/0321/c70731-24704474.html (accessed on 8 April 2026). (In Chinese)
Communist Party News Network. Nanning’s TV Accountability Targets Style Problems and Enforces Rectification and Accountability. Reprinted from the Website of the Central Commission for Discipline Inspection. 1 April 2015. Available online: https://news.12371.cn/2015/04/01/ARTI1427887243068509.shtml (accessed on 8 April 2026). (In Chinese)
China Nanning. Review Visits Assess Rectification Outcomes and Improve Governance Capacity: Commitment to the People—TV Accountability Launches Retrospective Episodes. 29 August 2022. Available online: https://www.nanning.china.com.cn/2022-08/29/content_42086987.htm (accessed on 8 April 2026). (In Chinese)
Meijer, A.; Bolívar, M.P.R. Governing the smart city: A review of the literature on smart urban governance. Int. Rev. Adm. Sci. 2016, 82, 392–408. [Google Scholar] [CrossRef]
Meijer, A.J. Datapolis: A public governance perspective on smart cities. Perspect. Public Manag. Gov. 2018, 1, 195–206. [Google Scholar] [CrossRef]
Löfgren, K.; Webster, C.W.R. The value of Big Data in government: The case of smart cities. Big Data Soc. 2020, 7, 2053951720912775. [Google Scholar] [CrossRef]
Viale Pereira, G.; Macadar, M.A.; Luciano, E.M.; Testa, M.G. Delivering public value through open government data initiatives in a Smart City context. Inf. Syst. Front. 2017, 19, 213–229. [Google Scholar] [CrossRef]
Pavone, P.; Ricci, P.; Calogero, M.; Capaccioni, P. A literature overview on data-driven value and accountability: Connecting the private and public dimensions. Public Integr. 2024, 26, 285–304. [Google Scholar] [CrossRef]
Zhang, L.; Song, Y.; Zhang, J. Televised accountability and urban governance innovation: Evidence from Wuhan. Zhejiang Soc. Sci. 2016, 7, 54–63. (In Chinese) [Google Scholar]
Yan, W.; Pan, Z.; Wu, H. Mediatized governance: A comparative analysis of televised accountability cases. J. Commun. 2020, 11, 37–56. (In Chinese) [Google Scholar]
Hou, Y.; Yu, C. Media co-governance: The theoretical lineage, local evolution, and research space of mediatized governance. Mod. Commun. (J. Commun. Univ. China) 2024, 46, 84–91. (In Chinese) [Google Scholar]
Li, C.; Shen, Z. Mediatized governance: Concepts, logic, and a “consensus” orientation. News Writ. 2023, 6, 5–12. (In Chinese) [Google Scholar]
Meng, T.; Zhao, J. Internet-driven responsive government: Institutional diffusion and operational modes of online political inquiry. J. Shanghai Adm. Inst. 2018, 19, 36–44. (In Chinese) [Google Scholar]
Zhao, J.; Meng, T. Digital technology and public crisis governance: Governance capacity and governance effectiveness. J. Cent. Inst. Social. 2021, 1, 172–185. (In Chinese) [Google Scholar]
Chen, T.; Liang, Z.; Yi, H.; Chen, S. Responsive e-government in China: A way of gaining public support. Gov. Inf. Q. 2023, 40, 101809. [Google Scholar] [CrossRef]
Zhang, Z.; Zhang, T. Big-data-assisted urban governance: A machine-learning-based data record standard scoring method. Systems 2025, 13, 320. [Google Scholar] [CrossRef]
Wang, Y.; Meng, T. Pressure as motivation: Vertical power relations and local government responsiveness. Soc. Sci. 2022, 12, 85–94. (In Chinese) [Google Scholar]
Weaver, R.K. The politics of blame avoidance. J. Public Policy 1986, 6, 371–398. [Google Scholar] [CrossRef]
Hood, C. What happens when transparency meets blame-avoidance? Public Manag. Rev. 2007, 9, 191–210. [Google Scholar] [CrossRef]
Hood, C. The Blame Game: Spin, Bureaucracy, and Self-Preservation in Government; Princeton University Press: Princeton, NJ, USA, 2011. [Google Scholar] [CrossRef]
Hood, C.; Jennings, W.; Copeland, P. Blame avoidance in comparative perspective: Reactivity, staged retreat and efficacy. Public Adm. 2016, 94, 542–562. [Google Scholar] [CrossRef]
Schulz, W. Reconstructing mediatization as an analytical concept. Eur. J. Commun. 2004, 19, 87–101. [Google Scholar] [CrossRef]
Strömbäck, J. Four phases of mediatization: An analysis of the mediatization of politics. Int. J. Press/Politics 2008, 13, 228–246. [Google Scholar] [CrossRef]
Dayan, D.; Katz, E. Media Events: The Live Broadcasting of History; Harvard University Press: Cambridge, MA, USA, 1992. [Google Scholar] [CrossRef]
Dickerson, S.S.; Kemeny, M.E. Acute stressors and cortisol responses: A theoretical integration and synthesis of laboratory research. Psychol. Bull. 2004, 130, 355–391. [Google Scholar] [CrossRef]
Folkman, S.; Moskowitz, J.T. Coping: Pitfalls and promise. Annu. Rev. Psychol. 2004, 55, 745–774. [Google Scholar] [CrossRef]
Staw, B.M.; Sandelands, L.E.; Dutton, J.E. Threat-rigidity effects in organizational behavior: A multilevel analysis. Adm. Sci. Q. 1981, 26, 501–524. [Google Scholar] [CrossRef]
Starcke, K.; Brand, M. Decision making under stress: A selective review. Neurosci. Biobehav. Rev. 2012, 36, 1228–1248. [Google Scholar] [CrossRef] [PubMed]
Dai, Y.; Ge, J. Governance as mediatization? Rethinking indigenous approaches to mediatization research. Glob. J. Media Stud. 2023, 6, 146–163. (In Chinese) [Google Scholar] [CrossRef]
Xu, K.; Huang, C. Mediatized governance in the national governance system: Key concepts and application scenarios. Ed. Friend 2023, 9, 35–41. (In Chinese) [Google Scholar]
Zhou, E.; Zhang, W. The interactive logic of public demands and government response in the context of digital empowerment. J. Henan Norm. Univ. (Philos. Soc. Sci. Ed.) 2023, 50, 22–27. (In Chinese) [Google Scholar]
Meng, L. Willingness-driven or capacity-driven? A public-oriented approach to responsive government building. J. Henan Norm. Univ. (Philos. Soc. Sci. Ed.) 2023, 50, 76–81. (In Chinese) [Google Scholar]
Chen, J.; Pan, J.; Xu, Y. Sources of authoritarian responsiveness: A field experiment in China. Am. J. Polit. Sci. 2016, 60, 383–400. [Google Scholar] [CrossRef]
Meng, T.; Pan, J.; Yang, P. Conditional receptivity to citizen participation: Evidence from a survey experiment in China. Comp. Polit. Stud. 2017, 50, 399–433. [Google Scholar] [CrossRef]
Distelhorst, G.; Hou, Y. Constituency service under nondemocratic rule: Evidence from China. J. Polit. 2017, 79, 1024–1040. [Google Scholar] [CrossRef]
Truex, R. Consultative authoritarianism and its limits. Comp. Polit. Stud. 2017, 50, 329–361. [Google Scholar] [CrossRef]
Cai, Y.; Zhou, T. Online political participation in China: Local government and differentiated response. China Q. 2019, 238, 331–352. [Google Scholar] [CrossRef]
Liu, Y.; Araral, E.; Wu, J. Policy responsiveness and its administrative organisation in China. Policy Politics 2024, 52, 360–383. [Google Scholar] [CrossRef]
Weng, S.; Schwarz, G.; Schwarz, S.; Hardy, B. A framework for government response to social media participation in public policy making: Evidence from China. Int. J. Public Adm. 2021, 44, 1424–1434. [Google Scholar] [CrossRef]
Huang, Y.-H.C.; Lu, Y.; Choy, C.H.Y.; Kao, L.; Chang, Y.-T. How responsiveness works in mainland China: Effects on institutional trust and political participation. Public Relat. Rev. 2020, 46, 101855. [Google Scholar] [CrossRef]
Medaglia, R.; Zhu, D. Public deliberation on government-managed social media: A study on Weibo users in China. Gov. Inf. Q. 2017, 34, 533–544. [Google Scholar] [CrossRef]
Maerz, S.F. The electronic face of authoritarianism: E-government as a tool for gaining legitimacy in competitive and non-competitive regimes. Gov. Inf. Q. 2016, 33, 727–735. [Google Scholar] [CrossRef]
Hansson, S. Discursive strategies of blame avoidance in government: A framework for analysis. Discourse Soc. 2015, 26, 297–322. [Google Scholar] [CrossRef]
Hansson, S. Defensive semiotic strategies in government: A multimodal study of blame avoidance. Soc. Semiot. 2018, 28, 472–493. [Google Scholar] [CrossRef]
Hansson, S. Coercive impoliteness and blame avoidance in government communication. Discourse Context Media 2024, 61, 100770. [Google Scholar] [CrossRef]
Stern, R.E.; O’Brien, K.J. Politics at the boundary: Mixed signals and the Chinese state. Mod. China 2012, 38, 174–198. [Google Scholar] [CrossRef]
Liebman, B.L. Watchdog or demagogue? The media in the Chinese legal system. Colum. Law Rev. 2005, 105, 1–157. [Google Scholar]
Hassid, J. Safety valve or pressure cooker? Blogs in Chinese political life. J. Commun. 2012, 62, 212–230. [Google Scholar] [CrossRef]
Ruijer, E.; Détienne, F.; Baker, M.; Groff, J.; Meijer, A.J. The politics of open government data: Understanding organizational responses to pressure for more transparency. Am. Rev. Public Adm. 2020, 50, 260–274. [Google Scholar] [CrossRef]
Ruijer, E.; Van Twist, A.; Haaker, T.; Tartarin, T.; Schuurman, N.; Melenhorst, M.; Meijer, A. Smart governance toolbox: A systematic literature review. Smart Cities 2023, 6, 878–896. [Google Scholar] [CrossRef]
Castells, M. The new public sphere: Global civil society, communication networks, and global governance. Ann. Am. Acad. Pol. Soc. Sci. 2008, 616, 78–93. [Google Scholar] [CrossRef]
He, B.; Warren, M.E. Authoritarian deliberation: The deliberative turn in Chinese political development. Perspect. Polit. 2011, 9, 269–289. [Google Scholar] [CrossRef]
Distelhorst, G. The power of empty promises: Quasi-democratic institutions and activism in China. Comp. Polit. Stud. 2017, 50, 464–498. [Google Scholar] [CrossRef]
Li, C.M.; Sasso, G.; Turner, I.R. The use of text as data methods in public administration: A review and an application to agency priorities. J. Public Adm. Res. Theory 2019, 29, 474–490. [Google Scholar] [CrossRef]
Neuendorf, K.A. The Content Analysis Guidebook, 2nd ed.; SAGE Publications: Thousand Oaks, CA, USA, 2017. [Google Scholar] [CrossRef]
Lombard, M.; Snyder-Duch, J.; Bracken, C.C. Content analysis in mass communication: Assessment and reporting of intercoder reliability. Hum. Commun. Res. 2002, 28, 587–604. [Google Scholar] [CrossRef]
Grimmer, J.; Stewart, B.M. Text as data: The promise and pitfalls of automatic content analysis methods for political texts. Polit. Anal. 2013, 21, 267–297. [Google Scholar] [CrossRef]
Gilardi, F.; Alizadeh, M.; Kubli, M. ChatGPT outperforms crowd-workers for text-annotation tasks. Proc. Natl. Acad. Sci. USA 2023, 120, e2305016120. [Google Scholar] [CrossRef]
Heseltine, M.; von Hohenberg, B.C. Large language models as a substitute for human experts in annotating political text. Res. Politics 2024, 11, 20531680241236239. [Google Scholar] [CrossRef]
Törnberg, P. Large language models outperform expert coders and supervised classifiers at annotating political social media messages. Soc. Sci. Comput. Rev. 2025, 43, 1181–1195. [Google Scholar] [CrossRef]
Kristensen-McLachlan, R.D.; Canavan, M.; Kárdos, M.; Jacobsen, M.; Aarøe, L. Are chatbots reliable text annotators? Sometimes. PNAS Nexus 2025, 4, pgaf069. [Google Scholar] [CrossRef]
Lin, H.; Zhang, Y. Navigating the risks of using large language models for text annotation in social science research. Soc. Sci. Comput. Rev. 2025, 44, 403–427. [Google Scholar] [CrossRef]
Wencker, T.; Borst-Graetz, J.; Niekler, A. Text as data for evaluation: Natural language processing and large language models to generate novel insights from unstructured text data. Evaluation 2025, 31, 309–329. [Google Scholar] [CrossRef]
Artstein, R.; Poesio, M. Inter-coder agreement for computational linguistics. Comput. Linguist. 2008, 34, 555–596. [Google Scholar] [CrossRef]
Gauld, R.; Flett, J.; McComb, S.; Gray, A. How responsive are government agencies when contacted by email? Findings from a longitudinal study in Australia and New Zealand. Gov. Inf. Q. 2016, 33, 283–290. [Google Scholar] [CrossRef]
Andersen, K.N.; Medaglia, R. The forgotten promise of e-government maturity: Assessing responsiveness in the digital public sector. Gov. Inf. Q. 2011, 28, 439–445. [Google Scholar] [CrossRef]
Janssen, M.; van der Voort, H. Adaptive governance: Towards a stable, accountable and responsive government. Gov. Inf. Q. 2016, 33, 1–5. [Google Scholar] [CrossRef]
Allison, P.D. Discrete-time methods for the analysis of event histories. Sociol. Methodol. 1982, 13, 61–98. [Google Scholar] [CrossRef]
Willett, J.B.; Singer, J.D. It’s déja vu all over again: Using multiple-spell discrete-time survival analysis. J. Educ. Behav. Stat. 1995, 20, 41–67. [Google Scholar] [CrossRef]
Abbott, A.; Tsay, A. Sequence analysis and optimal matching methods in sociology: Review and prospect. Sociol. Methods Res. 2000, 29, 3–33. [Google Scholar] [CrossRef]
Kim, N.; Yang, S. Conceptually related smart cities services from the perspectives of governance and sociotechnical systems in Europe. Systems 2023, 11, 166. [Google Scholar] [CrossRef]

Figure 1. Conceptual bridge from televised oversight inputs to observable response outputs.

Figure 2. UpSet view of non-empty co-occurrence patterns across the four core response dimensions. The upper bars report the frequency of each observed strategy combination, and the left bars report the marginal frequency of each individual strategy.

Figure 3. Row-level odds ratios for the four core response models. Points report odds ratios, and horizontal intervals report 95% confidence intervals.

Figure 4. Terminal unit-chain odds ratios for the four core response outcomes using historical maximum pressure before the final response. Omitted estimates are not plotted.

Figure 5. Compact cross-model summary of pressure-response associations. Cell color shows signed log odds ratios, and cell entries report odds ratios with conventional significance markers for the four core outcomes. Significance markers denote *

p < 0.05

, **

p < 0.01

, and ***

p < 0.001

. Cells marked “n.a.” are omitted because of model separation or collinearity constraints.

Figure 5. Compact cross-model summary of pressure-response associations. Cell color shows signed log odds ratios, and cell entries report odds ratios with conventional significance markers for the four core outcomes. Significance markers denote *

p < 0.05

, **

p < 0.01

, and ***

p < 0.001

. Cells marked “n.a.” are omitted because of model separation or collinearity constraints.

Figure 6. Stage-specific response structure. Cell entries report within-stage percentages for the main response dimensions, showing the concentration of generic and specific commitment in the commitment stage and the wider spread of acknowledgement, deflection, and routine informational response in earlier stages.

Figure 7. Row-normalized adjacent strategy transition matrix for repeated unit-chain sequences. Row labels include the number of outgoing transitions from each prior state. None denotes responses with none of the four core strategies, and mixed states are aggregated when they fall outside the most common acknowledgement, deflection, commitment, or simple mixed categories.

Figure 8. Turn-level chain reach and cumulative event timing. Bars show how many repeated unit-chains remain observable at each within-unit turn. Lines report the cumulative share of unit-chains that have already reached first visible concession or first specific commitment by that turn.

Figure 9. Descriptive survival curves for first visible concession and first specific commitment within repeated unit-chain sequences. The vertical axis reports the share of unit-chains that have not yet experienced the event by each turn.

Table 1. Row-level models for the four core outcomes and sanction-threat language: odds ratios and 95% confidence intervals.

	Acknowledgement	Deflection	Generic Commitment	Specific Commitment	Sanction Threat
Factual specificity	0.914 [0.816, 1.023]	0.830 ** [0.734, 0.938]	0.962 [0.852, 1.086]	1.080 [0.877, 1.329]	0.829 * [0.718, 0.957]
Emotional intensity	1.931 *** [1.698, 2.195]	1.164 * [1.031, 1.315]	1.230 *** [1.095, 1.380]	0.718 ** [0.575, 0.897]	1.316 * [1.025, 1.689]
Accountability directness	1.123 ** [1.031, 1.223]	1.259 *** [1.144, 1.385]	0.853 ** [0.761, 0.956]	1.000 [0.835, 1.198]	1.150 [0.993, 1.332]
Non-clip general evidence	1.030 [0.671, 1.579]	0.511 * [0.274, 0.952]	0.698 [0.430, 1.134]	1.828 * [1.115, 2.998]	1.350 [0.752, 2.426]
Negative exposure clip	1.132 [0.804, 1.592]	0.661 * [0.460, 0.952]	0.585 * [0.385, 0.887]	0.483 [0.208, 1.121]	0.816 [0.470, 1.416]
Rectification follow-up clip	0.323 * [0.114, 0.916]	0.700 [0.304, 1.615]	0.496 [0.120, 2.053]	0.475 [0.057, 3.957]	–
Coded problem severity	0.988 [0.901, 1.084]	1.126 * [1.016, 1.247]	1.030 [0.949, 1.117]	0.899 [0.758, 1.066]	1.074 [0.900, 1.282]
Observations	3675	3675	3670	3670	3643

Notes: Entries are odds ratios with 95% confidence intervals from row-level binary logit models using episode-clustered standard errors. All models include the full control set; only key pressure terms and the severity control are displayed here. The baseline categories for the categorical controls are the commitment stage, bureau or functional department for unit type, and 2014 for year. Sanction threat is reported separately as an additional outcome. Observation counts vary slightly across models because the estimation sample differs modestly across outcome definitions and because outcome-specific separation or model constraints prevent estimation in parts of the sample. *

p < 0.05

, **

p < 0.01

, ***

p < 0.001

. ‘–’ indicates omission due to model separation or collinearity constraints.

Table 2. Terminal unit-chain models for the four core outcomes and sanction-threat language: odds ratios and 95% confidence intervals.

	Terminal Acknowledgement	Terminal Deflection	Terminal Generic Commitment	Terminal Specific Commitment	Terminal Sanction Threat
Maximum prior factual specificity	0.778 [0.574, 1.055]	0.918 [0.663, 1.271]	0.846 [0.662, 1.080]	1.357 [0.912, 2.020]	0.504 *** [0.339, 0.748]
Maximum prior emotional intensity	1.662 *** [1.238, 2.233]	1.211 [0.841, 1.742]	1.197 [0.909, 1.576]	1.240 [0.774, 1.986]	1.898 * [1.162, 3.101]
Maximum prior accountability directness	1.255 [0.906, 1.737]	1.655 ** [1.152, 2.378]	1.221 [0.925, 1.611]	0.912 [0.624, 1.331]	1.933 ** [1.173, 3.188]
Maximum prior non-clip evidence	0.994 [0.598, 1.652]	1.156 [0.680, 1.967]	0.726 [0.441, 1.195]	0.949 [0.537, 1.677]	0.944 [0.445, 2.002]
Maximum prior negative clip	0.876 [0.531, 1.444]	0.874 [0.497, 1.537]	1.086 [0.671, 1.756]	0.864 [0.457, 1.635]	1.911 [0.934, 3.909]
Maximum prior rectification clip	0.358 [0.066, 1.949]	0.442 [0.050, 3.897]	0.535 [0.206, 1.384]	–	1.008 [0.108, 9.426]
Coded problem severity	1.138 [0.941, 1.378]	1.063 [0.856, 1.319]	0.991 [0.854, 1.150]	0.872 [0.693, 1.096]	0.870 [0.676, 1.119]
Response count within unit-chain	0.925 ** [0.878, 0.974]	0.935 ** [0.892, 0.979]	1.017 [0.977, 1.058]	0.992 [0.927, 1.062]	0.948 [0.891, 1.007]
Observations	840	837	838	813	820

Notes: Entries are odds ratios with 95% confidence intervals from terminal unit-chain binary logit models. Predictors are historical maximum pressure values prior to the unit’s final response within a chain. The baseline categories for the categorical controls are the commitment stage, bureau or functional department for unit type, and 2014 for year. Sanction threat is reported separately as an additional outcome. *

p < 0.05

, **

p < 0.01

, ***

p < 0.001

. ‘–’ indicates omission due to model separation or collinearity constraints.

Table 3. Sequential patterns: selected transition rates and discrete-time hazards. Panel (A). Selected path-specific transition rates. Panel (B). Discrete-time event-history models.

(A)
Scenario	Adjacent transitions	Visible concession rate	Specific commitment rate
Previous-turn deflection + current window without negative clip exposure	656	38.7%	7.0%
Previous-turn deflection + current window with negative clip exposure	11	63.6%	0.0%
(B)
	First visible concession	First specific commitment
Factual specificity	0.948 [0.749, 1.199]	1.014 [0.753, 1.365]
Emotional intensity	2.088 *** [1.625, 2.683]	0.661 ** [0.507, 0.861]
Accountability directness	1.086 [0.886, 1.331]	1.000 [0.792, 1.262]
Non-clip general evidence	0.653 [0.317, 1.347]	1.181 [0.373, 3.741]
Negative exposure clip	0.954 [0.576, 1.582]	0.733 [0.282, 1.910]
Rectification follow-up clip	1.063 [0.249, 4.534]	1.955 [0.228, 16.762]
Coded problem severity	0.894 [0.781, 1.023]	0.833 ^† [0.683, 1.017]
Turn order within unit-chain	0.895 ** [0.825, 0.971]	1.047 ** [1.012, 1.082]
Event count/eligible unit-chains	385/447	140/447

Notes: Panel A reports descriptive transition rates for the small path-specific comparison highlighted in the text. Panel B reports odds ratios and 95% confidence intervals from discrete-time hazard models estimated on repeated unit-chain sequences. ^†

p < 0.10

, **

p < 0.01

, ***

p < 0.001

.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, H.; Ju, Y.; Zheng, L. When Accountability Goes Public: Televised Oversight and Systemic Governance in Urban China. Systems 2026, 14, 615. https://doi.org/10.3390/systems14060615

AMA Style

Zhang H, Ju Y, Zheng L. When Accountability Goes Public: Televised Oversight and Systemic Governance in Urban China. Systems. 2026; 14(6):615. https://doi.org/10.3390/systems14060615

Chicago/Turabian Style

Zhang, Hong, Yifei Ju, and Lei Zheng. 2026. "When Accountability Goes Public: Televised Oversight and Systemic Governance in Urban China" Systems 14, no. 6: 615. https://doi.org/10.3390/systems14060615

APA Style

Zhang, H., Ju, Y., & Zheng, L. (2026). When Accountability Goes Public: Televised Oversight and Systemic Governance in Urban China. Systems, 14(6), 615. https://doi.org/10.3390/systems14060615

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

When Accountability Goes Public: Televised Oversight and Systemic Governance in Urban China

Abstract

1. Introduction

2. Theory and Research Focus

2.1. Televised Accountability, Mediated Governance, and the Missing Systemic Mechanism

2.2. Public Value, Accountability, and Systemic Response Under Visibility

2.3. From Structural Blame Avoidance to Response Packages

2.4. Televised Accountability as a Socio-Technical Governance Interface

2.5. Pressure Dimensions and Differential Expectations

2.6. Temporal Blame Displacement and the Meaning of Commitment

2.7. Research Focus

3. Materials and Methods

3.1. Data Source and Sample Construction

3.2. Analytical Unit, Chains, and Pressure Windows

3.3. Text Coding and Measurement

3.4. Outcome Variables

3.5. Pre-Response Pressure Variables

3.6. Controls and Structural Variables

3.7. Intercoder Agreement

3.8. Model Strategy

3.9. Robustness Strategy

4. Results

4.1. Descriptive Overview of the Response Structure

4.2. Row-Level Main Models

4.3. Unit-Chain Terminal Models

4.4. Stage Heterogeneity

4.5. Robustness Checks

4.6. Sequential Patterns and Discrete-Time Event Histories

5. Discussion

5.1. Systemic Pressure and Visible Adjustment

5.2. Constraint, Evidence, and the Limits of High-Cost Commitment

5.3. Implications for Systemic Accountability and Public Value in Smart-City Governance

5.4. Practical Implications

5.5. Limitations and Future Research Directions

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A. Intercoder Agreement

Appendix B. Text Coding and Problem-Severity Measurement

Appendix C. Additional Lagged-Feedback Diagnostic

Appendix D. Additional Robustness Figure

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI