Designing for Trust, Progress, and Dignity: A Conceptual Framework for Reliability, Responsiveness, and Relational Quality in AI-Enabled Service Systems

Colgate, Mark; Colgate, Orla

doi:10.3390/info17050443

Open AccessReview

Designing for Trust, Progress, and Dignity: A Conceptual Framework for Reliability, Responsiveness, and Relational Quality in AI-Enabled Service Systems

by

Mark Colgate

^1,*,†

and

Orla Colgate

^2,†

¹

Gustavson School of Business, University of Victoria, 3800 Finnerty Road, Victoria, BC V8P 5C2, Canada

²

School of Education and Leadership, City University, Vancouver Island Technology Park, 4464 Markham St., Victoria, BC V8Z 7X8, Canada

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Information 2026, 17(5), 443; https://doi.org/10.3390/info17050443

Submission received: 29 March 2026 / Revised: 25 April 2026 / Accepted: 1 May 2026 / Published: 4 May 2026

Download

Browse Figures

Versions Notes

Abstract

AI is now embedded in frontline service at scale, yet the design frameworks managers reach for were built around human agents and do not translate cleanly to systems that generate rather than retrieve, that automate rather than augment. This paper argues that three design challenges sit at the heart of the problem, though they are rarely treated as a connected set. Generative AI can produce fluent, confident outputs that are simply wrong, which is a qualitatively different kind of reliability failure from anything SERVQUAL was designed to address. AI can reply instantly while leaving the customer no closer to resolution, exposing a gap between speed and what we might call felt responsiveness. And it faces an awkward relational tension. Overclaiming warmth triggers distrust, yet there are genuine service contexts in which the non-human nature of the system is a feature rather than a liability. The RRR Design Framework developed here extends established service quality dimensions to the AI context, organising fifteen prescriptive design principles around reliability, responsiveness, and relational quality, each reconceptualised for AI-mediated service. The principles follow a prevent-and-recover logic within each dimension and are tied together by a single strategic proposition, which is to automate to protect relationships. Four empirically testable propositions are derived from the framework, each operationalised with measurable constructs, moderating conditions, and falsifiable null cases. The framework is most applicable to hybrid human-AI frontline systems where customers are actively working toward a resolution.

Keywords:

artificial intelligence; service design; customer experience; human-AI interaction; service quality; chatbots; service robots; trust calibration

1. Introduction

Large service organisations now use AI in customer-facing operations as a matter of course. Chatbots field insurance queries. Recommendation engines shape what customers see before they have quite finished forming a preference. Voice assistants handle first-contact triage in healthcare. Service robots deliver room orders in hotels. The question for most managers is no longer whether AI belongs in frontline service. That debate has largely settled. The real question is how to design it in a way that does not quietly erode what customers actually care about [1,2].

Service quality research is a reasonable starting point. Parasuraman et al. [3] gave the field reliability, responsiveness, and empathy as its core dimensions, and those dimensions have held up well; they capture something stable about what customers need from service encounters. The difficulty is that SERVQUAL was built for human-delivered service, and human-delivered service fails in different ways than AI-mediated service does. Knowing that reliability matters tells you almost nothing about what to do when a generative chatbot produces a confidently wrong answer. Knowing that responsiveness is important does not tell you how to hand off from a bot to a human agent without forcing the customer to start from scratch. And knowing that relational quality shapes experience still leaves open the thorny question of whether a chatbot should claim to understand how a customer feels, and if so, how much.

Three design challenges sit at the core of the problem. They are worth naming as a set precisely because they are not the same challenge wearing different clothes. First, AI outputs can be fluent and wrong simultaneously, a combination that older service quality frameworks had no reason to anticipate. Hannigan et al. [4] call this botshit, content that sounds plausible, reads smoothly, and happens to be fabricated. Second, AI can respond instantly while producing no actual progress for the customer. Speed and responsiveness turn out to measure different things. Third, and in some ways the most awkward, overclaiming emotional understanding or relational warmth triggers distrust when customers notice the mismatch [5,6], yet there are service contexts where the non-human nature of the system is precisely what customers want [7].

This paper develops a conceptual framework that extends and reconceptualises established service quality dimensions for AI-mediated service, rather than reporting an empirical test. It draws on recent work across service research, consumer psychology, human–computer interaction, and AI-enabled service design. Three contributions follow from this reconceptualisation. First, we try to be precise about what reliability, responsiveness, and relational quality actually require when AI rather than a human is doing the work. The definitions shift more than one might expect. Second, we identify three AI-specific failure modes that the service quality literature has not, to our knowledge, treated as a connected set, namely plausible error (fluent but wrong outputs), the illusion of responsiveness (fast replies that produce no real progress), and relational overclaim (claimed warmth that exceeds what the system can actually deliver). Third, we translate these into fifteen prescriptive design principles, organised as the RRR Design Framework, which extends established service quality categories to the AI context. The gap this reconceptualisation addresses is straightforward. Existing work has done a reasonable job of identifying what matters in AI-enabled service, but has offered less practical guidance on how to design against those findings.

A brief word on why these three dimensions rather than others. One could organise AI service design around trust, transparency, fairness, effort, control, or explainability, and any of those choices would yield a defensible framework. We chose reliability, responsiveness, and relational quality deliberately, though we hold that choice with some humility. Each maps onto a customer need that seems stable across service contexts. These are the need to trust that the system will do what it says, the need to feel genuine progress toward resolution rather than just activity, and the need to be treated as an adult with intelligence and dignity. Each also corresponds to a distinct failure type that calls for a different design response, which is what gives the framework its practical usefulness.

There is also the question of why other SERVQUAL dimensions, particularly tangibles and assurance, are not given the same treatment. The answer is not that they are irrelevant. Tangible elements of service, such as a clean hotel room or a well-designed mobile interface, remain important regardless of whether the service agent is human or artificial. But they are not transformed by AI mediation in the way that reliability, responsiveness, and relational quality are. The bathroom is still either clean or dirty, whether a robot delivered the towels or a person did. By contrast, the nature of reliability failure changes fundamentally when the service agent can generate fluent, confident, and fabricated content. The nature of responsiveness changes when a system can reply in milliseconds without producing any genuine progress. And the nature of relational quality changes when the service agent can simulate emotional understanding it does not possess. The RRR framework focuses on the dimensions that are most disrupted by AI, not on those that are most important in absolute terms.

Section 2, Section 3 and Section 4 take up each dimension in turn. Section 5 integrates the principles into the full framework. Section 6 draws out implications. Section 7 concludes.

2. Reliability in AI-Enabled Service

Reliability has the strongest claim to being the most fundamental dimension of service quality. Parasuraman et al. [3] ranked it first among the SERVQUAL dimensions, and the logic is intuitive enough. A service that cannot be counted on is not much of a service. In conventional settings, reliability means doing what was promised. The flight departs on schedule, the charge matches the order, and the delivery arrives as described. The design response is largely about process control, meaning standardised procedures, training staff, monitoring outcomes, and addressing failures when they occur.

AI changes this in a way that process control alone cannot fix. A generative chatbot responding to a query is not pulling a verified answer from a database. It is producing text from statistical patterns that are quite good at sounding correct, regardless of whether the answer is. This is botshit territory, as Hannigan et al. [4] put it. A travel chatbot can invent a cancellation policy that does not exist. A healthcare bot can deliver medically inaccurate guidance in prose that reads as though it came from a clinician. In most cases, customers cannot tell the difference.

This is a genuinely different category of reliability failure, not merely a harder version of the same problem. In traditional service, errors tend to be visible. The package is at the wrong address, the charge is for the wrong amount, and the booking is missing. In an AI-mediated service, the error can be invisible because the wrong answer arrives with exactly the same fluency and apparent confidence as the right one. Ferraro et al. [8] note that this fluency can work against customers. A smooth, well-structured response is harder to distrust than a hesitant one. The implication for design shifts accordingly. Rather than trying to prevent errors (which is difficult to guarantee in generative systems), the priority becomes making the system’s fallibility visible before the customer acts on a wrong answer.

These failures can be classified into several distinct categories, and the distinctions matter for design. Hallucination is the production of content that has no basis in the system’s training data or retrieved sources, as when a chatbot invents a cancellation policy that does not exist. Confident confabulation is closely related but distinct. The system produces an answer that sounds authoritative and draws on real patterns in its training data, but applies them incorrectly to the specific case. Context drift occurs when the system loses track of the conversation’s purpose over multiple turns, gradually shifting from the customer’s actual question to an adjacent but different topic. Inappropriate confidence occurs when the system presents uncertain or probabilistic information with the same assertive framing it uses for verified facts. Capability boundary violation occurs when the system attempts tasks that exceed its actual design scope, such as a general customer service bot attempting to provide medical or legal guidance. Data leakage occurs when the system inadvertently reveals information from other customer interactions or from its training data that should not be accessible in the current context. Each of these failures calls for a somewhat different design response, which is why the reliability dimension includes multiple complementary principles rather than a single solution.

Three design principles address this directly. The first, calibrated uncertainty communication, is probably the most counterintuitive. The organisational instinct is to project confidence, but the evidence points the other way. AI systems should signal their confidence level explicitly, making a genuine distinction between responses that draw on verified information and those that are inferred or generated. A system that says “this appears to be covered based on what I have, but I’d recommend confirming with your plan administrator” is, paradoxically, more trustworthy than one that asserts “your policy covers this procedure.” Lee and See [9] established the underlying logic. Appropriate trust in automation depends on users having honest information about when and how the system is likely to be wrong.

The second principle, conservative promise design, extends this to the claims the system makes more broadly. When uncertainty is high, AI systems should err toward narrower commitments rather than confident assertions. Dietvorst et al. [10] documented what they called algorithm aversion, where people abandon algorithms after seeing them err far more readily than they abandon humans who make equivalent mistakes. A system that overpromises and occasionally fails pays a disproportionate trust penalty. The complementary finding from Logg et al. [11] is that people are quite willing to follow algorithmic guidance when it is framed honestly. Expressed uncertainty, it turns out, protects trust rather than undermining it.

The third principle is retrieval-generation transparency. Many deployed AI systems combine retrieved information from verified sources with generated content, and customers have no way to distinguish between them. A financial advice system that labels market data as ‘retrieved’ and investment suggestions as ‘generated’ gives customers what they need to calibrate how much weight to put on each part of the response, which is a matter of respect as much as a design choice.

Work on AI transparency and explainability reinforces why these reliability principles matter. Doshi-Velez and Kim [12] argue that interpretability in machine learning systems is not a single property but a context-dependent requirement, and that the need for interpretability is greatest when the system’s decisions carry consequences that users cannot easily verify on their own. Service is exactly that kind of context. A customer receiving a generated insurance coverage answer cannot verify the output independently and is therefore dependent on the system being transparent about its own limitations. Liao and Vaughan [13] extend this argument to large language models specifically, noting that transparency for LLMs demands a human-centred perspective because different stakeholders seek different kinds of understanding in different contexts. For service design, this means that the form of uncertainty communication probably needs to be adapted to what the customer actually needs to know in that moment, not simply what the system is technically capable of reporting. Wachter, Mittelstadt, and Russell [14] make a complementary point from the regulatory side, arguing that meaningful explanations of automated decisions should focus on what would need to change to produce a different outcome rather than on exposing the full internal logic of the system. For service designers, the takeaway is simple. When a system cannot resolve a query, it should tell the customer what information or action would move things forward, not offer a technical account of why it failed.

Two further principles complete the reliability dimension. The first is graceful failure acknowledgment, which means saying clearly when the system cannot answer reliably and routing to someone who can, and output verification loops, which route high-stakes responses through human or automated review before they reach the customer.

The overarching design goal is not to make AI seem infallible, and that ship has probably already sailed, given how publicly AI errors have accumulated. It is to make fallibility legible, so customers can make informed decisions about when to act on what the system tells them. A system that flags its uncertainty and routes to a human when needed is, in any trust-relevant sense, more reliable than one that delivers the wrong answer with perfect confidence.

3. Responsiveness in AI-Enabled Service

Speed is the most visible thing AI brings to service. A chatbot responds in milliseconds. A recommendation engine surfaces results before the customer has quite finished deciding what they want. By the metrics most organisations track, AI looks maximally responsive, and in a narrow sense, it is.

But speed and responsiveness are not the same thing, and conflating them is one of the more consequential errors in current AI service deployments. Responsiveness, as customers actually experience it, means genuine progress toward what they came for. A system that replies instantly with “I’m sorry, I didn’t understand that. Could you rephrase?” is fast and completely unresponsive. A chatbot that returns the same FAQ answer regardless of how a question is framed is quick but fails on every attempt. Response time and resolution movement are different measurements of different things.

Dixon et al. [15] made the case for effort reduction as a loyalty driver using data from over 75,000 customer interactions, a finding that holds up and bears directly on what follows. Rawson et al. [16] extended it to the full customer journey, finding that cumulative effort erodes satisfaction even when individual touchpoints seem fine, which is roughly what happens when a customer endures a perfectly polite sequence of AI interactions that collectively produce nothing. We call the AI version of this the illusion of responsiveness. The system replies promptly, apologises warmly, and routes the customer in a circle, transferring between modules, re-asking for information already provided, or answering the surface question while missing what the customer needs.

The first design principle for this dimension is context preservation across channels and handoffs. It sounds obvious, but it is remarkably hard to find in practice. When a customer moves from chatbot to phone, or from one AI module to another, the system should carry the full interaction history forward. Ferraro et al. [8] name what they call the connected yet isolated paradox, where generative AI systems present as seamless but treat every channel transition as a blank slate. Having to re-explain the problem from scratch is among the highest-effort experiences in service [15], and the frustration is compounded when the customer knows perfectly well that the information existed and was simply lost.

The second principle is visible progress signalling, which means making explicit to the customer what is actually happening and what comes next, rather than just indicating that something is happening. “Your request is being processed” is almost useless. “I’ve identified this as a billing dispute on your March statement and I’m now checking the transaction records, which usually takes about 30 s” is not. Zierau et al. [17] found that voice bots produce meaningfully better customer experiences when interactions feel purposeful and directional. Circular exchanges and unexplained pauses, by contrast, reliably worsen outcomes.

Mariani, Hashemi, and Wirtz [18], in a systematic review of research on AI-powered conversational agents, identify a recurring pattern, namely that technically capable systems that fail to maintain conversational context across turns produce user experiences that feel fragmented and effortful, regardless of how quickly individual responses arrive. That is the architectural version of the illusion of responsiveness. Conversational coherence across the full interaction, not response latency at any single turn, is what drives the customer’s sense of genuine progress.

The third principle is low-friction escalation. There will always be cases that the automated system cannot resolve, and when that happens, there must be a clear, low-effort path to a human agent, one that carries the full interaction history so the agent picks up where the bot left off rather than starting blind. Larivière et al. [19] argued that technology in service should augment rather than replace human capability, and escalation design is probably where that argument faces its most direct practical test.

Ozuem et al. [20] conducted 47 in-depth interviews across four countries and found that customers navigating chatbot-led recovery experience a distinctive emotional journey shaped by their expectations of the chatbot, the scale of the service lapse, and whether the recovery process maintains what the authors call contextual coherence. When coherence breaks down, when customers are asked to re-explain problems or are routed through modules that do not share information, the recovery effort itself becomes a second service failure. Haupt et al. [21] add a useful design distinction. Chatbots that focus on concrete solutions outperform those that lead with empathetic language during service recovery, which fits the progress-first logic of the responsiveness principles proposed here.

Two further principles round out this dimension. The first is resolution-oriented metrics (measuring felt progress and actual resolution rates, not deflection rates and handle times), and loop detection with intervention (the system recognising when a customer is going in circles and escalating proactively, before the frustration becomes the dominant experience).

The underlying challenge is to build systems that optimise for whether the customer’s actual need moved toward resolution, not for whether the system responded promptly. That is a harder metric to construct, but it is the right one, and getting there requires rethinking measurement alongside system architecture rather than treating them as separate problems.

4. Relational Quality in AI-Enabled Service

The default instinct in AI service design is to add warmth. Make the chatbot friendlier. Give the voice assistant a more empathetic tone. Program the service robot to nod. Huang and Rust [22] chart an aspirational arc from emotion recognition through empathetic response to genuine emotional connection in their analysis of what they call feeling AI for customer care. The aspiration is not unreasonable. The distance between that aspiration and what current systems can actually deliver is also not small.

Read carefully, though, the evidence points somewhere quite different. Relational quality in AI-mediated service turns out to be less about warmth and more about something closer to dignity, meaning whether the system treats the customer as a person with intelligence, autonomy, and a legitimate claim to honest engagement. Three threads in the literature support this reading, and they reinforce each other.

The first is that making AI systems more human-like is riskier than it looks. Zhang et al. [6], drawing on a review of 118 studies on humanlike service robots, find that anthropomorphic features increase engagement in some contexts and provoke distrust in others. The effect is genuinely context-dependent. Kätsyri et al. [5] locate the mechanism more precisely. The uncanny valley effect is driven not by human-likeness itself but by perceptual mismatch, when features that are human and features that are not sit together inconsistently. A chatbot that says “I completely understand how frustrating this must be for you” but then cannot actually understand or resolve the problem creates exactly that kind of mismatch. The relational failure is not insufficient warmth. It is relational overclaim, which means presenting more than the system can deliver.

Zhang, Liang, and Wu [23] provide further empirical grounding through a qualitative study of user coping behaviours following chatbot-induced service failures. Their thematic analysis identifies ‘fake humanity’ as one of the most negatively received chatbot behaviours, occurring when systems project emotional understanding they cannot deliver. Participants described feeling manipulated rather than supported, which is precisely the relational overclaim dynamic that the calibrated anthropomorphism principle is designed to prevent.

The second thread concerns agency. Puntoni et al. [24] identify loss of personal control as among the most consequential costs of AI interaction, a finding that is easy to underweight when the system is technically producing acceptable outcomes. But when a system makes decisions without explaining them, restricts options without justification, or funnels customers into predetermined pathways, it undermines their sense of control even when the outcome is objectively fine. Liu-Thompkins et al. [25] add a related complication around artificial empathy. Recognising what a customer feels, without genuinely engaging with that feeling, tends to read as manipulative rather than caring. The design aim here is not to simulate caring but to behave in ways that respect autonomy, a more modest goal that is also considerably more achievable.

The third thread is perhaps the most counterintuitive, and it is worth dwelling on. There are service situations where the non-human nature of the AI is not a liability but a genuine asset. Holthöwer and van Doorn [7], across five studies, find that customers feel meaningfully less judged by robots than by humans in embarrassing service encounters, including buying sensitive products, disclosing financial difficulty, and discussing health problems. The freedom from social evaluation is real and valuable. Hermann et al. [26] extend this to vulnerable consumers more broadly, finding that AI can lower barriers to service access by removing the social anxiety that sometimes prevents people from asking for help at all. Critically, this benefit depends on the system not claiming to be human. It is the acknowledged non-humanness that creates the psychological safety, so designing it away is precisely the wrong move.

Strategic Non-Humanness as a Design Principle

Strategic non-humanness deserves more space than it received above, precisely because it pushes against what most people in AI service design take for granted, namely that more human-likeness is generally better.

The mechanism is straightforward. In service encounters involving embarrassment, stigma, or social sensitivity, a significant portion of the customer’s cognitive load goes to managing the social dynamics of the interaction rather than engaging with the service itself. Disclosing financial difficulty to a human advisor involves anticipating judgment, managing self-presentation, and processing the advisor’s verbal and nonverbal reactions. An AI system that is transparently non-human removes that layer of social processing, freeing the customer to focus on the substantive task.

The design implication is more specific than just being honest about being AI. It is that the non-human identity should be foregrounded as a feature of the service rather than apologised for or concealed. A health screening tool that opens with “Because I am not a person, nothing you share here involves social judgment” creates a different customer experience than one that opens with “I am a virtual assistant. How can I help?” The first frames the AI nature as part of the value proposition. The second treats it as a limitation that the customer is asked to accept. The same evidence on social evaluation that justifies the principle also constrains how it should be operationalised.

The boundary conditions matter. Strategic non-humanness is likely to be most beneficial when the encounter involves disclosure of stigmatised information, when the customer has a high need for social evaluation avoidance, and when the service task is primarily informational or transactional rather than deeply relational. It is likely to be least beneficial, and potentially counterproductive, when customers are in acute emotional distress and need a genuine human connection, or when the service requires complex negotiation that draws on social intelligence. The critical design requirement is that the non-human identity is presented honestly and as a feature of the service, not as a limitation that the system is trying to conceal.

Five design principles follow. Transparency about AI identity means disclosing clearly at the outset that the customer is not talking to a person. Calibrated anthropomorphism means matching humanlike features to actual system capabilities, never presenting relational warmth that the system cannot back up. Preserved agency means keeping options visible, explaining the system’s reasoning, and giving customers a genuine ability to override automated decisions, not merely the appearance of one. Strategic non-humanness means actively framing the AI nature of the agent as a feature rather than concealing it, particularly in encounters involving embarrassment, stigma, or sensitivity. Dignity-preserving defaults means designing interactions to treat customers as capable adults, rather than defaulting to patronising scripts and over-simplified assumptions about what people can handle.

The standard for relational quality in AI-mediated service is not whether the system makes customers feel warmly cared for. It is whether the system is honest enough about what it is, and candid enough about what it can do, to deserve their trust. That is a different bar. In some ways, it is a higher one.

Section 2, Section 3 and Section 4 have argued, separately, that each of the three SERVQUAL-era dimensions needs to be understood differently when AI is in the service loop. Table 1 consolidates that argument by setting the assumptions of pre-AI service design alongside the corresponding moves required in AI-enabled service systems. The contrast is not that the older categories are wrong. It is that they were specified for a kind of failure mode that AI does not have, and are silent on the kinds of failure modes AI introduces. The table is designed to make the gap visible at a glance, and to make explicit what the RRR framework asks designers to do differently.

5. The RRR Design Framework

The preceding three sections establish that reliability, responsiveness, and relational quality each need to be understood differently when AI is in the service loop. This section pulls the principles together. Table 2 presents all fifteen design principles with their associated design rules and illustrative applications. Figure 1 provides a visual representation of the framework’s structure.

Each principle in Table 2 derives from a specific conceptual claim developed in Section 2, Section 3 and Section 4, and it is worth making those connections explicit.

The logic behind the reliability principles runs as follows. Calibrated uncertainty communication grows out of the finding that generative AI produces fluent outputs regardless of accuracy [4], combined with Lee and See’s [9] evidence that appropriate trust depends on honest information about system limitations. Conservative promise design takes its rationale from Dietvorst et al.’s [10] demonstration that algorithm aversion is triggered disproportionately by confident errors, and from Logg et al.’s [11] complementary finding that expressed uncertainty protects rather than undermines trust. Retrieval-generation transparency addresses the specific architectural feature of modern AI systems that blend verified retrieved content with generated content, a distinction that matters because customers cannot otherwise calibrate the weight they should place on different parts of a response. Graceful failure acknowledgment is grounded in the broader argument that making fallibility legible is more protective of trust than projecting false confidence, and in Doshi-Velez and Kim’s [12] position that the need for interpretability is greatest when users cannot independently verify outputs. Output verification loops address high-stakes contexts where the cost of a plausible-but-wrong answer extends beyond a poor service experience, as in healthcare triage or financial advice.

In the responsiveness dimension, context preservation across channels responds to the specific problem Ferraro et al. [8] identified as the connected-yet-isolated paradox, reinforced by Dixon et al.’s [15] finding that re-explaining a problem from scratch is among the highest-effort service experiences. The rationale for visible progress signalling is grounded in the distinction between speed and felt responsiveness established in Section 3, and in Zierau et al.’s [17] evidence that purposeful, directional interactions produce meaningfully better outcomes than rapid but circular ones. Low-friction escalation draws on Larivière et al.’s [19] argument that technology in service should augment rather than replace human capability, applied to the specific moment when automated resolution fails. Resolution-oriented metrics follow from the observation that organisations currently optimise for deflection rates and handle time, neither of which measures whether the customer’s actual need moved toward resolution. Loop detection and intervention respond to the illusion of responsiveness concept, recognising that circular exchanges are among the most damaging forms of unresponsiveness precisely because they consume effort without producing progress.

The relational quality principles have a somewhat different character. Transparency about AI identity draws on the evidence about relational overclaim [6] and on the finding that acknowledged non-humanness is what creates the psychological safety benefits documented by Holthöwer and van Doorn [7]. Calibrated anthropomorphism is rooted in Kätsyri et al.’s [5] perceptual mismatch mechanism and the uncanny valley evidence, which shows that the relational failure is not insufficient warmth but the gap between claimed and actual relational capacity. Preserved agency responds to Puntoni et al.’s [24] identification of loss of personal control as a primary cost of AI interaction and Liu-Thompkins et al.’s [25] finding that artificial empathy reads as manipulative when it lacks genuine engagement. Strategic non-humanness, the principle whose derivation is most open to debate, builds on Holthöwer and van Doorn’s [7] five-study demonstration that customers feel less judged by robots in embarrassing encounters and Hermann et al.’s [26] extension to vulnerable consumers, with the critical insight that this benefit depends on the system not claiming to be human. Dignity-preserving defaults are grounded in the overarching argument in Section 4 that relational quality is better understood through the lens of dignity (treating customers as capable adults with intelligence and autonomy) than through the lens of warmth.

Within each dimension, the principles are organised around a prevent-and-recover structure that we think is more useful than a purely preventive framing. Some AI failures are genuinely hard to prevent, and recovery design tends to be where organisations systematically underinvest. In the reliability dimension, calibrated uncertainty communication and conservative promise design are primarily preventive, aimed at reducing trust failures before they occur. Graceful failure acknowledgment and output verification loops are restorative, addressing failures that have already happened.

In responsiveness, context preservation and visible progress signalling are preventive, while loop detection and low-friction escalation are recovery mechanisms for stalled journeys. In relational quality, transparency about AI identity and calibrated anthropomorphism work against relational overclaim before it occurs, while preserved agency and dignity-preserving defaults protect customers once interactions become more sensitive or difficult. The prevent-and-recover structure is what makes the fifteen principles feel like a system rather than a checklist these are presented in Table 2 while Table 3 contrasts service attributes in human-led frontline systems to AI-enabled frontline systems.

The Framework in Practice Through Illustrative Scenarios

Two scenarios illustrate how the framework operates as an integrated system rather than a collection of independent principles.

Scenario 1. Telecommunications billing dispute. A customer contacts a telecom provider’s AI assistant to dispute an unexpected charge on their monthly bill. The system identifies the query as a billing dispute and retrieves the relevant transaction records (retrieval-generation transparency). It reports what it found: “I can see a charge of $47.50 on your March statement linked to an international roaming add-on activated on 12 March. I am reasonably confident this is the source of the disputed charge, but I want to flag that I cannot confirm whether you personally activated this add-on or whether it was triggered automatically” (calibrated uncertainty communication). It then explains what it is doing next, “I am now checking whether this add-on was auto-activated by a network event, which typically takes about 20 s” (visible progress signalling). The system finds that the add-on was indeed triggered by a network-side roaming event rather than a customer action. It applies a credit and confirms, “I have applied a $47.50 credit to your next bill. If this does not resolve the issue or if the charge reappears, you can reach a billing specialist directly through this chat without re-explaining the situation” (low-friction escalation with context preservation). Throughout the interaction, the system identifies itself as an AI assistant and avoids simulating emotional understanding of the customer’s frustration (transparency about AI identity, calibrated anthropomorphism). Instead, it focuses on demonstrating competence and genuine progress toward resolution.

What makes this work is how the three dimensions reinforce each other. Reliability operates through accurate retrieval and transparent uncertainty, responsiveness through visible progress, relational quality through honest identity and competence over simulated empathy.

Scenario 2. Health screening in a sensitive context. A patient uses a hospital’s online screening tool to report symptoms that may be related to a sexually transmitted infection. The system opens by identifying itself, “I am an AI health screening assistant. I am not a clinician, and I will not make a diagnosis. My role is to help you describe your symptoms accurately so that the clinical team can prepare for your appointment” (transparency about AI identity, conservative promise design). It then adds, “Because I am not a person, nothing you share here involves social judgment. Many patients find it easier to describe sensitive symptoms to an automated system, and the information you provide will go directly to your care team” (strategic non-humanness). The system guides the patient through a structured symptom checklist, explaining at each step why the question is being asked (preserved agency). When the patient describes a symptom combination that the system’s confidence scoring flags as ambiguous, it says, “Based on what you have described, there are several possible explanations. I do not have enough information to narrow this down reliably, so I am flagging this for priority clinical review rather than suggesting a specific condition” (calibrated uncertainty communication, graceful failure acknowledgment). The completed screening summary is routed through a clinician review queue before being added to the patient’s file (output verification loops).

This scenario demonstrates the relational quality principles carrying the most weight. The non-human nature of the system is positioned as a feature rather than a limitation, and the reliability principles operate in the background to ensure that no unverified clinical suggestion reaches the patient.

Figure 2 maps both scenarios as step-by-step process flows, with each step colour-coded by which RRR dimension it primarily addresses. The diagram makes visible how the same governing proposition produces different principle combinations depending on the design challenge in front of the customer. Scenario 1 leans on reliability and responsiveness, with relational quality operating as a quiet background constraint. Scenario 2 leads with relational quality, then returns to reliability when the system reaches the limit of what it can confidently say. Together, they show the framework operating as a working design tool rather than as a static taxonomy.

The framework is held together by a strategic design proposition, which is to automate to protect relationships. This deliberately inverts how automation tends to be framed. Most organisations approach AI deployment as a cost-reduction mechanism, adding relational elements afterwards to soften the experience. What we are proposing instead is to treat the customer relationship as the governing design objective, asking for each deployment decision whether the AI is being used in ways that protect and strengthen that relationship or quietly erode it.

To move this from description to prescription, we propose a simple diagnostic. For any specific service touchpoint being considered for AI deployment, two questions should be asked. Does automation in this touchpoint preserve or enhance the customer’s sense that their problem is genuinely moving toward resolution? And does it preserve or enhance the customer’s sense of being treated as a person with intelligence and autonomy? If the answer to both is yes, the touchpoint is a candidate for automation. If the answer to either is no, the touchpoint should either retain human involvement or the AI interaction should be redesigned until both conditions are met. This is not a comprehensive implementation methodology, but it provides a minimum viability test that is concrete enough to apply to individual deployment decisions.

Under this logic, AI takes on tasks where it can be genuinely reliable. These include retrieving verified information, processing routine transactions, and providing structured first-pass triage. And it is designed from the outset to recognise the limits of that reliability and hand off to human judgment when those limits are reached.

Larivière et al. [27] offer useful support for this logic. Their analysis of AI-mediated customer experience shows that outcome quality depends heavily on whether the type of AI capability deployed matches what the customer needs in that interaction. When the match is poor, experience quality drops, sometimes quite sharply. Be honest about what each AI modality can and cannot do, and resist deploying it in situations where its limitations are likely to become the dominant feature of the customer’s experience.

The framework also speaks to the practical challenge of hybrid orchestration, specifically how AI and human agents work together within a single service system, which turns out to be considerably more complicated than technology-first framing tends to acknowledge. Getting it right requires not only technological integration (shared customer data, coherent handoff protocols, unified case management) but deliberate organisational choices about where human presence genuinely adds something and where it does not. Wirtz et al. [2] made the basic point that service robots and human employees will coexist rather than one replacing the other. The harder design question is what good coexistence actually looks like from the customer’s side. The escalation, context-preservation, and loop-detection principles offer architectural guidance here. The relational quality principles, particularly strategic non-humanness, address the more counterintuitive question of when not to insert a human agent.

The dependency between dimensions is more than additive. When reliability is weak, relational quality signals change meaning. A warm chatbot greeting in a system that regularly fabricates answers is not read as friendly. It is read as evasive. A personalised follow-up from a system that cannot preserve context across channels is not read as caring. It is read as theatre. This interpretive dynamic, where an unreliable foundation recolours the meaning of everything built on top of it, is what makes the sequencing of design investment matter. It also explains why the prevent-and-recover structure within each dimension matters practically.

Recovery design in the relational quality dimension only works when reliability and responsiveness have not already eroded the customer’s willingness to take the system’s gestures at face value. The implication for designers is that investing in relational quality features (empathetic tone, personalisation, continuity of context) without first securing reliable outputs and genuine responsiveness is likely to produce diminishing or even negative returns, because those features will be filtered through a deficit of credibility rather than building on a foundation of trust.

The fifteen principles are designed to work together, and the interactions between dimensions matter. Reliability without responsiveness gives you a system that answers accurately but cannot resolve anything. Responsiveness without relational quality gives you a system that resolves things efficiently but leaves customers feeling processed rather than helped. Relational quality without reliability gives you a system that is warm and untrustworthy, which may be the most damaging combination of all. The framework also offers a rough diagnosis when things go wrong. The question is whether the failure is primarily a trust problem, a progress problem, or a dignity problem? Each point leads to a different set of design responses, which is more useful than a generic mandate to improve the customer experience.

6. Discussion and Implications

The RRR Design Framework is designed for hybrid human-AI frontline service systems where customers are actively seeking resolution, guidance, or support. This includes chatbots, voice assistants, triage systems, and service robots operating in contexts such as telecommunications, financial services, healthcare screening, and retail support. The framework translates less directly to fully automated back-end processing where customers are not present as interactants, or to purely self-service systems where no human fallback exists by design. An important boundary condition arises in high-vulnerability, high-distress contexts such as crisis counselling, bereavement support, or acute mental health intervention. In these settings, customers may enter the encounter already depleted or highly vigilant, and relational safety may need to be established before the customer can engage productively with reliable information or responsive progress. The dependency between dimensions still holds in these settings (a caring system that delivers wrong information will still lose trust), but the sequence in which the design attention is allocated may need to shift, with relational attunement coming first so that customers can stay engaged long enough for reliability and responsiveness to take effect. Within its scope, the framework’s three dimensions are not equally weighted in every situation. Reliability principles carry the most weight in high-stakes domains. Responsiveness principles matter most in complex multi-touchpoint journeys. Relational quality principles are most consequential when the encounter is emotionally charged or involves vulnerability.

6.1. Theoretical Implications

Three things follow from this work that are worth spelling out for service quality theory. The first contribution is showing that the foundational dimensions identified by Parasuraman et al. [3], reliability, responsiveness, and relational quality, remain relevant for AI-mediated service but require substantial reconceptualisation. The customer needs underlying these dimensions are stable, but the design requirements for meeting them shift fundamentally when AI mediates the encounter. This is a more specific claim than the general observation that AI changes service. It identifies where the change bites and what kind of theoretical work is needed in response.

A second contribution is the identification of three AI-specific failure modes (plausible error, the illusion of responsiveness, and relational overclaim) treated as a connected set rather than independent problems. Existing work has addressed each of these individually. What we add is the observation that they interact. A system that is reliable but unresponsive gives accurate answers that do not resolve anything, while a system that is responsive but relationally overclaiming efficiently processes customers while eroding their dignity. The failure mode taxonomy provides a diagnostic structure that existing frameworks lack.

The third contribution, which we suspect is the most consequential, is to push back on the assumption (prevalent in both practice and parts of the literature) that adding warmth and anthropomorphic features is the default path to better AI service experiences. The evidence on strategic non-humaneness and relational overclaim points the other way. The design challenge is not to make AI more human-like but to make it more honest about what it is, and that honesty often produces better relational outcomes than simulated empathy.

6.2. Managerial Implications

For service managers, the most immediate implication is a shift in posture. The dominant mode of AI deployment right now is essentially reactive. Introduce the technology to reduce cost and accelerate response times, then troubleshoot the customer experience problems that emerge. The framework suggests an alternative, which is designing proactively against known failure modes before deployment, rather than discovering them through customer complaints after the fact.

The ‘automate to protect relationships’ proposition offers a practical decision heuristic. For each AI touchpoint, managers can apply two diagnostic questions. First, does automation in this touchpoint preserve or enhance the customer’s sense of progress toward resolution? Second, does it preserve or enhance the customer’s sense of dignity and honest engagement? If the answer to both is yes, automate. If the answer to either is no, retain human involvement or redesign the AI interaction.

There is also a measurement implication that organisations tend to resist. Deflection rates and average handle time need to give way, at least partly, to resolution rates, customer re-contact frequency within 48 h, and some measure of trust calibration, which is harder to operationalise but more honest about what is at stake.

One further measurement challenge deserves mention. When relational quality deteriorates, the most common customer response is not complaint but quiet disappearance. Customers who feel dismissed, processed, or stripped of agency tend not to escalate. They simply stop returning, reduce their engagement, or move to a competitor without explanation. This silent churn is the most costly form of relational failure precisely because it is invisible to standard satisfaction measurement.

Dissatisfied customers who feel their dignity was not respected are the least likely to respond to feedback surveys, which means that the organisations with the worst relational problems are also the ones least likely to detect them through conventional channels. Designing dignity-preserving defaults is therefore not only a matter of ethical service design. It is a retention mechanism that addresses a class of attrition that deflection rates, handle times, and even NPS scores systematically undercount.

For AI system designers, the framework translates customer experience requirements into technical specifications. Calibrated uncertainty communication requires confidence scoring in model outputs and threshold-based routing of responses that fall below acceptable confidence levels. Context preservation requires shared state management across service channels with consistent data schemas. Low-friction escalation requires integration between AI orchestration layers and human agent platforms, including automated case summarisation, so agents are not starting blind. Retrieval-generation transparency requires architectures that track the provenance of each element of a response, which is increasingly feasible with retrieval-augmented generation but is rarely prioritised when the design brief is focused on throughput.

6.3. Propositions

Four propositions follow from the framework that we think are amenable to empirical testing, though anyone who has tried to test propositions derived from conceptual work will know they often look crisper on paper than they turn out to be in the field. We present each with suggested measurement approaches, moderating conditions, and a falsifiable null case.

Proposition 1.

Calibrated uncertainty communication preserves customer trust more effectively than confident but occasionally wrong responses, particularly in high-ambiguity service tasks where customers cannot easily verify the answer themselves.

Trust can be measured using established scales such as the trust in automation scale developed by Lee and See [9] or adapted versions of interpersonal trust measures used in service research. Calibrated uncertainty communication can be operationalised as a design treatment in which AI responses include explicit confidence indicators compared with a control condition in which responses are delivered with uniform confidence. The moderating condition is task ambiguity. The effect should be strongest for genuinely uncertain queries (insurance coverage, medical triage) and weaker for routine factual queries (store hours, order tracking), where the answer is easily checked. The null case would be that calibrated uncertainty communication has no effect on trust, or reduces trust, because customers interpret expressed uncertainty as a signal of incompetence. An experimental design using scenario-based vignettes with randomised uncertainty framing, measured across high- and low-ambiguity service tasks, would provide an appropriate initial test.

Proposition 2.

Visible progress signalling increases perceived responsiveness more than raw response speed, especially when the issue is complex, and the customer is uncertain about what will happen next.

Perceived responsiveness can be measured using items adapted from SERVQUAL’s responsiveness dimension, supplemented with items capturing felt progress (for example, “I felt my issue was genuinely moving toward resolution”). Visible progress signalling can be operationalised as AI responses that include explicit status updates, next-step descriptions, and estimated timeframes, compared with a control condition providing equivalent speed but only generic acknowledgments. The moderating condition is issue complexity. Progress signalling should matter most for multi-step problems and less for simple single-step queries. The null case would be that progress signalling has no effect on perceived responsiveness, or that customers find it patronising. A between-subjects design comparing signalling versus no-signalling across simple and complex scenarios would be appropriate.

Proposition 3.

Transparency about AI identity improves perceived relational quality when customers are sensitive to relational overclaim, but may have smaller effects in routine low-stakes interactions where customers are primarily task-focused.

Perceived relational quality can be measured through items capturing perceived honesty, respect for intelligence, and a sense of dignity. The moderating condition is the interaction of stakes and emotional charge. The transparency effect should be strongest in contexts where customers have heightened sensitivity to authenticity (complaint handling, financial difficulty, health concerns) and weaker in routine transactions. The null case would be that AI identity disclosure has no effect on relational quality, or reduces it because customers associate AI with inferior service. A 2 × 2 design crossing AI disclosure with interaction context would provide a clean test.

Proposition 4.

Strategic non-humanness, framing the AI nature of the agent explicitly, enhances service experience in encounters involving embarrassment, stigma, or social sensitivity.

Service experience can be measured through scales capturing comfort, willingness to disclose information, and perceived freedom from social judgment. Strategic non-humanness can be operationalised as a design treatment in which the system explicitly frames its non-human nature as a feature compared with a neutral AI disclosure condition and a human agent control condition. The moderating condition is encounter sensitivity. The benefit should be strongest for stigmatised topics and weaker or absent in non-sensitive routine interactions. The null case would be that strategic non-humaneness framing has no effect, or backfires because customers perceive the framing as manipulative. This proposition builds directly on Holthöwer and van Doorn’s [7] experimental paradigm.

These propositions can in principle be tested across service contexts, customer segments, and AI modalities. Effect sizes are likely to vary, and negative or null findings would be just as theoretically informative as confirmatory ones.

6.4. Future Research Directions

The framework opens up questions that extend beyond the propositions above.

The interactions between dimensions deserve dedicated investigation. Ferraro et al. [8] identify paradoxes in generative AI services that suggest the tensions between dimensions may matter as much as the dimensions themselves. For instance, increasing reliability through explicit uncertainty communication may slow the interaction and reduce perceived responsiveness. Understanding these trade-offs empirically would strengthen the framework’s practical usefulness.

The framework currently treats context as a boundary condition rather than a design variable. Future work should examine how the relative weight of the three dimensions shifts across service contexts. Reliability principles are likely most critical in high-stakes domains such as healthcare and financial advice, responsiveness principles in complex multi-touchpoint journeys, and relational quality principles in emotionally charged or stigma-sensitive encounters.

Developing quantitative measures for the three dimensions as reconceptualised here is also a priority. Existing service quality measurement traditions (SERVQUAL and its variants) provide a starting point, but the specific constructs identified in the framework, particularly calibrated uncertainty communication, felt responsiveness (as distinct from speed), and relational overclaim, will require new scale development and validation.

Customer AI literacy is likely to moderate the effects predicted by the framework. A customer who is familiar with how generative AI works may have different expectations about uncertainty and different thresholds for relational overclaim than one encountering an AI-mediated service for the first time. How AI literacy interacts with the design principles proposed here is an open and consequential question.

The strategic non-humanness concept deserves considerably more theoretical development than this paper has been able to provide. The finding that non-human identity can be a design asset rather than a liability in specific service contexts is among the most counterintuitive results in the recent literature, and the boundary conditions, moderating mechanisms, and long-term effects remain largely unexplored.

There is also a broader theoretical implication worth naming, even if it cannot be fully resolved here. Parasuraman et al. [3] developed SERVQUAL for human-delivered service, and the categories of reliability, responsiveness, and relational quality that we build on are ultimately theirs. We think those categories remain useful. They are parsimonious, they are managerially legible, and they map onto customer needs that do not change just because the service agent is artificial. What changes substantially is what those categories actually require of a designer when AI mediates the encounter. The needs are stable. The design requirements for meeting them are not, and that gap is where most of the hard work in AI service design currently sits.

7. Conclusions

AI-enabled service is now an operational reality at scale, not a future scenario, not a pilot programme in select markets. But the design logic that most organisations have followed has been engineering logic, focused on optimising for speed, cost reduction, and deflection rates, and worrying about the customer experience later. The evidence reviewed here suggests that customers’ underlying needs are more stable than the technology serving them. They want to trust that the system is being straight with them. They want to feel that their problem is genuinely moving toward resolution. And they want to be treated as people with intelligence and autonomy, not as units of demand to be processed as efficiently as possible. Those three needs map onto reliability, responsiveness, and relational quality, each of which requires a different design approach when AI is the service agent.

The RRR Design Framework organises fifteen prescriptive design principles around those needs. In reliability, the key moves are signalling confidence levels honestly and narrowing claims when uncertainty is high, which runs against most organisations’ instinct to project confidence at all times. In responsiveness, the priorities are preserving context across channels and detecting when customers have gotten stuck in loops before the frustration becomes irreversible. In relational quality, the work is transparent about what the system actually is, and genuinely preserves the customer’s ability to understand and override what it does.

The proposition tying all fifteen together is to automate to protect relationships. Simple idea, but it cuts against how most AI service deployment is currently justified and measured. Systems that fail on trust, progress, or dignity tend eventually to fail on the efficiency metrics they were built to optimise. Customers disengage, re-contact at higher rates, or find a way around the system entirely to reach a human. The systems most likely to hold up over time will be those that are reliable enough to be trusted, responsive enough to produce genuine progress, and honest enough to treat the people they serve with the respect they are owed.

Author Contributions

Conceptualization, M.C. and O.C.; writing, original draft preparation, M.C. and O.C.; writing, review and editing, M.C. and O.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable. This article is a conceptual paper and does not involve human participants or identifiable personal data.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analysed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Huang, M.-H.; Rust, R.T. Artificial intelligence in service. J. Serv. Res. 2018, 21, 155–172. [Google Scholar] [CrossRef]
Wirtz, J.; Patterson, P.G.; Kunz, W.H.; Gruber, T.; Lu, V.N.; Paluch, S.; Martins, A. Brave new world: Service robots in the frontline. J. Serv. Manag. 2018, 29, 907–931. [Google Scholar] [CrossRef]
Parasuraman, A.; Zeithaml, V.A.; Berry, L.L. SERVQUAL: A multiple-item scale for measuring consumer perceptions of service quality. J. Retail. 1988, 64, 12–40. [Google Scholar]
Hannigan, T.R.; McCarthy, I.P.; Spicer, A. Beware of botshit: How to manage the epistemic risks of generative chatbots. Bus. Horiz. 2024, 67, 471–486. [Google Scholar] [CrossRef]
Kätsyri, J.; Förger, K.; Mäkäräinen, M.; Takala, T. A review of empirical evidence on different uncanny valley hypotheses: Support for perceptual mismatch as one road to the valley of eeriness. Front. Psychol. 2015, 6, 390. [Google Scholar] [CrossRef]
Zhang, W.; Slade, E.L.; Pantano, E. Humanlike service robots: A systematic literature review and research agenda. Psychol. Mark. 2024, 41, 3157–3181. [Google Scholar] [CrossRef]
Holthöwer, J.; van Doorn, J. Robots do not judge: Service robots can alleviate embarrassment in service encounters. J. Acad. Mark. Sci. 2023, 51, 767–784. [Google Scholar] [CrossRef]
Ferraro, C.; Demsar, V.; Sands, S.; Restrepo, M.; Campbell, C. The paradoxes of generative AI-enabled customer service: A guide for managers. Bus. Horiz. 2024, 67, 549–559. [Google Scholar] [CrossRef]
Lee, J.D.; See, K.A. Trust in automation: Designing for appropriate reliance. Hum. Factors 2004, 46, 50–80. [Google Scholar] [CrossRef]
Dietvorst, B.J.; Simmons, J.P.; Massey, C. Algorithm aversion: People erroneously avoid algorithms after seeing them err. J. Exp. Psychol. Gen. 2015, 144, 114–126. [Google Scholar] [CrossRef] [PubMed]
Logg, J.M.; Minson, J.A.; Moore, D.A. Algorithm appreciation: People prefer algorithmic to human judgment. Organ. Behav. Hum. Decis. Process. 2019, 151, 90–103. [Google Scholar] [CrossRef]
Doshi-Velez, F.; Kim, B. Towards a rigorous science of interpretable machine learning. arXiv 2017, arXiv:1702.08608. [Google Scholar] [CrossRef]
Liao, Q.V.; Wortman Vaughan, J. AI transparency in the age of LLMs: A human-centered research roadmap. Harv. Data Sci. Rev. 2024. [Google Scholar] [CrossRef]
Wachter, S.; Mittelstadt, B.; Russell, C. Counterfactual explanations without opening the black box: Automated decisions and the GDPR. Harv. J. Law Technol. 2017, 31, 841–887. [Google Scholar] [CrossRef]
Dixon, M.; Freeman, K.; Toman, N. Stop trying to delight your customers. Harv. Bus. Rev. 2010, 88, 116–122. [Google Scholar]
Rawson, A.; Duncan, E.; Jones, C. The truth about customer experience. Harv. Bus. Rev. 2013, 91, 90–98. [Google Scholar]
Zierau, N.; Hildebrand, C.; Bergner, A.; Busquet, F.; Schmitt, A.; Leimeister, J.M. Voice bots on the frontline: Voice-based interfaces enhance flow-like consumer experiences and boost service outcomes. J. Acad. Mark. Sci. 2023, 51, 823–842. [Google Scholar] [CrossRef]
Mariani, M.M.; Hashemi, N.; Wirtz, J. Artificial intelligence empowered conversational agents: A systematic literature review and research agenda. J. Bus. Res. 2023, 161, 113838. [Google Scholar] [CrossRef]
Larivière, B.; Bowen, D.; Andreassen, T.W.; Kunz, W.; Sirianni, N.J.; Voss, C.; Wünderlich, N.V.; De Keyser, A. “Service Encounter 2.0”: An investigation into the roles of technology, employees and customers. J. Bus. Res. 2017, 79, 238–246. [Google Scholar] [CrossRef]
Ozuem, W.; Ranfagni, S.; Willis, M.; Salvietti, G.; Howell, K. Exploring the relationship between chatbots, service failure recovery and customer loyalty: A frustration-aggression perspective. Psychol. Mark. 2024, 41, 2253–2273. [Google Scholar] [CrossRef]
Haupt, M.; Rozumowski, A.; Freidank, J.; Haas, A. Seeking empathy or suggesting a solution? Effects of chatbot messages on service failure recovery. Electron. Mark. 2023, 33, 56. [Google Scholar] [CrossRef]
Huang, M.-H.; Rust, R.T. The caring machine: Feeling AI for customer care. J. Mark. 2024, 88, 1–23. [Google Scholar] [CrossRef]
Zhang, R.W.; Liang, X.; Wu, S.-H. When chatbots fail: Exploring user coping following a chatbots-induced service failure. Inf. Technol. People 2024, 37, 175–195. [Google Scholar] [CrossRef]
Puntoni, S.; Reczek, R.W.; Giesler, M.; Botti, S. Consumers and artificial intelligence: An experiential perspective. J. Mark. 2021, 85, 131–151. [Google Scholar] [CrossRef]
Liu-Thompkins, Y.; Okazaki, S.; Li, H. Artificial empathy in marketing interactions: Bridging the human-AI gap in affective and social customer experience. J. Acad. Mark. Sci. 2022, 50, 1198–1218. [Google Scholar] [CrossRef]
Hermann, E.; Yalcin Williams, G.; Puntoni, S. Deploying artificial intelligence in services to AID vulnerable consumers. J. Acad. Mark. Sci. 2024, 52, 1431–1451. [Google Scholar] [CrossRef]
Larivière, B.; Verleye, K.; De Keyser, A.; Koerten, K.; Schmidt, A.L. The service robot customer experience (SR-CX): A matter of AI intelligences and customer service goals. J. Serv. Res. 2025, 28, 35–56. [Google Scholar] [CrossRef]

Figure 1. The RRR Design Framework: Fifteen prescriptive design principles organised by dimension and prevent-and-recover logic.

Figure 2. The two illustrative scenarios are mapped as step-by-step process flows, with each step colour-coded by RRR dimension to show how principles combine differently across design challenges.

Table 1. Existing service design assumptions and the corresponding RRR framework moves for AI-enabled service.

Design Issue	Existing Approach (SERVQUAL-Era Assumptions)	RRR Framework Move (AI-Enabled Service)
Nature of error	Errors are visible and bounded. Wrong address, wrong charge, missing booking. Reliability means reducing the rate of these visible errors.	Errors can be plausible, fluent, and invisible. Reliability means making fallibility legible to the customer through calibrated uncertainty communication and conservative promise design.
Confidence projection	Project confidence. Hesitation reads as incompetence. Trust is built through consistent assured delivery.	Signal uncertainty honestly. Hesitation reads as competence when the alternative is fluent fabrication. Trust is built through legible fallibility.
Speed and responsiveness	Speed approximates responsiveness. Faster reply times improve customer experience. Response time is the headline metric.	Speed and felt responsiveness are different things. Resolution movement is the headline metric. Visible progress signalling and loop detection address the gap.
Channel handoffs	Each channel is a separate touchpoint. Handoffs are operational handovers within a single human team.	Channels and AI modules are part of one customer journey. Context preservation and low-friction escalation prevent re-explanation effort.
Empathy and warmth	Empathy is a relational virtue. More warmth, more personalisation, more emotional attunement strengthen the customer relationship.	Claimed warmth that exceeds capability erodes trust. Calibrated anthropomorphism and dignity-preserving defaults replace simulated empathy.
Identity disclosure	Not applicable. The agent is human and identity is given.	Identity is a design choice. AI identity transparency is a baseline. Strategic non-humanness uses it as a value proposition in stigma-sensitive contexts.
Recovery design	Recovery is reactive. Apologise, refund, retain. Service recovery research focuses on what to do after a customer complains.	Recovery is architected into the system. Each dimension carries explicit prevent and recover principles, since AI failures often go unreported by the customer.
Governing logic	Automate to reduce cost. Add relational elements afterwards to soften the experience.	Automate to protect relationships. Customer relationship is the governing design objective. Cost reduction is downstream of that.

Table 2. The RRR design framework: Fifteen design principles for AI-enabled service systems.

Dimension	Design Principle	Design Rule	Illustrative Application
Reliability	Calibrated uncertainty communication	Signal confidence levels explicitly; distinguish verified from inferred content	Travel bot flags ‘verified policy’ vs. ‘based on typical practice’
	Conservative promise design	Under-promise relative to probabilistic outputs; narrow claims when uncertainty is high	Insurance bot says “this may be covered” rather than “this is covered” when confidence is below threshold
	Retrieval-generation transparency	Show which parts of a response come from verified sources and which are generated	Financial bot labels market data (retrieved) separately from investment suggestions (generated)
	Graceful failure acknowledgment	When the system cannot answer reliably, say so clearly and route to a qualified source	“I do not have enough information to answer accurately. Let me connect you with a specialist.”
	Output verification loops	Build human or automated review into high-stakes responses before delivery	Medical triage bot routes flagged symptom assessments through clinician review before sending
Responsiveness	Context preservation across channels	Carry the full interaction history across all channels and agent handoffs	Customer moving from chat to phone finds the agent already has the full case summary
	Visible progress signalling	Communicate explicitly what step is underway and what comes next	“I have verified your identity and located the order. Checking warehouse status now. About 30 s.”
	Low-friction escalation	Provide clear pathways to human agents that carry all accumulated context	One-click option that passes the full case summary and interaction history to the human agent
	Resolution-oriented metrics	Measure felt progress and actual resolution rates, not response speed or deflection	Track issue-resolved-in-single-session and customer re-contact within 48 h
	Loop detection and intervention	Identify when customers are cycling through unhelpful responses and intervene proactively	After two unsuccessful attempts on the same issue, automatically escalate with full context
Relational quality	Transparency about AI identity	Clearly disclose the AI nature of the interaction at the outset	“I am an AI assistant. I can help with common questions and connect you to a person for complex issues.”
	Calibrated anthropomorphism	Match humanlike features to actual system capabilities; never claim relational capacity the system cannot deliver	Voice assistant uses natural language but does not claim to feel or understand emotions
	Preserved agency	Keep options visible, explain AI reasoning, and allow customers to override automated decisions	Recommendation engine shows why it suggested this and offers to show different options
	Strategic non-humanness	In contexts involving embarrassment, vulnerability, or stigma, use the non-human nature of the agent as a design feature	Health screening bot framed explicitly as a non-judgmental AI for sensitive symptom disclosure
	Dignity-preserving defaults	Design interactions to respect customer intelligence and autonomy rather than assuming incompetence	System avoids patronising scripts, does not repeat obvious instructions, and adapts to demonstrated expertise

Table 3. Contrasting service attributes in human-led and AI-enabled frontline systems.

Service Attribute	Human-Led	AI-Enabled
Typical reliability failure	Visible errors (wrong charge, missed appointment)	Invisible errors (fluent but fabricated content)
Trust calibration	Customers read nonverbal cues, tone, hesitation	Customers have no natural cues for AI confidence levels
Responsiveness signal	Speed of human reply and demonstrated effort	Speed of system reply (which may mask lack of progress)
Context continuity	Depends on individual memory and case notes	Depends on system architecture and data integration
Escalation path	Ask for a manager or senior colleague	Often unclear, may require restarting the interaction
Relational register	Natural warmth, empathy, social intelligence	Simulated warmth risks relational overclaim
Social evaluation	Present in every human interaction	Absent in transparent AI interactions (a potential asset)
Failure attribution	Attributed to individual or organisation	Often attributed to ’the system’ or ‘technology’
Recovery expectations	Apology, acknowledgment, compensation	Functional resolution, honest acknowledgment of limits

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Colgate, M.; Colgate, O. Designing for Trust, Progress, and Dignity: A Conceptual Framework for Reliability, Responsiveness, and Relational Quality in AI-Enabled Service Systems. Information 2026, 17, 443. https://doi.org/10.3390/info17050443

AMA Style

Colgate M, Colgate O. Designing for Trust, Progress, and Dignity: A Conceptual Framework for Reliability, Responsiveness, and Relational Quality in AI-Enabled Service Systems. Information. 2026; 17(5):443. https://doi.org/10.3390/info17050443

Chicago/Turabian Style

Colgate, Mark, and Orla Colgate. 2026. "Designing for Trust, Progress, and Dignity: A Conceptual Framework for Reliability, Responsiveness, and Relational Quality in AI-Enabled Service Systems" Information 17, no. 5: 443. https://doi.org/10.3390/info17050443

APA Style

Colgate, M., & Colgate, O. (2026). Designing for Trust, Progress, and Dignity: A Conceptual Framework for Reliability, Responsiveness, and Relational Quality in AI-Enabled Service Systems. Information, 17(5), 443. https://doi.org/10.3390/info17050443

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Designing for Trust, Progress, and Dignity: A Conceptual Framework for Reliability, Responsiveness, and Relational Quality in AI-Enabled Service Systems

Abstract

1. Introduction

2. Reliability in AI-Enabled Service

3. Responsiveness in AI-Enabled Service

4. Relational Quality in AI-Enabled Service

Strategic Non-Humanness as a Design Principle

5. The RRR Design Framework

The Framework in Practice Through Illustrative Scenarios

6. Discussion and Implications

6.1. Theoretical Implications

6.2. Managerial Implications

6.3. Propositions

6.4. Future Research Directions

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI