1. Introduction
Large service organisations now use AI in customer-facing operations as a matter of course. Chatbots field insurance queries. Recommendation engines shape what customers see before they have quite finished forming a preference. Voice assistants handle first-contact triage in healthcare. Service robots deliver room orders in hotels. The question for most managers is no longer whether AI belongs in frontline service. That debate has largely settled. The real question is how to design it in a way that does not quietly erode what customers actually care about [
1,
2].
Service quality research is a reasonable starting point. Parasuraman et al. [
3] gave the field reliability, responsiveness, and empathy as its core dimensions, and those dimensions have held up well; they capture something stable about what customers need from service encounters. The difficulty is that SERVQUAL was built for human-delivered service, and human-delivered service fails in different ways than AI-mediated service does. Knowing that reliability matters tells you almost nothing about what to do when a generative chatbot produces a confidently wrong answer. Knowing that responsiveness is important does not tell you how to hand off from a bot to a human agent without forcing the customer to start from scratch. And knowing that relational quality shapes experience still leaves open the thorny question of whether a chatbot should claim to understand how a customer feels, and if so, how much.
Three design challenges sit at the core of the problem. They are worth naming as a set precisely because they are not the same challenge wearing different clothes. First, AI outputs can be fluent and wrong simultaneously, a combination that older service quality frameworks had no reason to anticipate. Hannigan et al. [
4] call this botshit, content that sounds plausible, reads smoothly, and happens to be fabricated. Second, AI can respond instantly while producing no actual progress for the customer. Speed and responsiveness turn out to measure different things. Third, and in some ways the most awkward, overclaiming emotional understanding or relational warmth triggers distrust when customers notice the mismatch [
5,
6], yet there are service contexts where the non-human nature of the system is precisely what customers want [
7].
This paper develops a conceptual framework that extends and reconceptualises established service quality dimensions for AI-mediated service, rather than reporting an empirical test. It draws on recent work across service research, consumer psychology, human–computer interaction, and AI-enabled service design. Three contributions follow from this reconceptualisation. First, we try to be precise about what reliability, responsiveness, and relational quality actually require when AI rather than a human is doing the work. The definitions shift more than one might expect. Second, we identify three AI-specific failure modes that the service quality literature has not, to our knowledge, treated as a connected set, namely plausible error (fluent but wrong outputs), the illusion of responsiveness (fast replies that produce no real progress), and relational overclaim (claimed warmth that exceeds what the system can actually deliver). Third, we translate these into fifteen prescriptive design principles, organised as the RRR Design Framework, which extends established service quality categories to the AI context. The gap this reconceptualisation addresses is straightforward. Existing work has done a reasonable job of identifying what matters in AI-enabled service, but has offered less practical guidance on how to design against those findings.
A brief word on why these three dimensions rather than others. One could organise AI service design around trust, transparency, fairness, effort, control, or explainability, and any of those choices would yield a defensible framework. We chose reliability, responsiveness, and relational quality deliberately, though we hold that choice with some humility. Each maps onto a customer need that seems stable across service contexts. These are the need to trust that the system will do what it says, the need to feel genuine progress toward resolution rather than just activity, and the need to be treated as an adult with intelligence and dignity. Each also corresponds to a distinct failure type that calls for a different design response, which is what gives the framework its practical usefulness.
There is also the question of why other SERVQUAL dimensions, particularly tangibles and assurance, are not given the same treatment. The answer is not that they are irrelevant. Tangible elements of service, such as a clean hotel room or a well-designed mobile interface, remain important regardless of whether the service agent is human or artificial. But they are not transformed by AI mediation in the way that reliability, responsiveness, and relational quality are. The bathroom is still either clean or dirty, whether a robot delivered the towels or a person did. By contrast, the nature of reliability failure changes fundamentally when the service agent can generate fluent, confident, and fabricated content. The nature of responsiveness changes when a system can reply in milliseconds without producing any genuine progress. And the nature of relational quality changes when the service agent can simulate emotional understanding it does not possess. The RRR framework focuses on the dimensions that are most disrupted by AI, not on those that are most important in absolute terms.
2. Reliability in AI-Enabled Service
Reliability has the strongest claim to being the most fundamental dimension of service quality. Parasuraman et al. [
3] ranked it first among the SERVQUAL dimensions, and the logic is intuitive enough. A service that cannot be counted on is not much of a service. In conventional settings, reliability means doing what was promised. The flight departs on schedule, the charge matches the order, and the delivery arrives as described. The design response is largely about process control, meaning standardised procedures, training staff, monitoring outcomes, and addressing failures when they occur.
AI changes this in a way that process control alone cannot fix. A generative chatbot responding to a query is not pulling a verified answer from a database. It is producing text from statistical patterns that are quite good at sounding correct, regardless of whether the answer is. This is botshit territory, as Hannigan et al. [
4] put it. A travel chatbot can invent a cancellation policy that does not exist. A healthcare bot can deliver medically inaccurate guidance in prose that reads as though it came from a clinician. In most cases, customers cannot tell the difference.
This is a genuinely different category of reliability failure, not merely a harder version of the same problem. In traditional service, errors tend to be visible. The package is at the wrong address, the charge is for the wrong amount, and the booking is missing. In an AI-mediated service, the error can be invisible because the wrong answer arrives with exactly the same fluency and apparent confidence as the right one. Ferraro et al. [
8] note that this fluency can work against customers. A smooth, well-structured response is harder to distrust than a hesitant one. The implication for design shifts accordingly. Rather than trying to prevent errors (which is difficult to guarantee in generative systems), the priority becomes making the system’s fallibility visible before the customer acts on a wrong answer.
These failures can be classified into several distinct categories, and the distinctions matter for design. Hallucination is the production of content that has no basis in the system’s training data or retrieved sources, as when a chatbot invents a cancellation policy that does not exist. Confident confabulation is closely related but distinct. The system produces an answer that sounds authoritative and draws on real patterns in its training data, but applies them incorrectly to the specific case. Context drift occurs when the system loses track of the conversation’s purpose over multiple turns, gradually shifting from the customer’s actual question to an adjacent but different topic. Inappropriate confidence occurs when the system presents uncertain or probabilistic information with the same assertive framing it uses for verified facts. Capability boundary violation occurs when the system attempts tasks that exceed its actual design scope, such as a general customer service bot attempting to provide medical or legal guidance. Data leakage occurs when the system inadvertently reveals information from other customer interactions or from its training data that should not be accessible in the current context. Each of these failures calls for a somewhat different design response, which is why the reliability dimension includes multiple complementary principles rather than a single solution.
Three design principles address this directly. The first, calibrated uncertainty communication, is probably the most counterintuitive. The organisational instinct is to project confidence, but the evidence points the other way. AI systems should signal their confidence level explicitly, making a genuine distinction between responses that draw on verified information and those that are inferred or generated. A system that says “this appears to be covered based on what I have, but I’d recommend confirming with your plan administrator” is, paradoxically, more trustworthy than one that asserts “your policy covers this procedure.” Lee and See [
9] established the underlying logic. Appropriate trust in automation depends on users having honest information about when and how the system is likely to be wrong.
The second principle, conservative promise design, extends this to the claims the system makes more broadly. When uncertainty is high, AI systems should err toward narrower commitments rather than confident assertions. Dietvorst et al. [
10] documented what they called algorithm aversion, where people abandon algorithms after seeing them err far more readily than they abandon humans who make equivalent mistakes. A system that overpromises and occasionally fails pays a disproportionate trust penalty. The complementary finding from Logg et al. [
11] is that people are quite willing to follow algorithmic guidance when it is framed honestly. Expressed uncertainty, it turns out, protects trust rather than undermining it.
The third principle is retrieval-generation transparency. Many deployed AI systems combine retrieved information from verified sources with generated content, and customers have no way to distinguish between them. A financial advice system that labels market data as ‘retrieved’ and investment suggestions as ‘generated’ gives customers what they need to calibrate how much weight to put on each part of the response, which is a matter of respect as much as a design choice.
Work on AI transparency and explainability reinforces why these reliability principles matter. Doshi-Velez and Kim [
12] argue that interpretability in machine learning systems is not a single property but a context-dependent requirement, and that the need for interpretability is greatest when the system’s decisions carry consequences that users cannot easily verify on their own. Service is exactly that kind of context. A customer receiving a generated insurance coverage answer cannot verify the output independently and is therefore dependent on the system being transparent about its own limitations. Liao and Vaughan [
13] extend this argument to large language models specifically, noting that transparency for LLMs demands a human-centred perspective because different stakeholders seek different kinds of understanding in different contexts. For service design, this means that the form of uncertainty communication probably needs to be adapted to what the customer actually needs to know in that moment, not simply what the system is technically capable of reporting. Wachter, Mittelstadt, and Russell [
14] make a complementary point from the regulatory side, arguing that meaningful explanations of automated decisions should focus on what would need to change to produce a different outcome rather than on exposing the full internal logic of the system. For service designers, the takeaway is simple. When a system cannot resolve a query, it should tell the customer what information or action would move things forward, not offer a technical account of why it failed.
Two further principles complete the reliability dimension. The first is graceful failure acknowledgment, which means saying clearly when the system cannot answer reliably and routing to someone who can, and output verification loops, which route high-stakes responses through human or automated review before they reach the customer.
The overarching design goal is not to make AI seem infallible, and that ship has probably already sailed, given how publicly AI errors have accumulated. It is to make fallibility legible, so customers can make informed decisions about when to act on what the system tells them. A system that flags its uncertainty and routes to a human when needed is, in any trust-relevant sense, more reliable than one that delivers the wrong answer with perfect confidence.
3. Responsiveness in AI-Enabled Service
Speed is the most visible thing AI brings to service. A chatbot responds in milliseconds. A recommendation engine surfaces results before the customer has quite finished deciding what they want. By the metrics most organisations track, AI looks maximally responsive, and in a narrow sense, it is.
But speed and responsiveness are not the same thing, and conflating them is one of the more consequential errors in current AI service deployments. Responsiveness, as customers actually experience it, means genuine progress toward what they came for. A system that replies instantly with “I’m sorry, I didn’t understand that. Could you rephrase?” is fast and completely unresponsive. A chatbot that returns the same FAQ answer regardless of how a question is framed is quick but fails on every attempt. Response time and resolution movement are different measurements of different things.
Dixon et al. [
15] made the case for effort reduction as a loyalty driver using data from over 75,000 customer interactions, a finding that holds up and bears directly on what follows. Rawson et al. [
16] extended it to the full customer journey, finding that cumulative effort erodes satisfaction even when individual touchpoints seem fine, which is roughly what happens when a customer endures a perfectly polite sequence of AI interactions that collectively produce nothing. We call the AI version of this the illusion of responsiveness. The system replies promptly, apologises warmly, and routes the customer in a circle, transferring between modules, re-asking for information already provided, or answering the surface question while missing what the customer needs.
The first design principle for this dimension is context preservation across channels and handoffs. It sounds obvious, but it is remarkably hard to find in practice. When a customer moves from chatbot to phone, or from one AI module to another, the system should carry the full interaction history forward. Ferraro et al. [
8] name what they call the connected yet isolated paradox, where generative AI systems present as seamless but treat every channel transition as a blank slate. Having to re-explain the problem from scratch is among the highest-effort experiences in service [
15], and the frustration is compounded when the customer knows perfectly well that the information existed and was simply lost.
The second principle is visible progress signalling, which means making explicit to the customer what is actually happening and what comes next, rather than just indicating that something is happening. “Your request is being processed” is almost useless. “I’ve identified this as a billing dispute on your March statement and I’m now checking the transaction records, which usually takes about 30 s” is not. Zierau et al. [
17] found that voice bots produce meaningfully better customer experiences when interactions feel purposeful and directional. Circular exchanges and unexplained pauses, by contrast, reliably worsen outcomes.
Mariani, Hashemi, and Wirtz [
18], in a systematic review of research on AI-powered conversational agents, identify a recurring pattern, namely that technically capable systems that fail to maintain conversational context across turns produce user experiences that feel fragmented and effortful, regardless of how quickly individual responses arrive. That is the architectural version of the illusion of responsiveness. Conversational coherence across the full interaction, not response latency at any single turn, is what drives the customer’s sense of genuine progress.
The third principle is low-friction escalation. There will always be cases that the automated system cannot resolve, and when that happens, there must be a clear, low-effort path to a human agent, one that carries the full interaction history so the agent picks up where the bot left off rather than starting blind. Larivière et al. [
19] argued that technology in service should augment rather than replace human capability, and escalation design is probably where that argument faces its most direct practical test.
Ozuem et al. [
20] conducted 47 in-depth interviews across four countries and found that customers navigating chatbot-led recovery experience a distinctive emotional journey shaped by their expectations of the chatbot, the scale of the service lapse, and whether the recovery process maintains what the authors call contextual coherence. When coherence breaks down, when customers are asked to re-explain problems or are routed through modules that do not share information, the recovery effort itself becomes a second service failure. Haupt et al. [
21] add a useful design distinction. Chatbots that focus on concrete solutions outperform those that lead with empathetic language during service recovery, which fits the progress-first logic of the responsiveness principles proposed here.
Two further principles round out this dimension. The first is resolution-oriented metrics (measuring felt progress and actual resolution rates, not deflection rates and handle times), and loop detection with intervention (the system recognising when a customer is going in circles and escalating proactively, before the frustration becomes the dominant experience).
The underlying challenge is to build systems that optimise for whether the customer’s actual need moved toward resolution, not for whether the system responded promptly. That is a harder metric to construct, but it is the right one, and getting there requires rethinking measurement alongside system architecture rather than treating them as separate problems.
4. Relational Quality in AI-Enabled Service
The default instinct in AI service design is to add warmth. Make the chatbot friendlier. Give the voice assistant a more empathetic tone. Program the service robot to nod. Huang and Rust [
22] chart an aspirational arc from emotion recognition through empathetic response to genuine emotional connection in their analysis of what they call feeling AI for customer care. The aspiration is not unreasonable. The distance between that aspiration and what current systems can actually deliver is also not small.
Read carefully, though, the evidence points somewhere quite different. Relational quality in AI-mediated service turns out to be less about warmth and more about something closer to dignity, meaning whether the system treats the customer as a person with intelligence, autonomy, and a legitimate claim to honest engagement. Three threads in the literature support this reading, and they reinforce each other.
The first is that making AI systems more human-like is riskier than it looks. Zhang et al. [
6], drawing on a review of 118 studies on humanlike service robots, find that anthropomorphic features increase engagement in some contexts and provoke distrust in others. The effect is genuinely context-dependent. Kätsyri et al. [
5] locate the mechanism more precisely. The uncanny valley effect is driven not by human-likeness itself but by perceptual mismatch, when features that are human and features that are not sit together inconsistently. A chatbot that says “I completely understand how frustrating this must be for you” but then cannot actually understand or resolve the problem creates exactly that kind of mismatch. The relational failure is not insufficient warmth. It is relational overclaim, which means presenting more than the system can deliver.
Zhang, Liang, and Wu [
23] provide further empirical grounding through a qualitative study of user coping behaviours following chatbot-induced service failures. Their thematic analysis identifies ‘fake humanity’ as one of the most negatively received chatbot behaviours, occurring when systems project emotional understanding they cannot deliver. Participants described feeling manipulated rather than supported, which is precisely the relational overclaim dynamic that the calibrated anthropomorphism principle is designed to prevent.
The second thread concerns agency. Puntoni et al. [
24] identify loss of personal control as among the most consequential costs of AI interaction, a finding that is easy to underweight when the system is technically producing acceptable outcomes. But when a system makes decisions without explaining them, restricts options without justification, or funnels customers into predetermined pathways, it undermines their sense of control even when the outcome is objectively fine. Liu-Thompkins et al. [
25] add a related complication around artificial empathy. Recognising what a customer feels, without genuinely engaging with that feeling, tends to read as manipulative rather than caring. The design aim here is not to simulate caring but to behave in ways that respect autonomy, a more modest goal that is also considerably more achievable.
The third thread is perhaps the most counterintuitive, and it is worth dwelling on. There are service situations where the non-human nature of the AI is not a liability but a genuine asset. Holthöwer and van Doorn [
7], across five studies, find that customers feel meaningfully less judged by robots than by humans in embarrassing service encounters, including buying sensitive products, disclosing financial difficulty, and discussing health problems. The freedom from social evaluation is real and valuable. Hermann et al. [
26] extend this to vulnerable consumers more broadly, finding that AI can lower barriers to service access by removing the social anxiety that sometimes prevents people from asking for help at all. Critically, this benefit depends on the system not claiming to be human. It is the acknowledged non-humanness that creates the psychological safety, so designing it away is precisely the wrong move.
Strategic Non-Humanness as a Design Principle
Strategic non-humanness deserves more space than it received above, precisely because it pushes against what most people in AI service design take for granted, namely that more human-likeness is generally better.
The mechanism is straightforward. In service encounters involving embarrassment, stigma, or social sensitivity, a significant portion of the customer’s cognitive load goes to managing the social dynamics of the interaction rather than engaging with the service itself. Disclosing financial difficulty to a human advisor involves anticipating judgment, managing self-presentation, and processing the advisor’s verbal and nonverbal reactions. An AI system that is transparently non-human removes that layer of social processing, freeing the customer to focus on the substantive task.
The design implication is more specific than just being honest about being AI. It is that the non-human identity should be foregrounded as a feature of the service rather than apologised for or concealed. A health screening tool that opens with “Because I am not a person, nothing you share here involves social judgment” creates a different customer experience than one that opens with “I am a virtual assistant. How can I help?” The first frames the AI nature as part of the value proposition. The second treats it as a limitation that the customer is asked to accept. The same evidence on social evaluation that justifies the principle also constrains how it should be operationalised.
The boundary conditions matter. Strategic non-humanness is likely to be most beneficial when the encounter involves disclosure of stigmatised information, when the customer has a high need for social evaluation avoidance, and when the service task is primarily informational or transactional rather than deeply relational. It is likely to be least beneficial, and potentially counterproductive, when customers are in acute emotional distress and need a genuine human connection, or when the service requires complex negotiation that draws on social intelligence. The critical design requirement is that the non-human identity is presented honestly and as a feature of the service, not as a limitation that the system is trying to conceal.
Five design principles follow. Transparency about AI identity means disclosing clearly at the outset that the customer is not talking to a person. Calibrated anthropomorphism means matching humanlike features to actual system capabilities, never presenting relational warmth that the system cannot back up. Preserved agency means keeping options visible, explaining the system’s reasoning, and giving customers a genuine ability to override automated decisions, not merely the appearance of one. Strategic non-humanness means actively framing the AI nature of the agent as a feature rather than concealing it, particularly in encounters involving embarrassment, stigma, or sensitivity. Dignity-preserving defaults means designing interactions to treat customers as capable adults, rather than defaulting to patronising scripts and over-simplified assumptions about what people can handle.
The standard for relational quality in AI-mediated service is not whether the system makes customers feel warmly cared for. It is whether the system is honest enough about what it is, and candid enough about what it can do, to deserve their trust. That is a different bar. In some ways, it is a higher one.
Section 2,
Section 3 and
Section 4 have argued, separately, that each of the three SERVQUAL-era dimensions needs to be understood differently when AI is in the service loop.
Table 1 consolidates that argument by setting the assumptions of pre-AI service design alongside the corresponding moves required in AI-enabled service systems. The contrast is not that the older categories are wrong. It is that they were specified for a kind of failure mode that AI does not have, and are silent on the kinds of failure modes AI introduces. The table is designed to make the gap visible at a glance, and to make explicit what the RRR framework asks designers to do differently.
5. The RRR Design Framework
The preceding three sections establish that reliability, responsiveness, and relational quality each need to be understood differently when AI is in the service loop. This section pulls the principles together.
Table 2 presents all fifteen design principles with their associated design rules and illustrative applications.
Figure 1 provides a visual representation of the framework’s structure.
The logic behind the reliability principles runs as follows. Calibrated uncertainty communication grows out of the finding that generative AI produces fluent outputs regardless of accuracy [
4], combined with Lee and See’s [
9] evidence that appropriate trust depends on honest information about system limitations. Conservative promise design takes its rationale from Dietvorst et al.’s [
10] demonstration that algorithm aversion is triggered disproportionately by confident errors, and from Logg et al.’s [
11] complementary finding that expressed uncertainty protects rather than undermines trust. Retrieval-generation transparency addresses the specific architectural feature of modern AI systems that blend verified retrieved content with generated content, a distinction that matters because customers cannot otherwise calibrate the weight they should place on different parts of a response. Graceful failure acknowledgment is grounded in the broader argument that making fallibility legible is more protective of trust than projecting false confidence, and in Doshi-Velez and Kim’s [
12] position that the need for interpretability is greatest when users cannot independently verify outputs. Output verification loops address high-stakes contexts where the cost of a plausible-but-wrong answer extends beyond a poor service experience, as in healthcare triage or financial advice.
In the responsiveness dimension, context preservation across channels responds to the specific problem Ferraro et al. [
8] identified as the connected-yet-isolated paradox, reinforced by Dixon et al.’s [
15] finding that re-explaining a problem from scratch is among the highest-effort service experiences. The rationale for visible progress signalling is grounded in the distinction between speed and felt responsiveness established in
Section 3, and in Zierau et al.’s [
17] evidence that purposeful, directional interactions produce meaningfully better outcomes than rapid but circular ones. Low-friction escalation draws on Larivière et al.’s [
19] argument that technology in service should augment rather than replace human capability, applied to the specific moment when automated resolution fails. Resolution-oriented metrics follow from the observation that organisations currently optimise for deflection rates and handle time, neither of which measures whether the customer’s actual need moved toward resolution. Loop detection and intervention respond to the illusion of responsiveness concept, recognising that circular exchanges are among the most damaging forms of unresponsiveness precisely because they consume effort without producing progress.
The relational quality principles have a somewhat different character. Transparency about AI identity draws on the evidence about relational overclaim [
6] and on the finding that acknowledged non-humanness is what creates the psychological safety benefits documented by Holthöwer and van Doorn [
7]. Calibrated anthropomorphism is rooted in Kätsyri et al.’s [
5] perceptual mismatch mechanism and the uncanny valley evidence, which shows that the relational failure is not insufficient warmth but the gap between claimed and actual relational capacity. Preserved agency responds to Puntoni et al.’s [
24] identification of loss of personal control as a primary cost of AI interaction and Liu-Thompkins et al.’s [
25] finding that artificial empathy reads as manipulative when it lacks genuine engagement. Strategic non-humanness, the principle whose derivation is most open to debate, builds on Holthöwer and van Doorn’s [
7] five-study demonstration that customers feel less judged by robots in embarrassing encounters and Hermann et al.’s [
26] extension to vulnerable consumers, with the critical insight that this benefit depends on the system not claiming to be human. Dignity-preserving defaults are grounded in the overarching argument in
Section 4 that relational quality is better understood through the lens of dignity (treating customers as capable adults with intelligence and autonomy) than through the lens of warmth.
Within each dimension, the principles are organised around a prevent-and-recover structure that we think is more useful than a purely preventive framing. Some AI failures are genuinely hard to prevent, and recovery design tends to be where organisations systematically underinvest. In the reliability dimension, calibrated uncertainty communication and conservative promise design are primarily preventive, aimed at reducing trust failures before they occur. Graceful failure acknowledgment and output verification loops are restorative, addressing failures that have already happened.
In responsiveness, context preservation and visible progress signalling are preventive, while loop detection and low-friction escalation are recovery mechanisms for stalled journeys. In relational quality, transparency about AI identity and calibrated anthropomorphism work against relational overclaim before it occurs, while preserved agency and dignity-preserving defaults protect customers once interactions become more sensitive or difficult. The prevent-and-recover structure is what makes the fifteen principles feel like a system rather than a checklist these are presented in
Table 2 while
Table 3 contrasts service attributes in human-led frontline systems to AI-enabled frontline systems.
The Framework in Practice Through Illustrative Scenarios
Two scenarios illustrate how the framework operates as an integrated system rather than a collection of independent principles.
Scenario 1. Telecommunications billing dispute. A customer contacts a telecom provider’s AI assistant to dispute an unexpected charge on their monthly bill. The system identifies the query as a billing dispute and retrieves the relevant transaction records (retrieval-generation transparency). It reports what it found: “I can see a charge of $47.50 on your March statement linked to an international roaming add-on activated on 12 March. I am reasonably confident this is the source of the disputed charge, but I want to flag that I cannot confirm whether you personally activated this add-on or whether it was triggered automatically” (calibrated uncertainty communication). It then explains what it is doing next, “I am now checking whether this add-on was auto-activated by a network event, which typically takes about 20 s” (visible progress signalling). The system finds that the add-on was indeed triggered by a network-side roaming event rather than a customer action. It applies a credit and confirms, “I have applied a $47.50 credit to your next bill. If this does not resolve the issue or if the charge reappears, you can reach a billing specialist directly through this chat without re-explaining the situation” (low-friction escalation with context preservation). Throughout the interaction, the system identifies itself as an AI assistant and avoids simulating emotional understanding of the customer’s frustration (transparency about AI identity, calibrated anthropomorphism). Instead, it focuses on demonstrating competence and genuine progress toward resolution.
What makes this work is how the three dimensions reinforce each other. Reliability operates through accurate retrieval and transparent uncertainty, responsiveness through visible progress, relational quality through honest identity and competence over simulated empathy.
Scenario 2. Health screening in a sensitive context. A patient uses a hospital’s online screening tool to report symptoms that may be related to a sexually transmitted infection. The system opens by identifying itself, “I am an AI health screening assistant. I am not a clinician, and I will not make a diagnosis. My role is to help you describe your symptoms accurately so that the clinical team can prepare for your appointment” (transparency about AI identity, conservative promise design). It then adds, “Because I am not a person, nothing you share here involves social judgment. Many patients find it easier to describe sensitive symptoms to an automated system, and the information you provide will go directly to your care team” (strategic non-humanness). The system guides the patient through a structured symptom checklist, explaining at each step why the question is being asked (preserved agency). When the patient describes a symptom combination that the system’s confidence scoring flags as ambiguous, it says, “Based on what you have described, there are several possible explanations. I do not have enough information to narrow this down reliably, so I am flagging this for priority clinical review rather than suggesting a specific condition” (calibrated uncertainty communication, graceful failure acknowledgment). The completed screening summary is routed through a clinician review queue before being added to the patient’s file (output verification loops).
This scenario demonstrates the relational quality principles carrying the most weight. The non-human nature of the system is positioned as a feature rather than a limitation, and the reliability principles operate in the background to ensure that no unverified clinical suggestion reaches the patient.
Figure 2 maps both scenarios as step-by-step process flows, with each step colour-coded by which RRR dimension it primarily addresses. The diagram makes visible how the same governing proposition produces different principle combinations depending on the design challenge in front of the customer. Scenario 1 leans on reliability and responsiveness, with relational quality operating as a quiet background constraint. Scenario 2 leads with relational quality, then returns to reliability when the system reaches the limit of what it can confidently say. Together, they show the framework operating as a working design tool rather than as a static taxonomy.
The framework is held together by a strategic design proposition, which is to automate to protect relationships. This deliberately inverts how automation tends to be framed. Most organisations approach AI deployment as a cost-reduction mechanism, adding relational elements afterwards to soften the experience. What we are proposing instead is to treat the customer relationship as the governing design objective, asking for each deployment decision whether the AI is being used in ways that protect and strengthen that relationship or quietly erode it.
To move this from description to prescription, we propose a simple diagnostic. For any specific service touchpoint being considered for AI deployment, two questions should be asked. Does automation in this touchpoint preserve or enhance the customer’s sense that their problem is genuinely moving toward resolution? And does it preserve or enhance the customer’s sense of being treated as a person with intelligence and autonomy? If the answer to both is yes, the touchpoint is a candidate for automation. If the answer to either is no, the touchpoint should either retain human involvement or the AI interaction should be redesigned until both conditions are met. This is not a comprehensive implementation methodology, but it provides a minimum viability test that is concrete enough to apply to individual deployment decisions.
Under this logic, AI takes on tasks where it can be genuinely reliable. These include retrieving verified information, processing routine transactions, and providing structured first-pass triage. And it is designed from the outset to recognise the limits of that reliability and hand off to human judgment when those limits are reached.
Larivière et al. [
27] offer useful support for this logic. Their analysis of AI-mediated customer experience shows that outcome quality depends heavily on whether the type of AI capability deployed matches what the customer needs in that interaction. When the match is poor, experience quality drops, sometimes quite sharply. Be honest about what each AI modality can and cannot do, and resist deploying it in situations where its limitations are likely to become the dominant feature of the customer’s experience.
The framework also speaks to the practical challenge of hybrid orchestration, specifically how AI and human agents work together within a single service system, which turns out to be considerably more complicated than technology-first framing tends to acknowledge. Getting it right requires not only technological integration (shared customer data, coherent handoff protocols, unified case management) but deliberate organisational choices about where human presence genuinely adds something and where it does not. Wirtz et al. [
2] made the basic point that service robots and human employees will coexist rather than one replacing the other. The harder design question is what good coexistence actually looks like from the customer’s side. The escalation, context-preservation, and loop-detection principles offer architectural guidance here. The relational quality principles, particularly strategic non-humanness, address the more counterintuitive question of when not to insert a human agent.
The dependency between dimensions is more than additive. When reliability is weak, relational quality signals change meaning. A warm chatbot greeting in a system that regularly fabricates answers is not read as friendly. It is read as evasive. A personalised follow-up from a system that cannot preserve context across channels is not read as caring. It is read as theatre. This interpretive dynamic, where an unreliable foundation recolours the meaning of everything built on top of it, is what makes the sequencing of design investment matter. It also explains why the prevent-and-recover structure within each dimension matters practically.
Recovery design in the relational quality dimension only works when reliability and responsiveness have not already eroded the customer’s willingness to take the system’s gestures at face value. The implication for designers is that investing in relational quality features (empathetic tone, personalisation, continuity of context) without first securing reliable outputs and genuine responsiveness is likely to produce diminishing or even negative returns, because those features will be filtered through a deficit of credibility rather than building on a foundation of trust.
The fifteen principles are designed to work together, and the interactions between dimensions matter. Reliability without responsiveness gives you a system that answers accurately but cannot resolve anything. Responsiveness without relational quality gives you a system that resolves things efficiently but leaves customers feeling processed rather than helped. Relational quality without reliability gives you a system that is warm and untrustworthy, which may be the most damaging combination of all. The framework also offers a rough diagnosis when things go wrong. The question is whether the failure is primarily a trust problem, a progress problem, or a dignity problem? Each point leads to a different set of design responses, which is more useful than a generic mandate to improve the customer experience.
6. Discussion and Implications
The RRR Design Framework is designed for hybrid human-AI frontline service systems where customers are actively seeking resolution, guidance, or support. This includes chatbots, voice assistants, triage systems, and service robots operating in contexts such as telecommunications, financial services, healthcare screening, and retail support. The framework translates less directly to fully automated back-end processing where customers are not present as interactants, or to purely self-service systems where no human fallback exists by design. An important boundary condition arises in high-vulnerability, high-distress contexts such as crisis counselling, bereavement support, or acute mental health intervention. In these settings, customers may enter the encounter already depleted or highly vigilant, and relational safety may need to be established before the customer can engage productively with reliable information or responsive progress. The dependency between dimensions still holds in these settings (a caring system that delivers wrong information will still lose trust), but the sequence in which the design attention is allocated may need to shift, with relational attunement coming first so that customers can stay engaged long enough for reliability and responsiveness to take effect. Within its scope, the framework’s three dimensions are not equally weighted in every situation. Reliability principles carry the most weight in high-stakes domains. Responsiveness principles matter most in complex multi-touchpoint journeys. Relational quality principles are most consequential when the encounter is emotionally charged or involves vulnerability.
6.2. Managerial Implications
For service managers, the most immediate implication is a shift in posture. The dominant mode of AI deployment right now is essentially reactive. Introduce the technology to reduce cost and accelerate response times, then troubleshoot the customer experience problems that emerge. The framework suggests an alternative, which is designing proactively against known failure modes before deployment, rather than discovering them through customer complaints after the fact.
The ‘automate to protect relationships’ proposition offers a practical decision heuristic. For each AI touchpoint, managers can apply two diagnostic questions. First, does automation in this touchpoint preserve or enhance the customer’s sense of progress toward resolution? Second, does it preserve or enhance the customer’s sense of dignity and honest engagement? If the answer to both is yes, automate. If the answer to either is no, retain human involvement or redesign the AI interaction.
There is also a measurement implication that organisations tend to resist. Deflection rates and average handle time need to give way, at least partly, to resolution rates, customer re-contact frequency within 48 h, and some measure of trust calibration, which is harder to operationalise but more honest about what is at stake.
One further measurement challenge deserves mention. When relational quality deteriorates, the most common customer response is not complaint but quiet disappearance. Customers who feel dismissed, processed, or stripped of agency tend not to escalate. They simply stop returning, reduce their engagement, or move to a competitor without explanation. This silent churn is the most costly form of relational failure precisely because it is invisible to standard satisfaction measurement.
Dissatisfied customers who feel their dignity was not respected are the least likely to respond to feedback surveys, which means that the organisations with the worst relational problems are also the ones least likely to detect them through conventional channels. Designing dignity-preserving defaults is therefore not only a matter of ethical service design. It is a retention mechanism that addresses a class of attrition that deflection rates, handle times, and even NPS scores systematically undercount.
For AI system designers, the framework translates customer experience requirements into technical specifications. Calibrated uncertainty communication requires confidence scoring in model outputs and threshold-based routing of responses that fall below acceptable confidence levels. Context preservation requires shared state management across service channels with consistent data schemas. Low-friction escalation requires integration between AI orchestration layers and human agent platforms, including automated case summarisation, so agents are not starting blind. Retrieval-generation transparency requires architectures that track the provenance of each element of a response, which is increasingly feasible with retrieval-augmented generation but is rarely prioritised when the design brief is focused on throughput.
6.3. Propositions
Four propositions follow from the framework that we think are amenable to empirical testing, though anyone who has tried to test propositions derived from conceptual work will know they often look crisper on paper than they turn out to be in the field. We present each with suggested measurement approaches, moderating conditions, and a falsifiable null case.
Proposition 1.
Calibrated uncertainty communication preserves customer trust more effectively than confident but occasionally wrong responses, particularly in high-ambiguity service tasks where customers cannot easily verify the answer themselves.
Trust can be measured using established scales such as the trust in automation scale developed by Lee and See [
9] or adapted versions of interpersonal trust measures used in service research. Calibrated uncertainty communication can be operationalised as a design treatment in which AI responses include explicit confidence indicators compared with a control condition in which responses are delivered with uniform confidence. The moderating condition is task ambiguity. The effect should be strongest for genuinely uncertain queries (insurance coverage, medical triage) and weaker for routine factual queries (store hours, order tracking), where the answer is easily checked. The null case would be that calibrated uncertainty communication has no effect on trust, or reduces trust, because customers interpret expressed uncertainty as a signal of incompetence. An experimental design using scenario-based vignettes with randomised uncertainty framing, measured across high- and low-ambiguity service tasks, would provide an appropriate initial test.
Proposition 2.
Visible progress signalling increases perceived responsiveness more than raw response speed, especially when the issue is complex, and the customer is uncertain about what will happen next.
Perceived responsiveness can be measured using items adapted from SERVQUAL’s responsiveness dimension, supplemented with items capturing felt progress (for example, “I felt my issue was genuinely moving toward resolution”). Visible progress signalling can be operationalised as AI responses that include explicit status updates, next-step descriptions, and estimated timeframes, compared with a control condition providing equivalent speed but only generic acknowledgments. The moderating condition is issue complexity. Progress signalling should matter most for multi-step problems and less for simple single-step queries. The null case would be that progress signalling has no effect on perceived responsiveness, or that customers find it patronising. A between-subjects design comparing signalling versus no-signalling across simple and complex scenarios would be appropriate.
Proposition 3.
Transparency about AI identity improves perceived relational quality when customers are sensitive to relational overclaim, but may have smaller effects in routine low-stakes interactions where customers are primarily task-focused.
Perceived relational quality can be measured through items capturing perceived honesty, respect for intelligence, and a sense of dignity. The moderating condition is the interaction of stakes and emotional charge. The transparency effect should be strongest in contexts where customers have heightened sensitivity to authenticity (complaint handling, financial difficulty, health concerns) and weaker in routine transactions. The null case would be that AI identity disclosure has no effect on relational quality, or reduces it because customers associate AI with inferior service. A 2 × 2 design crossing AI disclosure with interaction context would provide a clean test.
Proposition 4.
Strategic non-humanness, framing the AI nature of the agent explicitly, enhances service experience in encounters involving embarrassment, stigma, or social sensitivity.
Service experience can be measured through scales capturing comfort, willingness to disclose information, and perceived freedom from social judgment. Strategic non-humanness can be operationalised as a design treatment in which the system explicitly frames its non-human nature as a feature compared with a neutral AI disclosure condition and a human agent control condition. The moderating condition is encounter sensitivity. The benefit should be strongest for stigmatised topics and weaker or absent in non-sensitive routine interactions. The null case would be that strategic non-humaneness framing has no effect, or backfires because customers perceive the framing as manipulative. This proposition builds directly on Holthöwer and van Doorn’s [
7] experimental paradigm.
These propositions can in principle be tested across service contexts, customer segments, and AI modalities. Effect sizes are likely to vary, and negative or null findings would be just as theoretically informative as confirmatory ones.