1. Introduction
User-centered design (UCD) is an iterative design approach that places end-users at the core of the development process [
1]. Designers traditionally gather user needs, create prototypes, collect feedback, and refine solutions in iterations. While the definition of UCD has shifted since its conception, its philosophy still focuses on incorporating the perspective of the user throughout the design process to understand the cognitive factors present in a given interaction [
2]. This design process is often carried out in an iterative fashion, where designers consult with the users between each iteration in the form of a user study until the desired usability objectives are met [
3]. During the user study, multiple design decisions are made by interdisciplinary teams to create the final design based on the needs of the user.
Building on the iterative nature of UCD, agile methodologies naturally emerged as a way to streamline rapid prototyping and foster continuous user engagement [
4]. Tools such as Figma—a cloud-based design tool that facilitates real-time collaboration and rapid prototyping—and Sketch—a vector-based design tool widely used for UI/UX design and prototyping—have democratized this process by offering intuitive platforms to craft high-fidelity prototypes that mimic interactive behaviors and visualize potential designs. These advancements have simplified the design process, making it more accessible to a broader audience and facilitating a more inclusive approach to product development [
5].
Generative AI is increasingly integrated into this process, although its adoption varies across contexts. UX professionals use AI primarily for individual tasks like content drafting, with limited team-wide integration or formal policies [
6]. Designers’ perceptions of AI range from seeing it as a threat to viewing it as a creative partner, with metacognitive skills playing a key role in successful adoption [
7]. Tools like PromptInfuser demonstrate how LLMs can accelerate prototyping by binding UI elements to AI-generated content [
8]. Meanwhile, multimodal AI systems such as ProtoDreamer bridge physical and digital prototyping, enhancing creativity and iteration speed [
9]. In education, text-to-image models support early-stage ideation, encouraging divergent thinking but requiring further refinement by designers [
10]. Looking ahead, conceptual frameworks envision AI as an interactive collaborator that dynamically adapts to designer inputs, negotiates ideas, and sustains creative motivation [
11].
However, the integration of generative AI through live-prototyping represents a further leap towards democratizing design. This approach not only accelerates feedback loops by enabling real-time modifications during user studies but also embodies a more inclusive design philosophy [
12]. By empowering designers to iterate on components instantly based on user input, generative AI in live-prototyping bridges the gap between expert knowledge and user experience, reducing the barrier of entry, which allows diverse perspectives to contribute to the design process. Generative AI’s ability to seamlessly integrate into the testing process and the application itself enables functionalities such as dynamic interface adaptation, real-time content generation, and instantaneous feedback analysis [
13,
14]. These capabilities closely mirror the traditional Wizard-of-Oz (WOZ) technique, where human operators emulate system responses to test partially implemented components [
15]. However, while WOZ simulations are subject to human limitations like response variability, fatigue, and scalability challenges, AI-driven prototyping offers consistent, scalable, and immediate simulations. This technological shift not only addresses the inherent constraints of WOZ but also amplifies the iterative nature of UCD.
Our paper aims to explore the potential of generative AI in assisting live-prototyping within user studies. We aim to answer the following research question: “How can generative artificial intelligence (AI) be utilized to enable live-prototyping within user studies to facilitate immediate user feedback integration and validation?” To answer this question, we present a conceptual framework for live-prototyping in user-centered design. Our conceptual framework democratizes the use of generative AI through component simulation to facilitate user feedback validation in live-prototyping. To illustrate this, we conducted a case study where designers were tasked to perform a live-prototyping study using a prototype that leverages multiple generative AI tools in iterating a language practice tool that adheres to Intelligent Computer-Assisted Language Learning (ICALL) principles [
16].
This prototype was initially developed in a pre-study aimed at gathering detailed insights, comments, and feedback from users. In our current investigation, we simulated a live-prototyping scenario in which these collected user inputs were analyzed and integrated by generative AI. This simulation allowed us to observe how prototypers might leverage AI to iteratively refine their designs in real time. Furthermore, semi-structured interviews were conducted with participants to capture their perspectives on the conceptual framework and its potential application across various design contexts.
2. Review of Related Literature
2.1. Application of Generative AI in User-Centered Design
The evolution of user-centered design (UCD) is rooted in its fundamental principle of actively engaging users throughout the design process [
1]. As a multidisciplinary approach, UCD emphasizes a deep understanding of user needs through iterative research and evaluation, ensuring that final design solutions are closely aligned with user expectations and requirements [
2]. This methodology relies on multiple cycles of user testing and feedback to progressively refine designs. While this iterative nature is central to UCD’s effectiveness in crafting highly personalized user experiences, it also introduces significant trade-offs such as time constraints and financial costs [
17]. These inherent challenges have prompted ongoing exploration into more efficient alternatives that maintain UCD’s core user-centric philosophy while enhancing its practicality.
To address these challenges, recent research has increasingly explored the integration of generative AI within UCD. It has shown great promise in the prototyping of low-fidelity and high-fidelity design concepts. Petridis et al. introduced PromptInfuser, a plugin for Figma that lets designers link interface elements to LLM prompts, creating semi-functional prototypes where, for example, a user’s text input in a mock interface is sent to an LLM and the output is rendered on the UI. In a study with 14 designers, PromptInfuser allowed simultaneous iteration of the interface and the AI prompt, helping designers quickly catch UI–prompt incompatibilities and refine their solutions [
8]. An earlier exploratory study found that integrating LLMs into prototyping significantly speeds up the creation of functional prototypes and allows user testing of those prototypes much earlier in the design cycle. Tight integration of generative AI into prototyping could enable designers to iterate faster and to evaluate AI-driven functionality with users at a very early stage, which was previously difficult. Additionally, text-to-image models (e.g., Stable Diffusion, Midjourney) can also produce UI mockups or style inspirations from written prompts. A study by Lin et al. found that such image generators serve best as a brainstorming aid rather than as tools for final polished designs [
10]. In their experiments, the AI-generated images acted as a “catalyst for brainstorming” more than as ready-to-use design assets [
10], highlighting that generative visuals are most valuable in low-fidelity concept generation, where they can inspire human designers to refine or redesign the outputs to meet usability and branding requirements.
After prototyping, generative AI can also assist in collecting user feedback and implementing changes to the design. One specific use case employs LLMs to analyze qualitative user research data [
18,
19]. Tankelevitch et al. highlights AI’s ability to support user-centered research through metacognitive assistance by helping users analyze, cluster, and summarize user feedback as a key area for improvement in AI-UX collaborations. By automating the tedious parts of feedback analysis (like transcribing sessions or grouping similar responses), AI allows researchers to focus on deeper insights but may require deliberate separation of tasks [
19]. While specific academic studies on this are just emerging, the idea aligns with industry observations: UX professionals often use AI for writing and summarization tasks [
6]. Early explorations have looked at “human–AI collaborative analysis” for UX evaluation, seeking interfaces that let human researchers and AI jointly cluster and interpret user feedback. Chung et al.’s study on generative AI in the design thinking process emphasized AI’s role in problem-solving and idea generation [
20]. Similar to the findings of Tankelevitch et al., Chung et al. also notes the need for structured frameworks to integrate AI-generated insights while preserving human-centered decision-making [
19,
20].
When implementing feedback, generative AI’s ability to generate new design alternatives rapidly becomes valuable. AI-driven design frameworks have been proposed to incorporate AI-generated UI prototypes in testing phases, enabling A/B comparisons between human-created and AI-generated designs [
21]. This interaction could facilitate a more efficient iteration loop where, instead of starting redesigns from scratch, the human designer curates and improves outputs from the AI, which serves as a “creative assistant” during refinement. However, Vacanti et al. highlights that generative AI models have significant limitations in their opacity and lack of interpretability. Designers may struggle to understand the reasoning behind AI-generated outputs, leading to challenges in refining and justifying design choices [
22]. Additionally, AI-generated content can reflect biases embedded in the training data, potentially introducing unintended ethical and inclusivity issues in the design. The lack of precise control over AI-generated outputs further complicates its integration into UX workflows, requiring iterative adjustments and manual oversight to align with user needs.
Comparative studies further illustrate the transformative impact of AI-enhanced methodologies on the UCD landscape. For instance, research by Ray demonstrates that integrating generative AI in rapid prototyping enables real-time design modifications and more agile incorporation of user feedback, significantly reducing iteration cycles [
13]. Similarly, Bilgram and Laarmann highlight that AI-driven tools democratize the prototyping process, allowing non-technical stakeholders to participate in early-stage design and innovation actively [
23]. These advancements underscore AI’s potential to foster a more dynamic, inclusive, and efficient design workflow. However, as studies suggest, managing algorithmic biases and ensuring strong human oversight remain crucial challenges in AI-enhanced UCD, necessitating further research into best practices for mitigating these risks [
24].
2.2. Existing Frameworks That Utilize Generative AI
Generative AI is increasingly shaping user-centered design (UCD) by enabling dynamic content creation and rapid prototyping. Designers and researchers are actively investigating how generative AI can integrate into the iterative UCD process while focusing on users’ needs. One such approach is the DesignFusion framework proposed by Chen et al., which incorporates multiple generative models into the conceptual design phase. This framework employs large language models (LLMs) and text-to-image models to generate structured design ideas, ensuring a comprehensive exploration of design possibilities [
25]. To enhance the generative process, it decomposes design tasks into subcomponents using multi-step reasoning aligned with design attributes, providing a systematic approach to idea generation. Additionally, an interactive software toolkit allows designers to intervene at each stage, offering control over AI-generated outputs. By incorporating a mixed-initiative workflow, where AI suggests options and designers refine them, the framework fosters an iterative loop that supports real-time modifications. The transparency of this process further mitigates the “black box” nature of generative models [
26], ensuring greater designer oversight and trust in AI-driven solutions.
The AI-UCD framework proposed by Siricharoien offers a comprehensive methodology for embedding AI within established UCD principles. This framework seeks to systematically align AI capabilities with user experience design by emphasizing rigorous user research, ethical AI implementation, and ongoing validation. It is structured around nine key steps: assessing user needs, defining AI integration objectives, ideating AI-enhanced solutions, designing AI-driven interfaces, implementing AI functionalities, iterating based on user feedback, providing user training and support, continuously monitoring AI performance, and adapting AI behavior based on user interactions [
14]. To ensure AI-driven features remain user-centered, the framework also introduces the AI-UCD Validation Model, which consists of five structured validation steps: user testing and feedback collection, analytics-driven evaluation, ethical auditing, iterative refinement, and compliance benchmarking against industry standards. This structured approach ensures that AI components evolve based on empirical data and user insights, allowing for sustained adaptability and alignment with user needs.
While both frameworks integrate generative AI into user-centered design, they differ in scope and emphasis. DesignFusion primarily focuses on the conceptual design phase, leveraging AI to enhance ideation and provide structured design outputs through multi-step reasoning [
25]. Its approach prioritizes transparency and designer intervention to enable controlled AI-assisted creativity. In contrast, the AI-UCD framework offers a broader lifecycle-oriented methodology that extends beyond design ideation to encompass implementation, validation, and continuous adaptation [
14]. It emphasizes structured validation, ethical considerations, and long-term usability, ensuring AI integration remains aligned with user needs over time. Despite their differences, both frameworks underscore the potential of generative AI in supporting designers through iterative feedback loops and structured methodologies, demonstrating AI’s growing role in enhancing UCD practices.
2.3. Application in Wizard-of-Oz
The Wizard-of-Oz (WOZ) technique has long served as a cornerstone in early usability engineering, wherein human “wizards” simulate system functionalities that have yet to be implemented, thereby providing a flexible and dynamic means of prototyping [
15]. This method’s historical significance lies in its capacity to capture a comprehensive spectrum of user–system interactions, offering researchers invaluable insights into how users engage with emerging technologies even before a system is fully realized. This approach offers distinct advantages over conventional prototyping methods. As Bernsen et al. explains, WOZ avoids the oversimplification of complex task domains, allowing for an authentic simulation of system behavior that can capture subtle nuances in user interactions [
15]. Such detailed data collection is crucial for understanding usability in its full complexity, making WOZ an indispensable tool for iterative design and evaluation. However, Simpson et al. highlights several challenges inherent in employing human operators, including the complexities of managing intricate interfaces, the distractions that arise during multitasking, and the reliance on manual inputs such as keyboard hotkeys [
27]. These issues can introduce errors and inconsistencies, potentially undermining the fidelity and scalability of the simulated interactions.
To address these challenges, recent developments have sought to augment the conventional WOZ framework with AI-assisted methodologies. Shin et al. introduces the Apprentice of Oz system, a human-in-the-loop approach designed to alleviate the cognitive burden on human wizards by automating specific subtasks [
28]. By decomposing the wizard’s role into smaller, more manageable components, this integrated system enhances both the consistency and reliability of the simulation, thereby addressing many of the limitations associated with manual control. The versatility of the WOZ method is further demonstrated through its application in domain-specific contexts. Li et al. illustrate this through a dataset that leverages the WOZ technique to simulate realistic human–robot dialogues within industrial robotics and manufacturing settings [
29]. The creation of such high-quality, annotated datasets is pivotal for training task-oriented dialogue systems, underscoring the method’s capacity to yield insights that are both domain-relevant and practically applicable.
Further expanding the scope of WOZ research, studies have begun to examine the dynamics of human–AI teamwork in collaborative settings. Zheng et al. explore the integration of AI as an equal participant in group decision-making processes by granting it equal voting rights alongside human counterparts [
30]. Their findings reveal that, although AI contributes commendably with commendable consistency and objectivity, it still encounters difficulties adapting to dynamic discussions and contributing progressively. Innovations that enhance the believability and scalability of WOZ-inspired systems have also emerged. Schecter et al. propose the Vero method, which fuses traditional WOZ techniques with modern technological solutions such as video conferencing and animation [
31]. This approach facilitates the simulation of autonomous AI behavior in a controlled, yet realistic, environment and thereby renders human–AI teamwork research more accessible and scalable. The Vero method effectively addresses several scalability and authenticity issues that have long challenged manual WOZ simulations.
2.4. Implementation of Generative AI
An emerging technology that could prove useful in WOZ simulation is generative AI. This technology has been used to great effect in recent years in various applications that attempt to mimic human behaviour [
32,
33,
34] and automation of tasks [
35,
36,
37]. At a glance, it might seem that this trend skews towards the idea of replacing humans with AI. However, there has been a significant push towards the idea of labor augmentation [
35].
It should be noted that automation and labor augmentation are not mutually exclusive; automating certain tasks can lead to enhanced labor in other areas. Santiago et al. [
38] proposed the idea of utilizing multiple generative AI instances for collaborative storytelling and task automation in Dungeons & Dragons to reduce the cognitive load on dungeon masters. The concept of utilizing AI as a companion to aid in tasks extends beyond these applications as well. Sun et al. [
39] conducted a study to design a machine learning (ML) prototyping toolkit, along with a practical method for incorporating it into the design process. They created ML-Rapid to convert diverse, attractive ML applications into user-friendly modules that enhance designers’ understanding of ML, fostering innovation in ML-integrated product design.
The democratization of AI in the prototyping space has been steadily gaining popularity. Pioneering tools such as PromptMaker utilize natural language prompts with LLMs to democratize ML prototyping, making it accessible to a wider audience [
12]. This approach significantly broadens the spectrum of professionals who can engage in the prototyping process, fostering a more inclusive and collaborative environment. However, there are still challenges in the AI prototyping space, particularly the unpredictability of AI behavior and the complexity of outputs [
40]. These complexities call for innovative methodologies and necessitate a paradigm shift towards more adaptive and intuitive AI prototyping strategies. PromptInfuser exemplifies the potential of LLM integration into UI mock-ups for swift prototyping of functional applications [
8]. This innovation allows for rapid iteration on prototypes without requiring extensive technical knowledge, further democratizing the prototyping landscape by embedding AI capabilities directly into the design process.
By synthesizing these insights, we aim to democratize AI in prototyping by facilitating live, real-time modifications during user studies. An innovative approach that is worth exploring is through the use of multiple LLMs and other AI instances to perform various tasks such as component simulation and process automation. This approach can potentially allow for a more efficient and effective prototyping process with an emphasis on brainstorming and ideation, while embodying the inclusivity, adaptability, and rapid prototyping capabilities presented in the aforementioned works, striving for a prototyping environment that is more accessible, efficient, and responsive to user feedback.
2.5. Limitations of Generative AI
As generative AI systems become increasingly embedded in interactive technologies, they offer new design possibilities but introduce considerable complexity. Their integration into user-centered design (UCD) processes reveals a range of limitations that require careful management. Due to their inherent technical limitations, generative models often produce inaccurate or biased outputs, stemming from the nature of their training data and architectural constraints [
41,
42]. These limitations could potentially have cascading effects on usability and system reliability. Users frequently encounter unpredictable behavior, limited transparency, and constrained control, which can disrupt familiar interaction patterns and design assumptions [
24,
43]. Moreover, integrating these models into the UCD workflow complicates established practices such as prototyping, testing, and iterative refinement [
24,
44]. This systemic shift affects all phases of the design cycle, prompting concerns about design fixation, ethical risk, and the reliability of user feedback [
43,
45].
The technical limitations of generative AI are foundational to these broader challenges. Models frequently generate outputs that appear plausible but are factually incorrect or incoherent, a problem known as hallucination [
41,
42]. Biases embedded in training datasets can also lead to the reproduction of harmful stereotypes or the marginalization of underrepresented groups [
42,
46]. Additionally, the stochastic nature of these models limits their controllability, limiting their ability to produce consistent results across varying contexts [
24]. These technical constraints are further compounded by high computational demands during training and inference, which can restrict deployment options, especially in environments with limited resources [
42].
These foundational issues directly affect usability and user experience. Because generative AI systems often yield varied responses to identical inputs, users may struggle to form accurate mental models of the system’s behavior [
24]. This unpredictability undermines key UX principles such as consistency and reliability. Additionally, effective use of generative systems often depends on prompt engineering or other specialized input strategies, which can increase cognitive load and reduce user autonomy [
24]. The inherent opacity of most generative models also limits explainability, which can affect user trust. Users may over-rely on or reject system outputs without a clear rationale, complicating engagement and feedback [
43]. Addressing these concerns requires new UX strategies focusing on transparency, manageable control, and accessible methods for correcting or verifying outputs [
43,
44].
Beyond the user experience, integrating generative AI into development workflows presents significant process disruptions. Traditional prototyping techniques like Wizard-of-Oz simulations are often inadequate for representing generative behavior, making early-stage user testing less effective [
44]. Iterative refinement becomes resource-intensive, as altering model behavior often involves prompt tuning or retraining, which are not only technically demanding but also unpredictable in outcome [
24,
44]. Quality assurance is further complicated by the broad and dynamic output space of generative systems, necessitating continuous monitoring rather than one-time validation [
24]. Infrastructure constraints, including high computation costs and reliance on third-party APIs, raise concerns about scalability and long-term maintainability. These pressures demand more interdisciplinary collaboration and a shift toward ongoing design stewardship throughout the product lifecycle [
24,
44].
These workflow disruptions also manifest in the structure and execution of the UCD process itself. In the ideation phase, AI-generated suggestions can lead to design fixation, narrowing creative exploration, and reinforcing existing biases [
45]. During prototyping, the need to simulate generative capabilities with high fidelity often conflicts with the speed and flexibility expected in early design stages [
44]. Testing is complicated by the variability of AI outputs; inconsistent responses to the same stimuli can distort usability assessments and hinder the drawing of generalizable insights [
43]. Once deployed, systems require active monitoring, as user interaction may shift model behavior over time, leading to emergent issues that traditional UCD processes are not equipped to handle [
24]. These conditions suggest the need for new frameworks that embed AI-specific considerations into every stage of design to ensure outcomes that are not only functional but also equitable and sustainable.
3. Conceptual Framework and Methods
While the UCD process includes users for better design and evaluation, it often leads to multiple user study sessions, which require appropriate amounts of time and effort. This process can be seen as time-consuming and costly, especially when turning user feedback into components. To address this, we propose the idea of live-prototyping, which allows designers to make changes to the prototype immediately to evaluate components before investing time and resources in a full implementation. To facilitate live-prototyping, we propose a conceptual framework, as seen in
Figure 1.
To make live changes to the prototype, designers need to be able to modify it during user testing. To achieve this, our conceptual framework requires that the necessary elements of the prototype be built with generative AI, allowing components to be dynamically created and modified. These AI-driven components can be manipulated using different prompts, changing their behavior and interaction. The prompting process varies based on the generative AI being used: in the case of ChatGPT, such instructions utilize clear text [
47], while Stable Diffusion requires more complex prompting through a dashboard.
Our conceptual framework employs one or more generative AI models to simulate components of the prototype. These components represent features that users interact with during testing. If participants request any changes to the prototype, the designer can immediately validate these changes by restructuring simulated components to better align with the participants’ needs. For this, our conceptual framework proposes that prototypes utilized in user studies should comprise two interfaces: one directed toward the participant and a control interface for the prototype designer, which is not visible to the participant. This control interface allows designers to adjust how AI models generate components through prompts and how components interact with each other.
The concept of simulating components of a prototype is similar to the Wizard-of-Oz (WOZ) simulation technique. However, the reliance on a human “wizard” to simulate digital interactions presents limitations, particularly in tasks where digital prototyping demands more sophisticated solutions. Generative AI emerges as a viable alternative in this context. To facilitate the generation of more complex components, our framework encourages the use of multiple AI models, where the output of one model becomes the input of another. This method of linking multiple AI models is akin to having a collection of Wizards of Oz at the designer’s disposal. This approach could allow for the immediate and seamless integration of user feedback into the prototype, potentially reducing the need for creating high-fidelity components during the prototyping phase.
To demonstrate our conceptual framework, we built a prototype according to the principles outlined in
Figure 1. We then invited HCI practitioners with prototyping skills to participate in a study where they could actively engage in live-prototyping and reflect on the framework’s underlying ideas. In our study, they were tasked with leading a co-design session with an end-user, where they would redesign and adjust our prototype in real time based on the user’s feedback. To establish a controlled environment focusing on the live prototype, one of our researchers played the role of the end-user. This end-user voiced feedback during the co-design session, representing realistic concerns that the HCI practitioners were asked to address through real-time changes. To evaluate our prototype, we asked the HCI practitioners open-ended questions, encouraging them to express their views freely, discuss potential challenges of live-prototyping, and consider new possibilities. Topics included the role of AI in component simulation and other aspects relevant to live-prototyping. Finally, we performed a thematic analysis of the interview data to identify key issues and opportunities emerging from applying our conceptual framework.
For our conceptual framework prototype’s context, we designed a second-language acquisition (SLA) application, with the end-user representing a language learner. We decided on this context due to the wide variety of possible feedback a user can give for this setting. According to Cook, language learners have distinct personalities and biases that influence their language learning process, uniquely positioning the context as an ideal scenario to test our prototype [
48]. The initial design of the SLA prototype and the end-user’s feedback were informed by a preliminary study with actual second-language learners. To identify realistic end-user feedback, we observed these prototyping sessions and took detailed notes to document the process. After the session, we conducted semi-structured interviews to explore participants’ attitudes, experiences, and reflections on the SLA prototype.
3.1. Preliminary Study and Prototype Development
To establish a foundation for the live-prototyping study, developing a modifiable prototype design was necessary while gaining a clear understanding of user needs for live modifications. Thus, a preliminary study was conducted, focusing on the initial design of an AI-powered SLA tool. At this stage, the prototype primarily served as an interface linked to generative AI features, including a conversational agent powered by GPT-4 and a scenario-based image generator utilizing Stable Diffusion. However, live modification capabilities have not yet been implemented. The prototype was developed using Pygame for the user interface, with a supporting background pipeline that leveraged generative AI to create a personalized language learning experience. Users interacted with the system through two input modalities: text and voice.
Five non-native German speakers, aged 22 to 27, were recruited to participate to assess the usability and effectiveness of the prototype in its early form. These individuals came from diverse academic and cultural backgrounds and had been learning German for the past six months, with proficiency levels ranging from basic to intermediate (A1 to B1). Over three weeks, three iterative feedback sessions were conducted, each lasting approximately one hour. The primary objective of these sessions was to gain deeper insights into language learners’ needs and refine the participant-facing interface accordingly (see conceptual framework model,
Figure 1).
Each session combined semi-structured interviews with observational methods, ensuring a comprehensive collection of user feedback on desired features and design requirements. The findings from this preliminary study informed the development of a more contextually grounded and realistic environment for the subsequent live-prototyping study. At this stage, the focus remained solely on refining the participant-facing interface for language learners. In contrast, the control interface for designers—an essential component of the conceptual framework—was developed and integrated later for the live-prototyping phase with HCI practitioners.
During prototype use, learners initiated a conversation by selecting a provided scenario and were presented with a corresponding image generated using a latent diffusion model. The conversation was displayed in a text field, with responses also delivered through audio output. The participant interface of the prototype is illustrated in
Figure 2. Although the user needs identified during this preliminary study were not directly integrated into the prototype at this stage, they were translated into end-user requests. These requests were later voiced by a researcher simulating an SLA learner in the main study involving HCI practitioners, ensuring that the live modification process could be evaluated in a controlled yet dynamic manner.
3.2. Preliminary Study Results
In addition to evaluating interface usability, participants identified several critical needs to enhance the prototype’s effectiveness for language learning. Their feedback primarily focused on interaction structure, personalization, and content relevance. A recurring suggestion was the inclusion of a more structured introduction, with prompts for names, conversation roles, and roleplay objectives to establish more precise expectations. Personalization also emerged as a significant concern, as participants expressed interest in customizing the identity and characteristics of their conversation partners. Comprehension challenges were frequently noted when encountering unfamiliar phrases, prompting requests for in situ translations and the ability to input responses in their native language. Furthermore, participants emphasized the importance of adjusting AI-generated responses to match their level of proficiency to ensure that interactions remained accessible and appropriately challenging.
Another key area of concern was the quality of AI-generated images, which occasionally did not align with the context of the conversation. The participants suggested implementing on-demand image updates and refining prompt formulation to prevent inaccuracies or inappropriate content. Maintaining engaging conversations also proved challenging, as some participants struggled to maintain a consistent dialogue. They recommended that the AI introduce new topics or pose follow-up questions to facilitate a more dynamic exchange. Others noted that the AI’s responses were sometimes overly lengthy, leading to a perceived dominance in the conversation. Participants suggested shorter, more concise responses to encourage more natural and reciprocal interactions. Additional requests included real-time feedback on language accuracy and the ability to reference and discuss AI-generated images within roleplay scenarios.
Rather than implementing immediate modifications to the prototype, the user needs were integrated into “end-user requests” during the co-design sessions. Specifically, these requests were articulated by a researcher acting as a simulated end-user, ensuring that human–computer interaction (HCI) practitioners engaged with realistic user concerns. This approach allowed for real-time adjustments to the prototype in direct response to participant feedback.
Table 1 summarizes the key participant needs alongside the resulting end-user requests, which were incorporated into the main study to simulate authentic user interactions.
3.3. Conceptual Framework Prototype
Building on the insights gained from the preliminary study, we further developed our SLA prototype to support live-prototyping in alignment with our conceptual framework. At this stage, none of the voiced feedback from the preliminary study had been integrated into the prototype, as we intended for participants to modify the prototype live based on feedback provided during the co-design session. It is important to note that the participant groups differed across the two phases: the preliminary study involved SLA learners, whereas the main study engaged HCI practitioners with prototyping expertise. In the main study, one of our researchers additionally assumed the role of an SLA learner, voicing user needs—displayed in the “End-User Requests” column of
Table 1—that the HCI practitioners were expected to address and implement in real time.
To align with our conceptual framework, a separate control interface for the prototype was added, as seen in
Figure 3. A control interface places the designer between the participant and generative AI instances to tailor the behavior of the AI to the participant’s needs. This can be achieved while the participant interacts with their prototype user interface, as seen in
Figure 2. The added control interface enables live changes to the conversational agent by directly manipulating the behavior of the generative AI instances using clear-text prompting and component simulation by allowing multiple generative AI instances to be chained together. Since the components are created as AI-generated elements, they can be re-generated to perform various tasks, such as allowing for a translation feature by prompting an instance of GPT-4 based on a user input. This maintains the illusion that the SLA prototype has fully implemented components, while the generative AI plays a similar role to the Wizard-of-Oz, simulating a language learning application. Like the framework diagram in
Figure 1, our implementation now uses two interfaces: the language learning interface through which the user can communicate with the AI, and the control interface on the designer’s side.
3.4. Participants
Recognizing prototyping experts as the main target group for our conceptual framework, we set experience in prototyping as the primary participation criterion. With our framework’s focus on component simulation through generative AI, our main study invitations expressed a preference for participants with a basic understanding of AI prompting.
We recruited nine participants, aged 22 to 30, including seven women and two men, all with academic backgrounds and diverse cultural origins. In line with our participation criteria, individuals who self-identified as having experience in prototyping were selected as participants. Furthermore, all participants possessed at least a foundational understanding of AI tools, primarily from experimenting with applications like ChatGPT, Stable Diffusion, and other accessible AI tools.
3.5. Study Procedure
Each of our live-prototyping study sessions was conducted with two researchers and one participant. We began by welcoming our participants and obtaining their informed consent. We then introduced our proposed conceptual framework together with the language practice tool. All sessions followed a consistent structure—starting with an introduction of both the language learning and control interfaces, moving on to a role-play scenario where participants engaged in live-prototyping with one of the researchers acting as a language learner, and concluding with a semi-structured interview.
We first presented the two interfaces that the participants would be using. We introduced the language learning interface by explaining its approach to scenario-based language practice through conversations with an AI partner. We then demonstrated interactions using an example scenario while addressing any questions the participants had. After introducing the language learners’ perspective, we detailed the functions of the elements on the control interface, laying the groundwork for the next segment: a role-play scenario of a live-prototyping study.
During this scenario, one of our researchers took the role of a B1 English language learner who was already familiar with the language practice tool. Our "language learner" then provided feedback on potential modifications that could be implemented, based on actual feedback we received during the previous study. Meanwhile, our participants took on the role of the designer, conducting a live-prototyping study. They then incorporated the user feedback into the language practice tool via the control interface. If the participant needed assistance, another researcher was available to support them. After there was no further feedback from our language learner, we then proceeded to conduct the semi-structured interview.
4. Results
The thematic analysis of this study identified five key themes that illustrate both the complexities and opportunities of integrating generative AI into live-prototyping user studies. These themes include (1) the framework’s adaptability in generating meaningful insights, (2) its effectiveness in facilitating stakeholder communication, (3) the cost-efficiency of using generative AI for component simulations, (4) the planning and execution challenges of live-prototyping, and (5) the necessity of addressing potential biases and unintended AI outputs. Together, these themes provide a comprehensive perspective on the role of AI in enhancing prototyping methodologies.
All nine participants expressed confidence in the framework’s capacity to generate valuable insights, highlighting its ability to streamline iterative feedback. Several participants emphasized that live-prototyping allows for more dynamic and personalized interaction with users, which potentially could reduce the need for multiple iterations of the study. One participant noted, “Live-prototyping allows for more personalized interaction, … potentially reducing the need for multiple iterations”. This perspective highlights the benefits of the framework over traditional prototyping methods, where feedback is usually incorporated over multiple iterations. Another participant echoed this sentiment, stating, “The framework could make my prototyping process more efficient for instant feedback and changes, leading to more valuable insights”. Beyond efficiency, the reflective nature of the framework was perceived as broadening the design space. One participant remarked, “I gained more insights from this rather than a static approach”, suggesting that real-time adaptability fosters deeper exploration of design possibilities. Some participants further speculated that incorporating this framework, alongside direct user observations, could reduce the overall number of iterations required in user studies. These findings highlight the potential of the framework to improve efficiency and insight generation within prototyping processes.
In addition to its adaptability, several participants viewed the conceptual framework as an effective communication tool. One participant stated, “It serves as a communication tool, resolving misunderstandings early on and improving handoffs to developers by leveraging it as an early proof of concept”. This suggests that the framework could potentially mitigate misalignment between stakeholders, particularly in early-stage design discussions. Others emphasized its utility in professional settings, noting that “working with professionals, it could be used for live changes, enhancing collaboration”. Furthermore, participants emphasized the ability of the framework to facilitate direct, iterative user engagement. One participant explained, “It can make the user give an input, sitting side-by-side, then iteratively work on the feedback”. However, concerns were raised regarding potential over-reliance on individual user input; as one participant cautioned, “This might cause the prototyper to focus too much on one user instead of taking in all feedback”. These insights suggest that while the framework holds promise for enhancing early-stage communication and collaboration, careful consideration is needed to ensure the balanced integration of diverse user perspectives.
The study revealed that the participants believed that generative AI could be a cost-effective solution to simulate design components, potentially reducing the need for full implementation in early prototyping stages. One participant observed, “Using (generative) AI for feature simulations could offer a cost-effective solution, giving a greater sense of control over the AI’s behavior”. This perception suggests that AI-powered simulations may provide a viable alternative to resource-intensive development efforts. Additionally, participants appreciated the role of designers as intermediaries between users and AI systems, which allowed them to tailor simulations to specific prototyping goals. One participant remarked, “In terms of the prototyper being a mediator between the user and AI, it is useful”. However, concerns emerged regarding the potential for inconsistencies in AI-generated results across different study participants, which could complicate the comparability of user tests. These findings indicate that while AI-driven simulations could potentially offer notable cost and efficiency benefits, ensuring consistency in AI outputs would remain a key challenge.
Despite its advantages, live-prototyping introduces additional complexities in study planning and execution. One participant acknowledged this challenge, stating, “Preparing for live-prototyping studies might take longer, but the flexibility during testing means less time invested in preparation.” This suggests a trade-off between the initial time investment required for study setup and the adaptability gained during the testing phase. Another participant reinforced this point, explaining, “Since prototyping is more personalized, you would need a lot of time planning and developing.” All participants noted that live-prototyping requires increased spontaneity and creative problem-solving from designers. One participant mentioned, “Conducting studies will become more complex, requiring spontaneity and creative skills from the designer.” This complexity extends to data analysis, with some participants raising concerns about the need for alternative data structuring methods. Additionally, participants speculated that the real-time implementation of certain components might create slowdowns in study sessions, potentially testing users’ patience. To mitigate such disruptions, some participants suggested that a deeper understanding of AI chaining and prompting could improve study fluidity.
Finally, participants raised concerns regarding potential biases in AI-generated outputs and the overall research process. One participant warned, “You may be too influenced by the singular opinion of a user, which could cause tunnel vision”, indicating that an overemphasis on individual feedback may inadvertently narrow the scope of inquiry. Another participant cautioned that “Live-prototyping can lead to bias from the prototyper as to how they want certain features”, highlighting the risk of designer preferences shaping research outcomes. Beyond individual biases, participants also identified potential methodological concerns. One participant observed, “The ease of addressing issues with AI tools might introduce bias into the research process”, suggesting that the convenience of AI-assisted modifications could unintentionally skew study findings. These concerns underline the importance of implementing robust methodological safeguards to mitigate bias and ensure the integrity of AI-assisted live-prototyping studies.
The findings of this study reveal both the strengths and challenges of integrating generative AI into live-prototyping user studies. Participants highlighted the framework’s ability to generate meaningful insights efficiently, facilitate clearer communication among stakeholders, and provide a cost-effective alternative to fully implemented prototypes. However, they also identified important challenges, including the increased complexity of planning live-prototyping studies and the potential for biases in both AI-generated outputs and the research process. While the framework enhances adaptability and real-time engagement, its effectiveness depends on careful methodological considerations to balance user input, maintain consistency in AI behavior, and mitigate unintended influences on design decisions. Addressing these factors will be crucial for refining AI-assisted prototyping approaches and ensuring their reliability in future studies.
5. Discussion
5.1. General Perceptions During Framework Application
Traditional user-centered design (UCD) methodologies often struggle to capture the immediacy and depth of user feedback, necessitating multiple iterations before arriving at a refined solution. By integrating generative AI into live-prototyping, our conceptual framework could enable designers to address explicit user requests in real time and uncover underlying motivations, referred to as meta-feedback. This deeper insight is particularly valuable for understanding user needs beyond surface-level preferences. Several participants observed that live-prototyping tends to foster more personalized interactions and may streamline the design process by reducing the number of required iterations, suggesting that real-time adaptability broadens the design space and elicits more nuanced feedback than traditional static approaches. These findings align with prior research emphasizing the importance of capturing user context and latent needs in AI-supported UCD processes [
14] and highlight the role of interactive systems in aligning with users’ mental models to reveal deeper design requirements [
24].
One of the main advantages of our framework is its potential to reduce the number of iterations in prototyping studies. In a previous study, Subramonyam et al. demonstrated that balancing iterative flexibility with technical precision is crucial in AI-integrated design workflows, while Siricharoien suggests that incorporating live feedback could enhance prototyping efficiency [
14,
49]. Similarly, some participants indicated that the framework could improve efficiency by offering instant feedback and facilitating rapid adjustments that might lead to more valuable insights. However, while this efficiency may accelerate the design process, it could also introduce new demands. The semi-structured nature of live-prototyping requires rigorous planning and a high level of technical expertise in AI prompting and chaining. Other participants indicated that preparing for live-prototyping studies might need more time than traditional approaches, implying that investing more in initial planning and creative problem-solving during execution could be necessary.
Beyond its impact on iteration cycles, the framework demonstrates utility in early-stage design exploration. The ability to generate and refine ideas in real time fosters an iterative, dynamic brainstorming environment that supports rapid ideation, as several studies have observed [
23]. Participants generally noted that live-prototyping offers a more insightful and exploratory design process than static approaches, a sentiment also echoed by Bilgram and Laarmann [
23]. This indicates that by facilitating immediate feedback loops, the framework can strengthen the dialogue between user insights and design refinement, which could lead to richer conceptualization. Moreover, Ray has highlighted that such real-time methods accelerate early-stage innovation by reducing the gap between conceptual ideas and prototype realization [
13].
Another critical benefit of the framework is its potential role in improving stakeholder communication. Some participants suggested that live-prototyping could foster a more effective exchange between designers, developers, and collaborators by potentially addressing misunderstandings early in the design process. A few participants indicated that live-prototyping might serve as a valuable communication tool that could facilitate early resolution of misunderstandings and promote smoother transitions to development. Furthermore, the visibility of real-time design iterations might allow stakeholders to engage in a more transparent and iterative decision-making process, which could help reduce misalignment and promote a shared understanding of design goals. For instance, Subramonyam et al. have highlighted how collaborative prototyping environments can enhance cross-functional teamwork [
49]. Evidence also suggests that dynamic feedback mechanisms may support more coherent stakeholder alignment [
24]. These insights from prior work suggest that integrating real-time feedback into design processes could yield meaningful improvements in collaborative practices.
In addition to its functional advantages, the framework may offer a cost-benefit by using AI-generated component simulations instead of fully implemented prototypes. Several participants expressed that generative AI holds promise as a cost-effective alternative that enhances designers’ control over the AI’s behavior. This observation indicates that AI-assisted simulations may help reduce the financial and time constraints traditionally associated with prototype development.
5.2. Considerations for Generative AI and Real-Time Prototyping
Besides the advantages outlined previously, participants also identified challenges related to the application of generative AI within the UCD process. Inconsistencies in AI-driven simulations may complicate the comparability of user tests, while the interactive nature of live-prototyping raises concerns about over-reliance on individual user feedback. One participant warned that this could “cause tunnel vision”, where the designer becomes overly influenced by the perspective of a single user. Additionally, the ease of modifying designs using AI tools may inadvertently introduce biases into the research process, as designers may unconsciously be influenced by AI outputs. These concerns reinforce the need for structured methodologies to ensure reliability and mitigate bias in AI-assisted prototyping, echoing previous discussions on iterative human oversight in AI-integrated workflows [
50].
In general, owing to our usage of generative AI for the simulation of components during real-time prototyping, studies performed using our proposed framework ought to take into consideration potential biases of generative AI models and hallucinations in their outputs [
41,
42]. Model biases could influence how participants perceive prototypes and their feedback towards the same ones, while biases may also influence prototypers in their decisions during the prototyping process [
44,
51]. For prototypers, these limitations require both a proactive attitude in spotting potential biases from generative AI models and a mindful use of simulated components to gain insights. Generative AI outputs must be viewed as demonstrations of ideas rather than being indicative of the final product, as their purpose is to explore the design space in a collaborative environment. Prototypers should be aware of this fact and take measures to avoid attachments to explorative features, both from themselves and study participants.
During real-time prototyping, the usage of our framework requires prototypers to consider the possibility of potential generative AI failures that may impact the prototyping process. Such failures can either be technical, such as a temporary unavailability of generative AI models, or general failures of generative AI models, such as unexpected outputs leading to unintended functionality. Should such failures occur, the advantages of real-time prototyping, such as exploring new components or testing iterations of existing components, may be negatively impacted.
5.3. Comparison with Existing Frameworks
The AI-of-Oz framework aligns conceptually with the DesignFusion framework [
25] and Siricharoien’s AI-UCD framework [
14] in utilizing generative AI to augment UCD processes. All three frameworks leverage generative AI’s capability to dynamically support the design and evaluation phases and promote iterative feedback and real-time adaptability. However, the key differentiation lies in the frameworks’ goals, the extent of live interactivity, and their validation processes.
The DesignFusion framework primarily targets the early conceptual design stage, integrating multiple generative AI models to explore design ideas and conceptual alternatives. Its distinctive feature is the structured decomposition of design tasks into sub-components, facilitated by multi-step generative reasoning [
25]. While this framework significantly empowers designers by generating diverse conceptual artifacts, it does not explicitly support immediate prototype adjustments during real-time interactions with end-users. DesignFusion thus excels in structured ideation and exploration but lacks the direct user-testing adaptability featured prominently in our framework.
Conversely, the AI-UCD framework emphasizes a comprehensive integration of AI throughout all stages of the UCD process. It systematically covers user research, AI integration objectives, interface design, iterative feedback loops, and rigorous validation procedures with ethical audits and performance benchmarking [
14]. Unlike the DesignFusion framework, this framework places significant importance on continuous validation and ethical considerations, underscoring user satisfaction and accessibility. However, the framework’s process does not inherently facilitate real-time dynamic modifications during user interactions. Instead, it relies on a structured iterative cycle where user feedback informs subsequent design refinements rather than immediate adjustments within the same testing session.
The AI-of-Oz framework merges elements of existing frameworks by applying generative AI within live-prototyping to facilitate immediate adjustments during user testing. It extends the Wizard-of-Oz prototyping concept by replacing the human wizard entirely with generative AI models, allowing designers to dynamically alter prototype components instantly in reaction to user feedback. This approach enables in-session iterations through a dual-interface design: one for participants and a hidden interface for designers to allow real-time control over generative prompts and outcomes. The framework emphasizes adaptability during live user interactions more directly than the existing frameworks, making it particularly useful for exploring user responses to immediate interface adjustments or new feature explorations within a single user-testing session.
5.4. Limitations of the Study
Despite the valuable insights gained from our study, several limitations need to be acknowledged, which may affect the generalizability and robustness of our findings. Firstly, this study was conceived as an exploratory investigation into the conceptual potential of our framework and not as a controlled user evaluation or performance benchmark. We aimed to collect early and in-depth reflections from prototyping experts to assess perceived advantages and challenges, rather than to measure the effectiveness of the framework in a statistically significant way. Consequently, our study relied exclusively on qualitative research methods, opting out of utilizing quantitative approaches. Although qualitative methods provide rich insights into participants’ perceptions, the absence of quantitative data limits our ability to assess the impact of the framework with objective metrics. For instance, the inclusion of usability measurements or cognitive load assessments could have offered a more structured understanding of its practical value.
Secondly, our sample size was relatively small, comprising only nine participants. This limited sample size could restrict the scope of our findings when it comes to drawing broader conclusions about the applicability of the framework. In addition, our study predominantly recruited participants from an academic background, resulting in limited representation from industry contexts. This imbalance in the demographics of participants may further constrain the generalizability of our findings.
Thirdly, our study aimed to explore a conceptual framework rather than to test a fully developed system. As such, participants interacted with a predefined prototype that they had not personally designed. This unfamiliarity may have influenced their engagement and feedback, potentially introducing bias in their evaluation. Additionally, due to practical constraints, the study was conducted in a simulated environment rather than a real-world prototyping context, and the language learning scenario may not have been equally relevant to all participants.
Lastly, while the prototype used for our study incorporated generative AI, it is important to note that our framework does not inherently require generative AI to be used in the application that is being prototyped. This mismatch between the framework’s conceptual flexibility and the specific implementation presented may have shaped participants’ perceptions and evaluations in unintended ways.
6. Conclusions and Future Research
This study introduced a conceptual framework that integrates live-prototyping with generative artificial intelligence (AI) to explore new possibilities for enhancing the user-centered design (UCD) process. By enabling designers to modify prototypes in real time through AI prompting and chaining, the framework aims to facilitate more immediate user feedback without the need for high-fidelity component development. Traditional UCD workflows often require multiple iterative cycles, which can be time-consuming and resource-intensive. Live-prototyping may offer a way to address these challenges by enabling instant adjustments to facilitate early evaluation of design elements and foster a more dynamic and iterative design process. While conceptually inspired by the Wizard-of-Oz (WoZ) technique, our framework distinguishes itself by leveraging AI-generated components to support more flexible, adaptive prototyping. To investigate the viability of this concept, we employed an exploratory qualitative study. We conducted a case study with nine experienced prototypers who engaged in a simulated live-prototyping session based on real feedback gathered in a preliminary user study. Semi-structured interviews were then used to explore their experiences, attitudes, and interpretations of the framework. A thematic analysis of these interviews revealed both promising opportunities and important challenges.
Findings suggest that the framework may support early-stage ideation and improve communication between designers and stakeholders. Participants appreciated the potential for quicker feedback loops, increased transparency, and fast component simulation using generative AI. However, they also highlighted several challenges, such as the steep learning curve associated with prompt design and AI chaining, as well as concerns around the reliability, interpretability, and consistency of AI-generated outputs. It is important to emphasize that our study does not provide empirical evidence that AI-assisted live-prototyping is faster, more effective, or more insightful than traditional prototyping methods. These claims were discussed as perceived potential based on qualitative feedback, not as statistically validated outcomes. Due to the small sample size and the qualitative nature of the study, no generalizations should be drawn regarding performance improvements, iteration speed, or usability benefits. Future work should include quantitative studies with larger and more diverse participant samples to evaluate the practical impact of AI-assisted live-prototyping. This could include comparative study designs focusing on the number of iterations needed, time required for revisions, and usability metrics or cognitive load assessments between different workflows. Quantitative data collection would help measure user perceptions of effectiveness, usability, and collaboration. These efforts are essential for substantiating the framework’s benefits with robust empirical evidence.
Despite these challenges, participants expressed strong interest in a dedicated AI live-prototyping toolbox that would provide intuitive access to a wide range of AI tools for prompting and chaining. Such a system could significantly lower the barrier to entry for novice users, allowing them to leverage existing and emerging AI technologies to enhance their prototyping processes. Furthermore, as generative AI continues to evolve, future advancements may enable more sophisticated component simulations, expanding the possibilities of AI-assisted prototyping. Investigating the development of standardized toolkits and frameworks tailored for live-prototyping could further refine this approach and improve its accessibility for designers with varying levels of technical expertise. Future investigations could focus on refining AI chaining strategies, improving the interpretability of AI-generated outputs, and mitigating potential biases introduced by both designers and AI models. Additionally, expanding the application domains of live-prototyping beyond traditional UCD contexts could unlock new opportunities for real-time, AI-enhanced user studies. By building upon the core principles introduced in this research, future studies can contribute to the broader goal of making prototyping more adaptive, efficient, and responsive to user needs.
We encourage researchers, designers, and industry professionals to further explore the integration of live-prototyping within their respective methodologies. By refining and extending this framework, future innovations can drive more effective, user-centric design workflows. We look forward to the advancements that will emerge as AI-powered live-prototyping becomes an integral part of the design process.