Do Novices Struggle with AI Web Design? An Eye-Tracking Study of Full-Site Generation Tools

Chu, Chen; Zhao, Jianan; Dong, Zhanxun

doi:10.3390/mti9090085

Open AccessArticle

Do Novices Struggle with AI Web Design? An Eye-Tracking Study of Full-Site Generation Tools

by

Chen Chu

^1,†,

Jianan Zhao

^2,†

and

Zhanxun Dong

^1,*

¹

School of Design, Shanghai Jiao Tong University, Shanghai 200240, China

²

Fashion & Design College, Donghua University, Shanghai 200051, China

^*

Author to whom correspondence should be addressed.

^†

The authors contributed equally to this manuscript.

Multimodal Technol. Interact. 2025, 9(9), 85; https://doi.org/10.3390/mti9090085

Submission received: 11 July 2025 / Revised: 11 August 2025 / Accepted: 19 August 2025 / Published: 22 August 2025

(This article belongs to the Special Issue Human-AI Collaborative Interaction Design: Rethinking Human-Computer Symbiosis in the Age of Intelligent Systems)

Download

Browse Figure

Versions Notes

Abstract

AI-powered full-site web generation tools promise to democratize website creation for novice users. However, their actual usability and accessibility for novice users remain insufficiently studied. This study examines interaction barriers faced by novice users when using Wix ADI to complete three tasks: Task 1 (onboarding), Task 2 (template customization), and Task 3 (product page creation). Twelve participants with no web design background were recruited to perform these tasks while their behavior was recorded via screen capture and eye-tracking (Tobii Glasses 2), supplemented by post-task interviews. Task completion rates declined significantly in Task 2 (66.67%) and 3 (33.33%). Help-seeking behaviors increased significantly, particularly during template customization and product page creation. Eye-tracking data indicated elevated cognitive load in later tasks, with fixation count and saccade count peaking in Task 2 and pupil diameter peaking in Task 3. Qualitative feedback identified core challenges such as interface ambiguity, limited transparency in AI control, and disrupted task logic. These findings reveal a gap between AI tool affordances and novice user needs, underscoring the importance of interface clarity, editable transparency, and adaptive guidance. As full-site generators increasingly target general users, lowering barriers for novice audiences is essential for equitable access to web creation.

Keywords:

full-site web generation AI design; human–computer interaction; usability barriers; eye-tracking; cognitive load; user-centered design

1. Introduction

The rise in AI-driven web design platforms, especially the full-site web generation AI, promises to democratize website creation by enabling novice users with no professional web design and development education to build full-featured sites through automated, template-based, or conversational workflows. The systems, such as Wix ADI and Bookmark’s AIDA [1,2], aim to simplify the traditionally complex process of web development by automating the generation of site structure, layout, and content through template-based or conversational workflows [3]. By abstracting technical decisions and minimizing the need for manual customization, such platforms aspire to empower individuals without formal training in design or programming to create professional-grade websites [4].

Despite technological advances in automation and layout optimization, the human–computer interaction aspects of full-site web generation AI remain underexplored. While recent studies have focused on the functional and architectural capabilities of these systems [5], far fewer have examined their usability in novice end-users without design expertise. Emerging user reports suggest that while these tools may lower technical barriers, they often introduce new usability challenges, such as confusion over opaque automation processes and frustration when AI-generated outputs diverge from user intentions [6]. This highlights the need for a closer investigation of the usability and cognitive demands these systems place on novice users.

Eye-tracking has proven to be a valuable tool in diagnosing interface usability issues by capturing visual attention, identifying friction points, and assessing cognitive load [7]. For instance, fixation patterns and saccade frequency have been used to understand multiple user behaviors from information processing to cognitive behaviors [8]. Pupil size change has also been argued relevant to cognitive processes, tiredness, and variations in the brightness [9]. Moreover, systematic review of eye-tracking in usability research confirms its capacity to uncover subtle user difficulties that are often missed by traditional surveys or clickstream analytics [10].

However, there is a notable lack of eye-tracking studies focused on AI-powered, full-site website generation systems. This leaves a critical gap at the intersection of human–computer interaction, AI usability, and design accessibility. Given that novice users often lack the domain knowledge to navigate advanced customization workflows, it is particularly important to understand how they visually process and interact with AI-generated interfaces [11]. Insights into their gaze behavior can help surface hidden friction points and inform the development of more transparent, supportive, and user-friendly design environments [12].

This study aims to address this gap by (a) identifying the interaction barriers encountered by novice users when using a full-site web generation AI system; (b) employing eye-tracking metrics—specifically fixation frequency, saccade count, and pupil diameter—in combination with help-seeking behavior logs and post-task interviews to triangulate areas of user difficulty; and (c) proposing design recommendations to enhance the usability and accessibility of AI-assisted web development platforms. We test two hypotheses: (H1) novice users will encounter barriers with website creation using the full-site web generation AI system; and (H2) elevated eye-tracking metrics will co-occur with help-seeking events or task errors more often than non-elevated epochs.

2. Materials and Methods

2.1. Participant Recruitment

Twelve participants were recruited via online campus channels at a university. All participants were recent graduates with no formal education or professional experience in web design or development. Eligibility criteria included: (a) no prior experience in web or UI/UX design, (b) basic digital literacy and familiarity with generative AI tools, and (c) no use of eyeglasses during the session (participants with transparent contact lenses were permitted) to ensure compatibility with the eye-tracking system.

2.2. Materials

Eye movement data were collected using Tobii Glasses 2, which records eye movement including fixation, saccades, and pupil diameter. Tobii Glasses 2 was calibrated for each participant using a single-point on-device procedure at a 1–1.5 m distance following the manufacturer’s guidelines. Calibration was repeated when drift was observed (>1.5° visual angle) or after any headset adjustment. Behavioral data, including screen activity and help-seeking events, were captured via screen recording software, and annotated in real time by the researchers.

To situate Wix ADI and assess generalizability, we reviewed the user flow and surface controls of eight full-site AI web generation systems (Wix ADI, Framer AI, TeleportHQ, Durable, Zyro, Jimdo Dolphin, Tencent Cloud, and Bookmark AiDA). While all support template-driven generation, platforms vary in the following: (a) conversational scaffolding depth, (b) transparency and editability of AI decisions, and (c) granularity of visual control. Despite interface differences, most systems share a similar workflow: users begin by answering questions to define the website’s purpose, audience, and visual style; select from AI-suggested templates; and then customize content before publishing. Wix ADI was selected as the experimental platform due to its high user base (over 250 million users [13]) and integration of conversational AI for layout generation [14]. Screenshots and interface elements from Wix were used solely for interaction testing under fair use for academic research.

2.3. Procedure

All participants completed a demographic and pre-screening questionnaire prior to the study, collecting information on gender, age, academic major, internship experience, and prior exposure to generative AI tools (e.g., ChatGPT 4o, Midjourney). Written informed consent was obtained from all participants in accordance with approval from the ethical committee of Donghua University (SRSR202507070065).

The experiment took place from 1 June to 12 July 2025. Pre-analysis planning, pilot testing, and data quality assurance: Prior to data collection, we drafted a pre-analysis plan specifying primary outcome measure, statistical models, and qualitative procedures. We conducted a small pilot to validate task clarity, timing, Tobii calibration stability, and recording reliability, and used pilot feedback to refine instructions and annotation templates.

Data collection of the experiment was conducted individually in a quiet, dedicated lab office. Upon arrival, participants were given a printed design brief instructing them to create a three-page promotional website for a fictional smartphone product launch using the Wix ADI platform. A high-resolution digital image of the phone was also provided for use during the design task. The first five minutes of the session were allocated to experiment briefing and equipment setup. A researcher explained the overall process, assisted with calibrating the Tobii Glasses 2 eye-tracking system, and instructed participants that they could view the brief for each task only once before proceeding, encouraging memorization of task goals. Participants were informed that the total duration for completing all tasks was capped at 45 min but that they could finish earlier if they felt they had completed or could no longer proceed with the task.

The design process was structured into three sequential tasks, mirroring the standard user flow of full-site web generation tools:

Task 1 (onboarding): Complete the AI-driven onboarding questionnaire to automatically generate a homepage. This included naming the website, defining its function, specifying the target audience, and outlining any desired design features.

Task 2 (template customization):

Task 2.1: Select a website template from AI-generated options. Task 2.2: Modify textual content to suit the product launch context. Task 2.3: Adjust design elements such as layout, color scheme, and typography.

Task 3 (product page creation):

Task 3.1: Add a mock product purchasing page to the website. Task 3.2: Input the provided product name and price. Task 3.3: Upload the given product image into the interface.

Two researchers were present throughout the session. One researcher managed eye-tracking calibration, technical monitoring, and timekeeping, while the other annotated participant behavior, particularly help-seeking instances. Participants were permitted to ask for clarification about the task flow, but researchers did not offer any assistance related to the functionality or operation of the Wix platform, to preserve ecological validity.

After completing the design tasks, each participant participated in a semi-structured interview. Interviews were audio-recorded for transcription and qualitative analysis. As compensation for their time and effort, each participant received a RMB 50 voucher (approximately USD 7).

2.4. Outcome Measure

This study utilized a multi-modal analytical approach integrating task performance, physiological eye-tracking data, standardized usability and digital literacy assessments, and qualitative interview feedback to comprehensively evaluate user interaction with the AI-based website generator.

Task Performance: Participants’ interaction with the AI-based website generator was evaluated using the following task-based outcome measures: Task completion rate: Proportion of subtasks successfully completed for each design goal. Task completion time: Time (in minutes) required to complete each task from initiation to submission.

Eye-Tracking Metrics: Gaze data were captured and analyzed across defined areas of interest (AOIs) within the interface. Key metrics included (1) fixation count: total number of fixations, serving as an indicator of visual attention and potential disorientation; (2) saccade number: number of rapid eye movements, interpreted as evidence of visual search and scanning behavior; (3) mean pupil diameter (mm): continuously recorded as an indirect physiological measure of cognitive load. Prior studies suggest that increased saccade frequency and pupil dilation correlate with higher task difficulty [15,16], while elevated fixation counts often indicate user rise in cognitive load that caused by confusion or disorientation [17].

Help-Seeking Behaviors: Instances of verbal requests for clarification or assistance were logged and timestamped. We operationalized help-seeking as explicit verbal requests for clarification or assistance about task flow or interface operations (e.g., “Where do I change the price?”). Expressions of frustration without a direct request (e.g., sighs, “This is annoying”) were not counted as help-seeking but noted as affective markers in qualitative memos. Each occurrence was treated as a marker of user difficulty or barrier at specific points in the interaction.

Qualitative Feedback: Post-task semi-structured interviews assess participants’ suggestions on “Which aspects of the process do you expect AI to improve?” All interviews were audio-recorded, transcribed verbatim, and subjected to thematic qualitative analysis to identify recurrent patterns and barriers.

2.5. Data Analysis

All data analyses were conducted using IBM SPSS (v28.0) and Tobii Pro Lab. Descriptive statistics including means and standard deviations, were calculated for task performance metrics, eye-tracking measures, and help-seeking behaviors. The Shapiro–Wilk test assessed normality to guide the choice of parametric or non-parametric tests.

Eye-tracking data were segmented into 30 s epochs aligned to participant-specific task timings to account for individual variations in task duration. Increased cognitive load were identified by flagging epochs where any standardized gaze metric—number of fixations starts, average fixation pupil diameter, and number of saccades starts—exceeded one standard deviation above the participant’s mean, indicating heightened visual disorientation, cognitive load, or search effort. These epochs were reviewed alongside screen recordings to provide qualitative context. It was also measured through help-seeking behaviors, which were quantified by counting instances of participant requests for assistance during tasks, enabling correlation with eye-tracking data and task difficulty. To address individual differences in task progression, we realigned eye-tracking epochs to three task phases using participant-specific timing data. Epochs falling outside task boundaries were excluded. Missing data (<15% per phase) were imputed using task-specific linear interpolation. Repeated-measures ANOVA evaluated phase effects (Task 1/2/3), with Greenhouse–Geisser correction applied for sphericity violations (Mauchly’s p < 0.05). We report F, df, and p values. Assumptions were checked via Shapiro–Wilk (normality) and Mauchly’s test (sphericity). Pairwise comparisons used Bonferroni adjustment. Findings are interpreted cautiously given the small sample size.

Finally, post-task interviews were coded using inductive thematic analysis, where audio was transcribed and transcripts were verified against recordings prior to coding. Two independent coders analyzed independently and identified common themes such as interface confusion, AI-control expectations, and workflow interruptions. Disagreements were resolved through discussion: an adjudicator was available but not required.

3. Results

Twelve participants (mean age = 23.08 years, SD = 2.57) were recruited from a university setting via an online questionnaire. Of the participants, 58% were female. All were either final-year undergraduate or graduate students preparing to enter the workforce. None had prior experience with website design or development courses, nor had they worked in related professional roles. However, all reported experiences using generative AI tools, with seven indicating regular use and five indicating occasional use. Participants came from a variety of academic backgrounds, including business, public administration, materials science, marketing, and physics.

3.1. Task Performance

Participants completed three major tasks during the experiment, with Tasks 2 and 3 further subdivided into subtasks. The average time to complete all three major tasks was 25.33 min (SD = 7.12) (see Table 1).

For Task 1, it involved interaction with the AI assistant to generate a basic website framework, including site name, target users, objectives, and personalized requirements. All participants successfully completed this task.

For Task 2, which consisted of the three subtasks of selecting a template, editing content, and customizing visual elements, although all participants successfully chose a template only 66.67% (8 out of 12) completed all three subtasks. Three participants encountered difficulties with content and visual editing.

Task 3 required participants to simulate an online product page by creating a mock buying section, changing the product name and price, and replacing a product image using provided assets. Only 33.33% (4 out of 12) completed all subtasks. The most common challenge was editing product images (seven participants), followed by editing product names and prices (six participants).

Performance metrics showed that successful participants tended to spend more time on Tasks 2 and 3 compared to those who failed. For Task 2, successful participants averaged 14.04 min (SD = 7.16), compared to 9.50 min (SD = 4.02) for those who did not complete the task. Similarly, in Task 3 successful participants averaged 10.19 min (SD = 5.07), whereas unsuccessful ones averaged 5.07 min (SD = 2.38). However, increased time did not consistently predict success.

3.2. User Barriers

The user barriers were obtained from the eye-tracking metrics, the help-seeking behaviors, and the qualitative feedback.

3.2.1. Eye-Tracking Metrics

Three eye-tracking indicators—fixation count, average pupil size, and saccade count—were analyzed across all tasks (see Table 2).

Fixation count increased from Task 1 (M = 67.40, SD = 8.30) to Task 2 (M = 70.70, SD = 7.50), suggesting heightened visual engagement during the more complex editing and customization phases. A slight decrease in Task 3 (M = 68.10, SD = 6.80) may indicate either user fatigue or increasing familiarity with the interface.

Pupil diameter steadily rose from Task 1 (M = 3.57 mm, SD = 0.62) to Task 2 (M = 3.73 mm, SD = 0.60), peaking in Task 3 (M = 3.95 mm, SD = 0.65). This pattern suggests increasing cognitive load, especially during product image editing in Task 3.

Saccade count followed a similar trajectory, increasing from Task 1 (M = 37.80, SD = 15.20) to Task 2 (M = 40.90, SD = 14.50) and then slightly decreasing in Task 3 (M = 38.40, SD = 13.80), reflecting active visual search behavior during interface navigation.

In terms of individual differences in task progression, as shown in Table 3, eye-tracking metrics exhibited significant phase-dependent dynamics (n = 12). Repeated-measures ANOVA revealed a strong main effect of task phase on fixation count (F (2, 22) = 16.83, p < 0.001, η² = 0.605), with pairwise comparisons confirming elevated fixations during Task 2 (M = 68.9, SD = 14.3) relative to both Task 1 (M = 63.2, SD = 12.1; p = 0.002) and Task 3 (M = 59.1, SD = 16.8; p = 0.001). This pattern signifies sustained attentional engagement during Task 2. Pupil diameter similarly varied across phases (GG-corrected F (1.3, 14.3) = 12.74, p < 0.001, η² = 0.537), peaking in Task 3 (M = 4.22, SD = 0.62) and significantly exceeding Task 1 (M = 3.81, SD = 0.51; p < 0.001) and Task 2 (M = 3.97, SD = 0.57; p = 0.003), indicating maximal cognitive load during Task 3. Saccade frequency showed a quadratic trend (F (2, 22) = 9.61, p < 0.001, η² = 0.466), with Task 2 eliciting the highest counts (M = 41.2, SD = 12.6) compared to Task 1 (M = 33.7, SD = 8.9; p = 0.004) and Task 3 (M = 35.4, SD = 15.2; p = 0.009), reflecting exploratory visual search during Task 2. These results demonstrate that eye-movement patterns are intrinsically linked to task structure rather than absolute time, with Task 2 demanding focused attention and Task 3 imposing peak cognitive load despite shorter durations.

The number of elevated eye-tracking metrics (exceeding one standard deviation above the mean) and help-seeking behaviors are illustrated in Figure 1. Elevated fixation counts were observed in Task 1 and Task 2, appearing in nine participants in each, indicating high visual attention during early stages. Within Task 2, subphases 2.1 (template selection) and 2.3 (visual element editing) showed particularly frequent fixation increases (seven and nine instances), reflecting users’ intensive focus during these complex adjustments. Task 3 exhibited slightly fewer elevated fixations (appeared in seven participants), distributed relatively evenly across its subphases. Pupil diameter, a proxy for cognitive load, peaked notably in Task 3 (appeared in 10 participants), especially subphase 3.3 (seven instances) which involved detailed product image editing. This suggests that participants experienced greater mental effort in these later, detail-oriented tasks. Task 2 also showed elevated pupil size, mainly in subphase 2.3 (eight instances), aligning with visual customization demands. Saccade counts were highest in Task 2 (appeared in 11 participants), consistent with active visual scanning during template and content editing. Subphases 2.1 to 2.3 all displayed elevated saccades, underscoring the complexity and navigational demands of this phase. Task 3 showed a moderate number of elevated saccades (nine instances), balanced across subphases.

The data indicate that Tasks 2 and 3 impose greater visual and cognitive demands, evidenced by elevated eye-tracking metrics and increased help-seeking. These findings highlight critical phases where user support and interface improvements would be most beneficial to reduce cognitive overload and enhance task success.

3.2.2. Help-Seeking Behaviors

Help-seeking behaviors were observed across all three major tasks, with a total of 32 instances recorded among 12 participants. Despite being informed that assistance would not be provided, participants made an average of 2.67 help requests (SD = 1.50). These requests were most frequent during Task 2 (n = 15), followed by Task 3 (n = 12), and Task 1 (n = 5), with 5 requests specifically related to adding product information (see Table 4).

The most frequent type of help request occurred during Task 3, specifically in adding a mock buying page (25.00%) and editing product name and price (18.75%), suggesting that participants encountered more difficulty in adding product buying information. In Task 2, template selection and design element editing each accounted for 18.75% of help requests, indicating challenges in navigating and customizing visual components. Fewer requests were observed for content editing (6.25%) and product picture editing (6.25%). Additionally, 15.63% of help requests were related to Task 1, particularly in answering questions during initial interaction with the AI assistant. These findings highlight that help-seeking behaviors were most prevalent in later-stage, detail-oriented tasks, reflecting increasing task complexity and interface demands.

3.2.3. Qualitative Insights from Post-Task Interviews

Post-task interviews were analyzed using inductive thematic analysis, conducted by two independent coders. Five major themes emerged: interface confusion, AI control expectations, workflow interruptions, design specificity, and collaborative roles of humans and AI.

Participant quotes were used to illustrate each theme.

(1): Interface Confusion (nine participants)

Many participants reported challenges in navigating the web editing interface. Complaints focused on limited element options, outdated aesthetics, and unclear visual cues. One participant commented:

“I think the current style looks a bit outdated, the color combinations are not appealing, and the interface is too complicated.”

Another noted:

“Some icons are not clear enough, and the page could be more visual.”

These remarks highlight the need for a more intuitive design system with better visual clarity and customization flexibility.

(2): AI Control Expectations (eight participants)

Participants expressed specific expectations for the AI’s performance, especially in understanding user input and offering task-relevant suggestions. As one participant noted:

“The AI should better understand human text descriptions and handle detailed or lengthy inputs more effectively.”

Participants also hoped for iterative feedback loops where the AI could continue refining results based on user edits:

“After viewing the initial generation, I want to return to the AI chat to adjust or supplement details—make several rounds of improvements.”

(3): Workflow Interruptions (seven participants)

Workflow disruptions were common, especially when the system failed to harmonize new elements with existing designs. For example:

“Different selected elements couldn’t match the overall style automatically—like shapes or colors. It’s hard to adjust.”

In addition, content input fields were perceived as too abstract or lacking proper feedback:

“Some keywords didn’t trigger relevant responses. The content input system felt disconnected.”

These issues created friction in user experience and reduced creative efficiency.

(4): Design Specificity (six participants)

Participants emphasized that certain aspects of design still required human expertise, especially those demanding aesthetic judgment or professional tone. As one participant put it:

“Designers are needed to finalize the layout based on users’ browsing habits and to enhance brand coherence.”

Another explained:

“The product introduction should be improved by designers. Typography is too limited with system fonts.”

This theme reinforces the idea that AI can support but not replace high-level design thinking.

(5): Human–AI Collaboration (5 participants)

Several responses reflected a vision of AI–human collaboration, where AI handles structure and automation while humans oversee creativity and refinement. For example:

“The AI can take care of structural generation and creative direction, but text needs polishing, and unnecessary elements should be removed by the designer.”

Another participant advocated for more integration with design resources:

“There should be more links or cooperation with other design platforms—so we can access richer materials.”

These findings underscore that while participants found AI-generated design support useful, especially in early-stage ideation, they also identified significant gaps in interface usability, contextual understanding, and visual coherence. Human designers remain essential for refining layout, ensuring brand fit, and resolving ambiguity—pointing to a future where AI assists, but does not replace, professional design judgment.

4. Discussion

The rise in full-site AI web generation tools support web design by simplifying the complex process of web development [4]. However, the usability of full-site web generation AI in novice users remains underexplored [18]. This study aimed to fill that gap by identifying the specific interaction barriers encountered by novice users during a multi-stage website building process using an AI-assisted platform. Two hypotheses guided the investigation: (H1) novice users will encounter barriers when completing complex design tasks, and (H2) eye-tracking techniques can effectively identify moments of user difficulty. This study combined task performance, eye-tracking, help-seeking behavior, and qualitative feedback for comprehensive insight.

Findings confirmed that while all participants completed the initial onboarding task successfully, performance significantly declined for more complex tasks, with completion rates dropping to 66.67% for template customization and 33.33% for adding a mock buying interface. This supports H1, indicating that novice users face a number of barriers particularly in tasks requiring precise content and visual element manipulation. Interaction barriers included unclear interface elements, mismatch between AI-generated content and user expectations, and workflow disruptions when system responses failed to reflect user intent in real time.

The eye-tracking data provided further insights. Metrics such as fixation count, saccade frequency, and pupil diameter fluctuated across tasks, reflecting changes in cognitive load [19]. Fixation and saccade rates peaked during template customization (Task 2), as well as the help-seeking behaviors, while pupil diameter was highest in the onboarding task (Task 1). However, these variations in cognitive effort did not consistently correspond to task success or failure. Notably, Task 1, despite being completed by all participants, showed the highest average pupil diameter, suggesting that high cognitive load does not necessarily indicate user breakdown [20]. Rather, the eye-tracking metrics in this study functioned better as indicators of cognitive intensity than as markers of usability failure. Thus, while partially validating H2, the results suggest that eye-tracking alone is insufficient to identify interaction barriers without contextual behavioral or performance data [21].

Complementary behavioral evidence reinforced these findings. Participants exhibited an average of 2.67 help-seeking behaviors, most commonly during product editing and layout adjustment tasks. These requests often centered on understanding system logic or locating specific design tools [22]. Meanwhile, agreeing with previous research, our findings map onto design frameworks that emphasize mixed-initiative and co-adaptive systems. In line with guidelines for human–AI interaction [23], thematic analysis of post-task interviews identified five recurrent user-friction patterns: interface ambiguity, unmet expectations of AI assistance, confusion from misaligned layout logic, reliance on aesthetic judgment beyond users’ comfort zone, and a clear preference for guided human–AI collaboration. Together, these data illustrate that current AI tools, while competent in scaffolding basic structure, fall short in supporting novice users through complex design decisions and semantic content integration.

These findings point to several directions for improving the usability of AI-based web generation tools [24]. First, interface clarity should be enhanced—unclear icons and complicated layout led to confusion and errors, especially during template modification and product editing. Second, the on-board AI’s decision-making process should provide more guidance, with explanation on some design choices such as name choice, target users, and other requirements. Third, systems could provide interactions with different difficulty levels that enable changes between “guided” and “advanced” modes. Finally, real-time, context-sensitive assistance [25] could help reduce reliance on external help-seeking and support users in completing tasks independently. Together, these improvements can make such tools more approachable and empowering for novice users.

This study expands prior HCI literature [26] by shifting focus from professional or semi-professional design users to novice end-users engaging with AI tools. While earlier studies reported only minor usability gaps between AI-assisted and human-designed workflows in professional settings [23], our findings reveal that novice users encounter substantial barriers when directly interacting with AI systems across entire workflows. These discrepancies highlight the critical distinction between AI as a creative assistant for experts versus a replacement for experts in novice-facing contexts. As full-site AI tools increasingly target general users, their interface logic, design flexibility, and feedback mechanisms must adapt accordingly to bridge the gap between user intention and system behavior.

In general, our contribution shifts the lens from expert workflows—where AI often performs as a creative assistant—to novice-facing, end-to-end interaction, where collaboration must emphasize legibility, adjustable autonomy, and safe, teachable refinement. As full-site generators target general users, systems must ensure appropriate human–AI collaboration to translate automation into usable, trustworthy outcomes.

Limitations and Future Directions

While this study provides valuable insights into the barriers novice users face when interacting with full-site web generation AI tools, several limitations should be acknowledged. First, the sample (n = 12) comprised university students and may not represent users with lower digital literacy, older adults, or professionals outside academia; generalizability is therefore limited. As digital literacy moderates tool adoption and error recovery, results should be interpreted as indicative rather than population-level estimates. Second, the study focused exclusively on a single platform (Wix ADI), and future work should include comparative evaluations across multiple AI-driven design tools to isolate platform-specific versus generalizable usability issues. Additionally, while eye-tracking and help-seeking behaviors provided rich indicators of user difficulty, integrating real-time think-aloud protocols or screen-capture-assisted retrospective interviews may further elucidate decision-making processes and frustration points. Future research should also explore adaptive support mechanisms, such as intelligent agents that dynamically respond to cognitive load or visual confusion, to enhance user assistance during complex editing tasks.

5. Conclusions

This study demonstrates that novice users encounter clear and measurable barriers when interacting with AI-driven full-site web generation tools, particularly in complex tasks requiring customization, content manipulation, and visual judgment. Task performance, help-seeking behaviors, and qualitative feedback confirm these usability challenges. Eye-tracking metrics—while effective in revealing cognitive workload—did not consistently align with task success or failure, indicating that their role is better suited to assessing user effort than identifying specific breakdowns. Framed within human–AI collaboration, we recommend co-adaptive, mixed-initiative designs including intuitive interfaces, clear system feedback, personalized interaction difficulty, and real-time assistance.

Author Contributions

Conceptualization, C.C. and J.Z.; Methodology, J.Z.; Software, C.C.; Validation, J.Z. and Z.D.; Formal Analysis, Z.D.; Investigation, J.Z.; Resources, C.C.; Data Curation, C.C.; Writing—Original Draft Preparation, J.Z.; Writing—Review and Editing, C.C. and J.Z.; Visualization, J.Z.; Supervision, Z.D.; Project Administration, J.Z.; Funding Acquisition, Z.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki and approved by the Institutional Review Board of Donghua University (protocol code SRSY202507070065, approved on 7 July 2025).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The data are available upon reasonable request from the corresponding author.

Acknowledgments

Appreciations are given to all participants of this research.

Conflicts of Interest

The authors declare no conflicts of interest.

References

McLean, D. Wix ADI Review 2025: Is It Really That Powerful? Available online: https://www.elegantthemes.com/blog/business/wix-adi-review (accessed on 10 July 2025).
George, J. Using Bookmark’s AIDA to Build Your Website in 2 Minutes. Available online: https://www.sitepoint.com/using-bookmarks-aida-to-build-your-website-in-2-minutes/ (accessed on 10 July 2025).
Oswal, S.K.; Oswal, H.K. Examining the Accessibility of Generative AI Website Builder Tools for Blind and Low Vision Users: 21 Best Practices for Designers and Developers. In Proceedings of the 2024 IEEE International Professional Communication Conference (ProComm), Pittsburgh, PA, USA, 14–17 July 2024; pp. 121–128. [Google Scholar]
Calò, T.; De Russis, L. Advancing Code Generation from Visual Designs through Transformer-Based Architectures and Specialized Datasets. Proc. ACM Hum.-Comput. Interact. 2025, 9, 1–37. [Google Scholar] [CrossRef]
Kaluarachchi, T.; Wickramasinghe, M. A Systematic Literature Review on Automatic Website Generation. J. Comput. Lang. 2023, 75, 101202. [Google Scholar] [CrossRef]
Booyse, D.; Scheepers, C.B. Barriers to Adopting Automated Organisational Decision-Making through the Use of Artificial Intelligence. Manag. Res. Rev. 2024, 47, 64–85. [Google Scholar] [CrossRef]
Hansen, D.W.; Ji, Q. In the Eye of the Beholder: A Survey of Models for Eyes and Gaze. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 32, 478–500. [Google Scholar] [CrossRef] [PubMed]
Zardari, B.A.; Hussain, Z.; Arain, A.A.; Rizvi, W.H.; Vighio, M.S. QUEST E-Learning Portal: Applying Heuristic Evaluation, Usability Testing and Eye Tracking. Univers. Access Inf. Soc. 2021, 20, 531–543. [Google Scholar] [CrossRef]
Zagermann, J.; Pfeil, U.; Reiterer, H. Measuring Cognitive Load Using Eye Tracking Technology in Visual Computing. In Proceedings of the Sixth Workshop on Beyond Time and Errors on Novel Evaluation Methods for Visualization, Baltimore, MD, USA, 24 October 2016; pp. 78–85. [Google Scholar]
Novák, J.Š.; Masner, J.; Benda, P.; Šimek, P.; Merunka, V. Eye Tracking, Usability, and User Experience: A Systematic Review. Int. J. Hum.-Comput. Interact. 2024, 40, 4484–4500. [Google Scholar] [CrossRef]
Zheng, Q.; Chen, M.; Park, H.; Xu, Z.; Huang, Y. Evaluating Non-AI Experts’ Interaction with AI: A Case Study In Library Context. In Proceedings of the CHI 2025: CHI Conference on Human Factors in Computing Systems, Yokohama, Japan, 26 April–1 May 2025; pp. 1–20. [Google Scholar]
Birhane, A.; Isaac, W.; Prabhakaran, V.; Diaz, M.; Elish, M.C.; Gabriel, I.; Mohamed, S. Power to the People? Opportunities and Challenges for Participatory AI. In Proceedings of the EAAMO ‘22: Equity and Access in Algorithms, Mechanisms, and Optimization 2022, Arlington, VA, USA, 6–9 October 2022; pp. 1–8. [Google Scholar]
WIX Logo. Wix.Com (WIX) Company Profile. Available online: https://www.financecharts.com/stocks/WIX/profile (accessed on 10 July 2025).
Salehi Fadardi, M.; Salehi Fadardi, J.; Mahjoob, M.; Doosti, H. Post-Saccadic Eye Movement Indices Under Cognitive Load: A Path Analysis to Determine Visual Performance. J. Ophthalmic Vis. Res. 2022, 17, 397–404. [Google Scholar] [CrossRef]
de Greef, T.; Lafeber, H.; Oostendorp, H.; Lindenberg, J. Eye Movement as Indicators of Mental Workload to Trigger Adaptive Automation. Neuroergon. Oper. Neurosci. 2009, 5638, 219–228. [Google Scholar]
Wang, X.; Chen, D.; Xie, T.; Zhang, W. Predicting Women’s Intentions to Screen for Breast Cancer Based on the Health Belief Model and the Theory of Planned Behavior. J. Obstet. Gynaecol. Res. 2019, 45, 2440–2451. [Google Scholar] [CrossRef]
Wilcox, R.R.; Keselman, H.J. Repeated Measures One-way ANOVA Based on a Modified One-step M-estimator. J. Math. Stat. Psychol. 2003, 56, 15–25. [Google Scholar] [CrossRef] [PubMed]
Kitsara, I. Artificial Intelligence and the Digital Divide: From an Innovation Perspective. In Platforms and Artificial Intelligence: The Next Generation of Competences; Springer: Berlin/Heidelberg, Germany, 2022; pp. 245–265. [Google Scholar]
Mahanama, B.; Jayawardana, Y.; Rengarajan, S.; Jayawardena, G.; Chukoskie, L.; Snider, J.; Jayarathna, S. Eye Movement and Pupil Measures: A Review. Front. Comput. Sci. 2022, 3, 733531. [Google Scholar] [CrossRef]
Ehmke, C.; Wilson, S. Identifying Web Usability Problems from Eyetracking Data. In BCS-HCI ‘07: Proceedings of the 21st British HCI Group Annual Conference on People and Computers: HCI … But Not as We Know It—Volume 1, University of Lancaster, Lancaster, UK, 3–7 September 2007; BCS Learning & Development Ltd.: Swindon, UK, 2007. [Google Scholar]
Van der Wel, P.; Van Steenbergen, H. Pupil Dilation as an Index of Effort in Cognitive Control Tasks: A Review. Psychon. Bull. Rev. 2018, 25, 2005–2015. [Google Scholar] [CrossRef] [PubMed]
Zhao, J.; Zhu, D.; Chang, F.; Han, T. Rehab-Diary: Enhancing Recovery Identity with an Online Support Group for Middle Aged and Older Ovarian Cancer Patients. Proc. ACM Hum.-Comput. Interact. 2024, 8, 1–25. [Google Scholar] [CrossRef]
Amershi, S.; Weld, D.; Vorvoreanu, M.; Fourney, A.; Nushi, B.; Collisson, P.; Suh, J.; Iqbal, S.; Bennett, P.N.; Inkpen, K.; et al. Guidelines for Human-AI Interaction. In Proceedings of the CHI ‘19: CHI Conference on Human Factors in Computing Systems, Glasgow, UK, 4–9 May 2019; pp. 1–13. [Google Scholar]
Namoun, A.; Alrehaili, A.; Nisa, Z.U.; Almoamari, H.; Tufail, A. Predicting the Usability of Mobile Applications Using AI Tools: The Rise of Large User Interface Models, Opportunities, and Challenges. Procedia Comput. Sci. 2024, 238, 671–682. [Google Scholar] [CrossRef]
Cao, S.; Huang, C.M. Understanding User Reliance on AI in Assisted Decision-Making. Proc. ACM Hum.-Comput. Interact. 2022, 6, 1–23. [Google Scholar] [CrossRef]
Jiménez-Crespo, M.A. Human-Centered AI and the Future of Translation Technologies: What Professionals Think about Control and Autonomy in the AI Era. Information 2025, 16, 387. [Google Scholar] [CrossRef]

Figure 1. Frequency of elevated eye-tracking metrics (mean + 1 SD) and help-seeking behavior across task subphases.

Table 1. Task performance of the participants.

No.	Task 1		Task 2					Task3
No.	Length (mins)	Q&A	Length (mins)	Template Selection	Content Editing	Visual Elements Customization	General Task Completion	Length (mins)	New Buying Page	Name and Price Editing	Product Picture Editing	General Task Completion
1	4.13	succeed	13.37	succeed	succeed	succeed	succeed	4.15	succeed	failed	failed	failed
2	5.12	succeed	6.27	succeed	failed	failed	failed	17.54	succeed	succeed	succeed	succeed
3	4.05	succeed	11.52	succeed	failed	failed	failed	5.08	succeed	succeed	failed	failed
4	5.15	succeed	15.02	succeed	succeed	succeed	succeed	6.27	succeed	failed	succeed	failed
5	7.20	succeed	27.11	succeed	succeed	succeed	succeed	4.50	succeed	failed	failed	failed
6	4.13	succeed	22.48	succeed	succeed	succeed	succeed	4.43	failed	failed	failed	failed
7	3.09	succeed	14.32	succeed	succeed	failed	failed	7.15	succeed	succeed	succeed	succeed
8	2.06	succeed	18.50	succeed	succeed	succeed	succeed	9.05	succeed	succeed	succeed	succeed
9	3.38	succeed	10.28	succeed	succeed	succeed	succeed	5.40	succeed	failed	failed	failed
10	2.44	succeed	13.38	succeed	succeed	succeed	succeed	10.08	succeed	succeed	failed	failed
11	6.44	succeed	3.38	succeed	succeed	succeed	succeed	7.04	succeed	succeed	succeed	succeed
12	3.15	succeed	6.32	succeed	failed	succeed	failed	3.09	failed	failed	failed	failed

Table 2. Eye-tracking metrics for fixation count, pupil size, and saccade count across three tasks by participant.

Task No.	1	2	3	4	5	6	7	8	9	10	11	12	Ave
Fixation no.
Task 1	66.10 (11.00)	66.00 (9.90)	72.30 (10.10)	68.50 (8.70)	70.10 (9.20)	74.20 (11.30)	58.60 (12.50)	63.40 (10.50)	69.70 (9.80)	65.80 (10.20)	71.50 (11.00)	55.30 (13.20)	67.40 (8.30)
Task 2	70.20 (8.90)	68.40 (7.30)	75.60 (9.40)	71.30 (6.50)	73.50 (8.10)	77.80 (10.20)	62.30 (11.10)	66.70 (9.30)	72.90 (8.50)	69.10 (9.00)	74.80 (10.10)	59.10 (12.00)	70.70 (7.50)
Task 3	68.50 (5.10)	65.10 (6.20)	71.20 (7.80)	67.90 (5.40)	70.40 (6.70)	74.50 (8.90)	59.80 (9.70)	64.10 (7.60)	70.30 (6.90)	66.50 (7.20)	72.10 (8.50)	56.70 (10.40)	68.10 (6.80)
Ave pupil size (mm)
Task 1	3.76 (0.31)	3.81 (0.29)	2.85 (0.18)	4.25 (0.42)	4.51 (0.45)	4.18 (0.40)	3.63 (0.33)	4.33 (0.43)	3.12 (0.25)	2.73 (0.21)	2.66 (0.20)	3.13 (0.26)	3.57 (0.62)
Task 2	3.92 (0.28)	3.97 (0.26)	3.01 (0.15)	4.41 (0.39)	4.67 (0.42)	4.34 (0.37)	3.79 (0.30)	4.49 (0.40)	3.28 (0.22)	2.89 (0.18)	2.82 (0.17)	3.29 (0.23)	3.73 (0.60)
Task 3	4.15 (0.35)	4.20 (0.33)	3.22 (0.21)	4.63 (0.45)	4.89 (0.48)	4.56 (0.43)	4.01 (0.36)	4.71 (0.46)	3.50 (0.28)	3.11 (0.24)	3.04 (0.23)	3.51 (0.29)	3.95 (0.65)
Saccade no.
Task 1	32.00 (5.10)	49.00 (7.80)	55.00 (8.90)	17.00 (3.20)	46.00 (7.40)	40.00 (6.40)	0.00 (0.00)	46.00 (7.40)	47.00 (7.60)	55.00 (8.90)	44.00 (7.10)	22.00 (4.30)	37.80 (15.20)
Task 2	35.20 (4.70)	52.30 (7.20)	58.10 (8.30)	20.10 (2.80)	49.10 (6.80)	43.10 (5.80)	3.10 (1.20)	49.10 (6.80)	50.10 (7.00)	58.10 (8.30)	47.10 (6.50)	25.10 (3.90)	40.90 (14.50)
Task 3	33.10 (4.10)	49.80 (6.50)	55.60 (7.60)	18.00 (2.40)	46.60 (6.10)	40.60 (5.10)	1.50 (0.70)	46.60 (6.10)	47.60 (6.30)	55.60 (7.60)	44.60 (5.80)	23.00 (3.40)	38.40 (13.80)

Table 3. Repeated-measures ANOVA results for eye-tracking metrics across tasks.

Metric	F	df	p	η²	Significant Contrasts (Bonferroni Adjusted)
Fixation Count	16.83	2, 22	<0.001	0.605	Task 2 > Task 1 (p = 0.002), Task 2 > Task 3 (p = 0.001)
Pupil Diameter	12.74	1.3, 14.3	<0.001	0.537	Task 3 > Task 1 (p < 0.001), Task 3 > Task 2 (p = 0.003)
Saccade Count	9.61	2, 22	<0.001	0.466	Task 2 > Task 1 (p = 0.004), Task 2 > Task 3 (p = 0.009)

Table 4. Frequency and distribution of help-seeking behaviors across tasks and subtasks.

Participant No.	Task 1	Task 2			Task 3			Help-Seeking Behavior No.
Participant No.	Answering Questions	Template Choosing	Content Editing	Design Element Editing	Add a Mock Buying Page	Add Product Name and Price	Add Product Picture	Help-Seeking Behavior No.
1					5			5
2	1					1		2
3					1			1
4		1						1
5		1	1					2
6	1					1		2
7				2	1	1	1	5
8			1	2			1	4
9		2		1	1			4
10	1			1				2
11	2	1						3
12		1						1
Percentage	15.63%	18.75%	6.25%	18.75%	25.00%	9.37%	6.25%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chu, C.; Zhao, J.; Dong, Z. Do Novices Struggle with AI Web Design? An Eye-Tracking Study of Full-Site Generation Tools. Multimodal Technol. Interact. 2025, 9, 85. https://doi.org/10.3390/mti9090085

AMA Style

Chu C, Zhao J, Dong Z. Do Novices Struggle with AI Web Design? An Eye-Tracking Study of Full-Site Generation Tools. Multimodal Technologies and Interaction. 2025; 9(9):85. https://doi.org/10.3390/mti9090085

Chicago/Turabian Style

Chu, Chen, Jianan Zhao, and Zhanxun Dong. 2025. "Do Novices Struggle with AI Web Design? An Eye-Tracking Study of Full-Site Generation Tools" Multimodal Technologies and Interaction 9, no. 9: 85. https://doi.org/10.3390/mti9090085

APA Style

Chu, C., Zhao, J., & Dong, Z. (2025). Do Novices Struggle with AI Web Design? An Eye-Tracking Study of Full-Site Generation Tools. Multimodal Technologies and Interaction, 9(9), 85. https://doi.org/10.3390/mti9090085

Article Menu

Do Novices Struggle with AI Web Design? An Eye-Tracking Study of Full-Site Generation Tools

Abstract

1. Introduction

2. Materials and Methods

2.1. Participant Recruitment

2.2. Materials

2.3. Procedure

2.4. Outcome Measure

2.5. Data Analysis

3. Results

3.1. Task Performance

3.2. User Barriers

3.2.1. Eye-Tracking Metrics

3.2.2. Help-Seeking Behaviors

3.2.3. Qualitative Insights from Post-Task Interviews

4. Discussion

Limitations and Future Directions

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI