Interactive Experience Design for the Historic Centre of Macau: A Serious Game-Based Study

Zhao, Pengcheng; Wang, Pohsun; Lu, Yi; Lu, Yao; Wang, Zi

doi:10.3390/buildings16020323

Open AccessArticle

Interactive Experience Design for the Historic Centre of Macau: A Serious Game-Based Study

by

Pengcheng Zhao

¹

,

Pohsun Wang

^1,*

,

Yi Lu

²,

Yao Lu

¹ and

Zi Wang

³

¹

Faculty of Innovation and Design, City University of Macau, Macau 999078, China

²

College of Art and Design, Beijing University of Technology, Beijing 100124, China

³

College of Humanities, Xi’an Shiyou University, Xi’an 710065, China

^*

Author to whom correspondence should be addressed.

Buildings 2026, 16(2), 323; https://doi.org/10.3390/buildings16020323

Submission received: 9 November 2025 / Revised: 22 December 2025 / Accepted: 7 January 2026 / Published: 12 January 2026

(This article belongs to the Special Issue New Challenges in Digital City Planning)

Download

Browse Figures

Versions Notes

Abstract

With the advancement of digital technology, serious games have become an essential tool for disseminating and educating individuals about cultural heritage. However, systematic empirical research remains limited with respect to how visual elements influence users’ cognitive and emotional engagement through interactive behaviors. Using the “Macau Historic Centre Science Popularization System” as a case study, this mixed-methods study investigates the mechanisms by which visual elements affect user experience and learning outcomes in digital interactive environments. Eye-tracking data, behavioral logs, questionnaires, and semi-structured interviews from 30 participants were collected to examine the impact of visual elements on cognitive resource allocation and emotional engagement. The results indicate that the game intervention significantly enhanced participants’ retention and comprehension of cultural knowledge. Eye-tracking data showed that props, text boxes, historic buildings, and the architectural light and shadow shows (as incentive feedback elements) had the highest total fixation duration (TFD) and fixation count (FC). Active-interaction visual elements showed a stronger association with emotional arousal and were more likely to elicit high-arousal experiences than passive-interaction elements. The FC of architectural light and shadow shows a positive correlation with positive emotions, immersion, and a sense of accomplishment. Interview findings revealed users’ subjective experiences regarding visual design and narrative immersion. This study proposes an integrated analytical framework linking “visual elements–interaction behaviors–cognition–emotion.” By combining eye-tracking and information dynamics analysis, it enables multidimensional measurement of users’ cognitive processes and emotional responses, providing empirical evidence to inform visual design, interaction mechanisms, and incentive strategies in serious games for cultural heritage.

Keywords:

cultural heritage; serious games; eye tracking; interaction design; visual elements

1. Introduction

Cultural heritage represents humanity’s valuable cultural legacy, encompassing diverse forms such as architecture, archeological sites, and cultural landscapes. It serves as a testament to history and artistry while functioning as a key vehicle for community identity and continuity [1]. As sustainable development gains global consensus, the preservation and revitalization of cultural heritage have transcended cultural boundaries to become an important pathway for advancing multidimensional economic, social, and environmental progress, thereby promoting inclusive and sustainable regional growth [2]. As a representative region for Sino-Western cultural exchange, Macau’s Historic Centre was inscribed on the World Heritage List in 2005. Encompassing 22 historic buildings, including A-Ma Temple and the Ruins of St. Paul’s, as well as 8 squares, it holds unique cultural and historical value. Reinterpreting Macau’s cultural heritage through modern design language, while harmonizing traditional cultural identity with contemporary aesthetic and functional demands, can inject new vitality into its preservation and dissemination [3].

Currently, education-oriented serious games serve as effective tools for promoting cultural heritage. Through their “learning by doing” design mechanisms, they integrate explicit learning objectives into gaming experiences, thereby enhancing user engagement and knowledge retention [4,5,6,7]. When combined with technologies such as virtual reality (VR), augmented reality (AR), and 3D digitalization, serious games can create immersive learning environments [8,9,10,11]. For instance, integrating AR technology into Macau’s cultural festivals allows visitors to scan QR codes and participate in virtual games that blend local architecture with Portuguese symbols, providing novel opportunities for cultural exploration [12]. Through 3D spatial reconstruction, temporal narratives, and role-playing simulations, serious games deepen users’ understanding of historical contexts and archeological methodologies [13,14]. However, serious games focusing on cultural heritage continue to face challenges, including limited user retention, lack of long-term feedback, and insufficient iterative optimization. Game design must balance functionality with playability while ensuring the accuracy, diversity, and cognitive appropriateness of the cultural knowledge presented [15].

Eye-tracking technology provides an objective and precise research method for revealing cognitive processing by recording users’ gaze and saccade behaviors within visual scenes [16,17]. Based on principles of infrared imaging, this technology extracts multidimensional metrics, including fixation duration, fixation count, saccade paths, and pupil diameter, thereby quantifying users’ attention allocation, cognitive load, and information processing [18]. In recent years, eye-tracking technology has been widely applied in psychology, human–computer interaction, landscape visual assessment, and educational technology [19,20]. In serious games research, eye-tracking data are frequently used to evaluate the effectiveness of visual interface elements, thereby optimizing designs to enhance learning outcomes and user experience [21]. Studies have shown that dynamic media and image panels sustain user attention longer than static labels and that video game graphic design effectively captures target users’ attention [22]. In special education, eye-tracking studies involving children with autism spectrum disorder have deepened understanding of their cognitive processes, informing the design of joint attention interventions [23]. In resource management simulation games, metrics such as eye movement frequency, saccade rate, and pupil diameter effectively predict task difficulty and user performance [24]. Furthermore, dynamic information indicators such as transition entropy and stationary entropy have been applied to analyze fixation sequence switching patterns, assess attention distribution across areas of interest (AOIs), and reveal users’ visual cognitive strategies and their relationships with emotional arousal [20,25].

From a cognitive psychology perspective, how attention translates into emotional experiences and learning outcomes has been systematically examined across multiple theoretical frameworks. Cognitive Load Theory (CLT) and the Cognitive Theory of Multimedia Learning (CTML) indicate that learners have limited working-memory resources, and that different forms of visual and textual information can affect learning outcomes and psychological experience by shaping intrinsic load, extraneous load, and germane load [26,27]. Flow theory and immersion research further suggest that people are more likely to enter a highly engaged and enjoyable state when task goals are clear, feedback is timely, and interaction is smooth [28]. The embodied cognition perspective also proposes that first-person spatial exploration and embodied interaction with the environment can strengthen contextual understanding and emotional immersion [29,30]. Building on these theories, this study examines how visual elements with different levels of interaction dominance shape users’ emotional experiences and learning outcomes in cultural heritage serious games through attention allocation, cognitive load, and the continuity of immersion.

Although previous studies have made preliminary explorations into the role of visual elements in serious games, several limitations remain. First, most research has not systematically distinguished between active and passive interactive visual elements from the perspective of interaction dominance, making it difficult to reveal their differential effects on users’ cognitive processes and emotional responses [31]. Second, research methods predominantly rely on single data sources, lacking integrated multimodal analytical frameworks that combine eye-tracking, behavioral, and subjective data. This limitation hinders the comprehensive capture and interpretation of visual attention dynamics [32]. Furthermore, empirical studies focusing on serious games for cultural heritage remain limited, particularly those grounded in specific cultural contexts. Consequently, providing precise references for gamified learning design within particular cultural settings remains a challenge [10,15].

Therefore, this study uses the “Macau Historic Centre Science Popularization System” as an experimental platform to construct and empirically test an interaction-dominance-based analytical framework linking “visual elements–interactive behaviors–cognitive and emotional experiences–learning outcomes,” systematically examining the distinct influence mechanisms of active and passive visual elements. The innovation of this study is reflected across three dimensions: (1) Theoretical dimension: From an “interaction dominance” perspective, an integrated analytical framework was developed to systematically incorporate both active-interaction and passive-presentation visual elements, thereby elucidating the intrinsic mechanism through which visual elements influence learning outcomes via differentiated behavioral pathways and affective experiences. (2) Methodologically, it integrates multimodal data including eye-tracking metrics, information dynamism indicators, behavioral logs, the GEQ game experience scale, pre- and post-test learning assessments, and semi-structured interviews. This approach comprehensively captures the process from visual presentation to behavioral response, subjective experience, and learning outcomes, surpassing existing studies reliant on single questionnaires or static eye-tracking metrics. (3) At the practical level, using the World Heritage site of Macau Historic Centre as an empirical setting, this study proposes critical design trade-offs between “high emotional arousal, cognitive load, and interaction complexity.” It provides directly applicable empirical evidence for designing digital guided tours, narratives, and spatial experiences for similar cultural heritage sites.

This study proposes the following research questions and hypotheses (Figure 1):

RQ1: Can serious games about cultural heritage significantly improve users’ learning performance in terms of cultural knowledge retention and comprehension?

H1.

Users participating in serious games about cultural heritage will demonstrate significantly improved cultural knowledge retention and comprehension compared to pre-test levels, as reflected in higher scores on knowledge performance assessment questionnaires.

RQ2: Are there significant differences in eye-tracking metrics (TFD, AFD, FC, TFF, FFD, VC) across different categories of visual elements? Which elements receive the highest user attention?

H2.

Significant differences exist in eye-tracking metrics such as fixation duration (TFD) and fixation count (FC) across categories of visual elements; users show the highest visual engagement with props, text boxes, architectural light and shadow shows, and historic buildings.

RQ3: In games, do player-driven, active-interaction visual elements (NPCs, menus, characters, props, etc.) elicit greater emotional arousal potential than environment-driven, passive-interaction visual elements (historic buildings, architectural light and shadow shows, text boxes) in terms of information dynamism?

H3.

The dynamic nature of information in active-interaction visual elements, measured by transition entropy and stationary entropy, significantly enhances emotional arousal more effectively than passive-interaction visual elements.

RQ4: Do different visual elements exert varying effects on active and passive interaction behaviors? What are the specific correlations and degrees of influence?

H4.

NPCs and characters significantly affect active interaction behaviors, whereas architectural light and shadow shows and text boxes have a more pronounced influence on passive interaction behaviors.

2. Materials and Methods

2.1. Project Description

This study utilizes the self-developed “Macau Historic Centre Science Popularization System” to construct an experimental platform. Focusing on the World Cultural Heritage site of Macau’s Historic Centre, the platform selects two representative structures—St. Dominic’s Church and the Ruins of St. Paul’s—and employs serious games to enhance participants’ cognitive abilities, comprehension, emotional engagement, and participation in cultural heritage [33]. Developed using the Unity 2022 LTS engine (Unity Technologies, San Francisco, CA, USA), the system applies 3D modeling technology to digitally reconstruct historic buildings, accurately reproducing their architectural forms and spatial characteristics. Built upon an “exploration–learning–reward” educational framework, the game enables participants to assume the role of “cultural explorers” from a first-person perspective, conducting autonomous discovery within the virtual environment [34].

During the exploration phase, participants select target buildings via the map interface to enter a 3D virtual environment where they explore reconstructed historical sites. The system employs visual cues to help users identify interactive elements and spatial hotspots. The learning phase uses multimodal interaction design to facilitate knowledge acquisition: approaching non-player characters (NPCs) triggers dialogue interactions in which NPCs provide architectural and historical context through audio narration with text annotations, introducing environmental objectives and task requirements [35]. Simultaneously, the system displays text and image information windows for participants to deepen their understanding of the cultural context [36]. Additionally, participants must complete levels such as matching terms related to the Rose Chapel and assembling the Ruins of St. Paul’s puzzle, using interaction to enhance comprehension of architectural features and cultural significance [37,38]. In the reward phase, upon completing all learning tasks, the system activates a light and shadow spectacle on the building’s facade. By integrating dynamic lighting, particle effects, and soundscapes, it presents cultural imagery of Macau, offering emotional incentives and reinforcing learning achievement feedback [39]. Through its structured phased design, this platform provides a reliable experimental environment for researching the educational efficacy of serious games in cultural heritage education.

To ensure clarity and consistency in task instructions and core knowledge delivery in the experimental setting, this system adopts targeted designs for information presentation. First, it avoids relying on color alone to encode critical information. All task objectives and status feedback integrate multiple cues, including graphics, text, and spatial positioning, thereby reducing potential effects associated with individual differences in color perception. Second, during the learning phase, NPC voice explanations are fully accompanied by synchronized text boxes. This ensures information is accessible through a stable visual reading path supported by redundant multimodal cues. These designs aim to mitigate the potential impact of individual differences in momentary auditory attention and related factors on task execution. Consequently, they establish a more controlled baseline for subsequent analyses during data collection.

2.2. Experimental Preparation

2.2.1. Photograph Selection

The visual stimulus materials for this experiment were derived from screenshots captured during the operation of the “Macau Historic Centre Science Popularization System.” A total of 80 original images were obtained, all captured under uniform specifications: 300 dpi resolution, 1920 × 1080 pixels, 16:9 aspect ratio, and sRGB color space. This ensured consistent visual presentation quality. These images covered seven key interactive phases of the game: system login, narrative introduction, map navigation, free exploration, mission levels (term-matching and puzzle games), NPC dialogue, and the facade light show reward segment. Four experts were invited to evaluate the images based on the following criteria: (1) scene completeness, i.e., whether it accurately represents a specific game interaction phase; (2) clarity and visibility of key visual elements; (3) avoidance of visual overlap or excessive clutter to ensure interpretability of eye-tracking data.

Through expert evaluation, 16 representative images were ultimately selected as the official experimental stimuli. These images ensure comprehensive coverage of core interactive scenarios and visual diversity within serious games, specifically including two system interface images, 6 exploration scene images, 5 task interaction images, and 3 reward sequence images. This structured set of visual stimuli provides a reliable experimental foundation for subsequent research analyzing attention allocation across different visual elements [18]. It should be specifically noted that these images serve solely as spatial reference templates for defining areas of interest (AOIs) during postprocessing in the eye-tracking analysis software (Tobii Pro Lab v. 1.207, Tobii AB, Stockholm, Sweden). They are not static stimuli presented to participants during the experiment. All experimental data in this study were derived from participants’ complete real-time gameplay captured in subsequent procedures.

2.2.2. Participants

This study recruited 30 participants through online and offline channels, comprising 12 males (40%) and 18 females (60%). Participants ranged in age from 18 to 45 years (M = 26.67, SD = 5.50), with the following age distribution: 9 participants (30%) aged 18–25, 15 aged 26–35 (50%), and 6 aged 36–45 (20%). Participants had visited the Historic Centre of Macau between 1 and 5 times or more (M = 2.27, SD = 0.83), indicating a foundational understanding of the cultural heritage site and thus reflecting key characteristics of the target visitor group to some extent. Video game usage frequency ranged from “daily” to “rarely or almost never” (M = 3.40, SD = 1.52), with an overall distribution slightly skewed toward low-to-moderate users, reflecting the authentic experiences of a broader audience. All participants had normal or corrected-to-normal vision, exhibited no color blindness or color weakness, and had no prior exposure to the serious game system used in this study to ensure data validity. All participants voluntarily participated and signed informed consent forms. The informed consent form specified that this study would collect eye-tracking data and records of interactive behaviors during the experiment and would conduct audio recording and transcription of semi-structured interviews. All data will be used solely for scientific analysis. Participants may withdraw from the experiment at any stage without penalty and without incurring any adverse consequences and may refuse to answer any questions they are unwilling to address. To protect privacy, researchers will manage personal identifiers separately from research data. Data analysis and results presentation will exclusively utilize de-identified codes.

The sample size (N = 30) for this study was primarily constrained by the requirements of the multimodal experimental design. Each participant was required to complete the entire process, including pre-tests, eye-tracking game experiences, questionnaires, post-tests, and in-depth interviews. Given the stringent environmental and calibration demands of the eye-tracking experiments, coupled with the high time costs associated with the full process, recruiting a large sample presented significant operational challenges. Furthermore, according to established conventions in Human–Computer Interaction (HCI) and eye-tracking research, N = 30 represents a common and feasible sample size for comparable studies. In addition, post hoc sensitivity analysis indicates that under a significance level (α = 0.05) and statistical power of 0.80, this sample size is sufficient to detect an effect size (d ≥ 0.53). Considering that the effect size actually observed in this study far exceeds this threshold (d = 2.00), the sample size possesses adequate statistical power to detect the primary outcome variable.

2.3. Experimental Procedure

This experiment was conducted in the laboratory of the School of Innovation and Design at City University of Macau. To control environmental interference, the experiment was carried out in a soundproof room with curtains closed to block natural light. The room was equipped with constant artificial lighting using top-down diffuse illumination, and screen brightness was uniformly set to maintain stable lighting conditions, preventing environmental variations from affecting pupil measurement data. Visual stimuli were presented on a 24.5-inch ThinkVision LCD monitor (Lenovo, Beijing, China; resolution 1920 × 1080 pixels, refresh rate 100 Hz). The experiment ran on a Lenovo ThinkStation P920 workstation (Lenovo, Beijing, China) equipped with an Intel Xeon Scalable-series processor, an NVIDIA Quadro professional graphics card, and high-capacity ECC memory, ensuring stable game performance. Eye-tracking data were collected using the Tobii Pro Glasses 3 wearable eye tracker (Tobii AB, Stockholm, Sweden) (Figure 2) at a sampling rate of 50 Hz [21].

Each participant completed the experiment individually. Before the experiment began, the experimenter provided a detailed explanation of the purpose, procedures, and precautions. During this explanation, participants were explicitly informed that data collection included eye-tracking recordings and synchronized scene videos, interaction logs, and audio recordings of semi-structured interviews for subsequent transcription. Participants were informed that they could withdraw at any stage without penalty and could refuse to answer any questions they did not wish to address. Research findings were reported only through statistical summaries and anonymous quotations, without personally identifiable information. Following the experiment, raw data and transcripts were stored using de-identified codes, with access restricted to the research team as needed for academic analysis and review. In addition, eye-tracking data and the corresponding screen-interaction footage were synchronously recorded by the scene camera of Tobii Pro Glasses 3 on the same timeline, enabling frame-level alignment between gaze behavior and mouse operations. Behavioral coding was conducted via video playback and recorded only valid clicks on active-interaction visual elements (e.g., props, NPCs, dialogue boxes, function icons). Clicks on passive-interaction elements (e.g., historic buildings, text boxes, architectural light and shadow shows) were treated as task-irrelevant exploratory behaviors or accidental operations and were not counted as valid behavioral data.

They first completed a pre-test questionnaire covering demographic information, cultural heritage awareness, gaming experience, and cultural interests. Subsequently, experimenters guided them to sit in comfortable chairs positioned 60–65 cm from the screen, fitted them with the eye-tracking device, and performed calibration. Calibration was considered successful when the average error reported by the eye-tracking analysis software was ≤0.5°. If the initial calibration failed to meet this criterion, recalibration was performed 1–2 times; if multiple attempts still failed to reach the threshold, the participant’s eye-tracking data were excluded from subsequent analyses. After passing calibration, on-site verification was conducted by combining the software’s real-time average error feedback with the stability of the fixation trajectory. If verification indicated an average error exceeding 0.5° or signs of systematic drift due to headset displacement, recalibration was performed immediately. After successful calibration, participants were given 60 s to freely operate the keyboard and mouse to familiarize themselves with the game interface and interaction methods.

During the formal experimental phase, participants wore an eye tracker while experiencing the “Macau Historic Centre Science Popularization System,” lasting approximately 15–20 min. The game unfolded from a first-person perspective, encompassing elements such as exploration, learning tasks, and game rewards. The system simultaneously recorded eye-tracking data and interaction logs. Immediately after the experiment, participants completed a post-test questionnaire to assess learning performance, emotional engagement, and willingness to interact. This was followed by a semi-structured interview in which researchers asked questions about the gaming experience, visual design, and effectiveness of cultural communication (Figure 3). All interviews were audio-recorded and transcribed for subsequent analysis [40]. Throughout the experiment, environmental noise was consistently maintained at a low level (below normal conversational volume) to minimize external interference.

Interaction Logic and Task Flow

This study employed a single-session, dynamic real-time gameplay paradigm. To ensure all participants received visual stimuli within a consistent contextual framework, the game design adopted a strictly linear narrative structure. All participants sequentially completed the entire process comprising exploration, learning, and reward phases in a fixed order. This journey progressed from St. Dominic’s Church to the Ruins of St. Paul’s, culminating in the architectural light and shadow shows. The specific sequence is illustrated in (Figure 4). This design effectively avoided cognitive fragmentation potentially caused by scene randomization through a standardized narrative path, thus eliminating the need for scene counterbalancing.

Within this linear framework, in-game interactions were categorized into two types based on user control to facilitate subsequent analysis. The first type is active interaction, referring to mandatory user actions that participants must perform to advance tasks, such as clicking on NPCs or completing puzzles, where both initiation and feedback are user driven. The second category is passive interaction, referring to stimuli automatically presented by the system and primarily received through visual attention, such as watching architectural animations, reading information boxes, or viewing architectural light and shadow shows. These actions are optional and do not require users to perform active operations. The validity of this classification was confirmed through triangulation with behavioral logs and eye-tracking patterns. Active-interaction visual elements correlate with explicit click events and shorter Time to First Fixation (TFF), while passive-interaction visual elements induce longer Total Fixation Duration (TFD) without requiring mandatory actions.

2.4. Eye-Tracking Experiment

2.4.1. Defining AOI

The Area of Interest (AOI) is a crucial research method for identifying specific spatial regions of visual stimuli and examining the relationship between eye-tracking behavior and the visual environment [41]. In this study, AOIs were defined based on screen pixel boundaries using Tobii Pro Lab software. Visual elements within the game interface were systematically categorized into ten functional attributes: menu icons, function icons, text boxes, historic buildings, spatial navigation, characters, NPCs, dialogue boxes, props, and architectural light and shadow shows [42,43,44]. To ensure comparability of AOI boundaries across participants, all AOIs were drawn and annotated on 1920 × 1080 scene video frames using pixel-based boundaries.

For recurring static visual elements (such as menu icons, function icons, and text boxes), template matching is employed. AOI templates were predefined on representative interface screenshots captured during gameplay. During video analysis, when a video frame matched a screenshot interface, the software automatically applied the template AOI to the corresponding time interval. Manual correction was then conducted to ensure temporal accuracy. For dynamic visual elements such as architectural light and shadow shows, overall player visual attention during these events was analyzed by manually drawing fixed AOIs, based on the video frames corresponding to each event [45].

To further analyze visual behavior characteristics under different interaction modes, AOIs were classified into two major types based on user interaction with visual elements. Active-interaction AOIs included menu icons, function icons, spatial navigation, characters, dialogue boxes, props, and NPCs, all of which required deliberate actions such as mouse clicks or task triggers. Passive-interaction AOIs encompassed historic buildings, architectural light and shadow shows, and text boxes, where content presentation was driven by the game environment. Information reception occurred primarily through visual cues without active user intervention [46]. To account for potential eye-tracking measurement biases, AOI boundaries were drawn slightly larger than the actual target elements, with a minimum spacing of 20 pixels between AOIs to prevent misallocation of fixations. In summary, all AOI boundaries were defined at the pixel level. Static elements were fixed using templates, whereas dynamic elements used fixed regions covering their full presentation area. Specific boundary examples are provided in (Figure 5 and Supplementary Materials S1).

2.4.2. Eye-Tracking Metrics

To analyze users’ attention allocation patterns toward various visual elements during gameplay, this study selected six core eye-tracking metrics from gaze behavior for quantitative analysis. All raw eye-tracking data were preprocessed in Tobii Pro Lab (v. 1.207). Fixations and saccades were classified using the built-in I-VT (Velocity-Threshold) algorithm with default parameter settings to ensure standardized processing and algorithmic robustness. Blink events, identified as temporary loss of pupil imaging, and other signal-loss segments were treated as invalid and removed from the fixation sequences without data interpolation. Off-screen fixations were identified based on gaze-point mapping results, and records with gaze points outside the screen display area were excluded from Area of Interest (AOI) statistics. Accordingly, all AOI-based eye-tracking metrics were computed from the resulting valid fixation sequences. Metrics were extracted from predefined AOIs (Table 1). These metrics include Total Fixation Duration (TFD), Fixation Count (FC), First Fixation Onset Time (TFF), First Fixation Duration (FFD), Average Fixation Duration (AFD), and Visit Count (VC). Collectively, these metrics evaluate the intensity of user attention, depth of processing, and exploration patterns of visual elements [47]. In addition, because this study focuses on the total amount of attention captured by visual elements in natural gameplay, and because element size (e.g., large-scale historic buildings versus small UI icons) is an intentional design attribute, eye-tracking metrics were not normalized by AOI area in order to preserve ecologically grounded viewing behavior.

2.5. Measurement Instruments

2.5.1. Learning Performance Questionnaire

This study employed a self-developed Learning Performance Assessment Questionnaire, administered before and after the experimental intervention, to evaluate the effects of serious games on knowledge acquisition in cultural heritage education [48]. The questionnaire assessed participants’ learning outcomes across two dimensions: “knowledge retention” and “knowledge comprehension.” Knowledge retention measures users’ ability to retain and retrieve key cultural heritage information, reflecting both short-term mastery and the long-term impact of educational content. It also indicates that long-term effectiveness requires further validation through subsequent longitudinal follow-up studies. Existing research indicates that contextualized learning and immersive experiences enhance knowledge retention [49]. Knowledge comprehension reflects the ability to interpret and infer cultural significance, architectural features, and historical context [10].

The questionnaire consisted of 10 multiple-choice questions (Table 2), with five assessing knowledge retention and five evaluating knowledge comprehension. The content covered historical events, architectural styles, cultural symbols, and spatial structures featured in the game. All questions were developed based on core knowledge elements from the Macau Historic Centre Science Popularization System. Each correct answer earned 1 point, while incorrect or uncertain responses scored 0, yielding a total score ranging from 0 to 10. In this study, learning performance measurement focused on knowledge retention and knowledge comprehension only. No independent task was designed to assess transfer learning; therefore, transfer was not included among the outcome variables. In addition, to ensure that both assessments targeted the same knowledge points and to maintain pre-test and post-test comparability, identical test forms were administered at both time points, and no parallel versions were used. The questionnaire served as both a pre-test to assess baseline knowledge levels before the game intervention and as a post-test to evaluate learning outcomes after gameplay.

To validate the quality of the measurement tool, this study conducted item analysis and reliability testing on the questionnaire data, with detailed results summarized in Supplementary Materials S2. Despite constraints related to the sample size (N = 30) and the limited number of items (10 items), the internal consistency of this self-developed assessment remained within an acceptable range for classroom-based tests. Item analysis further indicated good overall discrimination, with an average discrimination index (

D_{pre}

= 0.49) and an average point-biserial correlation coefficient (

r_{pb}

= 0.27). Moreover, the mean item difficulty (P) increased from 0.39 in the pre-test to 0.67 in the post-test, indicating that the instrument was sensitive to knowledge gains attributable to the instructional intervention (instructional sensitivity).

2.5.2. Game Experience Questionnaire

The Game Experience Questionnaire (GEQ) employed in this study was adapted from an existing evaluation framework to assess participants’ overall experiences in a serious game environment [50]. The scale measures two dimensions: Task Engagement and Emotional Engagement. Task Engagement reflects users’ behavioral investment and perceived goal attainment during gameplay, while Emotional Engagement focuses on emotional responses and subjective feelings elicited during interaction [51]. This framework is well-suited for analyzing the overall experiential structure of cultural heritage serious games, specifically what users did and how they felt.

The scale comprises nine assessment dimensions. Task Engagement encompasses four sub-dimensions: boredom, accomplishment, competence, and fatigue. Competence measures users’ subjective evaluation of their ability to complete tasks and sense of control. Boredom assesses the degree of interest loss during task performance. Accomplishment reflects self-affirmation and goal attainment experienced after task completion. Fatigue measures the level of physical and mental exhaustion during sustained task performance [10,52]. Emotional Engagement comprises five sub-dimensions: positive emotions, negative emotions, tension, immersion, and flow experience. Flow experience characterizes the subjective state of sustained focus, diminished time perception, and effortless action during task progression. Positive emotions reflect pleasant experiences like enjoyment and interest, while negative emotions reflect distressing experiences such as frustration and disappointment. Tension describes a state of high stress and heightened alertness, while immersion assesses the user’s level of engagement and loss of self-awareness within the game [53]. The scale employs a five-point Likert scale (1 = Strongly Disagree, 5 = Strongly Agree), comprising 27 items in total, with each sub-dimension consisting of 3 items. Cronbach’s α for each sub-dimension in this study ranged from 0.738 to 0.870, indicating good internal consistency reliability. Furthermore, given the sample size (N = 30), we did not conduct confirmatory validation at the subscale level for the nine subscales. Instead, we focused on assessing the construct validity of the two higher-order dimensions: Task Engagement and Emotional Engagement. Analysis results show that the KMO values for both higher-order subscales meet the suitability criteria for factor analysis (KMO > 0.6), and Bartlett’s sphericity test is significant (p < 0.001), indicating the statistical robustness of the scale’s higher-order factor structure. Relevant statistical results are detailed in Supplementary Materials S3.

In defining the theoretical framework of the affective dimension, this study adopts the two-dimensional valence–arousal model. This model posits that emotions can be described by two mutually orthogonal dimensions: pleasantness–unpleasantness (valence) and high–low activation (arousal). Specifically, the positive and negative emotion subscales in the GEQ directly correspond to the valence dimension of emotion, while items such as tension and fatigue reflect high-arousal negative emotional states (high arousal with negative valence). Therefore, the term “emotional arousal” in this study specifically refers to the level of emotional activation represented by indicators like tension and fatigue. The positive or negative direction (valence) of emotions is independently assessed by the positive/negative emotion scales. This approach avoids conceptual and operational overlap between the two dimensions.

2.6. Data Analysis

2.6.1. Quantitative Analysis

Quantitative methods were used to analyze eye-tracking data, learning performance assessment results, and GEQ data to test research hypotheses (H1–H4). To ensure data quality, invalid eye-tracking samples with sampling rates below 85% were excluded prior to formal statistical analysis. All subsequent statistical analyses were conducted using IBM SPSS Statistics 24 software. Click counts on active-interaction elements, obtained through video coding, were analyzed alongside the corresponding eye-tracking metrics, learning performance scores, and experience questionnaire responses for subsequent correlation analyses and hypothesis testing. In addition to the sampling-rate threshold, eye-tracking data were excluded if calibration failed to meet standards, repeated calibrations did not achieve an average error ≤ 0.5°, prolonged or frequent signal loss prevented stable fixation sequences, or system errors caused misalignment between eye-tracking data and interaction logs.

(1) The Shapiro–Wilk test assessed the normality of the pre- and post-test differences in total knowledge scores. Data were considered normally distributed if p > 0.05; otherwise, they were deemed non-normally distributed [54]. For normally distributed data, a paired samples t-test analyzed pre- and post-intervention differences, with Cohen’s d calculated as the effect size. The 95% confidence interval for the mean difference was also reported to indicate estimation precision. For the memory and comprehension sub-dimensions that did not meet normality assumptions, the Wilcoxon signed-rank test was used to analyze the data and test H1 [55].

(2) Given the within-subjects design employed in this study, a one-way repeated-measures ANOVA was conducted to examine differences across 10 visual-element categories on six eye-tracking metrics (TFD, AFD, FC, TFF, FFD, VC). Shapiro–Wilk tests were used to assess distributional assumptions. The results indicated no severe deviations from normality at the AOI-type level; although some metrics (e.g., FFD) showed slight skewness in a few AOIs, repeated-measures ANOVA is considered robust to modest normality violations at N = 30 [56]. Sphericity was evaluated using Mauchly’s Test of Sphericity; when sphericity was violated (p < 0.05), Greenhouse–Geisser corrections were applied, and the adjusted degrees of freedom and F values were reported accordingly [57]. Post hoc comparisons used Bonferroni-corrected pairwise tests to control Type I error. Effect sizes (η_p²) were reported, and 95% confidence intervals for marginal means were provided to support interpretation, thereby testing H2.

(3) Spearman’s correlation analysis examined the relationship between the information dynamics (transition entropy

H_{t}

and stationary entropy

H_{s}

of active- and passive-interaction visual elements and the GEQ emotional arousal dimension to test H3 [58]. Furthermore, we investigated the correlation between the emotional dimensions of game experience (emotional arousal and emotional engagement) and knowledge acquisition (pre-test to post-test improvement in learning performance). Specifically, emotional arousal was operationally represented by the composite scores of the tension and fatigue subscales, while emotional engagement was assessed using its corresponding higher-order dimension score, calculated as the mean of relevant item scores, for correlation testing. To further characterize the dynamics of visual attention during interaction, this study introduces information dynamics metrics. Transition entropy (

H_{t}

) and stationary entropy

(H_{s})

were computed for AOI gaze sequences, with definitions and calculation methods detailed in Section 2.6.2.

Given the small sample size and the exploratory nature of the correlation tests, this study reports uncorrected p-values to identify potential associations and suggest directions for future validation. Results are annotated and presented using p < 0.05 as the significance threshold and p < 0.10 as the threshold for trend-level evidence. Future studies will incorporate multiple-comparison control procedures for correlation tests based on expanded samples. Additionally, we conducted sensitivity checks using the Benjamini–Hochberg FDR (BH-FDR) for all correlation tests as a robustness reference to assess the impact of multiple comparisons on the interpretation of results. However, the primary objective of this manuscript remains the exploratory identification of potential associations.

(4) Spearman correlation analysis was also employed. This method does not require data to follow a normal distribution and is suitable for the behavioral frequency data (click counts, eye-tracking metrics) and GEQ data in this study. The analysis examined the correlation strength (r) and significance (p) between different visual elements and emotional engagement to test hypothesis H4 [59].

Given the partially exploratory nature of this study and the absence of preregistration, the analytic strategy combined confirmatory and exploratory approaches. Tests for H1 and H2 were theory-driven confirmatory analyses, whereas the information-entropy and correlation analyses for H3 and H4 were implemented as exploratory analyses to identify potential association patterns among variables.

2.6.2. Definition and Calculation Methodology of Information Dynamics Indicators

This study employs Transition Entropy

(H_{t})

and Stationary Entropy

(H_{s})

as core metrics for quantifying the dynamics of visual attention, both calculated within a discrete Markov chain framework based on a single AOI transition sequence [25]. Given that “Transition Entropy” in information theory typically describes directed information flow between time series, to avoid conceptual confusion, this study uniformly uses “Transition Entropy” to denote the state-switching entropy metric employed herein. Specifically,

H_{t}

characterizes the randomness and exploratory nature of gaze shifts between AOIs, reflecting the intensity of dynamic fluctuations in visual attention.

H_{s}

is calculated based on the limiting stationary distribution estimated from the transition matrix, depicting the long-term uniformity of attention allocation across AOIs. Together, they complementarily characterize the information dynamics of the interaction process in terms of transition complexity and steady-state allocation structure.

The

H_{t}

and

H_{s}

metrics in this study are derived from first-order Markov chain models constructed using discrete AOI transition sequences, thereby calculating discrete entropy indicators. Consequently, they do not involve embedding dimensions, delay parameters, or kernel function settings inherent to continuous estimation frameworks. The state space is defined as core visual elements (AOIs) within the game interface, with a lag step set to Lag = 1, thereby constructing a first-order Markov chain based on single-step transitions between adjacent states. The transition matrix is constructed using maximum likelihood estimation, with uniform transition smoothing applied to state rows lacking observed transitions to ensure a well-defined matrix and facilitate stationary distribution estimation.

Regarding significance and robustness assessment, the primary inference objective is to compare information dynamics indicators and their relationships with experience and learning variables across different interaction conditions. Thus, correlation and difference tests aligned with the research design are employed for inference. Simultaneously, to mitigate the impact of random fluctuations in the transition sequence on the interpretation of entropy metrics, we introduce a robustness framework based on surrogate testing. This involves randomizing AOI transition sequences to construct a null distribution, which is used to assess the degree to which observed entropy metrics deviate from randomized baselines. Given this paper’s mechanism-oriented focus and small sample size, surrogate testing serves as a supplementary robustness validation method.

2.6.3. Qualitative Analysis: Thematic Analysis of Interviews

To ensure privacy protection and traceability of interview data, all interview recordings were de-identified during transcription, removing or replacing information that could indicate personal identity. Participants are represented in the text using coded identifiers. When citing direct quotes, researchers retain only content relevant to the research questions, avoiding the presentation of identifiable personal details. Given the cultural heritage context of the research subjects, thematic analysis coding and theme development adhere to principles of respect and non-judgment. Expressions potentially involving cultural sensitivities, such as religious symbols, colonial historical memory, and local identity, are analyzed only when spontaneously mentioned by participants and relevant to the experience mechanism. Their experiential meanings are presented using neutral language.

Two researchers conducted thematic analysis to systematically examine semi-structured interview data from 30 participants. The analysis process followed the six stages proposed by Braun & Clarke (2006): (1) familiarizing with the data; (2) generating initial codes; (3) searching for themes; (4) reviewing and refining themes; (5) defining and naming themes; (6) producing the analysis report [60]. Coding was facilitated using NVivo 12 software. This study employed a hybrid approach combining deductive and inductive coding. Research questions and theoretical frameworks served as the initial analytical framework and sensitizing concepts to guide coding focus. Concurrently, open coding of transcripts facilitated the induction of specific subthemes and expressive dimensions. Two researchers first conducted independent open coding, followed by discussion, comparison, and integration to form the final codebook. To ensure reporting consistency, 66 randomly selected interview semantic units (Reference Points) underwent two-coder review. The calculated Cohen’s κ was 0.77, indicating strong coding agreement. Discrepancies were resolved through discussion and incorporated into the final framework. This process yielded four core themes, twelve subthemes, and 33 code entries. Results presentation featured representative quotations, cross-referenced with corresponding subsections and quantitative findings for complementary interpretation and validation.

3. Results

First, consistency tests were conducted on the GEQ scale. Cronbach’s α for all dimensions exceeded 0.70 (Table 3). The results indicate that under the experimental conditions, each questionnaire factor demonstrated high internal consistency [61], satisfying the requirements for experimental analysis.

3.1. Effect on Academic Performance Improvement (RQ1)

To evaluate the effectiveness of cultural heritage serious games in enhancing user learning outcomes, this study analyzed the pre- and post-intervention knowledge test scores of 30 participants. The results indicate that participants’ total knowledge test scores, as well as scores for knowledge retention and comprehension, showed significant improvement following the game intervention compared to pre-intervention levels.

Descriptive statistics (Table 4) show that the post-intervention total knowledge test score (M = 6.73, SD = 2.03) was significantly higher than the pre-intervention score (M = 3.93, SD = 2.08). For knowledge retention, post-intervention scores (M = 3.40, SD = 1.22) showed a significant improvement compared to pre-intervention scores (M = 1.53, SD = 0.97). For knowledge comprehension, post-intervention scores (M = 3.33, SD = 1.03) also showed a significant increase compared to pre-intervention scores (M = 2.40, SD = 1.38).

To further examine the significance of these differences, this study employed the Shapiro–Wilk test to assess the normality of the pre- and post-test score differences. The results indicated that the total knowledge test score differences followed a normal distribution (W = 0.964, p = 0.380). Therefore, a paired samples t-test was conducted. The t-test results revealed a significant difference in total knowledge test scores before and after the intervention (t(29) = −10.958, p < 0.001), with an extremely large effect size (Cohen’s d = 2.001). The differences in knowledge retention (W = 0.898, p = 0.008) and knowledge comprehension (W = 0.899, p = 0.008) did not follow a normal distribution; hence, they were analyzed using the Wilcoxon signed-rank test. The results showed that both knowledge retention scores (z = 4.765, p < 0.001) and knowledge comprehension scores (z = 4.128, p < 0.001) were significantly higher post-intervention than pre-intervention (Table 5). In summary, the findings indicate that participation in cultural heritage serious games significantly enhances users’ learning performance in cultural knowledge retention and comprehension, supporting research hypothesis H1. Item analysis and internal consistency test results for the assessment are summarized in Supplementary Materials S2.

3.2. Differences in Eye-Tracking Behavior Across Visual Elements (RQ2)

To investigate users’ attention allocation patterns toward different categories of visual elements during gameplay, this study examined ten key visual elements: menu icons, function icons, text boxes, historic buildings, spatial navigation, characters, NPCs, dialogue boxes, props, and architectural light and shadow shows. A systematic analysis was conducted based on six eye-tracking metrics: TFD, FC, TFF, FFD, AFD, and VC.

Descriptive statistics (Figure 6) revealed significant differences in gaze behavior across visual elements under multiple eye-tracking metrics. For TFD, props (M = 48.40, SD = 27.55) and text boxes (M = 47.38, SD = 25.76) exhibited the highest means, while menu icons (M = 0.59, SD = 0.84) had the lowest mean. FC showed a similar trend, with text boxes (M = 172.07, SD = 79.70), props (M = 146.80, SD = 84.01), and architectural light and shadow shows (M = 81.17, SD = 59.55) exceeding other elements. For TFF, architectural light and shadow shows (M = 296.74, SD = 114.05) required the longest time, whereas menu icons (M = 17.81, SD = 32.63) were captured most rapidly.

Shapiro–Wilk normality tests, together with histogram and Q–Q plot inspections, indicated no overall severe deviation from normality at the aggregated “subject × AOI type” level, with only mild skewness observed for a small number of AOIs on a few metrics. Mauchly’s sphericity tests showed that all metrics violated the sphericity assumption (p < 0.05); therefore, results were reported with Greenhouse–Geisser corrections. Repeated-measures ANOVA results (Table 6) revealed significant main effects of visual elements on all six eye-tracking metrics, with F values ranging from 2.853 to 71.397 (p ≤ 0.023) and partial η_p² values ranging from 0.090 to 0.711. Bonferroni-corrected post hoc comparisons further indicated that props, text boxes, historic buildings, and architectural light and shadow shows had significantly higher mean values for TFD and FC than most other elements, suggesting that these four categories primarily carried information and conveyed context throughout the experience. Architectural light and shadow shows also showed significantly higher TFF means than other elements, consistent with their role as reward-based feedback at the end of the sequence. Spatial navigation and function icons displayed lower mean values for TFD and FC than the four categories above, while menu icons and NPCs ranked lowest overall in gaze duration and fixation frequency. These results support H2, indicating systematic differences in eye-tracking metrics across visual element categories, with the highest visual attention directed toward props, text boxes, architectural light and shadow shows, and historic buildings.

3.3. Eye-Tracking Heat Map Distribution and Analysis

This study utilized the Tobii Pro Lab analysis platform to overlay eye-tracking data from 30 participants, generating a comprehensive visual attention heatmap. The heatmap uses a color gradient to indicate the concentration of gaze points, with red representing the longest fixation duration and highest attention intensity. Yellow and green denote areas receiving secondary attention, while transparent or colorless regions indicate areas that did not receive significant visual attention.

During the system interface and navigation phase, AOIs such as menu icons (a), function icons (b), and spatial navigation (e) showed relatively dispersed visual attention distribution in the heatmap, with overall low color intensity dominated by green tones. This indicates that users engaged in goal-oriented rapid scanning and target localization during this phase. This finding is consistent with the quantitative analysis, in which the mean TFD and FC values for these elements were at their lowest levels, demonstrating that the interaction design is efficient and minimally disruptive in guiding functionality.

During the exploration and mission execution phases, significant yellow-to-orange hotspots appeared in the heatmap across AOIs, including historic buildings (d), characters (f), NPCs (g), dialogue boxes (h), and props (i1–i3). Props (i1–i3), serving as core carriers of mission objectives, formed sustained and concentrated red hotspots. Historic buildings, representing the cultural heritage core within the virtual environment, elicited moderate gaze intensity, reflecting users’ active observation of architectural forms and details. Text boxes (c) generated high-density linear hotspots during information reading, indicating sequential reading behavior. In contrast, narrative elements such as NPCs, characters, and dialogue boxes elicited moderate gaze intensity, predominantly appearing in light yellow to green hues. This indicates that while they serve narrative and interactive functions, their visual appeal remains lower than that of high-feedback tasks and dynamic content. The spatial distribution of heatmap patterns explains the significant differences in gaze metrics across various visual elements. These findings further validate the quantitative analysis results at the spatial distribution level, providing intuitive visual support for H2 and H4.

During the reward phase (j) of the architectural light and shadow shows, the heatmap revealed a highly concentrated red core hotspot that nearly covered the animation display area. The results indicate that as a passive interactive visual reward element, the architectural light and shadow shows possess significant advantages in visual appeal and sustaining user attention. This finding corroborates the “visual impact” and “emotional motivation” experiences reflected in user interviews, providing further support for Hypothesis H4. It demonstrates that dynamic visual effects can effectively promote positive user emotions, immersion, and a sense of accomplishment. In summary, the eye-tracking heatmap reveals that user attention is spatially concentrated on task-relevant, information-dense, and dynamically responsive visual objects (e.g., props, text boxes, architectural light and shadow shows, and historic buildings), while functional and navigational elements exhibit attention allocation patterns at the periphery of visual cognition (Figure 7 and Supplementary Materials S4).

3.4. The Relationship Between Information Dynamism and Emotional Arousal (RQ3)

To examine the relationship between information dynamics and emotional arousal of visual elements across different interaction types and to assess interaction-type differences [58], this study analyzed the relationships between the transition entropy and stationary entropy of active and passive interaction visual elements, respectively, and each affective dimension of the GEQ using Spearman’s correlation coefficient [25]. Given that the correlation analysis is based on uncorrected p-values and is intended for exploratory identification of potential associations, the results presented herein do not support causal inference. As this study is exploratory and some visual element categories have small sample sizes (N ≈ 20–30), statistical power is limited. To maximize the identification of potentially important effects, this report retains results reaching the trend-level threshold (p < 0.10) as preliminary findings requiring further validation in future large-sample studies (Table 7).

Among active interactive visual elements, both transition entropy (r = 0.442, p < 0.05) and stationary entropy (r = 0.500, p < 0.05) of NPCs showed significant positive correlations with negative emotions, suggesting that increased visual complexity of NPCs may trigger stronger negative emotional responses in players. Furthermore, character transition entropy exhibited a significant negative correlation with flow experience (r = −0.372, p < 0.05), indicating that dynamic changes in this element may partially disrupt players’ immersive experiences. In addition to significant correlations, several marginally significant trends were observed: the transition entropy and stationary entropy of NPCs showed marginally significant positive correlations with boredom (r = 0.355, p = 0.082 < 0.10) (r = 0.391, p = 0.053 < 0.10), while stationary entropy showed a marginally significant positive correlation with fatigue (r = 0.375, p = 0.065 < 0.10). The transition entropy of function icons showed a marginally significant negative correlation with fatigue (r = −0.362, p = 0.054 < 0.10), while stationary entropy exhibited a marginally significant positive correlation with competence (r = 0.314, p = 0.097 < 0.10). The transition entropy and stationary entropy of menu icons both showed marginally negative correlations with positive emotion (r = −0.402, p = 0.079 < 0.10) and (r = −0.440, p = 0.052 < 0.10), respectively. Transition entropy also exhibited a marginally negative correlation with tension (r = −0.435, p = 0.055 < 0.10). Among passive interactive visual elements, only the stationary entropy of historic building showed a significant positive correlation with tension (r = 0.433, p < 0.05). The analysis results indicate that the dynamic information of active interactive elements exerts a more pronounced effect on emotional arousal than that of passive interactive elements, supporting Hypothesis H3 [62]. Consequently, the conclusions presented in this section primarily serve to propose mechanistic insights and directions for subsequent validation. Their robustness requires further testing with larger samples and appropriate multiple-comparison control procedures.

3.5. The Impact of Visual Elements on Interactive Behavior and User Experience (RQ4)

Using Spearman correlation analysis, this study examined the relationships between active interaction clicks, passive interaction eye-tracking metrics, and each dimension of the GEQ scale. The analysis (Table 8) revealed that, regarding active interaction behaviors, the interactive mechanism linking NPCs and dialogue boxes showed significant negative correlations with multiple dimensions of user experience. Specifically, NPCs click counts showed significant negative correlations with positive emotion (r = −0.582, p < 0.01), immersion (r = −0.390, p < 0.05), sense of accomplishment (r = −0.423, p < 0.05), and tension (r = −0.374, p < 0.05). Clicks on the dialogue box showed significant negative correlations with immersion (r = −0.440, p < 0.05) and positive emotions (r = −0.405, p < 0.05). This indicates that while this interactive design aids task progression, it disrupts narrative flow, thereby negatively affecting emotional immersion and positive emotions [63]. In contrast, character clicks showed a significant positive correlation with sense of accomplishment (r = 0.455, p < 0.05), and AFD also correlated positively with immersion (r = 0.361, p < 0.05). This suggests that, as a core interactive object, characters effectively enhance players’ goal attainment and immersive experience. However, character FC showed a significant negative correlation with competence (r = −0.463, p < 0.05), suggesting the need to optimize its visual feedback mechanism.

(Table 9) Eye-tracking analysis of active interactive elements further revealed that the AFD of function icons showed significant positive correlations with perceived fatigue (r = 0.403, p < 0.05) and perceived tension (r = 0.466, p < 0.01). In contrast, the AFD of FC showed a significant positive correlation with negative affect (r = 0.378, p < 0.05). The AFD of menu icons was negatively correlated with sense of accomplishment (r = −0.451, p < 0.05), while the AFD of spatial navigation was negatively correlated with negative emotions (r = −0.400, p < 0.05), indicating that clear navigation design helps reduce users’ negative experiences; The TFF of props showed a significant negative correlation with competence (r = −0.383, p < 0.05). These findings indicate that active interaction involves cognitive load, necessitating improved user experience through optimized information hierarchy and enhanced interactive feedback.

At the passive interaction level (Table 9), environmental narrative elements significantly enhanced emotional engagement. The FC of architectural light and shadow shows showed significant positive correlations with positive emotions (r = 0.526, p < 0.01), sense of accomplishment (r = 0.375, p < 0.05), and immersion (r = 0.381, p < 0.05), indicating that it effectively promotes users’ emotional engagement through atmosphere creation. Text box FC also showed significant correlations with emotional engagement, exhibiting strong positive correlations with immersion (r = 0.417, p < 0.05) and sense of accomplishment (r = 0.378, p < 0.05), as well as a significant positive correlation with positive emotion (r = 0.363, p < 0.05). Simultaneously, text box FC showed a significant positive correlation with boredom (r = 0.378, p < 0.05), suggesting that optimizing information presentation is necessary to enhance content appeal.

The findings indicate that active and passive interactive elements exert distinct effects on user experience through different behavioral patterns. These results support Hypothesis 4. Specifically, NPCs, dialogue boxes, and characters—as active interactive elements—significantly influence user experience, with their impact depending on the fluidity of the interaction process. Conversely, architectural light and shadow shows and text boxes—as passive interactive elements—significantly enhance users’ positive emotional engagement through visual gaze behavior.

3.6. Thematic Analysis of Interview Results

Through thematic analysis of interview transcripts from 30 participants, this study identified four core themes: learning performance, attention to visual elements, interactive experience, and emotional and engagement experience. The qualitative findings complemented the quantitative data, providing additional support for the research hypotheses.

Regarding learning outcomes, 21 respondents (70%) affirmed the game’s positive influence in facilitating the internalization of cultural heritage knowledge, noting that interactive tasks effectively directed their attention to architectural details and cultural symbols. For example, P8 stated, “Through some activities in the game, I was able to examine the ornamental features of these buildings in greater detail.” This finding aligns with Hypothesis H1, as the post-test knowledge scores significantly exceeded pre-test scores (p < 0.001). Additionally, 29 respondents (96.7%) reported that the game inspired their interest in revisiting sites in person. As P23 noted: “After playing this game, I notice more details and want to revisit the actual locations.” However, text overload was widely mentioned. Seventeen respondents (56.7%) indicated that lengthy text blocks caused fatigue and boredom, leading them to feel disconnected from visual elements. P16 commented, “Text should be broken up and integrated with scenes; otherwise, you forget the preceding content when viewing images.” This finding corresponds with quantitative analysis, which showed that text boxes had the highest FC and a positive correlation with “boredom.”

Regarding visual element engagement, respondent feedback indicated that environment-driven passive interaction elements, such as architectural light and shadow shows, effectively captured visual attention and evoked positive emotions. Twenty-one respondents (70%) described the architectural light and shadow shows as “stunning” and “eye-opening” (P15). P30 remarked, “The projection segment was incredibly impactful, significantly enhancing my sense of participation.” This aligns with quantitative findings that revealed a strong positive correlation between the architectural light and shadow show’s FC scores and positive emotions/immersion, supporting H4. In contrast, while active interaction elements like NPCs played a crucial role in task progression, 18 respondents (60%) noted deficiencies in interface guidance. These included unclear task icon meanings, illogical NPC positioning and orientation that caused spatial confusion (P17), and the absence of a more explicit mini-map navigation system (P23).

Regarding the interactive experience, users generally expressed high satisfaction with the interaction mechanisms, finding designs such as first-person exploration and location-triggered tasks intuitive and user-friendly (P14). However, 13 respondents (43.3%) observed that the current depth of interaction remained insufficient. They expressed a desire to move beyond shallow “drag-and-drop/right-or-wrong” interaction patterns, to include contextual explanations for cultural symbols, and to enhance the interactivity of historic buildings rather than treating them merely as static backdrops (P11, P30). P8 further suggested that “NPCs should allow diverse interactions with each character, akin to the open world of The Legend of Zelda.” This indicates that embedding interactive behaviors within narrative and cultural contexts is crucial for enhancing immersion and the depth of learning.

In terms of emotional engagement and participatory experience, most participants (70%) acknowledged the unique value of virtual experiences. Twenty-one respondents (70%) believed that the game provided perspectives and immersive environments difficult to achieve in real life. P5 remarked, “The real-life Ruins of St. Paul’s is packed with people, but the game version is empty—it instantly made me feel immersed.” The architectural light and shadow shows were identified as a core feedback mechanism. However, 10 participants (33.3%) suggested strengthening their connection to historical and cultural themes and introducing more diverse reward formats, such as virtual badges, to sustain a sense of achievement (P11, P18).

The thematic analysis deepened the understanding of the research questions from users’ subjective perspectives. The qualitative findings triangulated effectively with the quantitative results, revealing underlying causal mechanisms and the logic of user experience. These results provide empirical evidence and practical directions for optimizing visual-interaction-affective design in cultural heritage serious games.

4. Discussion

4.1. Enhancing Cultural Heritage Learning Through Serious Games

This study employed a pre- and post-test comparative analysis and found that users’ scores on cultural knowledge retention and comprehension tests significantly improved (p < 0.001) after experiencing the cultural heritage serious game, validating Hypothesis H1. The results indicate that the serious game intervention based on the “Macau Historic Centre Science Popularization System” effectively enhances learning outcomes in cultural heritage knowledge. This finding aligns with the educational value of serious games, which use gamification mechanisms to strengthen learning motivation and knowledge retention [64].

The game adopts a progressive learning framework of “exploration–learning–reward,” integrating knowledge of historical architectural features and cultural symbols into first-person exploration, NPC dialogue interactions, and contextualized missions. This design provides users with an immersive learning experience and supports them in actively constructing knowledge through interaction, aligning with the “learning by doing” principle emphasized in experiential learning theory. Interview data further corroborate this conclusion. User feedback indicates that the game “helps notice architectural details previously overlooked” (P8) and “sparks interest in visiting sites in person” (P23). Thus, the game not only enhances knowledge retention but also fosters emotional engagement and motivation. This demonstrates that serious games can convey knowledge content and promote active learning behaviors through immersive experiences [10].

4.2. The Influence of Visual Elements on Interactive Behavior and Cognitive Resource Allocation

Eye-tracking data analysis revealed significant differences in user gaze behavior across different categories of visual elements (p < 0.001), supporting Hypothesis H2. Props, text boxes, architectural light and shadow shows, and historic buildings exhibited significantly higher TFD and FC values than other interface elements, indicating that these components more effectively captured users’ visual attention [65].

As the core vehicle for task objectives, props command significantly longer fixation durations than visual elements such as menu icons, reflecting users’ prioritization of functional components. Text boxes, which serve as primary carriers of cultural information, also attracted considerable attention, indicating that users allocate greater cognitive resources to textual reading and semantic processing [66]. Furthermore, the significant difference in First Fixation Duration (FFD) (p = 0.023) reveals users’ cognitive processing strategies across different visual hierarchies. Users exhibited extremely brief first fixations (0.16 s) on persistent UI elements such as menu icons, indicating strong intuitive recognition and low extraneous load. In contrast, spatial navigation (0.38 s) and character elements (0.32 s) elicited longer initial fixations, reflecting users’ need for immediate spatial orientation and semantic decoding when encountering narrative scene elements. This differentiated pattern of “UI rapid recognition” versus “scene deep processing” confirms that the interface design effectively established a visual hierarchy between functional and content layers. Architectural light and shadow shows, functioning as dynamic reward-based content [67], substantially extended user fixation duration through their visual appeal.

Based on interview feedback, users generally affirmed the visual appeal of architectural light and shadow shows, describing them as “impressive” (P15), but also noted that overly dense textual information can be “fatiguing” (P16). This aligns with cognitive load theory’s perspective that extraneous load impairs emotional engagement and learning motivation. Therefore, serious game design should prioritize optimizing the information structure and presentation of high-attention elements to balance cognitive load and user experience. From the perspectives of cognitive load theory and cognitive theory of multimedia learning, text boxes with excessive information density introduce excessive extraneous load within limited working memory resources. This shifts attentional resources away from processing cultural content itself toward managing textual form and interface structure, more readily inducing fatigue and boredom [26,27]. Therefore, to optimize text processing, we recommend a “layered presentation” strategy, decomposing dense text into visually prominent core points and expandable details. Concurrently, implement a “multimodal redundancy” design by providing brief voice narrations for key content to share cognitive load with the visual channel. In contrast, passive visual elements such as architectural light and shadow shows enhance emotional arousal and interest through atmospheric immersion and emotional engagement without significantly increasing intrinsic load, thereby indirectly boosting learning motivation and memory retention [26,28]. This reveals a critical trade-off in serious game visual design. Highly interactive, task-driven elements such as props effectively capture visual attention and promote deep cognitive processing but carry higher cognitive load and fatigue risks. Conversely, high-quality, narrative-driven passive elements such as architectural light and shadow shows offer lower active information transfer efficiency yet provide essential emotional buffering, immersive experiences, and learning memory consolidation. Therefore, designers should not rely solely on any single type of element. Instead, they should establish a dynamic and rhythmic narrative balance between high-cognitive-load interactive tasks and high-arousal experiential scenarios, addressing both cognitive and emotional needs.

4.3. Differential Impact of Interactivity on Emotional Experience

This study analyzed the impact of active and passive interaction elements on users’ emotional experiences [68]. The findings provided exploratory evidence and mechanism clues supporting Hypotheses H3 and H4, suggesting potential differences in their emotional arousal mechanisms [62]. The dynamic nature of information in active interaction elements exerted a more pronounced effect on emotional arousal than that in passive interaction elements [69].

Active interactive elements play a critical role in advancing task progression, yet their design quality directly influences user experience. The analysis of NPCs elements showed a significant correlation between their information dynamism and negative emotions. Specifically, NPC transition entropy exhibited a significant positive correlation with negative emotions (r = 0.442, p < 0.05). transition entropy and stationary entropy both demonstrated marginally significant positive correlations with boredom (r = 0.355, p = 0.082; r = 0.391, p = 0.053), while stationary entropy showed a marginally significant positive correlation with fatigue (r = 0.375, p = 0.065). These findings indicate that NPC interaction processes characterized by insufficient fluidity and feedback delays may cause users to experience operational interruptions and cognitive resistance [63]. User comments in interviews, such as “NPC dialogues feel slightly lengthy” (P11) and “Unreasonable NPC positioning and orientation design leads to spatial cognitive confusion” (P17), further reflect this issue. These representative quotes suggest that high-frequency interactions may be accompanied by experiential friction and spatial orientation costs. Unreasonable dynamic interactions may necessitate more frequent visual searches and repeated confirmations, thereby increasing subjective stress and correlating with higher levels of negative emotional experiences. This provides a qualitative, mechanism-level explanation for the positive association between NPCs information dynamism metrics and negative emotions. From a flow theory perspective, excessively long and poorly paced dialogues, along with interactions that frequently interrupt exploration of the main storyline, disrupt the equilibrium of “clear goals—immediate feedback—smooth control,” thereby diminishing immersive experiences [28]. From an embodied cognition perspective, NPCs’ spatial positioning and orientation that misalign with the player’s embodied movement paths and viewpoint disrupts the environmental spatial reference system, increasing cognitive load and disorientation [29,30]. This may collectively explain the underlying mechanism by which NPC interactions more readily induce tension and frustration.

Analysis of character elements revealed a significant positive correlation between click frequency and sense of accomplishment (r = 0.455, p < 0.05), while AFD also exhibited a significant positive correlation with immersion (r = 0.361, p < 0.05), demonstrating their role in goal attainment and narrative engagement. However, FC showed a significant negative correlation with competence (r = −0.463, p < 0.05), while transition entropy demonstrated a significant negative correlation with flow experience (r = −0.372, p < 0.05). This indicates that character interactions require optimization in terms of visual feedback clarity and operational fluidity [34]. When information dynamics are excessively high without clear feedback guidance, it easily disrupts users’ flow state [28].

Analysis of function icons further reveals the complexity of how visual elements influence user experience, with information dynamism metrics and eye-tracking metrics exhibiting distinct patterns of impact. Regarding information dynamics, transition entropy showed a marginally significant negative correlation with fatigue (r = −0.362, p = 0.054), while stationary entropy exhibited a marginally significant positive correlation with competence (r = 0.314, p = 0.097). However, in eye-tracking metrics, AFD showed significant positive correlations with fatigue (r = 0.403, p < 0.05) and tension (r = 0.466, p < 0.01), while FC also correlated significantly with negative emotions (r = 0.378, p < 0.05). Therefore, function icon design must balance recognition efficiency with visual guidance. This requires enhancing icon intuitiveness to reduce cognitive load while refining information architecture to provide clear visual exploration pathways.

In contrast, passive interactive elements such as architectural light and shadow shows, as well as text boxes, demonstrated more positive emotional enhancement effects. The fixation count (FC) for architectural light and shadow shows was positively correlated with positive emotions, immersion, and a sense of accomplishment, indicating that, as highly arousing visual reward elements, they effectively strengthened users’ emotional engagement [70,71]. While text boxes enhance immersion and a sense of accomplishment [72], when text density is excessively high, their fixation count (FC) was positively correlated with boredom (p < 0.05), suggesting a need to optimize information presentation [66]. This finding aligns with cognitive load theory, which posits that excessive extraneous load can induce fatigue and boredom [26,27].

Therefore, serious game design should balance both types of interaction elements. First, for active interaction design, given the significant correlation between NPCs’ high information dynamism and negative emotions, the core design focus should be on reducing cognitive friction during interactions. Specifically, implementing a “low-friction interaction redesign” strategy is recommended: abandoning modal pop-ups that disrupt task flow and flow states in favor of non-modal, persistent dialogue bubbles for feedback. Simultaneously, optimizing NPCs’ spatial positioning and viewpoint orientation maintains cognitive consistency within the virtual environment. Thus, active interactions must prioritize operational fluidity and timely feedback to prevent negative emotions arising from interaction barriers. Second, passive interaction design should fully leverage its strengths in atmosphere creation and emotional motivation while reasonably controlling information load. This complements active interaction, synergistically enhancing the overall experience’s coherence and enjoyment.

4.4. Limitations and Future Work

This study has certain limitations. It did not include a delayed post-test one to two weeks later to assess delayed retention. Both the pre-test and post-test were completed within the same experimental session with an interval of approximately 30 min, so the results primarily reflect immediate learning gains. Future research could incorporate transfer learning measurement tasks based on delayed retention assessments. Regarding sample adequacy, while N = 30 aligns with typical eye-tracking experiment sample sizes, the complex experimental design involving both eye-tracking data collection and interviews prevented large-scale sampling and rigorous a priori power analysis. Consequently, interpretations rely primarily on effect size measures like Cohen’s d and confidence intervals, rather than p-values alone. Future research may explore streamlining experimental procedures to increase sample size, thereby validating the robustness of this study’s conclusions.

This study has limitations regarding the system’s accessibility and universal design. The current game prototype was developed primarily for experimental purposes, prioritizing variable control and standardized information presentation within the experimental context. Consequently, the system does not yet integrate configurable accessibility features such as subtitle toggles, color schemes optimized for users with color vision deficiencies, or font and interface scaling controls adaptable to diverse visual needs. Future practice-oriented research should reference established accessibility design standards and best practices to systematically implement features like synchronized subtitles, optimized color contrast, and adjustable interface elements. These should be tested and validated across broader, more diverse user groups to enhance the fairness and universal value of digital cultural heritage experiences.

Nevertheless, this study ensured internal validity through specific sample controls. First, regarding sample composition, while participants were primarily recruited from a university setting (N = 30), their demographic characteristics exhibited internal diversity rather than representing a homogeneous group of young students. Participants ranged in age from 18 to 45, with mature individuals aged 26 and above constituting 70% of the sample; occupational backgrounds encompassed educators (26.7%), corporate employees (16.7%), and graduate students (56.6%). This cohort represents “knowledge-based users” possessing high digital literacy and cultural curiosity, constituting the core target audience for serious games on cultural heritage and immersive cultural tourism. Additionally, participants’ gaming habits spanned the full spectrum from “daily gamers” (20%) to “rarely play” (37%). Consequently, the study’s conclusions possess high ecological validity in explaining the cognitive and emotional patterns of this core user group.

Second, the study controlled for the “cultural familiarity” variable in its design, as all participants had prior experience visiting the Historic Centre of Macau in person. This control effectively minimized confounding interference from “cultural unfamiliarity,” ensuring the experiment focused on testing the impact of the “visual-interaction mechanism” itself on cognition and emotion. While this may somewhat limit the direct generalizability of conclusions to groups without relevant background knowledge, it provides targeted empirical evidence for digital design strategies aimed at enhancing “post-visit experiences” or stimulating “revisit intentions.” Furthermore, while the serious game materials used in this study focused on Macau, the revealed interaction mechanisms such as how poorly designed active interaction elements (e.g., NPCs) can induce cognitive friction, whereas passive visual rewards (e.g., architectural light and shadow shows) effectively drive positive emotions are grounded in universal principles of human cognitive load and affective arousal. Consequently, these design principles possess cross-contextual applicability and can inform digital design for other cultural heritage sites, including museums and archeological sites.

Statistical validation also presents limitations. Constrained by the sample size (N = 30) for high-precision eye-tracking experiments, current analyses such as correlation and analyses of variance sufficiently reveal significant associations between variables and meet standard statistical power requirements. However, the sample size remains insufficient to support complex multivariate mediation models like structural equation modeling (SEM) for rigorously quantifying mediating or causal effects across the full “visual–interaction–emotion–learning” pathway. Furthermore, this study was not formally preregistered. Consequently, the findings related to H3–H4 primarily serve as a basis for hypothesis generation and require further validation in future preregistered studies with larger samples.

Regarding the application of information dynamics metrics, interactive game data may exhibit temporal non-stationarity, potentially affecting entropy metric estimates based on global transition matrices. This study employs an intra-trial local stationarity assumption, calculating entropy values per trial and avoiding cross-trial mixed estimation to mitigate bias risks from global non-stationarity. Basic sequence quality control thresholds are also established, such as excluding trials with insufficient effective transitions from calculations, to mitigate estimation biases caused by short sequences. Future research may further incorporate methods like sliding-window entropy analysis or dynamic Markov models to characterize attention transfer structures at finer temporal granularity, thereby enhancing the depth of processing and interpretability for non-stationary data.

Future research may explore the following avenues. First, expand sample sizes and employ structural equation modeling for path and mediation analyses to formally test the specific effects of attention and emotion within the influence mechanism. Second, broaden participant diversity by including users with varying educational backgrounds, age groups, and cultural contexts to validate the applicability of current findings across broader audiences. Additionally, longitudinal experiments could track the long-term impact of serious games on cultural heritage learning. Concurrently, integrating multimodal data such as EEG and heart rate could provide a more comprehensive assessment of user experience. Finally, the theoretical framework should be validated and extended across more diverse cultural contexts and game mechanics. Exploring adaptive game design based on real-time user behavior data could enable personalized learning support.

5. Conclusions

This study employed a mixed-methods approach to systematically investigate how visual elements in cultural heritage serious games influence user interaction behaviors, cognitive resource allocation, and emotional experiences. Using the “Macau Historic Centre Science Popularization System” as an empirical platform, and integrating eye-tracking, behavioral log, questionnaire, and interview data, the following key conclusions were drawn:

Regarding cultural heritage learning performance, this study confirmed the effectiveness of serious games in cultural heritage education. Following the game intervention, users demonstrated significant improvements in cultural knowledge retention and comprehension (p < 0.001), confirming that game designs based on the “exploration–learning–reward” framework promote knowledge internalization and emotional engagement, supporting Hypothesis H1. Interview findings further indicate that the game not only enhanced short-term learning outcomes but also stimulated users’ long-term interest in exploring cultural heritage.

Regarding visual attention allocation, the study identified significant differences in how various visual elements capture user attention. Eye-tracking data revealed that four types of elements—prop objects conveying core task information and cultural narratives, text boxes, historic buildings, and reward-based dynamic feedback elements such as architectural light and shadow shows—exhibited significantly higher TFD and FC values than other interface elements (p < 0.001). Although the effect size for first fixation duration (FFD) was smaller (η_p² = 0.09), it remained statistically significant (p < 0.05), further revealing that users differentiated between function icons and narrative elements during the initial stages of visual processing. Overall results indicate that functionally relevant, information-dense, and dynamically presented visual elements more readily capture users’ cognitive resources, supporting Hypothesis H2.

Regarding the impact of interaction types on emotional experiences, this study reveals distinct mechanisms through which active and passive interaction elements influence user emotional responses. Results indicate that the dynamic nature of information in active interaction elements exhibits an association pattern with emotional arousal consistent with Hypothesis H3, providing exploratory trend evidence and mechanistic clues for H3. The relevant association requires further validation with larger samples and appropriate multiple-comparison control procedures. However, when the interaction design involves complex processes or delayed feedback, users experience cognitive friction that negatively affects immersion. For instance, certain NPC dialogue interactions—characterized by excessive hierarchical levels and unclear guidance—caused operational interruptions and frustration, partially offsetting the positive effects of emotional arousal. Therefore, although active interaction elements play a key role in eliciting emotional responses, their overall user experience outcomes depend heavily on the fluidity and intuitiveness of interaction design. In contrast, passive interaction elements demonstrated significant advantages in promoting positive emotional experiences. Architectural light and shadow shows and text boxes reveal differentiated effects consistent with Hypothesis H4 across experiential metrics such as positive emotions, immersion, and sense of accomplishment, providing exploratory evidence for H4. The robustness of these findings requires further validation through larger samples and more rigorous multiple-comparison controls. While text boxes aided cognitive immersion, their high information density correlated positively with frustration and boredom, indicating the need to optimize information presentation. These findings suggest that in serious game design, atmosphere creation and environmental storytelling are equally important as functional operations. Optimization should focus on simplifying active interaction paths to minimize frustration while enhancing the visual expressiveness and emotional motivation of passive interaction elements. This approach can achieve a balanced integration of functionality and emotional engagement in the user experience.

At both theoretical and practical levels, this study integrates perspectives from cognitive load theory, flow theory, and embodied cognition. By combining eye-tracking data, information dynamics analysis, and user reports, it constructs a “visual elements–interaction behaviors–cognition-emotion” analytical framework, providing a multidimensional methodology and empirical foundation for serious games research in cultural heritage preservation. Methodologically, this study integrates traditional eye-tracking metrics with information dynamics indicators based on transition entropy, pre- and post-test learning performance, the GEQ game experience questionnaire, and semi-structured interview data. This enables multimodal, process-oriented measurement of user experience. Compared to existing research often reliant on single questionnaires or static eye-tracking metrics, it systematically reveals the connections from visual presentation and behavioral responses to subjective experience and learning outcomes. Practically, using the World Heritage site of Macau Historic Centre as a case study, findings indicate that architectural light and shadow shows serve as passive visual elements with high emotional arousal and low interaction burden. Meanwhile, NPC interactions and text box designs require careful balancing between information density, interaction complexity, and cognitive load. Based on these findings, it is recommended that serious game design simplify active interaction processes, enhance the emotional motivation of passive elements, and adopt multimodal approaches to reduce textual cognitive load. These discoveries provide evidence-based, concrete guidance for the visual and interactive design of serious games and digital guide systems in similar heritage contexts.

Furthermore, this study offers significant implications for cultural heritage institutions’ digital policy development. First, in exhibition planning, a “digital-first” strategy can be adopted, positioning serious games as “cognitive scaffolding” for physical exhibitions. Leveraging online experiences to pre-construct visitors’ knowledge schemas enhances the depth of understanding and immersion during offline visits. Second, in content development and procurement, evaluation criteria centered on “cognitive usability” should be established, prioritizing cognitive load management and interaction fluidity. This approach effectively mitigates experience fragmentation and user attrition caused by poorly designed digital content in public cultural facilities.

This study’s limitations include a limited sample size, a relatively homogeneous participant background, and the absence of physiological indicator data. Future research could address these by expanding the sample size, conducting cross-cultural comparisons, and integrating physiological measurement methods such as electroencephalography (EEG) or galvanic skin response (GSR) to further explore the neural mechanisms underlying users’ cognitive and emotional responses. Furthermore, adaptive game mechanisms could be developed to dynamically adjust visual and interactive designs based on real-time user behavior data, thereby achieving personalized learning experiences.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/buildings16020323/s1, Supplementary Materials S1: The complete set of experimental stimuli, including additional scenes not presented in Figure 5. Supplementary Materials S2: Item analysis statistics and internal consistency reliability of the knowledge test. Supplementary Materials S3: Results of the KMO and Bartlett’s sphericity test for GEQ construct validity. Supplementary Materials S4: The complete heatmap dataset for all experimental scenes, supporting Figure 7.

Author Contributions

Conceptualization, P.Z.; Methodology, P.Z.; Software, Y.L. (Yao Lu); Validation, Z.W.; Formal analysis, Y.L. (Yi Lu); Investigation, P.Z., Y.L. (Yao Lu) and Z.W.; Writing—original draft, P.Z.; Writing—review & editing, P.W.; Supervision, Y.L. (Yi Lu); Project administration, P.W.; Funding acquisition, P.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding. The APC was funded by the authors.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki and approved by the Human Research Ethics Committee for Non-Clinical Faculties, City University of Macau (Reference No. 202505/13/1159; approval date: 13 May 2025).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study. The requirement for signed written informed consent was waived by the Ethics Committee due to the anonymous nature of the survey.

Data Availability Statement

Supporting data used in this study (including aggregated eye-tracking metric summaries and learning performance outcomes) are available from the corresponding author upon reasonable academic request. When providing data, we will share only de-identified or aggregated-level information and will not include any personally identifiable data. Given that the raw data contain video and behavioral records that involve participant privacy, they are not publicly available via data repositories at this time. The code and parameter configurations used for the information dynamics analyses are also available from the corresponding author upon reasonable academic request. Definitions of the relevant metrics and the calculation methodology are provided in Section 2.6.2.

Conflicts of Interest

The authors declare no conflict of interest.

References

Mekonnen, H.; Bires, Z.; Berhanu, K. Practices and Challenges of Cultural Heritage Conservation in Historical and Religious Heritage Sites: Evidence from North Shoa Zone, Amhara Region, Ethiopia. npj Herit. Sci. 2022, 10, 172. [Google Scholar] [CrossRef]
Nocca, F. The Role of Cultural Heritage in Sustainable Development: Multidimensional Indicators as Decision-Making Tool. Sustainability 2017, 9, 1882. [Google Scholar] [CrossRef]
Pan, L.; Hu, X. Study on the Advancement of Cultural and Design-Oriented Products Through the Interaction Between Cantonese Opera and Historical and Architectural Heritage in Macau. J. Archit. Urban Des. 2025, 2, 1–26. [Google Scholar] [CrossRef]
Gee, J.P. What Video Games Have to Teach Us About Learning and Literacy; Association for Computing Machinery: New York, NY, USA, 2003; Volume 1. [Google Scholar]
Mendoza, M.A.D.; De La Hoz Franco, E.; Gómez, J.E.G. Technologies for the Preservation of Cultural Heritage—A Systematic Review of the Literature. Sustainability 2023, 15, 1059. [Google Scholar] [CrossRef]
Laamarti, F.; Eid, M.; El Saddik, A. An Overview of Serious Games. Int. J. Comput. Games Technol. 2014, 2014, 358152. [Google Scholar] [CrossRef]
Zhang, A.; Gong, Y.; Chen, Q.; Jin, X.; Mu, Y.; Lu, Y. Driving Innovation and Sustainable Development in Cultural Heritage Education Through Digital Transformation: The Role of Interactive Technologies. Sustainability 2025, 17, 314. [Google Scholar] [CrossRef]
Malegiannaki, I.; Daradoumis, T. Analyzing the Educational Design, Use and Effect of Spatial Games for Cultural Heritage: A Literature Review. Comput. Educ. 2017, 108, 1–10. [Google Scholar] [CrossRef]
Mortara, M.; Catalano, C.E.; Bellotti, F.; Fiucci, G.; Houry-Panchetti, M.; Petridis, P. Learning Cultural Heritage by Serious Games. J. Cult. Herit. 2014, 15, 318–325. [Google Scholar] [CrossRef]
Liu, Z.; Chen, D.; Zhang, C.; Yao, J. Design of a Virtual Reality Serious Game for Experiencing the Colors of Dunhuang Frescoes. Herit. Sci. 2024, 12, 370. [Google Scholar] [CrossRef]
Bontchev, B.; Terzieva, V.; Vassileva, D.; Dankov, Y. Students Attitude to Serious Games for Cultural Heritage. IFAC-PapersOnLine 2024, 58, 316–321. [Google Scholar] [CrossRef]
Amaro, V.; Simpson, T. The Macau Arraial: Portuguese Heritage, Serious Games, and Postcolonial Identity in a Chinese Tourist City. Tour. Stud. 2024, 24, 385–409. [Google Scholar] [CrossRef]
Murtas, V.; Lombardo, V. Deeply Digging in Serious Games for Archaeology. In Proceedings of the 2024 IEEE Gaming, Entertainment, and Media Conference (GEM), Turin, Italy, 5 June 2024; pp. 1–6. [Google Scholar]
Champion, E. Critical Gaming: Interactive History and Virtual Heritage, 1st ed.; Routledge: Abingdon, UK, 2016; ISBN 978-1-317-15739-7. [Google Scholar]
Mao, P.; Cho, D.M. Research on an Evaluation Rubric for Promoting User’s Continuous Usage Intention: A Case Study of Serious Games for Chinese Cultural Heritage. Front. Psychol. 2024, 15, 1300686. [Google Scholar] [CrossRef] [PubMed]
Scott, N.; Zhang, R.; Le, D.; Moyle, B. A Review of Eye-Tracking Research in Tourism. Curr. Issues Tour. 2019, 22, 1244–1261. [Google Scholar] [CrossRef]
Lai, M.-L.; Tsai, M.-J.; Yang, F.-Y.; Hsu, C.-Y.; Liu, T.-C.; Lee, S.W.-Y.; Lee, M.-H.; Chiou, G.-L.; Liang, J.-C.; Tsai, C.-C. A Review of Using Eye-Tracking Technology in Exploring Learning from 2000 to 2012. Educ. Res. Rev. 2013, 10, 90–115. [Google Scholar] [CrossRef]
Wang, P.; Fu, H. The Influence of Different Visual Elements of High-Density Urban Observation Decks on the Visual Behavior and Place Identity of Tourists and Residents. Appl. Sci. 2025, 15, 3875. [Google Scholar] [CrossRef]
Kang, Y.; Kim, E.J. Differences of Restorative Effects While Viewing Urban Landscapes and Green Landscapes. Sustainability 2019, 11, 2129. [Google Scholar] [CrossRef]
Li, N.; Zhang, S.; Xia, L.; Wu, Y. Investigating the Visual Behavior Characteristics of Architectural Heritage Using Eye-Tracking. Buildings 2022, 12, 1058. [Google Scholar] [CrossRef]
Shi, W.; Ono, K.; Li, L. Cognitive Insights into Museum Engagement: A Mobile Eye-Tracking Study on Visual Attention Distribution and Learning Experience. Electronics 2025, 14, 2208. [Google Scholar] [CrossRef]
Mendoza López, M.; Alcaraz Artero, P.M.; Truque Díaz, C.; Pardo Ríos, M.; Hernández Morante, J.J.; Melendreras Ruiz, R. Kids Save Lives by Learning through a Serious Game. Multimodal Technol. Interact. 2023, 7, 112. [Google Scholar] [CrossRef]
Liu, L.; Li, S.; Tian, L.; Yao, X.; Ling, Y.; Chen, J.; Wang, G.; Yang, Y. The Impact of Cues on Joint Attention in Children with Autism Spectrum Disorder: An Eye-Tracking Study in Virtual Games. Behav. Sci. 2024, 14, 871. [Google Scholar] [CrossRef]
Sevcenko, N.; Appel, T.; Ninaus, M.; Moeller, K.; Gerjets, P. Theory-Based Approach for Assessing Cognitive Load during Time-Critical Resource-Managing Human–Computer Interactions: An Eye-Tracking Study. J. Multimodal User Interfaces 2023, 17, 1–19. [Google Scholar] [CrossRef]
Krejtz, K.; Duchowski, A.; Szmidt, T.; Krejtz, I.; González Perilli, F.; Pires, A.; Vilaro, A.; Villalobos, N. Gaze Transition Entropy. ACM Trans. Appl. Percept. 2015, 13, 1–20. [Google Scholar] [CrossRef]
Mayer, R.E.; Moreno, R. Nine Ways to Reduce Cognitive Load in Multimedia Learning. Educ. Psychol. 2003, 38, 43–52. [Google Scholar] [CrossRef]
Paas, F.; Renkl, A.; Sweller, J. Cognitive Load Theory and Instructional Design: Recent Developments. Educ. Psychol. 2003, 38, 1–4. [Google Scholar] [CrossRef]
Jennett, C.; Cox, A.L.; Cairns, P.; Dhoparee, S.; Epps, A.; Tijs, T.; Walton, A. Measuring and Defining the Experience of Immersion in Games. Int. J. Hum.-Comput. Stud. 2008, 66, 641–661. [Google Scholar] [CrossRef]
Barsalou, L.W. Grounded Cognition. Annu. Rev. Psychol. 2008, 59, 617–645. [Google Scholar] [CrossRef]
Wilson, M. Six Views of Embodied Cognition. Psychon. Bull. Rev. 2002, 9, 625–636. [Google Scholar] [CrossRef]
Ferreira De Almeida, J.L.; Dos Santos Machado, L. Design Requirements for Educational Serious Games with Focus on Player Enjoyment. Entertain. Comput. 2021, 38, 100413. [Google Scholar] [CrossRef]
Inal, Y.; Volden, F.; Nørgaard, G.; Eline Thømt Roksvåg, A.; Forsberg Sommerfelt, E.; Stenersen Sæth, E. Effects of Gameplay Dynamics on Visual Attention. IEEE Access 2024, 12, 126961–126969. [Google Scholar] [CrossRef]
Bellotti, F.; Berta, R.; De Gloria, A.; D’ursi, A.; Fiore, V. A Serious Game Model for Cultural Heritage. J. Comput. Cult. Herit. 2012, 5, 1–27. [Google Scholar] [CrossRef]
BlečIć, I.; Cuccu, S.; Fanni, F.A.; Frau, V.; Macis, R.; Saiu, V.; Senis, M.; Spano, L.D.; Tola, A. First-Person Cinematographic Videogames: Game Model, Authoring Environment, and Potential for Creating Affection for Places. J. Comput. Cult. Herit. 2021, 14, 1–29. [Google Scholar] [CrossRef]
Behl, A.; Jayawardena, N.; Pereira, V.; Islam, N.; Giudice, M.D.; Choudrie, J. Gamification and E-Learning for Young Learners: A Systematic Literature Review, Bibliometric Analysis, and Future Research Agenda. Technol. Forecast. Soc. Change 2022, 176, 121445. [Google Scholar] [CrossRef]
Ferdani, D.; Fanini, B.; Piccioli, M.C.; Carboni, F.; Vigliarolo, P. 3D Reconstruction and Validation of Historical Background for Immersive VR Applications and Games: The Case Study of the Forum of Augustus in Rome. J. Cult. Herit. 2020, 43, 129–143. [Google Scholar] [CrossRef]
Videnovik, M.; Vold, T.; Kiønig, L.; Madevska Bogdanova, A.; Trajkovik, V. Game-Based Learning in Computer Science Education: A Scoping Literature Review. Int. J. STEM Educ. 2023, 10, 54. [Google Scholar] [CrossRef]
Chen, H.J.H.; Hsu, H.L. The Impact of a Serious Game on Vocabulary and Content Learning: Computer Assisted Language Learning. Comput. Assist. Lang. Learn. 2020, 33, 811–832. [Google Scholar] [CrossRef]
Vranesic, P.; Aleksic-Maslac, K.; Sinkovic, B. Influence of Gamification Reward System on Student Motivation. In Proceedings of the 2019 42nd International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), Opatija, Croatia, 20–24 May 2019; pp. 766–772. [Google Scholar]
Shin, H.; Oh, C.; Kim, N.Y.; Choi, H.; Kim, B.; Ji, Y.G. Evaluating and Eliciting Design Requirements for an Improved User Experience in Live-Streaming Commerce Interfaces. Comput. Hum. Behav. 2024, 150, 107990. [Google Scholar] [CrossRef]
Chien, Y.-L.; Lee, C.-H.; Chiu, Y.-N.; Tsai, W.-C.; Min, Y.-C.; Lin, Y.-M.; Wong, J.-S.; Tseng, Y.-L. Game-Based Social Interaction Platform for Cognitive Assessment of Autism Using Eye Tracking. IEEE Trans. Neural Syst. Rehabil. Eng. 2023, 31, 749–758. [Google Scholar] [CrossRef]
Zamri, K.Y.; Tan, H.K. Evaluating Educational Game via User Experience (UX) and User Interface (UI) Elements. Educ. J. Soc. Sci. 2022, 8, 1–9. [Google Scholar] [CrossRef]
Vasiljevic, G.A.M.; De Miranda, L.C. The Influence of Graphical Elements on User’s Attention and Control on a Neurofeedback-Based Game. Entertain. Comput. 2019, 29, 10–19. [Google Scholar] [CrossRef]
Okur, M.; Kızıl, R.; Atamaz, E. The Art of Graphic Design in Video Games: Beyond the Visual. Rev. Amazon. Investig. 2024, 13, 9–26. [Google Scholar] [CrossRef]
Mohd Nizam, D.N.; Law, E.L.-C. Derivation of Young Children’s Interaction Strategies with Digital Educational Games from Gaze Sequences Analysis. Int. J. Hum.-Comput. Stud. 2021, 146, 102558. [Google Scholar] [CrossRef]
Lee, Y.-H. Does Digital Game Interactivity Always Promote Self-Efficacy? Cyberpsychology Behav. Soc. Netw. 2015, 18, 669–673. [Google Scholar] [CrossRef] [PubMed]
Underwood, G. (Ed.) Eye Guidance in Reading and Scene Perception; Elsevier: Amsterdam, The Netherlands, 1998. [Google Scholar]
Wiemeyer, J.; Kickmeier-Rust, M.; Steiner, C.M. Performance Assessment in Serious Games. In Serious Games; Dörner, R., Göbel, S., Effelsberg, W., Wiemeyer, J., Eds.; Springer International Publishing: Cham, Switzerland, 2016; pp. 273–302. ISBN 978-3-319-40611-4. [Google Scholar]
Essoe, J.K.-Y.; Reggente, N.; Ohno, A.A.; Baek, Y.H.; Dell’Italia, J.; Rissman, J. Enhancing Learning and Retention with Distinctive Virtual Reality Environments and Mental Context Reinstatement. npj Sci. Learn. 2022, 7, 31. [Google Scholar] [CrossRef] [PubMed]
Brockmyer, J.H.; Fox, C.M.; Curtiss, K.A.; McBroom, E.; Burkhart, K.M.; Pidruzny, J.N. The Development of the Game Engagement Questionnaire: A Measure of Engagement in Video Game-Playing. J. Exp. Soc. Psychol. 2009, 45, 624–634. [Google Scholar] [CrossRef]
Pourabdollahian, B.; Taisch, M.; Kerga, E. Serious Games in Manufacturing Education: Evaluation of Learners’ Engagement. Procedia Comput. Sci. 2012, 15, 256–265. [Google Scholar] [CrossRef]
Wiebe, E.N.; Lamb, A.; Hardy, M.; Sharek, D. Measuring Engagement in Video Game-Based Environments: Investigation of the User Engagement Scale. Comput. Hum. Behav. 2014, 32, 123–132. [Google Scholar] [CrossRef]
Ke, F.; Xie, K.; Xie, Y. Game-based Learning Engagement: A Theory- and Data-driven Exploration. Br. J. Educ. Technol. 2016, 47, 1183–1201. [Google Scholar] [CrossRef]
González-Estrada, E.; Villaseñor, J.A.; Acosta-Pech, R. Shapiro-Wilk Test for Multivariate Skew-Normality. Comput. Stat. 2022, 37, 1985–2001. [Google Scholar] [CrossRef]
Blair, R.C.; Higgins, J.J. Comparison of the Power of the Paired Samples t Test to That of Wilcoxon’s Signed-Ranks Test Under Various Population Shapes. Psychol. Bull. 1985, 97, 119–128. [Google Scholar] [CrossRef]
Schmider, E.; Ziegler, M.; Danay, E.; Beyer, L.; Bühner, M. Is It Really Robust? Reinvestigating the Robustness of ANOVA Against Violations of the Normal Distribution Assumption. Methodology 2010, 6, 147–151. [Google Scholar] [CrossRef]
Keselman, H.J.; Algina, J.; Kowalchuk, R.K. The Analysis of Repeated Measures Designs: A Review. Br. J. Math. Stat. Psychol. 2001, 54, 1–20. [Google Scholar] [CrossRef]
Gao, Y.; Cao, Z.; Liu, J.; Zhang, J. A Novel Dynamic Brain Network in Arousal for Brain States and Emotion Analysis. Math. Biosci. Eng. 2021, 18, 7440–7463. [Google Scholar] [CrossRef] [PubMed]
De Winter, J.C.F.; Gosling, S.D.; Potter, J. Comparing the Pearson and Spearman Correlation Coefficients across Distributions and Sample Sizes: A Tutorial Using Simulations and Empirical Data. Psychol. Methods 2016, 21, 273–290. [Google Scholar] [CrossRef] [PubMed]
Braun, V.; Clarke, V. Using Thematic Analysis in Psychology. Qual. Res. Psychol. 2006, 3, 77–101. [Google Scholar] [CrossRef]
Johnson, D.; Gardner, M.J.; Perry, R. Validation of Two Game Experience Scales: The Player Experience of Need Satisfaction (PENS) and Game Experience Questionnaire (GEQ). Int. J. Hum.-Comput. Stud. 2018, 118, 38–46. [Google Scholar] [CrossRef]
Juvrud, J.; Ansgariusson, G.; Selleby, P.; Johansson, M. Game or Watch: The Effect of Interactivity on Arousal and Engagement in Video Game Media. IEEE Trans. Games 2022, 14, 308–317. [Google Scholar] [CrossRef]
Vickery, N.; Tancred, N.; Wyeth, P.; Johnson, D. Directing Narrative in Gameplay: Player Interaction in Shaping Narrative in the Witcher 3. In Proceedings of the 30th Australian Conference on Computer-Human Interaction, Melbourne, Australia, 4 December 2018; pp. 495–500. [Google Scholar]
Kara, N. A Mixed-Methods Study of Cultural Heritage Learning through Playing a Serious Game. Int. J. Hum.-Comput. Interact. 2024, 40, 1397–1408. [Google Scholar] [CrossRef]
Fu, H.; Wang, P.; Zhou, J.; Zhang, S.; Li, Y. Investigating Influence of Visual Elements of Arcade Buildings and Streetscapes on Place Identity Using Eye-Tracking and Semantic Differential Methods. Buildings 2023, 13, 1580. [Google Scholar] [CrossRef]
Keller, K.L.; Staelin, R. Effects of Quality and Quantity of Information on Decision Effectiveness. J. Consum. Res. 1987, 14, 200. [Google Scholar] [CrossRef]
Bonometti, V.; Ruiz, M.J.; Drachen, A.; Wade, A. Approximating the Manifold Structure of Attributed Incentive Salience from Large-Scale Behavioural Data: A Representation Learning Approach Based on Artificial Neural Networks. Comput. Brain Behav. 2023, 6, 280–315. [Google Scholar] [CrossRef]
Zhang, H.; Cao, L.; Howell, G.; Peng, C. VR Edufication on Historic Lunar Roving Missions. In Proceedings of the 2022 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW), Christchurch, New Zealand, 12–16 March 2022; pp. 612–613. [Google Scholar]
Ravaja, N.; Saari, T.; Salminen, M.; Laarni, J.; Kallinen, K. Phasic Emotional Reactions to Video Game Events: A Psychophysiological Investigation. Media Psychol. 2006, 8, 343–367. [Google Scholar] [CrossRef]
Kober, S.E.; Wood, G.; Kiili, K.; Moeller, K.; Ninaus, M. Game-Based Learning Environments Affect Frontal Brain Activity. PLoS ONE 2020, 15, e0242573. [Google Scholar] [CrossRef]
Vahlo, J.; Tuuri, K.; Välisalo, T. Exploring Gameful Motivation of Autonomous Learners. Front. Psychol. 2022, 13, 825840. [Google Scholar] [CrossRef]
Babu, J. Video Game HUDs: Information Presentation and Spatial Immersion; Rochester Institute of Technology: Rochester, NY, USA, 2012. [Google Scholar]

Figure 1. Conceptual Framework of Visual Elements, Interaction, Cognition, and Affect.

Figure 2. Tobii Pro Glasses 3.

Figure 3. Overview of the experimental procedure.

Figure 4. Flowchart of the serious game’s task timeline and interaction process, distinguishing actions as active (mandatory) or passive (optional).

Figure 5. AOI of elements. All AOIs were manually drawn within the Tobii Pro Lab software based on the pixel boundaries of visual elements.

Figure 6. Viewers’ visual behavior data for different visual elements. (a) TFD, (b) AFD, (c) FC, (d) TFF, (e) FFD, and (f) VC. For the post hoc pairwise comparisons showing significant differences among specific visual elements (repeated-measures ANOVA with Bonferroni correction), see Table 6. The circles and asterisks in the boxplots represent outliers and extreme outliers, respectively, defined as data points located beyond 1.5 and 3 times the interquartile range (IQR) from the box edges.

Figure 7. The distribution of visual attention heatmaps in different Areas of Interest (AOIs). The color gradient indicates the concentration of gaze points, with red representing the highest attention intensity, yellow and green denoting secondary attention, and colorless regions indicating no significant attention. The letters indice (a–j) strictly correspond to the variable identifiers used in the statistical analysis in Table 6 for cross-referencing; specifically, (i1–i3) represent various interactive props that serve as core carriers of mission objectives.

Table 1. The meaning of eye movement metrics.

Metrics	Indicator	Abbreviation	Basic Significance
Fixation metrics	Total fixation duration/(s)	TFD	A higher TFD value indicates greater attention to the AOI and a greater investment in cognitive resources.
	Fixation count/(no. of visits)	FC	A higher FC value suggests more frequent visual sampling of the AOI, indicating greater information load or higher processing difficulty.
	Time to first fixation (s)	TFF	A shorter TFF indicates faster movement of the eyes to the AOI, suggesting stronger visual salience of the element.
	First fixation duration/(s)	FFD	A longer FFD reflects greater initial processing depth of the AOI, or greater inherent attraction of the element.
	Average fixation duration (s)	AFD	A higher AFD value implies that AOI conveys more complex information, requiring sustained cognitive processing.
	Visit count/(no. of visits)	VC	A higher VC value indicates a more frequent re-entry of the subject’s gaze into the AOI, reflecting the element’s reference value or attractiveness during visual exploration.

Table 2. Learning performance assessment questionnaire.

Assessment Factors	Closed-Ended Questions
Knowledge Retention	Q1. In what year was the Historic Centre of Macau inscribed as a World Heritage site?
	Q2. Approximately how many years spans the construction period of the Historic Centre of Macau from its earliest structures to the present day?
	Q3. What historical structure preceded the Ruins of St. Paul’s?
	Q4. Which of the following was not a name used by the Basilica of Our Lady of the Rosary?
	Q5. When does the Our Lady of Fatima Procession take place annually at St. Dominic’s Church?
Knowledge Comprehension	Q6. Which of the following buildings is not located within the Historic Centre of Macau?
	Q7. What core cultural elements does the historical axis of the Historic Centre of Macau primarily connect?
	Q8. Which architectural style do the Ruins of St. Paul’s belong to?
	Q9. Which of the following elements does not appear in the carvings of the Ruins of St. Paul’s?
	Q10. What is the primary color used on the exterior facade of the Basilica of Our Lady of the Rosary?

Table 3. Reliability Analysis of the Game Experience Questionnaire Scales.

Factor	Single-Choice Questions	Cronbach’s α
Engagement		0.931
Task Engagement		0.851
Boredom	Q1. I felt bored during the game. Q2. The task design was monotonous and repetitive. Q3. I find this game unappealing, and it does not spark my interest.	0.777
Achievement	Q4. I felt a sense of accomplishment after completing tasks. Q5. I felt proud of my in-game performance. Q6. The game rewards enhanced my satisfaction.	0.847
Competence	Q7. I felt skilled. Q8. I achieved the game goals quickly. Q9. I felt competent.	0.832
Fatigue	Q10. My energy decreased during the game. Q11. I felt tired after prolonged play. Q12. I felt weary after task completion.	0.808
Emotional Engagement		0.901
Immersion	Q13. I was fully immersed in the game. Q14. The game’s story was interesting. Q15. The game offered rich exploration.	0.787
Nervousness	Q16. The game made me feel annoyed. Q17. The task difficulty caused anxiety. Q18. The game pressure made me irritable.	0.870
Flow	Q19. I lost track of time while playing. Q20. Playing felt natural and effortless. Q21. I was unaware of my surroundings.	0.842
Positive Emotion	Q22. The game made me feel happy and excited. Q23. The game was interesting. Q24. I felt satisfied after playing.	0.833
Negative Emotion	Q25. My mind wandered during the game. Q26. The game was tedious and boring. Q27. I felt frustrated after the game ended.	0.738

Table 4. Descriptive Statistics of Learning Performance (N = 30).

Factor	Pre-Test M (SD)	Post-Test M (SD)
Total Score	3.93 (2.08)	6.73 (2.03)
Knowledge Retention	1.53 (0.97)	3.40 (1.22)
Knowledge Comprehension	2.40 (1.38)	3.33 (1.03)

Table 5. Results of Significance Tests for Pre-Post Differences in Learning Performance.

Factor	Statistical Method	Statistic
Total Score	Paired-samples t-test	t(29) = −10.958
Knowledge Retention	Wilcoxon signed-rank test	z = 4.765
Knowledge Comprehension	Wilcoxon signed-rank test	z = 4.128

** indicates significant differences at the 0.01 level (p < 0.01).

Table 6. Results of one-way repeated-measures ANOVA for eye-tracking indicators across visual elements.

	TFD (s)	AFD (s)	FC (s)	TFF (s)	FFD (s)	VC
F	55.493	4.809	71.397	67.550	2.853	56.602
df_GG	3.29	4.32	3.03	4.34	4.35	3.28
η_p²	0.657	0.142	0.711	0.700	0.090	0.661
p-value	<0.001	0.001	<0.001	<0.001	0.023	<0.001
Visual elements (AOIs)
(a) Menu Icon	0.59 (0.84) ^bcdefhij	0.17 (0.15) ^bdefhi	2.07 (2.08) ^bcdefhij	17.81 (32.63) ^bcdefghij	0.16 (0.14) ^bcdefi	2.00 (2.00) ^bcdefhij
(b) Function Icon	13.41 (7.59) ^acdegi	0.31 (0.13) ^a	42.60 (27.93) ^acdefgij	112.36 (44.43) ^acdefhij	0.28 (0.11) ^a	32.23 (18.72) ^aefgi
(c) Text Box	47.38 (25.76) ^abdefghj	0.27 (0.09) ⁱ	172.07 (79.75) ^abdefghj	143.18 (47.09) ^abfhij	0.27 (0.12) ^a	52.60 (31.54) ^aefghi
(d) Historic Building	21.34 (13.97) ^abcefgi	0.31 (0.13) ^a	61.93 (35.05) ^abcefgi	153.02 (50.13) ^abfij	0.28 (0.13) ^a	35.60 (17.16) ^aefghi
(e) Spatial Navigation	2.43 (2.23) ^abcdfhij	0.37 (0.26) ^a	6.27 (5.66) ^abcdfhij	169.53 (73.06) ^abj	0.38 (0.28) ^a	5.83 (5.07) ^abcdfhij
(f) Character	8.73 (6.21) ^acdeghij	0.35 (0.15) ^a	24.20 (17.54) ^abcdeghij	203.32 (69.07) ^abcdgj	0.32 (0.14) ^a	21.27 (14.05) ^abcdegij
(g) NPC	2.12 (2.32) ^bcdfhij	0.26 (0.18)	5.63 (5.28) ^bcdfhij	151.16 (90.13) ^afj	0.31 (0.36)	5.03 (4.50) ^bcdfhij
(h) Dialogue Box	14.85 (10.38) ^acefgi	0.29 (0.09) ^a	50.60 (34.09) ^acefgi	178.49 (63.85) ^abcj	0.27 (0.21)	20.97 (16.39) ^acdegij
(i) Prop	48.40 (27.55) ^abdefghj	0.32 (0.09) ^ac	146.80 (84.01) ^abdefghj	187.62 (50.31) ^abcdj	0.31 (0.09) ^a	92.20 (46.79) ^abcdefghj
(j) Architectural Light and Shadow Show	24.84 (18.61) ^acefgi	0.32 (0.26)	81.17 (59.55) ^abcefgi	296.74 (114.05) ^abcdefghi	0.26 (0.27)	43.03 (31.32) ^aefghi

Since none of the indicator data satisfied the sphericity assumption (Mauchly’s sphericity test p < 0.05), all F values and degrees of freedom (df_GG) are reported based on the Greenhouse–Geisser correction. Superscript letters following means in the table denote elements showing significant differences (p < 0.05) in pairwise post hoc comparisons adjusted using Bonferroni (a: menu icon; b: function icon; c: text box; d: historic building; e: spatial navigation; f: character; g: NPC; h: dialogue box; i: prop; j: architectural light and shadow show).

Table 7. Correlation analysis between eye movement entropy and GEQ dimensions.

Interaction Type	Visual Elements	Evaluation Factor	Transition Entropy	Stationary Entropy
Active	NPC	Boredom	0.355 ⁺	0.391 ⁺
		Fatigue	0.375 ⁺	0.309
		Negative Emotion	0.442 *	0.500 *
	Character	Flow	−0.372 *	−0.329 ⁺
	Function Icon	Fatigue	−0.362 ⁺	0.070
		Competence	0.041	0.314 ⁺
	Menu Icon	Nervousness	−0.435 ⁺	−0.216
		Positive Emotion	−0.402 ⁺	−0.440 ⁺
Passive	Historic Building	Nervousness	0.181	0.433 *

This table reports uncorrected p-values to identify potential associations and suggest directions for subsequent validation. In the results, * p < 0.05 indicates significant evidence, while ⁺ p < 0.10 indicates trend evidence. Future studies will incorporate multiple-comparison control procedures (e.g., Benjamini–Hochberg’s FDR method) for robustness assessment based on expanded sample sizes. The correlation analysis in this report is exploratory in nature. Future validation will be conducted with larger samples and enhanced statistical power.

Table 8. Correlation Analysis of Active Interaction Visual Elements Click Behavior with GEQ.

Visual Elements	Evaluation Factor	Achievement	Immersion	Nervousness	Positive Emotion
Character	Click Behavior	0.455 *	0.319	0.154	0.171
NPC	Click Behavior	−0.423 *	−0.390 *	−0.374 *	−0.582 **
Dialogue Box	Click Behavior	−0.326	−0.440 *	−0.189	−0.405 *

** indicates p < 0.01, meaning the correlation is significant at the 0.01 level; * indicates p < 0.05, meaning the correlation is significant at the 0.05 level.

Table 9. Correlation Analysis of Visual Elements Interaction Behavior with User Experience.

Type	Visual Elements	Evaluation Factor	TFD	AFD	FC	TFF	FFD	VC
Active	Menu Icon	Achievement	−0.119	−0.451 *	0.016	−0.232	−0.415	0.048
	Function Icon	Fatigue	0.009	0.403 *	−0.126	−0.02	0.192	−0.215
		Nervousness	0.1095	0.466 **	−0.053	0.0116	0.349	−0.0813
		Negative Emotion	0.340	0.033	0.378 *	0.337	0.032	0.285
	Spatial Navigation	Negative Emotion	−0.184	−0.400 *	−0.002	0.118	−0.333	0.002
	Character	Competence	−0.329	0.310	−0.463 **	−0.407 *	0.129	−0.409 *
		Immersion	0.146	0.361 *	0.012	0.079	0.256	0.020
	NPC	Competence	−0.394	−0.335	−0.367	−0.357	−0.132	−0.347
		Flow	0.344	0.346	0.227	0.292	0.201	0.271
	Dialogue Box	Tension	−0.353	−0.029	−0.423 *	0.148	0.031	−0.351
	Prop	Competence	−0.029	0.124	−0.226	−0.383 *	0.108	−0.211
Passive	Text Box	Boredom	0.313	−0.063	0.378 *	0.243	−0.006	0.189
		Achievement	0.266	−0.250	0.378 *	0.219	−0.154	0.272
		Immersion	0.325	−0.152	0.417 *	0.175	−0.113	0.306
		Flow	0.284	0.199	0.283	0.023	0.367 *	0.025
		Positive Emotion	0.272	−0.258	0.363 *	0.262	−0.075	0.189
	Architectural Light and Shadow Show	Achievement	0.257	−0.165	0.375 *	0.022	−0.194	0.317
		Competence	−0.171	0.143	−0.205	−0.466 **	0.038	−0.349
		Immersion	0.298	−0.185	0.381 *	0.011	−0.283	0.312
		Positive Emotion	0.393 *	−0.277	0.526 **	−0.034	−0.359	0.443 *

** indicates p < 0.01, meaning the correlation is significant at the 0.01 level; * indicates p < 0.05, meaning the correlation is significant at the 0.05 level.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhao, P.; Wang, P.; Lu, Y.; Lu, Y.; Wang, Z. Interactive Experience Design for the Historic Centre of Macau: A Serious Game-Based Study. Buildings 2026, 16, 323. https://doi.org/10.3390/buildings16020323

AMA Style

Zhao P, Wang P, Lu Y, Lu Y, Wang Z. Interactive Experience Design for the Historic Centre of Macau: A Serious Game-Based Study. Buildings. 2026; 16(2):323. https://doi.org/10.3390/buildings16020323

Chicago/Turabian Style

Zhao, Pengcheng, Pohsun Wang, Yi Lu, Yao Lu, and Zi Wang. 2026. "Interactive Experience Design for the Historic Centre of Macau: A Serious Game-Based Study" Buildings 16, no. 2: 323. https://doi.org/10.3390/buildings16020323

APA Style

Zhao, P., Wang, P., Lu, Y., Lu, Y., & Wang, Z. (2026). Interactive Experience Design for the Historic Centre of Macau: A Serious Game-Based Study. Buildings, 16(2), 323. https://doi.org/10.3390/buildings16020323

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Interactive Experience Design for the Historic Centre of Macau: A Serious Game-Based Study

Abstract

1. Introduction

2. Materials and Methods

2.1. Project Description

2.2. Experimental Preparation

2.2.1. Photograph Selection

2.2.2. Participants

2.3. Experimental Procedure

Interaction Logic and Task Flow

2.4. Eye-Tracking Experiment

2.4.1. Defining AOI

2.4.2. Eye-Tracking Metrics

2.5. Measurement Instruments

2.5.1. Learning Performance Questionnaire

2.5.2. Game Experience Questionnaire

2.6. Data Analysis

2.6.1. Quantitative Analysis

2.6.2. Definition and Calculation Methodology of Information Dynamics Indicators

2.6.3. Qualitative Analysis: Thematic Analysis of Interviews

3. Results

3.1. Effect on Academic Performance Improvement (RQ1)

3.2. Differences in Eye-Tracking Behavior Across Visual Elements (RQ2)

3.3. Eye-Tracking Heat Map Distribution and Analysis

3.4. The Relationship Between Information Dynamism and Emotional Arousal (RQ3)

3.5. The Impact of Visual Elements on Interactive Behavior and User Experience (RQ4)

3.6. Thematic Analysis of Interview Results

4. Discussion

4.1. Enhancing Cultural Heritage Learning Through Serious Games

4.2. The Influence of Visual Elements on Interactive Behavior and Cognitive Resource Allocation

4.3. Differential Impact of Interactivity on Emotional Experience

4.4. Limitations and Future Work

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI