The Art Nouveau Path: Longitudinal Analysis of Students’ Perceptions of Sustainability Competence Development Through a Mobile Augmented Reality Game

Ferreira-Santos, João; Pombo, Lúcia

doi:10.3390/computers15020086

Open AccessArticle

The Art Nouveau Path: Longitudinal Analysis of Students’ Perceptions of Sustainability Competence Development Through a Mobile Augmented Reality Game

by

João Ferreira-Santos

^*

and

Lúcia Pombo

^*

CIDTFF—Research Centre on Didactics and Technology in Education of Trainers, Department of Education and Psychology, University of Aveiro, 3810-193 Aveiro, Portugal

^*

Authors to whom correspondence should be addressed.

Computers 2026, 15(2), 86; https://doi.org/10.3390/computers15020086 (registering DOI)

Submission received: 23 December 2025 / Revised: 23 January 2026 / Accepted: 27 January 2026 / Published: 1 February 2026

(This article belongs to the Special Issue Extended or Mixed Reality (AR + VR): Technology and Applications (2nd Edition))

Download

Browse Figures

Versions Notes

Abstract

This paper presents a repeated cross-sectional longitudinal (trend) analysis of students’ self-perceived sustainability competence development across three waves surrounding participation in the Art Nouveau Path, a heritage-based mobile augmented reality game designed to foster sustainability competences, located in Aveiro, Portugal. In total, 1094 questionnaires were collected using a GreenComp-grounded instrument adapted from the GreenComp-based Questionnaire (GCQuest) to this context (25 items; 6-point Likert). Data were gathered at three stages: pre-intervention (S1-PRE; N = 221), immediately post-intervention (S2-POST; N = 439; n = 438 retained for scale scoring after applying a predefined completeness criterion), and follow-up (S3-FU; N = 434). Because responses were anonymous, waves were treated as independent samples rather than within-student trajectories. The Embodying Sustainability Values domain score and item-level response distributions were compared across waves using ordinal-appropriate non-parametric group comparisons, effect-size estimation, and descriptive threshold indicators. Results indicate an improvement from pre-intervention to post-intervention, followed by partial attenuation at follow-up while remaining above pre-intervention. Mean scores increased from 3.70 (S1-PRE) to 4.64 (S2-POST) and then stabilized at 4.13 (S3-FU). Findings, while exploratory, suggest that this heritage-based augmented reality game may have enhanced perceived sustainability competences. A structured program of follow-up activities is proposed to help sustain gains.

Keywords:

cultural heritage education; mobile augmented reality game (MARG); repeated cross-sectional (trend) design; GreenComp; embodying sustainability values; self-perceived sustainability competences; human–computer interaction (HCI)

1. Introduction

Education for Sustainable Development (ESD) calls for learning experiences that help students develop not only knowledge, but also the values, attitudes, and dispositions required to support sustainability transitions [1,2]. Within this agenda, cultural and built heritage is increasingly understood as more than an object of conservation: it functions as a situated, value-laden resource through which learners can debate what societies choose to preserve, for whom, and with what responsibilities across generations [3,4,5]. In historic urban landscapes, everyday architectural traces can make sustainability tensions tangible, anchoring discussions about stewardship, identity, social cohesion, and the public value of place.

In parallel, mobile augmented reality (AR) has been adopted to connect digital content to real-world contexts and to support outdoor, inquiry-oriented learning. Evidence syntheses indicate that AR can support motivation and engagement and can enable multimodal interaction when activities occur in authentic settings such as cities, museums, and heritage sites [6,7]. When combined with game mechanics, Mobile Augmented Reality Games (MARGs) can structure learning as place-based quests and collaborative challenges, making interaction design and attention orchestration between the screen and the environment central to the educational value of these interventions.

To support competence-oriented ESD, the European Commission’s Competence Framework for Sustainability, also known as the GreenComp, provides a shared reference for articulating sustainability competences across educational levels. GreenComp defines twelve interrelated competences organized into four competence areas: Embodying Sustainability Values, Embracing Complexity in Sustainability, Envisioning Sustainable Futures, and Acting for Sustainability [8]. While GreenComp has been considered as a policy and curriculum reference, two challenges remain salient for empirical work in authentic contexts: embedding competence descriptors into concrete learning tasks and assessing competence development in ways that remain meaningful under real educational limitations, including the frequent need for anonymous participation [9].

This paper focuses on the Art Nouveau Path, a heritage-based MARG implemented in Aveiro, Portugal, and developed within the EduCITY digital teaching and learning ecosystem (DTLE) [10,11,12]. The MARG guides students through a curated urban path and uses georeferenced points of interest (POI), AR visual overlays, and question-driven challenges to connect built heritage interpretation with sustainability themes. Prior works have reported the game’s design rationale, its alignment with GreenComp, and its validation with teachers, alongside pre-intervention students’ diagnostics on sustainability awareness and interest in learning through Art Nouveau heritage. Collectively, this program of works supports the feasibility and pedagogical plausibility of using heritage-based mobile AR to advance ESD aims in situ. However, a key evaluation gap persists in the broader AR-in-education literature: evidence is frequently derived from short-term evaluations, making it difficult to judge whether perceived competence-related benefits persist beyond the immediate experience, and may be confounded by novelty effects [6,7].

In cultural heritage AR, recent syntheses similarly note that evaluation is frequently framed around immediate experience and diverse outcomes [6,7], while the methodological basis for design and evaluation remains uneven, reinforcing the need for more explicit and robust evaluation approaches in authentic heritage contexts [13]. This limitation is especially consequential for the value-related aspects of sustainability competences, where durable change is likely to require reinforcement through continued reflection and opportunities to act.

This study addresses this gap using a repeated cross-sectional longitudinal (trend) design focused on students’ self-perceived sustainability competences. Repeated cross-sectional designs are commonly used to estimate population-level change over time when individual linkage is infeasible, and they provide a pragmatic alternative to panel designs in school contexts where anonymity, attrition, and respondent conditioning can threaten inference [14,15,16,17]. In this study, self-perceived competences were assessed using a GreenComp-grounded questionnaire adapted from the GreenComp-based Questionnaire (GCQuest) [18] to the Art Nouveau Path context.

Data was collected at three stages, surrounding the intervention: pre-intervention (S1-PRE; N = 221), immediate post-intervention (S2-POST; N = 439 collected; n = 438 retained for scale-based analyses after applying a predefined completeness criterion), and follow-up (S3-FU; N = 434), yielding 1094 questionnaires overall (1093 retained for scale-based analyses). Because participation was anonymous, waves were treated as independent samples, enabling cohort-level comparisons across time while avoiding claims about within-student trajectories. This analysis focuses, in line with GCQuest [18], on the GreenComp competence area ‘Embodying Sustainability Values’, which foregrounds valuing sustainability, reflecting on responsibility, and aligning intentions and actions with sustainability principles [8].

By examining domain scores, item-level trajectories, and threshold-based patterns (for example, shifts in the proportion of students reaching higher perceived competence bands), this study clarifies which value-related perceptions are most responsive immediately after the experience and which appear more sustained over time. Findings aim to inform both the curricular integration of heritage-based mobile AR experiences in ESD and the methodological discussion on how competence-oriented outcomes can be studied under realistic educational context and constraints using repeated cross-sectional datasets [14,15,19].

The significance of this paper is threefold. First, the three-wave trend design extends evaluation beyond immediate post-activity measurement, clarifying whether perceived sustainability values show attenuation or persistence at follow-up in a school-based heritage MARG. Second, the repeated cross-sectional approach provides a transparent evaluation strategy under anonymity and logistical constraints, supporting cohort-level monitoring without within-student linkage. Third, combining domain-level, item-level, and threshold-based indicators makes value-related response patterns more legible for curricular integration and for designing reinforcement opportunities that support consolidation over time.

Accordingly, this study addresses the following research questions (RQ):

RQ1. ‘How do students’ perceived sustainability competences in the GreenComp competence area Embodying Sustainability Values evolve across the three waves surrounding the Art Nouveau Path, from pre-intervention (S1-PRE) to immediate post-intervention (S2-POST) and follow-up (S3-FU)?’

RQ2. ‘How does the proportion of students reaching higher perceived competence bands (for example, scores ≥ 4.0 and ≥4.5 on the six-point scale) in Embodying Sustainability Values change between pre-intervention, post-intervention, and follow-up?’

RQ3. ‘Which GCQuest items within Embodying Sustainability Values show the largest and most sustained changes between waves, and what item-level response patterns emerge when comparing pre-intervention, post-intervention, and follow-up?’.

This paper is organized into six sections. Following the Introduction, Section 2 presents the theoretical framework, Section 3 describes the materials and methods, Section 4 reports the results, Section 5 discusses the findings in relation to the research questions and prior work, and Section 6 concludes with implications, limitations, and directions for future research paths.

2. Theoretical Framework

This broader research, and, in particular, this study, is the intersection of digitally mediated heritage education, MARG-based learning, and competence-oriented evaluation in ESD and in Education for Sustainability (EfS). The framework clarifies (i) why built heritage is a credible context for value-oriented sustainability learning, (ii) why MARGs are an appropriate intervention format in authentic urban settings, and (iii) why a repeated cross-sectional self-report design is a methodologically transparent strategy for examining persistence and attenuation when anonymous participation prevents within-student linkage.

2.1. Cultural Heritage Education and Digital Mediation

Cultural heritage education is currently perceived not as a static compilation of monuments and knowledge, but embraced as a lived, negotiated resource that can help communities and learners construct significance [20]. This entails heritage not only from a conservation lens but also as a common space of interpretation, conversation, and assessment considering today’s necessities and tomorrow’s commitments [21,22]. Policy and heritage research have similarly emphasized the social value of cultural heritage and its role in supporting citizenship, inclusion, and public responsibility, particularly when heritage is approached as a shared reference for societal choices and intergenerational care [3]. In urban contexts, built heritage provides tangible, place-based anchors that can connect identity and memory with stewardship, public value, and responsibility across time [23,24].

This orientation is especially relevant to ESD because sustainability dilemmas are inherently normative and involve decisions about what should be sustained, for whom, and under which value assumptions. A heritage lens can make these dilemmas concrete by setting them in everyday environments where learners can observe traces of past choices and debate their implications for present and future urban life [5]. It also aligns with values-based approaches to heritage management, which foreground how values are articulated, contested, and negotiated in real decision contexts [25].

Digital mediation extends heritage learning ecologies by adding interpretive layers that can be aligned with curricular goals and learners’ situated actions. In outdoor heritage learning, mobile technologies can scaffold attention to architectural details, provide contextual narratives, and prompt learners to observe, compare, and discuss features that might otherwise remain unnoticed. Digital mediation is not merely informational: in sustainability-oriented designs, it can be structured to elicit reflection on responsibility, trade-offs, and collective decision-making by inviting learners to link local heritage to broader sustainability themes [3,5]. This is particularly relevant in historic urban landscapes, where the negotiation between conservation, tourism, economic development, and climate adaptation often surfaces value tensions that are visible to learners in situ [23,26].

AR can embed interpretive prompts directly in place, linking digital representations to embodied presence and supporting place-based inquiry and dialog [27]. When aligned with coherent pedagogy and facilitation, AR can help connect observation and interpretation to value-oriented dimensions of ESD that are difficult to address through classroom-only approaches.

2.2. AR and Mobile Game-Based Learning

AR has been widely discussed as an educational medium because it overlays digital content onto real environments, enabling learners to interact with information that is spatially and contextually anchored. Evidence syntheses report that AR is frequently associated with affordances for multimodal and inquiry-oriented activities in authentic contexts such as outdoor environments, museums, and heritage sites [6,7]. These affordances are relevant for heritage learning, where noticing and interpreting physical features in context is central to meaning-making [28,29].

At the same time, research consistently emphasizes that educational value depends on design quality and orchestration. AR experiences can introduce extraneous cognitive load through split attention, interface complexity, or novelty effects, potentially shifting learners’ effort toward operating the technology rather than engaging with the intended conceptual or reflective task. In outdoor settings, orchestration demands are amplified by navigation, time constraints, weather, group coordination, and teacher supervision, making scaffolding and goal clarity critical to avoid experience without learning [30].

The combination of AR contents and Game-Based Learning (GBL) elements such as quests, feedback, rules, collaboration, and narrative progression positions MARG’s value as educational tools [31,32]. In heritage and sustainability contexts, this format can transform urban space into an interactive learning environment where learners move, observe, negotiate meanings with peers, and solve challenges in situ. In competence-oriented designs, game mechanics should function as purposeful learning structures rather than motivational wrappers, eliciting processes such as evidence-based interpretation, collaborative justification, and value-oriented reasoning. When properly designed, MARGs can enhance learning outcomes. This depends on how interaction is designed considering relevant elements such as navigation cues, information density, flow and aims, feedback contents and timing, and the concern regarding attention between the screen and the physical environment [33].

Remarkably, AR and MARG studies in heritage contexts often emphasize immediate engagement and user experience, while offering limited evidence about whether competence-related perceptions persist beyond the intervention moment [7,34]. In cultural heritage AR specifically, methodological reviews argue for greater clarity and consistency in design and evaluation approaches, motivating designs that extend beyond immediate post-activity impressions [13].

2.3. Educational Data Mining and Learning Analytics as an Evaluation Lens

Educational Data Mining (EDM) and Learning Analytics (LA) provide complementary lenses for understanding learning in technology-mediated environments and for supporting iterative improvement of digital learning designs. In GBL, these approaches are frequently associated with digital trace data and with game LA, which combine visual and data mining techniques to better understand player learning and improve serious games [34,35,36]. In the present study, EDM/LA are discussed as background but are not applied as trace-based analytic methods; the empirical analyses rely on repeated cross-sectional questionnaire data.

More broadly, LA has explored evaluation approaches that remain anchored in learning constructs and that report evidence transparently and meaningfully rather than relying on novelty or engagement alone [37]. Recent systematic reviews have also mapped out how analytics can support evaluation and design refinement in serious games and GBL implementations in formal education [38,39].

Regarding this study, the most relevant contribution of the analytics literature is methodological rather than trace-based. A competence-oriented evaluation stance requires evidence aligned with a competence model, measurable at scale, and sensitive to temporal dynamics, so repeated measurement can differentiate immediate post-intervention salience effects from more sustained shifts. In settings where anonymity prevents within-participant linkage, repeated cross-sectional designs are a recognized strategy for estimating population-level change over time and are often preferred when attrition or respondent conditioning make panel designs impractical [14,15]. Comparative methodological work also highlights that design choices can shape trend conclusions, underscoring the need for transparent reporting and careful interpretation [40,41].

In practical terms, this motivates distribution-aware analyses of Likert-type outcomes, effect-size reporting, threshold-based analyses, and item-level pattern inspection across waves, which map directly onto RQ1 to RQ3 [15,19]. Considering these, the analytics perspective in the present study is mobilized as an evaluation stance, justifying repeated measurement and transparent reporting.

2.4. Measuring Sustainability Competences Through Perceived Competence

Considering the relevance of sustainability competences including affective and normative domains [8], self-report instruments remain common in educational-based research, particularly when scalable measurement across cohorts is required. In the Art Nouveau Path context, perceived competence is treated as a meaningful outcome, reflecting learners’ self-assessed readiness, value orientation, and intention to align choices with sustainability principles [42]. However, self-reports also introduce known constraints, including social desirability and shifting reference frames. These constraints strengthen the need for careful interpretation and for study designs that avoid overclaims about individual development when anonymity prevents within-student linkage [43]. Repeated cross-sectional designs offer a pragmatic approach in such settings by supporting trend-focused inference while respecting ethical and practical constraints, as illustrated in large-scale school survey protocols that deliberately use repeated anonymous self-report waves [14,16,17]. In the broader research project, it was employed multi-method, and multi-informant (as teachers) approaches to mitigate these constraints [10,11,12,29].

Beyond its policy framing, recent GreenComp-aligned work has increasingly focused on operational translation and use across diverse learning settings. For example, learning-outcome work has been proposed to render GreenComp competences assessable and curriculum-facing [9]. In parallel, uptake evidence has been documented across formal and non-formal contexts, including competence-oriented implementation case studies and the development of a GreenComp-based “conversational game” resource explicitly designed for playful learning discussions [44,45]. In secondary education, GreenComp has also been used as a conceptual basis for validated assessment development, illustrating the wider shift from framework adoption to measurable evaluation instruments [46]. This emerging strand is particularly relevant for outdoor and gamified learning, where competence frameworks support explicit alignment between situated tasks, value-laden prompts, and evaluation choices under authentic orchestration constraints.

In the present study, sustainability competence development is examined through students’ self-perceptions using a GreenComp-grounded questionnaire adapted from GCQuest [18] to the Art Nouveau Path context. Consistent with the manuscript’s analytical focus and research questions, results are reported for the competence area Embodying Sustainability Values (ESV), capturing perceived valuing of sustainability, sense of responsibility, and intentions aligned with sustainability principles [8]. ESV is particularly pertinent in heritage-based interventions because heritage interpretation is inherently value-laden and invites reflection on responsibility and care, making it an appropriate target for examining persistence and attenuation across pre-intervention (S1-PRE), post-intervention (S2-POST), and follow-up waves (S3-FU).

2.5. Synthesis

Together, these strands clarify the rationale for examining a heritage-based MARG through repeated cross-sectional perceived-competence data. Heritage education and digital mediation justify built heritage as a value-laden learning context aligned with ESD, AR and MARG research motivates the intervention format while highlighting orchestration constraints that can affect learning quality, analytics perspectives motivate quantitative evaluation beyond immediate post-activity impressions, and GreenComp provides the competence model and measurement focus. This positioning aligns the present study with GreenComp-based efforts to move from framework adoption to operational, assessable and context-sensitive implementation, including playful and game-based formats [9,44,45].

Methodologically, repeated cross-sectional designs are a well-established strategy for estimating population-level change across waves when anonymity prevents individual linkage, and they are particularly appropriate in school-based research where ethical and practical constraints make panel tracking infeasible [14,15,16,17]. This integrated framework directly supports the study’s gap and research questions by justifying why ESV is an appropriate analytical focus for a heritage-based ESD intervention and why a three-wave trend design is a transparent strategy for examining persistence and attenuation across pre-intervention (S1-PRE), post-intervention (S2-POST), and follow-up (S3-FU) [19,41].

3. Methods and Materials

3.1. Research Design and Study Procedures

This study reports the quantitative survey component of a broader Design-Based Research case study [47,48,49,50] centered on the Art Nouveau Path, a heritage-based MARG implemented in Aveiro, Portugal, within the EduCITY DTLE. The present work isolates a three-wave questionnaire dataset to examine how students’ perceived sustainability competences vary across measurement moments surrounding participation in the intervention, addressing the persistence and attenuation gap identified in this study.

Methodologically, a repeated cross-sectional (trend) design was adopted. Data were collected at three measurement waves aligned with the same intervention format: pre-intervention prior to participation (S1-PRE), immediately after the game session (S2-POST), and a later follow-up moment (S3-FU). Considering that questionnaires were anonymous and administered in educational settings without any individual identifier, responses could not be linked across waves. Accordingly, the three waves were treated as independent samples, supporting cohort-level comparisons over time while avoiding claims about within-student developmental trajectories [14,15,19].

Data collection followed a sequential procedure aligned with the intervention timeline. At pre-intervention (S1-PRE), students completed the questionnaire before participating in the Art Nouveau Path session. The outdoor activity occurred, typically in small collaborative groups (3 to 4 elements) moving between POIs and completing place-based tasks on EduCITY Project mobile devices; immediately after the session, students completed the post-intervention questionnaire (S2-POST). The intervention consists of the outdoor Art Nouveau Path gameplay session and occurs between S1-PRE and S2-POST (Figure 1). At follow-up (S3-FU), the questionnaire was administered in class to capture medium-term patterns in perceived competences, approximately six to eight weeks after participation. Given anonymous administration and school scheduling constraints, the design involves partially overlapping cohorts rather than individually matched observations. Across waves, administration followed standard anonymous survey practices in educational settings [14,16,17].

An overview of the three-wave repeated cross-sectional design and the intervention-aligned measurement moments used in this study is presented in Figure 1.

Prior works detail the intervention rationale, design decisions, and ecosystem integration [10,11,12,29]. The present manuscript focuses on three-wave questionnaire evidence to address persistence and attenuation beyond immediate post-intervention measures; the intervention setting is summarized below.

3.2. Context and Intervention Setting

Data was collected with students during the implementation of the Art Nouveau Path, delivered as a location-based outdoor activity in Aveiro, Portugal, through the EduCITY DTLE [10,11,12,29]. The intervention was structured as a curated urban path comprising eight georeferenced points of interest (POIs) associated with Aveiro’s Art Nouveau built heritage. At each POI, students engage with place-based prompts and challenge items delivered via mobile devices, combining in situ observation of architectural features with digital interpretive content, optional AR, and quiz-driven tasks. Across the eight POIs, the MARG comprises 36 quiz items (coded P1.1 to P8.2) delivered through multimodal resources (for example, archival photographs, short videos, and AR overlays anchored to monuments and facades), with response submission via the app. Students typically completed the activity in teacher-formed collaborative groups (three to four members), producing one group response per item; items were designed to mix observation-based and conceptually demanding prompts rather than to follow a strict difficulty progression. For the purposes of this paper, the intervention is described at a functional level to contextualize why data collection occurred in authentic outdoor school conditions and why anonymous participation and independent-wave sampling were required. Full descriptions of game interaction design decisions and example task instances are available in prior publications [10,11,12,29].

3.3. Participants

Participants were students recruited through school-based implementations of the Art Nouveau Path in Aveiro, Portugal, within the broader EduCITY project, via the Municipal Educational Action Program of Aveiro (PAEMA, 2024/2025 edition) [51]. Participation was voluntary, resulting in a convenience sample.

The targeted population, by the MARG’s curricular alignment, comprised lower and upper secondary school students (grades 7 to 12), with an approximate age range of 13 to 18 years. During the on-site implementation of the Art Nouveau Path sessions (S2-POST wave), 439 students participated in the intervention session and completed the post-intervention questionnaire. They were distributed across 19 classes and six grade levels (7th: N = 19; 8th: N = 135; 9th: N = 156; 10th: N = 37; 11th: N = 20; 12th: N = 72), mainly from urban and peri-urban schools. No data on gender or socio-economic background was collected. Questionnaires were administered anonymously and did not capture respondent-level age, grade or school identifiers.

For contextual completeness regarding the intervention setting, students typically completed the outdoor game session in collaborative groups of three to four members, as organized by accompanying teachers.

3.4. Data Entry, Questionnaire Waves, and Analytical Samples

Across the three questionnaire waves (S1-PRE, S2-POST, S3-FU), a total of 1094 questionnaires were collected, namely, pre-intervention, prior to gameplay (S1-PRE; N = 221), immediate post-intervention (S2-POST; N = 439 collected), and follow-up after participation (S3-FU; N = 434). Analyses are therefore unadjusted (that is, no covariate control).

To ensure interpretable and stable domain-level scores, a completeness criterion was applied: respondents were included in scale-based analyses if they provided valid responses for at least 20 of the 25 Likert-type items. Applying this criterion led to the exclusion of one respondent in S2-POST and none in S1-PRE or S3-FU, yielding final analytic sample sizes of S1-PRE = 221, S2-POST = 438, and S3-FU = 434, for a total analytic dataset of 1093 responses. The referred excluded questionnaire had 7 non-answered Likert-type items responses.

Prior to the coding and data entry, the exclusion criterion was defined by the authors, namely, the predefined completeness rule (25 of 25 valid item responses) to define the analytic sample. All paper questionnaires (N = 1094) were coded and entered the spreadsheets by the first author. The missing entries were coded as “No Answered” (NA). The first author screened the datasets ensuring cross-checking the Likert-scale range (1 to 6) using spreadsheet’s tools, as “Find” and function “ISBLANK”. The second author performed a quality control analysis based on a random subsample of approximately 5% of questionnaires per wave. The full analysis pipeline was executed in R by the first author, and key numerical outputs were independently replicated in MATLAB (version R2025b) by external researcher as a computational verification step.

3.5. Instruments and Measures

3.5.1. GreenComp-Based Perceived Competence Questionnaire (S1-PRE, S2-POST, S3-FU)

Students’ perceived sustainability competences were assessed using a GreenComp-grounded questionnaire adapted from the GCQuest to the Art Nouveau Path context. The analyses use the 25-item Likert block, rated on a 6-point scale (1 to 6). A 6-point format was used to avoid a neutral midpoint and to increase discrimination across perceived competence levels in school-based self-report measurement [52,53,54].

To match each measurement moment while preserving conceptual equivalence, the questionnaire used wave-specific stems: pre-intervention (S1-PRE): “In my daily life, I try to…”, post-intervention (S2-POST): “This activity allowed me to…”, and follow-up (S3_FU): “Since the activity, in my daily life I try to…”. Across waves, item cores were kept as consistent as possible so that between-wave differences could be interpreted as trend shifts rather than artifacts of item meaning changes. Because stems differ by design, part of the between-wave variation may reflect framing and demand characteristics, including higher social desirability immediately post-intervention (S2-POST); therefore, results are interpreted as cohort-level trend evidence rather than within-students change.

Consistent with this work’s focus and RQs, the adapted 25-item instrument is used to capture students’ self-perceived competence within the GreenComp competence area ESV [8] in the Art Nouveau Path context. This manuscript does not aim to provide a full psychometric re-validation of the adapted version for each wave; instead, it reports wave-specific internal consistency as a dataset-level quality check and interprets between-wave differences as repeated cross-sectional trend evidence. Prior work within the EduCITY Project reports factorial validity evidence for GCQuest data using ordinal-appropriate Structural Equation Modeling (SEM) estimation [55].

3.5.2. The ESV Score as Measures Used in This Study

Consistent with this work’s RQ, the 25 Likert-type items were used to compute an ESV domain score for each respondent. The ESV score was computed as the arithmetic mean across available item responses, with higher values indicating higher perceived alignment with sustainability values. Domain scores were computed only for respondents meeting the predefined completeness criterion.

3.5.3. Derived Indicators for Threshold-Based Analyses

To support RQ2, threshold-based indicators were derived from the 1–6 domain score to summarize shifts in the proportion of students positioned in higher perceived competence bands across waves. The first cut point (≥4.0) was defined to represent performance above the scale midpoint (3.5 in a 1–6 scale) and to align with an agreement band in typical agree–disagree response formats. A second, more stringent cut point (≥4.5) was defined as a high-agreement benchmark to describe stronger endorsement patterns. These thresholds are reported as descriptive prevalence indicators. To quantify between-wave differences in these prevalence distributions, chi-square tests were also reported with Cramer’s V as effect size. Importantly, the thresholds are not interpreted as categorical evidence of achieved competence; inferential conclusions about between-wave differences rely primarily on ordinal-appropriate tests and effect sizes applied to the continuous domain score, with threshold indicators used to support interpretability and communication.

3.5.4. The GCQuest Validation Context

For completeness, the development and validation evidence reported for the GCQuest [18] within the broader EduCITY Project is summarized. The GCQuest data collection tool [56] was developed within the EduCITY Project to support the assessment of the GreenComp competence area ESV [8] and is openly available in English [18].

The instrument development was grounded in the GreenComp framework [8] and focused on ESV by operationalizing three competences, “Valuing Sustainability”, “Supporting Fairness”, and “Promoting Nature”, structured through Knowledge, Skills, and Attitudes (KSAs). The GCQuest includes open-ended prompts and a Likert block, and it was administered in Portuguese using the official EU translation of GreenComp [8] to ensure linguistic and conceptual consistency with the framework. For the Art Nouveau Path implementation, item wording and prompts were contextualized to reflect the intervention themes and learner experience, supporting content relevance in the school-based setting.

Evidence supporting the factorial structure has been reported through a second-order Confirmatory Factor Analysis (CFA) within a SEM in JASP 0.19.3, using the Diagonally Weighted Least Squares (DWLS) estimator appropriate for ordinal Likert-type data. The model specified KSA constructs as first-order factors loading onto a second-order factor representing ESV. Overall model fit was good Comparative Fit Index (CFI) = 0.945; Tucker–Lewis Index (TLI) = 0.939; Standardized Root Mean Square Residual (SRMR) = 0.049; Root Mean Square Error of Approximation (RMSEA) = 0.077, with statistically significant factor loadings (p < 0.001). These computation results were calculated with JASP software version 0.19.3 [57]. A documented workflow and technical materials are available through GCQuest resources, including the SEM technical note, as presented in previous work [29]. This prior validation provides context for the use of the 25-item block in the present trend analyses, which focus on between-wave comparisons and wave-specific internal consistency indicators.

3.6. Data Processing and Scoring

Questionnaire data were screened prior to analysis to confirm valid response ranges (1 to 6), identify missing values, and apply the predefined completeness criterion. Missing responses were treated as missing and were not imputed. All items were coded such that higher values indicated higher perceived competence alignment with sustainability values. Scores were computed as following defined.

3.7. Statistical Analysis

Analyses followed the repeated cross-sectional structure of the dataset. Descriptive statistics were computed for the ESV domain score and for each item by wave using distribution-aware summaries appropriate for Likert-type outcomes, prioritizing median and Interquartile Range (IQR) by wave. Means were retained as the operational definition of the domain score (mean of Likert-items Q1–Q25) and for descriptive figures. Inferential conclusions rely primarily on ordinal-robust non-parametric tests and effect sizes; mean-based heteroscedasticity-robust inference is reported as a sensitivity analysis to corroborate robustness.

Internal consistency of the 25-item scale was assessed within each wave (Cronbach’s alpha, complemented by McDonald’s omega). These indices were computed for the 25-item ESV composite and are reported as wave-specific dataset-level quality checks, not as reliability evidence for separate competence-specific subscales in this work. For domain-level comparisons across the three independent samples (RQ1), an omnibus non-parametric comparison was conducted (Kruskal–Wallis), followed by adjusted post hoc pairwise comparisons when warranted (Dunn tests with Holm correction). Effect sizes were computed alongside p-values (epsilon-squared for omnibus effects; rank-biserial correlation for pairwise contrasts).

Because the ESV composite score is defined as the mean of 25 Likert items, we also report mean-based estimates and heteroscedasticity-robust between-wave inference as sensitivity analyses. Specifically, heteroscedasticity-robust omnibus testing (Welch ANOVA) and Holm-adjusted Welch t tests were used for pairwise contrasts, alongside effect sizes and confidence intervals. These mean-based results are reported in parallel with ordinal-robust non-parametric comparisons to corroborate robustness; substantive conclusions are drawn from the ordinal-robust comparisons under the repeated cross-sectional design.

In addition to the domain and item-level trend analyses, we conducted an exploratory triangulation to contextualize item trajectories using discourse-oriented features of the item prompts. Because the stem framing differs across waves (habitual day-to-day orientation in S1-PRE, activity-attribution framing in S2-POST, and persistence-since-activity framing in S3-FU), discourse coding was performed on the canonical item text excluding the stem to isolate prompt properties from wave framing.

A four-member coding panel (including the authors, an EduCITY Project researcher, and a Portuguese language teacher who supported the field implementation) independently coded all 25 items and then resolved discrepancies through two structured consensus meetings. Coding followed a closed codebook with deterministic rules anchored in the dominant modal verb of the prompt, enabling a reproducible mapping of items to KSA-oriented categories: Knowledge (knowing, being aware), Skills (being able to), and Attitudes (becoming more willing, being more concerned, affective stance such as empathy). The final KSA mapping was used to aggregate item means and deltas by category to support interpretive triangulation.

We further computed simple linguistic-complexity indicators for each item (character count and word count, computed on the item text excluding the stem) and examined descriptive associations between these indicators and item-level change magnitudes (deltas) across waves. These analyses were treated as exploratory and were used to support interpretation rather than inferential claims.

3.8. Cross-Software Verification

To ensure data analysis and results quality, key descriptive statistics and score computations were independently cross-checked in MATLAB (version R2025b) using the same cleaned datasets and scoring rules. This cross-check was performed by an external researcher who was not involved in the project. This step verified numerical consistency across environments rather than generating additional results. The minor differences were attributable to rounding or display conventions and did not affect the reported conclusions.

3.9. Ethical Considerations and Data Access

The study was conducted in accordance with the ethical protocols established by the University of Aveiro and in compliance with the University of Aveiro’s General Data Protection Regulation (GDPR) on 27 November 2024 and was approved by the Ethics Committee of the same institution (protocol code 1-CE/2025, 5 February 2025).

Participation was voluntary. Informed consent was obtained from all participants. Regarding students, parental or legal-guardian consent was additionally secured in line with school-based procedures for research involving minors. Questionnaire administration was anonymous, and no personally identifiable information was collected.

Given that the datasets were collected in educational contexts involving minors and under GDPR constraints, participant-level questionnaire datasets are not publicly released. Supporting analysis materials are available via the Art Nouveau Path MARG’s Zenodo community [58]. Access to restricted datasets or additional aggregated outputs may be provided upon reasonable request, subject to ethics approval and data protection requirements.

4. Results

4.1. Data Completeness and Internal Consistency

Across the three waves, item-level data quality was high, and responses respected the expected 1 to 6 Likert range, with no out-of-range values detected. S1-PRE (N = 221) and S3-FU (N = 434) contained no missing cells. In S2-POST (N = 439), missingness was concentrated in a single record (7 missing cells across Q11 to Q17). Considering that this record did not meet the predefined completeness criterion, it was excluded. The resulting analytic sample comprised N = 221 (S1-PRE), N = 438 (S2-POST), and N = 434 (S3-FU), totaling N = 1093 responses.

Internal consistency of the 25-item Embodying Sustainability Values (ESV) composite score was acceptable to good across waves (Cronbach’s alpha range: 0.72 to 0.88; McDonald’s omega range: 0.72 to 0.88). Item diagnostics indicated that “alpha if item deleted” did not meaningfully improve the scale at any wave, supporting the use of a single composite score at each time point. Although the composite score deviated from normality in all waves (Shapiro–Wilk tests), the large sample sizes and observed variance heterogeneity (Levene and Brown-Forsythe tests) justified the use of heteroscedasticity-robust and ordinal-robust comparisons in subsequent analyses. Table 1 summarizes dataset-level quality checks and internal consistency (alpha and omega) by wave.

Table 1 indicates high item-level data quality across waves, with missingness concentrated in a single S2-POST record that was excluded by the predefined completeness rule. Internal consistency is acceptable to good (alpha and omega), supporting the use of a single ESV composite score per wave; Table 2 then reports wave-level descriptives for the ESV composite score used in subsequent between-wave comparisons.

As shown in Table 2, mean ESV scores increase markedly from S1-PRE to S2-POST and then partially attenuate at S3-FU while remaining above pre-intervention results. This pattern is examined next using distribution-aware summaries and ordinal-robust inference at the domain level.

4.2. Domain-Level Evolution of ESV

The evolution of students’ ESV composite score (defined as the mean of Q1–Q25) is summarized in Figure 2. The distribution-aware descriptives show a clear post-intervention uplift followed by partial attenuation: S1-PRE median = 3.60 [IQR 3.32–4.08], S2-POST median = 4.68 [IQR 4.44–4.88], and S3-FU median = 4.12 [IQR 4.00–4.28]. Mean scores followed the same pattern (S1-PRE: M = 3.70, SD = 0.54; S2-POST: M = 4.64, SD = 0.50; S3-FU: M = 4.13, SD = 0.36). An omnibus Kruskal–Wallis test confirmed differences across waves, H(2) = 428.06, p < 0.001, with a large effect (epsilon-squared = 0.391). Post hoc Dunn tests with Holm correction indicated that all pairwise contrasts differed (all p_Holm < 0.001), with large pairwise effect sizes (rank-biserial r = −0.78 for S1-PRE vs. S2-POST, higher in S2-POST; r = −0.49 for S1-PRE vs. S3-FU, higher in S3-FU; r = 0.64 for S2-POST vs. S3-FU, higher in S2-POST).

The distributional shift is consistent with these median-based contrasts, as presented in Figure 3.

Relative to S1-PRE, S2-POST is characterized by a marked concentration of higher values, while S3-FU shows a partial return towards intermediate values yet remains centered above pre-intervention. Sensitivity analyses (winsorisation and trimmed means) reproduced virtually identical mean-based contrasts, indicating that findings were not driven by distributional tails. Mean-based heteroscedasticity-robust inference (Welch ANOVA and Holm-adjusted Welch t tests) yielded the same qualitative conclusions and is reported as a sensitivity analysis (Table 3).

Table 1 indicates high item-level data quality across waves, with missingness concentrated in a single S2-POST record that was excluded by the predefined completeness rule. Internal consistency is acceptable to good (alpha and omega), supporting the use of a single ESV composite score per wave; Table 2 therefore reports the corresponding wave-level descriptives for that score.

4.3. Proportions of Students Reaching Higher Competence Bands

To complement domain–score contrasts, it was examined the proportion of students exceeding two pragmatic thresholds on the ESV composite score: ≥4.0 (moderate to high endorsement) and ≥4.5 (high endorsement). At the 4.0 threshold, the proportion increased from 28.96% at S1-PRE (64/221) to 88.58% at S2-POST (388/438) and remained elevated at S3-FU (75.12%, 326/434). The prevalence distribution differed strongly across waves (chi-square(2) = 259.99, p < 0.001, Cramer’s V = 0.49)

At the 4.5 threshold, the proportion increased sharply from 9.05% at S1-PRE (20/221) to 70.78% at S2-POST (310/438) but returned near pre-intervention at follow-up (9.91%, 43/434). This shift was also large at the distribution level (chi-square(2) = 436.76, p < 0.001, Cramer’s V = 0.63). At a stricter threshold of 5.0, the same pattern was visible (0.00% at S1-PRE; 16.44% at S2-POST; 2.53% at S3-FU), (chi-square(2) = 82.82, p < 0.001, Cramer’s V = 0.28), reinforcing that the post-intervention surge in very high endorsement was only partially retained, as presented in Figure 4.

Table 4 reports the proportions of students exceeding the selected ESV thresholds by wave, complementing the domain–score contrasts with a prevalence-oriented view.

Table 4 indicates that the proportion of students above the 4.0 threshold rises sharply at S2-POST and remains elevated at S3-FU, whereas the stricter 4.5 threshold shows a strong post-intervention surge that largely returns near pre-intervention by follow-up. To clarify which aspects of ESV drive these shifts, the next section turns to item-level trajectories and contrasts.

4.4. Item-Level Patterns in ESV

Item-level analyses clarified which aspects of ESV were most responsive and which gains were retained over time. For each item, an omnibus Kruskal–Wallis test indicated between-wave differences (all p < 0.001). Dunn post hoc tests with Holm correction (Holm-adjusted within each 25-item family) showed that all items increased from S1-PRE to S2-POST and all items decreased from S2-POST to S3-FU (all p_Holm < 0.001). For the long-term contrast (S1-PRE vs. S3-FU), 17 of 25 items remained significantly higher at follow-up; the eight items not significant after Holm adjustment were Q2, Q3, Q9, Q10, Q13, Q23, Q24, and Q25.

The largest immediate gains from S1-PRE to S2-POST were observed in Q7 (Delta = +1.25), Q17 (Delta = +1.24), Q6 (Delta = +1.22), Q15 (Delta = +1.19), and Q5 (Delta = +1.19). The largest declines from S2-POST to S3-FU were observed in Q23 (Delta = −0.77), Q3 (Delta = −0.70), Q17 (Delta = −0.68), Q25 (Delta = −0.67), and Q5 (Delta = −0.64). Despite this partial fade-out, 24 of 25 items remained at or above their pre-intervention mean at follow-up. Only Q9 ended marginally below pre-intervention (Delta = −0.03), and this difference was negligible and statistically non-significant. To visualize immediate responsiveness at the item level, Figure 5 orders items by their mean gain from S1-PRE to S2-POST.

To support interpretation of item sensitivity, we examined item discrimination and wording-related characteristics. Corrected item-total correlations computed on the pooled sample ranged from approximately 0.31 (Q10) to 0.51 (Q6), with relatively higher correlations for items showing larger and more sustained gains (for example, Q6, Q7, Q12, Q15, Q16, and Q17) and lower correlations for items that showed weaker retention (notably Q9 and Q10, and to a lesser extent Q2). This convergence between longitudinal change patterns, wording characteristics, and item discrimination supports the interpretation that both item content and phrasing shape sensitivity to the situated, place-based learning fostered by the Art Nouveau Path. Table 5 summarizes the most salient item-level change patterns, including the largest immediate gains, the largest follow-up losses, and the items whose long-term differences are not robust after multiplicity control.

Table 5 highlights that immediate post-intervention gains are largest for a small subset of items, while follow-up losses are also concentrated in specific items, indicating heterogeneous responsiveness and retention across ESV facets.

To make the post-intervention decay more transparent, Figure 6 reorders items by their mean loss from S2-POST to S3-FU (Delta S2–S3 = S3 minus S2), thereby highlighting which perceived competences were least stable over time. This visualization complements the S1–S2 gain-oriented ordering in Figure 5 by focusing on retention rather than immediate responsiveness.

As presented in Figure 6 and summarized in Table 6, the steepest declines occurred for Q23, Q3, Q17, Q25, and Q5 (losses between −0.64 and −0.77 points), indicating that the strongest post-test endorsements were not uniformly sustained at follow-up. Conversely, the smallest S2–S3 decreases, and thus the best relative retention, were observed for Q12, Q21, Q10, Q1, and Q9 (losses between −0.25 and −0.41 points). Importantly, this does not imply that these items were unchanged over time, but rather that their post-intervention levels were comparatively more stable when students were asked, at follow-up, to report persistence since the activity.

Table 6 confirms that the steepest declines from S2-POST to S3-FU are concentrated in a subset of items, while other items show comparatively better retention. This motivates the subsequent triangulation that interprets item trajectories in relation to prompt modality and linguistic features.

4.5. Triangulation Between Item Discourse Features and S1-S2-S3 Trajectories

To contextualize the item-level trajectories, we triangulated longitudinal patterns (S1-PRE to S2-POST to S3-FU) with a discourse-oriented characterization of item prompts, focusing on verbal modality and linguistic complexity. Importantly, the response framing differs systematically by wave: S1-PRE used a habitual self-report stem (day-to-day orientation), S2-POST asked respondents to attribute change to the intervention (activity-based attribution), and S3-FU asked for persistence since the activity (practice-based persistence). This shift in stems provides a parsimonious measurement explanation for the typical pattern observed in the dataset, namely a pronounced increase at S2-POST followed by a partial decrease at S3-FU, consistent with recency and attribution effects at post-test and recalibration demands at follow-up.

Items also cluster meaningfully by prompt modality in a way that aligns with the GCQuest [57] KSA’s framing. Prompts using knowing and awareness verbs (for example, “to know”, “to be aware”) were mapped to Knowledge (K), prompts using capability verbs (for example, “to be able to”) were mapped to Skills (S), and prompts expressing disposition, concern, willingness, or affective stance (including empathy) were mapped to Attitudes (A). When trajectories were aggregated by these KSA categories, Skills and Knowledge items showed comparatively stronger retained gains from S1-PRE to S3-FU, whereas Attitudes items exhibited the sharpest correction at follow-up, consistent with S3-FU implicitly requiring evidence of sustained day-to-day enactment rather than immediate post-activity intention.

Finally, exploratory indicators suggest that item complexity and pre-intervention anchoring shape responsiveness. Item length and word count were negatively associated with the immediate gain from S1-PRE to S2-POST (r approximately −0.45 and −0.32, respectively), indicating that more linguistically complex items tend to show smaller post-intervention inflation. pre-intervention items means were strongly negatively associated with change magnitudes (S1 mean versus Delta S1 to S2: r approximately −0.59; S1 mean versus Delta S1 to S3: r approximately −0.66), consistent with ceiling effects and reduced headroom for already highly endorsed items. The interpretation that observed item-level patterns reflect a combination of intervention-related change and systematic measurement properties linked to stem framing, modality, and linguistic may be suggested by these findings. Figure 7 visualizes the aggregated trajectories by KSA category.

The visualization of the aggregated trajectories by KSA category is complemented with numerical summaries in Table 7.

Table 7 presents that aggregated deltas differ by KSA-oriented prompt modality, with S and K items exhibiting comparatively stronger retained gains than A items. This pattern supports the interpretive claims developed next in the Discussion regarding measurement framing, prompt modality, and differential retention across ESV dimensions.

5. Discussion

5.1. Summary of the Main Findings and Linkage to the RQ

This repeated cross-sectional trend study examined students’ self-perceived sustainability competences within the GreenComp competence area ESV across three questionnaire waves: pre-intervention (S1-PRE), immediate post-intervention (S2-POST), and follow-up (S3-FU). As an exploratory, context-bounded evaluation under authentic school constraints, the study is designed to characterize cohort-level trends rather than within-student change. Overall, results indicate a pronounced increase from S1-PRE to S2-POST, followed by a partial decline at S3-FU, while remaining above pre-intervention at the domain level. Consistent with RQ1, the ESV composite score showed a marked shift in distributional summaries (S1-PRE median = 3.60 [IQR 3.32–4.08], S2-POST median = 4.68 [IQR 4.44–4.88], S3-FU median = 4.12 [IQR 4.00–4.28]). Between-wave differences were supported by an omnibus Kruskal–Wallis test with a large effect (epsilon-squared = 0.391) and Holm-adjusted post hoc contrasts (all p < 0.001). Mean-based summaries were consistent (M = 3.70, 4.64, and 4.13 for S1-PRE, S2-POST, and S3-FU, respectively) and mean-based Welch comparisons (Table 3) corroborated the same pattern as a sensitivity analysis.

Regarding RQ2, the competence-band indicators provide a complementary prevalence view of how endorsement shifts across waves. At the 4.0 threshold, the share of students meeting moderate-to-high endorsement increased sharply and remained elevated at follow-up, whereas at the more stringent 4.5 threshold, the post-intervention surge largely returned to near- pre-intervention levels by S3-FU. These shifts were associated with large between-wave differences in prevalence (chi-square with Cramer’s V), but thresholds are interpreted as descriptive indicators rather than categorical evidence of achieved competence.

Finally, addressing RQ3, item-level trajectories show that the intervention’s influence was not uniform across the 25 items. The most durable gains were concentrated in statements that align closely with the game’s place-based narrative and the responsibility to care for concrete heritage places, while more abstract items showed weaker retention.

5.2. Interpreting the Domain-Level Trajectories: Large Short-Term Gains and Partial Retention

The domain-level pattern suggests that a single, heritage-centered mobile AR experience act as a strong short-term catalyst for students’ perceived sustainability values, particularly in the immediate aftermath of the activity. The degree of the post-intervention shift is consistent with the idea that outdoor, collaborative, and narrative-driven tasks can activate value-oriented reflection by grounding sustainability in tangible contexts. In the Art Nouveau Path, students are not only exposed to sustainability themes, since they are repeatedly invited to observe architectural details, interpret their cultural meaning, and connect these observations to wider issues of care, responsibility, and the consequences of decisions for shared environments. Complementary analyses of the wave-specific open-ended prompts provide convergent context for the scaled trends reported here. In those analyses, references to preservation and care of the built structure increased from 28.96% (n = 64/N = 221) at pre-intervention (S1-PRE) to 61.05% (n = 268/N = 439) immediately after gameplay (S2-POST), remaining above pre-intervention at follow-up, 47.93% (n = 208/N = 434) [29]. The same open-ended dataset also showed parallel shifts toward heritage framed within sustainable urban development, 22.17% (n = 49/N = 221), 43.96% (n = 193/N = 439), and 35.94% (n = 156/N = 434), alongside a rebalancing of exclusively environmental framings, 57.92% (n = 128/N = 221), 30.98% (n = 136/N = 439), and 41.94% (n = 182/N = 434) [29]. Teacher observations corroborated this tendency, with spontaneous preservation discourse recorded in 58.33% of the T2-OBS forms (14/24) [29]. Post-game further evidenced place-attentive learning, including 17.20% (n = 71/N = 439) explicitly mentioning tiles and 7.30% (n = 30/N = 439) referring to the whip line in written responses [12].

At the same time, the partial decline at follow-up indicates that some of the immediate post-intervention uplift does not automatically consolidate into stable day-to-day self-perceptions. This fade-out is compatible with two non-exclusive interpretations. First, it may reflect genuine attenuation over time when learners do not encounter structured opportunities to revisit and apply the values activated during gameplay. Second, it may partly reflect measurement-related factors: the post-intervention wave explicitly attributes perceived change to the activity, whereas pre-intervention and follow-up rely more on day-to-day practice framing. Accordingly, the domain-level trend should be interpreted as evidence of strong immediate activation with partial retention, rather than as definitive proof of sustained competence change in individuals.

5.3. Interpreting Competence Bands: What Shifts in High Endorsement Do and Do Not Imply?

The competence-band results refine the interpretation of the domain-level means by showing that different “levels” of endorsement behave differently over time. Using the descriptive 4.0 threshold, the proportion of students meeting moderate-to-high endorsement increased from 28.96% at S1-PRE to 88.58% at S2-POST and remained elevated at 75.12% at S3-FU. This suggests that the experience may have a durable influence on moving many students away from low-to-moderate positions toward more affirmative self-perceptions on ESV.

However, at the more stringent 4.5 threshold, the pattern is qualitatively different: the proportion rose from 9.05% (S1-PRE) to 70.78% (S2-POST) but returned to 9.91% at S3-FU, close to pre-intervention. A similar pattern is visible at the 5.0 threshold (0.00% at S1-PRE; 16.44% at S2-POST; 2.53% at S3-FU). Together, these findings suggest that very high endorsement immediately after the activity is difficult to sustain without reinforcement. Importantly, this does not undermine the educational relevance of the intervention. Instead, it clarifies the likely mechanism: a single session can trigger strong short-term enthusiasm and confidence, while longer-term consolidation may require repeated engagement, explicit curricular integration, and opportunities to enact sustainability values beyond the game context.

Methodologically, these thresholds should be treated as descriptive indicators aimed to communicate prevalence and distributional shifts. These indicators are useful for interpretation and communication, but they should not be over-read as categorical evidence of “achieved competence”, especially given the ordinal nature of Likert-type data and the known information loss associated with binning.

5.4. Item-Level Insights and Implications for Game and Tasks Design

The item-level results indicate that the Art Nouveau Path has a differentiated impact across the 25 ESV items. While all items increased immediately after the game and then decreased at follow-up, most items remained significantly above pre-intervention at S3-FU (17 out of 25), indicating that retention was not limited to a single narrow aspect of the construct. At the same time, a small subset of items showed weaker long-term differences once multiplicity control was applied. After Holm adjustment, eight items did not show retained gains at follow-up (Q25, Q23, Q13, Q2, Q24, Q10, Q3, and Q9), returning to values statistically indistinguishable from pre-intervention. This pattern suggests that some value statements were less likely to translate into sustained perceived competences after a single session. This retention pattern is summarized at item level in Figure 8, which reports pre-intervention to follow-up effect sizes and Holm-adjusted significance.

The strongest and most durable gains were concentrated in items that closely match what the game repeatedly foregrounds, namely caring for places, recognizing cultural and environmental limits, and linking values to decisions that affect shared environments and heritage. The largest pre-intervention-to-follow-up gains were observed for Q7, Q12, Q16, and Q6, with similarly large retained effects for Q21 and Q18. These statements combine value-laden language with actionable or evaluative framing, which likely makes them more mappable to the situated experiences provided by the game. From a design perspective, this supports a clear implication: mobile AR heritage experiences may be most effective for sustainability values when they do more than present information. Accordingly, AR-based activities should be strengthened to require stance-taking, interpretation, and responsibility in relation to concrete places.

This differential retention pattern can be interpreted through a situated learning and context-based learning lens. In outdoor mobile AR, competence-relevant meanings are formed while learners perceive, discuss, and act in a real environment, rather than only as decontextualized endorsements. AR overlays and multimodal prompts can guide noticing and support embodied interaction with architectural cues, while group navigation and peer explanation provide repeated co-attention and negotiation. Under these conditions, items anchored in specific places and observable features are plausibly encoded more robustly and rehearsed more naturally during the activity, supporting stronger persistence beyond the immediate post-intervention moment [31,59].

A complementary account is provided by sense of place and place-based pedagogy. Built heritage functions as a locally meaningful referent that can re-signify urban space and activate place-mediated memory, thereby strengthening stewardship-oriented value judgements tied to concrete sites and shared environments. From this perspective, items framed around caring for places and evaluating decisions that affect heritage and public space are expected to show stronger maintenance because they remain attached to salient, visitable referents that carry perceived civic value [60].

A detailed analysis of the eight items that did not retain gains after correction for multiple comparisons (Q2, Q3, Q9, Q10, Q13, Q23, Q24, and Q25) suggests that the decline follows a consistent pattern rather than random variation. Table 8 summarizes the shared characteristics of the eight items that did not retain gains after multiplicity control (Holm adjustment) at follow-up (S3-FU).

Table 8 shows that three properties recur across these items: (i) higher conceptual abstraction and socio-normative framing; (ii) reliance on dialogic, evaluative, or metacognitive operations rather than in situ perceptual cues; and (iii) higher linguistic load due to multi-clause wording. This pattern suggests that durable shifts are more likely for items tightly coupled to repeated, place-anchored prompts, whereas items requiring sustained discourse practice and conceptual consolidation may require structured follow-up activities beyond a single session.

Consistent with this interpretation, items that are more abstract, conceptually dense, or dependent on specialized terminology, such as statements involving environmental justice, competing sustainability worldviews, or ontological claims about human-nature relations, tended to show smaller and less stable shifts at follow-up. This does not imply that these ideas lack pedagogical value. Instead, they may require additional scaffolding to promote long-term retention, including brief pre-briefing of key concepts, structured post-game debriefing, classroom follow-up tasks, or explicit prompts that connect in-game heritage dilemmas to higher-abstraction formulations.

This interpretation is also compatible with the discrimination pattern. Corrected item-total correlations ranged from approximately 0.31 to 0.51, with higher values observed for items that also exhibited larger and more sustained gains, notably Q6, Q7, Q12, Q15, Q16, and Q17. In contrast, among the items that did not retain gains after multiplicity control, Q9 and Q10, and to a lesser extent Q2, showed weaker retention signals alongside comparatively lower discrimination, while the remaining items in this subset showed more heterogeneous item–total behavior.

Together, the convergence between wave-to-wave patterns, linguistic and conceptual demands, and item discrimination supports a practical implication for instrument use in mobile AR heritage contexts. Although valuable, merely aligning items with a competence framework is not enough. Sensitivity to change also depends on how closely each item is linked to the learning experience and to the types of judgments and actions it encourages.

6. Conclusions

6.1. Main Conclusions

Three main conclusions are presented:

(1): The post-intervention wave (S2-POST) presents a marked uplift in perceived sustainability values relative to pre-intervention (S1-PRE), accompanied by a clear upward distributional shift and a higher prevalence of students in higher endorsement bands, with moderate-to-high endorsement showing clearer maintenance at follow-up (S3-FU) than very high endorsement. In substantive terms, a short, carefully designed, place-based mobile AR experience may make sustainability values salient and strengthen students’ value-oriented self-appraisals linked to care, responsibility, and stewardship in relation to built heritage and sustainability concepts.
(2): The trajectory indicates partial attenuation over time rather than a stable plateau. At follow-up (S3-FU), ESV scores decrease relative to the immediate post-intervention measurement (S2-POST) but remain clearly above pre-intervention (S1-PRE) at the domain level. This pattern is consistent with a residual positive trace of the experience while suggesting that the highest levels of endorsement are difficult to maintain without reinforcement beyond the gameplay session.
(3): Item-level trajectories indicate heterogeneous sensitivity within ESV. Items closely aligned with concrete, place-centered forms of care and responsibility show more robust retention, whereas more abstract or conceptually dense formulations show weaker long-term differentiation. Methodologically, this reinforces the value of reporting domain-level indicators alongside item-level patterns when evaluating competence-oriented ESD interventions in authentic, technology-mediated contexts.

6.2. Limitations

These main findings should be interpreted considering several limitations: (i) Outcomes rely on self-report. The GCQuest [18] captures self-perceived values and dispositions relevant to ESV, but does not directly measure behavioral change or observable sustainability action. (ii) The design is repeated cross-sectional rather than panel-based. Considering that responses were anonymous and class composition varied across waves, individual students could not be tracked. The analyses therefore describe cohort-level trends and do not support inference about within-student change or intra-individual variability. (iii) The wave-specific stems differ systematically in framing, which may contribute to between-wave differences. Pre-intervention (S1-PRE) emphasizes day-to-day orientation, the post-intervention (S2-POST) wave invites attribution to the activity, and follow-up (S3-FU) asks about persistence since participation. Part of the immediate uplift and subsequent attenuation may therefore reflect framing, recency, and demand characteristics including social desirability inflation at S2-POST, as well as reference-shift or recalibration effects at follow-up, in addition to substantive change. (iv) The absence of a comparison group constrains attribution. Concurrent curriculum activities, school projects, or local heritage and sustainability initiatives may have influenced students’ perceptions between measurement moments, particularly between post-intervention (S2-POST) and follow-up (S3-FU). (v) Generalizability is context-bounded. The study was conducted in a single city using a specific Art Nouveau path and implementation model within the EduCITY DTLE. Transfer to other heritage typologies, age groups, or educational systems may require adaptation. (vi) Implementation took place under authentic in-the-wild conditions and collaborative gameplay, which introduces heterogeneity that cannot be fully modeled with the available data. Variation in contextual factors such as weather, crowding, and path logistics, together with peer explanation and teacher mediation during group gameplay, may have shaped both the experience and subsequent self-appraisals. These features strengthen ecological validity, but they complicate attribution and may contribute to variability in perceived competence shifts. (vii) The follow-up interval (S3-FU) provides only a limited window for interpreting durability. While the follow-up (S3-FU) wave supports assessing short-term maintenance of perceived competences, it is insufficient for claims about long-term retention or for any inference about sustained behavioral change. (viii) Because the study prioritized data minimization, no socio-demographic profiling was collected, and questionnaires did not capture respondent-level age, grade, or school identifiers, which prevents moderation analyses and precludes school-level or cluster-aware analyses (for example, whether trajectories differ by background characteristics or prior interest). In addition, the threshold bands used in this paper should be interpreted as descriptive indicators of prevalence and distributional shift, not as categorical evidence of achieved competence.

6.3. Future Paths

The results motivate three complementary directions spanning method, pedagogy, and scaling.

Methodologically, future work should strengthen cross-source triangulation by linking repeated cross-sectional questionnaire trends to behavioral evidence already available in the project, including gameplay logs, spatial trajectories, and structured teacher observations, within a unified analytic framework. Where feasible, a hybrid design could retain anonymous cohort monitoring while adding a smaller consented panel subsample to estimate intra-individual change and to examine who sustains gains over time. Future implementations may consider collecting minimal, non-identifying covariates (for example, grade band and a broad school-context indicator) under appropriate ethical approval to enable stratified and cluster-aware analyses while preserving data minimization. Including comparison conditions, such as classes exposed to alternative heritage activities or standard instruction, would further strengthen interpretation.

Pedagogically, the observed partial attenuation supports implementing a structured program of follow-up activities to help sustain gains in perceived sustainability competences. A feasible model is a sequenced package combining preparation, reflection, and action: (i) pre-game preparation (classroom, 45 to 60 min): future iterations should systematize this phase as a structured briefing that introduces the local heritage context, frames sustainability value dilemmas, and sets a brief reflective prompt aligned with ESV; (ii) immediate post-game consolidation (classroom, 30 to 45 min): guided debrief, small-group discussion anchored in specific points of interest, and a short reflective artifact (written or multimodal) connecting observations to responsibility and care; (iii) short-term reinforcement (1 to 2 weeks): a micro-project in which groups adopt one visited point of interest, document its value and vulnerabilities, and propose one realistic preservation or sustainability-oriented action; and (iv) medium-term follow-up (4 to 8 weeks): a student-led dissemination or civic activity, such as a school exhibition, a digital story map, or a proposal shared with local stakeholders, accompanied by structured reflection on what was sustained in day-to-day choices.

This program is designed to convert post-intervention salience into repeated opportunities for value enactment, which is a plausible mechanism for stabilizing higher endorsement at follow-up.

Finally, future research should broaden the competence lens by replicating the longitudinal approach across additional GreenComp areas [8] and testing how value trajectories relate to systems thinking, critical reflection, and envisioning sustainable futures. Replications in other heritage settings and cities would help distinguish robust design principles from those requiring local tailoring, strengthening the cumulative evidence base for heritage-based mobile AR in ESD.

Author Contributions

Conceptualization, J.F.-S.; methodology, J.F.-S.; validation, J.F.-S. and L.P.; formal analysis, J.F.-S.; investigation, J.F.-S.; resources, J.F.-S.; data curation, J.F.-S.; writing—original draft, J.F.-S.; writing—review and editing J.F.-S. and L.P.; visualization, J.F.-S.; supervision, L.P.; project administration, J.F.-S. and L.P. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by National Funds through the FCT—Fundação para a Ciência e a Tecnologia, I.P., under Grant Number 2023.00257.BD, with the following DOI: https://doi.org/10.54499/2023.00257.BD. The EduCITY project is funded by National Funds through the FCT—Fundação para a Ciência e a Tecnologia, I.P., under the project PTDC/CED-EDG/0197/2021.

Data Availability Statement

The datasets supporting the findings of this study are derived from the implementation of the Art Nouveau Path MARG in Aveiro, Portugal. The research datasets (student questionnaires S1-PRE, S2-POST, and S3-FU) contain sensitive information and are therefore not publicly available due to participant privacy, GDPR constraints, and ethical restrictions. These anonymized datasets can be made available from the corresponding author upon reasonable request, subject to institutional approval and data protection requirements. Non-sensitive instruments, templates, and aggregated resources (for example, questionnaires, observation templates, and documentation of the GreenComp mapping) are openly available via the Art Nouveau Path Zenodo’s community landing page (https://zenodo.org/communities/artnouveaupath/records/, accessed on 11 December 2025). Publicly shared files omit sensitive fields; item-level logs and any additional restricted outputs remain available on reasonable request under the same ethical and institutional conditions.

Acknowledgments

The authors acknowledge the support of the EduCITY project research team, including assistance during field implementations, as well as the voluntary support provided for data validation. The authors also appreciate the willingness of the participants to contribute to this study. During the preparation of this manuscript, the authors used Microsoft Word, Excel, and PowerPoint (Microsoft 365) for writing and preparing tables and figures; DeepL (DeepL Translator) (version released 5 November 2025) and ChatGPT (OpenAI) (GPT-5, version released 7 August 2025) for translation and language polishing, including redundancy checking; and Julius.ai for an auxiliary plausibility check of selected descriptive summaries. Quantitative data were initially coded, screened, and preprocessed in Excel and were subsequently analyzed and visualized in R (version 4.4.1) using the tidyverse ecosystem and ggplot2 to generate publication-quality figures. Numerical data outputs and score computations were independently cross-checked in MATLAB (version R2025b) by an external researcher as a computational verification step. The authors reviewed and edited all tool outputs and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of this study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

ESD	Education for Sustainable Development
AR	Augmented Reality
MARG	Mobile Augmented Reality Game
DTLE	Digital Teaching and Learning Ecosystem
POI	Point of Interest
GCQuest	GreenComp-based Questionnaire
S1-PRE	Students’ Pre-Intervention Questionnaire
S2-POST	Students’ Post-Intervention Questionnaire
S3-FU	Students’ Follow-Up Questionnaire
RQ	Research Question
EfS	Education for Sustainability
GBL	Game-Based Learning
EDM	Educational Data Mining
LA	Learning Analytics
ESV	Embodying Sustainability Values
SEM	Structural Equation Modeling
KSA	Knowledge, Skills, and Attitudes
CFA	Confirmatory Factor Analysis
DWLS	Diagonally Weighted Least Squares
CFI	Comparative Fit Index
TLI	Tucker–Lewis Index
SRMR	Standardized Root Mean Square Residual
RMSEA	Root Mean Square Error of Approximation
IQR	Interquartile Range
GDPR	General Data Protection Regulation
SD	Standard Deviation
CI	Confidence Interval
df	Degrees of Freedom
K	Knowledge
S	Skills
A	Attitudes

References

UNESCO. Education for Sustainable Development: A Roadmap; UNESCO: Paris, France, 2020. [Google Scholar]
UN. Transforming Our World: The 2030 Agenda for Sustainable Development (A/RES/70/1); UN General Assembly; UN: New York, NY, USA, 2015. [Google Scholar] [CrossRef]
Council of Europe. Council of Europe Framework Convention on the Value of Cultural Heritage for Society; Council of Europe: Strasbourg, France, 2005. [Google Scholar] [CrossRef]
UNESCO. Convention Concerning the Protection of the World Cultural and Natural Heritage. 1972. Available online: https://whc.unesco.org/archive/convention-en.pdf (accessed on 23 December 2025).
Lerario, A. The Role of Built Heritage for Sustainable Development Goals: From Statement to Action. Heritage 2022, 5, 2444–2463. [Google Scholar] [CrossRef]
Akçayır, M.; Akçayır, G. Advantages and challenges associated with augmented reality for education: A systematic review of the literature. Educ. Res. Rev. 2017, 20, 1–11. [Google Scholar] [CrossRef]
Radu, I. Augmented reality in education: A meta-review and cross-media analysis. Pers. Ubiquit Comput. 2014, 18, 1533–1543. [Google Scholar] [CrossRef]
Bianchi, G.; Pisiotis, U.; Cabrera, M.; Punie, Y.; Bacigalupo, M. The European Sustainability Competence Framework; Publications Office of the European Union: Luxembourg, 2022. [Google Scholar]
Martín-Ramos, P.; Sánchez-Hernández, E.; Fourati-Jamoussi, F.; Annelin, A.; Charatsari, C.; Ferreira-Santos, J.; Doran, P.; Naves-Sousa, D.; Eugenio-Gozalbo, M.; Giudice, L.A.L.; et al. Operationalizing the European sustainability competence framework: Development and validation of learning outcomes for GreenComp. Open Res. Eur. 2025, 5, 203. [Google Scholar] [CrossRef]
Ferreira-Santos, J.; Pombo, L. The Art Nouveau Path: Promoting Sustainability Competences Through a Mobile Augmented Reality Game. Multimodal Technol. Interact. 2025, 9, 77. [Google Scholar] [CrossRef]
Ferreira-Santos, J.; Pombo, L. The Art Nouveau Path: Integrating Cultural Heritage into a Mobile Augmented Reality Game to Promote Sustainability Competences Within a Digital Learning Ecosystem. Sustainability 2025, 17, 8150. [Google Scholar] [CrossRef]
Ferreira-Santos, J.; Pombo, L. The Art Nouveau Path: Trajectory Analysis and Spatial Storytelling Through a Location-Based Augmented Reality Game in Urban Heritage. ISPRS Int. J. Geo-Inf. 2025, 14, 469. [Google Scholar] [CrossRef]
Chatsiopoulou, A.; Michailidis, P.D. Augmented Reality in Cultural Heritage: A Narrative Review of Design, Development and Evaluation Approaches. Heritage 2025, 8, 421. [Google Scholar] [CrossRef]
Brady, H.E.; Johnston, R. Repeated Cross-Sections in Survey Data. In Emerging Trends in the Social and Behavioral Sciences; Wiley: Hoboken, NJ, USA, 2015; pp. 1–18. [Google Scholar]
Lebo, M.J.; Weber, C. An Effective Approach to the Repeated Cross-Sectional Design. Am. J. Pol. Sci. 2015, 59, 242–258. [Google Scholar] [CrossRef]
Mansfield, K.L.; Puntis, S.; Soneson, E.; Cipriani, A.; Geulayov, G.; Fazel, M. Study protocol: The OxWell school survey investigating social, emotional and behavioural factors associated with mental health and well-being. BMJ Open 2021, 11, e052717. [Google Scholar] [CrossRef]
Winter, K.; Moor, I.; Markert, J.; Bilz, L.; Bucksch, J.; Dadaczynski, K.; Fischer, S.M.; Helmchen, R.M.; Kaman, A.; Möckel, J.; et al. Concept and methodology of the Health Behaviour in School-aged Children (HBSC) study—Insights into the current 2022 survey and trends in Germany. J. Health Monit. 2024, 9, 99–117. [Google Scholar] [CrossRef] [PubMed]
Ferreira-Santos, J.; Pombo, L.; Marques, M.M. GreenComp-based Questionnaire (GCQuest). 2024. Available online: https://zenodo.org/records/14524933 (accessed on 23 December 2025). (In English)
Pelzer, B.; Eisinga, R.; Franses, P.H. Panelizing’ Repeated Cross Sections. Qual. Quant. 2005, 39, 155–174. [Google Scholar] [CrossRef]
Achille, C.; Fiorillo, F. Teaching and Learning of Cultural Heritage: Engaging Education, Professional Training, and Experimental Activities. Heritage 2022, 5, 2565–2593. [Google Scholar] [CrossRef]
Smith, L. Uses of Heritage; Routledge: London, UK, 2006. [Google Scholar]
Choay, F. As questões do Património; Edições 70: Lisbon, Portugal, 2021. [Google Scholar]
Bandarin, F.; van Oers, R. The Historic Urban Landscape; Wiley: Hoboken, NJ, USA, 2012. [Google Scholar]
ICOMOS. Charter for the Conservation of Historic Towns and Urban Areas; ICOMOS: Paris, France, 1987. [Google Scholar] [CrossRef]
Avrami, E.; Macdonald, S.; Mason, R.; Myers, D. Values in Heritage Management; The Getty Conservation Institute: Los Angeles, CA, USA, 2019; Volume 1, No. 1. [Google Scholar]
Kamjou, E.; Scott, M. The heritage-climate change nexus: Towards a values-based adaptive planning response for cultural landscapes. J. Environ. Plan. Manag. 2025, 68, 1–20. [Google Scholar] [CrossRef]
Ard, T.; Bienkowski, M.S.; Liew, S.-L.; Sepehrband, F.; Yan, L.; Toga, A.W. Integrating Data Directly into Publications with Augmented Reality and Web-Based Technologies—Schol-AR. Sci. Data 2022, 9, 298. [Google Scholar] [CrossRef]
Xu, J.; Pan, Y. The Future Museum: Integrating Augmented Reality (AR) and Virtual-text with AI-enhanced Information Systems. J. Wirel. Mob. Netw. Ubiquitous Comput. Dependable Appl. 2024, 15, 373–394. [Google Scholar] [CrossRef]
Ferreira-Santos, J.; Pombo, L. The Art Nouveau Path: Valuing Urban Heritage Through Mobile Augmented Reality and Sustainability Education. Heritage 2026, 9, 4. [Google Scholar] [CrossRef]
Avila-Garzon, C.; Bacca-Acosta, J.; Kinshuk; Duarte, J.; Betancourt, J. Augmented Reality in Education: An Overview of Twenty-Five Years of Research. Contemp. Educ. Technol. 2021, 13, ep302. [Google Scholar] [CrossRef]
Dunleavy, M.; Dede, C. Augmented Reality Teaching and Learning. In Handbook of Research on Educational Communications and Technology, 4th ed.; Spector, J.M., Merrill, M.D., Elen, J., Bishop, M.J., Eds.; Springer: New York, NY, USA, 2014; pp. 735–745. [Google Scholar]
Pellas, N.; Fotaris, P.; Kazanidis, I.; Wells, D. Augmenting the learning experience in primary and secondary school education: A systematic review of recent trends in augmented reality game-based learning. Virtual Real. 2019, 23, 329–346. [Google Scholar] [CrossRef]
Wang, W.-T.; Lin, Y.-L.; Lu, H.-E. Exploring the effect of improved learning performance: A mobile augmented reality learning system. Educ. Inf. Technol. 2023, 28, 7509–7541. [Google Scholar] [CrossRef]
De Paolis, L.T.; Gatto, C.; Corchia, L.; De Luca, V. Usability, user experience and mental workload in a mobile Augmented Reality application for digital storytelling in cultural heritage. Virtual Real. 2023, 27, 1117–1143. [Google Scholar] [CrossRef]
Alonso-Fernández, C.; Calvo-Morata, A.; Freire, M.; Martínez-Ortiz, I.; Fernández-Manjón, B. Game Learning Analytics: Blending Visual and Data Mining Techniques to Improve Serious Games and to Better Understand Player Learning. J. Learn. Anal. 2022, 9, 32–49. [Google Scholar] [CrossRef]
Kim, Y.J.; Valiente, J.A.R.; Ifenthaler, D.; Harpstead, E.; Rowe, E. Analytics for Game-Based Learning. J. Learn. Anal. 2022, 9, 8–10. [Google Scholar] [CrossRef]
Gašević, D.; Dawson, S.; Siemens, G. Let’s not forget: Learning analytics are about learning. TechTrends 2015, 59, 64–71. [Google Scholar] [CrossRef]
Banihashem, S.K.; Dehghanzadeh, H.; Clark, D.; Noroozi, O.; Biemans, H.J.A. Learning analytics for online game-Based learning: A systematic literature review. Behav. Inf. Technol. 2024, 43, 2689–2716. [Google Scholar] [CrossRef]
Daoudi, I. Learning analytics for enhancing the usability of serious games in formal education: A systematic literature review and research agenda. Educ. Inf. Technol. 2022, 27, 11237–11266. [Google Scholar] [CrossRef]
Butterworth, P.; Watson, N.; Wooden, M. Trends in the Prevalence of Psychological Distress Over Time: Comparing Results From Longitudinal and Repeated Cross-Sectional Surveys. Front. Psychiatry 2020, 11, 595696. [Google Scholar] [CrossRef]
Ye, K.; Bilinski, A.; Lee, Y. Difference-in-differences analysis with repeated cross-sectional survey data. Health Serv. Outcomes Res. Methodol. 2025. [Google Scholar] [CrossRef]
Cebrián, G.; Junyent, M.; Mulà, I. Current practices and future pathways towards competencies in education for sustainable development. Sustainability 2021, 13, 8733. [Google Scholar] [CrossRef]
Lira, B.; O’bRien, J.M.; Peña, P.A.; Galla, B.M.; D’mEllo, S.; Yeager, D.S.; Defnet, A.; Kautz, T.; Munkacsy, K.; Duckworth, A.L. Large studies reveal how reference bias limits policy applications of self-report measures. Sci. Rep. 2022, 12, 19189. [Google Scholar] [CrossRef]
Javorka, Z.; Nieth, L.; Marinelli, E.; Sutinen, L.; Auzinger, M. GreenComp in Practice: Case Studies on the Use of the European Competence Framework Analytical Report; European Commission: Brussels, Belgium, 2024. [Google Scholar] [CrossRef]
Szkola, S.; Napoli, V.; Psiotis, U.; Williquet, F. The GreenComp Game; Publications Office of the European Union: Luxembourg, 2024; Available online: https://publications.jrc.ec.europa.eu/repository/handle/JRC139070 (accessed on 23 January 2026).
Monterde-Miralles, N.; Cebrián, G.; Martín-Arbós, S.; Junyent, M. Sustainability Competence Vignette Questionnaire for Secondary Education Students: Design and Validation. Sustain. Dev. 2025, 33, 971–987. [Google Scholar] [CrossRef]
Yin, R.K. Case Study Research Design and Methods, 5th ed.; SAGE Publications, Inc.: Thousand Oaks, CA, USA, 2015. [Google Scholar]
Barth, M.; Thomas, I. Synthesising case-study research—Ready for the next step? Environ. Educ. Res. 2012, 18, 751–764. [Google Scholar] [CrossRef]
Mckenney, S.; Reeves, T. Education Design Research. In Handbook of Research on Educational Communications and Technology, 4th ed.; Springer: New York, NY, USA, 2014; p. 29. [Google Scholar]
Anderson, T.; Shattuck, J. Design-Based Research. Educ. Res. 2012, 41, 16–25. [Google Scholar] [CrossRef]
Municipal Educational Action Program of Aveiro 2024–2025 (PAEMA). Available online: https://tinyurl.com/PAEMAveiro (accessed on 12 November 2025).
Garland, R. The Mid-Point on a Rating Scale: Is it Desirable? Mark. Bull. 1991, 2, 66–70. [Google Scholar]
Beglar, D.; Nemoto, T. Developing Likert-scale questionnaires. In JALT2013 Conference Proceedings; JALT: Tokyo, Japan, 2014; pp. 1–8. [Google Scholar]
South, L.; Saffo, D.; Vitek, O.; Dunne, C.; Borkin, M.A. Effective Use of Likert Scales in Visualization Evaluations: A Systematic Review. Comput. Graph. Forum 2022, 41, 43–55. [Google Scholar] [CrossRef]
Marques, M.M.; Ferreira-Santos, J.; Rodrigues, R.; Pombo, L. Mobile Augmented Reality Games Towards Smart Learning City Environments: Learning About Sustainability. Computers 2025, 14, 267. [Google Scholar] [CrossRef]
Ferreira-Santos, J.; Marques, M.M.; Pombo, L. GreenComp-Based Questionnaire (GCQuest): Questionnaire Development and Validation. Preprints 2024. [Google Scholar] [CrossRef]
JASP Team, ‘JASP.’ JASP: Amsterdam, The Netherlands. 2025. Available online: https://jasp-stats.org/ (accessed on 12 November 2025).
Art Nouveau Path MARG’s Zenodo Community. Available online: https://zenodo.org/communities/artnouveaupath/records/ (accessed on 23 December 2025).
Lim, K.Y.T.; Habig, S. Beyond observation and interaction: Augmented Reality through the lens of constructivism and constructionism. Br. J. Educ. Technol. 2020, 51, 609–610. [Google Scholar] [CrossRef]
Gruenewald, D.A. The Best of Both Worlds: A Critical Pedagogy of Place. Educ. Res. 2003, 32, 3–12. [Google Scholar] [CrossRef]

Figure 1. Study design and data collection timeline for the repeated cross-sectional longitudinal (trend) datasets (S1-PRE prior February 2025; N = 221), S2-POST, February–April 2025; N = 439), and S3-FU (July 2025; N = 434); analytic N presented).

Figure 2. Mean Embodying Sustainability Values (ESV) composite score by wave (S1-PRE, S2-POST, S3-FU), with 95% confidence intervals. The ESV score is computed as the mean of Q1 to Q25 on a 1 to 6 Likert scale.

Figure 3. Distribution of ESV composite scores by wave. The plot highlights the post-intervention rightward shift at S2-POST and the partial attenuation at S3-FU. The orange lines in the boxplots represent the Mean distribution.

Figure 4. Proportion of students above pragmatic ESV thresholds (4.0, 4.5, 5.0) by wave. Thresholds are reported as descriptive prevalence indicators to complement distribution-aware score contrasts.

Figure 5. Item-level mean gains from S1-PRE to S2-POST (Delta S1 to S2), ordered by magnitude to highlight the most responsive ESV items immediately after gameplay.

Figure 6. Item-level mean losses from S2-POST to S3-FU (Delta S2 to S3), ordered by magnitude to highlight which perceived gains show the weakest retention at follow-up.

Figure 7. Aggregated trajectories by KSA prompt modality, reporting mean deltas across waves to support interpretive triangulation of item discourse features and longitudinal patterns.

Figure 8. Item-level retention from pre-intervention (S1-PRE) to follow-up (S3-FU) across the 25 ESV items. Points represent Hedges g for the S3-FU minus S1-PRE difference (positive values indicate higher follow-up scores), with horizontal 95% confidence intervals. Items are ordered by effect size. Filled circles indicate items with retained gains at follow-up (positive mean difference and Holm-adjusted Welch t-test p < 0.05), while crosses indicate items not retained after multiplicity control. Overall, 17 of 25 items showed retained gains at follow-up; the largest effects are observed for Q7, Q12, Q16, and Q6, with similarly large effects for Q21 and Q18. Items not retained include Q2, Q3, Q9, Q10, Q13, Q23, Q24, and Q25.

Table 1. Data quality and internal consistency (ESV, 25 items).

Wave	Raw Data (N)	Analytic Data (N)	Missing Cells (Raw Data)	Response Range	Cronbach’s Alpha	McDonald’s Omega
S1-PRE	221	221	0	1 to 6	0.72	0.72
S2-POST	439	438	7	1 to 6	0.88	0.88
S3-FU	434	434	0	1 to 6	0.75	0.76

Table 2. ESV composite score descriptives (mean of Q1–Q25).

Wave	N	Mean	Standard Deviation (SD)	95% Confidence Interval (CI)
S1-PRE	221	3.70	0.54	[3.63, 3.77]
S2-POST	438	4.64	0.50	[4.59, 4.68]
S3-FU	434	4.13	0.36	[4.09, 4.16]

Table 3. Sensitivity analysis: mean-based between-wave comparisons for ESV composite score.

Contrast	Delta (Mean B − Mean A)	95% CI (Delta)	Welch t	Degrees of Freedom (df)	p (Holm)	Hedges g
S1-PRE vs. S2-POST	+0.93	[0.85, 1.02]	−21.57	413.12	p < 0.001	+1.82
S2-POST vs. S3-FU	−0.51	[−0.57, −0.45]	+17.26	798.23	p < 0.001	−1.17
S1-PRE vs. S3-FU	+0.43	[0.35, 0.50]	−10.59	324.71	p < 0.001	+0.99

Table 4. Proportions above thresholds (competence bands).

Threshold	S1-PRE (n/N)	S2-POST (n/N)	S3-FU (n/N)
≥4.0	64/221 (28.96%)	388/438 (88.58%)	326/434 (75.12%)
≥4.5	20/221 (9.05%)	310/438 (70.78%)	43/434 (9.91%)
≥5.0	0/221 (0.00%)	72/438 (16.44%)	11/434 (2.53%)

Table 5. Item-level highlights (means and change patterns).

Pattern	Items	Summary (Delta)
Largest increases S1-PRE to S2-POST	Q7, Q17, Q6, Q15, Q5	+1.25, +1.24, +1.22, +1.19, +1.19
Largest decreases S2-POST to S3-FU	Q23, Q3, Q17, Q25, Q5	−0.77, −0.70, −0.68, −0.67, −0.64
Items not significant S1-PRE vs. S3-FU (Holm)	Q2, Q3, Q9, Q10, Q13, Q23, Q24, Q25	Long-term differences not robust after multiplicity control

Table 6. S2-POST to S3-FU loss highlights (Figure 6 ordering support).

Pattern	Items	Summary (Delta S2–S3 = S3 Minus S2)
Largest decreases S2-POST to S3-FU	Q23, Q3, Q17, Q25, Q5	−0.77, −0.70, −0.68, −0.67, −0.64
Smallest decreases S2-POST to S3-FU (best retention)	Q12, Q21, Q10, Q1, Q9	−0.25, −0.29, −0.32, −0.35, −0.41

Table 7. Trajectories by KSA prompt modality (mean deltas aggregated by item category).

Item Category (per KSA)	Items (Q)	n Items	Delta S1–S2	Delta S2–S3	Delta S1–S3
Knowledge (K)	Q4, Q10, Q16, Q21, Q24	5	0.85	−0.45	0.40
Skills (S)	Q2, Q3, Q6, Q7, Q8, Q12, Q14, Q15, Q18, Q19, Q20, Q22, Q23	13	0.98	−0.51	0.47
Attitudes (A)	Q1, Q5, Q9, Q11, Q13, Q17, Q25	7	0.91	−0.55	0.36

Table 8. Shared characteristics of the eight items without retained effects at follow-up (S3-FU).

Item	KSA Category	Content Focus	Higher-Order Demand
Q2	S	Negotiating sustainability values and targets across viewpoints	Arguing, perspective taking, negotiating trade-offs
Q3	S	Identifying actions to avoid or reduce natural resource use	Applying principles to everyday practice, transfer
Q9	K	Environmental justice that includes other species and ecosystems	Systems framing, moral extension beyond humans
Q10	K	Distinguishing and evaluating sustainability worldviews (anthropocentrism, technocentrism, ecocentrism)	Categorizing concepts, critical evaluation
Q13	A	Articulating personal sustainability values in dialog	Self-positioning, dialogic openness
Q23	K	Integrating community values, including minorities, into problem framing and decisions	Equity-oriented deliberation, inclusion reasoning
Q24	K	Humans as part of nature, rejecting a strict human-ecological split	Reframing human-nature relations, ontological reconsideration
Q25	A	Appraising cultural contexts through sustainability impacts	Contextual judgment, normative reasoning

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ferreira-Santos, J.; Pombo, L. The Art Nouveau Path: Longitudinal Analysis of Students’ Perceptions of Sustainability Competence Development Through a Mobile Augmented Reality Game. Computers 2026, 15, 86. https://doi.org/10.3390/computers15020086

AMA Style

Ferreira-Santos J, Pombo L. The Art Nouveau Path: Longitudinal Analysis of Students’ Perceptions of Sustainability Competence Development Through a Mobile Augmented Reality Game. Computers. 2026; 15(2):86. https://doi.org/10.3390/computers15020086

Chicago/Turabian Style

Ferreira-Santos, João, and Lúcia Pombo. 2026. "The Art Nouveau Path: Longitudinal Analysis of Students’ Perceptions of Sustainability Competence Development Through a Mobile Augmented Reality Game" Computers 15, no. 2: 86. https://doi.org/10.3390/computers15020086

APA Style

Ferreira-Santos, J., & Pombo, L. (2026). The Art Nouveau Path: Longitudinal Analysis of Students’ Perceptions of Sustainability Competence Development Through a Mobile Augmented Reality Game. Computers, 15(2), 86. https://doi.org/10.3390/computers15020086

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

The Art Nouveau Path: Longitudinal Analysis of Students’ Perceptions of Sustainability Competence Development Through a Mobile Augmented Reality Game

Abstract

1. Introduction

2. Theoretical Framework

2.1. Cultural Heritage Education and Digital Mediation

2.2. AR and Mobile Game-Based Learning

2.3. Educational Data Mining and Learning Analytics as an Evaluation Lens

2.4. Measuring Sustainability Competences Through Perceived Competence

2.5. Synthesis

3. Methods and Materials

3.1. Research Design and Study Procedures

3.2. Context and Intervention Setting

3.3. Participants

3.4. Data Entry, Questionnaire Waves, and Analytical Samples

3.5. Instruments and Measures

3.5.1. GreenComp-Based Perceived Competence Questionnaire (S1-PRE, S2-POST, S3-FU)

3.5.2. The ESV Score as Measures Used in This Study

3.5.3. Derived Indicators for Threshold-Based Analyses

3.5.4. The GCQuest Validation Context

3.6. Data Processing and Scoring

3.7. Statistical Analysis

3.8. Cross-Software Verification

3.9. Ethical Considerations and Data Access

4. Results

4.1. Data Completeness and Internal Consistency

4.2. Domain-Level Evolution of ESV

4.3. Proportions of Students Reaching Higher Competence Bands

4.4. Item-Level Patterns in ESV

4.5. Triangulation Between Item Discourse Features and S1-S2-S3 Trajectories

5. Discussion

5.1. Summary of the Main Findings and Linkage to the RQ

5.2. Interpreting the Domain-Level Trajectories: Large Short-Term Gains and Partial Retention

5.3. Interpreting Competence Bands: What Shifts in High Endorsement Do and Do Not Imply?

5.4. Item-Level Insights and Implications for Game and Tasks Design

6. Conclusions

6.1. Main Conclusions

6.2. Limitations

6.3. Future Paths

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI