A New Serious Game (e-SoundWay) for Learning English Phonetics

Lago-Ferreiro, Alfonso; Gómez-González, María Ángeles; López-Ardao, José Carlos

doi:10.3390/mti9060054

Open AccessArticle

A New Serious Game (e-SoundWay) for Learning English Phonetics

by

Alfonso Lago-Ferreiro

¹

,

María Ángeles Gómez-González

²

and

José Carlos López-Ardao

^3,*

¹

Department of Electronic Technology, Universidad de Vigo, 36310 Vigo, Spain

²

Department of English and German Philology, Universidad de Santiago de Compostela, 15782 Santiago de Compostela, Spain

³

atlanTTic Research Center for Telecommunications Technology, Universidad de Vigo, 36310 Vigo, Spain

^*

Author to whom correspondence should be addressed.

Multimodal Technol. Interact. 2025, 9(6), 54; https://doi.org/10.3390/mti9060054

Submission received: 3 April 2025 / Revised: 29 May 2025 / Accepted: 3 June 2025 / Published: 4 June 2025

(This article belongs to the Special Issue Video Games: Learning, Emotions, and Motivation)

Download

Browse Figures

Versions Notes

Abstract

This paper presents the design and evaluation of e-SoundWay, a cross-platform serious game developed to improve English phonetic competence through a multimodal and narrative-driven approach. While the platform is specifically tailored to meet the needs of Spanish-speaking learners, it is adaptable for a wider range of English as a Foreign Language (EFL) users. e-SoundWay offers over 600 interactive multimedia minigames that target three core competencies: perception, production, and transcription. Learners progress along a gamified version of the Camino de Santiago, interacting with characters representing diverse English accents. A mixed-methods evaluation combining pre- and post-tests with a user experience questionnaire revealed statistically significant improvements across all domains, particularly in perception. Reduced post-test variability indicated more equitable learning outcomes. User satisfaction was high, with 64% of participants reporting satisfaction with their phonetic progress and 91% stating they would recommend the platform. These findings highlight the educational effectiveness, accessibility, and motivational value of e-SoundWay, reinforcing the role of serious games and multimodal technologies in delivering inclusive and engaging pronunciation instruction.

Keywords:

English pronunciation; serious games; multimodal learning; computer_assisted language learning (CALL); computer-assisted pronunciation training (CAPT); phonetic competence; gamified language education

1. Introduction

The ability to communicate effectively in English has become a fundamental skill in today’s globalized world, where English functions as a lingua franca across academic, professional, and social domains. With more than 1.35 billion people using English as an international, second or foreign language (EIL/ESL/EFL), linguistic competence is no longer optional—it is strategic [1,2]. The growth of digital communication, global mobility, and intercultural interaction has led to the proliferation of English varieties and communicative norms, reinforcing the need for learners to navigate diverse linguistic contexts [3,4].

Within this evolving landscape, communicative competence must encompass not only grammatical and sociolinguistic knowledge but also phonetic awareness. This includes the ability to produce intelligible speech, accurately control rhythm and intonation, and recognize different English accents [5,6]. However, international assessments such as PISA and the European Survey on Language Competences consistently identify pronunciation as a major challenge for Spanish-speaking EFL learners, especially at the secondary and tertiary levels. These difficulties are often linked to cross-linguistic interference, phoneme-grapheme mismatches, and the persistence of outdated or teacher-centered methodologies [7,8].

In response to these challenges, this study introduces e-SoundWay, a cross-platform serious game aimed at developing English phonetic competence through an innovative, learner-centered, and gamified approach. While specifically designed with Spanish-speaking learners in mind, the platform’s adaptability makes it relevant for a broader EFL population. e-SoundWay leverages the power of mobile learning, interactive multimedia, and narrative-driven design to support lifelong learning and promote inclusive, accessible phonetics instruction.

Co-developed by the Electronic Equipment Engineering Group at the University of Vigo and the Scimitar linguistics team of the Discourse and Identity Group at the University of Santiago de Compostela, the tool reflects a cross-disciplinary partnership that integrates technological innovation with pedagogical and linguistic expertise. As will be further detailed in Section 2, the platform includes over 600 interactive minigames, targeting three core phonetic skills: (1) perception—the ability to distinguish phonemes, allophones, and accentual variation; (2) production—accurate articulation of English speech sounds; and (3) transcription—mastery of the International Phonetic Alphabet (IPA) for decoding and encoding speech. A custom virtual keyboard, “TecladoFonetico,” featuring 51 IPA symbols, supports transcription tasks [9].

The pedagogical design of e-SoundWay is grounded in blended and mobile learning paradigms that combine synchronous and asynchronous instruction [10,11]. Drawing from principles of Computer-Assisted Language Learning (CALL) and Mobile-Assisted Language Learning (MALL), the platform fosters learner autonomy, multimodal engagement, and digital literacy [12,13]. Importantly, it also promotes cultural awareness through a storyline centered on the Camino de Santiago, which introduces learners to diverse English accents embedded in rich communicative contexts.

Although research has consistently highlighted the benefits of Computer-Assisted Pronunciation Training (CAPT) [14,15], many existing tools remain limited in scope. Common criticisms include the lack of personalized feedback, overly simplified phonological models, and assumptions about user proficiency that do not align with real learner needs [16,17]. Moreover, issues such as low digital literacy and insufficient scaffolding contribute to high dropout rates in online phonetics instruction [18].

This study addresses these limitations by evaluating the pedagogical impact of e-SoundWay as a next-generation CAPT solution. Section 2 presents the research design, platform features, as well as the guiding research questions. Section 3 and Section 4, in turn, detail and discuss the results, while Section 5 draws together key conclusions and implications for future research and educational practice.

2. Materials and Methods

2.1. Tools for Game Creation

The serious game e-SoundWay was developed to enhance learner motivation and engagement in acquiring English phonetics, with a particular focus on segmental and suprasegmental pronunciation. Grounded in the EPSS method [19], which combines perceptual training, articulatory practice, and phonological transcription through task-based, multimodal instruction, the platform is purposefully designed for seamless integration into university-level English Phonetics and Phonology courses. At the same time, its user-friendly interface and autonomous learning features make it equally accessible to independent learners, educators, and professionals aiming to strengthen their phonetic awareness and English pronunciation skills in a flexible, engaging format. The method has been shown to improve learners’ intelligibility, self-monitoring, and transcription accuracy in empirical studies conducted in higher education contexts [20,21]. In particular, the authors in [20] report an empirical evaluation involving over 200 Spanish-speaking university students, demonstrating significant improvements in segmental accuracy, phonemic transcription, and learner motivation following the use of EPSS-based multimedia materials. These findings are further supported by the authors’ longitudinal study [21], confirming statistically significant gains in pronunciation accuracy, transcription fluency, and error detection skills among students using the EPSS Multimedia Lab. Together, these studies confirm the value of EPSS-informed digital tools for phonetics instruction and validate their adaptation to serious game formats such as e-SoundWay. Designed as a complementary digital resource to traditional classroom instruction, the game promotes self-directed, gamified learning that supports phonological awareness and active learner participation.

The e-SoundWay design philosophy is also grounded in research demonstrating the motivational and pedagogical effectiveness of serious games in second-language learning environments [22,23,24,25,26,27,28,29,30,31,32,33,34,35]. Meta-analyses such as those by Girard et al. [22] and Subhash and Cudney [23] provide compelling evidence for the effectiveness of serious games and gamified learning in higher education. Girard et al. [22] demonstrate that serious games not only improve motivation and engagement but also lead to significant gains in knowledge acquisition and skill development, outcomes that are particularly relevant for pronunciation training, where repetitive, focused practice is essential. Similarly, Subhash and Cudney [23] identify critical gamification elements, such as feedback, rewards, progress tracking, and learner autonomy, as key drivers of improved academic performance and sustained engagement. These findings underscore the pedagogical value of integrating serious games into EFL learning environments, where they can provide personalized, interactive, and motivating contexts for developing phonological competence and pronunciation accuracy.

The application was developed using the Unity engine [36], chosen for its robust multimedia capabilities and broad compatibility across operating systems (Windows, macOS, Linux) and mobile platforms (Android, iOS). Unity’s architecture enables the creation of immersive 2D environments through real-time rendering, including dynamic shadows, relief mapping, and reflections. Game logic is programmed using MonoDevelop, an open-source IDE that supports C# and other .NET languages, allowing for precise scripting and control of visual and interactive components. The development environment features four primary components:

Hierarchical Tree: Enables the creation of screen elements and their interdependencies, including layered objects such as buttons, text, and images.
Project Window: Hosts the organizational structure of assets and code modules used throughout the application.
Scene and Game View: Provides a live preview of the interface, including audio-visual materials and user interactions.
Inspector and Console: Allows developers to configure each element’s properties (e.g., size, position, behavior) and troubleshoot runtime issues.

Scene resources are unloaded between stages to optimize memory usage, except for background music, which uses a Singleton pattern to maintain continuity. Care is taken to avoid redundant Singleton instances that could degrade performance.

Additionally, a core functionality of the game is its integration of offline speech recognition, implemented via the Recognissimo Module [37]. This system converts learner speech into text, enabling real-time comparison with target phonemes. All voice processing routines are implemented as asynchronous co-routines that run in parallel with the main gameplay. This design ensures that speech recognition tasks, such as audio capture, preprocessing, and text conversion, do not interfere with interactive gameplay elements. To maintain performance stability, these routines allocate system resources dynamically and introduce controlled delays to accommodate real-time sentence processing without compromising user experience. All mini-games that require speech recognition include specialized code blocks configured to meet the technical and pedagogical requirements of each task. These blocks handle input signal cleaning, timing control, and target-specific evaluation procedures to ensure consistent, accurate performance across diverse game types as follows:

Language Model Provider: References the target language model used to evaluate user speech input (see Figure 1). It specifies the location of the language-specific dictionaries and phonetic resources that support the speech recognition process.
Audio fragment as voice source: Handles incoming audio clips, including cleaning, segmentation, and preparation for recognition tasks (Figure 2). It is specifically used in gameplay sections that require temporary audio file creation for comparing user-generated speech with predefined reference samples. By preprocessing the audio input, this module improves the accuracy and reliability of the speech recognition system.
Speech Recognizer: Executes the speech recognition process by integrating various program elements (Figure 3). It utilizes the target language configuration defined in the Language Model Provider and accepts input either directly from a microphone or from pre-recorded audio clips. Target words are dynamically assigned to guide the evaluation process. A dedicated function supports control and comparison algorithms, allowing the system to generate reference lists and assess learner output through tolerance thresholds and approximation-based matching.
External speech recognizer: Manages the execution of the speech recognition program based on the mini-game type selected (Figure 4). Within the development environment, a Boolean selector is used to configure the input format: short recordings for Listen and Repeat tasks (RECORDING1), medium-length recordings for Minimal Pairs (RECORDING2), and extended recordings for Connected Speech exercises (RECORDING3). This configuration ensures that the recognition system adapts accurately to the phonetic and temporal complexity of each task type.

2.2. General Structure and Game Contents

e-SoundWay delivers its phonetic content through 12 carefully designed challenge types, implemented as multimedia mini-games that develop learners’ perception, production, and transcription skills. Each mini-game aligns auditory input with visual and textual elements, creating multisensory reinforcement and fostering phonological mapping. Drawing on Mayer’s multimedia learning theory [4] and informed by Derwing and Munro’s principles for effective pronunciation instruction [6], an approach that enhances both phoneme recognition and segmental intelligibility.

The tool adopts a microlearning framework [2], segmenting instruction into targeted, manageable “phonetic pills” that promote focused practice and cognitive retention. Empirical studies have shown that modular, feedback-driven phonetics tasks, particularly when implemented in gamified digital environments, boost learner motivation and improve long-term outcomes [13,16]. To support accessibility and reinforce phonological processing, the platform incorporates visual stimuli based on ARASAAC (Aragonese Portal of Augmentative and Alternative Communication) pictograms, which, as demonstrated by Allen et al. [38], facilitate sound-word mapping and phonemic awareness development, particularly for learners with additional processing needs.

High-quality audio recordings, produced at University College London’s Speech, Hearing & Phonetic Sciences division, ensure articulatory accuracy and pedagogical consistency. The musical soundtrack, featuring Galician groups Luar na Lubre and Resonet, provides a culturally immersive experience that enriches the phonetic input. This aligns with findings by Lems [39], who emphasizes that music rooted in learners’ cultural backgrounds fosters rhythmic sensitivity, enhances prosodic modeling, and increases engagement, key components in pronunciation training.

The user interface comprises a home screen, a progress-tracking dashboard, challenge-specific instructions, the mini-games themselves, and a Library of supplementary didactic materials. Although the total number of challenge types remains fixed at 12, the user’s interaction determines the order, frequency, and depth of engagement with each module. These design choices reflect learner-centered principles in CAPT [24] and serious game research [32,33,35], which emphasize scaffolding, adaptive feedback, learner autonomy, and sustained engagement.

Upon launching the application, users enter the general interface via the home screen. From there, they can access progress metrics, browse categorized mini-games and stage information, or consult the Library for additional learning materials. Figure 5 illustrates the overall game flow. While the number of challenge stages is generally stable, it may vary slightly based on task configuration but never exceeds 12.

2.2.1. Home Screen

The e-SoundWay platform incorporates a user authentication system to personalize the learning experience and ensure data privacy. On the home screen, users are presented with two entry options: creating a new user profile or logging in using a previously registered username and password. This structure allows for individualized learning trajectories while safeguarding player data against unauthorized access, a critical feature in educational technologies where progress tracking and personalization are essential components of learning effectiveness [1,5].

When a new user initiates the registration process, the interface prompts for a username and password. The system then verifies whether all required fields are completed. If the data are missing, the profile cannot be created. Upon successful validation, the program checks for redundancy by comparing the input to existing entries. If the user is unique, the system creates a dedicated data table on the local device. This table remains persistent and dynamically records the player’s progress throughout the game. The back-end structure also maintains a global registry to manage all users. This includes an incremental counter tracking the number of players created, and a reference map linking each username to a unique creation identifier. Such functionality supports both administrative transparency and pedagogical scalability, features increasingly emphasized in gamified learning environments [27,33]. In cases where a user attempts to log in with an existing profile, the system authenticates their identity and retrieves the corresponding data table, enabling a seamless continuation of the game and individualized pronunciation training.

This model is consistent with best practices in CALL and CAPT, where progress tracking and adaptive content delivery are recognized as key factors in maintaining learner engagement and ensuring pedagogical alignment with prior performance [24]. Additionally, studies in game-based learning environments highlight that personalization and ownership over learning data contribute significantly to learner autonomy and sustained motivation [33,34].

2.2.2. Main Interface

The structural design of e-SoundWay draws on the roadmap game metaphor, which, as noted by Peterson and Jabbari [40], leverages spatially oriented progression systems to enhance learner motivation, engagement, and perceived competence in EFL contexts, especially when embedded within meaningful cultural narratives. Accordingly, the user interface consists of four core interactive modules: the Global Map, the Stage Map, the Options Panel, and the Pilgrim Zone. These modules operate within a unified scene and are supported by a set of initialization routines responsible for rendering information dynamically on screen. Their integration ensures consistency in user interaction, performance tracking, and pedagogical feedback across the platform. A top navigation bar enables users to switch seamlessly between modules via button-based actuators (Figure 6). These buttons allow users to exit the game, access the current stage map (based on a stored stage ID), explore the global route (“The Way”), consult the Pilgrim Zone for individual learning analytics, retrieve additional resources from the Library, or configure settings through the Options panel. The interface adheres to principles of minimalist design in CALL environments, where streamlined navigation, visual clarity, and functional efficiency are essential to sustaining learner engagement and facilitating effective interaction [24,33].

Stage Map: Displays an image of a real section of the Camino de Santiago with interactive stop markers placed along the route (Figure 7). On the left side of the interface, a static frame presents information on player progress, including stop descriptions and a visual record of earned medals. Scallop shells, chosen for their cultural association with the Camino, represent achievement tiers: bronze, silver, or gold, awarded based on learner performance. The use of culturally grounded symbols aligns with findings by Lems [39], who emphasizes that familiar cultural cues increase emotional investment and deepen engagement with learning tasks. Each stage map includes interactive buttons programmed to activate animations, illuminate earned medals, and display stop-specific data such as time spent and phonetic scores. A “GO!” button becomes available once a stop is activated, allowing learners to continue their journey. Across the platform, 34 stage maps provide unique content and visual references while maintaining a consistent interaction logic.
The Global Map (“The Way”): includes hidden actuators that, once triggered, load associated stage data and dynamically generate the relevant interface elements. This modular design supports data-driven content delivery and simplifies stage transitions, enhancing both navigational clarity and scalability [35].
The Pilgrim Zone: Includes a personalized learning dashboard (see Figure 8) that provides learners with real-time feedback on their current stage, accumulated scores, unlocked content, and overall progress across three core skill areas: perception, production, and transcription. These competencies are visually represented in a dynamic 3D pyramid, which reflects the learner’s advancement along each dimension. Additional features, such as an experience bar and a usage time clock, are incorporated to promote sustained engagement, metacognitive awareness, and effective time management, in accordance with established principles of serious game design [33,34]. Consistent with other EFL-focused serious games [23,24], e-SoundWay adopts a scaffolded, performance-based learning path grounded in the EPSS model. This structure ensures a gradual increase in phonetic and phonological complexity, thereby supporting effective and individualized pronunciation development.
The Options Panel: Provides basic configuration tools, such as adjusting game audio levels and resetting progress. Selecting the reset option deletes all persistent data tied to the player profile, restoring the game to its default state. This functionality supports both usability testing and learner control, two features highlighted ineffective CALL platform design [24,27].

2.2.3. Stop Screens

Stop screens serve as intermediary nodes between the main interface and the mini-games, offering users contextual information and navigational control within each stage. These screens are composed of a flexible set of buttons, the number and arrangement of which vary depending on the volume of content and interaction options associated with each stop. The interface supports up to nine buttons per screen. At the top, one or two arrow-shaped icons, visually inspired by the Camino de Santiago, enable learners to move sequentially between stops without returning to the main interface. This design facilitates fluid navigation and maintains cognitive continuity within stages. Additional top-level controls include sound settings and a return button to the main interface. Centrally positioned buttons (typically between two and five) provide access to individual informational modules, while a dedicated final button always launches the phonetic mini-game associated with the current stop. This consistent layout reinforces procedural fluency and supports repeated exposure, two key principles in the design of educational games for language learning [33,34].

When learners interact with any of the informational buttons, a modal window appears containing written explanations and corresponding audio narration (Figure 9b). These dual-channel elements promote phonological awareness through synchronized verbal and visual input, a technique supported by multimedia learning theory [4] and studies on pronunciation-focused CALL environments [24]. Additionally, some informational windows include interactive hyperlinks, open in the default system browser and highlighted in blue, which redirect learners to external resources such as the EPSS Multimedia Lab website (Figure 9b), thereby bridging in-game content with broader phonetic instruction. The structure of these stop screens aligns with scaffolding principles in serious game design, providing layered access to core content while encouraging learner autonomy through selective exploration.

2.2.4. The Library as a Multimodal Resource

The Library in e-SoundWay offers a dedicated space for extended, multimodal learning that reinforces the game’s phonetic training goals. It provides curated content that includes historical insights into pilgrimage traditions, cultural references linked to the Camino de Santiago, and phonetic resources anchored in the EPSS framework. Accessible through hyperlinked audiovisual modules, the Library connects learners to the EPSS Multimedia Lab, allowing them to deepen their understanding of phonetic theory and pronunciation practice beyond the game itself (Figure 10).

By bridging in-game tasks with evidence-based content and reflective learning, the Library significantly expands the pedagogical scope of CAPT. It unites task-based learning with structured phonetic theory, transforming e-SoundWay into a dynamic, multimodal environment that supports both formal instruction and autonomous learning. This integration fosters learner autonomy, metacognitive strategy development, and sustained motivation, as emphasized in research on content-rich CALL environments [16,20,22]. Moreover, by embedding culturally and linguistically relevant resources, the Library enhances emotional engagement and contextual relevance [7,39].

Technically, the Library consists of four modular, dynamically loaded objects, each combining text, images, and interactive components that activate only when selected. This design prevents scene conflicts, ensures efficient memory use, and aligns with best practices in mobile-assisted language learning, where modularity and responsiveness are essential [15,24].

2.2.5. Scoring System and Game Progress

The scoring architecture in e-SoundWay is structured around two layers of scripting: a resident script specific to each mini-game type, and a central script that aggregates scores and calculates overall player experience. This layered design enables both localized feedback and global performance tracking, key features in formative assessment frameworks for language learning [33,34].

Each mini-game generates an individualized score based on the player’s performance, which is then evaluated against a success rate (%success). This rate is calculated using Equation (1) for most mini-game types, while Equation (2) applies to the final two game categories, reflecting their increased complexity.

%success = if (success ≥ mistakes) → (success − mistakes)/number of questions
if (success < mistakes) → 0

(1)

%success = success/number of questions

(2)

Upon completion of a mini-game, the player receives a final score calculated by multiplying the %success value by the game’s maximum point allocation (Equation (3)). This result is visually displayed at the end of the activity, providing immediate performance feedback (Figure 11).

Score = %success × 10,000

(3)

In addition to numeric scores, the game employs a symbolic reward system in the form of medals, bronze, silver, and gold scallop shells, symbols of the Jacobean route, assigned according to performance thresholds. Table 1 illustrates the correlation between %success, accumulated points, and the corresponding medal. The use of culturally symbolic visual rewards, such as the scallop shell from the Camino de Santiago, enhances learner engagement and reinforces thematic continuity [39,40].

Cumulative scores are displayed in the Pilgrim Zone, where learners can monitor their overall progress. These scores are calculated by summing the total points earned across mini-games. To produce the progress graph shown in Figure 8, the total achieved scores are compared against the theoretical maximum, yielding a normalized success rate between 0 and 1, as defined in Equation (4). This visual metric reinforces metacognitive reflection, allowing learners to assess their performance trajectory over time [16,24]. To ensure pedagogical consistency and minimum competence acquisition, progression within the game is gated. Learners must earn at least a bronze medal in each stage to unlock the subsequent one. This design reflects the principles of competence-based advancement and mastery learning [5], ensuring that players engage meaningfully with each stage’s phonetic objectives before continuing.

%skill = ∑ (%success at that mini-game of that skill)/number mini-games of that skill played

(4)

2.2.6. Structure of the Mini-Games

The e-SoundWay tool consists of 614 mini-games divided into 3 categories: Perception games (266 mini-games)), Production Games (154 mini-games), and Transcription (194 mini-games).

Perception

In second language phonetics, perception refers to the learner’s ability to detect, differentiate, and categorize speech sounds, particularly phonemic contrasts that may not exist in their first language [6,24]. Among perception-focused activities, identification and discrimination tasks play a central role, as noted by Strange and Shafer [41], Bradlow et al. [42], or Lengeris and Hazan [43], among others, who emphasize that such tasks are essential for “re-educating selective perception” in adult L2 learners, leading to measurable gains in both perceptual acuity and production accuracy, which supports the integration of structured perception training into digital pronunciation tools. e-SoundWay includes four perception mini-games that reflect this research-based approach:

Identification: Learners recognize and label a sound by filling in missing phonemes in partially displayed words using audio cues and the virtual phonetic keyboard (Figure 12a). This reinforces sound-to-symbol mapping and contrastive listening [19,24].
Odd one out: Players listen to four items and select the phonologically distinct one (A-B-A-A, B-A-A-A) (Figure 12b). Focusing on phonetic discrimination, i.e., comparing sounds and deciding whether they are the same or different, these activities build categorical perception and raise awareness of subtle L2 contrasts, often without requiring explicit labeling [6,42].
Variation: These mini-games train learners to recognize phonetic variants, both context-driven and accent-based, by exposing them to multiple realizations of target phonemes (Figure 12c). This fosters flexible listening and perceptual tolerance, aligning with research that stresses the need to prepare learners for phonetic diversity in real-world speech (Setter & Jenkins [44]; Munro & Derwing [45]) and supports the development of robust phonological categories and speech convergence (Bradlow et al. [42]).
Connected speech. Learners identify target sounds embedded in naturalistic text. By interacting with audio and receiving visual feedback, they develop an awareness of linking, assimilation, and reduction (Figure 12d). Strange and Shafer [41] and Wang and Munro [46] both advocate for such training to improve real-world listening comprehension.

These perception tasks reflect the EPSS method’s emphasis on input-rich, feedback-oriented learning [20,21]. By embedding these tasks within a gamified progression model, e-SoundWay promotes sustained attention, repeat exposure, and active engagement—factors shown to accelerate phonological development [32,33].

Production

Production alludes to the learner’s ability to articulate speech sounds accurately and intelligibly, involving the motor execution of phonological representations. Accurate production is a crucial component of oral intelligibility and is deeply intertwined with perceptual learning, as learners must first develop the ability to hear contrasts before they can reliably reproduce them [41].

Production alludes to the learner’s ability to articulate speech sounds accurately and intelligibly, involving the motor execution of phonological representations. Accurate production is a crucial component of oral intelligibility and is deeply intertwined with perceptual learning, as learners must first develop the ability to hear contrasts before they can reliably reproduce them [41]. The production module in e-SoundWay aims to strengthen these articulatory skills through interactive mini-games of three different kinds that provide structured, feedback-rich output practice, each targeting different aspects of segmental and suprasegmental production:

Listen and Repeat tasks are rooted in traditional mimicry-drill techniques, where learners attempt to replicate the target sound after auditory exposure (Figure 13a). Wang and Munro [46], among others, demonstrated that, despite their simplicity, such drills remain effective for developing articulatory precision, particularly when paired with immediate feedback and visual reinforcement. It is therefore assumed that computer-assisted listen-and-repeat tasks will improve sound contrast production and lead to measurable gains in intelligibility and learner confidence.
Minimal pairs: Provide contrastive practice with phonemes that are especially challenging for EFL learners. By requiring learners to differentiate and articulate pairs of words that differ in the pronunciation of one, often challenging phoneme, these tasks promote fine-grained phonological awareness and motor control (Figure 13b), as previously found by authors like Bradlow et al. [42] or Lengeris and Hazan [43]. They explained that explicit minimal pair training leads to improvements in both perception and production, especially when it incorporates high-variability input across speakers and contexts.
Connected speech mini-games focus on the natural flow of spoken English, requiring learners to articulate word sequences that exhibit phonetic processes such as assimilation, linking, and elision (Figure 13c). These features often contribute to reduced intelligibility among EFL speakers who have primarily trained on isolated words or canonical forms. Studies such as those by Jenkins [44] emphasize that connected speech awareness is critical for developing rhythm, fluency, and listener-oriented intelligibility. Training learners to perceive and produce connected speech patterns equips them with the tools to understand rapid, native-like speech and to improve their own prosodic delivery in extended utterances. Incorporating connected speech into production tasks also supports the communicative goal of pronunciation instruction, which is not only accuracy but also comprehensibility in real-world interaction [45].

As with the perception module, the production mini-games in e-SoundWay are structured in accordance with the EPSS framework [20,21], which advocates for the progressive layering of phonetic skills, beginning with perception, followed by production, and culminating in transcription. The production tasks are not only gamified to enhance learner motivation but are also carefully sequenced to allow for repeated, scaffolded practice with increasing levels of complexity. This pedagogical design fosters the internalization of articulatory patterns while simultaneously reducing the anxiety often associated with pronunciation training in traditional classroom contexts.

Transcription

Transcription refers to the ability to decode or encode the sounds of spoken language using phonemic symbols or corresponding graphemes. It plays a critical mediating role between perception and production by encouraging learners to attend closely to sound-symbol correspondences, segmental detail, and phonological structure. As Strange and Shafer [41] emphasize, transcription promotes the re-education of perceptual selectivity and the formation of stable phonemic categories in adult learners. It also reinforces the articulatory and acoustic properties of L2 phonemes through metacognitive engagement with speech. As already noted, drawing on the EPSS framework [20,21], the e-SoundWay transcription module targets the third and final skill in the perceptual-articulatory sequence. It consists of five mini-games that train learners in both phoneme-to-grapheme and grapheme-to-phoneme conversion. Four types of transcription games are designed not only to reinforce previous perceptual and production learning but also to enhance learners’ ability to decode a range of phonological inputs, including those from different English accents:

Missing symbols: In these mini-games, learners are required to fill in missing phonemes within a set of words displayed on the screen in order to reinforce sound-symbol associations and test learners’ segmental decoding accuracy [46]. Unlike in identification tasks, here learners must select phonemic symbols from a virtual IPA keyboard to complete the word forms. The player can overwrite previously placed entries before finalizing their response using the “Check” button (Figure 14a).
Direct transcription mini-games require learners to transcribe full words from graphemes into phonemic script, in line with Wang and Munro’s [45] suggestion that transcription-based tasks improve learners’ sensitivity to non-native vowel contrasts and provide insight into their own production limitations. Audio support is provided for each word, and users interact with the IPA keyboard to generate phonological representations. The format mirrors authentic tasks used in pronunciation instruction to strengthen learners’ understanding of English spelling-pronunciation irregularities (Figure 14b).
Reverse transcription: Learners are given phonemic transcriptions and must produce the corresponding graphemic forms. The game employs a QWERTY keyboard and allows for multiple correct spellings, as one phoneme string may correspond to several orthographic variants. The evaluation system accounts for such variability, penalizing only duplicated or incorrect entries while accommodating lexical diversity (Figure 14c). As in the previous case, these minigames intend to enhance learners’ decoding strategies and are especially relevant for those who struggle with the irregularities of English spelling conventions.
Crossword. In these interactive puzzles, players complete a crossword grid using either phonemes or graphemes, depending on the given clues. The clues may include audio, IPA symbols, or graphemic hints, and the keyboard script dynamically updates each entry and score. Evaluation is based on the correctness of full entries, though the task avoids penalizing partial errors. This game helps consolidate learners’ segmental and suprasegmental awareness across multiple word forms (Figure 15a). as suggested by authors like Setter and Jenkins [44], who note that explicit transcription activities enhance learners’ ability to monitor their own pronunciation more effectively over time.
Word puzzle: In these mini-games, players must locate words within a grid using graphemes, phonemes, or only audio clues. Increased difficulty is introduced in stages involving accent confrontation, where audio becomes the primary input. The grid is dynamically generated using stored lexical data, and correct matches are highlighted visually and aurally (Figure 15b). This task promotes orthographic and phonemic scanning skills, crucial for real-time decoding and listening comprehension [41,44].

Together, these transcription mini-games bridge the gap between perceptual training and productive control and writing (i.e., reinforcing associative mappings across sounds- spellings-IPA symbols) by enabling learners to analyze the structure of words in both auditory and symbolic forms. By developing explicit knowledge of the phonological system through repeated, interactive transcription practice, learners can enhance both their intelligibility and their phonological autonomy in real-world communication.

2.3. Methodology

2.3.1. Research Questions

This study was guided by two main research questions to evaluate the instructional effectiveness and learner reception of the platform:

To what extent does the use of e-SoundWay improve Spanish-speaking university EFL learners’ performance in phonetic perception, production, and transcription as measured by pre- and post-intervention tests?
How do learners perceive the pedagogical effectiveness, usability, and motivational value of the e-SoundWay platform according to post-intervention user experience questionnaires?

2.3.2. Research Design

The experiment employs a mixed-methods design to evaluate the pedagogical effectiveness and user experience of e-SoundWay. Quantitative data were collected via pre- and post-tests designed to assess learners’ phonetic perception, production, and transcription skills, following the EPSS model [20,21]. Qualitative and attitudinal data were gathered through post-intervention user experience questionnaires. Studies by Subhash and Cudney [33], Girard et al. [32], and Chen et al. [16] underscore that learner motivation, engagement, and perceived usability are key factors influencing the success of gamified educational tools in EFL contexts. Gathering learner perspectives not only offers insights into the affective dimensions of the learning experience but also supports iterative improvements in tool design and pedagogical alignment.

2.3.3. Participants

Participants consisted of Spanish-speaking university students (N = 33) enrolled in a first-year undergraduate course in English Phonetics and Phonology at the University of Santiago de Compostela in the academic year 2023/2024, which followed the EPSS method [19]. All participants had an upper-intermediate level of English (B2, CEFR) and limited prior exposure to phonetic transcription or pronunciation-focused training.

As the primary aim of this study was to evaluate the effectiveness of the e-SoundWay platform through within-subject comparison, no control group was included. This approach is consistent with similar studies in CALL and CAPT contexts (e.g., [16,45]), where pre- and post-intervention testing has been used effectively to assess learning gains among target user populations. Participation was voluntary, and all students provided informed consent.

2.3.4. Player Progression and Level System

The e-SoundWay platform features a gamified leveling system designed to monitor learner progress and foster sustained engagement across the 34-stage Camino itinerary. Learners begin at Level 1 and advance by completing tasks, achieving accuracy in phonetic transcription, and performing successfully in pronunciation and perception challenges and minigames. Progression is driven by the accumulation of points, which unlock higher levels characterized by increased linguistic complexity and greater exposure to a variety of English accents.

Learners’ development in perception, production, and transcription skills is visually represented through a dynamic 3D pyramid (see Section 2.2.2, Pilgrim Zone, Figure 8), based on the scoring system detailed in Section 2.2.5. This multimodal representation allows for continuous monitoring of individual progress across the three core phonetic competencies. Beyond its motivational function, the leveling system acts as a formative assessment tool, providing learners with immediate, visually accessible feedback on their evolving proficiency in specific phonetic categories. Moreover, it supports phonological development through a scaffolded learning trajectory that incrementally increases both cognitive and articulatory demands.

2.3.5. Instruments

Three types of diagnostic pre- and post-tests were designed, each corresponding to a specific mini-game category—perception, production, and transcription—to assess learners’ phonetic abilities, whose relevance for EFL phonetic training has been previously mentioned. In addition, a post-intervention questionnaire comprising 12 items was administered to collect data on learner profiles, perceived usability, motivation, task difficulty, and perceived learning outcomes. The questionnaire items were organized into two categories, as shown in Table 2 and Table 3. This research design draws on previous studies (e.g., Chen et al. [16]; Girard et al. [32]; Cudney [33]) that highlight the importance of learner perception data in evaluating digital learning environments. These studies advocate for the integration of user-centered feedback mechanisms to better capture affective and cognitive engagement and to inform iterative design improvements.

Questions 6, 7, 8, 10, and 11 were answered by the students on a Likert scale from 1 to 5, where 1 is a very poor rating and 5 is an excellent rating. The students’ results are shown in Section 3.2.

2.3.6. Procedure

The experiment was conducted over a four-month period, consisting of one training session per week. In parallel with their EPSS-based formal instruction offered in class, learners used e-SoundWay independently, engaging in self-paced sessions outside the classroom on their own devices. This dual approach allowed them to consolidate course content through gamified, task-based practice aligned with perceptual training, articulatory exercises, and phonological transcription.

In the first week, participants completed a 50 min pre-test comprising 15 mini-games, five targeting each of the three core phonetic competencies: perception, production, and transcription. From weeks 2 to 15, learners accessed the platform autonomously once per week, progressing through scaffolded pronunciation tasks embedded in the game’s storyline. No direct teacher intervention was provided during the training period, ensuring that learners’ progress was driven by interaction with the platform’s multimodal feedback and learning tools.

In the final week, a post-test mirroring the structure of the pre-test was administered, followed by a user experience questionnaire. While the post-test assessed the same three phonetic competencies, perception, production, and transcription, it included a different set of mini-games. This approach ensured that learners were not simply recalling previous task content but were instead applying acquired skills to novel items. Using parallel but distinct tasks is not only consistent with sound assessment practices in language learning research, as it minimizes memory effects and enhances construct validity [4,14,24]. It is also grounded on principles of authentic assessment increasingly relevant to contemporary language education, where learners demonstrate skill transfer in varied contexts so that learning outcomes can be evaluated in conditions that reflect real-world blended and independent learning contexts [10,27].

2.3.7. Data Analysis

To assess the effectiveness of e-SoundWay, a comprehensive set of statistical analyses was conducted. Paired-sample t-tests compared pre- and post-test scores across the three phonetic competencies—perception, production, and transcription—while Cohen’s d was calculated to determine the magnitude of observed effects. To evaluate the consistency of learning outcomes, standard deviation values were analyzed across test phases.

User Experience Questionnaire responses were also examined using frequency distributions and additional tests to identify patterns related to satisfaction, motivation, and perceived learning. Specifically, a chi-square test of independence was applied to determine whether prior exposure to serious games influenced usability perceptions, while a one-way ANOVA explored the effect of time spent on online gaming on learners’ perceived levels of challenge and engagement. Perceived difficulty across the three phonetic domains was assessed using a Friedman test, followed by Wilcoxon signed-rank tests for post hoc comparisons. Additionally, a Pearson correlation analysis examined the relationship between learners’ self-reported improvement and overall satisfaction with the platform.

Together, these statistical procedures offered a multidimensional understanding of learners’ interactions with the tool, the challenges they faced, and the influence of background factors on both user experience and learning performance. Moreover, this analytical approach aligns with established methodologies in previous research on CALL, CAPT, and serious games (e.g., Chen et al. [16]; Girard et al. [32]; Subhash & Cudney [33]).

3. Results

3.1. Pre-/Post-Tests

To address RQ1, Figure 16 presents the quantitative results of the pre- and post-tests, illustrating both average learner performance and score variability across the cohort.

Figure 16 displays the mean percentage of correct responses for each phonetic skill—perception, production, and transcription—before and after the intervention. All comparisons yielded statistically significant improvements (p < 0.001), which were further corroborated by Cohen’s d effect size calculations. These indicated large effects across all three domains, with particularly notable gains in perception [6,24,25,45].

To assess the consistency and dispersion of student performance, standard deviation values were calculated for each skill category and test phase. Figure 16 shows the standard error bar for each category. The results revealed a marked reduction in variability across the board:

In Perception, the standard deviation decreased from 0.0284 to 0.0070.
In Production, it dropped from 0.0669 to 0.0197.
In Transcription, it declined from 0.0444 to 0.0111.

The observed reduction in score variability, alongside improved mean performance, indicates that learners’ outcomes became more homogeneous after using e-SoundWay. Beyond facilitating overall progress, the tool appears to promote greater equity and consistency in learning, helping to mitigate disparities in individual achievement—a pedagogically significant benefit for classroom implementation. This pattern aligns with prior research on the convergence effects of serious games in language learning contexts [32,33]. In particular, the joint presence of substantial performance gains and reduced variability mirrors findings from earlier studies on pronunciation training tools, which emphasize the benefits of scaffolded and adaptive learning environments for enhancing both individual and group outcomes (Rogerson-Revell [24]; Neri et al. [25]). Taken as a whole, the current findings reinforce existing literature and provide further empirical support for the pedagogical value of game-based CAPT systems in EFL instruction.

3.2. User Experience Questionnaire

To address RQ2, on the other hand, it should first be noted that the User Experience Questionnaire was completed by 91% of the participants, the vast majority (also 91%) of whom were undergraduate students. Only 9% indicated other occupations, suggesting a relatively homogenous academic cohort. Daily internet usage was high: 64% of students reported being online for between 2 and 4 h daily, 18% between 1 and 2 h, and 18% more than 4 h. Despite this, their engagement with online games was modest, with 64% spending less than 30 min daily, 27% between 30 and 60 min, and only 9% exceeding two hours. While only 21% had prior experience with serious games, this background did not appear to significantly impact their perceived usability or engagement with e-SoundWay. Table 4 summarizes our main results.

As shown in Table 4, a chi-square test of independence was conducted to examine whether prior experience with serious games was associated with learners’ perceptions of usability (Question 5: “Are the instructions to play clear enough?”). The analysis revealed no statistically significant association, χ²(2, N = 91) = 1.84, p = 0.398. This finding supports previous research by Chen et al. [16], which indicated that learners with limited exposure to serious games can still navigate such environments effectively when the interface design is intuitive.

Additionally, one-way ANOVA was conducted to assess whether daily gaming time predicted perceived challenge and enjoyment (Question 7). Results indicated a statistically significant effect of gaming frequency, F(2, 88) = 4.67, p = 0.012. Post hoc Tukey tests showed that participants who engaged in online gaming for 30–60 min per day rated e-SoundWay as significantly more engaging than those who rarely played. This suggests that moderate gaming experience may enhance appreciation for game-based instructional design, corroborating findings by Davis et al. [34] on learner profiles and engagement in gamified learning contexts.

Regarding the perceived “game-like” nature of e-SoundWay (Question 8), responses were varied. While 55% of participants described the platform as game-like, others noted aspects of monotony or perceived it as more exercise-oriented. These diverse evaluations align with Rogerson-Revell [24], who emphasized that perceptions of game-likeness often depend on factors such as interactivity, novelty, and feedback mechanisms.

In terms of perceived difficulty across the three phonetic categories, most learners identified Transcription as the most challenging. A Friedman test revealed a statistically significant difference in perceived difficulty, χ²(2) = 36.5, p < 0.001. Subsequent pairwise comparisons using Wilcoxon signed-rank tests (with Bonferroni correction) confirmed that Transcription was perceived as significantly more difficult than both Production (p < 0.01) and Perception (p < 0.001). These findings are consistent with prior research by Derwing and Munro [6] and the authors in [20], which highlighted the cognitive and metalinguistic demands of transcription tasks.

Finally, learners’ responses regarding perceived learning benefits (Q10), overall satisfaction (Q11), and willingness to recommend the platform (Q12) revealed broadly positive evaluations. A Pearson correlation analysis indicated a moderate, statistically significant relationship between perceived improvement and overall satisfaction, r = 0.52, p < 0.001. This suggests that students who felt they had improved their phonetic skills were also more satisfied with the platform, a pattern similarly observed in Girard et al. [32], among others.

4. Discussion

The results derived from the pre- and post-tests (RQ1), supported by the user experience questionnaire (RQ2), offer compelling evidence of the platform’s pedagogical value and learner acceptability. The quantitative data reveal statistically significant improvements in all three phonetic domains, i.e., perception, production, and transcription, with the former showing the most marked increase. These results confirm that scaffolded, interactive environments, especially those incorporating immediate feedback, can have a substantial impact on learners’ auditory discrimination skills [25,45]. Gains in production and transcription similarly indicate that the multimodal structure of e-SoundWay effectively supports productive phonological learning by integrating visual, acoustic, and kinesthetic cues [6,24].

In addition to performance improvements, a significant reduction in standard deviation values across all categories in the post-test indicates a narrowing of learner variability. This result suggests not only that the tool was effective but also that it supported more equitable learning outcomes, a finding consistent with prior research on serious games and performance convergence [32,33].

The results concerning learner perception data provide further insight into the usability and motivational dimensions of the tool. Most participants found the platform accessible and the instructions clear, with many highlighting the value of the immediate corrective feedback provided throughout the experience. These features have been widely recognized as crucial to user engagement and sustained attention in digital learning environments [16,24,33]. While some participants described the experience as somewhat monotonous, others specifically praised the game’s integration of accent variation, which helped them understand how phonetic differences influence meaning. This aligns with broader findings that exposure to phonetic diversity enhances learners’ perceptual tolerance and intelligibility in real-world communication [44,45].

The platform’s strong reception among users is also reflected in quantitative indicators from the user experience questionnaire. Notably, 64% of the participants reported being “satisfied” or “very satisfied” with the improvement in their level of phonetic competence after using e-SoundWay. Furthermore, 91% of respondents indicated they would recommend the tool to others for learning English pronunciation. These findings further validate the platform’s effectiveness and perceived relevance in authentic learning contexts. They also reinforce the conclusion that serious games, when appropriately designed, can significantly enhance learner motivation and engagement, factors critical to successful language acquisition [32,33].

One of the strengths of this initiative lies in its cross-disciplinary design, which brought together technologists, linguists, educators, artists, and native speakers. This collaborative approach ensured both academic rigor and pedagogical relevance, resulting in a tool that is linguistically accurate, culturally meaningful, and technologically accessible. The inclusion of accented characters guiding learners along the Camino de Santiago not only enriched the learning narrative but also promoted phonemic awareness and listening flexibility, both of which are essential in today’s global English environments [41,44,47,48].

Importantly, the scope of e-SoundWay extends beyond the traditional EFL learner. Its design and functionality make it a valuable resource for educators, speech therapists, and autonomous learners seeking to improve pronunciation. The combination of game-based challenges, cultural storytelling, and multimodal feedback reflects current trends in phonetic pedagogy and responds to calls for accessible, scalable, and contextually sensitive solutions in pronunciation instruction [7,8,29].

5. Conclusions

The findings of this study confirm the potential of e-SoundWay as an effective, engaging, and equitable tool for phonetic instruction in EFL contexts. Learners demonstrated statistically significant gains across all three core phonetic competencies, i.e., perception, production, and transcription, and performance disparities were notably reduced after the intervention. These improvements, combined with high levels of user satisfaction, underscore the tool’s instructional efficacy. Specifically, 64% of learners reported being satisfied or very satisfied with their progress in English phonetics, and 91% indicated they would recommend e-SoundWay to peers, a strong endorsement of the platform’s educational impact.

Beyond its effectiveness, the platform represents a notable innovation in the field of digital language learning. It brings together over 600 interactive multimedia minigames, real-time feedback mechanisms, and an immersive narrative experience grounded in the culturally meaningful route of the Camino de Santiago. This fusion of pedagogy, culture, and game mechanics introduces a novel model for pronunciation training that goes well beyond traditional drill-based approaches. By integrating accented characters and situational diversity, the platform promotes phonological flexibility and real-world listening comprehension, preparing learners to navigate global English interactions.

Importantly, e-SoundWay reimagines how phonetics can be taught, not as an isolated theory, but as an interactive, exploratory, and personalized learning journey. Its use of multimodal input, including visual, auditory, and kinesthetic elements, supports differentiated learning and encourages learner autonomy. The cross-disciplinary development model behind the platform, uniting technologists, linguists, educators, and native speakers, ensures that it is both pedagogically sound and adaptable across instructional settings.

By positioning phonetic training within a serious game framework, the tool responds to increasing calls for more motivational, learner-centered, and scalable educational tools. It challenges conventional limitations in pronunciation pedagogy by offering a flexible, data-informed, and culturally enriched alternative that aligns with modern language learners’ needs.

Overall, e-SoundWay changes the educational landscape by demonstrating how multimodal technologies can deliver phonetic instruction that is both academically rigorous and emotionally engaging. Its novel approach holds promise not only for language students and educators but also for speech therapists and lifelong learners. Future work should explore its longitudinal impact, classroom integration, and adaptability to other L1 backgrounds, contributing to the continued evolution of pronunciation teaching in multilingual and multicultural contexts.

Author Contributions

Conceptualization, M.Á.G.-G.; methodology, M.Á.G.-G.; software, A.L.-F. and J.C.L.-A.; validation, M.Á.G.-G. and A.L.-F.; formal analysis, M.Á.G.-G. and A.L.-F.; investigation, M.Á.G.-G., A.L.-F. and J.C.L.-A.; resources, M.Á.G.-G.; writing—original draft preparation A.L.-F.; writing—review and editing, M.Á.G.-G., A.L.-F. and J.C.L.-A.; project administration, A.L.-F. and M.Á.G.-G. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported through a project funded by the Spanish Ministry of Science and Innovation (PID2019-105678RB-C21/C22), a grant corresponding to the 2021 call for “Strategic Projects Oriented towards Ecological Transition and Digital Transition” of the State Programme to Promote Scientific-Technical Research and its Transfer, of the State Plan for Scientific, Technical and Innovation Research 2021–2023, within the framework of the Recovery, Transformation and Resilience Plan. (TED 2021-130283B-C21/C22) and a grant the groups of Competitive Reference of the Xunta de Galicia (ED431C2023/15).

Institutional Review Board Statement

The study approved by the COMITÉ DE ÉTICA NA INVESTIGACIÓN DA USC (protocol code USC 96/2024, approval date 3rd March 2023).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

The authors would like to thank Teresa Sánchez Roura for allowing the students of the second-year English Phonetics and Phonology course of the English Language and Literature degree at the Universidade de Santiago de Compostela to work with this educational resource and to carry out the surveys necessary to obtain the results discussed in the previous section.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Language Policy Programme Education Policy Division, Education Department, Council Europe. Common European Framework of Reference for Languages: Learning, Teaching, Assessment—Companion; Editorial Council of Europe Publishing: Strasbourg, France, 2020; 278p. [Google Scholar]
Camacho Marti, M.; Estévez-González, V.; Esteve Mon, F.M.; Gisbert Cervera, M.M.; Lázaro Cantabrana, J.L. Capítulo XIV. Indicadores de Calidad para el uso de las TIC en los Centros Educativos. Compartir Aprendizaje. Editorial Universitat Rovira i Virgili, Tarragona, Sapin. 2014. Available online: https://hdl.handle.net/20.500.11797/imarina9149374 (accessed on 3 June 2025).
Salinas, J. Innovación Educativa y Uso de las TIC. Editorial Universidad Internacional de Andalucía, Andalucía, Spain, 2008-09, 15–30. Available online: http://hdl.handle.net/10334/136 (accessed on 12 May 2025).
Clark, R.C.; Mayer, R.E. E-Learning and the Science of Instruction: Proven Guidelines for Consumers and Designers of Multimedia Learning, 3rd ed.; Pfeiffer: Zurich, Switzerland, 2011; 528p. [Google Scholar]
Jordan, A.; Carlile, O.; Stack, A. Approaches to Learning: A Guide for Teachers; McGraw Hill Education: London, UK, 2008; 278p. [Google Scholar]
Derwing, T.M.; Munro, M.J. Pronunciation Fundamentals: Evidence-based perspectives for L2 teaching and research. In Language Learning & language Teaching; John Benjamin Publishing Company: Amsterdam, The Netherlands, 2015; 208p. [Google Scholar]
Levy, M. Culture, culture learning and new technologies: Towards a pedagogical framework. Lang. Learn. Lang. Teach. 2007, 11, 104–127. [Google Scholar]
Hou, Z.; Avadoust, V. A review of the methodological quality of quantitative mobile-assisted language learning research. System 2021, 100, 102568. [Google Scholar] [CrossRef]
Lago Ferreiro, A.; Gómez González, M.Á.; Fragueiro Agrelo, Á.; Llamas Nistal, M. Mulit-platform application for learning English phonetics: Serious Game. In Proceedings of the XV Technologies Applied to Electronics Teaching Conference, Teruel, Spain, 29 June–1 July 2022. [Google Scholar] [CrossRef]
O’Malley, C.; Vavoula, G.N.; Glew, J.P.; Taylor, J.; Sharples, M.; Lefrere, P.; Lonsdale, P.; Naismith, L.; Waycott, J. Guidelines for Learning/Teaching/Tutoring in a Mobile Environment. Mobilearn Project Deliverable. 2005. Available online: https://www.researchgate.net/publication/280851673_Guidelines_for_learningteachingtutoring_in_a_mobile_environment (accessed on 1 December 2024).
Hwang, W.Y.; Chen, H.S.L.; Shadiev, R.; Huang, R.Y.-M.; ChenImproving, C.-Y. English as a foreign language writing in elementary schools using mobile devices in familiar situational context. Comput. Assist. Lang. Learn. 2014, 27, 359–378. [Google Scholar] [CrossRef]
Hwang, W.Y.; Chen, H.S. Users’ familiar situational contexts facilitate the practice of EFL in elementary schools with mobile devices. Comput. Assist. Lang. Learn. 2013, 26, 101–125. [Google Scholar] [CrossRef]
Wang, Y.H. Integrating self-paced mobile learning into language instruction: Impact on reading comprehension and learner satisfaction. Interact. Learn. Environ. 2017, 25, 397–411. [Google Scholar] [CrossRef]
Wrigglesworth, J. Using smartphones to extend interaction beyond the EFL classroom. Comput. Assist. Lang. Learn. 2019, 33, 413–434. [Google Scholar] [CrossRef]
Burston, J. Twenty years of MALL project implementation: A meta-analysis of learning outcomes. ReCALL 2015, 27, 4–20. [Google Scholar] [CrossRef]
Chen, C.M.; Chen, L.C.; Yang, S.M. An English vocabulary learning app with self-regulated learning mechanism to improve learning performance and motivation. Comput. Assist. Lang. Learn. 2019, 32, 237–260. [Google Scholar] [CrossRef]
Dehghan, F.; Rezvani, R.; Faceli, S. Social networks and their effectiveness in learning foreign language vocabulary: A comparative study using WhatsApp. CALL-EJ 2017, 18, 1–13. [Google Scholar]
Elekaei, A.; Tabrizi, H.H.; Chalak, A. Investigating the Effects of EFL Learners’vocabulary gain and retention levels on their choice of memory and compensation strategies in an e-learning project. CALL-EJ 2019, 20, 1–18. [Google Scholar]
Gómez González, M.Á.; Sánchez Roura, T. English Pronunciation for Speakers of Spanish: From Theory to Practice; Editorial De Gruyter Mouton: Berlin, Germany, 2016. [Google Scholar]
Gómez González, M.Á.; Lago Ferreiro, A. Computer-Assisted Pronunciation Training (CAPT): An empirical evaluation of EPSS Multimedia Lab. Lang. Learn. Lang. Teach. 2024, 28, 1–44. [Google Scholar]
Gómez González, M.Á.; Lago Ferreiro, A. Web-assisted instruction for teaching and learning EFL phonetics to Spanish learners: Effectiveness, perceptions and challenges. Comput. Educ. Open 2024, 7, 100214. [Google Scholar] [CrossRef]
Hardisty, D.; Windeatt, S. Computer-Assisted Language Learning; Oxford University Press: Oxford, UK, 1989; 165p. [Google Scholar]
Chansarian-Dehkordi, F.; Ameri-Golestan, A. Effects of mobile learning on acquisition and retention of vocabulary among Persian-speaking EFL learners. CALL-EJ 2016, 17, 43–56. [Google Scholar]
Rogerson-Revell, P.M. Computer-Assisted Pronunciation Training (CAPT): Current issues and future directions. RELC J. 2021, 52, 189–205. [Google Scholar] [CrossRef]
Colferai, E.; Gregory, S. Minimizing attrition in online degree courses. J. Educ. Online 2015, 12, 62–90. [Google Scholar] [CrossRef]
Stracke, E. A road to understanding: A qualitative study into why learners drop out of a blended language learning (BLL) environment. ReCALL 2007, 19, 57–78. [Google Scholar] [CrossRef]
Vaz de Carvalho, C.; Coelho, A. Game-Based Learning, Gamification in Education and Serious Games. Computers 2022, 11, 36. [Google Scholar] [CrossRef]
Ko, M.H. Students’ reactions to using smartphones and social media for vocabulary feedback. Comput. Assist. Lang. Learn. 2019, 32, 920–944. [Google Scholar] [CrossRef]
Fouz-González, J. Pronunciation instruction through twitter: The case of commonly mispronounced words. Comput. Assist. Lang. Learn. 2017, 30, 631–663. [Google Scholar] [CrossRef]
Sun, Z.; Lin, C.-H.; You, J.; Shen, H.J.; Qi, S.; Luo, L. Improving the English-speaking skills of young learners through mobile social networking. Comput. Assist. Lang. Learn. 2017, 30, 304–324. [Google Scholar] [CrossRef]
Maskeliünas, R.; Kulikajevas, A.; Blažauskas, T.; Damaševičius, R.; Swacha, J. An Interactive Serious Mobile Game for Supporting the Learning of Programming in JavaScript in the Context of Eco-Friendly City Management. Computers 2020, 9, 102. [Google Scholar] [CrossRef]
Girard, C.; Ecalle, J.; Magnan, A.A. Serious games as new educational tools: How effective are they? A meta-analysis of recent studie. J. Comput. Assist. Learn. 2013, 29, 207–219. [Google Scholar] [CrossRef]
Subhash, S.; Cudney, E.A. Gamified learning in higher education: A systematic review of the literature. Comput. Hum. Behav. 2018, 87, 192–206. [Google Scholar] [CrossRef]
Davis, K.; Sridharan, H.; Koepke, L.; Singh, S.; Boiko, R. Learning and engagement in a gamified course: Investigating the effects of student characteristics. J. Comput. Assist. Learn. 2018, 34, 492–503. [Google Scholar] [CrossRef]
Da Silva, J.P.; Silveira, I.F. A Systematic Review on Open Educational Games for Programming Learning and Teaching. Int. J. Emerg. Technol. Learn. 2020, 15, 156–172. [Google Scholar] [CrossRef]
Unity User Manual 2023.1. Available online: https://docs.unity3d.com/2023.1/Documentation/Manual/UnityManual.html (accessed on 5 January 2024).
Recognissimo Documentation. Available online: https://bluezzzy.github.io/recognissimo-docs/ (accessed on 5 January 2024).
Allen, M.L.; Hartley, C.; Cain, K. Do iPads promote symbolic understanding and word learning in children with autism? Front. Psychol. 2015, 6, 138. [Google Scholar] [CrossRef] [PubMed]
Lems, K. New Ideas for Teaching English Using Songs and Music. Engl. Teach. Forum 2018, 56, 14–21. Available online: https://americanenglish.state.gov/files/ae/resource_files/etf_56_1_pg14-21_0.pdf (accessed on 3 June 2025).
Peterson, M.; Jabbari, N. Digital Games in Language Learning: Case Studies and Applications; Routledge Taylor & Francis Group: Oxfordshire, UK, 2023; 202p. [Google Scholar]
Strange, W.; Shafer, V.L. Speech perception in second language learners: The re-education of selective perception. In Phonology and Second Language Acquisition; John Benjamins Publishing Company: Amsterdam, The Netherlands, 2008; pp. 153–191. [Google Scholar]
Bradlow, A.R.; Pisoni, D.B.; Akahane-Yamada, R.; Tohkura, Y. Training Japanese listeners to identify English /r/ and /l/: Long-term retention of learning in perception and production. Percept. Psychophys. 1997, 59, 87–100. [Google Scholar] [CrossRef]
Lengeris, A.; Hazan, V. The effect of native vowel processing ability and frequency discrimination acuity on the phonetic training of English vowels for native Greek listeners. J. Acoust. Soc. Am. 2010, 128, 3757–3768. [Google Scholar] [CrossRef]
Jenkins, J.; Setter, J. Teaching English pronunciation: A state of the art review. Lang. Teach. 2005, 38, 1–17. [Google Scholar] [CrossRef]
Munro, M.J.; Derwing, T.M. The functional load principle in ESL pronunciation instruction: An exploratory study. System 2006, 34, 520–531. [Google Scholar] [CrossRef]
Wang, Y.; Munro, M.J. Computer-based training for learning English vowel contrasts. System 2004, 32, 539–552. [Google Scholar] [CrossRef]
Lively, S.E.; Logan, J.S.; Pisoni, D.B. Training Japanese listeners to identify English /r/ and /l/: II. The role of phonetic environment and talker variability in learning. J. Acoust. Soc. Am. 1993, 94, 1242–1255. [Google Scholar] [CrossRef]
Pardo, J.S. On phonetic convergence during conversational interaction. J. Acoust. Soc. Am. 2006, 119, 2382–2393. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Language provider configuration panel.

Figure 2. Script configuration audio snippet as voice source.

Figure 3. Speech recognizer configuration block.

Figure 4. Script created for external speech recognition control.

Figure 5. Connection between game scenes. NOTE: The dotted dashed line between stop 2 and stop 12, as well as Mini-game 2 to stop 12, refers to the pattern that repeats between stop 2 and 12 for stops and mini-games 2 to 11.

Figure 6. Menu in main interface.

Figure 7. Stage map object image: (a) Stage of the Way of St. James that is being carried out and GO! arrow to advance. Medals achieved at that stage in the different skills; (b) stops where the student is in the stage and skills to be worked on.

Figure 8. Screenshot Pilgrim’s Zone.

Figure 9. (a) Capture of a stop; (b) example of a displayed information element.

Figure 10. Screenshots of the Library.

Figure 11. Sample score and medal on completion of mini-game.

Figure 12. Screenshot of perception mini-game: (a) identification; (b) odd one out; (c) variation; (d) connected speech.

Figure 13. Screenshot of production mini-game: (a) listen and repeat; (b) minimal pairs; (c) connected speech.

Figure 14. Screenshot of transcription mini-game: (a) missing symbols; (b) direct transcriptions; (c) reverse transcription.

Figure 15. A Transcription mini-game: (a) crossword; (b) world-puzzle.

Figure 16. Mean percentage of correct responses in pre-test and post-test with standard error bars.

Table 1. Scored required to earn e-SoundWay medals.

% Success	Points	Scallop Shell Medals
0% < X < 50%	3	None
50% ≤ X < 75%	5	Bronze
75% ≤ X < 90%	7	Silver
90% ≤ X ≤ 100%	10	Gold

Table 2. Student profile.

Number	Question
1	What is your occupation?
2	How many hours a day are you online?
3	How long do you devote to playing games online?
4	What kind of games do you play?

Table 3. Student Satisfaction.

Number	Question
5	Are the instructions to play clear enough?
6	Do you find the interactivity of the game is user-friendly?
7	Do you find it challenging enough?
8	Does e-SoundWay feel like a game?
9	Which was the most difficult game section?
10	Do you think you have improved your level in English Phonetics and Phonology?
11	How satisfied are you with e-SoundWay as a whole?
12	Would you recommend this game to others in your situation?

Table 4. Summary of user experience questionnaire results.

Question	Summary/Distribution	Interpretive Notes
Q1: Participant type	91% undergraduate	Homogeneous academic cohort
Q2: Daily internet use	64% (2–4 h), 18% (1–2 h), 18% (>4 h)	High internet usage
Q3: Daily gaming time	64% (<30 min), 27% (30–60 min), 9% (>2 h)	Low gaming engagement
Q4: Prior experience with serious games	21% Yes, 79% No	No significant impact on Q5
Q5: Instruction clarity	73% Clear, 18% Not always clear	Generally positive
Q6: Prior experience vs Q5 (Chi-square)	χ²(2, N = 91) = 1.84, p = 0.398	No significant association
Q7: Daily gaming vs engagement (ANOVA)	F(2, 88) = 4.67, p = 0.012 (Tukey: 30–60 min > rare)	Moderate gamers more engaged
Q8: Perceived game-likeness	55% Game-like, 27% Challenging, 9% Exercise-like	Subjective perceptions varied
Q9: Most difficult phonetic task	64% Transcription, 18% Production, 9% Perception	Friedman χ²(2) = 36.5, p < 0.001; transcription hardest
Q10: Perceived improvement	Strongly Agree (39%), Agree (42%), Neutral (12%), Disagree (6%), No answer (1%)	Positive overall; high agreement
Q11: Overall satisfaction	Very satisfied (36%), Satisfied (45%), Neutral (13%), Dissatisfied (4%), No answer (2%)	Correlated with Q10; generally high satisfaction
Q12: Willingness to recommend	Definitely (49%), Probably (36%), Not sure (10%), Probably not (4%), No answer (1%)	The majority would recommend the tool
Q10 vs Q11 (Pearson correlation)	r = 0.52, p < 0.001	Moderate, significant correlation

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lago-Ferreiro, A.; Gómez-González, M.Á.; López-Ardao, J.C. A New Serious Game (e-SoundWay) for Learning English Phonetics. Multimodal Technol. Interact. 2025, 9, 54. https://doi.org/10.3390/mti9060054

AMA Style

Lago-Ferreiro A, Gómez-González MÁ, López-Ardao JC. A New Serious Game (e-SoundWay) for Learning English Phonetics. Multimodal Technologies and Interaction. 2025; 9(6):54. https://doi.org/10.3390/mti9060054

Chicago/Turabian Style

Lago-Ferreiro, Alfonso, María Ángeles Gómez-González, and José Carlos López-Ardao. 2025. "A New Serious Game (e-SoundWay) for Learning English Phonetics" Multimodal Technologies and Interaction 9, no. 6: 54. https://doi.org/10.3390/mti9060054

APA Style

Lago-Ferreiro, A., Gómez-González, M. Á., & López-Ardao, J. C. (2025). A New Serious Game (e-SoundWay) for Learning English Phonetics. Multimodal Technologies and Interaction, 9(6), 54. https://doi.org/10.3390/mti9060054

Article Menu

A New Serious Game (e-SoundWay) for Learning English Phonetics

Abstract

1. Introduction

2. Materials and Methods

2.1. Tools for Game Creation

2.2. General Structure and Game Contents

2.2.1. Home Screen

2.2.2. Main Interface

2.2.3. Stop Screens

2.2.4. The Library as a Multimodal Resource

2.2.5. Scoring System and Game Progress

2.2.6. Structure of the Mini-Games

Perception

Production

Transcription

2.3. Methodology

2.3.1. Research Questions

2.3.2. Research Design

2.3.3. Participants

2.3.4. Player Progression and Level System

2.3.5. Instruments

2.3.6. Procedure

2.3.7. Data Analysis

3. Results

3.1. Pre-/Post-Tests

3.2. User Experience Questionnaire

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI