Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

From Games to Understanding: Semantrix as a Testbed for Advancing Semantics in Human–Computer Interaction with Transformers

Electronics 2025, 14(17), 3480; https://doi.org/10.3390/electronics14173480

by Javier Sevilla-Salcedo^*

, José Carlos Castillo Montoya

, Álvaro Castro-González

and Miguel A. Salichs

Reviewer 1:

Hang Yu

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Electronics 2025, 14(17), 3480; https://doi.org/10.3390/electronics14173480

Submission received: 24 July 2025 / Revised: 21 August 2025 / Accepted: 27 August 2025 / Published: 31 August 2025

(This article belongs to the Special Issue Feature Papers in Artificial Intelligence)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This manuscript presents Semantrix, an interactive, web-based semantic word-guessing game, as a testbed for evaluating transformer-based models in human-computer interaction. The work addresses an important gap in assessing semantic understanding in open-ended, creative user interactions, and the study is carefully designed. Strengths: 1.Innovative and Timely Approach: The manuscript introduces a novel and well-motivated testbed that goes beyond static NLP benchmarks, enabling real-time, ecological assessment of transformer models’ semantic capabilities. The combination of user-driven gameplay and dynamic hint generation is compelling and relevant for advancing both research and practical applications in HCI and NLP. 2.Rigorous Experimental Design and Analysis: The authors employ a pre-registered 2×2 factorial study with clear hypotheses, robust methodology, and thorough statistical analysis. Both quantitative and qualitative results are well-presented, supporting the conclusions regarding the benefits of advanced semantic modeling and adaptive feedback. Weaknesses: 1.Limited Generalizability of the Sample: The participant pool, while diverse in some respects, skews towards highly educated and technologically literate users (majority with university degrees, high self-reported tech familiarity). This may limit the generalizability of the findings to broader populations, including those less familiar with digital games or NLP applications. 2.Insufficient Detail on Minor Aspects of the User Experience: While the system architecture and experimental design are described in depth, certain aspects of the user experience—such as the clarity of the instructions, handling of ambiguous guesses, and specific examples of feedback given to users—could benefit from more concrete description or illustration. Suggestions: 1.Discuss Generalizability and Limitations More Explicitly: Please expand your discussion of the participant sample and its potential limitations, particularly regarding educational background and digital literacy. Consider commenting on how the system might perform or be received with other user groups, and suggest directions for future research to address this. 2.Enhance Clarity with Additional User Experience Examples: Consider adding more detailed examples or screenshots of the user interface, the types of feedback/hints provided, and how ambiguous or creative guesses are handled in practice. This would help readers better appreciate the practical operation of the platform and the qualitative user experience. 3.To further strengthen the related work section, I recommend that the authors include citations to several recent and highly relevant publications from leading venues. In particular: (1) “CAMeL: Cross-Modality Adaptive Meta-Learning for Text-Based Person Retrieval,” which explores dynamic cross-modal adaptation; and (2) “A Unified Model for Haptic Experience,” which contributes valuable insights into unified modeling of multi-modal user experiences. Citing these works will enhance the positioning of your paper within the current research landscape of adaptive semantic modeling and interactive systems.

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

The paper proposes Semantix, a web-based semantic word guessing platform that uses a semantic similarity score between words and generates dynamic, context-sensitive hints in real time. The results show that the use of transformer-based semantic modeling and adaptive hint generation improves user engagement, motivation, and enjoyment.

However, there are still some issues to be corrected/added:

Figure 2 Main loop should be corrected as in the current state the flow is not clearly visible. From initialization is goes to: ask for new word or is user lost or hint? Also try to change the first box "Ask for a New Word" with something like "Input guess word" as you have in the text. The box "Processing embeddings" could also be renamed to "Compute semantic similarity" as you have in the text. There should also be a box where the secret word is generated/chosen (and add arrows to the "Compute semantic similarity").
In Table 1, use "Sentence Transformer" instead of paraphrase-multilingual-mpnet-base-v2, as you clearly written in the text you will be using this.
Line 544 (Subchapter 6.3 Participants and Recruitment): "..as women (31,74%), with 10 men (25%), and one participant referring not to disclose their gender." Correct the sentence and check if % are OK - I'm guessing 74 % women, 25 % men, but 1 participant is not 1 %. It should be 73.81 %, 23.81 % and 2.38 %.
Table 2: To improve the table, it would be good to write in the caption (or add in header) which values are better (higher, lower), and can also bold them.
Is your application only for English language or can be extended to other languages as well? For example, using mBERT (multilingual BERT) instead of BERT.

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

This manuscript introduces Semantrix, an interactive semantic word-guessing game designed not only for entertainment but as a testbed to evaluate and enhance semantic understanding in human-computer interaction (HCI). The authors aim to assess whether Transformer-based semantic embeddings (e.g., Sentence Transformers) combined with adaptive, context-sensitive hint generation (e.g., via GPT-4) can meaningfully improve user engagement, motivation, and experience compared to more traditional approaches (e.g., Word2Vec embeddings and static feedback). To investigate this, the authors conducted a preregistered 2×2 between-subjects factorial study (N=42) that independently manipulated the semantic model and hinting mechanism. They analyzed both behavioral data (e.g., number of rounds, play time, hint use) and validated self-report questionnaires (UES-SF, FunQ, IMI) to assess user engagement, fun, and intrinsic motivation.
The manuscript presents an innovative and well-structured study that effectively combines semantic models with real-time adaptive feedback in an interactive environment. The integration of computational modeling, interface design, and human factors research is commendable, and the commitment to open science practices (e.g., deployment via Hugging Face Spaces and Gradio) adds transparency and reproducibility. However, the manuscript still has the following shortcomings, which I recommend the authors address in revision:
Methodology:
1.The sample size across conditions is highly imbalanced (e.g., Word2Vec + No Hints group only n = 5), which may affect statistical power and the comparability of results. This issue should be more thoroughly discussed in the Limitations section.
2.Effect sizes (e.g., partial η², Cohen’s d) are not reported; including them would improve the interpretability of the findings and facilitate meta-analytic comparison.
3.While the EMA-based hint-triggering strategy is reasonable, the paper does not compare it with alternative approaches (e.g., rule-based triggers, time-based triggers, user-initiated hints). A brief comparison would strengthen the justification for this method.
Analyses:
1.The main results table (Table 2) should include confidence intervals or standard errors, particularly for the self-report metrics, to improve interpretability.
2.Non-significant trends (e.g., the absence of a main effect of model type on intrinsic motivation) could be discussed in greater depth to clarify their theoretical implications.

This manuscript presents a timely, methodologically sound, and highly relevant study at the intersection of NLP, HCI, and educational game design. With minor clarifications and elaborations as outlined above, it will make a strong contribution to the field.

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

I appreciate your thorough revisions and am satisfied that all my comments have been addressed.

Article Menu

From Games to Understanding: Semantrix as a Testbed for Advancing Semantics in Human–Computer Interaction with Transformers

Further Information

Guidelines

MDPI Initiatives

Follow MDPI