Next Article in Journal
ML-PSDFA: A Machine Learning Framework for Synthetic Log Pattern Synthesis in Digital Forensics
Previous Article in Journal
A Markov Chain Replacement Strategy for Surrogate Identifiers: Minimizing Re-Identification Risk While Preserving Text Reuse
Previous Article in Special Issue
A Modular Prescribed Performance Formation Control Scheme of a High-Order Multi-Agent System with a Finite-Time Extended State Observer
 
 
Article
Peer-Review Record

Replay-Based Domain Incremental Learning for Cross-User Gesture Recognition in Robot Task Allocation

Electronics 2025, 14(19), 3946; https://doi.org/10.3390/electronics14193946
by Kanchon Kanti Podder 1, Pritom Dutta 2 and Jian Zhang 2,*
Reviewer 1: Anonymous
Reviewer 2:
Reviewer 3: Anonymous
Electronics 2025, 14(19), 3946; https://doi.org/10.3390/electronics14193946
Submission received: 8 September 2025 / Revised: 29 September 2025 / Accepted: 2 October 2025 / Published: 6 October 2025
(This article belongs to the Special Issue Coordination and Communication of Multi-Robot Systems)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This paper proposes a memory-efficient replay-based domain incremental learning (DIL) framework, ReDIaL, that adapts to sequential domain shifts while minimizing catastrophic forgetting. The proposed approach employs a frozen encoder to create a stable latent space and a clustering-based exemplar replay strategy to retain compact, representative samples from prior domains under strict memory constraints.

  • To highlight the main contributions of this paper, the authors should compare the proposed algorithm with existing methods.
  • How to define the approximately equal sign in eq. (6)?
  • How can one determine the parameters $B$ and $\eta$ to optimize the performance of the proposed method?
  • The comparison results are suggested to show in a bar chart instead of Table 3 to show the advantages more intuitively.
  • The potential applications of the proposed method are suggested to be presented in the conclusion section. Could the proposed method be applied to the research titled "Impedance Learning for Human-Guided Robots in Contact with Unknown Environments"? Please discuss this.
  • There are typos and grammatical errors. Please correct them in the revised version.

Author Response

Please see the attachment.

 

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

This paper presents ReDIaL, a replay-based domain-incremental learning (DIL) approach to cross-user gesture recognition in human–robot interaction (HRI). It integrates a frozen encoder to establish a fixed latent space, clustering-based exemplar selection (saved as latent embeddings) under a memory limit, and balanced replay between existing-domain data and saved exemplars.

One in-house dataset as source domain; 20 subject-as-domain targets are from NATOPS (6 classes). The authors achieve 97.34% union accuracy, bigger than pooled fine-tuning (91.87%), incremental fine-tuning (80.92%), and vanilla Experience Replay (94.20%), and approximate a joint-training upper bound (98.18%).

Problem is practical and timely for real-world deployments. Method is good and results are good. To achieve publishable level of rigour, the paper requires more explicit methodology explanation, more in-depth statistical treatment (uncertainty, order robustness), ablations for controlling contributions, and stricter structure

 

Introduction

Introduce gesture-based HRI in noisy/safety-critical environments and introduce domain shift and catastrophic forgetting as bottlenecks of deployment. Put the work in a subject-as-domain DIL setting and enumerate contributions.

  • Reach precise gap above. Cement the trajectory from general HRI to subject-as-domain DIL with fixed label space.
  • Following two context paragraphs, caption the scenario, its boundaries, and the two objectives (adaptation + retention).
  • Contributions ought to map more closely to method sections. Provide 3–4 contributions mapping one-to-one to Sections 4.x (frozen encoder → 4.1; cross-modal model → 4.2; clustering replay → 4.3.2; balanced replay schedule → 4.3.3; evaluation protocol → 5.x).
  • Paper roadmap. Include one-sentence structure overview at end of section.

 

Literature Review

The review includes gesture HRI, continuous learning/DIL, and rehearsal approaches.

  • Depth on DIL & CL best practices. Lacking explicit description of Backward/Forward Transfer, order sensitivity, and suggested reporting (mean±SD across orders).
  • Insert: Brief summary paragraph of standard CL metrics (avg. acc., BWT/FWT, forgetting) and significance of different domain orders.

 

  • Exemplar selection landscape. Justification for clustering is presented, yet missing are comparisons with herding (iCaRL), k-center/k-medoids, reservoir, and class-balanced ER (ER-ACE) provided.
  • One comparative paragraph: boundary-seeking (herding/gradient-matching) vs order-agnostic (reservoir) vs coverage-seeking (k-medoids) and why the latter is appropriate for fixed per-class budgets.

 

  • Multimodal & skeleton literature. The posture stream and transformer encoders warrant references to skeleton-based action recognition (e.g., ST-GCN family) and recent video transformers; also refer to the keypoint extractor (OpenPose/MediaPipe/HRNet). 2–3 references + a rationale sentence (skeletons counter background/appearance changes; RGB has context).
  • Cross-subject domain shift evidence. Cite reference works/datasets that report cross-subject drops to invoke "subject-as-domain."
  • Privacy nuance. Cite feature/embedding inversion and membership inference literature and hint at mitigations (quantization, encryption, projection).

 

Methodology

Section 4 handles latent embedding through a frozen encoder, cross-modal recognition model, exemplar selection based on clustering, and balanced replay.

  • Add layers/heads, dropout, activation, normalization, query count, hidden size.
  • Posture stream details. Detector (name/version), joint set, confidence thresholds, missing joints handling, temporal alignment with RGB, skeleton rendering pipeline (resolution, line thickness/colors), and ensure skeletons are generated from the same RGB stream (no leakage).
  • Exemplar selection clarity. Make budgets per-class (( indicate the clustering algorithm (k-medoids vs k-means), distance metric (L2 on normalized embeddings), init, max iterations, seeds. Indicate whether memory scales linearly with domains or a global threshold activates pruning (and how).
  • Replay batching. Indicate class-stratified 1:1 mixing: μ/∣Y∣ from exemplars and from current domain; batch size B=2μ.
  • Optimization schedule & early stopping. Steps/epochs per domain, validation set under observation (current domain or combined), when the LR scheduler is re-started, and precise early-stopping criterion.
  • Baseline parity. For ER, is the memory equalized in bytes or item number? If ER is storing raw clips, specify total MB and iso-memory comparisons; if ER has been modified to latent storage, state so clearly.
  • Uncertainty estimation. Give: 95% binomial CIs (Clopper–Pearson or Wilson) by domain and union/pooled sets. For differences in methods on a shared test set, employ paired bootstrap (BCa) across samples to give Δaccuracy CIs.
  • Paired significance tests. McNemar's test by domain (paired mistakes) and a Mantel–Haenszel stratified analysis over domains. Give effect sizes (e.g., Cohen's h for union/pooled accuracies). Include: ≥5 random subject orders (fixed across methods). Report mean±SD (or median+IQR) for terminal union accuracy, A, H, and mean forgetting F. Compare ReDIaL vs ER between orders with Wilcoxon signed-rank and report Cliff's delta.
  • CL metrics completeness. Include: Backward Transfer (BWT) and Forward Transfer (FWT) with CIs between orders; compute all means on proportions (only report as %).
  • Small N & ceiling effects. Some target scores are 100% with N=24 in each domain; CIs will be broad. CIs stated explicitly and warning of ceiling effects; use optionally domain-level (subject) bootstrap to honor within-domain dependence.

 

Discussion

Captures benefits, prioritizes stability–plasticity tradeoff, and exhibits latent rehearsal usefulness.

 

  • Threats to validity. 6-class vocabulary, single pair of datasets, minute per-domain test sets, fixed encoder dependency, skeletons from same RGB (not independent modality), order effects—each with implications and remediation forthcoming.
  • Privacy nuance. Small threat model for embeddings (inversion/membership inference) and light mitigations appropriate for on-robot adaptation (8-bit quantization, subspace projection, at-rest encryption), with accuracy–efficiency trade-off observation.
  • Generalization path. Pathways to more successful gesture vocabularies, cross-dataset comparisons, adaptive encoder fine-tuning, and better informed exemplar selection criteria (uncertainty/gradient-based coresets).

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

The proposed article provides a technique that uses a gesture recognition approach to robotics task allocation. The proposed method utilizes a fixed encoder to establish a stable latent space and implements a clustering-based sample replay technique to preserve short, accurate representations from previous domains within stringent memory limitations. Firstly, the introduction section provides a detailed description and highlights the significant contributions of this work. The second section provides the most important recent literature surveys. The third section presents the problem statements related to incremental gesture learning within a subject-as-domain framework. The fourth section illustrates the proposed approach that contains the algorithms as well. The fifth section provides the experimental setup description. The sixth section provides the result of the proposed study. The seventh section provides the discussion section. Lastly, the eighth section provides the concluding remarks. The presented article contains all information required for the well-structured presentation of the manuscript. But future research trends are missing in the article. Thus, it is recommended to accept this article for publication after the addition of the future research directions in the discussion section.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

This paper is well revised and can be accepted.

Reviewer 2 Report

Comments and Suggestions for Authors

The authors have addressed all my comments and concerns.

Back to TopTop