Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Enhancing Retrieval-Oriented Twin-Tower Models with Advanced Interaction and Ranking-Optimized Loss Functions

Electronics 2025, 14(9), 1796; https://doi.org/10.3390/electronics14091796

by Ganglong Duan, Shanshan Xie^* and Yutong Du

Reviewer 1:

Hiwa Asadpour

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Reviewer 4:

Ahmad Ali

Electronics 2025, 14(9), 1796; https://doi.org/10.3390/electronics14091796

Submission received: 25 March 2025 / Revised: 24 April 2025 / Accepted: 25 April 2025 / Published: 28 April 2025

(This article belongs to the Section Computer Science & Engineering)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This paper presents an optimized twin-tower model for text retrieval that aims to address limitations in traditional models through improved feature interaction and loss function design. Here's a critical review of the paper:

Strengths:

Novelty and Innovation:

The introduction of an early interaction layer using cross-attention mechanisms is a significant innovation.
The ranking-optimized loss function is a novel approach that addresses overfitting and metric dependency issues.

Performance Improvements:

The model demonstrates substantial improvements in Top-K accuracy metrics across multiple datasets (NQ, TQA, WQ).
It achieves these gains while maintaining low latency (17ms retrieval time), balancing effectiveness and efficiency.

Comprehensive Evaluation:

The authors conduct extensive experiments on multiple benchmark datasets.
Ablation studies effectively demonstrate the contribution of individual components to overall performance.

Practical Implications:

The model maintains compatibility with vectorized calling mechanisms, making it more practical for real-world applications compared to some other advanced models.

Weaknesses and Areas for Improvement:

Literature Review:

The related work section could be more comprehensive and better organized.
Some references seem tangential to the core topic of text retrieval.

Methodological Details:

While the paper describes the general approach, more technical details about the implementation would be beneficial.
The exact configuration of the cross-attention mechanism and loss function could be explained more thoroughly.

Comparison with State-of-the-Art:

Although the model outperforms several benchmarks, a more detailed comparison with recent advancements in the field would strengthen the paper.
The performance gap between this model and ColBERT on the NQ dataset deserves more discussion.

Generalization:

The paper primarily focuses on question-answering datasets. Testing on other types of text retrieval tasks would enhance the generalizability claims.

Reproducibility:

While the datasets are publicly available, more information about hyperparameters, training procedures, and code availability would improve reproducibility.

Theoretical Analysis:

A deeper theoretical analysis of why the proposed modifications work so effectively would add scientific rigor to the paper.

Future Work:

The discussion of potential future directions is somewhat limited. More concrete suggestions for building upon this work would be valuable.

Writing and Presentation:

Some sections could benefit from more concise writing and better organization.
Figures and tables are referenced but not always clearly explained in the text.

Overall, this paper makes a meaningful contribution to the field of text retrieval by addressing key limitations in traditional twin-tower models. The proposed innovations show promise and achieve impressive results. However, the paper could be strengthened with more detailed methodology, broader comparisons, and a more thorough theoretical analysis. With some revisions and expansions, particularly in the areas of reproducibility and generalization, this work could become a more impactful contribution to the field.

Comments on the Quality of English Language

needs to be improved

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

Authors, congratulations on your work. This manuscript contributes to neural text retrieval by enhancing twin-tower models with early cross-attention interaction and a ranking-optimized loss function. The approach addresses key limitations of existing models and is supported by robust experiments on multiple datasets.

However, the manuscript would benefit from improved clarity in technical writing, a more concise and structured presentation of the related work, and an improved presentation of results and figures.

I reviewed the entire manuscript and provided specific recommendations throughout the text. I hope my suggestions assist the authors in further strengthening this work.

1 - I suggest briefly presenting a flowchart in the introduction, which shows the work stages. A steps flowchart enriches and also facilitates the reader to understand the method developed.

2 - Some descriptions are overly repetitive, particularly in the opening paragraphs of Sections 1, 3.1.2, and 3.1.3. A careful revision is recommended to eliminate redundancies and improve the overall clarity and flow of the text, especially in the Introduction and the early interaction module description.

3 - In the Introduction, clearly describe the originality of the work. What is novel about the proposed retrieval architecture? Justify the differential of the enhanced twin-tower model compared to other retrieval models such as DPR, ANCE, and ColBERT.

4-In section 2, The “Related Work” is overly long and includes extensive explanations of well-established methods such as TF-IDF, BM25, and RAG, which dilute the focus of the literature review. I recommend condensing this section by reducing verbosity and concentrating on prior works that are directly comparable to the proposed approach.

5 - In the results section, although the reported improvements are promising, the manuscript lacks statistical analysis to validate them (e.g., significance testing, confidence intervals, or t-tests). I suggest including such an analysis to reinforce the robustness and reliability of the performance gains over baseline models.

6 - It’s unclear whether the code or trained model will be publicly available. I suggest adding a link to a repository with the code and/or experimental data (if it is publicly available).

7 - The manuscript lacks a “Limitations and Future Work” subsection at the end of the conclusion, which is typically expected in MDPI-applied research articles. I suggest adding a final paragraph or subsection discussing the study's main limitations and outlining potential directions for future work.

8 -Regarding the References section, several entries are incomplete, improperly formatted, or duplicated (e.g., repeated authors or missing journal/conference details). I recommend thoroughly reviewing all references to ensure compliance with IEEE/MDPI formatting standards, including full author names, complete titles, venue, volume, issue, page range, DOI, and publication year.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

The paper proposes an optimised Twin-Tower Model architecture for the shortcomings of the traditional twin-tower model in feature interaction and loss function design. The core innovation points include:

Strong Points：

The approach improves the model's ability to understand the semantic relationship between the query and the document by introducing an early interaction mechanism that generates a simulated query representation at the document side and fuses it into the document features.
The paper introduces a novel loss function designed to optimize the relative similarity ranking between positive and negative sample pairs.
Comprehensive experiments were conducted on several standard datasets such as NQ, TQA, WQ, etc., and Top-K accuracy significantly outperformed benchmark models such as BM25, DPR, ANCE and ColBERT.

Weaknesses：

The symbols in Equation (12) are vaguely defined, and it is suggested to supplement the physical meaning of each variable in the equation and the basis of parameter setting.
Suggests adjusting and standardising the format of images throughout the text, e.g. the top part of figure 3 is incomplete.
Some of the terminology is repetitive (e.g., the paragraph comparing the ‘twin-tower model’ with the ‘cross-coder’), and redundancies need to be streamlined to improve readability.
The paper has insufficient references to the theoretical underpinnings of cross-attention mechanisms and loss function design, and lacks relevance analyses to classical theories. It is recommended that the following papers be added [1], [2]. Also the paper does not adequately discuss the model's capabilities in multimodal (e.g., mixed graphic) or graph-structured data, limiting the generalisability of application scenarios. Suggested citations for recent semantic-driven approaches in heterogeneous networks.

[1] Ashish Vaswani, Noam Shazeer, Niki Parmar, et al. Attention is All You Need. Advances in neural information processing systems, 2017, 30.

[2] Yi Tay, Mostafa Dehghan, Dara Bahri, et al. Efficient Transformers: A Survey. ACM Computing Surveys, 2022, 55(6): 1-28.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 4 Report

Comments and Suggestions for Authors

The manuscript should have a section to describe state-of-the-art techniques. This section should also outline a tabular sketch so that it is easy to identify what’s missing in the literature and how this paper addresses that. This section can be derived from contents described in the introduction section.
How does the cross-attention layer avoid introducing significant computational overhead compared to late interaction models like ColBERT?
The results, especially the comparisons between the proposed algorithms, should be discussed more detailed. What are the insights? Why the proposed strategy/mechanism can achieve good results? All the analysis is kind of summary of the results in the tables and figures.
What specific ranking loss is used, and how is it optimized during training?
Is it adaptive to class imbalance or noise in hard negatives?
There needs to be citation of recent papers on this topic and revise the literature section with slight Incorporation of recent ideas, for e.g., add the following in related section to improve the paper quality (Energy Efficient Real-time Tasks Scheduling on High Performance Edge-Computing Systems using Genetic Algorithm)
Pattern the motivation behind using this method to explain in the introduction. Why the existing schemes failed? Does no study try to address this aspect before? If yes, this has to be mentioned.
There are some improvements required for the clarity of the diagrams drawn. More specifically the fig.1 needs to provide with good quality and high resolutions.
What is the trade-off between early interaction strength and embedding independence?
Does introducing early cross-attention reduce the modularity of the twin towers.
The technical details do make much intelligibility, so please provide some strong technical details in the main methodology. The consumed time in training procedure of the proposed method and the compared algorithms can be listed.

Comments on the Quality of English Language

English should be improved.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

still it needs more clarification of the terminologies and approach

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

After reviewing the revised version of the manuscript, I confirm that the authors have adequately addressed all my previous comments and suggestions. The manuscript has been significantly improved in clarity, scientific contribution, and overall quality.

Therefore, I recommend the manuscript for publication in Electronics.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

The authors have well addressed my concerns and I accordingly suggest an accept.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 4 Report

Comments and Suggestions for Authors

No more comments.

Comments on the Quality of English Language

The English could be improved to more clearly express the research.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Article Menu

Enhancing Retrieval-Oriented Twin-Tower Models with Advanced Interaction and Ranking-Optimized Loss Functions

Further Information

Guidelines

MDPI Initiatives

Follow MDPI