Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Retrieving Memory Content from a Cognitive Architecture by Impressions from Language Models for Use in a Social Robot

Appl. Sci. 2025, 15(10), 5778; https://doi.org/10.3390/app15105778

by Thomas Sievers^*

and Nele Russwinkel

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3:

Boris Aberšek

Appl. Sci. 2025, 15(10), 5778; https://doi.org/10.3390/app15105778

Submission received: 19 April 2025 / Revised: 17 May 2025 / Accepted: 20 May 2025 / Published: 21 May 2025

(This article belongs to the Special Issue Advances in Cognitive Robotics and Control)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This paper presents an innovative approach combining cognitive architecture (ACT-R) with large language models (LLMs) and vision language models (VLMs) to endow social robots with human-like memory and recall capabilities, enhancing the contextual relevance and credibility of human-robot interaction. Through ACT-R's declarative memory, the robot stores real-time perceptual data (such as conversation keywords, visual features), and utilizes its procedural memory for association-based memory retrieval. The system can dynamically invoke personalized memory content to enhance LLM-generated responses, reducing "hallucination" issues. Experiments demonstrate two application scenarios: 1) text-based dialogue for train station information retrieval; and 2) memory triggering based on visual impressions, validating the framework's effectiveness in enhancing LLMs' contextual understanding and response accuracy. Generally speaking, this paper is well-motivated and easy to follow, and I do not have any major concern. Instead, I would like to provide some suggestions.

Please reorganize the Methods section to clarify the workflow, for instance, how keywords are processed by ACT-R.
Further discuss how the ACT-R integration improves over RAG.
Use some quantitative metrics like precision,recall, response time.
Conduct ablation study

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

The article presents an innovative integration of ACT-R cognitive models with social robots using LLMs/VLMs, but it would benefit from deeper experimental validation and a more critical discussion of limitations.

In the related work section, the authors review many sources but miss a critical synthesis highlighting the research gap
The methods are technically detailed; however, no quantitative evaluation metrics or performance benchmarks are presented
The discussion correctly identifies limitations, but the latency impact needs deeper quantitative assessment
Only 4 references from 2025 are used; total references are 52, which shows good breadth, but the inclusion of more recent empirical studies would strengthen the argument.
In the Discussion section, “We testet this system …” should read “tested” (p. 12, l. 439)
“we used it’s remote interface” misuses the apostrophe; it should be “its” (p. 6, l. 224)
Reference 35 redundantly states “Proceedings of the Proceedings of the 17th ICAART”; delete the second “Proceedings”
The authors are kindly encouraged to conduct thorough proofreading to correct minor typographical and grammatical inconsistencies throughout the manuscript.

Comments on the Quality of English Language

The authors are kindly encouraged to conduct thorough proofreading to correct minor typographical and grammatical inconsistencies throughout the manuscript.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 3 Report

Comments and Suggestions for Authors

The work is meaningfully completed, concisely and consistently written and cannot be criticised for any shortcomings. The only thing that bothered me was the somewhat modest methodological part, from which it is very difficult to get a uniform conclusion and further course of research. Therefore, I suggest that the whole, especially the empirical part, be written down a little more systematically, because such an approach is very close to computer scientists (e.g. in the form of a thought pattern, diagram or flow chart). For example, you mention two examples in various places, in the first case only briefly (123 employed remained the same in both cases), and in the following (line 279) in more detail - are these the same or different experiments?

A few minor comments:

Explain in a little more detail why - "126 of each image in three keywords" or "(127 human question or problem was also expressed in three keywords). Why three? Is this related to processing time or processing power?
It would be interesting to add whether the system works in real time or what the lag is.
To explain how you combined the different "programming languages", you mention LISP, C++, Python, ... then LLM and VLM?
"401-402 In general, there are far more options available for a programmatic implementation of cognitive models with ACT-R or similar frameworks in combination with social robots" - Please clarify this claim a bit.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

The authors should correct the duplicated DOI prefix in the reference 14; it currently reads “https://doi.org/https://doi.org/...” and should be revised to a single valid DOI link.

Author Response

Comments 1: The authors should correct the duplicated DOI prefix in the reference 14; it currently reads “https://doi.org/https://doi.org/...” and should be revised to a single valid DOI link.

Response 1: Thank you very much. We corrected the duplicated DOI prefix. It should be alright now.

Article Menu

Retrieving Memory Content from a Cognitive Architecture by Impressions from Language Models for Use in a Social Robot

Further Information

Guidelines

MDPI Initiatives

Follow MDPI