You are currently viewing a new version of our website. To view the old version click .
by
  • Juan Manuel Tejada-Triviño,
  • Elvira Castillo-Fernández and
  • Pedro García-Teodoro
  • et al.

Reviewer 1: Anonymous Reviewer 2: Anonymous Reviewer 3: Anonymous

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Journal: Electronics (ISSN 2079-9292)

Manuscript ID: electronics-3927417

Title: RDBAlert: An AI-driven Automated Tool for Effective Identification of Victims' Personal Information in Ransomware Data Breaches

Authors: José Manuel Tejeda Triviño , Elvira Castillo Fenández , Pedro García Teodoro , José Antonio Gómez Hernández *

 

 

The article discusses RDBAlert, an AI-based application that uses multimodal analysis to detect ransomware leaks and gather personal data. The manuscript provides a means for victims and organizations to identify compromised information and minimize the impact of ransomware. After reviewing the manuscript, my specific comments are below:

 

 

  1. Abstract:
  • You should explicitly state how RDBAlert is novel compared to existing tools.
  • It lacks quantifiable performance metrics (%) and numerical results (e.g., precision, recall, or accuracy).
  • The keywords are too limited.

 

  1. Correct the numbering of the sections, e.g., the Introduction section should start with 1, not “zero”.

 

  1. Introduction:
  • There is no distinct research limitation statement. You must specify what previous works fail to achieve and how RDBAlert addresses this gap.
  • Many references are industry reports. It is important to include more peer-reviewed academic studies on ransomware data leaks and breach detection.
  • It is important to outline the contributions that you made, summarize them as bullet points.

 

  1. Related Work: Table 1 does not critically compare existing approaches with RDBAlert in terms of accuracy, scalability, and applicability.

 

 

  1. Methodology:
  • Explain how crawler efficiency, dark web coverage, and ethical considerations are technically handled? Add algorithmic detail.
  • Add detail training for module mentions YOLOv11 and MiniCPM in terms of (hyperparameters, dataset splits, or preprocessing steps).
  • Add error analysis, it is unclear if bias or errors in OCR/LLM-based PII extraction were measured systematically.
  • The scalability tests in Elasticsearch/Kibana(e.g., query latency on TB-scale datasets) are missing.
  • The methodology should clarify how the system avoids misuse (e.g., downloading illicit material, GDPR compliance). Also, the methodology should include a flow diagram of the entire pipeline, including input, analysis, validation, and reporting stages.

 

  1. Results:
  • Add quantitative comparison with ground truth. For example, how many detections were true vs. false positives/negatives?
  • Provide more evidence on the accuracy claims (such as a 99.54% MiniCPM), it lacks statistical validation (confidence intervals and cross-validation).
  • Figures 9-17 do not provide scientific information..
  • Apply trends, temporal patterns, or association with attack groups.

 

  1. Conclusion: According to the paper, RDBAlert is effective but does not establish any performance standards. Please include clear criteria for success.

 

  1. There are a few minor grammar and typo issues.
Comments on the Quality of English Language

There are a few minor grammar and typo issues.

Author Response

Please, see attached file with the reponses to comments

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

The manuscript presents a technically interesting approach for automating the identification of personally identifiable information (PII) within ransomware data leaks using an AI-driven architecture. The topic is timely, and the overall structure of the paper is clear. However, several important issues require further attention before the work can be considered for publication. These concerns mainly relate to ethical implications, model evaluation transparency, and clarity in presentation. Addressing the following points would substantially improve the paper’s academic rigor, readability, and responsible research framing:

  1. The authors should discuss the potential dual-use risks of the proposed RDBAlert system — while it aims to protect victims and support incident response teams, it could also unintentionally provide malicious actors with an organized and easily accessible source of information about whose data has been leaked and where. A dedicated discussion on possible misuse scenarios and the corresponding technical, ethical, and legal safeguards would significantly strengthen the paper’s responsibility and impact assessment.
  2. The paper briefly states that the YOLOv11-based detector achieved full detection of PII-containing images at a 0.60 confidence threshold, yet no quantitative metrics (e.g., precision, recall, mAP) or dataset details are provided. For reproducibility and scientific transparency, the authors should include a comprehensive evaluation of the model’s performance, training dataset description, and comparative baselines.
  3. Figure captions should be more descriptive, explaining what is shown and why it is significant.
  4. Several sections (especially 3.1–3.3) contain redundant or overly descriptive text; the manuscript could be streamlined for clarity.
  5. Typographical and stylistic polishing is needed in sections 2 and 5 (e.g., consistent use of “data breach” vs. “data leak”).

Author Response

Please, see attached file with the reponses to comments

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

Summary:

This paper introduces RDBAlert, an AI-driven tool designed to automate the detection and extraction of Personally Identifiable Information from ransomware data leak repositories. The authors demonstrate RDBAlert's effectiveness by applying it to five real-world ransomware leak datasets, presenting statistical analysis of the extracted PII. However, the manuscript has several weaknesses regarding quantitative evaluation, baseline comparisons, and security that must be addressed.

Comments:

1. Lack of Quantitative Performance Evaluation: The paper reports the performance of individual components, such as the 99.54% accuracy for MiniCPM in detecting ID numbers, but lacks a comprehensive quantitative evaluation of the entire RDBAlert system. Moreover, standard information retrieval metrics such as Precision, Recall, and F1-score for overall PII detection are absent.

2. Absence of Direct Comparison with Baselines: While the paper lists several existing data breach analysis platforms in Table 1, it does not perform a direct, quantitative performance comparison against any of them. Stating that existing tools have "limitations" is insufficient to prove the technical superiority of RDBAlert and a direct comparison of PII detection performance (e.g., accuracy, processing speed) against at least one existing tool on the same dataset is necessary.

3. Inadequate Analysis of Computational Cost and Scalability: The paper claims the system is scalable but provides no concrete analysis of the associated computational costs. There is no information on the average wall-clock time required to process a large-scale dataset or the necessary computing resources (CPU/GPU specifications, memory usage).

4. Lack of Explanation on Security and Ethical Considerations: RDBAlert is a system that collects and processes vast amounts of highly sensitive, stolen PII, yet the paper provides almost no detail on the security architecture of the tool itself. A description of the mechanisms for securely storing, managing, and controlling access to the collected leak data is needed.

5. Lack of an Ablation Study: The system is composed of a pipeline of several AI components, but the paper does not isolate the contribution of each component to the overall performance. An analysis comparing the proposed YOLO+MiniCPM pipeline against a simpler baseline, such as using a single traditional OCR tool like Tesseract, is needed.

Questions: 
1. Please report the Precision, Recall, and F1-score for the end-to-end RDBAlert system on PII detection.
2. Provide a quantitative performance comparison against at least one of the existing platforms listed in Table 1, using an identical dataset.
3. State the wall-clock time, GPU/CPU specifications, and memory usage required to process a large-scale dataset.
4.  Describe in detail the security architecture of the RDBAlert system itself, including policies for secure data storage and access control of the collected PII.
5. Add an ablation study that compares the performance of your proposed AI pipeline (YOLO+MiniCPM) with that of a single OCR tool like Tesseract.

Author Response

Please, see attached file with the reponses to comments

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

I would like to thank the authors for their careful and thorough revisions. All my previous comments have been addressed thoughtfully, and the manuscript has improved substantially in both clarity and depth. 

I appreciate the authors’ effort and responsiveness throughout the review process. The paper, in its current form, meets the publication standards of the journal.

I recommend acceptance of the manuscript.