Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Multi-Head Hierarchical Attention Framework with Multi-Level Learning Optimization Strategy for Legal Text Recognition

Electronics 2025, 14(10), 1946; https://doi.org/10.3390/electronics14101946

by Ke Zhang¹

, Yufei Tu²

, Jun Lu^1,3, Zhongliang Ai^4,*, Zhonglin Liu⁵, Licai Wang¹ and Xuelin Liu⁶

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Electronics 2025, 14(10), 1946; https://doi.org/10.3390/electronics14101946

Submission received: 31 March 2025 / Revised: 6 May 2025 / Accepted: 9 May 2025 / Published: 10 May 2025

(This article belongs to the Special Issue Image Processing Based on Convolution Neural Network: 2nd Edition)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

With the rapid increase in the amount of legal text data and their increasing demands for intelligent processing, multi-label legal text recognition is more and more important. This article proposes a multi-head hierarchical attention framework suitable for multi-label legal text recognition. The comparative experiments on dataset CAIL2021, AAPD and WOS show that the proposed method performs better than some regular methods.
After careful reading, I have the following questions and opinions:
1)The timeliness of some comparative experiments is insufficient. In the comparative experiments on the legal text dataset CAIL 2021, some of the comparison methods are relatively old and may not fully represent the latest progress in this field. It is suggested to supplement some latest related work for comparison.
2)The calculation of experimental results in CAIL data set is questionable. Suspected confusion accuracy and precision rate, it is recommended to carefully nuclear check. Also, please note that the indicators of CAIL 2021-Top1 on the official website are not consistent with the versions in the paper. It is recommended to check the version of data used in the experiment.
3)Lack of experiments to accelerate convergence. It is emphasized that the proposed algorithm can "accelerate the convergence", and it is suggested to supplement the corresponding experiments to prove it.
4)English writing needs to be improved. In terms of English writing, there is still some room for improvement. There are some problems in this paper, such as not clear and accurate, and a small number of verb forms.

Comments on the Quality of English Language

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

This paper proposes a novel deep learning framework for hierarchical multi-label legal text classification, combining a multi-head attention mechanism with a hierarchical prototype network. The authors introduce a multi-level learning optimization strategy to accelerate convergence and improve accuracy. The model is evaluated on a legal dataset (CAIL2021) and two general datasets (WOS and AAPD), showing strong empirical results, particularly in the legal domain. While this paper presents several findings, it contains several issues that merit closer scrutiny, especially concerning the model architecture, dataset design and reporting, and interpretation of experimental results.

Model Architecture

The authors design a model that uses a fixed number of three multi-head attention layers in the feature extraction module, corresponding to the three-level hierarchy in the CAIL2021 dataset. However, the paper does not justify this architectural choice or evaluate how it generalizes to datasets with a different number of label levels, such as WOS and AAPD, which have only two levels. It remains unclear whether the model dynamically adjusts the attention depth based on the dataset or simply reuses the same three layers, potentially introducing redundant or misaligned feature representations. This rigid coupling between model structure and data-specific hierarchy depth raises concerns about the model’s flexibility and applicability to datasets with more or fewer levels.

A key component of the hierarchical module involves hard-coded binary transition matrices (A and B) to map prototype representations across label levels. These matrices are constructed based on predefined parent-child label relationships. The authors do not provide justification for this design choice, nor do they consider any learnable or soft alternatives. While this method ensures structural consistency, it severely limits the model’s ability to adapt to label noise, evolving taxonomies, or semantically ambiguous relationships between labels. This hard-coded structure could reduce the model’s robustness and generalizability to real-world legal data, where label hierarchies are often incomplete or context-dependent.

Data

While the paper uses a mix of legal and general datasets—CAIL2021, WOS, and AAPD—there are important issues regarding dataset suitability and transparency. CAIL2021 is a strong choice for legal text classification and provides a three-level hierarchical label structure. However, WOS and AAPD are general-purpose datasets that lack the same semantic and structural complexity found in legal texts. The authors do not clarify whether these general datasets are suitable for testing the model’s hierarchical reasoning capabilities, especially since they contain only two label levels and no deep semantic label relationships.

The paper frequently mentions the challenge of imbalanced label distributions, particularly in legal data, but does not provide any quantitative analysis or visualization of label frequency distributions. This is a missed opportunity, as demonstrating performance on rare labels is critical for validating the model’s robustness in real-world legal settings. Additionally, there is no discussion of head vs. tail label performance, nor is there a per-level breakdown of results (e.g., F1-score for level-1 vs. level-3 labels), which would help in understanding how the model performs across the label hierarchy.

The authors mention that data preprocessing plays a role in their multi-level optimization strategy, but they do not explain what specific steps are involved. For example, it is unclear how long legal documents are segmented, whether any truncation occurs, or how the tokenization aligns with the Sentence-BERT encoder. Given the complexity of legal texts, preprocessing decisions can significantly affect input representation and downstream model performance, and thus deserve more detailed explanation.

Another concern is the lack of information about training stability and reproducibility. The dataset splits (e.g., 8:1:1 for CAIL2021) are applied once, and results are reported from a single run. There is no reporting of variance, confidence intervals, or statistical significance testing. Without such analysis, it is difficult to assess whether the performance improvements are robust or due to random initialization and data splits.

Experimental Results

The authors claim that their model is “significantly superior to mainstream methods in both legal and general scenarios”, but this statement is not fully supported by the reported results. In Table 2, the model clearly outperforms all baselines on the CAIL2021 legal dataset in terms of Micro-F1 and Recall, validating its strength in domain-specific tasks. However, the general datasets present a different picture. In Table 3, while the proposed method achieves the highest Micro-F1 on the WOS dataset, it does not achieve the highest Macro-F1. Competing models such as HiSR, HALB, and MLCL-KNN report higher Macro-F1 scores, indicating stronger performance on rare or less frequent labels.

On the AAPD dataset, the proposed model does not achieve the best performance on either Micro-F1 or Macro-F1. The MLCL model performs better on both metrics, suggesting that the proposed method does not outperform all baselines in the general case. These inconsistencies undermine the claim of consistent superiority and suggest that while the model is effective, its advantages are not universal across all evaluation scenarios.

The evaluation metrics themselves are inconsistently reported. For CAIL2021, the authors include Accuracy, Recall, and Micro-F1, but do not report Macro-F1, which is particularly important in legal tasks characterized by long-tail distributions. For WOS and AAPD, only Micro-F1 and Macro-F1 are reported, without Recall or Accuracy. This inconsistency makes it difficult to compare performance fairly across datasets and may mask areas where the model underperforms.

Finally, while the authors claim their optimization strategy improves convergence, they provide no evidence of training efficiency. Although they mention using two RTX-TITAN 24GB GPUs, they do not report training time, GPU hours, or memory usage. The model’s complexity suggests higher computational cost compared to baselines like BERT or HiMatch, yet no comparisons are made. Without metrics on runtime or resource consumption, the claim of improved efficiency remains unsubstantiated.

Regarding the language, the authors need to carefully proofread the manuscript for typographical and formatting issues. A thorough language check would improve clarity and presentation. For example, the word “learning” is repeatedly misspelled as “Iearning” (with a capital “I”) in both the title and body. Additionally, inconsistent punctuation appears in some quoted phrases, such as mismatched quotation marks in the introduction. These issues are minor but noticeable, and should be corrected prior to publication.

Comments on the Quality of English Language

This report needs moderate revision to enhance the language.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

This paper Multi-head Hierarchical Attention Framework with Multi-level Iearning Optimization Strategy for Legal Text Recognition represents a multi-head hierarchical attention framework with a multi-level learning optimization strategy for multi-label recognition in legal texts. This is interesting research and could be used in different research areas. In general, this is very well written paper, but still, there are some major issues related with the paper that need to be addressed before it can be considered for publishing.

Paper does not provide strong theoretical innovation. The components are well known in the community, and their integration is incremental rather than groundbreaking. Can you add some more explanation on the innovation and practical usability?

As presented in resulty, model outperforms state of the art benchmarks, but lacks a detailed interpretation of why the proposed method performs better. Please add some more explanation.

The used dataset is relatively small. Can the dataset be upscaled or somehow expanded? Small dataset raises concerns about scalability and generalisation.

Could this model be easily used on another datasets?

There are also few language mistakes in paper, because of that I recommend a native speaker for proofreading of the paper.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The article proposed a multi-head hierarchical attention framework with multi-level learning optimization strategy for legal text recognition task. The proposed framework comprises a feature extraction module and a hierarchical module, which extracts multi-level semantic representations of text and obtains multi-label category information. The proposed strategy balances the learning needs of multi-level semantic representation and multi-label category information, effectively accelerating the convergence speed of framework training. In this paper, a large number of experiments are carried out, and the proposed multi-label recognition method performs well in the field of legal text, and it performs well in other fields.

Based on the manuscript，I carefully gave my comments and suggestions in the hope of benefiting the authors.

(1) Consider re-editing the text on Line 36-52 and Figure 1 in Chapter 1. It’s seem that the text didn’t match Figure 1 well when explaining the legal text classification task. Also, make sure you don't use Chinese punctuation even in the pictures.

(2) Check the specification and correctness of the formulas. For example, in “3.1 Problem Description”, it’s seem that “the total number of labels at that level” may be “|Cⁱ|” instead of “Cⁱ”. Another one, matrices are usually printed in bold, but matrix elements don't have to on Line 258-267 in “3.7 Hierarchical Module”.

(3) Pay attention to details. In Figure 2, "distance" is suspected to be misspelled as "distacne"; Meanwhile, the middle section is a bit messy, consider beautifying it.

Comments on the Quality of English Language

Keep refining the writing questions. When double quotation marks are used in this paper, improper punctuation of the sentence itself often occurs. On line 32, the combination "and" is not appropriate. "sentence-BERT" and "Sentence-BERT" coexist, please unity.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

I appreciate the authors’ responses to my previous comments and the revisions incorporated to address them. The additional information provided enhances the paper’s integrity and transparency, resolving the majority of my concerns and suggestions. At this stage, I do not have further questions or comments.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Article Menu

Multi-Head Hierarchical Attention Framework with Multi-Level Learning Optimization Strategy for Legal Text Recognition

Further Information

Guidelines

MDPI Initiatives

Follow MDPI