Chinese Text Readability Assessment Based on the Integration of Visualized Part-of-Speech Information with Linguistic Features
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThis contribution discusses a study on assessing Chinese text readability using machine learning techniques, specifically Support Vector Machine (SVM) and Bidirectional Long Short-Term Memory (Bi-LSTM) networks and integrated four categories linguistic features- type-token ratio, average sentence length, total word, and difficulty level of vocabulary for the readability assessment.
The finding indicated the proposed POS with the linguistic features work well in SVM network and the performance matches with the more complex architectures like Bi-LSTM network in Chinese readability assessment. Bi-LSTM achieved 83.87% accuracy, while SVM reached up to 86.45% when combining POS spectrum and linguistic features. SVM demonstrated faster execution time, making it suitable for scenarios requiring quick responses.
The presented article presents an interesting approach to how to use modern approaches in solving existing problems. On a general level, I would only have two comments/questions:
- In the Abstract, authors point out; "The article proposed the conceptual analogy between Chinese readability assessment and music's rhythm and tempo patterns, the syntactic structures of the Chinese sentences could be transformed to an image". This interesting thesis, however, is not explained in more detail after in paper. I suggest you update this.
- In the article, authors tackle the "narrow problem of Chinese text" of whether the presented approach could be applicable to other similar "fonts", hieroglyphs, or even sign language for the deaf. For readers who are not directly related to your specific problem, an additional explanation of the possibilities of use proposed example would be very interesting.
I'm also attaching a few smaller Particular comments
- Line 45 - Sherman [3] L. A. - Add a reference next to the author and use only the last name.
- Line 47 – "many scholars" – Never use generally – many, a lot... without stating WHO - references needed, at least 2-3.
- Figure 3: what the colors mean. Each image must be self-confessed (all the necessary information to understand the image must be on it! This also applies to figures 6, 7, 8!
- "Taiwan Benchmarks for the Chinese Language" is used to teach Chinese as a second/foreign language - why did you use this one, perhaps a brief description?
- CKIP Toolkit, PyTorch framework, etc. - Before using abbreviations, explain them briefly. Probably not all readers are right from your field .
- Figure 4 - It would probably be good for general understanding to show Figure 4 with the exaple?
- Line 317 - "Clearly, the cost of the training/testing model will be effectively depressed using the SVM architecture". This is a scientific article that needs proof. Confirm this thesis!
- Chapter 3.2 probably needs additional explanation. As I mentioned before, Every image needs to be so clear that we don't need a touchdown on it. Figures 6, 7, 8 are not these - what are the numbers in small squares, what do the 4 boxes (predict label) represent in Figures 7 and 8, etc.
Quality of English Language look well.
Author Response
Comments 1:This contribution discusses a study on assessing Chinese text readability using machine learning techniques, specifically Support Vector Machine (SVM) and Bidirectional Long Short-Term Memory (Bi-LSTM) networks and integrated four categories linguistic features- type-token ratio, average sentence length, total word, and difficulty level of vocabulary for the readability assessment.
The finding indicated the proposed POS with the linguistic features work well in SVM network and the performance matches with the more complex architectures like Bi-LSTM network in Chinese readability assessment. Bi-LSTM achieved 83.87% accuracy, while SVM reached up to 86.45% when combining POS spectrum and linguistic features. SVM demonstrated faster execution time, making it suitable for scenarios requiring quick responses.
The presented article presents an interesting approach to how to use modern approaches in solving existing problems. On a general level, I would only have two comments/questions:
Response 1:Thank you for the valuable comment. The proposed paper transforms the problem of Chinese text classification into a high-dimensional space in the form of image processing, and then applies neural networks to perform the classification. This technique draws inspiration from concepts related to musical composition, such as rhythm, style, and degrees of difficulty.
Comments 2: In the Abstract, authors point out; "The article proposed the conceptual analogy between Chinese readability assessment and music's rhythm and tempo patterns, the syntactic structures of the Chinese sentences could be transformed to an image". This interesting thesis, however, is not explained in more detail after in paper. I suggest you update this.
Reponse 2: Thank you for the valuable suggestion. The conceptual analogy description is listed in section 2.3.1.
Comments 3: In the article, authors tackle the "narrow problem of Chinese text" of whether the presented approach could be applicable to other similar "fonts", hieroglyphs, or even sign language for the deaf. For readers who are not directly related to your specific problem, an additional explanation of the possibilities of use proposed example would be very interesting.
Reponse 3:Thank you for the comments. We think that the proposed concept might be useful in the similar structure languages; e.g. Korean and Japanese language. However, the more evidences should be proved. Because the structure and part-of-speech (POS) patterns of sentences resemble the rhythmic patterns found in musical composition, we proposed that digital image processing techniques can be used to capture this rhythmicity and employ it for text-level classification.
Comments 4: Line 45 - Sherman [3] L. A. - Add a reference next to the author and use only the last name.
Reponse 4: Thank you; we had modified the problem.
Comments 5: Line 47 – "many scholars" – Never use generally – many, a lot... without stating WHO - references needed, at least 2-3.
Reponse 5:Thank you for your comment. We append five recently proposed papers in the reference suggested by MDPI Journals.
Hu, T.; Chen, Z.; Ge, J.; Yang, Z.; Xu, J. A Chinese Few-Shot Text Classification Method Utilizing Improved Prompt Learning and Unlabeled Data. Appl. Sci. 2023, 13, 3334. doi: 10.3390/app13053334
Liu, H.; Ye, Z.; Zhao, H.; Yang, Y. Chinese Text De-Colloquialization Technique Based on Back-Translation Strategy and End-to-End Learning. Appl. Sci. 2023, 13, 10818. doi: 10.3390/app131910818
Guo, S.; Huang, Y.; Huang, B.; Yang, L.; Zhou, C. CWSXLNet: A Sentiment Analysis Model Based on Chinese Word Segmentation Information Enhancement. Appl. Sci. 2023, 13, 4056. doi: 10.3390/app13064056
Kostadimas, D.; Kermanidis, K.L.; Andronikos, T. Exploring the Effectiveness of Shallow and L2 Learner-Suitable Textual Features for Supervised and Unsupervised Sentence-Based Readability Assessment. Appl. Sci. 2024, 14, 7997. doi: 10.3390/app14177997
Liu, Y.; Li, S.; Deng, Y.; Hao, S.; Wang, L. SSuieBERT: Domain Adaptation Model for Chinese Space Science Text Mining and Information Extraction. Electronics 2024, 13, 2949. doi: 10.3390/electronics13152949
Comments 6: Figure 3: what the colors mean. Each image must be self-confessed (all the necessary information to understand the image must be on it! This also applies to figures 6, 7, 8!
Reponse 6: Thank you for your comments. The different colors represented the different POS tags. The X-axis represents the sequence of characters in the Chinese text and the Y-axis indicates 26 POS categories and listed in Table 2, including 14 major POS tags, 11 punctuation types, and 1 whitespace.
Comments 7: "Taiwan Benchmarks for the Chinese Language" is used to teach Chinese as a second/foreign language - why did you use this one, perhaps a brief description?
Reponse 7:Thank you for your comments. The Taiwan Benchmarks for the Chinese Language (TBCL) is used because it is a widely adopted set of teaching guidelines and proficiency reference standards in the field of Chinese language education in Taiwan.
Comments 8: CKIP Toolkit, PyTorch framework, etc. - Before using abbreviations, explain them briefly. Probably not all readers are right from your field .
Reponse 8: Thank you for your suggestion. We had modified the problem in the article.
CKIP: Chinese Knowledge and Information Processing
PyTorch framework: https://pytorch.org/
Comments 9: Figure 4 - It would probably be good for general understanding to show Figure 4 with the exaple?
Reponse 9: The first four blocks in Figure 4 are demonstrated in Figure 3. The last processing is to get the related discrete cosine transform (DCT) so that we ignored this illustration due to the huge number of coefficients. We thank to the reviewer’s suggestions.
Comments 10: Line 317 - "Clearly, the cost of the training/testing model will be effectively depressed using the SVM architecture". This is a scientific article that needs proof. Confirm this thesis!
Reponse 10: Thank you for your comments. Because of the structure of LSTM is more complex than the SVM. So, we indicated the SVM architecture can effectively depressed the cost of the training/testing model. For clear explanation, we append a sentence in the paper.
Comments 11: Chapter 3.2 probably needs additional explanation. As I mentioned before, Every image needs to be so clear that we don't need a touchdown on it. Figures 6, 7, 8 are not these - what are the numbers in small squares, what do the 4 boxes (predict label) represent in Figures 7 and 8, etc.
Reponse 11: Thank you for your comments. As the reviewer mentioned, we append more explanations in the article. Basically, the confusion matrix is the most popular evaluation metric. The numbers in confusion matrix represent the counts of correct and incorrect predictions made by a classification model, categorized by the actual and predicted text readability.
Author Response File:
Author Response.docx
Reviewer 2 Report
Comments and Suggestions for AuthorsWhat is the main research question addressed by the study?
The main research question addressed by the study is how to perform Chinese text readability assessment. The article proposes integrating visualized Part-of-Speech information with traditional linguistic features to tackle this challenge. Two approaches are proposed and compared: the use of Support Vector Machines, integrating common linguistic features and visual POS spectrograms, and the use of Bi-LSTM to process word embeddings.
Is the topic original or relevant to the field? Does it address a specific gap in the field?
The topic is highly relevant, as Chinese text readability assessment plays a significant role in Chinese language education. The study addresses a critical gap derived from the intrinsic properties of the Chinese language, which differs substantially from alphabetic languages.
What does it contribute to the subject area compared to other published material?
Innovation in Integrated Features: The study incorporates POS spectrum information along with traditional linguistic features. The visualization of POS label distribution provides temporal information about the text, compensating for the lack of sequential modeling ability in SVM.
Improved Performance: The combination of linguistic features and POS spectrum in the SVM model achieved a classification accuracy of 86.45%. This represents a significant improvement over the 72.92% accuracy reported in Yao-Ting Sung’s study.
What specific methodological improvements should the authors consider?
Model selection and comparison:
The rationale for selecting SVM as the primary classifier is not sufficiently explained. To enhance methodological rigor, the authors should justify why SVM was chosen over other well-established machine-learning models such as Multilayer Perceptrons, Extreme Learning Machines, or Random Forests. Ideally, the study should include additional benchmarking experiments with these alternative classifiers using the same feature sets, allowing for a more robust model selection process. If expanding the experimental scope is not feasible due to data or computational limitations, the manuscript would benefit from a clearer justification supported by prior research or by the specific characteristics of the dataset.
Baseline selection, modern alternatives, and classical readability methods:
The choice of Bi-LSTM as the deep learning baseline also requires further clarification. Considering the advances in transformer-based architectures (e.g., BERT derivatives, RoBERTa, or Chinese-specific pretrained models), the authors should evaluate at least one transformer-based approach from the literature on their dataset to provide a more contemporary and competitive comparison. Even a lightweight or distilled transformer would help situate the proposed method within current state-of-the-art practices.
In addition, it is not clear why the authors did not include at least one classical readability formula as a baseline, particularly given their relevance in the history of readability assessment. Applying a traditional metric—even if imperfect for Chinese—would allow readers to contextualize the performance gains offered by machine learning and multimodal approaches.
If neither transformer-based baselines nor classical formulas can be incorporated, the authors should offer a convincing justification addressing constraints such as dataset suitability, computational resources, or the interpretability goals of the study, explaining why Bi-LSTM and the proposed SVM-based approach were deemed the most appropriate choices for the current investigation.
Evaluator agreement:
Although the dataset was curated by experts in the field, the study’s methodological soundness would be strengthened by reporting inter-rater agreement measures—such as Cohen’s Kappa—for the classification of texts into Advanced, Intermediate, and Basic levels. Including these reliability coefficients would provide clearer evidence of the consistency and validity of the expert annotations on which the dataset is based.
Are the conclusions consistent with the evidence and arguments presented?
Yes, the conclusions are consistent with the evidence and arguments presented in the Results and Discussion section.
Are the references appropriate?
Yes, the references appear appropriate for the scope of the study.
Any additional comments regarding the tables and figures?
Tables and figures are ok.
I wish the research team the best of luck as they continue refining and advancing this promising line of work.
Author Response
Comments 1: What is the main research question addressed by the study?
The main research question addressed by the study is how to perform Chinese text readability assessment. The article proposes integrating visualized Part-of-Speech information with traditional linguistic features to tackle this challenge. Two approaches are proposed and compared: the use of Support Vector Machines, integrating common linguistic features and visual POS spectrograms, and the use of Bi-LSTM to process word embeddings.
Response 1: Thank you for the valuable comment. The proposed paper transforms the problem of Chinese text classification into a high-dimensional space in the form of image processing, and then applies neural networks to perform the classification. This technique draws inspiration from concepts related to musical composition, such as rhythm, style, and degrees of difficulty.
Comments 2: Is the topic original or relevant to the field? Does it address a specific gap in the field?
The topic is highly relevant, as Chinese text readability assessment plays a significant role in Chinese language education. The study addresses a critical gap derived from the intrinsic properties of the Chinese language, which differs substantially from alphabetic languages.
Response 2: Thank you for your positive comments.
Comments 3: What does it contribute to the subject area compared to other published material?
Innovation in Integrated Features: The study incorporates POS spectrum information along with traditional linguistic features. The visualization of POS label distribution provides temporal information about the text, compensating for the lack of sequential modeling ability in SVM.
Response 3: Thank you for your positive comments.
Comments 4: Improved Performance: The combination of linguistic features and POS spectrum in the SVM model achieved a classification accuracy of 86.45%. This represents a significant improvement over the 72.92% accuracy reported in Yao-Ting Sung’s study.
Response 4: Thank you for your positive comments.
Comments 5:
What specific methodological improvements should the authors consider?
Model selection and comparison:
The rationale for selecting SVM as the primary classifier is not sufficiently explained. To enhance methodological rigor, the authors should justify why SVM was chosen over other well-established machine-learning models such as Multilayer Perceptrons, Extreme Learning Machines, or Random Forests. Ideally, the study should include additional benchmarking experiments with these alternative classifiers using the same feature sets, allowing for a more robust model selection process. If expanding the experimental scope is not feasible due to data or computational limitations, the manuscript would benefit from a clearer justification supported by prior research or by the specific characteristics of the dataset.
Response 5: Thank you for your positive comments. There are two major reasons to use SVM neural network for Chinese text grading.
1 The amount of data and the number of classification levels may lead to overfitting. In fact, we had tried some Multilayer Perceptrons for the classification, the results is out of expectation so that we do not mention the results in this article.
2 For the quantitative comparison with the previous study- for example: Yao-Ting Sung et al (2012). "Investigating Chinese Text Readability: Linguistic Features, Modeling, and Validation." https://doi.org/10.6129/CJP.20120621; the SVM neural network is adopted in the study. Because of the paper proposed by Sung used a series of comprehensive features with SVM for the classification of Chinese text readability. Therefore, we think that the comparison is valuable and significant. Consequently, the result of proposed paper is better than the Sung’s reported the means the proposed conceptual analogy between Chinese readability assessment and music's rhythm and tempo patterns works well.
Comments 6: Baseline selection, modern alternatives, and classical readability methods:
The choice of Bi-LSTM as the deep learning baseline also requires further clarification. Considering the advances in transformer-based architectures (e.g., BERT derivatives, RoBERTa, or Chinese-specific pretrained models), the authors should evaluate at least one transformer-based approach from the literature on their dataset to provide a more contemporary and competitive comparison. Even a lightweight or distilled transformer would help situate the proposed method within current state-of-the-art practices.
In addition, it is not clear why the authors did not include at least one classical readability formula as a baseline, particularly given their relevance in the history of readability assessment. Applying a traditional metric—even if imperfect for Chinese—would allow readers to contextualize the performance gains offered by machine learning and multimodal approaches.
If neither transformer-based baselines nor classical formulas can be incorporated, the authors should offer a convincing justification addressing constraints such as dataset suitability, computational resources, or the interpretability goals of the study, explaining why Bi-LSTM and the proposed SVM-based approach were deemed the most appropriate choices for the current investigation.
Response 6: Thank you for your comments. The main idea we propose is to map textual features—analogous to rhythmic patterns—into an image space, and then process them using SVM and LSTM models. While we recognize that transformer-based architectures have become widely used in large language models in recent years, incorporating them may obscure the significance and feasibility of the analogy we aim to highlight. For this reason, the paper does not include more recent neural network techniques and instead adopts LSTM, which has been traditionally and successfully applied, as a point of comparison. In addition, factors such as the amount of available data and the number of classification levels may potentially lead to overfitting, which is another consideration in our approach.
Comments 7: Evaluator agreement:
Although the dataset was curated by experts in the field, the study’s methodological soundness would be strengthened by reporting inter-rater agreement measures—such as Cohen’s Kappa—for the classification of texts into Advanced, Intermediate, and Basic levels. Including these reliability coefficients would provide clearer evidence of the consistency and validity of the expert annotations on which the dataset is based.
Response 7: Thank you for your comments. All of the data is manually evaluation by Chinese language experts so we think the collected Chinese texts with reference value.
Comments 8:Are the conclusions consistent with the evidence and arguments presented?
Yes, the conclusions are consistent with the evidence and arguments presented in the Results and Discussion section.
Response 8: Thank you for your positive comments.
Comments 9: Are the references appropriate?
Yes, the references appear appropriate for the scope of the study.
Response 9: Thank you for your positive comments.
Comments 10:
Any additional comments regarding the tables and figures?
Tables and figures are ok.
Response 10: Thank you for your positive comments.
Author Response File:
Author Response.docx

