Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Open AccessArticle

Peer-Review Record

The Classification of Log Decay Classes and an Analysis of Their Physical and Chemical Characteristics Based on Artificial Neural Networks and K-Means Clustering

Forests 2023, 14(4), 852; https://doi.org/10.3390/f14040852

by Wen Wen¹, Wenjun Zhang¹, Shirong He², Haitao Hu³, Hailiang Qiao³, Xiao Wang¹, Nan Rao¹ and Jie Yuan^1,4,*

Reviewer 1: Anonymous

Reviewer 2:

Silvano Piazza

Forests 2023, 14(4), 852; https://doi.org/10.3390/f14040852

Submission received: 25 February 2023 / Revised: 14 April 2023 / Accepted: 19 April 2023 / Published: 21 April 2023

(This article belongs to the Special Issue Forest Succession and Leaf Litter Decomposition)

Round 1

Reviewer 1 Report

I proceeded to analyze the manuscript entitled:

The classification of log decay classes and the analysis of their physical and chemical characteristics based on artificial neural networks and K-means clustering

Wen Wen, Wenjun Zhang, Shirong He, Haitao Hu, Hailiang Qiao, Xiao Wang, Nan Rao and Jie Yuan

The manuscript deals with an artificial neural network model (ANN) based procedure to determine the decay classes of logs from four species (Pinus tabulaeformis, Larix principis-ruprechtii, Betula albosinensis and Quercus aliena var. acuteserrata). The physicochemical characteristics of the log samples were determined by laboratory analysis. The hardness was then used as a clustering factor to quantify the decay levels of the log via K-means clustering analysis. An ANN model was used to predict the hardness values of the log samples with different levels of decay at different moisture contents. The authors report that the prediction of the hardness of the decayed log by the ANN was very effective and that the highly significant variability in dry matter content, basic density and some basic chemical element contents between the log samples that were classified into different decay grades confirmed the reliability of the clustering results.

The introduction provides a comprehensive and detailed overview of the importance of logs in forest ecosystems. It explains that logs play a crucial role in maintaining the ground, conserving biodiversity, and facilitating the carbon and nutrient cycling process. Overall, the introduction is well-written and clearly establishes the context and background information of the research topic.

I believe that the work and the results are interesting. The figures are suggestive and support the statements in part. The figures are presented in good graphical quality. The article is well written, using good English (I am not a native English speaker though). I believe that references are suggestive and reveal that the authors are very well aware of the what has been published on the topic.

I have several observations:

Lines 151 -154: Give citation on The National Renewable Energy Laboratory (NREL) method or explain the method in the manuscript.

Caption of Figure 1: You wrote: “The artificial neural network structure. The implied layer factors included moisture content, DMC, basic density, acid-insoluble lignin, acid-soluble lignin, cellulose content, hemicellulose content, glucose content and xylose content as the inputs, with hardness value as the output. The number of implied layers was set to 17, with 184 sets of data (175 sets as training data (95%) and 9 sets as test data (5%)). “

I believe that these details should be placed in the body of the paper not in the figure caption.

Moreover, you mentioned 13 inputs and the first four were varieties of trees. Explain in the manuscript how you transform the variety of trees as input: 1 for the type of tree and 0 for the other three inputs?

Explain in manuscript why did you chose the ANN structure to be the one in Figure 1. Did you try other structures, with different number of neurons? Are 60 neurons necessary in the hidden layer 2? Give details on the selection.

You mentioned (175 sets as training data (95%) and 9 sets as test data (5%)). Training an ANN presumes training, validation and test stages. Explain in manuscript why validation was not necessary.

Lines 190-192: give details in the manuscript on the normalization. Explain how you calculate TH and TMC

Lines 194-195: explain in manuscript why you selected 60 sets of data when you had the whole set of data.

Lines 202-204: write in the manuscript what were the values you found for the kurtosis and skewnes. Were they different for samples of each tree variant?

Line 250-254: “The hardness value of the level 1 cluster center was 41.57 N/mm 2 , the hardness value of the level 2 cluster center was 32.17 N/mm 2 , the hardness value of the level 3 cluster center was 22.50 N/mm 2 , the hardness value of the level 4 cluster center was 14.36 N/mm 2 and the hardness value of the level 5 cluster center was 7.64 N/mm 2 .”

You mentioned normalized hardens as input for ANN but you are discussing about measured values Explain in manuscript why these values are mentioned for the center of the five level of clusters and not the normalized values. Explain in manuscript why the hardness does not depend of the variety of tree the log comes from.

Lines 321-324: “Wang et al. used lignin content and cellulose content as indicators to determine the decay grade of P. koraiensis and Juglans mandshurica Maxim. and explain the mechanisms of lignin, cellulose and hemicellulose in log decomposition[39].” Please verify and rephrase.

Author Response

Lines 151 -154: Give citation on The National Renewable Energy Laboratory (NREL) method or explain the method in the manuscript.

Response：We have supplemented the cited references for NREL methods.

Caption of Figure 1: You wrote: “The artificial neural network structure. The implied layer factors included moisture content, DMC, basic density, acid-insoluble lignin, acid-soluble lignin, cellulose content, hemicellulose content, glucose content and xylose content as the inputs, with hardness value as the output. The number of implied layers was set to 17, with 184 sets of data (175 sets as training data (95%) and 9 sets as test data (5%)).”

I believe that these details should be placed in the body of the paper not in the figure caption.

Response：We have placed these details in the body of the paper.

Moreover, you mentioned 13 inputs and the first four were varieties of trees. Explain in the manuscript how you transform the variety of trees as input: 1 for the type of tree and 0 for the other three inputs?

Response：To encode tree species, we employ One-hot Encoding. With multiple category features, each of which having m category fetches, we convert all of them into m binary features through One-hot encoding. In this way, each feature corresponds to a specific category of fetches and effectively captures their characteristics.

For example：

Pinus tabulaeformis	1	0	0	0
Larix principis-ruprechtii	0	1	0	0
Betula albosinensis	0	0	1	0
Quercus aliena var. acuteserrata	0	0	0	1

Explain in manuscript why did you chose the ANN structure to be the one in Figure 1. Did you try other structures, with different number of neurons? Are 60 neurons necessary in the hidden layer 2? Give details on the selection.

Response：We tried various ANN structures, including those with four, five, or more layers of hidden neurons. However, we didn't want to use too many layers because it would increase the computational burden but only yield slight improvements in RMSE, R², and MAPE. Thus, we chose a three-layer ANN structure.

After selecting the three-layer ANN structure, we began determining the number of neurons in each hidden layer. We aimed to minimize the number of neurons to reduce the computational load while ensuring that RMSE and other metrics remained acceptable. Therefore, we experimented with a series of ANN structures, such as (10, 20, 10), (20, 40, 20), and (20, 30, 15), until we obtained satisfactory results with a (30, 60, 30) structure.

In fact, we should have studied the effects of the number of hidden layers and neurons on RMSE and other metrics. However, since this is not our area of expertise, we only used ANN as a tool for predicting data.

You mentioned (175 sets as training data (95%) and 9 sets as test data (5%)). Training an ANN presumes training, validation and test stages. Explain in manuscript why validation was not necessary.

Response: The validation set serves the purpose of testing the model during training and stopping the training process when optimal parameters are achieved. Essentially, the goal of validation is to obtain better parameters that guarantee a good ANN structure. After completion of training, the model is tested on a distinct test set.

Due to challenges associated with data collection, our dataset is relatively small. In order to enhance the size of our training set, we opted not to use a validation set. Instead, we manually experimented with multiple ANN structures until we obtained a suitable one (as depicted in Figure 1).

Lines 190-192: give details in the manuscript on the normalization. Explain how you calculate TH and TMC.

Response: We apologize for any methodological errors in the manuscript. The diverse dimensions and units of each indicator complicate the data analysis results. To address indicator comparability issues concerning dimensionality, we chose to normalize the data. Additionally, we normalized the data beforehand to hasten model convergence. However, in finalizing our hardness data clustering analysis, it was not essential to consider dimensionality factors. Therefore, we opted to use the raw data to best present our experimental results.

Lines 194-195: explain in manuscript why you selected 60 sets of data when you had the whole set of data.

Response: As a result of the clustering process, there was considerable variation in the number of samples per tree species across different decay levels. To prevent indeterminate effects of a particular species on kurtosis and skewness due to varying numbers, we structured the data to accommodate for the unique amounts of data associated with each tree species.

Lines 202-204: write in the manuscript what were the values you found for the kurtosis and skewness. Were they different for samples of each tree variant?

Response: We have written out values for kurtosis and skewness. The study set the criteria that absolute kurtosis values less than 10 and absolute skewness values less than 3 are acceptable for a normal distribution.

Indictors	skewness	kurtosis
dry matter content	0.284	-1.212
basic density	0.521	-0.689
acid-soluble lignin	0.532	0.609
acid-insoluble lignin	0.031	-1.088
cellulose	0.128	-0.051
hemicellulose	0.479	-0.828
glucose	0.129	-0.055
xylose	0.242	-0.288
hardness	0.124	-1.192

Although the variant samples differ for every tree, we conducted an ANOVA analysis of variance to gauge hardness distinctions between each tree species. Based on this analysis, we found no significant differences between the selected samples.

Line 250-254: “The hardness value of the level 1 cluster center was 41.57 N/mm 2, the hardness value of the level 2 cluster center was 32.17 N/mm 2 , the hardness value of the level 3 cluster center was 22.50 N/mm 2 , the hardness value of the level 4 cluster center was 14.36 N/mm 2 and the hardness value of the level 5 cluster center was 7.64 N/mm 2 .”

Response: As previously noted in Question 6, we apologize for our error. In analyzing inter-species variability of hardness values in our selected species, no significant differences were detected. Therefore, we did not include the tree species factor in our cluster analysis. Results outlining hardness variability among different tree species are provided. While hardness differences between tree species cannot be overlooked, our Discussion section explains that the number of indicators considered in our study was low, and subsequent studies could expand on these to improve result robustness.

Lines 321-324: “Wang et al. used lignin content and cellulose content as indicators to determine the decay grade of P. koraiensis and Juglans mandshurica Maxim. and explain the mechanisms of lignin, cellulose and hemicellulose in log decomposition[39].” Please verify and rephrase.

Response: We have modified this sentence. Changed the original sentence “Wang et al. used lignin content and cellulose content as indicators to determine the decay grade of P. koraiensis and Juglans mandshurica Maxim. and explain the mechanisms of lignin, cellulose and hemicellulose in log decomposition [39].” to “Wang et al. employed lignin and cellulose content as indicators to assess the decay level of Pinus. koraiensis and Juglans mandshurica Maxim., and elucidated the mechanisms under-lying changes in lignin, cellulose, and hemicellulose content during log decomposition”.

Author Response File: Author Response.docx

Reviewer 2 Report

1. The manuscript of Wen et colleagues presents an interesting report examining the wood mass loss current analyses and proposes new methods based on machine learning techniques.
The manuscript is well written, and I would recommend the publication upon one major issue and some minor issues are addressed.

Minor issues:

Abstract

The acronym of artificial neural network ANN is not presented in line 20 before use in line 27.

Materials and methods
1. Please authors may include an explanation of how samples were chosen.
1. Please authors may provide table of information about samples. For example, the age of logs may be important (as authors suggest like in the cited Oberle et al line 329)
1. Please add a reference line in 151 & 152
1. Please authors may specify which apparatus they used in line 162
Results
1. I would suggest changing the colors to figure 2. I.e: train data -> blue, test data -> red, all data -> blue & red and/or change shapes
1. Add comment on why the test data gets divided in two subsets
1. Figure 3 needs to be revised: a) Remove center of clustering. In not needed; b) calculate and add p-value of cluster robustness, c) add number of samples.
1. Please add the label for color scale in figure 4 (correlation?)
1. Please add jitter and statistical support in figure 5
Additional issues
1. Please, I strongly encourage the authors to provide as supplementary file the data used in the model, and not under request.
1. Please, I strongly encourage authors to provide as supplementary file the manuscript code.
These two points are extremely important in order to build up an open perspective in science publications and it will allow other groups in the future use this data in a metadata analysis or improve your ANN model.
1. Please provide a better explanation of role of wetting the log at 2, 4, 6, 8, 10 hours
Major issue

Even if it is very interesting, I found the ANN description/usage both in methods and in results too superficial and circumstantial. For example, I found it very strange not to see a table of variables’ importance. This could be a very interesting finding “per se” and also a useful tool to build a better model. Since the data available in this project is not very big (with less than 200 samples), the threshold value of which data is to be used for training and which for validation is crucial. So how about trying different values? Moreover, is not clear to me how many times the model was tested with samples random selection between training/validation. So in other word I would like to see something like a Confusion matrix, or Precision-Recall PR curve, or ROC curves under different parameters used in the ANN. Finally in line 234 and 234 is written “The training set R2 values were higher than the test set R2 values, indicating that the model was not overfitted” when in the table the R2 value of the training set is lower (0.9925) than the test set (0.9964).

Author Response

The acronym of artificial neural network ANN is not presented in line 20 before use in line 27.

Response: We add the abbreviation ANN for Artificial Neural Network in line 20.

Please authors may include an explanation of how samples were chosen.

Response: We have provided additional details on sample collection. For larger, mildly decayed logs, we utilized a hacksaw to cut 5 cm thick discs. For heavily decayed logs, we extracted partial samples using a knife and packed them into an aluminum box of known volume.

Please authors may provide table of information about samples. For example, the age of logs may be important (as authors suggest like in the cited Oberle et al line 329)

Response: Our experiments differ from those conducted by Oberle et al. We obtained random log samples from the understory exhibiting various degrees of decay, comprising incomplete sapwood and core wood. This made it challenging to determine the logs' ages and decay time. In contrast, Oberle et al. gauged changes in physicochemical properties of 5-10 cm live stems during decay under field conditions over a period of one to three years.

Please add a reference line in 151 & 152

Response: We have added the reference text " Determination of structural carbohydrates and lignin in biomass"

Please authors may specify which apparatus they used in line 162

Response: We have specified the instruments we use as " The Universal Mechanical Experiment Instruments."

Results

I would suggest changing the colors to figure 2. I.e: train data -> blue, test data -> red, all data -> blue & red and/or change shapes

Response: We have changed the colors of figure 2.

Add comment on why the test data gets divided in two subsets

Response: We didn't divide it into two test sets, we divided the whole dataset, into a training set, a test set, and a part of the prediction data.

Figure 3 needs to be revised: a) Remove center of clustering. In not needed; b) calculate and add p-value of cluster robustness, c) add number of samples.

Response: We have removed the clustering centers in Figure 3 and supplemented the number of clustered samples by 184.

Please add the label for color scale in figure 4 (correlation?)

Response: We have added the label for color scale in figure 4 “Red label indicate positive correlations and blue label indicate negative correlations.”

Please add jitter and statistical support in figure 5

Response: We have added jitter and statistical support in figure 5.

Additional issues

Please, I strongly encourage the authors to provide as supplementary file the data used in the model, and not under request.

Please, I strongly encourage authors to provide as supplementary file the manuscript code.

These two points are extremely important in order to build up an open perspective in science publications and it will allow other groups in the future use this data in a metadata analysis or improve your ANN model.

Response：Our data is valuable, but we are unable to share it with you due to privacy and confidentiality concerns. However, if other groups wish to replicate our experiment, we can provide the min.pt file and original code file, which are expertly commented and use open-source libraries. These libraries require minimal knowledge about machine learning for operation. Furthermore, those interested in our data can collaborate with us.

Please provide a better explanation of role of wetting the log at 2, 4, 6, 8, 10 hours

Response：As hardness is significantly impacted by wood moisture content, the logs' moisture levels are highly vulnerable to fluctuations in field conditions. To mitigate this, we subjected the logs to a distilled water spray for 2-10 hours under a nozzle.

Major issue

Even if it is very interesting, I found the ANN description/usage both in methods and in results too superficial and circumstantial. For example, I found it very strange not to see a table of variables’ importance. This could be a very interesting finding “per se” and also a useful tool to build a better model. Since the data available in this project is not very big (with less than 200 samples), the threshold value of which data is to be used for training and which for validation is crucial. So how about trying different values? Moreover, is not clear to me how many times the model was tested with samples random selection between training/validation. So in other word I would like to see something like a Confusion matrix, or Precision-Recall PR curve, or ROC curves under different parameters used in the ANN. Finally in line 234 and 234 is written “The training set R² values were higher than the test set R² values, indicating that the model was not overfitted” when in the table the R² value of the training set is lower (0.9925) than the test set (0.9964).

Response: The Confusion matrix, Precision-recall curve, and ROC curve are commonly used metrics for evaluating classification models. However, as the Artificial Neural Network (ANN) utilized in this article is intended for prediction tasks, these curves cannot be presented. Evaluation metrics for prediction models often include MSE, RMSE, R², and other indicators that were considered in this study. We apologize for the incorrect representation in our manuscript. We have modified this sentence. Changed the original sentence “The training set R² values were higher than the test set R²values, indicating that the model was not overfitted” to “The testing set R² values were higher than the training set R² values, indicating that the model was not overfitted. In both the training and test sets”.

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

Comment to the authors

I proceeded to re-analyze the manuscript entitled:

The classification of log decay classes and the analysis of their physical and chemical characteristics based on artificial neural networks and K-means clustering

Wen Wen, Wenjun Zhang, Shirong He, Haitao Hu, Hailiang Qiao, Xiao Wang, Nan Rao and Jie Yuan

You put a considerable effort in improving the manuscript. I still have some comments:

As a general observation, it makes sense to publish a work if you give sufficient details for other people to verify, use the results you publish and, eventually, continue the research. For this reason I requested that the additional clarification I asked should be mentioned IN THE MANUSCRIPT, while you mentioned some of them in the cover letter only.

-Line 155: “acid-soluble lignin, acid-insoluble lignin” - please verify here and, as a rule, recheck the manuscript for small typing errors.

-Lines 181-183: “The number of implied layers was set to 17, with 184 sets of data (175 sets as training data (95%) and 9 sets as test data (5%))”

Is it 17 layers or is it inputs? You mention that you had 13 inputs. Figure 1 indicates 3 hidden layers, the input layer and the output layer, not 17. It is mandatory to clarify this IN YOUR MANUSCRIPT, not only in the answer letter.

Issue from my previous Comment for authors remains in part:

“Line 250-254 (of the previous version of the manuscript): “The hardness value of the level 1 cluster center was 41.57 N/mm 2, the hardness value of the level 2 cluster center was 32.17 N/mm 2 , the hardness value of the level 3 cluster center was 22.50 N/mm 2 , the hardness value of the level 4 cluster center was 14.36 N/mm 2 and the hardness value of the level 5 cluster center was 7.64 N/mm 2 .”

You mentioned normalized hardness as input for ANN but you are discussing about measured values. Explain in manuscript why these values are mentioned for the center of the five level of clusters and not the normalized values. Explain in manuscript why the hardness does not depend of the variety of tree the log comes from.”

This is contradictory: you mentioned in your manuscript using normalized values and you use non-normalized values for clustering. Clarify IN YOUR MANUSCRIPT and explain exactly which input data is normalized and which is not.

Verify that eq. (3) and (4) you introduced are accurate. This is not a normalization but simply a decrease of the values way lower than 1. Is this the way you actually did normalization? A typical normalization would change values to be in range [0,1] Explain this in manuscript.

Author Response

Line 155: “acid-soluble lignin, acid-insoluble lignin” - please verify here and, as a rule, recheck the manuscript for small typing errors.

Response：We have cross-checked and verified in many English-language references that the terminology “acid-soluble lignin, acid-insoluble lignin” is correct.

Lines 181-183: “The number of implied layers was set to 17, with 184 sets of data (175 sets as training data (95%) and 9 sets as test data (5%))”

Response：Our neural network structure consists of one input layer, three hidden layers, and one output layer. The input layer comprises 13 neurons corresponding to 13 feature inputs, including varieties of trees 1, varieties of trees 2, varieties of trees 3, varieties of trees 4, moisture content, DMC concentration, basic density, acid-insoluble lignin content, acid-soluble lignin content, cellulose content, hemicellulose content, glucose content, and xylose content. We also added this explanation in the manuscript.

Issue from my previous Comment for authors remains in part:

“Line 250-254 (of the previous version of the manuscript): “The hardness value of the level 1 cluster center was 41.57 N/mm 2, the hardness value of the level 2 cluster center was 32.17 N/mm 2 , the hardness value of the level 3 cluster center was 22.50 N/mm 2 , the hardness value of the level 4 cluster center was 14.36 N/mm 2 and the hardness value of the level 5 cluster center was 7.64 N/mm 2 .”

You mentioned normalized hardness as input for ANN but you are discussing about measured values. Explain in manuscript why these values are mentioned for the center of the five level of clusters and not the normalized values. Explain in manuscript why the hardness does not depend of the variety of tree the log comes from.”

Response：We apologize for our mistake and have modified in the manuscript that non-normalized data was used in K-means clustering analysis. Normalized data is utilized in the artificial neural network process to expedite the model's convergence speed and mitigate the errors caused by varying dimensions of each index on experimental results, which has been extensively discussed in literature on data normalization methods to address dimensional effects. However, since there is only one hardness indicator during cluster analysis, normalizing the data is unnecessary, and using raw data can provide a more intuitive representation of hardness data clustering results. In our manuscript, we added that “due to the limited indicators considered in the clustering process, we did not take the dimensional effect on experimental results into account and used raw data to visually present the clustering results.”. Furthermore, we explained why tree species were not considered for hardness clustering “We analyzed the differences in hardness values among different tree species and found no significant variations among the selected tree species (as shown in Table 2). Thus, the tree species factor is not considered in this study.”. In our discussion, we also emphasized the need for developing a better method for determining decaying log hardness that considers the differences in tree species across various climatic zones, thereby enhancing current quantitative hardness classification systems.

Verify that eq. (3) and (4) you introduced are accurate. This is not a normalization but simply a decrease of the values way lower than 1. Is this the way you actually did normalization? A typical normalization would change values to be in range [0,1] Explain this in manuscript.

Response: We utilized the summation normalization method and cited “Property analysis of linear dimensionless methods” in our manuscript as a reference for the eq. (3) and (4).

Author Response File: Author Response.docx

Reviewer 2 Report

no more suggestions. I suggest for publication.

Author Response

no more suggestions. I suggest for publication.

Article Menu

The Classification of Log Decay Classes and an Analysis of Their Physical and Chemical Characteristics Based on Artificial Neural Networks and K-Means Clustering

Further Information

Guidelines

MDPI Initiatives

Follow MDPI