Next Article in Journal
Impact of Dietary Interventions with Schleiferilactobacillus harbinensis Z171, Its Exopolysaccharide, and Postbiotics on Hepatic Cholesterol Metabolism in High-Fat Diet-Fed Mouse Model
Next Article in Special Issue
A Systematic Review of Imaging Techniques for the Botanical and Geographical Classification of Coffee
Previous Article in Journal
From Waste to Worth: The Role of Fermentation in a Sustainable Future
Previous Article in Special Issue
Quality and Maturity Detection of Korla Fragrant Pears via Integrating Hyperspectral Imaging with Multiscale CNN–LSTM
 
 
Article
Peer-Review Record

Estimating Fibrosity Scores of Plant-Based Meat Products from Images: A Deep Neural Network Approach

by Abdullah Aljishi 1, Shirin Sheikhizadeh 2, Sanjoy Das 1,* and Sajid Alavi 2
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Submission received: 17 December 2025 / Revised: 26 January 2026 / Accepted: 4 February 2026 / Published: 12 February 2026

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

In this work, a deep neural network model was proposed to provide quantitative estimates of output fibrosity estimates. Results suggested that with only a reasonably limited amount of data and appropriate augmentation, the DNN could be trained to provide estimates with a high degree of accuracy. I think the manuscript is in the scope of the Foods and can be considered for the publication after minor revision. The following comments should be clarified.

Comment 1:In the study, pre-processing methods such as image threshold segmentation and rotation enhancement were used, which have certain rationality. But it is not explained why the resolution of 6032 × 6032 was chosen as the input size. It is suggested to supplement the explanation on the basis of image resolution selection.

Comment 2:The results displayed the differences in individual expert ratings and comprehensive ratings of the model, indicating that the model can capture expert rating habits. However, there is a lack of in-depth analysis on the reasons for the differences in ratings.

Comment 3: The generalization ability and robustness of the model should be further validated on a dedicated set of newly collected samples.

Comment 4:It is easy for deep learning models to overfit. How do the authors ensure that the model does not overfit?

Comments on the Quality of English Language

English can be improved.

Author Response

On behalf of my colleagues, I would like to express our gratitude to the esteemed reviewer for his/her comments. In response, I have made several additions/modifications to the revised article. We hope that these improvements fully satisfy the reviewer’s concerns. The following passages provide details of each comment by Reviewer-1, and the accompanying changes that I have made to the revised article.

 

Comment 1:In the study, pre-processing methods such as image threshold segmentation and rotation enhancement were used, which have certain rationality. But it is not explained why the resolution of 6032 × 6032 was chosen as the input size. It is suggested to supplement the explanation on the basis of image resolution selection.

 

I regret having overlooked this detail earlier. The paragraph below was inserted into the revised version of this article (page 6, lines 200-205):

“The preprocessed images’ horizontal and vertical sizes of  pixels, which was ~25% that of the largest raw image, was small enough to serve as DNN inputs, while also retaining all textural features. For comparison, in another application also involving plant-based meat analogues (Mishra et al., 2025) the input images to a ResNet-18 were of size  - an order of magnitude smaller than the present ones.”

 

Comment 2:The results displayed the differences in individual expert ratings and comprehensive ratings of the model, indicating that the model can capture expert rating habits. However, there is a lack of in-depth analysis on the reasons for the differences in ratings.

 

I fully concur with the reviewer’s observation. However, such an in-depth analysis of human perception is outside the scope of this research. My colleague (Dr. Alavi) and I believe that handpicking a priori a few specific textural features may not capture in its entirety, the underlying basis of an individual subject’s assessment. This rationale is addressed in the revised article (page 20/21, lines 715-721):

“Human scores were used only for the DNN’s training and evaluation; considering the possibility that some deeper aspects of human assessment may be dauntingly complex for this research (Auer et al, 2025), their underlying perceptual basis remains outside the scope of this study. This is unlike the approach taken in (Ma et al., 2024), where computer vision algorithms were applied to obtain a set of prespecified textural attributes, which were correlated with human visual inspection. Instead of selecting a priori only some features for investigation, a holistic approach has been adopted here.”

 

Moreover, excluding such an in-depth analysis has an advantage: The proposed neural network is customizable to fit individual preferences as well as for other plant-based meats. I have added the following sentence in this version of our article (page 21, lines 721-722):

“Only limited fine-tuning with additional data is needed to customize a DNN for other plant-based meat analogues as well as for other desired textural features.”

 

Comment 3: The generalization ability and robustness of the model should be further validated on a dedicated set of newly collected samples.

 

We had stated previously that in each experiment, the relevant dataset ( , , or ) was divided into (disjoint) training and testing sets. I have attempted to make this amply clear at several places in this revised article, such as the following statement (page 12, lines 441-444):

“Samples were drawn randomly from some dataset  and divided in the standard ratio of 85:15 into two a training set  and a test set , Whence, , , , and .”

 

Another statement clarifying our approach is (page 14, lines 512-514):

“Accordingly, the relevant dataset in an experiment is . As described earlier (Section 2),  was divided randomly into a training dataset , and a test dataset .”

 

Lastly, the figure legends (Figures 6,7,8) of the revised article also have statements along the same lines.

 

 

Comment 4:It is easy for deep learning models to overfit. How do the authors ensure that the model does not overfit?

 

The software package includes a variety of up-to-date training options, many of which were used in this research. These are as listed below.

 

Regularization. Minimizing only the sum squared error is mathematically equivalent to maximum likelihood (ML) estimation. It can be easily shown that adding an L2 regularization term is equivalent to maximum a posterori (MAP) estimation where the trainable parameters are assumed to follow Gaussian priors. Graduate level machine learning textbooks routinely provide formal proofs. In this research, L2 regularization was used during training. The revised article contains this statement (Page 13, line 469-470):

“To improve generalization and prevent overfitting, regularization techniques were employed.”

 

Dropout. It is widely believed that dropout is an indirect form of regularization. A mathematical proof (with a few simplifications) was provided in the original paper (Srivastava, Hinton, Krizhevsky, Sutskever, Salakhutdinov, JMLR:15, 2014). The revised paper shows that dropout was used in this research (Page 13, line 473):

“A dropout rate of 0.5 was applied to the FC layers….”

 

Early Stopping. This is perhaps the most direct, and “common sense” method to prevent overfitting. The revised paper shows that this was used to prevent overfitting (page 13, line 476-478):

“Early stopping was implemented with a patience of 20 epochs, ensuring that training halted once the performance began to plateau.”

 

I also wish to draw the reviewer’s attention to the figure below where the synthetic images are shown with their ranks based on DNN’s estimated fibrosity scores, from 1 (highest fibrosity score) to 30 (least fibrosity score). Visual inspection shows a trend that we believe is surprisingly consistent with human observation. This precision would not have been possible if the DNN had been overtrained to “memorize” the training dataset.

 

 

 

 

 

Miscellaneous comments

 

The English could be improved to more clearly express the research.

 

I fully concur with the reviewer’s observation. The current version of the article has been proof-read thoroughly. Several improvements have been made throughout the article, beginning with the abstract and up to the concluding section.

 

Figures and tables can be improved.

 

The following improvements were made:

  • Figure 7 (which had two plots) of the previous version was split into Figures 7 and 8. Appropriate changes were made to the figure legends, and to the main text.
  • Figures 1 – 3, 5 – 8 were enlarged. Uploaded separately are larger images in *.bmp format (previously they were in *.png format).
  • Rows/columns in Tables 1, 2 have been shaded. Only numerical results are shown in white background, while descriptors have light blue backgrounds.
  • Note that the entries inside small white boxes in Figure 3 are redundant – these values are also provided in Table 1 (columns 1, 3).
  • It is unrealistic to expect any other improvement in Figure 4. All elements in it are meticulously colored and positioned for maximum clarity. In contrast, practically all other published research papers in deep neural networks only provide Pytorch generated figures. Please see the figures below:

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

The authors of the paper "Estimating Fibrosity Scores of Plant-Based Meat Products from Images: A Deep Neural Network Approach" propose a deep learning–based model to estimate fibrosity scores of extruded plant-based meat products directly from macro-scale images. A modified ResNet-18 architecture is trained using image data labelled by two human experts, and its performance is evaluated using statistical accuracy metrics. The study further explores model explainability by testing the trained network on synthetically generated images to assess whether the learned features are related to relevant structural characteristics such as porosity and fibrous patterns.
The paper addresses an interesting and relevant problem. However, several issues should be addressed to further strengthen the manuscript:

1. The manuscript does not clearly articulate the scientific novelty or the specific research gap addressed by the proposed approach. Although related work on computer vision–based fibrosity assessment is cited, it remains unclear how the proposed method advances beyond existing tools such as Fiberlyzer or recent machine learning–based fibrosity estimation approaches.
2. The very limited size of the dataset raises serious concerns regarding the robustness and generalizability of the proposed deep learning model. The authors should explicitly discuss the limitations imposed by the small dataset and provide a stronger justification for adopting a deep neural network rather than simpler or more data-efficient regression models.
3. Important implementation details required for reproducibility are missing, including the training–validation split strategy, loss functions, optimisation algorithms, learning rates, number of epochs, and stopping criteria.
4. The results section lacks sufficient depth and critical analysis. Performance evaluation is limited, and no comparison is provided with alternative models or baselines. In addition, statistical significance, variability, and uncertainty measures are not reported.
5. The process of generating synthetic images requires a more detailed explanation. In particular, the rationale behind their design, their relationship to real fibrous structures, and their impact on training or evaluation outcomes should be clearly justified and quantitatively assessed.
6. Given the small size of the training dataset, I suggest that the authors consider using pretrained deep learning models and fine-tuning them for the regression task. Prior studies have shown that transfer learning often yields superior performance compared to training models from scratch in data-scarce scenarios.
7. Adding a table of nomenclature is strongly recommended to improve the readability of the manuscript.
8. The manuscript does not fully adhere to the journal’s reference formatting and citation style.

Author Response

On behalf of my colleagues, I would like to express our gratitude to the esteemed reviewer for the observations about the earlier manuscript. In response, I have made several additions/modifications to this revision, and added several new references that were necessary. The last two suggestions were not implemented but justifications given.

We hope that with these improvements, the revised article fully satisfies the reviewer’s concerns. Details of each comment by Reviewer-2, and the accompanying changes made to the revised article are in the passages below.

 

  1. The manuscript does not clearly articulate the scientific novelty or the specific research gap addressed by the proposed approach. Although related work on computer vision–based fibrosity assessment is cited, it remains unclear how the proposed method advances beyond existing tools such as Fiberlyzer or recent machine learning–based fibrosity estimation approaches.

My colleague (Dr. Alavi) and I believe that the Fiberlyzer tool is not an established standard. It applies a few handpicked textural features from the images. Some of those features (see Figure 2 (right): shape analysis in that paper) were not applied in our research. On the other hand, the paper on Fiberlyzer does not address some other features that were examined directly in our research (e.g. orientation, coverage, number of air cells). Hence these are two different approaches with dissimilar objectives. The revised article includes the statement to this effect (page 20, lines 715-721):

“Human scores were used only for the DNN’s training and evaluation; considering the possibility that some deeper aspects of human assessment may be dauntingly complex for this research (Auer et al, 2025), their underlying perceptual basis remains outside the scope of this study. This is unlike the approach taken in (Ma et al., 2024), where computer vision algorithms were applied to obtain a set of prespecified textural attributes, which were correlated with human visual inspection. Instead of selecting a priori only some features for investigation, a holistic approach has been adopted here.”

 

Moreover, there is an advantage in our approach: The proposed neural network is customizable to fit individual preferences as well as for other plant-based meats. I have added the following sentence in this version of our article (page 21, lines 721-722):

“Only limited fine-tuning with additional data is needed to customize a DNN for other plant-based meat analogues as well as for other desired textural features.”

 

  1. The very limited size of the dataset raises serious concerns regarding the robustness and generalizability of the proposed deep learning model. The authors should explicitly discuss the limitations imposed by the small dataset and provide a stronger justification for adopting a deep neural network rather than simpler or more data-efficient regression models.

My response to the previous comment #1 provides the justification for using a DNN in this research. We did not select only a prespecified subset of textural features to be able to use classical computer vision algorithms. On the contrary, we took the view that not all determinants of fibrosities are easily identifiable for standard algorithms to be used. Thus, such simple regression models were inapplicable for our research goals.

We have been forthright about the somewhat limited amount of collected data – statements to this effect were present even in the previous version of this article. However, we have shown that appropriate spatial preprocessing and data augmentation techniques can be used to circumvent this limitation. The revised article has the following statement (page 20, lines 679-683):

“In spite of limited image samples and prior human scores, the DNN could be trained for this purpose, whose accuracy is reflected through multiple statistical performance metrics. This task was accomplished using a suitable ResNet-18 layout with an additional layer, combined with appropriate spatial image preprocessing, data enhancement, and transfer learning.”

The limitations of this research have been addressed in the revised article (page 20, lines 699-706):

“Needless to say, this research is not without limitations. Although it highlights the feasibility of using such DNNs to assess the granularities of extruded plant-based meat products from camera images, sans human intervention, all real images used here were obtained solely by the present team. An in-depth analysis of human assessment would have been possible by collecting subject scores from a larger group of human experts. The DNN’s estimates was interpreted through visual observations. Quantifying the matrix and cell properties in the synthetic images would have allowed for more mathematically rigorous interpretation analysis.”

 

Explainability analysis with synthetic images is a fundamental aspect of this research. This is clarified in the revised manuscript (page 20, lines 693-698):

“Lastly, the outcome of the experiment with synthetic images is noteworthy. In the authors’ views, the DNN’s estimated granularity scores followed a remarkably consistent pattern that was amenable to simple, straightforward interpretation in terms of features of the input images. The study strongly suggests that the DNN’s estimation scheme was based on the extent of coverage provided to the food matrix by the air cells contained in it, the number of them present and their elongations”

 

I also wish to draw the reviewer’s attention to the figure below where the synthetic images are shown with their ranks based on DNN’s estimated fibrosity scores, from 1 (highest fibrosity score) to 30 (least fibrosity score). Visual inspection shows a trend that we believe is surprisingly consistent with human observation.

 

 

 

 

  1. Important implementation details required for reproducibility are missing, including the training–validation split strategy, loss functions, optimisation algorithms, learning rates, number of epochs, and stopping criteria.

The revised version of this article contains all these implementation details (page 13, lines 453-479):

“Details of the training algorithm are not provided here, as they are standardized aspects that are built-in within Pytorch (Paszke et al., 2019) and the Torchvision package (Marcel and Rodriguez, 2010).

…….

…….

The DNN was trained using the ADAM optimizer (Marcel and Rodriguez, 2010; Paszke et al., 2019). To improve generalization and prevent overfitting, various regularization techniques were employed. Although secondary aspects of training are not addressed in this article, the associated training parameters were as follows. The learning rate was kept at . A dropout rate of 0.5 was applied to the FC layers, and the weight decay (L2 regularization) was set to 0.001. Additionally, a learning rate scheduler (ReduceLROnPlateau) was applied to reduce the learning rate by a factor of 0.5 whenever the validation loss would not decrease for three consecutive epochs. Early stopping was implemented with a patience of 20 epochs, ensuring that training halted once the performance began to plateau. The DNN was trained for up to 5,000 epochs, with batch sizes of 8 and 32 for the training and validation datasets.”

 

  1. The results section lacks sufficient depth and critical analysis. Performance evaluation is limited, and no comparison is provided with alternative models or baselines. In addition, statistical significance, variability, and uncertainty measures are not reported.

As argued above, there are no other models that can be used for comparison. Statistical metrics have been provided in Tables 1 and 2. In Table 1, we provide average absolute errors, mean squared errors, correlation coefficients, coefficients of determination, and the slopes and intercept of linear regression. In Table 2, we provide the means, medians, range, and standard deviations of 30 outputs each. I believe that these are enough to establish the efficacy of this research.


  1. The process of generating synthetic images requires a more detailed explanation. In particular, the rationale behind their design, their relationship to real fibrous structures, and their impact on training or evaluation outcomes should be clearly justified and quantitatively assessed.

The revised version incorporates the following explanation (page 7, lines 251-268):

“The immediate purpose of synthesizing additional images was to ensure that the trained DNN was free of inductive bias (Shah and Sureja, 2025), i.e. that its output estimates were independent of any extraneous features in the real image samples. Inductive bias in DNNs, where they learn to pick artificial cues from their training datasets, has been long identified as a problem in supervised learning tasks (Wang and Wu, 2023; Wehrli et al. 2022; Shah and Sureja, 2025). Although bias in homogeneous DNNs has been extensively studied (Vardi, 2023), it is not well understood in the context of heterogeneous DNNs, including ResNets.

More broadly, synthetic images would allow the DNN’s output estimation be more interpretable (explainable). Explainable AI is a topic of significant interest (Ibrahim and Shafiq, 2023; Sharma et al., 2024; Kalasampath et al., 2025). Explainable AI methods have been explored in image processing applications (Bennetot et al., 2024; Cheng et al., 2025).

To ensure that the DNN was not sensitive to irrelevant image features, and to render its estimation more interpretable, a total of  synthetic images were created. Each image was assigned a unique index number between 1 and 30. Based on their shapes, the synthetic images fell under the following four categories, (i) “large circle” (LC), (ii) “box” (BO), (iii) “ellipse” (EL), and (iv) “small circle” (SC).”


  1. Given the small size of the training dataset, I suggest that the authors consider using pretrained deep learning models and fine-tuning them for the regression task. Prior studies have shown that transfer learning often yields superior performance compared to training models from scratch in data-scarce scenarios.

This research does indeed use a pre-trained network and transfer learning to train only relevant layers. The revised article states inasmuch at various places (page 3, lines 118-122):

“Using data collected for this investigation, a few layers of the original ResNet-18 were retrained for regression. This technique, called transfer learning, is used to curtail the needed training time (Razavi et al, 2024; Senapati et al., 2023). Recent research reports the use of transfer learning for a similar application (Ma et al., 25a).”

This is mentioned elsewhere in the revised article (page 20, lines 679-683):

“In spite of limited image samples and prior human scores, the DNN could be trained for this purpose, whose accuracy is reflected through multiple statistical performance metrics. This task was accomplished using a suitable ResNet-18 layout with an additional layer, combined with appropriate spatial image preprocessing, data enhancement, and transfer learning.”

Again, in this version (page 21, lines 721-728):

“Analysis of the DNN’s scores with synthetic image inputs, illustrate that an undue amount of experimental data is not needed to elicit high performance accuracy. This task can be achieved by selecting a suitable layout (e.g. the extended ResNet layout proposed here), and appropriate data preprocessing, augmentation, and transfer learning steps.”


  1. Adding a table of nomenclature is strongly recommended to improve the readability of the manuscript.

The journal format only provides space for a list of abbreviations. We choose not to do so, because only two abbreviations (DNN, ResNet) have been used multiple times. That, in my view, is too little to warrant a separate list.


  1. The manuscript does not fully adhere to the journal’s reference formatting and citation style.

Since 2026 MDPI has begun to allow free format submissions. It was for reviewer convenience that this article (except references) was formatted according to MDPI specifications.

 

Miscellaneous comment:

 

Figures and tables can be improved.

 

The following improvements were made:

  • Figure 7 (which had two plots) of the previous version was split into Figures 7 and 8. Appropriate changes were made to the figure legends, and to the main text.
  • Figures 1 – 3, 5 – 8 were enlarged. Uploaded separately are larger images in *.bmp format (previously they were in *.png format).
  • Rows/columns in Tables 1, 2 have been shaded. Only numerical results are shown in white background, while descriptors have light blue backgrounds.
  • Note that the entries inside small white boxes in Figure 3 are redundant – these values are also provided in Table 1 (columns 1, 3).
  • It is unrealistic to expect any other improvement in Figure 4. All elements in it are meticulously colored and positioned for maximum clarity. In contrast, practically all other published research papers in deep neural networks only provide Pytorch generated figures. Please see the figures below:

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

I appreciate the authors' thoughtful responses and the revisions made to the manuscript.
The majority of my concerns have been adequately addressed, and I recommend that the paper be accepted as is.

Back to TopTop