Next Article in Journal
Exploring Computational Methods to Advance Clinical Decision Making
Next Article in Special Issue
Minimization of Average Peak Age of Information for Timely Status Updates in Two-Hop IoT Networks
Previous Article in Journal
Middle Rock Pillar Stability Criteria for a Bifurcated Small Clear-Distance Tunnel
Previous Article in Special Issue
A Review of Traffic Flow Prediction Methods in Intelligent Transportation System Construction
 
 
Article
Peer-Review Record

AI-Based Smart Monitoring Framework for Livestock Farms

Appl. Sci. 2025, 15(10), 5638; https://doi.org/10.3390/app15105638
by Moonsun Shin 1,*, Seonmin Hwang 2,* and Byungcheol Kim 3
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Appl. Sci. 2025, 15(10), 5638; https://doi.org/10.3390/app15105638
Submission received: 29 March 2025 / Revised: 4 May 2025 / Accepted: 14 May 2025 / Published: 18 May 2025
(This article belongs to the Special Issue Future Information & Communication Engineering 2024)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This paper focuses on the RT-DETR deep learning object detection model and proposes an AI-based intelligent livestock monitoring framework to identify lesions and abnormal behaviors in cattle, with comparative analyses against YOLO-series models. The research topic holds practical application value, and the methodology demonstrates a degree of cutting-edge relevance, intended for smart livestock farm monitoring and management scenarios, aligning with current trends in agricultural intelligence. However, significant issues remain in terms of research completeness, scientific rigor, and standardization. Major revisions or re-review are recommended prior to reconsideration for publication.

Specific Issues and Revision Recommendations:

Unclear Research Objectives and Innovation (Lines 10-28): While the paper proposes a smart livestock framework, it inadequately addresses the limitations of existing research and fails to highlight the practical differences and technical innovations of RT-DETR compared to YOLO methods.

Recommendation: Explicitly state the core innovations in the Introduction, e.g., "first application of RT-DETR to livestock behavior recognition" or "resolving the low accuracy and high redundancy of traditional YOLO models in behavior recognition," to strengthen the argument for academic value.

Incomplete Methodological Details (Lines 239-306): The RT-DETR model structure is described too briefly, lacking explanations of key modules (e.g., CCFF, AIFI) and justification for parameter selection.

Recommendation: Add:

Detailed network parameter settings;

Functional mechanisms of key modules (e.g., principles and advantages of AIFI and CCFF);

Theoretical or experimental basis for hyperparameters (learning rate, batch size, optimizer);

Layer-wise dimensions and parameter annotations in network diagrams to improve reproducibility.

Ambiguous Dataset Description and Errors (Line 250, Table 1): Table 1 states 13,193 images, while Line 252 claims 8,142 images. Line 317 erroneously refers to "PCB images" (presumably a typo).

Recommendation:

Clarify data sources and quantities;

Align text and table data;

Correct "PCB images" to "cattle images" or "livestock images."

Poor Quality and Labeling of Figures/Tables. Fonts types and sizes in all figures and tables are inconsistent.

Ensure that all captions of figures and tables are correct in format

Figure 1: Missing image source attribution. If sourced from national livestock research institutions, cite appropriately.

Figure 6: Unclear annotations. Add labels for backbone, encoder, decoder, and data flow.

 

Figure 9: Blurry structure diagram lacking parameter details. Provide a clearer schematic with layer-wise parameters (e.g., convolution/Transformer specs). Repeated dot in Line 323: Figure 9.. Network structure diagram 

Caption of Figure 15 in Line 366: formatting issue.

Figure 16: Missing detection boxes and class labels. Include annotated detection results with confidence scores.

 

Table 2: Misleading title "Training Dataset" includes validation/test sets. Rename to "Dataset Partition."

Superficial Experimental Analysis (Lines 383-427): Results are presented graphically but lack depth. No discussion on why RT-DETR outperforms YOLO, or on metric fluctuations/false detection causes.

Recommendation:

Add quantitative metrics (FLOPs, inference time);

Analyze RT-DETR’s advantages in detecting small lesions or limbs;

Discuss accuracy variations across behaviors/viewing angles to strengthen validity.

Language and Formatting Errors:

Grammar: Line 198: "the We apply" → "we apply";

Redundancy: Revise repetitive phrasing, e.g., "we confirmed that the RT-DETR shows high performance…";

In Line 351: formatting issue.

Author Response

Response to Reviewer 1 Comments

To the Reviewers and Editor:

We are very pleased to have received your valuable comments regarding our manuscript entitled “AI-based Smart Monitoring Framework for Livestock Farms”.  

In the revision step, we have revised our manuscript according to the editor’s and reviewers’ comments, and our revisions are provided below.

Point 1: Unclear Research Objectives and Innovation (Lines 10-28): While the paper proposes a smart livestock framework, it inadequately addresses the limitations of existing research and fails to highlight the practical differences and technical innovations of RT-DETR compared to YOLO methods.

Response 1: We revised the introduction section so that the objectives and innovation of our research can be revealed(Lines 10-28).

 

Point 2: Incomplete Methodological Details (Lines 239-306): The RT-DETR model structure is described too briefly, lacking explanations of key modules (e.g., CCFF, AIFI) and justification for parameter selection.

 

Response 2: We revised our manuscript((Lines 239-306) and added explanations of key modules.

Point 3: Detailed network parameter settings; Functional mechanisms of key modules (e.g., principles and advantages of AIFI and CCFF); Theoretical or experimental basis for hyperparameters (learning rate, batch size, optimizer); Layer-wise dimensions and parameter annotations in network diagrams to improve reproducibility. Ambiguous Dataset Description and Errors (Line 250, Table 1): Table 1 states 13,193 images, while Line 252 claims 8,142 images. Line 317 erroneously refers to "PCB images" (presumably a typo).

Response 3: We added and supplemented the functional mechanism of the main module, the network parameter settings, and the description of the layer-by-layer dimension and parameters of the network diagram. We also corrected the number of images and typos presented in the table.

Point 4: Align text and table data; Correct "PCB images" to "cattle images" or "livestock images."

Response 4:  We revised and corrected the wrong text and table data.   

Point 5: Poor Quality and Labeling of Figures/Tables. Fonts types and sizes in all figures and tables are inconsistent. Ensure that all captions of figures and tables are correct in format

Response 5: We have revised and reorganized the issues of the figures and the table you mentioned.

Point 6: Superficial Experimental Analysis (Lines 383-427): Results are presented graphically but lack depth. No discussion on why RT-DETR outperforms YOLO, or on metric fluctuations/false detection causes.

Response 6: We have revised our manuscript with the graphs presented in the experimental results.

Point 7: Language and Formatting Errors

Response 7: We have corrected all the issues pointed out regarding Language and Formatting Errors.

 

 

Regarding the quality of English, we had our paper edited by an expert again.

We believe that the topic of our paper is appropriate for the special issue "Future Information & Communication Engineering 2024".

We greatly appreciate your helpful comments, which we feel have improved our manuscript.

Author Response File: Author Response.docx

Reviewer 2 Report

Comments and Suggestions for Authors

First, I would like to congratulate the authors on their work and say that I find it very interesting.
There are several issues I would like to comment on.
The "Data Inspection" section needs to be revised and better explained. In the images shown, it's difficult to get an idea of ​​the lesions considered. Almost nothing is visible. Nor is there any explanation as to what they mean when they talk about "many contaminants" and what type of lesion can be inferred from that. When they talk about detecting immobile cows, they only mention detecting them from the front or rear, but not in any other position. Why is this?
Regarding the results, when they explain Figure 10, the confusion matrix, they don't give any explanation for the result shown for the background. In complex classification systems, and this one appears to be one, especially given the poor resolution shown in the images, it is very difficult to achieve success rates of 100%.
On the other hand, it is also striking that they give the memory of the equipment used in KB.
There's an error on line 398: "YOLOv8.Recall" and I am not sure if it is correct to use the Saxon genitive, as it appears several times in the text.

Author Response

Response to Reviewer 2 Comments

To the Reviewers and Editor:

We are very pleased to have received your valuable comments regarding our manuscript entitled “AI-based Smart Monitoring Framework for Livestock Farms”.  

In the revision step, we have revised our manuscript according to the editor’s and reviewers’ comments, and our revisions are provided below.

Point 1: The "Data Inspection" section needs to be revised and better explained. In the images shown, it's difficult to get an idea of ​​the lesions considered. Almost nothing is visible. Nor is there any explanation as to what they mean when they talk about "many contaminants" and what type of lesion can be inferred from that. When they talk about detecting immobile cows, they only mention detecting them from the front or rear, but not in any other position. Why is this?

Response 1: We revised the "Data Inspection" section. There are four types of lesions we analyzed. We added a detailed explanation for them. At present, the dataset includes only postural data obtained under the current CCTV recording setup. In future work, we plan to augment the dataset by incorporating data captured from multiple angles to improve the robustness and generalizability of the model.

Point 2: Regarding the results, when they explain Figure 10, the confusion matrix, they don't give any explanation for the result shown for the background. In complex classification systems, and this one appears to be one, especially given the poor resolution shown in the images, it is very difficult to achieve success rates of 100%.

Response 2: We have reviewed our manuscript and added a detailed description of the results shown for confusion matrix in Figure 10. In Figure 10, the "background" label indicates cases where an object was present but failed to be detected. A supplementary confusion matrix summarizing the number of detected and undetected instances is included. Specifically, in case01, the model achieved 154 true positive detections and 2 false negatives. The reference to the "background" class was deemed insignificant for the analysis and has been excluded from the discussion.

Point 3: There's an error on line 398: "YOLOv8.Recall" and I am not sure if it is correct to use the Saxon genitive, as it appears several times in the text.

Response 3: There was a problem with spacing in line 398. We carefully reviewed and corrected those parts again.

 Regarding the quality of English, we had our paper edited by an expert again.

We believe that the topic of our paper is appropriate for the special issue "Future Information & Communication Engineering 2024".

We greatly appreciate your helpful comments, which we feel have improved our manuscript.

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The authors have revised the manuscript to the standard of publication, it is now recommended to accept it in the current form.

Author Response

To the Reviewers and Editor:

We are very pleased to have received your valuable comments regarding our manuscript entitled “AI-based Smart Monitoring Framework for Livestock Farms”.

We greatly appreciate your helpful review comments, which we feel have improved our manuscript.

We believe that the topic of our paper is appropriate for the special issue "Future Information & Communication Engineering 2024".

Author Response File: Author Response.docx

Reviewer 2 Report

Comments and Suggestions for Authors

I think the "Data Inspection" section has been well explained; at least now we can distinguish, relatively clearly, in the images what type of injury the authors are referring to.
However, in the explanation of the confusion matrix in Figure 10, the authors say: "The 'background' label indicates cases where an object was present but failed to be detected." I still don't understand this explanation; according to the confusion matrix, what I understand is that the entire background is labeled as belonging to class 1. They also say they have included a supplementary confusion matrix, but I haven't been able to find it in the text.
On the other hand, in the previous review, I forgot to ask about how they performed the training, that is, the adjustment of the 32,814,296 parameters of the model. Whether they did it directly with the images from their training sets (training from scratch) or, if they retrained a previously trained model (fine-tuning), using transfer learning.

Author Response

To the Reviewers and Editor:

We are very pleased to have received your valuable comments regarding our manuscript entitled “AI-based Smart Monitoring Framework for Livestock Farms”.

Point 1 : "Data Inspection" section has been well explained; at least now we can distinguish, relatively clearly, in the images what type of injury the authors are referring to.

Response 1 : Thank you for your kind comments.

Point 2 : However, in the explanation of the confusion matrix in Figure 10, the authors say: "The 'background' label indicates cases where an object was present but failed to be detected." I still don't understand this explanation; according to the confusion matrix, what I understand is that the entire background is labeled as belonging to class 1. They also say they have included a supplementary confusion matrix, but I haven't been able to find it in the text.

Response 2 :  I agree with the reviewer's comments. We updated figure 10 and revised lines 343 - 346 as follows: In Figure 10, the "background" column indicates cases where an object was not present but has been detected. A confusion matrix summarizing the number of detected and undetected instances is included. Specifically, in case01, the model achieved 154 true positive detections and 2 false positives.

Point 3 : On the other hand, in the previous review, I forgot to ask about how they performed the training, that is, the adjustment of the 32,814,296 parameters of the model. Whether they did it directly with the images from their training sets (training from scratch) or, if they retrained a previously trained model (fine-tuning), using transfer learning.

Response 3 : We trained the dataset using transfer learning.

We greatly appreciate your helpful review comments, which we feel have improved our manuscript.

We believe that the topic of our paper is appropriate for the special issue "Future Information & Communication Engineering 2024".

Author Response File: Author Response.docx

Back to TopTop