Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Printed Edition

A printed edition of this Special Issue is available at MDPI Books....

Share Help Cite Discuss in SciProfiles

Open AccessEditor’s ChoiceArticle

Peer-Review Record

An Attention Mechanism-Improved YOLOv7 Object Detection Algorithm for Hemp Duck Count Estimation

Agriculture 2022, 12(10), 1659; https://doi.org/10.3390/agriculture12101659

by Kailin Jiang^1,†

, Tianyu Xie², Rui Yan², Xi Wen², Danyang Li^2,†

, Hongbo Jiang², Ning Jiang³

, Ling Feng², Xuliang Duan²

and Jianjun Wang^1,*

Reviewer 1:

Miha Lavric

Reviewer 2:

Hamdy Soliman

Reviewer 3:

Suresh Merugu

Agriculture 2022, 12(10), 1659; https://doi.org/10.3390/agriculture12101659

Submission received: 18 August 2022 / Revised: 3 October 2022 / Accepted: 3 October 2022 / Published: 10 October 2022

(This article belongs to the Special Issue Internet and Computers for Agriculture)

Round 1

Reviewer 1 Report

The paper showcases incremental improvements of detecting ducks, which can have significant economic implications for the industry when applied and further refined.

The experimental set-up is sound and clear. The paper does need some form and text corrections, please see below. If those are addressed, the paper will be suitable for publication

Rephrase/correct sentence line 64/65/66 "At present, in the hemp ducks farming industry, a lot of counting work by hand, artificial mechanical counting laborious and laborious." Perhaps the authors meant:

"At present, in the hemp ducks farming industry, a lot of counting is done manually or by artificial mechanical, both being very laborious." ???

Correct sentence line 72/73:

"With the development of technology, monitoring equipment plays a huge role in the farm." to "With the development of technology, monitoring equipment plays a huge role on the farm."

Line 99

"results" to results."

The authors state that "In summary, the main contributions of this study are:

1. The dataset used in this study is the first release, and we constructed a new large-scale hemp ducks image dataset, which contains 1500 hemp ducks object detection whole-body frame labeling and head frame labeling.

2. This study constructs a comprehensive working baseline, including: hemp ducks identification, hemp ducks object detection, and hemp ducks image counting, to re-alize the intelligent breeding of hemp ducks.

3. This project model is introducing CBAM module to build CBAM-YOLOv7 algorithm.

Correct lines 237 to 239

"CBAM [22] is a lightweight Attention module, which can perform Attention in the channel and spatial dimensions. It is composed of channel Attention module (CAM) and spatial Attention module (SAM) CAM can make the network pay more attention to the foreground of the image and the meaningful area, while SAM can make the network pay more attention to the position rich in context information in the whole picture [23,24]."

"CBAM [22] is a lightweight attention module, which can perform attention in the channel and spatial dimensions. It is composed of channel attention module (CAM) and spatial attention module (SAM). CAM can make the network pay more attention to the foreground of the image and the meaningful area, while SAM can make the network pay more attention to the position rich in context information in the whole picture [23,24]."

Clean-up/rephrase lines 245 to 250, perhaps in shorter sentences, to explain in which places of the detection pipeline the insertion of CBAM might worsen the total performance and where it would improve it.

Correct line 255 (add spaces between words and abbreviations)

"subjected to a global max pooling(GMP) and a global average pooling(GAP), and two"

"subjected to a global max pooling (GMP) and a global average pooling (GAP), and two"

Add period at end of line 260.

Line 334 and 335 - Not exact. Change to:

"As can be seen from Table 2, YOLOv7 is overall better than other detection algorithms tested, leading in terms of precision, F1-score, [email protected] and a close second in terms of Recall, [email protected]:0:0.95 and detection speed."

Line 352: "Calculate pressure." What is the meaning of this sentence here?

Line 355: "Algorithms are flat." What is the purpose of this sentence here?

Line 368: "conduct experiment." What is the purpose of this sentence here?

Line 384/385: "After multiple convolutions nuclei, merged into a single feature, so that small objects cannot be identified." Correct language / rephrase.

Line 385/386: Correct "Therefore, Headannotation’s method is not suitable for the task of count estimation on the hemp Duck dataset." to Therefore, the Head annotation method is not suitable for the task of count estimation on the hemp Duck dataset."

Line 391/392

Correct "The prediction boxes in the left result picture all locate the whole body of the hemp duck, and the prediction box in the right result picture only locates the head of the hemp duck." to "The prediction boxes in the left panel of Figure 11 all locate the whole body of the hemp duck, and the prediction boxes in the right panel only locate the head of the hemp 392 duck."

Line 402 Delete "Table 5" word.

Line 407 to 409 Rephrase "As can be seen from Table 5, under the action of different Tricks, each group of experiments obtained different experimental results. The data show that the 4th set of experiments is the best." to

"As can be seen from Table 5, overall the simultaneous use of Mosaic and MixUp fares better than using just one Trick or none."

Line 421: Delete "very important impact"

Line 425: Delete sentence "An important..."

Line 432: Delete "for the problem"

Line 435 to 438: Rephrase, shorter sentences, improve language

Line 471 to 475: Shorter sentences.

Line 484: "can be collected personally"? Are you referring to the possibility that the data is open to be made available upon request?

Author Response

Response to Reviewer 1 Comments

Point 1: Rephrase/correct sentence line 64/65/66 "At present, in the hemp ducks farming industry, a lot of counting work by hand, artificial mechanical counting laborious and laborious." Perhaps the authors meant:"At present, in the hemp ducks farming industry, a lot of counting is done manually or by artificial mechanical, both being very laborious." ???

Response 1: Terribly sorry, this is our mistake, thank you very much for pointing out the mistake for us, we appreciate your modification suggestion very much.

Point 2: Correct sentence line 72/73:"With the development of technology, monitoring equipment plays a huge role in the farm." to "With the development of technology, monitoring equipment plays a huge role on the farm."

Response 2: I'm really sorry that this is still our mistake. We appreciate your modification and have revised the content of the manuscript according to your modification!

Point 3: Line 99 "results" to results."The authors state that "In summary, the main contributions of this study are:

The dataset used in this study is the first release, and we constructed a new large-scale hemp ducks image dataset, which contains 1500 hemp ducks object detection whole-body frame labeling and head frame labeling.
This study constructs a comprehensive working baseline, including: hemp ducks identification, hemp ducks object detection, and hemp ducks image counting, to re-alize the intelligent breeding of hemp ducks.
This project model is introducing CBAM module to build CBAM-YOLOv7 algorithm.

Response 3: This part is the main contribution of our paper. 1. Our project data set can be made public to provide data support for the mallard duck breeding industry. 2. Our work process can help the wisdom of duck breeding industry. 3. Improve and innovate YOLOv7 algorithm to improve the detection effect.

Point 4: Correct lines 237 to 239 "CBAM [22] is a lightweight Attention module, which can perform Attention in the channel and spatial dimensions. It is composed of channel Attention module (CAM) and spatial Attention module (SAM) CAM can make the network pay more attention to the foreground of the image and the meaningful area, while SAM can make the network pay more attention to the position rich in context information in the whole picture [23,24]." To "CBAM [22] is a lightweight attention module, which can perform attention in the channel and spatial dimensions. It is composed of channel attention module (CAM) and spatial attention module (SAM). CAM can make the network pay more attention to the foreground of the image and the meaningful area, while SAM can make the network pay more attention to the position rich in context information in the whole picture [23,24]."

Response 4: Sorry again, this is still our fault.We appreciate your modification and have revised the content of the manuscript according to your modification

Point 5: Correct line 255 (add spaces between words and abbreviations) "subjected to a global max pooling(GMP) and a global average pooling(GAP), and two" to "subjected to a global max pooling (GMP) and a global average pooling (GAP), and two"

Response 5: Sorry again, this is still our fault.We appreciate your modification and have revised the content of the manuscript according to your modification!

Point 6: Add period at end of line 260.

Response 6: Sorry again, this is still our fault.We appreciate your modification and have revised the content of the manuscript according to your modification!

Point 7: Line 334 and 335 - Not exact. Change to:"As can be seen from Table 2, YOLOv7 is overall better than other detection algorithms tested, leading in terms of precision, F1-score, [email protected] and a close second in terms of Recall, [email protected]:0:0.95 and detection speed."

Response 7: Sorry again, this is still our fault.We appreciate your modification and have revised the content of the manuscript according to your modification!

Point 8: Line 352: "Calculate pressure." What is the meaning of this sentence here?

Response 8: I'm terribly sorry, this is our translation error. Our original intention is that the experimental results in Table 3 show that the effects of SE-Yolov7 and ECA-YOLOV7 algorithms are not only inferior to the original YOLOv7, but even increase the model parameters and increase the computational pressure.We are very sorry for the confusion caused to your work due to our errors in expression and translation. Thank you very much for your modification!

Point 9: Line 355: "Algorithms are flat." What is the purpose of this sentence here?

Response 9: I'm terribly sorry, this is our translation error. Our original intention is that according to the results in Table 3, we can see that SE-Yolov7 and ECA--YOLOv7 models have the same numerical magnitude in terms of the number of FLOPS. Thank you very much for your modification. We have revised the content in the manuscript.

Point 10: Line 368: "conduct experiment." What is the purpose of this sentence here?

Response 10: I'm very sorry. This is our mistake in writing and translation. We have deleted "conduct experiment." from the manuscript. Thank you very much for your modification.

Point 11: Line 384/385: "After multiple convolutions nuclei, merged into a single feature, so that small objects cannot be identified." Correct language / rephrase.

Response 11: Thank you for your modification. We have made some mistakes in the expression of this sentence. As a result, We change it to "After multiple convolution operations, only one feature may be generated, Which leads to the failure of Pixel recognition."

Point 12: Line 385/386: Correct "Therefore, Headannotation’s method is not suitable for the task of count estimation on the hemp Duck dataset." to Therefore, the Head annotation method is not suitable for the task of count estimation on the hemp Duck dataset."

Response 12: I'm really sorry. This is our writing mistake. Thank you very much for your modification. We have revised the content in the manuscript!

Point 13: Line 391/392 Correct "The prediction boxes in the left result picture all locate the whole body of the hemp duck, and the prediction box in the right result picture only locates the head of the hemp duck." to "The prediction boxes in the left panel of Figure 11 all locate the whole body of the hemp duck, and the prediction boxes in the right panel only locate the head of the hemp 392 duck."

Response 13: Thank you very much for your modification. We have revised the content in the manuscript.

Point 14: Line 402 Delete "Table 5" word.

Response 14: Sorry, this is our mistake. Thank you very much for your modification. We have revised the content in the manuscript.

Point 15: Line 407 to 409 Rephrase "As can be seen from Table 5, under the action of different Tricks, each group of experiments obtained different experimental results. The data show that the 4th set of experiments is the best." to"As can be seen from Table 5, overall the simultaneous use of Mosaic and MixUp fares better than using just one Trick or none."

Response 15: Thank you very much for your modification. We have revised the content in the manuscript.

Point 16:Line 421: Delete "very important impact"

Response 16: Sorry, this is our mistake. Thank you very much for your modification. We have revised the content in the manuscript.

Point 17: Line 425: Delete sentence "An important..."

Response 17: Sorry, this is our mistake. Thank you very much for your modification. We have revised the content in the manuscript.

Point 18: Line 432: Delete "for the problem"

Response 18: Sorry, this is our mistake. Thank you very much for your modification. We have revised the content in the manuscript.

Point 19: Line 435 to 438: Rephrase, shorter sentences, improve language

Response 19: Sorry, this is our mistake. Thank you very much for your modification. We have revised the content in the manuscript.

Point 20:Line 471 to 475: Shorter sentences.

Response 20: Sorry, this is our mistake. Thank you very much for your modification. We have revised the content in the manuscript.

Point 21:Line 484: "can be collected personally"? Are you referring to the possibility that the data is open to be made available upon request?

Response 21: Since the dataset was personally collected and processed by the team members, we were able and willing to disclose our dataset, and we added the method of obtaining the dataset at the end of the manuscript.

Author Response File: Author Response.docx

Reviewer 2 Report

Please read my attached file.

Comments for author File: Comments.pdf

Author Response

Response to Reviewer 2 Comments

Point 1: Line #16-19 Comments：“…this paper 16 proposes an attention mechanism improved YOLOv7 algorithmCBAM-YOLOv7, adding three CBAM modules to the backbone network of

YOLOv7 to improve the ability of the network to extract features, and introducing SE and ECA modules for comparison experiments.”I am confused about the EXACT contribution of the paper and what is the EXACT name(s) of such new improvement that should include“SE” and ECA.

Response 1: I am very sorry that I will explain to you about our contribution to this manuscript.

Firstly, we collected a large number of images and video data of mallard ducks from the original waterfowl farm in Ya 'an City, Sichuan Province, China, and manually screened and discarded redundant data to finally select 1500 images. At the same time, we also annotated the whole body data of the duck and the head data of the duck, and constructed a new large-scale target detection dataset of the duck. It provides data support for the intelligent development of hemp duck breeding industry. 2. Secondly, our study constructed a comprehensive work baseline, including duck target detection and recognition, and duck image counting. At the same time, we also carried out experiments to analyze the advantages and disadvantages of whole body labeling and head labeling. To a certain extent, we have realized the intelligent development of the hemp duck industry, and can develop to chicken, goose and other poultry breeding industry. 3. We improved the YOLOv7 algorithm. We introduced CBAM, ECA, SE three attention mechanism modules and conducted a comparative test. The final experimental results show that the CBAM-YoloV7 algorithm has the best detection effect.

I hope my explanation can relieve your trouble.

Next is the answer to the second question: I'm very sorry about the statement of "SE and ECA modules". This is a mistake in our writing and translation work. Thank you for your suggestions. We have revised them all to "SE-YOLOv7 and ECA-YOLOv7" in the manuscript.

We have completed the modification according to your suggestions. If there are still deficiencies, we will continue to modify. Thank you for your comments.

Point 2: Line #18-19 Comments：acronyms CBAM, SE, ECA

Response 2: Thank you very much for your suggestion. We have revised the acronyms in the manuscript according to your requirements.

Point 3: Line # 19-22 Comments：A long sentence of confusing results comparison, needs to be rewritten in a much clearer way identifying what is compared to what and what is already done and what is compared to the paper contribution. I am very confused especially after reading the result section! No need for micro % comparison details, you might state“slight improvement” instead of writing 0.5, 1.15, 0.6, 0.93 edge of one mechanism (state clearly) over others (state clearly) peer approaches,by “our approach” (if I am correct) showed a slight improvement over peer’s works

Response 3: We are very sorry that our writing and translation have caused you confusion. We modify this part to "The experimental results show that CBAM-YOLOv7 has higher precision, recall, [email protected] and [email protected] : 0.95 is slightly improved. The evaluation index value of CBAM-YOLOv7 has improved more than SE-YOLOv7 and ECA-YOLOv7.”

We have completed the modification according to your suggestions. If there are still deficiencies, we will continue to modify. Thank you for your comments.

Point 4: Line # 20 Comments：“…. algorithm 1.04% …...”, You need “,” in between.

Response 4: I'm very sorry, this is our mistake. We have modified this content in the manuscript.

Point 5: Line # 26 Comments：what is “sisal” ?!

Response 5: I'm very sorry, this is our mistake. We have changed "sisal" to "hemp" in the manuscript.

Point 6: Line # 27 Comments：“…. intelligent farming industry.” Very general huge claim for the entire industry that is a bit exaggerating, lower the tone to the focus of the paper “smart reliable automated duck count”.

Response 6: I'm very sorry, this is our mistake. Our statement is too exaggerated and broad, and we have revised the content in the manuscript.

Point 7: Line # 51-59 Comments：Very long sentence, you might easily make it at least three sentences.

Response 7: Dear reviewer! In this regard, We have revised it to "As countries around the world pay attention to the ecological environment, the development of waterfowl farming has been subject to certain restrictions and regulations. Many areas have been developed and restricted, and the spatial range suitable for farming camp ducks has been shrinking [2] .At present, farming is developing in the direction of intensification and ecology. Large scale farming and higher rearing density will have a greater impact on the temperature, humidity, ventilation, harmful gases, dust, and microbial content of poultry houses. It indirectly has a series of adverse effects on the intake, growth performance, and animal welfare of birds. For example, unreasonable ducks flock density will lead to poor living conditions thus causing physiological diseases such as body abrasions, skin damage, and fractures. And considering animal behavior like pecking and fighting in the same kind,the unreasonable density will have a negative impact on the efficiency and economy of the livestock and poultry industry [3, 4].”

Point 8: Line # 90-94 Comments： Split into two sentences.

Response 8: Dear reviewer! In this regard, We have modified it in the manuscript to "For example, the R-CNN produced by Girship et al. in 2014 introduced a two stage (two stage) detection method for the first time. This method uses deep constructive networks to obtain excellent target detection accuracy, but its many redundant operations greatly increase the space and time costs, and are diverse to deploy in the actual product farms [9, 10]. Law et al. proposed a single stage (one stage) detection method, CornerNet, and a new pooling method: corner pool. However, the method based on key points often encounters a large number of incorrect object bounding boxes, which limits its performance and cannot meet the high performance requirements of the duck breeding model [11]”.

Point 9: Line # 98 Comments：“ … processing time, ….” Remove “time” it is understood and seems

redundant!

Response 9: I'm very sorry, this is our mistake. Our wording is too redundant and complex, and we have revised the content in the manuscript.

Point 10: Line # 104 Comments：the acronym “SSD” spell it out.

Response 10: I'm very sorry, this is our mistake. We have added "Single Shot MultiBox Detector" to the manuscript.

Point 11: Line # 104-110 Comments： Another long sentence, divide to two or three, you stated that you are utilizing YOLOv7 detection in your model, but I did not see it mentioned before (above)!

Response 12: I'm very sorry, this is our mistake. We should not say here that we use the YOLOv7 algorithm for detection. We have modified it in the manuscript. The modifications are as follows: At present, the main single stage target detection algorithms mainly include YOLO series, Single Shot MultiBox Detector (SSD), RetinaNet, etc. In this paper, we will transfer and apply the idea of crash counting based on CNN to the problem of counting ducks [13,14].

Point 12: Line # 111 Comments：“…. a object ….” Use “an” instead of “a’.

Response 12: I'm very sorry, this is our mistake. We have revised it in the manuscript.

Point 13: Line # 112-114 Comments：You need citations to some references where “object counting” is utilized/explained.

Response 13: Dear reviewer! For this problem, we added 4 relevant references.

Point 14: Line # 116-117 Comments：The term “dataset” is used twice, are they the same dataset? Make it clear to the readers, same or different!

Response 14: I'm very sorry, this is our mistake. Our statement is too redundant, which brings you confusion. In the manuscript, we have modified it to "We have built a new large scale data set of drake images and named it" Hemp Duck Dataset ". The Hemp Duck Dataset contains 1500 pieces of label labels for the whole body frame and head frame of duck target detection. The Hemp Duck Dataset is released for the first time by the team. We will make it public and give the access method at the end of the article. "

Point 15: Line # 147-150 Comments：Good to admit some difficulty in the process, but “…...two objects are too close together, it is highly likely that prediction of object A will be completely filtered out….” Vague! What is object A? Please rewrite in a clearer way and remove the confusion.

Response 15: I'm very sorry, this is our mistake. Our statement is too vague, which brings you confusion.

In the manuscript, we have modified it to" We will name two different ducks as hemp duck A and hemp duck B. When hemp duckA and hemp duck B are too close, the prediction box of hemp duck A may be eliminated due to the screening of non maximum intrusion. Herebefore, it is a challenge to accurately estimate the number of ense Hemp Duck Datasets with inclusion."

Point 16: Line # 149-150 Comments：“….obscured hemp ducks’ dataset and for achieving accurate counts of individual hemp ducks.” A bit vague, please rewrite.

Response 16: I'm very sorry, this is our mistake. Our statement is too vague, which brings you confusion. We have revised the content in the manuscript.

Point 17: Line # 212 Comments：“3. (1)The batch……” What is (1) ? Does it refer to smithing before?

Response 17: Dear reviewer, "The batch" here should be considered as "The batch normalization layer". They are a whole and cannot be read separately. The batch normalization layer 'is the BN layer, which can accelerate the convergence rate of the network.

In the manuscript, we changed "The batch normalization layer" to "The Batch Normalization Layer“.

Point 18: Line # 218 Comments：“[20] (3) EMA …” What is (3) ? Does it refer to smithing before?

Response 18: Dear reviewer, (3) here is the third part of this section and belongs to the same level as (1) and (2) above.

We are very sorry for the confusion caused by our typesetting problem. We have revised the typesetting in the manuscript.

Point 19: Line # 229 Comments：“…….fields [???].” missing citation a reference for your claim of the Attention mechanism.

Response 19: Dear reviewer! For this problem, we added a relevant reference.

Point 20: Line # 245-250 Comments：A very long sentence, make two or three sentences.

Response 20: Dear reviewer! For this problem, we have modified it to "In the spatial attention module, the feature map in the previous step is used as the input

After GMP and GAP, two feature maps of size H × W × 1 are obtained. Then Concat operation is performed. After dimensionality reduction of feature map, spatial attention feature is generated by sigmoid activation. Finally, the spatial attention feature is multiplied with the input feature map to obtain the final feature map [22].

”

Point 21: Line # 243 Comments：“CBAM attention mechanism is added to YOLOV7 network structure [13,22]” Then what is the contribution of the aper if it is already added by others [13,22]?

Response 21: Dear reviewer! In response to this problem, we have learned about the work overview of the same industry before submitting the paper. At present, there is no work to add CBAM module to YOLOv7 algorithm. Moreover, suppose that some peers have done this work, but they must not apply it to Hemp Duck Datasets for quantity estimation. Our contribution is to provide a new idea and direction for the intelligent development of the duck breeding industry.

Point 22: Line # 246 Comments：“………, if add attention mechanism to the backbone network part….”English, please rewrite.

Response 22: I'm very sorry, this is our mistake. We have modified the content in the manuscript to " Once we add the attention mechanism to the backbone network, the attention mechanism module will destroy some of the original weights of the backbone network. This will lead to errors in the prediction results of the network. In this regard, we choose to add the attention mechanism to the part of enhancing feature network extraction without destroying the original features of network extraction.“

Point 23: Line # 248-249 Comments：“…may lead to network prediction effect…”English, please rewrite.

Response 23: I'm very sorry, this is our mistake. We have revised the content in the manuscript to "This will lead to errors in the prediction results of the network."

Point 24: Line # 250 Comments： “…features, It does not destroy..” “It” typo.

Response 24: I'm very sorry, this is our mistake. We have revised the content in the manuscript.

Point 25: Line # 256 Comments： “The two feature maps were fed into a two-layer neural network respectively.Then, the output features were added based on element-257 wise, and the final channel attention feature was generated after sigmoid activation operation.”You need to expand and clarify the vagueness in “two-layer neural network respectively”, what type of two layer-NN? “Sigmoidal activation, needs to be clarified for non NN experts.

Response 25: I'm very sorry, this is our mistake. We have modified the content in the manuscript as "Send the two feature maps to a two layer Multilayer Perceptron The number of neurons in the first layer of MLP is C/r (r is the reduction rate) , and the activation function is Relu; The number of neurons in the second layer is C And the weights of these two layers of neural networks are shared. Then, the output features are added and added based on element wise, and the final channel attachment feature is generated through sigmoid activation; Finally, multiply the channel attention feature with the original input feature map to obtain the input feature of the spatial attention module [10].”

Point 26: Line # 271&285 Comments： SE [25] & ECA [26] are already there, the authors just borrowed them to replace CBAM!

Response 26: I'm very sorry, this is our mistake. Our statement is incorrect. In order to evaluate the effect of CBAM-YOLOv7 algorithm, we carried SE and ECA modules to replace CBAM modules for association experiences.

Point 27: Line # 315 Comments： “In Pascal VOC 2008[27], the threshold value of IOU is set to 0.5.” What is IOU?

Response 27: Dear reviewer! In object detection, there is a commonly used indicator, IoU (Intersection over Union), which is often used to measure the accuracy of location information of prediction results in target detection tasks.

Point 28： Line # Table 2 Comments： What are the YOLOv4s & YOLOv5s ???!!! You need to justify such small edge of slight improvement in some permeance (not all).

Response 28: Dear reviewer! YOLOv5 has four structures: YOLOv5s, YOLOv5m, YOLOv5l and YOLOv5x. The difference between the four structures is depth_ Multiple and width_ The multiple parameter is different. The parameters corresponding to YOLOv5s are respectively depth_ Multiple=0.33 and depth_ The interpretation of multiple=0.50. YOLOv4s is consistent with YOLOv5s.

For the improvement of several types of parameter indicators, in the field of deep learning, small improvements can also bring huge benefits. However, we still simplified according to your requirements. We have revised it to "For example, the recall rate of YOLOv7 algorithm is 15.6% higher than that of YOLOV4. The remaining indicators are basically superior to other target detection algorithms. Finally, we choose YOLOv7 as the target detection algorithm used in the experience."

Point 29: Line # Table 3 Comments： Also in Table 3 shows CBAM with higher performance than SE & ECA, how does this help the cause of the paper?!

Response 29: Dear reviewer! The data results in Table 3 show that the detection effect of CBAM-YOLOv7 is better than that of SE-YOLOv7 and ECA-YOLOv7. This shows that the choice of adding CBAM module to YOLOv7 is correct. And the effect of CBAM-YOLOv7 is better than YOLOv7, which proves that our improvement is meaningful.

Author Response File: Author Response.docx

Reviewer 3 Report

Authors have Presented a paper on "An Attention Mechanism Improved YOLOv7 Object Detection Algorithm for Hemp Duck Count Estimation"

In summary, the main contributions from the authors are:

1. The dataset used in their study is the first release, and they constructed a new large scale hemp ducks image dataset, which contains 1500 hemp ducks object detection whole-body frame labeling and head frame labeling.

2. Their study constructed a comprehensive working baseline, including: hemp ducks identification, hemp ducks object detection, and hemp ducks image counting, to realize the intelligent breeding of hemp ducks.

3. Their work model was introduction of CBAM module to build CBAM-YOLOv7 algorithm.

Author Response

Response to Reviewer 3 Comments

Point 1: An Attention Mechanism Improved YOLOv7 Object Detection Algorithm for Hemp Duck Count Estimation"In summary, the main contributions from the authors are:

The dataset used in their study is the first release, and they constructed a new large scale hemp ducks image dataset, which contains 1500 hemp ducks object detection whole-body frame labeling and head frame labeling.
Their study constructed a comprehensive working baseline, including: hemp ducks identification, hemp ducks object detection, and hemp ducks image counting, to realize the intelligent breeding of hemp ducks.
Their work model was introduction of CBAM module to build CBAM-YOLOv7 algorithm.

Response 1: I am very sorry that I will explain to you about our contribution to this manuscript.

Firstly, we collected a large number of images and video data of mallard ducks from the original waterfowl farm in Ya 'an City, Sichuan Province, China, and manually screened and discarded redundant data to finally select 1500 images. At the same time, we also annotated the whole body data of the duck and the head data of the duck, and constructed a new large-scale target detection dataset of the duck. It provides data support for the intelligent development of hemp duck breeding industry. 2. Secondly, our study constructed a comprehensive work baseline, including duck target detection and recognition, and duck image counting. At the same time, we also carried out experiments to analyze the advantages and disadvantages of whole body labeling and head labeling. To a certain extent, we have realized the intelligent development of the hemp duck industry, and can develop to chicken, goose and other poultry breeding industry. 3. We improved the YOLOv7 algorithm. We introduced CBAM, ECA, SE three attention mechanism modules and conducted a comparative test. The final experimental results show that the CBAM-YoloV7 algorithm has the best detection effect.

Author Response File: Author Response.docx

Round 2

Reviewer 2 Report

The authors answered to all of my lined comments!

Author Response

Thank you for your suggestions on the revision of our manuscript.

Article Menu

Printed Edition

An Attention Mechanism-Improved YOLOv7 Object Detection Algorithm for Hemp Duck Count Estimation

Further Information

Guidelines

MDPI Initiatives

Follow MDPI