AC R-CNN: Pixelwise Instance Segmentation Model for Agrocybe cylindracea Cap

Yin, Hua; Yang, Shenglan; Cheng, Wenhao; Wei, Quan; Wang, Yinglong; Xu, Yilu

doi:10.3390/agronomy14010077

Open AccessArticle

AC R-CNN: Pixelwise Instance Segmentation Model for Agrocybe cylindracea Cap

by

Hua Yin

^1,2

,

Shenglan Yang

¹,

Wenhao Cheng

¹,

Quan Wei

¹,

Yinglong Wang

^1,* and

Yilu Xu

^2,*

¹

School of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang 330045, China

²

School of Software, Jiangxi Agricultural University, Nanchang 330045, China

^*

Authors to whom correspondence should be addressed.

Agronomy 2024, 14(1), 77; https://doi.org/10.3390/agronomy14010077

Submission received: 27 November 2023 / Revised: 26 December 2023 / Accepted: 26 December 2023 / Published: 28 December 2023

(This article belongs to the Section Precision and Digital Agriculture)

Download

Browse Figures

Versions Notes

Abstract

:

The popularity of Agrocybe cylindracea is increasing due to its unique flavor and nutritional value. The Agrocybe cylindracea cap is a key aspect of the growth process, and high-throughput observation of cap traits in greenhouses by machine vision is a future development trend of smart agriculture. Nevertheless, the segmentation of the Agrocybe cylindracea cap is extremely challenging due to its similarity in color to the rest of the mushroom and the occurrence of mutual occlusion, presenting a major obstacle for the effective application of automation technology. To address this issue, we propose an improved instance segmentation network called Agrocybe cylindracea R-CNN (AC R-CNN) based on the Mask R-CNN model. AC R-CNN incorporates hybrid dilated convolution (HDC) and attention modules into the feature extraction backbone network to enhance the segmentation of adhesive mushroom caps and focus on the segmentation objects. Furthermore, the Mask Branch module is replaced with PointRend to improve the network’s segmentation accuracy at the edges of the mushroom caps. These modifications effectively solve the problems of the original algorithm’s inability to segment adhesive Agrocybe cylindracea caps and low accuracy in edge segmentation. The experimental results demonstrate that AC R-CNN outperforms the original Mask R-CNN in terms of segmentation performance. The average precision (AP) is improved by 12.1%, and the F1 score is improved by 13.7%. Additionally, AC R-CNN outperforms other networks such as Mask Scoring R-CNN and BlendMask. Therefore, the research findings of this study can meet the high-precision segmentation requirements of Agrocybe cylindracea caps and lay a theoretical foundation for the development of subsequent intelligent phenotyping devices and harvesting robots.

Keywords:

Agrocybe cylindracea; machine vision; intensive segmentation; mushroom phenotype

1. Introduction

Agrocybe cylindracea, also known as chaxingu, is highly regarded for its nutritional richness, delicious taste, strong aroma, and unique flavor [1,2]. It has been found to possess certain anti-cancer properties [3], anti-tumor effects [4,5,6], and the ability to improve chronic diseases [7,8]. Consequently, it has gained popularity among many consumers. Furthermore, recent studies have shown that polysaccharides can be extracted from the discarded Agrocybe cylindracea substrate, which has significant commercial value [9]. As a result, an increasing number of businesses are now cultivating Agrocybe cylindracea, leading to the establishment of large-scale Agrocybe cylindracea production areas, primarily in Jiangxi province [10].

With the increasing market demand, improving product quality and yield has become a key issue that needs to be solved in the Agrocybe cylindracea industry [11]. Agronomic traits must be paid attention to in the planting process of Agrocybe cylindracea, and the cap is the key to many agronomic traits of Agrocybe cylindracea. By observing the distribution position and the width and thickness of each cap, mushroom farmers can determine the quality of the mushroom and adjust the environmental conditions of the greenhouse more reasonably. However, at present, automated phenotyping equipment experiences difficulties in segmenting Agrocybe cylindracea caps due to its similarity in color to the rest of the mushroom and the occurrence of mutual occlusion. As a result, manual observation is currently employed as an alternative approach. Obviously, this method is time-consuming. Therefore, to meet the demand of automated phenotyping technology, detecting and segmenting the cap of the Agrocybe cylindracea in a non-destructive and rapid manner has become a challenging problem that needs to be solved urgently.

In the field of computer vision, distinguishing identical individuals in the same image falls into the category of instance segmentation. In particular, with the advancements in computer hardware and deep learning, an increasing number of instance segmentation networks has been proposed, leading to more accurate instance segmentation results. Fang et al. [12] proposed the InstaBoost network, which introduced a simple and efficient method to augment existing instance mask datasets. By incorporating this method, the instance segmentation network achieved a 2.2% improvement in mean average precision (mAP) on the COCO dataset. In most segmentation tasks, the confidence of instance classification is used as the quality score for masks. However, the intersection over union (IOU) between quantized mask instances and the ground truth often has a weak correlation with the classification confidence. To address this issue, Huang et al. [13] introduced a module called Mask Score R-CNN, which can learn and predict the quality of masks. By integrating instance features with the corresponding predicted masks and performing mask IOU refinement, the accuracy of this method reached 39.6% on the COCO dataset. Moreover, Bolya et al. [14] proposed the full convolutional model, YOLACT network, to improve the detection speed of segmenting an object in real time without substantial loss of accuracy. This is accomplished by adding a mask branch to the one-stage target detection algorithm and splitting the instance segmentation into two parallel branches without adding any ROI pooling operation. Finally, the mask and mask factor vectors are multiplied one by one and the results are outputted with an AP of 29.8% and a speed of 33.5 FPS on the COCO dataset. Hao Chen [15] combined the ideas of Top-Down and Bottom-Up by adding the anchor-free detection model FCOS. The Bottom Module was added to extract low-level detail features and predict attention features at instance-level. The proposed BlendMask model draws on the fusion method of FCIS and YOLACT; ultimately, the accuracy of BlendMask on COCO is 41.3%. Moreover, Fang et al. [12] proposed a multi-stage end-to-end network, QueryInst, which treats instances of interest as learnable queries by utilizing one-to-one correspondences in different stages of an object query and one-to-one correspondences between the mask’s ROI features and the object in the same stage. This method achieved better results when applied to the COCO, CityScapes, and YouTube-VIS datasets. Both the AP of box and the AP of mask in the COCO dataset are two percentage points higher than that of HTC [16].

Mask R-CNN is a classic instance segmentation network that was proposed in 2017 [17]. It extends the Faster R-CNN [18] network by adding a parallel branch for predicting masks in addition to bounding box detection. It has been widely applied in various scenarios, and its performance and robustness have been well validated. In agricultural scenarios, existing studies have demonstrated that satisfactory results can be obtained by improving Mask R-CNN,. For instance, researchers proposed an improved convolutional neural network based on Mask R-CNN, which achieved precise segmentation of cucumbers [19]. They improved the region proposal network (RPN) module and the aspect ratio of anchor boxes, resulting in an F1 score of 89.47%. To enhance the real-time performance of strawberry segmentation, Pérez-Borrero et al. [20] made improvements to Mask R-CNN by removing the original classifier and bounding box regressor. They replaced the non-maximum suppression algorithm with a new region grouping and filtering algorithm, significantly reducing the inference time. This provided a foundation for the strawberry automatic harvesting system. Due to the challenges posed by ripe green tomatoes, which are occluded by leaves and other tomatoes frequently, Zu et al. [21] proposed the ROI align bilinear interpolation method in the region of interest (ROI) to calculate the target area. They pooled the corresponding region based on the coordinate positional feature map of the pre-selected boxes into a fixed size. As a result, both the Mask and Box achieved an F1 score of 92.0%. Wang and He [22] performed different image enhancement processes on the apple dataset and added an attention module to the Mask R-CNN backbone network to enhance its feature extraction capability, which realized the segmentation of apples in complex backgrounds with an mAP of 91.7%. To accurately recognize cherry tomatoes, Xu et al. [23] made several improvements based on Mask R-CNN. First, RGB and depth data are fused by improving the input layer of the Mask R-CNN. Next, the corresponding region-generated network is constructed to represent the overall constraints between fruits and stems, thus reducing the identification of branches. Finally, the multiclass prediction network is used to decouple the pixel-level prediction between tomatoes and stems. The improved Mask R-CNN achieves an accuracy of 93.76% for fruit recognition. Furthermore, Cong et al. [24] incorporated the attention mechanism into the backbone network of Mask R-CNN to enhance the feature extraction capability; the F1 score achieved on bell pepper segmentation reached 98.8%.

The most well-known research in this field has focused on applying instance segmentation networks to fruits and vegetables, such as apples, tomatoes, cucumbers, etc. [25,26,27,28,29,30,31,32], mainly addressing the segmentation of fruit targets from dense foliage. In contrast, the segmentation of Agrocybe cylindracea caps often requires overcoming mutual occlusion, which is not adequately addressed by many of the existing algorithms. In this study, we propose an improved Mask R-CNN called AC R-CNN, which achieves segmentation of Agrocybe cylindracea caps in real-world situations under challenging conditions such as dense growth, similar colors, and mutual occlusion. This paper is organized as follows. First, the source of experimental data is introduced. Then, the improvements to the network in terms of three aspects (focusing on the segmentation objects, enhancing the distinction of overlapping caps, and improving the edge segmentation of caps) are described. Finally, the effectiveness of AC R-CNN is analyzed via ablation experiments, comparative experiments, and generalization experiments.

2. Materials and Methods

2.1. Data Acquisition

The cap data of Agrocybe cylindracea were obtained from the training base of the School of Biological Science and Engineering, Jiangxi Agricultural University. To improve the generalization ability of the proposed model, 8 different varieties (comprising 4 wild species, i.e., JAUCC 0727, JAUCC 0532, JAUCC 2133, JAUCC 2135, and 4 common species, i.e., JAUCC 1847, JAUCC 1920, JAUCC 1852, JAUCC 2110, which were provided by the Jiangxi Agricultural University Edible and Medicinal Fungi Technology Engineering Research Center (JAUCC)) were selected for making mushroom bags, and 20 bags of each species were made according to the mushrooming time, and images of the Agrocybe cylindracea caps were collected from September to November 2022.

Data acquisition components were constructed in the mushroom greenhouse, as shown in Figure 1. We fixed the camera (Redmi K20 Pro, Xiaomi Technology Co. Ltd., Beijing, China) holder at 40 cm distance from the mushroom bag and ensured that the camera was at the same height as the mushroom cap. A total of 3885 caps were collected in the whole cycle with a resolution of 3000 × 4000 in JPEG format.

The images were annotated by LabelMe (4.5.13) software. Additionally, 3367 caps from different varieties were chosen as the test set, while the remaining 518 caps were used as the training set. The composition of the dataset is shown in Table 1.

2.2. AC R-CNN Model

Mask R-CNN is a popular instance segmentation network that effectively detects objects and outputs high-quality instance segmentation masks. The main process involves extracting feature maps from input images by using the ResNet [33] backbone network and a feature pyramid network (FPN). These feature maps are then used to set ROIs for binary classification in the RPN. After aligning the original image and feature maps, regression and mask generation are performed on these ROIs. However, the similar colors of the entire mushroom and occlusion issues of Agrocybe cylindracea caps cause problems with the original Mask R-CNN as it does not achieve high accuracy when segmenting caps on our dataset. In addition, due to the problems of Mask R-CNN itself, it is not effective in the segmentation of the edge of the object, and it cannot focus on the object that needs to be segmented. Thus, it fails to meet practical requirements. To solve these problems, we made improvements to Mask R-CNN, consisting of the following three steps:

First, we replaced the pooling layer in the ResNet backbone with a dilated convolutional layer, which helps to preserve more details in the extracted features to differentiate between two occluded mushroom caps. Secondly, we added an attention module to the ResNet backbone, enabling the network to focus more on the cap that needed to be segmented. Finally, we introduced the PointRend module into the mask prediction, which enhances the fine-grained edge segmentation of the cap predicted masks.

The improved network is called AC R-CNN (Agrocybe cylindracea R-CNN), and its overall structure is shown in Figure 2.

2.2.1. Backbone Improvements

(1): Add expanded convolution module

Mask R-CNN uses ResNet and FPN as the backbone network to extract features from the input image. During the feature extraction process of ResNet, pooling operations are applied to the input features. However, these pooling operations can lead to the loss of some feature information; thus, restoration will be less effective during upsampling [24]. When segmenting individual Agrocybe cylindracea caps, the original Mask R-CNN focuses more on the details. However, the occlusion between caps makes it challenging to distinguish them locally, resulting in difficulty in differentiating the caps individually, even though they can be distinguished (as shown in Figure 3). This is one of the reasons why the original network achieves less accuracy. Thus, directly increasing the size of convolutional kernels would significantly increase computational complexity. Therefore, we need to consider how to include more information in the same convolutional region, combining the global contour features of mushroom caps with the local features of individual caps to differentiate between two occluded mushroom caps.

Dilated convolution can increase the receptive field without changing the size of the convolutional kernel. It achieves this by inserting gaps between the elements of the kernel. However, traditional dilated convolution suffers from an issue with disjointed receptive fields, which may overlook some detailed features in certain cases. To address this problem, Wang et al. [34] proposed a method called hybrid dilated convolution (HDC). This method utilizes multiple convolutional kernels with different dilation factors in a consecutive manner, as shown in Figure 4. Assuming a sequence of N consecutive convolutional kernels with a size of K × K and dilation factors of [r₁…r_n], this set of dilation factors satisfies the condition for HDC when M₂ ≤ K, allowing the output layer to cover a non-hole rectangular region in the underlying layers. The calculation formula for M_i is shown in Equation (1).

M_{i} = \max [M_{i + 1} - 2 r_{i}, 2 r_{i} - M_{i + 1}, r_{i}] i = 1 \dots n - 1

(1)

where M denotes the distance between two non-zero elements, M_i denotes the distance between two non-zero elements in the ith layer, n denotes the last layer, M_n denotes the distance between two non-zero elements in the last layer, r_n denotes the last layer dilation factors, and M_n = r_n. Equation (1) demonstrates that this experiment uses a set of convolutional kernels with kernel size 3 × 3 expansion coefficients of [1, 2, 2] where M₂ = 2, K = 3 to satisfy the condition M₂ ≤ K. Based on this set of coefficients, the field of view is increased to 11 × 11, as shown in Figure 5. Figure 5a shows the range of view size using three consecutive convolutional kernels with expansion coefficient of 1 and Figure 5b shows the range of view size after using HDC expansion coefficients of [1, 2, 2].

(2): Add attention module

During the training process, the complex and intricate nature of input features often makes it challenging for the network to effectively focus on important and relevant information. Inspired by human cognition, where attention is selectively directed towards specific regions of interest, while ignoring irrelevant or less important information. These attention mechanisms have been introduced into deep learning as modules. Attention modules allow models to focus on relevant parts of the data and adaptively adjust the weights and attention levels of a variety of regions, thereby improving model performance. Currently, commonly used attention mechanisms include CBAM [35], SE [36], and CA [37]. Incorporating these attention mechanisms can enhance network performance, but it can also increase computational complexity, resulting in slower network speeds. The ECA (efficient channel attention) mechanism, proposed by Wang et al. [38], focuses solely on the attention weight calculation in the channel dimension, resulting in lower computational complexity and faster runtime speed. It also exhibits a certain level of robustness. Experimental results have shown that its performance is comparable to other attention mechanisms. Therefore, this work adopts the ECA mechanism and incorporates it into the bottleneck module of ResNet, as shown in Figure 6. This allows the network to pay attention to regions of interest without compromising the original network structure.

2.2.2. Add PointRend Module

The Mask R-CNN obtains mask information via a fully convolutional network (FCN). FCN is a typical end-to-end network that operates on images via convolution and pooling operations, enhancing feature distribution in space to improve feature density. Then, by performing deconvolution operations, the feature map is upsampled to the same size as the input image; each pixel value is then classified to obtain the corresponding mask information. However, the convolution and pooling operations are not consistent, as they can easily lose object edge information. The lost information cannot be recovered during the upsampling process, resulting in low prediction accuracy of the Mask R-CNN on object boundaries. This problem is especially evident in the segmentation of Agrocybe cylindracea as the cap and stem have similar colors. Although the addition of dilated convolutions can increase the receptive field and improve segmentation accuracy, it is ineffective in handling edge details. To address this issue, the PointRend module is introduced, as shown in Figure 7.

As can be seen in Figure 7, PointRend mainly comprises two stages, fine-grained features (stage one) and coarse prediction (stage two). Stage one aims to refine segmentation results of 4× downsampling by ResNet, while stage two is applied to predict the result of the detection head. Overall, the PointRend module obtains the best edge segmentation result by iterative upsampling method. First, feature map is upsampled by 2× bilinear interpolation to obtain a mask and select N uncertain points (usually classification probabilities close to 0.5). Then, for each uncertain point, the feature vector is acquired by fine-grained features and coarse prediction based on its coordinates. Next, the classification result of the feature vector is predicted by the multi-layer perceptron (MLP) and coarse prediction is updated as the new mask. Finally, the above steps are repeated until the resolution of mask meets our requirements. The pipeline is shown in Figure 8.

Since the PointRend module can segment the edges of objects more finely, replacing this module with the original Mask Branch module in the Mask R-CNN as a mask generator allows the Mask R-CNN to have a better effect in segmenting the edges of the fungus cover.

2.3. Model Training and Evaluation

The whole experiment was performed using the PyTorch 1.10.0 framework and the programming language was Python 3.7. The computer used for training was an AMD(R) EPYC 7351P CPU, with NVIDIA GeForce RTX 3090 Ti GPU, and the system environment was Ubuntu 20.04.5 10. The labeled mushroom cap dataset was used to convert to COCO dataset format and trained on AC R-CNN where batch size was 4, learning rate was 0.0025, and learning momentum was 0.9.

The average precision (AP) and F1 score were used to evaluate the effect of mushroom cap segmentation. The calculation equations are shown in (2)–(5).

P = \frac{T P}{(T P + F P)} \times 100 %

(2)

R = \frac{T P}{T P + F N} \times 100 %

(3)

F 1 = \frac{2 \times P \times R}{P + R} \times 100 %

(4)

A P = \int_{0}^{1} P R (r) d r

(5)

where AP is the area enclosed by the PR curve and the x-axis, r denotes IOU, P denotes precision, R denotes recall, TP denotes the number of samples that were predicted to be positive samples that were actually positive samples, FP denotes the number of samples that were predicted to be positive samples but were not actually positive samples, and FN denotes the number of samples that were predicted not to be positive samples but were actually positive samples.

3. Results

3.1. Segmentation Effect

To evaluate the real performance of the AC R-CNN model, segmentation was performed on the caps in the test set. The results show that the AC R-CNN model achieved an AP50 of 0.883 and an F1 score of 0.886; the learning curves are shown in Figure 9. The segmentation results, as shown in Figure 10, accurately outline each mushroom cap. Despite occlusions between mushroom caps, the model can separate them effectively. The AC R-CNN model demonstrates good segmentation performance for mushroom caps with varying quantities and densities, laying a solid foundation for subsequent mushroom cap measurements.

3.2. Comparison with State of the Art

To verify the effectiveness of AC R-CNN, AP50, AP75, F1, and run time were used as the indexes (where AP50 and AP75 are both the accuracy of segmentation) and compared with some of the popular current networks. Table 2 shows that our method is optimal in AP50, AP75, and F1 scores, and due to the improvement of ResNet and the addition of the PointRend module, AC R-CNN improves the indexes by 12.1%, 19.6%, and 13.7% compared to the original Mask R-CNN, respectively. In terms of time consumption for image prediction, although our method is not the fastest, the difference is only 22 ms compared to the fastest method (Mask Scoring R-CNN), which is able to satisfy most of the scenarios, and these fully demonstrate the effectiveness of AC R-CNN.

3.3. Ablation Experiment

To verify the effectiveness of each improved part of the Mask R-CNN for the segmentation of the cap of the Agrocybe cylindracea, ablation experiments were carried out. They were completed in the same experimental environment with the same parameters, and the results of the experiments are shown in Table 3. It is evident that the original Mask R-CNN performs poorly in terms of segmentation, with an AP50 of only 0.762. However, after adding the PointRend module, the AP50 improved from 0.762 to 0.818, representing a 5.6 percentage point increase. Furthermore, with the addition of the HDC module, the AP50 increased from 0.818 to 0.852, showing a 3.4 percentage point improvement. Interestingly, while the AP50 increased by 3.1 percentage points after adding the ECA module, the AP75 improved by 7.8 percentage points. From the table, we can conclude that each module added in AC R-CNN plays a specific role in enhancing the overall performance of the original Mask R-CNN.

3.3.1. HDC Module Effect

In this work, the HDC module is introduced to address the problem of caps that are difficult to distinguish by the current small range of information due to adhesion between multiple caps of similar color. For this reason, an HDC module is needed to provide a larger field of view to distinguish different mushroom caps by the general outline of the caps. To verify whether the HDC module can effectively segment mushroom caps, it is introduced into the Mask R-CNN for training, and its effect is compared with the original network. The results are shown in Figure 11.

Figure 11 shows that the original Mask R-CNN struggles to differentiate between adhered mushroom caps due to their similar colors and textures. As a result, the segmentation performance for adhered mushroom caps is poor. However, with the addition of the HDC module, Mask R-CNN achieved better segmentation results for adhered mushroom caps. The HDC module helps to effectively separate the adhered mushroom caps.

3.3.2. Attention Module Effect

Attention modules enhance the feature extraction capabilities of the network. By incorporating attention modules into the feature extraction module, the network can focus more on the regions of interest, thereby improving the segmentation accuracy. Sanghyun et al. [35] proposed a new attention module called the convolutional block attention module (CBAM) by combining a spatial attention mechanism with a channel attention mechanism. Hu et al. [36] introduced the squeeze-and-excitation (SE) attention module, which focuses on the relationships between various channels to learn about the importance of each channel automatically. Wang et al. [38] improved the SE module to effectively capture cross-channel interactions without dimension reduction. Hou et al. [37] proposed the coordinate attention (CA) module for efficient mobile network design; the algorithm considers both channel information and direction-related positional information. To determine the most suitable attention mechanism for mushroom cap segmentation, common attention mechanisms were compared by applying the instance segmentation model for mushroom caps, as shown in Table 4. It demonstrates that incorporating the ECA module into the Mask R-CNN yields the best results. To investigate the impact of the attention module on the network, the heat maps of the original network and the network with the ECA module were compared, as shown in Figure 12. The addition of the ECA module clearly improves the network’s accuracy. Compared to the original network, the network with an ECA module focuses more on the entire mushroom structure, particularly when placing greater emphasis on the mushroom caps. This focus enables the network to better concentrate on the mushroom caps and ultimately improve segmentation accuracy.

3.3.3. PointRend Effect

The original Mask R-CNN can segment the overall contour of mushroom caps but may not segment fine details accurately, e.g., object edges. To evaluate the effectiveness of the PointRend module for object edge segmentation, it was incorporated into the original Mask R-CNN and trained. The results are shown in Figure 13.

In Figure 13, the original Mask R-CNN exhibits uneven and incomplete segmentation of mushroom cap edges. However, after incorporating the PointRend module, the network achieves a smoother and more complete segmentation of mushroom caps. This validates the effectiveness of the PointRend module in enhancing edge segmentation of mushroom caps.

4. Discussion

4.1. Discussion of Dilated Convolutional Layer Structure

HDC is proposed to solve the gridding effect brought about by the expansion of the convolution. HDC can achieve the effect of expanding the field of view without losing details. Figure 4 shows one set of the multiple expansion convolution combinations under the HDC condition; this combination takes the result obtained after the input feature map is convolved by expansion as the input of the next expansion convolution, and then repeats the cycle; it then takes the result of the previous expansion convolution as the output. Another form of combination is to convolve the input feature map by a set of convolutional kernels with varying expansion coefficients. Then, the sum of the convolution results is used to obtain the output. The structure of the output is shown in Figure 14.

The convolutional layers with different expansion coefficients in the HDC module are separable if Equation (1) is satisfied. The results obtained between different convolutional layers can be processed by choosing either the structure shown in Figure 4 (method 1) or the structure shown in Figure 14 (method 2). To verify which expansion coefficients to choose, and which relationship between the convolution results can achieve better results in our dataset, some commonly used expansion coefficients were chosen for comparison. The experimental results are shown in Table 5. They demonstrate that the accuracy of method 1 and method 2 changed under different expansion coefficients. The performance achieved by adopting method 1 and the expansion coefficient 1-2-2 is the best. Therefore, we chose the HDC expansion coefficient configuration 1-2-2 and adopted method 1 for expansion operation.

4.2. Model Deficiencies

Compared with traditional CNN methods, the AC R-CNN model greatly improves the overall segmentation of the mushroom cap, especially the edge and the adhered mushrooms. Although the modified network can achieve good segmentation results for most of the bacterial cover images, there are also some cases with poor segmentation results. The results are summarized by four major points, as shown in Figure 15. Figure 15a shows that the detection failed due to the small target. Figure 15b shows that the surface of the mushroom cap is wet due to excessive humidity in the greenhouse, and the problem with light reflection makes the color characteristics of some areas different from the surrounding areas, which leads to the inability to divide the caps. Figure 15c shows that that the mushroom cap is divided into two parts by the stalk and the target is small. Figure 15d shows a case of misidentification. The reason is that when the cap grows slowly, the inner Agaricales of the cap slowly fall off and exist in the stalk. Due to its color and appearance, the stalk can resemble the cap and therefore be misidentified.

5. Conclusions

High-throughput acquisition of crop phenotypes is a future development trend of smart agriculture. To address the challenge of automated segmentation of Agrocybe cylindracea caps, which is the main obstacle to developing automated phenotyping equipment, this research proposes an improved Mask R-CNN (AC R-CNN). Compared with Mask R-CNN and similar networks, AC R-CNN shows significant improvement in segmentation accuracy and overall performance. The average precision (AP) reaches 88.3%, an increase of 12.1 percentage points compared to traditional networks.

This method effectively segments Agrocybe cylindracea caps; however, there are still some limitations. For example, only one side of the mushroom bag could be observed, potentially obtaining incomplete information; smaller caps may not be detected by the algorithm; and segmentation accuracy is not perfect for damp caps. Furthermore, segmentation accuracy still needs to be further improved. In the future, we will continue to improve the image acquisition structure and investigate new algorithms to adapt to these special cases and further enhance segmentation accuracy.

Author Contributions

H.Y.: Conceptualization, Writing—review and editing, Project administration, Supervision; S.Y.: Data curation, Methodology, Formal analysis, Writing—original draft; W.C.: Methodology, Formal analysis, Software; Q.W.: Software; Y.W.: Supervision, Writing—review and editing; Y.X.: Supervision, Writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the National Natural Science Foundation of China (62362039, 62166020), Natural Science Foundation of Jiangxi Province (20224BAB206089), Innovation and Entrepreneurship Training Program for College Students project (X202310410263), and Graduate Student Innovation Fund project (YC2023-S366).

Data Availability Statement

Given that the data used in this study were self-collected, the dataset is being further improved. Thus, the dataset is unavailable at present.

Acknowledgments

Special thanks to the reviewers for their valuable comments.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Liu, Q.M.; Song, H.Y.; Chen, R.X.; Chen, M.; Zhai, Z.; Zhou, J.; Gao, Y.; Hu, D. Species concept of Cyclocybe chaxingu, an edible mushroom cultivated in China. Mycosystema 2021, 40, 981–991. [Google Scholar] [CrossRef]
Qian, L.; Abudureheman, B.; Xue, S.; Heng, Z.; Jianlin, Z.; Rui, T. Research Progress of Agrocybe aegerita. Mod. Food 2023, 29, 42–44. [Google Scholar] [CrossRef]
Shon, Y.H.; Nam, K.S. Antimutagenicity and induction of anticarcinogenic phase II enzymes by basidiomycetes. J. Ethnopharmacol. 2001, 77, 103–109. [Google Scholar] [CrossRef]
Chen, Y.; Jiang, S.; Jin, Y.; Yin, Y.; Yu, G.; Lan, X.; Cui, M.; Liang, Y.; Wong, B.H.C.; Guo, L.; et al. Purification and characterization of an antitumor protein with deoxyribonuclease activity from edible mushroom Agrocybe aegerita. Mol. Nutr. Food Res. 2012, 56, 1729–1738. [Google Scholar] [CrossRef] [PubMed]
Chien, R.C.; Tsai, S.Y.; Lai, E.Y.; Mau, J.L. Antiproliferative Activities of Hot Water Extracts from Culinary-Medicinal Mushrooms, Ganoderma tsugae and Agrocybe cylindracea (Higher Basidiomycetes) on Cancer Cells. Int. J. Med. Mushrooms 2015, 17, 453–462. [Google Scholar] [CrossRef] [PubMed]
Yin, H.; Yi, W.; Hu, D. Computer vision and machine learning applied in the mushroom industry: A critical review. Comput. Electron. Agric. 2022, 198, 107015. [Google Scholar] [CrossRef]
Lee, B.R.; Lee, Y.P.; Kim, D.W.; Song, H.Y.; Yoo, K.; Won, M.H.; Kang, T.; Lee, K.J.; Kim, K.H.; Joo, J.H.; et al. Amelioration of streptozotocin-induced diabetes by Agrocybe chaxingu polysaccharide. Mol. Cells 2010, 29, 349–354. [Google Scholar] [CrossRef] [PubMed]
Nath, M.; Barh, A.; Sharma, A.; Verma, P.; Bairwa, R.K.; Kamal, S.; Sharma, V.P.; Annepu, S.K.; Sharma, K.; Bhatt, D.; et al. Identification of Eight High Yielding Strains via Morpho-Molecular Characterization of Thirty-Three Wild Strains of Calocybe indica. Foods 2023, 12, 2119. [Google Scholar] [CrossRef]
Wang, Y.; Zhang, T.; Huang, X.; Yin, J.; Nie, S. Heteroglycans from the fruiting bodies of Agrocybe cylindracea: Fractionation, physicochemical properties and structural characterization. Food Hydrocolloid 2021, 114, 106568. [Google Scholar] [CrossRef]
Zhang, J.; Chen, X.; Mu, X.; Hu, M.; Wang, J.; Huang, X.; Nie, S. Protective effects of flavonoids isolated from Agrocybe aegirita on dextran sodium sulfate-induced colitis. Efood 2021, 2, 288–295. [Google Scholar] [CrossRef]
Zhu, H. Breeding of Excellent Strains of Cyclocybe cylindracea and Their High-yielding Cultivation. Master Type. Ph.D. Thesis, Jiangxi Agricultural University, Nanchang, China, 2022. [Google Scholar]
Fang, Y.; Yang, S.; Wang, X.; Li, Y.; Fang, C.; Shan, Y.; Feng, B.; Liu, W. Instances as Queries. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada, 11–17 October 2021; pp. 6910–6919. [Google Scholar]
Huang, Z.; Huang, L.; Gong, Y.; Huang, C.; Wang, X. Mask Scoring R-CNN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 6409–6418. [Google Scholar]
Bolya, D.; Zhou, C.; Xiao, F.; Lee, Y.J. YOLACT: Real-Time Instance Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9157–9166. [Google Scholar] [CrossRef]
Chen, H.; Sun, K.; Tian, Z.; Shen, C.; Huang, Y.; Yan, Y. BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2020; pp. 8573–8581. [Google Scholar]
Chen, K.; Pang, J.; Wang, J.; Xiong, Y.; Li, X.; Sun, S.; Feng, W.; Liu, Z.; Shi, J.; Ouyang, W.; et al. Hybrid Task Cascade for Instance Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 4974–4983. [Google Scholar]
He, K.; Gkioxari, G.; Dollar, P.; Girshick, R. Mask R-CNN. In Proceedings of the IEEE Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
Girshick, R. Fast R-CNN. In Proceedings of the IEEE Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
Liu, X.; Zhao, D.; Jia, W.; Ji, W.; Ruan, C.; Sun, Y. Cucumber Fruits Detection in Greenhouses Based on Instance Segmentation. IEEE Access 2019, 7, 139635–139642. [Google Scholar] [CrossRef]
Pérez-Borrero, I.; Marín-Santos, D.; Gegúndez-Arias, M.E.; Cortés-Ancos, E. A fast and accurate deep learning method for strawberry instance segmentation. Comput. Electron. Agric. 2020, 178, 105736. [Google Scholar] [CrossRef]
Zu, L.; Zhao, Y.; Liu, J.; Su, F.; Zhang, Y.; Liu, P. Detection and Segmentation of Mature Green Tomatoes Based on Mask R-CNN with Automatic Image Acquisition Approach. Sensors 2021, 21, 7842. [Google Scholar] [CrossRef] [PubMed]
Wang, D.; He, D. Fusion of Mask RCNN and attention mechanism for instance segmentation of apples under complex background. Comput. Electron. Agric. 2022, 196, 106864. [Google Scholar] [CrossRef]
Xu, P.; Fang, N.; Liu, N.; Lin, F.; Yang, S.; Ning, J. Visual recognition of cherry tomatoes in plant factory based on improved deep instance segmentation. Comput. Electron. Agric. 2022, 197, 106991. [Google Scholar] [CrossRef]
Cong, P.; Li, S.; Zhou, J.; Lv, K.; Feng, H. Research on Instance Segmentation Algorithm of Greenhouse Sweet Pepper Detection Based on Improved Mask RCNN. Agronomy 2023, 13, 196. [Google Scholar] [CrossRef]
Zhang, H.; Tang, C.; Sun, X.; Fu, L. A Refined Apple Binocular Positioning Method with Segmentation-Based Deep Learning for Robotic Picking. Agronomy 2023, 13, 1469. [Google Scholar] [CrossRef]
Wang, C.; Yang, G.; Huang, Y.; Liu, Y.; Zhang, Y. A transformer-based mask R-CNN for tomato detection and segmentation. J. Intell. Fuzzy Syst. 2023, 44, 8585–8595. [Google Scholar] [CrossRef]
Li, Y.; Wang, Y.; Xu, D.; Zhang, J.; Wen, J. An Improved Mask RCNN Model for Segmentation of ‘Kyoho’ (Vitis labruscana) Grape Bunch and Detection of Its Maturity Level. Agriculture 2023, 13, 914. [Google Scholar] [CrossRef]
López-Barrios, J.D.; Escobedo Cabello, J.A.; Gómez-Espinosa, A.; Montoya-Cavero, L. Green Sweet Pepper Fruit and Peduncle Detection Using Mask R-CNN in Greenhouses. Appl. Sci. 2023, 13, 6296. [Google Scholar] [CrossRef]
Chen, Y.; Li, X.; Jia, M.; Li, J.; Hu, T.; Luo, J. Instance Segmentation and Number Counting of Grape Berry Images Based on Deep Learning. Appl. Sci. 2023, 13, 6751. [Google Scholar] [CrossRef]
Yeh, J.; Lin, K.; Lin, C.; Kang, J. Intelligent Mango Fruit Grade Classification Using AlexNet-SPP With Mask R-CNN-Based Segmentation Algorithm. IEEE Trans. Agrifood Electron. 2023, 1, 41–49. [Google Scholar] [CrossRef]
Mu, X.; He, L.; Heinemann, P.; Schupp, J.; Karkee, M. Mask R-CNN based apple flower detection and king flower identification for precision pollination. Smart Agric. Technol. 2023, 4, 100151. [Google Scholar] [CrossRef]
Shen, R.; Zhen, T.; Li, Z. Segmentation of Unsound Wheat Kernels Based on Improved Mask RCNN. Sensors 2023, 23, 3379. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
Wang, P.; Chen, P.; Yuan, Y.; Liu, D.; Huang, Z.; Hou, X.; Cottrell, G. Understanding Convolution for Semantic Segmentation. In Proceedings of the WACV 2018: IEEE Winter Conference on Applications of Computer Vision, Lake Tahoe, CA, USA, 12–15 March 2018; pp. 1451–1460. [Google Scholar]
Sanghyun, W.; Park, J.; Joon-Young, L.; So, K.I. CBAM: Convolutional Block Attention Module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
Hou, Q.; Zhou, D.; Feng, J. Coordinate Attention for Efficient Mobile Network Design. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 13713–13722. [Google Scholar]
Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11534–11542. [Google Scholar]

Figure 1. Schematic diagram of mushroom data acquisition.

Figure 2. Structure of AC R-CNN.

Figure 3. Enlarged image of overlap cap.

Figure 4. Flowchart of expansion convolution execution.

Figure 5. Receptive fields of different expansion coefficients. (a) shows the range of view size before using HDC. (b) shows the range of view size after using HDC.

Figure 6. Add ECA module to the bottleneck of ResNet.

Figure 7. Schematic diagram of PointRend module.

Figure 8. Sampling process on PointRend module.

Figure 9. Learning curves.

Figure 10. Segmentation effect of AC R-CNN model. Different colors represent different caps, the same color represents one cap, and the colors are generated randomly.

Figure 11. Impact of HDC module. Different colors represent different caps, the same color represents one cap, and the colors are generated randomly.

Figure 12. Heat map of model based on attention mechanism.

Figure 13. Impact of PointRend module. Different colors represent different caps, the same color represents one cap, and the colors are generated randomly.

Figure 14. Alternative architecture of expansion convolution.

Figure 15. Deficiencies of AC R-CNN. (a) shows the detection failed due to the small target. (b) shows that the problem with light reflection leads to the inability to divide the caps. (c) shows that that the mushroom cap is divided into two parts by the stalk. (d) shows a case of misidentification.

Table 1. Classes and number of images in our dataset.

Dataset	Number of Caps in Dataset
Dataset	Occluded Caps	Unoccluded Caps	Total
Training set	2165	1202	3367
Test set	327	191	518

Table 2. Results of different instance segmentation methods. Bold represents the best result.

Method	AP50	AP75	F1	Run Time (s)
Mask R-CNN	0.762	0.585	0.749	1.578
Mask Scoring R-CNN	0.763	0.605	0.751	1.483
YOLACT	0.748	0.382	0.757	1.518
InstaBoost	0.710	0.517	0.684	1.743
QueryInst	0.735	0.585	0.855	1.732
BlendMask	0.742	0.452	0.762	1.485
AC R-CNN (Ours)	0.883	0.781	0.886	1.505

Table 3. The results of ablation experiment. – mean values unused in this model, √ mean values used in this model. Bold represents the best result.

PointRend	HDC	ECA	AP50	AP75
–	–	–	0.762	0.585
√	–	–	0.818	0.649
√	√	–	0.852	0.727
√	√	√	0.883	0.781

Table 4. The effect of different attention modules. Bold repersent the best result.

Attention Module	AP50	AP75	F1
CBAM	0.746	0.527	0.782
SE	0.778	0.631	0.805
CA	0.769	0.627	0.787
ECA	0.793	0.703	0.811

Table 5. Effect of choosing different expansion coefficients. Bold represents the best result.

Expansion Coefficients	Method 1-AP50	Method 2-AP50
1-2-2	0.811	0.803
1-2-5	0.771	0.783
1-2-2-1	0.809	0.805
1-2-2-1-2-2	0.806	0.796
1-2-5-1-2-5	0.782	0.792

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yin, H.; Yang, S.; Cheng, W.; Wei, Q.; Wang, Y.; Xu, Y. AC R-CNN: Pixelwise Instance Segmentation Model for Agrocybe cylindracea Cap. Agronomy 2024, 14, 77. https://doi.org/10.3390/agronomy14010077

AMA Style

Yin H, Yang S, Cheng W, Wei Q, Wang Y, Xu Y. AC R-CNN: Pixelwise Instance Segmentation Model for Agrocybe cylindracea Cap. Agronomy. 2024; 14(1):77. https://doi.org/10.3390/agronomy14010077

Chicago/Turabian Style

Yin, Hua, Shenglan Yang, Wenhao Cheng, Quan Wei, Yinglong Wang, and Yilu Xu. 2024. "AC R-CNN: Pixelwise Instance Segmentation Model for Agrocybe cylindracea Cap" Agronomy 14, no. 1: 77. https://doi.org/10.3390/agronomy14010077

APA Style

Yin, H., Yang, S., Cheng, W., Wei, Q., Wang, Y., & Xu, Y. (2024). AC R-CNN: Pixelwise Instance Segmentation Model for Agrocybe cylindracea Cap. Agronomy, 14(1), 77. https://doi.org/10.3390/agronomy14010077

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

AC R-CNN: Pixelwise Instance Segmentation Model for Agrocybe cylindracea Cap

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Acquisition

2.2. AC R-CNN Model

2.2.1. Backbone Improvements

2.2.2. Add PointRend Module

2.3. Model Training and Evaluation

3. Results

3.1. Segmentation Effect

3.2. Comparison with State of the Art

3.3. Ablation Experiment

3.3.1. HDC Module Effect

3.3.2. Attention Module Effect

3.3.3. PointRend Effect

4. Discussion

4.1. Discussion of Dilated Convolutional Layer Structure

4.2. Model Deficiencies

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI