Identiﬁcation of Oil Tea ( Camellia oleifera C.Abel) Cultivars Using EfﬁcientNet-B4 CNN Model with Attention Mechanism

: Cultivar identiﬁcation is a basic task in oil tea ( Camellia oleifera C.Abel) breeding, quality analysis, and an adjustment in the industrial structure. However, because the differences in texture, shape, and color under different cultivars of oil tea are usually inconspicuous and subtle, the identiﬁcation of oil tea cultivars can be a signiﬁcant challenge. The main goal of this study is to propose an automatic and accurate method for identifying oil tea cultivars. In this study, a new deep learning model is built, called EfﬁcientNet-B4-CBAM, to identify oil tea cultivars. First, 4725 images containing four cultivars were collected to build an oil tea cultivar identiﬁcation dataset. EfﬁcientNet-B4 was selected as the basic model of oil tea cultivar identiﬁcation, and the Convolutional Block Attention Module (CBAM) was integrated into EfﬁcientNet-B4 to build EfﬁcientNet-B4-CBAM, thereby improving the focusing ability of the fruit areas and the information expression capability of the fruit areas. Finally, the cultivar identiﬁcation capability of EfﬁcientNet-B4-CBAM was tested on the testing dataset and compared with InceptionV3, VGG16, ResNet50, EfﬁcientNet-B4, and EfﬁcientNet-B4-SE. The experiment results showed that the EfﬁcientNet-B4-CBAM model achieves an overall accuracy of 97.02% and a kappa coefﬁcient of 0.96, which is higher than that of other methods used in comparative experiments. In addition, gradient-weighted class activation mapping network visualization also showed that EfﬁcientNet-B4-CBAM can pay more attention to the fruit areas that play a key role in cultivar identiﬁcation. This study provides new effective strategies and a theoretical basis for the application of deep learning technology in the identiﬁcation of oil tea cultivars and provides technical support for the automatic identiﬁcation and non-destructive testing of oil tea cultivars. by convolutional neural networks and the attention mechanism, this presents a effective neural network model for oil tea cultivar this is the application of EfﬁcientNet and in the identiﬁcation of cultivars. contributions comparative experiments on show proving the effectiveness a oil tea cultivars. We suggest that future work can further enrich the oil tea cultivar identiﬁcation data set, and provide a sufﬁcient data basis for the research on the oil tea cultivar identiﬁcation algorithm. In future research, we will optimize the speed of the EfﬁcientNet-B4-CBAM model and attempt to deploy it using a mobile phone.


Introduction
Oil tea (Camellia oleifera C.Abel), which is one of four major woody oil plants in the world [1,2], has high nutritional and economic value globally, and its major producing areas include China, Japan, South Korea, and Vietnam [3,4]. Among these numerous producing areas, China, its country of origin, has the largest cultivar group of oil tea. With an increase in the number of new cultivars of oil tea, even professional researchers can become confused when identifying them, which has had a negative impact on the cultivar breeding and industrial structure adjustment of oil tea [5][6][7]. Owing to the numerous cultivars and their high phenotypic similarity, many difficulties and challenges have been encountered in the cultivar identification of oil tea [8][9][10]. Therefore, it is extremely challenging to identify oil tea cultivars scientifically and accurately [11][12][13].
In previous studies, molecular marker [14,15], hyperspectral image [16,17], and traditional computer vision technologies [18,19] have been used to solve the cultivar identifica-by convolutional neural networks and the attention mechanism, this study presents a completely new and more effective convolutional neural network model EfficientNet-B4-CBAM for oil tea cultivar identification. As far as we know, this is the first application of EfficientNet model and CBAM module in the identification of oil tea cultivars. The specific contributions and innovations are summarized as follows: (1) By formulating oil tea cultivar identification as a fine-grained image classification problem, this study presents EfficientNet-B4-CBAM to achieve an adaptive refinement of the feature channels and space.
(2) An oil tea cultivar identification dataset was constructed, which contains a total of 4725 images with four cultivars of oil tea. The oil tea cultivar identification dataset was collected using a smartphone and then calibrated by human experts. (3) Based on the oil tea cultivar identification dataset, extensive comparative experiments were conducted on an EfficientNet-B4-CBAM against VGG16 [41], InceptionV3 [42], ResNet50 [43], EfficientNet-B4 [44], and EfficientNet-B4-SE [45]. The results show that the proposed EfficientNet-B4-CBAM is superior to the other methods in comparative experiments, proving the effectiveness of embedding a CBAM module.

Study Area
The study area is located in the state-owned Camellia Oleifera Forestry Farm of Huang Lawn, Shaoyang City, Hunan Province, China. The altitude is 470 m above sea level. According to the Köppen climate classification, the study area has a humid subtropical climate, abundant precipitation, simultaneous rain and heat, and four distinct seasons. The average annual temperature is 17.2 • C, and the average rainfall is 1361.6 mm per annual. The State-owned Camellia oleifera Forestry Farm of Huang Lawn is the only state-owned farm named after oil tea in China, with 52.67 ha of a national key oil tea forestry seed base and 133.33 ha of a special oil tea industrial park.

Data Acquisition and Dataset Construction
The most represented oil tea cultivars, i.e., Xianglin 210, Huashuo, Huaxin, and Huajin, in Hunan Province were selected as the research objects. The images used in this research were taken using an iPhone XR from 8:00 to 17:00 from 4 September to 7 November 2020. To cover the diversity in lighting, shadow, and background, the oil tea images were captured under different weather conditions. The data acquisition process used a minimum resolution of 72 dpi, and the resulting images were saved in the Joint Photographic Experts Group format with a pixel resolution of 3024 × 4032. After data acquisition, 4725 images were obtained. Sample images of the four oil tea cultivars are shown in Figure 1. The 4725 raw experimental images collected from four oil tea cultivars were used to build a high-quality oil tea cultivar identification dataset. As shown in Table 1, the oil tea cultivar identification dataset was divided into three parts: one for training (60% of the dataset, 2845 samples), one for validation (20% of the dataset, 940 samples), and one for testing (20% of the dataset, 940 samples). The training dataset was used to fit the model.  The 4725 raw experimental images collected from four oil tea cultivars were used to build a high-quality oil tea cultivar identification dataset. As shown in Table 1, the oil tea cultivar identification dataset was divided into three parts: one for training (60% of the dataset, 2845 samples), one for validation (20% of the dataset, 940 samples), and one for testing (20% of the dataset, 940 samples). The training dataset was used to fit the model. The validation dataset was used to adjust the hyperparameters of the model and to select the best model. The testing dataset was used to evaluate the performance of the best model.

EfficientNet-B4-CBAM Model
EfficientNet is a highly accurate network obtained through a machine search [46]. It uses a simple and effective compound coefficient to uniformly scale the width, depth, and resolution of the networks [47]. In addition, compared to other CNN models that achieve similar accuracy on the ImageNet dataset, EfficientNet is much smaller. To accurately identify different oil tea cultivars, a new deep learning model, called EfficientNet-B4-CBAM, is built combining the EfficientNet-B4 model and CBAM module. Figure 2 illustrates the network structure of the EfficientNet-B4-CBAM model. Framework of the proposed EfficientNet-B4-CBAM for oil tea cultivar identification. Input denotes input image of oil tea; N × N × C represents the size of feature map, where N × N is the 2D map size and C is the number of channels; Conv denotes pointwise convolution; k n × n represents the convolution kernel of n × n; MBConv represents mobile inverted bottleneck convolution; MBConv1 × n indicates n MBConv1 modules; MBConv6 × n indicates n MBConv6 modules; CBAM denotes convolutional block attention module; AdaptiveAvgPool2d represents adaptive average pooling; Fully connected represents fully connected layer; Output represents output layer.
As shown in Figure 2, the EfficientNet-B4-CBAM model is mainly composed of an EfficientNet-B4 model and a CBAM module. In the EfficientNet-B4-CBAM model, the Ef-ficientNet-B4 model was responsible for extracting oil tea features, whereas the CBAM module was responsible for realizing the refinement of the extracted oil tea features [48]. The EfficientNet-B4 model comprises mostly a mobile inverted bottleneck convolution, with a three-channel image with a pixel resolution of 380 × 380 as input and an identification result as the output. Figure 3 shows an illustration of MBConv6.  Figure 2. Framework of the proposed EfficientNet-B4-CBAM for oil tea cultivar identification. Input denotes input image of oil tea; N × N × C represents the size of feature map, where N × N is the 2D map size and C is the number of channels; Conv denotes pointwise convolution; k n × n represents the convolution kernel of n × n; MBConv represents mobile inverted bottleneck convolution; MBConv1 × n indicates n MBConv1 modules; MBConv6 × n indicates n MBConv6 modules; CBAM denotes convolutional block attention module; AdaptiveAvgPool2d represents adaptive average pooling; Fully connected represents fully connected layer; Output represents output layer.
As shown in Figure 2, the EfficientNet-B4-CBAM model is mainly composed of an EfficientNet-B4 model and a CBAM module. In the EfficientNet-B4-CBAM model, the EfficientNet-B4 model was responsible for extracting oil tea features, whereas the CBAM module was responsible for realizing the refinement of the extracted oil tea features [48]. The EfficientNet-B4 model comprises mostly a mobile inverted bottleneck convolution, with a three-channel image with a pixel resolution of 380 × 380 as input and an identification result as the output. Figure 3 shows an illustration of MBConv6.
CBAM denotes convolutional block attention module; AdaptiveAvgPool2d represents adaptive average pooling; Fully connected represents fully connected layer; Output represents output layer.
As shown in Figure 2, the EfficientNet-B4-CBAM model is mainly composed of an EfficientNet-B4 model and a CBAM module. In the EfficientNet-B4-CBAM model, the Ef-ficientNet-B4 model was responsible for extracting oil tea features, whereas the CBAM module was responsible for realizing the refinement of the extracted oil tea features [48]. The EfficientNet-B4 model comprises mostly a mobile inverted bottleneck convolution, with a three-channel image with a pixel resolution of 380 × 380 as input and an identification result as the output. Figure 3 shows an illustration of MBConv6.  Figure 3. The MBConv6 (k5 × 5) fundamental structure. MBConv represents mobile inverted bottleneck convolution; Conv denotes pointwise convolution; BN denotes batch normalization; Swish denotes swish activation function; DWConv represents depthwise convolution; SE Module represents squeeze-and-excitation module; FC represents fully connected layer; H × W × F represents tensor shape (height, width, depth).
Pointwise convolution, depthwise convolution, and a squeeze-and-excitation (SE) module are the three primary components of MBConv. When receiving the feature map, MBConv first executes a 1 × 1 pointwise convolution on it and then changes the channel dimension of the input feature map according to the expansion ratio. Next, a 5 × 5 depthwise convolution is applied, followed by the introduction of a SE module to boost the expressiveness of the model. Subsequently, 1 × 1 pointwise convolution is used to return Swish denotes swish activation function; DWConv represents depthwise convolution; SE Module represents squeeze-and-excitation module; FC represents fully connected layer; H × W × F represents tensor shape (height, width, depth).
Pointwise convolution, depthwise convolution, and a squeeze-and-excitation (SE) module are the three primary components of MBConv. When receiving the feature map, MBConv first executes a 1 × 1 pointwise convolution on it and then changes the channel dimension of the input feature map according to the expansion ratio. Next, a 5 × 5 depthwise convolution is applied, followed by the introduction of a SE module to boost the expressiveness of the model. Subsequently, 1 × 1 pointwise convolution is used to return the feature map to its original channel dimension. Finally, a drop connect is executed, and a skip connection of the input is applied.
As can be seen in Figure 4, the CBAM module mainly consists of a channel attention module and a spatial attention module. the feature map to its original channel dimension. Finally, a drop connect is executed, and a skip connection of the input is applied. As can be seen in Figure 4, the CBAM module mainly consists of a channel attention module and a spatial attention module. As shown in Figure 4a, when the feature map F of the input oil tea image is input to the CBAM module, the CBAM module first sends F to the channel attention module for processing, resulting in the channel attention feature map Mc of the input oil tea image. Then, refined oil tea image feature map F' needed by the spatial attention module is obtained by multiplying Mc with F. Then, F′ is fed into the spatial attention module, yielding the spatial attention feature map Ms of oil tea image, and F´ is multiplied by Ms to produce the final feature map F″ of oil tea image.
As shown in Figure 4b, the channel attention mechanism compresses the feature map F of the oil tea images in the spatial dimension to produce a one-dimensional vector Mc. As shown in Figure 4a, when the feature map F of the input oil tea image is input to the CBAM module, the CBAM module first sends F to the channel attention module for processing, resulting in the channel attention feature map Mc of the input oil tea image. Then, refined oil tea image feature map F needed by the spatial attention module is obtained by multiplying Mc with F. Then, F is fed into the spatial attention module, yielding the spatial attention feature map Ms of oil tea image, and F is multiplied by Ms to produce the final feature map F of oil tea image.
As shown in Figure 4b, the channel attention mechanism compresses the feature map F of the oil tea images in the spatial dimension to produce a one-dimensional vector Mc. Average pooling and max pooling are used to aggregate the spatial information of the feature maps, and a shared multilayer perceptron is used to compress the spatial dimensions of the input oil tea image feature maps.
As shown in Figure 4c, the spatial attention module compresses the oil tea image feature maps F output by the channel attention module within the spatial dimension to generate a one-dimensional vector and then generates spatial attention feature Ms after a sigmoid. Max pooling is used to extract the maximum value on the channel, and average pooling is used to extract the average value on the channel.

Evaluation Indicators
In this study, six quantitative criteria are used to evaluate the oil tea cultivar identification results. The accuracy (ACC), precision (P), recall (R), F1-score (F 1-score ), overall accuracy (OA), and kappa coefficient (K c ) are used to assess and compare the performance of the cultivar identification, as shown in Equations (1)-(6). where

Experimental Environment Configuration
All CNN models in this study were modeled using the Python tool in the Tensorflow and Keras framework, and all the confusion matrix and heat maps were exported using Matplotlib. The training and testing of the convolutional neural network were conducted on a PC with a 3.7-GHz Intel i7-8700k CPU and an NVIDIA GeForce GTX 1080 Ti graphics processing unit.
The input image size and training epochs of the EfficientNet-B4-CBAM model were 380 × 380 pixels and 30 epochs, respectively. To accelerate the model training and improve the model performance, transfer learning was exploited, and the initial freeze layer was set to 12. The learning rate and batch size were set to 0.0001 and 8, respectively.

Analysis Results of Cultivar Identification Using EfficientNet-B4-CBAM
In this section, the cultivar identification performance of the EfficientNet-B4-CBAM model was assessed using images from the testing dataset. Figure 5 shows the confusion matrix of EfficientNet-B4-CBAM for the identification of oil tea cultivars in the test dataset. and Keras framework, and all the confusion matrix and heat maps were exported using Matplotlib. The training and testing of the convolutional neural network were conducted on a PC with a 3.7-GHz Intel i7-8700k CPU and an NVIDIA GeForce GTX 1080 Ti graphics processing unit.
The input image size and training epochs of the EfficientNet-B4-CBAM model were 380 × 380 pixels and 30 epochs, respectively. To accelerate the model training and improve the model performance, transfer learning was exploited, and the initial freeze layer was set to 12. The learning rate and batch size were set to 0.0001 and 8, respectively.

Analysis Results of Cultivar Identification Using EfficientNet-B4-CBAM
In this section, the cultivar identification performance of the EfficientNet-B4-CBAM model was assessed using images from the testing dataset. Figure 5 shows the confusion matrix of EfficientNet-B4-CBAM for the identification of oil tea cultivars in the test dataset.  From the confusion matrix, it can be seen that the misidentification of oil tea cultivars mainly occurs between Huashuo and Huajin. After observing the dataset of oil tea, we found that there were two main reasons: (1) there are extremely small and subtle differences between different oil tea cultivars, and (2) leaf-shading and illumination influence the identification of the oil tea cultivars. In Table 2, we report the quantitative evaluation results of EfficientNet-B4-CBAM on the testing dataset. As shown in Table 2, the proposed EfficientNet-B4-CBAM has a remarkable effect on the identification of oil tea cultivars in the testing dataset. From the perspective of single cultivar identification, the accuracy of EfficientNet-B4-CBAM for identifying four oil tea cultivars was higher than 97.87%, the precision was higher than 95.53%, the recall was higher than 94.36%, and the F1-score was higher than 95.34%. Taking Xianglin 210 as an example, the accuracy, precision, recall, and F1-score of EfficientNet-B4-CBAM for Xianglin 210 identification were 99.04%, 98.11%, 98.48%, and 98.29%, respectively. The identification results of a single oil tea cultivar showed that EfficientNet-B4-CBAM could accurately identify each cultivar in the testing dataset. From the perspective of the overall cultivar identification, we found that the overall accuracy of EfficientNet-B4-CBAM for the testing dataset identification was 97.02%, and the kappa coefficient was 0.96, indicating that EfficientNet-B4-CBAM has a stable ability to identify all oil tea cultivars.
Moreover, we found that the EfficientNet-B4-CBAM model also made some misjudgments when identifying oil tea cultivars. For example, EfficientNet-B4-CBAM identifyies Xianglin210, Huashuo, Huajin, and Huaxin with FP of 5, 13, 7, and 3, respectively, and FN Forests 2022, 13, 1 8 of 13 of 4, 7, 11, and 6, respectively. This was because the differences between some cultivars were extremely small and subtle [49]. For instance, the difference between Xianglin 210 and Huashuo was only the rind color of the fruit. In addition, between Huajin and Huashuo, the only difference was the shape of the fruit.

Comparison of Cultivar Identification Results with Different Models
To estimate the cultivar identification performance of EfficientNet-B4-CBAM model, the EfficientNet-B4-CBAM model was compared with other CNN models on the testing dataset. The cultivar identification results of these models were presented in the form of a confusion matrix, as shown in Figure 6. As Figure 6 indicates, the cultivar misidentification of EfficientNet-B4-CBAM in the testing dataset was far lower than that of the other models in the comparison experiment. Comparing the confusion matrices of VGG16, InceptionV3, ResNet50, and EfficientNet-B4, it can be seen that the cultivar misidentification of EfficientNet-B4 is lower, which proves that it is reasonable to choose EfficientNet-B4 as the base model in this study. In addition, after adding the CBAM module to the EfficientNet-B4 model, the correctly identified Xianglin210, Huashuo, Huajin, and Huaxin increase by 11, 1, 6, and 35. The test results prove that the CBAM module can effectively improve the oil tea cultivar identification ability of the EfficientNet-B4 model [50].
To comprehensively assess the EfficientNet-B4-CBAM, we used the accuracy, precision, recall, F1-score, overall accuracy, and kappa coefficients as evaluation indicators to quantitatively evaluate all methods of the comparative experiment. A comparative evaluation of several state-of-art identification methods on the testing dataset is presented in Table 3.  As Figure 6 indicates, the cultivar misidentification of EfficientNet-B4-CBAM in the testing dataset was far lower than that of the other models in the comparison experiment. Comparing the confusion matrices of VGG16, InceptionV3, ResNet50, and EfficientNet-B4, it can be seen that the cultivar misidentification of EfficientNet-B4 is lower, which proves that it is reasonable to choose EfficientNet-B4 as the base model in this study. In addition, after adding the CBAM module to the EfficientNet-B4 model, the correctly identified Xianglin210, Huashuo, Huajin, and Huaxin increase by 11, 1, 6, and 35. The test results prove that the CBAM module can effectively improve the oil tea cultivar identification ability of the EfficientNet-B4 model [50].
To comprehensively assess the EfficientNet-B4-CBAM, we used the accuracy, precision, recall, F1-score, overall accuracy, and kappa coefficients as evaluation indicators to quantitatively evaluate all methods of the comparative experiment. A comparative evaluation of several state-of-art identification methods on the testing dataset is presented in Table 3. It can be clearly seen from Table 3 that, compared with InceptionV3, VGG16, and ResNet50, EfficientNet-B4 has a better ability to identify oil tea cultivars. The identification overall accuracy of EfficientNet-B4 for the oil tea cultivar was 91.38%, which was 22.34%, 19.36%, and 5.85% higher than that of InceptionV3, VGG16, and ResNet50, respectively. The kappa coefficient of the EfficientNet-B4 was 0.88, which was 0.29, 0.26, and 0.07 higher than that of InceptionV3, VGG16, and ResNet50, respectively.
By comparing the experimental data of EfficientNet-B4, EfficientNet-B4-SE, and EfficientNet-B4-CBAM listed in Table 3, we found that adding an attention mechanism to EfficientNet-B4 can significantly improve the ability of oil tea cultivar identification. When the attention mechanism was not added, the overall accuracy and kappa coefficients of EfficientNet-B4 for oil tea cultivar identification in the testing dataset were 91.38% and 0.88, respectively. When the SE module was added to EfficientNet-B4, such as EfficientNet-B4-SE shown in Table 3, the overall accuracy and kappa coefficient of oil tea cultivar identification were increased by 1.60% and 0.02, respectively. When the CBAM module was added to EfficientNet-B4, such as EfficientNet-B4-CBAM in Table 3, the overall accuracy and kappa coefficient of oil tea cultivar identification were increased by 5.64% and 0.08, respectively. From the cultivar identification results of EfficientNet-B4, EfficientNet-B4-SE, and EfficientNet-B4-CBAM, we discovered that the performance improvement of EfficientNet-B4 when using the CBAM module is superior to that of the SE module, which may be associated with the fact that the spatial attention module of the CBAM module can locate the key information more accurately.
In summary, after a comprehensive comparison of different models on the testing dataset, we found that the EfficientNet-B4-CBAM proposed in this paper can accurately identify oil tea cultivars under natural conditions. Compared with other models used in comparative experiments, EfficientNet-B4-CBAM has obvious advantages for most evaluation indicators.

Visual Analysis of Cultivar Identification Results
To investigate the reason of why EfficientNet-B4-CBAM outperforms other CNN models, Grad-CAM [51] is adopted to visualize the cultivar identification results of oil As shown in Figure 7, it is apparent that when identifying the cultivars of oil tea, EfficientNet-B4 focused on the areas of the fruit and background, whereas EfficientNet-B4-CBAM paid substantial attention to the areas of the fruit, but paid little attention to the background areas. According to the experience of human experts, when identifying oil tea cultivars, the fruit areas of oil tea images can often provide key information for cultivar identification, whereas the background areas of the oil tea image generally interfere with the cultivar identification [52].
By comparing the heat maps shown in Figure 7, it can be seen that after the CBAM module was added to EfficientNet-B4, the attention of the model focused more on the fruit areas. Based on this phenomenon, we can determine that when identifying oil tea cultivars, the CBAM module can not only determine the location of key information but also As shown in Figure 7, it is apparent that when identifying the cultivars of oil tea, EfficientNet-B4 focused on the areas of the fruit and background, whereas EfficientNet-B4-CBAM paid substantial attention to the areas of the fruit, but paid little attention to the background areas. According to the experience of human experts, when identifying oil tea cultivars, the fruit areas of oil tea images can often provide key information for cultivar identification, whereas the background areas of the oil tea image generally interfere with the cultivar identification [52].
By comparing the heat maps shown in Figure 7, it can be seen that after the CBAM module was added to EfficientNet-B4, the attention of the model focused more on the fruit areas. Based on this phenomenon, we can determine that when identifying oil tea cultivars, the CBAM module can not only determine the location of key information but also improve the expression of information in key areas, thereby improving the identification of oil tea cultivars based on the use of EfficientNet-B4. The heat map analysis experiment proved the capability of the EfficientNet-B4-CBAM model to identify oil tea cultivars from a visual perspective.

Conclusions
There are significant differences in the yield, oil content, and camellia oil quality among different oil tea cultivars, and the identification of oil tea cultivars has an important impact on cultivar breeding and an adjustment of the industrial structure. In this study, four typical oil tea cultivars were identified using computer vision technology based on deep learning, and an oil tea cultivar identification model, EfficientNet-B4-CBAM, was proposed.
When identifying the four typical oil tea cultivars in the testing dataset, the overall accuracy and kappa coefficient of EfficientNet-B4-CBAM proposed in this paper were 97.02% and 0.96, respectively. Compared with other CNN models used in the comparative experiment, EfficientNet-B4-CBAM has obvious advantages in all evaluation indexes. The experiment results of the visual analysis show that the EfficientNet-B4-CBAM proposed in this paper can not only accurately locate fruit areas but can also fully express the information of such areas when identifying the cultivars of oil tea.
This research can provide more advanced technical options for the identification of oil tea cultivars and lay a foundation for further research on image-based non-destructive recognition of oil tea cultivars. We suggest that future work can further enrich the oil tea cultivar identification data set, and provide a sufficient data basis for the research on the oil tea cultivar identification algorithm. In future research, we will optimize the speed of the EfficientNet-B4-CBAM model and attempt to deploy it using a mobile phone.