Multiscale Local Feature Fusion: Marine Microalgae Classification for Few-Shot Learning

: In the marine ecological environment, marine microalgae is an important photosynthetic autotrophic organism, which can carry out photosynthesis and absorb carbon dioxide. With the increasingly serious eutrophication of the water body, under certain environmental factors, the rapid propagation of some algae in the water body gradually forms a harmful bloom, which damages the water environment. Therefore, how to identify the beneﬁcial algae and harmful algae quickly and accurately has gradually become the key to solve the problem. There are more than 30,000 species of marine microalgae in the world, and the sample data are few and the characteristics are not obvious. Many of the algae are similar in shape and difﬁcult to distinguish. The few-shot learning task is very challenging. By training very few labeled samples, the deep learning model has excellent recognition ability. Meanwhile, the few-shot classiﬁcation method based on metric learning has attracted considerable attention. In this paper, in order to make full use of image features and improve the generalization ability of the model, a multi-scale local feature fusion algorithm was proposed to classify marine microalgae with few shots. First, the input image is gridded and multiscale processed, and then it is sent to the CTM category traversal module for feature extraction to obtain local features. A local feature fusion module based on the SE-NET self-attention mechanism is designed to obtain local enhanced features containing global information to improve the generalization ability of the model. Classiﬁcation is realized by calculating the distance between the sample feature vector of the query set and the prototype of the support set. Under the settings of 5-way 1-shot and 5-way 5-shot, the classiﬁcation accuracy of the proposed method is improved by 6.08% and 5.5%, respectively. It provides a new idea for microalgae identiﬁcation and a new opportunity for the sustainable development of new energy.


Introduction
Marine microalgae are primitive, tiny photosynthetic organisms that can release oxygen by converting solar energy and synthesizing organic matter; this process of photosynthetic oxygen release is an important chemical reaction on Earth.Marine microalgae are tiny and can only be seen clearly with the help of a microscope.Various approaches such as computer vision and deep learning-based techniques have been frequently used for object detection [1][2][3].At the same time, such methods have also been used to identify algae [4,5].At present, few species of marine microalgae have been recorded; however, the organic matter synthesized by marine microalgae accounts for half of all photosynthetic products on Earth every year.When encountering something new, humans can usually learn it quickly with a small quantity of learning data.For example, a few pictures can be used to recognize plants that have never been seen before.The deep learning algorithms used in modern AI systems often require a large quantity of data to be trained in order to achieve a usable model, resulting in a high training cost.Additionally, the large quantity of data for manual labeling is very time-consuming and labor-intensive.Therefore, ideally, AI systems need to have the same fast learning ability as human beings to make deep learning models with excellent recognition ability with only a very small number of labeled samples for each type of training data.This can be achieved with the few-shot classification task [6].There are more than 30,000 species of marine microalgae worldwide, and most of them require human recognition of microalgae microscopic images.Due to the small size and dense distribution in microalgae images, most algae with similar morphology are analyzed and identified using cell detection techniques, but it takes a long time.In this study, it is important to use the few-shot learning algorithm for the classification and identification of marine microalgae microscopic images to further distinguish algae accurately with less image data and close morphology.
In recent years, most methods using meta-learning have obtained good results in few-shot learning, and metric-based learning methods have become mainstream in metalearning few-shot classification algorithms, which classify by measuring the distance between samples in the feature space.The twin network structure (Siamese neural network) proposed by Koch G et al. [7] (2015) separately extracts features from two images by the same network structure, and if the feature information of the two images is very close, then they are likely to belong to the same class of objects; otherwise, they belong to different classes of objects.The network proposed by Vinyals et al. [8] (2016) is a matching network (MN) which adopts the form of matching to achieve the few-shot classification task, and introduces the idea of the nearest neighbor algorithm to solve the overfitting problem caused by deep learning algorithms that cannot fully optimize the parameters under the condition of few-shot.It uses a neural network with an attention mechanism and a memory module to solve the problem of over-reliance on the ordinary nearest neighbor algorithm metric function, and maps the sample feature information.Snell et al. [9] (2017) used an embedding representation of the prototypical network (PN) so that the feature points corresponding to each category are clustered around a single prototypical representation.The input information is mapped to a high-dimensional metric space through a simple four-level neural network structure, and, for each category, the mean of its high-dimensional feature vector is taken as the prototype.Finally, the squared Euclidean distance between the corresponding high-dimensional feature vector of the test sample and each prototype is calculated and transformed into probability values using softmax to predict the category of that sample.Sung et al. [10] (2018) proposed the relation network (RN) to learn a portable depth metric capable of comparing the relationship between images.The network is divided into two stages: the first stage is an embedded module for extracting feature information, and the second stage is a relevance module for outputting the similarity score between two images to determine whether two images are from the same category.Li et al. [11] (2019) proposed the deep nearest neighbor neural network (DN4), which compares images with the category by comparing the local features.Zhang et al. [12] (2020) proposed a new distance metric, the Earth Mover's Distance (EMD), which finds the distance between two image blocks by means of linear programming, finding the best match between the individual blocks of two images, assigns different weights to the blocks at different locations, and expresses the similarity between the query set and the support set images by calculating the cost of the best match between the individual blocks of the two.Qi Wang et al. [13] (2021) studied a multi-attention network model, which uses semantic embedding of class tags to generate attention graphs and uses these attention graphs to create image features for one-time learning.Different from other models, the attention generators in this model can be extended to new classes.Xie et al. [14] (2022) proposed a deep Brownian distance covariance (DeepBDC) based method for few-shot classification tasks by combining the Euclidean distance between the feature function and the edge product to quantify the correlation between two random variables.Wei et al. [15] proposed a prototype network based on mixed attention to solve the problem of relationship classification in a few-shot environment.In this model, an instance-level attention mechanism is designed to select key sample instances to support concentration, so as to alleviate the influence of noise samples on the model.A feature-level attention mechanism is also designed to highlight important feature dimensions in the feature space, so as to alleviate the problem of feature sparsity.
The above methods based on metric learning are simple, effective, and widely popular.However, the metric learning model above uses image-level global features.In the case of sparse data samples, these global features cannot effectively characterize the category distribution, which affects the final classification results to a certain extent.To effectively utilize the local distinguishable information of images and improve the feature representation and generalization ability of the model, this paper proposes a few-shot classification method based on gridded multiscale local feature fusion.First, the input algae images are processed by gridded multiscale chunking and then fed into the CTM category traversal module to obtain local features.To address the problem that the category traversal module is difficult to learn and generalize, a local feature fusion module based on the SE-NET architecture is designed to learn the relationship between multiple local features of each image to obtain a better metric feature representation.Finally, extensive experiments are conducted on a microscopic marine microalgae image dataset to verify the effectiveness of the proposed algorithm.The main contribution of our study in this work is as follows:

•
The collection of microscopic image of marine microalgae;

•
Trained multiscale local feature fusion, CTM, SE-NET, and a metric module on the marine microalgae dataset;

•
The feasibility of the method was verified by ablation experiment;

•
Comparative analysis of all the models' accuracies and performances.

Category Traversal Module
A category traversal module [16] can be added directly to the network of the original algorithm as a plug-and-play module.Relative to the original category traversal module, CTM extracts "intraclass commonality" and "interclass uniqueness" in a targeted manner.
The CTM category traversal module consists of two parts: a concentrator for extracting common features within marine microalgae classes, and a projector for extracting unique features between marine microalgae classes.First, the feature information of the marine microalgae support set and query set is obtained by the category traversal module f θ (S) and f θ (Q).Then, the feature information of the marine microalgae support set is input into the cascade.Next, the convolutional neural network performs dimensional compression on the input marine microalgae feature.Finally, the average of each marine microalgae class sample feature map is the output as o.The process is represented as follows: Eq. NK denotes the input N marine microalgae categories, with K marine microalgae samples in each category, for a total of NK samples.m 1 denotes the number of channels in the marine microalgae signature map.d 1 denotes the size of the marine microalgae feature map, and after the cascader, the number of feature maps is compressed to N because the sample feature maps are averaged for each category.The dimensionality of the feature map is compressed to m 2 , and the size of the feature map is compressed to d 2 .This process removes the differences between the instances and extracts the features common to each instance of each category.
Next, the output o of the cascader enters the projector, and the feature maps corresponding to each category are cascaded along the feature map channel direction.Then, the marine microalgae feature maps are compressed using the convolutional neural network, and, finally, the mask map p corresponding to each feature map is obtained using the softmax layer in the marine microalgae feature map channel dimension.The process is represented as follows: After obtaining the output o of the cascade in Equation ( 2), the dimension of the marine microalgae feature map become d 2 .Then, the CNN in the projector compresses it to d 3 , and the marine microalgae feature map dimension is also compressed to d 3 .This process extracts the unique features between classes.Finally, the initial marine microalgae feature information is obtained through the category traversal module f θ (S) and f θ (Q).

SE-NET Feature Fusion
The extracted initial marine microalgae feature information is fused by the selfattentive feature fusion module f θ (S) and f θ (Q).The initial feature vectors are stitched together and fused according to the weights to generate a more accurate feature representation of the marine microalgae image.Cao et al. [17] improved the efficiency of the model in processing feature information by extracting important feature information on an individual marine microalgae basis and by fusing the information as a whole.The fused feature representation enhances the association of the sample detailed features and global features.The self-attentive mechanism can reduce the dependence on external information by obtaining the internal correlation of global information features of marine microalgae.

Squeeze Operation
In the convolution process, the individual feature convolution is a local spatial sensory field, and there is no channel information data.The concept of channel features is only obtained when the sum operation is performed.The purpose of the squeeze operation is to open the correlation between channels only with the sum operation to enhance the channel data correlation.The squeeze operation takes each marine microalgae feature map through the global pooling operation extracted as scalar data.Wang et al. [18] calculated the sum of each pixel value of the N × M feature map and averaged them to obtain the global average pooling result, which compresses the N × M feature map into a scalar global description feature.Then, the original N × M × S feature map is transformed into a 1 × 1 × S feature map, and the number of channels remains unchanged.

Excitation Operation
The global description of each feature map is obtained through the squeeze operation, which opens the connection between channels, and the excitation operation is used to obtain the relationship between channels.Specifically, the corresponding channel weights are obtained by two full-connection operations and the sigmoid activation function.Among them is the ReLU function.W1 is the first full-connection operation, the purpose of which is dimensionality reduction, and it has a dimensionality reduction coefficient r as the hyperparameter.W2 is the parameter of the second full-connection operation, which restores the dimensionality to the input dimension.After two full connections, a weight normalization operation is performed on each feature map using the sigmoid activation function.
Finally, the output weight matrix of the sigmoid activation function is multiplied with the original feature map to obtain the output feature map with weights, which adds the weight coefficients of each feature map in the channel dimension by a convolution operation.This makes the channel features of the feature map more capable of extracting features, amplifying the effective features, and reducing the features.After the SE block is embedded in a layer, the received marine microalgae feature map is first pooled globally averaged to make the size of the feature map 1 × 1, and the number of channels remains unchanged.

Problem Definition
For a few-shot marine microalgae target classification task, there exists a marine microalgae support set S, which consists of N different classes of algal images and K labeled samples of each class.Then, a query set Q of algae consists of N classes of images of the same class as the support set S and q unlabeled samples of each class.The purpose of few-shot learning is to enable the model to classify each unlabeled sample in set Q according to set S. This setup is called the N-way K-shot few-shot classification task [19].
The entire seaweed marine microalgae dataset is divided into three subdatasets according to categories, namely, the seaweed training dataset, the seaweed validation dataset, and the seaweed testing dataset, where the training set has the largest number of sample categories.The training and validation datasets are used in training, and the test dataset is used to test the classification performance of the best model generated in the training phase.There is no intersection of the categories in the three subdatasets.
The proposed few-shot classification model based on gridded multiscale local feature fusion is a neural network architecture that can be trained from end-to-end.The general framework is shown in Figure 1, which is a schematic diagram of the few-shot classification task for marine microalgae 5-way 1-shot classification.The proposed method consists of three main modules: a CTM local feature extraction module, a local feature fusion module based on the SE-NET architecture, and a metric module, as shown in Figure 1.

Backbone
The backbone network selected is the ResNet network, which is modified from the VGG19 network by adding residual units through a short-circuiting mechanism [20].The ResNet network changes are reflected in the fact that ResNet directly uses the convolution of stride = 2 for downsampling and replaces the fully connected layer with the global average pool layer.An important ResNet design principle is that the feature map doubles the feature channels when the feature map size is reduced by half, which maintains the complexity of the network layers.ResNet adds a short-circuiting mechanism between every two layers compared to the normal networks, which results in residual learning.
The ResNet-12 residual block is composed of four residual blocks of depth 3, each with three convolutional layers with a 3 × 3 kernel inside, and the number of channels in each stage of the network is [64, 160, 320, 640], and the number of residual blocks is [1, 1, 1, 1, 1], respectively, and a maximum pooling layer of 2 × 2 is applied at the end of each block [21].

Feature Extraction Module
The CTM category traversal module is chosen as the feature extraction module, and the main CTM structure is shown in Figure 2, where concentrator represents the cascade for extracting intraclass common features of marine microalgae, and projector represents the projector for extracting interclass unique features of marine microalgae.CTM can extract intraclass common features and interclass unique features of marine microalgae in a targeted manner, and the intraclass common features and interclass unique features in the marine microalgae dataset are shown in Figure 3.In Figure 3, in the marine microalgae support set (4-shot), each category contains four samples, and the texture features of the same category samples are the same in four samples, while the shape features are different, indicating that the texture features are common to such samples, which are common features within the marine microalgae category.
The marine microalgae support set (1-shot) contains five sample categories.Each category has only one sample, and each sample contains two features: texture features and shape features.When a query sample query is given, the distance between the query sample query and the first sample is 2, and the texture features and shape features are different from the other 4 samples.The distance between the query sample query and the first sample is 2, and the texture and shape features are different, and the distance between the other 4 samples is 1, and the texture and shape features are different on only 1.Following the traditional metric learning approach, the query sample might be randomly assigned to one of the four categories.By carefully observing the five samples in the support set, it is found that the texture features are different among different categories, while the shape features have one feature in common to different categories, indicating that the texture features are the fundamental features for distinguishing the categories, and such features with different attributes among different categories are unique features among marine microalgae.

Self-Attentive Feature Fusion Module
The global feature information of the marine microalgae image can be extracted indiscriminately by the CTM feature extractor, which may be affected by background interference and different sizes and morphologies of the target objects.To better represent the important feature information in the image, an SE-NET self-attentive feature fusion module is used to fuse the important feature information of the marine microalgae image [22].A microalgae image sample can generate two sets of feature information after passing through the CTM feature extractor.The SE-NET self-attentive feature fusion module is used to fuse the extracted important feature information at two different scales, and, finally, these feature vectors are stitched together to generate a more accurate feature representation of this image.By fusing the microalgae image information as a whole, the efficiency of the model in processing the feature information is improved.

Metrics Module
The distance measure chosen for the metric module is the squared Euclidean distance in the Bregman scatter, where, for a series of marine microalgae feature points, the nearest point in the feature space to the average distance between all points is the mean of the series of feature points.In [23], the squared Euclidean distance was used as a distance measure, which is better than the cosine distance, and the Bergman scatter as a distance measure can transform the classification problem of few-shot classification into a linear classification model.
The loss function uses the episode small batch gradient descent training method, for which the distance d between the training sample and the prototype is taken as a negative number and transformed into a probability value p using the softmax function, and the negative logarithm of the probability value is taken as the loss.The goal is to make the samples closer and closer to the prototype of the correct category, and further and further from the prototype of other categories.The specific form of the loss function is:

Evaluation Method
The evaluation method used in this paper is accuracy with a confidence interval, which can be used to measure the uncertainty of the estimated value.The accuracy rate calculated in this way is more real and reliable.The overall evaluation index is calculated as follows.First, the accuracy of classification is calculated: where T represents the number of samples correctly classified, and all represents the total number of samples.The mean avg of accuracy is obtained by averaging the sum of multiple accuracies.

Image Dataset
The microalgae images from the Heshi Reef sea area in Dalian city (Liaoning province, China) were used in this study.The image information was collected by extracting seawater samples and placing the samples under the microscope of an optical microscopic imaging system.The specific steps are as follows.
Step 1: An appropriate sample of seawater is taken and it is placed under the microscope of the optical microscopy imaging system.
Step 2: The identified target in the water is magnified and imaged to a clear resolution, and then converted into an image signal through a high-speed Charge Coupled Devic imaging device.
Step 3: The image signal is sent to the computer through the image acquisition card to obtain the image information.
The microscopic image acquisition process of marine microalgae is shown in Figure 4: Based on the marine microalgae microscopic image samples collected in the laboratory, the collected microscopic images of algae cells were expanded to provide sample support for the subsequent microalgae recognition and classification learning algorithm.In most cases, due to the influence of acquisition tools or means in the process of image acquisition, the acquired image, namely, the original image, is restricted by various conditions and often cannot fully reflect all the information of the original image.Some samples of the finally obtained marine microalgae microscopic image data set are shown in Figure 5: The acquisition process of marine microalgae microimages is complicated.In order to avoid image overfitting, it is necessary to expand the marine microalgae microimage.
In practical problems, it is difficult for relevant researchers to collect enough relevant data in some tasks, and the sorting, screening, and the labeling of image data after collection also consumes people's mental and physical energy.In order to obtain more training data, researchers often use data augmentation methods.This method makes certain changes to the original image to make it become two different images in pixel level from the original image.However, the subject predicted in the image does not change, and this method can enrich the data set and effectively avoid overfitting.
According to the collected marine microalgae image data, the data was expanded by using such methods as data flipping, data rotation, data splicing, brightness adjustment, and noise addition.The data augmentation effect is shown in Figure 6.In most cases, the acquired image will be affected by particles due to the influence of acquisition tools or means in the process of image acquisition.The original image is limited by particles under various conditions and often cannot fully reflect all the information of the original image [24,25].Therefore, it must be processed in the early stage of visual information processing.
By using computer technology, image processing operations such as gray correction and noise filtering are carried out on the original image.Some unnecessary image information is improved by particle suppression on the image data, and useful image information for subsequent processing is enhanced.The dataset contained 150 classes of marine microalgae images, 100 samples of each class, totaling 15,000 images.The main dataset information is shown in Table 1.The overall marine microalgae microscopic image recognition framework is shown in Figure 7:

Experimental Setup and Method
In this experiment, marine microalgae microscopic images are used as the data set, and the recognition algorithm of few-shot learning is used to identify marine microalgae images.The process is shown below.
Training process: the weight of the backbone network in the model is trained in advance, and then fine-tuned together with other modules in the model to specifically carry out the N-way K-shot few-shot classification task.There are 100 epochs in total, and there are 100 and 500 episodes in each epoch for training and verification.In the test phase, 2000 episodes tasks are randomly selected from the test set to evaluate the average accuracy of their classification.For an episode, N classes are selected, and then support and query are selected for each selected class.The model is trained once with the support and query of several selected classes.
Two scenarios commonly used in few-shot learning were selected for the experiment: 5-way 1-shot and 5-way 5-shot.The scenario of 5-way 1-shot increased the training difficulty due to the small number of support set samples.The backbone network in the model is initialized with pre-trained weights, and then fine-tuned together with other modules in the model.In particular, the situational training strategy is adopted to carry out the N-way K-shot few-shot classification task.The SGD (Stochastic Gradient Descent) optimizer is used to optimize the network.The initial learning rate is set to 100 epochs in total, and the learning rate decays by half after every 10 epochs.Each epoch has 100 and 300 episodes for training and verification.
In the test phase, 1000 episodes tasks are randomly selected from the test set to evaluate the average accuracy of their classification.In the verification and testing stage, whether the input is a whole image or a local image block, the size is adjusted to 84 × 84 first, and then it is input into the backbone network to extract features.
In Table 2, the proposed method P(1, 2, 3) indicates that the original graph is divided into a total of 14 local blocks of 1 × 1, 2 × 2, and 3 × 3 as inputs.The proposed method shows excellent performance on the marine microalgae classification dataset.On the 5-way 1-shot and 5-way 5-shot few-shot classification tasks, the results of the proposed method improved 6.08% and 5.5% over the suboptimal results, respectively.

Ablation Experiments
To verify the effect of gridded multiscale local features on the classification accuracy of the model and to select the optimal hyperparameters, the following ablation experiments were designed in the marine microalgae dataset using a different number of feature scales.
From the experimental results, it can be concluded that the marine microalgae dataset with P(1,2,3) scale features has the best final recognition effect.Single-scale features can only obtain partial feature information about the image, which is one-sided.However, when using multiple scales for feature extraction, not only the detailed features of the image but also the global feature information of the image can be obtained, which achieves a more accurate information representation of the image in multiple scales and can better improve the classification accuracy of the model.
Although the chunking process can effectively utilize the local image features, when the number of chunks is too large, a complete target may be separated into multiple local chunks, and the integrity of the target is destroyed, which adversely affects the classification.Therefore, the number of chunks needs to be traded off according to the classification results, and more chunks may not be better.

Grid-Based Partial Chunking Experiments
To verify the rationality of the gridded local chunking approach, a comparison experiment is conducted with the local chunking approach using random cropping by comparing the recognition results under the same number of local image blocks, as shown in Figure 8.
The performance of the grid chunking method is better than random cropping for the same number of local blocks.This is because when the image is grid chunked, multiple local block images are stitched together to contain the information of the whole image, and then each local feature is made to have global information by the local feature fusion module.In contrast, the random cropping method can only randomly take part of the local blocks on the original image, and may not be able to obtain the complete global information even through feature fusion, so its classification performance is weaker than that of the grid chunking method.

Conclusions
In recent years, accurate identification of marine microalgae has provided new opportunities for sustainable photosynthetic oxygen release processes.This study is for the current few-shot learning based on the classification and identification of marine algae.In this paper, an algorithm for few-shot classification of marine microalgae based on gridded multiscale local feature fusion has been proposed.In few-shot image recognition, the gridded multiscale local features are obtained by the category traversal module, the local features of algae images are fully utilized, and the enhanced features containing global information are obtained by the SE-NET architecture local feature fusion module, which improves the feature extraction ability and generalization ability of the model.Compared with various classical networks, the proposed method has higher classification accuracy on the marine microalgae dataset, and the ablation experiments also verify the effectiveness of the proposed method.In future work, the development of marine microalgae classification technology will be promoted by enriching the data set of marine microalgae and strengthening model training and learning.For marine microalgae microscopic image data, how to use deep neural networks instead of shallow networks to mine microalgae image hidden feature information is one of the problems to be solved.

Figure 5 .
Figure 5. Sample of data set.

Figure 8 .
Figure 8.The 5-way 1-shot and 5-way 5-shot classification results on the marine microalgae dataset using grid chunking and random cropping.