A Systematic Evaluation: Fine-Grained CNN vs. Traditional CNN Classifiers

To make the best use of the underlying minute and subtle differences, fine-grained classifiers collect information about inter-class variations. The task is very challenging due to the small differences between the colors, viewpoint, and structure in the same class entities. The classification becomes more difficult due to the similarities between the differences in viewpoint with other classes and differences with its own. In this work, we investigate the performance of the landmark general CNN classifiers, which presented top-notch results on large scale classification datasets, on the fine-grained datasets, and compare it against state-of-the-art fine-grained classifiers. In this paper, we pose two specific questions: (i) Do the general CNN classifiers achieve comparable results to fine-grained classifiers? (ii) Do general CNN classifiers require any specific information to improve upon the fine-grained ones? Throughout this work, we train the general CNN classifiers without introducing any aspect that is specific to fine-grained datasets. We show an extensive evaluation on six datasets to determine whether the fine-grained classifier is able to elevate the baseline in their experiments.


Introduction
Fine-grain visual classification (FGVC) refers to the task of distinguishing the categories of the same class.Fine-grain classification is different from traditional classification as the former models intra-class variance while the later is about the inter-class difference.Examples of naturally occurring fine-grain classes include: birds (Wah et al., 2011;Van Horn et al., 2015), dogs (Khosla et al., 2011), flowers (Nilsback and Zisserman, 2008), vegetables (Hou et al., 2017), plants (Wegner et al., 2016) etc. while human-made categories include aeroplanes (Maji et al., 2013), cars (Krause et al., 2013), food (Chen et al., 2009) etc. Fine-grain classification is helpful in numerous computer vision and image processing applications such as image captioning (Aafaq et al., 2019), machine teaching (Spivak, 2012), and instance segmentation (Liu et al., 2018), etc.
Fine-grain visual classification is a challenging problem as there are minute and subtle differences within the species of the same classes e.g., a crow and a raven, compared to traditional classification, where the difference between the classes is quite visible e.g., a lion and an elephant.Fine-grained visual classification of species or objects of any category is a herculean task for human beings and usually requires extensive domain knowledge to identify the species or objects correctly.
As mentioned earlier, fine-grained classification in image space aims to reduce the high intra-class variance and the low inter-class variance.We provide a few sample images from the dog and bird datasets in Figure 1 to highlight the difficulty of the problem.The examples in the figure show the images with the same viewpoint.The colors are also roughly similar.Although the visual variation is very limited between classes, all of these belong to different dog and bird categories.In Figure 2, we provide more examples of the same mentioned categories.Here, the difference in the viewpoint and colors are prominent.The visual variation is more significant than the images in Figure 1, but these belong to the same class.
Many approaches have been proposed to tackle the problem of fine-grained classification; for example, earlier works converged on part detection to model the intra-class variations.Next, the algorithms exploited three-dimensional representations to hand multiple poses and viewpoints to achieve state-of-theart results.Recently, with the advent of CNNs, most methods have exploited the modeling capacity of CNNs as a component or as a whole.
This paper aims to investigate the capability of traditional CNN networks compared to specially designed fine-grained CNN classifiers.We strive to answer whether current general CNN classifiers can achieve comparable performance to fine-grained ones.To show the competitiveness, we employ several fine-grained datasets and report top-1 accuracy for both classifier types.These experiments provide a proper place for general classifiers in fine-grained performance charts and serve as baselines for future comparisons for FGVC problems.

Malamute
Husky Eskimo

Crow Raven Jackdaw
Fig. 1 The difference between classes (inter-class variation) is limited for various classes.
Fig. 2 The intra-class variation is usually high due to pose, lighting, and color.
This paper is organized as: Section 2 presents related work about the finegrain classification networks.Section 3 introduces the traditional state-of-theart algorithms which will be compared against fine-grain classifiers.Section 4 shows the experimental settings and datasets for evaluation.Section 5 offers a comparative evaluation between the traditional classifiers and fine-grain classifier; finally, section 6 concludes the paper.

Fine-Grain Classifiers
Fine-grained visual classification is an important and well-studied problem.Fine-grain visual classification aims to differentiate between subclasses of the same category instead of the traditional classification problem, where discriminative features are learned to distinguish between classes.Some of the challenges in this domain are the following: i) The categories are highly correlated i.e. small differences and small inter-category variance to discriminate between subcategories.ii): Similarly, the intra-category variation can be significant due to different viewpoints and poses.Many algorithms such as (Angelova et al., 2013;Chai et al., 2012;Deng et al., 2013;Gavves et al., 2013;Zhang et al., 2016a;Huang et al., 2016;Zhang et al., 2016b) are presented to achieve the desired results.In this section, we highlight the recent approaches.The FGVC research can be divided into the following main branches reviewed in the paragraphs below.
Part-Based FGVC Algorithms.The part-based category of algorithms relies on the distinguishing features of the objects to leverage the accuracy of visual recognition, which includes (Yang et al., 2012;Chai et al., 2013;Berg and Belhumeur, 2013;Zhang et al., 2013Zhang et al., , 2014;;Xiao et al., 2015).These FGVC methods (Zheng et al., 2017;Wang et al., 2015) aim to learn the distinct features present in different parts of the object e.g. the differences present in the beak and tail of the bird species.Similarly, the part-based approaches normalize the variation present due to poses and viewpoints.Many works (Parkhi et al., 2011(Parkhi et al., , 2012;;Wah et al., 2011) assume the availability of bounding boxes at the object-level and the part-level in all the images during the training as well as testing settings.To achieve higher accuracy, (Berg and Belhumeur, 2013;Liu et al., 2012;Xie et al., 2013) employed both object-level and part-level annotations.These assumptions restrict the applicability of the algorithms to larger datasets.A reasonable alternative setting would be the availability of a bounding box around the object of interest.Recently, Chai et al. (2013) applied simultaneous segmentation and detection to enhance the performance of segmentation and object part localization.Similarly, a supervised method is proposed (Gavves et al., 2013), which locates the training images similar to a test image using KNN.The object part locations from the selected training images are regressed to the test image.
Bounding Boxes Based Methods.The succeeding supervised methods take advantage of the annotated data during the training phase while requiring no knowledge during the testing phase and learn on both object-level and object-parts level annotation in the training phase only.Such an approach is furnished in Krause et al. (2015), where only object-level annotations are given during training while no supervision is provided at the object-parts level.Similarly, Spatial Transformer Network (STCNN) (Jaderberg et al., 2015) handles data representation and outputs vital regions' location.Furthermore, recent approaches focused on removing the limitation of previous works, aiming for conditions where the information about the object-parts location is not required either in the training or testing phase.These FGVC methods are suitable for deployment on a large scale and help the advancement of research in this direction.
Attention Models.Recently, attention-based algorithms have been employed in FGVC, where the focus is on distinguishing parts via an attention mechanism.Using attention, Xiao et al. (2015) presented two attention models to learn appropriate patches for a particular object and determine the discriminative object parts using deep CNN.The fundamental idea is to cluster the last CNN feature maps into groups.The object patches and object parts are obtained from the activations of these clustered feature maps.Xiao et al. (2015) needs the model to be trained on the category of interest, while we only require the general trained CNN.Similarly, DTRAM (Li et al., 2017) learns to end the attention process for each image after a fixed number of steps.Several methods are proposed to take advantage of object parts.However, the most popular one is the deformable part model (DPM) (Felzenszwalb et al., 2008), which learns the constellation relative to the bounding box with Support Vector Machines (SVM).Simon and Rodner (2015) improved upon Simon et al. (2014) and employed DPM to localize the parts using the constellation provided by DPM (Felzenszwalb et al., 2008).Navigator-Teacher-Scrutinizer Network (NTSNet) (Yang et al., 2018) uses informative regions in images without employing any annotations.Another teacher-student network was proposed recently as Trilinear Attention Sampling Network (TASN) (Zheng et al., 2019), which is composed of a trilinear attention module, attention-based sampler, and a feature distiller.
No Bounding Boxes Methods.Contrary to utilizing the bounding box annotations, current fine-grain visual categorization state-of-the-art methods avoid incorporating the bounding boxes during testing and training phases altogether.Zhang et al. (2014) and Lin et al. (2015) used a two-stage network for object and object-part detection and classification employing R-CNN and BilinearCNN, respectively.Part Stacked CNN (Huang et al., 2016) adopts the same strategy as Lin et al. (2015) and Zhang et al. (2014) of a twostage system; however, the difference lies in stacking of the object-parts at the end for classification.Fu et al. (2017) proposed multiple-scale RACNN to acquire distinguishing attention and region feature representations.Moreover, HIHCA (Cai et al., 2017) incorporated higher-order hierarchical convolutional activations via a kernel scheme.
Distance metric learning Methods.An alternative approach to partbased algorithms is distance learning algorithms which aim to cluster the data points/objects into the same category while moving different types away from each other.Bromley et al. (1994) trained Siamese networks using deep metrics for signature verification and, in this context, set the trend in this direction.Recently, Qian et al. (2015) employs a multi-stage framework that accepts pre-computed feature maps and learning the distance metric for classification.The pre-computed features can be extracted from DeCAF (Donahue et al., 2014), as these features are discriminative and can be used in many tasks for classification.Dubey et al. (2018) employs pairwise confusion (PC) via traditional classifiers.
Feature Representations Based Methods.These methods utilize the features from CNN methods to capture the global information.Many works including Branson et al. (2014); Krause et al. (2015); Xiao et al. (2015), and Zhang et al. (2014) utilized the feature representations of a CNN and em- Feature Integration Algorithms.Recently, feature integration methods take the features from different layers of the same CNN model and combine them.This technique is becoming popular and is adopted by several approaches.The intuition behind feature integration is to take advantage of global semantic information captured by fully connected layers and instance level information preserved by convolutional layers (Babenko and Lempitsky, 2015).Long et al. (2015) merged the features from intermediate and high-level convolutional activations in their convolutional network to exploit both lowlevel details and high-level semantics for image segmentation.Similarly, for localization and segmentation, Hariharan et al. (2015) concatenated the feature maps of convolutional layers at a pixel as a vector to form a descriptor.Likewise, for edge detection, Xie and Tu (2015) added several feature maps from the lower convolutional layers to guide CNN and predict edges at different scales.

Traditional Networks
To make the paper self-inclusive, we briefly provide the basic building blocks of the modern state-of-the-art traditional CNN architectures.These architectures can be broadly categorized into plain, residual, densely connected, inception, and split-attention networks.We review the most prominent and pioneering traditional networks which fall in each mentioned category and then adapt these models for the fine-grained classification task.The five architectures we investigate are VGG (Simonyan and Zisserman, 2015), ResNet (He et al., 2016), DenseNet (Huang et al., 2017), Inception (Szegedy et al., 2016), and ResNest (Zhang et al., 2020).

Plain Network
Pioneering CNN architectures such as VGG (Simonyan and Zisserman, 2015) and AlexNet follow a single path i.e. without any skip connections.The success of AlexNet (Krizhevsky et al., 2012), inspired VGG.These networks rely on the smaller convolutional filters because a sequence of smaller convolutional filters achieves the same performance compared to a single larger convolutional filter.For example, when four convolutional layers of 3×3 are stacked together, it has the same receptive field as two 5×5 convolutional layers in sequence.Although, the large receptive field has fewer parameters than the smaller ones.The basic building block of VGG (Simonyan and Zisserman, 2015) architecture is shown in Figure 3.
VGG (Simonyan and Zisserman, 2015) has many variants; we use the 19layer convolutional network, which has shown promising results on ImageNet.As mentioned earlier, the block structure of VGG is planar (without any skip connection), and the number of feature channels is increased from 64 to 512.

Residual Network
To solve the vanishing gradients problem, the residual network employed elements of the network with skip connections known as identity shortcuts, as shown in Figure 4.The pioneering research in this direction is ResNet (He et al., 2016).
The identity shortcuts help to propagate the gradient signal back without being diminished.The identity shortcuts theoretically "skip" over all layers and reach the network's initial layers, learning the task at hand.Because of the summation of features at the end of each module, ResNet (He et al., 2016) learns only an offset, and therefore, it does not require the learning of the full features.The identity shortcuts allow for successful and robust training of much deeper architectures than previously possible.We compare ResNet50 & ResNet152 variants with fine-grained classifiers due to successful classification results.

Dense Network
Building upon the success of ResNet (He et al., 2016), DenseNet (Huang et al., 2017) concatenates each convolutional layer in the modules, replacing the expensive element-wise addition and retaining the current features and from the previous layers through skipped connections.Furthermore, there is always a path for information from the last layer backward to deal with the gradient diminishing problem.Moreover, to improve computational efficiency, DenseNet (Huang et al., 2017) utilizes 1×1 convolutional layers to reduce the number of input feature maps before each 3×3 convolutional layer.Transition layers are applied to compress the number of channels that result from the concatenation operations.The building block of DenseNet (Huang et al., 2017) is shown in Figure 5.
The performance of DenseNet on ILSVRC is comparable with ResNet.However, it has significantly fewer parameters, thus requiring fewer computations, e.g.DenseNet with 201 convolutional layers with 20 million parameters produces comparable validation error as a ResNet with 101 convolutional layers having 40 million parameters.Therefore, we consider DenseNet, a suitable candidate for fine-grained classification.

Inception Network
Here, we present Inception-v3 (Szegedy et al., 2016), which utilize label smoothing as a regularization with 7×7 convolutions factorization.Similarly, to propagate label information in the deepest parts of the network, Inception-v3 (Szegedy et al., 2016) employ an auxiliary classifier along with batch normalization help for sidehead layers.Figure 6 shows the proposed block in the Inception-v3 (Szegedy et al., 2016) architecture is used on 8×8 grids of the coarsest level to promote high-dimensional representations.

Split-Attention Network
Lastly, we present the split-attention network in Figure 7, which employs attention and residual block, called ResNest (Zhang et al., 2020), an extension of the resnet.The cardinal group representations are then concatenated along the channel dimension.The final output of other split-attention blocks is produced using a shortcut connection similar to standard residual blocks considering the input and output feature-map have the same shape.Moreover, to align the outputs of blocks having a stride, an appropriate transformation is implemented to the shortcut connection, e.g.transformation can be convolution, strided-convolution, or convolution with pooling.

Experimental Settings
Stochastic Gradient Descent (SGD) Bottou ( 2010) optimizer and a decay rate of 1e −4 are used.We choose the batch size to be 32, an initial learning rate of 0.01 for 200 epochs, where the learning rate is decreased linearly by 0.1 after every 30 epochs for all datasets.The networks are finetuned from Ima-geNet (Deng et al., 2009) training weights.According to each dataset, the last fully-connected layer is also re-mapped from 1k to the number of classes.

Datasets
This section provides the details of the six most prominent fine-grain datasets used for evaluation and comparison against the current state-of-the-art algorithms.
-Birds: The birds' datasets that we compare on are Caltech-UCSD Birds-200-2011, abbreviated as CUB (Wah et al., 2011) is composed of 11,788 photographs of 200 categories which further divided into 5,994 training and 5,794 testing images.The second dataset for birds' fine-grained classification is North American Birds, generally known as NABirds (Van Horn et al., 2015), which is the largest in this comparison.NABirds (Van Horn et al., 2015) has 555 species found in North America with 48562 photographs.

Performance on CUB Dataset
We present the comparisons on the CUB dataset (Wah et al., 2011) in Table 2.The best performer on this dataset is DenseNet, which is unsurprising because the model concatenates the feature maps from preceding layers to preserve details.The worst performing among the traditional classifiers is NasNet (Zoph et al., 2018), maybe due to its design, which is more inclined towards a specific dataset (i.e.ImageNet (Deng et al., 2009)).The ResNet models perform relatively better than NasNet, which shows that networks with shortcut connections surpass in performance than those with multi-scale representations for fine-grained classification.DenseNet offers higher accuracy than ResNet Table 3 Experimental results on FGVC Aircraft (Maji et al., 2013) and Cars (Krause et al., 2013).
The fine-grained classification literature considers CUB-200-2011 (Wah et al., 2011) as a standard benchmark for evaluation; therefore, image-level labels, bounding boxes, and different types of annotations are employed to extract the best results on this dataset.Similarly, multi-branch networks focusing on various parts of images and multiple objective functions are combined for optimization.On the contrary, the traditional classifiers (Huang et al., 2017;He et al., 2016) use a single loss without any extra information or any other annotations.The best-performing fine-grained classifiers for CUB (Wah et al., 2011) are DCL ResNet50 (Chen et al., 2019), TASN (Zheng et al., 2019), and NTSNet (Yang et al., 2018) where merely 0.1% and 0.2% gain is recorded over DenseNet (Huang et al., 2017) for (Chen et al., 2019) and(Zheng et al., 2019), respectively.Furthermore, NTSNet (Yang et al., 2018) lags by a margin of 0.2%.The improvement over DenseNet is insignificant, keeping in mind the different computationally expensive tactics employed to learn the distinguishable features by fine-grained classifiers.

Quantitative analysis on Aircraft and Cars
In Table 3, the performances of fine-grained classifiers are shown on Cars and Aircraft datasets.Here, we also observe that the performance of the traditional classifiers is better than the fine-grained classifiers.DenseNet161 has an improvement of about 1.5% and 3% on Aircraft (Maji et al., 2013) compared to best-performing NTSNet (Yang et al., 2018) and MACNN (Zheng et al., 2017), respectively.Similarly, an improvement of 0.6% and 1.4% is recorded against NTSNet (Yang et al., 2018) and DTRAM (Li et al., 2017) on the Cars dataset, respectively.The fine-grain classifiers such as Yang et al. (2018); Li et al. (2017) and Zheng et al. (2019) fail to achieve the same accuracy as the traditional classifiers, although the former employ more image-specific information for learning.

Comparison on Stanford Dogs
The Stanford dogs (Khosla et al., 2011) is another challenging dataset where the performance is compared in Table 4. Here, we utilize ResNet and DenseNet from the traditional ones.The performance of ResNet composed of 152 layers is similar to DenseNet with 161 layers; both achieved 85.2% accuracy, which is 1.4% higher than PC-DenseCNN (Dubey et al., 2018), the best performing method in fine-grained classifiers.This experiment suggests that incorporating traditional classifiers in the fine-grained ones requires more insight than just utilizing them in the framework.It is also worth mentioning that some of the fine-grained classifiers employ a large amount of data from other sources in addition to the Stanford dogs' training data.

Results of Flower dataset
The accuracy of DenseNet on the Flower dataset (Nilsback and Zisserman, 2008) is 98.1% which is around 5.5% higher as compared to the second-best performing state-of-the-art method (PC-ResCNN (Dubey et al., 2018)) in Table 4. Similarly, the other traditional classifiers also outperform the finegrained ones by a significant margin.It should also be noted that the performance on this dataset is approaching saturation.

Performance on NABirds
Relatively fewer methods have reported their results on this dataset.However, for the sake of completeness, we provide comparisons on the NABirds (Van Horn et al., 2015) dataset.Again the leading performance on NABirds is achieved by DenseNet161, followed by ResNet152.The third-best performer is a fine-grain classifier i.e.PC-DenseCNN (Dubey et al., 2018), which internally employes DenseNet161 lags behind by 3.5%.This shows the superior performance of the traditional CNN classifiers against state-of-the-art fine-grained CNN classifiers.

Ablation studies
Fine-tune vs. Scratch: Here, we present two strategies for training traditional CNN classification networks i.e. fine-tuning the weights via Ima-geNet (Deng et al., 2009) and training from scratch (randomly initializing the weights) for the Car dataset.The accuracy presented for each is given in Table 5.The ResNet50 achieves higher accuracy when fine-tuned as compared to the randomly initialized version.Similarly, ResNet152 performed better for the fine-tuned network; however, it fails when trained from scratch.The reason may be due to a large number of parameters and smaller training data.Backbones Improvement Over Standalone Classifiers: Some fine-grain state-of-the-art methods use ResNet50 as the backbone and achieve an accuracy higher than the standalone ResNet50.To be precise, Table 6 shows the backbones used by state-of-the-art methods in their algorithm.One can observe that many algorithms employ the same backbones more than once, increasing the overhead and doubling or tripling the number of parameters.
Besides utilizing traditional classifiers as backbones, state-of-the-art fine-grain methods rely on specialized techniques to extract fine-grain features; hence, adding more parameters and computation.Therefore, the improvement achieved by the state-of-the-art fine-grain methods comes at the cost of extra considerations and the number of parameters, while the traditional classifier like DenseNet doesn't require such tricks to achieve the same accuracy.
Parameters, FLOPs, and Performance: We provide comparisons in terms of the number of parameters, FLOPs, and performance on the ImageNet for the traditional classifiers that have been employed in our experiments in Table 7.The ResNet50 (He et al., 2016) approximately has the same number of parameters as DenseNet161 (Huang et al., 2017), but the performance of DenseNet161 (Huang et al., 2017) is much higher than ResNet50 (He et al., 2016).It should also be noted that DenseNet169 and DenseNet201 have fewer parameters but higher performance on imageNet; hence, we argue that backbones in the fine-grain methods should be updated to appropriate ones as suggested by our experimental analysis.

Conclusion
In this paper, we provided comparisons between state-of-the-art traditional CNN classifiers and fine-grained CNN classifiers.It has been shown that conventional models achieve state-of-the-art performance on fine-grained classification datasets and outperform the fine-grained CNN classifiers.Therefore, it is necessary to update the baselines for comparisons.It is also important to note that the performance increase is due to the initial weights trained on the ImageNet (Deng et al., 2009)

Fig. 3
Fig. 3 Basic building block of the VGG (Simonyan and Zisserman, 2015), where no skip connections are used.

Fig. 5
Fig. 5 Basic block of the DenseNet (Huang et al., 2017), each layer get connection from previous layers of the block.

Fig. 6
Fig.6Basic building block of the Inception-v3(Szegedy et al., 2016), where many paths are used for feature extraction and then concatenated.

Fig. 7
Fig. 7 Basic building block of the ResNest (Zhang et al., 2020), where different paths are used for feature extraction and then concatenated.

Table 1
(Sharif Razavian et al., 2014)sual categorization datasets to evaluate the proposed method.ployed in many tasks such as object detection(Girshick et al., 2014), Understanding(Zeiler and Fergus, 2014)and recognition(Sharif Razavian et al., 2014).CNN captures the global information directly as opposed to the traditional descriptors that capture local information and require manual engineering to encode global representation.Destruction and Construction Learning Zhang et al. (2016a)014)takes advantage of a standard classification network and emphasizes discriminative local details.The model then reconstructs the semantic correlation among local regions.Zeiler and Fergus (2014)illustrated the reconstruction of the original image from the activations of the fifth max-pooling layer.Max-pooling ensures invariance to small-scale translation and rotation; however, global spatial information might achieve robustness to larger-scale deformations.Gong et al. (2014)combined the features from fully connected layers using VLAD pooling to capture global information.Similarly,Cimpoi et al. (2015)pooled the features from convolutional layers instead of fully connected layers for text recognition based on the idea that the convolutional layers are transferable and are not domain-specific.Following the footsteps ofCimpoi et al. (2015), andGong et al. (2014), PDFR byZhang et al. (2016a)encoded the CNN filters responses employing a picking strategy via the combination of Fisher Vectors.However, considering feature encoding as an isolated element is not an optimum choice for convolutional neural networks.
has 196 classes with different make, model, and year.It has a total number of 16185 car photographs where the split is 8,144 training images and 8,041 testing images i.e. roughly 50% for both.-Aeroplanes: A total of 10,200 images with 102 variants having 100 images for each are present in the fine-grained visual classification of Aircraft i.e.FGVC-aircraft dataset (Maji et al., 2013).Airplanes are an alternative to objects considered for fine-grained categorization, such as birds and pets.-Flowers: The number of classes in the flower (Nilsback and Zisserman, 2008) dataset is 102.The training images are 2,040, while the testing images are 6,149.Furthermore, there are significant variations within categories while having similarities with other categories.

Table 2
Comparison of the state-of-the-art fine grain classification on CUB(Wah et al., dataset.

Table 6
The comparison of backbone and number of parameters in fine-grain methods regarding classification accuracy on the CUB dataset.The input to all methods is 448 × 448.

Table 7
(Deng et al., 2009)iers comparison on ImageNet(Deng et al., 2009)in terms of number of parameters, FLOPS and accuracy.
datasets.Furthermore, we have established that the DenseNet161 model achieves new state-of-the-art results for all datasets outperforming the fine-grained classifiers by a significant margin.IEEE/CVF International Conference on Computer Vision (ICCV) Zheng H, Fu J, Zha ZJ, Luo J (2019) Looking for the devil in the details: Learning trilinear attention sampling network for fine-grained image recognition.In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Zoph B, Vasudevan V, Shlens J, Le QV (2018) Learning transferable architectures for scalable image recognition.In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)