Abstract
This paper proposes a solution to the challenge of low accuracy in fine-grained vehicle classification, which arises from minimal intra-class feature variations. It introduces TS-Net, a multi-scale convolutional progressive fine-grained vehicle recognition network. TS-Net utilizes a jigsaw puzzle generator to create multi-level fine-grained images, facilitating progressive learning from fine to coarse granularity. The network integrates multi-attention modules to localize regions of interest and optimizes feature extraction using a split-transform-fuse strategy with multi-scale convolutions. This approach enables cross-granularity feature learning, resulting in improved fine-grained vehicle classification. Experiments on the Stanford Cars dataset show that TS-Net achieves a vehicle identification accuracy of 95.68%. Ablation studies confirm the effectiveness of the progressive training strategy and highlight the synergistic contributions of each module in addressing the issue of low accuracy.