The single shot multi-box detector (SSD) exhibits low accuracy in small-object detection; this is because it does not consider the scale contextual information between its layers, and the shallow layers lack adequate semantic information. To improve the accuracy of the original SSD, this paper proposes a new single shot multi-box detector using trident feature and squeeze and extraction feature fusion (SSD-TSEFFM); this detector employs the trident network and the squeeze and excitation feature fusion module. Furthermore, a trident feature module (TFM) is developed, inspired by the trident network, to consider the scale contextual information. The use of this module makes the proposed model robust to scale changes owing to the application of dilated convolution. Further, the squeeze and excitation block feature fusion module (SEFFM) is used to provide more semantic information to the model. The SSD-TSEFFM is compared with the faster regions with convolution neural network features (RCNN) (2015), SSD (2016), and DF-SSD (2020) on the PASCAL VOC 2007 and 2012 datasets. The experimental results demonstrate the high accuracy of the proposed model in small-object detection, in addition to a good overall accuracy. The SSD-TSEFFM achieved 80.4% mAP and 80.2% mAP on the 2007 and 2012 datasets, respectively. This indicates an average improvement of approximately 2% over other models.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited