Conventional single-stage object detectors have been able to efficiently detect objects of various sizes using a feature pyramid network. However, because they adopt a too simple manner of aggregating feature maps, they cannot avoid performance degradation due to information loss. To solve this problem, this paper proposes a new framework for single-stage object detection. The proposed aggregation scheme introduces two independent modules to extract global and local information. First, the global information extractor is designed so that each feature vector can reflect the information of the entire image through a non-local neural network (NLNN). Next, the local information extractor aggregates each feature map more effectively through the improved bi-directional network. The proposed method can achieve better performance than the existing single-stage object detection methods by providing improved feature maps to the detection heads. For example, the proposed method shows 1.6% higher average precision (AP) than the efficient featurized image pyramid network (EFIPNet) for the MicroSoft Common Objects in COntext (MS COCO) dataset.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited