Next Article in Journal
Error-Aware Data Clustering for In-Network Data Reduction in Wireless Sensor Networks
Previous Article in Journal
A Large Detection-Range Plasmonic Sensor Based on An H-Shaped Photonic Crystal Fiber
Previous Article in Special Issue
Face Recognition at a Distance for a Stand-Alone Access Control System
Open AccessArticle

Mask-Refined R-CNN: A Network for Refining Object Details in Instance Segmentation

by Yiqing Zhang 1,2, Jun Chu 1,2,*, Lu Leng 1,2,3,* and Jun Miao 1,4
1
Department of Key Laboratory of Jiangxi Province for Image Processing and Pattern Recognition, Nanchang Hangkong University, Nanchang 330063, China
2
School of software, Nanchang Hangkong University, Nanchang 330063, China
3
School of Electrical and Electronic Engineering, College of Engineering, Yonsei University, Seoul 120749, Korea
4
School of Aeronautical Manufacturing Engineering, Nanchang Hangkong University, Nanchang 330063, China
*
Authors to whom correspondence should be addressed.
Sensors 2020, 20(4), 1010; https://doi.org/10.3390/s20041010
Received: 15 January 2020 / Revised: 11 February 2020 / Accepted: 12 February 2020 / Published: 13 February 2020
(This article belongs to the Special Issue Visual Sensor Networks for Object Detection and Tracking)
With the rapid development of flexible vision sensors and visual sensor networks, computer vision tasks, such as object detection and tracking, are entering a new phase. Accordingly, the more challenging comprehensive task, including instance segmentation, can develop rapidly. Most state-of-the-art network frameworks, for instance, segmentation, are based on Mask R-CNN (mask region-convolutional neural network). However, the experimental results confirm that Mask R-CNN does not always successfully predict instance details. The scale-invariant fully convolutional network structure of Mask R-CNN ignores the difference in spatial information between receptive fields of different sizes. A large-scale receptive field focuses more on detailed information, whereas a small-scale receptive field focuses more on semantic information. So the network cannot consider the relationship between the pixels at the object edge, and these pixels will be misclassified. To overcome this problem, Mask-Refined R-CNN (MR R-CNN) is proposed, in which the stride of ROIAlign (region of interest align) is adjusted. In addition, the original fully convolutional layer is replaced with a new semantic segmentation layer that realizes feature fusion by constructing a feature pyramid network and summing the forward and backward transmissions of feature maps of the same resolution. The segmentation accuracy is substantially improved by combining the feature layers that focus on the global and detailed information. The experimental results on the COCO (Common Objects in Context) and Cityscapes datasets demonstrate that the segmentation accuracy of MR R-CNN is about 2% higher than that of Mask R-CNN using the same backbone. The average precision of large instances reaches 56.6%, which is higher than those of all state-of-the-art methods. In addition, the proposed method requires low time cost and is easily implemented. The experiments on the Cityscapes dataset also prove that the proposed method has great generalization ability. View Full-Text
Keywords: instance segmentation; multi-scale feature fusion; Mask-Refined R-CNN; ROIAlign adjustment instance segmentation; multi-scale feature fusion; Mask-Refined R-CNN; ROIAlign adjustment
Show Figures

Figure 1

MDPI and ACS Style

Zhang, Y.; Chu, J.; Leng, L.; Miao, J. Mask-Refined R-CNN: A Network for Refining Object Details in Instance Segmentation. Sensors 2020, 20, 1010.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop