This paper aims to utilize the deep learning architecture to break through the limitations of camera perspective, image background, uneven crowd density distribution and pedestrian occlusion to estimate crowd density accurately. In this paper, we proposed a new neural network called Deep Scale-Adaptive Convolutional Neural Network (DSA-CNN), which can convert a single crowd image to density map for crowd counting directly. For a crowd image with any size and resolution, our algorithm can output the density map of the crowd image by end-to-end method and finally estimate the number of the crowd in the image. The proposed DSA-CNN consists of two parts: the seven layers CNN network structure and DSA modules. In order to ensure the proposed method is robust to camera perspective effect, DSA-CNN has adopted different sizes of filters in the network and combines them ingeniously. In order to reduce the depth of the data to increase the speed of training, the proposed method utilized 1 × 1 filter in DSA module. To validate the effectiveness of the proposed model, we conducted comparative experiments on four popular public datasets (ShanghiTech dataset, UCF_CC_50 dataset, WorldExpo’10 dataset and UCSD dataset). We compare the proposed method with other well-known algorithms on the MAE and MSE indicators, such as MCNN, Switching-CNN, CSRNet, CP-CNN and Cascaded-MTL. Experimental results show that the proposed method has excellent performance. In addition, we found that the proposed model is easily trained, which further increases the usability of the proposed model.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited