Skip Content
You are currently on the new version of our website. Access the old version .
SensorsSensors
  • Article
  • Open Access

3 September 2023

Parking Lot Occupancy Detection with Improved MobileNetV3

,
,
,
and
1
Department of Computer Engineering, Gachon University, Seongnam-si 13120, Republic of Korea
2
Department of Artificial Intelligence, Tashkent State University of Economics, Tashkent 100066, Uzbekistan
*
Author to whom correspondence should be addressed.
This article belongs to the Special Issue Computer Vision for Smart Cities

Abstract

In recent years, parking lot management systems have garnered significant research attention, particularly concerning the application of deep learning techniques. Numerous approaches have emerged for tackling parking lot occupancy challenges using deep learning models. This study contributes to the field by addressing a critical aspect of parking lot management systems: accurate vehicle occupancy determination in specific parking spaces. We propose an advanced solution by harnessing an optimized MobileNetV3 model with custom architectural enhancements, trained on the CNRPark-EXT and PKLOT datasets. The model processes individual parking space patches from real-time video feeds, providing occupancy classification for each patch, identifying occupied or available spaces. Our architectural modifications include the integration of a convolutional block attention mechanism in place of the native attention module and the adoption of blueprint separable convolutions instead of the traditional depth-wise separable convolutions. In terms of performance, our proposed model exhibits superior results when benchmarked against state-of-the-art methods. Achieving an exceptional area under the ROC curve (AUC) value of 0.99 for most experiments with the PKLot dataset, our enhanced MobileNetV3 showcases its exceptional discriminatory power in binary classification. Benchmarked against the CarNet and mAlexNet models, representative of previous state-of-the-art solutions, our proposed model showcases exceptional performance. During evaluations using the combined CNRPark-EXT and PKLot datasets, the proposed model attains an impressive average accuracy of 98.01%, while CarNet achieves 97.03%. Beyond achieving high accuracy and precision comparable to previous models, the proposed model exhibits promise for real-time applications. This work contributes to the advancement of parking lot occupancy detection by offering a robust and efficient solution with implications for urban mobility enhancement and resource optimization.

1. Introduction

The problem of parking has become increasingly problematic as the number of cars on the roads has increased, particularly in urban areas. Therefore, there is a strong demand for effective parking lot management systems that can address these problems in real time. With limited parking availability and the ever-growing number of vehicles, traditional parking management approaches are proving inadequate in ensuring optimal space utilization and reducing congestion. Deep learning techniques, particularly convolutional neural networks (CNNs), have gained attention for their potential to transform parking management. These methods offer the promise of accurate occupancy detection, which is fundamental for making informed decisions regarding space allocation, traffic flow optimization, and overall urban planning.
Several studies have proposed deep learning techniques for parking lot management, with a specific focus on three types of problems: automatic parking space position detection, individual parking space classification, and vehicle detection and counting [1]. The motivation behind developing a model for parking lot occupancy detection is to address the need for efficient management of parking spaces. By accurately determining the occupancy status of parking lots, it becomes possible to optimize parking resource utilization, enhance traffic management, and improve the overall parking experience for users.
However, the successful integration of deep learning in parking management necessitates a profound understanding of the unique challenges posed by this domain. Parking scenarios introduce complexities such as varying lighting conditions, diverse vehicle types, occlusions, and the requirement for real-time response. These challenges demand tailored solutions that can reliably function across a spectrum of conditions, providing accurate occupancy detection while accommodating the dynamic nature of parking environments. Existing methods for parking lot occupancy detection often rely on conventional computer vision techniques or shallow machine learning models, which struggle to achieve high accuracy in complex parking scenarios. These methods lack the ability to handle some of the aforementioned problems.
In this paper, we address these challenges by proposing an enhanced MobileNetV3 architecture customized for the nuanced demands of parking lot occupancy detection. By leveraging the architectural efficiency of MobileNetV3 [2] and introducing domain-specific modifications, we aim to mitigate the complexities inherent to parking management scenarios. Although the MobileNetV3 architecture has demonstrated significant efficiency gains and high accuracy in various computer vision tasks, its application to parking lot occupancy detection poses unique challenges. In the context of parking lot occupancy detection, the original MobileNetV3 encounters limitations related to handling varying lighting conditions, dealing with occlusions, and distinguishing between different vehicle types. These challenges stem from the specific characteristics of parking lot images, including complex backgrounds, varying perspectives, and the need to accurately identify small, partially occluded objects. Our research addresses these limitations by introducing key modifications to the MobileNetV3 architecture tailored for parking lot occupancy detection. This modified version incorporates several architectural improvements, including the use of a Leaky-ReLU6 [3] activation function for the shallow part of the MobileNetV3 model, the replacement of the squeeze-and-excitation module [4] with the convolution block attention module [5], and the replacement of the depth-wise separable convolutions with blueprint separable convolutions [6]. We treat the automatic detection of vacant spaces as a binary classification problem and train and test the improved model on widely used parking management datasets such as CNRPark-EXT [7] and PKLOT [8]. The proposed model processes individual parking spaces and classifies them as vacant or occupied. The incoming real-time video feed frame is processed to obtain individual parking spaces. The proposed model exhibits superior performance compared to previous state-of-the-art models in terms of accuracy and precision and demonstrates its capability to function in real time.
The industrial significance of our approach lies in its practical applications within the rapidly growing field of smart cities and intelligent transportation systems. Our modified MobileNetV3 model addresses key challenges in parking management, contributing to reduced congestion, improved user experiences, and optimized parking resource utilization. With real-time and accurate parking occupancy detection, cities can implement responsive parking guidance systems, enabling drivers to quickly locate available parking spots.
The main contributions of this study are as follows:
  • Novel model outperforming state-of-the-art models: We propose and develop a novel model that achieves a substantial advancement over existing state-of-the-art models in terms of both accuracy and AUC score. Importantly, this superior performance is achieved while ensuring real-time functionality, making our model highly suitable for practical applications.
  • Enhancements to MobileNetV3 architecture: We enhance the performance of the MobileNetV3 architecture through a series of strategic modifications. Firstly, we introduce a novel activation function that contributes to improved accuracy and precision. Additionally, we replace the traditional squeeze-and-excitation (SE) module with a Convolution Block Attention Module (CBAM), a change that refines the model’s ability to focus on salient features. Moreover, we optimize the depth-wise convolution block by adopting blueprint separable convolutions, resulting in a model architecture that is more efficient and effective for parking management tasks.
  • Improved generalization and small object detection: Our enhanced MobileNetV3 model exhibits notable improvements in its architecture. These modifications empower the model to better identify essential aspects of images, pay attention to small objects within the image, and achieve increased generalization capability. These enhancements collectively contribute to superior performance in parking lot occupancy detection tasks.
  • Practical significance: The contributions outlined above hold significant implications for real-world parking management scenarios. Our model’s elevated accuracy, coupled with its capacity for real-time operation, has the potential to revolutionize parking lot occupancy detection. By honing in on crucial image components and effectively detecting small objects, our model proves to be a valuable asset for optimizing parking resource utilization, alleviating traffic congestion, and ultimately enhancing the efficiency of parking management systems.
The remainder of this paper is organized as follows. Section 2 reviews the literature concerning the MobileNet models’ family and parking space classifications. Section 3 describes the datasets used in the experiments. Section 4 and Section 5 discuss the proposed parking management approach and present the experimental results and analyses, respectively. Section 6 provides an overview of the research findings and suggests potential areas for future investigation.

3. Datasets Used for Experiments

CNRPark-EXT and PKLot datasets were used in our experiments as the source of data. Table 2 shows the CNRPark-EXT and PKLot dataset features.
Table 2. Main features of CNRPark-EXT [7] and PKLot [8] datasets.

3.1. CNRPark-EXT Dataset

Amato et al. [7] developed the CNRPark-EXT dataset by extending the CNRPark dataset [21]. CNRPark-EXT is a comprehensive dataset designed for visual occupancy detection in parking lots. Figure 7 shows some examples from different camera perspectives and environmental conditions.
Figure 7. CNRPark-EXT dataset samples: (ac) taken in 3 weather conditions.
CNRPark-EXT contains approximately 150,000 labeled images (patches) representing both vacant and occupied parking spaces. The dataset is built on a parking lot with 164 parking spaces. It extends the original CNRPark dataset, which consisted of 12,000 images collected from two cameras during different days in July 2015. CNRPark-EXT is an additional subset, collected from November 2015 to February 2016, that significantly expands the dataset. It includes images captured by nine cameras with varying perspectives and angles of view. CNRPark-EXT captures diverse scenarios, including different light conditions, partial occlusions (due to obstacles like trees, lampposts, and other cars), and partial or global shadows on cars. The cameras in CNRPark-EXT cover a wide range of views, capturing parking spaces from different angles. The dataset provides a glimpse into the fields of view of the nine available cameras.

3.2. PKLot Dataset

The PKLot dataset is a robust collection designed specifically for parking lot classification developed by Almeida et al. [8]. The PKLot dataset comprises 12,417 images of parking lots and an impressive 695,899 images of segmented parking spaces. The dataset incorporates images captured under various weather conditions, including sunny, cloudy, and rainy days, ensuring the model’s robustness to weather variations. Images were collected at different times of the day, including diverse lighting conditions that a real-world parking detection system would encounter. The dataset was acquired from the parking lots of two Brazilian universities: the Federal University of Parana (UFPR) and the Pontificial Catholic University of Parana (PUCPR), both located in Curitiba, Brazil. Investigations have revealed that UFPR04 presents a slightly greater challenge than the other two subsets, UFPR05 and PUCPR; this is because this subset contains images with different obstacles and ground patterns. The dataset includes both occupied and empty parking spaces, allowing for comprehensive classification tasks. The dataset contains images of parking lots with delimited spaces, both occupied and empty. Figure 8 shows some examples from all three different camera points in different weather conditions.
Figure 8. PKLot dataset samples. (ac) show examples of different parking lots and weather conditions.

4. Proposed Method

This section examines the development process of the deep learning-based parking lot occupancy detection system and its constituent components. We use the LeakyReLU6 activation function for the shallow part of the model, replace the SE block with a convolution block attention module, and replace the depth-wise convolution layers with blueprint separable convolutions. The logical architecture of the occupancy detection process with an already trained model is presented in Algorithm 1.
Algorithm 1. Pseudocode for parking lot occupancy detection process.
  • Input: images of streaming camera
  • Input: manually entered parking space locations
  • Set classification threshold → T
  • When the streaming video does not stop, for each frame of the video:
    • Divide frame into patches according to manually predefined locations
    • Resize the patches
    • For each patch:
      i.
      Feed it to the trained model
      ii.
      Obtain its classification result
      iii.
      If the classification result is higher than threshold T, mark it as occupied, else vacant
      iv.
      Draw bounding box around patch in the frame in red color if it is occupied, else in green color
    • End for cycle.
    • Show the frame with bounding boxes drawn over the initial frame
  • End while

4.1. LeakyReLU6 Activation Function for the Shallow Part of the Network

The use of activation functions is an important aspect of deep learning models. Activation functions introduce non-linearity into the network, thereby allowing it to learn more complex and abstract features from the input data. The authors of MobileNetV3 used the ReLU6 activation function as part of the h-swish activation function. ReLU6 is a popular activation function that is frequently deployed in neural networks because it is computationally efficient and can prevent the vanishing gradient problem.
ReLU6(x) = min(max(0, x), 6)
However, ReLU6 has the limitation that it remains inactive for negative input values, which can result in inaccurate feature extraction. To address this limitation, the Leaky-ReLU6 activation function is used in this study. The Leaky-ReLU6 function combines the leaky-ReLU concept with the ReLU6 function to form a new activation function that is divided into three segments.
When x is less than zero, the function is multiplied by a small parameter, ‘a’, to prevent the neuron from dying. This allows for more effective feature extraction in the low-level network. When 0 < x < 6, the function grows linearly; when x reaches 6, it remains at 6 and does not increase further.
Leaky-ReLU6(x) = min(6, max(ax, x))
The use of Leaky-ReLU6 in the shallow part of the MobileNetV3 model can help improve the accuracy of image feature extraction, particularly for negative input values. The parameter ‘a’ can be manually adjusted during the training process to find the optimal value for the best performance; this value can be used in subsequent test executions.
During our experiments, we tested values in the range [0.0001:0.1]. When a was equal to 0.001, the observed performance was better than the other experimental values.

4.2. CBAM Attention Mechanism

In computer vision, the attention mechanism is a technique that focuses on specific regions of an image that are most relevant to a given task or objective. It is inspired by the manner in which human attention works, where we tend to focus on the most informative or interesting parts of an image. In an attention mechanism, a model learns to assign importance weights to different parts of an image and then selectively combines these features to make a prediction or decision. This can improve the accuracy and efficiency of a model because it allows it to pay attention to the most important details while avoiding unimportant or distracting details in a picture. Attention mechanisms have been demonstrated to enhance the performance of these models in several computer-vision tasks, including image classification, object identification, and image captioning.
The attention module in MobileNetV3 is called the squeeze-and-excitation (SE) module. It comprises two main operations: squeeze and excitation. In the squeeze operation, the feature maps from the previous convolutional layer are globally averaged and pooled to produce a 1D feature vector that represents the channel-wise statistics of the feature maps. During the excitation operation, this 1D feature vector is passed through two fully connected layers using a gating mechanism, producing a channel-wise importance score vector. This vector is then multiplied with the original feature maps to produce the attended feature maps, which emphasize the informative channels and suppress the less informative ones.
The SE module is designed to adaptively adjust the channel-wise importance of feature maps, which enhances the discriminability of features and boosts the performance of object-detection tasks. It has been shown to perform well in a range of computer vision tasks such as semantic segmentation, object detection, and image classification. However, the SE module concentrates solely on the channel dimension of the feature map while overlooking the spatial dimension of the target data. In contrast, the convolution block attention module (CBAM) creates an attention map in both the channel and spatial dimensions and conducts element-wise multiplication operations between the attention map and input feature map in the corresponding dimensions. This results in a more comprehensive and accurate extraction of the target features.
The CBAM channel attention mechanism is characterized by a greater number of parallel global max pooling layers than the SE module. In addition, the utilization of diverse pooling operations enables the extraction of more comprehensive, high-level features. Within the bottleneck structure of the parking space classification model, the input channels undergo a dimensional upgrade and deep convolution, obtaining feature F through deep convolution; this feature is input into the channel attention module of the CBAM to derive the channel feature. The resulting channel feature F’ is then multiplied with F to obtain the feature F’, which is fed into the spatial attention module to produce the spatial feature. The final feature F’’ is obtained by multiplying the channel feature F’ and the spatial feature, followed by linear point-by-point convolution. Figure 9 shows a schematic diagram of MobileNetV3’s bottleneck structure with an integrated CBAM module.
Figure 9. The structure diagram of MobileNetV3′s bottleneck layer structure after adding CBAM. Dwise—depth-wise convolution.

4.3. Blueprint Separable Convolutions to Replace Depth-Wise Separable Convolutions

As discussed in Section 3.2, depth-wise separable convolutions are used in MobileNetV3 to reduce the number of parameters and computational complexity while maintaining accuracy. Traditional convolutional layers have a large number of parameters, which can lead to slow inference times and high memory usage. In MobileNetV3, the use of depth-wise separable convolutions, along with other optimizations, such as SE blocks and hard-swish activation functions, results in a highly efficient and accurate neural network architecture for mobile and embedded devices. However, Haase and Amthor [6] quantitatively analyzed the properties of kernel weights obtained from trained models and found that depth-wise separable convolutions indirectly rely on correlations between kernels; however, their proposed new approach, blueprint separable convolutions, utilizes intra-kernel correlations to enable a more effective separation of standard convolutions, as opposed to traditional convolutional neural networks that rely on inter-kernel correlations. This results in a more efficient and effective convolution method.
Blueprint separable convolutions are a type of convolutional neural network layer introduced by Haase and Amthor [6] that aims to improve the efficiency of depth-wise separable convolutions by exploiting the interrelationships between CNN kernels along their depth dimension. Depth-wise separable convolutions employ M × K × K filters that can be represented by a K × K template and M parameters that distribute the template in the depth dimension; this observation has motivated the creation of blueprint-separable convolutions. Every filter kernel F(n) can be depicted using a blueprint B(n) and the weights wn, 1, …, wn, M via
F(n)m,:,: = wn, m * B(n)
with m in {1, …, M: number of kernels in one filter} and n in {1, …, N: number of filters in one layer}. Figure 10 illustrates the blueprint separable convolutions and their differences from standard convolutions. Blueprint separable convolutions exploit the CNN kernel correlations along their depth axes. Consequently, each filter kernel is represented as a single two-dimensional blueprint kernel in blueprint separable convolutions, which are then distributed along the depth axis using a weight vector. Although filter kernels are subject to strict limitations under this formulation, the authors experimentally showed that, when compared to their vanilla equivalents, CNNs trained using blueprint separable convolutions can achieve the same or even higher quality.
Figure 10. Blueprint separable convolution.
Compared to standard convolution layers that have M×N×K2 free parameters, blueprint separable convolution only has N×K2 parameters for the blueprints and M × N parameters for the weights. The authors proposed two versions of blueprint separable convolutions: unconstrained blueprint separable convolutions (BSConv-U) and subspace blueprint separable convolutions (BSConv-S).
When compared to DSConv, BSConv-U has depth-wise and point-wise convolution layers in opposite order, in which intra-kernel correlations are promoted more than cross-kernel correlations. BSConv-U is less complex in terms of the mathematical equations and calculations, making it more suitable for practical implementation.
Reversing the order of the layers is not expected to significantly affect the middle flow of the network because it already includes point-wise and depth-wise convolutions in an alternating pattern. However, the entry flow is affected because the feature maps from the initial regular convolution can be more fully utilized by the depth-wise convolution via the preceding point-wise distribution. The authors experimentally demonstrated that CNNs trained using the BSConv method can achieve comparable or even superior quality compared to their conventional counterparts.
Overall, the improvements in the architecture of the proposed model helped prevent the model from overfitting, decreased the inference time, and improved accuracy.

4.4. Implementation Details

The proposed classification model was trained using a personal computer with an 8-core 3.70 GHz CPU, 32 GB Memory, and Nvidia GeForce RTX 3060 GPU. The training and testing processes utilized two commonly used parking lot datasets: PKLot and CNRPark-EXT. During our experiments, we used predefined training, validation, and testing subsets of the CNRPark-EXT dataset: the training subset contains 104,493 patches from both the CNRPark and CNRPark-EXT dataset training subsets; the validation subset contains 21,231 patches from both the CNRPark and CNRPark-EXT datasets; and the testing subset contains 31,825 patches from the CNRPark-EXT dataset testing subset. From the PKLot dataset, we used the PUCPR (424,269 patches), UFPR04 (105845 patches), and UFPR05 (165,785 patches) subsets alternatively as our training and testing subsets. The crucial parameters for the training experiments are as follows: 500 epochs, a batch size of 64 images, and a 224 × 224 input image size. Using a starting learning rate of 0.0001, weight decay of 0.0005, and momentum of 0.99, we employed the Adam optimizer, which combines the benefits of two other optimizers: the adaptive gradient algorithm (AdaGrad) and root mean square propagation (RMSProp).
Using five-fold cross-validation, we separated the dataset into five sections and used 80% of it for training and the remaining 20% for validation throughout the training phase. Shuffling was performed at every epoch. Our trained model performed well when tested on an untested sample of photographs.
We used accuracy and AUC scores as our main metrics in this work. Below, we present the formulas used to calculate the accuracy and precision:
  • Accuracy: used to evaluate the performance of the identification task. It is calculated as the number of all correct predictions divided by the total number of the dataset and the best accuracy is 1.0, which is calculated as follows:
Accuracy = (TP + TN)/(TP + TN + FN + FP)
where TP, FN, FP, and TN represent the number of true positives, false negatives, false positives, and true negatives, respectively.
  • AUC score: a metric commonly used to evaluate the performance of binary classification models, such as those used in machine learning and deep learning. The receiver operating characteristic (ROC) curve is a graphical representation that illustrates the trade-off between the true positive rate (sensitivity) and the false positive rate (1 specificity) at different probability thresholds. The AUC represents the area under the ROC curve, which is a single value ranging from 0 to 1. The AUC score in our work measures the model’s ability to distinguish between occupied and unoccupied parking spaces.

5. Experimental Results and Analysis

In this section, we analyze and compare the results of our proposed model with those of other classification models developed for parking lot classification, such as mAlexNet, CarNet, and others, in terms of classification accuracy and AUC score. The experiments show that our proposed modified MobileNetV3 model has a higher classification accuracy than other models and that our proposed model correctly classifies and categorizes more empty and busy parking spaces than other models.
We tried to visualize what our model learnt during the training process and used GradCAM [22] and feature visualization [23] methods to check if our model was learning the right features and paying attention to the right part of the image. In Figure 11, samples are given for this process. GradCAM helps by visualizing which parts of the image the model is paying the most attention to.
Figure 11. Images on the left are busy parking space patches; images in the middle are taken with GradCAM; images on the right are extracted with the first layer of our trained model.
In Figure 12, we demonstrate the sample parking lot classification result performed with our proposed model. As is visible in the figure, all the parking spaces are correctly classified as busy or vacant, which shows the accuracy of our model.
Figure 12. Sample parking lot visualization result with our proposed model.
As an ablation study, we trained the original MobileNetV3 model from scratch on PKLot and CNRPark-EXT datasets and tested the model on both datasets, and the same process was applied to four different models: MobileNetV3 with the proposed LeakyReLU6 activation function, MobileNetV3 with its SE mechanism replaced by the CBAM attention mechanism, MobileNetV3 with its depth-wise separable convolutions replaced by blueprint separable convolutions, and MobileNetV3 with all the above modifications applied. The goal of these experiments was to detect which modification made to the original model brought the greatest increase in accuracy and made the model more generalized and scalable to different parking areas. The results are summarized in Table 3.
Table 3. Performance results of all modified versions of MobileNetV3 model on PUCPR, UFPR04, UFPR05 subsets of PKLot [7] dataset. Bold data shows the highest score for that experiment.
From Table 3, it is evident that although the original MobileNetV3 model achieved nearly 100% accuracy on the same training and testing subsets of the PKLot dataset. But, when trained on one subset and tested on another, the accuracy of this model dropped, which means that it overfit the dataset. When the model was trained on the UFPR05 dataset and tested on two different subsets, its performance was not good, achieving accuracy rates of 87.80% for PUCPR testing and 88.25% for UFPR05 testing. However, changing its shallow part activation function, changing its attention mechanism, and replacing depth-wise separable convolutions with blueprint separable convolutions helped the model avoid overfitting and achieve high accuracy on all training and testing parts.
Substituting the ReLU6 activation function with LeakyReLU6 resulted in a reduction in overfitting of approximately 2% within identical training and testing dataset scenarios. Introducing the CBAM module in lieu of the SE module led to a noteworthy accuracy enhancement from 87.80% to 92.64% for the UFPR05/PUCPR case and from 88.25% to 91.78% for the UFPR05/UFPR04 scenario. Conversely, replacing DSConv with BSConv yielded the most significant improvement in accuracy among the three architectural modifications. In the case of training and testing on the same subset, the accuracy nearly approximated that of the original MobileNetV3, while successfully mitigating overfitting. Moreover, for the UFPR05/PUCPR and UFPR05/UFPR04 cases, the model’s accuracy exhibited improvements of 6% and 5%, respectively. The best classification results were achieved when all modifications were applied to the model, which was expected regarding the modifications to the model structure and their effects on the model’s performance.
Figure 13 presents the learning curves of five different models in Table 3 for training on the PUCPR subset of the PKLot dataset.
Figure 13. Comparison of 5 different model training processes on PUCPR subset of PKLot dataset. x-axis: number of epochs (500), y-axis: accuracy.
In Figure 13, it can be seen that after the final epoch, the training accuracies for the original MobileNetV3 and our proposed approach (MobileNetV3 with all modifications) were 99.95% and 99.9%. Also, this comparison shows that out of all three architectural changes, replacing DSConv with BSConv had more effect on the model’s classification improvement. However, as it was said before, the original MobileNetV3 overfitted the dataset, so it achieved higher accuracy compared to the one we proposed.
We then compared the results of our best model with those of other models developed or fine-tuned with transfer learning, such as AlexNet, mAlexNet, CarNet, VGG16 [24], VGG19 [24], and others, on the PKLot dataset. A comparison of the results is presented in Table 4.
Table 4. Classification results comparison of our best model with mAlexNet, CarNet, VGG16, and other models on PUCPR, UFPR04, UFPR05 subsets of PKLot [8] dataset. Bold data shows the highest score for that experiment.
Table 4. Classification results comparison of our best model with mAlexNet, CarNet, VGG16, and other models on PUCPR, UFPR04, UFPR05 subsets of PKLot [8] dataset. Bold data shows the highest score for that experiment.
ModelTrainTest
PUCPRUFPR04UFPR05
Our solution: modified MobileNetV3PUCPR99.90%98.20%95.15%
UFPR0498.85%99.68%98.38%
UFPR0595.06%96.34%99.20%
CarNet [16]PUCPR98.80%94.40%97.70%
UFPR0498.30%95.60%97.60%
UFPR0598.40%95.20%97.50%
mAlexNet [7]PUCPR99.90%98.03%96%
UFPR0498.27%99.54%93.29%
UFPR0592.72%93.69%99.49%
AlexNet [14]PUCPR98.60%88.80%83.40%
UFPR0489.50%98.20%87.60%
UFPR0588.20%87.30%98%
VGG16 [24]PUCPR88.20%94.20%90.80%
UFPR0489.70%95.30%90%
UFPR0590.50%94.90%91.80%
VGG19 [24]PUCPR81.50%93.80%94.60%
UFPR0480.40%92.30%91.90%
UFPR0588.80%95.10%95.90%
Xception [25]PUCPR96.30%92.50%93.30%
UFPR0494%94.60%93.40%
UFPR0595.70%90.90%91.20%
Inception V3 [26]PUCPR90.80%91.10%94.20%
UFPR0491.70%95.20%92.40%
UFPR0594.30%92.90%93.70%
ResNet50 [27]PUCPR88.20%94.20%94.10%
UFPR0489.70%95.30%93.30%
UFPR0590.50%94.90%95.50%
The results presented in Table 4 indicate that our approach demonstrated superior performance compared to the alternative classification methods across six out of nine experimental scenarios. Notably, our method exhibited higher accuracy rates in the following scenarios: PUCPR/PUCPR (99.9%), PUCPR/UFPR04 (98.2%), UFPR04/PUCPR (98.85%), UFPR04/UFPR04 (99.68%), UFPR04/UFPR05 (98.38%), and UFPR05/UFPR04 (96.34%). Notably, CarNet [16] exhibited better performance than our proposed model in the UFPR05/PUCPR and PUCPR/UFPR05 scenarios, recording accuracy rates of 98.4% compared to 95.06% and 97.7% compared to 95.15%, respectively. Additionally, in the UFPR05/UFPR05 scenario, mAlexNet [7] achieved the highest accuracy of 99.49%, whereas our model attained an accuracy of 99.2%. These results show that the modifications to the original MobileNetV3 model are as useful and efficient as expected.
We subsequently repeated the experiments using the CNRPark-EXT dataset. First, we trained five models on the training subset of the CNRPark-EXT and tested them on the testing subset of the dataset: original MobileNetV3, MobileNetV3 with the LeakyReLU6 activation function, MobileNetV3 with the CBAM module, MobileNetV3 with BSConv, and MobileNetV3 with all architecture modifications. The results of these experiments are presented in Table 5.
Table 5. Performance results of all modified versions of MobileNetV3 on CNRPark-EXT [7] dataset. Bold data shows the highest score for that experiment.
The initial MobileNetV3 architecture yielded accuracies of 94.95%, 90.13%, and 93.53% on the training, validation, and testing subsets of the dataset, respectively. The introduction of an alternative activation function resulted in a modest enhancement of approximately 0.5% in accuracy. Meanwhile, the adoption of an alternative attention module led to a notable improvement of 2% in accuracy. Substitution of depth-wise separable convolutions (DSConv) with blueprint separable convolutions (BSConv) yielded a substantial increase of about 2.5% in accuracy.
Figure 14 shows the training process for the five different models in Table 5 on the training subset of the CNRPark-EXT dataset.
Figure 14. Comparison of 5 different model training processes on training subset of CNRPark-EXT dataset. x-axis: number of epochs (500), y-axis: accuracy.
From Figure 14, it is visible that, as expected, the architectural changes helped the model increase its accuracy. In this dataset, the changes with the biggest accuracy increase were replacing the SE module with the CBAM module and replacing DSConv with BSConv.
After finishing the experiment with different modifications, we compared our best model results with those of the CarNet, AlexNet, and ResNet models on the CNRPark-EXT dataset. A comparison of the results is presented in Table 6. From Table 6, we can observe that our model performed better in two out of three tasks in the training and testing subsets of the CNRPark-EXT dataset. Our model’s validation result was also good but slightly lower than that of AlexNet. Our model achieved 97.73% accuracy for the validation subset; AlexNet achieved 97.91% accuracy. The previous state-of-the-art model, CarNet, achieved 97.91% accuracy in the training subset of the dataset, while achieving 90.05% and 97.24% accuracies in the validation and test sets of the dataset.
Table 6. Classification results comparison of our best model with CarNet, AlexNet, and ResNet50 on CNRPark-EXT [7] dataset. Bold data shows the highest score for that experiment.
Finally, we compared our best model with mAlexNet and AlexNet in combination with the CNRPark EXT and PKLot datasets. The test results are provided in Table 7.
Table 7. Comparison of results of our model with CarNet and mAlexNet in combination of PKLot [8] and CNRPark EXT [7]. Bold data shows the highest score for that experiment.
As CarNet was specifically designed for this task, it achieved 97.03% accuracy on average for all three different experiments. AlexNet obtained 94.07% accuracy as it is a good general deep learning architecture. However, mAlexNet achieved only 88.69% accuracy on average for all three different experiments, which shows that mAlexNet achieves very poor results when trained on one full dataset and tested on another, or in the reverse case. The testing scores for the three combinations provided reveal that our model is much more robust, as it can generalize well and learn general features from the datasets.
In Table 8, the AUC scores for our proposed model and other state-of-the-art models are given and compared. In this table, we include one different model proposed in [8], which we call PKLot for convenience. Out of nine experiments with different subsets of the PKLot dataset, our proposed model achieved the highest AUC scores in five cases, while the PKLot approach had the highest AUC scores in three experiments, and CarNet achieved the highest AUC score in one experiment when trained on the PUCPR subset and tested on the UFPR05 subset of the PKLot dataset.
Table 8. Comparison of AUC scores of modified MobileNetV3 with CarNet, PKLot [8], and mAlexNet on 3 subsets of PKLot [8] dataset. Bold data shows the highest score for that experiment.
Our trained models took around 10 MB memory, which is quite good compared to big models like VGG16, AlexNet, etc. A modified version of mAlexNet proposed in [15] needs about 10 KB memory, but its accuracy is lower than mAlexNet. mAlexNet, proposed by Amato et al. [7], needed about 129 KB. So, while our model is bigger than mAlexNet and modified mAlexNet in size, it has better accuracy and AUC score, as shown in the above experiments.
We also compared the average runtimes of our proposed model with those of other models. We randomly selected 1000 224 × 224 images from each of the CNRPark-EXT and PKLot datasets and ran each model on the same machine used for training without GPU acceleration in the PyTorch framework. Table 9 shows our runtime analysis.
Table 9. Average runtime of our model with AlexNet, mAlexNet, and custom mAlexNet [28] on subsets of PKLot [8] and CNRPark EXT [7].
While our model is 6.7 times slower than both mAlexNet and custom mAlexNet models, it is still 3 times faster than the AlexNet model, which makes it applicable in real-world applications.
The overall conclusion is that the improved MobileNetV3 is a fairly robust model when trained on one dataset and tested on another. We are certain that this approach can be applied to real-life scenarios.

6. Conclusions and Future Work

A parking lot occupancy detection approach was developed in this study using a deep CNN classification model, MobileNetV3, with several modifications to its architecture that increased its robustness and accuracy. The developed model was trained on two well-known parking lot datasets: PKLot and CNRPark-EXT. The incoming video stream is processed frame-by-frame, and each frame is split into patches; the modified MobileNetV3 model classifies each patch as being occupied by a car or as an empty parking space. The classification results were integrated into frames with bounding boxes drawn around each parking space. The qualitative and quantitative performances of the proposed system were experimentally compared with those of other established classification models. The evaluation and experimental results revealed that the enhanced MobileNetV3 model achieved high accuracy and outperformed the other classification models in terms of both accuracy and speed. The developed parking-space classification model is efficient and can be applied to real-world scenarios using mobile devices, resource-constrained edge devices, and cameras.
The main contributions of this work are provided below:
  • An optimal deep learning model was developed to classify parking lot spaces as empty or busy. In the proposed model, the activation function in the shallow part of the model, which requires significant calculations, is replaced by a new activation function that requires less computation. The squeeze-and-excitation attention mechanism applied in the original MobileNetV3 is replaced by another, more effective attention mechanism: the convolutional block attention mechanism. Moreover, because of the hidden cross-kernel correlations in depth-wise separable convolutions, blueprint separable convolutions are used, as they require less computation because they have fewer parameters.
  • Using the improved MobileNetV3 model, the parking lot occupancy detection approach can precisely detect the number of free and busy parking spaces, despite different weather conditions, lighting, and shadows.
Despite being robust and sufficiently quick for real-world applications, our model still has some shortcomings: the inability to correctly classify images under diverse weather conditions, images that contain a portion of cars, images with unusual parking configurations, images with partial occlusion, and images with unseen objects.
In the future, we plan to continue exploring new methods and changes to improve the accuracy of the classification model, reduce its runtime to make it faster when applied to mobile and edge devices, and make it successfully applicable in the above mentioned cases where the model may fail now. Furthermore, we plan to work on a smart camera containing the proposed system to detect parking lot occupancy, improve its efficiency, and reduce its resource consumption.
Our research can be extended by being integrated into a decentralized smart camera system [7]. Incorporating the Improved MobileNetV3 into a decentralized smart camera system has the potential to significantly enhance the efficiency, responsiveness, and intelligence of the system. Also, we are working on automatic parking space detection to replace manually labeling the parking spaces.

Author Contributions

Conceptualization, Y.Y.; methodology, Y.Y. and M.M.; software, Y.Y. and A.B.A.; validation, Y.Y. and R.N.; formal analysis, Y.Y. and M.M.; investigation, Y.Y. and J.C.; resources, Y.Y. and A.B.A.; data curation, Y.Y. and R.N.; writing—original draft preparation, Y.Y.; writing—review and editing, J.C., Y.Y., and M.M.; visualization, R.N. and A.B.A.; supervision, J.C.; project administration, J.C.; funding acquisition, J.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partly supported by Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MIST) (No.2022-0-00838, An intelligent AI-BOX-based shared parking control/induction system capable of recognizing multinational indoor/street vehicle information (vehicle number/classification, etc.)) and this work was supported by the Gachon University research fund of 2022 (GCU-202208830001).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data used in this study is openly available and can be accessed from the following sources: [7,8].

Acknowledgments

Thanks to our families and colleagues who supported us with encouragement.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

  1. Almeida, P.R.; Alaves, J.H.; Parpinelli, R.S.; Barddal, J.P. A Systematic Review on Computer Vision-Based Parking Lot Management Applied on Public Datasets. Expert Syst. Appl. 2022, 198, 116731. [Google Scholar] [CrossRef]
  2. Howard, A.; Sandler, M.; Chu, G.; Chen, L.-C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for MobileNetV3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar]
  3. Zhang, Y.; Chen, X. Lightweight Semantic Segmentation Algorithm Based on MobileNetV3 Network. In Proceedings of the 2020 International Conference on Intelligent Computing, Automation and Systems (ICICAS), Chongqing, China, 11–13 December 2020; pp. 429–433. [Google Scholar]
  4. Hu, J.; Shen, L.; Albaine, S.; Sun, G.; Wu, E. Squeeze-and-Excitation Networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
  5. Jia, L.; Wang, Y.; Zang, Y.; Li, Q.; Leng, H.; Xiao, Z.; Long, W.; Jiang, L. MobileNetV3 with CBAM for Bamboo Stick Counting. IEEE Access 2022, 10, 53963–53971. [Google Scholar] [CrossRef]
  6. Haase, D.; Amthor, M. Rethinking Depthwise Separable Convolutions: How Intra-Kernel Correlations Lead to Improved MobileNets. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 14588–14597. [Google Scholar]
  7. Amato, G.; Carrara, F.; Falchi, F.; Gennaro, C.; Meghini, C.; Vairo, C. Deep Learning for Decentralized Parking Lot Occupancy Detection. Expert Syst. Appl. 2017, 72, 327–334. [Google Scholar] [CrossRef]
  8. Almeida, P.R.; Oliveira, L.S.; Britto, A.S., Jr.; Silva, E.J., Jr.; Koerich, A.L. PKLot—A Robust Dataset for Parking Lot Classification. Expert Syst. Appl. 2015, 42, 4937–4949. [Google Scholar] [CrossRef]
  9. Howard, A.; Zhu, M.; Chen, B.; Kalenichenko, M.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.14861. [Google Scholar]
  10. Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.-C. MobilenetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
  11. Al-Kharusi, H.; Al-Bahadly, I. Intelligent parking management system based on image processing. World J. Eng. Technol. 2014, 2, 55–67. [Google Scholar] [CrossRef]
  12. Ahrnbom, M.; Astrom, K.; Nilsson, M. Fast classification of empty and occupied parking spaces using integral channel features. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 1609–1615. [Google Scholar]
  13. Razavian, A.S.; Azizpour, H.; Sullivan, J.; Carlsson, S. CNN features off-the shelf: An astounding baseline for recognition. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Columbus, OH, USA, 23–28 June 2014; pp. 512–519. [Google Scholar]
  14. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2012; pp. 1097–1105. [Google Scholar]
  15. Nguyen, T.; Tran, T.; Mai, T.; Le, H.; Le, C.; Pham, D.; Phung, K.H. An Adaptive Vision-based Outdoor Car Parking Lot Monitoring System. In Proceedings of the 2020 IEEE Eighth International Conference on Communications and Electronics (ICCE), Phu Quoc Island, Vietnam, 13–15 January 2021; pp. 445–450. [Google Scholar]
  16. Nurullayev, S.; Lee, S.-W. Generalized Parking Occupancy Analysis Based on Dilated Convolutional Neural Network. Sensors 2019, 19, 277. [Google Scholar] [CrossRef] [PubMed]
  17. Xiao, A.; Doshi, D.; Wang, L.; Gorantla, H.; Heitzmann, T.; Groth, P. Parking Spot Classification based on surround view camera system. arXiv 2023, arXiv:2310.12997. [Google Scholar]
  18. Grbić, R.; Koch, B. Automatic Vision-Based Parking Slot Detection and OCCUPANCY classification. Expert Syst. Appl. 2023, 225, 120147. [Google Scholar] [CrossRef]
  19. Duong, T.L.; Le, V.D.; Bui, T.C.; To, H.T. Towards an Error-free Deep Occupancy Detector for Smart Camera Parking System. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2020; Springer: Cham, Switzerland, 2022; pp. 163–178. [Google Scholar]
  20. Martynova, A.; Kuznetsov, M.; Porvatov, V.; Tishin, V.; Kuznetsov, A.; Semenova, N.; Kuznetsova, K. Revising Deep Learning Methods in Parking Lot Occupancy Detection. arXiv 2023, arXiv:2306.04288. [Google Scholar]
  21. Amato, G.; Carrara, F.; Falchi, F.; Gennaro, C.; Vairo, C. Car Parking Occupancy Detection using Smart Camera Networks and Deep Learning. In Proceedings of the 2016 IEEE Symposium on Computers and Communication (ISCC), Messina, Italy, 27–30 June 2016; Institute of Electrical and Electronics Engineers Inc.: New York, NY, USA, 2016; Volume 2016, pp. 1212–1217. [Google Scholar]
  22. Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
  23. Nguyen, A.; Yosinski, J.; Clune, J. Understanding Neural Networks via Feature Visualization: A Survey. In Explainable AI: Interpreting, Explaining and Visualizing Deep Learning; Samek, W., Montavon, G., Vedaldi, A., Hansen, L., Müller, K.R., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2019; Volume 11700, pp. 55–76. [Google Scholar]
  24. Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
  25. Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
  26. Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 2818–2826. [Google Scholar]
  27. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
  28. Satyanath, G.; Sahoo, J.K.; Roul, R.K. Smart Parking Space Detection under Hazy Conditions using Convolutional Neural Networks: A Novel Approach. Multimed. Tools Appl. 2023, 82, 15415–15438. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.