1. Introduction
The urban landscape presents a complex and intricate challenge when it comes to finding parking spaces for the growing vehicle population, especially in downtown areas during rush hour. The phenomenon of parking shortages is a multifaceted issue intertwined with urban planning, transportation engineering, and human behavior. The growing density of vehicles in urban areas has caused a spatial mismatch between the demand for parking and the available supply, which makes the search for parking spaces a difficult task when resorting to certain areas of the population. This mismatch has given rise to a large number of scientific studies aimed at understanding the spatial, temporal, and behavioral dynamics of the use of parking spaces, as well as the search for a solution to quickly find available parking locations in real time.
Because of the widespread availability of cars, traffic congestion caused by the increasing volume of traffic is a global issue [
1,
2]. Moreover, finding parking spaces for an ever-growing number of vehicles exacerbates this issue. Drivers seek to find available spots close to their destinations to minimize walking distances, but this search for parking can cause traffic congestion and decrease vehicle speed; some studies have shown that almost 30% of urban traffic is due to drivers searching for parking spots [
3]. The issue of traffic congestion and limited parking spaces is becoming a worldwide problem. Despite the restrictions on CO
2 emissions, the number of vehicles on the roads is predicted to rise rather than fall, with conventional cars being replaced by electric cars [
4], implying that unless steps are taken to address these issues they will only worsen with time. Therefore, allocating parking spaces is a critical concern for large- and medium-sized cities where high traffic and a growing demand for parking availability are necessary for daily life. Finding parking spaces is not only time-consuming but also stressful for drivers, especially in busy urban areas [
5]. The issue of parking allocation has been a topic of active research, with several proposals emerging in various areas such as optimizing parking allocation, intelligent parking, and automatic detection of vacant spaces. However, one area of opportunity that has gained significant attention is the prediction of parking spaces, which is an essential aspect of an all-encompassing solution to the parking allocation problem. Accurately predicting the availability of parking spaces can significantly improve the efficiency of parking space utilization, reduce traffic congestion, and save time for drivers. This approach requires the use of advanced technologies such as machine learning and artificial intelligence to analyze real-time data on parking lot occupancy and traffic patterns. Several innovative solutions have been proposed to predict parking space availability, such as using sensors, cameras, and mobile applications. These technologies provide real-time information to drivers on parking availability and guide them to the nearest available parking spot. Moreover, the data gathered from these systems can be analyzed to generate insights into parking usage patterns, which can be used to optimize parking lot allocation and enhance parking space utilization.
For instance, in ref. [
6], the authors used a graph-convolutional neural network followed by a long short-term memory layer to extract features of traffic flow data and predict parking occupancy. Their work demonstrates better predictions for business areas than recreational locations. Another study [
7] predicts the intent and trajectory of vehicles using a transformer model combined with convolutional neural networks trained with video data of persons driving in parking lots and traffic scenarios. In ref. [
8] is proposed a deep learning architecture based on combining GRU with a graph convolutional network to extract both temporal and spatial correlation information. The GRU detects temporal information, while the graph convolutional network is incorporated into the GRU cell to extract spatial correlations. The research by ref. [
3] involves a comparison of various prediction methods for parking spaces, including deep learning, standard machine learning, and classical methods. The methods they examined include long short-term memory from deep learning, seasonal autoregressive integrated moving average as a classical method, and the ensemble-based decision trees as a general method of deep learning. The study found that the ensemble method and the long short-term memory generally performed better in producing predictions with lower errors compared with the classic autoregressive-based method. The study conducted by ref. [
9] compared the predictive performance of various models including linear regression, support vector machine, neural networks, and autoregressive integrated moving average. It was found that the support vector machine model offered the most stable and accurate predictions among the models reviewed.
Predicting parking space availability is a vital area of research that can significantly improve the efficiency of parking space utilization and reduce traffic congestion. As technology advances, more innovative solutions are likely to emerge that will transform the way we manage parking allocation and enhance the overall driving experience.
The studies reviewed above demonstrate that deep learning techniques, particularly the transformer, are commonly employed. However, these techniques require substantial computational power, and therefore cloud computing may be necessary. Nonetheless, the high computational cost and the increasing number of prediction requests may result in latency problems, even with the use of cloud computing.
In this work, we propose a fog computing-based system that predicts parking time for each vehicle and parking lot availability based on parking history records. The system incorporates a machine learning algorithm integrated into the cloud module, which reduces response time and provides data analytics on parking statistics. By utilizing this system, cities can optimize their parking spaces, reduce traffic congestion, and enhance the efficiency of daily life.
This paper presents several contributions, including the development of a fog computing platform to minimize response delays in the system. Additionally, infrastructure based on the AdaBoost algorithm, which is of low computational cost, is proposed for integration to a fog node. This model is considered as a generalization of the autoregressive model. Another contribution is the creation of a mobile application that acts as the user interface for the system. Finally, various machine learning techniques are compared to determine the most suitable method for parking space prediction. Through these contributions, the paper provides a comprehensive framework that enhances the efficiency and accuracy of parking space prediction.
The remainder of this paper is structured in the following manner. 
Section 1 provides an introduction to the theory of fog computing and the AdaBoost algorithm. 
Section 2 presents the proposed fog computing system, which includes the fog node and the user interface, implemented as a mobile application. 
Section 3 describes the methodology used for experimentation, while 
Section 4 presents the results obtained from the prediction and latency calculations. Finally, 
Section 5 summarizes the conclusions drawn from the research.
  3. Methods
This section outlines the architecture of a proposed fog system designed to manage and predict parking space availability. The architecture consists of three layers, as shown in 
Figure 1. The first layer is the IoT layer, which includes the parking lots themselves that provide real-time updates on their space availability. Users in their vehicles or on their mobile devices can receive predictions of the number of available parking spaces that will be available at a specific time, helping them to plan their parking ahead of time. The second layer is the fog layer, which receives information requests from the users and the parking data. This layer incorporates a predictor of the number of parking spaces that will be available in the future, which utilizes a ML algorithm based on AdaBoost. This algorithm helps to accurately predict the number of available parking spaces, based on historical data and real-time updates. Finally, the cloud layer receives data from the parking lots, stores it, and generates statistics on their usage. Additionally, based on the performance of the predictor, this layer is responsible for training or retraining the algorithm to improve its accuracy over time. The main types of fog agents considered are fog gateways, fog node/server, and fog storage [
20]. With respect to the supported communication protocols between the fog layer and the IoT layer, the main protocols are TCP/IP and 5G or 6G, as the primary interaction with users will be through mobile applications. However, for communication between the parking lot and the fog node, the protocols must be compatible with the sensor network in the parking lot. Sensor networks can consist of individual space sensors or cameras covering a large area. The communication protocols can be wireless, such as TCP/IP, Zigbee, or Bluetooth, or they can be wired, using protocols like I2C and MQTT, among others. Based on the investigation in ref. [
21], which explores the relationship between the optimal number of fog nodes and the path loss coefficient to account for the impact of shadowing and fading, we can estimate the required number of fog nodes. One of its experiments suggests that, on average, 1.52 fog nodes are needed for 262 end devices, with certain conditions given in ref. [
21]. Therefore, for example, for parking lots with 1000 spaces, we recommend using approximately six fog nodes to ensure adequate coverage and performance.
Overall, this proposed fog system for parking space management and prediction has the potential to significantly improve parking space utilization, reduce traffic congestion, and save time for drivers. By utilizing advanced technologies, such as IoT, fog computing, and ML, this system can provide real-time updates and accurate predictions to users, helping to streamline the parking experience and enhance overall driving efficiency.
  3.1. Module for Occupancy Prediction
This study proposes a fog module that utilizes a machine learning regressor to predict the availability of parking spaces over time. This fog module has the potential to significantly enhance the efficiency of parking space management in cities, resulting in reduced traffic congestion and improved accessibility for drivers. The fog module consists of two subsystems: the user interface management subsystem and the prediction subsystem. Additionally, it incorporates fog storage to save recent historical data on parking usage for the last seven days. These subsystems are depicted in 
Figure 2. The user interface management subsystem receives data from the user application. The prediction subsystem receive requests for prediction from the user interface management subsystem and uses recent historical data to return predictions. The prediction subsystem uses the AdaBoost regressor to forecast parking space availability for a given date. Parking data are also transferred to the cloud for storage and statistical management via the cloud analytics module. This module determines the need to retrain the regressor based on the error rate.
In the present investigation, we proposed utilizing past parking occupancy information for prediction, which is represented as a sequence of data over time, 
, where 
 is the parking occupation at time slot 
n. In addition, we incorporate features, 
, linked to each component of the sequence, 
, which expands the dataset as 
. Moreover, we consider the traditional autoregressive model of order 
I as a base model, which is given by
        
From this, we also propose to use the AdaBoost algorithm as a generalization to the classic model, similar to the neural autoregressive model of [
22], and we express this as follows,
        
        where 
 is white noise with zero mean and variance 
. In (
3), the features are the samples taken backwards according to the order of the model 
I, 
 is the set with elements 
, and 
 is a function that combines the output of 
R number of regressors according to the AdaBoost regression algorithm. The formula for calculating the output, 
, is as in [
18]
        
        where 
 are weak regressors, and each 
 depends on the error, 
, of regressor 
t, which are calculated in the training phase.
In this work, we also propose to calculate the output of the regressor using the Tukey’s biweight location estimator [
23]. The median, on which Equation (
4) is based, is a resistant estimate, but it has only moderate robustness of efficiency. However, the biweight location estimator is resistant with robust efficiency. The formula for the biweight is given by
        
        where
        
        and 
S is the median of the set 
, and 
c is a parameter of the algorithm. Note that the value 
 of the estimator is computed iteratively.
As far as we are aware, no previous research has proposed the biweight with the AdaBoost regressor, particularly in the context of parking prediction. In 
Figure 3, it is shown the scheme for training the AdaBoost algorithm, using windows of past samples as input and comparing the output of the regressor with the one-step-ahead sample.
To evaluate the effectiveness of the method, we conducted experiments using a parking space database and predicted availability for the next day. In this case, we chose a seventh-order model because the parking data varied weekly. The Adaboost estimator was configured with  decision tree regressors, a learning rate of , and a quadratic loss function for each boosting iteration; these settings achieve the most precise forecast of parking space availability over time.
  3.2. User Interface
The user interface (
Figure 4) consists of receiving data from the user through a mobile application. In this application, user are asked for the location of the destination where they want to go and the time frame of the prediction; for now, only three locations are implemented as Parking one, Parking two, and Parking three (
Figure 5a). The application shows the available parking lots of the selected destination and offers, in real time, the amount of space available in each parking lot.
To this end, an Android mobile application was designed to establish communication with the fog node for parking space availability prediction (
Figure 4). In addition, the application can save the location of the parked vehicle to facilitate its retrieval. To obtain predictions of parking space availability, users need to select a “Choose place/time” button in the app, to access a list of parking lot options. After selecting the desired parking lot, a second list appears for selecting the time period for which the prediction is requested. Once the user has made these selections, the information is sent to the fog node using Google’s Firebase service [
24,
25], which generates and sends the prediction back to the application for display.
Figure 4 provides an example of the user interface of the application, where the current location of the mobile device is represented by a blue globe on the map. To access the parking space availability prediction server, the user needs to tap on a “Choose Place/time” button located at the bottom of the application. This action will display two consecutive selection lists. The first list allows the user to choose one of the three available parking lots in the predictor, as shown in 
Figure 5a. Once the user selects the parking lot of interest, a second list will appear, showing the available time frames, as depicted in 
Figure 5. After selecting both the parking lot and the desired time frame, the options are sent to the fog node, and the button that was initially labeled as “Choose place/time” displays the predicted parking lot availability percentage on the right side of the button, as illustrated in the example shown in 
Figure 6. Algorithm 2 shows the procedure executed by the app, while 
Figure 7 illustrates the sequence diagram depicting a user’s prediction request initiated by clicking the “Choose place/time” button. The diagram shows the flow of information, including parking lot preferences and time inputs sent to Google’s Firebase for generating a prediction, which is then presented on the app’s graphical user interface (GUI).
        
| Algorithm 2 Parking space availability prediction procedure | 
| 1:Initialization Choose parking lot from the displayed menu.2:Choose time from the displayed menu.3:Selected location and time are sent to fog Node via Google “Firebase”4:Fog Node send calculated parking space availability prediction to App via Google’s “Firebase”5:Parking space availability prediction is displayed in App
 | 
 An additional feature of the application is to save a location through the “Save location” button (
Figure 8a), which can be used to save the location of the parking space where the vehicle has been left; this location will appear with a red balloon on the application map (
Figure 8) and can continue to be seen while the mobile device is moving (
Figure 9a,b), until the location is deleted by pressing the “Clear location” button (
Figure 8c). If the location is not deleted, it can be shown centered on the map by pressing the “Show location” button (
Figure 8b); the map centered on the saved location is shown in 
Figure 9c. Another very useful functionality of the application is to show the route to reach the saved location; when pressing the “Trace route” button (
Figure 8d), on the map, a convenient walking path is displayed from the mobile device’s current location to the saved parking space location where the vehicle was left (
Figure 10).
Finally, the application has a button to align the current location (
Figure 11) with which the application constantly centers the map on the current location; when pressing the alignment button, the icon will change (
Figure 11a), and the position of the map will remain where the user manually locates it, returning to its original behavior of centering the current location when the alignment button is pressed again and returning to the icon shown in 
Figure 11b.
  3.3. Metrics
We used several metrics to evaluate the performance of the predictor. We assume a dataset of 
N observations, and the metrics are described below. The 
 score, or coefficient of determination, is the percentage of the variation in the dependent variable that is predictable from the independent variable [
26], and can be calculated as
        
        where 
 represents the prediction for data point 
, 
 denotes the mean of all data points, and the summation extends over 
N, the total number of observations.
The maximum error (MaxError), is simply the maximum of the residuals, defined as
        
The mean absolute error (MAE) measures the errors between the data observed and the prediction
        
The median absolute error (medianAE)
        
The root mean square error
        
  4. Results
For this work we used the database of ref. [
27] for training and testing of the subsystem. The database reports the capacity of a given parking lot and consists of the following fields: parking lot system, capacity, occupancy, date, and time. The size of the database is 35,717 entries divided in 30 parking lots. The period of data collection for most parking lots is from 4 October 2016 to 19 December 2016; however, some parking lots have a more reduced interval. The total number of data points collected per parking lot is shown in 
Figure 12a. In the database, parking data were acquired every 30 min; however, we use the data per day, as shown in 
Figure 12b.
For prediction of parking spaces, we used several ML algorithms: AdaBoost, XGB, lasso, random forest, K-neighbors, support vector machine, and kernel ridge regression, and we denote by TAdaBoost the modified AdaBoost algorithm by the use of Equation (
5). All algorithms were trained using a split of the data: 80% for training and 20% for testing. We used 315 days of the database, and thus 63 days were used for testing. In 
Table 1 are shown the values used for the most common parameters in each algorithm. For a complete list of the parameters and the value used, please see scikit-learn library documentation in default values [
28].
The evaluation of the algorithms using various metrics is presented in 
Table 2. According to the results, the TAdaBoost algorithm performs better than all other methods in terms of R2 score, MAE, and RMSE. However, it has a higher MaxError than the other methods. AdaBoost has the best MedianAE, with 10.5, which is 3.099 units lower than the next best method, TAdaBoost. However, TAdaBoost has a consistently low error for most of the predictions.
The curves in 
Figure 13 display a portion of the occupancy data series obtained from the database; only the best three methods are shown. It is apparent that the TAdaBoost algorithm fits the actual occupancy curve more accurately. The data show a cyclic pattern that deviates around days 25–30, where all methods demonstrate poor performance. Also, on day 26, most methods predict an increase that the actual data do not show, but the TAdaBoost and the AdaBoost algorithms do not make this error and follow the actual trend.
Figure 14, shows boxplots of the four best methods. The errors between prediction and actual measures were used as data. Most methods have a median of zero error; however, the TAdaBoost method presents less variance but presents more outliers than the second best, AdaBoost.
 We simulate a network composed of a variable number of cellular phone requests to the server with fog and only cloud. The overall end-to-end delay of the system is approximated as the sum of the nodal processing delay, 
, queuing delay, 
, serialization delay, 
, and propagation delay, 
 [
29]. The advancements in hardware and software have decreased the processing and serialization delay to microseconds, and the propagation delay is around 5 microseconds per kilometer. The queuing delay can be optimized using quality of service (QoS) techniques for prioritized data.
In the simulation, the nodes’ distance, , was dynamically changed within a 10 km diameter to more realistically simulate mobile devices moving with respect to fixed fog servers. The latency calculations were performed using the following parameter values: , , and , where r is a random variable between 0 and 1; note that  was not considered in this work.
Figure 15 displays the delay curve, which shows that fog computing reduces the overall delay. The reduction is not significant with fewer than 100 nodes, and, when there are fewer than 50 mobile phones, the delay is negligible, less than a second. However, as the number of requests increase, the delay with fog computing is almost half of that with cloud-only computing.
 Finally, with regard to study design, we assess both the internal validity and external validity of our study. Internal validity, in our context, focuses on two primary threats. Firstly, it pertains to the suitability of the dataset for training the algorithms. We have chosen a database containing 30 parking lots, for providing data for algorithm training. Secondly, it relates to the comparison of selected machine learning algorithms. To address this concern, we have opted for the most successful techniques commonly employed in the literature. External validity, in the context of our study, revolves around result generalization. There is a risk that the machine learning model may not perform effectively when applied to a parking lot not included in the database. However, we have implemented two strategies to mitigate this risk. Firstly, we selected machine learning algorithms known for their ability to recognize significant patterns in data, especially over extended periods, such as in time series analysis. These algorithms typically offer better generalization compared with more traditional methods like simple regression. Consequently, we anticipate that other parking lots will exhibit similar patterns to those on which the algorithm was trained. Secondly, even if generalization to other parking lots proves challenging, our fog nodes incorporate a retraining module. This module allows the algorithm to adapt to specific parking lots in the event of significant error levels, ensuring continued performance optimization.