1. Introduction
E-commerce has transformed the way companies operate, customer behavior and needs, and business transactions [
1]. A recent report by Lipsman predicts that global e-commerce sales of retail goods will reach
$6.5 trillion by 2023, indicating a significant increase in online shopping trips. The market share of e-commerce in the total retail market is expected to grow from 10.4% to 22% (with sales totaling approximately
$2.4 trillion) from 2017 to 2023. This trend has a substantial impact on transportation management, and logistics, affecting both delivery times and costs for customers [
2]. One of the primary challenges remains the choice between online and offline shopping trips. For instance, in Korea, major retail companies such as Shinsegae Group (i.e., E-mart, Shinsegae Department Store) and Lotte Group (i.e., Lotte Mart, Lotte Department Store), which dominate the retail industry, face fierce competition in the online shopping and have created offline stores. However, even though Lotte Mart (an offline store) and Lotte Mart Mall (an online store) operate in the same product sales category, they compete strongly in terms of product sales [
3]. Online product buyers often require fast delivery, so their orders should be processed as soon as they are placed. All the operations required to deliver these orders to customers are compressed into a short time frame, encompassing order processing, pickup, long-distance shipping, and delivery [
1]. In online shopping, orders vary in location based on the customers’ preferences within the logistics system. The most important issue is the delivery and travel time for products to reach the customers who are the furthest away, which sometimes causes customer dissatisfaction stemming from possible delays or failing to meet expected delivery times.
This issue poses numerous challenges for choosing between online and offline shopping trips. Online shopping offers several advantages for customers, such as saving time, reducing travel expenses, accessing product discounts, 24/7 availability, avoiding queues, escaping crowds in the stores, easily accessing the list of desired products with their specifications, and making informed purchasing decisions. However, online shopping also has some disadvantages, such as delayed delivery, the absence of physical product touch, delivery costs, and the lengthy process of returning the product, which often leads many buyers to prefer offline shopping at the nearest physical store. Naturally, both types of shopping, online and offline, require the evaluation of shopping trips. Therefore, it is necessary to predict the type of shopping trip that can be made by the customer (offline trips) or the companies offering online products (online trips) by measuring the factors influencing the customer’s choice between online and offline shopping. This demonstrates the importance and the need to conduct this research and using methods based on artificial intelligence and machine learning.
The travel decision-making process related to travel is of great importance in transportation planning, and its application, informed by effective information and detailed analysis can serve as a predictive indicator for future development. The significance of travel production within transportation demand management has led to extensive studies for different travel purposes. Daily trips undertaken by residents serve purposes such as work, shopping, education, and recreation. According to the studies conducted in Tehran, approximately 15% to 18% of trips are for shopping purposes [
4], which shows the importance of this type of trip after work trips for the studies in question. By identifying and prioritizing the factors influencing the occurrence of online and offline shopping trips, we can significantly contribute to the reduction of transportation costs, pollutant emissions, urban traffic, user satisfaction congestion, and enhanced and sustainable development. According to surveys conducted in 24 countries including India, China, the USA, Germany, and Japan, it has been revealed that on average only 10% of daily trips are made for online shopping and 90% of trips are made for offline shopping in one day [
4].
Tehran is the most populous city and the capital of Iran, with a population of over 9 million people. According to the 2018 estimate by the United Nations, it ranks as the 34th most populous city in the world and the most populous city in West Asia. Tehran metropolis is the second most populated metropolis in the Middle East. Due to the specific style of modern and traditional life in Tehran, the types of shopping trips in this city are diverse [
5].
Table 1 shows the number of online and offline shopping trips in one day in Tehran [
4].
Due to their high computational efficiency, deep learning algorithms are widely used in various fields of urban transportation, such as traffic monitoring, accident avoidance, traffic intersections, autonomous vehicles, and intelligent transportation systems, and have attracted the attention of experts and specialists in this field [
6].
The type of shopping trip is an important factor that varies depending on whether the shopping is conducted online or offline. In the context of online shopping, the logistics system plays a crucial role in the product supply chain of Internet companies. The logistics system encompasses the transfer, movement, processing, and access to logistics information facilitating the integration of transportation, ordering, manufacturing processes, order changes, production scheduling, logistics planning, and warehousing operations. In the case of offline shopping, the choice of travel mode and the transportation system utilized by customers hold economic and environmental implications. Therefore, estimating the type of travel is the main topic of this paper. Due to the large volume of data associated with shopping trips in both online and offline shopping, it is necessary to adopt new methods based on artificial intelligence technologies and machine learning algorithms. The main objective of this paper is to leverage a machine learning approach, specifically deep learning, to develop a travel prediction model for online and offline shopping in Tehran, following the identification of the factors that influencing trip creation. The research questions that guide this paper are as follows:
What are the main factors that influence the type of shopping trip in Tehran?
How can a deep learning approach improve the accuracy and reliability of estimating the type of shopping trip in Tehran?
How does the proposed model compare with other methods for estimating the type of shopping trip in Tehran?
This paper is structured as follows.
Section 2 provides a review of the literature and previous research on estimating the type of shopping trip.
Section 3 describes the research method, which comprises of a deep learning approach and its associated steps.
Section 4 presents the results obtained by applying the proposed method and compares them with other methods.
Section 5 is the discussion, and finally
Section 6 concludes the paper by summarizing the main contributions and findings and suggesting future work directions. This paper makes the following contributions:
A cost-effective method is presented to accurately estimate the type of shopping trip.
Providing an effective alternative method for transportation decision-making and urban traffic resource allocation.
Integration of decision models and machine learning methods in order to improve the travel type estimation system.
Using a CNN-based approach to estimate shopping trips.
3. Methodology
This research uses machine learning methods for analysis. The feature selection is done by using supervised machine learning algorithms, namely, deep networks.
Figure 1 shows the general procedure of the proposed method as a flowchart.
In the first step, we collected 500 questionnaires from online shopping trips through text messages and 500 questionnaires from offline shopping trips which were strategically placed within shopping centers in districts 2 and 5 of the Tehran metropolis, according to the data frequency and calculations using Cochran’s formula. The statistical population of this research consisted of 1000 active e-commerce users residing in districts 2 and 5 of Tehran who had successfully placed orders in the last 20 days of 2021 in online and offline services, therefore we used purposive sampling. All of these questionnaires included factors such as age, gender, marital status, car ownership, delivery cost, delivery time, product price, income, employment status, and level of education as the factors affecting the shopping trip. The questionnaire and additional information about it are available in the
Appendix A section. Since the deep learning method requires numerical data, we converted the values obtained from the questionnaires into quantitative values using the following
Table 3, so that they could be used as input for the deep network.
In this paper, we used the Cronbach’s alpha method in Equation (1) to estimate the validity of the questionnaire [
18]. We distributed 100 questionnaires to a random sample in a pre-test and calculated the Cronbach’s alpha coefficient using the SPSS 22 software.
is equal to Cronbach’s alpha coefficient,
K is equal to the number of questionnaire questions,
is the variance related to the i-th question, and
is equal to the total variance of the test. The alpha coefficient indicates the extent to which the questions are consistent, and the respondents answered them with accuracy and knowledge. For research purposes, a reliability between 0.6 and 0.8 is considered appropriate. A questionnaire is reliable when the value of Cronbach’s alpha is greater than 0.7 and the closer this value is to 1, the questionnaire has higher reliability [
19]. The results of validity of the questionnaire are provided in
Table 4.
Table 4 shows that the Cronbach’s alpha coefficient for all indicators is above 0.7, indicating the reliability of the questionnaire.
After converting the qualitative data into quantitative data, we organized the data sets for input into the deep neural network. Then, we performed data preprocessing and determined the deep network architecture. We prepared the data for both the training and testing stages and presented the results. We sorted and labeled the data, and finally, we examined it.
Table 5 shows a sample of data that is labeled.
4. Results
The first step is to present the descriptive statistics of the statistical population.
Table 6 shows this information.
Next, we discuss the estimation of shopping-oriented trip modes using deep learning. In this research, we utilize a convolutional neural network (CNN) as the main algorithm. Within the fully connected layer, we obtain the feature vector (using the activation command in MATLAB R2022a) and use it as a deep feature. We use the stochastic gradient descent (SGD) algorithm to train CNN.
Table 7 shows the parameters used for the SGD algorithm. We also set the number of training epochs to 40 in the network.
One thing that can be noted about the model’s performance is that the neural network work is based on training and testing data. In the system preprocessing, we used 70% of the data for training and 30% for testing. However, since the system randomly selects data for training and some others for testing in each run, the results may have a slight difference, which is insignificant. We also note that the results are based on the best outcomes after 15 runs of the neural network, which are directly related to the data selection and the system implementation. Upon executing the CNN model, we found that this model achieved F1. Score 96.15 for estimating the shopping trip mode.
Figure 2 displays the correlation plot of the proposed CNN network, showcasing the distribution of training and testing data. As seen, the correlation of the results was R = 0.91934.
In the next step, we compared the results obtained by the deep learning algorithm with those obtained by other models, including K-nearest neighbor (KNN), decision tree (DT), multi-layer perceptron (MLP) neural network, and long short-term memory (LSTM). To compare the accuracy, precision, recall, F1-score, Mean Squared Error (MSE), and Root Mean Squared Error (RMSE) have been used as provided in Equation (2) to Equation (7) [
20,
21].
Table 8 and
Figure 3 shows the result of this comparison.
The results in
Table 8 and
Figure 3 show that the deep model (CNN) is more efficient than the DT, MLP, KNN, and LSTM algorithms for estimating both online and offline shopping trips.
5. Discussion
In today’s economic world, having accurate and timely information is invaluable for business owners, investors, creditors, and other stakeholders when making financial decisions. With the development of technology, the use of simple models for predicting customer behavior and shopping trip modes has become possible for all industries and manufacturing companies. The availability of simple and powerful tools to predict shopping trips can help owners to prevent bankruptcy and take necessary measures to improve the company’s condition based on the customers’ purchase or non-purchase. Predicting customer behavior and shopping trip modes holds paramount importance issues within decision-making in industries, considering the effects and consequences of this phenomenon at both micro and macro levels of society. There are various tools and models, each of which differs in the method or predictor variable. Also, it is obvious that for any type of shopping trip, whether online or offline, a logistics system must be adopted (shopping by private car or public transport for offline shopping and using a logistics fleet to deliver the product for online shopping). Depending on the type of online or offline shopping, the shopping trip mode is very important. The logistics system, encompassing the transmission, movement, processing, and access to logistics information for the seamless integration of transportation, ordering, manufacturing processes, order changes, production scheduling, logistics planning, and warehousing operations, is the most important part of the supply chain of companies. On the other hand, in offline shopping trips, the choice of travel mode and customer transportation systems by customers, in addition to economic issues, also involve environmental issues. Based on this, estimating the type of travel can be adopted as the main topic of this dissertation. Due to the large amount of data related to shopping-oriented trips that exist in both online and offline shopping, it is necessary to adopt new methods based on artificial intelligence and computing technologies.
6. Conclusions and Future Work
Based on this, in this research, we used machine learning techniques and specifically deep learning to evaluate the data results. In this research, considering the data frequency in areas 2 and 5 of the Tehran metropolis and calculations based on Cochran’s formula, we provided 1500 questionnaires to the people of these areas. Finally, we collected 1000 questionnaires from 1000 active e-commerce users living in areas 2 and 5 of Tehran who had successful orders in online and offline services in the last 20 days of 2021. The results of the descriptive statistics of the respondents showed that the largest share of people in the statistical population were single men in the age range of 18–35 years without owning a car and having a bachelor’s degree with an income level of 10–15. It was also found that most of these people were full-time employees. Also, based on the reviews conducted in the articles and consultation with experts, we used age, gender, marital status, car ownership, delivery cost, delivery time, product price, income, employment status, and education level as indicators affecting the type of shopping trip. In the next step, after determining the optimal architecture of the deep network, we evaluated the results and estimated the travel mode. To compare the proposed method with other methods, we used MLP, LSTM, DT, and KNN algorithms. The results showed that the deep model had the best performance with an accuracy of 95.73%. After the CNN algorithm, the LSTM algorithm is later with an accuracy of 94.04%. This means that the proposed approach has been able to improve the accuracy of LSTM in estimating the shopping-based trips by 1.69%. In future works, we will try to use new meta-heuristic approaches such as the gray wolf algorithm to first adjust the hyperparameters of the CNN algorithm, and then simultaneously with the selection of the feature by the CNN model, we will also perform the feature dimension reduction operation to improve the accuracy and time.