Application of Machine Learning Techniques to Predict the Price of Pre-Owned Cars in Bangladesh
Abstract
:1. Introduction
1.1. Used Car Market in Bangladesh
1.2. Applicability of Machine Learning in Predicting the Price of a Pre-Owned Car
1.3. Research Goal
- To collect dataset pertaining to pre-owned cars and to identify the prominent features that can be used to predict the price of cars.
- To reliably predict the price of pre-owned cars using machine learning approaches.
- To deploy the model as a web application in a local machine so that it can be later made available to end users.
2. Related Work
3. Research Methodology
3.1. Data Acquisition
3.2. Pre-Processing
3.3. Exploratory Data Analysis
- If transmission is considered, cars with automatic transmission have a higher average price than cars with manual transmission.
- If body_type is considered, SUV has the highest average price. It is followed by MPV, Saloon, Estate, and Hatchback.
- If fuel_type is considered, Hybrid cars have highest average price.
- Newer cars have a higher price for all body_type.
- Within different body_type, cars with automatic transmission have a higher price than cars with manual transmission.
- Some outliers are present for Saloon and SUV cars.
3.4. Further Data Cleaning and Removing Outliers
- Step 2: All data outside the range are treated as outliers.
3.5. Data Encoding
3.6. Feature Selection
Algorithm 1 Algorithm for finding correlated features. |
|
3.7. Data Splitting
3.8. Data Scaling
3.9. Regressors Used
3.9.1. Linear Regression
3.9.2. LASSO Regression
3.9.3. Decision Tree
3.9.4. Random Forest
3.9.5. Extreme Gradient Boosting
4. Results
Investigation of the XGBoost Model on Price Estimation
5. Deployment of the Model
- The XGBoost model is trained with the pre-processed data using the fine-tuned hyperparameters. Then, this model and the fitted min–max scalar are saved.
- The given inputs collected from the HTML web page are passed to the Python Flask API via a POST request.
- The saved model and the fitted min–max scalar are loaded.
- The inputs are processed and prepared with the help of the loaded min-max scalar within the API.
- The model is imported and populated with the processed inputs. The model generates a prediction for us, which is displayed in another HTML web page.
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Bangladesh Population. Available online: http://srv1.worldometers.info/world-population/bangladesh-population/ (accessed on 13 October 2021).
- Haq, R.A. A brief look at the auto industry in Bangladesh. The Daily Star, 13 February 2021. [Google Scholar]
- Bank, W. Bangladesh Development Update. April 2013. Available online: https://openknowledge.worldbank.org/handle/10986/16497 (accessed on 13 October 2021).
- Imam, S.H. Bangladesh surpasses India on per capita income. The Financial Express, 24 May 2021. [Google Scholar]
- Islam, S.; Huda, E.; Nasrin, F.; Freelanch Researcher, M. Ride-sharing Service in Bangladesh: Contemporary States and Prospects. Int. J. Bus. Manag. 2019, 14, 65–75. [Google Scholar] [CrossRef] [Green Version]
- Holy, I.J. Bangladesh Automotive Industry: A Roadmap to the Future. 2020. Available online: https://www.lightcastlebd.com/insights/2020/07/bangladesh-automotive-industry-a-roadmap-to-the-future (accessed on 13 October 2021).
- Hasan, M. Reconditioned car imports take a nosedive: Industry people cite high tariff, rising trend of ridesharing as major factors. Dhaka Tribune, 21 September 2019. [Google Scholar]
- Ahmmed, M.; Ullah, M.H. Analysis of the National Budget of Bangladesh 2010–2011: Excellencies and Constraints. 2019. Available online: https://research.usc.edu.au/esploro/outputs/journalArticle/Analysis-of-the-National-Budget-of/99451299902621 (accessed on 13 October 2021).
- Anik, S.S.B. Budget FY19: Used car prices may rise, hybrid cars to become cheaper. Dhaka Tribune, 9 June 2018. [Google Scholar]
- Anwari, N.; Ahmed, M.T.; Islam, M.R.; Hadiuzzaman, M.; Amin, S. Exploring the travel behavior changes caused by the COVID-19 crisis: A case study for a developing country. Transp. Res. Interdiscip. Perspect. 2021, 9, 100334. [Google Scholar] [CrossRef]
- Lessmann, S.; Voß, S. Car resale price forecasting: The impact of regression method, private information, and heterogeneity on forecast accuracy. Int. J. Forecast. 2017, 33, 864–877. [Google Scholar] [CrossRef]
- Mackenzie, A. The production of prediction: What does machine learning want? Eur. J. Cult. Stud. 2015, 18, 429–445. [Google Scholar] [CrossRef]
- Listiani, M. Support Vector Regression Analysis for Price Prediction in a Car Leasing Application. Unpublished. 2009. Available online: https://www.ifis.uni-luebeck.de/~moeller/publist-sts-pw-andm/source/papers/2009/list09.pdf (accessed on 13 October 2021).
- Pal, N.; Arora, P.; Kohli, P.; Sundararaman, D.; Palakurthy, S.S. How much is my car worth? A methodology for predicting used cars’ prices using random forest. In Proceedings of the Future of Information and Communication Conference, Singapore, 5–6 April 2018; pp. 413–422. [Google Scholar]
- Gajera, P.; Gondaliya, A.; Kavathiya, J. Old Car Price Prediction With Machine Learning. Int. Res. J. Mod. Eng. Technol. Sci. 2021, 3, 284–290. [Google Scholar]
- Venkatasubbu, P.; Ganesh, M. Used Cars Price Prediction using Supervised Learning Techniques. Int. J. Eng. Adv. Technol. (IJEAT) 2019, 9, 216–223. [Google Scholar]
- Monburinon, N.; Chertchom, P.; Kaewkiriya, T.; Rungpheung, S.; Buya, S.; Boonpou, P. Prediction of prices for used car by using regression models. In Proceedings of the 2018 5th International Conference on Business and Industrial Research (ICBIR), Bangkok, Thailand, 17–18 May 2018; pp. 115–119. [Google Scholar]
- Gegic, E.; Isakovic, B.; Keco, D.; Masetic, Z.; Kevric, J. Car price prediction using machine learning techniques. TEM J. 2019, 8, 113. [Google Scholar]
- Autopijaca. Available online: https://www.autopijaca.ba/ (accessed on 13 October 2021).
- Samruddhi, K.; Kumar, R.A. Used Car Price Prediction using K-Nearest Neighbor Based Model. Int. J. Innov. Res. Appl. Sci. Eng. (IJIRASE) 2020, 4, 629–632. [Google Scholar]
- Rathee, G.; Sharma, A.; Iqbal, R.; Aloqaily, M.; Jaglan, N.; Kumar, R. A blockchain framework for securing connected and autonomous vehicles. Sensors 2019, 19, 3165. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Dhiman, G.; Oliva, D.; Kaur, A.; Singh, K.K.; Vimal, S.; Sharma, A.; Cengiz, K. BEPO: A novel binary emperor penguin optimizer for automatic feature selection. Knowl.-Based Syst. 2021, 211, 106560. [Google Scholar] [CrossRef]
- Dhiman, G.; Singh, K.K.; Soni, M.; Nagar, A.; Dehghani, M.; Slowik, A.; Kaur, A.; Sharma, A.; Houssein, E.H.; Cengiz, K. MOSOA: A new multi-objective seagull optimization algorithm. Expert Syst. Appl. 2021, 167, 114150. [Google Scholar] [CrossRef]
- Bikroy.com. Available online: https://bikroy.com/ (accessed on 13 October 2021).
- Web Scraper. Available online: https://chrome.google.com/webstore/detail/web-scraper-free-web-scra/jnhgnonknehpejjnehehllkliplmbmhn?hl=en (accessed on 13 October 2021).
- Dataset and Codes. Available online: https://github.com/Amik-TJ/cse_445_used_car_price_prediction_using_machine_learning/tree/main/Experiment_Notebook_Dataset (accessed on 8 December 2021).
- Seo, S. A Review and Comparison of Methods for Detecting Outliers in Univariate Data Sets. Ph.D. Thesis, University of Pittsburgh, Pittsburgh, PA, USA, 2006. [Google Scholar]
- LabelEncoder. Available online: https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html (accessed on 13 October 2021).
- Scikit-Learn. Available online: https://scikit-learn.org/stable/ (accessed on 13 October 2021).
- Get_dummies. Available online: https://pandas.pydata.org/docs/reference/api/pandas.get_dummies.html (accessed on 13 October 2021).
- Pandas. Available online: https://pandas.pydata.org/ (accessed on 13 October 2021).
- Benesty, J.; Chen, J.; Huang, Y.; Cohen, I. Pearson correlation coefficient. In Noise Reduction in Speech Processing; Springer: Berlin/Heidelberg, Germany, 2009; pp. 1–4. [Google Scholar]
- MinMaxScaler. Available online: https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html (accessed on 13 October 2021).
- Bisong, E. Introduction to Scikit-learn. In Building Machine Learning and Deep Learning Models on Google Cloud Platform; Springer: Berlin/Heidelberg, Germany, 2019; pp. 215–229. [Google Scholar]
- Montgomery, D.C.; Peck, E.A.; Vining, G.G. Introduction to Linear Regression Analysis; John Wiley & Sons: Hoboken, NJ, USA, 2021. [Google Scholar]
- Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 1996, 58, 267–288. [Google Scholar] [CrossRef]
- Quinlan, J.R. Induction of decision trees. Mach. Learn. 1986, 1, 81–106. [Google Scholar] [CrossRef] [Green Version]
- Liaw, A.; Wiener, M. Classification and regression by randomForest. R News 2002, 2, 18–22. [Google Scholar]
- Friedman, J.H. Stochastic gradient boosting. Comput. Stat. Data Anal. 2002, 38, 367–378. [Google Scholar] [CrossRef]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Oliphant, T.E. Python for scientific computing. Comput. Sci. Eng. 2007, 9, 10–20. [Google Scholar] [CrossRef] [Green Version]
- Aslam, F.A.; Mohammed, H.N.; Mohd, J.M.; Gulamgaus, M.A.; Lok, P. Efficient way of web development using python and flask. Int. J. Adv. Res. Comput. Sci. 2015, 6, 54–57. [Google Scholar]
- Oliphant, T.E. A Guide to NumPy; Trelgol Publishing: Spanish Fork, UT, USA, 2006; Volume 1. [Google Scholar]
Variable Name | Description |
---|---|
car_name | Name of the car |
brand | Car brand |
car_model | Model of the car |
ine model_year | Model year |
transmission | Transmission (automatic or manual) |
body_type | Body type |
fuel_type | Fuel type |
engine_capacity | Capacity of the engine (in cc) |
kilometers_run | Kilometers run by the car |
price | Price of the car (in taka) |
Feature Name | Unique Values |
---|---|
car_name | 1108 |
brand | 26 |
car_model | 123 |
model_year | 35 |
transmission | 2 |
body_type | 7 |
fuel_type | 24 |
engine_capacity | 51 |
kilometers_run | 640 |
Body_Type | Value Counts |
---|---|
Saloon | 606 |
MPV | 193 |
SUV/4 × 4 | 183 |
Estate | 77 |
Hatchback | 76 |
Convertible | 2 |
Fuel_Type | Value Counts |
---|---|
CNG, Octane | 391 |
Octane | 246 |
Petrol, Octane | 116 |
Hybrid, Octane | 93 |
Petrol, Hybrid, Octane | 62 |
Hybrid | 55 |
Petrol, CNG, Octane | 52 |
Petrol, CNG | 28 |
Diesel | 24 |
Octane, LPG | 22 |
Petrol | 21 |
CNG | 6 |
Octane, Other fuel type | 4 |
Petrol, Octane, LPG | 3 |
Petrol, Other fuel type | 2 |
LPG | 1 |
Petrol, LPG | 1 |
Petrol, Hybrid, Octane, LPG | 1 |
Petrol, CNG, Octane, LPG | 1 |
CNG, Hybrid | 1 |
Petrol, Hybrid | 1 |
Diesel, Petrol | 1 |
Hybrid, Octane, LPG | 1 |
Petrol, Octane, Other fuel type | 1 |
Fuel_Type | Description | Value Counts |
---|---|---|
CNG and Oil | cars that run on both CNG and Oil | 477 |
Oil | cars that run on oil | 411 |
Hybrid | Hybrid cars | 210 |
LPG and Oil | cars that run on both LPG and oil | 25 |
Model | Parameter Space | Best Parameters |
---|---|---|
Linear Regression | normalize: True, False | normalize: False |
Lasso Regression | alpha: 1, 2 selection: random, cyclic | alpha: 2 selection: random |
Decision Tree | criterion: mse, friedman_mse max_depth: 1–21 splitter: best, random | criterion: mse max_depth: 9 splitter: best |
Random Forest | criterion: mse, friedman_mse, mae n_estimators: 0, 5, 10, … 100 | criterion: friedman_mse n_estimators: 60 |
XGBoost | colsample_bytree: 0–1 criterion: mse, friedman_mse, mae eta: 0.1–0.01 max_depth: 1–6 n_estimators: 0, 5, 10, … 100 | colsample_bytree: 0.8 criterion: mse eta: 0.1 max_depth: 5 n_estimators: 95 |
Model | Score (%) | Log RMSE (%) | Log MAE (%) |
---|---|---|---|
Linear Reg. | |||
Lasso Reg. | |||
Decision Tree | |||
Random Forest | |||
XGBoost | 91.32 |
Absolute Percentage Difference | Number of Cars | Number of Overestimation | Number of Underestimation |
---|---|---|---|
0–4 | 30 | 14 | 16 |
4–8 | 35 | 17 | 18 |
8–12 | 30 | 8 | 22 |
12–13 | 5 | 2 | 3 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Amik, F.R.; Lanard, A.; Ismat, A.; Momen, S. Application of Machine Learning Techniques to Predict the Price of Pre-Owned Cars in Bangladesh. Information 2021, 12, 514. https://doi.org/10.3390/info12120514
Amik FR, Lanard A, Ismat A, Momen S. Application of Machine Learning Techniques to Predict the Price of Pre-Owned Cars in Bangladesh. Information. 2021; 12(12):514. https://doi.org/10.3390/info12120514
Chicago/Turabian StyleAmik, Fahad Rahman, Akash Lanard, Ahnaf Ismat, and Sifat Momen. 2021. "Application of Machine Learning Techniques to Predict the Price of Pre-Owned Cars in Bangladesh" Information 12, no. 12: 514. https://doi.org/10.3390/info12120514
APA StyleAmik, F. R., Lanard, A., Ismat, A., & Momen, S. (2021). Application of Machine Learning Techniques to Predict the Price of Pre-Owned Cars in Bangladesh. Information, 12(12), 514. https://doi.org/10.3390/info12120514