Advancing Real-Estate Forecasting: A Novel Approach Using Kolmogorov–Arnold Networks
Abstract
:1. Introduction
1.1. Residential Real-Estate Domain:Motivation and Challenges
1.2. The Specific Case of the House Price Prediction Problem
1.3. Scientific Background and the Proposed Approach
1.4. Contribution
- Scientific contribution: This study expands the methodological landscape in regression by presenting Kolmogorov–Arnold networks (KANs) as a robust alternative to traditional models like gradient boosting, ensemble methods, and neural networks such as MLPs, GRUs, and LSTMs. Gradient boosting algorithms like XGBoost and LightGBM, while effective on structured datasets, struggle with capturing intricate, nonlinear relationships in high-dimensional data and face scalability and interpretability challenges. Similarly, MLPs require extensive tuning to handle complex, high-dimensional data effectively, while GRUs and LSTMs, optimized for sequential data, are less suitable for non-sequential regression tasks and introduce significant computational overhead. KANs address these limitations by leveraging the Kolmogorov–Arnold representation theorem, which decomposes multivariate functions into sums and compositions of univariate functions. This approach allows KANs to model complex nonlinear relationships efficiently while maintaining interpretability, scalability, and adaptability to high-dimensional data. The spline-based architecture further enhances their ability to generalize well across diverse regression tasks, offering a significant advantage over both traditional and modern machine learning models. By outperforming these methods in terms of accuracy and computational efficiency, KANs demonstrate their potential as a transformative tool for tackling complex regression problems across various domains.
- Economic and social impact: By improving prediction accuracy in housing price estimation, this research contributes to a more transparent and informed real-estate market. Enhanced predictive capabilities help stakeholders make better decisions, optimize investments, and mitigate risks. Furthermore, the insights gained from the study can lead to the development of smarter urban planning and housing policies, ultimately benefiting society by fostering equitable and sustainable growth in the housing sector.
- Broader applicability across industries: While the focus is on house price estimation, the versatile nature of KANs offers potential applications beyond real estate. KANs can be applied in areas such as financial modeling, healthcare predictions, environmental analysis, and supply chain optimization. The study underscores the adaptability of this approach, setting a precedent for its application in diverse regression and prediction tasks.
2. Literature Review and State of the Art
2.1. KAN-Based Approaches
2.2. Literature Review House Price Prediction
3. Materials and Methods
3.1. Problem Formulation and Proposed Model Description
- Independence: The input features are assumed to be conditionally independent given the target variable.
- Sufficient data: The dataset is assumed to be sufficiently large and diverse to capture the underlying patterns in housing prices.
- Feature engineering: The input features are assumed to be preprocessed (e.g., normalized) to ensure compatibility with the neural network.
- Input Layer: The input layer accepts a vector X of size n, where n corresponds to the number of features. Each feature is standard-scaled to ensure consistent scaling across all inputs.
- Intermediate Layers:
- Three KAN layers are sequentially stacked, each performing a decomposition of the input into univariate components using splines.
- Each KAN layer introduces nonlinear transformations, enabling the model to capture intricate feature interactions.
- The splines are defined by a set number of knots and a specific order, which control the granularity and smoothness of the approximations. The number of knots determines how many polynomials are strung together. The grid range and epsilon define the bounds and resolution of the decomposition.
- Fully connected layers refine the outputs of the KAN layers.
- Activation functions introduce nonlinearity and enhance the model’s expressiveness.
- Output Layer: The output layer consists of a single neuron with a linear activation function, producing the predicted price .
- Loss Function: The Mean Squared Error (MSE) is used as the loss function.
- Regularization: Dropout and L1 and L2 regularization techniques are applied to prevent overfitting.
- Optimization Algorithm: The model is trained using the Adam optimizer, which combines the benefits of momentum and adaptive learning rates to ensure efficient convergence.
3.2. Research Methodology
- 1.
- Literature Review
- 2.
- Data Collection and Preprocessing
- 3.
- Model Development and Validation
- MLP
- -
- h is the output of the neuron.
- -
- W is the weight matrix.
- -
- x is the input vector.
- -
- b is the bias vector.
- -
- f is the activation function (e.g., ReLU and sigmoid).
- LSTM
- GRU
- Regression-Based Models
- Boosting-Based Models
- 4.
- Integration and Testing—Experiments
4. Results
- (a)
- Exclusion of feature extraction models: No feature extraction methods, such as autoencoders or Boltzmann Machines, were used beforehand.
- (b)
- Model optimization: The most effective configuration parameters (the number of layers, neurons, optimizer, etc.) for each model were identified through hyperparameter tuning, a critical aspect of model development and validation. Hyperparameters play a pivotal role in defining both the structural and operational characteristics of a model, significantly influencing its performance. The hyperparameter tuning process employed a combination of manual trial-and-error and grid-search techniques to identify optimal settings. The tuning involved several key steps. For the learning rate, a range of values (e.g., 0.001 and 0.01) was tested to balance precision in convergence with training efficiency as smaller learning rates can lead to more precise convergence but increase training time, while larger rates may risk overshooting the optimal solution. The architecture was iteratively refined by testing different configurations of layers and neurons, starting with a simple structure and gradually increasing complexity. Performance on the validation set was closely monitored to determine the optimal depth and number of neurons. Various activation functions were evaluated to assess their impact on convergence speed and overall model accuracy, with the final choice made based on empirical performance. Additional parameters, including batch size and dropout rates, were systematically varied to enhance model robustness and mitigate overfitting. The effects of these parameters on validation performance were carefully observed to ensure stability and generalization. The optimal hyperparameter settings, summarized in Table 1, were determined based on the best validation performance across multiple runs, ensuring that the model effectively generalizes to unseen data. Table 1 below shows the optimal values for the KAN-based model on the datasets.
4.1. Dataset A—Greece Listings
- location_name (the municipal of the house; categorical feature; 73 unique values; most common: Athens, 21%).
- location_region (the region of the house; categorical feature; possible values: Attiki/Thessaloniki; most common: Attiki, 94%).
- res_type (the type of the property; possible values: building, apartment, etc.; five unique values).
- res_address (secondary location attribute; this stands for the exact neighborhood of the property; categorical feature; 987 unique values).
- res_price (advertised price for the property (in euros). This is the feature that the model has to predict. The mean value is 367,000.
- res_sqr (square meters of the property. The mean value is 169).
- construction_year (the construction year of the property).
- levels (levels for the property, e.g., 1st floor, 2nd floor, etc.).
- bedrooms (the number of bedrooms. The mean is 2.58).
- bathrooms (the number of bathrooms. The mean is 1.48).
- status (the current status for the property; categorical feature; possible values such as ‘good’, ‘renovated’, etc.; eight unique values).
- energyclass (categorical feature; the energy class is from the lowest level (H) to the highest (A+) in the following order: H, Ζ, Ε, Δ, Γ, Β, Β+, A, and A+. There are also three possible values for the energy class: Non-effective, Excluded, and Pending).
- auto_heating (autonomous heating: 1 for Yes, 0 for No).
- solar (solar water heater: 1 for Yes, 0 for No).
- cooling (cooling: 1 for Yes, 0 for No).
- safe_door (safety door: 1 for Yes, 0 for No).
- gas (1 for Yes, 0 for No).
- fireplace (1 for Yes, 0 for No).
- furniture (1 for Yes, 0 for No).
- student (Is it appropriate for students? 1 for Yes, 0 for No).
4.2. Dataset B—California Housing
- longitude: Represents the longitude coordinate of the district.
- latitude: Represents the latitude coordinate of the district.
- housing_median_age: Represents the median age of the houses in the district. The mean value is 28.64.
- total_rooms: Represents the total number of rooms in the district. The mean is 2643.66.
- total_bedrooms: Represents the total number of bedrooms in the district. The mean is 538.43
- population: Represents the total population of the district. The mean is 1425.48.
- households: Represents the total number of households in the district. The mean is 499.54.
- median_income: Represents the median income of the households in the district. The mean value is 3.87.
- median_house_value: Represents the median house value in the district. The mean is 206,855.
- ocean_proximity: Represents the proximity of the district to the ocean. It contains five different categories.
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Feth, M. Proptech: The real estate industry in transition. In The Routledge Handbook of FinTech; Taylor & Francis Group: Abingdon, UK, 2021; pp. 385–391. [Google Scholar] [CrossRef]
- Siniak, N.; Kauko, T.; Shavrov, S.; Marina, N. The impact of proptech on real estate industry growth. IOP Conf. Ser. Mater. Sci. Eng. 2020, 869, 062041. [Google Scholar] [CrossRef]
- He, X.; Lin, Z.; Liu, Y. Volatility and Liquidity in the Real Estate Market. J. Real Estate Res. 2018, 40, 523–550. [Google Scholar] [CrossRef]
- Braesemann, F.; Baum, A. PropTech: Turning Real Estate Into a Data-Driven Market? SSRN Electron. J. 2020, 1–22. [Google Scholar] [CrossRef]
- Chiang, M.-C.; Sing, T.F.; Wang, L. Interactions Between Housing Market and Stock Market in the United States: A Markov Switching Approach. J. Real Estate Res. 2020, 42, 552–571. [Google Scholar] [CrossRef]
- Adamczyk, T.; Bieda, A. The applicability of time series analysis in real estate valuation. Geomat. Environ. Eng. 2015, 9, 15–25. [Google Scholar] [CrossRef]
- Mora-Garcia, R.-T.; Cespedes-Lopez, M.-F.; Perez-Sanchez, V.R. Housing Price Prediction Using Machine Learning Algorithms in COVID-19 Times. Land 2022, 11, 2100. [Google Scholar] [CrossRef]
- Kundu, A.; Sarkar, A.; Sadhu, A. KANQAS: Kolmogorov-Arnold Network for Quantum Architecture Search. EPJ Quantum Technol. 2024, 11, 76. [Google Scholar] [CrossRef]
- Liu, Z.; Wang, Y.; Vaidya, S.; Ruehle, F.; Halverson, J.; Soljačić, M.; Hou, T.Y.; Tegmark, M. KAN: Kolmogorov-Arnold Networks. arXiv 2024, arXiv:2404.19756. [Google Scholar] [CrossRef]
- Moradi, M.; Panahi, S.; Bollt, E.; Lai, Y.C. Kolmogorov-Arnold Network Autoencoders. arXiv 2024, arXiv:2410.02077. [Google Scholar] [CrossRef]
- Jiang, B.; Wang, Y.; Wang, Q.; Geng, H. A Hybrid KAN-ANN based Model for Interpretable and Enhanced Short-term Load Forecasting. Authorea Prepr. 2024. [Google Scholar] [CrossRef]
- Yang, X.; Wang, X. Kolmogorov-Arnold Transformer. arXiv 2024, arXiv:2409.10594. [Google Scholar] [CrossRef]
- Hollósi, J.; Ballagi, Á.; Kovács, G.; Fischer, S.; Nagy, V. Detection of Bus Driver Mobile Phone Usage Using Kolmogorov-Arnold Networks. Computers 2024, 13, 218. [Google Scholar] [CrossRef]
- Danish, M.U.; Grolinger, K. Kolmogorov–Arnold recurrent network for short term load forecasting across diverse consumers. Energy Rep. 2025, 13, 713–727. [Google Scholar] [CrossRef]
- Gao, Y.; Hu, Z.; Chen, W.-A.; Liu, M.; Ruan, Y. A revolutionary neural network architecture with interpretability and flexibility based on Kolmogorov–Arnold for solar radiation and temperature forecasting. Appl. Energy 2025, 378, 124844. [Google Scholar] [CrossRef]
- Limsombunchai, V.; Gan, C.; Lee, M. House Price Prediction: Hedonic Price Model vs. Artificial Neural Network. Am. J. Appl. Sci. 2004, 1, 193–201. [Google Scholar] [CrossRef]
- Afonso, B.; Melo, L.; Oliveira, W.; Sousa, S.; Berton, L. Housing Prices Prediction with a Deep Learning and Random Forest Ensemble. In Proceedings of the Encontro Nacional De Inteligência Artificial E Computacional (Eniac), Rio Grande, Brazil, 24 September 2020; pp. 389–400. [Google Scholar] [CrossRef]
- Nouriani, A.; Lemke, L. Vision-based housing price estimation using interior, exterior & satellite images. Intell. Syst. Appl. 2022, 14, 200081. [Google Scholar] [CrossRef]
- Kim, J.; Lee, Y.; Lee, M.-H.; Hong, S.-Y. A Comparative Study of Machine Learning and Spatial Interpolation Methods for Predicting House Prices. Sustainability 2022, 14, 9056. [Google Scholar] [CrossRef]
- Joshi, H.; Swarndeep, S. A Comparative Study on House Price Prediction using Machine Learning. Int. Res. J. Eng. Technol. 2022, 9, 782–788. [Google Scholar]
- Ragb, H.; Muntaser, A.; Jera, E.; Saide, A.; Elwarfalli, I. Hybrid GRU-LSTM Recurrent Neural Network-Based Model for Real Estate Price Prediction Hybrid GRU-LSTM Recurrent Neural Network-Based Model for Real Estate Price Prediction. TechRxiv 2023. [Google Scholar] [CrossRef]
- Dobrovolska, O.; Fenenko, N. Forecasting Trends in the Real Estate Market: Analysis of Relevant Determinants. Financ. Mark. Inst. Risks 2024, 8, 227–253. [Google Scholar] [CrossRef]
- Rey, D.; Neuhäuser, M. Wilcoxon-Signed-Rank Test. In International Encyclopedia of Statistical Science; Springer: Berlin/Heidelberg, Germany, 2011; pp. 1658–1659. [Google Scholar] [CrossRef]
Parameter | Optimal Values: Dataset A | Optimal Values: Dataset B |
---|---|---|
Model architecture | 256-128-64-32-1 | 50-40-30-20-1 |
Optimizer | Adam | Adam |
Activation function | ReLU | ReLU |
Number of knots | 7 | 7 |
Spline layer | 9 | 9 |
Learning rate | 0.001 | 0.01 |
Batch size | 8 | 8 |
Model Name | MAE | RMSE | SMAPE (%) |
---|---|---|---|
KAN | 35,861 | 52,158 | 16.53 |
GRU–LSTM | 36,972 | 53,763 | 16.92 |
CatBoost | 37,615 | 54,918 | 17.14 |
XGBoost | 38,157 | 55,711 | 17.41 |
LGB | 38,961 | 55,906 | 17.98 |
MLP | 39,123 | 57,034 | 18.07 |
Random forest | 46,124 | 67,341 | 21.16 |
Linear regression | 46,451 | 68,176 | 21.65 |
Model Name | MAE | RMSE | SMAPE (%) |
---|---|---|---|
XGBoost | 30,625 | 42,477 | 15.81 |
LGB | 30,792 | 42,709 | 15.93 |
KAN | 31,961 | 44,191 | 16.45 |
Random forest | 32,774 | 45,458 | 16.87 |
GRU–LSTM | 33,174 | 46,012 | 16.98 |
MLP | 35,723 | 49,548 | 17.14 |
CatBoost | 36,345 | 50,411 | 18.67 |
Linear regression | 40,452 | 56,107 | 20.26 |
Model Name | MAE | RMSE | SMAPE (%) |
---|---|---|---|
KAN | 34,716 | 53,378 | 17.74 |
XGBoost | 36,102 | 55,533 | 18.31 |
Model Compared | p-Value | Performance Improvement |
---|---|---|
CatBoost | 0.03 (Significant) | 9.19% |
GRU–LSTM | 0.003 (Significant) | 3.28% |
LGB | 0.502 | 2.40% |
Linear regression | 0.005 (Significant) | 27.98% |
MLP | 0.003 (Significant) | 9.31% |
Random forest | 0.0312 (Significant) | 15.62% |
XGBoost | 0.6875 | 1.10% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Viktoratos, I.; Tsadiras, A. Advancing Real-Estate Forecasting: A Novel Approach Using Kolmogorov–Arnold Networks. Algorithms 2025, 18, 93. https://doi.org/10.3390/a18020093
Viktoratos I, Tsadiras A. Advancing Real-Estate Forecasting: A Novel Approach Using Kolmogorov–Arnold Networks. Algorithms. 2025; 18(2):93. https://doi.org/10.3390/a18020093
Chicago/Turabian StyleViktoratos, Iosif, and Athanasios Tsadiras. 2025. "Advancing Real-Estate Forecasting: A Novel Approach Using Kolmogorov–Arnold Networks" Algorithms 18, no. 2: 93. https://doi.org/10.3390/a18020093
APA StyleViktoratos, I., & Tsadiras, A. (2025). Advancing Real-Estate Forecasting: A Novel Approach Using Kolmogorov–Arnold Networks. Algorithms, 18(2), 93. https://doi.org/10.3390/a18020093