Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Intelligent Feature Selection Ensemble Model for Price Prediction in Real Estate Markets

Informatics 2025, 12(2), 52; https://doi.org/10.3390/informatics12020052

by Daniel Cristóbal Andrade-Girón¹, William Joel Marin-Rodriguez^2,*

and Marcelo Gumercindo Zuñiga-Rojas³

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Informatics 2025, 12(2), 52; https://doi.org/10.3390/informatics12020052

Submission received: 18 March 2025 / Revised: 29 April 2025 / Accepted: 15 May 2025 / Published: 20 May 2025

(This article belongs to the Section Machine Learning)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The manuscript effectively emphasizes the importance of accurate real estate price forecasting, referencing historical crises (2007–2008) and contemporary economic needs. Evaluates multiple ensemble models with and without feature selection, offering a broad view of how each algorithm performs in high-dimensional settings. Systematically compares RFE, Random Forest–based selection, and Boruta, shedding light on the trade-offs in accuracy and computational savings. The paper employs well-established regression metrics (MAE, MSE, RMSE, R²) and occasionally MAPE and Explained Variance, providing detailed insights into model performance. The conclusions section rightly points to further avenues for research (diverse datasets, hyperparameter optimization, hybrid ensemble methods), underlining awareness of the field’s ongoing developments. However, there are some issues need to be considered:

- Repetitive references to the importance of real estate and repeated citation for the same conceptual points dilute the core message.

- The literature review, while wide-ranging, could be restructured to compare and contrast studies more cohesively, highlighting gaps more directly.

- The manuscript notes which models degrade or improve in accuracy after feature selection but offers little in-depth explanation of why certain algorithms are more sensitive to dimension reduction.

- The results do not explicitly connect improvements in prediction error to tangible outcomes for real estate professionals, policymakers, or financial institutions.

- Future studies are encouraged to explore automated tuning, but the current manuscript does not clarify whether or how hyperparameters were optimized. This can be critical for ensemble performance.

- In Literature Review, please Group studies by key themes (e.g., boosting vs. bagging, importance of feature selection) and specifically highlight their limitations, so the manuscript’s novelty stands out more clearly.

- Incorporate short discussions or case scenarios on how improved accuracy could benefit end-users like banks or urban planners. More explicit discussion of economic or business implications would help practitioners appreciate the full utility of the findings.

- If possible, include approximate training times or computational resource usage to substantiate claims of “improved scalability.”

- Demonstrate how fewer features can help domain experts improve transparency and interpretability in real estate valuations.

- In addition to generic references to hyperparameter tuning, specify which methods might be most fruitful.

- For hybrid ensembles, propose a few candidate architectures (e.g., stacking XGBoost with Random Forest, or ensembling bagged and boosted models) and how they might be systematically compared.

- Although dimensionality reduction is mentioned as an advantage for interpretability and efficiency, concrete examples of how these factors translate into real-world benefits are not clearly shown.

- The figures are not presently clear and acceptable. Furthermore, I advise the writers to guarantee consistency and clarity all through the paper by carefully re-checking every symbol and its meaning in all equations.

- The manuscript does not clearly demonstrate how it significantly advances existing knowledge. The literature review is broad and includes a large number of references that are either only tangentially relevant. This weakens the scientific focus and originality of the manuscript.

Overall, this manuscript makes a valuable contribution to real estate price modeling by illustrating how various ensemble methods perform under different feature-selection frameworks. By refining the structure (especially in the abstract, introduction, and literature review) and expanding on interpretability, runtime data, and domain-specific impacts, the paper would become even more compelling for both academic and professional audiences.

Comments on the Quality of English Language

The manuscript requires improvements in writing style, structure, and formatting.

Author Response

Comments in attached file

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

I only have a few minor comments and clarifications that I believe it will help the quality of the paper.

[Section 4, Table 1 & 2]

It would be helpful to explain how the “best” model was selected. Although Stacking tends to show the lowest MAE and highest R^2, it’s worth discussing the role of model complexity and how such choices translate to real-world applications.

[Section 4]

The drop in performance with RFE-based feature selection across most models needs further explanation. Were the excluded features too weak or perhaps not relevant enough? A brief interpretation of this could provide more clarity.

Boruta appears to outperform both RFE and RF in terms of maintaining accuracy, as shown in Table 3. Even though Boruta performs better, the paper doesn’t explain why. A quick note on what might give it an edge in this case would be helpful.

[Model Training]

The training section doesn’t mention whether hyperparameters were tuned or defaults were used. It would help to briefly clarify which approach was taken and why.

[Section 2: Literature Review]

The literature review is broad but could benefit from more focus. Please include a concise summary table listing past works, the algorithms used, datasets, and performance metrics for comparison with your results.

Page 6–7, Selection Methods:

If these formulas influenced decisions in the analysis, a short explanation of how they were used would be useful.

Page 16, Conclusions:

The conclusions feel mostly descriptive. Adding a few concrete insights—like which models offer better interpretability or are more practical in low-resource environments—would make this section more useful.

Parts of the text, such as Paragraphs 2–4 on Page 3, include unnecessary detail. Streamlining these sections and cutting repetition would help maintain reader interest and improve flow.

[Page 5–6, Pre-processing]
Authors mention normalization using several techniques (StandardScaler, MinMaxScaler, PowerTransformer). Which was ultimately applied in the models? Please clarify and justify your choice.

Authors mention that “no new data were created or analyzed,” yet the study uses the Ames Housing dataset. Please revise the Data Availability Statement to reflect this publicly available dataset and provide the direct link.

[Page 13–14, Discussion of Trade-Offs]
The trade-off between accuracy and computational efficiency is discussed, but no concrete runtime or resource comparisons are given. Please consider adding a table or comment on actual runtime or memory usage for full vs. reduced models.

[Ensemble Methods Description]
The paper provides mathematical expressions for each ensemble algorithm, but lacks implementation details. Please specify the base learners used (e.g., decision trees of what depth?) and whether homogeneous or heterogeneous learners were applied in stacking.

The feature selection step appears sequential and independent of model training. Consider discussing the potential benefit of embedding feature selection within model tuning (e.g., via recursive CV or pipeline optimization).

[Tables 1–3]

The models with fewer features (RFE, RF, Boruta) are compared using error metrics, but statistical tests (e.g., paired t-tests or Wilcoxon signed-rank tests) to assess significance of performance differences are not included. Please consider including such comparisons.

[Model Evaluation]

MAPE is listed as one of the metrics but is not reported in the results tables. Please either include MAPE in the results or remove it from the methodology section for consistency.

Authors mention a 70/30 split, but don’t specify whether the feature selection and scaling were fitted only on the training data and then applied to the test set. Please clarify to ensure no data leakage occurred.

Author Response

Comments in attached file

Author Response File: Author Response.pdf

Article Menu

Intelligent Feature Selection Ensemble Model for Price Prediction in Real Estate Markets

Further Information

Guidelines

MDPI Initiatives

Follow MDPI