Maximizing Profitability and Occupancy: An Optimal Pricing Strategy for Airbnb Hosts Using Regression Techniques and Natural Language Processing
Round 1
Reviewer 1 Report
Comments and Suggestions for Authors
Article is good for the research point of view however, I am suggesting a few points as follows.
-The abstract will be expanded to provide a more comprehensive overview of the research, including the objectives, methodology, key findings, and implications.
-Recent research articles from 2024 will be included to ensure the study is up-to-date with the latest findings and developments in the field.
-The research scope will be clearly defined and elaborated upon, ensuring it is precise and relevant to the study's objectives.
-A comparative study table will be added to present a clear and concise comparison of key aspects, findings, or methodologies from various sources or studies.
Author Response
Thank you for all your comments. We have updated the paper. In this Word you can check the draft. Thank you.
Author Response File: Author Response.docx
Reviewer 2 Report
Comments and Suggestions for Authors
1. There are many AirBnB datasets are available such as New York City, etc. But why only Rome, Italy alone chosen? Multiple cities data will improve the research study.
2. Figure 3. is general, not required
3. Introduce architectural diagram
4. Also, algorithm can be presented.
5. why only 3 methods alone chosen? XGB,NN,SVR? also, these methods are not equal to compare. XGM is ensemble method, but NN is for different purpose. How NLP augment these methods?
6. Result section is too weak and doesn't present the novelty. Useful prediction for price and utilisation is needed.
Comments on the Quality of English Language
minor editing is required
Author Response
1. There are many AirBnB datasets are available such as New York City, etc. But why only Rome, Italy alone chosen? Multiple cities data will improve the research study.
1. Thank you for your input. we made sure to add the explanation behind the choice in the first paragraph when we introduce the dataset.
2. Figure 3. is general, not required
2. Figure 3 is the wordcloud based on the frequency of the words in the reviews scraped and used throughout the research. We believe it can be beneficial to be in the paper.
3. Introduce architectural diagram
3. Thank you for your comment. This sounds like a great idea. We added the workflow chart.
4. Also, algorithm can be presented.
4. We are not sure we understood this part correctly, if you are referring to the lines of code or the algorithms themselves. We believe that the algorithms have been thoroughly explained through the paper.
5. why only 3 methods alone chosen? XGB,NN,SVR? also, these methods are not equal to compare. XGM is ensemble method, but NN is for different purpose. How NLP augment these methods?
5. We understand your concern. We tried to explain this better in the Table 1.
6. Result section is too weak and doesn't present the novelty. Useful prediction for price and utilisation is needed.
6.
The Results section showcases using advanced machine learning models, including XGBoost, Neural Networks, and Support Vector Regression, to predict Airbnb prices. These models were not only selected for their robustness but also optimized through hyperparameter tuning, demonstrating a high level of technical rigour. The comparison of different models, as seen in the results (R² and MAE values), provides clear evidence of the effectiveness of these techniques, with Neural Networks performing the best (R² = 0.81, MAE = 0.18). This multi-model approach underscores the novelty of applying a range of cutting-edge algorithms to a complex, real-world problem. Moreover, integrating NLP techniques, particularly Aspect-Based Sentiment Analysis (ABSA), to analyze guest reviews adds a unique dimension to the pricing strategy. This method goes beyond traditional numerical data analysis by incorporating qualitative data, offering a more comprehensive view of factors influencing pricing. The use of ABSA to understand sentiments towards specific aspects of the listing, such as host behaviour and property location, exemplifies the innovative approach taken in this research. Such an approach allowed us to highlight the importance of various features in determining property prices, such as location, number of bedrooms, and guest reviews. By identifying key predictors, the study provides actionable insights for Airbnb hosts on where to invest to maximize profitability and occupancy. For example, properties in "Centro Storico" or with a high number of positive reviews are shown to significantly impact price, which is critical information for hosts looking to optimize their listings. Using Mean Absolute Error (MAE) and R² as performance metrics ensures that the predictions are accurate and reliable. Discussing these metrics in the Results section helps validate the usefulness of the predictions in real-world applications, such as setting dynamic pricing strategies. Therefore, we think the paper presents substantial empirical findings, supported by rigorous model validation. Combining different models and the detailed analysis of their performance against the test set strengthens the section. Accordingly, the major novelty of our contribution lies in the dual approach of combining machine learning with NLP to capture both quantitative and qualitative aspects of Airbnb listings. This holistic approach provides a deeper understanding of the factors that drive price and utilization, which is novel and practically valuable for Airbnb hosts.
Reviewer 3 Report
Comments and Suggestions for Authors
The paper presents a well-researched and innovative approach to optimizing Airbnb pricing using advanced machine learning and NLP techniques. The paper applied diverse ML and DL algorithms (XGB, SVR, NN) and compares their performance for price prediction. It also integrates NLP techniques to mine insights from reviews. It addresses relevant and timely issues, provides a solid theoretical foundation, and discusses practical implications, making it a valuable contribution to the field.
To further strengthen the paper, the following areas should be addressed:
1. Comparison of Feature Selection Methods: The paper should elaborate on how Forward-Feature Selection outperformed PCA in this study. More details should be provided on the comparison between Forward-Feature Selection and PCA, including the number of features selected by each method, the criteria used in Forward Selection, and a quantitative comparison of model performance with features selected by each method.
2. Detailed Data Description: To enhance the reproducibility and credibility of the research, more information about the Airbnb dataset should be provided, such as the period covered by the data, specific data features, and the process of data acquisition.
3. Neural Network Architecture Justification: The paper mentions the use of a neural network composed of two fully connected dense layers, each with 64 neurons, and a final layer with a single neuron. This architecture should be justified and compared with other architectures used for similar tasks in the literature, if possible, to provide context and support for the chosen design.
4. Literature Support for Sentiment Analysis: The paper should include a literature review to support the claim that sentiment analysis has gained significant attention in the deep learning community, which can help situate the paper within the broader context of current research trends.
Comments on the Quality of English Language
The authors use appropriate academic language and terminology throughout the paper, demonstrating a good command of the subject matter. The sentences are generally well-formed, and the ideas are expressed clearly. While the overall language quality is good, a moderate English editing could help enhance clarity and readability of this paper.
Author Response
Response 1 -> Feature Selection vs PCA
Thank you for pointing it out. Here is the revised version of that, we made sure to add the comparison between the two methods we used:
The dataset, consisting of approximately 75 columns, required a meticulous evaluation to enhance its predictive capability. Irrelevant columns such as hostName, listingUrl, and scrapeId were eliminated, while the remaining features underwent rigorous assessment using various feature selection techniques. Among these, Forward Feature Selection (FFS) significantly outperformed Principal Component Analysis (PCA), leading to a 27 percent improvement in the R² metric.
Forward Feature Selection works by incrementally adding features that improve model performance, using the ANOVA F-statistic to identify features with the strongest relationship to the target variable. In our case, this method selected the top 10 features, which were directly relevant to the prediction task. Principal Component Analysis, by contrast, is a dimensionality reduction technique that transforms the original features into a set of linearly uncorrelated principal components. We reduced the data to 4 principal components, designed to capture the maximum variance in the dataset.
Despite PCA's strength in reducing dimensionality and minimizing the risk of overfitting, FFS proved more effective for our purposes. When applied to models such as XGBoost and SVR, the features selected by FFS resulted in a lower MAE compared to those derived from PCA, underscoring FFS's superiority in preserving features with direct predictive relevance.
In addition to curating the most productive features, we introduced new attributes that showed significant predictive power. These include activeDaysHost, which measures the duration of host activity; amenitiesTotal, which counts the total number of amenities in a listing; and verificationsTotal, representing the number of verifications completed by the host. Given the high dimensionality of our dataset, we were careful to avoid the "Curse of Dimensionality" and the risk of overfitting, ensuring that our model remained both robust and generalizable.
Response 2 -> Detailed Data Description:
Thank you for your comment. We tried improving it by:
This paper focuses on analyzing the public Airbnb dataset for Rome, Italy, obtained from Inside Airbnb. The dataset used in this study was last scraped in May 2023, providing a recent snapshot of the short-term rental market in the city. Initially comprising approximately 75 columns, the dataset underwent a meticulous evaluation and preprocessing to enhance its predictive capability. Our data preparation process involved several key steps: (i) removing listings with improper or incomplete information, (ii) eliminating records with missing values, (iii) converting categorical features to one-hot vectors for machine learning compatibility, and (iv) removing irrelevant and uninformative features. Specifically, columns such as hostName, listingUrl, and scrapeId were eliminated, while the remaining features underwent rigorous assessment using various feature selection techniques. Further, we normalized the dataset and removed anomalies to ensure data quality. The refined dataset was then split into training and test sets to facilitate model development and evaluation. This comprehensive data preparation process ensures the reliability and relevance of our analysis, contributing to the reproducibility and credibility of our research findings in the context of Airbnb pricing optimization for Rome's short-term rental market. Further, the dataset has been normalised, and anomalies have been removed. Then, data have been split into two sets, namely the train set and the test set.
Response 3: Thank you for your comment. We tried to explain the reason of choosing this structure of NN here:
In our implementation, we employed a neural network architecture composed of two fully connected Dense layers, each containing 64 neurons, followed by a final layer with a single neuron tailored to our regression task. This architecture was chosen after extensive experimentation with various network configurations. Notably, this relatively simple structure outperformed more complex architectures, including those with additional layers or more neurons per layer. The superior performance of this leaner network can be attributed to the moderate complexity of our dataset, which does not require an overly sophisticated model to capture its underlying patterns. Our chosen architecture strikes an optimal balance: it's capable of modeling the non-linear relationships within the data while avoiding overfitting, a common issue with larger networks on datasets of this scale. We determined an optimal learning rate of 0.078 and applied a Dropout rate of 0.4 to further enhance generalization. The model was trained for 50 epochs, a number we found to provide sufficient iterations for learning without risking overfitting. This configuration allowed the model to effectively capture the nuances of Airbnb pricing patterns while maintaining robust performance on unseen data. Through careful monitoring of training progress and validation performance, we confirmed that this architecture achieved strong predictive capabilities while preserving excellent generalization, outperforming both simpler linear models and more complex neural networks.
Response 4 -> Latest literature citation for NLP with DL
In recent years, sentiment analysis has garnered significant attention within the deep learning community, revolutionizing the approach to handling complex textual data. Unlike traditional rule-based or machine learning approaches, deep learning methods have demonstrated remarkable capabilities in automatically learning features from large-scale datasets, substantially enhancing the accuracy and efficiency of sentiment classification and analysis \cite{sentiment2024}. The adoption of deep learning for sentiment analysis offers considerable advantages, particularly in terms of precision and performance. Neural network models excel at identifying intricate patterns and relationships in data, often surpassing the accuracy of traditional machine learning methods. DL utilises a layered approach in its neural network structure, particularly in the hidden layers. Unlike conventional ML methods, where feature definition and extractions are done manually or rely on feature selection techniques, DL automatically learns and extracts features \cite{dang2020sentiment}. Furthermore, these models exhibit versatility across various domains and languages, adapting effectively to different data types and scales, which is particularly valuable for analyzing sentiment in platforms like social media \cite{sentiment2024_2}. The layered approach in deep learning neural network structures, especially in the hidden layers, sets it apart from conventional machine learning methods. While traditional approaches rely on manual feature definition and extraction or feature selection techniques, deep learning algorithms automatically learn and extract relevant features [dang2020sentiment]. This capability has led to a surge in research exploring various deep learning architectures for sentiment analysis. The growing body of literature in this field underscores the potential of deep learning in advancing sentiment analysis techniques and their applications across diverse domains.
Round 2
Reviewer 2 Report
Comments and Suggestions for Authors
paper is revised