Finding publicly available datasets where images and tabular attributes for real estate listings co-exist is challenging. For this paper, the dataset of listings was gathered from a Swiss real estate platform. The platform contains property advertisements in Switzerland and its neighboring cross-border cities. The announcements are published by private individuals and real estate agencies, expanding over several listing categories. Most notably, an announcement may involve selling or renting different property types such as flats or houses. Among these, flat rentals had the highest number of announcements. While the platform and its structured property features were translated and made available in four languages (English, French, German, and Italian), the unstructured description of the property often remained in the original language of the publication.
Our dataset, named SRED (Swiss Real Estate Dataset), consists of 17,758 flat rentals scraped in English during February and March 2021. Each listing contains a price, which may refer to rental price or sales price depending on the type of listing. Additionally, throughout this paper, the term price has been used for rental prices.
2.1. Tabular Data
SRED contains 12 structured and unstructured features, some of which may be relevant for flats. The structured features that are suitable for both property types (flat or house) are the price, living space (), number of bathrooms, number of rooms, location (longitude and latitude), year of construction, and advertiser (private contact or agency details). Additionally, there are common unstructured features in the form of a title for the announcement, attached property images, and a description of the advertisement. Some features are only meaningful and relevant based on the property type. For instance, a flat may have a feature indicating the floor number on which it is located.
While the primary aim of this paper is to use all predictive features, the heterogeneity in the available data played an essential role in the final selection of adequate modeling features. Some scraped features were frequently missing due to inconsistency in HTML tags and were often misplaced by the user. The chosen consistent and informative features include the living space, number of bedrooms, and location. The literature also supports this choice of variables [9
The inclusion criteria of a listing are of two types. Firstly, and the methodology dictates that listings must (i) report the exact address to obtain location features, (ii) have a year of construction < 2020, since those ≥ 2020 were assumed under construction, and (iii) have at least four property images, since this is central to the modeling process. Secondly, listings must (vi) have a surface of at least 18 , (v) have rental prices between CHF 200 and 7500, and (vi) have to be located in Switzerland and its cross-border cities ( longitude and latitude ).
2.2. Image Data
Each listing in SRED contains a set of images attached by the advertiser. The real estate platform assumes that these images provide visual information about the property, for instance, showcasing the listing’s interior and exterior. However, some images do not directly represent a property, such as images of realtor’s logos, pets, nearby forests, and waterfalls. These images should be removed from the dataset to obtain cross-comparable images across all listings. For this purpose, we designed a three-stage process that included three classification models where we assumed that most output images were photographs. Before beginning the image pre-processing, each SRED listing had a mean of 8.42 images and a median of 8.
In the first step, images representing logos, mock-ups, the layout schemes of a rental unit or its affiliated buildings, and all images not showing the property’s appearance were removed. Indeed, a logo could bring information such as the name and contact details of the commercial realtor, which may also be provided in the tabular feature, and thus carry redundant information. Removing those prevents a pricing model from learning shortcuts on incomparable listings. Further, we included other irrelevant images (e.g., pets and persons) within the irrelevant category. In the first stage of our process, we used a binary classification model to filter such images out from our dataset.
The second type of images considered irrelevant included mostly outdoor images where the building of the flat was not identifiable. Such images primarily relate to the property’s exterior, for instance, photos taken of or from gardens, balconies, and, more generally, outdoor shots that did not include the property building. The second pre-processing stage was designed to remove such outdoor images through another binary classifier.
In the last step, the remaining relevant images were classified into one of seven categories: bathroom, bedroom, kitchen, living room, dining room, interior (miscellaneous), and exterior. To this aim, we used the publicly available datasets from Poursaeed et al. [13
]. There are four key remarks with regard to these images. First, the interior class mainly represents the miscellaneous and, in our case, potentially unwanted images, such as stairs, corridors, elevators, and unusual images, that are not strictly related to the property. Second, the kitchen category contains open kitchens (no walls or barriers separate the kitchen from the living or dining rooms), like many SRED listings. Third, distinguishing between living and dining rooms may not be relevant, particularly for studios or small properties. At last, the exterior class has the most images which bear various architectural styles. This variety helps the classification model to identify the different styles of buildings. Since the dataset provided by [13
] is not specific to one region while SRED is specific to Switzerland, it may be argued that some classes, such as exterior images, could benefit from images that are primarily Swiss architectural designs. In practice, such architectural styles did not seem to influence the correct identification of exterior images, as the other classes differed vastly.
We summarize our three-stage image pre-processing in Figure 1
. This process was applied to each image in our dataset.
In order to train our first two binary models, two individual annotators labeled images from SRED from randomly selected listings. For the first classification task, 15,110 images were labeled, where 96% belonged to the class with relevant images. For the second classification model, there were 14,549 images, including 92% relevant images. Examples of annotated irrelevant images for these two stages have been shown in Figure 2
. Additionally, it should be noted that the irrelevant images of the first model were not carried on to the second set of annotations. The images in both cases were randomly split in 95:5 proportions for training and testing. In terms of the performance on the test set, the first classification model reached a balanced accuracy of 97.5%, while the second model achieved a balanced accuracy of 94.5% for removing irrelevant exterior images. Attaining a high specificity in both cases—99.8% and 99.2%, respectively—meant that, on average, very few relevant images were removed.
We trained the classification model from the third stage based on the combination of the dataset by Poursaeed et al. [13
] and images from SRED. The original dataset Poursaeed et al. (2018) used had 145,994 images of 224 × 224 pixels. We removed duplicated images from their dataset and added 2352 labeled images from SRED, yielding a dataset of 148,342 images. The annotations for SRED were performed only for the three categories of the bathroom, exterior, and kitchen, where the newly annotated images from SRED were to capture the Swiss architectural idiosyncrasies.
The images from the exterior category of [13
] were incompatible with those found in SRED. This occurred because the exterior category in [13
] consisted mostly of images taken from the frontal angle of the building. This meant that images from SRED, which showed a balcony or the building from a side angle, could be mistaken for another category that interests us (e.g., bathroom) since it was not included in the training set. Moreover, the images from the balcony of a property were vastly different than those from the frontal view of the building, and they deserved a dedicated category. However, as there were too few images of balconies, such images were eventually included within the exterior category. This inclusion did not pose significant challenges for answering our research question since we assumed that some irrelevant balcony images were filtered out during the second stage of image processing, that is, when deploying our second classification model for removing the irrelevant exterior images. For this task, 95% of the data was randomly taken for the training set, while the remaining 5% was used for the test set. This multi-class model reached 89.5% accuracy on the test set. The class distributions and performances are shown in Table 1
. It can be observed that when considering the balanced accuracy, the best performing classes are exterior and bathroom, while the relatively worse performances are attained for living room and dining room.
To cross-compare the images from the listings, it was necessary to have a homogeneous set of room types among them. As certain room types were not found between all the listings, choosing which room types depended on what kind of rooms a human appraiser—either a professional individual or a renter—would consider predominant when comparing properties. Additionally, the choice of the room types was constrained by the available images from SRED, where certain classes may be more frequently found among the listings than others. In work by Ahmed and Moustafa [14
], the authors used frontal images of houses (in our case labeled ‘exterior’), bedrooms, kitchens, and bathrooms for estimating property prices. After running the model and classifying the room types in SRED, we found the most relevant and frequent classes to be the property’s exterior, living room, kitchen, and bathroom. An example of each room type is depicted in Figure 3
. It may also be argued that the appearance of certain room types such as the kitchen and bathroom may differ, while other room types such as the bedroom and living room may be similar. Therefore, in selecting relevant room types, we sought to find a balance between the frequency and diversity of the room types. Furthermore, the methodology section explains why we selected four room types.
To avoid having several images belonging to one type, we kept the image with the highest probability of belonging to that class, removing the atypical room types. There were some listings where the probability of two images belonging to the same class was very similar or the same (indicating possible duplicates), and for such cases, we selected the image with the lowest probability for its most likely alternative.