The Airbnb listing data presented in the previous Section are aggregated for the City of Athens. However, as it is going to be evident in the forthcoming analysis, the expansion and prevalence of the phenomenon is extremely localized in nature, in the sense that certain parts, or neighborhoods, of the city exhibit high concentrations of relevant STR activity, while others remain relatively “underexploited”. This is characteristically presented in Figure 1
, which depicts the density of Airbnb listings for the neighborhoods of Athens over the examined time period (2015 through 2019).
More specifically, Figure 1
a showcases the density of activity in the first data dump of 17th July 2015 in the form of a heat map. Even though the central neighborhoods accumulate most listings, the overall distribution appears to be homogenous, as not big differences are witnessed between the center and the periphery. A totally different picture is presented in Figure 1
b (the density heat map of the last data dump of 19th November 2019); in this case, the concentration of Airbnb listings in the central neighborhoods is two orders of magnitude bigger than the periphery. Additionally, STR activity seems to be “moving” to certain northern, southern and south-eastern neighborhoods.
In essence, the aim of this work is not only to showcase the active neighborhoods, but to also try to interpret the phenomenon and capture its dynamics in a systematic & consistent way. In this direction, the analysis that follows quantifies the evolution of STRs by examining various aspects of the available data and mapping them into the geolocation-driven ontology.
5.1. Keyword Extraction
User reviews constitute one of the most straightforward ways of studying STR prevalence. Even though taste and preference have an indisputably personal character when evaluating a lodging, they may also convey more general information, that falls well beyond the scope of a specific stay. This is vividly pictured in Figure 2
, where an actual review from the dataset is displayed.
A detailed analysis of the content of this specific review leads to some interesting observations; for the most part, the user in question is evaluating the property (yellow marking) and the interaction with the host (blue marking). Nevertheless, certain remarks about the neighborhood are also present; the availability of shops, restaurants (orange marking) and public transportation (green marking) nearby, as well as a landmark, the President Hotel (purple marking).
In general, review content may be grouped in any of the five categories appearing in the legend of Figure 2
; that is;
Landmarks or Attractions
Even though the last two categories naturally vary between accommodations and hosts, the first three characterize the neighborhoods and constitute an indication of why certain areas are more popular than others. Therefore, the application of a keyword extraction methodology on review data is going to quantify user preference with respect to the aforementioned categories of interest.
Keyword extraction has been discussed in Section 2
, where it has been reasoned that the keyword extraction methodology of choice in this work is RAKE [36
], a domain independent algorithm, which counts term appearance and co-occurrence frequencies excluding special words, such as conjunctions and prepositions. Initially, all reviews pertaining to properties within a specific Athens’ neighborhood are concatenated into a single document. Then the document is split into a list of words and the stopwords (most common words & prepositions like “and”, “the” etc) are removed, getting a list known as content words.
In the following step, a square symmetric matrix of content word co-occurrences W
is created; whose
element designates the number of times word i
co-occurs with word j
in a phrase, with the maximum considered phrase length
being a hyper-parameter of the approach. Once W
has been computed, then the score
of content word i
equals the ratio of its degree
, i.e., the sum of the number of co-occurrences i
has with any other content word in the text (Equation (1
)), over its frequency
in text (Equation (2))
Content words may be viewed as phrases of length 1. The score of longer phrases (up to length ) spotted in the the list of content words ensues from the summation of the scores of the individual words they are comprised of. Finally, in order to filter out rare words and phrases, a minimum frequency is defined for any non-stopword to be included in the list of content words (and of any phrase of length less than to be considered), which constitutes the second hyper-parameter of the approach.
After a thorough experimentation procedure, the optimal values of the hyper-parameters have been determined to be
. Table 4
below summarizes the 20 most frequently extracted keywords for the neighborhood of Emporiko Trigono-Plaka, which is the most popular among the City of Athens neighborhoods, as it concentrates the biggest number of listings and the most user reviews.
A closer examination of the extracted keywords for the Plaka neighborhood reveal similar patterns, as those discussed when analyzing Figure 2
. Among the most popular keywords, there exist those that are related to nearby attractions or landmarks (e.g., the National Archeological Museum of Athens), to public transportation (e.g., “x95 bus stop” and “major metro stations”) and to the availability of shops (e.g., “main shopping strip”). Of course, the most popular keywords need not be the same for all neighborhoods; in fact, the observed similarities and dissimilarities inbetween different neighborhoods are going to be among the key elements of the created ontology for the STR in the City of Athens.
5.2. Ghost Hotels
The main focus when studying STR impact on the housing markets is around entire home or apartments listings which, as it has already been discussed in Section 4.1
), account for the overwhelming majority of the available properties in the City of Athens. The reason is that the aforementioned listings can no longer be available to house long-term tenants, thereby intensifying the housing crisis that has been witnessed in Athens in recent years. Private or shared rooms, on the other hand, are generally not regarded as a contributing factor to the said phenomenon, as they are viewed as properties that don’t affect the housing availability for regular tenants.
Nevertheless, a detailed inspection on private or shared rooms reveals that above assumption is not always valid. Table 5
groups these categories of rooms on a per host basis, for the first and last data dumps of the dataset (Section 4
). For example, on 17th July 2015, 387 private or shared rooms were available on Airbnb for the City of Athens, 228 of which were listed by hosts having exactly one listing on the platform and the rest by hosts having more listings. This last grouping of listings are generally being referred to as “ghost hotels” because a single host pretends to possess multiple small properties while, in fact, s/he owns a larger one, split into individual rooms, much like an ordinary hotel.
In reality, these cases constitute an unregulated form of hotel operation and have been repeatedly labeled as unfair competition by both registered hotel owners and tourism-related authorities [47
]. Monitoring the existence and expansion of ghost hotels is of great importance as it illustrates the dynamics of STRs and the development of tourism within the city. Indeed, Table 5
also portrays a more than a three-fold increase in relevant entries between the first and last data dumps, while ghost hotel listings have witnessed a 6-fold increase from 97 to 789 or from
illustrates ghost-hotel expansion in the City of Athens between 2015 & 2019 (first & last data dumps). It is very similar to Figure 1
, in the sense that most ghost exist in the central neighborhoods and not in the periphery. However, ghost hotel prevalence does not follow the same patterns as entire homes and apartments, since it does not seem to move to southern, northern & south-eastern areas. An exception to this rule is the rather sharp density increase around the areas of the Central Railway Station of Athens (Stathmos Larissis) and in the Kypseli neighborhood.
5.3. Ontology Creation
Following the previous analysis and case study description, the effect of user reviews is considered to be optimally captured within a given spatial resolution; namely, the diversification based on the municipal boundaries of the City of Athens, Greece. These variations are affiliated with special classes related to visitor comments, therefore the identified variables are dependent on their distinctive spatial identifier (i.e., polygon vertices and area-names). The variables are classified into particular fields on inductive thought, taking into consideration user opinion and the specific characteristics of each accommodation. The proposed knowledge model may be expressed in a formal manner with the use of basic elements towards semantic interpretation, such as concepts, relations between concepts and topics, that result in the ontology structure depicted in Figure 4
In particular, each reviewing comment is considered to be part of a qualitative assessment category (i.e., “Landmark”, “Transportation”, “Shop”) in addition to specific quantitative ones (i.e., ”Price” and ”Amenity”) that are captured after each lease. Furthermore, each location consists of sub-classes (including the aforementioned classes), which are subsequently connected according to their respective statistical importance, among all areas of interest. In order to define, extract, and use the underlying knowledge of a set of concepts, we rely on the semantics of their relations, as the latter are expressed by the so-called “is-related” relation. In other words, the existence of an edge in the graph quantifies the relation, whereas the absence of an edge illustrates a non-existing relationship between any two concepts.
Since relations among real-life concepts are often uncertain (or a matter of degree), the approach followed herein may be extended to include a formal methodology and mathematical notation based on fuzzy relational algebra [48
]. Still, as depicted in Figure 5
, the proposed model is quite flexible and can be adjusted to the required research framework, i.e., one or more of its sub-classes may be altered accordingly. Therefore, in the presented approach, classifying natural language text through automated statistical and non-statistical procedures is split on the type of service provided by the owners of the listed properties.