Neighborhoods are vaguely defined, localized regions that share similar characteristics. They are most often defined, delineated and named by the citizens that inhabit them rather than municipal government or commercial agencies. The names of these neighborhoods play an important role as a basis for community and sociodemographic identity, geographic communication and historical context. In this work, we take a data-driven approach to identifying neighborhood names based on the geospatial properties of user-contributed rental listings. Through a random forest ensemble learning model applied to a set of spatial statistics for all n-grams in listing descriptions, we show that neighborhood names can be uniquely identified within urban settings. We train a model based on data from Washington, DC, and test it on listings in Seattle, WA, and Montréal, QC. The results indicate that a model trained on housing data from one city can successfully identify neighborhood names in another. In addition, our approach identifies less common neighborhood names and suggestions of alternative or potentially new names in each city. These findings represent a first step in the process of urban neighborhood identification and delineation.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited