# Combining Design Patterns and Topic Modeling to Discover Regions That Support Particular Functionality

## Abstract

**:**

## 1. Introduction

- A novel framework for discovering functional regions that combines results based on patterns and LDA topic modeling in three different ways: mutual evaluation to identify cases of significant agreement or disagreement; using pattern-based knowledge to adjust topic probabilities; and using topic probabilities to adjust pattern-based results.
- A discussion, in the context of GIS, of the benefits of combining the interpretability offered by knowledge-based techniques with the transferability and scalability of data-driven methodologies.

## 2. Related Work and Critical Analysis

#### 2.1. Knowledge-Based Approaches

#### 2.1.1. Discovering Functional Regions using Composition Patterns

#### 2.1.2. Critical Analysis of the Pattern-Based Approach

#### 2.2. Data-Driven Approaches

^{2}regions in New York and London to find clusters containing similar distributions of place types. Zhou and Zhang [26] similarly combined Twitter and Foursquare data to extract spatial distributions of common human activities (e.g., food and restaurants, shops and services, outdoor and recreation) and to determine major hotspots. Finally, Zhi et al. [27] used a vast dataset of 15 million social media check-ins over a year to detect functional regions. Spatiotemporal structures which potentially represent associations between functional regions and human activities were extracted; these associations were then used to discover functional regions in the city of Shanghai.

#### 2.2.1. Functional Region Extraction from POI and Human Activity Data

#### 2.2.2. Critical Analysis of the Topic Modeling Approach

## 3. Methodology

#### 3.1. Mutual Evaluation

#### 3.2. Data to Knowledge Fusion

#### 3.3. Knowledge to Data Fusion

## 4. Demonstration and Results

#### 4.1. Study Area and Data

#### 4.2. Results Using Individual Approaches

^{2}). Figure 2 presents the results of each individual approach on the same map. Darker hues indicate higher probability of the region being a “shopping plaza”, with red and gray colours denoting results using the pattern-based and topic modeling approach, respectively. Figure 3 presents the results of a primitive integration process that does not follow any of the proposed methodologies in Section 3: It simply includes only those results from both approaches that overlap and score higher than 50%. A pie chart is also provided, showing how each category of sub-functions within the pattern contributes to the confidence value.

#### 4.3. Results of Mutual Evaluation

#### 4.4. Results of Data to Knowledge and Knowledge to Data Fusion

#### 4.5. Overall Results

## 5. Discussion

- are highly functional, also explaining which particular functions mostly contribute to this, as derived from the knowledge-based aspect;
- are popular, based on the inclusion of social media information exploited by the data-driven aspect;
- are homogeneous both in terms of the POIs included and the way they are spatially organized.

## 6. Conclusions

## Appendix A

**Table A1.**Top-15 ranked point of interest (POI) types for the “shopping plaza” topic in [12].

Category | Probability | Category | Probability |
---|---|---|---|

shopping mall | 0.207709 | bistro | 0.000105 |

accessories store | 0.056738 | dumpling restaurant | 0.000096 |

chocolate shop | 0.013896 | korean restaurant | 0.000090 |

shoe store | 0.000288 | german restaurant | 0.000080 |

breakfast spot | 0.000282 | herbs & spices store | 0.000079 |

gaming cafe | 0.000196 | airport terminal | 0.000078 |

optical shop | 0.000180 | outlet store | 0.000076 |

post office | 0.000114 |

Variable | Component | Filter |
---|---|---|

${C}_{S}$ | Shop | $Type\_Filter(\u201cShop\u201d)$ |

${C}_{A}$ | Amenity | $Type\_Filter(\u201cAmenity\u201d)$ |

${C}_{F}$ | Facilities | ${C}_{S}\cup {C}_{A}$ |

${C}_{WP}$ | Walkable plaza | $Type\_Filter(\u201cSurface\u201d)\phantom{\rule{3.33333pt}{0ex}}\cap $ |

$Prop\_Filter(\u201cwalkable\u201d,\u201ctrue\u201d)$ | ||

${C}_{H}$ | Motorway | $Type\_Filter(\u201cRoad\u201d)\phantom{\rule{3.33333pt}{0ex}}\cap $ |

$Prop\_Filter(\u201cpedestrians\u201d,\u201cfalse\u201d)$ | ||

${C}_{Sr}$ | Service Road | ${C}_{H}\cap Prop\_Filter(\u201cpedestrians\u201d,\u201ctrue\u201d)$ |

${C}_{W}$ | Walkable | ${C}_{WP}\cup {C}_{Sr}$ |

${C}_{P}$ | Parking place | ${C}_{A}\cap Prop\_Filter(\u201cservice\u201d,\u201cparking\u201d)$ |

${C}_{B}$ | Transportation node | ${C}_{A}\cap Prop\_Filter(\u201cservice\u201d,\u201ctransportation\u201d)$ |

${C}_{An}$ | Anchor Store | ${C}_{S}\cap Prop\_Filter(\u201cgoods\u201d,\u201cvarious\u201d)$ |

${C}_{M}$ | Mall | ${C}_{S}\cap Prop\_Filter(\u201cgoods\u201d,\u201cvarious\u201d)\phantom{\rule{3.33333pt}{0ex}}\cap $ |

$Prop\_Filter(\u201cservic{e}^{\u201d},\u201cvarious\u201d)$ | ||

${C}_{At}$ | Attractors | ${C}_{M}\cap {C}_{An}$ |

${C}_{Sb}$ | Basic Shop | ${C}_{S}\cap Prop\_Filter(\u201cgoods\u201d,\u201cbasic\u201d)$ |

${C}_{Se}$ | Special Shop | ${C}_{S}\cap Prop\_Filter(\u201cgoods\u201d,\u201cspecial\u201d)$ |

${C}_{Su}$ | Uncommon Shop | ${C}_{F}\cap (Prop\_Filter(\u201cgoods\u201d,\u201cuncommon\u201d)\cup $ |

$Prop\_Filter(\u201cservices\u201d,\u201cuncommon\u201d))$ | ||

${C}_{As}$ | Food court | ${C}_{A}\cap Prop\_Filter(\u201cservice\u201d,\u201csustenance\u201d)$ |

${C}_{Ae}$ | Entertainment | ${C}_{A}\cap Prop\_Filter(\u201cservice\u201d,\u201centertainment\u201d)$ |

${C}_{Al}$ | Luxury services | ${C}_{A}\cap Prop\_Filter(\u201cservice\u201d,\u201chealth\&beauty\u201d)$ |

${C}_{Av}$ | Aesthetics | ${C}_{A}\cap Prop\_Filter(\u201cservice\u201d,\u201cvisuallypleasing\u201d)$ |

Functional Implications | |
---|---|

Functions$\left(\mathcal{F}\right)$ | Logical Formula |

${F}_{W}({C}_{Sb},\phantom{\rule{3.33333pt}{0ex}}{C}_{At},\phantom{\rule{3.33333pt}{0ex}}{C}_{W},\phantom{\rule{3.33333pt}{0ex}}{C}_{Sr})$ (Walkability) | $Occurrence({C}_{W},\mathbb{N})\wedge ((Occurrence({C}_{Sb},[5,\infty ))\wedge Proximity({C}_{Sb},{C}_{Sb},(0,500m])\wedge \phantom{\rule{3.33333pt}{0ex}}S\_Relation({C}_{W},{C}_{Sb},\left[\mathit{intersects}\right]))\vee \left(Occurrence({C}_{At},[1,\infty ))\right)\wedge \phantom{\rule{3.33333pt}{0ex}}S\_Relation({C}_{W},{C}_{At},\left[\mathit{intersects}\right]))$ |

${F}_{SE}({C}_{At},\phantom{\rule{3.33333pt}{0ex}}{C}_{Sb},\phantom{\rule{3.33333pt}{0ex}}{C}_{W})$ (Shopping Experience) | ${F}_{W}\wedge (Occurrence({C}_{Sb},[5,\infty )\wedge S\_Relation({C}_{W},{C}_{Sb},\left[intersects\right]))\vee (Occurrence({C}_{At},[1,\infty )\wedge S\_Relation({C}_{W},{C}_{At},\left[contains\right]))$ |

${F}_{SV}\left({C}_{Sb}\right)$ (Shopping Variety) | ${F}_{SE}\wedge Occurrence({C}_{Sb},[5,\infty ))$ |

${F}_{AT}\left({C}_{Sb}\right)$ (Sh. Attractiveness) | ${F}_{SE}\wedge Occurrence({C}_{At},[1,\infty ))$ |

${F}_{SD}({C}_{Sb},{C}_{Se})$ (Sh. Orientation) | ${F}_{SE}\wedge Correlation({C}_{Sb},{C}_{Se},[2,\infty ))$ |

${F}_{SG}\left({C}_{Se}\right)$ (Special Goods) | ${F}_{SE}\wedge Occurrence({C}_{Se},\mathbb{N})$ |

${F}_{CC}({C}_{Sb},\phantom{\rule{3.33333pt}{0ex}}{C}_{At},\phantom{\rule{3.33333pt}{0ex}}{C}_{Su},\phantom{\rule{3.33333pt}{0ex}}{C}_{W})$ (Compatible Components) | ${F}_{SE}\wedge Occurrence({C}_{Su},\mathbb{N})\wedge (Correlation({C}_{Sb}\cup {C}_{At},{C}_{Su},[5,\infty ))\vee Proximity({C}_{W},{C}_{Su},[500m,\infty )))$ |

${F}_{SO}({C}_{S},{C}_{A})$ (Shopping Opportunities) | ${F}_{SE}\wedge Occurrence({C}_{A},\mathbb{N})\wedge Correlation({C}_{S},{C}_{A},[2,\infty ))$ |

${F}_{L}\left({C}_{As}\right)$ (Leisure) | ${F}_{SO}\wedge Occurrence({C}_{As},\mathbb{N})$ |

${F}_{E}\left({C}_{Ae}\right)$ (Entertainment) | ${F}_{SO}\wedge Occurrence({C}_{Ae},\mathbb{N})$ |

${F}_{LS}\left({C}_{Al}\right)$ (Luxury Services) | ${F}_{SO}\wedge Occurrence({C}_{Al},\mathbb{N})$ |

${F}_{Resupply}({C}_{W},{C}_{H})$ | ${F}_{SE}\wedge Occurrence({C}_{H},\mathbb{N})\wedge Proximity({C}_{W},{C}_{H},[0,1000m])$ |

${F}_{AD}({C}_{W},{C}_{P})$ (Access to Drivers) | ${F}_{W}\wedge Occurrence({C}_{P},[1,\infty ])\wedge \phantom{\rule{3.33333pt}{0ex}}(S\_Relation({C}_{W},{C}_{P},\left[\mathit{intersects}\right]))\vee Proximity({C}_{W},{C}_{P},[0,200m])$ |

${F}_{AN}({C}_{W},{C}_{B})$ (Access to Non-drivers) | ${F}_{W}\wedge Occurrence({C}_{B},[1,\infty ])\wedge \phantom{\rule{3.33333pt}{0ex}}(S\_Relation({C}_{W},{C}_{B},\left[\mathit{intersects}\right]))\vee Proximity({C}_{W},{C}_{B},[0,200m])$ |

${F}_{WS}({C}_{H},{C}_{W})$ (Walking Safety) | ${F}_{W}\wedge Occurrence({C}_{H},\mathbb{N})\wedge S\_Relation({C}_{W},{C}_{H},\left[\mathit{disjoint}\right]))$ |

${F}_{WO}({C}_{S},{C}_{A})$ (Well-Organized) | ${F}_{SE}\wedge Occurrence({C}_{A},\mathbb{N})\wedge \phantom{\rule{3.33333pt}{0ex}}S\_Configuration({C}_{S},{C}_{A},\left[clustered\right])$ |

${F}_{VP}({C}_{Av},{C}_{W})$ (Visually Pleasing) | ${F}_{W}\wedge Occurrence({C}_{Av},\mathbb{N})\wedge (S\_Relation({C}_{W},{C}_{H},\left[\mathit{intersects}\right])\vee Proximity({C}_{W},{C}_{Av},[0,200m]))$ |

Scoring Function | |

${F}_{SE}\ast {F}_{W}\ast ({F}_{SD}+{F}_{SO}+{F}_{SA}+{F}_{SG}+{F}_{L}+{F}_{E}+{F}_{LS}+{F}_{AD}+{F}_{AN}+{F}_{R}+{F}_{WS}+{F}_{VP}+{F}_{WO})\ast error$ |

**Figure 1.**Overview of the proposed framework fusing knowledge-based and data-driven approaches. Latent Dirichlet allocation (LDA).

**Figure 8.**Results combining data-to-knowledge and knowledge-to-data fusion in the Denver metropolitan area.

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

