Next Article in Journal
Exploring the Latent Manifold of City Patterns
Next Article in Special Issue
Spatiotemporal Evolution of the Online Social Network after a Natural Disaster
Previous Article in Journal
Which Gridded Population Data Product Is Better? Evidences from Mainland Southeast Asia (MSEA)
Previous Article in Special Issue
Study on the Impact of the COVID-19 Pandemic on the Spatial Behavior of Urban Tourists Based on Commentary Big Data: A Case Study of Nanjing, China
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Geospatial Semantics Analysis of the Qinghai–Tibetan Plateau Based on Microblog Short Texts

1
State Key Laboratory of Resources and Environmental Information System, Institute of Geographic Sciences and Natural Resources Research, CAS, Beijing 100101, China
2
College of Resources and Environment, University of Chinese Academy of Sciences, Beijing 100049, China
*
Author to whom correspondence should be addressed.
ISPRS Int. J. Geo-Inf. 2021, 10(10), 682; https://doi.org/10.3390/ijgi10100682
Submission received: 13 July 2021 / Revised: 29 September 2021 / Accepted: 5 October 2021 / Published: 10 October 2021
(This article belongs to the Special Issue Geovisualization and Social Media)

Abstract

:
Place descriptions record qualitative information related to places and their spatial relationships; thus, the geospatial semantics of a place can be extracted from place descriptions. In this study, geotagged microblog short texts recorded in 2017 from the Tibetan Autonomous Region and Qinghai Province were used to extract the place semantics of the Qinghai–Tibetan Plateau (QTP). ERNIE, a language representation model enhanced by knowledge, was employed to extract thematic topics from the microblog short texts, which were then geolocated and used to analyze the place semantics of the QTP. Considering the large number of microblogs published by tourists in both Qinghai and Tibet, we separated the texts into four datasets according to the user, i.e., local users in Tibet, tourists in Tibet, local users in Qinghai, and tourists in Qinghai, to explore the place semantics of the QTP from different perspectives. The results revealed clear spatial variability in the thematic topics. Tibet is characterized by travel- and scenery-related language, whereas Qinghai is characterized by emotion, work, and beauty salon-related language. The human cognition of place semantics differs between local residents and tourists, and with a greater difference between the two in Tibet than in Qinghai. Weibo texts also indicate that local residents and tourists are concerned with different aspects of the same thematic topics. The cities on the QTP can be classified into three groups according to their geospatial semantic components, i.e., tourism-focused, life-focused, and religion-focused cities.

1. Introduction

Semantics refers to the meaning of expressions in a language, and includes realistic semantics and cognitive semantics. In cognitive semantics, the meanings of language expressions are related to human cognitive ability [1]. When referring to space, semantics deals with the meaning of spatial language [2], which is an interdisciplinary research area combining Geographic Information Science (GIScience), cognitive science, artificial intelligence (AI), and the Semantic Web [3,4]. Spatial semantics in the linguistics domain typically involves how languages structure space and schematize spatial relations from perceptual representations and world knowledge, which is the result of spatial cognition [5,6]. In the field of geography, geospatial semantics analyzes the meaning of digital referents at the geographic scale and involves the concepts of geographical entities and ontology; its purpose is to deal with the semantic interoperability of geo-referenced information [3,4].
Egenhofer clarified three types of geospatial semantics: semantics of geospatial entity classes, semantics of spatial predicates, and semantics of geospatial names [7], which have been widely researched [8,9,10,11,12,13,14,15,16]. As well as the classes, names and spatial relations of geographical entities having meanings, the geographical entities themselves also have meanings; this is known as place semantics [17].
Place semantics is endowed with the natural attributes of geographical entities and the human activities surrounding them, and can be captured through human descriptions and human–place interactions. Spatial cognition research from a semantics perspective can help us better capture the meaning of a place, thereby enriching place attributes in computational models [18]. Moreover, place semantics are related to human experiences. People may have different perceptions of places; therefore, numerous studies have investigated the ways in which different cultures categorize geographic features [11,19], indicating significant differences in how cultures perceive space–time and spatial relations [20,21,22,23]. Thus, research on people’s perceptions of a place can provide better services to people according to their needs. However, there is relatively little research on the cognition of semantic information related to geographical spaces.
Place descriptions in sources such as news articles, local chronicles, social media texts, and travel diaries represent a method of communicating spatial information and a type of mental representation of human spatial cognition [24,25]. By exploring human mobility and activities via place descriptions, we can extract abundant semantic information of places [26,27]. The increasing popularity of social media texts such as Twitter (US) and Sina Weibo (China), in which georeferenced texts record the public’s perception of locations, has led to a valuable source of place descriptions. Therefore, social media texts can be used to study semantic cognition in relation to geographical space.
The Qinghai–Tibetan Plateau (QTP) is a unique region in the world. This large area contains various landscapes, leading to rich geospatial semantics. Its high altitude and poor accessibility have caused long-term geographical, social, and cultural isolation, effectively preserving the unique culture, customs, and lifestyle of the indigenous people. Moreover, the various landscapes and special ethnic customs attract a large number of tourists each year. Thus, studying the geospatial semantic characteristics and variability in the QTP, as well as differences in the geospatial semantic cognition between locals and tourists, can reveal the emotions and perceptions of different groups and improve our understanding of the culture and religion of indigenous peoples.
This study analyzes spatial cognition in the QTP from a semantics perspective using social media texts. There are two main research questions: (1) Do locals and tourists in the QTP exhibit different cognition in relation to place semantics? (2) Is there spatial variability in the place semantics related to the QTP? Therefore, the objective of this study is to investigate place semantics related to the QTP according to different groups of people and spatial locations, and classify cities in the QTP according to their place semantics. The remainder of the paper is organized as follows. Section 2 reviews related work and background knowledge. Section 3 explains the study area and data. Section 4 describes the methodologies and the framework. Section 5 presents the analytical results and discussion. Section 6 summarizes the main contributions of this study and highlights future research directions.

2. Related Work

2.1. Geospatial Semantics

Kuhn defined geospatial semantics as “understanding GIS contents and capturing this understanding in formal theories” [4]. This definition expresses the fact that geospatial semantics involves the human cognition and formal modeling of geographic concepts. There is a large body of literature on geospatial cognition, ranging from behavior geography and mental representation to language descriptions of geospaces [28,29,30,31,32]. Research on the formal modeling of geographic concepts includes geographical ontology, digital gazetteers, geographical information retrieval and linked data [33,34,35,36,37,38,39].
Recent work on geospatial semantics has focused on eliciting semantic information from semi-structured and unstructured resources [17,34,39]. The semantics of a place or a geospatial entity not only originate from its natural attributes, but also from human activities in the place. Cai et al. [40] used geospatial semantics to represent the meaning of a place, which is related to the functions provided by the place as well as human activities within the place. A place may provide many functions where people can engage in various activities; thus, it can have multiple meanings, which can be inferred from human mobility and activities [41]. Increasing amounts of crowdsourced big data, such as mobile data, smart car data, social media data, and points of interest (POIs), reveal patterns of human mobility and activities. These data can therefore be used to extract the multiple activity-related semantics of a place. For example, Gao et al. [42] identified urban functional regions using POIs and user check-in data on social media, Wang et al. [43] detected the geospatial semantics of urban regions based on POI categories, and Tu et al. [44] and Cai et al. [40] interpreted dynamic urban functions and spatial semantics through human activities via mobile phone and positioning data.
Place descriptions depict the qualitative characteristics of geographic locations from multiple perspectives, and can be a rich source of geospatial semantics. For example, Hu et al. [26] extracted the place semantics of cities from news articles and Huang [27] categorized geographic features using text documents. Geographically referenced social media texts provide a vast and valuable source of place descriptions; thus, they have previously been used to extract the semantic information of places. Steiger et al. [45] analyzed the spatiotemporal and semantic characteristics of georeferenced Tweets and found that the extracted spatiotemporal and semantic clusters of Tweets indicated the human activity patterns and urban structure. Moreover, Chen et al. [46] extracted and analyzed the hidden semantics of regions from georeferenced social media data using the Latent Semantic Analysis method. Furthermore, Lansley and Longley extracted topics of geo-tagged Tweets posted in London, UK, and found clear spatial and temporal variations in topics and attitudes [47]. Georeferenced social media data are increasingly used to study spatial regions and human activities, as well as geospatial semantics, due to their characteristics of large volume, easy acquisition and timeliness [41,46,47].

2.2. Natural Language Processing Model

Latent Dirichlet allocation (LDA) is the most widely used method for extracting topics from corpora [42,45,48]. LDA is a document generative model, which assumes that documents exhibit a joint probability distribution for thematic topics and words [49]. LDA is appropriate for longer texts but challenging with short texts such as Weibos and Tweets due to the sparse data and less focused topics. To overcome this problem, short texts may be grouped into long corpora according to individual users or locations [46,50]. However, grouping is not always suitable as it may be important to determine the thematic topic of each piece of short text. Some scholars have solved the problem of data sparseness by gaining external knowledge or combining other models, such as word2vec [51]. Although these methods somewhat improve the accuracy of the results, static word embedding does not consider either polysemy or context.
Recent attempts to solve this problem have included neural networks and a series of natural language processing models, such as ELMO (Embeddings from Language Models), GPT (Generative Pre-Training) and BERT (Bidirectional Encoder Representations from Transformers) [52,53,54]. By adding a context-aware representation, these models have greatly improved the results of natural language processing by employing attention mechanisms. For example, BERT uses a “masked language model” which masks a certain percentage of words in the sentences and learns to predict those masked works [52]. On the basis of these models, the ERNIE (Enhanced Representation through kNowledge IntEgration) language representation model was proposed, which is enhanced by knowledge [55]. The ERNIE model is a deep learning method for constructing language expression. The model architecture uses a bidirectional multilayer transformer as the basic encoder, followed by a self-attention mechanism to capture the contextual information of each word. Compared to BERT, which randomly masks some words from the input sentences, ERNIE adopts knowledge-masking strategies at the phrase and entity levels, and learns the prior knowledge of phrases and entities during the training stage. Thus, knowledge and long semantic information can be learned, such as the relationship between entities, the property of an entity, and the type of an event. In this way, ERNIE can learn the semantic relationship between entities and concepts, thereby greatly enhancing the ability of general semantic representation. ERNIE was produced by the Chinese company Baidu, and has exhibited a better performance than state-of-the-art models in Chinese language processing tasks [55]. As such, ERNIE has been employed to complete various Chinese tasks such as language inference, semantic similarity calculation, named entity recognition, emotion analysis, and question answering. Considering its ability to process Chinese language, ERNIE was adopted in this study to process Weibo social media texts.

3. Study Area and Data

The QTP is a unique part of the world that has an average elevation of over 4500 m. It represents the highest plateau of the world and is known as both “the roof of the world” and “the third pole”. The uplift of the terrain and surrounding ranges has formed a geographically isolated region where the unique climate and physical environment boast not only splendid natural landscapes, but also distinct national cultures. Therefore, studying the semantic characteristics in the QTP can help us better understand the culture and sentiments of indigenous people. Tibet and Qinghai are two provincial administrative units of China that are completely contained within the QTP, occupying a majority of the plateau; therefore, these two provincial administrative units were chosen as the study area (Figure 1). Furthermore, tourists attracted by the beautiful scenery and exotic customs generate numerous geotagged Weibos, from which their feelings and perceptions of the QTP can be extracted and compared with those of indigenous peoples.
In this study, we obtained 1,279,455 geotagged Weibo posts, of which 419,157 were posted in Tibet and 860,298 were posted in Qinghai. After data cleaning, such as tokenize, removing emojis, duplicate posts and stop words, 333,420 and 710,475 posts remained from Tibet and Qianhai, respectively (Table 1). Figure 1 displays the spatial distribution of these Weibo posts. As Qinghai and Tibet differ in both their physical and social environments, we separated the Weibo texts into four datasets according to the type and origin of the user: local users in Tibet, tourists in Tibet, local users in Qinghai and tourists in Qinghai. This classification allowed us to compare the semantic differences between the two provinces and the cognitive differences between indigenous people and tourists.

4. Research Framework and Methodology

Each geotagged Weibo post has a thematic topic that refers to some semantic aspect of a place. The thematic topics of multiple Weibos in a specific region therefore represent the semantic structure of the region and reflect people’s knowledge and perception of the region. First, we extracted the topic of each Weibo considered to represent a semantic description of the place where the Weibo was located. Then, we analyzed the spatial distributions of different semantic descriptions. Finally, the geospatial semantics was compared at the city level, and cities were grouped according to their semantic similarity (Figure 2).

4.1. Weibo Thematic Topic Extraction

The natural language processing model ERNIE1.0, proposed by Baidu, was used for thematic topic extraction. The experiment was based on Python 3.7, the Baidu AI studio platform, and Arcgis10.2. The model input of Weibo embeddings and output the probability distribution over topics.
ERNIE is a pre-trained deep learning natural language model that can fulfill many natural language tasks including topic prediction and text classification. However, to improve the accuracy of topic prediction for our datasets, we employed a small number of annotated Weibos to fine-tune the ERNIE model. There are three steps required to extract the Weibo thematic topics (Figure 2). First, we chose and annotated a training set to fine-tune the model by selecting 1200 Weibos from Tibet and labeling their topics manually. According to the hot topic tags of Sina Weibo, 39 thematic topics were finally determined and used to label the 1200 Weibos. Then, the training set was used to fine-tune the model parameters, and cross entropy was used as the loss function to evaluate the result of the model:
H ( p , q ) = x ( p ( x ) l o g q ( x ) + ( 1 p ( x ) ) log ( 1 q ( x ) ) )
here, p is the probability distribution of the expected output of topics, and p is the probability distribution of the actual output of topics. The smaller the cross entropy, the closer the two probability distributions. Due to the limited number of labeled data, a 10-cross validation was adopted to train and test the model. During fine-tuning, the weight decay was set to 0.1, the learning rate was set to 5 × 10–5, and the batch size was 64. The loss function converged after approximate 50 epochs, and the overall accuracy of the model reached a maximum of approximately 78%. Finally, the fine-tuned model was used to classify the Weibo texts and identify the thematic topic of each Weibo.

4.2. Spatial Distribution of Place Semantics

Due to the large area of the QTP and the uneven distribution of Weibos, the distribution of different thematic topics was concentrated in specific areas. It is difficult to determine the importance of semantic descriptions in a certain region and compare the spatial distributions of different semantic descriptions by the number of Weibos containing certain topics. However, the proportion of Weibos containing certain topics in a certain space range indicates the strength of that topic within that space. If there are n topics and m regions, the proportion of Weibos containing topic t in region r can be represented as follows:
p t r = N t i = 1 n N i
where Nt is the number of Weibos with topic t in region r. If topic t is more significant in region r than in any other regions, then p t r > p t j   ( 1 j m   a n d   j r ) . Thus, we employed the distribution of a topic’s proportion to analyze the distribution of semantic descriptions.
Specifically, we divided the space into 20 × 20 km grids and then calculated the proportion of each topic in each grid. Then, spatial interpolation was conducted according to the topic proportion to identify and compare the continuous spatial distribution of different thematic topics.

4.3. Geospatial Semantic Differences and Clustering

According to the proportion of each thematic topic in region r, the geospatial semantics of the region can be represented by a vector composed of the proportion of each topic:
g e o Sem r = [ p 1 r , p 21 r , , p n r ]
Then, the semantic similarity between two regions, r1 and r2, can be revealed by the angle between two vectors, calculated with cosine similarity:
similar ( r 1 ,   r 2 ) = g e o Sem r 1 · g e o Sem r 2 g e o Sem r 1 g e o Sem r 2
where · is the mode of the vector.
Taking prefecture-level cities as a unit, we represented the geospatial semantics of a city as a vector composed of the proportion of each thematic topic. Chi-square statistics were used to test for significant differences between the geospatial semantics of different cities (Figure 3). Cosine similarity was used to estimate the geospatial semantic similarity between cities; those with similar semantics structures were grouped using the hierarchical clustering method based on the cosine similarity. Hierarchical clustering is an unsupervised classification method that groups similar cities into clusters according to the similarity between cities and generates a tree indicating the hierarchy of the clusters. The semantic patterns in the QTP were then displayed.

5. Results and Discussion

5.1. Weibo Thematic Topics in the QTP

The topic of each Weibo in the four datasets was extracted using ERNIE. The results show that Weibos related to life and emotional expression represent the majority of all four datasets. According to the statistics of Weibo topics in all four datasets, the 20 topics mentioned in the largest number of Weibos were selected for further analysis.
Figure 3 shows the distribution and standardized residuals of the top 20 topics in the four datasets. The Chi-square statistics imply significant differences in the distribution of topics between the four datasets. From the perspective of local users, thematic topics related to travel, scenery, food, religion, and photography are more common in Tibet than in Qinghai, whereas topics related to beauty salons, work, and emotional expression are significantly more common in Qinghai than in Tibet. From the perspective of tourists, thematic topics related to life and emotional expression are more common in Qinghai, whereas travel, scenery, religion and food topics are more common in Tibet, indicating that tourism resources are more attractive to tourists in Tibet than in Qinghai. Spatially, there is a significant difference between the topics of concern for tourists and residents in Tibet. In Tibet, the Weibos of tourists are more related to travel, beauty salons, religion and toponyms but less related to life, emotional expression, food, festivals, and photography than the Weibos of residents. In Qinghai, the Weibos of tourists are more related to life, emotional expression, music, film and TV but less related to beauty salons and work than the Weibos of residents. However, there is little difference in the proportion of travel, scenery, and religion topics between tourists and residents, which again indicates less interest in tourism resources in Qinghai.
Thus, travel, scenery, religion and food are more important in Tibet than in Qinghai, whereas emotional expression, work and beauty salons are less important topics, which reflects that fact that Tibet has stronger tourism characteristics than Qinghai. The relatively low modernization level in Tibet, in addition to the influence of cultural, historical and geographical factors, leads to greater lifestyle differences between Tibet and other provinces, and therefore a greater perception of tourism among Weibo users in Tibet. The geographical location, culture, and lifestyle of Qinghai is very close to that of other inland provinces in China, which may explain the greater perception of lifestyle-related topics among people and the lower emphasis on travel-related topics in Qinghai.

5.2. Spatial Distribution of Place Semantics

Travel, scenery, religion, food, emotions, and work are the main topics extracted from Weibos in the QTP. Figure 4 and Figure 5 show the spatial distributions of these topics for residents and tourists, respectively. Each topic represents a form of geospatial semantics; therefore, the spatial distribution of different topics reflects the distribution of geospatial semantics. Figure 6 displays the continuous spatial distribution of these topics after interpolation, exhibiting the spatial variations in the perception strength of different topics for residents and tourists (left and middle columns, respectively). The right column indicates the difference between the two, reflecting the difference in semantic cognition between residents and tourists. The green color indicates that residents feel more strongly than tourists about a topic, whereas the red color indicates that tourists feel more strongly than residents about a topic.
According to the spatial distributions of travel-related semantics, tourists have a greater perception of travel than residents in the QTP region, especially along the road network. The spatial distribution of travel-related semantics is more scattered for residents, except for an obvious hot spot at the northern margin of the Tsaidam Basin in Qinghai (Figure 6a). Indeed, most Weibos in this region are travel advertisements, indicating the strong desire to develop tourism. Regarding scenery-related semantics, residents’ perception of scenery is greatest close to large residential areas, whereas that of tourists is greatest along the road network, especially along the Qinghai–Tibet Railway and the national highway between Shigatse and Ali (Figure 6b). Regarding food-related semantics, the hot spots for residents are predominantly distributed in Qinghai and the eastern part of Tibet, with most Weibos related to special local products, such as wolfberry, cordyceps sinensis and dried yak meat. Food-related hot spots are more scattered for tourists (Figure 6c). Weibo texts indicate that the food-related topics of most concern to tourists in Qinghai and eastern Tibet are special local products, such as cordyceps sinensis and dried yak meat, in Qinghai and eastern Tibet, whereas those in western Tibet are more general food topics due to the lack of restaurants on the road. The hot spots of emotion-related semantics for both residents and tourists are typically in densely populated areas such as eastern Qinghai and southern Tibet. However, residents have a much stronger perception of sense of emotion-related semantics than tourists (Figure 6d). Hot spots of work-related semantics are found in urban areas for residents; however, these hot spots are only found in Naqu and Haixi for tourists. Indeed, the Weibo texts reveal that the publishers are migrant workers from other provinces (Figure 6e). Regarding religion-related semantics, obvious hot spots occur for both residents and tourists in southern Yushu and northern Changdu, with stronger perception among local residents, indicating a strong religious environment in these areas (Figure 6f).
In general, place semantics exhibit clear spatial variations, as do tourists’ and residents’ perceptions of these semantics. Tourists feel more travel- and scenery-related semantics, whereas residents feel more emotion- and religion-related semantics. The spatial variation of semantics can reveal important regional characteristics. For example, people in northern Qinghai are attempting to develop tourism by attracting more tourists on social media platforms; western Tibet attracts tourists but does not provide good food services; Naqu in Tibet is home to many migrant workers.

5.3. Geospatial Semantic Differences among Cities

Due to the different concerns of Weibo users, the distribution of Weibo topics is uneven in the QTP. For example, travel accounts for the majority of topics in various regions. However, the proportion of the travel-related topics differs among different regions, which reflects regional semantic structure differences.
Figure 7 shows the distribution and standardized Chi-square residuals for the top 20 topics extracted from the Tibetan resident dataset in each Tibetan city. Chi-square statistics show significant differences in the distribution of different cities, which implies different semantic structures for different cities. Travel-related semantics is strongest in Nyingchi, followed by Ngari, Lhasa, and Shannan, which reflects the substantial tourism attraction in these areas. Conversely, the level of tourism interest in Naqu is low for Tibetan residents due to the overall high altitude and tough natural conditions. The level of tourism interest for Tibetan residents is lowest in Changdu due to its location in the border region between Han culture and Tibetan culture, and its similarity to cities in other provinces of China owning to its crowded buildings and people; therefore, Changdu holds little attraction for Tibetan people. In addition, Changdu is characterized by a high proportion of beauty salon-related semantics; the Weibo texts reveal many micro businesses in Changdu selling cosmetic products via social media. The abundance of temples in Lhasa and Shigatse explains the high levels of religious semantics in these two cities. Nyingchi exhibits high levels of travel-, scenery-, and food-related semantics, and Shannan also exhibits high levels of travel-related semantics; however, the place semantics for both cities reveal low interest in religion. These two cities are located in southeastern Tibet, which boasts a good natural environment and rich natural landscape resources; thus, these cities are a notable attraction for Tibetan people. Moreover, Ngari is the birthplace of Tibetan culture and the original Bon religion, containing many famous mountains and holy lakes with religious significance, which are places of pilgrimage for Tibetans and Buddhists; this explains the high levels of emotion-related semantics. As for work-related semantics, no obvious difference is observed among cities.
Figure 8 shows the distribution and standardized residuals of the top 20 topics extracted from the Tibetan tourist dataset in each Tibetan city. Chi-square statistics also show significant differences among cities. Compared to residents, tourists in Tibet exhibit different cognition from a geospatial perspective. The level of travel-related semantics is highest in Changdu, which is the opposite result to that of Tibetan residents. This is because Changdu lies at the border of the QTP and is the place where most tourists enter Tibet. Moreover, the altitude is relatively low, which makes it a more attractive place for tourists. Religion-related semantics is most common in Lhasa, Ngari, and Shigatse for tourists; however, tourists do not have strong cognition of emotional semantics. This contrasts with Tibetan residents, who have strong cognition of emotion-related semantics in Shigatse and Ngari. This indicates that tourists pay more attention to the world outside, whereas residents pay more attention to their inner emotions. However, tourists have strong cognition of life- and emotion-related semantics in Naqu. Food-related semantics is high in Lhasa, whereas health-related semantics is relatively high in Shigatse. As the average altitude of Shigatse is rather high, at greater than 4000 m, many tourists feel uncomfortable coming to this area. Similar to residents, tourists have a strong perception of travel and scenery in Nyingchi and strong perception of scenery in Shannan and Ngari. This implies that the rich tourism resources in Nyingchi are highly attractive to tourists.
Figure 9 shows the distribution and standardized Chi-square residuals for the top 20 topics extracted from the Qinghai resident dataset in each Qinghai city. Chi-square statistics show significantly different distributions of semantic structures among Qinghai cities. Haibei, Hainan, and Haixi are regions with strong cognition of travel and scenery for Qinghai residents. There are many natural scenic spots in these three cities, which are obviously preferred by residents. Huangnan, Golog, and Yushu, which contain many temples, are areas where residents have strong cognition of religion. Xining, the capital city of Qinghai, is dominated by work- and beauty salon-related semantics for residents, reflecting the everyday concerns of residents in large cities. Food-related semantics is abnormally high among residents in Yushu and Haixi. Weibo texts show that many Weibos promote wolfberry and cordyceps sinensis in Haixi, and a “love lunch” activity was lunched in Yushu in 2017. Moreover, Huangnan is characterized by more charity-related semantics among residents.
Figure 10 shows the distribution and standardized Chi-square residuals for the top 20 topics extracted from the Qinghai tourist dataset in each Qinghai city. Again, Chi-square statistics show significantly different semantic structures among Qinghai cities. Similar to Qinghai residents, tourists in Qinghai have stronger cognition of travel in Haibei, Hainan, and Haixi, stronger cognition of religion in Huangnan, Golog, and Yushu, stronger cognition of fashion, work and, beauty salon in Xining, and stronger cognition of charity in Huangnan and Yushu.

5.4. Geospatial Semantic Clustering

Considering the semantic structure of a city as a vector, we calculated the semantic similarity among cities using the cosine similarity and then grouped the cities in the QTP using the hierarchical clustering method. The cities on the QTP can be clustered into three categories according to the geospatial semantic cognition of residents (Figure 11a). Yushu, Golog, and Huangnan in Qinghai exhibit high levels of religion-related place semantics but low levels of other semantics. Changdu exhibits high levels of beauty salon-related place semantics and so is placed in a separate category. Other cities exhibit particularly high levels of travel- and scenery-related place semantics. Additionally, three categories of cities are identified according to the geospatial semantic cognition of tourists (Figure 11b). Xining represents a category by itself due to strong work-, emotion-, and life-related place semantics but low levels of other semantics. Yushu, Golog, Huangnan, Haidong, Lhasa, and Naqu also show strong work-, emotion-, and life-related place semantics; however, these cities also exhibit strong religion-related place semantics, so are clustered into a separate category. Other cities exhibit prominent travel- and scenery-related place semantics but weak life- and emotion-related place semantics. Generally, most cities on the QTP exhibit strong travel- and scenery-related place semantics, especially for tourists. Yushu, Golog, and Huangnan exhibit strong religion-related place semantics.

6. Conclusions

In this study, topics extracted from geotagged Sina Weibos posted in the QTP were used to analyze geospatial semantics in the QTP. By determining the spatial distribution of Weibo topics, the characteristics and regional differences of cognition in the QTP were analyzed from a geospatial semantic perspective.
First, residents’ cognition is more focused on life, emotional expression, food, and festivals, whereas tourists’ cognition is more focused on travel, scenery, and religion. The difference between the two is greater in Tibet than in Qinghai, reflecting the greater tourist appeal of Tibet.
Second, the spatial distribution of place semantics exhibits clear variability. Tourism- and scenery-related place semantics are widely distributed, whereas religion-related place semantics are mainly distributed in Golog, Yushu, and Changdu. Hot spots of work-related place semantics are predominantly close to residential areas. Compared to Qinghai, Tibet exhibits stronger cognition of travel- and scenery-related semantics, whereas Qinghai exhibits stronger cognition of life-related semantics, such as emotion, work, and beauty salons, which indicates that Tibet has stronger tourism characteristics than Qinghai. The spatial variation of place semantics can reveal important regional characteristics such as the amount of migrant workers or food services.
Third, the cognition of geospatial semantics differs substantially between residents and tourists. For example, residents have greater cognition of travel-related semantics in Qinghai than in Tibet, whereas tourists have greater cognition of travel-related semantics in Tibet than in Qinghai. Tibet exhibits a greater the difference between residents and tourist. Residents have the lowest cognition of travel-related semantics in Changdu, whereas tourists have the highest cognition of travel-related semantics in Changdu. Residents have strong cognition of emotion-related semantics in Ngari and Shigatse. However, tourists do not exhibit the same level of cognition. On the contrary, tourists have greater cognition of health-related semantics in Shigatse. Moreover, there is less difference between the perspectives of residents and tourists in Qinghai.
Fourth, Weibo texts indicate that residents and tourists have different concerns about place semantics. Regarding travel, tourists enjoy trips, whereas residents are concerned with improving local tourism attractions. As for food, tourists care about dining on their trips, whereas residents are concerned with selling special local products.
Fifth, clustering results based on semantic similarity show that the cities of the QTP can be divided into approximate three types: tourism-focused cities, life-focused cities and religion-focused cities; however, the categories of some cities differ according to the cognition strength of residents and tourists. Generally, most cities in the QTP have a strong focus on tourism and scenery, especially for tourists, whereas Yushu, Golog, and Huangnan have a strong focus on religion for both residents and tourists.
This research can improve our understanding of the regional characteristics of the QTP. Furthermore, a better understanding of the geospatial semantic cognition differences between different groups of people can be used to improve publicity related to tourism and enhance the regional attractiveness of the QTP.

Author Contributions

Conceptualization, Jun Xu; methodology, Lei Hu; software, Lei Hu; formal analysis, Jun Xu and Lei Hu; investigation, Lei Hu; writing—original draft preparation, Jun Xu and Lei Hu; writing—review and editing, Jun Xu; supervision, Jun Xu. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Strategic Priority Research Program of Chinese Academy of Sciences, Pan-Third Pole Environment Study for a Green Silk Road, grant number XDA20040401, and National Natural Science Foundation of China, grant number 41771477 and 42071376.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data sharing not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Di Donato, P. Geospatial semantics: A critical review. In Computational Science and Its Applications—ICCSA 2010, Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2010; pp. 528–544. [Google Scholar]
  2. Zlatev, J. Spatial Semantics. In The Oxford Handbook of Cognitive Linguistics; Cuyckens, H., Geeraerts, D., Eds.; Oxford University Press: New York, NY, USA, 2007; pp. 318–350. [Google Scholar]
  3. Janowicz, K.; Simon, S.; Pehle, T.; Hart, G. Geospatial semantics and linked spatiotemporal data—Past, present, and future. Semant. Web 2012, 3, 321–332. [Google Scholar] [CrossRef] [Green Version]
  4. Kuhn, W. Geospatial semantics: Why, of what, and how? In Journal on Data Semantics III; Spaccapietra, S., Zimányi, E., Eds.; Springer: Berlin/Heidelberg, Germany, 2005; pp. 1–24. [Google Scholar]
  5. Talmy, L. How language structures space. In Spatial Orientation: Theory, Research and Application; Pick, H., Acredolo, L., Eds.; Plenum Press: New York, NY, USA, 1983; pp. 225–282. [Google Scholar]
  6. Herskovits, A. Language, spatial cognition, and vision, In Spatial and Temporal Reasoning; Oliver, S., Ed.; Kluwer Academic Publishers: Dordrecht, The Netherlands, 1997; pp. 155–202. [Google Scholar]
  7. Egenhofer, M.J. Toward the semantic geospatial web. In Proceedings of the 10th ACM international symposium on Advances in geographic information systems, McLean, VA, USA, 8–9 November 2002. [Google Scholar]
  8. Xu, J. Formalize natural-language spatial relations between linear objects with topologic and metric properties. Int. J. Geogr. Inf. Sci. 2007, 21, 377–395. [Google Scholar] [CrossRef]
  9. Shariff, R.; Egenhofer, M.J.; Mark, D. Natural-language spatial relations between linear and areal objects: The topology and metric of English-language terms. Int. J. Geogr. Inf. Sci. 1998, 12, 215–246. [Google Scholar]
  10. Jones, C.B.; Purves, R.S.; Clough, P.D.; Joho, H. Modelling vague places with knowledge from the Web. Int. J. Geogr. Inf. Sci. 2008, 22, 1045–1065. [Google Scholar] [CrossRef] [Green Version]
  11. Mark, D.M.; Turk, A. Landscape categories in Yindjibarndi: Ontology, environment, and language. In Spatial Information Theory, Proceedings of the International Conference on Spatial Information Theory, Lecture Notes in Computer Science, Ittingen, Switzerland, September 24–28, 2003; Kuhn, W., Worboys, M.F., Timpf, S., Eds.; Springer: Berlin/Heidelberg, Germany, 2003; pp. 28–45. [Google Scholar]
  12. Manley, E.; Filomena, G.; Mavros, P. A spatial model of cognitive distance in cities. Int. J. Geogr. Inf. Science 2021. [Google Scholar] [CrossRef]
  13. Vasardani, M.; Winter, S.; Richter, K.F. Locating place names from place descriptions. Int. J. Geogr. Inf. Sci. 2013, 27, 2509–2532. [Google Scholar] [CrossRef]
  14. Mark, D.M. Toward a theoretical framework for geographic entity types. In Spatial Information Theory: A Theoretical Basis for GIS, Lecture Notes in Computer Sciences; Frank, A.U., Campari, I., Eds.; Springer: Berlin, Germany, 1993; pp. 270–283. [Google Scholar]
  15. Bennett, B.; Agarwal, P. Semantic categories underlying the meaning of ‘place’. In Spatial Information Theory: 8th International Conference, COSIT 2007, Lecture Notes in Computer Science (4736); Winter, S., Duckham, M., Kulik, L., Kuipers, B., Eds.; Springer: Berlin/Heidelberg, Germany, 2007; pp. 78–95. [Google Scholar]
  16. Twaroch, F.A.; Brindley, P.; Clough, P.D.; Jones, C.B.; Pasley, R.C.; Mansbridge, S. Investigating behavioural and computational approaches for defining imprecise regions. Spat. Cogn. Comput. 2019, 19, 146–171. [Google Scholar] [CrossRef]
  17. Hu, Y. Geospatial semantics. In Comprehensive GeographicInformation Systems; Huang, B., Ed.; Elsevier: Oxford, UK, 2018; pp. 80–94. [Google Scholar]
  18. Adams, B. Finding similar places using the observation-to-generalization place model. J. Geogr. Syst. 2015, 17, 137–156. [Google Scholar] [CrossRef]
  19. Mark, D.M.; Turk, A.G.; Stea, D. Progress on Yindjibarndi ethnophysiography. In Proceedings of the International Conference on Spatial Information Theory, Lecture Notes in Computer Science, Spatial Information Theory; Melbourne, Australia, 19–23 September 2007, Winter, S., Duckham, M., Kulik, L., Kuipers, B., Eds.; Springer: Berlin/Heidelberg, Germany, 2007; pp. 1–19. [Google Scholar]
  20. Xu, J.; Jing, Z.; Si, W.; Sun, H. Bi-linguistic study of natural-language understanding of spatial relations in Chinese and English. J. Remote Sens. 2008, 12, 362–369. [Google Scholar]
  21. Mark, D.M.; Egenhofer, M.J. Topology of prototypical spatial relations between lines and regions in English and Spanish. In Proceedings of the Auto Carto 12, Charlotte, NC, USA, 27 February–2 March 1995; pp. 245–254. [Google Scholar]
  22. Marchi Fagundes, C.K.; Stock, K.; Delazari, L.S. A cross-linguistic study of spatial location descriptions in New Zealand English and Brazilian Portuguese natural language. Trans. GIS 2021. [Google Scholar] [CrossRef]
  23. Reid, G.; Sieber, R.; Blackned, S. Visions of time in geospatial ontologies from indigenous peoples: A case study with the Eastern Cree in Northern Quebec. Int. J. Geogr. Inf. Sci. 2020, 34, 2335–2360. [Google Scholar] [CrossRef]
  24. Chen, H.; Vasardani, M.; Winter, S.; Tomko, M. A graph database model for knowledge extracted from place descriptions. ISPRS Int. J. Geoinf. 2018, 7, 221. [Google Scholar] [CrossRef] [Green Version]
  25. Vasardani, M.; Timpf, S.; Winter, S.; Tomko, M. From descriptions to depictions: A conceptual framework. In Spatial Information Theory, Proceedings of the International Conference on Spatial Information Theory, Lecture Notes in Computer Science, Scarborough, UK, September 2–6, 2013; Tenbrink, T., Stell, J., Galton, A., Wood, Z., Eds.; Springer: Berlin, Germany, 2013; pp. 299–319. [Google Scholar]
  26. Hu, Y.; Ye, X.; Shaw, S.-L. Extracting and analyzing semantic relatedness between cities using news articles. Int. J. Geogr. Inf. Sci. 2017, 31, 2427–2451. [Google Scholar] [CrossRef]
  27. Huang, Y. Conceptually categorizing geographic features from text based on latent semantic analysis and ontologies. Ann. GIS 2016, 22, 1–15. [Google Scholar] [CrossRef] [Green Version]
  28. Golledge, R.G.; Stimson, R.J. Spatial Behaviour: A Geographic Perspective; Guilford Publications: New York, NY, USA, 1997. [Google Scholar]
  29. Gould, P.; White, R. Mental Maps, 2nd ed.; Routledge: London, UK, 1986. [Google Scholar]
  30. Burigo, M.; Coventry, K. Context affects scale selection for proximity terms. Spat. Cogn. Comput. 2010, 10, 291–312. [Google Scholar] [CrossRef]
  31. Knauff, M. A neuro-cognitive theory of deductive relational reasoning with mental models and visual images. Spat. Cogn. Comput. 2009, 9, 109–137. [Google Scholar] [CrossRef]
  32. Mark, D.M.; Freksa, C.; Hirtle, S.C.; Lloyd, R.; Tversky, B. Cognitive models of geographical space. Int. J. Geogr. Inf. Sci. 1999, 13, 747–774. [Google Scholar] [CrossRef]
  33. Mark, D.M.; Smith, B.; Egenhofer, M.; Hirtle, S. Ontological foundations for geographic information science. In A Research Agenda for Geographic Information Science; McMaster, R., Usery, L., Eds.; CRC Press: Boca Raton, FL, USA, 2004; pp. 335–350. [Google Scholar]
  34. Kokla, M.; Guilbert, E. A review of geospatial semantic information modeling and elicitation approaches. ISPRS Int. J. Geoinf. 2020, 9, 146. [Google Scholar] [CrossRef] [Green Version]
  35. Wang, J.; Worboys, M. Ontologies and representation spaces for sketch map interpretation. Int. J. Geogr. Inf. Sci. 2017, 31, 1697–1721. [Google Scholar] [CrossRef]
  36. Moura, T.H.V.M.; Davis, C.A.; Fonseca, F.T. Reference data enhancement for geographic information retrieval using linked data. Trans. in GIS 2017, 21, 683–700. [Google Scholar] [CrossRef]
  37. Adams, B.; Janowicz, K. Thematic signatures for cleansing and enriching place-related linked data. Int. J. Geogr. Inf. Sci. 2015, 29, 556–579. [Google Scholar] [CrossRef]
  38. Janowicz, K.; Keßler, C. The role of ontology in improving gazetteer interaction. Int. J. Geogr. Inf. Sci. 2008, 22, 1129–1157. [Google Scholar] [CrossRef]
  39. Bordogna, G.; Fugazza, C.; Tagliolato Acquaviva d’Aragona, P.; Carrara, P. Implicit, formal, and powerful semantics in Geoinformation. ISPRS Int. J. Geoinf. 2021, 10, 330. [Google Scholar] [CrossRef]
  40. Cai, L.; Xu, J.; Liu, J.; Ma, T.; Pei, T.; Zhou, C. Sensing multiple semantics of urban space from crowdsourcing positioning data. Cities 2019, 93, 31–42. [Google Scholar] [CrossRef]
  41. Lai, J.; Lansley, G.; Haworth, J.; Cheng, T. A name-led approach to profile urban places based on geotagged Twitter data. Trans. GIS 2020, 24, 858–879. [Google Scholar] [CrossRef]
  42. Gao, S.; Janowicz, K.; Couclelis, H. Extracting urban functional regions from points of interest and human activities on location-based social networks. Trans. GIS 2017, 21, 446–467. [Google Scholar] [CrossRef]
  43. Wang, Y.; Gu, Y.; Dou, M.; Qiao, M. Using spatial semantics and interactions to identify urban functional regions. ISPRS Int. J. Geoinf. 2018, 7, 130. [Google Scholar] [CrossRef] [Green Version]
  44. Tu, W.; Cao, J.; Yue, Y.; Shaw, S.-L.; Zhou, M.; Wang, Z.; Chang, X.; Xu, Y.; Li, Q. Coupling mobile phone and social media data: A new approach to understanding urban functions and diurnal patterns. Int. J. Geogr. Inf. Sci. 2017, 31, 2331–2358. [Google Scholar] [CrossRef]
  45. Steiger, E.; Resch, B.; Zipf, A. Exploration of spatiotemporal and semantic clusters of Twitter data using unsupervised neural networks. Int. J. Geogr. Inf. Sci. 2016, 30, 1694–1716. [Google Scholar] [CrossRef]
  46. Chen, Y.; Gao, Y. Extracting and analyzing latent semantic characteristics of locations using social media data. J. Geoinf. Sci. 2017, 19, 1405–1414. [Google Scholar]
  47. Lansley, G.; Longley, P.A. The geography of Twitter topics in London. Comput. Environ. Urban Syst. 2016, 58, 85–96. [Google Scholar] [CrossRef] [Green Version]
  48. Liu, X.; He, J.; Yao, Y.; Zhang, J.; Liang, H.; Wang, H.; Hong, Y. Classifying urban land use by integrating remote sensing and social media data. Int. J. Geogr. Inf. Sci. 2017, 31, 1675–1696. [Google Scholar] [CrossRef]
  49. Blei, D.; Ng, A.; Jordan, M. Latent dirichlet allocation. J. Mach. Learn. Res. 2003, 3, 993–1022. [Google Scholar]
  50. Steyvers, M.; Smyth, P.; Rosen-Zvi, M.; Griffiths, T. Probabilistic author-topic models for information discovery. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’04; Seattle, WA, USA, 22–25 August 2004, Association for Computing Machinery: New York, NY, USA, 2004; pp. 306–315. [Google Scholar]
  51. Phan, X.-H.; Nguyen, L.; Horiguchi, S. Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In Proceedings of the 17th Information Conference on World Wide Web (WWW’08), Beijing, China, 21–25 April 2008; ACM: New York, NY, USA, 2008; pp. 91–100. [Google Scholar]
  52. Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, Minnesota, 2–7 June 2019; Volume 1, pp. 4171–4186. [Google Scholar]
  53. Peters, M.E.; Neumann, M.; Iyyer, M.; Gardner, M.; Clark, C.; Lee, K.; Zettlemoyer, L. Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, Louisiana, 1–6 June 2018; Volume 1, pp. 2227–2237. [Google Scholar]
  54. Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I. Improving Language Understanding by Generative Pre-Training. 2018; Available online: https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf (accessed on 20 June 2020).
  55. Sun, Y.; Wang, S.; Li, Y.; Feng, S.; Chen, X.; Zhang, H.; Tian, X.; Zhu, D.; Tian, H.; Wu, H. Ernie: Enhanced representation through knowledge integration. arXiv 2019, arXiv:1904.09223. [Google Scholar]
Figure 1. Study area and locations of geotagged Weibo texts.
Figure 1. Study area and locations of geotagged Weibo texts.
Ijgi 10 00682 g001
Figure 2. Schematic of the research framework employed in this study.
Figure 2. Schematic of the research framework employed in this study.
Ijgi 10 00682 g002
Figure 3. Distribution and adjusted standardized Chi-square residuals for the top 20 topics in the four datasets (unit: %).
Figure 3. Distribution and adjusted standardized Chi-square residuals for the top 20 topics in the four datasets (unit: %).
Ijgi 10 00682 g003
Figure 4. Spatial distribution of topics extracted from resident microblogs.
Figure 4. Spatial distribution of topics extracted from resident microblogs.
Ijgi 10 00682 g004
Figure 5. Spatial distribution of topics extracted from tourist microblogs.
Figure 5. Spatial distribution of topics extracted from tourist microblogs.
Ijgi 10 00682 g005
Figure 6. Spatial distributions of the perception strength of different semantics for residents (left column) and tourists (middle column). Right column shows the difference between the two, where the green color indicates that residents feel more strongly than tourists about a topic, and the red color indicates that tourists feel more strongly than residents about a topic.
Figure 6. Spatial distributions of the perception strength of different semantics for residents (left column) and tourists (middle column). Right column shows the difference between the two, where the green color indicates that residents feel more strongly than tourists about a topic, and the red color indicates that tourists feel more strongly than residents about a topic.
Ijgi 10 00682 g006
Figure 7. Distributions and adjusted standardized Chi-square residuals for the top 20 topics in the Tibetan resident dataset by city (unit: %).
Figure 7. Distributions and adjusted standardized Chi-square residuals for the top 20 topics in the Tibetan resident dataset by city (unit: %).
Ijgi 10 00682 g007
Figure 8. Distributions and adjusted standardized Chi-square residuals for the top 20 topics in the Tibetan tourist dataset by city (unit: %).
Figure 8. Distributions and adjusted standardized Chi-square residuals for the top 20 topics in the Tibetan tourist dataset by city (unit: %).
Ijgi 10 00682 g008
Figure 9. Distributions and adjusted standardized Chi-square residuals for the top 20 topics in the Qinghai resident dataset by city (unit: %).
Figure 9. Distributions and adjusted standardized Chi-square residuals for the top 20 topics in the Qinghai resident dataset by city (unit: %).
Ijgi 10 00682 g009
Figure 10. Distributions and adjusted standardized Chi-square residuals for the top 20 topics in the Qinghai tourism dataset by city (unit: %).
Figure 10. Distributions and adjusted standardized Chi-square residuals for the top 20 topics in the Qinghai tourism dataset by city (unit: %).
Ijgi 10 00682 g010
Figure 11. Classification of cities in the Qinghai–Tibetan Plateau according to their semantic similarity for (a) residents and (b) tourists.
Figure 11. Classification of cities in the Qinghai–Tibetan Plateau according to their semantic similarity for (a) residents and (b) tourists.
Ijgi 10 00682 g011
Table 1. Number of geotagged Weibos texts and users.
Table 1. Number of geotagged Weibos texts and users.
Number of UsersNumber of Geotagged Weibos
Before CleaningAfter Cleaning
TibetLocal users14,317151,941117,895
Tourists43,062267,719215,525
QinghaiLocal users31,948361,769278,021
Tourists70,063498,529432,454
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Xu, J.; Hu, L. Geospatial Semantics Analysis of the Qinghai–Tibetan Plateau Based on Microblog Short Texts. ISPRS Int. J. Geo-Inf. 2021, 10, 682. https://doi.org/10.3390/ijgi10100682

AMA Style

Xu J, Hu L. Geospatial Semantics Analysis of the Qinghai–Tibetan Plateau Based on Microblog Short Texts. ISPRS International Journal of Geo-Information. 2021; 10(10):682. https://doi.org/10.3390/ijgi10100682

Chicago/Turabian Style

Xu, Jun, and Lei Hu. 2021. "Geospatial Semantics Analysis of the Qinghai–Tibetan Plateau Based on Microblog Short Texts" ISPRS International Journal of Geo-Information 10, no. 10: 682. https://doi.org/10.3390/ijgi10100682

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop