Landscape Analysis of Geographical Names in Hubei Province, China

Hubei Province is the hub of communications in central China, which directly determines its strategic position in the country's development. Additionally, Hubei Province is well-known for its diverse landforms, including mountains, hills, mounds and plains. This area is called " The Province of Thousand Lakes " due to the abundance of water resources. Geographical names are exclusive names given to physical or anthropogenic geographic entities at specific spatial locations and are important signs by which humans understand natural and human activities. In this study, geographic information systems (GIS) technology is adopted to establish a geodatabase of geographical names with particular characteristics in Hubei Province and extract certain geomorphologic and environmental factors. We carry out landscape analysis of mountain-related geographical names and water-related geographical names respectively. In the end, we calculate the information entropy of geographical names of each county to describe the diversity and inhomogeneity of place names in Hubei province. Our study demonstrates that geographical names represent responses to the cultural landscape and physical environment. The geographical names are more interesting in specific landscapes, such as mountains and rivers.


Introduction
Landscape toponyms (also called geographical names or place names) are the aspect of the landscape that reflects the link between a physical space and human perception; toponyms also convey the understanding and interpretation of this space and offer a unique approach for studying landforms and the cultural heritage of particular regions.Some toponyms are derived from natural landscapes as well as human perceptions and reflect the relationships between humans and the environment, such as water resources and landforms.Other toponyms are human-centric geography names, which merely reflect human entities, such as settlements and civil constructions.Toponyms are not only linguistic forms but also cultural and societal artefacts that offer insights into the history, settlements, habitats, water and environmental perceptions of a certain culture [1][2][3][4][5][6].Consequently, the interpretation of toponyms and the analysis of their spatial arrangements can help us examine the spatial characteristics of the geographical name landscape.
Hubei Province has a total of 14 prefecture-level cities.Each prefecture-level city has its own administrative subdivisions that include districts, counties, autonomous counties, county-level cities and subordinate villages.There are 80 county-level administrative districts in total.Besides, Hubei Province is known for having abundant water sources and a local mountainous area.As the largest province located in the middle of China, Hubei Province also has a large population.In this paper, we analyse the statistics of toponyms of the 80 county-level administrative districts and select the 49 words that are used most frequently.We then use query language to determine how many times these words appear in each county-level administrative district.These words are classified into five types based on their linguistic meaning (mountain, water, plain, settlement or construction).And then the geodatabase is created which contains the numbers of words of each type that happen to every county.Through visualisation, we can more clearly determine the dimensions and density of the toponym landscape by showing the distribution probability of the landscape-related words of toponyms in Hubei Province in a thematic map.
On this basis, we extract certain environmental factors that are related to the formation of toponyms from topographic data.Specifically, elevation and slope factors are extracted to study the correlativity between mountain-related and plain-related toponyms and landscape.In addition, we calculate the information entropy of all types of geographical names such as water features, plain features, mountain features, settlement features and urban features in each county in Hubei province.From the result we can know that the density of place names in central region is much smaller than eastern and western of Hubei province.
By applying spatial analysis and statistical analysis, we figure out the regularity of the spatial arrangement of the geographical names in Hubei Province and that allowed draw out the main conclusions of the research.Geographical names do represent responses to the cultural landscape and physical environment and the distribution of geographical names is correlated with the landscape to some extent.However, toponyms are also an important part of the cultural heritage, which means the formation of geographical names is a historical process and may develop with changes in people's perceptions.We calculate the information entropy from place names of each county, and find that the bigger information entropy, the more uniform the place names types of distribution.
The application of geographic information systems (GIS) technology represents a major advancement of toponymic studies because GIS enables the systematic examination of spatial patterns of geographical names as well as their association with other human and environmental factors.In this study, ArcGIS 10 software, ENVI 4.7 software, and the SPSS 19 statistical program were used as analysis tools for spatial management, data manipulation and thematic mapping.
The main goals of the research are to figure out the regularity of distribution of toponyms and the relationship between geographical names and landscape types through GIS and spatial analysis.And information entropy can reflect the uncertainty and diversity of place names in Hubei province.By comparing the information entropy, we can conclude the place names' distribution and types of place names.

Previous Work
Previous work has been performed on the application of GIS in toponymic studies.Wang used GIS to examine the spatial patterns of particular Tai toponyms and their relationship with terrain characteristics in South China and Southeast Asia [7].Wang analysed the spatial characteristics of a place names landscape based on GIS technology in Guangdong Province, China [8].Wang built a GIS database of place names in Guangxi, China, analysed how the spatial distributions of Zhuang and Han place names were linked to manmade and physical environments, and analysed how the patterns changed over time [9].Jung studied the Korean perceptions of the environment based on an examination of the toponymic terms used in conjunction with Korean village names during the early twentieth century [10].To establish the relationship between field names and physical spaces, Seidl analysed field names in a select area and determined if and how they can be used in the planning and management of contemporary landscapes [11].In an analysis of geographic names in Zhongwei County based on the character frequency, origin and cultural landscape of names using quantitative and statistical methods, Jian-hua Li concluded that the distribution of the landscape toponyms was consistent with the topography [12].Yan-bo suggested that the origin and development of each place name had a close relationship with the historical geography and culture background [13].A large amount of interior and exterior cultural information is available on the distributions of environments, geographies and various frontiers.O'Connor examined the landscape terminology and place names in the Chontal region in the state of Oaxaca in southern Mexico by focusing on terms from the Lowland Chontal, which is a highly endangered language spoken near the Pacific Coast [14].In addition to the linguistic analysis, O'Connor presented a general description of the physical geography of the area and how it relates to settlement patterns and subsistence activities.
Boillat et al. deepened the search for ecosystem-like concepts in indigenous societies by highlighting the importance of place names used by Quechua indigenous farmers in the Central Bolivian Andes [15].Čargonja et al. investigated how toponyms for which the names of the plants can be recognised mostly represent the local climazonal vegetation of an area as well as the ethno-linguistic and socio-cultural motives embedded in the lives of the people [16].Senft studied landscape terms and place names in the Trobriand Islands-the Kaile' una subset-and presented a critical discussion of the landscape terms and the proposed typology for place names [17].Vasardani et al. reviewed the current literature on geographic information retrieval based on place names.They focused on three interrelated research areas: (1) the use of place names in gazetteers, (2) the use of formal models to explain spatial relationships and the spatial extent of place names in linguistic place descriptions, and (3) web-harvesting and crowd-sourcing techniques for identifying place names and their spatial extension from public and volunteer sources, such as social networks and photo-sharing sites [18][19][20].
Freitas et al. research was designed mainly to: (i) propose an original hydrotoponymical classification based on historical hydrogeography, hydrogeology and hydrohistorical inventories; (ii) analyse a dynamic assessment of the evolution of hydrotoponymy in the Porto urban region throughout the time; and (iii) stresses the importance of the hydrotoponymical role had in improving the hydrogeological conceptual modelling for old large urbanised areas [3].
Burenhult depicted the relationship of linguistic categories of landscape terms and place names based on first-hand data [21].Many researchers are using GIS and information technology to study the relationship of place names and landscapes in many fields, such as linguistics, anthropology, epistemology, ontology, and environmental theory [22][23][24][25][26][27][28][29].
The main difference between this study and the approaches described in the aforementioned publications is the combined use of spatial analysis of GIS data and a data mining module.Through comparisons, most of the previous selected toponymical works studied geographical names through qualitative description.However, in this paper, we count the words in toponyms via statistics and classify them into five types based on linguistic and semantic analysis.Besides, quantitative study is another specific aspect of our research.Through factors extraction and correlation analysis, the correlation between landscape and toponyms are studied.

Study Area
Hubei Province (Figure 1) is located in Central China (29°05′-33°20′S, 108°21′-116°07′E) and extends across the Yangtze River.Hubei Province covers an area of 185,900 km 2 , which accounts for 1.94% of the total area of China.The province has developed Archean-Cenozoic strata and ultrabasic, basic, medium acidic, acidic, alkaline igneous rocks and various metamorphic rocks.The proportions of strata areas are as follows: sedimentary rocks, 61%; metamorphic rocks, 32%; and igneous rocks, 7%.The proportions of various landforms of the total provincial area are as follows: mountains, 55.5%; hills and hillocks, 24.5%; and plains and lake areas, 20%.Consisting of 14 prefecture-level cities, Hubei Province has a permanent population of 57.79 million.Hubei Province is a multiracial province.The minority nationalities account for 4.31% of the total population in the province, and the autonomous area of minority nationalities is approximately 30,000 km 2 , accounting for nearly one-sixth of the total area.
The eastern, western and northern areas of Hubei Province are surrounded by mountains, such as Wuling Mountain, Daba Mountain, Wudang Mountain, and Dabie Mountain.The Jianghan Plain crosses central and southern Hubei Province where the terrain is relatively flat.The Yangtze River flows across 26 counties and cities in Hubei Province from west to east and flows a total distance of 1041 km.The tributaries of the Yangtze River within Hubei Province include the Han River, Zhang River, Qing River, Dongjing River, Lu River, and Fu River.Among these tributaries, the Han River, which flows a distance of 858 km, is the largest tributary along the midstream of the Yangtze River.From northwest to southeast, the Han River flows across 13 counties and cities and flows into the Yangtze River at Wuhan City, the capital of Hubei Province.Furthermore, Hubei Province is called "The Province of Thousand Lakes" due to its abundant water resources.The lakes within the province have a total area of 2983.5 km 2 .Lakes Honghu, Changhu, Liangzi and Futou each cover more than 100 km 2 .

Data Sources
The toponyms dataset of Hubei Province that was surveyed by the Department of Civil Affairs of Hubei Province was used to conduct the statistical analysis.The dataset contains the attribute data of place names of the 80 county-level administrative districts in Hubei Province.The toponyms dataset was recorded in a geodatabase for subsequent analysis.
Vector data of the administrative districts of Hubei Province acquired from the National Fundamental Geographic Information System (NFGIS) were used for clipping features and the base map.This map of the study area is crucial for the visualisation process.
The vector data spatial arrangement of the rivers greater than 4th order were acquired from the National Fundamental Geographic Information System.The data were used to display the spatial arrangement of rivers during the process of visualisation and to extract water-related factors while analysing the spatial correlation.
A digital elevation model (DEM) with a spatial resolution of 30 m was downloaded from the ASTER GDEM distribution site (http://gdem.ersdac.jspacesystems.or.jp/).The data processing of the DEM involved mosaicking and clipping.The hillshade of Hubei Province was generated from the DEM.The elevation and slope factors were extracted for the analysis of the spatial correlation.

Toponyms Data
To obtain a record of the words that appear most frequently in the 80 county-level administrative districts in Hubei Province, we developed a program to obtain statistics of the first-hand data [30].Some researchers combine place names and the ecological environment to study landscapes and social-ecological systems [31][32][33][34].After selecting the 49 words that are used most frequently, we applied these words to each county record and obtained 80 records that show how many times these words appear in each county.After classifying these words into five types (mountain, water, plain, settlement or construction) based on their linguistic meanings, we added these five types to the fields of the geodatabase and applied the numbers of words of each type to the records of the counties.Finally, we obtained 80 records that show how many times the words of each type appear in every county.During the spatial analysis, this statistical result played an important role because the main objective of this analysis was to determine how the landforms affect the spatial arrangement of geographical names.The classification of the 49 words used most frequently in the geographical names of Hubei Province is shown in Table 1.

Creating the Geodatabase
During data processing, we classified the 49 words that were used most frequently in the geographical names into five types: mountain, water, plain, settlement or construction.We then added these five types to the fields of the geodatabase and added the numbers of words of each type to the records of the counties.Finally, we obtained 80 records showing how many times the words of the five types appeared in every county (Table 2).We then entered these data into the attribute table of the administrative district map of Hubei Province at the county level.The spatial correlation analysis was performed based on these results.

DEM Data
An image of the DEM mosaic with a spatial resolution of 30 m was generated after processing the mosaic with Envi software.The image contained the entire dimensions of Hubei Province, although the limits extended beyond the precise boundary of Hubei Province.We then added the image data and the vector data of the administrative district of Hubei Province using ArcMap 10 software.By applying the vectors as a clip feature using a geoprocessing function, we obtained the DEM of Hubei Province with a spatial resolution of 30 m (Figure 2).

The Spatial Arrangement of Rivers
To guarantee the accuracy of the analysis, we chose a layout of rivers that are greater than 4th order; the Yangtze River is a 1st order river that flows across Hubei Province.Again, the vector data of the administrative district of Hubei Province were added to ArcMap10 and used as the clip feature during geoprocessing.The resulting figure (Figure 3) shows the spatial arrangement of the rivers greater than 4th order in Hubei Province.As shown in Figure 3, some counties include or are bordered by rivers, whereas others are not located near rivers.However, the resulting figure only provides a visual representation rather than a quantitative analysis.Therefore, water-related factors were extracted from the data; these factors are crucial for further analysis of the spatial correlation.

Flowchart
The flowchart for landscape analysis of geographical names is shown in Figure 4.It presents the data processing, visualization, hillshade analysis, regression analysis, information analysis and the conclusion.Data preparation and processing was introduced above, and the section analysis the relationship of geographical names and its natural landscape by visualization, hillshade analysis, regression analysis and information entropy.

Visualisation
Because the statistics were recorded and digitised, the figures below show the absolute counts of landscape-related toponyms (mountain, plain and water) in the 80 county-level administrative districts (Figure 5).In general, the trends of these three types were coincident, meaning that the counts of the three types of landscape-related toponyms showed a parallel trend.However, because the total number of toponyms of each county varied, the absolute numbers were not appropriate.Consequently, another figure was created in which the relative values were expressed as percentages.The relative abundances of landscape-related toponyms (mountain, plain and water) of the 80 county-level administrative districts are shown in Figure 6.This figure is much better suited for explaining differences in the quantity of the landscape-related toponyms in the 80 counties.

Spatial Analysis-Hillshade
In ArcMap 10 software, the "Raster Surface: Hillshade" function in ArcToolbox-3D Analyst was used to obtain the shaded relief from the DEM of Hubei Province.After several trials varying the parameters and colour grading, we obtained the results shown in Figure 7.    Based on the hillshade figure, the terrain is relatively high in eastern, western and northern Hubei Province, whereas the southern and middle areas are much flatter.In Hubei Province, The elevation values were divided into seven levels that show the various terrains of Hubei Province.Moreover, the watersheds can be identified in the hillshade image where the green colour is relatively light, i.e., locations of lower elevations.We assume that because Hubei Province features complex and various landforms, there is a high probability that the spatial arrangement of geographical names is associated with the landforms.To test this hypothesis, we combined the attribute data that were already processed using the data showing the general terrain of Hubei Province.To determine whether the layout of the different types of toponyms correlates with the landscape, we chose to use a thematic map, which provides more information and a more intuitive and visual representation.The relative numbers of landscape-related toponyms (mountain, plain and water) were selected to create a histogram.This chart directly and clearly shows the distribution of toponyms associated with the landscape (Figure 8).Based on the thematic map, the percentage of water-related toponyms is higher in those counties that border rivers.Moreover, in the eastern, western and northern areas where the elevation is relatively high, mountain-related toponyms generally appear more frequently.

Elevation and Slope Factors Extraction
The DEM and the vector data of the administrative districts of Hubei Province provided elevation information and were used to extract the slope information of each county in Hubei Province.We then reclassified the elevation and slope according to the following standards [35,36].The value chosen that had the highest number of pixels is as the dominant factor in each county.According to the landform classification standards, if elevation-slope value was 2-2 (the minimal reclassified value is 2), then the landform was classified as a plain, and the other combinations of values were all classified as mountains.If the type was a plain, then the county was assigned a value of 0. If the type was a mountain, then a value of 1 was assigned.We recorded these data in a table for further study.For each county, if words in the "plain" category appeared more frequently than those in the "mountain" category, then the county was assigned a value of 0, otherwise the county was assigned a value of 1.The data were also recorded in the same table.We then used SPSS software to create cross-tables (Table 4).These results show that in the 68 counties having more mountain-related toponyms than plain-related toponyms, 40 counties have predominantly mountainous terrain.In the 12 counties that have more plain-related toponyms than mountain-related toponyms, two counties have predominantly plains terrain.However, regarding those 10 counties for which the value of the toponym landscape is 0-1, the average of the relative difference between the abundance of mountain-related words and plain-related words is 0.04.In general, the results have some explanatory value.The spatial arrangement of the geographical names related to specific landscapes generally coincides with the actual landforms, although there are exceptions.However, a geographical name conveys the regional culture and the influences of nature, history, tradition, society, and culture.Geographical names are a cultural phenomenon.The process of naming and renaming toponyms is characterised by occasionality, which means totally relying on modern technology may not reflect all the actual situation of geographical names; thus qualitative analyses regarding culture are indispensable.Moreover, the formation of geographical names is a historical process and may develop with changes in people's perceptions.

The Area on River Systems
Before we analysis the relationship of river systems and geographical names, the river systems area should be chosen.In the study, we only had chosen the typical area which lie in the area of Yangtze River and Han River for the other are more likely plain-related or mountain-related geographical names.The 20 counties were chosen displayed in Figure 9.And we calculated the related indexes such as the river systems density and the ratio of geographical names with water features in all geographical names in the 20 counties for further analysis.

Regression Analysis of Geographical Names and River Systems
Regression analysis is a statistical analysis method which used to identify the quantitative relationship of interdependence between the two or more variables.It's often used to reveal the relationship of geographical entity or epidemiological diseases and the surrounding environment in geographical study.The paper uses the method to study the relationship of geographical names and river systems in Huber province, and the river systems density and the ratio of geographical names with water features in all geographical names were considered as dependent variables.We calculate the two variables through the formula as follows: RWN= The Water Features Geographical Names Number / All Geographical Names = RSD is the river systems density and RWN is the ratio of geographical names with water features in all geographical names.Twenty county administrations which located in Yangtze River or Han River were chosen in the study, and the others county administration were excluded for there is no big river in.And 20 counties which labelled on Figure 9 were selected for the regression analysis to demonstrate the relationship of geographical names with water features and river systems.First, we calculate the river systems density and the ratio of geographical names with water features in all geographical names each county through the formula aforementioned, and the river length and the area of each county can be computed from ArcGIS 10 through the river systems data displayed in Figure 3, in addition, the number of water features geographical names and all geographical names can be get from Table 2. Then we use the SPSS software to obtain the result of regression analysis as follows: As shown from Table 5, the R is 0.821 and the adjusted R square is 0.657 which mean there is a strong relationship between the river systems density and the ratio of geographical names with water features in all geographical names.And the result is quite well fit for regression analysis.From Table 6 we can see the regression analysis of variance.The Sig. is 0.000 which less than 0.05 that signified there is statistical significance in regression analysis.And in the coefficients of regression analysis we can get the regression equation between the river systems density(X) and the ratio of geographical names with water features in all geographical names(Y) as follows: From Tables 5-7 we can conclude there is a strong relationship between the river systems density and the ratio of geographical names with water features in all geographical names with statistical significance.And in Hubei province, the conclusion can be draw from the regression analysis: the more rivers, the more geographical names with water feature.In some case, the geographical names is a reflection of the natural geographical landscape features.

Information Entropy Calculation
Information entropy is a measurement of information which can be used to evaluate the relationship of object of study and its influencing factors [37].Sukhov [38] propose a method to calculate map statistic information value from purely statistical which take symbol type as a statistical object to compute the probability of each symbol appeared on the map.Neumann [39] present to use dual graph generated by contour and calculate the information entropy of the dual graph for the calculation of topological information.Li [40] put forward by establishing the Voronoi diagram to measure map geometric information, thematic information and the method of topological information.We calculate the information entropy of all types of geographical names such as water features, plain features, mountain features, settlement features and urban features in each county in Hubei province.The information entropy calculation formula is as follows: is the probability of place names with some feature such as water features, plain features, mountain features, settlement features and urban features in all place names in each county, and ( ) H x is the information entropy.
From the formula above we can conclude information uncertainty that the probability of events is inversely proportional to information entropy.From Table 2 we can calculate the ( ) i Ρ of geographical names with river features in all of geographical names each county.Table 8 reveals the information entropy of geographical names each county, and the average information entropy is 0.61, the highest information entropy is the Yingcheng County which is 0.6802 while the lowest is the Hannan District in Wuhan city which is 0.3827.We know most of geographical names in Hannan district have water features, and the distribution of various types of geographical names in Yingcheng County are much uniform, therefor the information entropy of Hannan District is low and the Yingcheng County is high.From Table 8 we can see the diversity of geographical names which are affected by all of natural landscape, cultural relicts and architectures and so on [41][42][43][44].Sweeney used place names to interpret floodplain connectivity in the Morava River [45].And we can conclude the diversity of geographical names in Hubei province through information entropy which varies from 0.3827 to 0.6802 from Table 8.We calculate the information entropy from place names of each county, and find that the bigger information entropy, the more uniform the place names types of distribution.By comparing the geographcial names information entropy with water features or mountain features, the HanNan District, Qianjiang City, Yunmeng County and Jianli County which lies in JiangHan Plain have much more geographical names with water features than mountain features, while the Yingshan County, Macheng City, Luotian County which lies in the DaBie Mountain have much more place names with mountain features than water features.From the result we can know that the density of place names in the central region is much smaller than in eastern and western Hubei Province.The water entropy contribution is the water entropy divided by all entropy, so does the mountain entropy contribution in Table 8.We can draw the conclusion that the more river systems, the more geographical names with water features, the more mountains, the more geographical names with mountain features in each county in Hubei Province through water entropy contribution and mountain entropy contribution.

Conclusions
A geographical name is a reflection of the regional culture.Thus, geographical names are also an important part of the cultural heritage.Using statistical and GIS methods, this study directly and clearly demonstrated the correlation of geographical names at the county and landscape levels in Hubei Province.In addition, were determined that there are relationships between geographical names and landscape types.Geographical names are one of the oldest linguistic cultures in human history.The formation of geographical names is a historical process and may develop with changes in people's perceptions.Thus it's still reasonable and valuable of our research though it can't explain for every single case.
Geographical names remain important cultural sources and spatial records of past generations.Because they are distinctly associated with the rural landscape, further research should be directed at the possibility of applying geographical names when reorganising agricultural land; this process is likely to occur in the near future.
This study analysed the landscape of place names in Hubei Province.Through GIS-based mapping, we clearly illustrated the probability distribution of landscape-related words of toponyms in Hubei Province.GIS-based mapping are becoming more important tools in landscape analysis for its powerful spatial analysis and display function [46].Geographical names represent responses to the cultural landscape and physical environment.Combined with elevation and slope features, geographical names are more interesting in specific landscapes like mountains.There is a strong relationship between the river systems density and the ratio of geographical names with water features in all geographical names with statistical significance.In Hubei Province, the conclusion can be draw from the regression analysis: the more rivers, the more geographical names with water features.In some case, the geographical names are a reflection of the natural geographical landscape features.We calculate the information entropy from place names of each county, and find that the bigger the information entropy, the more uniform the types of place name distribution.Information entropy can reflect the uncertainty and diversity of place names in Hubei Province.By comparing the information entropy, we can conclude the place names distribution and types of place names.From the result we can know that the density of place names in the central region is much smaller than in eastern and western Hubei Province.

Figure 2 .
Figure 2. The DEM of Hubei Province with a spatial resolution of 30 m.

Figure 3 .
Figure 3.The spatial arrangement of rivers greater than 4th order in Hubei Province.

Figure 4 .
Figure 4.The flowchart for landscape analysis of geographical names.

Figure 8 .
Figure 8.The distribution probability of the landscape-related words of toponyms in Hubei Province.

Figure 9 .
Figure 9.The area on river systems for regression analysis.

Table 1 .
The classification of the 49 words.

Table 2 .
The numbers of words classified into five types in 80 counties.

Table 3 .
Classification standards for the elevation and slope.

Table 4 .
Cross-table of words appearing in toponyms and landscapes.

Table 5 .
The regression analysis model summary.

Square Adjusted R Square Std. Error of the Estimate
a Predictors: (Constant), the river systems density.

Table 6 .
Regression analysis of variance.
a Dependent Variable: the ratio of geographical names with water features in all geographical names; b Predictors: (Constant), the river systems density.

Table 7 .
The coefficients of regression analysis.Dependent Variable: the ratio of geographical names with water features in all geographical names. a

Table 8 .
Information entropy of geographical names of each county.