Scientometric Analysis for Spatial Autocorrelation-Related Research from 1991 to 2021

Spatial autocorrelation describes the interdependent relationship between the realizations or observations of a variable that is distributed across a geographical landscape, which may be divided into different units/areas according to natural or political boundaries. Researchers of Geographical Information Science (GIS) always consider spatial autocorrelation. However, spatial autocorrelation research covers a wide range of disciplines, not only GIS, but spatial econometrics, ecology, biology, etc. Since spatial autocorrelation relates to multiple disciplines, it is difficult gain a wide breadth of knowledge on all its applications, which is very important for beginners to start their research as well as for experienced scholars to consider new perspectives in their works. Scientometric analyses are conducted in this paper to achieve this end. Specifically, we employ scientometrc indicators and scientometric network mapping techniques to discover influential journals, countries, institutions, and research communities; key topics and papers; and research development and trends. The conclusions are: (1) journals categorized into ecological and biological domains constitute the majority of TOP journals;(2) northern American countries, European countries, Australia, Brazil, and China contribute the most to spatial autocorrelation-related research; (3) eleven research communities consisting of three geographical communities and eight communities of other domains were detected; (4) hot topics include spatial autocorrelation analysis for molecular data, biodiversity, spatial heterogeneity, and variability, and problems that have emerged in the rapid development of China; and (5) spatial statistics-based approaches and more intensive problem-oriented applications are, and still will be, the trend of spatial autocorrelation-related research. We also refine the results from a geographer’s perspective at the end of this paper.


Introduction
Spatial autocorrelation (SA) is a concept employed by researchers in a wide range of disciplines and whose datasets have locational information. The essential cause of SA may be geographical or locational proximity. As the first law of geography [1] says: "everything is relating to everything else, but near things are more related than distant things"; hence, SA is a widely existing geographical characteristic. Thanks to the endeavors of pioneering quantitative geographers (e.g., the "Washington School" [2]), in the 1950-1960s, SA drew much attention and became the central part of quantified geography. However, the first mention of SA appeared in 1968 (i.e., [3,4]), before which it was called spatial association, spatial dependence, spatial interaction, etc. The seminal works of the geographer Cliff and the statistician Ord [5,6] lay the theoretical foundation of SA, which inspired its flourishing development (i.e., [7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22][23]) across the natural and social sciences. Therefore, SA is also subsequent sections are organized as follows: Section 2 presents the datasets and scientometric methodologies which are employed in this paper. Section 3 provides the results and analyses, where Sections 3.1 and 3.2 present influential journals, countries, and institutions, respectively, and Section 3.3 explores SA research communities. Section 3.4 focuses on important topics and papers of SA research, and Section 3.5 analyzes the development trajectory and possible future research trends. Section 4 discusses the results and refines them from a GIS researcher's perspective. Section 5 draws some conclusions.

Datesets
The bibliographical datasets analyzed in this paper were retrieved from the Web of Science Core Collection (WOSCC) database. The detailed search conditions are listed in Table 1. Specifically, the Science Citation Index Expanded (SCI-E) and the Social Science Citation Index (SSCI) databases were used, because they are the most recognized data sources. After setting these conditions, 8461 records were obtained. We chose the terminology "spatial autocorrelation" as the keyword when researching this topic because "spatial autocorrelation" may be more generic than other expressions, such as "spatial correlation", "spatial covariogram", "spatial association", "spatial dependence", "spatial interaction", etc., which can also represent spatial autocorrelation. For example, spatial covariograms or spatial correlograms are often used in geostatistics, and spatial association and spatial dependence are chosen by some researchers of spatial econometrics, but spatial autocorrelation can cover them all. In fact, when we talk about spatial autocorrelation, we not only think of basic spatial autocorrelation statistics (global or local), but also of semivariograms in geostatistics, spatial autocorrelation-related regression models that are widely employed by spatial econometricians as well as scholars in other fields, and other methods in ecology and biology.
The overall disciplinary distribution of the top ten subjects is depicted in Figure 1, which shows that ecology accounted for more than 2000 papers and is the most active and productive discipline relating to SA research. Other highly productive disciplines include environmental sciences/studies, biodiversity conservation, geography, multidisciplinary geosciences, etc. Figure 1 also shows that the percentage of papers in the geographical domain is about 19%, which indicates that other disciplines account for around 80% of SA-related research.
Except for the disciplinary information, the bibliographical datasets also contain the complete information of metadata, including authors, institutions, publication names, citation counts, and references. Hence, not only can the basic bibliography be extracted but also the network structures can be formed by picking the relevant information from the dataset. These items constitute original materials for scientometric analyses. In this paper, we employed several techniques, including scientometric indexes, co-authorship/words/citations analysis, to conduct our research. Except for the disciplinary information, the bibliographical datasets also contain the complete information of metadata, including authors, institutions, publication names, citation counts, and references. Hence, not only can the basic bibliography be extracted but also the network structures can be formed by picking the relevant information from the dataset. These items constitute original materials for scientometric analyses. In this paper, we employed several techniques, including scientometric indexes, co-authorship/words/citations analysis, to conduct our research.

Scientometric Indexes
For the datasets, three scientometrical indexes, the records (Recs), the Total Local Citation Score (TLCS), and the Total Global Citation Score (TGCS), were used to quantify the institutions as well as countries/regions. The Recs counts the frequency of an object (e.g., an institution) appearing in the dataset and thus describes the popularity of this object. The TLCS is the citation counts within the 8461 papers; it quantifies the SA-related domains impact. Additionally, the TGCS is the citation counts within papers in WOSCC; it measures the global impact. A phenomenon that has been observed is that some journals are so professional that they have a small audience, and thus their impact factors (IF) are relatively low, but they have good reputations among scholars within some domains; the TLCS is a metric that represents specialization. In contrast, the TGCS is a metric that represents universality. We used HistCite [39] to report the three counts.

Scientometric Network Mapping
(1) Co-authorship analysis Co-authorship analysis is often conducted to discover research/academic communities or schools by presenting cooperative relationship networks. There are three types of co-authorship analyses, i.e., authors analysis, institutions analysis, and countries/regions analysis [40]. We only focus on author analysis in this paper, aiming to find out scholars who have similar research interests. VOSviewer [41] was employed to carry out this work, with authors in the same community presented by the same color and connected by links. More specifically, each author is shown by a node, and co-authorship is expressed by links among the nodes. To visualize the co-author network and simultaneously reflect the leading scholar(s), we set the weight as "normalized citations", which

Scientometric Indexes
For the datasets, three scientometrical indexes, the records (Recs), the Total Local Citation Score (TLCS), and the Total Global Citation Score (TGCS), were used to quantify the institutions as well as countries/regions. The Recs counts the frequency of an object (e.g., an institution) appearing in the dataset and thus describes the popularity of this object. The TLCS is the citation counts within the 8461 papers; it quantifies the SA-related domains impact. Additionally, the TGCS is the citation counts within papers in WOSCC; it measures the global impact. A phenomenon that has been observed is that some journals are so professional that they have a small audience, and thus their impact factors (IF) are relatively low, but they have good reputations among scholars within some domains; the TLCS is a metric that represents specialization. In contrast, the TGCS is a metric that represents universality. We used HistCite [39] to report the three counts.

Scientometric Network Mapping
(1) Co-authorship analysis Co-authorship analysis is often conducted to discover research/academic communities or schools by presenting cooperative relationship networks. There are three types of coauthorship analyses, i.e., authors analysis, institutions analysis, and countries/regions analysis [40]. We only focus on author analysis in this paper, aiming to find out scholars who have similar research interests. VOSviewer [41] was employed to carry out this work, with authors in the same community presented by the same color and connected by links. More specifically, each author is shown by a node, and co-authorship is expressed by links among the nodes. To visualize the co-author network and simultaneously reflect the leading scholar(s), we set the weight as "normalized citations", which determines the size of a node. In other words, a node with a bigger size indicates that its corresponding author has been cited more frequently. Other types of weights include "links" and "total link strength", which focus on the tightness of the co-authorship, and "documents" which represents the number of papers in which an author cooperated with other scholars.
(2) Co-words analysis The co-words (also called co-keyword or keywords co-occurrence) analysis enables researchers to gain knowledge of hot topics of their domain of interest. For beginners, it is helpful to have an overall impression of a research field, and thus to start his or her research by selecting a promising topic. For (relatively) mature scholars, it is also useful to obtain fresh ideas that may come up with the map of co-words networks. We still used VOSviewer to carry out the visualization. Being similar to the co-author networks, co-words networks consist of nodes and the links among them. To present the popularity of the words, we set weights as "occurrences", a bigger node indicates more frequent occurrences of a word co-occurring with other words in the literature. The thickness between two words is determined by the number of their co-occurrences, and thicker links between two words signify stronger relevance. Other weights include "links" and "total link strength", which emphasize the relevance of two keywords.
(3) Co-citation analysis Co-citation means that two articles are cited in one paper. Co-citation analysis builds co-cited relationships between articles, and then helps researchers to find important papers as well as their related researchers efficiently. It is a classic method in scientometrics [42]. We employed CiteSpace [43,44] to gain co-citation networks. Two important indices that CiteSpace can implement are "betweenness centrality" and "burst". The former was first introduced by Freeman [45] to measure centrality based on the shortest path in a graph, and is used to show the pivotal nodes for information flow in a network (CiteSpace highlights the pivotal nodes with purplish red circles). Additionally, the latter was developed by Kleinberg [46], and is used to detect the burst of an event (citation, keyword, or publication) in CiteSpace with burst nodes colored with red circles. Another merit of CiteSpace is that it can generate the co-citation networks for different periods simultaneously, which is not only helpful for researchers to see the development of a research domain, but also helps one to infer the possible research trends.
Co-authorship, co-words, and co-citation analyses can be implemented by other scientometric mapping tools, not only those mentioned above. For a thorough overview of scientometric mapping tools, we refer the reader to the work of Li et al. [38]. The purpose of this paper is to gain knowledge of SA research on a macro-level, so we focus on the interpretation of the (visualized) results rather than the comparison of results output by different mapping tools.

Results and Analysis
SA research covers a wide range of disciplines because data with geographical or locational information have the feature of spatial autocorrelation. Additionally, SA research increased yearly over 1991-2021, which is shown in Figure 2 (i.e., Recs, the blue bars). Figure 2 also depicts the yearly changes of TLCS and TGCS for SA-related papers. Three significant peaks appear for 1993, 2007, and 2013, indicating that there were important contributions published in these years. There are also several moderate peaks between 1993 and 2007, showing that the SA methods and theories were continuously developed over these 15 years. The decreasing trend of the TLCS and TGCS after 2013 makes sense because there has not been enough time for newly published articles to be cited.
As mentioned in Section 2.1, the collected dataset contains complete metadata information which can tell a full story for each article. Therefore, compared to quantitative statistics, qualitative cognitions are more interesting. In this paper, we focus on influential journals, the main countries and institutions, representative research communities, hot topics and important references, and the evolution as well as possible research trends of SA research, which are discussed in the following subsections. ISPRS Int. J. Geo-Inf. 2022, 11, x FOR PEER REVIEW 6 of 26 As mentioned in Section 2.1, the collected dataset contains complete metadata information which can tell a full story for each article. Therefore, compared to quantitative statistics, qualitative cognitions are more interesting. In this paper, we focus on influential journals, the main countries and institutions, representative research communities, hot topics and important references, and the evolution as well as possible research trends of SA research, which are discussed in the following subsections.

Influential Journals
An important element for scientific studies is the platform on which research findings are published and thus can be propagated. A formal platform can be academic journals which often are peer reviewed. It is necessary for researchers to know the influential journals in their research domains so that they can keep up to date with cutting-edge research. Table 2 presents ranks for journals in terms of Recs, TLCS, and TGCS. It shows that most productive journals relate to ecology. Specifically, Ecology, Global Ecology and Biogeography, and Ecography are the most representative journals with high TLCS values. These journals also have high TGCS values. In addition, as a geographical journal, Geographical Analysis is an outstanding geographical journal with TLCS 1002 and TGCS 4937. Except classic journals, open access journals such as Plos One, Sustainability, and ISPRS International Journal of Geo-Information appear in the Recs rank. For a research domain, important papers are probably published in journals with a high TLCS, so scholars typically pay more attention to these journals.

Influential Journals
An important element for scientific studies is the platform on which research findings are published and thus can be propagated. A formal platform can be academic journals which often are peer reviewed. It is necessary for researchers to know the influential journals in their research domains so that they can keep up to date with cutting-edge research. Table 2 presents ranks for journals in terms of Recs, TLCS, and TGCS. It shows that most productive journals relate to ecology. Specifically, Ecology, Global Ecology and Biogeography, and Ecography are the most representative journals with high TLCS values. These journals also have high TGCS values. In addition, as a geographical journal, Geographical Analysis is an outstanding geographical journal with TLCS 1002 and TGCS 4937. Except classic journals, open access journals such as PLoS ONE, Sustainability, and ISPRS International Journal of Geo-Information appear in the Recs rank. For a research domain, important papers are probably published in journals with a high TLCS, so scholars typically pay more attention to these journals.

Main Countries and Institutions
For a scholar, conducting research includes not only reading and publishing articles in academic journals, but also conducting academic visits and communicating their research. Hence, knowing influential countries and institutions is necessary. Table 3 shows different ranks Recs, TLCS, and TGCS. The top ten countries for each rank were extracted. USA ranks the first in all the three aspects, indicating that it has very big influence in SA research. Australia, Canada, and UK also have considerable impacts. As the only Asian country, China has the second largest amount of SA publications, and middling TLCS and TGCS ranks.  Table 4 presents the top ten institutions. Australian National University has the largest TLCS and TGCS values; University Montreal (Canada) and University Federal de Goiás (Brazil) also have high TLCS and TGCS values. Chinese Academy of Science and University of Chinese Academy of Science published the most SA research papers, but only has a rank of TGCS, indicating that ground-breaking works may be lacking somewhat. Except these institutions, the universities of USA make up a large proportion in this table, which is coincident with the result in Table 3.

Representative Research Communities
The total number of authors for the 8461 papers is 27,752. In order to discover representative research communities, we conducted a co-authorship analysis for authors. Figure 3 shows the co-author networks of those authors who published more than five papers of SA research. Each node presents an author; the links between nodes indicate co-authorship between authors. A larger node means that papers of the respective author have more citations; a thicker link between two nodes means more collaborations of the authors. Two types of nodes can be recognized in the figure, grey ones which have rare co-authors that can hardly forms a community, and colored ones which have at least one co-author and indicate clusters. We focus on the colored clusters which are labeled shows the co-author networks of those authors who published more than five papers of SA research. Each node presents an author; the links between nodes indicate co-authorship between authors. A larger node means that papers of the respective author have more citations; a thicker link between two nodes means more collaborations of the authors. Two types of nodes can be recognized in the figure, grey ones which have rare co-authors that can hardly forms a community, and colored ones which have at least one co-author and indicate clusters. We focus on the colored clusters which are labeled by authors' names and discuss the research communities according to the geographical locations of authors' affiliations. Of note is that the figures shown in Sections 3.3.1-3.3.4 (Figures 4-7) are zoomed in counterparts of Figure 3 with the scores of normalized citations of the leading authors listed in each figure caption.       Figure 6b is zoomed out to fit the typesetting. In fact, the node of Svenning is slightly bigger than the node of Thuiller in (a)).

Research Communities in the Southern Hemisphere
Two communities can be recognized in the southern hemisphere, as shown in Figure 4. The first is the Peakall community, and the second is the Diniz-Bini-Rangel community. Peakall's GenAIEx, which is a "cross-platform package" [47] used to conduct population genetics analyses, contributes much to the TLCS and TGCS ranks of Australia and Australian National University (see Tables 1 and 2). The package provides both frequency-based and distance-based methods to explore the spatial pattern of genetic structures; not only classical statistical analyses are available, but also spatial autocorrelation analyses, including spatial heterogeneity tests for genetic structures. In recent years, Peakall's community pays much attention to the fine-scale genetic structure, which may offer new evolutionary insights that are overlooked by large-scale analyses [48][49][50].
The Diniz-Bini-Rangel community represents the Federal University Goiás (i.e., Univ Fed Goiás in Table 4) in Brazil. The SA research fruits of this community relate to multiple themes such as the geographical patterns of biodiversity and simulations [51], geographical genetics [52], and spatial statistics [53][54][55]. Compared to Peakall's community, the Diniz-Bini-Rangel community is more connected and balanced. The Diniz-Bini-Rangel community consists of nodes with similar sizes, which indicates that the authors in these communities were cited at similar frequencies, whereas Peakall was cited far more frequently than other authors in Peakall's community. The authors in the Diniz-Bini-Rangel community focus on spatial analysis and modelling for species distribution and patterns within macroecology [56,57].
These two communities constitute the main forces of SA research in the southern hemisphere. However, research communities in the northern hemisphere are more diverse, so in the following discussion, they are grouped at the continent level. Figure 5 presents five communities of SA research from northern America. The Epperson community is devoted to research of geographical genetics, developing probability and distribution theories of spatial statistics [58] and simulation processes to analyze population genetic data [59]; Epperson also considered the geographical scale problem, which is similar to the modifiable areal unit problem (MAUP) [60,61] in geography relating to the correlation among spatial statistics themselves [62]. The Jetz community (Yale University) addresses the geographical and environmental factors behind the distributions of biodiversity [63], and especially how the scale dependencies function on biodiversity [64]; in their studies, spatial autocorrelation always mingles with environmental variables, which impacts species distribution or co-occurrence [65].

Research Communities in Northern America
Scholars from University of Montreal and University of Toronto constitute the Legendre-Fortin community, who study SA in the background of ecology. The beginning of Legendre's SA work is the paper published in 1993 [28], which develops a frame within which SA can be described and measured, hypothesis testing can be conducted properly, and spatial structures can be introduced to ecological models explicitly. This paper led to the citation peaks in 1993 (see Figure 2). From then on, the SA works of Legendre and his team mainly emphasized exploring proper statistical methods for ecological studies [66]. Fortin's team also focused on modelling ecological processes [67,68], however, from a perspective of conservation biology [69]. Sokal (1926Sokal ( -2012) is a pioneer who introduced SA to biology [70,71] and led studies of population genetics [27]; he collaborated extensively with authors in the Legendre-Fortin community. Another SA model-oriented research community is the Peres Neto-Dray community (Dray is from Université Claude Bernard Lyon 1, France; his co-authorship with northern American authors groups him into this community), who are especially interested in exploring multi-scale and multivariate problems in ecological studies [72], and problems relating to the spatial weights matrices which represent spatial structures [73][74][75].
The Griffith community is devoted to spatial statistics and geographical information science (GIS) studies relating to the domain of geography. Griffith developed the Moran Eigenvector Spatial Filtering (MESF) technique to deal with the SA latent in the regressive model for spatial data [76,77], and this community develops a series of methods for solving mathematical or computational problems relating to SA (e.g., [78,79]).

Research Communities in Europe
Two research communities of SA research, presented in Figure 6, were discovered by VOSviwer. The Thuiller-Kuehn community focuses on species distribution modelling [80], uncovering the relationships between species distribution (in time series) and environmental factors [81]. Kuehn's team developed an R package, "spind", to improve prediction accuracy by selecting appropriate accuracy measures [82], and analysis lattice data at different spatial scales [83]. The Svenning community is focused on dealing with SA latent in spatial data in terms of regression models to conduct biodiversity studies [24].

Research Communities in Asia
The SA research forces of Asia are mainly in China. Figure 7 shows two communities: the Wang-Yang-Liu community, colored in red, and the Wang community, colored in pale blue. The common research object of the Wang-Yang-Liu community is cities. However, they concern different urban problems. Wang Shaojian and his co-workers study urban environmental pollution by spatial modelling, especially regression modelling with SA [33,84]. Yang Jun's team use SA techniques to explore spatial factors that impact urban temperature [85,86]. Liu Yanfang's research group is interested in urbanization [34] and urban public services, such as in Greenland [87], and medical facilities [88]. The Wang community is mainly made up of scholars from Chinese Academy of Science. Wang Jinfeng and his team developed "Sandwich Spatial Sampling" [31] and released "GD" (geographical detector or GeoDetector) [32,89] to handle spatial (stratified) heterogeneity latent in spatial/geographical datasets.
There are another two clusters of Chinese scholars in Figure 3: the blue cluster near the red community, and the green-yellow cluster near the Peakall community. We have not listed them out because the major nodes (i.e., Wang Chao, and Wang Yan) are made up of different scholars whose names are pronounced the same. As a summary, the blue cluster consists of researchers who apply SA analysis to spatial epidemic include the COVID-19 studies [35]; and authors in green-yellow cluster published papers about the spatial distribution of chemical elements in soil [90,91].
We discuss 11 SA research communities in total, except the Griffith community and two Chinese communities (Figure 7) belong to geography discipline, other 8 communities can be grouped in ecology and biology disciplines. Most of these 11 communities are methods originated so that they have great impacts.

Hot Topics and Important Papers
We analyze the research subject of SA in Sections 3.2 and 3.3, and probe the research objects (hot topics) and important references in this part.

Hot Topics
A keywords co-occurrence map generated by VOSviewer is shown in Figure 8 (the co-occurrences of selected keywords are also listed); it provides evidence for picking out hot topics from more than 8000 references. Because "spatial autocorrelation" is a common topic in the literature, it has the largest size and is in the center of the co-occurrence map. In addition, other keywords still have significant sizes on which we focus to gain knowledge of the hot topics.
The blue cluster represents spatial autocorrelation analysis for molecular data, for the word "genetic" appears frequently in this group (e.g., landscape genetics, genetic structure, genetic diversity, etc.). The possible hot research topics include exploring the sources of diversity, e.g., seed dispersal [92,93], (isolation by) distance [94,95], (genetic) differentiation [96], and developing computer programs [97] for spatial genetic data analysis. The green-yellow cluster indicates studies pertaining to biodiversity, whose related topics include (species) richness [98], and beta-diversity [99]. Studies of this type consider factors such as scale [100] and climate [101], which impact biodiversity.
The green cluster highlights spatial heterogeneity and variability, for which geostatistical methods such as the use of variograms are frequently employed [87,102]. In addition, technologies including remote sensing and lidar are used to investigate topics related to land use [103], soil [104], and vegetation [105]. In this cluster, machine learning techniques [106], e.g., random forest, are applied to SA-related studies. The red cluster addresses topics such as urbanization, carbon emission, economic growth, health and epidemiology pertaining to China [107][108][109][110]. The spatial model selection, application, and improvement for specific research problems are frequently discussed by authors in this cluster [111][112][113]; in particular, the geographical weighted model is intensively developed and applied [114][115][116][117]. mon topic in the literature, it has the largest size and is in the center of the co-occurrence map. In addition, other keywords still have significant sizes on which we focus to gain knowledge of the hot topics. The blue cluster represents spatial autocorrelation analysis for molecular data, for the word "genetic" appears frequently in this group (e.g., landscape genetics, genetic structure, genetic diversity, etc.). The possible hot research topics include exploring the sources of diversity, e.g., seed dispersal [92,93], (isolation by) distance [94,95], (genetic) differentiation [96], and developing computer programs [97] for spatial genetic data analysis. The green-yellow cluster indicates studies pertaining to biodiversity, whose related topics include (species) richness [98], and beta-diversity [99]. Studies of this type consider factors such as scale [100] and climate [101], which impact biodiversity.
The green cluster highlights spatial heterogeneity and variability, for which geostatistical methods such as the use of variograms are frequently employed [87,102]. In addition, technologies including remote sensing and lidar are used to investigate topics related to land use [103], soil [104], and vegetation [105]. In this cluster, machine learning techniques [106], e.g., random forest, are applied to SA-related studies. The red cluster addresses topics such as urbanization, carbon emission, economic growth, health and epidemiology pertaining to China [107][108][109][110]. The spatial model selection, application, and improvement for specific research problems are frequently discussed by authors in this cluster [111][112][113]; in particular, the geographical weighted model is intensively developed and applied [114][115][116][117]. Figure 9 displays 959 papers whose nodes have larger sizes and are thus more influential. The biggest node is that of Peakall (2006) [20], who introduced GenAIEx 6 to the genetic analysis community, the popularity of this package may contribute to its convenience, i.e., that it can be used directly in Microsoft Excel. Another computer package used by this community is PopGenReport (Adamack (2014)) [118]. Hence, the purple cluster presents the main articles about packages that can conduct genetic data analyses. The green group suggests important works about geographical or environ-  Figure 9 displays 959 papers whose nodes have larger sizes and are thus more influential. The biggest node is that of Peakall (2006) [20], who introduced GenAIEx 6 to the genetic analysis community, the popularity of this package may contribute to its convenience, i.e., that it can be used directly in Microsoft Excel. Another computer package used by this community is PopGenReport (Adamack (2014)) [118]. Hence, the purple cluster presents the main articles about packages that can conduct genetic data analyses. The green group suggests important works about geographical or environmental factors that impact genetic structure (e.g., Vekemans (2004) [119] and Streiff (1998) [120]).
The important papers displayed in Figure 9 are in line with our reasoning: researchers prefer to cite references which are method-original or about user-friendly tools for implementing data analysis in their specific domains. Two other frequently used computer packages are GeoDa [133] and spdep [134,135], which are designed to handle spatial dependence hidden in geographical data. These references, however, are not presented in Figure 9.

Research Development and Trends
Section 3.4 discusses hot topics and important articles. However, the visualizations are for the whole time period so that papers which were published earlier have a bigger chance of being displayed, so we need to remove the intertwined knowledge so that the vein of the development of SA research from 1991 to 2021 can be clear. To gain information of the evolution of SA research, we need to consider the timeline, and CiteSpace [43,44] can meet this requirement. Figure 10 shows the co-keyword (or keywords

Research Development and Trends
Section 3.4 discusses hot topics and important articles. However, the visualizations are for the whole time period so that papers which were published earlier have a bigger chance of being displayed, so we need to remove the intertwined knowledge so that the vein of the development of SA research from 1991 to 2021 can be clear. To gain information of the evolution of SA research, we need to consider the timeline, and CiteSpace [43,44] can meet this requirement. Figure 10 shows the co-keyword (or keywords co-occurrence) map and co-citation map with the timeline divided by six periods: 1991-1996, 1997-2002, 2003-2008, 2009-2014, 2015-2020, and 2021. Table 5 presents the cluster labels of the co-keyword clusters map and co-citation clusters map. Although the CiteSpace extracted labels for co-citation clusters of 1991-1996 and 1997-2002, we can also summarize more accurate expressions through the representative papers displayed in the respective clusters. In the first period (1991-1996, purple), Sokal's articles (e.g., [136,137]) were intensively cited; research in this period is mainly about SA analysis for biological data, which are pioneer works introducing SA in biology. In the second period (1997-2002, blue), studies were about spatial genetic structure [138] and diversity [139] considering SA in ecological modelling [140]. In the third period (2003-2008, cyan), SA studies also focused on spatial genetic structure and diversity, but publications pertaining to genetic population structure were emerging (e.g., [141,142]). In the fourth period, authors studied species richness, species distribution, etc.; "land use" appeared in this period, indicating that SA methodologies were applied to research more inclined to people. In the fifth period, except the "ecological niche model" and "moran eigenvector", a large number of scholars employed SA methods to study city problems (air pollution [143], urbanization of China [144]), and problems of public health [145]. A keyword's color(s) is/are coincidence with its year(s) color(s), e.g., "spatial autocorrelation" is colored purple, blue, cyan, light-green, yellow, and red like annual rings, indicating that it appears throughout 1991-2021. An article node is colored by color(s) of its citation year(s), e.g., Wang JF (2016) [32] was cited in 2015-2020 and 2021. These pivotal keywords have purplish red circles, and bursting cited papers are in red circles. Table 5. Label titles of clusters in Figure 10. --Num. of clusters 12 15 There seems to be a change that appeared in 2009-2014 after which SA research about studies in the geographical domain can be recognized in the global background, and SA studies began to cover problems pertaining to people and people's lives. It can be inferred from the trajectory that more and more humanistic studies which employ SA methods will emerge in the future, so that SA may be a bridge which connects the observed phenomenon and its unobserved causes.

Merits and Shortcomings of This Paper
In our research, we employed bibliographical-data-driven methods to explore the features of SA research which has several merits. Firstly, the results tell readers the general information of SA studies, such as those important journals, leading countries/regions, institutions, representative authors and articles. Secondly, the visualizations developed by scientometric tools suggest main research topics, and evolution of SA research. In a word, bibliographical analysis and visualization present objective results, and give researchers an overall view of SA research.
Although the results are objective, interpreting the results depends on people. For example, we organized the research communities in terms of countries/regions; however, it may be better to discuss these communities in terms of disciplines or research domains. Another point that needs to be explained is the division of time periods. The period 1991-2021 may be divided in a more "representative" manner rather than being equally divided (2021 is the single year left). The term "representative" means that topics in one period had better be different from the topics in its neighbor periods so that the development process can be presented more clearly. At last, as we mentioned in the very beginning of this paper, SA research covers to a wide range of disciplines within which geography-related domains counts small proportions, so works in geographical domains may be overlooked under such a huge base. To prevent this, we conducted co-keyword and co-citation analyses on papers within geographical domains.

Refine the Results from a Geographer's Perspective
A total of 1699 records within geographical domains were extracted from the original dataset which have 8461 records. We conducted keywords co-occurrence analysis and co-citation analysis for the sub-dataset. Figure 11 shows topics highlighted in different time periods.
which geography-related domains counts small proportions, so works in geographical domains may be overlooked under such a huge base. To prevent this, we conducted co-keyword and co-citation analyses on papers within geographical domains.

Refine the Results from a Geographer's Perspective
A total of 1699 records within geographical domains were extracted from the original dataset which have 8461 records. We conducted keywords co-occurrence analysis and co-citation analysis for the sub-dataset. Figure 11 shows topics highlighted in different time periods. To interpret the co-keyword and co-citation map more efficiently and clearly, we condense the information to Figure 12. Instead of applying SA methods to specific research objects in Figure 10, topics in Figure 11 are more technical or method oriented. It can be seen from Figure 11 that, research often used Monte Carlo method to simulate SA in the early 1990s; in the middle-to-late 1990s and early 2000s, authors concentrated their attention to local SA, which is the variation (also called spatial heterogeneity) in a finer spatial scale comparing to the global scale on which global SA may not be significant.  Figure 10).
To interpret the co-keyword and co-citation map more efficiently and clearly, we condense the information to Figure 12. Instead of applying SA methods to specific research objects in Figure 10, topics in Figure 11 are more technical or method oriented. It can be seen from Figure 11 that, research often used Monte Carlo method to simulate SA in the early 1990s; in the middle-to-late 1990s and early 2000s, authors concentrated their attention to local SA, which is the variation (also called spatial heterogeneity) in a finer spatial scale comparing to the global scale on which global SA may not be significant. Classical SA methods developed in this period were Anselin's LISA [146], Ord's and Getis' local SA statistics [121,147], and Fotheringham's geographically weighted regression (GWR) [29]. Over 2003-2008, scholars considered temporal autocorrelation as well as spatial autocorrelation; related methods are thoroughly discussed in Cressie's work [148]. Constructing proper models for data with different features is one of the hot topics in 2009-2014. Many social scientists employed regression models to explore econometric problems, and model specification tests are an important work during model-building processes. The Lagrange multiplier test for SA and spatial heterogeneity developed by Anselin [149] was frequently used to conduct specification tests. In 2015-2020, GWR, Wang's geographical detector [89], and Griffith's Moran Eigenvector Spatial Filtering (MESF) [15,76] were widely applied. Moreover, intensive developments and implementations [150] of these methods also contribute to their popularities. In 2021, SA methods were used to discover spatial or spatiotemporal distributions or patterns of COVID-19 (e.g., [151,152]); in addition, algorithms such as projection pursuit [153] and fuzzy c-means [154] were improved for spatial clustering in 2021.  Figure 12 also indicates a technical evolution of SA research. From the Monte Carlo simulation to combine typical clustering algorithm to spatial clustering, SA methods or techniques evolve with research needs and the features of datasets. Three main research trends of SA techniques may be: (1) developing faster-computing methods to handle massive spatial datasets; (2) exploring more intensive model building or parameter setting schemes to deal with finer-scale datasets as well as diversified research objects; and (3) improving model diagnosing methodologies to ensure the reliability of spatial models for datasets with huge sizes and multi-sources.

Conclusions
This paper employs scientometric methods, i.e., scientometric indicators and scientometric network techniques (co-author/word/citation analysis), to gain an all-encompassing perspective of SA research which covers a wide range of disciplines. Firstly, we used three indicators, Recs, TLCS, and TGCS ,equipped in HistCite to evaluate the impacts of journals, countries/regions, and institutions relating to SA research. The results indicate that most of the top journals are of ecological and biological domains, among which geographical analysis ranks highly in terms of TLCS and TGCS. Northern American countries, European countries, Australia, Brazil, and China as well as institutions in these areas are influential. This gives general information about SA research.
Secondly, we employed VOSviewer to conduct an co-author analysis and identified 11 SA research communities. Griffith's MESF community, Wang's GeoDetecor community, and the Wang-Yang-Liu community are three groups contributing to SA research in the geographical domain. Anselin's GeoDa group was not recognized. A reason for  Figure 12 also indicates a technical evolution of SA research. From the Monte Carlo simulation to combine typical clustering algorithm to spatial clustering, SA methods or techniques evolve with research needs and the features of datasets. Three main research trends of SA techniques may be: (1) developing faster-computing methods to handle massive spatial datasets; (2) exploring more intensive model building or parameter setting schemes to deal with finer-scale datasets as well as diversified research objects; and (3) improving model diagnosing methodologies to ensure the reliability of spatial models for datasets with huge sizes and multi-sources.

Conclusions
This paper employs scientometric methods, i.e., scientometric indicators and scientometric network techniques (co-author/word/citation analysis), to gain an all-encompassing perspective of SA research which covers a wide range of disciplines. Firstly, we used three indicators, Recs, TLCS, and TGCS, equipped in HistCite to evaluate the impacts of journals, countries/regions, and institutions relating to SA research. The results indicate that most of the top journals are of ecological and biological domains, among which geographical analysis ranks highly in terms of TLCS and TGCS. Northern American countries, European countries, Australia, Brazil, and China as well as institutions in these areas are influential. This gives general information about SA research.
Secondly, we employed VOSviewer to conduct an co-author analysis and identified 11 SA research communities. Griffith's MESF community, Wang's GeoDetecor community, and the Wang-Yang-Liu community are three groups contributing to SA research in the geographical domain. Anselin's GeoDa group was not recognized. A reason for this may be that we did not include "spatial association", which was the keyword of its seminar work [146] as a search topic.
Thirdly, we applied CiteSpace to conduct co-keyword and co-citation analyses and divided 1991-2021 into six time periods (2021 is the year single listed) so that the evolutionary path can be clearly presented. Global (the whole dataset with 8461 records) and local (the 1699 records relating to geographical domains) analyses were both conducted, from which research trends from two different views can be inferred. The first is from the view of all the related disciplines; SA research may be more humanistic, i.e., researchers may focus more on people and the natural as well as social environment within which they are living. SA models or SA analysis may better uncover the spatial pattern or key factors of the observed phenomena. The second is from the view of geography-related disciplines, and we make a technical summary. As the spatial datasets are becoming bigger, and their scales are finer, more efficient algorithms for computation are needed, and more intensive spatial model building or parameter setting schemes are also needed. In addition, improving model diagnosing methodology is very necessary for the reliable modelling of spatial data with huge sizes and multiple sources.
Although we have discussed SA-related research in geographical domains in the previous paragraph, it is still necessary to give an overall summary at the very end of this paper. As shown by the results of our analyses, SA-related research in geographical domains only makes up about 19% of the whole literature in which research of ecological and biological domains count the most. Before 2009, SA research in geographical domains can hardly be recognized in the global background, although fundamental works [4][5][6]15,29,146,147] were conducted by pioneering and later geographers. Hence, it may be after 2009 that these theoretical works of geographical domains are widely cited and deeply developed. Except the technical trends of SA research addressed in the above paragraph, the research trends of empirical studies should also be discussed. In fact, a very large portion of the detected research is applied research (e.g., the Wang-Yang-Liu community in Figure 7a and co-keywords clusters in Figure 11a) which implies that SA-related methodologies are potent to a wide range of research topics. Therefore, making geographers' more visible and known is important for applying SA to more domains which use data with locational information as their research objects.
A last point that needs to be mentioned is that not all SA-related research is included in the 8461 records, because there are no searching strategies guaranteeing a collection without a single article left. However, these records should cover the majority of publications. Meanwhile, it does not mean that works not included in the datasets or not mentioned in this paper are not important. Although we cannot guarantee all SA papers to be included in the dataset, the results of this work still have referential value for SA researchers: not only for beginners to start a research topic more efficiently, but also for (relatively) mature researchers to gain new insights into their studies.