5.1. Between-Group Variance
The test results from 78 undergraduates in the CNU in Beijing, China, were calculated to examine the reliability of this STAT test in a context of the Chinese higher geography education. A relatively high value of Cronbach’s Alpha at 0.71 was considered to be a reliable set of items in this survey instrument.
The scoring for each participant was a calculation of the number of questions answered correctly, thus ranging from 0 to 16. An independent-samples t test was conducted to look into whether the geography and geoinformation students’ STAT test scores differed significantly from each other. The mean scores of STAT tests of the two pedagogy groups in our study were 12.88 (the geography group without GIS enactment) and 13.70 (the geoinformation group with an intense GIS learning), respectively. Apparently, the geography group scored lower than the geoinformation group, and their mean difference (−0.82) was statistically marginally significant (p < 0.10, t = −1.68, SE = 0.49, 95% CI = −1.79 ~ 0.15). This finding supported our presumption that the GIS learning would prominently improve the spatial reasoning abilities in the human geography courses.
Table 3 represents the
t test results between geography and geoinformation groups regarding the respective scores of the 16 items of STAT test. Percentages of students correctly answering each of the 16 items of the STAT were calculated and tabulated in
Table 3, ranging from 0 to 1. As verified by Lee and Bednarz, the STAT test is able to investigate how completely the students can understand the core concepts of the spatial thinking suggested by Gersmehl et al. [
45,
46,
47]. As elaborated in
Table 2, we also analyze the detailed critical spatial thinking to measure in STAT test. Combined with
Table 2 and
Table 3, we can see the particular variance between the two human geography pedagogy groups in these various aspects of spatial reasoning. Levene’s test evaluates the assumption that the variances for the two pedagogy groups are equal. Levene’s tests are significant for total, #3, #7, #9, #11, #13, #14 and #15, and thus the equality-of-variance assumption is violated and we should report the
t value that does not assume equal variances. In our investigation among the eight item types in STAT identified from
I to
VIII in
Table 2, it was found that only the latter four types (
V,
VI,
VII and
VIII) showed statistically significant mean differences between geography and geoinformation groups. Apparently, GIS learning was exerting very different influences on the different components of spatial reasoning ability, and this standardized and comprehensive STAT test allowed us to look into the different skill sets and GIS training’s differential effects on them. Such an item-by-item analysis was also adopted in Lee and Bednarz’s validity of their STAT test, and another empirical study on the effects of web-based GIS learning in a world geography context in 2016 [
10,
11].
The mean score difference between the geography and geoinformation groups in #7 that is the second item of Type V, which requires comprehending and graphing the positive and negative spatial associations on two maps, was reported to be marginally significant (mean difference = −0.17, p = 0.09, Cohen’s d = 0.38, small to medium effect size). However, their mean scores on #6 (also Type V) were insignificantly different, because #6 requires a simpler comparison of different layers of spatial information pertaining to the same area in the map, and #7 asks to critically interpret and graph the spatial information that is a higher-order critical thinking.
What is beyond the expectation is that the geography students had a marginally significantly better performance on Type VI (#8), which requires spatial orientation in the real-world situation, 2-D to 3-D conversion and visualization of real-world images (mean difference = 0.20, p = 0.09, Cohen’s d = 0.42, medium effect size). It is thus proved that the GIS enactment failed to effectively improve this spatial reasoning type compared with traditional geography teaching. Extensive fieldwork in a human geography course has strengthened geography students’ imagination and orientation in a real-world 3-D situation, including how to associate and estimate spatial information from an unannotated 2-D topography map with a real world. Types VI (#8) and IV (#5) are seemingly quite similar; but very different from #5, #8 asks to estimate the elevation of places having no labelled data points and thus demonstrate a higher-order spatial cognitive ability as to mentally visualize, imagine and orientate in the real-world 3-D situation. Therefore, it is probable that students can improve this type of spatial thinking as a result of fieldwork in a real world rather than GIS enactment.
The geoinformation group improved significantly on Type VII (#9) requiring Boolean operations such as overlaying and dissolving maps. Among all the four items in Type VII (#9, #10, #11, #12), only #9 that requires choosing Boolean logics showed a significant score difference (mean difference = −0.10, p = 0.02, Cohen’s d = 0.47, medium effect size); but the other three items (#10, #11, #12) in Type VII requiring a unidirectional Boolean operation did not show a statistically significant mean difference between geography and geoinformation groups.
The geoinformation group with GIS enactment reported a significantly higher score in Type VIII items (#13, #14, #15), associated with the imagination and recognition of spatial data types and map symbols (e.g. point, line and polygon) and their spatial patterns from visually or verbally expressed spatial information. The mean score differences between geography and geoinformation groups were reported −0.36 for item #13 (p = 0.00, Cohen’s d = 0.94, large effect size), −0.14 for item #14 (p = 0.04, Cohen’s d = 0.46, medium effect size), and −0.06 for item #15 (p = 0.08, Cohen’s d = 0.35, small to medium effect size). Although no significant difference was reported for item #16, it is interesting that both geography and geoinformation groups performed badly in answering item #16, with over half of them wrongly comprehending and constructing the data types and symbols.
5.2. Comparison with Lee and Bednarz’s Study
Our test using STAT has examined the Chinese undergraduate students’ spatial thinking ability in the human geography teaching in China today, which is a problem-based pedagogy towards sustainability. CNU is a “Double First-Class” university in China (sorted by discipline), and its geography discipline was ranked B
+ on the 2017 lists of universities and disciplines to be developed under China’s “Double First-Class” (
Shuang yiliu) initiative, which was released by the Chinese Ministry of Education, Ministry of Finance and National Development and Reform Commission in September 2017 [
48]. CNU and its geography discipline are therefore representative of a medium to upper level of the higher geography education in a context of China.
We thus continue with another analysis of the Chinese college students’ spatial thinking ability in comparison with the existing STAT reports from the different American Universities located in Texas, Ohio, Illinois and Oregon in Lee and Bednarz’s 2012 STAT test, and the Texas State University and the University of West Georgia (UWG) in Jo Injeong, Hong Jung Eun and Verma Kanika’s 2016 STAT test [
10,
11].
Table 4 summarizes and compares the STAT test results from a total of the seven different universities in China and the USA. According to Lee and Bednarz’s analysis on the possible correlation between the STAT scores and the percentage of geography majors, the better performance of Chinese students in the STAT can be partially attributable to the apparent gap in percentage of geography majors in each group [
10]. Jo
et al. explained the difference between Texas State and UWG using the same reasoning [
11]. Some other reasons, such as the large difference in geography textbooks and pedagogies and in social and cultural capital for the STEM (known as an interdisciplinary and applied curriculum in science, technology, engineering and mathematics), critical thinking and spatial cognition abilities in the Oriental and Western cultures are worthy of more investigation [
49,
50,
51].
Figure 3 shows the percentage of participants correctly answering each of the 16 items of the standardized STAT test as reported in Lee and Bednarz’s 2012 STAT test and this study [
10]. The horizontal axis and vertical axis in
Figure 3 show the 16 items in the STAT test and the percentage of correctly answering each of the item, respectively. The detailed scores for each item in the Texas State and UWG were not given in the published paper [
11], and thus were not compared here in
Figure 3. Univ. A had the highest percentage of geography majors and also the highest mean scores on STAT test in these American samples. For this reason, the comparative examination would be focused on the mean score difference between CNU in China and Univ. A in the USA. As shown in
Figure 3, the Chinese students can better solve the ideal location choice problem based on given criteria in Type
III (#4). More than 80% of the Chinese participants can perform spatial reasoning to visualize, overlay and manipulate spatial objects that are not physically over-layable on map, but only about 70% of the American participants chose the right location. Furthermore, Chinese students can successfully execute the complex Boolean operations among all the four items in Type
VII (#9, #10, #11, #12), displaying a higher spatial cognition than their American peers in this aspect. The different comparative performance between Chinese and American students seems to show the great advantage in adopting the standardized STAT test which can embrace a multi-facet examination of spatial thinking that is composed of more than one different skill. It is true that the participants from different pedagogies or demographic and cultural contexts would be good at very different skills of spatial thinking. Next, we will look into the correlation between the reported STAT scores and exam ranks.
5.3. Statistics Explaining the Applicability of the STAT Test in the Chinese Context
In Lee and Bednarz’s 2009 STAT test, it was found that the exam grade could explain quite well the students’ STAT test scores, but this presumption was not held in our empirical study.
Table 5 presents the correlations between STAT test scores, final exam rank in the class and the students’ self-assessments of geography learning in the college and high school periods. It was expected that the positive correlations between students’ STAT test scores and their final exam rank as well as their self-assessments on geography learning would exist in our empirical study. This research assumption, however, was not supported by empirical findings listed in
Table 5. A strong negative correlation was reported between STAT test scores and final exam rank (
r = −0.31,
p = 0.006), and the weak and negative correlations between the students’ STAT test scores and self-assessments on geography learning (
r = −0.19,
p = 0.10;
r = −0.19,
p = 0.10) were also found.
Such a mismatch between the STAT test and final test in geography teaching needs more attention, because it is possible that the higher geography education in a context of China may not fully embrace the spatial thinking capacity as a strategic goal. We can see the very different outline in the Chinese and American geography education systems when probing into China’s 2017 version of the compulsory geography education curriculum criteria in secondary school. In this national curriculum criteria released during new curriculum reforms in contemporary China, four different types of geographic abilities have been highlighted as the very core of geography education in China also including that in higher education: a) the sustainability ideal of People-Place Harmony; b) the spatially integrative knowledge system for purposes of the sustainable development; c) the regional cognition towards sustainability; and d) the newly proposed concepts on geographic practice ability [
52]. Apparently, the Chinese students’ spatial cognition is not taught and tested independently from the problem solving for purposes of the sustainable development.
The standardized STAT test can report a better performance of Chinese students in solving the location choice problems and practicing Boolean operations (see
Figure 3), but this test was actually NOT quite consistent with the geography teaching pedagogies towards sustainability in real world situations in contemporary China, such as an extraordinary fast urbanization and industrialization process and the very challenging pollution issues. Interestingly, the mismatch between STAT test scores and Chinese geography pedagogies was apparently more prominent in the geography group (without any GIS enactment) than geoinformation peers who were exposed to intensive GIS practices (see
Table 5), but self-assessments of the geoinformation group in geography learning displayed a weak but negative correlation with their exam rank in the class, implying that higher student motivation in geography learning would not produce higher achievement. This result can be attributable to geoinformation students’ stronger identity to be an IT (information technology) professional rather than a geography teacher in secondary schools in the near future.
Table 6 lists the results of a regression taking into account the STAT test score variability. It is demonstrated that the human-geography pedagogical approaches (GIS enactment or not), local-nonlocal divide, and ethnical difference (Han or not) were significant (
R Square = 0.47, adjusted
R Square = 0.40,
F = 6.07,
p < 0.01 for regression model). It is therefore revealed that the GIS teaching, Beijing local students, and Han ethnicity would enhance the students’ performance in a standardized STAT test. However, the other independent variables were reported not significant, indicating that the quite important findings in the educational fields (such as the role of the gender gap, student motivation and parental social capital) were not highlighted in this STAT test here. The limitation of the STAT test application in the context of China would explain partially the results in regression analysis. Despite these limitations, our preliminary assumption on the different effects of the two different human-geography pedagogical approaches (GIS enactment or not) was supported quite well in this test. More comparative experimental research on the Oriental and Western gaps and other socio-cultural backgrounds is worthy of attention for a more solid explanation in further research.