Linking Importance–Performance Analysis, Satisfaction, and Loyalty: A Study of Savannah, GA

: Importance–performance analysis (IPA) has been widely used to examine the relationship between importance, performance, and overall satisfaction in tourism destinations. IPA implicitly assumes that attribute performance will have little impact on overall satisfaction when stated importance is low. However, this assumption is rarely tested. This study, for the ﬁrst time, tested this assumption by including attributes in each IPA quadrant into a second-order structural equation model. Results indicate that attributes with lower ratings of importance in the “low priority” and “potential overkill” quadrants do not contribute to overall satisfaction, regardless of performance, while the opposite is true for attributes in the “keep up the good work” quadrant with higher ratings of importance and performance, thus conﬁrming the validity of this assumption. This novel approach allowed us to take a fresh look at an old debate, and the results suggest stated importance may be more useful than previously thought. Theoretical, methodological, and managerial implications are discussed.


Introduction
Although it is widely agreed that sustainability conceptually consists of three dimensionsenvironmental, social/cultural, and economic-it is argued that "sustainability cannot simply be a 'green' or 'environmental' concern, no matter how crucial those aspects of sustainability are" [1] (p. 1). Instead, more attention should be paid to other aspects of sustainability, including economic development and social justice. The economic development of a destination largely depends on having high quality tourism experiences and resulting satisfaction provided by the destination. Economically satisfied tourists are more likely to pay more, recommend others to the destination, and to visit again [2,3]. Thus, tourist satisfaction plays a crucial role in contributing to the success of a destination [4,5].
A tourism destination is usually composed of various attributes, involving accommodation, food, activities, shopping, and services, among others. It is the tourist's perceptions of these attributes in a destination that interact to form a composite image of it [6], which affects his or her overall satisfaction. In most cases, it is very unlikely that a tourist will experience all attributes in a destination during a single trip. Moreover, attributes experienced may not be equally important [7]. For example, if a person's main purpose to visit a destination is for snorkeling, then the opportunity to observe various and colorful marine life would be of high importance to him/her. Accordingly, the snorkeling experience would have a considerable impact on his/her overall satisfaction at the destination. In contrast, other attributes (e.g., sightseeing, hiking, etc.) that are not central to the main purpose of his/her trip may mean less in terms of their importance and performance [8]. Thus, an attribute in a destination may have different meanings for different tourists, and may not contribute equally to overall satisfaction. As a result, "determining the attributes of a product or service most important The IPA framework was introduced in marketing research to understand customers' satisfaction by matching their perceptions toward attribute importance and performance [11]. Importance and performance ratings are displayed on a two-dimensional grid, and fall into one of four quadrants-"keep up the good work", "potential overkill", "low priority" and "concentrate here" (Figure 1). Another classification approach is to insert an upward 45 • diagonal line (i.e., iso-rating or iso-priority line) to identify management priorities. Items on the line imply that performance equals importance, while items above the line indicate that performance fails to meet importance and thus requires improvement [12,13].
The use of the IPA has suffered criticism. One controversial issue is where the intersecting lines or crosshairs should be positioned [14][15][16]. The two commonly used approaches are scale-centered and data-centered, whereas the scale middle is used [17][18][19][20] for the former, and the average of all importance or performance ratings is used [21][22][23][24][25][26][27] for the latter. Relatively, "the majority of researchers use the mean values of actual importance and performance ratings when specifying the thresholds" [15] (p. 45). This study followed this approach with a slight modification [28], wherein the mean of the mean differences (0, 0), instead of the original ratings, was used as the crosshairs. If Excel or SPSS is used, all points will be automatically and effectively positioned in the IPA grid, with the crosshairs being positioned as (0, 0) and the iso-rating line positioned diagonally through the origin (0, 0) [28]. Data-centered methods have been used in a number of studies measuring satisfaction in the field of tourism studies [29][30][31][32][33]. Another controversial issue is the use of stated or absolute importance directly obtained from tourists vs. derived importance indirectly obtained from statistical methods (i.e., multiple regression analysis [9,34]; partial correlation analysis [22,23]; and simple regression analysis [35][36][37]). Although the use of the derived importance has advantages over the stated importance (i.e., reduced social desirability bias and fatigue bias [37,38]), this approach is not without limitations. Apart from the possible multicollinearity issue among attributes [39], some multiple regression analyses or partial correlation analyses may generate negative coefficients which "violates the positive relationship between service attributes and overall satisfaction in practice" [37] (p. 456). Although this indirect approach has gained popularity among some researchers, it is found that the stated importance is more robust than the derived importance in predicting overall satisfaction [9,40] and brand choice [41]. Thus, for the purpose of this study, stated importance was used.
Although there are some methodological limitations with the IPA in terms of its validity and reliability, the method has been widely used to measure product and service quality because of its simplicity, intuitiveness, and effectiveness [42]. While the IPA was initially designed as a market technique to elucidate management strategies of a business, its application has extended to many areas of services industries, including hospitality and tourism [15], wherein tourists' satisfaction at the attribute level is examined along with attribute importance and performance.

Importance, Performance, and Satisfaction
Satisfaction is defined as "a judgement that a product or service feature, or the product or service itself, provides a pleasurable level of consumption-related fulfilment" [43] (p. 13). Satisfaction has been viewed as an outcome of a mental process where the perceived performance of a product or service is compared with various standards (e.g., expectations). How satisfied an individual is largely depends on whether his/her perceived performance exceeds or equals expectations, or not. An individual feels satisfied if positive disconfirmation (i.e., performance exceeds expectations) occurs, while negative disconfirmation (i.e., performance fails to meet expectations) results in dissatisfaction. There are two methods for measuring disconfirmation: inferred measure and direct measure [44]. In an inferred measure, "the score for the measured standard of comparison is subtracted from the score Another controversial issue is the use of stated or absolute importance directly obtained from tourists vs. derived importance indirectly obtained from statistical methods (i.e., multiple regression analysis [9,34]; partial correlation analysis [22,23]; and simple regression analysis [35][36][37]). Although the use of the derived importance has advantages over the stated importance (i.e., reduced social desirability bias and fatigue bias [37,38]), this approach is not without limitations. Apart from the possible multicollinearity issue among attributes [39], some multiple regression analyses or partial correlation analyses may generate negative coefficients which "violates the positive relationship between service attributes and overall satisfaction in practice" [37] (p. 456). Although this indirect approach has gained popularity among some researchers, it is found that the stated importance is more robust than the derived importance in predicting overall satisfaction [9,40] and brand choice [41]. Thus, for the purpose of this study, stated importance was used.
Although there are some methodological limitations with the IPA in terms of its validity and reliability, the method has been widely used to measure product and service quality because of its simplicity, intuitiveness, and effectiveness [42]. While the IPA was initially designed as a market technique to elucidate management strategies of a business, its application has extended to many areas of services industries, including hospitality and tourism [15], wherein tourists' satisfaction at the attribute level is examined along with attribute importance and performance.

Importance, Performance, and Satisfaction
Satisfaction is defined as "a judgement that a product or service feature, or the product or service itself, provides a pleasurable level of consumption-related fulfilment" [43] (p. 13). Satisfaction has been viewed as an outcome of a mental process where the perceived performance of a product or service is compared with various standards (e.g., expectations). How satisfied an individual is largely depends on whether his/her perceived performance exceeds or equals expectations, or not. An individual feels satisfied if positive disconfirmation (i.e., performance exceeds expectations) occurs, while negative disconfirmation (i.e., performance fails to meet expectations) results in dissatisfaction.
There are two methods for measuring disconfirmation: inferred measure and direct measure [44]. In an inferred measure, "the score for the measured standard of comparison is subtracted from the score for perceptions" while a direct measure directly asks the consumer to indicate on a Likert scale about his/her expectations being "better than expected" or "worse than expected" [44] (p. 121).
In most cases, expectations are measured before the purchase/consumption of a product/service, while service quality and satisfaction are measured on site [44]. However, it is not realistic to obtain people's expectations before leaving for a destination, nor it is accurate to assess their expectations on site by asking them to recall what they expected before leaving [31]. Alternatively, much of the research has chosen to measure attribute importance and performance and then examine the extent to which these two measures correlate with the attribute-level satisfaction or overall satisfaction [4,9,45,46]. However, some have used perceived performance or quality of service as a priori to measure satisfaction [9,31], although these constructs are conceptually different [45].
It is suggested that importance and expectations are potential antecedents to performance [11]. In other words, attribute performance could be directly influenced by attribute importance. Some argue that importance is positively related to performance [14]. Previous studies have found that both importance and performance have significant effects on overall satisfaction. For example, it is found that both importance and performance significantly and directly contribute to satisfaction; the impact of performance, however, is more significant than importance [4]. Moreover, importance indirectly impacts satisfaction via its impact on performance. A study [47] measured service quality in several types of tourism businesses, finding that the mean perception of performance scores is a good or better evaluation of perceived service quality than the computed quality score of perceptions minus expectations. Similar findings have also been reported in several other studies [48,49]. Performance was also found to be a better predictor of the service quality than importance and other predictors [45] (e.g., expectations, importance times expectations, importance minus performance, importance times performance, performance minus expectations, and importance times the difference between performance and expectations), although these predictors are also significantly and positively related to the overall measure of quality.
Contrary to these findings on performance being more significant than importance in predicting overall satisfaction, a study [50] in Australia involving visits to various aspects of aboriginal culture suggested that attribute importance determined satisfaction ratings. Another study [51] tested the relationship between product categories, product attributes, and satisfaction for shopping, finding that product attribute importance and satisfaction were significantly correlated for Japanese tourists visiting Hawaii. It is argued that importance and performance are interdependent and cannot be separately treated when overall satisfaction is examined, based on the rationale that "overall satisfaction will be influenced less by attribute performance when the self-stated importance of the attribute is low" [9] (p. 303). In other words, the performance of an attribute, regardless of its quality, may not affect a tourist's overall satisfaction if the tourist did not experience or did not care too much about the attribute, while the opposite would be true for attributes with high levels of importance and performance. This assumption leads to the following four hypotheses: Hypothesis 1 (H1). Attributes in the "keep up the good work" quadrant will significantly and positively contribute to overall satisfaction. Hypothesis 2 (H2). Attributes in the "concentrate here" quadrant will significantly and negatively predict overall satisfaction.

Hypothesis 3 (H3).
Attributes in the "low priority" quadrant will not contribute to overall satisfaction.

Hypothesis 4 (H4).
Attributes in the "potential overkill" quadrant will not contribute to overall satisfaction.

Satisfaction and Loyalty
Loyalty, in the marketing literature, has been defined in three dimensions: attitudinal, behavioral intention, and actual behavior [52]. Attitudinal loyalty refers to expressed liking for a destination/establishment or festival/event without overt intentions, while behavioral intention is defined by intention to revisit, recommend, and say positive things about a person's experience. In addition, willingness to pay more and likelihood to switch are also included in some studies to measure behavioral intention [53,54]. Finally, behavior loyalty refers to the actual purchase of a product/service (e.g., proportion of nights/visitors/dollars spent at a particular brand or property, frequency of visits, and actual patronization of a destination) [52].
Satisfaction is considered a necessary step towards the formation of loyalty [55]. The relationship between satisfaction and loyalty has been measured in numerous studies on festivals [56], hospitality [53], and tourism [57]. A strong positive relationship between satisfaction and loyalty was found in a meta-analysis of festival literature [56]. This positive relationship was also found in another meta-analysis of hospitality literature [52] and in most other individual studies on tourism destinations [58]. Thus, the following hypothesis is proposed: Hypothesis 5 (H5). Overall satisfaction will significantly and positively predict loyalty.

Study Area
The city of Savannah, Georgia, USA (our study location), is located at 32 • 3'3"N, 81 • 6'14"W. With a total area of 202.3 km 2 , it is the largest city in Chatham County and the fourth largest city in Georgia. In 2015, it had an estimated population of 145,674 [59].
Savannah (founded in 1733) is internationally known as the first planned city in the US. The size of the downtown area is among the largest National Historic Landmark Districts in the country and is abundant in cultural and historical attractions. The downtown area has unique architecture and historic buildings in addition to its nature-based attractions, such as coastal islands, botanical gardens, and city parks. It is famous for its 22 nature-dominated and parklike public squares that contain woods and gardens. Savannah's city plan is organized around the squares which are the main components of the city's urban forests.
The unique and elegant architecture, fountains, ornate ironwork, and green squares have been a major interest for over 50 million visitors in the last 10 years [60]. In 2014, more than 13.4 million people visited the city. Of those visitors, 7.6 million were overnight tourists and 5.8 million were day-trippers [61]. Savannah offers a variety of tourism attractions and visitors' activities (e.g., city trolley and carriage tours around the public squares and architecture, boating along the Savannah River, dinning in restaurants, shopping antique and souvenir stores, attending guided night walks, and attending festivals and special events).

Data Collection
The study used a convenient sample of visitors walking in the Rousakis Plaza/River Street, a popular attraction in the city. A convenience sampling method is used to collect a nonprobability sample and has been widely used by "almost all of the major public opinion polling groups, political polling groups, and market research organization" [62] (p. 49). Study participants (aged 18 years or older) were asked to complete an on-site questionnaire consisting of visitors' trip characteristics, perceptions of tourism attributes (i.e., importance, performance, and satisfaction), overall satisfaction, and loyalty. Savannah's Park and Tree Department, the Savannah Visitor Information Center, and other project collaborators helped to develop the study instrument. Sampling took place from July 2008 to August 2009.

Measurement
Visitors' perceptions of attribute importance, performance, and satisfaction were measured by 23 items on a 7-point Likert scale (1 = the least important, 7 = the most important for importance; 1 = strongly disagree, 7 = strongly agree for performance; and 1 = extremely dissatisfied, 7 = extremely satisfied for attribute-level satisfaction and overall satisfaction). Likewise, destination loyalty (i.e., behavioral intention) was also measured on a 7-point Likert scale (1 = strongly disagree, 7 = strongly agree). Visitors were also asked to provide information on their socio-demographics (i.e., gender, age, education, occupation, and residency) and trip characteristics (i.e., lengths of stay, frequency of visits, group size, and expenditures). Finally, they were asked to provide comments in an open-ended question about the study and tourism in the city.

Data Analysis
The expectation-maximization algorithm [63] was used to replace missing data before conducting data analyses. A total of 36 questionnaires were excluded from the 640 returned questionnaires due to systematic incomplete responses or responses with skeptical patterns. This reduced the sample size to 604 for further analysis. After the screening, the missing rate was less than 5% for all variables, which is inconsequential [64].
In addition, data skewness and kurtosis for observed variables were also tested. If absolute values of univariate skewness and univariate kurtosis for a sample are within 2 and 3, respectively, the data can be considered not to "extremely" deviate from the normal distribution [65]. The normality assessment indicated that the absolute values of all endogenous variables met the criteria, except for the four loyalty items, whose skewness values met the criteria of less than 2, but kurtosis values were greater than 3, ranging from 3.6 to 5.3, resulting in a moderate violation of normality. To correct this non-normality issue, the maximum likelihood bootstrapping technique with 10,000 bootstraps was used [66].
Data analyses consisted of four steps. First, gap analysis was conducted using pairwise t-tests to examine the extent to which perceptions of importance and performance matched one another. Second, IPA was conducted with attributes being graphically displayed on the I-P grids. Third, factor analysis was also conducted for the five items measuring loyalty, and for the attributes in each quadrant. Finally, SEM was analyzed to test the five proposed hypotheses.
For the factor analysis, principal components analysis was conducted to obtain latent variables. The varimax rotation and eigenvalues of 1.00 or more were used to identify potential factors. The appropriateness of the data for factory analysis was tested using the Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy and Bartlett's test of sphericity. A cut-off point of 0.45 was used to select items for a factor [67]. Cronbach's alpha coefficients were calculated to estimate the reliability of factors. Although a Cronbach's alpha value of 0.70 [68] is commonly accepted for measuring a scale's reliability, it is also acceptable to have a lower value of 0.60 in exploratory studies and in research in the social sciences and psychology [69,70].
In a SEM analysis, if p > 0.05, the model is acceptable. However, a p-value is usually less than 0.05 due to the χ 2 value being sensitive to sample size as well as model complexity. This would lead to a well-fitting model being rejected if judged by this criterion [71]. Alternatively, a ratio of χ 2 /df as high as 5 was used to assess a model fit [72].
Several other indices, such as RMSEA (Root Mean Square Error of Approximation), CFI (Comparative Fit Index), IFI (Incremental Fit Index), NFI (Normed-Fit Index), RFI (Relative Fit Index), and TLI (the Tucker-Lewis Index) have also been used in the literature to judge a model fit. A RMSEA value ranging from 0.05 to 0.10 indicates a fair fit [73], while 0.90 is commonly used as the cut-off criterion for CFI, IFI, NFI, RFI, and TLI. However, 0.95 has also been recommended as the threshold for a better fit [74].

Sample Characteristics
In total, 1219 visitors were approached, with 640 agreeing to fill the questionnaire, resulting in a response rate of 52.5%. The majority of respondents were females (57.0%), aged 26 to 54 (47.3%), well educated, and affluent. Most of respondents (66.2%) had visited the city at least once prior to the current trip. In addition, the average group size was 3.22 persons and the length of stay was 2.77 days. Table 1 presents the paired-sample t-tests for mean differences between performance and importance. Attribute satisfaction is also presented in the table. As shown, the mean values for importance, performance, and attribute satisfaction were 5.73, 5.60, and 5.64, respectively. The mean for overall satisfaction was 6.04. "Personal safety and security" had the highest rating on importance (M = 6.31), while the item "attractive architecture/buildings" was rated highest on performance (M = 6.28) and satisfaction (M = 6.27). In contrast, "good night-life/entertainment" was rated lowest on importance (M = 4.83), and "parking availability" was rated lowest on both performance (M = 4.14) and satisfaction (M = 4.25). t-Tests showed that 19 out of 23 pairs were significantly different, with nine items measuring safety, price, and food/service being significantly lower in performance than in importance. However, 10 items, measuring green space, variety of restaurants, shops, and accommodation, and entertainment/nightlife had a significantly higher performance than importance. A correlation analysis indicated that performance, importance, and attribute satisfaction are highly correlated with one another. It is worth noting that attribute performance and attribute satisfaction are highly correlated, with the coefficient being 0.985, which is close to 1.0, suggesting a perfect match between attribute performance and attribute satisfaction ( Figure 2). As the figure shows, the two lines representing attribute performance and attribute satisfaction are aligned very well.

Importance-Performance Analysis
As mentioned in the methods section, neither the midpoint of the scale (i.e., 4.0 out of 7.0), nor the mean values of the raw scores, were used to determine the position of the crosshairs in the I-P grid. Instead, the mean (which is 0) of the mean differences between the raw scores for importance or performance and associated means was used to determine the position of the crosshairs. The results are displayed in Figure 3. If judged by the iso-rating line, all 14 items below the diagonal line perform well (i.e., items 1-4 on green space, items 5, 10 and 11 on variety of products, item 9 on friendliness of local people, items 12-14 on cultural and heritage resources, item 17 on information/welcome centers, and item 23 on good night-life/entertainment), while the nine items above the line require special attention to improve (i.e., item 6 on street cleanliness, items 7 and 8 on safety, items 15, 16, and 22 on price, item 18 on parking, and items 19 and 21 on food and restaurant service). Interestingly, these nine items are identical to the nine items whose performance is perceived significantly lower than their importance in the gap analysis.
Alternatively, if judged by the four quadrants, the four green space items (i.e., items 1-4), three safety items (i.e., items 7-9) and two heritage items (i.e., items 13 and 14) are located in the "keep up the good work" quadrant. Item 6 "street cleanliness" and two other items on food and restaurant service (i.e., items 19 and 21) are in the "concentrate here" quadrant. Three items, measuring the price of food/accommodation, attractions, and meals (i.e., items 15, 16 and 22), along with another three items, measuring parking (item 18), variety of cuisine (item 20), and nightlife/entertainment (item 23) are positioned in the "low priority" grid. Finally, five items are in the "potential overkill" area, including three items involving the variety of products (i.e., items 5, 10 and 11), one item regarding cultural attractions (item 12), and one regarding information/welcome centers (item 17).

Importance-Performance Analysis
As mentioned in the methods section, neither the midpoint of the scale (i.e., 4.0 out of 7.0), nor the mean values of the raw scores, were used to determine the position of the crosshairs in the I-P grid. Instead, the mean (which is 0) of the mean differences between the raw scores for importance or performance and associated means was used to determine the position of the crosshairs. The results are displayed in Figure 3. If judged by the iso-rating line, all 14 items below the diagonal line perform well (i.e., items 1-4 on green space, items 5, 10 and 11 on variety of products, item 9 on friendliness of local people, items 12-14 on cultural and heritage resources, item 17 on information/welcome centers, and item 23 on good night-life/entertainment), while the nine items above the line require special attention to improve (i.e., item 6 on street cleanliness, items 7 and 8 on safety, items 15, 16, and 22 on price, item 18 on parking, and items 19 and 21 on food and restaurant service). Interestingly, these nine items are identical to the nine items whose performance is perceived significantly lower than their importance in the gap analysis.
Alternatively, if judged by the four quadrants, the four green space items (i.e., items 1-4), three safety items (i.e., items 7-9) and two heritage items (i.e., items 13 and 14) are located in the "keep up the good work" quadrant. Item 6 "street cleanliness" and two other items on food and restaurant service (i.e., items 19 and 21) are in the "concentrate here" quadrant. Three items, measuring the price of food/accommodation, attractions, and meals (i.e., items 15, 16 and 22), along with another three items, measuring parking (item 18), variety of cuisine (item 20), and nightlife/entertainment (item 23) are positioned in the "low priority" grid. Finally, five items are in the "potential overkill" area, including three items involving the variety of products (i.e., items 5, 10 and 11), one item regarding cultural attractions (item 12), and one regarding information/welcome centers (item 17).

Factor Analysis
A factor analysis was conducted for the five items measuring loyalty (KMO = 0.89, p < 0.001 for Bartlett's test of sphericity), generating one factor (85.69% of total variance was explained, Cronbach's α = 0.95). (Table 2). These five items were used as observed variables to measure loyalty in the SEM analysis. A factor analysis was also conducted to confirm that items in each quadrant measure the same construct. The results show that the "keep up the good work" quadrant consists of three factors-"green space", "safety", and "heritage"-each explaining 34.25%, 14.29%, and 13.63% of the total variance, respectively, with a cumulative variance of 62.18% (Table 3). In addition, two factors are

Factor Analysis
A factor analysis was conducted for the five items measuring loyalty (KMO = 0.89, p < 0.001 for Bartlett's test of sphericity), generating one factor (85.69% of total variance was explained, Cronbach's α = 0.95). (Table 2). These five items were used as observed variables to measure loyalty in the SEM analysis. A factor analysis was also conducted to confirm that items in each quadrant measure the same construct. The results show that the "keep up the good work" quadrant consists of three factors-"green space", "safety", and "heritage"-each explaining 34.25%, 14.29%, and 13.63% of the total variance, respectively, with a cumulative variance of 62.18% (Table 3). In addition, two factors are found in the "low priority" area (Table 4), while the "concentrate here" quadrant and "potential overkill" quadrants each have one factor (Tables 5 and 6). Table 3. Summary results of the factor analysis of items in the "keep up the good work" quadrant.

Code
Factor ( KMO = 0.80, p < 0.001. Item 9 was removed from factor 2 due to its loading on factors lower than 0.45, and because its item-total correlation is less than 0.30. Table 4. Summary results of the factor analysis of items in the "low priority" quadrant.   The existence of subscales in the two quadrants warrants a second-order factor being included in the SEM. However, factor two, which consists of two attributes (i.e., attributes 20 on cuisine and 23 on night life) in the "low priority" quadrant, was excluded from the SEM for two reasons. First, the two attributes are not conceptually linked. Second, inclusion of the factor in the model resulted in a poorer fit.

Importance-Performance Quadrants and Structural Equation Modeling
In order to test H1-H5 with regard to relationships between items in each IPA quadrant with overall satisfaction, covariance arrows were initially added to all exogenous variables. This resulted in a poor model fit. A good model fit was achieved when only the covariance links between "keep up the good work" and "concentrate here", and between "low priority" and "potential overkill" were kept in the model (Figure 4).   Tables  2-6 for the observed variables measuring importance and performance and (2) to increase the model fit, not all items were used in the model.

Discussion
At least four findings from this study are worthy of discussion. First, the results from the IPA reveal that Savannah as a tourism destination performs well, with the majority of attributes (i.e., 14 out of 23) being below the iso-rating line, which means that performance exceeds importance. In addition, there were nine attributes in the "keep up the good work" quadrant as opposed to three in the "concentrate here" area, suggesting the city is highly attractive due to its green space, unique heritage, and safety. Thus, to remain competitive in the market, the city needs to maintain or enhance Although the p-value for the model is less than 0.05, other indices (e.g., χ 2 /df = 3.36, RMSEA = 0.063, CFI = 0.93, IFI = 0.93, NFI = 0.91, RFI = 0.90, TLI = 0.92) indicate a good model fit. This shows that the first-order constructs-"green space", "safety", and "heritage"-are all significantly and positively related to the second-order construct of "keep up the good work", which significantly and positively correlates with overall satisfaction (r = 0.43, p < 0.001, R 2 = 0.19). Thus, H1 is supported.
Interestingly, none of other three constructs-"concentrate here", "low priority", and "potential overkill"-were significantly related to overall satisfaction. Specifically, "concentrate here" was negatively related to overall satisfaction (r = −0.05, p > 0.05), albeit not at the significant level of 0.05. This leads to the partial support of H2. The regression weights for "low priority" and "potential overkill" were 0.01 and 0.00, respectively, which indicates that items in these two quadrants have little to zero impact on overall satisfaction. Hence, H3 and H4 are fully supported. Finally, overall satisfaction significantly and positively contributes to loyalty (r = 0.57, p < 0.001, R 2 = 0.33), leading to the support of H5.

Discussion
At least four findings from this study are worthy of discussion. First, the results from the IPA reveal that Savannah as a tourism destination performs well, with the majority of attributes (i.e., 14 out of 23) being below the iso-rating line, which means that performance exceeds importance. In addition, there were nine attributes in the "keep up the good work" quadrant as opposed to three in the "concentrate here" area, suggesting the city is highly attractive due to its green space, unique heritage, and safety. Thus, to remain competitive in the market, the city needs to maintain or enhance the quality of these important attributes, particularly the unique architectural buildings and green space featured by the park-like public squares. It should be noted that architectural buildings and green space attributes are still under the "keep up the good work" quadrant, even when judged by the iso-rating line. This finding not only endorses previous studies on urban forests in cities (i.e., historical attractions/architectural buildings and green space were ranked the top two most important attributes for the city in a previous study [75]; pubic squares were ranked the highest in scenic beauty [76]) but reflects the real appeal of tourism attractions in the city. For instance, unique architecture, ornate ironwork, fountains and green squares were among the top motives for people to visit the city [60]. This, in turn, corroborates the use of stated importance for the IPA. A previous study [26] also reported that safety and natural merits (e.g., nice weather, outstanding scenery, etc.) are among the most important attributes perceived by visitors to Guam.
It should be noted that results using the iso-rating line as a classification threshold are consistent with those using a gap analysis, where t-tests have shown that performance is perceived to be significantly lower than importance. This substantiates another previous study [19] which argued that the data-centered method can provide closer agreement with the results of gap analysis. It also endorses a study [77] on stakeholders' perceptions of challenges facing rural communities in the Appalachian region, where attributes above the iso-rating line (i.e., performance < importance) represented most of those identified by t-tests.
Second, the mean value of performance (M = 5.60) is closest to that of the attribute-level satisfaction (M = 5.64), and the overall satisfaction rating of 6.04 is the highest measured. It is likely that a tourist's overall experience/satisfaction is primarily affected by several main attributes, and, in this study, this could be heritage and green space, which have the highest rating of attribute-level satisfaction (i.e., architectural buildings with a mean value of 6.27 are ranked the highest, followed by historical attraction, M = 6.20, and the four green space attributes with mean values ranging from 6.10 to 6.18).
Third, there is a debate in the literature about the use of absolute vs. relative measures of importance and performance to predict overall satisfaction. Relative importance was thought, in some studies, to be more valuable to managers than absolute importance [9]. Our study partially endorses this assertion, albeit in a different manner. That is, we argue that it is the relative differences between performance and importance, and the relative differences within quadrants of performance and importance, as represented by the IPA grid, that determine overall satisfaction. Attributes in each IPA quadrant are positioned relative to the mean values of importance and performance and observed variables in the SEM (Figure 4) represent the differences between performance and importance of relevant attributes in each quadrant. The modeling of relative values proved to be quite effective in predicting overall satisfaction and provides a strong empirical support for the importance-performance analysis which assumes that "overall satisfaction will be influenced less by attribute performance when the self-stated importance of the attribute is low" [9] (p. 303). The model indicates that attributes with lower importance in the "low priority" and "potential overkill" have no contribution to overall satisfaction, while the opposite is true for attributes in the "keep up the good work" quadrant with higher importance and performance ratings.
Fourth, and lastly, this study used a second-order factor in the SEM, a technique that has rarely been used in tourism literature. According to some researchers [77,78], a second-order model can never produce a better model fit than a first-order model. However, our study shows that the opposite is true, and thus, is in favor of the use of SEM methods that include a second-order structure; this has also been tested and recommended in another study [10]. It should be noted that alternative models without the inclusion of second-order constructs have been tried, but with poorer fits. In addition to the better model fit, inclusion of a second-order factor in SEM has two additional advantages. First, it avoids the addition of the scores of manifest variables into an aggregate score, as practiced by some researchers who have treated the aggregated variables as reflective of a higher-order construct [10]. Second, it allows for the inclusion of as many observed variables as possible in the SEM and thus provides the chance to test the impact of those observed variables, some of which will have to be excluded from the model otherwise. For instance, if the second-order factor was not used in the SEM (Figure 4) that tests H1-H4, attributes 7, 8, 13, and 14 would have to be removed in order to achieve a model with acceptable fit indices.

Conclusions
A tourism destination is composed of a set of attributes which may have different meanings to different people. Tourists "have to choose which of the attractions they wish to visit and which to skip" [79] (p. 742) due to time/budget constraints and varying preferences for attributes. Therefore, an urban tourism experience rarely depends on the totality of a city's attributes, but on one or several major attributes [75]. To remain competitive in the tourism market, each tourism destination must have one or several attributes as the core resources and attractors [80] that are distinct and unique from others. This study shows that green space and architectural buildings are the two such attributes with higher ratings of importance, performance, and attribute-level satisfaction than other attributes in the city. Apart from this, this study has important theoretical, methodological, and managerial implications.
Theoretically, this study, for the first time, tested the assumption, using SEM, that performance of an attribute will not significantly predict overall satisfaction if the attribute importance is relatively low, while the opposite is true when attribute importance is high. Thus, the validity of IPA was tested and confirmed. That said, future studies need to factor the relative importance (i.e., importance ratings minus the average importance value) into the model.
Methodologically, this study introduced the use of a second-order factor into the SEM, a technique that has rarely been used in tourism literature. It was shown that SEM with the inclusion of a second-order factor not only performed better than without it, but also allowed for the test of individual manifest variables which would be otherwise aggregated or removed.
Managerially, the green elements in the city which had higher importance, performance, and satisfaction ratings, as perceived by tourists, play a vital role in attracting tourists, and thus decision makers and planners should understand the trade-offs among different land use choices. If green space is not justified for its existence in terms of economic revenue generated from tourism and other sources, it may give way to residential or commercial development [81].
This study is not without limitations. First, there is no guarantee that a tourist who responded to all attributes has actually experienced each of them. To avoid this, future research may need to add a column "not applicable" to the Likert-scale measures. Second, biases on responses may exist due to the use of the convenience sampling method. Since a random sample is not realistic for a street survey, future research needs to conduct surveys at other sites (e.g., public squares, restaurants/bars, shopping centers, and hotels) to reduce potential response biases. Finally, several observed variables, particularly those measuring loyalty were not normally distributed, which may affect the statistical power of the analysis. However, this non-normality impact is likely to be small given the use of maximum likelihood estimators which are relatively robust to violations of normality assumptions [82] and the use of 10,000 bootstraps for the correction of the non-normality data [66]. Having said this, future studies need to increase sample size to improve the normal distribution of the data.
In conclusion, this study used a novel approach by including attributes in each IPA quadrant as observed variables in a second-order SEM to test the assumption that attributes with lower importance ratings will have little influence on overall satisfaction, regardless of their performance. This novel approach allowed us to take a fresh look at the old debate regarding the use of stated vs. derived importance methods for the predication of overall satisfaction, and the results suggest that stated importance may be more useful than previously thought.