Other- versus Self-Referenced Social Impacts of Events: Validating a New Scale

: Publicly funded sport events are partially justiﬁed based on positive social impacts. Past research generally measured social impact for a generic and global “other” with claims such as “Events create new friendships in the community”. These other-referenced (OR) social impacts are generally higher pre-event than post-event and are inﬂated for both methodological and theoretical reasons. In the pre-event period of the Tokyo 2020 Olympic and Paralympic Games, we empirically tested OR items compared to self-referenced (SR) items, such as “Because of the event, I create new friends in the community” and allowed projection bias to vary between scales. Results of the experiment between an OR-Social Impact Scale (OR-SIS) and a similar SR-SIS conﬁrmed OR-measures to be signiﬁcantly higher than SR-measures. While artiﬁcially inﬂated OR scores may be useful for event organizers and politicians to gain support for hosting, estimates based on circumscribed self (SR) are a methodologically appropriate measurement of social impact. present asserts that measuring pre-event social impact through expected using a self-referenced social impact scale provides a more accurate assessment of possible social impact than using an other-referenced social impact scale. A self-referenced social impact scale proponents of events inform host communities more realistically about how events contribute (or not). Overestimating social impact claims through measures based on perceptions of others may raise residents’ expectations; if not delivered, this could negatively impact perceived beneﬁts. social exchange theory, residents form toward the object (i.e., the event) when the beneﬁts are lower than the costs Thus, accurate assessment is important for event organizers to host sustainable sport events.


Introduction
Publicly funded sport events require the demonstration of substantial and sustainable positive outcomes for host communities. From a social welfare perspective [1], an efficient allocation of resources suggests that tax-payers, as major co-payers (e.g., 80% of the Olympic Games; [2]), should receive beneficial outcomes. Nevertheless, residents' quality of life may be directly impacted by the event, be it positive or negative, before, during, and/or after the event. Research has demonstrated that hosting one-off major sporting events generally results in negative ecological impacts [3], as well as a lack of substantial economic impact [4][5][6][7] and sustainable tourism impact [8]. This has shifted the justification for hosting mega-events towards trying to demonstrate positive benefits from other features of events such as urban planning [9], or less tangible features such as social impacts [10] and sport participation impacts [11] affecting local communities and residents after the event has taken place. Kellett, Hede, and Chalip emphasized "the need for greater attention to the social value of events and the relationship between events and their host community" [12] (p. 117). Thus, measuring social impacts from events for residents is important to potentially identify if and how event-related outcomes occur and are sustained in host communities. It is equally important to accurately measure those social impacts, but there is no consensus on how to best measure social impact.
Most research to date has asked residents about their perceptions of social impact (e.g., [13][14][15]). As West and Kenny stated, "if researchers are interested in accuracy of perception, they need to operationally define and measure it" [16] (p. 359). Previous research has defined perception as attitudes or opinions regarding the social impact of events for a generic and global "other" with questions such as "Events create new friendships in the community". We refer to these as other-referenced (OR) to distinguish them from questions operationalized through a self-referenced (SR) lived experience, for example, "Because of the event, I created new friends in the community".
There are two concerns with OR perceptions. First, in relation to time, most research to date has found social impacts measured in terms of OR items to be higher pre-event compared to post-event [13][14][15][17][18][19]. These greater predicted pre-event OR values can be criticized as over-estimates because the post-event values are more precise measurements since the actual items being asked about have already occurred. While these pre-event perceptions of social impacts may well serve event organizers and/or local governments to gain event support (e.g., [20,21]), improving ways to measure pre-event social impact more precisely is warranted.
A second area of concern is the wording or point of reference. Questions about a global other (OR) and circumscribed self (SR) lead to different answers [16]. For instance, Fredline, Deery, and Jago [22] found consistently higher OR scores than SR scores for social impacts of tourism. Because of social projection bias [23], a respondent will report that others are more likely to hold their views, resulting in higher scores than if asked about themselves. Therefore, we suggest that the point of reference taken (i.e., circumscribed self versus a global other) can calibrate and provide a more accurate measurement of social impacts pre-event.
To account for these concerns with OR social impact scales, the purpose of this study is to re-word existing questions from OR to SR and to test each simultaneously. Whereas others have looked at the same type of questions (OR) in two different time periods pre-and post-event [13,15,17], our research design looks at different types of questions (OR vs. SR) in the same time period (pre-event), keeping time constant. By holding the time constant to the higher pre-event period, other influencing factors, such as media framing, level of excitement, etc., are also kept constant, allowing us to isolate the effect of words in obtaining a more precise estimate of social impact. Any difference in scores will be because we re-orient respondent perceptions of social impact from others to self.
In what follows, we first elaborate on the social impact of events and the theories underpinning their measurement. We provide an overview of how past research has measured the social impact of events. We discuss the difference between OR and SR outcomes in terms of two distinct points of reference, namely: (1) words (OR vs. SR items), and (2) time (pre-vs. post-event). Next, we conduct an experiment comparing two similar social impact scales for which only the wording has been changed; one scale reflects a Self-Referenced Social Impact Scale (SR-SIS), while the other reflects an Other-Referenced Social Impact Scale (OR-SIS). From data collected in the pre-event time of the Tokyo 2020 Olympic and Paralympic Games (OPG), we validate the scales and present practical insights on the anticipated social impact of this mega event. Finally, the results allow us to advocate for future use of the SR-SIS as a measurement of social impact.

Social Impact of Sport Events
Mathieson and Wall's definition of social and cultural impacts of tourism can be easily applied to sport as the ways in which sport events contribute to "changes in value systems, individual behaviour, family relationships, collective lifestyles, safety levels, moral conduct, creative expressions, traditional ceremonies and community organisations" [24] (p. 133). Ritchie [25] described perception of social impact of sports events as enhanced local pride, a sense of community, and enthusiasm for the community among residents of a host community. Holmes and colleagues [26] distinguished between positive and negative social impacts as well as between short-term (during the event) and long-term social impacts (after the event). Examples include prestige for the host community (positive, short-term), increased traffic (negative, short-term), building community pride and community cohesion (positive, long-term), and community alienation (negative, long-term).

Theories Underpinning Scales for Social Impact of Events
Different theories have informed the development of various social impact scales (see Table 1, column 2). Some scales developed based on these theories are predominantly OR-based while others are more SR-based. Social Exchange Theory (SET) is perhaps most commonly used, assessing that residents are willing to become involved in a social exchange if the perceived benefits outweigh the costs of involvement (e.g., [27]). In other words, if residents believe that there will be an overall positive impact from hosting an event locally, they will be in favour of hosting it [28]. The Theory of Reasoned Action (TRA) analyzes the interrelationships between beliefs, attitudes, intention, and behavior with the intent to predict and understand individuals' behavior [29,30]. According to Social Representation Theory (SRT), perceptions of impacts are formed through "direct experiences, social interaction and other sources of information, such as the media" [31] (p. 147). Thus, this requires some form of interaction between residents (i.e., an experience) and information sources to shape perceptions [32]. Overall, researchers utilizing SET, TRA, and SRT predominantly use individuals' perceptions of others to measure social impacts such as overall community values, expectations, and beliefs.
Community Attachment Theory (CAT) posits that OR perceptions of community residents toward hosting a sport event are largely impacted by the extent to which individuals feel connected to and involved in the community at large. Trust and reciprocity, concepts which relate to social capital, are considered to be important factors (e.g., [33,34]). Social Identity Theory (SIT) refers to values and emotional attachment associated with memberships in a particular group (i.e., connection and involvement; [35,36]). Social Anchor Theory (SAT) is used to explain psychological benefits that are sustained among group members after experiencing an event, and consists of social capital and social identity [37]. While different authors define social capital in different ways (e.g., [38]), in the context of events, social capital reflects how the event affects the residents and their relationship to the community [33]. The focus of CAT, SIT, and SAT on feelings requires the application of SR items. In constructing scales of social impact, some researchers have used one theoretical framework [39], while other researchers have integrated several [40].

Measuring Social Impact of Events
Regardless of the theories used to create previous scales, social impact is intangible in nature and therefore challenging to capture. Column 3 in Table 1 provides an overview of measurements of social impacts in the various studies, and illustrates the use of OR and/or SR items and constructs in each of these studies. Most of the studies presented in Table 1 are standalone projects and isolated cases. From the table it is evident that: (1) there is no unified, accepted social impact scale for sport events; and (2) many social impacts rely heavily on OR-based measurements of residents, or occasionally employ a mix of both OR and SR items (see Table 1).
While there is no globally accepted social impact scale, there are some common, recurring dimensions that capture social impact such as: community spirit, social cohesion, social capital, community involvement, disorder and conflict, and feelings of (un)safety (see Table 1, column 3). Community spirit, the one common dimension across all studies presented in Table 1, refers to feelings of pride and happiness instilled by an event. Some authors call it psychic income (e.g., [33,37]), or a psychological "feel-good-factor" [45]. Social cohesion represents people's perceptions as to how an event affects connectedness between individuals in the community [42,43], while social capital reflects how the event affects the residents and their relationship to the community (e.g., [33,38]). Community involvement indicates to what extent the community is involved with hosting an event and whether their input is solicited and appreciated [14,19]. Disorder and conflict gauges to what extent an event disrupts residents' daily lives [13,22]. Feelings of (un)safety enquire about peoples' feelings of (un)safeness [14]. These six commonly used constructs serve as a basis for this study. Sport participation can be added to this list, given its recent prominence as an acclaimed social outcome of events [11,46].  SET, SRT; 23 items (2)

(21 OR and 2 SR); 4 Factors
General benefits: Because of the World Games I will have more recreational opportunities (SR) Community involvement: I support the World Games because of its vital role in our community (SR) Negative impacts: The World Games will result in traffic congestion (OR) Scores vary according to segmentation group (neutral, moderately, and positive) Effect on personal quality of life was limited and scores on "personal impact" scored consistently lower than "scores on community impact" Constructs score well above the indifference point of 4; three constructs (social connections, diversity tolerance and value of life) score above the indifference point of 3; two constructs (trust and safety and collective action) score around or below the indifference point of 3 All items score above the indifference point of 4, but all OR items scored above 6.0, and the SR items scored below 6.0 (except for 2 items: "I really enjoy following golf" and "supporting this cause is important to me") Items means not comparable; participants' excitement and neighbourhood identification decreased after the event; social capital and team identification increased after the event.

Mainly experience-based
Note. * All nine social impact of events articles published in JSM, SMR and ESMQ since 2013 are included in Table 1. Two key articles were added from Tourism Management, as well as Fredline et al.'s foundational report [22] and Balduck et al.'s article [13] which was the first social impact paper to appear in ESMQ. CAT = Community Attachment Theory (Social Capital); SAT = Social Anchor Theory; SET = Social Exchange Theory; SIT = Social Identity Theory; SRT = Social Representation Theory; TRA = Theory of Reasoned Action; OG = Olympic Games; SR = Self-Referenced; OR = Other-Referenced; (1) = number of items not provided. Background colour clearly delineate the four distinct approaches to measuring social impact; Italic words represent "Items".

Point of Reference in Measuring Social Impact
The fact that social impact studies, including the ones in Table 1, use non-unified social impact scales and pertain to unique events, hinders comparing actual results in numerical terms. However, from column 4 in Table 1 it is apparent that studies utilizing OR measures generally demonstrate average scores above the indifference point for positive social impact perceptions. Results for OR negative social impacts are more variable with some scoring above the indifference point, indicating negative social impact perceptions [13,14]. Other studies show negative social impact with scores both above and below the indifference point (e.g., [41]), or solely below the indifference point [30]. Low scores of negative impact indicate a positive perception. Nevertheless, studies relying on OR perceptions overly demonstrate positive social impact outcomes [13][14][15][17][18][19].
In contrast, studies that embedded some SR items demonstrate a range of scores above and below the indifference points (e.g., [33,35]). In cases where SR items score above the indifference point, the measures are consistently lower than OR items (e.g., [22,43,44]). Other studies show SR items scoring below the indifference point and OR measures above the indifference point (e.g., [33]). In both cases, measures with an OR orientation consistently surpass those relying on an SR orientation. Moreover, studies which mainly use SR measures report many more social impact measures below the indifference point.
To try and explain why OR estimates are consistently greater than SR estimates of social impact, we begin by noting the common phenomenon that public perception of positive event impacts remains strong despite research demonstrating that claims of positive economic impact and/or enhanced city image are not supported by the evidence [47]. This disconnect can be explained by noting that events are frequently hosted based on political grounds [48]. Politicians, senior sport administrators, and corporate leaders involved in bids and/or host committees use political discourse to emphasize positive impacts and outcomes of hosting the events, including economic, tourism, city image, social and/or sport participation outcomes [49]. Well-planned public campaigns often highlight the multiple benefits from hosting events, while neglecting the negative outcomes with the intent to gain the support of residents [50].
Similarly, Sant and Mason [49] demonstrated how public opinion is influenced through media framing, which highlights some aspects of an event over others. "Framing refers to the process by which people develop a particular conceptualization of an issue or reorient their thinking about an issue" [51] (p. 104). For instance, framing may reorient the audience's beliefs and attitude towards an event [49], thereby instilling certain expectations. For example, media emphasizing the millions of dollars an event will generate without reporting about the costs may engender a biased public opinion of positive economic impact from the event. The tendency to overstate the potential economic and social benefits of event hosting has been well documented in the literature [52] and results in public perceptions that are not always in alignment with actual facts. Media framing may influence both OR and SR expectations when estimating social impacts, especially pre-event, when the event experiences have not yet occurred.
Whereas media framing and public discourse are factors external to the individual, projection bias is internal to the survey respondent. Humans have a tendency to believe others hold similar views to themselves, a phenomenon called social projection [23]. When determining a response to an OR global statement such as, "The event increases social interactions in the community" individuals will use their own preferences (which may already have been influenced by external factors) to predict what others are likely to believe. Even though SR responses may be influenced by these external factors, OR responses are even more influenced because, as Van Boven, Judd, and Sherman note, respondents "perceive others as more likely to hold that stance" [53] (pp. 84-85, emphasis added). Thus, there is a theoretical explanation why OR outcomes are higher than SR outcomes.

Same OR Question, Different Time
When using OR questions in different time periods, there is a strong tendency for OR social impacts to be higher pre-event compared to post-event [13][14][15][17][18][19]. In these studies, the post-event measures are presumed to be a better reflection of reality because the event has actually occurred and people are not guessing about the future. No studies argue that the lower post-event numbers are underestimates of the pre-event reality. In other words, time helps us better understand which estimates are closer to reality.

Same SR Question, Different Time
When using SR questions in different time periods, there is more variation in pre-and post-event results [33,35,37,45]. For example, Kavetsos and Szymanski [45] studied the feel-good effect one year before, during, and three months after the 2006 FIFA World Cup, using a four-scale SR question: "On the whole, are you very satisfied, fairly satisfied, not very satisfied, or not at all satisfied with the life you lead?" They found no effect one year prior, the highest levels during, and a decrease three months after the event. In their study on the 2010 FIFA World Cup, Heere and colleagues [35] found higher SR scores post-compared to pre-event for private and public evaluation of the event. Gibson and colleagues [33], also studying the 2010 FIFA World Cup, found lower SR scores post-event for three social capital dimensions (i.e., collective action, social connectedness, tolerance of diversity). However, practical significance of this difference was regarded as moderate. The authors found the same SR score for two other social capital dimensions (i.e., trust and safety, and value of life). In their study on the 2012 Major League Baseball All-Star Game, held in Kansas City (MO), Oja and colleagues [37] found some SR increased post-event (e.g., social capital: "Interacting with Kansas Citians makes me want to try new things"), others decreased (e.g., Neighborhood identification: "I am very interested in what others think about Kansas City"). Given this wide variation in higher, lower, or the same scores for SR items, there is no ground to posit that SR items underestimate social impact.
We established in previous sections that OR measures are empirically and theoretically higher than SR measures and we can now establish that this is true whether the data are collected before, during, or after an event. For example, when data are collected in advance of an event, predicted perceptions of community impact on generic others (which have not occurred) will still be higher than predictions of self-experiences (even though they also have not yet occurred). While both OR and SR questions are subject to media framing, SR questions involve no projection bias.

Different Questions, Same Time
Whereas others have looked at the same type of questions in two different time periods, the only study that used both OR and SR questions in the same time period is the study by Fredline and colleagues [22] in the context of tourism. The authors found OR outcomes to be consistently greater than SR outcomes. We developed an experimental research design to further compare SR and OR questions in the same time period. The following section explains the need for this investigation.

SR Scales as an Improved Measure of Social Impact
According to Fredline and colleagues, social impacts "often have a differential effect on different members of the community" [54] (p. 23). The only way to capture these differential effects is to ask community members about their own experiences. Asking any human to evaluate a situation involves a level of subjectivity [22], but in determining levels of truth within human judgement, West and Kenny [16] make clear that circumscribed responses are more pragmatic and accurate than global ones.
Methodologically, consider that political polls obtain a representative sample of the population and then ask, "Will you vote for candidate x?" The foundation of inferential statistics is that the collective opinions of this sample can be extrapolated to the population. A poll will not ask, "Will others vote for candidate x?" because asking someone to predict the opinions of others allows more error into the measurement.
Similarly, methodologically, a proper sample of the population includes all residents (not just event attendees or volunteers). When referring to social impact experiences, we want to know how a sport event actually affects a resident's life in general, regardless of whether they associate with the event (e.g., as volunteer or attendee) or not. The internal and subjective response residents have through direct or indirect contact with the event captures the spillover or externality effect the event has on the hosting community as a whole [55]. It is the collective response of all residents responding to questions about themselves that will provide the most accurate measure of social impact.
Moving forward at a practical level, we conduct an experiment comparing two similar social impact scales for which only the wording is different; one scale uses SR items and the other uses OR items. Depending on the timing of the data collection, these items can be reworded as future (expected experiences before the event), present (actual experiences during the event), or past tense (reflecting on past experiences after the event), as is consistent with past social impact scales that use multiple time frames [13,14,32]. In the context of this study, we test social impacts pre-event. Thus, items were worded in the future tense (expected social impacts). SR items are worded in the first person using "me", "my", and "I" to rate how respondents expect that the event will affect their lives personally. For example, when measuring social cohesion using SR, residents are asked to indicate their level of agreement with "the event will strengthen my relationships in the community" measuring expected self-experience. OR items are worded in the third person, reflecting a "generic other"; in this case, the social cohesion item reads, "the event will strengthen relationships in the community", measuring predicted perception of others.
In what follows, we first investigate how the social impact of events differs when measured based on two comparable scales, the OR-SIS (OR items only) and the SR-SIS (SR items only) in a pre-event period that holds political discourse and media framing constant but allows projection bias to vary between scales. This allows us to replicate the tourism findings of Fredline et al. [22] to investigate if OR-worded items result in higher outcomes than SR-worded items in the context of events.

Context
The event selected for this study had to be significant enough to create some type of shock in the host community to attract attention from residents [56]. The selected event was the Tokyo 2020 OPG in Japan. The candidature file for this event proposed various programs to increase the social impact, including $39 million USD for cultural development, $12 million USD for Olympic education, and $60 million USD for community development. The surveys were collected two years prior to the when the event was supposed to happen. At the time of data collection, there was no mention yet of the 2020 Tokyo Olympic and Paralympic Games being postponed (official announcement made by the International Olympic Committee on 24 March 2020, https://tokyo2020.org/en/). With the OPG being the largest international sporting event, attracting attention from residents in the host region in the years leading up to this mega-event, the event offered an appropriate context for the study. Japan was awarded the OPG in 2011, and the country has spent years preparing.

Experimental Design and Survey Instruments
In practice, social impact can be measured before, during, and after events, or in multiple periods; the choice depends on the unique research purpose. The purpose of this paper is to test the wording of survey items, thus, the research design only necessitates we hold the time period constant. We did so by taking advantage of the OPG which were in the pre-event period as we began the study. In terms of experimental design, Fredline and colleagues [22] also held the time period constant and then took an approach in which they first asked respondents whether they agreed that a certain impact occurred. Subsequently, they asked respondents how they: (1) perceived this impact on the overall community (OR); and (2) experienced the impact on their personal quality of life (SR). Our approach is slightly different in that respondents were randomly classified into two groups: Group A responded to the SR-SIS scale only while Group B responded to the OR-SIS first, followed by the SR-SIS. In both surveys, basic demographic information was collected, as well as some questions related to the respondents' affinity with the Tokyo 2020 OPG (e.g., support and interest). Each of the two surveys was written in English and translated to Japanese; translation validity was checked by two native speakers. Given that the survey took place 28 months before the originally scheduled date of the OPG, the wording in both surveys was written in the future tense to capture anticipated pre-event social impact.

Measurements
Social impact was measured using a scale developed in the context of the 2016 Rio OPG [57]. It contains 23 items, representing seven constructs described in the literature review: social cohesion (SCOH, 4 items), community spirit and feel-good factor (FGF, 3 items), social capital (SC, 4 items), community involvement with regard to the event (CI, 3 items), disorder and conflict (DC, 3 items), feelings of (un)safety (FUS, 3 items), and sport participation and physical activity (SPA, 3 items). In this scale, all items were worded in terms of "I" and "me", representing the SR-SIS. To create the OR-SIS all items were re-worded in terms of global others (see Table 2). Items from both scales were measured on a 7-point Likert scale (1, Strongly disagree to 7, Strongly agree; see also Table 2).
Demographic variables included sex, age, marital status (2 categories), occupation (8 categories), and personal annual income (6 categories). The survey also inquired how long the participants had lived in the city (number of years).
Affinity with the OPG was also measured because previous research has demonstrated that affinity for, or involvement with an event affects social impact [37]. It is included to verify sample bias. Affinity was measured by asking two questions: "I support the Tokyo 2020 Olympic and Paralympic Games as a resident" and "Tokyo should bid for other major sporting events", both measured on a 7-point Likert scale (1 = Strongly disagree to 7 = Strongly agree) [58]. Three additional questions asked about their interest with the OPG, including, "How frequently do you think about Tokyo 2020 OPG (1 = Not at all to 7 = Very frequently), "How interested are you in Tokyo 2020 OPG" (1 = Not at all to 7 = Very interested), and "How important is it for you to be informed (have knowledge) about Tokyo 2020?" (1 = Not at all to 7 = Very important) [59].

Study Participants and Data Collection
Data were collected through an Internet-based survey conducted by a Japanese research company in February 2018. The commercial company was paid to collect the data and operated under their own ethical code (https://monitor.macromill.com/policy/privacy.html). This study was exempt from ethics approval because it was considered use of secondary data by the University of Ottawa's Office of Research Ethics and Integrity. Data were transmitted to the researchers without identifiers from the respondents. Stratified sampling based on demographic variables (gender and age groups) from the Population Census of Tokyo was performed to establish a representative view of the 1030 participants (successful response rate: 98.7%). Of the respondents, 49.5% were female; the average age was 42.58 years (SD = 14.42); 53.7% were employed, 63.9% earned more than 2,000,000 yen; and the average length of time living in Tokyo was 16.64 years (SD = 15.46). Among respondents, 38.2% expressed an intention to volunteer for the event. Their support for the Tokyo 2020 OPG was mediocre (M = 3.87; SD = 1.53, on a 7-point Likert), although their interest in the Tokyo 2020 OPG scored slightly higher (M = 4.35; SD = 1.81). As for the Tokyo census, 50.7% of the population was female, and the average age was 44.76 years (Tokyo Census: https://www.toukei.metro.tokyo.lg.jp/jsuikei/js-index2.htm). Although small differences were found in gender (the number of male respondents was slightly higher) and average age (about 2 years lower) comparing the census data with the case description, no significant differences were found between local residents and this sample (see Table 3). These statistically insignificant marginal differences are consistent with previous stratified samples in social impact research [33].

Study Design and Data Analysis
The first randomly classified group, Group A (n = 515), responded to SR-SIS only; Group B (n = 515) responded to the OR-SIS first, followed by the SR-SIS. There was no significant difference between Groups A and B in demographic and event affinity variables, indicating no selection bias (see Table 3). Confirmatory factor analysis was performed for both the SR-SIS and the OR-SIS. Composite Reliability (CR) and Average Variance Extracted (AVE) values were computed for each construct to tests convergent and discriminant analysis. Comparative fit index (CFI ≥ 0.90), Tucker-Lewis index (TLI ≥ 0.90), root mean square error of approximation (RMSEA ≤ 0.08), and standardized root mean square residual (SRMR ≤ 0.08) were utilized to confirm the goodness-of-fit index criteria [60]. Several comparative analyses were performed using chi-squares and paired independent t-tests.
To begin, the samples from Group A and Group B were compared with the Tokyo city census to test for sample bias. Then, the samples from Group A and Group B were compared to eliminate any sample bias between the two sets of respondents (see above). Next, the differences between OR and SR social impact were tested in two ways: first by comparing the SR-SIS from Group A (SR-SIS-A) with the OR-SIS from Group B (OR-SIS-B) using an independent t-test between surveys; second, the Group B SR scale (SR-SIS-B) was compared with Group B OR scale (OR-SIS-B) through a paired sample t-test. Finally, the SR-SIS from Group A (SR-SIS-A) was compared with the SR-SIS from Group B (SR-SIS-B) to test for response bias and to report social impacts of the Tokyo 2020 OPG.

Results of the CFA
The results of the global fit indexes, which assessed the proposed model's fit with the data (χ2/df = 2.42 (898), p < 0.001, CFI = 0.933, TLI = 0.923, RMSEA = 0.053, SRMR = 0.046) showed that the measurement models fit the data (see Table 2). Moreover, the computed CR and AVE values for the 14 constructs (seven OR and seven SR factors) ranged from 0.68 to 0.91 for CR and from 0.  Table 4). Thus, we compared the chi-square value of a measurement model with the correlation constrained to equal one to a baseline model without this constraint [62]. We performed a chi-square difference test for those five pairs of factors (a total of five tests in all), and every case resulted in a significant difference, suggesting that all the measures of constructs in the measurement model achieve discriminant validity. Thus, the validity and reliability of this scale were acceptable.

SR versus OR Social Impact
Independent t-tests to examine significant differences between the SR items of Group A (SR-SIS-A) and the OR items of Group B (OR-SIS-B; Table 5) showed that all scores on the OR factors are significantly higher than the scores for the SR factors, except for feelings of (un)safety. The same is true for differences in OR and SR for the same participants in Group B, as demonstrated by the paired t-test results in Table 5.
In terms of specific results, the ranking of factors according to OR and SR scores for the participants in Group B were similar. In the OR-SIS, four of the seven factors scored above the 4-point indifference level, compared to three factors in the SR-SIS. In both the OR-SIS and SR-SIS, the concepts pertaining to negative social impact factors were over the indifference threshold, namely feelings of (un)safety, and disorder and conflict. The two other factors in the OR-SIS which scored higher than 4 were sport participation and physical activity (M    Note: OR-SIS = other-referenced social impact scale; SR-SIS = self-referenced social impact scale; SCOH = social cohesion; FGF = community spirit; SC = social capital; CI = community involvement; SPA = sport participation and physical activity; DC = disorder and conflict; FUS = feelings of (un)safety. Table 5. Comparison between the Self-Referenced Social Impact Scale (SR-SIS) and the Other-Referenced Social Impact Scale (OR-SIS), and Response Bias.

Self-Referenced vs. Other-Referenced Response Bias SR-SIS-A vs. OR-SIS-B OR-SIS-B vs. SR-SIS-B SR-SIS-A vs. SR-SIS-B Independent t-Tests
Paired t-Tests Independent t-Tests The four remaining positive social impact factors scored below the 4-point indifference level. For instance, in the SR-SIS, participants disagreed that the event will positively impact their personal sport participation or physical activity levels (MGroupA = 3.59; SD = 1.56; MGroupB = 3.63; SD = 1.58). Social capital, social cohesion, and community involvement scored low in both the SR-SIS and OR-SIS, but significantly lower in the SR-SIS. Thus, participants disagree that the event will positively impact social cohesion, social capital, and/or community involvement with regard to the event if this is measured through OR measures, but disagree even more when measured through SR measures. Overall, the results confirm that in the period roughly two years before the OPG, OR measures of perceived future social impact are consistently higher than SR measures of perceived future social impact, except for feelings of (un)safety.

Social Impact of Tokyo 2020 OPG Based on SR Measures
The SR-SIS measured the pre-event social impact of the Tokyo 2020 OPG in two different sample groups. No significant differences were found in SR items between Group A (SR-SIS-A) and Group B (SR-SIS-B), except for disorder and conflict (see Table 5). However, with an effect size of 0.8, this difference is unimportant. Thus, overall, the response bias between Group A and Group B is negligible.
In terms of specific results, amongst the conceptual categories, feelings of (un)safety scored highest, followed by disorder and conflict. Both reflect negative social impact factors. Community spirit is ranked third and was the only other factor that scored around the 4-point indifference threshold. All other social impact factors scored below the indifferent threshold, from a 3.06 (SD = 1.34) for social cohesion to 3.63 (SD = 1.58) for sport participation and physical activity, showing little evidence for positive social impact of the Tokyo 2020 OPG based on projected SR social impact.

The SR-SIS versus OR-SIS
Scale testing confirmed that the 23 items represented seven predetermined social impact factors in both the SR-SIS and the OR-SIS, as was the case for the original social impact scale [57]. The comparison between the SR-SIS and OR-SIS allowed us to replicate past work on social impact from tourism [22] in the context of sport events. Our results confirm the theoretical prediction that OR items are higher, both in positive and negative ways. Specifically, except for feelings of (un)safety, all other OR measures of collective social impacts are significantly higher than SR measures, whether this was measured between groups (A and B), or within the same group of respondents (Group B). The order in which the scales were provided for group B (i.e., OR first, SR second) cannot explain the differences, as the SR social impacts reported by group A and B were not significantly different. Thus, changing the point of reference, in terms of wording, matters when measuring social impacts of sport events.
The higher OR scores indicate that respondents consistently believed that others were more likely to experience certain potential social impacts than if asked about themselves, confirming an internal bias of social projection [23,53]. We can find no evidence or theory to explain why the SR measures would be underestimates of reality and therefore conclude that OR measures are overestimates because projection bias leads to inaccuracy in measurement. We therefore posit that the point of reference as the circumscribed self (SR) calibrates and provides a more accurate measurement of social impacts than providing a perspective of a global other (OR). As a result, based on the experimental design, the SR-SIS is an appropriate scale to use in the pre-event period and can be generalized to other one-off, mega-events similar to the OPG.
SR scores are also more accurate because they involve a more appropriate scientific survey methodology (asking each person in the sample and generalizing to the population). Each person matters. People who see zero social benefits are given a voice. They can respond zero and be averaged in with those who see medium and high social impacts. On the other hand, in OR questions, these people who do not see a personal benefit, are asked if others see a benefit. Of course, if they are paying attention to media, they will answer yes, that there are others who see a benefit. Indeed, external factors such as media framing and public discourse can be powerful tools to shape perceptions [49,63,64], and possibly affect pre-event expectations of OR social impacts more than SR. While OR scales are upwardly biased, they do grasp public opinion in more general terms, reflecting a general mood that lingers among a population which may influence event organizers and policy makers. For instance, high OR social impact scores may be useful for sport event organizers and policy makers to gain support for hosting, but the reverse can also happen. Specifically, for the Tokyo OPG, residents have been strongly influenced by the media regarding COVID-19, negatively impacting their attitudes towards the Games. A poll indicated that 51.7% of Tokyo residents no longer wanted to see Tokyo host the OPG in 2021 [65]. While these attitudes and public opinions may matter to gain support for hosting or not, they remain generic and affected by projection bias.

Practical Results for the Tokyo 2020 OPG
In terms of absolute scale values we found little evidence for perceived potential positive social impact whether measured as SR or OR items, and negative social impacts prevailed. Thus, positive social impacts measured two years prior to the originally scheduled Tokyo OPG are negligible, whether measured based on OR or SR. The only positive social impact factor which hovered just above (in the OR-SIS) or around (in the SR-SIS) the 4-point indifference threshold is community spirit. This refers to the short-lived feel-good factor which residents and the community at large experience in the context of events, confirming previous studies [37,44,45]. Except for sport participation in the OR-SIS, the remaining four positive social impact factors score below the 4-point indifference level, and significantly lower in the SR-SIS compared to the OR-SIS. In summary, two years prior to the planned occurrence of the Tokyo 2020 OPG, residents did not anticipate that the event would contribute positively to social impact, whether measured through perceptions of others, and even less so when measured through perceptions of self. This may be due to the absence of recency [66]; data were collected two years prior to the event. In previous studies where social impacts of events were prevalent, results showed the highest levels during the event [37,44,45].
The two negative social impact factors, disorder and conflict and feelings of (un)safety, showed the highest scores pre-event, regardless of how they were measured. This result shows that negative social impact perceptions and experiences prevail over positive social impacts (see also [13,14]), at least when measured two years prior to the event. Residents have no control over the occurrence of the negative social impact factors. However, people can react to aspects of conflict and disorder, and decide for themselves if they want to circumvent the inconveniences created by the event, such as deciding not to go to the city because of traffic jams. Thus, it is up to the individual how they deal with or act upon the situation. Overall, there may be an expectation that people will try to avoid the potential inconveniences leading to higher scores for both SR and OR measures of disorder and conflict. In contrast, there is little residents can do against feelings of unsafety. For example, if terrorist attacks occur, anybody can be affected, and there is very little individuals can do to avoid this threat. This may explain why we do not find a significant difference between pre-event OR and SR measures for feelings of (un)safety; if it occurs everybody will experience it, as is the case with COVID-19. The data for this study were collected pre-COVID-19 and we recognize that if repeated during COVID, the results could be very different.

Limitations and Future Research
The literature has demonstrated that pre-event OR scores are generally higher than post-event scores [13][14][15][17][18][19], indicating a potential problem with using OR questions before an event. However, in keeping time constant in the research design, both OR and SR data were collected pre-event when social impacts were not yet experienced and respondents could only anticipate how they believed what they would experience the events and how they would react. It can be argued that asking residents to evaluate the impacts of an event based on their projected self-experience is self-reflective and subjective in nature [22]. Nevertheless, estimates of pre-event social impact are a regular occurrence in both research and practice. Whether or not we agree with the practice of predicting a possible future social impact, we know it occurs, and this research offers a solution to make the data collection as precise as possible.
Clearly, more research is warranted to understand the longitudinal and sustainable nature of experienced social impacts. Assuming manifestations of social impact experiences do occur during the event, future research will have to determine whether the social impact gap between OR and SR items remains, increases, or diminishes when data are collected during the event, and even post-event. Thus, to provide deeper insight and to further test our argument, we recommend repeating the same study design using both the SR-SIS and OR-SIS in future research during and post-event. This will allow to test the actual accuracy of SR scores and find support if the SR-SIS is indeed a valid scale in other event time periods. This will also provide further insights in the wide variation of higher, lower, or the same scores for SR items, found in the literature pre-and post-event [33,35,37,45]. Thus, in order to test the robustness of the scales and the findings, the data collection should be repeated during and after the Tokyo 2020 OPG to test the recency effect and take into account that media attention [49,64] and political discourse [63], increasing as the event gets closer. We also recognize that COVID-19 has drastically changed the context for the Tokyo OPG [65], which may hinder a longitudinal comparison pre-, during, and post-event.
Although Robbins and Krueger [23] reported no differences in the ordering of scales, as in SR-SIS versus OR-SIS in Group B, ideally, to provide a clearer distinction between OR and SR outcomes, data would have been collected from four groups: a group responding to the OR-SIS only; a group responding to the SR-SIS only; a group responding to the OR-SIS first, followed by the SR-SIS; and finally, a group responding to the SR-SIS followed by the OR-SIS. This expanded experimental design could have confirmed the findings of Robbins and Krueger [23] and would have controlled for any potential bias of first responding to OR items followed by SR items and vice versa. The fact that no significant differences appeared between the SR-SISs of Group A and Group B, except for disorder and conflict, alleviated some of these concerns.

Conclusions
Past research on social impact of events relies heavily on asking respondents about a generic and global "other" which we referred to as other-referenced (OR) measures of event impact on the community at large (e.g., "The event strengthens friendships/relationships in the community"; [13,14]). To obtain more accurate measures of social impact, the solution is to use word scale items in terms of self-referenced (SR) social impacts, using the first person using "me", "I", or "my" (e.g., "The event strengthens my friendships/relationships in the community"). Two years prior to the planned Tokyo 2020 OPG, we found that anticipated social impact measures of others (OR) were generally significantly higher than projected social impact measures of self (SR). Projection bias, and to some extent media framing and socio-political discourse, explain why OR measures are skewed upwards, and therefore overestimate social impacts. Moreover, methodologically, by asking each person in the sample about their expected personal experiences, generalizations to the population are a more truthful reflection of reality. We therefore posit that the point of reference as the circumscribed self (SR) calibrates and provides a more accurate measurement of social impacts when used to measure mega-events in the pre-event period. Using the SR-SIS moving forward and testing it in other time periods will generate deeper insights and strengthen the generalizability of the instrument.
Two years prior to the anticipated hosting of the Tokyo 2020 OPG, there is little evidence for expected positive social impacts, except for a short-lived anticipated enhanced community spirit (i.e., feel-good factor) and perceived negative social impacts prevail. The negligible positive social impacts can be partially explained by recency bias [66], as stronger effects may become apparent closer to the event.
The present study asserts that measuring pre-event social impact through expected individual experiences using a self-referenced social impact scale provides a more accurate assessment of possible social impact than using an other-referenced social impact scale. A self-referenced social impact scale helps proponents of events inform host communities more realistically about how events contribute socially (or not). Overestimating social impact claims through measures based on perceptions of others may raise residents' expectations; if not delivered, this could negatively impact perceived benefits. According to social exchange theory, residents form negative attitudes toward the object (i.e., the event) when the perceived benefits are lower than the perceived costs (e.g., [13]). Thus, accurate assessment is practically important for event organizers to host sustainable sport events.

Conflicts of Interest:
The authors declare no conflict of interest.