Measuring Anticipated and Episodic UX of Tasks in Social Networks

: Today, social networks are crucial commodities that allow people to share different contents and opinions. In addition to participation, the information shared within social networks makes them attractive, but success is also accompanied by a positive User eXperience (UX). Social networks must offer useful and well-designed user-tools, i.e., sets of widgets that allow interaction among users. To satisfy this requirement, Episodic User eXperience (EUX) yields reactions of users after having interacted with an artifact. Anticipated User eXperience (AUX) grants the designers the capacity to recollect users’ aspirations, assumptions, and needs in the initial development phase of an artifact. In this work, we collect UX perceived in both periods to contrast user expectations and experiences offered on social networks, in order to ﬁnd elements that could improve the design of user-tools. We arrange a test where participants ( N = 20) designed prototypes on paper to solve tasks and then did the same tasks on online social networks. Both stages are assessed with the help of AttrakDiff, and then we analyze the results through t -tests. The results we obtained suggest that users are inclined towards pragmatic aspects of their user-tools expectations.


Introduction
The popularity of social networks has increased in recent years [1], especially due to the COVID-19 pandemic [2,3]. However, they are not a new topic but are much less unknown. Social networks have been studied by Computer Science researchers for a long time and from different angles, that is why we can find several definitions in the state-of-the-art [4][5][6][7][8]. Among all, we adopted the one by Boyd and Ellison [9]-"web-based services that allow individuals to (1) construct a public or semi-public profile within a bounded system, (2) articulate a list of other users with whom they share a connection, and (3) view and traverse their list of connections and those made by others within the system. The nature and nomenclature of these connections may vary from site to site"-because it denotes the main elements of social networks and their interaction.
Sociability and usability are vital factors for any social network. Sociability refers, of course, to the contact and exchange of information among users, whereas usability enables technology to allow those exchanges [10,11].
A result of sociability is participation. Therefore, several studies have been carried out in order to understand the motives of individuals to engage in a social network [12][13][14][15]. We believe that the progressing of any social network heavily relies on: (1) the collaboration among its users to create contents and make contributions to the community [16], and (2) the user interaction with businesses, organizations, colleagues, family members, and friends to create together their production and consumption experience and meet their necessities [17][18][19]. Periods are essential because user responses may be different, e.g., when measuring momentary UX, it can result in a visceral response from the user. While if UX is measured some time after the use of an artifact, the user can remember more positive things and suppress the negative ones [29]. In this way, a study that considers more than one period could be more enriching.
While EUX is simply the experience that is obtained after having used a system, product, or service [30], AUX has to do with attitudes and experiences that the user assumes to happen when envisioning using an artifact [31]. Thus, the goal of an AUX assessment is recognizing whether a determined idea offers the type of UX anticipated by developers for potential users [32]. Making AUX trials has been established worthily, even if there are not many research works on this subject [27,[33][34][35].
This paper gives continuity and expands the research that we have previously published on this topic [36,37]. The objective of the present work was to know whether there are and which are the differences between User eXpectations (AUX) and the experiences they find on social networks (EUX). Identifying these contrast elements could help improve the design of the user-tools. Thus, we propose the following hypothesis for our study: There is no significant difference in perceived UX between the prototypes imagined by the participants and the actual social networks performing the same tasks.
To confirm or refute this hypothesis, we propose a method that allows us to assess the AUX and EUX of daily tasks on social networks: sending messages, sharing multimedia, and doing searches. Our participants (N = 20) completed these tasks in two phases. In the former phase, they had to make a paper prototype with the elements they considered necessary to solve the task; once their prototype was finished, they evaluated it with the AttrakDiff questionnaire [38]. In the latter phase, the participants solved the task in real social networks and, in the same way, they evaluated it with AttrakDiff. Our main finding is that user expectations are mainly composed of pragmatic aspects.
The organization of this article is as follows. First, we present a brief analysis of related works (Section 2). After describing the research methodology that we follow to develop our proposal (Section 3), we explain our assessment method (Section 4). Subsequently, we report all the details of our tests (Section 5), followed by the results of said tests (Section 6). After that, we depict the discussion of our results, as well as the implications and limitations of our study (Section 7). Finally, we expose our conclusions and some proposals for future work (Section 8).

Related Work
In this section, we present a brief review of some outstanding works that involve AUX and EUX. We classify them into these two groups because it seems that this is the trend in most UX work. The former group are researchers who study popular systems in the market and then propose theories (Section 2). The latter group are those who, after studying theoretical works, use their knowledge to propose changes in practical systems (Section 2). We consider that our work is a hybrid approach, trying to bring together the best of both paths.

From Practice to Theory
Practice is vital, as it allows collecting people's opinions and reactions. Aladwan et al. [39] designed a framework through review searches and constructed a prototype that describes user anticipations and experiences, using instructional fitness applications. The main limitation of this work is the difficulty in unraveling ambiguous user reviews.
Although, in general, qualitative evaluations are complicated to analyze because they precisely lend themselves to ambiguities, they are an indispensable resource if the investigation is about transferring real-world interactions to a virtual environment. Such is the case of Moser et al. [40] who organized workshops for children around the world. Through various types of activities, they managed to gather children's expectations and idealizations regarding games. Although they detailed the way to capture AUX, they did not make comparisons, nor propose elements for the design of GUIs.
The works of Margetis et al. [41], and Zhang et al. [42] also fall into this area of gathering the users' know-how. The former ones created an augmented reality (AR) system that facilitates reading and writing in books without being invasive to users. In addition to a heuristic evaluation, there is no evidence of AUX evaluation, only of EUX after testing the prototype. The latter authors designed a card game that encourages the practice of people who are learning a foreign language. Even though in their design they did an AUX study, there are no contrasts with EUX.
User expectations are also gathered when new environments are studied. For example, Kukka et al. [43] investigated the integration of Facebook content in three-dimensional applications. They created design guidelines based on the problems they could identify in this kind of environment. Being a preliminary investigation, they did not compare AUX and EUX. Another example is Wurhofer et al. [44] who examined the context of UX motorists. Through a study of cumulative UX, they compared expectations against the real experiences of drivers. Despite that this is a study of UX over time, it does not include GUIs.

From Theory to Practice
Theory is essential because it identifies and proposes elements that can be used to design and evaluate systems. Such is the case of Magin et al. [45] who described possible factors that cause a negative UX using apps. Through a prototype app, AUX and EUX were measured by the participants. They concluded that the lack of usability causes negative emotions. Similarly, Sato et al. [46] reported a series of elements used in multi-agent systems that can possibly be applied in Communities of Practice (CoP). Though the impact that these elements would have on UX can be deduced, they did not evaluate UX.
The works gathered here are a sample of how AUX and EUX studies can be applied, as well as their worth. Although these studies present elements that stand out, particularly in AUX or EUX, neither describes which dimension (or dimensions) are more critical for one or the other period. Table 1 summarizes and compares each of the works analyzed in this section.

Research Methodology
As a guide to carry out our research, we use the Design Science Research Methodology (DSRM) process model by Peffers et al. [47]. This methodology was selected because it has been used in works that are under the same UX study spectrum. For example, Carey et al. [48] used this methodology to develop and validate their interactive evaluation instrument-their goal was to improve the process for mobile service innovation. Strohmann et al. [49] followed DSRM to create recommendations for the representation and interaction design of virtual in-vehicle assistants. Lastly, Kumar et al. [50] used this methodology to design an app that provides remote students with a learning support.
The DSRM iterative process consists of a research entry point and six stages [47]. The initiation point could be problem-centered, objective-centered, design-and-development-centered, or client/content-centered. The six stages of the methodology are:

1.
Identify problem and motivate: define the problem, show importance.

2.
Define objectives of a solution: what would a better artifact accomplish? 3.
Design and development: solution artifact.

4.
Demonstration: find a suitable context. Use artifact to solve the problem.

5.
Evaluation: observe how effective and efficient the artifact is. Iterate back to design.
In our case, we selected the research entry point objective-centered initiation of DSRM, given that our aim was to help improve the design of user-tools. Regarding the first step, identify problem and motivate, we have already highlighted the role that UX plays in the design of user-tools within social networks. The second step of the methodology, define objectives of a solution, concerns the construction of the assessment method, whose objective is to compare between AUX and EUX to find elements of contrast. The third step, design and development, refers to the specification of the proposed assessment method. The fourth and fifth steps, demonstration and evaluation, are respectively the tests we prepared and the outcomes we achieved. The final step, communication, is exposed along with this article. To refine the proposed AUX and EUX assessment method, we will initiate succeeding iterations in the design and development step.

Assessment Method
As already described in Section 3, this segment presents stages two and three of DSRM applied to our proposal, i.e., define objectives of a solution, and design and development.

Define Objectives of a Solution
Social networks have problems in the two areas that comprise them: technological (the platform that supports them) and social (misinformation problems, lack of motive, and guidance) [51]. User-tools can help to solve the problems in these areas (see Figure 1), which are vital in a successful social network [52][53][54][55].
User-tools are groups of widgets that make up the GUI of a social network, in order to allow users to perform tasks and communicate with each other, e.g., friend lists, newsfeeds, chats, and publishing menus. The granularity of user-tools is dictated by activities, i.e., a specific set of widgets, that allows solving a specific activity, conforms a user-tool. As we have been mentioning, user-tools are the elements that allow interaction among users on a social network, so its design should be a primary issue. For this, our task focuses on contrasting AUX and EUX, since with this we hope to identify which dimensions of UX have the most significant weight in each period. Therefore, we introduce a six step assessment method (see Figure 2) to be explained in the following subsection.

Design and Development
Here, we describe each step of our assessment method. To demonstrate how our proposal works, we take the basic case when one person uses a chat to make contact with another person: • Set Goals: This step is about the objectives that developers need to achieve, e.g., a chat must allow users to communicate effectively with each other. • Identify Tasks: It refers to the stages that the user has to follow with the aim of attaining the aforementioned objectives, e.g., a user has to recognize the receiver of the message, display the direct message option or window, compose the message and finally send it. • Identify User-tools: This step involves determining which user-tools are available to accomplish the previously identified tasks, e.g., avatars, user profiles, lists, buttons, commands, and text boxes. • Assess AUX: It concerns an AUX evaluation over the prototyped artifact. This stage can be done with various tools, e.g., low-fidelity prototypes [56,57], or techniques such as The Wizard of Oz [58,59]. Nevertheless, the important thing is to stimulate the creativity of participants, so that we can obtain their idealizations and expectations. To know what aspects should be taken into account at this stage, we rely on the bases proposed by Yogasara et al. [31]: -Intended Use: It is about the practical connotation of each user-tool, e.g., the functioning of a chat from the user's point of view. -Positive Anticipated Emotion: It refers to agreeable feelings that the user expects undergoing as a result of the interaction with a user-tool, e.g., satisfaction after sending a message, happiness when the answer comes, generally pleased for not receiving errors, or any other type of alert. -Desired Product Characteristics: As for this aspect, we accommodated the principles suggested by Morville [60] to our case of study. These principles specify that a user-tool must be worthy, functional, helpful, attractive, attainable, honest, and discoverable. -User Characteristics: It concerns the mental and physical faculties of users, e.g., developing a generic chat does not imply the same endeavor for developing one intended for children or for seniors, since each group has specific needs. -Experiential Knowledge: We need to know the background of users, because they rely on their experience to gather information, then compare and contrast, e.g., a user might ask whether the new chat is more suitable than the one provided by Facebook.

-Favorable Existing Characteristics:
This aspect is about the properties that users have identified in the past as assertive in comparable tools, e.g., a user could think that they enjoy the chat from another platform thanks to the response time, availability, and ease of use.
• Assess EUX: This step involves conducting an EUX assessment over the developed artifact. For this step, we need at least a mid-fidelity prototype [61,62], i.e., something so that participants can already experience the tool on a PC or a mobile device. However, to make the comparison of results achievable, it is vital to evaluate all the aspects taken into account for the AUX assessment, e.g., if NASA TLX questionnaire [63] was used in the AUX assessment, it is necessary to reapply it, this time for EUX, being careful to measure similar parts or functionalities between both stages. • Compare Results: Once AUX and EUX assessments were carried out, the results have to be contrasted, so that developers can make resolutions on the design of user-tools, placing side by side the idealizations of users and reality, and examining whether their propositions were developed or not, e.g., compare the evaluations of the NASA TLX questionnaire of the prototype and developed chat.

Demonstration and Evaluation
This section represents steps 4 and 5 of the DSRM methodology. It details the Materials (Section 5.1) and Method (Section 5.2) that we used in our tests.

Materials
To carry out our tests, we use basic materials. For the development of prototypes, we have stationery such as sheets of paper, pens, pencils, and markers of various colors. Whereas for social media tests, we used a 15-inch laptop with internet access, and Firefox as a web browser. For each social network, we created a new user profile.
An essential factor that can compromise the validity and reliability of a study is improvisation. Choosing the wrong instrument invalidates the results, no matter how rigorous a study's proposed methodology was [64,65]. That is why we weigh in on the various factors that could affect our tests. AttrakDiff, since its original proposal in 2003 [38], has been used in multiple tests to measure UX based on its pragmatic and hedonic factors [23]. In each study, experts have used this tool, and it has been tested for validity and reliability in different contexts [66][67][68][69][70][71][72][73], it has been translated into various languages [26], and it has been modified to suit the specific needs of particular experiments [74]. In addition, it is simple to answer and does not represent a burden for participants [75]. All these results made us choose AttrakDiff as a valid tool to study UX.
The AttrakDiff full questionnaire is composed of 28 semantic pairs, i.e., pairs of words that make a strong contrast to each other (e.g., good-bad). Through these semantic pairs, the questionnaire measures the following aspects [76]: • Pragmatic Quality: It refers to the perceived quality of manipulation, i.e., effectiveness and efficiency of use. The hedonic and pragmatic dimensions are autonomous of each other and provide evenly to the UX evaluation [23]. We use a printed version, in English, of the questionnaire available on the official website of the tool (http://attrakdiff.de/index-en.html). All participants had the same materials at their disposal.

Method
Since we try to study the user-tools of social networks, and we have one independent variable with two factors, prototypes and social networks, our tests follow a basic design [77]. Moreover, since we had only one group of participants who were exposed to both factors, our tests have a within-group design [77].
Our only dependent variable is UX, of course, but since it is a latent variable and therefore cannot be measured directly [78], we have AtrakDiff, which with its four dimensions helps us measure the UX perceived by our participants (cf. Section 5.1).
Finally, our control variables are the environment where we carried out the tests, since all the participants were exposed to the same conditions (e.g., materials, noise and light levels, desk, chair, and room). The characteristics of our participants were also controlled (cf. Section 5.2.1). Table 2 summarizes the variables of our tests. The method for conducting our tests has been widely used by various authors in similar contexts [79][80][81][82].

Participants
We used an opportunistic sample to recruit our participants, given that they are all members of our department. All participants gave their informed consent for inclusion before they participated in the study. In addition, the study was conducted in accordance with the Declaration of Helsinki, and the protocol was approved by the Ethics Committee of our department.
Our testing group was composed of 20 participants (five of whom were females), whose average age was 28.15, and the maximum and minimum ones were 38 and 20, respectively. We made the decision of limiting their age to a range between 20 and 40 years, in order to prevent our results from being biased by participants with particular needs (e.g., oversimplify the language and instructions used or make the fonts of the GUIs larger). Although we know that it is a rather small sample, it is within the average for this kind of test [72].
Participants were selected because of their familiarity with social networks. We think that people unconnected to such platforms constrain their potential to perform the assigned tasks, causing an invalidating impact on our study. Moreover, we believe that better results will be obtained if participants have experience with social networks.

Procedure
We carried out the AUX and EUX assessment of user-tools in a peaceful ambiance to limit outer sources of noise in our study. Each volunteer individually participates in the testing sessions, which were conducted by a moderator in situ.
As the first step of our tests, participants filled a questionnaire about their demographic information and former contact with social networks. Afterwards, participants performed the tasks and assessments.
Each session had a length of around 40 min. We run the tests in 20 days, i.e., one participant per day. All tests were done around 10 a.m.; we did this to try to have a similar state in each participant.
The results of the aforementioned questionnaire reported that YouTube is proven to be the most used platform by our participants with 100% of usage. As for Facebook, it got a moderated use with 47%, and Reddit was the least used with 2%. Therefore, we decided to use these three platforms to asses EUX.
First of all, we said that our goal was to improve the design of user-tools through the contrast of AUX and EUX. To achieve this, we devised the following three tasks that represent common activities within social networks. Participants would have to complete each one twice, one for AUX, and one for EUX, during the trial:

1.
Message: Transmit a private message to another user.

3.
Search: Look for somebody or for a certain theme.
To identify the required user-tools for accomplishing each task, we analyzed different ways, e.g., giving them user-tools made up of paper cut-outs. Nevertheless, if we provided participants with a predetermined set of user-tools, they would have prejudices, i.e., we would obtain very similar results between each participant, including the possibility of identical prototypes, consequently limiting their feedback significantly. Thus, for the AUX assessment step, the best alternative was that each participant created their own user-tools.
The next two steps of our method are the AUX and EUX assessments of user-tools: • Prototypes construction: First, we asked participants to imagine that they took the role of a Web designer with the aim of creating a novel GUI for a social network. Afterwards, relying on their experience, they had to create three paper prototypes, corresponding to the three tasks previously defined. Participants had to draw GUI elements to solve the tasks, just as if they were designing a website GUI. In our pilot tests, we obtained prototypes similar to the one depicted in Figure 3a, so we resolved to design a canvas to make it easier for the participants to create their prototypes. Figure 3b-d show random samples of prototypes in our actual tests. When they concluded the construction and description of each prototype, participants had to assess them with the AttrakDiff questionnaire. Therefore, this stage allowed participants to explain their decisions about how they conceive the behavior of the GUI, the rationale behind their designs, and the user-tools required to accomplish each task. In this manner, we assess the AUX of user-tools. • Tasks using online social networks: Once the three prototypes and their assessments were concluded, we asked participants to carry out the same three tasks, but now using online social networks. Hence, on Reddit, participants transmitted a private message to another user; on Facebook, they shared multimedia, and on YouTube, they sought for a somebody or for a certain topic. Like in the previous stage, after finishing each task, they had to assess it through the AttrakDiff questionnaire. Just like that, we assess the EUX of user-tools.
In this way, and taking into account that each participant made six evaluations, we finished with 120 questionnaires: 60 corresponding to AUX and 60 to EUX.

Results
Seven semantic pairs correspond to each dimension of AttrakDiff. The ratings go from one to seven, and the higher, the better. Table 3 contains the results from the 120 questionnaires, the means (µ), and the standard deviations (σ) of each dimension for the three tasks.   In assessing these results, we also look at the reliability scores for the different dimensions. Table 4 shows the Cronbach's alpha values for the AttrakDiff dimensions in each task (α level = 0.05). To contrast the results of the tests, and given that the design we have is within-groups with an independent variable of two factors, the statistical analysis we performed was a paired-samples t-test [83]. In this way, we determined whether there are significant differences between the means of each dimension of AttrakDiff in the AUX and EUX tests for each task (see Table 5). To obtain all the statistical analyzes, we use the R language.  Table 3 clearly shows that the paper prototypes were better evaluated than their counterpart in Reddit. Moreover, Figure 4a reveals something similar. The prototypes for messages were the only ones where the assessment of AUX exceeded that of EUX. This is likely because, for most participants, this was their first time using Reddit. It can also be attributed to Reddit offering a negative UX, since it was not so easy for participants to use their previous experiences on a new platform.

Discussion
Even though participants were free to design their user-tools at their convenience, based on their experience, real social networks gave them more satisfying experiences. Figure 4b,c show that, indeed, all the dimensions were superior in social networks, although it is interesting that there is a difference, but not that much. An intriguing observation is that the participants were quite incisive in criticizing their prototypes, i.e., they complained that they did not do a good job, because they did not have the experience or knowledge necessary to design a GUI.
In general, we can say that the reliability of data is good, since most of the dimensions obtained good results (>0.7) as can be seen in Table 4. The result that stands out the most is that of the Hedonic Quality-Identity dimension, as none of the tests was significant. This could come to mean that AttrakDiff has a weakness to measure the Identity dimension. Of course, we would need more evidence to verify or refute that assumption. Table 5 suggests which results of the t-test with paired-samples allows us to reject our null hypothesis. The comparison between AUX and EUX of the messages task were significant in the dimensions of Pragmatic Quality, Identity, and Attractiveness. For the publications task, only Identity was significant, while for the search task, Pragmatic Quality, Identity, and Attractiveness were significant. These significant dimensions indicate that, in these tests, we can refute the null hypothesis, because there is a significant difference between the UX perceived by the participants between the prototypes and the social networks. It is interesting to note that the only dimension that was consistently not significant in any task was Stimulation.
According to Aladwan et al. [39], when users of fitness applications were physically stressed by exercise and tried to use said apps with no avail, their stress increased, as their expectations were not met. This makes sense with our findings, since it is likely that, in an altered state of mind, users will need to rely on pragmatic elements that are familiar to them. Something similar happens with the tests carried out by Kukka et al. [43], Margetis et al. [41], Wurhofer et al. [44], and Zhang et al. [42], as their participants focused on interactions that they considered safe, when they found themselves in an unfamiliar environment.
Magin et al. [45] studied the possible sources of negative emotions in UX (e.g., anger, sadness, and confusion). They determined that a significant part came from instrumental elements, i.e., usability, which agrees with our findings, since users expect things like that a button is active under certain circumstances or that a selected item can be removed, i.e., practical tasks.
The work by Moser et al. [40] is interesting because the expectations they measured came from children. It seems that their imagination was more oriented to hedonic aspects, mainly self-identification, since they cared that the games reflected their personality and decisions. It is striking because it goes against our findings: perhaps the AUX perceived by children has more weight in the hedonic factors, which could indicate a future path of investigation.

Implications
An exciting result we obtained was that the Stimulation means were not significant, as it could indicate that participants thought about basic user-tools to make their prototypes and found similarly essential elements in social networks. Now, we know that if we want to draw more reliable conclusions from this, we will need to do more research. However, we could speculate that the experience and imagination of the participants are limited to the essential elements that are commonly found in all GUIs, i.e., they prefer to play it safe. Users are looking for security rather than looking for new experiences when testing new GUIs, so Stimulation could become a more decisive factor when they are already familiar with GUIs.
Such behavior could also indicate that user expectations are more grounded in pragmatic aspects than in hedonic ones. This could have significant implications. For example, it would imply that, when creating new GUIs, designers have to pay more attention to including basic user-tools that allow users to efficiently complete tasks, since user expectations would be mainly focused on practical aspects, e.g., that they imagine a button, its action, but not how it looks.

Limitations
The results presented in this work could have been affected by the sampling of our participants. Given that each evaluation took around 40 min, having a random sample would represent a significant challenge. Our participants did not receive any kind of incentive.
Similarly, the limitations of the within-groups design make it difficult to control the effects of learning and fatigue. We try to alleviate this by offering a comfortable and relaxed environment for the participants and reiterate them that they were helping us to evaluate the systems, and that we were not evaluating them [77].

Conclusions and Future Work
UX evaluation is always valuable, regardless of the nature or purpose of the evaluated artifact. In this paper, we proposed a study that compares the AUX and EUX of user-tools through daily tasks in social networks. Our tests revealed that our participants build their expectations with pragmatic criteria, i.e., hedonic and attractiveness aspects were secondary when they were building their prototypes.
Our research contributes to further increasing the understanding of UX, how perceived experiences are measured, and which factors are most relevant at a certain point in an evaluation or development. As we already explained in the discussion (Section 7), our results quantitatively confirmed that AUX seems to be mainly composed of pragmatic aspects. The development of this idea could lead to improving existing evaluation methods and the creation of new ones.
As future work, we intend to replicate our tests, but this time with children. As the work by Moser et al. [40] suggests, children can build prototypes with hedonic aspects in mind, i.e., we would expect to obtain results opposite to what we found. We also consider it essential to use other questionnaires besides AttrakDiff, which would help validate our conclusions quantitatively. While in this work we focus on social networks, our assessment method can be used in multiple areas. To prove this, we will use this proposal to assess a chatbot that attends the teaching-learning process in middle schools.