User Interaction Modeling and Profile Extraction in Interactive Systems: A Groupware Application Case Study †

A relevant goal in human–computer interaction is to produce applications that are easy to use and well-adjusted to their users’ needs. To address this problem it is important to know how users interact with the system. This work constitutes a methodological contribution capable of identifying the context of use in which users perform interactions with a groupware application (synchronous or asynchronous) and provides, using machine learning techniques, generative models of how users behave. Additionally, these models are transformed into a text that describes in natural language the main characteristics of the interaction of the users with the system.


Introduction
At present, a proliferation of groupware applications (social networks, shared editors, messaging services, etc.) can be observed, enabling collaboration between users The computer-supported collaborative work (CSCW) research field [1] has focused on studying how technology can effectively support these collective processes. One of the main purposes of the CSCW has been the identification of methodologies that provide a systematic approach for discovering the users' requirements, and for evaluating the degree to which these systems allow users to be aware of, modify or interact with the work of other users.
One of the most frequent characterizations of the groupware systems is based on the following two dimensions of their contexts of use [2]: (1) time, whether users collaborate synchronously (they collaborate at the same time) or asynchronously (they collaborate at different moments); and (2) space, whether users are co-located or geographically distributed. Currently, most users of smart-phones use groupware applications that allow them to collaborate synchronously and/or asynchronously while being geographically distributed. Therefore, the evaluation of these groupware applications should assess the versatility of the systems to support synchronous and asynchronous interactions and whether the system is adapted to the users' mental model so they do not have to make any effort to use the system's features.
In order to carry out these evaluations, this article describes a methodological approach (see Figure 1) that identifies the context of use in which the users perform the interactions (synchronous or asynchronous) and generates descriptive models of how these users orchestrate the interactions with the system. Finally, these models are processed to create a descriptive text in natural language of the main characteristics of the users' interaction with the system. Thus, the designer can verify whether the system offers a natural interaction experience to its users in each context of use and whether users deviate substantially from the task model used to design the system. The task models are specifications widely used in the human-computer interaction (HCI) field to describe the logical activities that have to be performed in order to reach the users' goals. Therefore, the evaluation of the system should analyze the users' behaviors to verify that it is not hard for them to follow the sequence of actions specified by the task model. Moreover, the users of the system can have different behaviors so the task model must also be flexible enough to enable the users to reach their goals in different ways. With this aim, we propose a methodology that processes log repositories with information of the user interactions and also models the behavior of the users. Then, the users with similar behaviors are grouped together and for each set of similar users, a profile (in the form of a weighted automaton) is generated. Finally, these weighted automata are transformed into descriptive text that capture the characteristics of those different user profiles, which is valuable information that allows for the design of a task model according to the users' natural behavior. With respect to our previous work [3], we introduce two new models for the users' behavior, one of them allowing for more flexibility (the user is no longer represented by a fixed-length vector), and we highlight the pros and cons of using any of the three proposed models. The experimental section is modified accordingly. Note that user modeling can be understood in a more general context (from an artificial intelligence perspective), as the process through which systems obtain information about the individual characteristics of users [4,5]. On the other hand, in the HCI field, a lot of work has been done on ontology-based models, where three aspects are considered: user, context and device [6][7][8][9] (we refer the reader to [10] for a very good chronological review of the evolution of user models).

Related Work
Suarez et al. [11] establish a classification of interactive systems and point out that specific evaluation criteria should be applied to each one, according to their features and complexity. This classification includes the groupware as a specific type of application. Among the specific features of the groupware applications, it is noteworthy that they enable not only the communication between users but also the collaborative building of artifacts in a shared workspace. The groupware applications support user interactions whose effects can be perceived by at least another member of the group or another community. Therefore, the evaluation should consider specific criteria such as the effectivity of the awareness and social support of the application. Molina et al. [12] propose a combination of techniques (questionnaires, empirical testing, heuristic evaluation, eye tracking) to analyze the awareness support of groupware applications. Neale et al. [13] propose some specific measurements to evaluate the degree to which an application intuitively enables social interactions.
These evaluations have been automated using tools such as Tatiana, which is independent of any groupware system and allows us to configure and to automate the analysis of interactions recorded in log files by means of several socio-cognitive methodologies. Moreover, Tatiana provides support to perform non-automated analysis, where collaborative activity is reproduced by video and the users introduce annotations or categorizations about the users' interactions. ProM [14] is another tool that facilitates this kind of analysis: it allows users to select different data mining algorithms in order to analyze the work-flows recorded in log files.
Thaler [15] uses the term usability mining for the evaluation process of information systems based on analyzing log traces. In this case, the goal is to derive a model of usage of the system that includes information about the users behavior (irrelevant actions, undo actions, using help function, etc.). The users' actions can be analyzed using statistical techniques that generate quantitative information. These measures are known as low-level indicators [16], as they usually do not provide an interpretation of the user's activity. The indicators whose values provide an interpretation (cognitive, technological, etc.) of the user's activity are known as high-level indicators.

Data Flow Representation
The data flow generated by the interactions of users with a computerized system can be seen as a stream of tuples D of the form (id, t, a), where id is the user's identifier in the system and t represents the instant of time when the action a took place.
More precisely, D is a sequence of the form [(id i , t i , a i )] 1≤i≤N where each (id i , t i , a i ) belongs to U × T × A, U is a finite set of user identifiers, T is a finite set of instants in which the process is observed and A is a finite set of actions. We assume that t i ≤ t i+1 , for 1 ≤ i < N − 1. The information contained in D can be segmented according to different criteria. If we are interested in the case in which D is partitioned according to the user identifiers, U := {u 1 , ..., u l }, each segment of information is of the form: That is, segment D(k) is the subsequence of D formed by those tuples (a i , t i ) in which action a i is performed by the user whose identifier is u k (the user's id information is the same for all tuples in D(k) and thus it is omitted). We call segment D(k) a trace (also path or trajectory) of user u k in the system.
For data analysis and user profiling, one might be interested in the duration of transitions between actions and not in the exact instance of time in which actions take place. In such cases, we can rewrite the trace D(k) as a new path of the form: where d i j := t i j+1 − t i j is the time elapsed between action a i j and action a i j+1 , for 1 ≤ j < N k , and thus is a continuous attribute. We can discretize these durations into a fixed number of categories, say C : = {c 1 , . . . , c L }, where each value c i represents a time interval chosen ad hoc for the system under study.

User Interaction Models
The problem of modeling the user interactions can be approached from two very different perspectives. One possibility is to identify each user with its trajectory and determine which users behave similarly using a path-similarity measure. The other option is to find a generative model that behaves in the same way as the user does in the interactive system. A priori there is no restriction on the nature of the generative model, which could be a Markov model, an automaton, a Bayesian network, etc. We explore here a Markov-like structure called weighted automaton.

Definition 1.
Weighted automaton (see [17]). Let Σ be a finite alphabet of symbols and n a positive natural number. A weighted automaton over Σ with n states is a tuple M := (in, out, {W σ } σ∈Σ ), where in and out are vectors in R n representing features of the empty prefix and of the empty suffix, respectively, and W σ is an n × n-matrix with real entries representing transition weights. In some situations we can omit in and out if they are not relevant for the problem.
Let A be the set of actions that can be performed by a user in a given interactive system. For each user u k , we build the path π(k) as in (1). In this path, actions a i j are in A and durations d i j and belong to some interval c in C, where C = {c 1 , . . . , c L } is a finite set of time intervals as explained in Section 3. We define the weighted automaton M u k := ({W k c } c∈C ) over the alphabet C with M = |A| states, in which the matrices of weights are defined as follows: that is, the number of times action a precedes action a in the trajectory π(k) of the respective user and the time elapsed between the two actions is in the time category c.
Depending on the application under study, we may be interested in the sequentiality of the user's actions (see Model a below) or in the frequency with which each action is performed (Model b). Alternatively, the user's behavior can be described by its path, which also leads to further categorization based on the number of previous actions we consider (Model 0, Model 1, Model 2, etc.).

•
Model a: we identify the user u k with a point of the affine space R LM 2 defined by, • Model b: the behavior of user u k is a point of the affine space R LM defined by, • Model p, p ≥ 0: we identify the user u k with a trace of variable length defined by, where c i j is the time interval to which d i j belongs and p ≥ 0. In particular, if p = 0, we have: Determining whether two users have a similar behavior can then be done via distance measures or similarity measures. If users are represented as paths of variable length, their similarity can be calculated using a Monte Carlo estimation of the crossed entropy between their respective traces (see Equation (7) in [18] or Equation (11) in [19]).
Moreover, to improve estimations when the available traces are too short and do not cover the entire set of available actions (along with their respective time intervals), we can use Laplace smoothing, and modify the equation as follows (see Equation (8) in [18] or Equation (12) in [19]).
The more similar traces are the smaller values of H MC we obtain, but the zero can only be reached when both traces consist of a constant number of identical elements. For this reason, H MC cannot be considered a distance function. Also, note that in general H MC (π p (k 1 ), π p (k 2 )) and H MC (π p (k 2 ), π p (k 1 )) are not equal. Nevertheless, this can be fixed by using the following formula.
For users represented as vectors of fixed length x = (x 1 , . . . , x p ) and y = (y 1 , . . . , y p ), the distance between them can be calculated using the Minkowski metric: The commonly used Euclidean distance is obtained for g = 2, the Manhattan distance (also called city block) for g = 1 and the Chebyshev distance for g = ∞.
An alternative concept to that of the distance is the similarity function. When the angle between the two vectors is a meaningful measure, one may consider the cosine measure, where ·, · is the Euclidean inner product in R p , and · is the norm induced by the inner product, or the normalized Pearson correlation: wherex denotes the average feature value of x over all dimensions. In practice, we use d cos (x, y) := 1 − cos(x, y) and d cor (x, y) := 1 − cor(x, y) in order to have d cos (x, x) = d cor (x, x) = 0 as in the case of the above mentioned distances.
Note that each of the distances introduced so far corresponds to different goals, and choosing one of them should be made accordingly. In the sequel, we outline their differences using an oversimplified example in which we have only two possible actions (M = 2 and A = {a 1 , a 2 ) and we consider only one time interval (L = 1 and C = {c 1 }). Example 1. Let π 0 (1) = [a 1 , a 2 , a 1 , a 2 ], π 0 (2) = [a 1 , a 2 , . . . , a 1 , a 2 ] and π 0 (3) = [a 1 , a 2 , a 1 , a 1 ] be the traces of three different users, of length 4, 40 and 4, respectively (the time interval is dropped for better readability). Based on the Minkowski metrics, user 3 is much more similar than user 2 to user 1 in both Model a and Model b, while the metrics that measure the angle between vectors show user 2 to be much more similar than user 3 to user 1 (the equality of Model a and Model b values for the Minkowski metrics are fortuitous). The Monte Carlo-like estimation also marks user 2 as being more similar than user 3 to user 1. Therefore, if one wants to group together those users that perform actions in a similar fashion ignoring the length of their traces, then Minkowski metrics should be avoided. In contrast, these metrics are highly recommended when the length of the trace is an important user behavior aspect for the application under study.

User Interaction Group Profiles
Next, we are interested in clustering the users into a certain non-specified quantity of representative user profiles such that users in the same group (cluster) behave more similarly to each other than to users in other clusters. To this end, we use agglomerative hierarchical clustering with three linkage criteria: single, complete and average (see [20] for more details). Given a set of l users to be clustered and a fixed number k ≤ l of desired clusters, the basic process of our hierarchical clustering is as follows.

•
Step 1. Start by assigning each user to its own cluster, so that if we have l users, in this initial stage we have l clusters, each containing just one user.

•
Step 2. Set the distances between the clusters equal the distances between the users they contain.

•
Step 3. Find the closest (most similar) pair of clusters and merge them into a single cluster, so that now we have one less cluster.

•
Step 4. Compute distances between the new cluster and each of the old clusters.

•
Step 5. Repeat steps third and fourth until users are clustered into k clusters.
Depending on the type of linkage chosen, the distance between two clusters is computed with one of the three formulas: where d(x, y) can be any of the distances previously defined. We denote by k the number of representative user profiles in an interactive system. The correct choice of k depends most of the time on the application. The optimal k will strike a balance between maximum compression of user profiles using a single cluster, and maximum accuracy by assigning each profile to its own cluster (having one cluster per user). If an appropriate value of k is not apparent from prior knowledge on the properties of the profile set, it must be somehow determined. In the literature, there are several proposals for making this decision effective (see [20] for a survey). One popular proposal is to use the so called elbow method. The intuition behind this method is that one should choose k such that adding another cluster does not give much better modeling of the data.
However, because of the myriad of different settings (users represented as traces or as vectors, different distance metrics, different number of time intervals, different linkage criteria, etc.), we face a supplementary complication in determining the best k (see Example 2).

Example 2.
Let us consider the following users (represented as vectors): and assume we want to perform hierarchical clustering using the Manhattan distance with two linkage criteria: single and complete. The hierarchy of clusters obtained for both linkage methods happens to be the same (see Figure 2a,b): (((u 1 , u 2 ), u 3 ), (u 4 , u 5 )), but if we were to plot the number of clusters k against the distance between the last two clusters merged when transitioning from the best k-clustering to the best k − 1 clustering (according to the particular linkage method chosen), one would get a contradictory situation. Namely, although we have the exact same clustering, the change in tendency is when k = 3 if single linkage is used, and k = 4 when using complete linkage (see Figure 2c,d). Therefore, we propose a uniform way of deciding the best k. Given a set {u 1 , . . . , u l } of users and a particular partition G 1 , . . . , G k of this set into k clusters, we compute the LM 2 -dimensional vectors A u 1 , . . . , A u l as per Model a (see Equation (2)) and then we evaluate the within-cluster sum of squares (WCSS) error: where q j = 1 |G j | ∑ u∈G j A u is the centroid of the points representing users in G j . Note that, in principle, Model b could also be used to represent users, but Model a offers a finer description. Moreover, partitioning-based clustering methods allow us to find groups of similar users that optimize the WCSS error. In particular, the k-means algorithm, a heuristic method commonly employed that converges quickly to a local optimum (see [21]), was used in a related study [22]. Nevertheless, there are several reasons for which hierarchical clustering is a better option in this case: in hierarchical clustering, we do not actually need the value of k; instead, the clusterings obtained for different values of k can be "visualized" (via the dendrogram) even for points in higher dimensional spaces (helping decide where to cut).
Once k is chosen and the groups of similar users are identified, the next step is to find a model that describes the profile of each group of users. Note that groups of users that contain less than 10% of the population are considered outliers (the threshold can be modified depending on the application). We propose to train a weighted automaton for each group of users (representing a cluster). We define the group profile of cluster G as the weighted automaton M G := {W G c } c∈C over alphabet C with |A| states, in which the weights matrices are defined as follows.
, for all c ∈ C and a, a ∈ A (11) Initial and final probabilities can be similarly defined, but they are not relevant in this case so we omit giving explicit formulas. According to [23], the previous automaton maximizes the likelihood of the observations.

Case Study: Collaborative Sports Betting
Our proposal was applied to a case study in which thirty users interacted with a mobile groupware application that supports sports betting. These thirty users were randomly grouped in ten groups of three members. They were requested to collaboratively make five bets. The interactions of the users with the application were stored in a log repository.
The process of creating a bet is made up of three main steps. First, a user proposes a bet to the other members of the group. This proposal includes a result of a sport event and an amount of money. Second, the members of the group use a chat tool (see Figure 3, right) to analyze the result and the stake of the bet. Third, the members of the group use a voting panel (see Figure 3-center) to accept or reject the bet that has been proposed. Figure 3, left, shows the main user interface of this application with its six panels that allows the user to: propose a new bet to the other members of the group, see the state of the bets or proposals made previously, use a chat to discuss about a proposal, and create a new group of users or see a tutorial that explains how to use the application. The appendix of this paper includes a specification of the actions supported by the application. , that is, c 1 corresponds to short actions (that take less than one minute and three quarters), and c 2 corresponds to long actions (longer than one minute and three quarters). The threshold is set such that 50% of all possible durations are short and the other 50% are long.
For each of the three linkage criteria, we generated all possible clusterings (each clustering containing k groups, with k in {2, . . . , 30}) for all the models listed in Table 1, each of them with the corresponding distances. This accounts for a total of 1131 = (5 × 3 + 5 × 3 + 3 × 3) × 29 distinct ways of grouping the same set of users, albeit many of them give the same output. In a real life setting, one should be able to choose a priory which model and which measure of similarity is the most appropriate for the application under study. Nevertheless, here we perform an a posteriori analysis, which allows us to select the best combination for our case study. First of all, we turned our attention to the linkage criteria and we observed the following: • The single-link or average-link clustering methods detect many outliers before starting to output reasonable sized groups. We report below (see Table 2) the values of k at which a second group of at least three users is identified. Table 2. The minimum number of clusters for which a second group of at least three users is identified.

Single/Complete/Average Euclidean Manhattan Chebyshev Cosine Correlation Monte Carlo
Model a never/5/9 never/4/8 14/4/6 7/5/6 7/2/6 -Model b 11/5/6 12/2/6 9/4/6 8/5/6 • Out of the total of 464 = 2 + 3 + . . . + 30 groups created for different values of k, only a very few of them were proper (having at least 2 members) in the case of average linkage and even fewer for the single linkage criteria. The number of proper groups is presented in Table 3. Therefore, we concluded that the complete-link clustering methods usually produce more compact clusters and more useful hierarchies than the other two clustering methods.
Once chosen the type of linkage, we had to determine which of the five models should be used, and which is the most appropriate measure of similarity between users. As mentioned in Section 5, we plotted the WCSS error against the size of the clustering. The obtained graphics are illustrated in Figure 4.
Inspecting the plots, we can see that for our case study, the three Minkowski type metrics are better: there is a clear steep for small values of k, followed by a smooth descent after the "elbow" point. Also, in almost all models, k = 5 seems to be an inflection point. And between Euclidean, Manhattan and Chebyshev, we choose Manhattan (with k = 5) because, in comparison with the two other distances, Manhattan provides the highest number of proper groups and the lowest k at which a second group of at least three users is identified. Note that for the chosen k, Model a and Model b give the same clusterings (this is not true though for k = 2 or k = 3).
The weighted finite automaton generated for each of the two bigger groups identified with this metric are presented in Figure 5 (the other three groups contained at most two users each, and their characteristics are described in Table 4). A description for each action is provided in the Appendix. A transition from a state a p to a state a r labeled i/x has to be interpreted as W G c i (a p , a r ) = x. Note that for a better readability, transitions with weights less than 0.1 are omitted.  These automata provide information used to generate a set of rules that enables us to build a description in natural language of the main characteristics of each user profile. Thus, this methodology allows us to build expert system that take as input log files with the users' interactions and generate as output a textual description of the main user profiles. The following five rules are applied for this purpose: • Rule 1: If |G| < 10% * l for a cluster G → The members of G can be considered as outliers. • Rule 2: If ∃a, a such that W G c 2 (a, a ) > 0.10 → The members of G have used the system in an asynchronous context. • Rule 3: If ∃a, a such that W G c 1 (a, a ) > 0.10 → The members of G have used the system in a synchronous context. • Rule 4: If ∃a ∈ A such that W G c (a, a ) = W G c (a , a) = 0, ∀c ∈ C, ∀a ∈ A → During the interaction process, the members of G never used action a of panel A . • Rule 5: If ∃A ⊆ A such that W G c (a, a ) = W G c (a , a) = 0, ∀c ∈ C, ∀a ∈ A and ∀a ∈ A → During the interaction process, the members of G never used panel A .  Table 4 includes a textual description of the five profiles generated in this case study by means of these five rules. We can observe that the application was often used in a synchronous way. Moreover, we can observe that most users never perform some of the actions supported by the voting panel and some users (groups 2 and 4) access this panel but they do not use all its actions. By simply checking the generated weighted automata, evaluators of the system can easily detect that the actions of accept and reject are very seldom used. The users of the third group accept some proposal of other partners to generate new bets in the list of active bets. However, these users use the chat to answer the proposal of bets that they reject. This illustrates how the automaton and the textual information enable designers to understand the natural user behavior and adapt the system to them. Previous work [3] identified situations in which the users only use some of the actions supported by a specific panel. This new methodological contribution enables us to generate also profiles of users who never use some panel of the interactive system. The members of this group can be considered as outliers (Rule 1). The members of this group have used the system in a synchronous context (Rule 3).

/ 1
The members of this group can be considered as outliers (Rule 1). The members of this group have used the system in an asynchronous context (Rule 2). The members of this group have used the system in a synchronous context (Rule 3). During the interaction process, the members of this group never used the voting panel (Rule 5).

/ 9
The members of this group have used the system in an asynchronous context (Rule 2). The members of this group have used the system in a synchronous context (Rule 3). During the interaction process the members of this group never used action reject of the voting panel (Rule 4).

/ 17
The members of this group have used the system in an asynchronous context (Rule 2). The members of this group have used the system in a synchronous context (Rule 3). During the interaction process the members of this group never used the voting panel (Rule 5).

/ 2
The members of this group can be considered as outliers (Rule 1). The members of this group have used the system in a synchronous context (Rule 3). During the interaction process the members of this group never used action reject of the voting panel (Rule 4). During the interaction process the members of this group never used the tutorial panel (Rule 5).
Note that there are some actions that are never used by any of the users (actions a 12 and a 13 of the proposals panel and actions a 5 − a 8 of the chat panel. These are included separately in the final report and are ignored in the description of the user profiles since they are not specific to any of the five identified profiles.

Conclusions
We have proposed a methodology for the automatic generation of user interaction models in interactive systems that combines unsupervised and supervised learning. We record the logs of the users in the system in the form of traces and, after a pre-processing phase consisting of describing actions and durations, we use these trajectories (either directly or by compressing them into feature vectors) as input to (various) agglomerative hierarchical clustering algorithms (unsupervised phase) in order to obtain groups of users by similarity of the empirical distributions of actions, durations and transitions. Once this process is completed, we model each group of users by means of a weighted automaton (supervised phase). This finite state machine is what we call a user profile. User profiles, as intended in this paper, constitute a low-level representation of user patterns in interactive systems. From the low-level representation we derive a profile in text form using rules that automatically generate a text identifying some features being representative of the group of users under consideration. As future work we plan to apply our technique to inverse software engineering, using the weighted automaton to infer a task model of the interactive system under study that may help the software engineer to improve the system's design.

Abbreviations
The following abbreviations are used in this manuscript: CSCW computer-supported collaborative work WCSS within-cluster sum of squares HCI human-computer interaction

Appendix A
The following table describes the semantics of the actions of the interactive groupware system used for the experimentation. The actions supported by the tool to generate groups of users are omitted because they were managed by the evaluators and the users of the system never performed these actions. Table A1. Actions of the groupware system.

Identifier
Panel Description of the Action a 1 Chat Access to this space a 2 Chat Send a "fre" message a 3 Chat Send a "Why..." message a 4 Chat Send a "Because" message a 5 Chat Send a "I think that" message a 6 Chat Send a "I don't agree" message a 7 Chat Send a "The beast team is" message a 8 Chat Send a "My vote will be" message a 9 Proposals Access to this space a 10 Proposals Reject a proposal a 11 Proposals Accept a proposal a 12 Proposals A proposal was accepted by the group a 13 Proposals A proposal was rejected by the group a 14 My Bets Access to this space a 15 Tutorial Access to this space a 16 New Bet See sport events a 24 New Bet Access to this space a 25 New Bet Send a proposal