Data-Driven Interaction Review of an Ed-Tech Application

Smile and Learn is an Ed-Tech company that runs a smart library with more that 100 applications, games and interactive stories, aimed at children aged two to 10 and their families. The platform gathers thousands of data points from the interaction with the system to subsequently offer reports and recommendations. Given the complexity of navigating all the content, the library implements a recommender system. The purpose of this paper is to evaluate two aspects of such system focused on children: the influence of the order of recommendations on user exploratory behavior, and the impact of the choice of the recommendation algorithm on engagement. The assessment, based on data collected between 15 October 2018 and 1 December 2018, required the analysis of the number of clicks performed on the recommendations depending on their ordering, and an A/B/C testing where two standard recommendation algorithms were compared with a random recommendation that served as baseline. The results suggest a direct connection between the order of the recommendation and the interest raised, and the superiority of recommendations based on popularity against other alternatives.


Introduction and Background
Smile and Learn (S&L) is a mobile application in the educational technology (Ed-Tech) field, which is aimed at children and displays a large set of educational games.As of December 2018, there is a total of 107 games, which are grouped based on Gardner's theory of multiple intelligences [1].The games included in S&L cover the following intelligences: visual-spatial, logical-mathematical, verbal-linguistic, naturalistic, artistic, emotional-intrapersonal and group-interpersonal.
The application is intended for parents and educators to first create an account and then several profiles, one per child.When the application is then started, children can select a profile (if more than one exists) with their name and customized avatar, and then the main screen is displayed.
In the main menu, whose aspect can be seen in Figure 1, the different games are grouped in so-called "worlds", with each world corresponding to an intelligence: Science (naturalistic), Spatial (visual-spatial), Multiplayer (group-interpersonal), Logic (logical-mathematical), Literacy (verbal-linguistic), Emotions (emotional-intrapersonal) and Arts (artistic).There is one additional world named as the child, which consists on a virtual village where the child must interact with characters in order to improve their wealth, requiring abilities from a combination of all intelligences.
Once a world is chosen, apps are displayed grouped in categories, with one category per line.The children can use the vertical scroll (swipe pattern) to navigate through categories, and the horizontal scroll to navigate between apps within a category.An example screenshot from the Science world is shown in Figure 2.
Because of the increase in the number of games, which is still growing month after month, navigation through the application is becoming more tedious, meaning that it can take more time and effort to search for a certain game within the application.In order to alleviate this issue, we have introduced a navigation ribbon at the bottom of the main menu which allows a fast access to some chosen applications.As it can be seen at the bottom of Figure 1, this navigation pattern consists of two  tabs: "Recommended" and "Most played".The latter displays a list of the five top-played apps for the child using the app, turning into a useful mechanism for providing a fast access to those frequent apps.
Regarding the former tab, it displays a list of up to seven apps recommended to children based on their usage behavior.The design motif behind this recommender system is not only to enable fast access to those apps which might be of interest for a child, but also to enhance exploration: letting children discover games which they might not reach if they had to find them by navigating through the app.
For generating recommendations, there are several strategies that can be followed.Currently, two different strategies are implemented in Smile and Learn: • Popular: this approach recommends the most popular applications among other users.The motivation behind this approach is that it is likely that there are some games that, because of their quality, design or playability, are specially enjoyable for most children, and therefore it can be safe to recommend these games to children, since there is a high chance that they will find them engaging.
• Collaborative filtering: in this approach, the historical records of usage of all children are used in order to generate recommendations.The idea behind collaborative filtering is to find children similar to the one whose recommendations are being generated, so that these recommendations consist on games that have been played by these similar children, but not the child being targeted.
The specifics of the implementation of these algorithms are discussed in relation to a prior version of the system by Ruiz-Iniesta et al. [2].
In the navigation bar displayed at the bottom of the main menu, recommendations from different recommenders could be combined.Also, it is noticeable that while there is a maximum of seven recommended games, in some screen sizes or formats it can occur that less than seven are displayed and then the list becomes scrollable.Based on our experience, most screens are able to show five recommendations, with the last two being hidden and accessible only upon scroll (through a swipe pattern or by pressing the side arrows, an example can be seen in Figure 1).
In this paper, we want to design a sound experiment in order to determine which recommender strategy is working best, i.e., which one is translating into a higher engagement of children with the recommended games.We are also interested in getting insights about the interaction patterns taken by children when dealing with the recommendations.With this information, we could improve the system thus resulting in a higher success of the recommender system and an enhanced user experience.
The rest of the document is structured as follows: In Section 2 we study related works of recommender systems and interaction patterns within the context of an Ed-Tech application.Later, in Section 3 we thoroughly describe the experimental methodology, whose results are later described in Section 4. Finally, conclusive remarks are presented in Section 5 along with suggested lines of future work.

Related Work
Applications of technology to educational processes (a field known under the term "Ed-Tech") are not particularly new.However, it has only been in the last decade that the availability and ubiquity of technology devices (such as smart-phones or tables) has reached a point in which these applications can be deployed in large-scale settings.
Additionally, the implementation of data science or machine learning techniques are allowing to extend Ed-Tech far beyond the classical definition of using technology to deliver contents to an audience (e.g., slides, interactive videos, etc).Instead, novel developments are able to track the progress of individual users, detect strengths and weaknesses, and provide a customized experience to enhance the learning process, among other possibilities.
Many examples of such novels applications can be found in the literature in recent years.For example, Charleer et al. [3] presented a learning analytics dashboard aimed at improving the communication in advising sessions, helping to increase students' motivation.Käser et al. [4] have proposed an approach to student modelling to represent and predict students' knowledge and skills.Also, Al-Saleem et al. [5] and Bydžovská [6] have proposed alternative methods to predict students' academic performance, and Uneno and Miyazawa [7] introduced a system designed to scaffold learning in programming.Finally, it is worth mentioning the efforts to develop virtual worlds for education, reviewed by Nunes et al. [8], and the new possibilities in the introduction of gamification dynamics in the space favored by the adoption of information technologies discussed by Dichev and Dicheva [9].
An important subset of Ed-Tech applications are those aiming at providing a customized experience by suggesting users a certain topic or learning process.This problem can generally be regarded as a recommendation problem.Recommender systems, the instruments used to tackle the problem, have a three-decade long history and have been the subject of a sizable amount of research both in general [10,11] and specifically focused on education [12,13].
The range of strategies developed to assign users to items is broad.Some well established alternatives would be the recommendation of the most popular items [14]; collaborative filtering [15]; content-based approaches [16] or usage context-based similarity [17].In addition to the basic possibilities, there is room for hybrid strategies, like the one currently implemented at S&L, that combine the output of several canonical strategies to generate their output.
The design and implementation of recommender systems in the education space pose a number of difficulties, as discussed by Tarus and Niu [18], and so does their evaluation [19].Among these challenges, we can mention the selection of the right algorithm (or set of algorithms), and the design of the interface.Even though the first aspect is not settled yet, there have been valuable efforts to contribute evidence in this regard like a recent work by Kopeinik et al. [20].
The second aspect is specially complicated when it involves children, as the intuitions by adult designers might not be correct [21,22].Even though there have been advances in this aspect, like the contribution of Wu et al. on interface design for children [23] there is still a significant amount of work to be done.
If we focus on recommendation for children, even though we could mention some relevant works like [24] the volume is still very scarce.To illustrate this, it is worth mentioning that the first specialized workshop, KidRec, took place in 2017 [25].As Deldjoo et al. [26] explain, recommender systems have been traditionally focused on adults and, when it comes to children, the field is still in its infancy.

Materials and Methods
In this study we are interested in comparing different recommender strategies that can be used for suggesting potential games of interest to a child.As it was described earlier, the two alternatives that we are interested in comparing are the popular one, which provides recommendations based on apps commonly played by most children; and the collaborative filtering whose recommendations are based on those games played by similar users, i.e. those who have a similar record of played games.
In order to get a better understanding of which strategy is working best, we will test the performance of both recommender strategies and, additionally, we will compare them against a baseline random recommender, which just recommends random alternatives.
Therefore, an A/B/C testing is performed when running the recommender system.During the whole period in which the experiment will be running, each child will be assigned one group (either A, B or C) randomly following a uniform distribution.When running the recommender, children will receive recommendations coming from a different strategy based on their group: Instead of generating all the recommendations (a maximum of seven) from the resulting strategy, only the first three recommendations will be computed using the recommender corresponding to the child's group, and the remaining recommendations will be chosen randomly.
In all cases, some filters are applied in order to opt out some recommendations which might not be suitable for a given child: • Some games are blacklisted, meaning that they might not have enough quality as to be recommended (e.g., they are in a beta stage).• Games are filtered out if they are not available in the version of the app owned by the user (e.g., a game was introduced in version 4 and the child is using version 3).• Games are filtered out if they are not designed for the range of age of the target child (e.g., a game is aimed at children 4-6 years old, but the child is 3 years old).
It is worth mentioning that these filters are applied in all cases, even in the random recommendation strategy.
Regarding the collaborative filtering recommender, it is based on implicit rather than explicit feedback.This decision is motivated by the fact that small children have their own interacting patterns, and might be unable to properly give credit or score to a game after playing it [26].For this reason, a system based on explicit feedback could be unreliable.Instead, we are basing recommendations on implicit patterns of interaction, and in particular in how much and how long does a child play with a game.

Dataset
During the period of the experiment, 2018/10/15-2018/12/01, we recorded the following information, which we will use to later evaluate the system and report the results: • The recommender (popular, collaborative filtering, or random) assigned to each child.
• The recommendations generated, and also which recommender provided each of these recommendations.
• The date and time at which each recommendation is generated.
• The games usage per child, including the times at which they play and the duration for each game.
It is worth noting that regarding the second aspect, recommendations might not always be generated using the desired recommender.For example, a child might be assigned the collaborative filtering strategy, but this recommender might not have enough information about usage as to generate useful recommendations, or that all of these recommendations are filtered out.In that case, random recommendations are provided instead.
With this data, we can explore the patterns of engagement of children with the different games depending on whether those games were recommended or not.

Performance metrics
The impact of the position in the ribbon of recommendations will be analyzed using click-through frequencies.That way, we will determine whether the first five visible items get more clicks than the last two, and whether the ordering within the visible and invisible ones matters.
The assessment of the recommendation algorithms will be made according to two engagement metrics.One based on the number of games and another one on game time.
The first one, the average number of games per user (ANG), is formally defined as: Where Games Ri is the number of games played by user i on apps recommended by algorithm R (either Collaborative Filtering, Popular or Random, and NumUsers R is the total number of users who were recommended the apps by algorithm R and acted on it. The second aspect of engagement to be measured is the average game time (AGT) by users who acted on the recommendations.The expression used to compute the indicator is: Where GameTime Ri is the total time spent by user i playing apps recommended by algorithm R (either Collaborative Filtering, Popular or Random, and NumUsers R is the total number of users who were recommended the apps by algorithm R and used them.
We should note that, for the computation of these two metrics, we define "game" as the event where the user interacts with the application for 10 seconds or more.Given the presence of some outliers, we also filtered out games of more than 3000 seconds and the instances where a game was played more than 60 times by the same user.These accounted for less than 0.5% of the sample.

Results and Discussion
As it was mentioned before, the first part of the study has to do with the analysis of the impact of the position in the recommendation ribbon on the click-through.The experimental results on this aspect are reported in Table 1.There, we can see the total number of clicks over the relevant period by position for the apps that were recommended in all the possible slots.It is apparent that the first three positions grabbed much more interest than the rest.The difference between the first two and the other five is specially sizable.While it is true that the 4 th received fewer clicks than the ones that follow, other than that, the results are consistent with the existence of a direct relationship between the order and the number of times that users acted on the recommendations.
If we consider visibility, the ribbon only shows five recommendations at a time.We expected that to be a relevant factor, as getting to the last two requires a supplementary effort from the user.Interestingly, even though the average number of clicks on the visible slots was 4830, higher than the 3044 average clicks on the hidden ones, the role of friction as an element that drags click-through down could be questioned.If we consider the effect of position, where apps on the left hand side gather more interest that those on the right, and we infer the trend line, the recommendations on the 6 th and 7 th positions get more clicks than expected.
Regarding engagement, the recommender algorithm based on popularity was superior across metrics.As we can see in Table 2, it outperformed the other two both in terms of number of games and accumulated use time by user.Unexpectedly, the proposed collaborative filtering algorithm resulted in a slightly lower mean number games vs. the random alternative.The sign of this difference, however, was the opposite for game time.The statistical significance of the differences reported in Table 2 was assessed according to the protocol that follows.First, we started testing the normality of the distribution of the engagement metrics with Kolmogorov-Smirnov test with Lilliefors correction.In case normality was rejected, we applied Wilcoxon's test.Otherwise, we tested for the presence homoskedasticity using Levene test and, based on the result, we relied either on Welch test, or the traditional t-test.The results were analogous for the metrics.The superiority of the Popular algorithm over the other two was significant at 1%. Regarding the comparison of the baseline vs the implementation of collaborative filtering, the null hypothesis of equality could not be rejected at the 5% conventional level for neither the number of games nor the total game time.

Summary and Conclusions
In this paper we have described and evaluated the behavior of a recommender system in the scope of an Ed-Tech application.In particular, Smile and Learn is a company that runs a smart library (available in most relevant mobile markets) with over 100 applications, games and tales; aimed at children aged 2-10 and their families.The implemented recommender system served for the purpose of easing navigation through the library and enhancing exploration.
The two aspects that constituted the subject of study of the recommender system were the influence of the order of recommendation on user exploratory behavior and the impact of the choice of a recommendation algorithm on engagement.In order to be able to evaluate such aspects, we acquired data of real interaction with the application between 2018/10/15 and 2018/12/01, running an A/B/C test with three different implementations of a recommender system (including a random baseline) and measuring clicks made on recommendations.
The following conclusions are drawn from the analysis of these data: First, there is a direct association between the order in the recommendation ribbon and the number of clicks.Apps on the left gather more interest that those on the right hand side.The friction introduced by the fact that reaching the last recommendations requires either swiping or pressing on the arrows on the side does not seem to have a negative impact, which was unexpected.
A/B/C testing was used to assess two recommendation approaches, one that recommends apps based on popularity and an implementation of collaborative filtering, both compared against a random recommender used as baseline.The experiments offered two main results: The first one is that the popular algorithm beats the other two both in terms of number of games and the total time spent playing the recommended contents.The second is that the current implementation of collaborative filtering does not seem to be add add value, as it offers the same performance as the baseline.
Future works would include thorough studies on alternatives to the current implementation of collaborative filtering or the optimization of its parameters; studying new possibilities based on the interface to drive the attention of the user to the recommended applications; or the development of recommendation strategies based on competences aimed at fostering the development of the user according to predefined preferences established by parents and educators.

Figure 1 .
Figure 1.Screenshot of the main menu of Smile and Learn, showing the different worlds available in the application, each world corresponding to an intelligence.

Figure 2 .
Figure 2. Screenshot of a world menu (in this case, the Science world) in Smile and Learn, where apps belonging to the same world (or intelligence) are listed grouped by category.
(A) The popular recommendation strategy is used.(B) The collaborative filtering recommendation strategy is used.(C) The random recommendation strategy is used.

Table 1 .
Accumulated click-through by position in the recommendation ribbon and mean for the visible and invisible positions for the period 2018/10/15 -2018/12/01.