Induced Emotion-Based Music Recommendation through Reinforcement Learning

: Music is widely used for mood and emotion regulation in our daily life. As a result, many research works on music information retrieval and affective human-computer interaction have been proposed to model the relationships between emotion and music. However, most of these works focus on applications in a context-sensitive recommendation that considers the listener’s emotional state, but few results have been obtained in studying systems for inducing future emotional states. This paper proposes Moodify , a novel music recommendation system based on reinforcement learning (RL) capable of inducing emotions in the user to support the interaction process in several usage scenarios (e.g., games, movies, smart spaces). Given a target emotional state, and starting from the assumption that an emotional state is entirely determined by a sequence of recently played music tracks, the proposed RL method is designed to learn how to select the list of music pieces that better “match” the target emotional state. Differently from previous works in the literature, the system is conceived to induce an emotional state starting from a current emotion instead of capturing the current emotion and suggesting certain songs that are thought to be suitable for that mood. We have deployed Moodify as a prototype web application, named MoodifyWeb . Finally, we enrolled 40 people to experiment MoodifyWeb , employing one million music playlists from the Spotify platform. This preliminary evaluation study aimed to analyze MoodifyWeb ’s effectiveness and overall user satisfaction. The results showed a highly rated user satisfaction, system responsiveness, and appropriateness of the recommendation (up to 4.30, 4.45, and 4.75 on a 5-point Likert, respectively) and that such recommendations were better than they thought before using MoodifyWeb (6.45 on a 7-point Likert).


Introduction
Music is an important aspect of any human culture, being able to induce a range of intense and complex emotions both in musicians involved in composing pieces and individuals listening to them.The digital age involved sizeable changes in the economy, in the industrial and social spheres, with interesting advances and transformations also in the music field.With regard to the music streaming market, its size was valued at USD 29.45 billion in 2021 and is expected to expand at a compound annual growth rate (CAGR) of 14.7% from 2022 to 2030 [1].The penetration of music streaming platforms, as well as the ubiquity of smartphones, will boost the music market growth, accordingly.Furthermore, platforms that allow streaming services are gaining popularity, offering services such as song recommendations and automatic playlist personalizations by supporting individuals in suggesting similar (and preferred) pieces.On the other hand, emotion/mood has become a fundamental criterion used by digital technologies in predicting social behaviors or conditioning people in their social interactions and work activities.
In light of this, music systems that regulate mood and emotions in our daily life are arousing particular interest.Consequently, a great deal of research has been undertaken in the affective computing community to model the relationship between music and emotion [2][3][4][5].More in general, applications of affective computing studies can be found in education, health care, entertainment, ambient intelligence, multimedia retrieval, and music retrieval and generation.As for the specific musical context, most of these works consist of context-sensitive recommendation tools that consider the listener's emotional state.Unfortunately, few results have been obtained in the study of music systems for induction of future emotional states, i.e., methods for influencing the emotional state of listeners and adapting interaction with technology to their affective state.Unlike games or movies [6,7], where the study of the induction of emotional states has recently obtained interesting results, especially in educational or commercial contexts, the potential of music as a means of inducing emotions still leaves significant possibilities for study.
Inductive systems use affective contents for induction of emotional states, assuming that the emotions conveyed by the affective content (perceived emotions) are always consistent with emotions brought to mind in users' (induced emotions) games or movies [2].There are at least two main perspectives in such systems: user and system perspectives.Users perceive and interpret the content (perceived emotions).Systems usually provide emotional annotations that describe which emotions are expected by the users during an interaction step (intended emotions).In general, perceived and induced emotions are not usually considered separately in studies on affective content.However, some studies on "music emotions" have shown that in music, traditionally regarded as an art form that can make people produce emotional responses or induce their emotions naturally [8,9], emotions perceived are not always consistent with the emotions elicited in listeners [10].This is particularly evident in most music recommendation systems.Recommender systems have been widely studied in recent years [11,12], but they do not always lead to the best possible designs for affective recommendation systems.
Several studies showed that emotions could play a significant role in designing intelligent music recommendation systems, and most of them focused on "recognizing" emotions induced by music [13][14][15][16][17][18], nearly no attempts have been made to model musical emotions and their changes over time in terms of a target "emotion to induce".In this paper, we explore this direction, and we focus on the problem of defining an intelligent music recommendation system that, given a future target emotional state to induce, and starting from the assumption that an emotional state is determined entirely by a sequence of music pieces recently listened to, selects the list of music pieces that better "match" it.
In order to "recommend" music for inducing a target emotional state, we exploit reinforcement learning (RL) techniques that were proven effective in recommendation music systems.The idea is to train an intelligent agent capable of recommending songs such that the user's mood changes from a given current emotional state to a desired target emotional state based on the user's musical preferences (see Figure 1 for an overview of the proposal).The agent learns the user's preferences and the best trajectories for inducing the target emotion through a feedback-based mechanism.However, to face sparse and deceptive problems, we propose a novel method based on Go-Explore [19].Go-Explore has proven particularly effective on hard-exploration problems, a category that many real-world issues belong to.We will show that this also applies to the problem of affective computing in the musical recommendation.
The main contributions of this paper can be summarized as follows.

•
We propose Moodify, a novel music recommendation system based on Go-Explore, which takes into account the listener's emotional state for inducing a future target emotion; the main novelty is that it adopts a "look-forward-recommendation", i.e., it recommends music intended to induce (in the future) a specific target emotion.Previous works in the literature have proposed methods that, based on the current user's mood, recommend music or artists to listen to, for example, by computing similarities between artists' and users' moods.

•
To analyze its effectiveness and overall user satisfaction, we have involved 40 people in testing Moodify, with one million music playlists from the Spotify platform; results obtained show that the proposed system can bring both significant overall user satisfaction and high performance.To the best of our knowledge, this is one of the few proposals of a system which undergone an evaluation phase of this kind.

•
The proposed method has been developed as a Web application, namely MoodifyWeb, which exploits Spotify API for developers and JavaScript.To the best of our knowledge, this is one of the first proposals deployed in software for end-users.
The remaining parts of the paper are structured as follows.First, in Section 2, we offer an overview of the most used recommendation approaches in the literature, and we discuss some relevant works that inspired the proposed one.With respect to these points, we place our proposal highlighting similarities and differences with previous works.Then, in Section 3, we provide preliminary knowledge to understand the methods and techniques used in Moodify.In Section 4, we formalize the music recommendation system highlighting the relation with Go-Explore.Next, Section 5 offers details on the web application implemented that revolves around the proposed music recommendation method.Section 6 is devoted to discussing the results obtained when surveying users about the usability and satisfaction of MoodifyWeb.Lastly, in Section 7, we provide final remarks with envisioned future directions of this research.

… …
How do you feel?How do you wish to feel?

Related Work
In the literature, there are many solutions for recommending music suited to the listeners' environment, and in general, in all areas that refer to IT for "well-being", e.g., gyms [20] or home settings [21].From a musical point of view, the music generation takes place either by selecting existing music from platforms such as Spotify or Youtube or by using sophisticated generative music composition techniques [22,23].From a technological point of view, most such systems combine Artificial Intelligence (AI) and Internet of Things (IoT) techniques to ensure intelligent musical choices that satisfy listeners [21].Therefore, given the vastness of the topic, in this section, we limit ourselves to an analysis of the works most closely related to the one proposed in this paper, essentially classifying them into four main categories: collaborative filtering (Section 2.1), content-based filtering (Section 2.2), context-based filtering (Section 2.3), and emotion-based filtering systems (Section 2.4).

Collaborative Filtering
Collaborative filtering generates automatic predictions about users' interests by collecting preferences from a large user base.This approach adopts user ratings to recommend songs.Such systems are built on the following assumption: users who rate songs similarly in the past will continue to rate them similarly in the future [24].Usually, clustering algorithms are employed to deliver recommendations.Ratings can be of two kinds: explicit or implicit.Examples of explicit ratings are the stars-based systems that e-commerce sites use: the user has to express a rating often based on a five-star score where the higher is the better (e.g., Trustpilot (Available online: https://it.trustpilot.com,accessed on 18 October 2022)).These ratings are explicitly provided by the users.Instead, implicit ratings can be collected by throwing light on the user's behaviors.For instance, play counts can be employed for implicit rating: a song played thousands of times gets a higher implicit rating than some others that have listened a dozen times.The biggest drawback of these kinds of systems is that they offer poor recommendations in the early stages.Especially for songs with very few ratings, recommendations are performed in a not-very-reliable [25] fashion.This is a well-known issue in the literature, named the cold-start problem.When a new user joins the system, the recommender cannot offer effective suggestions, as the user has never interacted before and hence has not rated anything yet.Another challenge of collaborative filtering is related to human effort.In general, users are not willing to rate every item on a system that requires a lot of effort and attention to generate recommendations.Among the closest articles in the literature, there is the proposal by [26] in which association rules and music features were added to a collaborative filtering mechanism.The system considers users' preferences for different song features and uses the similarity of interests among different users to suggest music.The system has been implemented in a Web application as we did, and the author also performed an experiment with 20 real users.The main difference is that we do not use a collaborative filtering method, but the suggestion is only tailored to the specific user; we employ an RL-based method to recommend music and not a rule-based algorithm; lastly, the deployed Web application is not described thoroughly as well as the user evaluation, which is, furthermore, only a preliminary one and does not involve, for example, confirmation of expectation test.In addition, the results of the evaluation are not clear.
Moodify employs an explicit rating "encapsulated" in an RL approach tailored to one user only.Every user has his/her own agent tailored, through usage, to his/her needs.Such a rating mechanism is used to define a reward function, i.e., by asking the user to evaluate how much the emotion felt at the end of each listening is similar to target emotion.

Content-Based Filtering
In the content-based filtering approach, music is recommended to exploit the system's comparison between the items' content and a user's profile.Each item's content is represented as a set of tags.In the case of textual documents, the tags can describe the words within a document.In the case of music, the tag-in the simplest form-can be related to the genre.Several issues must be considered when implementing such a category of systems: • tags can either be assigned automatically or manually; • the tags must be generated or assigned such that both the user's profile and the items can be easily matched and compared to derive a similarity measure; • a learning algorithm must be chosen that learns and classifies the user's profile based on played songs (i.e., seen items) and offers recommendations based on it.
To recommend music, the song's features, such as loudness, tempo, and danceability, are analyzed.Among the widely used methods to perform content-based filtering and measure similarities between user's profile and songs are (i) clustering [27] and (ii) expectation-maximization with Monte Carlo sampling.These techniques can recommend music tracks also with very little data; thus, they solve the cold-start problem (seen in Section 2.1).The major challenge of these approaches is in the appropriateness of the item model [28].Another major drawback is that, with tags trying to describe the songs' macro-characteristics, these approaches fail to differentiate crucial musical differences between similar songs in terms of tags.
In [29], the authors introduce MoodPlay, a system for recommending music artists based on the general mood of the artists and the self-reported mood of users.The authors proposed the method and the visual (graph-based) interface of the system.In addition, they performed an experiment with more than 200 final users.From these experiments, it emerged that mood plays a crucial role in the recommendation.The main differences from this work are (i) we recommend songs, not artists; (ii) we only base our recommendations on the starting emotion and target emotion; thus, we do not consider artists' general mood; (iii) our system is designed to induce a particular emotion, not to recommend a specific artist based on "the similarity" between certain user and artist moods; (iv) the ultimate goal of [29] was more related to understanding how users perceive recommendations through visual interfaces than generating an affective recommender system.
With respect to these kinds of systems, Moodify does not use pre-defined item content to compare with the user profile.It "dynamically" builds an intelligent agent capable of selecting the music most suited to the user's target emotional state simply by observing the choices and the ratings assigned by the user himself during a training phase.

Context-Based Filtering
The context-based filtering approach takes advantage of the public perception of a music track in its suggestions.It exploits social media such as Facebook and Twitter and video platforms such as YouTube to collect information and derive insights about the public opinion of songs.Then, it recommends such music tracks accordingly to the users.This approach considers the users' listening history of collecting user data; next, it recommends similar songs based on the engagement the songs have generated on social media.The context-based technique can build a "For You section" for the user through intelligent exploitation of user preferences (i.e., the listening history) and social media engagement of different music tracks.Another technique in this category of filtering uses the user's location to suggest appropriate music tracks.The basic idea is that listeners in the same place may like similar music, and the system suggests music tracks with this assumption.The literature offered insights into the performance of this model, that is, it could perform as well as the amount of social information collected [30], but it needs to integrate with various sources and exploit a joint analysis of a massive data load to ensure good performance.
A different kind of context-based technique exploits data captured from the users, for example, from their activities that are treated as context.In [31], the authors propose a smartphone-based mobile system to recognize human activities and recommend music accordingly.In the proposed method, a deep recurrent neural network is applied to obtain a high level of activity recognition accuracy from accelerometer signals on the smartphone.Music recommendation is performed using the relationship between recognized human activities and the music files that reflect user preference models in order to achieve high user satisfaction.The results of comprehensive experiments with real data confirm the accuracy of the proposed activity-aware music recommendation framework.In this case, the authors have not developed the system as an application for end-users, and they have not evaluated their method with listeners.Conversely, in the present work, we provide insights from end-users on the MoodifyWeb app deployed.Similarly, in [21], the author proposed a framework based on deep learning and IoT architectures to build a music recommendation system, but did not provide any software or evaluation to listeners.Both the aforementioned works revolve around the recognition of emotion through different devices and the recommendation of a suitable song.Differently, we aim to induce emotion through a series of songs with Moodify.
With respect to context-based filtering, our solution does not build a listening history nor collect information to be used for the recommendation.Instead, the listening history is implicitly employed in the training phase to build the agent and the reward of our method.The only listening information exploited concerns the audio features from Spotify of the songs listened to during the training sessions.

Emotion-Based Filtering
As explained above, music and human emotions are closely intertwined, so we have a recommended approach that considers human emotions, namely emotion-based filtering.Different audio features of the music tracks are used to understand emotions that they may trigger or induce.Then, music streaming sites build playlists based on human emotions and moods tailored to a feeling that a user might experience while listening to those songs.In this field, the research on affective computing has produced a series of interesting solutions (see [32] for a recent survey on the topic related to music).We have identified various works [33][34][35][36][37][38][39][40][41][42].Some studies identify emotions through facial expressions.Others analyzed EEG [40], physiological, and video signals.These works show that musical recommendation is generally carried out by combining physiological signals, heart and respiratory rates, and facial expressions, and in general, AI methods (generally deep learning techniques) were used to analyze such information.Among the works on this kind of filtering, we found [43], where the authors propose an emotion-based music recommendation framework that learns the emotion of a user from the signals obtained via wearable sensors.In particular, a user's emotion is classified by a wearable computing device integrated with a galvanic skin response and photoplethysmography sensors.The experimental results were obtained from 32 subjects' data.The authors evaluated several machine learning methods, such as decision tree, support vector machines, and k-nearest neighbors.The results of experiments on real data confirmed the accuracy of the proposed recommender system.With respect to [43], we deploy an RL-based recommender system for music to induce emotions in a prototype Web application, and we perform a real-world experiment with end-users to get their perceptions about Moodify.
Moodify belongs to this class.However, some novelties need to be highlighted.First, the equipment needed for the recommendation.Such solutions require EEG or ECG, facial expression, or physiological information to recommend adequate songs.However, the devices need to capture those traits for the mood analysis are not common and quite expensive in some cases.Our idea is that Moodify can recommend music without requiring further devices or equipment.Though, Moodify can be extended with appropriate modules to consider traits like facial expression, and EEG for recognizing the mood while in use.Furthermore, Moodify adopts a "look-forward-recommendation", i.e., it recommends music with the aim of inducing (in the future) a specific target emotion.All the methods described, instead, adopt a "look-back-recommendation", i.e., to recommend music only by using previously collected or observed information.

Summarizing Literature's Proposals
In this section, we summarize the proposals available in the literature and we list the similarities and differences with ours.Such information is reported in Table 1, where we sketch the papers based on: (i) type of approach (collaborative, emotion, etc.); (ii) the idea behind the proposal; (iii) whether a method is presented; (iv) whether the software is presented/available; (v) whether a user evaluation/study has been carried out.Induce emotion creating a trajectory of music songs to listen so to get an indicated target emotion given a starting emotion

Background
This section provides some basic notions necessary to understand the proposed system.We first describe the model of emotions used in this work (Section 3.1).Next, we detail the audio features provided by Spotify (Section 3.2) and the RL-based method used to define the proposed music recommendation system (Sections 3.3 and 3.4).

Models of Emotional States
Emotions are biologically based reactions essential in determining behavior [44].Among the several models of emotions proposed in the literature, one of the most used is the circumplex model defined by Russell [45].Such a model organizes the emotional states in terms of valence and arousal.The result is a two-dimensional space, where a pleasantunpleasant (valence) value is represented by the horizontal axis and high-low arousal is represented by the vertical axis (see Figure 2).As proposed in [6], in this work, we have used such a model by considering emotional states organized in the following groups: pleasant-high (excited, amused, happy), pleasant-low (glad, relaxed, calm), unpleasant-high (tired, bored, depressed), and unpleasant-low (frustrated, angry, tense).We remark that in this work, we are not interested in the recognition of emotional states.As we will see in Section 4, to build the Go-Explore-based model used by the proposed recommendation system, we have used the "user feedback" regarding the emotions induced by the musical pieces used during the training phase of the model itself.

Spotify
Spotify Technology S.A. (Available online: https://www.spotify.com/it/,accessed on 18 October 2022) is a media-services provider whose core business is an audio streaming platform that provides access to over 50 million music tracks (Available online: https://newsroom.spotify.com/company-info/,accessed on 18 October 2022).The platform exposes a simple API that allows developers to interface with its music repository, in which tracks are organized through specific features (Table 2).Energy (e) "Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity.Typically, energetic tracks feel fast, loud, and noisy.For example, death metal has high energy, while a Bach prelude scores low on the scale.Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy." [0 "Predicts whether a track contains no vocals."Ooh" and "aah" sounds are treated as instrumental in this context.Rap or spoken word tracks are clearly 'vocal.'The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content.Values above 0.5 are intended to represent instrumental tracks, but confidence is higher as the value approaches 1.0." "Detects the presence of an audience in the recording.Higher liveness values represent an increased probability that the track was performed live.A value above 0.8 provides a strong likelihood that the track is live." Loudness (lou) "The overall loudness of a track in decibels (dB).Loudness values are averaged across the entire track and are useful for comparing the relative loudness of tracks.Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude)." [−60, 0] Speechiness (s) "Speechiness detects the presence of spoken words in a track.The more exclusively speech-like the recording (e.g., talk show, audiobook, poetry), the closer to 1.0 the attribute value.Values above 0.66 describe tracks that are probably made entirely of spoken words.Values between 0.33 and 0.66 describe tracks that may contain both music and speech, either in sections or layered, including such cases as rap music.Values below 0.33 most likely represent music and other non-speech-like tracks." [0 "The overall estimated tempo of a track in beats per minute (BPM).In musical terminology, the tempo is the speed or pace of a given piece and derives directly from the average beat duration."[30,240] Valence (v) "A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track.Tracks with high valence sound more positive (e.g., happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g., sad, depressed, angry)." The Spotify API allows to interact with the repository in different ways, and it organizes the possible calls into several groups (Available online: https://developer.spotify.com/documentation/web-api/reference/, accessed on 18 October 2022).

Reinforcement Learning Notes
In real-world scenarios, individuals learn to make decisions based on their experience and interaction with the external environment.Such a learning process is related to the socalled law of effect: responses that produce a satisfying effect in a particular situation become more likely to occur again in that situation, and responses that produce a discomforting effect become less likely to occur again in that situation.
The first studies of this phenomenon are due to Skinner, which carried out experiments aimed at observing the behavior of individuals inside an operant conditioning chamber, consisting of an environment in which only some operations/actions (generally at most two) are possible and the choice of which operation to perform depends on punishment or reinforcement.These studies have given rise to the RL [46], a sub-field of machine learning which focuses on the "goal-directed learning from interaction".RL faces a different problem than supervised and unsupervised learning, that is, to observe an agent that acts within an environment and decides which actions to perform on the basis of rewards assigned by the environment itself.Differently, in supervised learning, the agent learns "how to map" input data (samples of the problem) to output, usually to classes.To achieve this goal, during a learning process, the agent learns from training data and labels expected outputs for such data.In unsupervised learning, instead, the agent is only provided with not labeled unstructured input samples, which it seeks to discover hidden structures and patterns.

The Learning Model
In RL, the agent interacts with the environment by choosing from time to time which actions to take in order to achieve a goal.Each agent's action changes the environment's state and affects future choices within the environment itself.In order to monitor the not predictable effects of the actions, the agent takes into consideration some crucial elements: policy, reward, and value function.The policy is a mapping of the environment states into agent actions and essentially indicates which action is preferable to perform in correspondence with a particular state.The reward indicates how desirable it is, in the "immediate" term, for the agent to be in a specific state.In this sense, it can be intended as the "short-term" goal of the agent.The entire process is divided into a succession of actions by the agent over time, each corresponding to a change in the environment's state and a reward to the agent based on the action taken.The agent's main goal is to "maximize" the reward over time.The value function is the "long-term" goal for the agent.Given a state, the corresponding value function predicts the rewards determined from it, i.e., the total amount of reward that the agent will accumulate starting from it.
The objective of an RL algorithm is to build a policy and value function that the agent will use to maximize the reward.

Markov Decision Processes
Markov Decision Processes (MDPs) can be used to provide a mathematical representation of the model described above since they are generally used for describing "decision making" contexts in which the decision maker affects the result of the decisions.We focus on environments in which the number of actions and states is-finite ("finite" MDPs).
Formally, an MDP is a tuple (S, A, P, R), where (i) S is the set of states, (ii) A is the set of actions that the agent can undertake, (iii) P is the probability that action in some state s will result in the state s , i.e., P a (s, s ) = P(s = S t+1 |S t = s, A t = a), (iv) R is the expected reward as a result of action a which led the environment to go from state s to state s .
An MDP searches for the "best policy" for the agent.A policy function π maps a pair s, a , where s ∈ S is a state and a ∈ A(s) is an action, to the probability π(a|s) of undertaking a when in s. π is used for estimating a state the expected reward V π (s) when starting in s: We say that V π (s) is the state-value function for π.However, for a similar function based on the agent's actions, it is necessary to relate the choice of action s under the policy π.So, we define the expected reward starting from s and taking the action a following π: The optimal Q-value function (q * ) gives us the maximum reward obtainable from a given state-action pair with any policy pi.The best policy π * , therefore, is to take the best action, as defined by q * , in each step.

Go-Explore
RL has made impressive progress in recent years, showing impressive performance in games such as Go [47,48].However, these results hide some of the more difficult challenges not addressed in trying to scale RL to really complex situations, such as "hardexploration" problems: (i) sparse-reward problems, i.e., a large number of actions is necessary to obtain rewards; (ii) deceptive-reward problems, i.e., the reward feature provides misleading feedback, which can lead to getting stuck in a local optima.
Classic RL algorithms perform poorly on such kinds of problems, and several strategies have been proposed to overcome this issue.In this paper, we adopted the metaphor proposed by Go-explore [19].It works in two phases: (i) first find out how to solve a brittle (e.g., deterministic) version of the problem, and then (ii) robustify the model to be reliable even in the presence of unforeseen situations and of stochasticity in general.
Usually, the first phase focuses on the exploration of poorly visited states.To this aim, it builds an archive as follows: (1) save in the archive all interesting states visited so far trajectories to reach such states, and (2) for each state in the archive, Go without exploration from such a state, and Explore for interesting states.The second phase of Go-Explore "robustifies" high-performing trajectories from the archive in order to deal with stochastic dynamics of the environment.Robustification is achieved via imitation learning [49][50][51][52], i.e., "learns how to solve a task from demonstrations".In Go-Explore, such demonstrations are produced automatically by the first phase.

Music Recommendation Based on Go-Explore
In this section, we propose Moodify, a music recommendation system based on Go-Explore and able to induce target emotions in the user.Table 3 summarizes the main project decisions made to exploit the Go-explore paradigm for defining the proposed system.Table 3. From Go-Explore to Moodify: project decisions.

Go-Explore Moodify
The states correspond to the game's cells (e.g., pixels) The states correspond to the emotions The agent begins by exploring the environment without any prior knowledge about it The agent learns during the training phase based on user feedback without having to make any decisions When the agent reaches a new state rewarding him with points, the algorithm stores such a state When the agent reaches a new state rewarding him with points, the algorithm memorizes the musical features corresponding to that state The agent continues to explore from a stored state, thus, being able to progress to new states over time The agent continues to listen to memorized music, thus, being able to progress towards new emotions over time Each time the game character dies, a negative reward is assigned to that cell Whenever the user gives feedback that does not correspond to the desired emotion, the agent receives a negative reward In the following, we first dwell on definitions and preliminary information (Section 4.1).Next, we formalize the problem (Section 4.2, then we show the methodology adopted (Section 4.3) with details about the two main steps of Moodify, i.e., listen until solved (Section 4.3.2)-withrelated cell and state representation, selection, exploration and updateand emotion robustification (Section 4.3.3).Lastly, we dwell on the limitations of the method (Section 4.3.4).

Preliminaries and Definitions
During a preliminary analysis, we observed that given an initial emotional state, the induction through music recommendation of a target emotional state practically never occurs through listening to a single piece of music.In fact, it is usually necessary to listen to different songs with the passage of intermediate emotions.Formally, let E s be the start emotional state and E t be the target emotional state of a user, and let m 1 , . . ., m N be the sequence of musical songs that induces E t in the user starting from E t .We observed that N 1.This interesting observation justifies using an RL-based approach to define an emotion-based music recommendation system able to induce emotions.
We define m 1 , . . ., m N as a musical trajectory from E s to E t for the user, and E 1 , . . ., E N the corresponding emotional trajectory from E s to E t for the user, where E k is the intermediate emotional state induced in the user, after listening to m 1 , . . ., m k .
Observe that, let E be an emotional state, it corresponds to a specific point in the twodimensional space of the circumplex model shown in Figure 2. So, given two emotional states E s and E t , we say that the distance between E s and E t , denoted with d(E s , E t ), is the euclidean distance between the point in the circumplex model corresponding to E s and the point in the circumplex model corresponding to E t .

Problem Description
We emphasize that let E s be the start emotional state and E t be the target emotional state of a user, Moodify will be trained to propose the "best trajectory" from E s to E t .In our context, the concept of "best trajectory" is related to two aspects.On the one hand, we are interested in finding the musical trajectory that allows the listener to reach an emotional state that is "as close as possible" to the chosen target emotional state (appropriateness of the recommendation).On the other hand, we are interested in reaching the emotional state in the shortest possible time, therefore, in waiting as short as possible (responsiveness of the recommendation).
Thus, the problem faced at each request for a music recommendation can be formalized as follows: given a start emotional state E s and a target emotional state E t , the goal is to find the musical trajectory m 1 , . . ., m N , which, starting from E S , (i) minimize the distance between E t and E t , where E t is the target emotional state reached after listening m 1 , . . ., m N , and (ii) minimize the length N of the musical trajectory m 1 , . . ., m N .In the following, we propose a Go-Explore-based system to face such a problem.The idea is to recommend music according to the best policy built by such a system.As we will see in Section 6, a preliminary evaluation study has been conducted to evaluate this approach.

The Methodology
This section provides details about the methodology followed to build Moodify.First, we will reformulate our decision-making context in terms of MDP.Then, we will describe the main steps of the proposed Go-Explore-based approach.

Induced Emotion-Based Music Recommendation as MDP
In this section, we define the notions of state, action, reward, and transition model.

•
state: one state corresponds to one specific emotion defined in the circumplex model (Section 3.1), represented as the pair x, y where x and y are the coordinates in the two-dimensional plane; at each request of recommendation, the user starts with a start state, chooses a target state, and after listening each song reaches a new state.• action: the action space is the set of possible musical songs; given a current state E s and it's coordinate in the circumplex model x and y, our model recommends a song and stores a, d, e, in, liv, lou, s, t, v which are the acousticness, danceability, energy, instrumentalness, loudness, speechness, tempo, and valence (see the Spotify audio features in Table 2); therefore, a recommendation is a tuple E s , a, d, e, in, liv, lou, s, t, v .• reward: as also detailed in the following, in our approach, we adopt a "feedback-based reward", i.e., after each listening, the user assigns a score (integer in [0, 10]) which represents the perception of the user on "how much the emotion perceived after the listening is similar to the chosen target emotion".
We remark that the complexity of the recommendation task of the MDP described above depends on several terms.First, it depends on the number of emotions N e described in the circumplex model.As explained in Section 5, in this preliminary work, we focus on 8 emotional states (see Figure 3).Furthermore, the complexity also depends on the domains of each Spotify feature described in Table 2, which represent the parameters that the model changes from time to time to "adapt" the trajectory.Finally, the complexity also depends on the length N of the trajectory (number of songs) m 1 , . . ., m N chosen by the model to reach the target emotional state.Formally, the complexity of the described MDP is: where x, with x = a, d, e, in, liv, lou, s, t, v is a Spotify audio feature (see Table 2) and range(x) is the range of x.However, as discussed in Section 6, we have evaluated the system's responsiveness during real-experience time windows.As a result, the involved users positively rated the system's capability to recommend music in a timely manner.

Step 1: Listen until Solved
The goal is to discover high-performing trajectories in the emotion space, to be improved in Step 2. The result is an archive of different emotional states, named "cells", and trajectories to follow to reach these states.
This step is organized into several listening sessions.The goal of each session is to find high-performing trajectories for one specific target emotional state chosen by the user.Indeed, at the beginning of the session, the user declares the state he would like to reach at the end of it, i.e., the target emotion he would like to experience after the listening session.At the beginning of each session, the archive only contains the initial emotion selected by the user (start state).From there, the system repeats the following steps: (i) select a cell from the current archive, (ii) explore from that cell location stochastically, i.e., recommend random music and collect feedback from the user after the listening, (iii) add new cells and their trajectory to the archive.Here, we provide details about the cell, state representation, and the reward function based on the feedback the user provided at the end of each listening.

Cell and State Representation
In order to be tractable in high-dimensional state spaces like the emotional space (see Figure 2), Step 1 of Moodify reduces the dimensionality of the search space into a significant low-dimensional space.Our idea is to conflate "similar" emotions in terms of musical features required to stimulate them in the user, in the cell representation.To this aim, in our approach, first, we have discretized the emotions space represented by the circumplex model (see Section 3.1) into a grid in which each cell is 8 × 8.Then, we decided that each state contains information about a specific emotion and the set of audio features that a music song should have to arouse this emotion.

Selecting and Returning to Cells
Step 1 selects a cell at each iteration.Moodify preferred cells (i) not visited often, (ii) recently used to discover a new cell, and (iii) expected near undiscovered cells.Moodify stores the sequence of musical songs that lead to a cell to avoid added exploration.

Exploration from Cells
Once a cell is reached, Moodify explores the emotion perceived by the user for each of k = 100 training musical songs randomly selected from Spotify, with a 70% probability of listening to the previous music at each step.After each listening, the user selects the emotion perceived, i.e., the state reached.Then, he assigns a score to the reached state, i.e., an integer value in [0, 10] which represents "how much the emotion perceived after listening to the song is similar/close to a target emotion established at the beginning of the session".Finally, the audio features of such a state are updated.Formally, let [x, y, s 1 , . . ., s 9 ] be the start state, and let [s 1 , . . ., s 9 ] the audio features of the listened song.Then, if s i = 0 then set s i = s i , for each i = 1, . . ., 9. Otherwise, set s i = s i +s i 2 , for each i = 1, . . ., 9. Exploration can be terminated at the end of listening to the selected k training songs limit for exploration or when the user stops the exploration/listening session.

How to Update the Archive
During the exploration of a cell, the archive can be updated in two cases.First, the agent explores a cell not contained in the archive.In this case, details about such a cell are added to the archive, together with some related information: (i) the full "trajectory" (both musical and emotional), in terms of a sequence of state-vectors, to follow for reaching that cell from the starting state; (ii) the current environment state; (iii) the trajectory score; (iv) the trajectory length in terms of the number of listened songs.The second case is when the trajectory is "better" (higher score) than that belonging to a cell already saved the archive.

Step 2: Emotion Robustification
As a result of Step 1, Moodify collected a set of high-performing trajectories.To make the trajectories robust to any noise, Step 2 creates a policy via imitation learning.The idea is to build a policy that performs as well as the trajectory discovered during the exploration, but at the same time, it must be able to deal with circumstances not present in the original trajectory.As proposed in [19], to train the policy, we chose a Learning from Demonstration algorithm that proved to be able to improve upon its demonstrations, i.e., the Backward Algorithm [51].It works as follows: (i) the agent starts near the trajectory's last state t and runs a standard RL algorithm (in our approach, we chose the Q-Learning approach [46]) from such a state, (ii) when the algorithm has learned to get a better reward than t, the algorithm repeats the process by starting from a point near to the trajectory and repeats the process, (iii) if for each trajectory form the initial state the agent is able to obtain a better score, then stop the process.

Limitations of the method
At the end of the training, if E 1 , . . ., E N are the emotions on which the system has been trained, then Moodify has N Q-Table available, each of which will be used when the corresponding emotion is used as target emotion.The problem with this approach is that when we face complex environments, such as the emotional space described by the circumplex model shown in Figure 2, where the number of states and actions can grow, then Q-tables can become unfeasible.As we will see in Section 5, in this preliminary work, we focused on only 8 emotional states.However, as also highlighted in Section 7, in future work, we have planned to exploit Deep Q-learning techniques.Such techniques exploit power deep feedforward neural networks for computing Q-value, i.e., to use the output of such neural networks to get new Q-value.

MoodifyWeb: The Web Application
We developed a Web application, namely MoodifyWeb, which uses the method described in Section 4 to enable listening to music songs from Spotify according to target emotions selected by the user.For the development, we used Vue.js (Available online: https://vuejs.org/,accessed on 18 October 2022), a JavaScript framework and the Spotify API for developers (Available online: https://developer.spotify.com/,accessed on 18 Oc- With the aim of strengthening the validation of experiences, the confirmation of expectations test has been carried out.We measured the expectation level through a 7-point Likert scale with "Strongly disagree" ("1") and "Strongly agree" ("7") as verbal anchors [58].Responses were provided, also in this case, via MoodifyWeb.Next, we calculated the minimum, maximum, and average scores for participants' confirmation of expectations, i.e., 4.7, 6.45, and 7, respectively.Lower values mean that participants' expectations were too high, and so the "recommendation" is worse than expected; conversely, a high value suggests that participants' expectations were too low, and so the "recommendation" is better than they thought.The value of 6.45 confirms the latter for most of the participants: Moodify recommendations were satisfying with respect to participants' emotional demands.
Lastly, in the (Summary Step), we gathered suggestions about imaginable enhancements to the music recommendation system.Among the most interesting imaginable enhancements, we found: "It would be interesting to directly integrate Moodify into a plug-in for Spotify", and "It would be interesting to consider on MoodifyWeb other aspects such as the environment in which the user is and/or the activity carried out by the user while listening to music".

Conclusions
In the digital age, emotion/mood has become a fundamental criterion used by ICT systems in predicting social behaviors or conditioning people in their social interactions and work activities.In light of this, music systems that regulate mood and emotions in our daily life are arousing particular interest.Therefore, the affective computing research community has put efforts into modeling the relationship between music and emotion.
Applications of affective computing studies can be found in education, health care, entertainment, affective ambient intelligence, multimedia retrieval, and music retrieval and generation.As for the specific musical context, most of these works consist of contextsensitive recommendation tools which take into account the emotional state of the listener.Few results have been obtained in the study of music systems for induction of emotional states, i.e., methods to influence the emotional state of listeners and adapt interaction with technology to their affective state.
In this work, we have employed RL methods for developing Moodify, a novel music recommendation system that can induce a target emotional state in the listener.We implemented Moodify in MoodifyWeb, a Web platform delivered to end-users.The results of an evaluation study carried out with potential end-users proved that our system is useful and satisfactory for all participants involved.

Limitations and Future Works of the Project
There are a series of envisioned steps for the next future of Moodify that we try to summarize as follows.Currently, MoodifyWeb interacts with Spotify by searching specific music tracks and recommending them to the user, but we have planned to directly "incorporate" it in the Spotify interface (e.g., Spotify plug-in).By doing so, the user does not have to switch between MoodifyWeb and Spotify but can use one only integrated application.Furthermore, we have planned to make Moodify more extensive by considering other aspects that could influence the listener's emotion, such as the environment in which the user activity is carried out while listening to music.At the moment, the system is designed to consider only the starting and target emotions.It could be helpful to add more context to the recommendations, as it happens-with some variations-in context-based filtering techniques.This contextual information may come from the ambient and the type of activity the user is involved in, e.g., gym [20] where the trajectory for inducing a specific emotion could be different than the one used when at home.Of course, this kind of improvement will need extensive study and validation with final stakeholders.
Another future direction is represented by the extension of Moodify so to collect and analyze users' behavioral information, e.g., interactions with a mobile or IoT device [20,59,60], to better tailor the songs recommended and include the implicit feedback typical of collaborative filtering mechanisms.Different studies have found a connection between emotions and the way we use smartphones [61][62][63][64][65]. Currently, MoodifyWeb always asks for explicit feedback from listeners.Based on the insights coming from the literature, we will extend the system so that the emotions captured from smartphone interaction will provide implicit feedback.For instance, if we capture sadness through the smartphone interaction while listening to a song, MoodifyWeb could avoid asking for explicit feedback and apply the sad emotion to that song, adjusting the RL method's trajectory appropriately.
Finally, to face the problem of the Q-Learning scaling (see Section 4.3.4),as a future development, we are going to exploit Deep Q-learning techniques, which utilize the virtues of deep learning with so-called Deep Q-networks, i.e., feed-forward neural networks used for computing Q-value.

Figure 1 .
Figure 1.An overview of Moodify: the user selects the starting emotional state and the target emotional state.Such emotions are sent to the agent.The agent recommends a music track from Spotify, and expresses feedback through MoodifyWeb.Such feedback is, in turn, provided to the agent for refining the recommendation.

Figure 2 .
Figure 2. Examples of emotional states in the circumplex model.In this study, these states are divided into four groups: Pleasant-high, Pleasant-low, Unpleasant-high, and Unpleasant-low.
measure from 0.0 to 1.0 of whether the track is acoustic.1.0 represents high confidence the track is acoustic."[0,1]Danceability (d)"Danceability describes how suitable a track is for dancing based on a combination of musical elements, including tempo, rhythm stability, beat strength, and overall regularity.A value of 0.0 is the least danceable, and 1.0 is the most danceable."[0,1]

Figure 3 .
Figure 3.The emotional states are selectable in MoodifyWeb.

Table 1 .
Main points of both closest articles in the literature and this work.† = lacking details.

Table 2 .
Audio features for a music track provided by the Spotify API.

Table 5 .
Results of the Testing phase across all participants (average scores).