Automatic Association of Scents Based on Visual Content

Although olfaction can enhance the user’s experience in virtual environments, the approach is not widely utilized by virtual contents. This is because the olfaction displays are either not aware of the content in the virtual world or they are application specific. Enabling wide context awareness is possible through the use of image recognition via machine learning. Screenshots from the virtual worlds can be analyzed for the presence of virtual scent emitters, allowing the olfactory display to respond by generating the corresponding smells. The Convolutional Neural Network (CNN), using Inception Model for image recognition was used for training the system. To evaluate the performance of the accuracy of the model, we trained it on a computer game called Minecraft. The results and performance of the model was 97% accurate, while in some cases the accuracy reached 99%.


Introduction
Virtual environment is defined as the awareness of the environments' contents by using synthetic sensory information as if they were not synthetic [1].Feeling immersed in this type of environment requires the user to be in a psychological state that he/she feels included in the environment.
Many researchers aimed to enhance the immersive experience for the users of virtual environments by attempting to mimic the real environment.Since we perceive our surrounding by our senses in the real environment, the absence of any sense will affect our engagement in the environment.The same thing can be generalized in the virtual environment.A player's presence will be increased by increasing the senses that will be active during the experiment [2].Thus tactile, olfactory, audio and visual sensory cues will result in stimulating the senses from the real environment and also give the user a strong sense of presence in the virtual environment [3].
However, among the sensory cues, olfactory is the less used to enrich experience of the virtual environment for the users [4].It is mainly due to the slow advancement especially in emitting the scents when needed, and hence failing to integrate the smells with many software applications [5].Moreover, the absence of a simple and robust device that can emit the scents hinders using smell compared with the visual and auditory sensory modalities [6].The device which is responsible for delivering the scents to the user during such experiments is called olfactory display.Olfactory devices are controlled by a computer that delivers odorants to the human olfactory organ [7].Very few are commercial olfactory displays and are mainly used in cinema movies such as Olorama [8].
In literature, several olfactory displays were produced releasing scents in the set time.It is a method that works well if we know the type of scents the user will come across such as in the movies.One major issue of this approach is that in the virtual environment such as games or virtual reality applications, it is hard to predict where the player will decide to go or do next, hence what type of scent should be released.Nonetheless, it is possible to further improve the timing of the release of the scents by associating it with the contents of the virtual environment.To the best of our knowledge, no previous attempt or study has fully accomplished and implemented a content aware solution, that can be generalized.With this goal, the work seeks to develop an olfactory display with the use of Inception-V3 model as an approach used in deep learning for image recognition and to release the corresponding scents.The data used in the study is the Minecraft computer game [9].
The main objective of the study is to propose a new approach for associating scents with visual content of the virtual environment.The findings should make an important contribution in the field of virtual reality that can potentially offer a more immersive experience.Additionally, this work contributes to artificial intelligence by showing that the use of pre-trained model such as the Inception model can be used to solve such multilabel classification problems without the need to develop and train a new CNN from scratch.
This paper has been divided into four parts.The first part is a literature review to show different types of olfactory displays.The second part covers the methods and materials that describe the system design.The third part presents the result and discussion while the last part is the conclusion and future work.

Literature Review
According to the literature, there are two main types of olfactory displays: Wearable and devices placed in environment.The wearable devices are attached on the user's body or head [10].This type of displays ensures the delivery of scents to the user but it has the drawback that the user is aware of the device, which can cause discomfort and therefore, the experience may not be as immersed as it should be.On the other hand, utilizing olfactory displays that are placed in the environment is not disturbing the user, since the user does not have to wear any hardware.However, a key limitation in this type of olfactory devices is that the scent may not reach the user, especially if it is placed far from the user or due to the weakness of the scent resulting from the air movement.
Each type is subdivided into different categories.The external devices also known as "placed in environment" are divided based on the scent generation technique that is used in the air canon, natural vaporization and air flow.While the wearable devices are divided according to their placement on the user to either "body" or "head mounted".
The paper in [11] introduces an olfactory display called inScent.A wearable device worn as necklace, it emits different scents upon receiving a mobile notification known as scentification in which scents are used for delivering information.The scents are emitted automatically based on different scenarios either through a predefined name or the contents of a message or may be based on the timing as in calendar events.Scents are chosen on behalf of the user's preference, thus the cartridge can be exchanged.To generate and control the amount of scents to be produced, they use heating followed by fans' air for delivering to the user.Heating of a wearable device can be risky and also the fan produces a loud sound that can cause discomfort for the user.
Recently, several researchers focused on designing a fashionable wearable device known as Essence [12].They are developing a lightweight necklace-shape olfactory comfortable enough to be worn on a daily basis and controlled wirelessly via an Android application.The device can either release scent manually in which the user pulls down the string that will send data for the release of the scent or may be receiving some data from the Smartphone such as location and time.They also release the scent based on the heart rate, brain activity or electro dermal activity.The device can be controlled by someone to release the scent.However, the device can only release one scent thus it is not practical to have one scent that will be released according to the user's circumstances.
Smelling Screen device in [13] can generate scent, based on the image shown in the Liquid Crystal Display (LCD), or by placing four fans on the corners of the screen indicating the image position in the screen resulting from the airflow of these fans.These devices did not operate with games, applications, or a movie but rather on random images which were shown on different corners of the screen.
An inexpensive olfactory display has been developed in [14].The device uses the Arduino Uno microcontroller as it is economical and capable of controlling the olfactory display.They also used fans to generate the airflow for scent delivery.The device was tested in different experiments involving games, advertising and procedural memories.For gaming, they used the Unity-based Tuscany.They had to modify the environment by adding a bowl of oranges that released the scent once the player came near it.However, this approach is not generic as most games are not designed to support olfactory displays.The researchers also developed their own application, presented in [15].
Researchers in [16] present an olfactory display that is simple, economical and capable of releasing 8 scents based on timed events.The scent is generated by a heating process to vaporize the essential oils and water that is used to clean the air from the previous scents.Along with the olfactory display, they also created the software that will be used to select the list of scents, control the speed of the fans and the intensity of aromas.However, the heating process is time consuming until the scent is released based on their experiments.It takes 6 s thus it is not an instant release of the scents.
Olfactory displays were used in many studies regarding synchronizing movie with scents.The commercial olfactory display known as "The Vortex Active" was used in research [16] that aims to synchronize a movie clip with some specific scents.The synchronization based olfactory display releases scents on time.The device is installed in the environment and uses fans to deliver the scents for the users.The device is capable of releasing up to four scents at a time.It is connected to the computer via USB in order to set the timing of the scent releasing.
The study in [17] uses an olfactory display called Exhalia SBi4.The research used six scents based on the chosen movie clips and synchronized the scents with them.The device used fans to deliver the scents for the users and could release up to four scents at a given time.
Another work done in [5] was supposed to alter the contents of the film with the addition of the subtitle, which is a logo with different colors and was used to release the scents.The researchers made an olfactory display called Sub Smell.The system used different colors in every scene and when the movie was played, the machine identified the logo and analyzed the color with the release of the corresponding scent.
In [18], the authors developed a device for research purposes.The device is controlled by Arduino, and has three servo motors to press the jars of scents after sending the command through a wire connection from the Arduino.The advantage of this device is that due to its innovative design, each servo can press release two scents.Additionally, the device can be easily expanded.The device utilized fans to spread the scent but also to absorb when it should not be present.They developed two applications that use their device.One C# based application that utilizes timers to release scents and a unity Asset that allows games that use it to release scents when the player was in a certain area.Even though this device has the advantage of being low cost, like the other olfactory devices, the releasing of scents depends on the specific time or game, which is a problem we overcame with the proposed approach.
The literature shows that the existing olfactory displays are not used in the games or VR application due to difficulties in identifying the order of the scents.Few olfactory displays used in games depend on creating a game for the device and by setting the objects, a player might collide with and use them to release the scents.

System Description
A system consisting of electronic components and a custom C# Windows application was developed to control the unit.The electronic components consist of: Arduino Uno, fans, a Bluetooth module to connect the application with the device and servo motors to be used to press the scent jars causing the release of the scents.
As for the C# Windows application, it was used to capture the images from the selected game.In this research we selected the Minecraft computer game.The user can control the time when the capturing occurs.The release of the scents took place after the image was classified by the Convolutional Neural Network (CNN) model.Figure 1 shows the framework of the system.

Olfactory Display
Arduino UNO boards were the core for many projects with good community support, multiple platforms support and easy to use software and hardware.The Arduino UNO has 14 digital input/output pins, USB connection and power jack [19].
In this project, we tried to develop a device similar to the olfactory display that was presented in [18] with a slight change by adding Bluetooth to transfer data instead of the wire connection.The device has several electronic components connected and controlled by the Arduino, like three servo motors of the MG996R type to press the jars.The servo motors are controlled from the Arduino.Each servo motor can control two jars.Therefore, there are six scent jars and by increasing the number of servo motors, we can control the number of scents the device releases.At the same time only two scents can be released and the scents are ocean, fire, snow, mildew, grass and dirt.This technical limitation can be easily taken care of, however, this device aims to serve as a proof of concept and not a commercial product.Figure 2 shows an overview of the device.The Arduino is connected to DC driver 2 × 15 A which is used to control two fans.The fans assist in pushing away the scents in the environment after the release of the scents.To make the application communicate wirelessly with the device, HC-05 Bluetooth module has been used with Arduino.The whole system is used with the power of three Ampere phone chargers.System connections schematic is presented in Figure 3.

The Virtual Environment
The study uses the Minecraft game to associate scents with the contents of this virtual environment.Minecraft is a game in which the player can go for many adventures in generated worlds.They can build whatever they imagine by using the resources which are given to them.They can equally go to the "adventures" and "survival" modes where they have to defend themselves [20].The game can be played on many platforms and has 74 million active players [21].
We have selected six scents that we associated with visual information from the game.The scents are grass, ocean, snow, dirt, fire, and mildew, which is used as an unpleasant scent to associate with zombies.The device will automatically release a scent based on the identified visual content.

Inception Model
The automatic association depends on using image recognition.Our brain makes the process of associating the image to a class as an easy task to do, but it is still one of the hardest problems to organize by using a computer in which an image is simply a large grid of numbers.The advancement of technology in this field that uses deep convolutional neural networks made it possible and became state-of-art to solve these kinds of problems.It takes a lot of time to train the convolutional network from scratch, especially when it is hard to find large data sets that are sufficient enough to accomplish this task.The researchers made many models as in [22][23][24].These models were trained on large data to classify them into many classes.Many researches used them to conduct their study as in Style Transfer [25] and Skin Cancer Detection [26].
Inception model is one of the CNN models used extensively in the transfer of learning.Normally, in the convolutional layer the designer must decide whether 1 × 1, 3 × 3 or 5 × 5 filter size is to be used followed by max pooling and then repeating this layer by stacking more layers with the hope to detect more details.However, this architecture is computationally expensive due to many operations occurring at every neuron, thus increasing the layers will result in more details.But in the Inception model, different architecture is used.Instead of choosing, we use them all on the same layer, concatenating them and sending them to the next layer.This architecture is complicated but it worked remarkably well and managed to give better performance results in terms of speed and accuracy.
In this study, we applied the transfer of learning by using Inception-v3 which is trained by Google to classify 1.28 million images from 1000 classes.The model was pre-trained on our image data and was supposed to do multi-label image classification in which multiple classes could be corrected to classify a single image with the help of TensorFlow [27], framework from Google to pre-train the Inception-v3.

Data Preparation
The first step was to collect data to train the inception model.We aimed to have 200 images for each class so manually we collected 100 screenshots from Minecraft game with the categories as grass, fire, dirt, snow, ocean, mildew (the unpleasant smell for zombies) and then we performed data augmentation on the previously collected data to get another 100 images for each class, thus we had a total 1800 images.The data augmentations were performed manually on some of the images; mainly they were rotation, scaling and changing of the color.Before data augmentation the accuracy was less than 30% making data augmentation a necessity.For testing, we used 10 images for each class, thus in total we had 60 images for testing (see Supplementary Material).The testing and training datasets were different.Figure 4 illustrates some of the images used to train the model.For each image we prepared a text file that had the label of the image as was proposed in [28] to classify multi-labels images.Labels were coded using the Sigmoid method so that one image could belong to multiple classes with different probabilities.In the study, we used two labels for each image that contained two of the scents we wanted to release.We covered only three cases which were: (1) If the image had both ocean and grass, (2) if the image had fire and grass, and (3) if the image had grass and mildew-zombie.All screenshots used for training and testing were manually collected.

Windows Application
For the controlling of the olfactory display, we developed an application that could communicate with the device through Bluetooth.It took screenshots of Minecraft game every 6 s, resulting in image recognition.The pre-training process of the Inception-V3 model resulted in creating two files which were: Trained graph and a text file containing the labels.The application read these files and tried to recognize the new images by using EmguTF, a .Net wrapper [29] to call Tensor Flow functions.After the classification of the screenshot the image was sent to the Arduino and the corresponding servo motor pressed the jar and the scent was released.The fans rotated, causing the spread of the scent.Figure 5 shows a screenshot of the application.

System Evaluation and Results
The retrained model was trained on Windows 10 PC with 8 GB RAM, Intel core i7 processor.We set the training steps to be 20,000 steps consuming an hour, based on default value.In order to retrain the model, we calculated the accuracy of new images from the Minecraft game that was not used for training before.There were 90 images in which each class comprised of 10 images.Also, we calculated the recall, precision and F-score.

Accuracy of Retrained Model
To evaluate the accuracy before integrating the model to our application, we provided the model with ten testing images for each class.Learning rate was set to 0.01 with batch size as 100.Table 2 shows the results we got for each testing image.For the dirt images, the accuracy differed nonetheless, the model gave correct prediction of the label.For the grass class and ocean, they gave the highest accuracy results compared to the rest of the classes.However, the ocean images with dark color-images that have been taken after setting the time to night-gave lower scores.For the fire images, even though we reached an accuracy score of 0.99 in a few testing images, most of them ranged between 0.7 and 0.6 and this, due to the difficulty in collecting data which had fire only from the game.Fire was usually found in images that had grass too.For images containing snow, the color of snow was similar to that of the ocean and sky in some weather conditions provided by the game, thus causing the accuracy to drop in many of the testing images.If more training images were fed to stimulate the model in all conditions, then the accuracy would be much higher.In fact, the increasing of training images in all classes would increase the overall accuracy of the whole model.The images that contained illustrations of zombies were associated with the scent of mildew to represent an unpleasant scent.In comparison to the other classes, the zombies had the lowest scores when it was tested on the application, mainly due to the difficulty in training the system on high resolution images that represent zombies alone.

Accuracy of the Model within the Application
After integrating the CNN model to the Windows application, we ran the application while playing the game and we calculated the time to recognize each captured image and its accuracy as illustrated in Table 4.
As the results show, the recognition of the image takes a few seconds, which makes the application running in almost real time.To release the scents, the accuracy was set to be at least 90% or higher.

Figure 1 .
Figure 1.Framework of the system.

Figure 2 .
Figure 2. Overview of the device.

Figure 4 .
Figure 4.A sample of the training data used to train the model.(a) Dirt data; (b) Fire data; (c) Fire and grass data; (d) Grass and ocean data; (e) Grass data; (f) Mildew(zombie) data; (g) Mildew and grass data; (h) Ocean data; (i) Snow data.

Figure 6 .
Figure 6.A sample of the testing images used to test the accuracy of the model in classifying new images.(a) Testing image for dirt class; (b) Testing image for fire class; (c) Testing image for grass and fire classes; (d) Testing image for grass and ocean class; (e) Testing image for grass class; (f) Testing image for mildew(zombie) class; (g) Testing image for mildew and grass classes; (h) Testing image for ocean class; (i) Testing image for snow.

Table 1 .
Table 1 below shows a comparison of the different olfactory displays covered in the literature.Comparison between the olfactory displays.

Table 4 .
Performance of the model within the application.