Introducing Tagasaurus, an Approach to Reduce Cognitive Fatigue from Long-Term Interface Usage When Storing Descriptions and Impressions from Photographs

: Digital cameras and mobile phones have given people around the world the ability to take a large number of photos and store them on their computers. As these images serve the purpose of storing memories and bringing them to mind in the potentially far future, it is important to also store the impressions a user may have from them. Annotating these images can be a laborious process and the work here presents an application design and functioning implementation, which is openly available now, to ease the effort of this task. It also draws inspiration from interface developments of previous applications such as the Nokia Lifeblog and the Facebook user interface. A different mode of sentiment entry is provided where users interact with slider widgets rather than select a emoticon from a set to offer a more ﬁne grained value. Special attention is made to avoid cognitive strain by avoiding nested tool selections.


Introduction
This work presents an application which has been given the name Tagasaurus. The name is derived from the essence of the application's functionality which provides an ability to associate tags with their collection of images. The 'tag' word comes from popularization of the Hashtag introduced by Twitter [1] where users of the online platform provide keywords with the # character proceeding it so that the users can filter their content feeds [2] based on these hashtags. It also helps users identify 'trending' topics that surround different hashtags. This paradigm has seen a great deal of usage during relatively recent political discussions [3] or even commentary about TV shows in real time between different communities [4]. Although the name focuses mainly upon 'tags', the motivation takes inspiration from other user interaction with content such as Facebook's emoticon selection [5] that is associated with descriptions but exists as independent functions outside of the text forms. As emoticons are associated with sentiment (emotion), the prevalence of 'memes' [6] provides similar sentiment information [7] that may not be possible to directly infer using automated tools reliably when not used in a big data context [8].
This application seeks to provide users a simpler avenue to enter their impressions from images they posses and store them in a user centric approach. Inspiration is taken from previous applications that produced software giving users the ability to produce lifeblogs [9,10] which came from Nokia along with their mobile phones ("Let your phone help you automatically keep a diary of your life memories"). The ability to have memories recorded can assist when people search for memorabilia of sentimental moments [11].
This supports the concept that digital technology is part of the 'extended mind' which provides for the mind a store of memories [12,13]. Since memories can fade or change over time, having the ability to record the overall impressions from photos may help a user track transitions. It is a major factor that can appear with age where certain memories can provide a stepping stone for retrieving many other memories as well [14].
The new approach utilizes the new concepts of data entry explored by the large social media platforms for the purpose of relieving cognitive strain when annotating (tagging) their possession of collected photographs. Cognitive strain (or cognitive load) from visual user interfaces (UIs) has been shown to affect nurses [15] in their ability to carry out their duties in ICUs. This has been examined for textual outputs as well, where researchers look for ways to summarize the information for patients with limited abilities to process large amounts of information [16] from an 'ergonomic' approach. There are solutions to this issue which provide improved workflows [17], essentially filtering the user's queries through the use of the fewest required interaction, and that of interface designs for the visual appearance to address the human factors [18,19]. It is important for such strain to be as low as possible for it to not affect the sentiment when saving annotations.
The production of 'tags' (keywords) from user description text has been explored, and the use of emoticons as explicit sentiment declarations as well as the associated 'memes' placed alongside content. Section 2 displays a novel use of the emotional element where the values are not mapped to a predefined set of emotion dimensions which have a mutually exclusive value entry (one-hot-encoded/dummy variable values) since multiple emoticons cannot be inserted into the Facebook emoticon toolbar, although users can insert the emoticons within their textual description. Taking the direction of Facebook, a separate toolbar with a widget set is provided for users to fine-tune the values for different emotions they associate with the image. The pictures that are 'memes' placed in content threads can also take the form of 'stickers' [20], giving a more general use case of associating a main content piece with other images. This work takes this more general view that an image can have a set of other images that are 'linked' to it. Given that a multitude of images exist for linking, it then becomes a question of how to do this in a simple single user centric approach. Together, the two main explorations aim to see how an intuitive 'flat' UI can be designed to allow fine grain emotional value representations with the feature to explicitly link other images with a main image and for this to exist along with the established text to tag processing for the user to see.
The other aspect we aim to include in this application is a basic form of gamification [21]. Since the annotation process of the photographs may not be an entertaining process without new information being generated or a change in the user's state/environment, some feedback may be necessary. The methodology describes metrics users will see to validate their efforts of producing a completely annotated image collection which they can use in the future to remember emotions and key concepts related to their memories.
Section 3 provides screenshots of the application demo implemented that can be found at https://github.com/mantzaris/Tagasaurus (accessed on 31 May 2021). The implementation is in ElectronJS, which is aimed at being cross platform so that most users can use it. Section 2 covers more of those details. It can be seen that the results support the objectives outlined covering a new way for users to annotate the images on their computers. This work offers a novel approach for users to more efficiently create an emotionally annotated image corpus (database) from human-based tagging [22]. This new approach can also be used in a research based setting for exploring the emotional responses to certain image stimuli [23] and their associated tags.

Background Literature
There exist many methods for tagging images; for example, the one in [24] would appeal to a traveller since it explicitly takes into account geographic and temporal information for the images stored. An approach leveraging deep learning to provide an overall automated approach to tagging is presented in [25]. As the paper notes, such methodologies require large corpuses of images to learn from and their methodology required more than 7k images to train for the single mode use case (not batch but a single image tagging). Another deep learning approach in [26] uses the sentences alongside image to assist, and, given that this implementation also provides text, such an approach is the most similar as the keywords and image stores are associated, but it is not a personalized tagging experience for the user since thousands of images are required. A pre-trained model could be used, but it would not address the specific personalized exploration of sentimental change. which is the current scope.
Andriyanov and Lutfullina [27] showed how important 'human factors' are in general as they studied traffic accidents in the context of artificial intelligence becoming more of an assistance. They correctly highlighted the need to investigate the stimuli which cause distractions and how emotions in regards to similar stimuli can change as a consequence of inner emotional states. Taking these into account can provide better driver feedback. There has been active research on using deep learning to detect and classify emotions from images of human expression as in [28] which can apply to important case studies such as driving. The work offered here can assist such studies by having a tool to more efficiently produce a ground truth training and validation dataset for the ML to learn from.
Qian et al. [29] presented a different implementation strategy for obtaining tags from a 'social' mechanism where users in a collective provide tags. The authors correctly stated that tags often are 'noisy', displaying variation about the true underlying label and offered a solution by using the crowd intelligence to remove errors by taking a statistic of the entries, but this ignores privacy issues. Privacy is a major concern in the digital age where the potential damage to a users privacy can be utilized by perpetrators; thus, it needs to be taken into consideration and 'engineered' into the applications [30]. Being hacked can be a source of anxiety [31]. This can be especially true for personal memories.
A solution which offers more privacy but still leverages the social graph is shown in [32]. The methodology takes an approach of building models of those participants in the graph which may appear in the photos in order to tag them. There is a 'Tagger' module developed alongside a 'Face Learner' module, but no modules take into account sentimental objects.
Many of the principles adopted here can be traced back to the seminal work of M. Dertouzos [33], who pioneered the necessity for applications to take the user centric approach and the nature of which users interface with applications in order for services to be provided to the user. This book was written shortly after the time when personal computers became common and discusses how many of the key components in human factors had not been addressed but need to be given attention. Although it does not explicitly discuss the emotional components of data storage, it emphasizes that the services would be able to encode the broader experiences in a flexible manner.
For the interface design, the book by Johnson and Jeff [34] provides a humorous but at the same time very important set of messages which are made clear to UI designers based on common errors made. Notable is chapter 2, of the referenced book, which contains a section describing what a user may experience as a complicated process to access functionality. It also has a rich description and explanation for the right way toggles should be used (which Tagasaurus employs, as shown in Section 2).

Materials and Methods
From the usability inspection methods defined in the seminal work of Nielsen [35], six of the seven inspection methods were applied to this design. The design presented in [36] provides a wealth of information based on case studies with seniors on how to organize controls and widgets within the viewing pane. Most notably is the effort to spatially spread the buttons and input fields around the screen rather than increase the size of the media context by 'nesting' controls in menus. This also coincides with the structure based perception principles stated in [37]. More inspection methods will be applied in later stages of the project when a larger user base has been established, which can take time due to the organic growth approach of open source software. Figure 1 displays the wireframe model [38] layout of the UI which the user will interact with during the tagging and general annotation process. There is in the center of the screen a display pane for the image on which annotations are to be focused, and using the bootstrap carousel [39] user clicks on the right or left edge will 'slide' the image to the following one in the list. The widgets for each of the emotional values is shown on the top left, where the widgets allow the user to drag a slider between minimum and maximum values. To the bottom left is the set of images which the user can toggle between, producing links. These links facilitate an explicit link between images rather than attempt the inference. The textual description area on the top right allows a user to provide free form unstructured text for their description of the image. 'Tags' are produced after a user chooses to save the annotation set and those are shown to the user in a pane to the bottom right for inspection, although no editing options are allowed. The tags have the stop words filtered [40] and are ordered in the sequence they appear. Above the main image display a set of buttons is shown offering functions such as 'loading an image', 'delete an image', 'main menu' (return to welcome screen) and 'export' for the data to be stored in a cross compatible format.  Park and Kim [5] investigated how FaceBook utilizes the six emoticons to support the analysis and organization of the information feeds towards its users. Although NLP techniques have made significant progress in inferring sentiment from text, in practice, the practical issues present myriad challenges for inferring sentiment and any emotion value [41]. The potential of false positives in the case of returning to faded memories may be a problem for some users. It would require a solution relying on big data where statistics and ML need more than a single user's data. This is not the in line with a 'life blog' type of application [9]. The importance of sentiment and how emoticons can provide clear indicators of emotional states is discussed and demonstrated in [42], showing how, although it is applicable to smaller datasets, it is quite accurate.
The scope for the wireframe model can be defined in a manner similar to how UI element actions are described in [35,43]. Given a collection of images X, each image found in the folder read by the application by default is a collection of individual images that can be indexed over i; X = [X 1 , X 2 , . . . , X N ]. Here, the number of images is N, and each image is X i . Each X i variable is associated with the state of the user inputs on the UI: where E i is the state of the emotion value widgets for image i (with E i j ∈ R), M i is the binary (Boolean) state of selection for the different image link associations (with M i l ∈ {0, 1} and T i the set of tags produced from the description and T i k ∈ Z + since each integer corresponds to a unique tag keyword element). Algorithm 1 provides a basic understanding of the data entry process with the pseudocode. db_store(image, tags, emotions, imageConnectionMatrix) 12: update_tagging_score(Score T , Score E , Score M , Score H ) 13: end procedure

Algorithm 1 Tagasaurus Item Addition
The application is written in Javascript (JS) [44] which in 2016 had by far the largest number of contributions compared to any other language, and Tambad et al. [45] provided graphics showing that in 2020 Javascript was a leading language. The book by Crockford [44] provides a good discussion for the merits of JS noting the use of closures [46] and promises for asynchronous programming by relying on task management via the event loop. To be in line with the vision of the complete user centric computing experience motivated in [33], the ElectronJS framework [47] is used. It offers a plethora of APIs for produces UI elements (including widgets) and is able to be used across multiple operating systems without changing the implementation. ElectronJS uses nodeJS [48] to provide the ability for the program to work with the local filesystem.
The saving operation places an image file from the user's computer into an internal folder. The database then registers this new file given a key as the filename. If there are images in the folder that are not in the database, the program will insert those as well with default annotation values (null for the different selections). The database leverages the relatively new technology of browser storage [49], where the data are kept on the client side. This is part of the paradigm of progressive web apps (PWAs) [50], which aim to offer an alternative to native mobile applications where user interactivity in a web app should not cease when the connectivity is interrupted. To maintain user actions, local storage is necessary via the browser, and, although ElectronJS is not a 'web' app, the browser facilitates this database paradigm. There are benefits over having to use 'local storage' which presented security issues, as noted in [51], and the consequential inconsistent permission problems between browsers which developers and users experienced frustration. This version of Tagasaurus uses WebSQL [52] to store the file names and annotation data. The code for the implementation is available at Github, (https://github.com/mantzaris/ Tagasaurus (accessed on 31 May 2021)).

Score Metrics
In order to provide the user with feedback on the annotation progress, some basic skill metrics are calculated based upon the data stored in the database.
where I is the indicator function [53]. Each of these metrics is the percentage of images which have had some user information inserted for their type of image annotation. In particular, Score T , Score E , Score M are the percentage of tagged (T), emotion stamped (E) and Meme connected (M) images, respectively. The score for the overall assessment of progress is based upon the harmonic mean, , as it can be used in assessment purposes [54]. Here, each y i is the cardinality of each image entry's non-empty annotation element from Equation (1) (y i = X i ). Using an adjustment for the cases with zero entries ( [55]), the score applied is: where n 0 is the image annotation entry indices which are empty n 0 = {i : X i = 0}. These four values will be presented to the user in the form of feedback to reinforce progress as it continues in a type of 'gamification' (also referred to as powermeter) [56,57]. Figure 2 shows the screen that the user first encounters when the application is started. Below the title are four skill bars, where the first three are the scores computed from Equation (2) that are the percentages from the number of images that have annotations of that type. The fourth skill bar is the value defined in Equation (3), which is the zero entry adjusted harmonic mean of the proportion of the three metrics included for each image. These scores are a type of motivation for the user to find a degree of feedback on their efforts to annotate their image collections. Then is the button at the bottom to begin another session of annotating photos. The overall score based on the harmonic mean is given a name of 'Awesomeness Score' as an attempt to draw enthusiasm in the user before delving into the process of annotating the images.   Figure 1). The use of Bootstrap and Flexbox together ensures that the window can be scaled by the user manually in order to provide a comfortable window size to interact with and maintain the proportions necessary for setting the widget values and entering information. The implementation can accommodate different main image view proportions since the horizontal size is kept fixed for the window as the images are scrolled through and the vertical size can vary as needed to keep the image from stretching. There are no nested controls and users can enter more text than what can be held in the description box which then provides a scrolling feature rather than distorts the proportions of the wireframe design. Large numbers of tags/keywords also produce a scrolling feature for users to examine all the tags without changing the font size. When a user presses the button 'save', the information for the annotation object associated with that image is created and saved into the database (facilitating the variable defined in Equation (1)). The button 'Return to Main' brings the user back to the welcome screen of Figure 2, where the scores on the annotation progress are updated with the most recent annotations stored in the database.

Results
From the resulting implementation of the inspection methods defined in [35], six of the seven inspection methods were applied to this design: heuristic evaluations, cognitive walkthroughs, pluralistic walkthroughs, feature inspections, consistency inspections and standards inspections. As stated in the article, 'it is possible to have regular developers serve as evaluators', and the expertise/experience of the authors was applied to ensure a 'flat' UI design was presented to the users avoiding the requirement for nested controls.

Discussion
The major contribution of Tagasaurus is its investigation to produce a diary of photos and the related tags similar to lifeblogs [9,10]. It draws inspiration from the interface of Facebook where users are given a separate tool bar to choose from a set of emoticons to store their sentiment and that images in content threads can be placed as 'memes'. The design presented provides a set of widgets storing a degree for which an emotion is associated with the images presented and that the collection of images can be 'linked' to other images for organization of memories. The descriptions and impressions panels presented allow for easy storage of a summary of the image perceptions and an overall journal of life stories.
This solution integrates the elements of the standard emotion scales, memes and textual descriptions in a non-nested set of tools in an interface. Tagasaurus is an open source solution allowing users to quickly install the solution and store the information locally. The interface is web-based, allowing the application to be easily installed cross platform as well by leveraging the ElectronJS framework.
Future work will explore summarization techniques for how a user can examine the different impressions and the associations produced between different images. A network diagram would be an intuitive visual depiction for such information.