Move, hold and touch : a framework for tangible gesture interactive systems

: Technology is spreading in our everyday world, and digital interaction beyond the screen, with real objects, allows taking advantage of our natural manipulative and communicative skills. Tangible gesture interaction takes advantage of these skills by bridging two popular domains in Human-Computer Interaction, tangible interaction and gestural interaction. In this paper, we present the Tangible Gesture Interaction Framework (TGIF) for classifying and guiding works in this field. We propose a classification of gestures according to three relationships with objects: move, hold and touch. Following this classification, we analyzed previous work in the literature to obtain guidelines and common practices for designing and building new tangible gesture interactive systems. We describe four interactive systems as application examples of the TGIF guidelines and we discuss the descriptive, evaluative and generative power of TGIF.


Introduction
Since Weiser's vision of ubiquitous computing [1], many branches of Human-Computer Interaction (HCI) have tried to obtain a seamless and digitally augmented interaction with the physical world.Jacob et al.,formalized the transition from the Windows Icon Mouse Pointer (WIMP) paradigm to the post-WIMP era with the Reality Based Interaction, which takes advantage of human natural skills to interact in the real world [2].In this scenario, the user needs to communicate with the system (or other users) in some manner.Yet in 1980, Bolt used gestures and speech as a natural modality to interact with the system [3].While speech interaction diffusion has been limited by its low social acceptance [4], gestures thrived in several application domains.Since the origins of tangible interaction, many researchers considered gestures as a communication mean in Tangible User Interfaces (TUIs).Fitzmaurice's PhD thesis on graspable user interfaces recites: "By using physical objects, we not only allow users to employ a larger expressive range of gestures and grasping behaviors but also to leverage off of a user's innate spatial reasoning skills and everyday knowledge of object manipulations" [5].A few years later, Fishkin wrote that tangible interfaces "all share the same basic paradigm-a user uses their hands to manipulate some physical object(s) via physical gestures; a computer system detects this, alters its state, and gives feedback accordingly" [6].Still in Radical Atoms [7], Ishii's latest vision of tangible interaction, gestures and direct manipulation are the two modalities that allow the users to control the system.Even if tangible gestures, i.e., gestures with physical objects, are often considered an integral part of tangible interfaces, many tangible interaction frameworks and models focused mostly on the objects and on their meaning (e.g., Shaer et al.'s Token & Constraint [8], Ullmer and Ishii's TUIs [9]), without analyzing in depth which forms user's gestures can assume [10].In 2011, Hoven and Mazalek defined for the first time Tangible Gesture Interaction (TGI), an approach that combines the communicative purpose of gestures with the manipulation of the physical world typical of tangible interaction [10].By leveraging our ability to think and communicate through our bodies and our manipulative skills [10], TGI is an interesting approach for designing applications in many domains, especially if the application aims at facilitating collaboration and reflections.
In this paper, we deepen the investigation started by Hoven and Mazalek [10] in order to shed additional light on the opportunity of gesturing with physical artifacts for digital interaction purposes.To this purpose, we present the Tangible Gesture Interaction Framework (TGIF), which aims at helping during the creation of new tangible gesture interactive systems, covering three fundamental aspects: abstracting, designing and building [11].First, by modeling how tangible gestures are performed according to three main components (move, hold and touch) and which semantic constructs can be associated to tangible gestures, this framework helps abstracting and reflecting about TGI. Figure 1 depicts the components of a tangible gesture, a gesture performed in relation to an object.As a second step, by analyzing existing work in the literature, we summarize the common practices, which help designers to choose appropriate gesture types and object forms for their applications.Finally, the existing work in the literature is used to identify the most popular technologies and approaches for the recognition of tangible gestures.In Section 5, we present four systems as application examples of TGIF.We conclude the paper by discussing our findings and by pointing to the expected future work in this field.
Figure 1.The Tangible Gesture Interaction Framework (TGIF) syntax of tangible gestures: a gesture based on optional move, hold and touch components, related to one object.

Related Work
Hoven and Mazalek represented Tangible Gesture Interaction as the intersection of tangible interaction and gestural interaction, bridging advantages from both approaches, i.e., the affordance of manipulating the physical world from tangible interaction and the communicative intent from gestures [10]: even if TGI has a relatively short history, it is rooted on two well-established branches of HCI.Therefore, this section analyzes, first, related work in the field of tangible interaction and gestural interaction, and then, specific researches related to gestures with objects.Finally, we describe our contribution in relation to the state of the art.

Tangible Interaction
The birth of tangible interaction is often associated with the manifest of Wellner et al. [12], who invited researchers, yet in 1993, to bring back human activities from the personal computer to the real world.Fitzmaurice's Graspable User Interfaces [5] was one of the first PhD thesis that analyzed tangible manipulation as an interface for interacting with computer.In 2000, Ullmer and Ishii presented the MCRpd model [9], which helped framing and developing Tangible User Interfaces for almost a decade.Ullmer and Ishii's work focused on the embodiment of digital data in the physical objects and described very well how the users can directly manipulate those data, but it did not consider the variety of human skills that can be used for the interaction.In 2006, Hornecker and Buur [13] extended this data-centered vision of tangible interaction by including embodied interaction, bodily movement and embeddedness in real space among the characteristics of tangible interaction, besides tangibility, materiality and physical embodiment of data.Thanks to Hornecker and Buur's framework [13], tangible interaction enlarged its scope, embracing full-body interaction and spatial interaction.Two years later, through Reality-Based Interaction, Jacob et al. [2] paved the way of post-WIMP interfaces, where the involvement of human skills is the key point to trade off with application requirements.Tangible Interaction is probably the HCI branch that is able to obtain the best trade-offs for exploiting the human skills.Hoven et al. [14] recently summarized the current direction of this research field by identifying three main foundations of tangible interaction: the physical world, the human skills and computing.With these broad foundations, tangible interaction is approaching the largest domain of ubiquitous computing.Nevertheless, in tangible interaction we can find some peculiar qualities: the interaction is generally direct, integrated and meaningful.Since Fitzmaurice's seminal work [5], several frameworks have been presented in the tangible interaction domain.Mazalek and Hoven [11] classified these frameworks according to three types: abstracting, designing and building.Tangible interaction frameworks in the abstracting category generally focus on the objects [8,9], rather than on actions performed with the objects, i.e., gestures.Fishkin [6] made the first attempt to consider also the gesture semantics in tangible interaction, by classifying also metaphors that can be associated to actions (verbs), besides those associated to objects (nouns).Frameworks in the designing category [15,16] often underline the importance of actions with a perceptual-motor centered view to tangible interaction, but do not attempt to classify the type of actions that can be performed through our perceptual-motor skills.In particular, Wensveen et al. [16] provide detailed guidelines on how to couple user actions and system feedback and feedforward, while Djadiningrat et al. [15] suggest using "formgiving" in object design for inspiring user actions.

Gestural Interaction
Gestures are one of the first modalities developed by humans to communicate and there are theories arguing that actually speech originated from gestures [17].Because of their common origin and aim, gesticulation and gestures have still a great importance in human verbal interaction.For this reason, it will not be surprising that Bolt's Put-That-There system [3], often considered as the first gestural interface for interacting with a computer, combined pointing gestures and vocal commands to operate a Graphical User Interface.Quek et al. investigated further the possibility to combine gestural and vocal interaction in HCI, showing that gestures and speech are often generated by the same semantic intent of communicating an idea [18].Therefore, Quek's effort of classifying gestures focused mostly on their communicative role, distinguishing between acts (mimetic or deictic) and symbols (referential or modalizing) [19].Karam's thesis [20] represents probably the biggest effort to frame all gestural interaction.Karam classified gestures along four categories: gesture style, application domain, input (enabling technologies) and output (system responses).For the gesture styles classification, Karam identified five types of gestures: deictic (pointing), gesture-speech approaches (gesticulation), manipulations, semaphores, and language-based gestures (sign language).Semaphores, which can be both static and dynamic, are considered as a universe of symbols to be communicated to the machine [18], thus without any particular intrinsic meaning.Iconic gestures and pantomimes are considered as part of gesticulations, thus generally performed in relation to speech.
Because of the great variety of gestures types (from both physical and semantic point of view) and application domains, it is difficult to derive a generic description of gestures for HCI.In 1993, Baudel and Beaudoin-Lafon [21] formalized some principles to design free-hand gestures as well as the main steps necessary to recognize gestures.
Other works focuses only on particular types of gestures.Kammer et al. proposed a semiotic analysis of multi-touch gestures [22], with a grammar for describing the syntactics of gestures performed on flat surfaces.Golod et al. proposed some design principles to design gestures for micro-interactions [23], non-main task interactions that last less than four seconds.All these studies focused on free-hand gestures, without considering physical objects as part of the interaction.

Gestures with Objects
The first attempt to theorize gestural interaction with objects could be dated to 2008, when Vaucelle and Ishii presented the definition of Gesture Object Interfaces [24].Gesture Object Interfaces are opposed to the classical GUIs, which offer neither manipulation benefits, nor gestures, and to manipulatory interfaces, which aim to exert a functional movement of the object in order to obtain a result in the real world.Gesture Object Interfaces, instead, are intended to animate objects with meaningful gestures, introducing what they called "an identity reinvention".For Vaucelle and Ishii, powerful metaphors associated to gestures can animate or give a new identity to objects, while interacting with physical objects facilitates spatial cognition.In 2011, Hoven and Mazalek presented the first definition and characterization of Tangible Gesture Interaction (TGI) [10].They provide a broad review of existing systems that adopted TGI and they individuated several promising application domains for TGI, encouraging further research in this field.
Some recent frameworks analyze in detail only particular types of gestures with objects, thus only a subsection of human perceptual-motor skills.Wimmer's GRASP model encompasses both semantics (goal and relationship) and physicality (anatomy, setting and properties) but keeps the focus only on the way we grasp objects in the hand [25].Wolf presented an even more narrowed taxonomy, which analyzes microgestures that can be performed while grasping objects [26].Similarly, Valdes et al., investigated only gestures performed with active tokens [27].

Proposed Contribution
The analysis of related work shows that, while gestures are often used in tangible interaction, the form that they can assume is generally not discussed.Conversely, a consistent part of literature in gestural interaction considers only free-hand gestures.Previous studies that analyzed in detail gestures with objects focused only on particular types of gestures.Therefore, a comprehensive framework that analyzes gestures with objects for digital interaction purposes was still missing.
In this paper, we aim at extending the work of Mazalek and Hoven [10], proposing the Tangible Gesture Interaction Framework (TGIF), which offers a complete view on Tangible Gesture Interaction.We will show how TGI builds on Hoven et al. [14] tangible interaction foundations, exploiting human skills to digitally interact with the physical world and computing to recognize gestures and give feedback to the user.
TGIF aims at helping designers of interactive systems throughout the whole creation process.Indeed, while most tangible interaction frameworks found in literature focus only on one phase of the creation of a tangible interactive system, TGIF aims at fulfilling all the three framework types (abstracting, designing, and building) individuated by Mazalek and Hoven [11].For abstracting, TGIF proposes a communication model based on tangible gestures and analyzes the syntactics and semantics of tangible gestures.For designing, it illustrates the typical design process and common practices for different applications domains and object affordances.Finally, for building, it shows the possible technological approaches for recognizing tangible gestures and it references the most common implementations for each approach.

TGIF: Abstracting on Tangible Gestures
Since Gesture Object Interfaces, Vaucelle and Ishii depicted tangible gestures as a language at users' disposal: "Gestures scale like a language, have different contexts, different meanings and different results" [24].While spoken and written languages use words, TGI uses gestures with objects as communication signs.The study of signs and the study of languages fall both into the broad discipline of semiotics, which has a long history also outside of the HCI community.For Ferdinand de Saussure, who is considered as one of the father of linguistics, a sign can be represented through a signifier-signified pair [28]: for example, the word "tree" is associated to the concept of the tree, as universally known by English speakers.In general, however, the interpretation of a sign is a much more complex task than resolving a signifier/signified bijective function.Buhler's Organon model offers a richer vision of the sign, which has now three communicative functions (expressive, conative and referential) [29].Buhler's triadic vision of sign adds an important concept that the signifier-signified dyadic vision of Saussure cannot address: the sign is used to communicate between an addresser and an addressee and the interpretation of the sign is not universal and could differ for the addresser and the addressee.Inspired by the Organon model [29], in Figure 2 we depict the communication model of tangible gesture interaction as a triadic function of the sign.In this model, the user performs a tangible gesture (expressive function), which is a sign expressed in the physical world with an associated meaning in the digital world (referential function); the computer receive and interprets this sign (conative function) and acknowledges the user with feedback.Considering the model in Figure 2, our framework focuses on two aspects: the physical phenomenon, i.e., the syntactics of tangible gesture, and its referential function, i.e., semantics.Syntactics and semantics are two of the three branches of semiotics, which studies signs in different domains, with some examples also in HCI [22,30].In semiotics, semantics focuses on the meaning of symbols as conventionally defined, while a third branch, pragmatics, explains how these symbols are used taking into account also the prior knowledge of the user and the context of the utterance.In tangible gestures, semantics has a relatively short history and it is difficult to find standardized conventions, thus pragmatics generally affects most of the tangible gesture meanings.Because of the novelty of the field and the purpose of the article, we will discuss the two branches under the comprehensive term of semantics.The communication model of Figure 2 assumes that the users and computer have a shared knowledge of the possible signs and of their meanings, otherwise the communication could be ineffective.As suggested by De Souza for the broader field of HCI [30], the TGI designer has the important role to communicate this knowledge to the users.TGIF offers additional insights on this topic in Section 3.2 and in Section 4.1.2.
The Tangible Gesture Interaction Framework (TGIF) is based on Hoven and Mazalek's [10] definition of tangible gesture interaction: "the use of physical devices for facilitating, supporting, enhancing, or tracking gestures people make for digital interaction purposes.In addition, these devices meet the tangible interaction criteria".In 2011, Hoven and Mazalek [10] proposed Ullmer and Ishii's framework for Tangible User Interfaces [9] as tangible interaction criteria for TGI.Since the definition of tangible interaction and the research interest of the Tangible, Embedded and Embodied Interaction (TEI) community have evolved much in the last years.Hoven et al.'s [14] tangible interaction foundations (physical world, human skills and computing) offer broader and more recent tangible interaction criteria for TGI.According to the three foundations, in TGI the user must interact with real-world objects, using her or his cognitive and perceptual-motor skills for gesturing, while an underlying computation recognizes these gestures.This paper deals with the physical world and the perceptual-motor human skills in the syntax of tangible gestures (Section 3.1), and with the cognitive skills necessary to understand and remember the different meanings that can be associated to gestures (Section 3.2).

TGIF Syntax: Touch, Hold and Move
TGIF syntax describes how tangible gestures can be physically performed by the user.Tangible gestures are basically pairs generated by the combination of the physical world, i.e., objects, and perceptual-motor human skills, i.e., gestures.In particular, TGIF syntactics analyzes gestures by decomposing them in three fundamental interaction components, move, hold and touch, applied to a physical object, as depicted in Figure 1.The choice of move, hold and touch components is well rooted in tangible interaction history.
Djajadiningrat et al. have presented a perceptual-motor centered approach to tangible interaction based on movement of the body as well as movement of product components [15].Similarly, Matthews stressed the potential of using movement as rich interaction modality [31].Price and Rogers identified three types of physicality: physical movement, interaction with physical tools and combining artifacts [32].
Fishkin et al. considered hold as an important interaction modality in their seminal work "Squeeze me!Hold me! Tilt me!" [33].Wimmer investigated the way we hold objects as an interaction modality in [25,34].
Wobbrock et al. [35] and Kammer et al. [22] explored touch gestures to interact with surfaces.Touch interaction is a current trend even in commercial products, with more and more devices integrating touch gestures of different nature.Even if holding an object implies also touching it, in TGIF we consider them as two different components.Indeed, touch and hold involve also different human haptic receptors: touching an object activates tactile receptors in the skin, while holding an object is mostly related to receptors in joints, muscle and tendons [36].Move, instead relates more to vision and proprioception.
Move, hold and touch can be combined to obtain a rich variety of tangible gestures, as depicted in Figure 3.In the next sections, we will deepen the analysis of these various combinations with examples from literature.To enrich further the vocabulary of gestures, the designer of a TGI system can also take into account different properties that can be associated to the basic components.Time is a property than can be considered for move, hold and touch: how long an object is moved, touched or held can be mapped to different behaviors in the application.Similarly, the amount of movement, or contact points of a touch gesture can be used as an additional degree of freedom.The amount of pressure, or force, can be considered as an additional property of hold and touch.Move is probably the component that offers more degrees of freedom: the designer can also consider speed, direction, angle, etc.  Similarly to languages, where words are separated by spaces in the text or pauses in the speech, also gesture are delimited in time.Gesture segmentation is a common issue in Human-Computer Interaction, acknowledged since Charade [21], one of the first gestural interaction systems.Indeed, Human movements and objects manipulations are typically continuous and need to be separated.Golod et al. [23] delimit a "gesture phrase" with an activation and closure, both characterized by a muscular tension.A gesture phrase can be composed by one single gesture or several microinteractions to operate incremental actions.
In TGIF, a gesture is generally delimited in time by changes in a component or in a property of one component.For example, a static posture gesture is delimited by movements before and after the gesture.When no clear delimitation could be seen in a gesture, external triggers are used, for example speech or other gestures.

Move, Hold and Touch: Single Gestures
The simplest tangible gestures consist of only one basic component; they are generally deictic gestures.Often people touch an object or hold it in their hands to specify which object they are referring to.In Rosebud [37] a typical example of a hold gesture is shown: children hold toys in front of the machine in order to access stories associated to that toy.More degrees of freedom and expressivity can be added by considering additional properties of the components.One can touch an object in different manners: with one or more fingers, or with the whole palm.Pasquero et al. [38], for example, distinguished between touching the wristwatch face with two fingers or with the whole palm.Also, one can hold an object applying forces, i.e., pressure, like for the squeezing gesture of tangible video bubbles [39].Although losing the important haptic feedback given from the contact with the object, one can also perform gestures in proximity of an object.In the ReachMedia application [4], users have first to hold an object to select its related media content and then they perform free-hand move gestures to browse the content.Although move, hold and touch can occur singularly in TGI, richer and more expressive gestures can be obtained by combining two or even three components.Like in languages, tangible gestures components can be considered as the basic phonemes to construct complex words.An example of the possible combinations is presented in Figure 3.

Hold + Touch
As depicted in Figure 3, combining hold and touch, we obtain grasps.A grasp is considered as the way the user holds an object in the hand, and thus the points of the object s/he touches.Grip is often used in the literature as a synonym of grasp [40].It is worth noting that grasps are static gestures and can be compared to postures in free-hand gestures.Wimmer's GRASP model [25] offers several guidelines to design and recognize grasp postures, which should be followed by TGI designer for this type of gestures.The "Human Grasping Database" is another powerful tool to understand all the physical forms that grasp gestures can assume [41].Different grasps can be applied to the same object, as gesture for changing the object function, like in the Microsoft multitouch pen [40] and Talyor and Bove's Graspables [42].Grasps are often used for mode switching [40] in TGI applications and associated with the "verb" metaphor of Fishkin [6]: one grasps the object as an "X" in order to make it behave as an "X".

Hold + Move
This class of tangible gestures includes all dynamic gestures made by moving an object while holding it.The held object has a very important role because different objects can give different meanings to the same movement.Conversely, different movements can give different meanings to the same multifunctional object: for example, the Nintendo Wii Remote and the Sony PS Move controllers are not only physically animated by gestures, but the user can, with proper gestures in a given context, "reinvent its identity" [24].Another example is provided by Vaucelle and Ishii, who presented also gestures performed with a doll to collect movies [24].Gestures with a physical object in the hand were first categorized by Ferscha et al. [43].The Smart Gesture Sticker project, instead, showed how the user could perform gestures with everyday objects by just attaching a wireless accelerometer to them [44].Pen rolling [45] is a particular hold+move gesture: the movement is associated to the held object but not to the hand.

Touch + Move
This category includes all dynamic gestures that the user performs by touching an object, i.e., s/he touches the object and then moves the finger or the whole hand over its surface.The user can interact with a standalone object in the environment or with a body-worn object.It is worth noting that generally the user does not need to move the object, while his/her fingers or the whole hand swipe its surface to perform gestures.Wobbrock et al. presented an extensive taxonomy of touch gestures [35].Although the study was focused on tabletop surfaces, thus on touch gestures that are generally performed in a GUI, it offers interesting insights for designing touch + move gestures.An example of dynamic touch gesture on a wristwatch touchscreen can be found in [46].The user can swipe around the bezel to select the items of a calendar.Although the prototype implied the presence of a GUI, the gesture follows the typical paradigm of round watches.In fact, the item to be selected in the calendar was disposed in a circle near the bezel, like the hours in a watch.Instead of interacting on the bezel, Perrault et al. proposed touch gestures on the wristband of the watch [47].In TZee, touch gestures on a truncated pyramid serve as various commands for manipulating 3D digital objects [48].

Move + Hold + Touch
Tangible gestures that belong to this class are the most complex.These gestures are generally extensions of the previous classes and are obtained by combining simultaneously the three interaction components, move, touch and hold.Often, they are just a composition of two simpler combinations, i.e., a grasp and a hold + move or a grasp and a touch + move gesture.For example, the user can grasp an object in a particular way, in order to activate a special modality, and then s/he can move it to perform a gesture in the air.Similarly, a grasp posture could be followed by a dynamic touch gesture on the object held in a particular way.In this case, the object is static in the hand, while fingers move to perform touch gestures on the surface.In [40], it is possible to find an example of this category: Song et al., combined a cylindrical grasp on the MTPen with a swipe on the surface of the pen to define a command for turning the page of a digital book.Wolf extensively analyzed these gestures in [26], proposing a large taxonomy of the best microinteractions to be associated to three different grasps.The purpose of Wolf's microgestures is the interaction for secondary tasks, while grasping the object is often associated to the primary task; for example, steering the wheel to drive the car.An implementation of microinteractions on the steering wheel can be found in [49].It is possible to find gestures that do not imply a movement of the forearm but only of the object.This could be achieved by manipulating the object with fingers, e.g., for bending an object [50].These latter gestures often alter physical properties of the object, either in a permanent or temporary way.A particular preemptive grasp is generally still needed, but, in these cases, its only purpose is allowing the subsequent movement of the object.

Extension to More than One Object and Full Body Interaction
All the gestures described above involve only one object, which can be either handheld or a standalone object that can be touched.Gestures with more than one object can be considered too.However, when two objects are combined in the hands, the relationship between them introduces many degrees of freedom to the interaction designer.Most of the implications that arise when interacting with more objects are explained by the principles of Ullmer and Ishii's tangible interaction [9], according to the spatial, constructive and relational categories.Therefore, the framework proposed in this paper aims to classify tangible gestures performed with only one object.As suggested by Hoven and Mazalek [10], two-handed interaction and full body interaction with objects should be also considered in TGI.Gestures performed with two hands on one object can be easily classified using TGIF by analyzing together the gestures of the two hands if the two gestures are symmetric, or separately if they are different.For example, the squeezing gesture on the tangible video bubbles [39] is performed with a symmetric action of the two hands.Conversely, the aforementioned combination of grasp and dynamic touch gesture for the turning page command on the MTPen [40]could be performed with two hands: one hand makes the grasp posture while the other performs the touch gesture to flip pages.Other parts of the body could be also used.For some particular scenarios, the user could touch, hold or move an object with the mouth: the framework can be applied in the same manner depicted for the interaction with the hand.).

TGIF Semantics: Meanings of Objects and Gestures
Several semantic classification exists for objects [6,9,16] and gestures [6,20,22], but few consider the semantic constructs that can be associated to gesture-object pairs.Semantics is an important aspect to be considered in the design of tangible gestures.This facet of TGIF deals with a particular foundation of tangible interaction: the cognitive human skills needed to understand and remember gesture meanings.Referring to Figure 2, a tangible gesture is a sign performed in the physical world that generally is associated to an action in the digital system.The symbol associated to the sign is generally represented as the digital world in Ullmer and Ishii's model [9] and as the reference in the Buhler's Organon model [29].In order to have an effective communication, the knowledge of this reference should be unambiguous for both addresser (who performs the sign, e.g., the user in the top of Figure 2) and addressee (who interprets the sign, e.g., the computer or the other user in Figure 2).Two approaches are possible to share this knowledge: the system designers explicitly share the vocabulary of gestures and their relative meaning or they try to convey implicitly this information by embodying the interaction in the gesture-object pairs.In this latter case, the objects should provide enough affordances to make the user guess the possible gestures [30,51].Section 4.1.2offers more insights about how to convey the possible gestures through object form.Even when the tangible gesture vocabulary is communicated explicitly, having strong associations between the physical world (gesture-object pairs) and the digital world (digital objects and actions) could help decreasing the learning time and facilitate remembering the tangible gesture vocabulary.However, metaphors can break at some point and they are not always the best solution to make the user understand how the system actually works [52].Therefore, the designer should always consider the full range of semantic constructs and choose the most proper physical-digital associations according to the application requirements.
In TGI, objects and gestures can have either a weak or a strong association with the referent.The object could be a multipurpose tool or an iconic object that totally resembles to the referred object.Similarly, the gesture could be defined arbitrarily (semaphore) or a movement that we typically perform during our everyday life (metaphor).Separating weak and strong references in only two distinct classes is generally not possible: following the approach of Koleva et al. [53], we will rather represent the possible semantic combinations of the object-gesture pair in a two-dimensional continuum.In Figure 4, we mapped most tangible gestures semantic constructs that we found in literature.As examples of each construct, we will consider a tangible gesture that can be associated to a function for turning on a light.Fishkin's metaphors [6] can be easily identified in our two-dimensional representation of semantic constructs.The none metaphor occurs when both object and gesture have a different representation than in the digital world, i.e., they are both symbolic, or both arbitrarily defined by the designer.Shaking a pen (in this context a multipurpose tool) to turn on the light is an example of low coherence for both gesture and object.The verb metaphor by Fishkin has a respective representation only for the gesture.The object can be a multipurpose tool or an object whose identity is transformed by the gesture.With the identity reinvention, the real object is semantically transformed in the referenced object and behaves as this latter [24].As example of identity reinvention gesture, one can hold a pen as a torch in order to make light into a room.Conversely, noun is a metaphor associated to objects with a strong coherence with the reference in the digital world but with an action that has low coherence, i.e., a semaphore [18,20].For example, the user can swipe over a lamp in order to turn on the light.For gesture-object pairs that both have a coherent reference with the digital world, Fishkin distinguishes between noun+verb metaphors and full metaphors.In the case of full metaphors, there is a complete overlap between the physical system and the digital system.Indeed, with direct manipulation the user has no longer to translate a metaphor, nor to reason about similarities: he can directly modify the state of the system by altering some parameters.The communicative intent of tangible gestures is partially lost, however, the interaction is still explicit and we still included direct manipulation in our two dimensional continuum.Deictic gestures are particular gestures that are used to identify or convey the attention on a particular object.Deictic gestures generally do not have a particular meaning associated, although their forms (like pointing or taking an object in the hand) are common in our everyday gestural communication.Finally, embodied metaphors are particular metaphors that are based on simple concept (embodied schemata) derived by our everyday experience [54] and they can be used to design intuitive gestures that can be associated to objects.In direct manipulation, deictic gestures and embodied metaphors, the object could be a container with low coherence or could have the same representation in the digital world.For this reason, their representation spans across the whole object continuum.
It is worth nothing that the reference associated to an object-gesture pair can change over time and space even in the same application, according to context information.Moreover, the interpretation of metaphors could deeply vary among users, according to their personal backgrounds.As an example, Hoven and Eggen showed how personal objects can assume particular meaning and can serve as a link to memories [55].

Classification of TGI Systems
Table 1 resumes the classification of tangible gestures found in existing systems.This classification does not aim to be complete-gestures with objects have been adopted in many systems-but rather, it offers several examples from all different classes depicted throughout the paper.This classification of the previous system has also been used to derive common practices for designing and building TGI systems.For each system, we specified the component involved in gestures, i.e., move (M), hold (H) and touch (T), the objects used for gesturing, the application aim, and the technological approach used to recognize gestures.

Engineering TGI Systems
Developing a tangible gesture interactive system requires an iterative process that encompasses both interaction and system design.This process is showed in Figure 5; the TGIF taxonomy serves as reference framework for reasoning about all the possible gesture-object pairs and their meanings.After investigating the user requirements and the aim of the application, designers should understand which types of gesture-object pairs are suitable for the desired application.For example, gesture with component like hold and touch can facilitate the communication of emotions between users, while movement can improve the expressivity of a gesture.Also, the objects used for tangible gestures have an important role in the definition of the gesture components: bulky objects cannot be held in the hand and in this case touch + move gestures are more suitable; deformable objects, like pillows, offer affordances for applying pressure, either while touching or holding them.Moreover, object affordances can facilitate the usage of metaphors for associating particular system behaviors to the gesture-object pairs.Many examples exist in the literature to inspire designers and some of them are presented in this article, but exploring new gesture-object pairs and metaphors is also fundamental to improve the interaction experience.Designers should iterate several times through the choice of gesture-object pairs by testing with users the designed interaction.While the first iteration could be done with a mockup, further steps should involve working prototypes that recognize user gestures.An alternative approach for designing gestures is asking to the user which gestures they would perform for the different command that could be imagined in a determined application.Indeed, gesture elicitations have been often used to derive gesture taxonomies and improve the user experience of particular interfaces [27,35,50].Morris et al. [65] recently provided guidelines to obtain unbiased gesture taxonomies from the participants to elicitation studies.Several technologies exist for recognizing the different types of tangible gestures; however, sometimes the technologies are not ready for recognizing the desired gestures with enough accuracy or power/space efficiency.Depending on the availability of technology and resources, the designed tangible gestures need to meet the implementation requirements and need to be refined or modified during the following iterations that bring to the final working system.

Designing Tangible Gesture Interaction
In order to facilitate the work of the designer we derived guidelines for designing TGI systems by reviewing common practices in literature.In this section, two aspects are presented: the first section describes common practices for the most popular application domains; the second section analyzes properties of our everyday objects to find affordances for possible gestures.

Common Practices for Popular Application Domains
Hoven and Mazalek [10] identify several application domains for TGI: communication and collaborative applications, education, collaborative design, mobile applications, entertainment and gaming.By analyzing examples in literature, we identified few additional domains, i.e., communication of emotions, individual and collaborative production and control.
In the domain of emotional design, hold and touch assume a very important role.Hold and touch are often associated to intimacy in social interactions and have a very important role in childhood [66].Several artistic exhibitions make use also of movement to communicate emotions, often in association to music or lights [67].Storytelling is a domain where tangible gestures are particularly interesting.Objects play an important role in stories [37,57] and can be animated by children through hold + move gestures [24].In learning and education, objects are useful for the reification of abstract concepts: by holding and moving objects it is possible to highlight relationships between different concepts embodied by the objects [68].In work environments, where efficiency in content production is crucial, tangible gesture offer easy to remember and quick shortcuts in respect to traditional GUI or dedicated buttons [40].The MTPen [40] adopted different hold + touch gestures to change ink-mode, while touch + move gestures while holding the pen provided additional shortcuts.In general, the introduction of hold + touch gestures to change the operating mode of a multipurpose tool allows reducing the number of interactive objects, while conserving part of the tangible affordances provided by the manipulation of a tangible tool.When the work environment is set up for collaboration, tangible gestures should be designed to be easily understood by all participants to the interaction, in this case, hold + move [67] and touch + move gestures on tabletop [48] are particularly suitable, especially if the system is persistent and the effect of each gesture is clearly represented.In gaming applications, tangible gestures obtained a great success, especially with the Nintendo WiiMote and the PlayStation PSMove controllers.Movement has a key role in gaming and the names chosen for the two controllers reflect this importance.In fact, hold + move gestures are often used for these applications and they are able to give new meanings to the multipurpose controller, according also to the different gaming contexts.As stressed by Vaucelle ans Ishii [24], similarly to storytelling, these gestures are able to animate the object and to give it a new life.In contrast to work environments, in gaming applications big movements are not avoided and fatigue could be a key element of the play.Finally, gestures are very often used to give commands and to control the behavior of digital or physical objects.Touch and touch + move gestures on remote controllers [59] or on wearable devices [38,46,47] are often used because of the little effort that is necessary to perform them.

Object Affordances
Sometimes designers can find inspirations from everyday objects and can look for the gestures that are more suitable for that given object.Affordances help users to guess the interaction and simplify the work of designers to communicate to the user the available gesture in the interface.We identified several affordances exploited by existing systems: constraints, deformability, moving parts, dimensions, forms, and life-like objects.Constraints are classic affordances exploited since Shaer et al.'s TAC paradigm [8].By limiting the interaction possibility with surfaces or bindings, the user can easily guess which gesture s/he can perform.Tabletops are a typical constrained setup where users are likely to perform touch + move gestures on the surface around the objects or planar hold+move gestures over the tabletop [67].Moving parts of the objects are particular constraints that are able to guide touch+move gestures [59] or hold + move gesture on a part of the object.The form of an object and in particular its ergonomics, instead, suggest to the user how to hold and touch the object, which can be associated to different operating modalities of the object.Gestures could vary also according to the different object dimensions: bulky object are difficult to hold, thus, touch + move gestures are preferable, unless they have moving parts that can be held and moved [43].Small objects, instead, can be easily held and moved.Deformable objects can be distinguished in two types: objects with form memory [62] and objects without memory [39,50].Deformable objects often encourage the user to apply forces, either in relation to hold gestures or to touch gestures.Movement is generally involved as an effect of the applied forces [39,50,62].
A particular class of object affordances is related to the semantics of the object.The previous knowledge of the user can be exploited to convey through the object form the gestures that can be performed with it.The object can be an everyday object or an artifact that resemble to a well-known object or tool.This class of objects falls into the iconic category of Ullmer and Ishii [9], or the noun + verb category of Fishkin [6]: because the user knows how to hold and move the object for the everyday use, she is likely to think that a similar gesture is implemented also in the digital interactive system.Particular objects are those that show anthropomorphic or zoomorphic affordances [69].In this case, the user expects a life-like behavior from the object, and the designer should adopt gestures that include touch and hold components in order to obtain a greater emotional involvement of the user [60,66].
Whenever the object is not able to communicate the possible gestures through its form, the TGI designer has to communicate to the user which gestures she can perform with the object.Typically, the designer provide explicit instructions to the user either with a tutorial or a manual.Recently, Lopes et al. suggested a new vision, called Affordance++ [70], where the object instructs the user how to perform gestures with it.The authors adopted a wearable system that force the user to perform the right gestures for each object by contracting her muscles with electric stimuli.

Building TGI Systems
Tangible gestures can take various forms: in TGIF, we identified three main components (move, hold and touch) that can be combined and to which the designer can associate further properties (e.g., pressure, amount, speed, etc.) as additional degrees of freedom.Thus, selecting a subset of gesture types, i.e., a limited number of component and property combinations, can be useful to restrain the sensors needed to detect and recognize them.In this paper, we will discuss only the different hardware technologies that can be adopted in a TGI system, without discussing the software (e.g., algorithms and techniques) required to segment and classify gestures.
Each year, new technologies for recognizing gestures with objects are proposed.Recently, Sato et al., demonstrated a new capacitive sensing technique to make every object and also our body touch and grasp sensitive [71].As an opposite approach, Klompmaker et al., used depth cameras in the environment to detect finger touches on arbitrary surfaces, holding, moving and releasing objects [64].
Finding the most appropriate technologies for tangible gesture recognition is not trivial and a survey of existing ones is out of the scope of this paper.However, we propose to frame recognition systems into three main approaches: (1) embedded and embodied; (2) wearable; and (3) environmental.These approaches differ not only in the emplacement of sensors, but also in the point of view of the interaction: respectively, from the object, from the user and from an external point of view.Moreover, advantages and disadvantages exist for each approach: an object equipped with sensors can generally observe only the tangible gestures related to it, while using a wearable system generally it is possible to recognize gestures that the user performs with whatever object.In the environmental approach, instead, the same sensors can observe gestures made by all users with all objects.Indeed, an approach can be preferred to another (or possibly combined), depending on the interaction scenario and power/space requirements.
As tangible gesture interaction starts permeating our common activities and our everyday environments, gesture spotting becomes a challenging task for the system designer.Distinguishing gestures communicated to the systems from common everyday activities is critical to avoid unintended responses from the system.The presence of an object to interact with generally facilitates gesture spotting, because the gesture begins only when the user gets in contact with or in proximity to that object.Still, a more fine grained segmentation is needed, because several gestures can be performed together without leaving the object from the hand [49].Most gestures can be delimited by changes in one component (presence or absence of move, hold or touch) or in the value of an attribute.However, there are some examples [49] in which this event is difficult to recognize and an external trigger is needed.This trigger can be a vocal command, pressing a button or an additional gesture.In order to recognize the different tangible gestures, four main tasks have been identified: (i) the identification of the object the user is interacting with; (ii) the object or hand movement tracking; (iii) the recognition of the way the user is grasping the object, i.e., hold + touch gestures; and (iv) the recognition of touch + move gestures.Finally, the designer should also include proper feedback in order to acknowledge the user's commands.
The embodied and embedded approach (EE) is probably the most common for TGI.Enhancing objects with onboard technology is a well-known practice in the field of tangible interaction.As shown by Atia et al. [44], even simple objects can be augmented with gesture recognition capabilities by just attaching a small module to it.Inertial sensors are sufficient to detect most hold + move tangible gestures [43,44].Touch sensing, both static and dynamic, can be achieved with different techniques: capacitive sensors [40,71], pressure sensors [49], and cameras that analyze light conveyed from the surface of the object [72].
The wearable approach (W) exploits technologies embedded in worn accessories, especially those very close to the core of the interaction, e.g., the hand.The wearable approach overcomes most problems of occlusion typical of cameras placed in the environment and allows the interaction with many common objects, which do not need embedded sensors.In fact, recognizing objects is as simple as sticking an RFID tag on them and integrating an RFID reader in a glove [57] or in a wrist-band [4].The detection of hold + move gestures is easy when the movement of the forearm is rigidly coupled to the movement of the object.In this case, inertial sensors allow detecting most of possible gestures with the object [4,63].However, detecting the movement of the object within the hand, like pen rolling, is more difficult.The movement of the object could be inferred by inertial sensors placed on the fingers, for example in a ring [73], or analyzing the movement of tendons in the wrist [63].The analysis of tendons is an interesting approach also to detect grasp postures and touch gestures.
The environmental (E) approach offers an external point of view for the analysis of tangible gestures, placed in the environment, which allows discovering important properties of tangible gestures.Seen from the environment, grasps and grips are obviously static; touch + move gestures imply a movement of the user hand or fingers relative to the object, which, instead, is generally static; hold+move involve a movement of both the object and the user hand with respect to the environment.Thus, depending on the gestures type, a vision based recognition system should focus on the tracking of the hand, of the fingers, or of the object.Three main trends can be identified for the environmental approach.Tabletops [48,67] are very common in tangible interaction and restrain the interaction space to a surface, which generally simplifies the recognition tasks.Systems with fixed RGB [45] and/or 3D [64] cameras in the environment are also possible, even if they can suffer from occlusion problems.Finally, as shown by Li et al. [74], gestures could be recognized by mobile cameras in the environment, which could be integrated in serving robots that follow the user in the environment.

Design of Four TGI Systems
This section presents four Tangible Gesture Interactive systems developed as practical application of the Tangible Gesture Interactive Framework.Each system explores different gesture components and semantic constructs, following the common practices presented in Section 4.1.The examples show also different design processes: in some cases, the design of tangible gestures followed a technology-driven approach; in another case, the gestures adopted in the system were elicited previously by users.Finally, through the various systems we explored different technological approaches to recognize tangible gestures.The gesture recognition performance and the perceived user experience will not be discussed in this paper, since each project has a different application domain and they could not be compared directly.Indeed, the aim of this section is an analysis of the design choices made for each project.Additional information on each system can be found in the related papers [49,63,[75][76][77][78][79][80].

WheelSense
The WheelSense project investigates gestures that can be performed on the surface of the steering wheel to interact with the In-Vehicle Infotainment System (IVIS).Three different technological approaches have been investigated to recognize gestures: sensors embedded in the steering wheel (embodied and embedded approach), sensors worn on the user body (wearable approach) and a hybrid approach that combines the two previous approaches.Four versions of the system have been developed.
The first three versions of the WheelSense system aimed at meeting a particular safety requirement: gestures must be performed while holding firmly the steering wheel with both hands in the position suggested by the Swiss driving school manual [81].This requirement imposes a constraint on the presence of the hold component in the gestures performed on the steering wheel.The first system used a wearable approach, based on electromyography [80].This system is able to detect hand movements and forces applied with the hand through the analysis of the forearm muscular activities.This electromyography system needs sticking electrodes on the user forearm, which is not practical for a real usage, but currently we are investigating the possibility to use electrodes integrated in smart clothes.We designed four gestures to control the IVIS that could be easily recognized by the system: index abduction to start music, hand squeeze to stop music and wrist flexion and extension (dragging upward and downward the hand), respectively, to go to the next or to the previous song (Figure 6). Figure 6.The four gestures for the wearable WheelSense system.For the second system [49], we chose a similar gesture design approach with the constraint of keeping both hands on the steering wheel, but we recognized gestures with an embodied and embedded approach.Indeed, we chose to integrate pressure sensors on the surface of the steering wheel to detect gestures that imply no movement of the hand from the position required by the Swiss driving school manual.We designed four gestures to control the IVIS that could be easily recognized with pressure sensors: tapping with the index to play music, squeezing the whole hand to stop music, wrist flexion and extension, respectively, to go to the next or to the previous song (Figure 7).In a third application [78], we combined the two previous systems (wearable and embedded) in order to recognize five different gestures showed in Figure 8 (the four gestures designed in the second system plus a push gesture where the user presses with the hand to pause the IVIS music).In this case, we adopted a hybrid and opportunistic approach to recognize gestures: the system exploits the information of both subsystems to increase the recognition accuracy, but it is able to work even when one of the two subsystems is missing.In this work, we investigated also the advantages and drawbacks that arise combining two or more systems in terms of eight design parameters (Interaction area, Personalization, Consistency, Private interaction and intimate interfaces, Localized information, Localized control, Resource availability and Resource management), classifier fusion algorithms and segmentation techniques.
In the three previous systems, the choice of gestures was made to comply with the safety requirement to keep both hands on the steering wheel and to maximize the recognition accuracy of the system.For the fourth system, we adopted a different design approach, in order to maximize the guessability and learnability of gestures.Therefore, following a common practice in literature, we conducted a gesture elicitation [77], asking to 40 people six gestures that they would like to perform to interact with the IVIS.In order to obtain a large variety of gestures, we removed the constraint of keeping both hands on the steering wheel for safety reasons.Nevertheless, all the 40 people participated previously to a simulated driving experience, thus, they were aware of the risk of interacting with an IVIS while driving.We obtained a taxonomy of 13 common gestures that can be performed on different spots of the steering wheel (Figure 9).The taxonomy includes only gestures that have been elicited at least four times; obviously, participants suggested also other uncommon gestures.It is worth noting that most gestures suggested by users required leaving one hand from the steering wheel, thus, not respecting the safety constraint imposed in the first three systems.Following the insights of the elicitation study, we designed a new user-driven system to recognize gestures performed on the surface of the steering wheel [75].We selected the most frequent gestures suggested to interact with the IVIS for six particular commands (next/previous, select/back and volume up/down) of a menu-based head-up display.We implemented the new system following the embodied and embedded approach.In this case, we adopted capacitive sensors instead of the pressure sensors used in the previous system, because the most frequently suggested gestures were swipes and taps, which imply little pressure on the steering wheel.We integrated six electrodes on the steering wheel (three on the top, one in the bottom, one on the left side and one on the right side) to recognize six gestures: hand tap on the top/bottom of the steering wheel for the select/back commands, hand tap on the right/left side of the steering wheel for the next/previous commands and hand swipe up/down on the top of the steering wheel for the volume up/down commands.The system is shown in Figure 10.The design of the different WheelSense systems investigated a particular gesture design choice, i.e., whether to impose or not the hold component for gesturing on the steering wheel.Gestures performed while holding the steering wheel could increase driving safety, but the removal of this constraint led to a broader taxonomy of gestures, as found in the gesture elicitation study.This implies gestures that are potentially unsafe, but also the possibility to use gestures that are more familiar to the user, typically borrowed from touchscreen mobile devices.To assess the unsafety of gestures on the steering wheel without the hold constraint, we conducted a between-subject study with 60 participants to compare the last WheelSense system (based on user-elicited gestures) with a vocal command interface and a touchscreen based interface integrated in the car dashboard [75].The results showed no significant difference for the driving performances of the three systems, suggesting that using gestures without the hold component could lead to a safety level comparable to the interaction modalities typically available in current commercial IVIS.

ADA Lamp
The Anthropomorphic Display of Affection (ADA) [76] is a system designed to investigate tangible gestures as an emotional communication means.The aim is to provide the user with tangible affordances that can facilitate the interaction with an artifact (a spherical lamp), in particular for performing gestures that carry strong emotional contents.Indeed, the spherical form of the lamp and the facial expressions that the lamp can mimic (a mouth that smiles or is frown and eyes that can blink) help the user recognizing the metaphor of a human head.These life-like affordances aim at facilitating the discovery and learning of the gestures that the user can perform on the lamp and that the system is able to recognize.Indeed, the gestures recognized by the lamp are those typically used in human-to-human communication: the user can caress, hug or kiss the head to convey positive emotions, but she can also slap and shake the head to convey negative emotions (Figure 11).All gestures that can be performed on the ADA lamp include the touch component.The main reason of this choice is the importance of touch in social relationships, as anticipated in Section 4.1.Two applications are envisaged for the ADA lamp, a smart companion that reacts to the gestures performed on the lamp, and a mirror of a distant person's emotions that allow communicating with him or her.Therefore, the role of gestures with a touch component is developing an emotional closeness, respectively with the smart companion or with the distant person.To enhance further this feeling, the smart companion lamp was programmed to become sad after some time that nobody was interacting with it, expressing the feeling of lack of physical contact.
During two interactive demonstrations of the ADA lamp at the IHM'14 and TEI'15 conferences, we observed several participants interacting with the lamp, which was programmed to react as a smart companion.The anthropomorphic affordances stimulated conference attendees to interact with the lamp, although we noticed that most caresses were performed on the top of the lamp while the system was designed to recognize only caresses performed on the side, as one would perform on an adult.We suppose that this behavior is due to the position of the lamp (on a table), which is lower than an adult human head, thus ergonomically facilitating gestures on the top of the lamp (as one would do on a child or on a pet head).

Hugginess
Similarly to the ADA lamp, one of the aims of the Hugginess system [79] is to recognize gestures that people usually perform to communicate with other people, especially those including a strong emotional component.In this case, the project focuses only on the hug gesture, but instead of gesturing on an anthropomorphic object, this system is conceived to recognize and enhance gestures performed on other people.Indeed, the Hugginess system exploits a t-shirt to recognize hug gestures and it rewards the user by exchanging contact information with the hugged person (Figure 12).
Concerning the technological approach adopted in Hugginess, even if gestures are recognized by a wearable system, at the same time the system can be classified also in the embodied and embedded category.Indeed, the object on which the hug tangible gesture is performed is actually another person and technology is embodied and embedded in this particular "object".Similarly to the ADA lamp project, Hugginess explores gestures that imply the touch component.However, in this case there is a deeper emotional involvement, since gestures are not performed on an object, but on a real person, invading the intimacy sphere of both huggers.Because people hold each other in their arms, the hug gesture stimulates a sense of reciprocal belonging.While the gesture has the digital effect of exchanging the phone numbers, it has also the performative role of establishing physical contact between two people.In most cultures, a hug between two strangers could be seen as an invasive gesture, but the project aims at remembering the undeniable psychophysiological benefit of hugging, thanks to the release of oxytocin in the human body during hugs.

Smart Watch
The aim of this project is exploring gestures that can be performed to interact with everyday objects in the context of a smart home [63].These objects are used as tangible cues to digital content [82] or controllable home functionalities and different gestures can be associated to these objects to display the associated digital content or to operate the related home functionality.The project focuses only on objects that can be held in the hand and on gestures that can be performed while holding an object.The role of the smart watch is recognizing the object taken in the hand and the gesture performed with the forearm.Therefore, this project adopts a wearable approach, where all the sensors are embedded in the smart watch and only a RFID tag is attached to objects to allow their identification.In order to investigate the user appreciation of gesturing with physical artifacts, we conducted a Wizard of Oz experience.Eight participants interacted with four different objects linked to digital media: a seashell for videos from Florida, a snowball for photos from the winter, a plush owl for photos of animals and a CD with classical music (Figure 13, left).We asked users to wear a fake prototype of the smart watch (Figure 13, right) and to interact with a media player performing gestures with the objects.We designed six gestures associated to six commands of the media player.To play the content of the object, users had to perform a shake gesture towards the TV.Users could browse the media list through small movement of the hand to the left or to the right and they could raise or decrease the volume with small up or down movements.Raising the hand and maintaining it in a vertical position was associated to the stop command.The aim of the study was evaluating the user satisfaction with the systems as well as understanding if gesturing with objects in the hand could be cumbersome for the users (in a 5-point Likert scale).Only two users reported the interaction as cumbersome, mostly because of strap of the smart watch fake prototype.No significant difference has been found between gesturing with or without the object in the hand.Although the six tangible gestures were the same for all the four objects, we observed particular behaviors during the interaction.Two users squeezed the plush owl while maintaining the hand raised for the stop command, because of the affordance offered by the plush.Moreover we asked to the participants if they would like to define their own set of gestures and they all answered affirmatively.Following the previos insight, in order to allow the user to define their own gestures, it was necessary to design a smart watch that is able to recognize a large variety of gestures with handheld objects.To this purpose, we investigated different technologies that can be used to recognize the different types of gestures [63].Most mid-air strokes performed by moving the forearm while holding the object in the hand can be recognized through an accelerometer integrated in the smart watch.In order to recognize other gestures such as grasping, squeezing, releasing and finger movements in the hand, we investigated the integration of pressure sensors in the smartwatch strap.In this case, the aim is detecting tendon movements associated to the aforementioned gestures, which generally do not imply a movement of the whole forearm.
Since this project investigates gestures with handheld objects, all gestures imply the hold component.Nevertheless, different gestures can be performed with objects in the hand and recognized by sensors integrated in a smart watch: hold+move gestures (forearm or hand movements while holding the object) and hold + touch gestures (different ways to grasp the object).

Discussion
TGIF aims at guiding the design and development of tangible gesture interactive systems.Tangible gestures offer interesting opportunities in several application domains and are promising for a future where more and more everyday objects will become interactive.Although previous studies in the field of gestures with objects already exist, the TGIF framework offer a high-level overview on the large variety of tangible gestures and tangible gesture interactive systems that already exist or that can be designed.Demonstrating the usefulness of this framework is not easy without adoption by several researchers.However, we believe that TGIF is a powerful tool to show the richness of TGI and it could be a source of inspiration to TGI designers.
According to Beaudoin-Lafon [83] a good interaction model should include three main dimensions: "(1) descriptive power: the ability to describe a significant range of existing interfaces; (2) evaluative power: the ability to help assess multiple design alternatives; and (3) generative power: the ability to help designers create new designs.".We propose an evaluation of TGIF according to these three dimensions.

Descriptive Power
The abstracting section of the paper (Section 3) presented a taxonomy of gestures and analyzed them according to their syntactics and semantics.We provided several examples that show how tangible gestures can be described through the three move, hold and touch components and additional properties such as pressure, amount, speed, time direction, etc. Section 3.2 presented a taxonomy of semantic constructs that can be associate to tangible gestures.Section 3.3 presented a classification of 23 existing systems according to the TGIF dimensions.Most gestures with objects can be described through the proposed syntactics, although some gestures that focuses on object movements instead that on the user interaction with the object can have a more complex description.The large variety of possible gestures with objects implies that a simple description of the whole spectrum of tangible gestures is not possible.During the definition of the move, hold and touch taxonomy, it was important to trade off the descriptive power of the framework with the simplicity of the description and easiness of understanding.A first version of the taxonomy has been evaluated with four HCI experts, who had to classify 13 gestures across seven classes: in this version, hold + move + touch gestures were divided in three subclasses and there was a class for non-TGIF gestures.We gave the experts a preliminary version of the TGIF gesture syntax of Section 3.1 without any of the examples from the literature, which were part of the gestures to be evaluated.The description of gestures to be classified was extracted from the respective articles found in literature and reported without modifications.We have calculated the inter-annotator agreement rate of the independent evaluations using the free-marginal multirater kappa, obtaining a value of 0.63.After a discussion among all the participants, the kappa coefficient of the agreement rate increased to 0.86.All HCI experts managed to understand the model and to classify the 13 gestures in about an hour, without prior knowledge of the model.Feedback received by the four HCI experts allowed to refine the move, hold and touch taxonomy, which has been simplified as shown in the current version of Section 3.1.

Evaluative Power
Evaluating different tangible gesture alternatives would be difficult without considering the specific application domain of the tangible gesture interactive system that the designer would like to build.Through the common practices for designing and building, we have shown how it is possible to choose among the broad range of gestures according to the application domain or the affordances offered by everyday objects.Obviously, TGIF offers only an insight on existing common practices, but further explorations are needed on each application domain.In fact, the interaction designer of a new application should compare different gesture sets by conducting user evaluations.Gesture elicitations are a popular approach to maximize user acceptance and guessability of gestures.Successful examples have been shown by Wobbrock et al. [35] and Lahey et al. [50].In Section 5, we showed how we applied TGIF guidelines to choose tangible gestures for four TGI applications.
A more generic evaluation of tangible gesture interactive systems could be conducted according to the qualities of tangible interaction proposed by Hoven et al. [14].They suggest that tangible interactive systems generally have a direct representation and control, an integrated representation and control, and a meaningful representation and control.Therefore, it is possible to evaluate a TGI system according to how much it is compliant to these six qualities.Hereinafter, as an example, we present the evaluation of the ADA lamp (behaving as smart companion) according to the six tangible interaction qualities: • Integrated control: The emotional state of the ADA lamp can be controlled performing gestures on the surface of the lamp.Sensors are also integrated inside the lamp to recognize these gestures.• Integrated representation: The lamp emotional state is represented by its facial expressions, which are generated through RGB LEDs integrated in the lamp.• Direct control: The lamp state cannot be controlled directly through tangible gestures.Indeed, although a deterministic state machine is used to describe the lamp behavior, we avoided to obtain direct reactions to user's gestures.The almost unpredictable reactions are intended to create a life-like behavior that should foster long-lasting interactions.• Direct representation: The representation of the lamp emotional state is direct and is coded through specific facial expressions and colors.• Meaningful control: The gestures used to control the lamp are those typically used to interact with humans.Therefore, the meaning and emotional valence of gestures are consistent with the users' social habits.• Meaningful representation: The representation of the lamp emotional state through facial expressions is meaningful and generally can be understood easily by the users.Obviously, the mapping between colors and emotions according to Plutchick's wheel of emotion is not universal and the color mapping might not be meaningful for some users.
The analysis of the ADA lamp tangible qualities shows that this system is compliant to all quality criteria but the direct control.However, in this system, the absence of direct control was designed on purpose to obtain an unpredictable, life-like behavior.Therefore, the lack of one or more tangible qualities is not always a synonym of a bad TGI system.Moreover, it is not possible to evaluate the system with a binary (present/not present) evaluation of each of the six quality.Sometimes a quality could be present only partially and the evaluation of the quality could depend on the social or personal background of the user.In conclusion, the analysis of the tangible qualities of a TGI system could be useful to evaluate the properties of the system, but it does not offer an absolute measure of its overall quality.

Generative Power
There is a large variety of new tangible gestures that can be explored and the move, hold and touch taxonomy offers already a large palette of gestures.Combining gesture components (move, hold and touch) with properties (pressure, amount, time, etc.), the designer can generate a plethora of different interactions with objects.Design principles like polymorphism and reification [84] can be applied in a given application to generate new gestures.With the polymorphism, a multipurpose object can be transformed in different tools according to how it is held or moved (cfr.[24,40]).This will reduce the number of objects to interact with in the application.Similarly, the same gesture can be used to interact with different objects, having the same effect in the application, as seen for the smart watch illustrated in Section 5.4 (cfr.the functional gesture approach of Carrino et al. [85]).New interactions can be generated also through the reification of digital concept: indeed, it is possible to add existing objects in the real world as interactive tokens or to make the user define personalized gestures (cfr.the Smart Watch application in Section 5.4).
The application examples provided in Section 5 explain how the guidelines proposed in TGIF have been used to generate four TGI systems and the respective tangible gestures.Table 2 resumes how we explored the design space of TGIF through the four application examples.Obviously, the examples are not comprehensive, since the design space of TGIF is very broad.For example, none of the systems provided as examples used the environmental approach to recognize tangible gestures.Similarly, only a few of the common practices for application domains and object affordances has been applied.

Limitations
TGIF aims at framing future work in this field at a high level.For this reason, further investigations are needed on each application domain.The main objective of TGIF is demonstrating the richness of tangible gesture interaction, without offering a formal grammar for describing all the possible gestures.Such grammar could help formalizing interactions and building TGI systems but, because of the variety of tangible gestures that can be imagined, that grammar will be very complex, with the result of confusing novice designers that approach this field for the first time.
TGIF does not address the very specific need of each TGI designer, but can help them by providing a high level framework that points to more specific guidelines, whenever they exist for a specific type of tangible gesture, application domain or technological approach for tangible gesture recognition.Common practices are not necessarily the best choice for each application and the designer should always explore new gestures and new technologies in order to advance the field.
Because of the several output modalities that can be implemented in a TGI system, we did not discuss in this paper feedback and feedforward for TGI.Wensveen et al. offer useful guidelines for coupling feedback to user actions, which could be applied also to TGI [16], but the variety of application domains and of possible modalities to be used as feedback makes the definition of generic guidelines on feedback for TGI applications very difficult.

Conclusions
Gesturing with objects is rooted in HCI history since the birth of tangible interaction.Several different types of tangible gestures have been used in recent years for digital interaction purposes, but few studies have analyzed the forms that these gestures can assume.In this paper, we proposed a high-level classification of tangible gestures according to move, hold and touch, and to their semantic constructs.By reviewing existing work in literature, we propose guidelines and common practices to design and build new tangible gesture interactive systems.In particular, we investigated how object affordances and the application domain can lead to the design of different gesture types.We presented four application examples to demonstrate how TGIF can help in designing and building new TGI systems.Finally, we discussed the descriptive power, the evaluative power and the generative power of TGIF, showing how a TGI designer could use them.
Nevertheless, tangible gesture interaction is a very broad field and it still requires further investigation.With the broad diffusion of technology in everyday objects, tangible gesture interaction represents a promising means to leverage human skills in digital interaction, benefiting of our innate ability to manipulate objects and to understand and remember tangible gesture meanings.Within this context, TGIF aims at encouraging and guiding new work in this field, which tries to bring humans back "to the real world" [12] for digital interaction purposes.

Figure 2 .
Figure 2. The communication model of tangible gesture interaction.A user performs a tangible gesture, which is a sign with an associated meaning.The computer (and possibly the other users) interprets this sign and acknowledges the user with feedback.

Figure 3 .
Figure 3. Taxonomy of move, hold and touch combinations.

Figure 7 .
Figure 7.The four gestures for the embedded WheelSense system based on pressure sensors.

Figure 10 .
Figure 10.The six gestures for the embedded WheelSense system based on capacitive sensors: hand tap in the top/bottom/right side/left side, and hand swipe up/down.

Figure 11 .
Figure 11.The five gestures of the ADA lamp.

Figure 12 .
Figure 12.The hug gesture for the Hugginess system.

Figure 13 .
Figure 13.The 4 objects of the user evaluation (left), the fake prototype (right).

Table 2 .
TGIF design space explored through the application examples of Section 5.