User Experience Sensor for Man–Machine Interaction Modeled as an Analogy to the Tower of Hanoi
1.1. Introduction to the Hybrid Intelligence System (HINT)
- an AI pre-programmed with specific facts and topics within a predefined domain of expert knowledge,
- the ability to learn/supplement the knowledge by inserting new facts and relations,
- an assistance of a human expert in case of lack of knowledge or any other emergency situation.
1.2. The Research Objective
1.3. Advantages and Limitations of the Proposed System
- The system, its behaviour and interface is intuitive enough to be used by untrained employees.
- The system supports quick employee training and efficient problem solving .
- Improvement of the quality of work .
- Relieving highly qualified employees from time-consuming activities (such as teaching beginners)  (see part 4, chapter 13).
- Causing an employee not to feel watched/supervised by another person, which is a considerable discomfort for some people .
- The system allows employees to make decisions independently within certain limits. The employee can choose the type and form of information that suits him best, with particular groups of employees choosing different forms of providing virtually the same information .
- The system is able to analyze the effectiveness of the information provided, and to check if another form of information would be better for a given employee (as a result of: employee’s request to change the form, numerous changes of the information form or the Avatar’s suggestion after interaction), and thus modifying the employee’s preferences to such preferences that increase efficiency .
- The need to build an appropriate system.
- The need for a proper space to mount the interaction hardware (screen, microphone, camera, etc.).
- The need to include the documentation and manuals into the system.
- The system requires good availability of an expert with high knowledge and special predispositions to fit the role of an Avatar.
2. Materials and Methods
2.1. The Conception of the Proposed Hybrid Intelligence System (HINT)
- communication with the system in various languages,
- process support described by stages (e.g., custom manufacturing)—products are manufactured based on customer orders,
- production stage control,
- in problem cases, by providing access to a human-expert (Avatar).
- backend—as a unit that processes audio and video streams into inputs of other components, generates a response to the HMI and controls the work of components dependent on it,
- Avatar console—as an interface unit for an expert supporting/supervising the process/processes,
- client—as a unit that (1) acquires physical data and forwards it to a backend, and (2) is an output interface for the response generated by backend.
- flawless switching between autonomous conversation system (human – AI) and assisted conversation mode (human–specialist) giving the impression of an unlimited artificial intelligence technology to the interlocutor,
- support of the system’s knowledge by the operator’s knowledge, by supplementing it by adding new facts and relations to the system,
- solving problematic situations in a manner typical of a human being, as opposite to a manner typical of information systems,
- and others not mentioned here.
- from the human interlocutor’s point of view: extending the intelligence and knowledge of the robot (seems to be similar to attempts to deceive the Turing Test by helping the AI),
- from the robot’s point of view: supporting the system in the situation of absence of information necessary to continue the conversation and supplementing the system’s knowledge.
2.3. Key Technologies
- Active Beamforming —gives the possibility to filter the acquired sound basing on the source location in two- or three-dimensional space, using a microphone matrix. Using Active Beamforming within the vision system  feedback loop  provides additional information on the location of the interlocutor, and the beamforming subsystem can be reconfigured to target the acquisition coordinates to a specific area/direction,
- Speech-to-text conversion—a spectrogram of a specified time segment of the recorded sound sample can be analyzed based on large recursive neural networks [27,28]. In order to improve the recognition efficiency of the most frequently issued commands, the system has been expanded with an additional context containing quantitative frequency data generated by FFT for each of the system users and a subsystem of enhancing important signal features ,
- Natural language processing in the above-described system is designed to convert text into structured data and determine the interlocutor’s intentions along with the preparation of data for the expert network,
- Expert Bayesian Networks —for making decisions in the conditions of incompleteness and uncertainty of input data based on the structured context provided by NLP as well as subsystems providing only context data (video subsystem, quality subsystem).
2.4. Operation Mode 1—(Semi-) Unattended Autonomous Mode with Optional Avatar Support
2.5. Operation Mode 2—Auto-Corrected Autonomous Mode with Optional Avatar Support
- the text of the answer is not modified, but confirmed by the Avatar, causing the system to raise the confidence parameter (this particular answer will be used more frequently in similar future situations, possibly without the need to disrupt the Avatar),
- the text of the answer is modified by the Avatar, and the system will use a few question–answer pairs to re-learn this particular situation, and thus to improve the knowledge base (Bayes- and Process-bases) as well as the NLP engine (NLP bases).
2.6. The Reasons Causing the Initialization of Connection with an Avatar
- Upon the Avatar’s request
- At any time, the Avatar that is observing the course of the conversation can take over the conversation, thus replacing the AI engine output by typing the answer/text directly with a keyboard. This option is used rarely, only if a particular customer needs to be served in a perfect way (completely trouble-free, without delays, without missed answers, and even without universal/fuzzy answers).
- Avatar can take over the interaction of any agent when he/she needs fast communication with people in the case of an emergency (e.g., the camera of the robot captured an event like a fire or a pickpocket and the Avatar wants to warn the robot’s interlocutor or to ask him for a specific action)
- Avatar can take over the interaction of an agent, if the values of conversation quality parameters (STT duration or confidence level -or- conversation topic identification) or values of technical parameters (e.g., increased transmission latency to/from cloud services) are relatively low—the Avatar can improve the overall interaction quality by taking over the conversation
- Initiated by NLP engine of the HINT system
- If the NLP or SpeechToText engine of the HINT system does not correctly recognize the topic or parameters of a human speech/text or the expert network explicitly recommends getting help, it may suggest/request Avatar assistance
- If a given interaction path does not hold one/some of the technical parameters or does not provide the availability of a required service (including SpeechToText service delays, unacceptable ping or jitter, no internet connection, etc.)
- Initiated by the HINT knowledge base module broker
- If the NLP engine correctly recognizes the topic, subject, intention, history of the conversation but the knowledge base lacks a given value/topic/article/subject, the system may request Avatar assistance
- In case the NLP engine correctly recognizes the topic, subject, intention, history of the conversation, but there are additional elements that interfere with the interpretation (e.g., the verbs: connect, fix, break), whose presence reduces the score value of some indicators of the quality of substantive conversation.
- Initiated by the HINT process stage control system
- If the stage control system described in Section 2.7 provides premises for activating the Avatar connection option
- Or, if other subsystems of the HINT system have security incidents implemented (e.g., movement at night, smoke or fire), the system may trigger various alarm actions, one of which is activating the remote connection with the Avatar.
2.7. The Challenge
- first of all, it is assumed that the diameters of the rings placed on the tower will be determined and adjusted to the number of stages of the entire analyzed process.
- second, to clarify the analysis, it is assumed that the ring corresponding to the first stage of the process have larger diameter and that the rings will gradually decrease their diameters, so that the rings corresponding to the initial stages of the process have larger diameters than those corresponding to the final stages.
- third, the height of individual rings will depend on and result from the duration of performing a given stage of the process.
- fourth, the colors used in the rings will reflect the system user’s actions possible in the system—various information paths (e.g., passing information using speech, text, video or by using a default response).
The System Operation Strategy Matched with the Tower of Hanoi Structure and the Impact of the Choice of System Processes on the Duration of the Stage
- —the maximal acceptable process time,
- or —Kronecker delta specifying the correct or incorrect execution of a process step,
- —factor determining employee experience,
- or —element of the set of possible actions of the system user,
- or —part of a set of possible system actions when providing information,
- —changes in execution time of a stage and a process.
- or —set of possible system user actions,
- or —default action when no action is taken by a user,
- or —set of possible system responses,
- or —set of system default responses.
- detecting the bottlenecks in the operation of subsystems that would cause a loss of fluidity of interaction, and
- detecting the moment when a human-expert resource should be attached automatically (as a result of the AI system’s decision), most likely resulting from a local inefficiency and lack of process optimization of a current stage of the process.
4.1. Comparison with Other Concepts
4.2. Method Evaluation and Discussion
4.3. Future Work
Conflicts of Interest
|HINT||Hybrid Intelligence System;|
|STT||Speech to Text;|
|NLP||Natural Language Processing;|
|NLU||Natural Language Understanding;|
|VoIP||Voice over IP.|
- Norman, D. The Invisible Computer: Why Good Products Can Fail, the Personal Computer is so Complex, and Information Appliances Are the Answer; The MIT Press: Cambridge, MA, USA, 1998. [Google Scholar]
- ISO. ISO FDIS 9241-210 Human-Centred Design Process for Interactive Systems; ISO: Geneva, Switzerland, 2019. [Google Scholar]
- Adikari, S.; McDonald, C.; Campbell, J. User experience in HMI: An enhanced assessment model. In Sustainable Development through Effective Man-Machine Co-Existence; IEEE: Colombo, Sri Lanka, 2010; pp. 17–19. Available online: https://pdfs.semanticscholar.org/6c09/86422e23c53eaa32c1ed4c182df0f2940116.pdf (accessed on 1 March 2020).
- ISO. ISO 9241-11—Ergonomic Requirements for Office Work with Visual Display Terminals (VDT)s—Part 11 Guidance on Usability; ISO: Geneva, Switzerland, 2018. [Google Scholar]
- Lindgaard, G. Aesthetics, visual appeal, usability, and user satisfaction: What do the user’s eyes tell the user’s brain. Aust. J. Emerg. Technol. Soc. 2007, 5, 1–16. [Google Scholar]
- Protalinski, E. Google’s Speech Recognition Technology Now Has a 4.9% Word Error Rate. 2017. Available online: https://venturebeat.com/2017/05/17/googles-speech-recognition-technology-now-has-a-4-9-word-error-rate/ (accessed on 1 September 2019).
- Google Inc. Google Cloud Platform—Cloud Speech API. Available online: https://cloud.google.com/speech/ (accessed on 31 August 2017).
- Pierce, D. Siri Finally Got its Coming Out Party. Available online: https://www.wired.com/2017/06/siri-finally-got-coming-party/ (accessed on 2 September 2017).
- Soto, P. The Major Advancements in Deep Learning in 2016; Tryo Labs: San Fransicso, CA, USA, 2016; Available online: https://tryolabs.com/blog/2016/12/06/major-advancements-deep-learning-2016/ (accessed on 4 January 2020).
- Appenzeller, T. The AI revolution in science. Science 2017, 357, 16–17. [Google Scholar] [CrossRef] [PubMed][Green Version]
- Mao, A.; Zhang, H.; Liu, Y.; Zheng, Y.; Li, G.; Han, G. Easy and Fast Reconstruction of a 3D Avatar with an RGB-D Sensor. Sensors 2017, 17, 1113. [Google Scholar] [CrossRef][Green Version]
- Janssen, C.P.; Donker, S.F.; Brumby, D.P.; Kun, A.L. History and future of human-automation interaction. Int. J. Hum.-Comput. Stud. 2019, 131, 99–107. [Google Scholar] [CrossRef]
- Kamar, E. Directions in Hybrid Intelligence: Complementing AI Systems with Human Intelligence. In Proceedings of the 25 International Joint Conference on Artificial Intelligence (IJCAI-16), New York, NY, USA, 9–15 July 2016; pp. 4070–4073. [Google Scholar]
- Hussein, A.; Abbass, H. Mixed initiative systems for human-swarm interaction: Opportunities and challenges. In Proceedings of the 2nd Annual Systems Modelling Conference (SMC), Canberra, Australia, 4 October 2018; pp. 1–8. [Google Scholar]
- Pollock, S. 6 Strategies for Effective Performance Management. Available online: https://hrdailyadvisor.blr.com/2018/01/11/6-strategies-effective-performance-management/ (accessed on 2 July 2020).
- Bevilacqua, M.; Bottani, E.; Ciarapica, F.E.; Costantino, F.; Di Donato, L.; Ferraro, A.; Mazzuto, G.; Monteriù, A.; Nardini, G.; Ortenzi, M.; et al. Digital Twin Reference Model Development to Prevent Operators’ Risk in Process Plants. Sustainability 2020, 12, 1088. [Google Scholar] [CrossRef][Green Version]
- Gibson, J.; Ivancevich, J.; Konopaske, R. Organizations: Behavior, Structure, Processes; McGraw-Hill Higher Education: New York, NY, USA, 2011. [Google Scholar]
- Stein, M.; Vincent-Höper, S.; Schümann, M.; Gregersen, S. Beyond mistreatment at the relationship level: Abusive supervision and illegitimate tasks. Int. J. Environ. Res. Public Health 2020, 17, 2722. [Google Scholar] [CrossRef][Green Version]
- Conesa-Muñoz, J.; Gonzalez-de-Soto, M.; Gonzalez-de-Santos, P.; Ribeiro, A. Distributed multi-level supervision to effectively monitor the operations of a fleet of autonomous vehicles in agricultural tasks. Sensors 2015, 15, 5402–5428. [Google Scholar] [CrossRef] [PubMed][Green Version]
- Google Inc. Google Cloud Platform—Google Speech-To-Text Official Documentation. Available online: https://cloud.google.com/speech-to-text (accessed on 15 April 2020).
- Google Inc. Dialogflow Documentation. Available online: https://dialogflow.com/docs (accessed on 20 May 2020).
- SoftBank Robotics. Pepper. Available online: https://www.softbankrobotics.com/us/pepper (accessed on 1 May 2020).
- Sanbot Innovation Technology Ltd. Sanbot Humanoid Webpage. Available online: http://en.sanbot.com/product/sanbot-elf (accessed on 25 May 2020).
- Chen, H.; Zhang, P.; Yan, Y. Multi-Talker MVDR Beamforming Based on Extended Complex Gaussian Mixture Model. arXiv 2019, arXiv:1910.07753. [Google Scholar]
- Fularz, M.; Kraft, M.; Schmidt, A.; Kasiński, A. The architecture of an embedded smart camera for intelligent inspection and surveillance. In International Conference on Automation; Springer: Cham, Switzerland, 2015; pp. 43–52. [Google Scholar]
- Schneider, M.; Machacek, Z.; Martinek, R.; Koziorek, J.; Jaros, R. A System for the Detection of Persons in Intelligent Buildings Using Camera Systems—A Comparative Study. Sensors 2020, 20, 3558. [Google Scholar] [CrossRef] [PubMed]
- Purwins, H.; Li, B.; Virtanen, T.; Schlüter, J.; Chang, S.Y.; Sainath, T. Deep learning for audio signal processing. IEEE J. Sel. Top. Signal Process. 2019, 13, 206–219. [Google Scholar] [CrossRef][Green Version]
- Hannun, A.; Case, C.; Casper, J.; Catanzaro, B.; Diamos, G.; Elsen, E.; Ng, A.Y. Deep speech: Scaling up end-to-end speech recognition. arXiv 2014, arXiv:1412.5567. [Google Scholar]
- Ni, Z.; Mandel, M.I. ONSSEN: An open-source speech separation and enhancement library. arXiv 2019, arXiv:1911.00982. [Google Scholar]
- Zhang, L.; Wu, X.; Ding, L.; Skibniewski, M.J.; Yan, Y. Decision support analysis for safety control in complex project environments based on Bayesian Networks. Expert Syst. Appl. 2013, 40, 4273–4282. [Google Scholar] [CrossRef]
- Beniak, R. Estimation of parameters of selected converter drives. Arch. Electr. Eng. 2012, 61, 533–565. [Google Scholar] [CrossRef][Green Version]
- Bryant, A.T.; Kang, X.; Santi, E.; Palmer, P.R.; Hudgins, J.L. Two-Step Parameter Extraction Procedure With Formal Optimization for Physics-Based Circuit Simulator IGBT and p-i-n Diode Models. IEEE Trans. Power Electron. 2006, 21, 295–309. [Google Scholar] [CrossRef]
|Average Number of Stages Exceeding the Limit||Average Number of Connections with Avatar||Average Avatar Involvement Time [s]||Average Total Process Time [min]||Average Process Quality (0–100%)|
|Before Optimization||3.2||1.9||62 s||52||66|
|After Optimization||0.6||0.7||26 s||46||72|
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gardecki, A.; Podpora, M.; Beniak, R.; Klin, B.; Pochwała, S. User Experience Sensor for Man–Machine Interaction Modeled as an Analogy to the Tower of Hanoi. Sensors 2020, 20, 4074. https://doi.org/10.3390/s20154074
Gardecki A, Podpora M, Beniak R, Klin B, Pochwała S. User Experience Sensor for Man–Machine Interaction Modeled as an Analogy to the Tower of Hanoi. Sensors. 2020; 20(15):4074. https://doi.org/10.3390/s20154074Chicago/Turabian Style
Gardecki, Arkadiusz, Michal Podpora, Ryszard Beniak, Bartlomiej Klin, and Sławomir Pochwała. 2020. "User Experience Sensor for Man–Machine Interaction Modeled as an Analogy to the Tower of Hanoi" Sensors 20, no. 15: 4074. https://doi.org/10.3390/s20154074