Risks of Deep Reinforcement Learning Applied to Fall Prevention Assist by Autonomous Mobile Robots in the Hospital

Namba, Takaaki; Yamada, Yoji

doi:10.3390/bdcc2020013

Open AccessArticle

Risks of Deep Reinforcement Learning Applied to Fall Prevention Assist by Autonomous Mobile Robots in the Hospital

by

Takaaki Namba

^*

and

Yoji Yamada

Graduate School of Engineering, Nagoya University, Furo-cho, Chikusa-ku, Nagoya City 464-8603, Japan

^*

Author to whom correspondence should be addressed.

Big Data Cogn. Comput. 2018, 2(2), 13; https://doi.org/10.3390/bdcc2020013

Submission received: 30 April 2018 / Revised: 2 June 2018 / Accepted: 7 June 2018 / Published: 17 June 2018

(This article belongs to the Special Issue Applied Deep Learning: Business and Industrial Applications)

Download

Browse Figures

Versions Notes

Abstract

:

Our previous study proposed an automatic fall risk assessment and related risk reduction measures. A nursing system to reduce patient accidents was also developed, therefore reducing the caregiving load of the medical staff in hospitals. However, there are risks associated with artificial intelligence (AI) in applications such as assistant mobile robots that use deep reinforcement learning. In this paper, we discuss safety applications related to AI in fields where humans and robots coexist, especially when applying deep reinforcement learning to the control of autonomous mobile robots. First, we look at a summary of recent related work on robot safety with AI. Second, we extract the risks linked to the use of autonomous mobile assistant robots based on deep reinforcement learning for patients in a hospital. Third, we systematize the risks of AI and propose sample risk reduction measures. The results suggest that these measures are useful in the fields of clinical and industrial safety.

Keywords:

artificial intelligence; assistive robotics; deep reinforcement learning; fall prevention; machine learning; mobile robot; risk; safety

1. Introduction

1.1. Social Background

There have been significant improvements in accuracy (with regard to deep learning) in fields such as image recognition, behavior recognition, object detection, and scene recognition. Moreover, these techniques have begun to surpass the human abilities in some fields. Advances in technology will produce irreversible changes to the fundamental concepts of human life, otherwise known as “singularity” [1]. The support and replacement of human work by artificial intelligence (AI) systems (and the development of systems capable of cooperating with people) are progressing in various industrial fields. However, applying AI to safety technology is difficult, as it is prohibited by existing international standards related to the safety of machinery, such as IEC 61508 ed. 2 [2]. In practice, AI learns the surrounding features and its own behavior through interactions with its surroundings. Therefore, during AI development, the communication robot sometimes learns inappropriate behaviors by interacting with its surroundings. For example, in March 2016, Microsoft’s conversational AI “Tay” started to talk about racial discrimination, sexism, and conspiracy theories, which were learned through Twitter in technical experiments. This prompted Microsoft to immediately shut Tay down. In this way, safety is difficult to guarantee if a human being is not properly involved in the learning processes of AI. In May 2016, a Tesla model S collided with another vehicle while on autopilot and caused a fatality. Therefore, it can be difficult to use AI as an element of safety related parts.

1.2. Related Work

To apply AI technology to a system, there are three approaches, according to how AI and safety technology are involved. The “Three Safety Policies of Artificial Intelligence based on Robot Safety” have been proposed to achieve a systematic approach [3]. These policies regard the application of AI to non-safety-related parts, safety-related parts, and humans. We agree with this systematizing approach. On the other hand, the “Consideration of Errors and Faults Based on Machinery for Robot using Artificial Intelligence” has reported that it is appropriate to treat errors in AI functions as probabilistic faults [4]. Moreover, it has been revealed that it is impossible to eliminate errors in practice, because AI makes human-like errors. This reference paper has insisted that there are four possible ways to guarantee safety. The first one is analyzing trends, such as variance and standard deviation of learning error, and evaluating the likelihood of error. The second one is duplicating the system to secure diversity. The third one is reducing the possibility of errors. Finally, the fourth one is correctly evaluating the error risk of AI, comparing advantages and disadvantages and determining acceptable risk levels. In addition, it should be demonstrated that learning methods and supervised data are transparent (i.e., visible and explainable). This regards their development and evaluation processes, disclosure of accountability, clarification of accountability, recording of learning processes, and securing of reproducibility. In a study, the recognition accuracies of AI and sensor performance have been compared. However, we think that such a comparison is inappropriate, because AI also uses sensor data. Assuming that a sensor is in the foundation hierarchy, AI is in the application layer. The accuracies of using and not using AI (from the viewpoint of application) should be compared. In terms of safety verification, the necessity of establishing a quantitative and analytical evaluation method has been demonstrated. Further, the examination of a safety evaluation platform for robots using AI [5] is being addressed, and an autonomous moving function using intelligence can be installed as an additional interface. In a recent study, Fujiwara et al. proposed an asymmetry classification method for the judgment of safety, to suppress the probability of dangerous side failures by judging uncertainty as a dangerous side [6]. However, this method sacrifices the accuracy of multi-class classification of AI, instead of giving priority to safety. Apart from the safety perspective of applying AI, “The Japanese Society for Artificial Intelligence Ethical Guidelines” are considered important for research [7]. Google, OpenAI, Stanford University, and the University of California ( Berkeley) have reported five main challenges specifically associated with the safe use of AI [8]: (1) avoiding negative side effects, including adverse effects on the surroundings, the interactions with humans and the environment, and vandalism; (2) avoiding reward hacking, which takes into consideration the measures for achieving desire and malicious hacking from the outside; (3) scalable oversight for proper and efficient feedback; (4) safe exploration to secure safety, such as during learning by simulation; (5) robustness toward distributional shifts, to manage changes in cases that are significantly different from the learning environment.

In these previous studies, there has been insufficient consideration of the development procedures specific to AI, such as training/validity/verification and the safety of the entire AI life cycle (including the online/offline updating of AI models). Considering the entire lifecycle leads to a reduction in unsupported events. Moreover, regarding the systematization of risk and risk reduction measures of AI, a clarification on the risk factors in a human robot coexistence environment and specific draft measures for risk reduction are required. Furthermore, no measures have been devised to achieve compatibility between estimation ability and safety for unknown/unlearned subjects, so as not to impair the flexibility and robustness of AI.

1.3. Our Objective

In this paper, we extracted the risks and propose risk reduction measures when applying deep reinforcement learning to the control of an autonomous mobile robot. The robot was designed for assisting in preventing patients from falling over and reducing the caregiving load on families and medical staff. The content of this paper is mainly based on the development of a cognitive and control system, aimed at preventing elderly people falling by using an autonomous mobile robot, using the deep learning from our previous studies [9,10]. Three patient falling factors are: external factors (i.e., environment), internal factors of the patient (i.e., condition), and management (i.e., organization/system). Our approach was to assist in the recognition, judgment, and control of these factors using AI and robotics. Therefore, we performed automated and real-time risk assessments (and risk reduction measures) with an autonomous mobile robot, throughout the life cycle of patients in a hospital. Then, we applied both deep and deep reinforcement learning to detect unknown signs (such as a fall prediction) and automated the decision-making of optimal policy in intervention assistance.

First, we extracted the fall risks of patients in each phase of their life cycle in the hospital. Second, we discussed the goal of assistance by autonomous mobile robots. Third, we proposed assist methods using deep reinforcement learning. Finally, we extracted characteristic risks to AI (especially deep learning and deep reinforcement learning), systematized the risks, and then proposed some risk reduction measures mainly in terms of cognitive and control technology.

2. Materials and Methods

2.1. Fall Risk and Phases of a Patient’s Life Cycle in the Hospital

2.1.1. Upon Arrival

Emergency transportation and pick up.

In this phase, an initial screening for outpatients and inpatients was conducted by automatic risk assessment based on deep learning, using data from cameras and laser range finders (LRFs). This detects the equipment in the hospital and the number of humans, then analyzes their appearance and brought items. Here, it identified patients at high risk of falling and recorded the results in an electronic medical record linked with the patient ID. Thus, the medical staff and the mobile assistant robot shared the information and payed particular attention to the moderate- and high-risk patients. To judge whether a patient was moderate- or high-risk, an assessment score sheet was completed, and a judgement was made, as shown in Table 1 [11]

Loss of balance caused by the following:

Getting out of and transferring from a car
Bringing items (for example stick, wheel chair, cart, bag, umbrella, pet, and slippers)
Declining physical, mental, and emotional health or ability depending on age, multidrug administration, level of inebriation, sight and hearing impairment, and injury/physical condition
Slipping on wet floors of entrances and exits, tripping over a mat
Environments such as escalator steps, speed of the escalator, and getting on/off
Handrail positioning during movement
Floor geometric patterns or color transitions, lighting changes
Slope walking

2.1.2. Waiting for Examination and Consultation

In this phase, patients at high risk of falling were mainly monitored, and falling triggers around them were detected. Moreover, the conditions of the patient in the electronic medical record were considered. Thus, the data of current conditions and patient records were combined, and the current patient risk was judged.

Loss of balance and consciousness caused by the following:

Leaning against backless and unfixed chairs
Walking and other movements
Rising or standing up

2.1.3. Examination, Surgery, Treatment

This phase refers to the condition of the patient indicated in the electronic medical record and follow-ups.

Loss of balance and consciousness caused by the following:

Effects of changes in physical conditions
Sedation and anesthesia after surgery
Rising or standing up.

2.1.4. Rehabilitation

This phase refers to the condition of the patient indicated in the electronic medical record and follow-ups.

Loss of balance caused by the following:

Inadequate support equipment
Inappropriate care (holding both hands of a patient and guiding, the work of one person being done by two, the work of two persons being done by one, leaving a patient alone).

2.1.5. In the Hospital Room

This phase refers to the fall history of the patient and the conditions in the electronic medical record (paying particular attention to the moderate- and high-risk patients). To judge whether tahe patient was moderate- or high-risk, an assessment score sheet was completed, as shown in Table 1.

Loss of balance caused by the following:

Transfer between the bed, wheelchair, or stretcher
Performing activities by themselves (or without assistance), such as going to the toilet, taking a walk, or going to other areas of the hospital
Leaning on the drip stand
Forgetting to fix equipment (such as tables)
Environmental change due to differences in buildings and departments
Changes in the mental state, such as impatience, anxiety, or mental conditions
Personality traits, such as overconfidence and wariness of pressing a nurse call button

2.1.6. At Discharge

This phase refers to the condition of the patient indicated in the electronic medical record and subsequent actions.

Loss of balance and consciousness caused by the following:

Continued sedation and anesthetic effect

The other factors are as in Section 2.1.1.

2.2. Consideration on Support Target by Autonomous Mobile Robots

It is important to confirm the arrival of each patient and conduct a risk assessment for those who are considered high-risk. This includes first-time visitors to the hospitals. However, this is a burden for both patients and medical staff, as there are more than 2000 outpatients per day and more than 1000 beds for the inpatients.

We propose a nursing system that prevents patients from falling, as shown in Figure 1. When conducting the initial screening (to extract and focus attention on high-risk patients), some initiatives are considered useful for reducing the caregiving load for medical staff: using autonomous mobile robots with cameras, laser range finders (LRFs), and various sensors to detect risks, communicate with stakeholders, and assist with interventions, and using hospital facilities such as medical treatment reception machines with fixed cameras. Moreover, nursing patients in the absence of nurses are considered useful for automatic risk analysis. If it is necessary to improve these measures, warnings can be given by patrols to both nurse stations and people around hazardous equipment.

2.3. Proposal of Assist Method by Using Deep Reinforcement Learning

Conventional risk assessments and risk reduction measures are usually conducted manually, when employed in clinical safety requirement assessment or nursing planning. This type of assessment has the following limitations:

The completeness of risk extraction is dependent on the experience and capability of the medical staff
Risk assessment procedures are sometimes complex and require a specific number of person- hours depending on patient numbers. However, immediate risk assessment and reductions are required.

Therefore, we tried to select an approach that applies deep learning to risk assessment and risk reduction. However, when judging if an intervention is required, it is difficult to detect the medical situation and injury/illness of a patient from only camera images and sensor data. Thus, it is necessary to acquire information from the patient database, which is input by the medical staff as electronic medical records. Moreover, it is necessary to ascertain the operation status of the medical staff, thereby predicting the contents and timings of the support providing assistance

We propose autonomous mobile assistant robots (using deep reinforcement learning), which select optimal actions and respond robustly to multimodal input changes in the environment and patients. Figure 2 shows a deep Q-network (DQN) model for learning fall risk reduction measures. Input data comprise sensing data from the environment and patient conditional data from the electronic medical records. First, the three layers following the input layer are convolutional and pooling layers. Next, six layers are fully connected layers, including dropout to avoid overfitting. Each activation function is a rectified linear unit (ReLU). Finally, the output layer provides an estimated Q-value for each action. Therefore, we could obtain the optimal action with a maximum Q-value.

It is difficult to obtain the consent of the patient regarding personal information, including camera images and electronic medical charts. However, to avoid risks to patients’ health and life, it is approved to use this information by Article 23, paragraph 1, item 2 of the personal information protection law in Japan.

3. Results

The autonomous mobile robot used in this study had a two-wheel differential drive, and learning was performed under the conditions of Table 2.

We classified the characteristic risks to deep reinforcement learning into six factors as follows: Change, Error/Gap/Delay, Design, AI independence, Human mind, and Weakness of resilience. These were systematized, as shown in Figure 3 and Table 3.

3.1. Risk from Changes

In the conventional rule-based model, control is performed by branch judgment based on the threshold value. Therefore, it cannot respond appropriately in the case of unexpected conditions.

On the other hand, deep learning can respond flexibly according to the learning of features if the conditions are similar, even if they have not previously been experienced. However, generally, deep learning has inductive method risks. For deep reinforcement learning, taking into account the characteristics that are not good when changing the environment and rules, that are related to the covariate shift of neural network caused by data distribution shifts at training time and estimated time. It is necessary to repeat learning and updating, as shown in Fig. 4. The update procedure is shown by a lower left red circle arrow in Figure 4. Figure 4 also shows the AI life cycle: (1) drawing a vision, planning, and designing a concept; (2) procuring material and resources; (3) analyzing and defining the request; (4) designing AI; (5) implementing AI; (6) learning and creating the AI model; (7) validating the AI model and adjusting hyper-parameters; (8) verification and testing the AI model; (9) field variations; (10) providing applications and services; (11) disposing or recycling; in this phase, attention must be paid to security, such as tampering with the AI model; (12) operation management and maintaining all processes; if this phase is based on AI, it is thought that some external monitoring and multiplexing by different algorithms are necessary. In this lifecycle of AI, when learning and application/service are provided (in addition to the robustness of the response to the situation change of the assist target), it is necessary to recognize both the environmental change and the rate of change. These correspond to changes in the environment and patient behavior, changes in procedures, and changes in physical and mental conditions of the patient at a fall risk described in Section 2.1. They are thought to correspond to a covariate shift of the neural network, caused by data distribution shifts at training and estimated times.

Practical examples of this are changes in patient physical condition, mental condition, patients ward transition, medication/sedation treatments and procedures.

In order to reduce these risks, our policy recognizes change and changing rate. We should repeat learning and update the deep reinforcement learning model. The new AI model will recognize changed state. Moreover, it will predict situations a few seconds before happening.

Thus, we created an updated neural network model and performed automatic (and real-time) risk assessment and risk reduction. The updated neural network responded to time series and detected change and changing rate by Recurrent Neural Network with Long-Short Term Memories (RNN-LSTM) or Convolutional Neural Network (CNN) multi-streams, as shown in Figure 5.

3.2. Risk from Errors, Gaps, and Delay

In the conventional rule-based model, its validity and the optimality of the threshold are dependent on past statistic data or human designing skills.

However, in the DQN-based model (from the environmental and patient perception points of view), in addition to errors due to sensors, misrecognition and lack of unrecognition from the actual environment are considered to cause collisions and falls. Further, there is also a risk of misjudgment due to a lack of learning of the relationship between input and output. However, the fact that there is a mismatch may not in itself be a risk in realizing functions by deep learning. This is because the model recognizes the environment robustly by deep learning and does not achieve a recognition rate of 100% from the point of overfitting prevention. This is a compromise between robustness and accuracy. Moreover, when combining different deep learning frameworks and models, the aim is usually to improve the accuracy and stabilize the judgment. However, when the input data size (dimension) differ for each model, there is a risk of inconsistency and performance deterioration (due to the combination of model and input data) where resizing does not sufficiently apply the function, performance, and accuracy of the original model. The internal covariance shift of neural networks is caused by a distribution change due to simultaneous weight updates derived from the multilayer structure.

Furthermore, there is a risk of an accumulation of transmission delays from sensing and recognition through the integration of recognition, judgment, and control processes reflecting on the control environment. As time elapses, the state where the control should be applied and the actual state deviate from each other (which could cause loss of the adequacy), therefore real-time processing of a series of processing is required.

These errors, gaps, and delays cause collisions with obstacles and people (due to spatiotemporal errors of recognition) when robots assist in transferring patients from beds and wheelchairs, which can cause injuries. Thus, to reduce the risk, spatiotemporal deviation must be reduced, interfaces should be matched, learning should include deviation (noise) and delay, and functional safety processes should be verified. In particular, by data argumentation, early stopping, convolution and pooling, data meaning (ensemble, bagging, dropout), batch normalization (power distribution of each layer is fixed at learning), adjustments of hyper-parameter according to learning progress, validation in functional safety processes in real-time, and learning in a space–time robust simulation.

3.3. Risk from Design

There is a risk when certain factors are included in the design of AI. These factors are lack of variation of input data, low exploration rate, insufficient number of learnings, lack of neural network (NN) structure approximating action-value functions (number of nodes, layers, and network paths, inappropriate model selection, and approximate error), overfitting, malicious interference or data rewriting from the outside, errors of training procedure and method, and spatiotemporal errors of robot–environment interaction. In conventional machine learning, there are risks with respect to coverage, validity, and optimality, caused by humans designing decision boundaries. However, in deep reinforcement learning, there are risks depending on the design of state definitions, reward definitions, and choice of actions. It should be remembered that AI has different thought patterns to humans, because AI has a lack of biological constraints in time. For example, in deep reinforcement learning, there is a possibility of choosing actions with less risk for AI (irrespective of the magnitude of positive reward), as infinite reward will be obtained even if a little reward is considered for infinite time. A combination with a negative reward is also an important risk reduction measure. In addition, when the goal is far, methods such as compensation distribution are used to speed up learning but can also be regarded as a partial introduction of conventional rule-based type logic (for example a cost map). This may not necessarily be the optimal solution, because there is a risk that it will be induced by the person who decided the rule.

Furthermore, if it is a desire of AI to seek rewards (in deep reinforcement learning), then only seeking the means for achieving the purpose is requested. This is equivalent to “self-transcendence,” from the top of the viewpoint of Maslow’s hierarchy needs of seven levels. However, it is realized from the beginning without any other needs, such as self-realization requests, the concept of guilt, or humility. There is a risk that models are generated that lack social impact items, such as multi-viewpoint thinking from multimodal inputs and empathy. Therefore, it is necessary to design rewards to prevent these occurrences.

The limit of the conventional rule-based method is that the design range can be exceeded by human design, whereas the limit of the deep reinforcement learning method is that the correspondence range of the neural network model is not limited by human design. There is a risk that the neural network can be exceeded by own learning by design mistakes.

To reduce design risks, the following can be implemented: verification of function safety process, limiting the role of AI, adding assist functions to prevent accidents, external monitoring, and equipping with emergency stop functions. In particular, by minimizing the constraints, combining positive and negative rewards, designing reward considering social influence, clarifying assignment of AI-loaded equipment and assistant persons, confirming the role coverage, self-diagnosis-verifying mechanism installed, safety verification, the AI-robot is monitored and stopped when an abnormal behavior is detected.

3.4. Risk from AI Independence

It is possible for AI to have creativity, and even if initially there was integrity in the implementation of hardware and software, there are problems such as self-repair, automatic proofreading, self-replication and automatic function expansion, and attacks against other programs. There is also a risk that the portion involving the autonomy, creativity, and personality of AI will be expanded beyond the scope of the original implementation. For example, there are risks of complementing recognition by voluntarily moving parts that are hidden, and interpolating recognition by estimating even if the environments and human conditions are not visible and detectable.

To reduce this independence risk, we should consider the following points: no biological constraints, meaninglessness of reward when AI has its own value, and not overturned due to spatiotemporal blind spots. Moreover, AI predicts and complements against unknown information, excess/shortage of intervention, and attacks on people without human intervention. Therefore, a learning curriculum should be prepared that creates a mechanism to interrupt human interaction.

3.5. Risk from Human Mind

In the case of imitating learning to imitate human acts, although it seems no problem likely to occur at first glance because the content of the learning is limited, it is necessary to consider risks to human acts. Likewise, even in deep reinforcement learning, there are risks due to learning data and its curriculum. There is also a risk that even if there is no problem in the safety of AI itself, people intentionally (or unintentionally) misuse AI for criminal activities. For this risk, there are approaches for risk reduction measures by displaying the attitude of researchers, such as the ethics guidelines of the Ethics Committee of the Japan Society of AI [6].

This risk includes the possibilities of injury to patients, abuse, and crime. To reduce the risks, it is necessary to provide ethical guidelines for researchers and users, education, and legal constraints on the formulation and compliance of robot safety standards applied to AI.

3.6. Risk from Weakness of Resilience

Resilience is mainly related to the third approach as following: prevention, mitigation, and recovery. And it is the ability to handle the unexpected. Further, it is an indicator related to the sustainability of the system. It shows the ability to absorb changes and disturbances to maintain a certain state. Moreover, it depicts the ability to mitigate the effects of physical and social system tolerance and serious harm and to recover from a crisis. Bruneau et al. described the concept of resilience as the ability to minimize any reduction in the quality of life (infrastructure) due to earthquakes, and resilience consists of the following properties: robustness, redundancy, resourcefulness, and rapidity [12,13]. Moreover, these authors produced a graph of the resilience curve. Further, this concept and its properties are considered applicable to AI. Figure 6 shows a resilience curve, where the horizontal axis represents time, and the vertical axis represents the system function. The inclination to the right shows the severity and tolerance of harm. The rising line to the right shows the resilience.

In our previous study on the risk of AI [14], we systematically organized parts of risks and proposed some risk reduction measures. However, we did not consider the risks after an accident. Sometimes, a weakness of resilience is not considered. When a patient is seriously injured, this risk may lead to fatal conditions, extension of the hospitalization period, or failure to trust the hospital.

For the patient to recover as soon as possible, it is necessary to plan and clarify the restoration method and procedure assumed after an accident. Concisely, emergency preparedness is necessary. For example, in the case of deep reinforcement learning, we can strengthen the recovery ability by causing accidents in the simulation environment and giving more positive reward to the AI system, so that the recovery time can be shortened.

We propose four quantitative indicators of resilience for clinical safety: recovery time of a patient’s physical and mental condition (health), recovery time of the environment (environment), recovery time of the medical staff’s routine work (role), and recovery cost in the hospital (occupied period of beds for inpatients). However, there is room for discussion on what kind of indicator to quantitatively evaluate. These indicators are improved by the clarification of restoration methods and procedures assumed after an accident, which are a form of crisis management. For systems with stochastic uncertainty (such as robots applying AI), resilience is also necessary. In our experiment, we handled this with a safety reward. We conducted reinforcement learning by giving a positive reward for quickness of recovery after an accident and a negative reward for a delay.

4. Discussion

The risks of autonomous mobile assistant robot control based on deep reinforcement learning are considered to be high. We mainly considered as important the design, tolerance of AI autonomy, and mind. However, even if the risk is high, there is a possibility of applying AI, and the advantages and disadvantages should be compared. If the advantages outweigh the disadvantages, it is necessary to take measures against the disadvantages.

As a future work, in order to confirm the effectiveness of this proposal, not only the simulation but also the comparison of the results when the nurse with and without robots carry out the risk assessment and the risk reduction measures separately in the actual hospital. Moreover, the analysis of the related effects on the medical staff are necessary to measure the caregiving load reduction for the medical staff.

5. Conclusions

We extracted and systematized characteristic risks of AI. Moreover, we proposed risk reduction measures when deep reinforcement learning is applied to the control of autonomous mobile robots that assist in reducing the fall risk of patients in hospitals. These results are described for general purposes; therefore, the results suggest that these viewpoints (i.e., risk extraction and risk reduction measures applying deep learning) are useful not only in the field of clinical safety, but also in fields such as industrial safety.

Author Contributions

Conceptualization, Methodology, Analysis, Investigation, and Written, by T.N.; Reviewed by Y.Y.

Funding

This research received no external funding.

Acknowledgments

The authors thank the staff of the fall prevention working group at Nagoya University Hospital.

Conflicts of Interest

The authors declare no conflict of interest.

References

Kurzweil, R. The Singularity is Near: When Humans Transcend Biology; Viking Books: New York, NY, USA, 2005. [Google Scholar]
IEC International Electrotechnical Commission. IEC61508-ed2: Functional Safety of Electrical/Electronic/Programmable Electronic Safety-Related Systems; IEC International Electrotechnical Commission: Geneva, Switzerland, 2010. [Google Scholar]
Fujiwara, K.; Sumi, Y.; Ogure, T.; Nakabo, Y. Three Safety Policies of Artificial Intelligence based on Robot Safety. In Proceedings of the 2017 JSME Conference on Robotics and Mechatronics, 1A1-F01, Fukushima, Japan, 10–13 May 2017. (In Japanese). [Google Scholar]
Nakabo, Y.; Fujiwara, K.; Sumi, Y. Consideration of Errors and Faults Based on Safety of Machinery for Robot using Artificial Intelligence. In Proceedings of the 2017 JSME Conference on Robotics and Mechatronics, 1A1-F02, Fukushima, Japan, 10–13 May 2017. (In Japanese). [Google Scholar]
Sumi, Y.; Kim, B.; Fujiwara, K.; Nakabo, Y. Development of Safety Evaluation Platform for the Robot using Artificial Intelligence. In Proceedings of the 35th RSJ, 3J1-06, Kawagoe, Japan, 11–14 September 2017. (In Japanese). [Google Scholar]
Fujiwara, K.; Sumi, Y.; Ogure, T.; Nakabo, Y. Three Policies of AI-Safety in viewpoint of Functional Safety and Asymmetric Classification Methods for Judgement of Safety. In Proceedings of the 23nd Robotics Symposia, 3A1, Yaizu, Japan, 13–14 March 2018. (In Japanese). [Google Scholar]
The Japanese Society for Artificial Intelligence. The Ethic Committee of the Japanese Society for Artificial Intelligence, Ethical Guidelines; The Japanese Society for Artificial Intelligence: Tokyo, Japan, 2017. [Google Scholar]
Amodei, D.; Olah, C.; Steinhardt, J.; Christiano, P.; Schulman, J.; Mane, D. Concrete Problem in AI Safety. arXiv, 2016; arXiv:1606.06565v2. [Google Scholar]
Namba, T.; Yamada, Y. Risk Analysis Method Preventing the Elderly from Falling. In Proceedings of the 22nd Robotics Symposia, 3A1, Annaka, Japan, 15–16 March 2017. (In Japanese). [Google Scholar]
Namba, T.; Yamada, Y. Fall Risk Reduction for the Elderly by Using Mobile Robots Based on the Deep Reinforcement Learning. J. Robot. Netw. Artif. Life 2018, 4, 265–269. [Google Scholar] [CrossRef]
Kobayashi, K.; Imagawa, S.; Suzuki, Y.; Nishida, Y.; Nagao, Y.; Ishiguro, N. Analysis of falls that caused serious events in hospitalized patients. Geriatr. Gerontol. Int. 2017. [Google Scholar] [CrossRef] [PubMed]
Bruneau, M.; Chang, E.S.; Eguchi, T.R.; Lee, C.G.; O’Rourke, D.T.; Reinhorn, M.A.; Shinozuka, M.; Tierney, K.; Wallace, A.W.; Winterfeldt, V.D. A Framework to Quantitatively Assess and Enhance the Seismic Resilience of Communities. Earthq. Spectra 2003, 19, 733–752. [Google Scholar] [CrossRef]
Aoki, M.; Itoi, T.; Sekimura, N. Resilience-Based Framework of engineered Systems for Continuous Safety Improvement. In Proceedings of the 12th International Conference on Structural Safety and Reliability, Vienna, Austria, 6–10 August 2017. [Google Scholar]
Namba, T.; Yamada, Y. Risk of Deep Reinforcement Learning Applied to the Control Technology for the Autonomous Mobile Robot in the Human/Robot Coexisting Environment—A Study to Assist the Prevention from Falling for the patients in the Hospital. In Proceedings of the 2018 JSME Conference on Robotics and Mechatronics, 2A2-A13, Kokura, Japan, 2–5 June 2018. (In Japanese). [Google Scholar]

Figure 1. A nursing system preventing patients from falling using deep learning.

Figure 2. Deep Q-network model for learning fall risk reduction measures.

Figure 3. Characteristic risks to deep reinforcement learning.

Figure 4. Life cycle of artificial intelligence (AI).

Figure 5. Deep neural network model for risk assessment.

Figure 6. Resilience curve.

Table 1. Assessment score sheet for the patients and target of automation.

Assessment	Yes	No	Automation
			Outpatient	Inpatient
Past history
History of fall	1	0		○
History of syncope	1	0		○
History of convulsions	1	0		○
Impairment
Visual Impairment	1	0		○
Hearing impairment	1	0		○
Vertigo	1	0	○	○
Mobility
Wheelchair	1	0	○	○
Cane	1	0	○	○
Walker	1	0	○	○
Need assistance	1	0	○	○
Cognition Disturbance of consciousness
Restlessness	1	0	○	○
Memory disturbance	1	0		○
Decreased judgment	1	0		○
Dysuria
Incontinence	1	0		○
Frequent urination	1	0		○
Need helper	1	0		○
Go to bathroom often at night	1	0		○
Difficult to reach the toilet	1	0		○
Drug use
Sleeping pills	1	0		○
Psychotropic drugs	1	0		○
Morphine	1	0		○
Painkiller	1	0		○
Anti-Parkinson drug	1	0		○
Antihypertensive medication	1	0		○
Anticancer agents	1	0		○
Laxatives	1	0		○
Dysfunction
Muscle weakness	1	0		○
Paralysis, numbness	1	0	○	○
Dizziness	1	0	○	○
Bone malformation	1	0		○
Bone Rigidity	1	0		○
Brachybasia	1	0		○

Patients are classified into three groups: Grade 1 (low risk), Grade 2 (moderate risk), and Grade 3 (high risk) on the basis of total scores of 0–5, 6–15, and ≥ 16 (and including at least one item in each category), respectively.

Table 2. Conditions for deep reinforcement learning.

Definitions	Conditions
Experiment environment	Deep Q-Learning Simulator
Algorithm	ε-greedy strategy (ε = 0.9→0.01)
Reward discount rate	γ = 0.9
Learning style	Scene learning
AI platform	Chainer
Programing language	Python
State	196 dimensions
Camera/LRF sensing data	position, pose, time
Patient’s data	Fall history, medical condition, medication, aid
Management data	Operational status of nurses and robots
Action	37 actions
Speed fixed All 32 directions	32
Stop	1
Risk reduction measures	4 (Remove, avoid, transit, accept)
Reward	Normalize each element
Achieve purpose	Positive/Negative
Intervention effect	Positive/Negative
Transfer efficiency	Positive/Negative
Safety	Positive/Negative
Processing interval	100 ms

Table 3. Systemization of risks and proposal of risk reduction measures.

Risk Classification Characteristic of AI	Risk Factors		Risk Cases (Severity of Harm)	Construction Policy on Risk Reduction Measures	Proposed Risk Reduction Measures
Risk Classification Characteristic of AI	Ruled Based Type	DQN-Based Type	Risk Cases (Severity of Harm)	Construction Policy on Risk Reduction Measures	Proposed Risk Reduction Measures
Change	Completeness of branching by threshold	Inductive method exception	Leads inappropriate behavior (Fracture by falling)	Updating DNN model, and detecting change	Automatic real-time risk assessment and risk reduction (Learning international safety standards)
Error/Gap/Delay	Validity and optimality of threshold	Difference between virtual and real	Fracture caused by colliding or falling	Reducing spatiotemporal gap or learning including deviation	Data argumentation, ensemble, learning in space–time robust simulation
Design	Completeness, validity, optimality, security	Human defined parameter, no biological restriction	Fracture caused by colliding, slipping, falling during transfer or movement	External monitoring and emergency stop function	Safety verification,
AI independence	Designer’s intention	Creativity of AI	Attacking people without human intervention	Interacting with people	Mechanism to stop braking with human interaction
Human mind	Human malice, carelessness, misuse	Human malice, carelessness, misuse	Injuries to patients by outside purpose	Enlightenment, ethical guidelines, legislation	Safety standards related AI robotics, law constraints
Weakness of resilience	Unexpected accident	Not assuming after accident	Extension of hospitalization period	Clarification of restoration method	Simulate recovery method and procedure, Positive reward by shorten recovery time

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Namba, T.; Yamada, Y. Risks of Deep Reinforcement Learning Applied to Fall Prevention Assist by Autonomous Mobile Robots in the Hospital. Big Data Cogn. Comput. 2018, 2, 13. https://doi.org/10.3390/bdcc2020013

AMA Style

Namba T, Yamada Y. Risks of Deep Reinforcement Learning Applied to Fall Prevention Assist by Autonomous Mobile Robots in the Hospital. Big Data and Cognitive Computing. 2018; 2(2):13. https://doi.org/10.3390/bdcc2020013

Chicago/Turabian Style

Namba, Takaaki, and Yoji Yamada. 2018. "Risks of Deep Reinforcement Learning Applied to Fall Prevention Assist by Autonomous Mobile Robots in the Hospital" Big Data and Cognitive Computing 2, no. 2: 13. https://doi.org/10.3390/bdcc2020013

Article Menu

Risks of Deep Reinforcement Learning Applied to Fall Prevention Assist by Autonomous Mobile Robots in the Hospital

Abstract

1. Introduction

1.1. Social Background

1.2. Related Work

1.3. Our Objective

2. Materials and Methods

2.1. Fall Risk and Phases of a Patient’s Life Cycle in the Hospital

2.1.1. Upon Arrival

2.1.2. Waiting for Examination and Consultation

2.1.3. Examination, Surgery, Treatment

2.1.4. Rehabilitation

2.1.5. In the Hospital Room

2.1.6. At Discharge

2.2. Consideration on Support Target by Autonomous Mobile Robots

2.3. Proposal of Assist Method by Using Deep Reinforcement Learning

3. Results

3.1. Risk from Changes

3.2. Risk from Errors, Gaps, and Delay

3.3. Risk from Design

3.4. Risk from AI Independence

3.5. Risk from Human Mind

3.6. Risk from Weakness of Resilience

4. Discussion

5. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI