Risks of Deep Reinforcement Learning Applied to Fall Prevention Assist by Autonomous Mobile Robots in the Hospital

Our previous study proposed an automatic fall risk assessment and related risk reduction measures. A nursing system to reduce patient accidents was also developed, therefore reducing the caregiving load of the medical staff in hospitals. However, there are risks associated with artificial intelligence (AI) in applications such as assistant mobile robots that use deep reinforcement learning. In this paper, we discuss safety applications related to AI in fields where humans and robots coexist, especially when applying deep reinforcement learning to the control of autonomous mobile robots. First, we look at a summary of recent related work on robot safety with AI. Second, we extract the risks linked to the use of autonomous mobile assistant robots based on deep reinforcement learning for patients in a hospital. Third, we systematize the risks of AI and propose sample risk reduction measures. The results suggest that these measures are useful in the fields of clinical and industrial safety.


Social Background
There have been significant improvements in accuracy (with regard to deep learning) in fields such as image recognition, behavior recognition, object detection, and scene recognition.Moreover, these techniques have begun to surpass the human abilities in some fields.Advances in technology will produce irreversible changes to the fundamental concepts of human life, otherwise known as "singularity" [1].The support and replacement of human work by artificial intelligence (AI) systems (and the development of systems capable of cooperating with people) are progressing in various industrial fields.However, applying AI to safety technology is difficult, as it is prohibited by existing international standards related to the safety of machinery, such as IEC 61508 ed. 2 [2].In practice, AI learns the surrounding features and its own behavior through interactions with its surroundings.Therefore, during AI development, the communication robot sometimes learns inappropriate behaviors by interacting with its surroundings.For example, in March 2016, Microsoft's conversational AI "Tay" started to talk about racial discrimination, sexism, and conspiracy theories, which were learned through Twitter in technical experiments.This prompted Microsoft to immediately shut Tay down.In this way, safety is difficult to guarantee if a human being is not properly involved in the learning processes of AI.In May 2016, a Tesla model S collided with another vehicle while on autopilot and caused a fatality.Therefore, it can be difficult to use AI as an element of safety related parts.

Related Work
To apply AI technology to a system, there are three approaches, according to how AI and safety technology are involved.The "Three Safety Policies of Artificial Intelligence based on Robot Safety" have been proposed to achieve a systematic approach [3].These policies regard the application of AI to non-safety-related parts, safety-related parts, and humans.We agree with this systematizing approach.On the other hand, the "Consideration of Errors and Faults Based on Machinery for Robot using Artificial Intelligence" has reported that it is appropriate to treat errors in AI functions as probabilistic faults [4].Moreover, it has been revealed that it is impossible to eliminate errors in practice, because AI makes human-like errors.This reference paper has insisted that there are four possible ways to guarantee safety.The first one is analyzing trends, such as variance and standard deviation of learning error, and evaluating the likelihood of error.The second one is duplicating the system to secure diversity.The third one is reducing the possibility of errors.Finally, the fourth one is correctly evaluating the error risk of AI, comparing advantages and disadvantages and determining acceptable risk levels.In addition, it should be demonstrated that learning methods and supervised data are transparent (i.e., visible and explainable).This regards their development and evaluation processes, disclosure of accountability, clarification of accountability, recording of learning processes, and securing of reproducibility.In a study, the recognition accuracies of AI and sensor performance have been compared.However, we think that such a comparison is inappropriate, because AI also uses sensor data.Assuming that a sensor is in the foundation hierarchy, AI is in the application layer.The accuracies of using and not using AI (from the viewpoint of application) should be compared.In terms of safety verification, the necessity of establishing a quantitative and analytical evaluation method has been demonstrated.Further, the examination of a safety evaluation platform for robots using AI [5] is being addressed, and an autonomous moving function using intelligence can be installed as an additional interface.In a recent study, Fujiwara et al. proposed an asymmetry classification method for the judgment of safety, to suppress the probability of dangerous side failures by judging uncertainty as a dangerous side [6].However, this method sacrifices the accuracy of multi-class classification of AI, instead of giving priority to safety.Apart from the safety perspective of applying AI, "The Japanese Society for Artificial Intelligence Ethical Guidelines" are considered important for research [7].Google, OpenAI, Stanford University, and the University of California ( Berkeley) have reported five main challenges specifically associated with the safe use of AI [8]: (1) avoiding negative side effects, including adverse effects on the surroundings, the interactions with humans and the environment, and vandalism; (2) avoiding reward hacking, which takes into consideration the measures for achieving desire and malicious hacking from the outside; (3) scalable oversight for proper and efficient feedback; (4) safe exploration to secure safety, such as during learning by simulation; (5) robustness toward distributional shifts, to manage changes in cases that are significantly different from the learning environment.
In these previous studies, there has been insufficient consideration of the development procedures specific to AI, such as training/validity/verification and the safety of the entire AI life cycle (including the online/offline updating of AI models).Considering the entire lifecycle leads to a reduction in unsupported events.Moreover, regarding the systematization of risk and risk reduction measures of AI, a clarification on the risk factors in a human robot coexistence environment and specific draft measures for risk reduction are required.Furthermore, no measures have been devised to achieve compatibility between estimation ability and safety for unknown/unlearned subjects, so as not to impair the flexibility and robustness of AI.

Our Objective
In this paper, we extracted the risks and propose risk reduction measures when applying deep reinforcement learning to the control of an autonomous mobile robot.The robot was designed for assisting in preventing patients from falling over and reducing the caregiving load on families and medical staff.The content of this paper is mainly based on the development of a cognitive and control system, aimed at preventing elderly people falling by using an autonomous mobile robot, using the deep learning from our previous studies [9,10].Three patient falling factors are: external factors (i.e., environment), internal factors of the patient (i.e., condition), and management (i.e., organization/system).Our approach was to assist in the recognition, judgment, and control of these factors using AI and robotics.Therefore, we performed automated and real-time risk assessments (and risk reduction measures) with an autonomous mobile robot, throughout the life cycle of patients in a hospital.Then, we applied both deep and deep reinforcement learning to detect unknown signs (such as a fall prediction) and automated the decision-making of optimal policy in intervention assistance.
First, we extracted the fall risks of patients in each phase of their life cycle in the hospital.Second, we discussed the goal of assistance by autonomous mobile robots.Third, we proposed assist methods using deep reinforcement learning.Finally, we extracted characteristic risks to AI (especially deep learning and deep reinforcement learning), systematized the risks, and then proposed some risk reduction measures mainly in terms of cognitive and control technology.

Upon Arrival
Emergency transportation and pick up.In this phase, an initial screening for outpatients and inpatients was conducted by automatic risk assessment based on deep learning, using data from cameras and laser range finders (LRFs).This detects the equipment in the hospital and the number of humans, then analyzes their appearance and brought items.Here, it identified patients at high risk of falling and recorded the results in an electronic medical record linked with the patient ID.Thus, the medical staff and the mobile assistant robot shared the information and payed particular attention to the moderate-and high-risk patients.To judge whether a patient was moderate-or high-risk, an assessment score sheet was completed, and a judgement was made, as shown in Table 1 [11]  In this phase, patients at high risk of falling were mainly monitored, and falling triggers around them were detected.Moreover, the conditions of the patient in the electronic medical record were considered.Thus, the data of current conditions and patient records were combined, and the current patient risk was judged.
Loss of balance and consciousness caused by the following: This phase refers to the condition of the patient indicated in the electronic medical record and follow-ups.
Loss of balance and consciousness caused by the following:

•
Effects of changes in physical conditions • Sedation and anesthesia after surgery • Rising or standing up.

Rehabilitation
This phase refers to the condition of the patient indicated in the electronic medical record and follow-ups.
Loss of balance caused by the following: Inappropriate care (holding both hands of a patient and guiding, the work of one person being done by two, the work of two persons being done by one, leaving a patient alone).

In the Hospital Room
This phase refers to the fall history of the patient and the conditions in the electronic medical record (paying particular attention to the moderate-and high-risk patients).To judge whether tahe patient was moderate-or high-risk, an assessment score sheet was completed, as shown in Table 1.
Loss of balance caused by the following: • Transfer between the bed, wheelchair, or stretcher • Performing activities by themselves (or without assistance), such as going to the toilet, taking a walk, or going to other areas of the hospital • Leaning on the drip stand • Forgetting to fix equipment (such as tables)

•
Environmental change due to differences in buildings and departments

•
Changes in the mental state, such as impatience, anxiety, or mental conditions • Personality traits, such as overconfidence and wariness of pressing a nurse call button

At Discharge
This phase refers to the condition of the patient indicated in the electronic medical record and subsequent actions.
Loss of balance and consciousness caused by the following:

• Continued sedation and anesthetic effect
The other factors are as in Section 2.1.1.

Consideration on Support Target by Autonomous Mobile Robots
It is important to confirm the arrival of each patient and conduct a risk assessment for those who are considered high-risk.This includes first-time visitors to the hospitals.However, this is a burden for both patients and medical staff, as there are more than 2000 outpatients per day and more than 1000 beds for the inpatients.
We propose a nursing system that prevents patients from falling, as shown in Figure 1.When conducting the initial screening (to extract and focus attention on high-risk patients), some initiatives are considered useful for reducing the caregiving load for medical staff: using autonomous mobile robots with cameras, laser range finders (LRFs), and various sensors to detect risks, communicate with stakeholders, and assist with interventions, and using hospital facilities such as medical treatment reception machines with fixed cameras.Moreover, nursing patients in the absence of nurses are considered useful for automatic risk analysis.If it is necessary to improve these measures, warnings can be given by patrols to both nurse stations and people around hazardous equipment.factors using AI and robotics.Therefore, we performed automated and real-time risk assessments (and risk reduction measures) with an autonomous mobile robot, throughout the life cycle of patients in a hospital.Then, we applied both deep and deep reinforcement learning to detect unknown signs (such as a fall prediction) and automated the decision-making of optimal policy in intervention assistance.First, we extracted the fall risks of patients in each phase of their life cycle in the hospital.Second, we discussed the goal of assistance by autonomous mobile robots.Third, we proposed assist methods using deep reinforcement learning.Finally, we extracted characteristic risks to AI (especially deep learning and deep reinforcement learning), systematized the risks, and then proposed some risk reduction measures mainly in terms of cognitive and control technology.

Upon Arrival
Emergency transportation and pick up.In this phase, an initial screening for outpatients and inpatients was conducted by automatic risk assessment based on deep learning, using data from cameras and laser range finders (LRFs).This detects the equipment in the hospital and the number of humans, then analyzes their appearance and brought items.Here, it identified patients at high risk of falling and recorded the results in an

Proposal of Assist Method by Using Deep Reinforcement Learning
Conventional risk assessments and risk reduction measures are usually conducted manually, when employed in clinical safety requirement assessment or nursing planning.This type of assessment has the following limitations:

•
The completeness of risk extraction is dependent on the experience and capability of the medical staff • Risk assessment procedures are sometimes complex and require a specific number of personhours depending on patient numbers.However, immediate risk assessment and reductions are required.
Therefore, we tried to select an approach that applies deep learning to risk assessment and risk reduction.However, when judging if an intervention is required, it is difficult to detect the medical situation and injury/illness of a patient from only camera images and sensor data.Thus, it is necessary to acquire information from the patient database, which is input by the medical staff as electronic medical records.Moreover, it is necessary to ascertain the operation status of the medical staff, thereby predicting the contents and timings of the support providing assistance We propose autonomous mobile assistant robots (using deep reinforcement learning), which select optimal actions and respond robustly to multimodal input changes in the environment and patients.Figure 2 shows a deep Q-network (DQN) model for learning fall risk reduction measures.Input data comprise sensing data from the environment and patient conditional data from the electronic medical records.First, the three layers following the input layer are convolutional and pooling layers.Next, six layers are fully connected layers, including dropout to avoid overfitting.Each activation function is a rectified linear unit (ReLU).Finally, the output layer provides an estimated Q-value for each action.Therefore, we could obtain the optimal action with a maximum Q-value.
It is difficult to obtain the consent of the patient regarding personal information, including camera images and electronic medical charts.However, to avoid risks to patients' health and life, it is approved to use this information by Article 23, paragraph 1, item 2 of the personal information protection law in Japan.

Results
The autonomous mobile robot used in this study had a two-wheel differential drive, and learning was performed under the conditions of Table 2.We classified the characteristic risks to deep reinforcement learning into six factors as follows: Change, Error/Gap/Delay, Design, AI independence, Human mind, and Weakness of resilience.These were systematized, as shown in Fig. 3 and Table 3.

Results
The autonomous mobile robot used in this study had a two-wheel differential drive, and learning was performed under the conditions of Table 2.We classified the characteristic risks to deep reinforcement learning into six factors as follows: Change, Error/Gap/Delay, Design, AI independence, Human mind, and Weakness of resilience.These were systematized, as shown in Figure 3 and Table 3.

Risk from Changes
In the conventional rule-based model, control is performed by branch judgment based on the threshold value.Therefore, it cannot respond appropriately in the case of unexpected conditions.
On the other hand, deep learning can respond flexibly according to the learning of features if the conditions are similar, even if they have not previously been experienced.However, generally, deep learning has inductive method risks.For deep reinforcement learning, taking into account the characteristics that are not good when changing the environment and rules, that are related to the covariate shift of neural network caused by data distribution shifts at training time and estimated time.It is necessary to repeat learning and updating, as shown in Fig. 4. The update procedure is shown by a lower left red circle arrow in Figure 4.

Risk from Changes
In the conventional rule-based model, control is performed by branch judgment based on the threshold value.Therefore, it cannot respond appropriately in the case of unexpected conditions.
On the other hand, deep learning can respond flexibly according to the learning of features if the conditions are similar, even if they have not previously been experienced.However, generally, deep learning has inductive method risks.For deep reinforcement learning, taking into account the characteristics that are not good when changing the environment and rules, that are related to the covariate shift of neural network caused by data distribution shifts at training time and estimated time.It is necessary to repeat learning and updating, as shown in Fig. 4. The update procedure is shown by a lower left red circle arrow in Figure 4. Practical examples of this are changes in patient physical condition, mental condition, patients ward transition, medication/sedation treatments and procedures.
In order to reduce these risks, our policy recognizes change and changing rate.We should repeat learning and update the deep reinforcement learning model.The new AI model will recognize changed state.Moreover, it will predict situations a few seconds before happening.
Thus, we created an updated neural network model and performed automatic (and real-time) risk assessment and risk reduction.The updated neural network responded to time series and detected In order to reduce these risks, our policy recognizes change and changing rate.We should repeat learning and update the deep reinforcement learning model.The new AI model will recognize changed state.Moreover, it will predict situations a few seconds before happening.
Thus, we created an updated neural network model and performed automatic (and real-time) risk assessment and risk reduction.The updated neural network responded to time series and detected change and changing rate by Recurrent Neural Network with Long-Short Term Memories (RNN-LSTM) or Convolutional Neural Network (CNN) multi-streams, as shown in Figure 5.

Risk from Errors, Gaps, and Delay
In the conventional rule-based model, its validity and the optimality of the threshold are dependent on past statistic data or human designing skills.In order to reduce these risks, our policy recognizes change and changing rate.We should repeat learning and update the deep reinforcement learning model.The new AI model will recognize changed state.Moreover, it will predict situations a few seconds before happening.
Thus, we created an updated neural network model and performed automatic (and real-time) risk assessment and risk reduction.The updated neural network responded to time series and detected change and changing rate by Recurrent Neural Network with Long-Short Term Memories (RNN-LSTM) or Convolutional Neural Network (CNN) multi-streams, as shown in Figure 5.

Risk from Errors, Gaps, and Delay
In the conventional rule-based model, its validity and the optimality of the threshold are dependent on past statistic data or human designing skills.

Risk from Errors, Gaps, and Delay
In the conventional rule-based model, its validity and the optimality of the threshold are dependent on past statistic data or human designing skills.
However, in the DQN-based model (from the environmental and patient perception points of view), in addition to errors due to sensors, misrecognition and lack of unrecognition from the actual environment are considered to cause collisions and falls.Further, there is also a risk of misjudgment due to a lack of learning of the relationship between input and output.However, the fact that there is a mismatch may not in itself be a risk in realizing functions by deep learning.This is because the model recognizes the environment robustly by deep learning and does not achieve a recognition rate of 100% from the point of overfitting prevention.This is a compromise between robustness and accuracy.Moreover, when combining different deep learning frameworks and models, the aim is usually to improve the accuracy and stabilize the judgment.However, when the input data size (dimension) differ for each model, there is a risk of inconsistency and performance deterioration (due to the combination of model and input data) where resizing does not sufficiently apply the function, performance, and accuracy of the original model.The internal covariance shift of neural networks is caused by a distribution change due to simultaneous weight updates derived from the multilayer structure.
Furthermore, there is a risk of an accumulation of transmission delays from sensing and recognition through the integration of recognition, judgment, and control processes reflecting on the control environment.As time elapses, the state where the control should be applied and the actual state deviate from each other (which could cause loss of the adequacy), therefore real-time processing of a series of processing is required.
These errors, gaps, and delays cause collisions with obstacles and people (due to spatiotemporal errors of recognition) when robots assist in transferring patients from beds and wheelchairs, which can cause injuries.Thus, to reduce the risk, spatiotemporal deviation must be interfaces should be matched, learning should include deviation (noise) and delay, and functional safety processes should be verified.In particular, by data argumentation, early stopping, convolution and pooling, data meaning (ensemble, bagging, dropout), batch normalization (power distribution of each layer is fixed at learning), adjustments of hyper-parameter according to learning progress, validation in functional safety processes in real-time, and learning in a space-time robust simulation.

Risk from Design
There is a risk when certain factors are included in the design of AI.These factors are lack of variation of input data, low exploration rate, insufficient number of learnings, lack of neural network (NN) structure approximating action-value functions (number of nodes, layers, and network paths, inappropriate model selection, and approximate error), overfitting, malicious interference or data rewriting from the outside, errors of training procedure and method, and spatiotemporal errors of robot-environment interaction.In conventional machine learning, there are risks with respect to coverage, validity, and optimality, caused by humans designing decision boundaries.However, in deep reinforcement learning, there are risks depending on the design of state definitions, reward definitions, and choice of actions.It should be remembered that AI has different thought patterns to humans, because AI has a lack of biological constraints in time.For example, in deep reinforcement learning, there is a possibility of choosing actions with less risk for AI (irrespective of the magnitude of positive reward), as infinite reward will be obtained even if a little reward is considered for infinite time.A combination with a negative reward is also an important risk reduction measure.In addition, when the goal is far, methods such as compensation distribution are used to speed up learning but can also be regarded as a partial introduction of conventional rule-based type logic (for example a cost map).This may not necessarily be the optimal solution, because there is a risk that it will be induced by the person who decided the rule.
Furthermore, if it is a desire of AI to seek rewards (in deep reinforcement learning), then only seeking the means for achieving the purpose is requested.This is equivalent to "self-transcendence," from the top of the viewpoint of Maslow's hierarchy needs of seven levels.However, it is realized from the beginning without any other needs, such as self-realization requests, the concept of guilt, or humility.There is a risk that models are generated that lack social impact items, such as multi-viewpoint thinking from multimodal inputs and empathy.Therefore, it is necessary to design rewards to prevent these occurrences.
The limit of the conventional rule-based method is that the design range can be exceeded by human design, whereas the limit of the deep reinforcement learning method is that the correspondence range of the neural network model is not limited by human design.There is a risk that the neural network can be exceeded by own learning by design mistakes.
To reduce design risks, the following can be implemented: verification of function safety process, limiting the role of AI, adding assist functions to prevent accidents, external monitoring, and equipping with emergency stop functions.In particular, by minimizing the constraints, combining positive and negative rewards, designing reward considering social influence, clarifying assignment of AI-loaded equipment and assistant persons, confirming the role coverage, self-diagnosis-verifying mechanism installed, safety verification, the AI-robot is monitored and stopped when an abnormal behavior is detected.

Risk from AI Independence
It is possible for AI to have creativity, and even if initially there was integrity in the implementation of hardware and software, there are problems such as self-repair, automatic proofreading, self-replication and automatic function expansion, and attacks against other programs.There is also a risk that the portion involving the autonomy, creativity, and personality of AI will be expanded beyond the scope of the original implementation.For example, there are risks of complementing recognition by voluntarily moving parts that are hidden, and interpolating recognition by estimating even if the environments and human conditions are not visible and detectable.
To reduce this independence risk, we should consider the following points: no biological constraints, meaninglessness of reward when AI has its own value, and not overturned due to spatiotemporal blind spots.Moreover, AI predicts and complements against unknown information, excess/shortage of intervention, and attacks on people without human intervention.Therefore, a learning curriculum should be prepared that creates a mechanism to interrupt human interaction.

Risk from Human Mind
In the case of imitating learning to imitate human acts, although it seems no problem likely to occur at first glance because the content of the learning is limited, it is necessary to consider risks to human acts.Likewise, even in deep reinforcement learning, there are risks due to learning data and its curriculum.There is also a risk that even if there is no problem in the safety of AI itself, people intentionally (or unintentionally) misuse AI for criminal activities.For this risk, there are approaches for risk reduction measures by displaying the attitude of researchers, such as the ethics guidelines of the Ethics Committee of the Japan Society of AI [6].
This risk includes the possibilities of injury to patients, abuse, and crime.To reduce the risks, it is necessary to provide ethical guidelines for researchers and users, education, and legal constraints on the formulation and compliance of robot safety standards applied to AI.

Risk from Weakness of Resilience
Resilience is mainly related to the third approach as following: prevention, mitigation, and recovery.And it is the ability to handle the unexpected.Further, it is an indicator related to the sustainability of the system.It shows the ability to absorb changes and disturbances to maintain a certain state.Moreover, it depicts the ability to mitigate the effects of physical and social system tolerance and serious harm and to recover from a crisis.Bruneau et al. described the concept of resilience as the ability to minimize any reduction in the quality of life (infrastructure) due to earthquakes, and resilience consists of the following properties: robustness, redundancy, resourcefulness, and rapidity [12,13].Moreover, these authors produced a graph of the resilience curve.Further, this concept and its properties are considered applicable to AI. Figure 6 shows a resilience curve, where the horizontal axis represents time, and the vertical axis represents the system function.The inclination to the right shows the severity and tolerance of harm.The rising line to the right shows the resilience.
In our previous study on the risk of AI [14], we systematically organized parts of risks and proposed some risk reduction measures.However, we did not consider the risks after an accident.Sometimes, a weakness of resilience is not considered.When a patient is seriously injured, this risk may lead to fatal conditions, extension of the hospitalization period, or failure to trust the hospital.In our previous study on the risk of AI [14], we systematically organized parts of risks and proposed some risk reduction measures.However, we did not consider the risks after an accident.Sometimes, a weakness of resilience is not considered.When a patient is seriously injured, this risk may lead to fatal conditions, extension of the hospitalization period, or failure to trust the hospital.
For the patient to recover as soon as possible, it is necessary to plan and clarify the restoration method and procedure assumed after an accident.Concisely, emergency preparedness is necessary.For example, in the case of deep reinforcement learning, we can strengthen the recovery ability by causing accidents in the simulation environment and giving more positive reward to the AI system, so that the recovery time can be shortened.
We propose four quantitative indicators of resilience for clinical safety: recovery time of a patient's physical and mental condition (health), recovery time of the environment (environment), recovery time of the medical staff's routine work (role), and recovery cost in the hospital (occupied period of beds for inpatients).However, there is room for discussion on what kind of indicator to quantitatively evaluate.These indicators are improved by the clarification of restoration methods and procedures assumed after an accident, which are a form of crisis management.For systems with stochastic uncertainty (such as robots applying AI), resilience is also necessary.In our experiment, we handled this with a safety reward.We conducted reinforcement learning by giving a positive reward for quickness of recovery after an accident and a negative reward for a delay.

Discussion
The risks of autonomous mobile assistant robot control based on deep reinforcement learning are considered to be high.We mainly considered as important the design, tolerance of AI autonomy, and mind.However, even if the risk is high, there is a possibility of applying AI, and the advantages and disadvantages should be compared.If the advantages outweigh the disadvantages, it is necessary to take measures against the disadvantages.
As a future work, in order to confirm the effectiveness of this proposal, not only the simulation but also the comparison of the results when the nurse with and without robots carry out the risk assessment and the risk reduction measures separately in the actual hospital.Moreover, the analysis For the patient to recover as soon as possible, it is necessary to plan and clarify the restoration method and procedure assumed after an accident.Concisely, emergency preparedness is necessary.For example, in the case of deep reinforcement learning, we can strengthen the recovery ability by causing accidents in the simulation environment and giving more positive reward to the AI system, so that the recovery time can be shortened.
We propose four quantitative indicators of resilience for clinical safety: recovery time of a patient's physical and mental condition (health), recovery time of the environment (environment), recovery time of the medical staff's routine work (role), and recovery cost in the hospital (occupied period of beds for inpatients).However, there is room for discussion on what kind of indicator to quantitatively evaluate.These indicators are improved by the clarification of restoration methods and procedures assumed after an accident, which are a form of crisis management.For systems with stochastic uncertainty (such as robots applying AI), resilience is also necessary.In our experiment, we handled this with a safety reward.We conducted reinforcement learning by giving a positive reward for quickness of recovery after an accident and a negative reward for a delay.

Discussion
The risks of autonomous mobile assistant robot control based on deep reinforcement learning are considered to be high.We mainly considered as important the design, tolerance of AI autonomy, and mind.However, even if the risk is high, there is a possibility of applying AI, and the advantages and disadvantages should be compared.If the advantages outweigh the disadvantages, it is necessary to take measures against the disadvantages.
As a future work, in order to confirm the effectiveness of this proposal, not only the simulation but also the comparison of the results when the nurse with and without robots carry out the risk assessment and the risk reduction measures separately in the actual hospital.Moreover, the analysis of the related effects on the medical staff are necessary to measure the caregiving load reduction for the medical staff.

Conclusions
We extracted and systematized characteristic risks of AI.Moreover, we proposed risk reduction measures when deep reinforcement learning is applied to the control of autonomous mobile robots that assist in reducing the fall risk of patients in hospitals.These results are described for general purposes; therefore, the results suggest that these viewpoints (i.e., risk extraction and risk reduction measures applying deep learning) are useful not only in the field of clinical safety, but also in fields such as industrial safety.

Figure 1 .
Figure 1.A nursing system preventing patients from falling using deep learning.

Figure 1 .
Figure 1.A nursing system preventing patients from falling using deep learning.

Figure 2 .
Figure 2. Deep Q-network model for learning fall risk reduction measures.

Figure 2 .
Figure 2. Deep Q-network model for learning fall risk reduction measures.

Figure 3 .
Figure 3. Characteristic risks to deep reinforcement learning.

Figure 4
also shows the AI life cycle: (1) drawing a vision, planning, and designing a concept; (2) procuring material and resources; (3) analyzing and defining the request; (4) designing AI; (5) implementing AI; (6) learning and creating the AI model; (7) validating the AI model and adjusting hyper-parameters; (8) verification and testing the AI model; (9) field variations; (10) providing applications and services; (11) disposing or recycling; in this phase, attention must be paid to security, such as tampering with the AI model; (12) operation management and maintaining all processes; if this phase is based on AI, it is thought that some external monitoring and multiplexing by different algorithms are necessary.In this lifecycle of AI, when learning and application/service are provided (in addition to the robustness of the response to the situation change of the assist target), it is necessary to recognize both the environmental change and the rate of change.These correspond to changes in the environment and patient behavior, changes in procedures, and changes in physical and mental conditions of the patient at a fall risk described in Section 2.1.They are thought to correspond to a covariate shift of the neural network, caused by data distribution shifts at training and estimated times.Practical examples of this are changes in patient physical condition, mental condition, patients ward transition, medication/sedation treatments and procedures.

Figure 3 .
Figure 3. Characteristic risks to deep reinforcement learning.

Figure 4
also shows the AI life cycle: (1) drawing a vision, planning, and designing a concept; (2) procuring material and resources; (3) analyzing and defining the request; (4) designing AI; (5) implementing AI; (6) learning and creating the AI model; (7) validating the AI model and adjusting hyper-parameters; (8) verification and testing the AI model; (9) field variations; (10) providing applications and services; (11) disposing or recycling; in this phase, attention must be paid to security, such as tampering with the AI model; (12) operation management and maintaining all processes; if this phase is based on AI, it is thought that some external monitoring and multiplexing by different algorithms are necessary.In this lifecycle of AI, when learning and application/service are provided (in addition to the robustness of the response to the situation change of the assist target), it is necessary to recognize both the environmental change and the rate of change.These correspond to changes in the environment and patient behavior, changes in procedures, and changes in physical and mental conditions of the patient at a fall risk described in Section 2.1.They are thought to correspond to a covariate shift of the neural network, caused by data distribution shifts at training and estimated times.

Figure 5 .
Figure 5. Deep neural network model for risk assessment.

Figure 5 .
Figure 5. Deep neural network model for risk assessment.

Figure 5 .
Figure 5. Deep neural network model for risk assessment.
resilience curve, where the horizontal axis represents time, and the vertical axis represents the system function.The inclination to the right shows the severity and tolerance of harm.The rising line to the right shows the resilience.

Table 1 .
Assessment score sheet for the patients and target of automation.

Table 2 .
Conditions for deep reinforcement learning.

Table 2 .
Conditions for deep reinforcement learning.

Table 3 .
Systemization of risks and proposal of risk reduction measures.