Assessment of the Use of Patient Vital Sign Data for Preventing Misidentification and Medical Errors

Patient misidentification is a preventable issue that contributes to medical errors. When patients are confused with each other, they can be given the wrong medication or unneeded surgeries. Unconscious, juvenile, and mentally impaired patients represent particular areas of concern, due to their potential inability to confirm their identity or the possibility that they may inadvertently respond to an incorrect patient name (in the case of juveniles and the mentally impaired). This paper evaluates the use of patient vital sign data, within an enabling artificial intelligence (AI) framework, for the purposes of patient identification. The AI technique utilized is both explainable (meaning that its decision-making process is human understandable) and defensible (meaning that its decision-making pathways cannot be altered, just optimized). It is used to identify patients based on standard vital sign data. Analysis is presented on the efficacy of doing this, for the purposes of catching misidentification and preventing error.


Introduction
Medical mistakes are a significant problem. They range from systematic failures to isolated accidents and provider issues. A study, in 2013, estimated that medical errors cost over USD 20 billion each year and result in the death of 100,000 people [1].
Patient misidentification is a component of this issue and can result when providers lose track of which locations patients are assigned to, swap charts, or otherwise confuse one patient with another. Notably, a 2016 study found that the problem actually starts in patient registration [2], with patient misidentification at registration being the leading cause of misidentification, generally. Each hospital, on average, looses USD 17.4 million per year due to misidentification-attributable denied claims, and 86% of providers "have witnessed or have known of" a misidentification-attributable medical error [2].
This problem is considered to be "wicked" meaning that it is "complex and without a clear solution" and "difficult to solve" [3]. Despite procedures requiring providers to verify patients' names, dates of birth, and ID bands, time pressures, ID band issues, and chaoticness can still result in issues [3]. This paper proposes the use of an artificial intelligence-based solution, which utilizes patient vital sign data, for the purposes of patient identification. It assesses whether, using this approach, patients can be readily identified to a level that would be useful for preventing misidentification. It also characterizes the level of misidentification prevention accuracy that is possible using the proposed approach.
The paper continues in Section 2, with a review of prior work which provides a foundation for the work presented herein. Next, in Section 3, the system design for the test system is presented. Section 4 presents the data that has been collected and the analysis of it. Finally, the paper concludes in Section 5 and discusses areas of potential future work.

System Design
To effectively use vital sign data for patient misidentification prevention, the vital sign data must be available to be correlated with patient identity information. Two scenarios for use are illustrative of how this could be achieved (though others are also possible). These scenarios are presented in Section 3.1. Then, in Section 3.2, system operations are described. Finally, limitations are discussed in Section 3.3.

System Use Scenarios
The first scenario is a smart hospital bed which would contain or be connected to the vital sign monitoring equipment and also know the supposed identity of the patient occupying it. Under this scenario, if the bed were to detect an anomaly it could notify hospital staff, display an alert and annotate any patient identity information with a misidentification concern. Notably, in addition to detecting possible patient misidentification, the same techniques proposed herein could be useful for detecting changes in medical condition; thus, any change would need to be investigated to confirm both patient identity and condition.
The second scenario is a patient record-keeping system. This system would be conceptually similar to the smart bed; however, it would not require specialized bed hardware. Instead, the system would look for indications that new incorrect information is

System Design
To effectively use vital sign data for patient misidentification prevention, the vital sign data must be available to be correlated with patient identity information. Two scenarios for use are illustrative of how this could be achieved (though others are also possible). These scenarios are presented in Section 3.1. Then, in Section 3.2, system operations are described. Finally, limitations are discussed in Section 3.3.

System Use Scenarios
The first scenario is a smart hospital bed which would contain or be connected to the vital sign monitoring equipment and also know the supposed identity of the patient occupying it. Under this scenario, if the bed were to detect an anomaly it could notify hospital staff, display an alert and annotate any patient identity information with a misidentification concern. Notably, in addition to detecting possible patient misidentification, the same techniques proposed herein could be useful for detecting changes in medical condition; thus, any change would need to be investigated to confirm both patient identity and condition.
The second scenario is a patient record-keeping system. This system would be conceptually similar to the smart bed; however, it would not require specialized bed hardware. Instead, the system would look for indications that new incorrect information is being loaded for a patient, through the loading of vital signs data associated with the patient. If this data indicated a potential problem, patient misidentification would be suspected, and other contemporaneously loaded information would be flagged with the concern. Additionally, any access to the record for performing a procedure or dispensing a medication would trigger a misidentification concern alert to the provider.
Under both scenarios, the system would start with a presumed patient identity and historical vital sign data for the patient. It would collect or have vital sign data provided to it. The historical and current vital sign data, along with the presumed identity, would be provided to the misidentification system. This system would assess the likelihood of misidentification, using the process described subsequently and, if needed, provides a patient misidentification alert.
If no prior data are available for the patient, the system would not be able to provide misidentification warnings until data has been collected for a period of time. Thus, when prior data are unavailable, or where it has been invalidated by a change in a patient's medical condition (i.e., where an alert has been generated, the patient identity reverified and the system instructed that the identification is correct), the system would enter an initial data collection mode. While in this mode, it could caution users that it is learning the patient and not yet able to provide misidentification alerts, thus encouraging providers to be particularly careful. This could be augmented with provider practices that include additional verification, while the system is in this mode, to ensure that the patient is correctly verified manually (and, thus, also assuring that the data that is being collected will be associated with the correct individual).
The basic operations of the system are depicted in Figure 2.
Healthcare 2022, 10, x FOR PEER REVIEW 5 of 91 being loaded for a patient, through the loading of vital signs data associated with the patient. If this data indicated a potential problem, patient misidentification would be suspected, and other contemporaneously loaded information would be flagged with the concern. Additionally, any access to the record for performing a procedure or dispensing a medication would trigger a misidentification concern alert to the provider. Under both scenarios, the system would start with a presumed patient identity and historical vital sign data for the patient. It would collect or have vital sign data provided to it. The historical and current vital sign data, along with the presumed identity, would be provided to the misidentification system. This system would assess the likelihood of misidentification, using the process described subsequently and, if needed, provides a patient misidentification alert.
If no prior data are available for the patient, the system would not be able to provide misidentification warnings until data has been collected for a period of time. Thus, when prior data are unavailable, or where it has been invalidated by a change in a patient's medical condition (i.e., where an alert has been generated, the patient identity reverified and the system instructed that the identification is correct), the system would enter an initial data collection mode. While in this mode, it could caution users that it is learning the patient and not yet able to provide misidentification alerts, thus encouraging providers to be particularly careful. This could be augmented with provider practices that include additional verification, while the system is in this mode, to ensure that the patient is correctly verified manually (and, thus, also assuring that the data that is being collected will be associated with the correct individual).
The basic operations of the system are depicted in Figure 2.

System Operations
A key decision, for system implementation, is what data to analyze. Ideally, the system would operate using vital signs which are easily and already commonly recorded. This would facilitate the use of sensors that would already be needed for other purposes also being used for this application. This would reduce costs and collection burden on the patient. From the variety of vital sign information available, the following four pieces of information were selected for use: heart rate, end-tidal carbon dioxide, respiratory rate, and blood pressure. These four vital signs were selected primarily because of availability. Each patient within the dataset has an abundance of data for each of these vital signs and these data were available for all patients. A key area of future work will be to identify whether other vital signs perform as well as, outperform, or underperform the ones analyzed in herein. In addition to assessing performance generally, the correlation of vital sign performance with demographic characteristics also merits assessment in future work.

System Operations
A key decision, for system implementation, is what data to analyze. Ideally, the system would operate using vital signs which are easily and already commonly recorded. This would facilitate the use of sensors that would already be needed for other purposes also being used for this application. This would reduce costs and collection burden on the patient. From the variety of vital sign information available, the following four pieces of information were selected for use: heart rate, end-tidal carbon dioxide, respiratory rate, and blood pressure. These four vital signs were selected primarily because of availability. Each patient within the dataset has an abundance of data for each of these vital signs and these data were available for all patients. A key area of future work will be to identify whether other vital signs perform as well as, outperform, or underperform the ones analyzed in herein. In addition to assessing performance generally, the correlation of vital sign performance with demographic characteristics also merits assessment in future work.
Each piece of vital sign information's numeric value is converted into a value between 0 and 1, to supply to the system for training. The data set contains a set of patients with vital sign data collected during surgery. Ten of these patients are used in this experiment. Each of these ten patients was selected because their procedures lasted longer than 70 min. Data used for training the system for a given patient were taken from times 30:00.00 through 39:59.99 of the patient's procedure. This ten-minute period allows for 60,000 entries for each of the four vital signs.
The system was tested using a series of trials. Trial numbers are common across the datasets that were used for analysis and refer to a specific set of steps that were used for data processing and analysis, in all instances.
For example, trial 1 is performed by supplying the network with training data from patient one to produce a single output value. This output value also lies between 0 and 1. Following this, a data sample is taken from each of the ten patients at intervals within times 1:00:00.00 and 1:09:59.99. These samples are supplied to the network and their output values are compared with the original value generated by training the network at the beginning of the trial. For trial 1, this means that patient one will have ten minutes of data used to train the network and produce an output value. Following this each of the ten patients will have a small amount of data supplied to the network, producing ten output values, each corresponding to one of the ten patients. Ideally, the value generated by the sample from patient one will be most similar to the original value produced by training the network. Meanwhile, the other nine patients should produce output that differs from the original value. For trials 1 through 10, patients one through ten are each used to train the network for their respective trials. This convention is followed for trials 11 through 50, with each collection of ten trials being modified slightly as described below. Each of these 50 trials is performed three times, in order to test three different networks. Each of these three networks is run four times to explore four slightly different approaches, which is also described below.
For the experimentation presented herein, converting the raw data to the value between 0 and 1 was completed using methods that vary by trial. In trials 1 through 10 and 21 through 50, the method used was to take the value and divide it by the maximum value found in any of the patients. For example, at 30:00.00 patient one has a heart rate of 53 bpm. This was divided by the maximum heart rate of all patients, in this case 135 bpm, resulting in a value of 000.393. This approach is referred to as the comprehensive data conversion method. Trials 11 through 20 divide the particular value by the greatest value found in a given patient's own data. This is referred to as the isolated data conversion method.
Both of these approaches could be used by a system operating in the real world and would simply use the largest value recorded to date; however, this would potentially necessitate caping values at 1.000, should a higher value be detected during operations, or retraining the network. In any case, for a real-world implementation, the divisor value would need to be set, potentially based on an in-situ pilot study building upon the experimentation, which is presented herein, so as to be consistent throughout operations. Notably, the largest value of all patients also approximates using the largest reasonable value, which would not change over time.
Once the values are converted, they are supplied to the system. Procedurally, this is completed using a set fact (SF) command (see [45]). A 32-character globally unique identifier (GUID) is assigned to each fact and rule which is used for identifying nodes within the network when issuing commands. For example, the command below sets the heart rate input fact to 000.393: SF:{24da3290-e934-4a9c-84e9-6a0d856e5073}=000.393 Set fact commands are issued for each of the variables used in the given tests. Once the SF commands are issued, a training (TR) command is then issued. Training commands begin with a reference to a starting fact. The blood pressure fact is used as the starting fact and the second GUID included in the TR command is that of the output fact.
Note that one of the SF commands is vestigial and not strictly necessary, as the initial fact included in a TR command is set by the TR command; however, all SF commands were issued for simplicity, as this approach allows the TR command issued to be changed without requiring changing the block of SF commands (and the issuance of an additional Healthcare 2022, 10, 2440 7 of 87 SF does not materially impact system operations). An extended discussion of this can be found in Appendix A along with technical details regarding the system commands.
The four input values (and the starting fact value from the TR command) are drawn from the data row currently being used for training. For the output value, 000.500, which is the midpoint of the valid output range, is used in all cases. This approach makes the trained patient the middle output value with other patients able to show a positive or negative deviation from this patient.
The command set (which is presented in Appendix A) of four SF commands followed by one TR command, is repeated for each row of data in the patient spreadsheets (60,000 records). Following these commands, a present (PR) command is issued.
Two forms of the PR command were utilized. For some tests, a set value (e.g., 000.500) was used that is at or near the middle of the allowable fact value range. The value of 000.500 was used in trials 1 through 20 of each set of data. In trials 21 through 30 the value of 000.600 was used, trials 31 through 40 used 000.700, and trials 41 through 50 used 000.550.
A second approach utilized, as input for the PR command, the blood pressure value from the last training row (so as to produce a natural output from the network). The form of this command was the same, except that the value assigned to blood pressure varied depending on the value in the final row.
A third approach was tested, which uses a fixed value (just like the first approach). The difference is that, just prior to the PR command being issued, all four facts are also set to that value.
Finally, a fourth and final approach is similar to the third, with one primary difference. The average value of each fact is calculated. These averages are used in a set of SF command and the PR command.
The results from these four different approaches to generating a baseline value for a given patient are compared herein. The output of the PR command is what all other results, which are based on performing PRs using data from later in the patient data sets, were compared against. This is referred to as the initial training output value or the target value.

Limitations
It is important to note that this system is intended for use as an additional layer of protection against patient misidentification. It is not designed to uniquely identify patients, nor is it designed to guarantee that patients will be identified correctly in all circumstances. Rather, it is designed to augment existing patient identification and misidentification prevention methods. It will provide partial support or refutation for the presumed patient identity provided. This analysis can occur in tandem with other commonly used patient identification methods, such as verifying identity details with the patient and, in this context, it is capable of reducing patient misidentification. Further expanding the identity assessment capabilities of the proposed system with the use of additional data elements and methods is a potential topic for future work.

Data and Analysis
The data used to test this system was sourced from patient monitoring and vital sign data that was recorded during surgical cases where patients underwent anesthesia at the Royal Adelaide Hospital (RAH) [46]. The data are typical of biometric information that is frequently collected during hospital stays. Data were utilized from 10 patients: case01, case03, case04, case05, case06, case09, case11, case12, case13, and case14 in the RAH dataset [46]. These files correspond to patients one through ten, respectively, in the data presented in this section (e.g., patient one is associated with case01, patient two is associated with case03). Patients whose procedures were completed in less than 70 min (e.g., case02, case07, case08, and case10), and thus did not have a full set of data, were excluded. Training data were sourced from the fourth file for each patient (e.g., uq_vsd_case01_fulldata_04.csv for patient 1), which contains data collected from the times 30:00.00 to 39:59.99. For each patient, 60,000 rows of data are included covering this timeframe (however, in many cases, values are repeated so that only a handful of unique values are present).
After the system was trained, data from later in patients' surgery was used to test it. The results were generated by presenting data from the seventh file of each patient (e.g., uq_vsd_case01_fulldata_07.csv for patient 1), which covers the time from 1:00:00.00 to 1:09:59.99. For each patient, seven rows were presented for data collection: these were rows 3 (rows 1 and 2 contain header data), 10,000, 20,000, 30,000, 40,000, 50,000, and 60,000. Each of these rows was used to create a group of set fact commands (in the same way as was used for the training data) and a PR command was then run. After each run, all intermediate facts were reset (using the SF command) to the default value of 000.500 before performing the next run. Data from this experimental process, using three different network configuration designs, are presented in Sections 4.1-4.3.

First Network Design
The first network is comprised of seven facts and three rules. It uses a heart rule, which gets its value by combining blood pressure and heart rate data, as well as an oxygen rule, determined by combining respiratory rate and end-tidal carbon dioxide data. These two rules provide the heart fact and oxygen fact, respectively, which are inputs to the final rule and output fact. This model is presented in Figure 3. case11, case12, case13, and case14 in the RAH dataset [46]. These files correspond to patients one through ten, respectively, in the data presented in this section (e.g., patient one is associated with case01, patient two is associated with case03). Patients whose procedures were completed in less than 70 min (e.g., case02, case07, case08, and case10), and thus did not have a full set of data, were excluded. Training data were sourced from the fourth file for each patient (e.g., uq_vsd_case01_fulldata_04.csv for patient 1), which contains data collected from the times 30:00.00 to 39:59.99. For each patient, 60,000 rows of data are included covering this timeframe (however, in many cases, values are repeated so that only a handful of unique values are present).
After the system was trained, data from later in patients' surgery was used to test it. The results were generated by presenting data from the seventh file of each patient (e.g., uq_vsd_case01_fulldata_07.csv for patient 1), which covers the time from 1:00:00.00 to 1:09:59.99. For each patient, seven rows were presented for data collection: these were rows 3 (rows 1 and 2 contain header data), 10,000, 20,000, 30,000, 40,000, 50,000, and 60,000. Each of these rows was used to create a group of set fact commands (in the same way as was used for the training data) and a PR command was then run. After each run, all intermediate facts were reset (using the SF command) to the default value of 000.500 before performing the next run. Data from this experimental process, using three different network configuration designs, are presented in Sections 4.1-4.3.

First Network Design
The first network is comprised of seven facts and three rules. It uses a heart rule, which gets its value by combining blood pressure and heart rate data, as well as an oxygen rule, determined by combining respiratory rate and end-tidal carbon dioxide data. These two rules provide the heart fact and oxygen fact, respectively, which are inputs to the final rule and output fact. This model is presented in Figure 3.  The following table represents the results obtained using the first GDES network model, hence the header including Set 1. Trials 1 through 10 of Set 1 are presented in Table  1 and the first PR approach described in Section 3.2 is used. The average error column displays, for each trial, how much all patients tend to deviate from the initial output training fact on average. The avg err column focuses on this deviation in just the target patient, rather than the average of all patients. The lowest column displays a yes or no value, depending on whether the target patient had the lowest deviation from the initial training output of all patients. The correct at columns also have a yes or no value representing The following table represents the results obtained using the first GDES network model, hence the header including Set 1. Trials 1 through 10 of Set 1 are presented in Table 1 and the first PR approach described in Section 3.2 is used. The average error column displays, for each trial, how much all patients tend to deviate from the initial output training fact on average. The avg err column focuses on this deviation in just the target patient, rather than the average of all patients. The lowest column displays a yes or no value, depending on whether the target patient had the lowest deviation from the initial training output of all patients. The correct at columns also have a yes or no value representing whether or not the target patient falls within a given margin of error. The false at column represents how many of the incorrect patients also fall within that margin of error. An example of how to read row one is as follows: In set 1, trial 1, the average deviation from the initial training output value across all ten patients is 0.037, while the average value of patient one only deviated by 0.012. This deviation, while lower than average, is not the lowest of all ten patients in trial 1; therefore, the lowest column displays N. Because the average deviation of patient one is greater than 0.01 the correct at 0.01 column also displays an N. However, the deviation is lower than 0.025 so the correct at 0.025 column (and the remaining correct at columns) display a Y value. In trial 1, no patients deviate from the initial output value by less than 0.01, as indicated in the False at 0.01 column. Five patients deviated by less than 0.025, seven patients deviated by less than 0.05, and eight patients deviated by less than 0.10.
From these results, a ratio of correct-to-incorrect patients can be determined for each margin of error by examining the collection of all ten rows. In these ten trials, there were two correct patients that fell within a 0.01 margin of error (these being trials 6 and 9) while the trials averaged 0.8 incorrect patients also falling within this margin (with 0.8 being the average of the false at 0.01 column). This ratio is 2:0.8, or 2.5. At a 0.025 margin of error, this ratio is 2.22. At a 0.05 error margin, the ratio is 1.66. Finally, at an error margin of 0.10, this ratio is 1.52. A high number of correct patients falling within a given error margin corresponding to a low number of incorrect patients also falling within that error margin means a high-performing model.
Each of the following tables highlight the highest and lowest performing collections of all trials. All remaining results are shown in Appendix B. Table 2 displays the results from the lowest performing collection of trials that use the first GDES network model, with rows one through ten corresponding to trials 11 through 20, respectively. Because these are trials 11 through 20, the isolated data conversion method is used. The third PR approach described in Section 3.2 is what was utilized for these ten trials. As seen in the table, the target patient often deviates from the initial training output value more than the average patient, indicating these trials do not favor the correct patient at any margin of error. This is the case in trials 3, 4, 6, 7, 8, and 9. No patients, correct or incorrect, fall within any listed margins of error.  Table 3 displays results from the highest performing collection of trials using the first GDES model. It is trials 21 through 30, indicating that the comprehensive data conversion method is used and that the default fact value is 000.600. In all but one trial (trial 8), the target patient deviates from the target value less than the average patient. The highest ratio of correct-to-incorrect patients is found at an error margin of 0.01. At this margin, three correct patients are included with an average of 0.6 incorrect also being included. Thus, the ratio is 5. Notably, at an error margin of 0.025, there are 6 correct patients included with an average of 2.1 incorrect patients per trial. While this ratio is lower, at 2.86, there are twice as many correct patients included within the margin. No other trial collections produced multiple ratios this high utilizing the first GDES network model, although other models do have higher-performing trials.

Second Network Design
The second network design, shown in Figure 4, links heart rate and respiratory rate into a single rate rule while blood pressure and end-tidal carbon dioxide combine into another rule. These rules produce facts which serve as inputs to the final rule, which produces the output.  A similar set of tests were performed with this second network. Trials 1 through 10 utilized a default weight of 0.5 and the comprehensive data conversion method. Trials 11 through 20 used a default weight of 0.5 and the isolated data conversion method. Trials 21 through 50 utilized the comprehensive data conversion method with default weights of 0.6, 0.7, and 0.55 for trials 21 through 30, 31 through 40, and 41 through 50, respectively. Table 4 displays the results of trials 1 through 10 utilizing the second GDES network model and the fourth PR approach, as described in Section 3.2. These trials are noteworthy as at an error margin of 0.01 there are four correct patients included with an average of 0.7 incorrect patients also included in the margin. This ratio, 4:0.7, or 5.71, is the second  Table 4 displays the results of trials 1 through 10 utilizing the second GDES network model and the fourth PR approach, as described in Section 3.2. These trials are noteworthy as at an error margin of 0.01 there are four correct patients included with an average of 0.7 incorrect patients also included in the margin. This ratio, 4:0.7, or 5.71, is the second highest of any collection of trials. While this is noteworthy, the actual number of correct patients within the margin of error is only four, meaning that in most of the trials the correct patient did not fall within a 0.01 margin of error.  Table 5 represents the lowest performing collection of trials that utilize the second GDES network model. This is trials 11 through 20, indicating the use of the isolated data conversion method. The third PR approach described in Section 3.2 is used. In trials 11,13,14,17,18, and 19 the target patient deviated from the target value more than the average of all patients. No patients, correct or incorrect, fall within any listed margin of error.  Table 6 displays the highest performing trial collection utilizing the second GDES network model. It represents trials 21 through 30, indicating the default fact value used is 000.600. The second PR approach described in Section 3.2 is used. At a margin of error of 0.01 there are 3 correct patients included and an average of 0.6 incorrect per trial. This is a correct-to-incorrect ratio of 5. At a 0.025 margin of error 5 correct patients are included along with an average of 1.2 incorrect patients. This ratio is 4.17.

Third Network Design
The third network design, which is depicted in Figure 5, combines the heart rate and end-tidal carbon dioxide facts with a single rule, while respiratory rate and blood pressure are combined using a second. These two rules produce facts that serve as inputs to the final rule, leading to the output fact. Notably, this network design tended to produce greater accuracy than the previous two networks, so results for this model will be explored in slightly more detail.   Table 7 is a noteworthy performer of the third GDES network model. It represents trials 1 through 10, so the default fact value used in this case is 000.500. It also uses the first PR approach described in Section 3.2. As can be concluded from the average error and avg err columns, output from the target patient is always closer to the target value than the average patient. The most notable aspect of these trials is found at the 0.025 margin of error. Across seven of the trials, the correct patient falls within the margin. Meanwhile, across the trials, an average of 1.7 incorrect patients also fall within the margin. This is a 7:1.7 correct-to-incorrect ratio, or 4.12. The high ratio, along with seven correct patients falling within the margin of error, makes this trial collection one of the most accurate.   Table 7 is a noteworthy performer of the third GDES network model. It represents trials 1 through 10, so the default fact value used in this case is 000.500. It also uses the first PR approach described in Section 3.2. As can be concluded from the average error and avg err columns, output from the target patient is always closer to the target value than the average patient. The most notable aspect of these trials is found at the 0.025 margin of error. Across seven of the trials, the correct patient falls within the margin. Meanwhile, across the trials, an average of 1.7 incorrect patients also fall within the margin. This is a 7:1.7 correct-to-incorrect ratio, or 4.12. The high ratio, along with seven correct patients falling within the margin of error, makes this trial collection one of the most accurate.  Table 8 represents an exceptional collection of trials for several reasons. It is trials 11 through 20, indicating the isolated data conversion method is used. It also utilizes the second PR approach. Specifically, these trials perform well at a 0.01 margin of error. While a relatively unimpressive four trials have included patients that fall within that margin, there is an average of only 0.5 incorrect patients also being included. This 4:0.5 ratio, or 8, is the highest of any collection of trials across the entirety of this experiment. Additionally, this is unexpected considering the isolated data conversion method generally underperforms. While this is a noteworthy ratio, the actual number of correct patients falling within the margin is not substantial in comparison to some of the other high performers, particularly those utilizing GDES model three.  Table 9 displays the lowest performing trials of set 3. In trials 13,14,16,17,18, and 19 the target patient deviated further from the target value than most other patients. It uses the isolated data conversion method and the third PR approach. Like the corresponding lowest performers of the other GDES models, no patients, correct or incorrect, fall within any listed margin of error. The commonalities between all of the lowest performers for each model are the isolated data conversion method and the third PR approach. This indicates that these tend to produce low-performing trials, especially when used in tandem.  Table 10 displays the results of another high-performing trial collection of GDES model three. These are trials 21 through 30 and use the second PR approach. In all trials except 28, the target patient is closer to the target value than most other patients. At an error margin of 0.01, there are two correct patients that fall within the margin while an average of 0.4 incorrect patients are also included. This correct-to-incorrect patient ratio is then 5. Expanding to a 0.025 margin of error improves the results further. Five correct patients fall within the margin of error while an average of 0.9 incorrect patients are also included. This ratio is then 5.56. This ratio is the third highest correct-to-incorrect patient ratio across the entire experiment, and the highest ratio of any error margin that includes five or more incorrect patients.  Table 11 displays the results of trials 41 through 50 of the third GDES network model and first PR approach. Of all trials, these ten produce the most desirable outcome. At a margin of error of 0.025 nine of ten correct patients are included with an average of 1.9 incorrect patients also falling within the margin. This ratio, 9:1.9 or 4.74 is among the very highest and the error margin includes substantially more correct patients than any other trial collection with a comparable ratio.  Table 12 summarizes each of the high performers with a few key details in order to compare the variables present across trials. As can be seen, the third model tends to outperform the first and second. A default value of 000.600 tends to outperform the others, as well as using the actual PR value to determine the target value. Only error margins of 0.01 and 0.025 are present in the top performers. While some of these variables tend to outperform other trials on average, they are not strictly superior. For example, using the actual PR value tended to produce higher correct-to-incorrect patient ratios on average; however, using the single 0.5 PR value results in higher numbers of incorrect patients falling within a 0.025 margin of error. Ultimately, the last column in Table 12 displays the results of the top performer. That combination of model, default fact value, and PR approach demonstrated in this case the ability to eliminate 80% of incorrect patients on average per trial while only eliminating the target patient in one trial.

Conclusions and Future Work
This paper has presented and analyzed a prospective technology, which is designed to help medical providers recognize misidentified patients. Notably, this is not performed via specific patient identification but rather by providing support or refutation for presumed identifications. Thus, this system could be used to provide alerts indicating possible misidentification for human follow-up. It could also be potentially paired with other indicator subsystems as part of a multi-factor patient misidentification system.
To assess the efficacy of the proposed approach, several algorithms for identifying misidentified patients, using gradient descent expert systems, were evaluated in this work. These included three different network designs, multiple ways of preparing data to supply it to the network, and different ways of generating the target value for a given patient, based on their historic vital signs data.
There are several key outcomes from the analysis of the trial data presented herein. First, it was shown that the GDES network design itself clearly affects the efficacy of the system's ability to identify misidentified patients. Despite each of the three network designs utilizing the same starting fact values and the same number of facts and rules, there were clear differences in performance between the different network designs. In the case where the target value was generated with a single 0.5 input, the third network design outperformed the other two. This indicated a potential advantage associated with linking patient heart rates with end-tidal carbon dioxide and respiratory rates with blood pressure.
Second, it was shown that the comprehensive data conversion method consistently outperformed the isolated data conversion method. This is evident from the fact that trials 11 through 20 for each network design tended to be the least accurate, and these were the batches of trials that utilized the isolated conversion method.
The third area of analysis is the target value used for training. Of the values used, there was no clearly superior performer. Different networks performed better with different values.
Overall, the results obtained are quite promising. While none of the trials demonstrated the capability of positively identifying all patients by only their vital signs, many of the trials demonstrated a consistent ability to eliminate many-and in some cases-the majority of incorrect patients, allowing the system to provide an effective warning for many single patient mix-up scenarios. In particular, the best performers, listed in Table 12, show that the GDES system can effectively rule out incorrectly identified patients in the majority of cases.
There are several other factors, which were not explored in the present study, that are key topics for potential future work. One area is assessing the level of training data that is needed. In this regard, two key considerations exist. The first is to determine what level of training is most effective. To this end, future work can focus on assessing whether using all 60,000 data records, which were used for this study for training, provides the best results. The second area of consideration is to assess what the cost and benefit tradeoff of using less than the optimal amount of data is, as operating with lower amounts of data would allow the system to provide misidentification warnings with less input data and potentially learn about a patient more promptly, before a mix-up can occur.
Another area for potential future work is the development of additional networks and their assessment. These networks could use some or all of the input data used in this study and potentially augment it with additional data types. In particular, the data analyzed could be augmented with image data, which could potentially be collected using providers' tablet computers, providing another independent source of patient validation/misidentification warning.
In conjunction with the above, a third area of potential future work is the assessment of the efficacy of using other types of vital sign data. This analysis could compare the ease and cost of collection, the amount of data required and the performance of the system, presenting a trade-off analysis that could guide real-world implementation decision-making.
Overall, this work has shown the efficacy of using GDES in a different way from previous work, where network result values are compared to each other instead of presenting multiple subjects to a single network for classification. Additionally, this work has shown the potential promise of using patient vital sign data for misidentification prevention. The data and analysis presented have demonstrated a meaningful ability to eliminate incorrect patients using common vital sign data. Based on this initial work, a number of promising areas for additional work have been identified for future exploration.  Institutional Review Board Statement: Not applicable, as this study used pre-existing de-identified publicly available data.
Informed Consent Statement: Not applicable, as this study used pre-existing de-identified publicly available data.

Data Availability Statement:
No new data were collected during this study.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Appendix A
This appendix provides an additional discussion of the technical implementation of the GDES system, building upon what is presented in Section 3.2. For the purposes of simplicity GUIDs will be written with the node name replacing the actual GUID value. Thus, the command: would be presented as: Set fact commands are issued for each of the variables used in the given tests. Once the SF commands are issued, a training (TR) command is then issued. Training commands begin with a reference to a starting fact. As shown below, the blood pressure fact is used as the starting fact and the second GUID included in the TR command is that of the output fact. To train the network, this combination of starting fact and final fact is used: Note that one of the SF commands is vestigial and not strictly necessary, as the initial fact included in a TR command is set by the TR command; however, all SF commands were issued for simplicity, as this approach allows the TR command issued to be changed without requiring changing the block of SF commands (and the issuance of an additional SF does not materially impact system operations). In the above example, the vestigial SF is for the blood pressure fact, which is set to the same value in lines 1 and 5 of the command text. The TR command sets the blood pressure fact to 000.459, even though it follows a SF command that has already set the blood pressure fact value to 000.459. While redundant, this is performed for simplicity and does not impact the network's output.
In this example, 000.459 is the blood pressure (NBP) value from the data row being used for training. The value of 000.500, which is the midpoint of the valid output range, is used in all cases. This approach makes the trained patient the middle output value with other patients able to show a positive or negative deviation from this patient.
These commands, four SF commands followed by one TR command, are repeated for each row of data in the patient spreadsheets (60,000 records). Following these commands, a present (PR) command is issued. Two forms of this command were utilized. For some tests, a set value (e.g., 000.500) was used that is at or near the middle of the allowable fact value range: For each set, in trials 1 through 20, the value of 000.500 was used. In trials 21 through 30 the value of 000.600 was used, trials 31 through 40 used 000.700, and trials 41 through 50 used 000.550.
A second approach utilized the blood pressure value from the last training row (so as to produce a natural output from the network). The form of this command was the same, except that the value assigned to blood pressure varied: A third approach uses a set value, just like the first approach. The difference is that, just prior to the PR command being issued, all four facts are also set to that value: The results from these four different approaches to generating a baseline value for a given patient are compared. The result of this PR command is what all other results, which are based on performing PRs using data from later in the patient data sets, were compared against. It is referred to as the initial training output value or the target value.

Appendix B
This appendix presents data and limited analysis collected during the trails described in Section 4 of the paper.  Table A1 presents data collected using a single 0.5 input to the PR command. Table A2 presents data collected using an actual value, from the dataset, as the PR command input. Table A3 presents data collected by supplying 0.5 values as all inputs before the PR command. Finally, Table A4 uses the average of all values for a field as the PR command input.   Tables A5-A8 contain data similar to Tables A1-A4. However, rather than averaging values from each patient, these tables present data from only the target patient. This data facilitates the assessment of the system's performance in the identification of the target patient. In these tables, row one corresponds to trial one and patient one, row two corresponds to trial two and patient two, and so forth.  Table A9 presents data collected by dividing the values in Table A5 by the values in Table A1. Each row's % err avg is calculated by the corresponding row of Table A5's avg  err divided by Table A1's average error row. Likewise, % max avg and % min avg are the results of values from Table A5 being divided by corresponding values from Table A1. The bottom row contains the averages of each column. The lower the % err avg values, the closer the system output is to the target patient. A value of 100% would indicate that the target patient's margin of error is equal to the average of all patients. A value greater than 100% indicates that the target patient's margin of error is greater than the average of margin of error among all patients. Likewise, a value less than 100% indicates that the target patient had a lower margin of error than the average patient in that given trial. A low value indicates that the combination of the GDES network, default weight, and method for converting data to values 0 to 1 are effective at eliminating the incorrect patients from the pool of possible true identities.  Table A13 presents results from further analyzing the data. The average error and avg err columns in this table are the same values as in Tables A1 and A5. The lowest column contains a yes or no value. Yes indicates that the target patient had the lowest margin of error of all patients, while no indicates that this was not the case. Following this are four columns labeled correct at X.XX. These columns indicate whether the target patient's value falls below the value listed for the column. Corresponding to these columns are the false at X.XX columns. These columns display the number of patients, other than the target patient, that have an average margin of error below the listed value.
An example of how to interpret the first row of Table A13 is as follows: the average error is 0.037, while the avg err is 0.012. This indicates that, in trial 1, patient one had a margin of error that is lower than the average of the patients. However, but because the lowest column indicates N, patient one does not have the lowest margin of error. The correct at X.XX columns in the first row read N, Y, Y, Y. This indicates that the target patient's margin of error is above 0.01, but below 0.025. The False at X.XX builds on this by indicating that no patients had a margin of error below 0.01, but that five incorrect patients are below 0.025, and seven incorrect patients below 0.05. Interpreting all ten rows of the table will yield a clearer picture of how effectively the first GDES network works with a default weighting value of 0.5.
In this case, at an error margin of 0.01, there are two trials for which the target patient falls within the margin with an average of 0.8 incorrect patients also being included. This produces a correct-to-incorrect ratio of 2.5, the highest at any margin of error in Table A13. In Tables A14-A16 the highest ratios and their error margins are 2.5 at 0.025, 1.4286 at 0.01, and 3.333 at 0.01, respectively. This indicates that in this batch of trials the fourth approach for running the PR command performs the best, in terms of this metric.
The tables in this and subsequent sections are structured in the same way as those previously described and can therefore be interpreted in the same fashion. Sets of trials that are notable are specifically discussed in this and subsequent sections.
As shown in Table A25, the % err avg average value is greater than 100%, indicating that the target patients, in general, have a greater margin of error than most other patients in the trials. Table A29 indicates that in only one trial was the target patient closest to the system output. It also indicates that, at an error margin of 0.05, only half of the trials contain the target patient, and in those cases, there are at least six (though usually seven or more) other patients that also fall within that margin. At an error margin of 0.025 the highest correct-to-incorrect patient ratio is attained as 1.22. In Tables A30 and A31 the ratios from the other PR approaches are 1.4 at 0.05 and 1.36 at 0.05. Table 1 shows that, in that case, no patients correct or incorrect fell within a 0.1 margin of error. Thus, the PR approach that produces the highest ratio, in this case, is the second. It is not until the margin is increased to 0.1 that the target patient falls within the range for each trial, though in every trial there are at least eight other patients that fall within this range. Set 1, trials 11-20 are the least effective identification method tested. It uses the first GDES network, a default weight of 0.5, and the isolated data conversion method.   Data from trials 21 to 30 are presented in Tables A32-A46. As seen in Table A40, the % err avg for Set 1, trials 21 through 30 is 41.6, indicating that the target patient had a lower margin of error than most other patients in those trials. Table A44 indicates that the target patient had the lowest margin of error in two trials. However, at a 0.05 margin, there were only two trials that did not include the target patient and an average of 3.6 incorrectly identified patients. This yields a correct-to-incorrect ratio of 2.22. At an error margin of 0.025, six correct patients fall within the margin with an average of 2.1 incorrect patients per trial. This is a higher ratio of 2.86, the highest of any error margin in these ten trials using the first PR approach. The highest ratios found in Tables 2, A45 and A46 are 5 at 0.01, 1.36 at 0.05, and 3.33 at 0.01, respectively. Thus, the second PR approach performs best in this case.     Tables A47-A62. In this data, the system was tested with the first GDES network, a default weight of 0.7, and the comprehensive data conversion method. Table A55 indicates an average % err avg value of 40.4%, indicating that in most trials the target patient had a lower margin of error than the average. Table A59 shows that in three of the trials the target patient had the lowest margin of error. At an error margin of 0.05, seven of the target patients fall within that range and, in all but two of the trials, there are three or fewer incorrect patients that are also within that range. Upon expanding the margin to 0.1 there is only one additional trial that includes the target patient while the number of incorrect patients also falling within the margin increases to seven in most of the trials. The margin of error with the highest correct-to-incorrect ratio is 0.01 with a ratio of 4. The highest ratios in Tables A60-A62 were 4 at an error margin of 0.01, 1.11 at 0.025 and at 0.05, and 3.33 at 0.01. Thus, in this case, the first and second PR approach produce an equally strong ratio.

.5. Trials 41-50
Tables A63-A78 present data for set 1, trials 41 through 50. These trials used the first GDES network and a default weight of 0.7. The average % ERR AVG across the trials was 45.1%. At a margin of error of 0.05, seven target patients were within the correct range, with an average of 3.8 incorrect patients also within the margin. At 0.1 all ten target patients fall within the margin and an average of 6.7 incorrect patients are also within the margin. In Tables A75-A78, the highest correct-to-incorrect ratios are 2.22 at an error margin of 0.01, 2.27 at 0.025, 1.11 at 0.1, and 2.16 at 0.05, respectively. This indicates that, in this case, the highest performing PR approach is the second. At an error margin of 0.025 there were five target patients that fell within the margin with an average of 2.2 incorrectly identified patients per trial, as shown in Table A76.
The second network design, shown in Figure 4, links heart rate and respiratory rate into a single rate rule while blood pressure and end-tidal carbon dioxide combine into another rule. These rules produce facts that serve as inputs to the final rule, which produces the output.
A similar set of tests was performed with this second network. Trials 1 through 10 utilized a default weight of 0.5 and the comprehensive data conversion method. Trials 11 through 20 used a default weight of 0.5 and the isolated data conversion method. Trials 21 through 50 utilized the comprehensive data conversion method with default weights of 0.6, 0.7, and 0.55 for trials 21 through 30, 31 through 40, and 41 through 50, respectively. Tables A79-A94 present data for trials 1 through 10 for set 2. The performance for these trials is similar to the first 10 trials of set 1, with the % ERR AVG equaling 44.57%. At an error margin of 0.05, nine target patients fell within the margin along with an average of 4.4 false patients. As seen in Table A94, at an error margin of 0.01 there are 4 correct patients included with an average of 0.7 incorrect patients as well. This equals a correct-toincorrect patient ratio of 5.71. This is the highest ratio found in these ten trials across all PR approaches. The remaining approaches had their best ratios of 2.05 at an error margin of 0.05, 2.5 at 0.05, and 2.86 at 0.01 as shown in Tables A91-A93, respectively.

.2. Trials 11-20
Data for trials 11 through 20 of set 2 is presented in Tables A95-A107 and A109 and ??. The % ERR AVG is higher than many other sets of trials at 82.86%. This indicates that the target patient average margin of error is only slightly lower than the overall average error. At a 0.05 margin, five of the ten target patients fall within the range with an average of 4.1 additional incorrect patients also having error levels within this range. Increasing the margin to 0.1 increases the number of target patients falling within the margin to nine while also increasing the average number of erroneously identified patients to 6.8. Table 3 shows that, in that case, no patients, correct or incorrect, fell within even a 0.1 margin of error. The highest correct-to-incorrect ratio found in these trials was 3.33, which was achieved with both the first and second PR approach at an error margin of 0.01. The fourth approach, presented in Table ?? yielded a ratio of 2.5 at an error margin of 0.025. As shown in Table 3, no patients, correct or incorrect, fell within even a 0.1 margin of error, for the actual PR value approach.    Data for trials 21 through 30, using the second GDES network, are presented in Tables A110-A124. The % err avg is 45.5, which is quite similar to many of the previous trials. At an error margin of 0.025, five of the target patients were identified while only an average of 1.6 incorrect patients were also incorrectly identified. Expanding the margin to 0.05 allows nine of the target patients to be identified with an average of 2.8 incorrect patients falling within the margin. As seen in Table 4, at an error margin of 0.025, five correct patients fall within the margin while an average of 1.2 incorrect patients are identified. This leads to a correct-to-incorrect ratio of 4.17. This is the highest ratio of these trials. The highest ratios from the other PR value, as shown in Tables A122-A124, were 3.21 at 0.05, 1.79 at 0.05, and 1.67 at 0.025, respectively.

.4. Trials 31-40
Tables A125-A140 present the data from trials 31 through 40 for the second GDES network. Roughly similar to most other trials, the % err avg for this set was 45.71%. Additionally, similar to the previous ten trials, an error margin of 0.025 yielded five identified target patients with 1.7 incorrect patients also identified, on average. Expanding the error margin to 0.05 yielded 8 target patients identified and an average of 3.2 incorrect patients also being identified. This is shown in Table A137. The highest correct-to-incorrect ratio is thus 2.94 at 0.025. This is a greater ratio than any found from the results of Tables A138-A140, which have ratios of 2.5 at 0.01, 1.25 at 0.025, and 0.48 at 0.025, respectively.     Tables A141-A156. For the 0.5 PR value, a % ERR AVG of 43.07% was observed. At an 0.025 margin of error, there are four correctly identified patients and an average of 1.8 incorrectly identified patients. Expanding this margin to 0.05 yielded nine correctly identified patients with an average of three incorrect patients being identified per trial. Thus, at an error margin of 0.05, the ratio of correct-to-incorrect patients is 3. This is outperformed by the second PR approach, as shown in Table A154. At an error margin of 0.025, its correct-to-incorrect ratio is 3.57. Tables A155 and A156 show that, ultimately, the highest ratios found for the other approaches were 2.31 at an error margin of 0.025 and 2.65 at 0.05, respectively.
The third network design, which is depicted in Figure 5, combines the heart rate and end-tidal carbon dioxide facts with a single rule, while respiratory rate and blood pressure are combined using a second. These two rules produce facts that serve as inputs to the final rule, leading to the output fact. Notably, this network design tended to produce greater accuracy than the previous two networks. Tables A157-A171 display the results form trials 1 through 10 with the third GDES network. A % ERR AVG of 33.49% was produced with the single 0.5 PR value. This value is lower than those found in any other set. At an error margin of 0.025, seven patients were correctly identified with an average of 1.7 patients also being incorrectly identified per trial. Expanding the error margin to 0.05 correctly identifies one additional patient (eight total) and increases the average number of incorrectly identified patients per trial from 1.7 to 3.9. As shown in Table A169, at an error margin of 0.025, five correct patients fall within the margin while an average of 1.1 incorrect patients are also identified. This leads to a correctto-incorrect ratio of 4.54. This is the highest among all of these trials. The highest ratios and their error margins for the other techniques, presented in Tables 5, A170 and A171, are 4.12 at 0.025, 1.18 at 0.025, and 3.68 at 0.025, respectively.

Trials 11-20
Tables A172-A185 present the results from trials 11 through 20, with the third GDES network. The % err avg from the base 0.5 PR values is 87.11%. At an 0.05 margin of error, five patients fall within the margin with an average of 5.6 incorrect patients per trial also being identified. Expanding to a 0.1 margin of error allows for seven correct patient identifications with an average of 7.1 incorrect per trial. It is worth noting that, at an error margin of 0.01, three target patients fall within the margin with an average of 0.6 incorrect patients per trial also falling within this margin. While this is a relatively low number of correctly identified patients, this is a high correct-to-incorrect ratio of 5. Furthermore, as seen in Table 6, at an error margin of 0.01, four correct patients are identified with an average of 0.5 incorrect patients falling within the margin of error as well. The correct-toincorrect ratio in this case is thus 8.00. While it is a relatively low number of correct patients identified, given the low error margin, this is noteworthy, as the ratio is the highest from any batches of trials. Table 7 shows that, in that case, no patients, correct or incorrect, were identified within a 0.1 margin of error. As seen in Table A185, the only margin of error that included any correct patients was 0.1 with one correct patient falling within the margin and an average of 1.7 incorrect patients also being included, resulting in a ratio of 0.59.  Tables A186-A200. A % err avg of 40.95% was produced by the base 0.5 PR value approach. Seven target patients fell within the 0.025 margin of error, along with an average of 1.9 incorrect patients per trial. Expanding the margin to 0.05 yielded nine correctly identified patients with an average of four incorrect patients also being identified per trial. As shown in Table 8, at an error margin of 0.025, five correct patients are identified while an average of 0.9 incorrect patients also fall within the margin. The correct-to-incorrect ratio, in this case, is 5.56. This is the highest ratio from any margin of error displayed in Table 8. The techniques presented in Tables A198-A200 produce best ratios of 3.68 at 0.025, 0.91 at 0.025, and 2 at 0.025, respectively.   Tables A201-A216. A % err avg of 43.71% is produced by the base 0.5 PR technique. As shown in Table A213, at a margin of error of 0.025, six patients are correctly identified by the 0.5 PR technique along with an average of 1.9 incorrect patient identifications per trial. This leads to a correct-to-incorrect ratio of 3.16. Table A216 shows that, at an error margin of 0.01, one correct patient falls within the margin of error while an average of 0.2 incorrect patients fall within the margin, resulting in a ratio of 5.00, the highest of any within these trials. The highest ratios from other techniques, presented in Tables A214 and A215, are 3.68 at an error margin level of 0.05 and 1.25 at 0.025.  Data from trials 41 through 50 using the third GDES network are presented in Tables A217-A231. These trials are the most accurate of any presented. The % err avg for the base 0.5 PR technique is 29.96%, the lowest of any batch of trials. An error margin of 0.025 yields nine correctly identified patients, with an average of 1.9 incorrectly identified patients per trial. This is nearly as precise as the results of trials 11 through 20 of set 3. It also identifies three times the number of correct patients. In half of these trials, the target patient was the patient with the lowest margin of error, which is a higher rate than any other batch of trials. Tables A229 and A230 also have their highest ratios at an error margin of 0.025. Those ratios were 3.89 and 0.95, respectively. Table A231 presents the fourth PR approach and shows that, at an error margin of 0.01, two correct patients were identified with an average of 0.8 incorrect patients falling within the margin.