Lessons Learned from the 787 Dreamliner Issue on Lithium-ion Battery Reliability

On 16 January 2013, all Boeing 787 Dreamliners were indefinitely grounded due to lithium-ion battery failures that had occurred in two planes. Subsequent investigations into the battery failures released through the National Transportation Safety Board (NTSB) factual report, the March 15th Boeing press conference in Japan, and the NTSB hearings in Washington D.C., never identified the root causes of the failures—a major concern for ensuring safety and meeting reliability expectations. This paper discusses the challenges to lithium-ion battery qualification, reliability assessment, and safety in light of the Boeing 787 battery failures. New assessment methods and control techniques that can improve battery reliability and safety in avionic systems are then presented.


Introduction
The Boeing 787 Dreamliner is a long-range, wide-body, twin-engine jet airliner which began commercial flights in late 2011. On 16 January 2013, all Boeing 787 Dreamliners worldwide were grounded, a move prompted by safety concerns over the lithium-ion batteries that provide on-board backup power during flight, as well as auxiliary startup power. These failures tarnished the aircraft manufacturer's reputation and caused tremendous financial losses for the airlines that were operating Dreamliners at the time, as well as Boeing and its suppliers.

OPEN ACCESS
The grounding was preceded by several other subsystem electrical failures. All Nippon Airways (ANA) reported that between May and December 2012 at least 10 batteries had to be returned due to abnormally low voltages or other anomalous behavior [1]. On 4 December 2012, a United Airlines flight was forced to make an emergency landing in New Orleans after experiencing electrical power issues [2], initially considered mechanical in nature, but found to be due to electrical arcing on the power panel motherboard. A Qatar Airways plane was grounded on 13 December 2012, with similar electrical problems [3]. A few days after that, United Airlines confirmed that another of their 787s was experiencing electrical problems [2]. Yet another incident involved a false alarm in the brake diagnostics system on 9 January 2013 [4]. While these failures posed concerns, ultimately the grounding was caused by two catastrophic battery failures that occurred 10 days apart from each other in January 2013.
On 7 January 2013, a battery fire occurred in a parked 787. A mechanic noticed a power failure in the auxiliary power unit (APU), followed by flames and smoke coming from the auxiliary battery terminals. First response efforts were hindered by a melted quick release knob, but the battery fire was eventually extinguished. One firefighter was burned when the battery vented [5].
On 16 January 2013, a battery failure occurred in a 787 operated by All Nippon Airways. This failure caused the pilots to make an emergency landing at the Takamatsu Airport in Kagawa, Japan. According to All Nippon Airways Vice President Osamu Shinobe, "There was a battery alert in the cockpit and there was an odd smell detected in the cockpit and cabin, and (the pilot) decided to make an emergency landing" [6]. Japanese inspectors found that the auxiliary battery system may have been improperly wired [7], which raised further questions about whether other systems had been installed correctly.

Lithium-Ion Battery Use in Commercial Avionics
New technologies, such as lithium-ion battery systems, must first be approved by the Federal Aviation Administration (FAA) before they can be installed into a plane [8]. The FAA had previously issued a document, 14 Code of Federal Regulations (CFR) 25.1353c(5) and c (6), to govern the installation of nickel-cadmium batteries as a result of a number of failures that have accompanied the increased use of nickel-cadmium batteries in small airplanes [9]. However, the existing regulations have been considered inadequate to cover all the risks posed by lithium-ion battery technology based on problems with lithium-ion batteries in other industries such as portable electronics and electric vehicles.
The Boeing 787 Dreamliner utilizes two identical lithium-ion batteries that help start the auxiliary power unit when the plane is on the ground and serve as a backup for electronic flight systems. Lithium-ion battery technology was chosen for its high energy density and long cycle life compared to other battery chemistries. Lithium-ion batteries operate by shuttling lithium ions between two electrodes to transfer charge and generate current. The two electrodes are separated by a polymer membrane to prevent internal short circuits, and an organic solvent with a lithium salt is added to the cell to provide a medium for ion transport. One of the major safety issues with lithium-ion batteries is the volatility of the organic electrolyte solution. A short circuit or high operating temperature can lead to exothermic reactions that can generate combustible gases, melt the separator, and result in thermal runaway. In the worst case scenario, the battery could catch fire or explode.
When Boeing was initially qualifying the design of their battery system in 2009, lithium cobalt oxide (LiCoO 2 ) was the most widely used cathode chemistry for most commercial lithium-ion battery applications. This was due to its high energy density and high voltage limit compared to alternative chemistries. However, concerns have been raised about the thermal stability of LiCoO 2 and its tendency to release pure oxygen when over-charged, providing an ideal environment for combustion [10]. As lithium-ion technology has matured, lithium iron phosphate (LiFePO 4 ) cathodes have gained wide acceptance in applications such as power tools and electric vehicles due particularly to their enhanced thermal stability over LiCoO 2 cathodes [11]. Batteries made with LiFePO 4 cathodes operate within a lower voltage range and have slightly less charge storage capacity than batteries with LiCoO 2 cathodes, but these factors should have been weighed against safety improvements when deciding on the type of lithium-ion battery to install in an aircraft.
Lithium-ion batteries are also subject to performance degradation as they undergo usage or storage. When a battery is first assembled, electro-chemical reactions result in the formation of a passivation layer on the surface of both electrodes [12]. This layer is known as the solid electrolyte interphase (SEI), and it prevents further decomposition reactions from occurring between the electrodes and the electrolyte. When a battery is charged, lithium insertion results in expansion of the anode. This expansion causes cracks in the SEI layer, leading to the formation of insoluble byproducts [13]. Over time, this mechanism increases the density and thickness of the SEI layer, which increases internal resistance and decreases the available storage capacity. This increase in internal resistance has the potential to enhance the effects of internal heat generation, which increases the risk of thermal runaway.
Battery packs incorporate many cells to meet the power and energy requirements for their target applications. The battery pack on the Dreamliner contains eight 2.5-4.025 V cells wired in series, providing a pack voltage range of 20-32.2 V. Each cell contains three electrode winding assemblies, resulting in a pack capacity of 75 Ah. In comparison, the size of a cell that powers small portable electronic devices may be in the range of 1-4 Ah. The amount of energy that can be released from an individual cell in a failure event on a Dreamliner is several times larger than what could be released in the failure of a cell phone battery. Therefore, size and pack configuration are influential in determining the potential risks. If one cell begins to heat up and enters into thermal runaway, the heat propagating from that cell can spread to adjacent cells, triggering a chain reaction.
Battery management systems (BMS) are incorporated into most battery pack designs to monitor the batteries and maintain safe operating conditions. If lithium-ion batteries are operated outside of a specific voltage and temperature window, degradation is accelerated and the probability of a catastrophic failure is increased. If the battery is charged above its upper voltage limit, excessive heat generation will cause the electrolyte to become unstable and undergo decomposition reactions [14]. These reactions can be further accelerated by increased temperatures. Also, overdischarge can result in the dissolution of the copper current collector, providing opportunities for stray copper particles to cause internal short circuits [15]. For these reasons, a BMS is required to prevent overcharging, overdischarging, and operation at too high or too low of a temperature. The BMS used in the Dreamliner was developed by Kanto Aircraft Instrument Co. It imposed voltage limits, temperature limits, and additional overcharge fail-safe measures to reduce the risk of battery failure under certain abuse conditions. Regardless of the protection devices, commercial battery packs can still vent gas, ignite, and even explode. For example, internal cell faults, such as the deposition of lithium at the anode, can lead to internal short circuits [16]. Common causes for internal short circuits also include manufacturing defects such as metallic contaminates introduced during cell assembly [17] and charging in low environmental temperatures [16]. Also, mechanical shock during the handling or usage of a battery can cause the electrode to deform and puncture the separator [18].

Root Cause Analysis
After the 787 battery incidents, media outlets circulated a theory proposed by Japan's Transport Ministry investigator Hideyo Kosugi on 18 January 2013 [19] that the battery was operated at voltages exceeding the manufacturer's recommendations. However, the BMS in the Dreamliner was supposedly designed to prevent overcharging of the battery pack by establishing operational voltage limits [5]. Additionally, passive cell balancing was implemented to prevent overcharging of individual cells by equalizing the cell voltages after the pack has been charged. Flight recorder data later revealed that the battery pack voltage never exceeded the upper threshold over the last 20 minutes of flight [20].
On 21 January 2013, a US and Japanese joint investigation was launched into the cell manufacturer, GS Yuasa [21]. The investigation examined possible quality control issues such as the introduction of particle contaminants or faulty connections between the cells in the pack. By 28 January 2013, authorities had not found any serious quality control or manufacturing issues at GS Yuasa's facilities [22], or in the battery charging unit, battery monitoring unit, battery fail-safe contactor, or auxiliary power unit controller [5].
Failure analysis was performed on the battery pack using X-ray computer tomography scans, digital radiography, and disassembly inspection [5]. While the entire pack displayed varying degrees of thermal damage, the most severe damage was located on one of the sides of the pack, suggesting that thermal runaway may have originated in a single cell and then spread to the remainder of the battery pack. A protrusion on the bottom of the pack case was consistent with the ejection of a high-temperature liquid that was likely emitted when the cell casing ruptured.
Data collected by the flight data recorder (FDR) during the 7 January 2013, battery fire in Boston gave a limited account of the battery condition prior to failure [23]. The data released to the public by the National Transportation Safety Board (NTSB) spanned from approximately 20 min prior to the shutdown of the auxiliary power unit (APU) to 10 min after the shutdown. The recorder contains 363 different measurements collected from various systems on the airplane, but only two-the DC feed load current and the APU battery DC bus voltage-directly relate to the auxiliary battery. Individual cell voltages inside the battery pack, which were monitored by the BMS, were not recorded by the FDR. Because the bus voltage gives the sum total of all the cells wired in series, a drop in bus voltage cannot identify which particular cell was the first to undergo an internal short circuit. Forwarding the voltage and temperature of each individual cell to the FDR must be used in the future to assisted failure analysis efforts. Additionally, individual cell voltage measurements must be used in real time to isolate a faulty cell and help to prevent the spread of damage to the battery pack as a whole. Figure 1 shows the battery voltage and current data measured by the battery charger over the full length of available data. This data can be divided into three zones. In Zone I (Figure 2), the voltage remained constant at 32 V, except for a 1 V drop at 10:04:13. The current shows a discharge event and two charge events, as highlighted in the first box in Figure 1. The unchanging voltage is usually indicative of a constant voltage trickle charge. The discharge event may have been prompted by the APU, and the following charge events brought the battery back to its fully charged state.  In Zone II, the voltage remained at 32 V, but the current fluctuated from 10:08:33 to 10:19:39. This behavior was not seen prior to 10:06:14. Whether this data is indicative of an impending fault cannot be known for certain without prior data, but similar current fluctuations recorded by the FDR in other parts of the plane indicate that this behavior may be normal.
Zone III, shown in Figure 3, presents the battery data from 10:20:00 to 10:22:52. At 10:21:01 the APU battery bus voltage decreased from 32V to 31 V. At the same time, the current was negative, indicating either a discharge of the battery or a possible soft short circuit. From 10:21:04 to 10:21:07, the APU battery charging current increased to approximately 45 A for 4 s; however, the voltage decreased from 31 V to 30 V. This behavior is abnormal for batteries; usually, voltage increases or remains constant with a positive current. At 10:21:08 the battery switched to a discharge mode with −3 A current, and at 10:21:09 the APU battery bus voltage dropped to 29 V. The voltage then increased to 31 V at 10:21:10 without a change in current draw. The engine indicating and crew alerting system (EICAS) message indicated that the APU battery failed at 10:21:15. The data shows that the battery continued to operate after the APU failure alert. At 10:21:30, the battery voltage dropped from 31 V to 28 V, and at 10:21:37 the APU battery bus voltage decreased to 0 V and returned to 28 V three times, while the current fluctuated between 0 A and −5 A. The sudden drop in voltage by 4 V was an indication that a single cell inside the battery pack was the initial starting point for the short circuit. Initiation of a short circuit will cause the potential difference in the cell to rapidly drop and approach 0 V. The voltage of the battery pack in the Dreamliner is the sum of each individual cell voltage. Therefore, a short circuit in one cell of a 32 V system would result in a 2.5-4 V drop in voltage for LiCoO 2 batteries (depending on the state of charge).
The actual start time of the fire remains unknown. However, from the data, the time of the APU shutdown can be determined. Additionally, there is an estimated time when smoke was first detected by the crew. However, the events between are not well understood. According to the interim factual report [5], the following is known: "The flight and cabin crewmembers had deplaned by 10:20, at which time he [the maintenance manager] and the cabin cleaning crew had entered the airplane. Shortly afterward, a member of the cleaning crew reported to the maintenance manager, who was in the cockpit, 'an electrical burning smell and smoke in the aft cabin.' The maintenance manager then observed a loss of power to systems powered by the APU and realized that the APU had automatically shut down. After confirming that the airplane's electrical power systems were off, the maintenance manager turned the APU and main battery switches to the 'off' position." "According to the transcription summary for this incident, at 10:21:41, the cockpit voice recorder (CVR) recorded sounds associated with the APU shutting down; specifically, the cockpit fans stopped operating. Conversations among maintenance personnel and the turnaround coordinator about the APU shutdown began about 9 s later. At 10:24:10, the turnaround coordinator reentered the cockpit and reported smoke in the cabin." Smoke in the aft cabin had been reported by the cleaning crew by 10:21:50, which was deduced based on the information in the interim factual report. Based on the available data, anomalous events occurred as early as 10:21:04. This corresponds with the spike in APU battery current from 10:21:04 to 10:21:08. The positive current should correspond to a charging condition, and the voltage should increase accordingly. However, the battery voltage dropped from 31 to 30 V in this time period.
Based on the data and the recorded events, there are several factors that suggest there was a problem with how the battery system was integrated into the plane. Two smoke detectors in the aft electrical/electronics bay, where the APU battery is located, failed to trigger an alarm even though reports indicated that the crew reported smelling smoke at 10:21:50 before the system shut down at 10:23:10. The event log also showed that after the battery voltage drop and the shutdown of the APU, the EICAS removed the failure message, and the plane attempted to reuse the battery when restarting the APU. Additionally, the maintenance manager claimed to have manually turned off the switches for the main and APU batteries after the APU had first shut off. It is possible that the fail-safe devices in the battery management system, the charger, and the battery cells functioned properly and prevented the short circuit from becoming a catastrophic failure. Finally, the reboot of the APU by a different subsystem in the plane could have been what caused the final surge in current that led to the fire.

Risk Assessment
Boeing worked with Thales, a power conversion subcontractor, and GS Yuasa, a Japanese battery manufacturer, to develop a lithium-ion battery capable of meeting the 787's electrical and safety requirements. Boeing's reliability group assessed battery risk on two levels. The first level dealt with the severity of the potential failure events. Boeing claims that standard qualification and abuse tests were performed on the batteries to satisfy qualitative requirements and identify dangerous abuse conditions. Through this testing, Boeing determined that overcharging of the battery was the only event that would result in venting with fire [24]. Other abuse situations, such as internal short circuits, were identified as safety risks in terms of venting without fire. These tests were conducted to define operational limits and assess which events posed the greatest safety threats.
The second level of risk assessment focused on the probability of occurrence. GS Yuasa had experience with over 14,000 cells of similar (but not identical) make-up and millions of h of operation without any venting issues [25]. Based on this past experience, Boeing's reliability group initially predicted that a cell would vent smoke without fire due to an internal short circuit once in every 10 million flight h [25]. However, in reality, Boeing experienced two failures in roughly 50,000 flight h. The actual failure rate was three orders of magnitude greater than the original estimate, demonstrating the need for Boeing or preferably an outside agency to reevaluate their entire reliability assessment process. It was claimed that the initial failure rates mainly considered design risks rather than manufacturing flaws [25]. GS Yuasa explained that reassessment of the failure rates would ultimately require the determination of a root cause. However, Boeing has stated that they may never understand the root cause of the failures [24,25] and no root cause has yet been determined.
Without an understanding of the root cause of failure, determination of failure rates and safety modifications is a major challenge. After the battery failures, Boeing's engineers applied a series of band-aids to the battery system to address a suite of possible causes. Some of the modifications were incorporated in an attempt to eliminate other potential problems, such as the tightening of the operational voltage range to prevent overcharge and overdischarge events. Other changes included modifications to hardware, such as increasing the space between each cell and building a containment chamber around the battery system to vent gases outside of the plane. These additions have significantly changed the original battery design and represent only a temporary fix. There is no guarantee that they are fail-proof and it is doubtful that these systems were adequately tested.

Recommendations
A key challenge for ensuring the reliability and safety of batteries is the development of meaningful standards and qualification tests. Boeing noted that each component in the battery was tested separately for more than 5000 h, including a variety of abusive tests aimed to overstress the battery beyond typical operating conditions [25]. Boeing also noted that the power system was tested for more than 25,000 h in the lab. However, the battery failures challenge the criteria and testing standards used to certify the safety of the battery packs in aerospace systems. Internal short circuit testing was included in Boeing's assessment of battery fire risk, but the test prescribed by industry standards did not accurately mimic a true internal short circuit. The test times for these systems is quite short and it is not publicized as to the failure distributions that Boeing assumed in their analysis (Boeing has tended in the past to make the false assumption that failures in electronic components were constant in nature and used outdated Military Handbook 217 for the component predictions).
One industry standard battery test, the nail penetration test, uses a pointed metal rod to penetrate the exterior of a cell and short one or many layers of the cathode and anode. Maleki et al. [26] noted that the nail penetration test provides a thermal pathway for heat transfer out of the cell, reducing the chances of catastrophic thermal runaway. In a true internal short circuit, the generated heat builds up within the cell. Underwriters Laboratory [27] identified a need for additional internal short circuit testing, and noted that the only true internal short circuit test in industry standards was in the Japanese Standards Association, Standard JIS C8714. This test requires disassembling the cell, placing a nickel particle between layers of the cell electrodes, and applying a force to induce a short circuit; however, this does not accurately mimic an internal short circuit in an enclosed cell. Researchers at Sandia National Laboratory [28] and a joint collaboration between NASA and the National Renewable Energy Laboratory [29] developed short circuit tests that can be triggered while the cell is still intact. This is an improvement over existing tests, but all methods require either an external force or pre-heating of the cell to induce a short circuit. The development of a methodology to consistently create internal short circuits under normal aerospace operating conditions is desirable.
As the aviation industry outsources more design, manufacturing and control, new challenges are being created that must be overcome. About 70 percent of the 787 was outsourced to tier-1 suppliers around the globe, a portion of which was then outsourced to additional tiers [30]. Although a multi-tier supply chain can reduce cost and development time, the increase in complexity makes it difficult to control product quality across the entire supply chain. International standards promoting clear communication, rigorous monitoring strategies, and component reliability testing should continue to be refined. Improved oversight and enhanced quality control standards that are proactively enforced may help to improve reliability and safety. In the end, it is the responsibility of the airplane manufacturer to deliver a high quality and reliable product. Final product testing should be largely focused on evaluating risks that may arise during system integration.
Aviation safety can also be improved with innovative control strategies for batteries. In the Boeing 787, safe charging and discharging was ensured by applying voltage, current, and temperature limits. The lithium cobalt oxide cells in the Boeing 787 each have an operational voltage range from 2.5 V to 4.025 V. These fixed thresholds are designed to prevent side reactions inside the battery such as lithium deposition, dissolution of the copper current collector, breakdown of the solid electrolyte interphase layer, and decomposition of the electrolyte. However, maintaining constant operational thresholds neglects the effects of varying loading conditions, battery aging, and unit-to-unit variations, which can lead to underutilization or over-stressing of the battery [31][32][33]. For example, constant voltage limits may not adequately prevent lithium deposition after significant degradation or low-temperature-induced behavior [32].
New battery control strategies should be developed based on physics-based electrochemical models of batteries to prevent lithium deposition. Battery management systems should also be exploited to detect precursors to short circuits. Identifying conditions that could lead to an internal short circuit would assist in failure mitigation efforts. Novel sensor systems and anomaly detection to predict the onset of internal short circuits caused by lithium deposition and dendrite formation should be developed and implemented. Prediction algorithms [34][35][36][37][38][39][40][41] could further supplement these techniques to provide accurate time-to-failure estimations. Control strategies should not be limited to the cell or battery pack level. The way in which each subsystem in the plane uses the main battery and the auxiliary power unit should be examined to assure a robust system design. If a fault is detected in a particular subsystem, its fail-safe designs should not be overridden by another part of the aircraft.
The battery failures occurred during the winter, and the location of the APU in the aft bay suggests that the batteries may also have been exposed to extremely low temperatures, especially at high altitudes. Boeing addressed the temperature risks with a control strategy in their battery management system which would stop charging if a high or low temperature threshold was crossed. Further improvements could be achieved with active thermal management using techniques that have already been well established in the electronics packaging industry.

Summary and Conclusions
Passenger flights resumed on 27 April 2013, after the FAA approved Boeing's new battery design [42]. However, the new battery design focused primarily on mitigating the propagation of failure to the rest of the airplane. Actions taken by Boeing have added redundancy to help contain a fire and vent gases, but they do not solve the fundamental problems that result in battery failure. The new enclosure has been designed to eliminate all chances of fire; however, only a limited amount of testing has been performed on the new design, making it difficult to evaluate the true reliability of the new battery system.
The criteria used by the FAA to qualify Boeing's proposed solution highlights a larger issue of government agency competence in evaluating complex electronic systems. All of the component testing performed during the accident investigation was conducted at the facilities of the original equipment manufacturers (OEM) and according to the OEMs specifications. This could have introduced bias into the investigative conclusions. It is important that government agencies overseeing matters of public safety have the right skill set to adequately evaluate new technologies. This sentiment was echoed previously by Senator Chuck Grassley regarding the ability of the National Highway Traffic Safety Administration (NHTSA) to evaluate cases of unintended acceleration in Toyota's electronic throttle control system [43].
Even with vast improvements in chemistry and fail-safe devices, battery failure prevention has remained a challenge. In 2011, an electric vehicle taxi and bus caught fire in China [44,45]. In April 2007, Acer recalled thousands of laptop batteries that were prone to overheating and catching fire [46]. Dell, Apple, Toshiba, Lenovo, and Sony all experienced similar recalls in 2006. A fire that resulted in the crash of a cargo plane carrying batteries on 3 September 2010, led to stringent transportation restrictions [47]. Nevertheless, lithium-ion batteries provide unmatched performance in terms of energy density and have the potential to be a safe alternative to other energy storage methods. The grounding of Boeing's 787 reinforces the need for effective qualification standards not only for battery development but also for integration of battery systems into airplanes. The modifications made to the 787 battery system were made to prevent future failures from impacting the rest of the plane. While this may stop smoke from accumulating or fire from breaking out, it does not ensure a lower failure rate for battery cells. Additionally, the added weight of the enclosure surrounding the battery pack negates the energy density benefits that motivated the use of lithium-ion batteries in the first place. To prevent over-engineering and to realize the full potential of lithium-ion batteries in future applications, all failure mechanisms must be identified and understood, and BMSs must be designed to account for these vulnerabilities.