Design and Reliability Analysis of a Tunnel-Detection AUV Based on a Heterogeneous Dual CPU Hot Redundancy System

: A water conveyance tunnel is narrow and enclosed with a complex distribution of ﬂow ﬁeld. The performance of sensors such as Doppler log, magnetic compass, sonar, and depth gauge used by conventional underwater vehicles in the tunnel is greatly affected and can even fail. Aiming at the special operating environment and operational requirements of water conveyance tunnels, this paper designed an architecture suitable for pressurized water conveyance tunnel-detection autonomous underwater vehicles (AUVs). The tunnel-detection AUV (called AUV-T in this paper) with the architecture proposed in this paper could easily and smoothly complete inspection tasks in water conveyance tunnels, and ﬁeld tests have veriﬁed the effectiveness of the architecture. Since an AUV in a water conveyance tunnel cannot go to the surface to rescue itself, in order to ensure its safety we designed the heterogeneous dual-CPU (Central Processing Unit) hot redundancy system based on dual communication lines. The reliability analysis showed that the system can signiﬁcantly reduce the probability of AUV failure and ensure that the AUV can still be recovered even if it fails in the tunnel.


Introduction
Water conveyance tunnels are commonly used in water conservancy projects and play a vital role in urban domestic water transportation. However, there are many uncertain factors in the construction process of water conveyance tunnels, especially some longdistance and deeply buried water conveyance tunnels. After running for a certain time, tunnels are likely to suffer from diseases of different degrees [1]. Therefore, it is necessary to conduct regular inspections of water conveyance tunnels. The traditional type of drainage inspection needs to stop water delivery and empty the tunnel, which leads to a huge workload. An autonomous underwater vehicle (AUV) can be used to detect a water conveyance tunnel at any time without changing the working state of the water conveyance tunnel, which not only saves a lot of manpower, material resources, and financial resources but also is convenient and flexible and does not affect urban domestic water, which is the development direction of tunnel detection in the future.
The environment in pressurized water conveyance tunnels is narrow, enclosed, and unknown, with the characteristics of complex flow field distribution, harsh environment, low light levels, and insignificant environmental characteristics. In the water conveyance tunnels, the performance of conventional sensors, such as Doppler log, magnetic compass, sonar, and depth gauge, will be greatly affected or even fail. If the conventional design idea is adopted, it will be impossible to complete the detection task in the water conveyance tunnel. On the other hand, the enclosed environment prevents the AUV from adopting the conventional self-rescue measures of floating in case of emergency. As we all know, AUVs are high-cost, and if they break down in a tunnel and cannot be recovered they will cause a great loss. For this special operating environment, it is necessary to design an AUV with a unique working mode and ensure its operation safety to the maximum extent.
Glushko et al. [2] discussed software architecture for controlling a vehicle actuator system; the paper introduced an additional abstraction level to reduce the complexity of controlling unconventional actuators. Tipsuwan and Hoonsuwan [3] described a design and implementation of an AUV for petroleum pipeline inspection. The AUV is controlled by a 6-degrees-of-freedom PID (Proportion Integral Differential) controller. The software structure of the AUV is based on a Robot Operation System. The AUV was under testing when the paper was published. There is a big difference between the working environment of a pipeline inspection AUV and that of a tunnel inspection AUV. Conventional sensors fail in water conveyance tunnels, and the above architecture is not applicable to the AUV-T. Pidić et al. [4] presented an AUV for the inspection of water-filled tunnels. A twodimensional height vs. time plot of the tunnel was produced. A proof-of-concept for traversing water-filled tunnels using water flow and variable buoyancy was presented in the paper. The data were collected in a scaled-down version of the tunnel. The AUV could only obtain a simple dimensional height vs. time plot, which will be compared with tunnel schematics if it cannot meet the actual detection needs. Jia et al. [5] designed a model predictive path tracking controller for a tunnel-tracking AUV. The experiment was conducted in a canal to prove the effectiveness of the proposed controller. However, the controller was only verified in the canal experiment and was not suitable for the narrow water conveyance tunnel environment. Freire et al. [6] achieved a reasonable level of safety and reliability through the enforcement of coding standards in C language. Besides this, the level of reliability was provided by the ruggedness and simplicity of the components in hardware aspects. Das et al. [7] described the control architecture required for the autonomous navigation and guidance control of AUV-150. The control architecture was presented as an ensemble of both hardware and software modules. The leak detection unit was used in the AUV. If any leakage occurs, the computer will shut down the entire system. The reliability of the above-mentioned system was only improved theoretically, but the reliability of the control computer was not substantially improved from the hardware. Wang et al. [8] designed triple-redundancy hardware and software control architectures for dynamically positioned ships and rigs. The hardware-redundant configuration included three computers which connect to each other through dual networks. The simulation was carried out on a drilling rig. Wang et al. [9] proposed the triple-redundant hardware architecture and hierarchical software for the DP control system. The model test together and the HIL (hard-in-the-loop) test were conducted. The computer and dual networks in the system are homogeneous, without considering the actual situation.
In this paper, considering the environmental characteristics of water conveyance tunnel and the difficulties in inspection work, an AUV architecture for water conveyance tunnel detection and a heterogeneous dual-CPU hot redundancy system based on dual communication lines was proposed. The design of two control computers could reduce the probability of AUV failure and could guarantee the recovery of the AUV in the event of failure in the tunnel to the maximum extent.
The rest of this paper is organized as follows. Firstly, the second chapter introduces the hardware structure and architecture design of the AUV-T. This chapter highlights the key design challenges and how they are proposed to be tackled. The third chapter introduced the structure of the dual-CPU hot redundancy system and analyzes its reliability. Chapter four introduces and analyzes the results of the inspection test in the water conveyance tunnel. Chapter five includes a summary of our findings.

Hardware Composition
As shown in Figure 1, the AUV-T is bound to collide with walls due to sailing in the narrow tunnel. Therefore, the exterior is wrapped in a carbon fiber shell and multiple sets of collision avoidance bars have been added to the shell to ensure the safety of the AUV. The main body of the AUV-T consists of three watertight compartments-namely, a control cabin, a camera cabin, and a battery cabin-outside which annular buoyancy material is equipped, as shown in Figure 2. The AUV-T is equipped with a main thruster at the stern and two vertical and two horizontal thrusters for the control of surge, sway, heave, pitch, and yaw motion. A high-precision depth gauge and four ranging sonars are equipped in the stern, four ranging sonars are equipped around the bow, and five lights and five HD cameras are equipped in the camera cabin in a circular distribution. The cameras are mainly responsible for shooting the video of the wall in the tunnel. The brightness of the lights is adjusted by a PWM (pulse width modulation) signal to provide a light source for the camera. In addition, a forward-looking camera is equipped at the bow of the AUV to recognize the optical signals of the branch hole and assist the AUV in exiting the hole. The navigation, fault diagnosis, and wall following the control of AUV-T is controlled by an onboard PC104 Pentium III-class computer (Computer_Ctrl, Sheng-Bo Technology Corp., Shenyang, China) which is configured in the control cabin. Fiber-optic gyro (Fizoptika, Mosta, Malta), TCM5 (PNI Corporation, Santa Rosa, CA, USA), and leak detection sensors (Huakongxingye, Beijing, China) are also located in the control cabin.
There are three video processing PC104 (Computer_Video1, Computer_Video1, Com-puter_Video3, Sheng-Bo Technology Corp., Shenyang, China) in the camera cabin. Com-puter_Video1 is responsible for processing the data captured by circumferentially distributed cameras 1 and 2 and communicating with Computer_Ctrl regularly according to the communication protocol to ensure the operation of the dual-CPU hot redundancy system. Computer_Video2 is responsible for processing the video data of circumferentially distributed cameras 3, 4, and 5. Computer_Video3 is responsible for processing the forward-looking camera data and communicating with Computer_Ctrl, sending the command of driving out of the main tunnel to Computer_Ctrl when an exit signal is detected. In addition, a solid storage device for data storage is equipped in the camera cabin. Other parameters of the AUV-T are shown in Table 1.

Architecture Design
The architecture of the AUV-T adopts the idea of layered design, which is divided into three levels from the bottom to the top: execution level, task level, and management level [10], as shown in Figure 3. The management level consists of three systems: a navigation system, a mission systemm and a motion control system. According to the special requirements of the tunnel detection task, the task layer is divided into ten modules.
AUV-T is equipped with ranging sonars on the fore and stern. In order to avoid echo interference from sonars in narrow tunnels [11], eight ranging sonars adopt a weighted round robin mode, so it is impossible to obtain all the sonar data at the same time. The data fusion module performs time registration on the sonar data in the same direction. First, the least square method is used to estimate the distance data in the direction where no echo is obtained at the current moment, and then the data in the direction is fused. Through the sensors carried by the AUV-T, important data such as the width, height, depth, heading, and wall video of different positions of the water conveyance tunnel can be obtained, which provides an important reference for the detection of the water conveyance tunnel.
Since the water conveyance tunnel is a reinforced concrete structure, the magnetic compass cannot work normally in the tunnel and so the fiber optic gyroscope is used to obtain the heading change of the AUV. However, the fiber optic gyroscope will have a drift problem under continuous operation [12]. The course correction module uses the fusion of magnetic compass data and ranging sonar data to generate an estimate of the heading change, thereby completing the correction of the drift of the fiber optic gyroscope.
Conventional navigation methods deal with the problem of failure in the tunnel. Therefore, using the known terrain feature data of the tunnel, combined with real-time heading, depth, and ranging sonar data, the relative positioning is found based on the entrance to obtain the relative position of the AUV in the tunnel.
The five circumferential cameras on the AUV-T are used to take pictures of the tunnel wall in real time. Then, the system performs real-time crack detection in the video data and uses the relative positioning data to mark the current crack position. When the video data need to be transmitted, the AUV can be connected via WIFI or optical fiber for data transmission. The WIFI module adopts 802.11 n protocol and the transmission speed can reach 150 Mbps. The transmission speed of the optical transceiver and router in the optical fiber communication is 1000 Mbps, which can meet the demand for high-speed data transmission.
As shown in Figure 4, the obstacle avoidance in the water conveyance tunnel mainly refers to: First, avoiding the vessel meeting holes in the tunnel to prevent the AUV from entering the meeting hole, affecting the relative positioning of the AUV and even causing damage to the AUV. The second is to avoid non-outlet branch.es Since there are many branch exits in the water conveyance tunnel and the AUV needs to be recovered from the designated branch exits [13], it is necessary to identify the branch exits based on the positioning data [14] to avoid driving out from the wrong branch exit and affecting the completion of the work task and the AUV recycling. In order to prevent the AUV from missing the exit and to ensure that the AUV can be successfully recovered, a light source with a specified frequency is placed at the exit. The light blinks according to the preset color and frequency. Red and blue blink alternately, the frequency is 1 Hz, and the frequency can be adjusted according to the actual situation. The AUV-T uses the forward-looking camera onboard to identify the light source of the specified frequency and exit the tunnel.
The AUV operating in the pressurized water tunnel is in a closed and confined environment. When a failure or danger occurs, the conventional self-rescue measures of releasing the ballast from floating cannot be adopted [15]. In order to ensure the safety of the AUV-T in tunnels, it is extremely necessary to study reliable fault diagnosis and self-rescue algorithms. Utilizing multiple CPUs that can communicate with each other in the AUV-T, this paper proposes a heterogeneous dual-CPU hot redundancy system based on dual lines, the details of which will be introduced in Chapter 3.
In order to shoot a clear video, the AUV-T needs to maintain a fixed distance from the tunnel wall [16]. The AUV-T transforms the wall following into heading control by adjusting the distance between the heading and the wall [17]. The backstepping method is used to design an adaptive sliding mode controller to control the heading [18].
The kinematic model of heading motion is as follows: .
The dynamic model [19] is as follows: The S-plane function is: Adaptive law is: The controller is: where ψ is the heading angle; r is the heading angular velocity; τ N is the moment about the GZ-axis; u, v are linear velocities; f N is the external disturbance moment; r ), and k 1 , k 2 , z 1 , z 2 , λ are the control parameters; .
x 2 = f (x 2 ) + bτ N + f ; x 2d is the expected value of x 2 ; s is the sliding surface function, which can be expressed as s = k 2 z 1 + z 2 ;f is the estimated value of f . There is water pressure in the water conveyance tunnel, and the depth sensor obtains the depth of the AUV from the free surface. In order to maintain the stability of the distance between the AUV and the wall in the vertical direction, an adaptive S-plane controller is designed [20]. The distance between the AUV and the bottom of the tunnel obtained by the ranging sonar is used as the height value to input into the controller so as to realize the control of the distance between the AUV and the bottom of the tunnel.
The control model of S-plane [21] is: where τ Z is the output value of the height controller. k 1 , k 2 are the controller parameters. e, . e are the height deviation and deviation change rates. ∆τ Z is the adaptive adjustment term, which is defined as: where . e max is the deviation rate threshold, which is used to control the adjustment quantity. ∆e is the height deviation adaptive adjustment quantity, which is defined as: where γ is the parameter of low-pass filter. e t is the height deviation at time t. η t is the instant elimination factor at time t, which is defined as following: The distance between the AUV and the bottom of the tunnel varies due to the gradient of the tunnel. The integral part of the adaptive adjustment item can improve the response ability of the controller to a small deviation and improve the response to the height control. The design of weakening link can improve the control shock brought by an integral term. A low-pass filter is added to smooth the output of the height controller.
To meet the needs of long-term detecting tasks, two rechargeable Li-ion batteries are equipped in AUV-T, as shown in Figure 5. Battery1 is located in the control cabin with an output voltage of 24 V and is used to power sensors, cameras, LEDs, and PC104. Battery2 is located in the battery cabin with a 48 V output and is used to power five thrusters. Both batteries are equipped with an intelligent detection system, which can output battery voltage, current, temperature, SOC (state of charge), and other information to the control system in real time. When the SOC value of one of the batteries reaches the critical threshold, AUV-T will stop the detection task and go out the tunnel nearby to ensure that the AUV can complete the recovery safely.

Multi CPU Hot Redundancy System Modeling
In order to find an optimal multi-CPU hot redundancy system suitable for the AUV-T, a mathematical model of the multi CPU hot redundancy system was established, as shown in Figure 6. System "a" consists of two CPUs, and system "b" consists of three CPUs, and so on. There is a main CPU (number 1) and multiple standby CPUs (number 2 to n) in each system, and each standby CPU communicates with the main CPU through the dualcommunication line (including a network line and a serial line). To simplify the calculation, it is assumed that the failure rate of each CPU is the same, which is α, and the failure rate of the network line and the serial line are the same, which is β. Taking the actual situation into account, the following assumptions are also made.
(1) When the PC104 fails, the serial communication fails, or the network communication fails, it is considered that the module fails. When the module fails, it cannot recover by itself. (2) The working status of the system is divided into three types: 1 Normal working state (R 1 ): All modules are working normally, or the network line or serial line in one or more dual-communication fails (that is, there is no failure of both the network line and the serial line in a dual-communication line). At this time, normal communication can still be guaranteed between all CPUs, and the AUV performs tasks normally. 2 Communication failure status (R 2 ): One or more dual-communication lines fail (both the network line and serial line fail), or one of the CPU fails. At this time, although the AUV can accept control commands and work normally, some CPUs cannot perceive the status of each other. To ensure safety, the AUV-T should exit the tunnel nearby. After exiting the tunnel, a manually check and fault restoring is needed. 3 Fatal failure status (F): As long as the CPU can start the nearby exit program, it is considered that the system has no fatal failure. Only when all the CPUs fail is it considered that the entire system has a fatal failure. At this time, the AUV loses control and sails in an unknown state. The failure of this mission may cause the AUV to fail to recover or even cause damage to the AUV.
When there are n(n ≥ 2) CPUs in the system, the probability of the normal working state of the system can be obtained as: The probability of a fatal failure in the system is: When α = 0.1, β = 0.01 the curve of R 1 and n is shown in Figure 7. Obviously, the increase in the number of CPUs (n) will reduce the probability of normal working (R 1 ), but it will also reduce the probability of fatal failures of the system.
Taking the following factors into account: A fatal failure of the system is a rare event.
An AUV can only perform tasks normally when it is in an R 1 state, so increasing the probability of R 1 should be given priority.
The AUV needs at least two CPUs (Computer_Ctrl and Computer_Video, respectively) to perform inspection tasks normally.
Computer_Ctrl needs to communicate with each other CPUs simultaneously with the network and serial lines. Each communication method requires the establishment of two independent task processes. The increase in CPU will increase the operation load of Computer_Ctrl, thereby further increasing its failure rate.
Considering the above factors comprehensively, combined with the actual operation requirements of AUV, it is finally decided to adopt the dual CPU hot redundancy system as the optimal multi CPU hot redundancy system for AUV-T.

Modeling of Heterogeneous Dual-Line Dual-CPU Hot Redundancy System
The heterogeneous dual-line dual-CPU hot redundancy system is divided into four modules, an shown in Figure 8, including two different CPUs (Computer_Ctrl, Com-puter_Video), one serial line, and one network line. The two CPUs communicate through the network and serial lines. Computer_Ctrl as well as Computer_Video are connected by a serial line. If there is any congestion or bottleneck on the network line, it helps the computers to stay connected. Both the CPUs execute a health checkup routine in parallel. If an emergency happens such as low battery power, the breakdown of the internal ethernet bus, the malfunctioning of navigational sensors, or some program execution fault, any or both of them executes the same emergency routine.
Two CPUs run a VxWorks system and run the same program. The CPU receives the same input data and establishes fault diagnosis tasks with a high priority. This task is responsible for real-time serial communication and network communication between two PC104 (TCP/IP, Computer_Ctrl is the server and Computer_Video is the client). Computer_Video continuously updates the internal variable value according to the communication content. In the initial state, the output status bit of Computer_Ctrl is true, and the output status bit of Computer_Video is false-that is, the AUV finally executes the control command from Computer_Ctrl. When the fault diagnosis system determines that a fault has occurred, Computer_Video sets the output status bit to true and executes the nearest exit task in the current state.
The Markov model is proposed to analyze the reliability of the system under the assumption that any failure is independent. The transition probabilities are time invariant and depend only on the current state.
The interrelationship of failures in the dual-line heterogeneous dual-CPU hot redundancy system is shown in Figure 9. Because the two CPUs have different models and working loads, the probability of failure is different. On the other hand, the communication mechanism between the serial line and the network determines that the probability of failure is different. α 1 is the constant failure rate of computer1, α 2 is the constant failure rate of computer2, β 1 is the constant failure rate of the serial line, and β 2 is the constant failure rate of the network line. X i (i = 1, 2...12, 13), ( f = X 13 ) is the process states of Markov. X 1 is the initial normal working state of the system. X 2 , X 3 are the state of a single CPU failure. X 4 , X 5 are the state of a single communication line failure. X 6 is the state in which both communication lines have failed. X 7 , X 8 , X 11 , X 12 are the state in which both a computer and a line have failed. X 9 , X 10 is the state in which both a computer and two communication lines have failed. f is the state of a fatal failure of the system. Among them, X 1 , X 4 , X 5 belong to the normal working state, and X 2 , X 3 , X 6 , X 7 , X 8 , X 9 , X 10 , X 11 , X 12 belong to the communication failure state. Figure 7 shows the transition probability and transition process between each state X i . The blue box on the left completely displays the transition probability and transition process between some states. The state transition process in the red dashed frame on the right and upper side is similar to that on the left, so it is omitted. Therefore, the state transition probability matrix can be obtained: where, p is the conditional probability, X = s i , s i ∈ s is the state, and s is the state space.
The n-step transition probability in the Markov chain is: Furthermore, from the Chapman-Kolmogorov equation, the n-step transition matrix is: There is no fault condition in the initial state-that is, the initial state distribution is: According to Equations (12)-(16), we could get that the value of p i (t). p 1 (t), p 2 (t), p 3 (t) . . . p 13 (t) is the probability that the system is in state X 1 , X 2 , X 3 . . . X 13 at time t. When the system is in the X 13 (i.e. f ) state, it is considered that a fatal fault has occurred, otherwise the system is considered to be a reliable state. Therefore, the fatal failure rate of the system at time t is defined as: The probability that the system is in a reliable state is: The system reliability status includes normal working status and the communication failure status. The probability of a normal working state is defined as: The probability of a communication failure state is: That is:

Reliability Analysis
In order to verify the reliability of the heterogeneous dual-line dual-CPU hot redundancy system, let α 1 , α 2 , β 1 , β 2 take different values. As far as we know, there is no accurate value for the failure rate used in this paper. Therefore, the failure rate is set in accordance with empirical values and other papers [8]. In order to ensure the reliability of the data, we appropriately increased the failure rate in the paper. When t takes a different value, the values of p i (t), R(t), F(t) change accordingly. According to Equations (16)- (20), we could calculate the probability of the system in different states under different working hours. The results are shown in Table 2. In Figure 10a-c represent three failure rate states from high to low respectively. Table 2. Probability of the system in different states.   On the other hand, the working hours of the system under different fatal failure probability can be calculated, as shown in Table 3. The results are shown in Figure 11. Table 3. Working hours under different fatal failure probabilities. P1, P2, and P3 are the corresponding system states under different failure rates. It can be seen that the dual-CPU hot redundancy system can greatly improve the reliability of the system compared with the system of one CPU. Even if the CPU failure rate is as high as 10 −2 /h, the AUV-T can still be guaranteed to be in a reliable working condition (R) for more than 27 h. The high reliability of the VxWorks system can ensure that the CPU has a very low failure rate. When the failure rate of the CPU is 10 −3 /h, the probability of the AUV maintaining a normal working state (R 1 ) for 72 h is 80.5%, and the probability of it maintaining a reliable working state (R) is 99.1%. When the failure rate of the CPU is 10 −4 /h, the probability of the AUV maintaining a normal working state (R 1 ) for 72 h continuously is 97.9%, and the probability of it maintaining a reliable working state (R) is greater than 99.9%. In practical engineering applications, due to the limitations of the power supply carried by the AUV and the length of the tunnel, the continuous working time of the AUV for tunnel detection usually does not exceed 24 h. Therefore, the use of the dual-CPU hot redundancy system can ensure that AUV-T could complete the detection of the water conveyance tunnel in a reliable working state.

Discussion
In order to verify the AUV-T's ability to detect tunnels and the reliability of the system, we used the AUV-T to conduct an automatic inspection of a water conveyance tunnel in Hangzhou, Zhejiang Province, China, in August 2019, as shown in Figure 12. Figure 12a is AUV-T. In Figure 12b, AUV-T was sailing in the tunnel. The length of the water conveyance tunnel is about 4 km. The sailing time of the AUV-T is 90 min, and the average speed is about 0.74 m/s.    It can be seen that the AUV-T can successfully detect the tunnel. During the selected sailing time, the AUV-T maintained a heading angle of about 75 • , and the overshoot was less than 5 • . At 4000 s, due to the change in the tunnel direction, the AUV-T realized the automatic adjustment of the heading angle after recognizing the tunnel direction change. On the one hand, the distance between the AUV and the bottom of the tunnel was always stable at 0.7 m, and the height fluctuation was less than 0.1 m. Using the controller proposed in this paper, the AUV-T could maintain a stable heading and height during the sailing in accordance with instructions to ensure the stability of the AUV-T during sailing, so as to realize the autonomous observation of the tunnel.

Conclusions
(1) This paper designed a system structure suitable for a water conveyance tunnel inspection AUV. The task layer includes ten task modules: data fusion, course correction, relative positioning, wall defect detection, autonomous obstacle avoidance, visual guidance, fault diagnosis and self-rescue, wall following, height control, and energy and power. It can adapt to the harsh environment in a water conveyance tunnel and realize the detection of the tunnel. (2) The heterogeneous dual-line dual-CPU hot redundancy system was proposed and its reliability was analyzed under different failure rates and different working hours. (3) A stability analysis showed that the dual-CPU hot redundancy system proposed in this paper can significantly improve the reliability of the system, and the probability of maintaining a reliable working state can reach more than 99%. In addition, the tunnel experiment showed that an AUV-T with the architecture designed in this paper can successfully and effectively complete an autonomous inspection of a water conveyance tunnel. (4) In the next step, the data collected this time will be compared with known tunnel data to evaluate the detection effect and provide a basis for water conveyance tunnel detection. In the not-too-distant future, the AUV-T will be used to conduct longerdistance autonomous inspections on more water tunnels.

Conflicts of Interest:
The authors declare no conflict of interest.