AR-Assisted Guidance for Assembly and Maintenance of Avionics Equipment

: The assembly and maintenance of products in the aviation industry constitute a crucial aspect of the product life cycle, with numerous tasks still reliant on manual operations. In order to solve the problem of narrow operation spaces and blind areas in the processes of manual assembly and maintenance, we proposed an augmented reality (AR) assistant guidance method specifically designed for such scenarios. By employing a multi-modality anti-occlusion tracking algorithm, pose data of assembly parts can be obtained, upon which AR guidance information is displayed. Additionally, we proposed an assembly step identification method to alleviate user interaction pressure. We developed an AR visualization assistant guidance system and designed and conducted a user evaluation experiment to measure the learnability, usability, and mental effort required. The results demonstrate that our method significantly enhances training efficiency by 128.77%, as well as improving assembly and maintenance efficiency by 29.53% and 27.27% compared with traditional methods. Moreover, it has significant advantages in learnability, usability, and mental effort, providing a feasible and effective resolution for addressing blind areas during assembly and maintenance within the aviation industry.


Introduction
Advanced aviation equipment such as aircraft has the characteristics of complex product structure, huge numbers of parts, and many assembly relationships, which make its assembly and maintenance difficult [1].In most of the processes of aircraft assembly and maintenance operations, manual operations are still predominant.The characteristics of aircraft assembly operations require operators to memorize a lot of operation instructions, familiarize themselves with complex operation steps and actions, and master a large number of disassembly operation skills.These tasks have high requirements in terms of the experience and comprehensive literacy of the assembly operators, which also indirectly leads to problems, such as low efficiency and unstable assembly quality [2].The number of large aircraft parts exceeds one million, and their assembly cycle accounts for about 65% of the total manufacturing cycle [3].Therefore, assembly efficiency and quality are of great significance for the improvement of production capacity.
The traditional manual assembly process is guided by two-dimensional (2D) paper drawings and process manuals to assist workers to complete the assembly work [4].This method has no obvious guidance, and it is easy to make errors when distinguishing drawings, which will reduce assembly efficiency and quality.We introduce AR technologies to assist in assembly and maintenance.By adding the assembly process, installation location, specification requirements, and other assembly guidance information in the real operating environment, we aim to improve the intuitiveness and safety of assembly guidance, increase assembly efficiency, and realize AR visual intelligent assembly of advanced aviation equipment.At present, some researchers have carried out research on the use of AR technology to assist in assembly and maintenance [5].Szajna et al. proposed an AR-assisted manual wiring production method [6].Boeing engineers and technicians can use AR glasses to query information about aircraft parts, quickly locate faults, and view maintenance procedures [7].However, there are still some serious obstacles to achieving efficiency in assembly problems which have not been solved.A typical example of such obstacles is a blind-area assembly problem: in the narrow landing gear and equipment bay triangle area of the aircraft, the assembly parts of some avionics equipment cannot be directly seen and the workers basically rely on hand touch to perform blind assembly, which makes the efficiency and quality of assembly uncontrollable.It is also common for assembly parts to be occluded by other parts such as cables during maintenance [8].One of the solutions to the above problems is to visually guide the whole process of assembly and maintenance through AR-assisted assembly methods.The key of the AR-assisted method lies in real-time acquisition of the pose of dynamic objects through a 3D tracking approach.This enables the overlay of virtual information onto real objects and automatic recognition of the assembly process and results.The primary goal is to reduce the skill requirements for workers, and improve assembly efficiency [9].
Aiming at the common process of assembly and maintenance of avionics equipment in the equipment bay (including the assembly of bolts, cables, and other parts), this paper proposes a blind-area information perception and AR-guided assembly method based on color and depth data (RGBD).We apply machine vision, AR, and image processing technologies to track the poses of parts in blind areas and identify the assembly stage and guide the operator to efficiently complete assembly and maintenance tasks with AR.The contributions of this paper can be summarized as: 1.
We proposed a pose tracking method for dynamic assembly parts based on RGBD data, aiming at fast, accurate tracking and the robust display of assembly information on parts; 2.
We proposed an AR-assisted assembly guidance method for the assembly and maintenance operations in the visual blind area.The method consists of assembly step identification and dynamic assembly guidance; 3.
We designed and conducted a user study, comparing the proposed methods with traditional methods in terms of assembly efficiency and user experience, which verified the feasibility of our proposed method.
The rest of this article is organized as follows.Section 2 reviews related work on the aspects of 3D tracking algorithms and AR-assisted assembly guidance.Section 3 introduces the blind-area assembly methods and specific technical details, including the construction method of the system and the details of our user study.Section 4 analyzes the experimental data and shows the results.Section 5 is the discussion.Section 6 presents the limitations and the conclusions.

Related works 2.1. Object 3D Tracking Algorithm
One of the core techniques of AR-assisted guidance method is stable, accurate, and efficient virtual-real fusion [10].The key to virtual-real fusion is to track the correct poses of assembly parts, which allows the display of virtual assembly guidance information, such as the assembly docking site and visual blind areas of the assembly parts.
Region-based methods use image statistics to model the probability that a pixel belongs to an object or background areas in the environment [11].And the pose and the corresponding contour of the object that best fit the image segmentation can be found, and a change in the pose of the object can be tracked.
The region-based method allows robust tracking in many challenging scenarios.Such methods typically extract the contour as an initial step, followed by optimization to find the optimal pose [12].Prisacariu [13] developed PWP3D, a real-time algorithm that uses level set pose embedding, which laid the foundation for a region-based method.On the basis of PWP3D, many algorithms have been proposed to add additional information, extend the segmentation model, or improve the efficiency of segmentation [14].To combine depth-based and region-based information, Kehl [15] extended the energy function of PWP3D using the iterative closest point (ICP) algorithm.Ren [16] tightly coupled region and depth information in a probability formula using a 3D symbolic distance function.Then, Li [17] developed adaptive weighted region binding and local constraints on edge information to improve the robustness of 3D target tracking methods in complex situations.Stoiber proposed SRT3D [18], ICG [19], and other methods to further accelerate the speed and robustness.
In order to improve the robustness of partial occlusion, Zhong [20] designed a local area definition strategy and an occlusion detection strategy based on a pixel weighting function.Marinello [21] matched the affinity scores calculated for local feature embeddings and motion descriptors of tracking objects, improving the robustness of tracking in cases of occlusion and omission.
While deep learning has proven to be very successful in 6DoF pose estimation, pure tracking methods are also evolving rapidly.Many tracking methods are inspired by pose refinement and predict the relative pose between an object's rendering and subsequent images [22].For example, se(3)-TrackNet [23] proposed a neural network framework to identify the best relative pose of an object in a case driven by synthetic image data.PoseRBPF [24] employed a rao-blackwelzed particle filter for a potential code with pose representation.And 6-Pack [25] tracked key points based on anchor points.Wang [26] employed a learning-based active contour model to improve the accuracy of tracking.However, it is difficult for the learning-based method to meet the computing speed and hardware configuration requirements of mobile devices.To be used in AR devices, more region-based and image processing methods are being considered at present.

AR-Assisted Assembly Guidance
AR technology has been researched and applied in product design, maintenance, assembly training, and many other fields.The assembly process of the Boeing B787 uses AR technology to assist the assembly of inner wiring harnesses in the fuselage to solve the mis-installation problem caused by mutual occlusion of wires and cables [27].The assembly error rate has been reduced by 50% and the assembly time has been shortened by 25%.AR technology is also applied in the seat assembly of the Airbus A330, which improves the assembly efficiency by 90% and reduces the level of skill requirements for workers [28].
Pang [29] introduced AR technology in the early design stage and developed an AR design and assembly system which can detect assembly constraints and perform collision simulation for real assembly.The system simulates the manual assembly process and improves the intuitiveness and interactivity of integrated design and assembly planning.Henderson [30] conducted an AR maintenance study, using a head-mounted display to repair the LAV-25A1 armored personnel carrier turret, using arrows, text, labels, close-up views, and virtual models to guide the attention of maintenance personnel and assist in the maintenance task, significantly improving the level of equipment maintenance in the US military.Erkoyuncu [31] developed a maintenance-oriented AR prototype system, which automatically generates AR content required for industrial maintenance operations through spatial semantic perception and is used to guide other maintenance personnel to inspect and maintain their work, improve the work efficiency of maintenance personnel through adaptive data management, and reduce the technical level requirements for programmers to customize AR content according to scenarios.Ojer [32] developed an AR system based on projection, which guides and assists operators to complete manual assembly of electronic components by accurately projecting assembly information, thus shortening operators' skill training time, improving operators' ability to adapt to complex and changeable production processes, and reducing assembly errors.Mura [33] applied the AR application development platform of Unity 3D and Vuforia Engine to display measurement sensor data in an AR headset with virtual information and provided guidance information for assembly workers based on the measurement data, thus reducing the operation time and error probability of workers.Auxiliary assembly of the car body is realized, but the AR application system relies on the accurate measurement data of the measurement sensor, which affects the stability of the human-computer interaction content trigger.
In terms of AR assembly guidance in the blind area of the visual field, Khenak [34] used AR to provide assembly guidance of an occluded part in a blind insertion task for an assembled object.Feng [35] proposed an AR assembly guidance method to provide auxiliary guidance in a free-hand bolt assembly task in a blind area.Laviola [36] developed an AR auxiliary maintenance system to guide workers to complete manual maintenance tasks on machines in gas-fired radiant heating systems.
AR technology has been studied and applied in many fields and has achieved certain results in mechanical product design [37], simulation [1], assembly [38], maintenance [39], inspection [40], training [41], and other aspects.However, due to the high cost of hardware equipment and the relatively basic technical theory, the application scenarios are limited and have not been widely popularized.

Materials and Methods
This section describes framework and technical details of the proposed system, along with an explanation of the user study.

System Framework
As shown in Figure 1, the blind-area assembly guidance method we proposed includes three main modules: (A) assembly pose tracking; (B) assembly step identification; and (C) an AR visual guidance module.Module A is mainly responsible for using RGBD data to track the pose of the dynamic assembly.Module B is in charge of identifying whether the current assembly step is completed and deciding whether to enter the next step.Module C is responsible for visualizing the blind area in the assembly process in AR and guiding the operation and providing the necessary technical information.For different applications, such as training, assembly, and maintenance, we customized the development to make the user experience better.The specific details of each module are explained below.
assembly of electronic components by accurately projecting assembly information, thus shortening operators' skill training time, improving operators' ability to adapt to complex and changeable production processes, and reducing assembly errors.Mura [33] applied the AR application development platform of Unity 3D and Vuforia Engine to display measurement sensor data in an AR headset with virtual information and provided guidance information for assembly workers based on the measurement data, thus reducing the operation time and error probability of workers.Auxiliary assembly of the car body is realized, but the AR application system relies on the accurate measurement data of the measurement sensor, which affects the stability of the human-computer interaction content trigger.
In terms of AR assembly guidance in the blind area of the visual field, Khenak [34] used AR to provide assembly guidance of an occluded part in a blind insertion task for an assembled object.Feng [35] proposed an AR assembly guidance method to provide auxiliary guidance in a free-hand bolt assembly task in a blind area.Laviola [36] developed an AR auxiliary maintenance system to guide workers to complete manual maintenance tasks on machines in gas-fired radiant heating systems.
AR technology has been studied and applied in many fields and has achieved certain results in mechanical product design [37], simulation [1], assembly [38], maintenance [39], inspection [40], training [41], and other aspects.However, due to the high cost of hardware equipment and the relatively basic technical theory, the application scenarios are limited and have not been widely popularized.

Materials and Methods
This section describes framework and technical details of the proposed system, along with an explanation of the user study.

System Framework
As shown in Figure 1, the blind-area assembly guidance method we proposed includes three main modules: (A) assembly pose tracking; (B) assembly step identification; and (C) an AR visual guidance module.Module A is mainly responsible for using RGBD data to track the pose of the dynamic assembly.Module B is in charge of identifying whether the current assembly step is completed and deciding whether to enter the next step.Module C is responsible for visualizing the blind area in the assembly process in AR and guiding the operation and providing the necessary technical information.For different applications, such as training, assembly, and maintenance, we customized the development to make the user experience better.The specific details of each module are explained below.

Dynamic Assembly Pose Tracking
In this paper, the multi-modality anti-occlusion tracking method of assembly objects is applied to realize continuous and accurate tracking, achieved through iterative optimization calculation.The translational error of our method is less than 1 cm, and the rotational error is less than 0.9 degrees.The operation speed is fast, and the calculation cost is small.Inspired by Stoiber [18,19], the region-based method constructs a region modality to optimize an object's pose by calculating the probability that the sampling points on the contour edge of the target object belong to the foreground and background.We consider depth information with RGBD data from the depth camera to track the assembly parts and introduce the texture information of the object for auxiliary tracking, which can improve the robustness of the algorithm.Incorporating these algorithms in practical applications can greatly improve the stability of the pose tracking of the assembly.
In order to ensure efficiency and avoid rendering a 3D model in the tracking process, we apply a sparse viewpoint model to represent the geometric shapes of the assembly parts.In the process of sparse viewpoint model generation, we set up a lot of virtual cameras distributed at the sampled viewpoints to render assembly parts.To obtain a dense and uniform spatial viewpoint model, we evenly divide the triangles on each face of the icosahedron into four sub-triangles.This process is iterated four times, and we obtain 256 regular triangles on each face, totaling 5120 regular triangles.
The sampling viewpoints of the virtual camera are located at the apexes of the regular triangles, 1 m from the center of the icosahedron (the distance depends on the size of the assembly part), with the optical axis pointing to the center.The CAD model center of the assembly part coincides with the center of the icosahedron.Then, we render the spatial viewpoint model of the CAD model in 2562 sampling viewpoints.We collect 200 contour points and their normal vectors and 100 surface points and their normal vectors of the CAD model in the spatial viewpoint model.

Region Modality
Our method integrates region, depth and texture modalities by solving the joint probability density function and subsequently optimizing it.Multi-modality fusion can improve the robustness of the algorithm and achieve stable and efficient assembly tracking.
The region modality calculates the 3D pose transformation matrix between two frames by maximizing the probability that the sampling points on the contour of the tracking target in the previous frame are on the contour in the current frame.
Our method calculates probabilities on the correspondence line.Firstly, the CAD model corresponding to the pose result of the previous frame is projected onto the imaging plane to form a mask.The sampling points on the contour of the mask are the midpoints of the correspondence lines.The correspondence line is obtained along the normal projection direction of the sampling point.The length of the correspondence line is 10 to 30 pixels.In order to reduce the amount of calculation, the length of the correspondence line will decrease with the increase in the number of iterative optimizations.The length is long enough in the first few iterations to ensure that the amount of change in the contour of the object between the two frames can be covered.
The advantage of using sampling points is that the pixel values can be sparsely calculated along the correspondence lines.As shown in Figure 2, the blue points are the sampling points from the mask contour of the previous frame, the yellow lines are the correspondence lines, and the red line segment is the position of the contour edge with high probability.The length of the correspondence line decreases with the increase in the number of iterations, which reduces the amount of calculation and improves the operation speed.First, we calculate the probability of each pixel belonging to the foreground or background along the current corresponding lines using the normalized color histogram method.Based on the assumption that each pixel is independent, the probability can be calculated using Bayesian theory.
where  is the pixel color;  is the foreground;  is the background;   |  represents the probability that the pixel color belongs to the foreground or background;  is the step size, which is given by the user; and  pixels are divided into one segment to calculate the step size.Then, we calculate the distance ( from the center () of the corresponding line to the estimated contour.
where  is the normal vector and  ‾ max | |,  ,  is the transform vector,  stands for the 3D points in the camera coordinate system, and  denotes the projection.We calculate the probability that each pixel on the corresponding line belongs to the foreground and background.Based on Bayesian theory, we can calculate the posterior probability that each pixel of the correspondence line belongs to the mask contour of the current frame.
First, we calculate the probability of each pixel belonging to the foreground or background along the current corresponding lines using the normalized color histogram method.Based on the assumption that each pixel is independent, the probability can be calculated using Bayesian theory.
where y is the pixel color; f is the foreground; b is the background; p(y | m i ) represents the probability that the pixel color belongs to the foreground or background; s is the step size, which is given by the user; and s pixels are divided into one segment to calculate the step size.Then, we calculate the distance ( d s ) from the center (c) of the corresponding line to the estimated contour.
where n is the normal vector and n = max |n x |, n y , θ is the transform vector, C X stands for the 3D points in the camera coordinate system, and π denotes the projection.Assuming that the above distribution conforms to the normal distribution after the pose transformation, the posterior probability density function corresponding to the pose transformation can be obtained.
where ω s is the area where the corresponding line l is located and σ c is the user-defined standard deviation in the region modality.

Depth Modality
Combined with the RGBD data of the depth camera, we can add depth modality to our method to improve the stability of assembly part tracking in practical applications.Our method calculates the pose transformation vector between two frames by extracting the correspondence point P from the depth data so that it matches the 3D surface points X sampled from the sparse viewpoint model closest to the pose result of the previous frame.
The correspondence point P is a 3D point, and the depth modality first needs to search for the correspondence point.Firstly, the 3D surface point X sampled from the sparse viewpoint model closest to the pose in the previous frame is projected into the depth image.Based on the user-defined radius and stride, 3D points in the vicinity of X are then reconstructed.Finally, we select the nearest reconstructed point to the 3D surface point X as the correspondence point P.And the correspondence point P whose distance from the 3D surface point X is greater than the threshold is deleted.
As shown in Figure 3, the yellow points are the correspondence points in the depth image, the blue points are the 3D surface points sampled from the CAD model, and the red line segments represent the corresponding associations of the correspondence points.
the correspondence point P from the depth data so that it matches the 3D surface points X sampled from the sparse viewpoint model closest to the pose result of the previous frame.
The correspondence point P is a 3D point, and the depth modality first needs to search for the correspondence point.Firstly, the 3D surface point X sampled from the sparse viewpoint model closest to the pose in the previous frame is projected into the depth image.Based on the user-defined radius and stride, 3D points in the vicinity of X are then reconstructed.Finally, we select the nearest reconstructed point to the 3D surface point X as the correspondence point P.And the correspondence point P whose distance from the 3D surface point X is greater than the threshold is deleted.
As shown in Figure 3, the yellow points are the correspondence points in the depth image, the blue points are the 3D surface points sampled from the CAD model, and the red line segments represent the corresponding associations of the correspondence points.For the posterior probability model of depth modality, in this paper, we use point-toplane error measurement to establish the normal distribution.Therefore, the distance between the 3D surface point X and the correspondence point P is projected to the normal vector  direction of the corresponding 3D surface point for calculation.Given the correspondence point , we can calculate the probability of the pose transformation vector .
where  is the user-defined standard deviation in the depth modality,  is the depth of the correspondence point, and  is 3D points in the model coordinate system.For the posterior probability model of depth modality, in this paper, we use pointto-plane error measurement to establish the normal distribution.Therefore, the distance between the 3D surface point X and the correspondence point P is projected to the normal vector N direction of the corresponding 3D surface point for calculation.Given the correspondence point P, we can calculate the probability of the pose transformation vector θ.
where σ d is the user-defined standard deviation in the depth modality, d Z is the depth of the correspondence point, and M X is 3D points in the model coordinate system.In practical use, σ d is given by the user, considering that the quantity and quality of the data measured by the depth camera will decrease as the distance between the tracked object and the depth camera increases, so σ d should increase as d Z increases.At the same time, σ d also ensures compatibility with the uncertainty of the region modality, which also increases with the camera distance.

Texture Modality
RGB data can be utilized not only for region modality but also for texture modality, especially when the tracked assembly parts have distinctive surface texture features.This can significantly reduce the drift phenomenon in practical applications and further improve the stability and reliability of tracking.
To account for local object appearance, a key point-based probability density function is applied in this paper.As shown in Figure 4, first, for each image frame, we detect the key point within the image within the minimum horizontal rectangular bounding box region of the mask of the tracking object.The feature descriptor is then extracted based on the detected key points and matched with the feature descriptor of the key frame.If the directional difference of feature descriptors between the current frame and the existing key frames exceeds a certain threshold, the current frame will be regarded as a key frame.
Finally, if a frame is considered a key frame, a depth rendering is generated, and for each key point falling on the contour, a reconstruction of the 3D model point is performed, its feature descriptor is added, and the non-occluded 3D points are stored in the key frame.
To account for local object appearance, a key point-based probability density function is applied in this paper.As shown in Figure 4, first, for each image frame, we detect the key point within the image within the minimum horizontal rectangular bounding box region of the mask of the tracking object.The feature descriptor is then extracted based on the detected key points and matched with the feature descriptor of the key frame.If the directional difference of feature descriptors between the current frame and the existing key frames exceeds a certain threshold, the current frame will be regarded as a key frame.Finally, if a frame is considered a key frame, a depth rendering is generated, and for each key point falling on the contour, a reconstruction of the 3D model point is performed, its feature descriptor is added, and the non-occluded 3D points are stored in the key frame.Our method applies the ORB algorithm to extract key points.Although there is still a certain gap between the ORB algorithm and the SIFT algorithm in terms of accuracy, ORB is faster in calculation and can achieve a better balance between tracking quality and efficiency in AR application scenarios.
With the 2D key points detected by the current frame and the matching 3D model points from the key frame, a probability density function can be established.Assuming that the reprojection error follows a normal distribution, the probability density function can be constructed to describe independent point pairs.
where  is texture modal data;  is the key point detected in the current frame;   is the 3D model point projected onto the imaging plane;  is a user-defined standard deviation, which is used to represent the uncertainty of texture modality relative to other Our method applies the ORB algorithm to extract key points.Although there is still a certain gap between the ORB algorithm and the SIFT algorithm in terms of accuracy, ORB is faster in calculation and can achieve a better balance between tracking quality and efficiency in AR application scenarios.
With the 2D key points detected by the current frame and the matching 3D model points from the key frame, a probability density function can be established.Assuming that the reprojection error follows a normal distribution, the probability density function can be constructed to describe independent point pairs.
where D t is texture modal data; x ′ i is the key point detected in the current frame; x i (θ) is the 3D model point projected onto the imaging plane; σ t is a user-defined standard deviation, which is used to represent the uncertainty of texture modality relative to other modalities; and ρ tukey is the Tukey error norm, which is used to minimize the effect of outliers.

The Method to Remove Correspondence Points and Lines under Occlusion Conditions
In practical applications, sometimes assembly parts are occluded by each other or by hands.When occlusion occurs, if we continue to calculate the correspondence lines, points, and key points at the occlusion, we will not only waste computing resources but also deteriorate the pose results.Moreover, it is also possible that the object or camera moves too fast.This will cause interference with the local sampling data, resulting in motion blur of RGB data or an increase in depth information error near the contour.At this time, the out points increase, and we should eliminate the out points and reduce the confidence of the corresponding modality.
In the region modality, our method eliminates correspondence lines if the difference between the theoretical depth values of sampling points on the corresponding lines and the actual depth values obtained from the depth camera exceeds 3 cm.This indicates the occurrence of occlusion in that area.In order to reduce the error deletion, the actual depth value at the point is taken as the minimum depth value of 5 × 5 pixels around the sampling point in the RGBD image.In the depth modality, our method will eliminate the sampling points whose absolute value for the deviation between the actual depth value of each pixel in the mask and the depth value of the CAD model is greater than 3 cm.At this point, we believe that an occlusion has occurred there, or the sampling point belongs to the background.Therefore, it should not be calculated.
In the texture modality, out method similarly eliminates points whose expected depth is significantly smaller than the corresponding measured depth when local occlusion occurs, and the threshold is set to 5 cm.

Optimization
For the posterior probability of region modality, depth modality, and texture modality, this paper uses the Newton method modified by Tikhonov regularization to maximize the probability.
Assuming that the measurements of the three modalities are independent, the pose of the tracked assembly can be estimated by maximizing the joint probability function.The joint probability density function of the three modes is defined as follows.
where n c is the number of data for the region modality, n d is the number of data for the depth modality, and n t is the number of data for the texture modality.
To maximize the joint probability function, several iterations are performed at each pose estimation, where the change vector ( θ) is calculated and the pose is updated.In each iteration, a Newton optimization method with Tikhonov regularization is used.
where H is the Hessian matrix of 6 × 6, g is the gradient vector, λ r and λ t are the regular parameters of rotation and translation, and I 3 is the identity matrix.
The gradient and Hessian matrix of the three modalities can be added directly for optimization.
Finally, considering that θr corresponds to the rotation vector represented by the axis angle, exponential mapping can be used to update the pose of the tracking object.
where C T + M is the updated transformation matrix from model coordinate system M to camera coordinate system C.
In order to maximize the joint probability function, our method in this paper conducts 4 iterations in each pose estimation.We calculate the change vector ( θ) in each iteration and update the tracking object's pose.The confidence degree of the algorithm with respect to the current data and the range of the RGBD data considered can be defined by adjusting the standard deviation, threshold, and other parameters.

Assembly Step Identification
Our method tracks the pose of an assembly part with a certain volume, such as the avionics equipment considered in this paper.Based on the obtained real-time pose data of the assembly, it can be judged whether the current assembly step is completed and whether the assembly result is correct.And our system automatically guides the next assembly step until all assembly processes are completed and takes photos as a certificate.The overall process is shown in Figure 5.
adjusting the standard deviation, threshold, and other parameters.

Assembly Step Identification
Our method tracks the pose of an assembly part with a certain volume, such as the avionics equipment considered in this paper.Based on the obtained real-time pose data of the assembly, it can be judged whether the current assembly step is completed and whether the assembly result is correct.And our system automatically guides the next assembly step until all assembly processes are completed and takes photos as a certificate.The overall process is shown in Figure 5.In the process of identifying whether this assembly step is completed, the changes in pose tracking results of the assembly object do not exceed a certain range within a certain time (for avionics equipment, we take 2 cm/10 degrees within 2 seconds), such that it can be considered that the assembly object is still, which means that the assembly step is completed.Otherwise, it is indicated that the assembly object is still moving and it is considered to be in the assembly process, so the auxiliary guidance will continue.
The assembly result is identified for the parts whose assembly steps are completed.The assembly result is considered correct when the deviation between the pose tracking result and the correct assembly pose of the assembly object is within a certain range (for avionics equipment, we take 3 cm/10 degrees).At this point, it will automatically jump to the next step of guidance.
If the pose tracking result of the assembly object shows a large deviation from the theoretical value, the system identifies the assembly result as an error.The assembly error In the process of identifying whether this assembly step is completed, the changes in pose tracking results of the assembly object do not exceed a certain range within a certain time (for avionics equipment, we take 2 cm/10 degrees within 2 s), such that it can be considered that the assembly object is still, which means that the assembly step is completed.Otherwise, it is indicated that the assembly object is still moving and it is considered to be in the assembly process, so the auxiliary guidance will continue.
The assembly result is identified for the parts whose assembly steps are completed.The assembly result is considered correct when the deviation between the pose tracking result and the correct assembly pose of the assembly object is within a certain range (for avionics equipment, we take 3 cm/10 degrees).At this point, it will automatically jump to the next step of guidance.
If the pose tracking result of the assembly object shows a large deviation from the theoretical value, the system identifies the assembly result as an error.The assembly error part and the outline of the assembly object will be highlighted in red and the correct assembly position will be prompted in yellow.The assembly guidance does not stop until the assembly step has been performed correctly and the next assembly step of the guidance is entered.
When the assembly process involves a large area of occlusion or the assembly part is out of the camera's field of view, the pose tracking may be inaccurate or even invalid.Although this extreme situation does not occur in most assembly processes, we still consider and propose feasible and effective solutions.
In this paper, we propose an evaluation function as an evaluation index for the confidence of pose tracking results, using the F norm of Equation ( 9) in the evaluation function where α n is the noise parameter and ∥H∥ F is the F norm of H.We evaluate the pose tracking results, and if the evaluation function B exceeds a threshold (for avionics equipment A, B and C, we set α n to 2.4, 2.8 and 2.9 based on their sizes), it indicates a lower confidence in the current pose result.At this time, it is difficult to identify the assembly steps and results.Human-computer interaction methods such as voice will be used to view the correct assembly results and enter the next assembly step.

Assembly Training
In the assembly training stage, we use AR technology to display assembly parts and assembly processes in a 3D manner.
For the training of assembly parts, we mainly control the virtual parts through gestures and other human-computer interaction methods.The semi-transparent display of virtual parts allows users to clearly see the previously invisible parts, such as the backs and interiors of the parts, which is conducive to faster learning.
If combined with real assembly parts, the pose tracking algorithm can also be used to track the poses of the parts in real time, so that the virtual information can be accurately presented on the real parts (see Figure 6a).We directly display the information, such as holes and positioning points, that needs to be clear in the assembly process on the real object, and the real object can be moved and rotated.
where  is the noise parameter and ‖‖ is the F norm of .
We evaluate the pose tracking results, and if the evaluation function B exceeds a threshold (for avionics equipment A, B and C, we set  to 2.4, 2.8 and 2.9 based on thei sizes), it indicates a lower confidence in the current pose result.At this time, it is difficul to identify the assembly steps and results.Human-computer interaction methods such a voice will be used to view the correct assembly results and enter the next assembly step.

Assembly Training
In the assembly training stage, we use AR technology to display assembly parts and assembly processes in a 3D manner.
For the training of assembly parts, we mainly control the virtual parts through ges tures and other human-computer interaction methods.The semi-transparent display o virtual parts allows users to clearly see the previously invisible parts, such as the back and interiors of the parts, which is conducive to faster learning.
If combined with real assembly parts, the pose tracking algorithm can also be used to track the poses of the parts in real time, so that the virtual information can be accurately presented on the real parts (see Figure 6a).We directly display the information, such as holes and positioning points, that needs to be clear in the assembly process on the rea object, and the real object can be moved and rotated.
For the assembly step, we introduce the operational details through 3D animation and speech (see Figure 6b).For the assembly step, we introduce the operational details through 3D animation and speech (see Figure 6b).

Visualization and Assembly Guidance of Blind Areas of Avionics Equipment
Based on the QR code in the scene and the SLAM algorithm, we superimpose the assembly guidance information on the real world (see Figure 7a): (1) The assembly parts corresponding to the assembly step being performed are rendered semi-translucent, distinguished by different colors, and the assembly parts (holes, shafts, interfaces, and joints) are highlighted in yellow (see Figure 7b).(2) A 3D assembly simulation animation (including the assembly process, tools, and actions) is played at the corresponding position of the assembly object, and assistance is provided with assembly guidance information, such as arrows, curves, circles, and voice prompts.(3) Easy-to-browse process indicators (such as torque, number of winding turns, and measurement data) are displayed near the assembly object.And textual descriptions, such as process instructions and assembly steps, are displayed in the information box above the equipment bay.
During the assembly process, we show the parts being assembled by semi-transparent visualization, focusing on the occluded assembly parts.Through the visualization of the blind area of the assembly vision, we can express the assembly process and location quickly and accurately.As shown in Figure 7c, the positions of the occluded bolts in the narrow space at the back of the equipment bay are visualized.
We track the poses of the parts being assembled, synchronize the assembly process in real time, and visualize the visual blind area so as to achieve a visual effect similar to "perspective".As shown in Figure 7d, users can intuitively see the matching of the trapezoidal groove on the back of the avionics equipment with the avionics equipment bay, as well as the location of the bolt holes.
ent visualization, focusing on the occluded assembly parts.Through the visualization of the blind area of the assembly vision, we can express the assembly process and location quickly and accurately.As shown in Figure 7c, the positions of the occluded bolts in the narrow space at the back of the equipment bay are visualized.
We track the poses of the parts being assembled, synchronize the assembly process in real time, and visualize the visual blind area so as to achieve a visual effect similar to "perspective".As shown in Figure 7d, users can intuitively see the matching of the trapezoidal groove on the back of the avionics equipment with the avionics equipment bay, as well as the location of the bolt holes.

Auxiliary Maintenance Guidance
Different from assembly tasks, which are carried out in a specific order based on design drawings and process guidelines, aircraft maintenance tasks often involve troubleshooting systems or components, replacing damaged parts, or performing routine

Auxiliary Maintenance Guidance
Different from assembly tasks, which are carried out in a specific order based on design drawings and process guidelines, aircraft maintenance tasks often involve troubleshooting systems or components, replacing damaged parts, or performing routine maintenance.Maintenance personnel may need to make decisions based on real-time failure information and may encounter unfamiliar failure situations.While the order of assembly tasks aims to minimize blind areas in the assembly process design, maintenance tasks typically involve disassembly and replacement of individual components, which are prone to visual blind areas.We can use AR to overlay markers on damaged parts, provide workers with intuitive disassembly and assembly guidance, and display necessary maintenance data to simplify operations and reduce error rates.
In our approach, an AR-based maintenance technical guide is provided for workers to consult.Once a maintenance fault point is identified, we can select specific failure equipment to carry out auxiliary maintenance guidance.After conducting necessary inspections, disassembly steps are indicated with highlighted yellow parts (see Figure 8).A 3D simulation animation is played at corresponding positions along with additional assistance, such as arrows, curves, circles, exclamation marks, and voice prompts.Technical documents are also attached alongside to provide instructions.After completing the disassembly, the other parts are further repaired, replaced, and assembled with assistance guidance until all maintenance tasks are completed.
equipment to carry out auxiliary maintenance guidance.After conducting necessary inspections, disassembly steps are indicated with highlighted yellow parts (see Figure 8).A 3D simulation animation is played at corresponding positions along with additional assistance, such as arrows, curves, circles, exclamation marks, and voice prompts.Technical documents are also attached alongside to provide instructions.After completing the disassembly, the other parts are further repaired, replaced, and assembled with assistance guidance until all maintenance tasks are completed.

User Study
We designed and conducted a user study of scenarios in which there are visual blind areas during assembly.This section describes the experimental purpose, participants, experimental conditions, and procedures, as well as the relevant assumptions and data analysis methods.

Participants
We invited 30 participants (26 males and 4 females, mean age = 23.5 years, SD = 1.7) to participate in our user experiment.They were all graduate students majoring in mechanical engineering and had experience in assembly.
Our experiment included two conditions (see Figure 9): ➀ traditional guidance (TG) and ➁ AR guidance (ARG).In TG, the participants used an assembly manual in the form of graphics and text as a guide.In ARG, the participants used an AR-assisted guidance system deployed in AR glasses as a guide.The Hololens 2, running on the Windows Holographic operating system, was employed as the AR glasses.The software development was conducted using Unity3D (version 2020.3.30).

User Study
We designed and conducted a user study of scenarios in which there are visual blind areas during assembly.This section describes the experimental purpose, participants, experimental conditions, and procedures, as well as the relevant assumptions and data analysis methods.

Participants
We invited 30 participants (26 males and 4 females, mean age = 23.5 years, SD = 1.7) to participate in our user experiment.They were all graduate students majoring in mechanical engineering and had experience in assembly.
Our experiment included two conditions (see Figure 9): 1 ⃝ traditional guidance (TG) and 2 ⃝ AR guidance (ARG).In TG, the participants used an assembly manual in the form of graphics and text as a guide.In ARG, the participants used an AR-assisted guidance system deployed in AR glasses as a guide.The Hololens 2, running on the Windows Holographic operating system, was employed as the AR glasses.The software development was conducted using Unity3D (version 2020.3.30).We measured the participants' familiarity with the assembly on a 5-point Likert scale, ranging from 1 point (light blue) for a novice to 5 points (deep blue) for an expert.And we assigned them as fairly as possible to the TG group (15 people, mean = 2.73, SD = 1.03) and the ARG group (15 people, mean = 2.80, SD = 1.22).A Likert scale was also used to measure the familiarity of the participants in the ARG group with AR (mean = 2.07, SD = 1.22).Furthermore, we measured the frequency of assembly task execution among the participants in both groups.The results indicated that the participants engaged in assembly tasks at least a few times per year (TG group = 53%, ARG group = 53%).For participants in the ARG group, we further measured the frequency of their AR usage.The results showed that they used AR at least a few times per year (ARG group = 60%) (see Figure 10).We measured the participants' familiarity with the assembly on a 5-point Likert scale, ranging from 1 point (light blue) for a novice to 5 points (deep blue) for an expert.And we assigned them as fairly as possible to the TG group (15 people, mean = 2.73, SD = 1.03) and the ARG group (15 people, mean = 2.80, SD = 1.22).A Likert scale was also used to measure the familiarity of the participants in the ARG group with AR (mean = 2.07, SD = 1.22).Furthermore, we measured the frequency of assembly task execution among the participants in both groups.The results indicated that the participants engaged in assembly tasks at least a few times per year (TG group = 53%, ARG group = 53%).For participants in the ARG group, we further measured the frequency of their AR usage.The results showed that they used AR at least a few times per year (ARG group = 60%) (see Figure 10).
ranging from 1 point (light blue) for a novice to 5 points (deep blue) for an expert.And assigned them as fairly as possible to the TG group (15 people, mean = 2.73, SD = 1.03) a the ARG group (15 people, mean = 2.80, SD = 1.22).A Likert scale was also used to measu the familiarity of the participants in the ARG group with AR (mean = 2.07, SD = 1.2Furthermore, we measured the frequency of assembly task execution among the part pants in both groups.The results indicated that the participants engaged in assembly tas at least a few times per year (TG group = 53%, ARG group = 53%).For participants in t ARG group, we further measured the frequency of their AR usage.The results show that they used AR at least a few times per year (ARG group = 60%) (see Figure 10).Before starting the experiment, the participants read and signed an informed conse form containing written explanations regarding the experiment and ethical approval.participants remained in good condition and actively engaged in the experimental proc of the system.
Then, we gave the participants a preliminary introduction to our system and expe mental methodology.Then, the preparation before the experiment was carried out.Fir we introduced the assembly parts to the participants, explained the use of the tools, a made them familiar with the operation process through practical cases of assembling bo with ratchet wrenches and installing and removing aviation cables.For the ARG grou Overview and preparation (20-30 min) Before starting the experiment, the participants read and signed an informed consent form containing written explanations regarding the experiment and ethical approval.All participants remained in good condition and actively engaged in the experimental process of the system.
Then, we gave the participants a preliminary introduction to our system and experimental methodology.Then, the preparation before the experiment was carried out.First, we introduced the assembly parts to the participants, explained the use of the tools, and made them familiar with the operation process through practical cases of assembling bolts with ratchet wrenches and installing and removing aviation cables.For the ARG group, we also taught participants to interact with the AR system using gestures and speech.In this process, we did not disclose the specific experimental task details.

2.
Experiment (40-50 min) The task of the experiments in this paper involved three items of avionics equipment and one avionics equipment bay (see Figure 11).The size of avionics equipment A is 200 mm × 200 mm × 300 mm, and the left and right sides are fixed with two M5 bolts each, and there are three cable interfaces in the front.Avionics equipment B is 300 mm × 250 mm × 300 mm in size and is fixed with three M5 bolts on both front and rear sides, with five cable interfaces in the front.Avionics equipment C is 250 mm × 250 mm × 400 mm in size and is fixed with two M6 bolts at the front and trapezoidal limit slots at the rear, with four cable interfaces at the front.All cable interfaces are of different types.
We designed three tasks: 1 ⃝ Task 1: training the overall assembly process; 2 ⃝ Task 2: performing the assembly task for avionics equipment A, B, and C (including installing bolts (Subtask 2-1) and cables (Subtask 2-2)); and 3 ⃝ Task 3: performing the maintenance task for avionics equipment A and B. Before starting, the participants were told to balance efficiency and quality.mm × 200 mm × 300 mm, and the left and right sides are fixed with two M5 bolts each, and there are three cable interfaces in the front.Avionics equipment B is 300 mm × 250 mm × 300 mm in size and is fixed with three M5 bolts on both front and rear sides, with five cable interfaces in the front.Avionics equipment C is 250 mm × 250 mm × 400 mm in size and is fixed with two M6 bolts at the front and trapezoidal limit slots at the rear, with four cable interfaces at the front.All cable interfaces are of different types.We designed three tasks: ➀ Task 1: training the overall assembly process; ➁ Task 2: performing the assembly task for avionics equipment A, B, and C (including installing bolts (Subtask 2-1) and cables (Subtask 2-2)); and ➂ Task 3: performing the maintenance task for avionics equipment A and B. Before starting, the participants were told to balance efficiency and quality. Task 1: Training Task 1 was intended to make the participants understand the assembly process and steps for all the avionics equipment until they believed they were familiar with the assembly process and reported that they had the confidence to start the assembly task.In TG, we provided 2D drawings of the avionics equipment, the assembly order (AO), and the technical requirements.In ARG, we provided 3D rendered and interactive avionics equipment models, 3D animations of the AO, and the technical requirements.


Task 2: Assembly After completing the training, the participants began to perform the assembly tasks for the three items of avionics equipment.The participants used assembly tools such as ratchet wrenches to assemble the three items of avionics equipment according to the AO (including Task 2-1, loading the avionics equipment into the equipment bay and tightening the bolts, and Task 2-2, installing the cable glands).In TG, we provided 2D drawings, the AO, and the technical requirements for the avionics equipment.In ARG, we provided the assembly guidance information and the technical requirements that could be rendered in 3D and interacted with through AR glasses. Task 3: Maintenance The usual maintenance process includes inspection, disassembly, replacement, and assembly.Considering that the inspection and replacement process takes a short time and • Task 1: Training Task 1 was intended to make the participants understand the assembly process and steps for all the avionics equipment until they believed they were familiar with the assembly process and reported that they had the confidence to start the assembly task.In TG, we provided 2D drawings of the avionics equipment, the assembly order (AO), and the technical requirements.In ARG, we provided 3D rendered and interactive avionics equipment models, 3D animations of the AO, and the technical requirements.
• Task 2: Assembly After completing the training, the participants began to perform the assembly tasks for the three items of avionics equipment.The participants used assembly tools such as ratchet wrenches to assemble the three items of avionics equipment according to the AO (including Task 2-1, loading the avionics equipment into the equipment bay and tightening the bolts, and Task 2-2, installing the cable glands).In TG, we provided 2D drawings, the AO, and the technical requirements for the avionics equipment.In ARG, we provided the assembly guidance information and the technical requirements that could be rendered in 3D and interacted with through AR glasses.
• Task 3: Maintenance The usual maintenance process includes inspection, disassembly, replacement, and assembly.Considering that the inspection and replacement process takes a short time and that the assembly process is similar to Task 2, we conducted experiments on the disassembly process.Upon completion of the assembly, the participants began to perform the disassembly session of the maintenance task.The participants used assembly tools such as ratchet wrenches to perform the experiment according to the maintenance operation process (MOP) to disassemble the specified avionics equipment A and B. In TG, we provided 2D model drawings and the MOP.In ARG, we provided interactive MOP and AR maintenance guidance information.
We recorded the time taken by every participant to complete the tasks, which was used to assess efficiency.After completing all the tasks, the participants were asked to fill out the System Usability Scale (SUS) questionnaire [42], which measured the overall system usability.The SUS questionnaire comprises a total of 10 questions.SUS questions 4 and 10 are utilized for assessing learnability, while the remaining 8 questions are employed for evaluating usability.The total SUS score (0-100) indicates the adjective ratings, where 38 is "poor", 52 is "OK", 73 is "good", and 85 is "excellent".
The participants were also asked to fill out the Subjective Mental Effort Questionnaire (SMEQ) [43], which measured the mental effort required for each experimental condition, and the NASA-TLX [44], which evaluated the subjective feelings of the participants in the different guidance groups.The score for the SMEQ can be rated: 2, "Not at all hard to do"; 12, "Not very hard to do"; 24, "A bit hard to do"; 36, "Fairly hard to do"; and 56, "Rather hard to do".We used the Shapiro-Wilk test and one-way analysis of variance (ANOVA) with Bonferroni correction to analyze the completion times for each task.If P SW , the value of the Shapiro-Wilk test result between the TG and ARG groups, was greater than 0.05, it was indicated that the data followed a normal distribution.If P BF , the result of Bonferroni correction, was less than 0.05, it was indicated that there was a statistically significant difference between the data for the groups.Friedman and Wilcoxon signed-rank tests were employed for indicating significant differences between the two groups on a discrete scale and for questionnaire data.If PWI, the value of the Wilcoxon signed-rank test result, was less than 0.05, it was indicated that there was a significant difference between the two groups.Finally, we analyzed the user feedback.

3.
Review and interview (10-20 min) We videotaped all the participants performing the tasks.And, when the user study was complete, we asked them to go back and watch the videos and recall the overall experience.The interview included the following questions:

Results
We used completion times for training, assembly, and maintenance tasks, as well as the SUS, SMEQ, NASA-TLX, and user interviews to analyze differences between TG and ARG.
Figure 12a illustrates the completion time results for Task 1 among the participants in the TG and ARG groups.Figure 12b details the completion time results for Task 2 and its Subtasks, 2-1 and 2-2, among participants in the TG and ARG groups.Figure 12c shows the completion time results for Task 3 among participants in the two groups.Furthermore, we calculated the mean (M) and standard deviation (SD) of each task completion time to show the difference between TG and ARG groups (see Table 1).For all three tasks, the ARG tasks took significantly less time than the TG tasks.To analyze the data specifically, firstly, the Shapiro-Wilk test showed that the completion time for all three tasks follows a normal distribution (TG group (P SW1 = 0.896, P SW2-1 = 0.467, P SW2-2 = 0.253, P SW2 = 0.443, P SW3 = 0.897), ARG group (P SW1 = 0.543, P SW2-1 = 0.933, P SW2-2 = 0.697, P SW2 = 0.670, P SW3 = 0.823)).Then, when we used a one-way ANOVA, the three tasks were found to have significant differences in completion time between the TG and ARG groups (F 1 = 98.86,P 1 < 0.01, F 2-1 = 20.95,P 2-1 < 0.01, F 2-2 = 4.45, P 2-2 < 0.01, F 2 = 16.38,P 2 < 0.01, F 3 = 8.27, P 3 < 0.01).As shown in Figure 12d-h, for the subjective data, we calculated the SUS questionnaire scores for the TG and ARG groups and analyzed them in two dimensions: SUS learnability (see Figure 12d) and SUS usability (see Figure 12e).Figure 12f presents the SUS total scores for the TG and ARG groups.Figure 12g illustrates the SMEQ results for both the TG and ARG groups.Additionally, Figure 12h displays the NASA-TLX results for the two groups.Furthermore, we used the Friedman test for the significance analysis of the SUS, SMEQ, and NASA-TLX results.The results show that all the subjective data are statistically significantly different.
As shown in Figure 12d-h, for the subjective data, we calculated the SUS questionnaire scores for the TG and ARG groups and analyzed them in two dimensions: SUS learnability (see Figure 12d) and SUS usability (see Figure 12e).Figure 12f presents the SUS total scores for the TG and ARG groups.Figure 12g illustrates the SMEQ results for both the TG and ARG groups.Additionally, Figure 12h displays the NASA-TLX results for the two groups.Furthermore, we used the Friedman test for the significance analysis of the SUS, SMEQ, and NASA-TLX results.The results show that all the subjective data are statistically significantly different.

Efficiency
We analyzed the working efficiency of TG and ARG and used the following equation to calculate the efficiency (E) of ARG compared with TG.

Efficiency
We analyzed the working efficiency of TG and ARG and used the following equation to calculate the efficiency (E) of ARG compared with TG.
where t TG is the average completion time of the TG group and t ARG is the average completion time of the ARG group.
In the Task 1 training experiment, the efficiency of ARG was increased by 128.77% compared with TG.
In the assembly experiment of Task 2, the efficiency of ARG was improved by 29.53% compared with TG.For the Subtask 2-1 avionics equipment assembly experiment, the efficiency of ARG was increased by 38.18% compared with TG, and for the Subtask 2-2 aviation cable assembly experiment, the efficiency of ARG was increased by 16.65% compared with TG.
In the Task 3 maintenance experiment, the efficiency of ARG was improved by 27.27% compared with TG.
The data show that the efficiency of ARG in the three tasks was higher than that of the TG group, which indicates the effectiveness and feasibility of the ARG method.

Learnability
Learnability includes the time required to complete the training task (Task 1) and the two problems of the SUS.Their means and standard deviations are shown in Tables 1 and 2. First, the results were analyzed using the Bonferroni method.This showed that the results of the TG and ARG groups were statistically significantly different for Task 1 (P BF1 < 0.001).Compared with the TG group, the ARG group spent 56.29% less time in completing the training task.
From the perspective of users' subjective feelings, the SUS learnability score for ARG (two questions, 0-4 points per question, 0-8 points overall) was significantly higher than that for TG (P WI < 0.001).A comparison of their mean values shows that the SUS learnability score of ARG is almost twice that of TG.This means that the AR guidance method has stronger learnability than the traditional guidance method and can reduce the learning difficulty and time cost.

Usability
Usability includes the time required to complete the two tasks (Task 2 and Task 3) and the eight questions of the SUS.Their means and standard deviations are shown in Tables 1 and 2.
The implementation of assembly and maintenance tasks required the participants to perform manual operations.After statistical analysis of the completion times, the Bonferroni method was used for analysis.The results showed that there were statistically significant differences between the TG and ARG groups for Subtask 2-1, Subtask 2-2, Task 2, and Task 3 (P BF2-1 < 0.001, P BF2-2 = 0.044, P BF2 = 0.001, P BF3 = 0.008).Compared to the TG group, the ARG group spent 22.80% and 21.42% less time on Task 2 and Task 3, respectively.
In terms of users' subjective feelings, the SUS usability score (8 questions, 0-4 points for each question, 0-32 points overall) for the ARG group was also higher than that for the TG group (P WI < 0.001), and the comparison results for the mean values also showed that (M ARG = 26.00> M TG = 17.80).Compared with the traditional drawings and AO guidance, the participants rated the AR-guided approach as having higher usability.
The results for the total SUS scores (0-100) are shown in Table 2.The total SUS score for TG is "OK" in the adjective rating, while the total SUS score for ARG reaches "GOOD".

Mental Effort
Mental effort is based on SMEQ scores (0-150) and the NASA-TLX scale.Their means and standard deviations are shown in Table 2.
For the SMEQ, there was a significant difference between the TG and ARG groups (P WI < 0.001): TG was rated between "A bit hard to do" and "Not very hard to do," while ARG's rating was very close to "Not very hard to do."In the NASA-TLX scale, it was also shown that the human mental effort required for ARG was significantly lower than that required for TG (P WI < 0.001).This means that the participants in the ARG group needed to use less mental effort than those in the TG group.

Discussion
In this paper, the AR-assisted guided visual assembly and maintenance method for avionics equipment was studied.We first proposed an assembly object pose tracking method based on RGBD data.The assembly guidance information can be superimposed on a real object based on obtained pose data.And our method can reduce the user's interaction pressure through assembly step identification and improve the usability and operation efficiency.Different from the existing AR guidance manual methods [45][46][47], our method enables the overlay of AR instructional information on both static and dynamic components.Experimental results show that our method is particularly well-suited for assembly and maintenance tasks with visual blind areas.
Then, we proposed the AR visual assistant guidance method for blind areas in assembly and conducted experiments with the traditional guidance method for training, assembly, and maintenance tasks in the avionics equipment scene.We collected statistical data and analyzed four aspects of the three tasks: completion time, learnability, usability, and mental effort.The results show that the AR guidance method is significantly better than the traditional guidance method in terms of completion time, learnability, and usability evaluation and that the ARG has a lower mental workload.
In terms of learnability, the ARG group significantly outperformed the TG group in both Task 1 completion time and SUS learnability scores.This was because the ARG group used AR glasses, in which 3D model assembly animations and voice explanations could help the participants intuitively understand the assembly parts, processes, tools and other information and quickly achieved the purpose of enabling participants to build with confidence and think that they could complete the assembly task well and quickly.However, the participants in the TG group needed to understand the content and technical requirements of the assembly order according to two-dimensional drawings.The understanding abilities of participants with different skills are different, which is the reason why the TG group in Task 1 not only took a long time but also had a large SD.
In terms of usability, ARG has higher usability than TG.The main reason for this difference is that, in Tasks 2 and 3, the AR guidance method could guide the operator to complete the AO or MOP step-by-step.The operator did not need to think, only to concentrate on completing the corresponding operation according to the instructions of the system.In contrast, the traditional guidance method relies on 2D paper drawings and process manuals, resulting in workers' attention switching back and forth between the assembly object and the manual.And the continuous shifting of operators' attention leads to an increase in their mental effort.As a result, the completion time increases and the operation efficiency decreases.In addition to that, some participants in the TG group (U TG 1, U TG 3, U TG 5, and U TG 13) reported that for bolt assembly in narrow spaces, it was difficult to find the bolt hole and align the hexagon bolt sleeve at the end of the extension rod with the bolt.However, the participants in the ARG group did not report such problems.On the contrary, some of them mentioned the convenience of perspective presentation in the interview (U ARG 3, U ARG 4, U ARG 7, U ARG 10, U ARG 12, and U ARG 13).The main reason is that the AR guidance method has obvious advantages in visual guidance with respect to blind areas in assembly and maintenance work and can achieve a perspective-like effect.In the task, it helped the operator to better understand the assembly object and the position of the bolt hole in the narrow space.Moreover, it reduces the occurrence of mis-assembly caused by occlusion and effectively improves assembly efficiency and quality.
In terms of mental effort, the experimental results show that the participants in the ARG group put in less mental effort than those in the TG group.This is mainly because the participants with ARG only needed to follow the guidance of the AR information to perform the operations, without having to understand and memorize the steps of the AO and MOP.This can be of great help to the novice or unskilled operator and can effectively reduce the difficulty of the work and improve efficiency and quality in the completion of tasks.
In addition, among the three tasks in the experiment, ARG had the highest efficiency improvement compared to TG in the training task.This means that the AR guidance method is very suitable for training new or unskilled workers.Compared with the traditional 2D paper drawings and process manuals, the 3D animations and voice explanations and interaction could help the users complete the training tasks more efficiently.Furthermore, the efficiency of the ARG group exhibited a more pronounced improvement in Subtask 2-1 compared to Subtask 2-2.This suggests that the AR guidance method and instruction presentation style may have varying impacts on different assembly stages.This disparity could be due to the fact that there were fewer visual blind areas during the assembly of aviation cable glands in Subtask 2-2 compared to Subtask 2-1.

Limitations and Conclusions
Although the proposed AR-assisted guidance method for visual blind areas improves the efficiency of training, assembly, and maintenance tasks, it also has some limitations.
The assembly pose tracking method proposed in this paper has a limited application in terms of objects and scenarios.At present, our method can merely be applied to targets with a certain volume and cannot track small parts such as bolts.And when severe occlusion occurs, the tracking performance will be reduced or even invalid, making it difficult for the system to automatically identify the assembly steps, and human-computer interaction methods such as speech are needed to move on to the next step.Moreover, the computing power of the hardware equipment used in this experiment is limited.Continuous use for a long time will lead to hardware overheating and frequency reduction, resulting in instability of the algorithm.We will improve this the future work by adjusting the algorithm parameters and structure, as well as transferring the computational work to high-performance hardware.
There is redundancy in the presentation of the AR guidance information.We provide AR assembly guidance information, including translucent highlighting rendering of assembly locations, 3D assembly animation, arrows, curves, circles, voices, and text information such as assembly steps.However, some information may not be focused on by participants or may not be presented in the best form.Redundant AR graphics may reduce the efficiency of AR guidance and increase the mental burden of users.In the future, we will consider the user experience and optimize the presentation form of AR information.
The task scenario in the user study was limited and the participants were limited to graduate students with assembly experience.The assembly task is only for avionics, and the space in the equipment bay is relatively narrow.And for skilled workers on actual production lines, the AR guidance method may be less helpful.In future work, we will propose more AR methods applicable to engineering sites to help improve production efficiency.
In this paper, an AR-assisted guidance method for assembly in visual blind areas was developed.Firstly, we proposed an anti-occlusion tracking method for assembly parts integrating multi-modality information.Then, we proposed an assembly step identification and guidance method.The blind-area visualization and auxiliary guidance system was ap-

Figure 3 .
Figure 3. Visualization of corresponding points in depth images: (a) 3D surface points in depth image; (b) corresponding associations (depicted by red line segments) between correspondence points (shown in yellow) and 3D surface points of CAD model (shown in blue).

Figure 3 .
Figure 3. Visualization of corresponding points in depth images: (a) 3D surface points in depth image; (b) corresponding associations (depicted by red line segments) between correspondence points (shown in yellow) and 3D surface points of CAD model (shown in blue).

Figure 4 .
Figure 4.The flow chart of extracting correspondence key points.

Figure 4 .
Figure 4.The flow chart of extracting correspondence key points.

Figure 5 .
Figure 5.The process of assembly step identification.

Figure 5 .
Figure 5.The process of assembly step identification.

Figure 6 .
Screenshots from assembly training with AR glasses: (a) the virtual information is accurately presented on the real part; (b) operational details are trained by 3D animation and speech.

Figure 6 .
Figure 6.Screenshots from assembly training with AR glasses: (a) the virtual information is accurately presented on the real part; (b) operational details are trained by 3D animation and speech.

Figure 7 .
Figure 7. Presentation of 3D assembly guidance information and assembly processes: (a) overlay of virtual information on the actual equipment bay; (b) bolt holes to be assembled are highlighted in yellow; (c) bolt holes that are occluded are highlighted in yellow; (d) the occluded area can be observed through the virtual model.

Figure 7 .
Figure 7. Presentation of 3D assembly guidance information and assembly processes: (a) overlay of virtual information on the actual equipment bay; (b) bolt holes to be assembled are highlighted in yellow; (c) bolt holes that are occluded are highlighted in yellow; (d) the occluded area can be observed through the virtual model.

Figure 8 .
Figure 8. Disassembly steps are indicated with highlighted yellow parts.The tools and parts to be used are indicated in the "Tools Reference", where "M5 bolt*1" implies using one M5 bolt.

Figure 8 .
Figure 8. Disassembly steps are indicated with highlighted yellow parts.The tools and parts to be used are indicated in the "Tools Reference", where "M5 bolt*1" implies using one M5 bolt.

Figure 10 .
Figure 10.The participants' experience with assembly and AR.

Figure 11 .
Figure 11.Avionics equipment installed in the equipment bay.

Figure 11 .
Figure 11.Avionics equipment installed in the equipment bay.

Figure 12 .
Figure 12.The statistical results of the experiments, with dots representing individual participant results (participants from the TG group in blue and those from the ARG group in orange) and black crosses indicating the means: (a) completion time for Task 1; (b) completion time for Task 2; (c) completion time for Task 3; (d) SUS learnability; (e) SUS usability; (f) SUS total; (g) SMEQ; (h) NASA-TLX.

Figure 12 .
Figure 12.The statistical results of the experiments, with dots representing individual participant results (participants from the TG group in blue and those from the ARG group in orange) and black crosses indicating the means: (a) completion time for Task 1; (b) completion time for Task 2; (c) completion time for Task 3; (d) SUS learnability; (e) SUS usability; (f) SUS total; (g) SMEQ; (h) NASA-TLX.

•
Do you have any general comments or questions about the study, task, or system?• What information are you most lacking in visual blind area assignments?• Do you think visual display of bolts is necessary?What about the bolt holes?• How do you feel during the operation of "Assembly blind spot Perspective"?• Do you think visual displays of aeronautical cable interfaces are necessary?

Table 1 .
Means (standard deviations) of the completion times for the tasks.

Table 2 .
Means (standard deviations) of the subjective experimental data.