ECG Electrode Localization: 3D DS Camera System for Use in Diverse Clinical Environments

Models of the human body representing digital twins of patients have attracted increasing interest in clinical research for the delivery of personalized diagnoses and treatments to patients. For example, noninvasive cardiac imaging models are used to localize the origin of cardiac arrhythmias and myocardial infarctions. The precise knowledge of a few hundred electrocardiogram (ECG) electrode positions is essential for their diagnostic value. Smaller positional errors are obtained when extracting the sensor positions, along with the anatomical information, for example, from X-ray Computed Tomography (CT) slices. Alternatively, the amount of ionizing radiation the patient is exposed to can be reduced by manually pointing a magnetic digitizer probe one by one to each sensor. An experienced user requires at least 15 min. to perform a precise measurement. Therefore, a 3D depth-sensing camera system was developed that can be operated under adverse lighting conditions and limited space, as encountered in clinical settings. The camera was used to record the positions of 67 electrodes attached to a patient’s chest. These deviate, on average, by 2.0 mm ±1.5 mm from manually placed markers on the individual 3D views. This demonstrates that the system provides reasonable positional precision even when operated within clinical environments.


Introduction
Models of the human body have gained increasing interest in clinical research and are essential for delivering personalized diagnoses and treatments to patients. They can be used to build a digital twin of a patient's body that can be used for planning curative interventions, predicting the outcomes of intended treatments, or the likelihood of relapses and complications. For most of these models, apart from the knowledge of the patient's exact anatomy, information about physiological processes is required such as the impedance of fibrous tissue forming an infarction scar, which largely differs from the impedance of intact myocardium.
Electrical impedance tomography (EIT) and electrical capacitance tomography (ECT) are used to measure tissue parameters such as impedance or capacitance [1][2][3]. The origin of cardiac arrhythmias or myocardial infarctions can be identified by integrating ECG recordings [4][5][6], and the functions within the human brain [7,8] can be visualized using models, including EEG recordings. All these methods require the positions of between 12 and a few hundred sensors to be exactly known. The larger the positional error, the lower the diagnostic value of the results generated by the model. Consequently, they are less suitable for treatment planning, guidance, outcome stratification, or prevention of complications and relapses.
A commonly used approach is to extract the sensor positions, along with the anatomical details, from Magnet Resonance Image (MRI) stacks or X-ray Computed Tomography (CT) slices. Both approaches require special markers to be attached to the sensors, which are visible in MRI [9,10] or CT scans [11]. Identifying the sensor positions from MRI or CT scans yields the smallest positional errors compared to the true sensor position. However, this approach significantly hinders the clinical uptake and widespread use of electrical impedance tomography (EIT), electrical capacitance tomography (ECT), noninvasive imaging of cardiac electrophysiology (NICE), and other model-based approaches. Patients either have to be exposed to large amounts of ionizing radiation when using CT scans, limiting the use of the aforementioned methods to three applications per year. Although MRI is not bound by this limitation, it is only covered by insurance companies if it is required for obtaining a proper diagnosis and evaluating outcomes.
Given these limitations, alternative approaches that decouple the generation of the underlying anatomical models from the localization of the sensors have been tested [12][13][14]. Alternatives such as magnetic digitizer systems, e.g., the Polhemus Fastrak [12], tracked gaming controllers [13], or motion capture systems, have been used to identify the positions of electrodes relative to the patient's body. The use of photogrammetry, visual odometry, and stereoscopic approaches was already considered more than 15 years years ago [15,16]. The Microsoft Kinect 3D depth-sensing camera (3D DS) was one of the first compact and affordable devices. Nowadays, modern coded light and stereo vision-based models are portable and lightweight enough to be easily attached to or even integrated within a standard tablet computer.
In the past few decades, 3D DS cameras have mainly been used in EEG-based studies to locate EEG sensors on the patient's skull [12,14,17]. All of them use the recorded EEG signals to localize brain activity or identify the focus of a seizure within the cortex. In contrast, very few studies report the use of 3D DS cameras to locate ECG sensors on the chest or even the whole torso [18][19][20]. One reason for this may be that the skull is a rigid structure that does not change its shape when the subject moves during the recording. In contrast, when recording the sensor position on the torso, the patient needs to maintain a specific posture. The instructions provided to the patient on how to achieve and maintain this posture are integral to the entire recording procedure.
In the present work, the positions of 64 ECG electrodes mounted on the torso are recorded using 3D DS camera readings only. Section 2 encompasses descriptions of the overall structure of the developed 3D DS camera-based system, method, and algorithm used for the real-time recording of the individual 3D views of the torso (Section 2.2); the postprocessing steps necessary for extracting the electrode positions (Section 2.3); and the recording protocol used and the instructions provided to each subject participating in the clinical testing (Section 2.4). In Section 3, the results obtained from the five subjects are presented, and in Section 4, these results are discussed.

Materials and Methods
The 3D depth-sensing (3D DS) camera-based measurement of electrode positions can be divided into four main steps: (i) selecting the appropriate 3D DS camera, (ii) defining an appropriate measurement protocol, (iii) recording the 3D surfaces in real time, and (iv) extracting the electrode center points.
The most important component for recording the electrode positions is the 3D camera. It can be characterized by various parameters such as the closest distance d near and the vertical ψ V and horizontal ψ H fields of view (FOV). These parameters define the volume in front of the camera in which objects must be placed to be accurately captured by the depth sensor. Based on these considerations, the Intel Realsense SR300 3D DS camera [21] was selected. Descriptions of the exact selection criteria that led to this decision can be found in Section 2.1.
The human torso represents a flexible object that offers several degrees of freedom for movement and deformation in contrast to the rather rigid skull. The position of each ECG electrode perceived by the 3D DS camera and its relative position to the other electrodes is directly affected by the movements of the patient's body. Therefore, it is essential to Figure 1. Schematic presentation of the overall approach used to extract ECG electrode positions from 3D depth sensor camera data. It is split into a real-time and an offline processing part. The first contains methods to control the cameras white balance, exposure settings and generate textured 3D surface meshes from the recorded depth data. During the offline processing these surfaces are aligned to extract the electrode positions within clusters of marker vertices found using texture images in rg-chromacity color space and the 3D surfaces.
testing are described in section 2.4. In section 3 the results obtained from 5 subjects are 87 presented and discussed in section 4.

89
The 3D depth sensor (3D-DS) camera based measurement of electrode positions 90 consists of several steps. These can be grouped into four main steps, : (i) selecting of the 91 appropriate 3D-DS camera, outlined in section 2.1, (ii) defining an appropriate measurement 92 protocol, introduced in section 2.4, (iii) recording the 3D surfaces in real-time , detailed in 93 section 2.2) and (iv) extracting the electrode center pointssection, explained in 2.3. The first 94 two steps, namely the selection of the camera and the definition of the recording protocol 95 are essential for the acquisition and processing of the data by the other two. They have to 96 be accomplished . 97 The most important component for recording the electrode positions is the 3D camera. 98 It can be characterized by several parameters such as the closest distance d near or the vertical 99 ψ V and horizontal ψ H field of view (FOV) . They define the volume in front of the camera 100 within which objects must be placed to be properly captured by the depth sensor. Based 101 upon these the Intel Realsense SR300 3D-DS camera [22] was selected. The description of 102 the exact selection criteria which led to this decision can be found in section 2.1.

103
The human torso represents a flexible object which offers several degrees of freedom 104 for movement and deformation in contrast to the rather rigid skull. The position of the 105 ECG electrodes perceived by the 3D-DS camera and their relative position to all others is 106 directly affected by movements of the patients body. Therefore it is essential to define an 107 appropriate recording protocol before the first 3D data set is recorded. For the As large 108 displacements may prevent the successful extraction of the electrode positions the patient is 109 required to actively maintain the same posture throughout the whole recording procedure. 110 The details on how this active engagement of the patient can be achieved are described in 111 section 2.4. For the remaining steps iii) real-time recording and offline processing steps 112 iv) offline processing figure 1 provides an overview of the necessary sub steps and their 113 interdependence:

114
A 3D-DS camera combines a depth sensor and a an RGB color sensor within a single 115 device. These two sensors simultaneously record a an RGB color image Ξ rgb and a 16 bit 116 depth image D 16 . The latter image encodes the distance d between the camera and the 117 objects located in front of the camera. The distance of objects which are too close as well as 118 Figure 1. Schematic presentation of the overall approach used to extract ECG electrode positions from 3D depth-sensing camera data. The first step involves methods to control the camera's white balance and exposure settings and generate textured 3D surface meshes from the recorded depth data. During the offline processing step, these surfaces are aligned to extract the electrode positions within clusters of marker vertices found using texture images in the RGB chromacity color space and 3D surfaces.
A 3D DS camera combines a depth sensor and an RGB color sensor in a single device. These two sensors simultaneously record an RGB color image Ξ rgb and a 16-bit depth image D 16 . The latter image encodes the distance d between the camera and the objects located in front of the camera.
The developed real-time recording system is intended for use in diverse clinical settings such as examination rooms in outpatient clinics or local cardiology practitioner clinics. The lighting conditions encountered depend on the pointing direction of the camera and the number of light sources, as well as their brightness and color hue. In order to properly handle these conditions, the white-balancing settings, exposure time τ, and overall gain γ ex of the color sensor are continuously adjusted in real time. Automatic white balancing (AWB), which is described in detail in Section 2.2.1, uses Ξ rgb to estimate the color temperature K W of the dominant light source.
At the same time, a binary mask M D is generated from the depth image D 16 . This M D splits D 16 into foreground pixels representing the torso surface and objects in the background (Section 2.2.3). M D is used to generate a 3D mesh S of the imaged torso surface (Section 2.2.4) and tune the exposure time τ and global gain setting Γ ex of the color sensor. This is achieved by combining M D with the brightness information I of the color image Ξ rgb obtained during the AWB step (Section 2.2.2). The mask M D is also used to outline the patient's contours on the real-time preview screen, along with various system parameters.
When the trigger is pressed, the triangulation component ( Figure 1) generates a 3D surface mesh S, which is stored along with the corresponding texture information Ξ uvs of the torso created from the RGB color image Ξ.
In the offline processing step, a pairwise iterative closest-point (ICP) algorithm is used to align the recorded surfaces S with each other. The resulting transformation matrices are used to extract the 3D positions from the color-corrected texture images Ξ uvs , which have been stored alongside each S (Section 2.3.2). In order to facilitate the steps necessary to identify the color markers attached to the electrodes, an additional color-correction step, which is described in Section 2.3.1, is conducted. The aim of this step is to ensure that the patient's skin color and marker colors are accurately represented across all the recorded texture images Ξ uvs . To achieve this, the Ξ uvs are split into a chromacity image χ rg and the corresponding intensity image I. Both are used to identify the red and blue pixels and related 3D points corresponding to each electrode marker. Details on how this is achieved can be found in Section 2.3.3.
The centers of these markers are coaligned with the centers of the electrode clips and patches. Their positions on the surface are computed by fitting a planar model (Section 2.3.4) to the extracted red and blue points. In the final labeling step (Section 2.3.5), the electrode positions are assigned to the corresponding ECG signals recorded from the patient's torso.
The colors of the markers vary depending on the position and orientation of the electrode clip relative to the torso and 3D DS camera. Therefore, a dedicated calibration procedure is utilized, which is outlined in Section 2.3.6, to determine the ranges of the red and blue color values that represent the electrode markers.

Selecting the Camera
The selected Intel Realsense SR300 3D DS camera [21] is used in narrow or crowded places such as examination rooms in outpatient clinics and cardiology practitioner clinics. In these places, the patient is typically seated on an examination bed or chair placed close to the wall. Consequently, the closest distance d near relative to the depth sensor at which objects may be placed has to be shorter than the shortest horizontal distance d H,min of the patient's torso to any surrounding obstacles such as walls or furniture. The horizontal ψ H and vertical ψ V FOVs determine how tall or wide the closest object can be to be fully captured in its height and width. The minimum required values for ψ H and ψ V can be approximated based on the patient's approximated h torso and the 3D DS camera's d near using the following relationships: According to the datasheet [21] the depth sensor can capture objects located at distances between 20 cm and 150 cm from the camera. This range is more than sufficient to record the surface of the torso. The depth information of each object is captured using an infrared sensor in combination with a near-infrared projector [21,22]. The depth images D 16 are recorded in 4:3 image format, covering a horizontal FOV of 69 degrees and a vertical FOV of 54 degrees at a depth resolution of less than 1 mm. The color sensor of the camera generates the RGB images Ξ rgb in 16:9 format. Its horizontal FOV of 68 degrees is sufficiently well-paired with the horizontal FOV of the depth sensor. With a ψ V of 41 degrees, it covers only 2/3 of the depth sensor in height. This results in a lack of color information for the pixels close to the top and bottom edges of the depth image D 16 , which was considered when outlining the measurement protocol in Section 2.4.

Automatic White Balancing
The color sensor used by the Intel Realsense 3D DS camera offers the possibility to manually tune the color gains ΓR, ΓĜ, ΓB indirectly by adjusting their color temperature parameter K W . This was used to implement a custom AWB component (Figure 1), along with the algorithm proposed in [23], which can handle these varying conditions. After applying a lookup tablev = v based on linearization (gamma decompression) and normalization to the interval [0, 1] of the redR = R , greenĜ = G , and blueB = B Sensors 2023, 23, 5552 5 of 28 color channels, the resulting linear RGB image Ξ is converted into an RGB chromacity χ rg image and a linear grayscale image I that encodes the brightness of each pixel.
From χ rg , all pixels (r i , g i , b i ) are selected that encode shades of gray. The red r, green g, and blue b chromacity values of these pixels are located within a small area around the neutral color gamut point, which has a color temperature of 5500 K, as shown in Figure 2. The basic assumption is that these pixels most likely correspond to object surfaces of a neutral gray color. Consequently, a reddish-colored taint in these pixels must be caused by a low K value of the predominant illumination, and a bluish cast most likely results from a light source with a large K. Overexposed pixels are excluded, as their color most likely results from the saturation of at least one of the three color channels and thus does not properly represent the skin color of the patient or the color of the illuminant. Likewise, underexposed pixels are not considered, as their color is most likely caused by camera noise rather than the light reflected by the imaged object.
For adjusting the color temperature setting K W of the 3D DS camera, only pixels (r i,W , g i,W , b i,W , I i,W ) that are located within a small area surrounding the neutral color gamut point are selected, which is, according to Cohen [23], defined by the chromacity values r = 0.363, g = 0.338, and b = 0.299. This area encloses all pixels that are located within the following two ellipses centered at the color gamut point: Their primary and secondary axes are defined by the standard deviations for the red σ r = 0.0723, green σ g = 0.0097, and blue σ b = 0.0749 chromacity values with respect to the neutral color gamut point, which was determined in [23]. The maximum intensity encountered is max I = 3.
The lower I min = 0.02 and upper I max = 0.98 exposure limits, as defined for each channel in [23], are linearized to I min = 0.02/12.92 and I max = ((0.98 + 0.055)/1.055) 2.4 before applying them to the overall linear intensity values I.
To match K W with the color temperature K of the light source, the overall color gain Γ K of the camera is estimated. The following model is used to simulate how the camera adjusts the gain ΓR of its red and blue ΓB channels when K W is updated.
Neither the lower and upper limits for ΓR and ΓB, nor the color temperature corresponding to equal gain values ΓR = ΓB = 1, are documented for up-to-date 3D DS cameras. It is assumed that ΓR = ΓB = 1 corresponds to the center color temperature K W = (K W,min + K W,max )/2 between the minimum K W,min and maximum K W,max values of the color sensor. On startup, K W is initialized to K W,0 = K W . For the recorded color images Ξ n+1 , the corresponding K W,n+1 = γK W,n is estimated from the previous value of K W = K W,n and a scaler γ reflecting the relative change of K W between two consecutive Ξ n . The color sensor of the used camera has a rolling shutter. Therefore, color images are only considered for estimating the scaler γ and K W after the next exposure time interval has elapsed.  Figure 2. The small area in the rgb chromacity space corresponds to the pixels encoding shades of gray [25]. The red r and green g chromacity values of the natural illumination color gamut of 5500 K define a point which is is shifted slightly off the mean rgb chromacity towards yellowish colors. Standard 3D-DS cameras designed for indoor use like the Intel Realsense TM typically allow to adjust the gains for the red and blue channel to illumination color gamuts between for example 2800 K and 6500 K as indicated on the color gamut curve, whereas within real clinical settings gamuts from 2000 K up to 10000 K may be expected, dependent upon the number of light sources and shades casted by objects and people.
or local cardiology practitioner clinics as well as cardiology operation-theaters. In these 196 locations the illumination can largely vary. For example on a bright sunny day outside the 197 window the sun is shining while in the corner opposite to the window it is illuminated 198 by fluorescent light or tungsten bulbs embedded within the ceiling or mounted on the 199 walls. Shades casted by objects and people also have an impact upon the illumination 200 observed by the color sensor of the 3D-DS camera. It has to be able to cope with these 201 varying illuminations and automatically adopt to the predominant K at the patients location 202 independent of its viewing direction and position within the room.

203
All up to date available 3D-DS cameras which record RGB color images in addition 204 use auto white balance (AWB) programs which are likely based upon grey world [21], or 205 the retinex algorithms [23]. Simple implementations thereof are based upon the assumption 206 that the scene is illuminated by a single source of light producing an evenly distributed 207 color hue to which the camera easily can adopt [21]. In multi illuminant settings this results 208 in the effect that the AWB of the color sensors can be distracted by a strong light source 209 located far from the patient. Consequently the camera fails to properly capture the color of 210 the patients skin and the electrode makers.

211
Standard The color sensor used by the Intel Realsense 3D-DS cameras offer camera 212 offers the possibility to manually tune the color gains ΓR, ΓĜ, ΓB indirectly by adjusting its 213 color temperature parameter K W . This was used to implement a custom AWB component 214 (figure 1), using along the algorithm proposed by [25] . After which can cope with these 215 Figure 2. The elliptic area in the RGB chromacity space corresponds to the pixels encoding shades of gray [23]. The red r and green g chromacity values of the natural illumination color gamut of 5500 K define a point that is shifted slightly off the mean RGB chromacity toward yellowish colors. Standard 3D DS cameras designed for indoor use such as the Intel Realsense TM typically allow adjusting the gains for the red and blue channels to illumination color gamuts between, for example, 2800 K and 6500 K, as indicated on the color gamut curve. However, in real clinical settings, gamuts from 2000 K up to 10,000 K can be expected, depending on the number of light sources and shades cast by objects and people.
The goal is to minimize the distance between the average r W , green g W , and blue b W chromacities of the selected pixels and the r, g, b of the color gamut point that corresponds to a color temperature of K = 5500 Kelvin. To achieve this, r W , g W , and b W are multiplied by the unknown intensity . y = R W + G W + B W to obtain the corresponding mean red R W , green G W , and blue B W color values. These values are scaled by γ using (5). After scaling, the updated r W , g W , and b W are computed using (2).
It is obvious that the unknown intensity . I does not have any impact on the result. It can be omitted from (6) and γ. Consequently, Γ K can be computed from r, g, b, r W , g W , and b W directly.
In Figure 2, it can be observed that the curve along which the color gamut point moves can be approximated for color temperatures K ≤ 5000 K by the line connecting the red corner (r = 1, g = 0, b = 0) and the midpoint (r = 0, g = 0.5, b = 0.5) between the blue (r = 0, g = 0, b = 1) and green corners (r = 0, g = 1, b = 0) of the chromacity space. For color temperatures K > 5000 K, the curve can be approximated by the line connecting the blue corner (r = 0, g = 0, b = 1) with the midpoint (r = 0.5, g = 0.5, b = 0) between the red (r = 1, g = 0, b = 0) and green (r = 0, g = 1, b = 0) corners, respectively. The two midpoints (r = 0.5, g = 0.5, b = 0) and (r = 0, g = 0.5, b = 0.5) correspond to the yellow y = (r + g)/2 and cyan c = (g + b)/2 chromacities, respectively. Based on the ratio y/b, Sensors 2023, 23, 5552 7 of 28 the average chromacity value g W γ of the green channel scaled by γ can be expressed. The resulting expression is inserted into the quadratic Equation (8) obtained from the ratio r/c: Solving (8) with respect to γ yields Along with γ, the actual error E between the neutral illumination color gamuts r, g, b and r W , g W , and b W ; the expected error E * after scaling r W and b W by γ; and the updated value K + W = K W Γ K are computed using (5): Based on these equations, the K W setting of the 3D DS camera is updated During testing, it was found that numerical inaccuracies can prevent the computation of the appropriate estimates for the color temperature K of the predominant illuminant. Therefore, a numerically stable test is used instead to determine whether K W has to be updated or its current value can be kept.

Patient-Locked Auto-Exposure
In addition to the overall color appearance, the light sources that are present also affect the overall light intensity I, which among others, can vary depending on the viewing direction of the 3D DS camera. For example, in the case shown in Figure 3a, the camera is pointing toward a window. In Figure 3b, the camera is pointing in the opposite direction toward the door.
In order to maintain a constant illumination intensity I of the patient's torso, independent of the viewing direction and the overall brightness of all present light sources, the histogram-based auto-exposure AE algorithm proposed in [24] was adopted.
This algorithm is implemented in the exposure component ( Figure 1). It considers only the pixels in Ξ that correspond to the patient's torso. These pixels are selected by segmenting the depth image D recorded by the 3D DS camera into a foreground object (the patient) and the remaining background using the approach outlined in Section 2.2.3. The binary mask M D obtained in this segmentation step is mapped to the color image Ξ using the texture coordinates v uvs computed from the depth image D 16 by the camera control library. All brightness values I i of all pixels covered by the mapped mask M D are considered for adjusting τ. Any other pixels and pixels that are over-or underexposed according to Equation (4) are discarded.
The algorithm proposed by Chen and Li [24] uses the histogram of the gammacompressed grayscale image Ξ computed from Ξ. In order to avoid the computational burden required by an explicit conversion between the linear illumination image I and Ξ, the histogram H(V) is directly computed from the linearized illumination values I i of the selected pixels. This is accomplished by maintaining a lookup table that lists the linearized bin boundary valuesĥ v corresponding to the uniform boundaries h v of the grayscale histogram H(V). The histogram H(V) can then be generated for all considered I i using a  In order to maintain a constant illumination intensity I of the patients torso, inde-205 pendent of the viewing direction, the overall brightness of all present light sources, the 206 histogram based auto exposure AE algorithm proposed by [24] was adopted.

207
This adopted version is implemented in the exposure component (figure 1). It consid-208 ers only pixels in Ξ which correspond to the patients torso. These pixels are selected by 209 segmenting the depth image D recorded by the 3D-DS camera into a foreground object, 210 the patient, and the remaining background using the approach outlined in section 2.2.3. 211 The binary mask M D obtained by this segmentation step is mapped to the color image 212 Ξ using the texture coordinates v uvs computed from the depth mage D 16 by the camera 213 control library. All brightness values I i of all pixels covered by the mapped mask M D are 214 considered for adjusting τ. Any other pixels and pixels which are over or underexposed 215 according to equation 4 are discarded. The algorithm proposed by Chen and Li [24] uses the histogram of the gamma com-217 pressed gray scale image Ξ computed from Ξ. In order to avoid the computational efforts 218 required by an explicit conversion between linear illumination image I and Ξ the histogram 219 H(V) is directly computed from the linearized illumination values I i of the selected pixels. 220 This is accomplished by maintaining a lookup table listing the linearized bin boundary 221 valuesĥ v corresponding to the uniform boundaries h v of the gray scale histogram H(V). 222 The histogram H(V) can than be generated for all considered I i using a left bisection 223 search to scan this lookup-table, which is a lot less computationally demanding. A further 224 reduction can be achieved by precomputing for each bin the differences To compute the values of the exposure time τ and overall gain Γ ex to be set on the camera, the overall exposure parameter * The parameter τ ∆ represents the size of one τ step in milliseconds, N τ = 5 represents the number of steps to take if S(V) = 1, and τ f rame = 100 ms represents the optimal exposure time for each frame. The value of τ ∆ depends on the actual step size in ms offered by the 3D DS camera.

Depth Segmentation
The binary mask M D is created from the 16-bit depth images D 16 recorded by the 3D DS camera. It splits the image into the patient and any surrounding objects, obstacles, and Sensors 2023, 23, 5552 9 of 28 relevant edges. This implementation was inspired by the Canny edge detection algorithm proposed in [25]. The algorithm uses two thresholds to find the edges in an image Ξ based on the gradient ∆Ξ of its corresponding grayscale image Ξ. Pixels that have a gradient value ∆x i that exceeds the upper limit are considered to be part of an edge. Pixels with a value of ∆x i between the two limits are only included in an edge if they are adjacent to an already identified edge pixel. To improve the obtained set of edges and reduce the number of edges caused by noise, the grayscale Ξ is smoothed using a Gaussian filter.
This approach was adopted for processing depth images D 16 that contain pixels for which no valid depth value D i = 0 is available. The computation of the depth value gradient ∆D i and one of the corresponding Gaussian filter weights w i are computationally too demanding to be computed in real time. Therefore, the depth gradient values The smoothedD is filtered using an octagonal Laplace kernel to find the initial set of edge pixels d e ,  An octagonal kernel has the advantage that all distances between the eight-connected neighbor pixels d 8 and the central pixel d c are of equal length.
All pixels d that exhibit a sign change between opposing neighbor pixels ∇d 8 on the Laplacian image ∇D are included in the initial set of edge points d e . Pixels d e that have at least one neighbor d k,8 = 0 with an undefined depth are considered primary edge pixels e P . Their actual ∆D(d e ) values are computed using the following approach: All d e where ∆D(d e ) > ∆D P are marked d P , whereas any other d e are only considered if the Canny rule for minor edge pixels d M holds. This rule has been modified for use on depth images D as follows: The upper Canny limit ∆D P is set to 1.2 cm and the minor limit ∆D M is set to 0.35 cm. As the depth values at the boundaries of L + 4 can largely vary, the following approach is used to remove any unrelated outliers. This approach is based on the observation that the boundaries of the patient's torso are well-separated from the background along the vertical direction and above the head.
The values D r,close and D r, f ar correspond to the smallest and largest depth values encountered for the mask pixels M i within each row of L + 4 , and D r,close , D r, f ar , σD r,close , and σ(D r, f ar ) represent their mean and standard deviations. Any pixel M i for which the condition in (18) does not hold is removed from L + 4 . In the case that either the number of pixels of L + 4 is less than 200 or no appropriate values for D f ar or D close could be found, the current L 4 is discarded and the search for a suitable L + 4 representing the patient is attempted with the next larger L 4 . If no suitable L 4 is left, segmentation is aborted and real-time processing continues with the next set of depth and color image frames recorded by the 3D DS camera.

Surface Mesh Generation
The final surface mesh is generated by converting the depth image D into a corresponding point cloud P. Therein, each point v i corresponds to a specific pixel d i in D. In the case of pixels d i = 0 without a defined depth value, the origin point v i = O = (0, 0, 0) is assigned. The unique correspondence between any d i and its corresponding v i allows creating S by mapping a pre-triangulated grid G to P. Any triangle T that includes at least one v i for which d i = 0 is dropped from G.
Before S is stored on disk using the .obj format, along with Ξ uvs and the color temperature setting K W it was recorded with, degenerated T A=0 and occluded triangles T −1 that do not correspond to a valid surface patch are removed. The filtering of T −1 is facilitated by the fact that 3D DS cameras, especially those that can capture objects located a short distance from the camera, use a dedicated RGB color sensor to record Ξ uvs . This sensor is typically attached to the left or right side of the depth sensor system and thus views the imaged object from a slightly different angle. This difference in viewing angle and FOV between the depth and the color sensor is sufficiently large to identify triangles that do not represent a part of the object's real surface. This small difference in viewing angle causes the surface normal n −1 to flip its direction between the representation of T −1 in the depth image D and in Ξ. This flip is not plausible as it would mean that the color sensor is capturing the back side of T −1 , whereas the depth sensor captures its front side. This is prevented by the fact that both sensors are mounted on the same support. The following approach exploits this fact by identifying triangles where the sign, and thus the direction, of the surface normal vector appears flipped in Ξ compared to D.
The pre-triangulated grid G is initialized such that the normal vector n T of each triangle T on S points toward the camera and is oriented in the negative m, Z < 0 viewing direction Z of the camera. For every valid T of initial surface mesh S, the normal vector n uvs of its representation in Ξuvs T uvs must also point in the −Z uvs direction. Triangles T −1 where the signs of n and n uvs are opposite, indicated by n uvs , Z uvs >= 0, suggest that triangle T −1 likely does not represent a valid part of S and should be removed.
In addition, triangles T A=0 with a degenerated representation T uvs in Ξ uvs are removed. This includes triangles with an area A uvs < 0.25 pixels, as well as cases where T uvs has a shortest edge of less than half a pixel and triangles that extend beyond the top and bottom corners of Ξ uvs .
Further, skinny triangles T ϕ<13 are discarded if they enclose at least one angle ϕ between any two edges e a , e b , and e c that is smaller than 13 degrees, and if the lengths |e c | and |e b | of its longest two edges e c and e b conform to the following conditions: To compute the average length |e KNN | and standard deviation σ|e KNN |, only triangles T KNN are considered that are formed by any three K-nearest neighbors v KNN located within a radius of max(|e b | * 0.9, |e a |) around the tip vertex v ϕ<13 of T ϕ<13 and the midpoint of its shortest edge e a . Additionally, any T ϕ<13 that has to be discarded according to (21) will result in the deletion of all adjacent T ϕ<13 connected to its e b or e c . In addition, in the case of any T ϕ<13 satisfying (22), only the T ϕ<13 adjacent to e c is removed. Finally, duplicate vi i ≡ v j encoding the same point and v not referenced by any triangle are removed from the surface S, along with all small disconnected surface patches S dis .
The surface S is stored on disk in .obj format, along with the corresponding texture information Ξ uvs . Its triangle n T and vertex normals n v are recomputed, and a transformation is applied to all vertices and normals. The latter ensures that the z-axis points in the direction of the patient's head and the positive x-axis extends from the left to the right side of the torso. The origin point is selected such that it is located on the central viewing axis of the camera. To compute its y-component, the point cloud is divided into 3 sections along the vertical direction, roughly representing the chest, belly, and hips of the patient from top to bottom. The points within the top third are further split into 5 subgroups from right to left along the x-axis. For the rightmost and leftmost groups, the median coordinatesŷ r andŷ l are computed. Based on these values, the final y-coordinate of the origin point is computed as y =ŷr +ŷ l/2.
This ensures that all surfaces are located close to each other and that they partially overlap. At the same time, the actual relative shift between the surfaces and the angle at which the camera views the surface is retained as much as possible. This is crucial for the registration process described in Section 2.3.2.

Offline Processing
The electrode positions are computed using a set of at least 14 recordings of the torso surface, covering a minimum angle of approximately ≈270 degrees in the horizontal plane. The necessary steps, depicted in Figure 1, are presented in the following subsections. These steps include the pairwise alignment and registration of the recorded surfaces S, as described in Section 2.3.2; the extraction of the points v representing the colored electrode markers, as described in Section 2.3.3; and the fitting of a model of the marker to identify its central point, as described in Section 2.3.4. In the final step, a unique label is attached to each position, which uniquely links the individual ECG signals and the 3D position of the corresponding electrode.

Color Correction
The color sensor of the Intel Realsense SR300 camera (Intel corporation, Santa Clara, CA, USA) offers only a limited range (between K W,min = 2500 and K W,max = 6500) within which the color temperature parameter K W can be tuned using the algorithm discussed in Section 2.2.1. This range is optimized for indoor use [21,22], where typical light sources include incandescent tungsten lamps (K = 2500), fluorescent lights (K = 3800), and standardized CIE sources such as CIE55 (K = 5000) or CIE65 (K = 6500).
The space limitations encountered in clinical settings, for example, outpatient and cardiology practitioner clinics, result in more challenging illumination conditions that can vary significantly depending on factors such as the patient's seating position or the camera's direction. Specifically, individual objects and parts of the room may be shaded by other objects, for example, the electrodes on the patient's back. Shaded areas are characterized by color temperature values K > 7000, which are significantly larger than the K W,max = 6500 upper limit assumed by the color sensor. Examples of this situation are shown in Figure 4a,c.   The color sensor of the Intel Realsense SR300 camera offers only a limited range 336 between K W,min = 2500 and K W,max = 6500 within which the color temperature parameter 337 K W can be tuned using the algorithm discussed in section 2.2.1. This range is optimized for 338 indoor use [21,22] where typically light sources ranging from incandescent tungsten lamps 339 (K = 2500), fluorescent lights (K = 3800) to standardized CIE sources such as the CIE55 340 (K = 5000) or the CIE65 (K = 6500) are used.

341
The space limitations encountered in clinical settings for example out-patient and 342 cardiology practitioner clinics cause a lot more challenging illumination conditions which 343 largely vary dependent upon parameters like the place where the patient is seated or the 344 pointing direction of the camera. Especially individual objects and parts of the room can 345 be shaded by other objects, for example the electrodes on the back of the patient. Shaded 346 An additional color-correction process is applied to the recorded texture images Ξ and the 3D surfaces. A virtual camera is used to simulate the recording of Ξ with a different K W setting than the actual one. This virtual camera offers an AWB range between K W,min = 2000 and K W,max = 9000. It uses the model introduced in Section 2.2.1 to adjust the gain of its red ΓR = Γ K and blue ΓB = 1/Γ K color channels.
The virtual camera internally stores a linearized and normalized representationΞ = of Ξ uvs . This representation corresponds to an image recorded with an equal gain Γ K = 1 and K W = 5500.
Its white-balancing parameter K W is initialized to the color temperature K uvs at whicĥ Ξ uvs was recorded by the color sensor of the 3D DS camera.
After initialization, the color-correction approach described in Section 2.2.1 is used to adjust the K W of the virtual camera until a suitable value for K + W is found. If K + W jitters around its ideal value for at least 20 repetitions, the color correction stops when the following condition is met: In this case, K + W is set to the mean value K + W of the last 3 minimum updates for which the difference between consecutive K + W values is less than 10. With each update of K W , a new version of Ξ uvs is created by multiplying the red color valuesR = ofΞ = by the updated Γ + K , multiplying the blue values by 1 /Γ + K , and performing a left bisection search on the lookup tableV = V established in Section 2.2.1. Pixels that are overexposed according to (4) are not modified. Pixels that appear overexposed after scaling and exceed a maximum value of 1 in at least one channel are assumed to be fully saturated in all three channels, which are each set to the maximum value. Pixels that appear underexposed, with at least one channel having a value less than 10 −8 , are assumed to be unexposed in all channels. Therefore, in such cases, all three channels of the pixel are set to 1 when fully saturated and 0 when unexposed. Additionally, all channels are clipped to the maximum possible value of 1 if necessary. The color-optimized version of Ξ uvs (Figure 4b,d) is then used to extract the 3D points of the electrode markers, as described in Section 2.3.3.

Surface Registration
To align the surfaces, a point-to-plane algorithm was chosen. This kind of ICP algorithm minimizes the distances l = | v S − v T | between corresponding v S and v T along the direction of the surface normals n T of S T .
A precise alignment between S T and S S across all surface pairs is achieved when Equation (25) is also minimal in the reverse case with S T and S S swapped. The following simple symmetric point-to-plane approach is used by the registration component ( Figure 1) to align the surfaces. It was chosen in favor of other symmetric point-to-plane algorithms such as [26], as it can be directly implemented using unidirectional ICP functions from open3D library [27]. In the first step, the forward transformation matrix f is computed for the set of corresponding points (v T , v S ) ∈ C f by applying (25). In the second step, the reverse transformation R is computed for the points (v S , v T ) ∈ C r corresponding to the reversed setup. The initial 0,r is initialized as − 1 f . The set (v T , v S ) ∈ C f is selected from a subset of v S that is located within the maximum correspondence distance l c of v T . The same selection criterion is used for the reverse set (v S , v T ) ∈ C r with respect to any v S . In the final step, the optimal transformation and the new correspondence distance l c,+1 are selected from f , r , l c+1, f , and l c+1,r using the following criteria: The surfaces S recorded using the approach described in Section 2.2 are aligned such that they more or less share the same space, apart from the small rotation ∆ϕ along the horizontal direction and the relative vertical movement ∆z between the cameras. No information about their orientation in space or how much each pair overlaps is recorded.
For obtaining sufficiently precise positions of the electrodes, the optimal correspondence distance l o between any (v T , v S ) should be l o 1 mm. Therefore, the symmetric ICP registration is repeated for each pair in multiple runs. The results obtained for and l c+1 in the previous run are used to initialize 0 and l c in the next run. If the condition in (26) for updating and l c fails, one last run is attempted with l c = l min ≈ 1 mm if l c < l min and l c−1 − l c > σl 0 holds. For the first optimization run, 0 is initialized to roughly reflect the relative rotation about the z-axis between two recorded surfaces S T and S S and its relative shift ∆z along the z-axis. The following approach is used to estimate the relative rotation angle ∆ϕ between S T and S S : The rightv r,S ,v r,T and leftv l,T ,v r,T median points define the horizontal directions of the sagittal planes with respect to S S and S T . They are computed using the same approach described in Section 2.2.4 to define the final position of the origin along the y-coordinate.
Suitable estimates for l c,max , l c,min , and σl 0 are essential for achieving a sufficiently precise alignment of S T and S S . When testing the implementation of the symmetric ICP, it was empirically found that the values for l c,max , in particular, varied significantly depending on the relative distance and angle between two consecutive surfaces. Initially, constant values were assigned to l c,max and l c,min . However, these values resulted in an insufficient alignment between the surfaces on average. Specifically, the alignment of the surfaces at the left side where the front and back sides of the torso meet was rather challenging, and in some cases, not possible at all.
In order to improve the results and ensure a proper alignment between the surfaces, the following approach is used to determine suitable estimates of l c,max , l c,min , and σl 0 for each pair of S T and S S . These estimates are computed based on the distances between the vertices v T and v S within the volume V T∩S = V T ∩ V S , which represents the common region of the axis-aligned bounding boxes V T and V S encompassing the target surface S T and the source surface S S . The latter S S is obtained by applying an initial transformation 0 to the source surface S S . The transformation 0 shifts all v S ∈ V T∩S such that their center of massv S aligns with the center of massv T of all v S ∈ V T∩S . The value for l max is obtained by applying (26) to the distances between the points in the forward correspondence set (v T , v S ) ∈ C f ,0 and the backward correspondence set (v T , v S ) ∈ C r,0 . Both sets are found through a KNN search [28,29], which also considers the surface normals n T and n S in each v T and v S . This approach has the advantage of considering only v T and v S as corresponding when their surface normals n T and n S are closely aligned. From the resulting C f ,0 and C r,0 , any v T and v S are removed if the deviation between their surface normals n S and n T exceeds 30 degrees, ensuring that n S |n T < 0.98.
The estimate for l min is based on the overall mean(|v T |, |v S |) of the shortest neighbor distances within all v T ∈ V T∩S and v S ∈ V T∩S .
From the final of all consecutive pairs of S T and S S , the global alignment of each S i is determined by the cumulative transformation = ∏ i j j , starting with identity Re = I for the first surface S 1 . Alternatively, the transformation of the first surface can be initialized by the horizontal camera inclination angle ϕ about the z-axis using (27). From this, the relative angle between S 1 and the x-axis of the patient's frontal plane is computed. This already provides a rough alignment of the resulting point cloud of the torso with its frontal plane.

Electrode Marker Extraction
In the current setup, the electrodes are attached to g.LADYbird TM active electrode clips from g.tec medical engineering GmbH, Schiedlberg, Austria. These clips have a circular head, with its center aligned with the center of the electrode. The clip itself is covered with red-colored epoxy to protect the integrated electronics from water and other liquids. The circumference of the head is painted blue to model a circular electrode marker with a blue boundary and a red central disk. Figure 5 shows an example of this basic setup.
The blue boundary color (see Figure 5b) is selected such that the electrode marker easily can be detected within the RGB chromacity space representationsχ rg of the surface texture images Ξ uvs . Theχ rg values are obtained as a byproduct of the white-balancing and light color-temperature correction approaches described in Section 2.3.1.
Eachχ rg is scanned for red x r = r r , g r , b r and blue x b = r b , g b , b b pixels that are fully described by one of the following two ellipses within the RGB chromacity space.
(δr r cos(φ r ) − δg r sin(φ r )) 2 σ(r r ) + (δr r sin(φ r ) + δg r cos(φ r )) 2 σ(g r ) <= 1 (28) with δr r = r r − r r , δg r = g r − g r The values r r , g r , r b , φ r , g b , and φ b define the red and green coordinates of the center point of the ellipsis and the rotation angle by which each of them is rotated with respect to the red axis of the RGB chromacity space. Their values are determined through the calibration procedure described in Section 2.3.6. All matching x r and x b pixels are mapped to their corresponding 3D vertices v r and v b on the torso surface S. This mapping is accomplished by computing the barycentric coordinates of each x r and x b within the representation of the surface triangle T in Ξ uvs . considers the surface normals n T and n ′ S in each v T and v ′ S . This has the advantage that 501 only v T and v ′ S are considered corresponding when their surface normals n T and n ′ S are 502 also closely related. From the resulting C f ,0 and C r,0 all v T and v ′ S are removed for which 503 n ′ S and n T deviate by more than 30 degree and thus n ′ S |n T < 0.98 holds.

504
The estimate for l min is based upon the overall mean(|v T |, |v ′ S |) of the shortest neighbor 505 distances within all v T ∈ V T∩S ′ and v ′ S ∈ V T∩S ′ .

506
From the final ℜ of all consecutive pairs of S T and S S the global alignment of each S i 507 is defined by the cumulative transformation ℜ = ∏ i j ℜ j starting with identity Re = I for 508 the first surface S 1 . Alternatively the transformation of the first surface can be initialized by 509 the horizontal camera inclination angle ϕ about the z axis using using 27. Thereby 27. From 510 this the relative angle between S 1 and the x-axis of the patients frontal plane is computed. 511 This already provides a rough alignment of the resulting point-cloud of the torso to its 512 frontal plane. The resulting marker point cloud P M formed by all v r and v b is filtered with respect to v r and v b , which likely correspond to a valid electrode marker, as defined by the color of the clip head. This is achieved by a radius-based KNN search for at least one neighbor of the opposite color. The radius is set to the radius of the clip head for all v r and the width of the blue boundary ring for all v b . If the neighborhood of radius ρ does not contain any points of the opposite color, v is removed from P M .
The filtered P M is split into individual clusters of v el ∈ v r ∪ v b , representing the individual electrode clips. This is accomplished by applying the HDBSCAN algorithm [30]. The results are more robust compared to the basic DBSCAN algorithm [31], especially in the presence of groups of outliers, for example, generated by a bluish shadow cast on the cables and electrode clips. In addition, a minimum distance split can be defined, and clusters are not split any further. In contrast to the basic DBSCAN [31] algorithm, split defines a lower boundary limit rather than a strict cutting distance. In other words, less dense clusters with an average density exceeding split are not necessarily forced to split into distinct leaf clusters. The parameters of the minimum cluster size N C,min and the minimum samples X C,min = 20 are used to fine-tune and control the extraction of clusters that represent the individual electrode markers, considering the actual number of electrodes N el .
In order to simplify the subsequent processing steps, the overall point cloud P S , as well as PM, is realigned such that the frontal plane of the torso is in line with the x-z plane of the coordinate system. This is achieved by once again splitting P S into chest, belly, and hip sections. The points of the chest section are further split along the x-axis into three parts, representing the right shoulder, neck, and left shoulder. The final transformation is computed by aligning the vector between the median points of the left and right shoulders to the x-axis of the frontal plane.

Fitting Marker Model
The red points v r and blue points v b within each cluster are fitted to a planar marker model consisting of a red disk enclosed within a blue ring. Before fitting, all v r and v b are projected onto the plane Q cl , which is parallel to all v el .
This ensures that all v el are located on Q cl , which is defined by the predominant surface normal vector direction n mj within all surface normal vectors n el v el and their center of mass v el .
The shifted v el are then fitted to the following model, which is based on the distances ρ(v el ) between the individual v el ) and the electrode center X on Q el . (34) δ ρ represents the relative distances of the blue points v b from the boundary of the enclosed red disk with a radius ρ disc . From all the red points v r within a cluster, the model selects those that are within a radius ρ r < ρ disc from the current X. The model in (34) is optimized with respect to X using the L-BFGS-B algorithm provided by the SciPy minimize function. This numerically robust algorithm was selected because it can achieve satisfactory optimization results for least-squares optimization problems. Its implementation details can be found in the SciPy manual and [32,33]. For all clusters for which an min (X) could be found, X min is stored, along with n mj . Any remaining clusters for which no appropriate X min could be found are not further considered.
In some cases, it is possible that a clip is split into two smaller clusters. For example, if an electrode array is carelessly attached to the torso, electrode leads can shadow the relevant parts of the clip head. This might be the case when the following condition holds with respect to the counts of v r or v b : Two neighboring clusters are considered pieces of the same marker only if at least 10 closest neighbors of any v el in the first cluster are closest to at least 85 distinct v el in the other cluster. The cylindrical model is fitted to the largest piece of the marker only. This prevents nearby image artifacts in the Ξ uvs from causing misalignment of the affected electrode marker and distracting the center point from its true location.
The identified cluster centers X are triangulated using the ball-pivoting method [34,35] implemented in the open3D library. The radii ρ 1 = x/ √ 2 and ρ 2 = x for two distinct balls are derived from the average distance x = mean(|X 9 − X|) between each X and its 9 closest neighbors. Outliers are removed if |X 9 − X| > x + 2 * σ(|X 9 − X|) before computing x. For a final check to determine if the X of neighboring clusters resemble two pieces of the same marker, the surface connectivity between individual X is computed. The marker attached to the largest group, where two X for which |X 1 − X 2 | < 2 /3ρ 0 holds, is retained, whereas the other is removed. Ball-pivoting triangulation and the removal of small clip pieces are repeated until no more nearby groups, represented by distinct X, are found. The remaining X that are included in the resulting triangular surfaces represent the frontal and dorsal patches of the electrode grid layout proposed in [36]. Clusters that are too far away to be included in the mesh by the ball-pivoting process are considered single electrodes, similar to those used, for example, in Einthoven I, II, and III.
In the final step, the triangular meshes of the frontal and dorsal electrode patches are normalized. In this process, any vertical edge that intersects the horizontal line between two common neighbors of its endpoints is swapped with the edge that connects the common neighbors.

Label Assignment
Starting from the point with the smallest y-coordinate, the triangulation of the frontal patch is scanned line by line. All electrodes that can be connected along consecutive horizontal edges are joined into one row of the frontal patch [36] and stored in right-to-left order. The rows are ordered from bottom to top. After all rows of the frontal patch have been collected, the same approach is applied to collect the electrodes of the dorsal patch. Again, the electrodes are stored in right-to-left and bottom-to-top order.
On the frontal patch, the number labels for each channel are assigned in ascending order from bottom right to top left. The dorsal assignment starts at the top right and ends at the bottom left. The remaining electrode points X that have not been included within the triangulation of the frontal and dorsal patches either correspond to the three Einthoven leads I, II, and III if they are located on the arms close to the front of the left and right shoulders and on the left hip. The electrode array includes two additional electrodes that are placed frontal and dorsal close to the right side of the torso.

Calibration
The proposed method to identify the color electrode markers requires proper calibration of the mean values r r , g r , r b , and g b ; the standard deviations σ(r r ), σ(g r ), σ(r b ), and σ(g b ); and the rotation angles φ r and φ b of the ellipses in Equations (28) and (29). In the first step, the color-corrected chromacity representationχ rg of the texture images Ξ uvs obtained as a byproduct in Section 2.3.1) is roughly segmented. The pixels representing a blue or red pixel of the clips are initialized with the following values: r r = 0.75, σ(r r ) = 0.1, g r = 0.08, σ(g r ) = 0.06, r b = 0.05, σ(r b ) = 0.02, g b = 0.13, σ(g b ) = 0.06, and φ r = φ b = 0.
These values were empirically identified from the chromacity space triangle of the 3D DS camera's color sensor, generated from the pixels of allχ rg . The resulting raw pixel masks Mχ ,raw are stored along with the correspondingχ rg obtained from the data sets of at least three patients. In addition, a binary mask M I selects pixels of Ξ uvs that are properly exposed according to (4). For storing theĉ hi rg on disk, the 16-bit PNG format is used. They are loaded along with the corresponding Mχ ,raw in an image processing program such as Gimp TM or Adobe Photoshop TM for manual segmentation of the clips.
The resulting Mχ, created by manually removing any pixel that does not represent a clip or electrode marker from Mχ ,raw , is used in combination with M I to extract the pixels that are part of the electrode clips and markers visible on each 16-bit χ rg image. Any pixel that does not correspond to a clip, is over-or underexposed, or meets the condition in (3) is not further considered in the following calibration steps. From all other pixel values, a 2D heat map N H with 256 bins for red r and green g chromacity values each is generated and median-filtered using a 7 by 7 neighborhood.
The red and blue color shades of the electrode markers appear as distinct, Gaussianshaped peaks P H on N H . They (1,2) are clearly visible as bright spots on the heat map, as shown in Figure 6. A Gaussian mixture model [37,38] is used to extract the individual clusters C H that represent each peak. Each peak is described as a 2D Gaussian distribution, which can be characterized by its center point or centroid and the standard deviations along each direction with respect to this center. By fitting the individual Gaussian models to the heat map N − H, the actual position, orientation, and area covered by each peak can be found. To compute the initial positions of the cluster centroids, the heat map is binarized and labeled. In this process, any 4-connected set of at least 5 bins is considered a peak if all bin counts n H conform to the following condition: (36) Figure 6. Chromacity heat map of the pixels representing the electrode markers created from the texture images of three patients. The brighter the color, the higher the pixel count for the corresponding point in the RGB chromacity space, represented by its red r and green g chromacity values. For better readability, the RGB chromacity values are displayed in RGB gamma-compressed form. The Gaussian peaks (dash doted ellipses) representing the red (1) and blue (2) pixels of the electrode markers are clearly visible. They can easily be distinguished from the peak (3) representing the color highlights and reflections. Peak (4) is caused by inappropriately chosen values for the parameters required to convert raw color sensor data to the RGB color space.
The cluster C H,r with the highest mean r r red component is used to compute σ(r r ), σ(g r ), and φ r . The values for σ(r b ), σ(g b ), and φ b are derived from C H,b for which 1 − r b − g b = max holds.
σ 1,r , σ 2,r , σ 1,b , and σ 2,b represent the first and second eigenvalues Σ of the covariance matrices cov(C H,r ), cov(C H,b ) of C H,r and C H,b , and u 1,r and u 1,b are the corresponding initial eigenvectors. These values are stored along with the centroids of C H,r and C H , which define the mean values r r , g r , r b , and g b on disk that are to be used in the extraction step described in Section 2.3.3. The remaining clusters 3 and 4 are not further considered as they correspond to the color highlights on the clips (3) or are caused by inappropriately chosen parameters affecting the conversion of the raw sensor signals to the RGB color space (4).

Recording Protocol
The technical approach outlined in Sections 2.2 and 2.3, requires that the patient maintains the same posture throughout the recording. This is only possible if the patient is directly engaged and actively participating in the measurement.
Therefore, prior to the application of the electrodes, the patient is instructed to sit down on a chair. The height of the chair is then adjusted so the patient can comfortably sit upright throughout the recording process. The feet of the patient should rest flat on the floor and the knees should be bent by no more than 90 degrees. If the chair cannot be adjusted in height, an alternative solution is to stack multiple chairs to increase the patient's comfort and encourage them to straighten their back. To ensure optimal recordings without any obstacles, the chair should not have armrests or a backrest and be placed at least 1 meter from any furniture or other objects that can cause shadows. This ensures that the FOV of the 3D DS camera can be optimally used and the operator is able to capture a surface at least every 20 degrees.
After the electrodes have been attached to the torso, the patient is instructed to place the hands on the thighs. The fingers should point inward and the thumbs should point straight toward the hips. The optimal position of the hands is a thumb length before the hips. While the electrode positions are recorded, the patient is instructed to maintain a straight and upright back. Most patients are able to easily maintain this position by slightly straightening their elbows (about 120 degrees between the upper and lower arm). This helps them to move their chest and shoulders into a position that is as upright as possible. This has the effect that the patient is forced into an isometric posture, which can easily be maintained while the electrode positions are recorded. In addition, this position facilitates the recording of electrodes placed under the left axle, for example, the Wilson electrodes V 5 and V 6 .

Results
In the following section, the results are presented. The narrow vertical field of view of the color sensor is one of the main reasons why the 3D images of the torso are recorded in portrait mode. In a typical clinical setting, where space is limited, it is likely that the patient is seated close to furniture or walls. For proper recording of the 3D images, a space of at least 2.5 m by 2.5 m is required. This includes a standard chair without armrests or a backrest, with a diameter of 50 cm, that can provide at least 1 m of space on all four sides of the patient for the operator to move around while recording the images. The remaining space between the patient, the operator, and any surrounding furniture or walls should be 50 cm or less. Both sensors of the camera must be able to properly capture the dorsal part of the patient's torso at distances between 20 cm and 50 cm. This can only be achieved by cameras with FOV angles conforming to (1) such as the Intel Realsense cameras, which have wide viewing angles of ≈70 degrees for both the depth and color sensors when used in portrait recording mode. This is especially important for capturing the dorsal views of the torso.
The color sensor has a viewing ratio of 16:9 between the horizontal and vertical FOVs. This results in a vertical viewing angle of about 40 degrees, which is a lot smaller than the ≈60 degrees of the depth sensor. This can lead to a situation where, for example, around ≈60 columns on the top and bottom of the depth image lack texture information. However, this is acceptable given that consecutive 3D images are recorded in portrait mode with an overlap of about two-thirds, ensuring that the texture images sufficiently overlap.
Thanks to the vertical nature of the patient's torso, in portrait mode, it is easy to keep the patient centered in the image while moving the camera to the next recording position. As the patient's torso covers most of the image space, only very few objects and obstacles located behind the patient are captured by the cameras, which can easily be removed before storing the 3D surface images.
Scanning always starts with the right frontal view of the torso and ends at the right dorsal side. If possible, the right lateral side of the torso can be recorded. This is not essential for extracting the electrode positions and can be omitted in standard recording procedures. It is recommended to explicitly record the right lateral torso surface when there is sufficient space to the right of the patient.
The preview image of the torso, shown in the main area (1) of the user interface shown in Figure 7, is split into a 3-by-3 grid. The center segment of this grid is used as the focus area, representing the central part of the patient's torso. The contours of the largest object containing the focus segment are highlighted in orange. As the camera points at the patient's torso, the contours highlight the boundaries of the patient's torso. The recording of a torso surface segment is initiated by pressing the trigger of the camera. The color of the contour line switches to green and the live preview freezes to indicate that the captured depth and color image have been processed and the 3D surface has been generated and stored. Once the underlying point cloud has been triangulated, occluded and degenerated triangles, as well as detached surface patches, are removed. Then, the contour is updated to mark the parts that will be stored on disk. After the 3D surface information, corresponding texture image, and meta information have been stored, the live preview is started again and the color of the contour reverts to orange. The live preview is updated at a maximum rate of 10 FPS. With the Python-based prototype, update rates between ≈4 FPS and ≈7 FPS can be realistically achieved.
The main preview area (panel 1 in Figure 7) has the same shape as the depth image. For the parts on the left and right sides that are not captured by the RGB image, the edges identified on the depth image are displayed instead. The outline of the patient's torso does not extend beyond the edges of the RGB image. In panel 2 of the preview screen (Figure 7), several recording and camera parameters, such as the frame rate in FPS, exposure time τ in ms, etc., are shown, along with the intermediate parameters computed for automaticexposure control and color correction. In panel 3, the full set of edges identified on the current depth image is displayed. The two vertical lines delineate the area of the depth image that is covered by the color image.
The prototype for real-time recording of the 3D torso surface patches, as well as for postprocessing and calibration, was implemented in Python version 3 using a recent version of NumPy and SciPy [39]. The librealsens version 2 library [40] was used to control the acquisition, convert the depth values into a point cloud, and compute the corresponding texture uvs map for the RGB image. The OpenCV library [41] was used to generate the preview display, and the generation and cleaning of the 3D meshes were accomplished using the open3D library [27]. The most computationally demanding components, the depth-edge detection (Section 2.2.3), automatic white balancing (Section 2.2.1), and patientlocked auto-exposure control (Section 2.2.2), were converted into Python-C modules using Cython [42]. Version May 15, 2023  located behind the patient are captured by the cameras and can easily be removed before 680 storing the 3D surface images.

681
Scanning always starts with the right frontal view of the torso and ends at the right 682 dorsal side. If possible also the right lateral side of the torso can be recorded. For extracting 683 the electrode positions this is not essential and thus can be omitted in standard recording 684 procedure. It is recommended to explicitly record the right lateral torso surface when there 685 is sufficient space to the right of the patient.

686
The preview image of the torso, shown in the main display area (1) of the user interface 687 shown in figure 7, is split into a 3 by 3 grid. The center segment of this grid is used as 688 focus area which marks the central part of the patients torso. The contours of the largest 689 object containing the focus segment are highlighted by an outline of orange color. As the 690 camera points at the patients torso, the contours highlight the boundary of the patients 691 torso. The recording of a torso surface segment is initiated by pressing the trigger of the 692 camera. Thereby the The color of the contour line is switched to green and the live preview 693 freezes to indicate that the captured depth and color image are processed and the 3D surface 694 is generated and stored. As soon as the underlying point cloud has been triangulated, 695 occluded and degenerated triangles and detached surface patches have been removed 696 the contour is updated to mark the parts which will be stored on disk. After 3D surface 697 information, the corresponding texture image and meta information have been stored, the 698 live preview is started again and the color of the contour is reverted to orange. The live 699 preview is updated at a maximum rate of 10 FPS. With the python based prototype update 700 rates between ≈ 4 FPS and ≈ 7 FPS can be realistically achieved.

701
The main preview area (panel 1 figure 7) has the same shape as the depth image. In total, five male subjects between 38 and 70 years of age participated in the present study. Each subject was seated on a chair or examination bed, depending on the available space. After applying the ECG electrodes to the chest and back, the subjects were instructed to maintain the posture described in Section 2.4. The measurement of the torso surface and the recording of a 30 -min long ECG with 67 channels took about 30 min to 45 min. After each measurement, the data were analyzed and the prototype improved accordingly.
The data set recorded from the first subject turned out to be quite limited and, therefore, is not included in the presented results, as it was affected by the automatic white balancing and exposure control of the color sensor, which could not cope well with the diverse and complex lighting conditions. Further, the 3D points recorded by the depth sensor were directly transformed to match the color image captured by the color sensor. This posed several challenges related to occluded surface parts causing undesirable distortions and the introduction of noncausal surfaces. Starting with the data for the second subject, the direct mapping was replaced with the texture mapping approach, which yielded better results and allowed for the implementation of the algorithms for occlusion management and the removal of noncausal triangles, as described in Section 2.2.4.
For each patient, 12 to 15 views were recorded. Each of the views contained a 3D surface described by ≈170,000 vertices and ≈300,000 triangles. As shown in Table 1, between 7 and 21 iterations of the symmetric ICP algorithm were necessary to align the surfaces. The maximum correspondence distance between the points of the surface pairs was reduced in every iteration step, starting from 7 cm-12 cm and reaching 0.7 mm-1.2 mm. More iterations were necessary to align the surfaces joining the frontal and dorsal views on the left side of the torso. In cases where the available space around the subject was insufficient, the number of iterations required to align the surfaces was increased. In the most challenging scenario, the proper alignment of the surfaces was not possible at all. This situation was encountered in the data set recorded from subject 5, where part of the torso surface on the left side was obscured by the backrest of the chair. Among other challenges, this required an increased number of 21 iterations to align the leftmost frontal and dorsal views. Table 1. The symmetric ICP alignment metrics obtained from the data sets of four out of five subjects participating in the study. For an average of 14 angular views, the computation of the pairwise transformation matrix was repeated between 7 and 21 times, with an average repetition rate of 9.7 repeats per surface pair. The average initial distance between corresponding points was 7 cm, the average root mean square error was 0.7 mm, and the average final correspondence distance was 1 mm. Across all subjects, a final root mean square error between consecutive surfaces of 0.7 mm was achieved. Using the proposed approach, 12 to 15 surfaces per patient were registered within 13 min. As shown in Table 2, the extraction of the electrode marker points and the computation and labeling of the electrode positions were completed after another ≈8 min. The recording sessions were part of a larger clinical pilot study investigating the prognostic value of index arrhythmias with respect to the outcome of pulmonary vein ablation, for which the participants provided informed consent. Apart from the 3D camera and ECG recordings, this study was based on clinical data recorded during the patient's clinical treatment. Therefore, CT recordings and other independent means of recording the electrode positions relative to the torso were included. To assess the accuracy of electrode localization, the electrode positions were backprojected onto the individual views of the torso and marked on the corresponding color images. Examples are shown in Figures 3, 4b,c and 8.

Subj
The annotated RGB images were presented to an expert who used the cross-hair tool shown in Figure 8b to manually adjust the position of each marker. In order to facilitate this task, two markers were used: the green marker indicates the backprojected position of the marker and the red marker corresponds to the manually adjusted position. All positions were checked during this process and if necessary, they were moved to better reflect the perceived center positions on each view. When finished, all positions were reprojected onto a 3D space.
For the set of corrected positions of each electrode, the mean point, as well as the mean distance and standard deviation to this mean point, were computed. The resulting values are shown in Table 3, along with the mean and standard deviations of the computed electrode positions with respect to the manually determined mean. Both sets of results were influenced by the accuracy of the registration process and the fact that no unique solution exists for the backprojection of the electrode positions onto the individual views. In addition, the mean and standard deviation of the registration errors and the error between the mean and standard deviation of the distances between the individual projections and their mean point are listed.  the frontal views with the dorsal ones on the left side of the torso. In case the space on all four sides of the subject is was not large enough, than the number of iterations required to align the surfaces increases. In worst increased. In the worst scase the proper alignment of the surfaces might not be possible at all. This situation was encountered in the data set recorded from subject 5. When recording the data set, part of the torso surface on the left side were obscured by the backrest of the chair the subject was sitting on. Amongst other challenges this required an increased number of 21 iterations to align the left most frontal and dorsal views.
Through out Throughout all subjects a final root mean square error between consecutive surfaces of 0.7 mm could be achieved. All With the proposed approach 12 to 15 surfaces per patient can be registered using the proposed approach within 13 minutes. As can be seen from table 2 the extraction of the electrode marker points, the computation and labeling of the electrode position is completed after another ≈ 8 minutes.
The recording sessions were part of a larger clinical pilot study investigating the prognostic value of index arrhythmias with respect to the outcome of pulmonary vein ablation for which the participants provided informed consent. Apart from the 3D camera and ECG recordings this study is based upon clinical data recorded for clinical treatment the participating subjects received. Therefore neither CT recordings nor other means of  Table 3. The electrode positions computed using the proposed approach and defined by manually marking the clip center on each view deviated from each other, on average, by ≈1.9 mm ± 1.5 mm. In addition, the distance between the computed electrode position and the mean point obtained by the projection of each electrode onto each individual view, as well as the average position found by manually marking the center of the clip on each view, are provided. Along with the variation resulting from the ICP-based surface alignment, both values allow for the assessment of how well the proposed approach can approximate the true positions of the electrodes.

Subj.
Manual ( The corrected electrode positions deviated, on average, by 2.3 mm ± 1.4 mm from the mean point, and the computed electrode positions deviated from the mean point by [2.0 mm ± 1.5 mm]. This is in accordance with the limitations posed by the backprojection, where the reprojected points deviated from the computed position by 0.9 mm ± 1.4 mm, and the ICP registration resulted in an average deviation between corresponding points of 0.6 mm ± 0.2 mm. Given the amount of data to be processed per subject, the overall time of 22 min. required to extract and align the electrode positions is quite impressive, considering that only the computations of the asymmetric ICP and the HDBSCAN algorithms are implemented as part of the native open3D library and as Cython scripts, respectively. The rest of the implementation was carried out in Python using NumPy arrays only. In contrast, the expert required between 30 min. and 45 min. to point and place the electrode markers on the 14 views of a single data set.

Discussion
The results are promising given the fact that the torso is a far less rigid structure compared to the skull. Further, the limited space conditions and adverse environmental conditions typically found in clinical settings, e.g., outpatient and local practitioner clinics, are quite challenging. This is evident in the results shown in Table 3 for subjects 4 and 5. In both cases, nearby obstacles such as backrests or furniture limited access to the patient's left side, resulting in increased positional variations of 2.2 mm ± 1.5 mm and 2.4 mm ± 1.8 mm in relation to the mean of the manually defined electrode positions. This is compared to 1.7 mm ± 1.4 mm and 1.6 mm ± 1.5 mm for subjects 2 and 3, respectively.
These values are still in the range reported for recently proposed approaches for localizing electrodes mounted on the human body. As shown in Table 4, few studies exist that evaluate the use of 3D DS cameras [19,20] and photogrammetry methods [18] for localizing ECG electrodes on the torso. The achieved results varied between 1.16 mm and 11.8 mm, depending on the metrics and positional references used. The authors of [20] used the Hausdorff metric to compare the positions obtained from a Microsoft Kinect 3D DS camera to positions found on MRI or CT scans. On average, they achieved a positional error of 11.8 mm, which is an order of magnitude larger than the error between 1.16 mm and 2.5 mm achieved by Schulze et al. [18], Alioui et al. [19] and the present study, all of which used the Euclidean metric instead.
The majority of studies proposed methods for the localization of EEG sensors mounted on the scalp. Apart from Homölle and Oostenveld [8], the achieved average positional errors ranged from 1.5 mm [12] to 3.26 mm [14] using various reference measurements, including the mean of manually placed marks [12,14] and positional references generated using a magnetic digitizer [8,13,16] such as the Polhemus Fastrak. Comparing the positional error of 9.4 mm achieved by Homölle and Oostenveld [8] to all other results, it can be assumed that this was mainly caused by unavoidable inaccuracies when taking the magnetic digitizer measurements.
Considering that the positions of ECG electrodes mounted on the torso are directly affected by any movements, the positional error of 2.0 mm achieved in the present study is a clear indication that the active engagement and participation of the patient in the measurement is essential. The instructions on how the patient can easily maintain a posture that facilitates the recording of the electrode positions have a huge impact on the outcome of the measurements. If the instructions are not clearly defined by the measurement protocol, or not properly understood or followed by the patient, the positional error will increase. For example, subject 4 (see Table 3) changed the position of his arms during the measurement twice. This immediately resulted in an increased positional error of 2.2 mm ± 1.5 mm.
In addition to the limited space, the lighting conditions encountered in the clinical environment, as well as tight schedules, have a direct impact on the average positional error. Varying lighting conditions, including multiple light sources with differing light temperatures, on the other hand, can have a negative impact on photogrammetric approaches and 3D DS camera-based measurements of the torso surface and the electrode positions thereon. Algorithms for automatic white balancing and exposure control have been adopted to improve color constancy across multiple 3D views of the torso and maintain a constant exposure of the torso independent of the viewing direction and angle. In combination with the developed calibration method, this resulted in increased accuracy in identifying those pixels representing the color markers. Table 4. Comparison of the proposed approach for localizing ECG electrodes on the torso using a 3D DS camera with recent developments. In contrast to the large number of publications addressing the localization of EEG electrodes, only a few could be found using 3D DS cameras. The results obtained in the present study are within the ranges found by other studies.

Source
Sensor  [16] Laser scanner Scanner software Magnetic digitizer 68 1.83 1.16 -Time, in particular, is a very limited resource, which largely limits the routine use of magnetic digitizers within clinical environments. For precise positional measurement, the exact placement of the magnetic probe on each electrode and manual triggering of the measurement are required. An experienced user requires about 15 min. to accomplish this task. Any attempt to reduce this time can only be achieved by the less accurate placement of the probe on each electrode, which can result in increased positional errors of 7.8 mm and higher, as encountered by Clausner et al. [43].
In general, keeping the required human interactions and number of related errors as low as possible is one key goal for establishing NICE-based tools and procedures in clinical environments. The time required to localize the electrode positions on the human torso, as well as the amount of ionizing radiation the patient is exposed to, are key factors that can either prevent or facilitate a successful uptake. Alternative approaches currently used to obtain the electrode positions include manually placing markers on CT and MRI scans [9,12,19]. and automatically segmenting and pointing a magnetic digitizer probe to each individual electrode [8,13,16]. These approaches require a significant amount of time (about 45 min.) to point to each electrode, which is more than the 15 min. required for magnetic probe-based measurements. The mentioned approaches suffer from an additional bias related to the individual human perception of the electrode and marker shapes, as well as inaccuracies in the way the pointing probes are placed onto the electrode.
In contrast, the proposed 3D DS camera-based approach is not affected by these kinds of errors. When implemented on a tablet computer, the presented approach will enable clinicians to acquire the electrode positions and torso surfaces within 10 min. Therefore, average positional errors of less than 2.5 mm will be feasible even under limited spatial conditions and tight schedules. Some aspects essential for the successful clinical uptake of the presented approach still have to be addressed. On all color sensors, the raw signals recorded for red, green, and blue channels have to be converted into the RGB color space before they can be used. If the required parameters are not properly calibrated, the resulting images may show a bluish hue that can not be corrected by any white-balancing algorithm. This was the case for subject 5 shown in Figure 3, and caused the additional peak (4) in the calibration heat map shown in Figure 6. During the preparation of future studies, it is necessary to establish an appropriate procedure for verifying and optimizing the settings for these parameters before the first measurement and at regular intervals.
Each 3D DS camera data set also provides a point cloud representation of the torso surface. This is used in current studies to build electroanatomical models for electrocardiographic noninvasive imaging methods from clinical cardiac CT slices only. Further applications of the proposed approach are currently being investigated for enhanced electrical impedance tomography.

Conclusions
In the presented work, a complete 3D DS camera-based system was developed for localizing 67 ECG electrodes identified by color markers. Issues such as varying lighting conditions, including multiple light sources with different light temperatures, and the alignment of individual 3D views were addressed. The implemented recording protocol provides precise rules on how to seat the patient and includes well-defined instructions for the patient to easily maintain a specific isometric posture while all views are recorded. The resulting active engagement and participation of the patient in the measurement helped to minimize positional errors caused by the patient moving during the measurement. In combination with the symmetric ICP algorithm implemented, average positional errors of 2.3 mm or less could be achieved for each measurement. The implemented prototype system localizes the electrodes on the torso with minimal human interaction. It can handle diverse lighting conditions and operate in narrow spaces, as encountered in clinical settings such as outpatients of local practitioner clinics.

Abbreviations
The following abbreviations are used in this manuscript: