Magnetic Field Sensors’ Calibration: Algorithms’ Overview and Comparison

The calibration of three-axis magnetic field sensors is reviewed. Seven representative algorithms for in-situ calibration of magnetic field sensors without requiring any special piece of equipment are reviewed. The algorithms are presented in a user friendly, directly applicable step-by-step form, and are compared in terms of accuracy, computational efficiency and robustness using both real sensors’ data and artificial data with known sensor’s measurement distortion.


Introduction
Magnetic field sensors (magnetometers) are nowadays widely used in a plethora of commercial, industrial, marine, aerospace and military applications. Their applications include but not limited to navigation and attitude estimation, geophysical surveys, archaeology, entertainment devices, consumer electronics and others.
In most applications, sensor's calibration is essential in order to achieve the desirable accuracy level. The purpose of magnetic field sensors' calibration is a twofold. First, as in the case of every measurement unit, calibration ensures that the measurement of the standalone sensor corresponds to the actual value of the magnetic field. To do so, calibration must compensate for all static (manufacturing imperfections etc.) and active (temperature, humidity, etc.) phenomena effecting the accuracy of the sensor's measurement. In addition, when a magnetic sensor is embedded in a larger system, other components of the system may cause disturbances (both static and active ones) to the local magnetic field. Static disturbances are usually caused by magnetic and ferromagnetic materials in the vicinity of the sensor; called hard-iron distortion and soft-iron distortion respectively (more information are given in Section 2). Mechanical or electronic structures embedded in the system, such as motors and coils could also actively distort the local magnetic field and cause significant measurement error.
This review paper focuses on algorithms correcting the dominant linear time-invariant (static) measurement errors, requiring no special piece of equipment for their application. Such algorithms are most commonly used for in-situ calibration of magnetic field sensors which are usually in chip form and embedded in larger systems. The paper presents seven representative calibration algorithms for three-axis magnetometers and compares them in terms of accuracy, robustness, computational efficiency and ease of deployment. The seven algorithms are briefly presented, to introduce all required mathematical expressions, and are summarized in an easy-to-develop, step-by-step form. For the details of the algorithms, the reader is referred to the original works.
The selection of the particular algorithms was done based on their popularity and on our attempt to present as many different calibration approaches as possible. The TWOSTEP [1] algorithm is one of the first algorithms that addressed the full calibration problem (and probably the most popular one). At a later time, Elkaim and Vasconcelos [2] proposed a geometric approach of TWOSTEP which is also very popular. At the same time, Dorveaux et al. [3] offered a non-linear formulation of the problem and they treated it in an innovative, strictly iterative way. In addition, Wu and Shi [4] suggested the most complete formulation of the calibration problem as an optimal maximum likelihood estimation one. The TWOSTEP algorithm, as well as the algorithms proposed by Vasconcelos et al. and Wu et al., consist of a first step deriving an initial solution, and, a second step for improving it. On the other hand, Papafotis and Sotiriadis [5] recommended an iterative approach based on a twofold minimization, which was shown to be extremely effective. Furthermore, a realtime approach by Crassidis et al. [6] using the popular Kalman Filter is discussed. Finally, to represent the recent trends towards Machine Learning, an Artificial Intelligence (AI) method applying Particle Swarm Optimization on the estimation problem is explored [7].
Please note that this review focuses on works for in-situ calibration of three-axis magnetic field sensor without using any special piece of equipment or any other additional sensor. Thus, several interesting works dealing with magnetometer's calibration, in combination with inertial sensors, [8][9][10][11][12] are not included in this work.
The rest of the paper is organized as follows. First, a standard error model for threeaxis magnetic field sensors is presented in Section 2. In Sections 3-9, seven representative algorithms are discussed in chronological order of publication. In Section 10, a method for generating artificial data is proposed and algorithms are evaluated via extensive Monte Carlo simulation to identify their performance. In addition, the algorithms are evaluated using several real sensor's measurements in order to evaluate their performance under real-world conditions. Finally, Section 11 summarizes our findings and provides brief comments for each algorithm. The notation used along the paper is presented in Table 1.

Magnetic Field Sensor's Error Sources and Measurement Model
In this section, the most important linear, time-invariant error sources of three-axis magnetic field sensors are presented. Based on them, a mathematical model relating the sensor's measurement with the actual value of the magnetic field is derived.
The total output error of a magnetic sensor is a combination of several error sources related to the sensing element itself, the instrumentation electronics, manufacturing imperfections and distortions caused by magnetic and ferromagnetic materials in the vicinity of the sensor. The linear, time-invariant error sources with the most significant contribution in the total sensor's error, are listed below: • Bias, or offset; all magnetic sensors suffer from bias, which is a constant distortion. In many cases, it is the most important defect in the sensor's overall error. A 3 × 1 vector, h s , is used to model it.
• Scale-factor error represents the input-output gain error of the sensor. It is modeled by a 3 × 3 diagonal matrix, T s f . • Cross-coupling or non-orthogonality inaccuracies are resulted by the non-ideal alignment of the sensor's axes during manufacturing and are modeled by a 3 × 3 matrix, T cc . • Soft-iron distortion is caused by ferromagnetic materials in the vicinity of the sensor, attached to the sensor's coordinate frame. Those materials do not generate their own magnetic field, but instead alter the existing magnetic field locally, resulting in a measurement discrepancy. This effect is modeled by a 3 × 3 matrix, T si . • Hard-iron distortion is due to magnetic materials attached to the sensor's coordinate frame. As a consequence of the persistent magnetic field created by those materials, the sensor's output has a constant bias. Hard-iron distortion is modeled by a 3 × 1 vector, h hi . • Random noise is the stochastic error in the sensor's output. It is induced by the sensor's mechanical and electrical architecture. It is modeled by a 3 × 1 vector, ε, and it is most commonly assumed to be a sequence of white noise, i.e., ε ∼ N (0, σ 2 ). Let m be the 3 × 1 true magnetic field vector and y be the 3 × 1 measurement vector. With the aforementioned error terms in mind, a widely accepted and well-referenced measurement model for a three-axis magnetometer is the following [1,2,[4][5][6][7]13,14] In most applications, the exact contribution of each error term in (1) is of no concern and, thus, instead of (1), most calibration algorithms use the following, compact form of (1) where T T s f T cc T si and h T s f T cc h hi + h s . This work focuses on algorithms intended to be used with magnetic field sensors requiring no special piece of equipment. In such cases, the calibration is done in the sensor's (body) coordinate frame implying that both the measurement vector, y and the true magnetic field vector, m in (2) are expressed in the senor's coordinate frame.
Note that when expensive laboratory equipment is not available, both the calibration parameters T and h in (2), and the magnetic field vector, m, are unknown. Thus, in most works, multiple measurements of the local (Earth's) magnetic field are used to derive T and h. Note that the Earth's magnetic field varies with location and time and its value (magnitude and direction) is only approximately known by magnetic models such as International Geomagnetic Reference Field model (IGRF) [15]. However it is reasonable to assume that the magnitude of the magnetic field is (locally) constant during the calibration procedure. Based on this fact, most authors formulate an optimization or an estimation problem to derive T and h.

Alonso and Shuster (TWOSTEP)
The TWOSTEP algorithm [1] consists of an analytical centering approach [16,17] for its first step, while in the second step the solution is optimized numerically. The authors initially solved the problem of bias, h, determination when attitude is not known [18] and then extended their method to determine matrix T as well [1].
TWOSTEP is motivated by the assumption that matrix T should not be far from a pure rotation. Therefore by applying polar decomposition it can be written as T = (I 3×3 + D) −1 O where O is an orthogonal matrix and D is a symmetric 3 × 3 matrix so as (I 3×3 + D) −1 to be positive definite. Matrix O can be integrated into vector m since it does not alter its norm. The equivalent measurement model is whereT Therefore, for the full calibration, D and h must be estimated. To this purpose, a set of N measurements, y k , k = 1, 2, . . . N, is used and the corresponding effective measurements z k , k = 1, 2, . . . N, are defined as The last ones can be decomposed into a deterministic part plus an approximately Gaussian noise term, υ k with mean µ k and variance σ 2 k , i.e., υ k ∼ N (µ k , σ 2 k ), given by Since D and h are unknown, the variance σ 2 k is assumed to be similar to measurement's output error variance σ 2 . Hence µ k and σ 2 k can be assumed independent of k. To estimate D and h, Alonso and Shuster define the auxiliary quantities E D 2 + 2D and c (I + D)h, as well as the estimation vector θ containing the elements of the 3 × 1 vector c and those of the 3 × 3 symmetric matrix E, which is formed as TWOSTEP algorithm functions on the estimation vector θ and thus on the auxiliary parameters, E and c and not on the actual calibration parameters, D and h. The transformation from E and c back to D and h is described in (13) and (14).

Initial Estimate
For every measurement, y k , k = 1, 2, . . . N, the following auxiliary variables are defined S k = [y 2 k,1 y 2 k,2 y 2 k,3 2y k,1 y k,2 2y k,1 y k,3 2y k,2 y k,3 ] (6a) The centering approximation is done using the following weighted averages along with their corresponding centered values Sensors 2021, 21, 5288 5 of 31 The first estimation of θ is the centered one given bỹ withP θ θ denoting the centered covariance matrix andF θ θ denoting the centered Fischer information matrix.

Solution Improvement Step
The second step improves the previous estimate of vector θ, derived in (8), via Gauss-Newton method using the centered estimateθ as the initial guess. The estimation is updated as follows where with v j denoting the j th element of vector v. At every iteration the 3 × 3 symmetric matrix E and the 3 × 1 vector c are updated according to the current estimation vector θ i using Alonso and Shuster define the following quantity in order to establish a stop condition for the Gauss-Newton method.
The iterations continue until η i became smaller than a predetermined threshold. After sufficiently many iterations, an optimal estimation of matrix E * and of vector c * is derived. The derived solution is then used to find D * and h * . To this end we apply SVD [19] to the symmetric matrix E * , i.e., where S = diag(s 1 , s 2 , s 3 ), U ∈ O(3). Then find the diagonal matrix W = diag(w 1 , w 2 , w 3 ) satisfying S = 2W + W 2 . Typically, the elements of S are much smaller than unity [1] so a real solution exists with the diagonal entries of W being w j = −1 + 1 + s j , j = 1, 2, 3.
Combining the above, the estimates of matrix D * and bias vector h * are given by and are related to the calibration parameters T and h of the measurement model (2) as follows Summarizing, when the centered estimation is near the ground truth value the Gauss-Newton method typically converges rapidly. The authors verified the robustness of their method via simulations assuming either white noise or colored noise. TWOSTEP is also suitable for on-orbit calibration using IGRF data [15]. The algorithm is summarized in Algorithm 1.

Crassidis et al.
The authors of [6] were motivated by the fact that real-time applications demand realtime calibration methods. To this end, based on the problem formulation (3) established in TWOSTEP [1], Crassidis et al. formulate a real-time identification problem for the derivation of the calibration parameters D and h and solve it using the extended Kalman Filter (EKF) approach. Note that the authors of [6] have proposed two more algorithms dealing with the online calibration of a magnetic field sensor in [20,21]. However, in this work we focus on the EKF based one presented in [6] which is the most efficient and popular one.
Following the problem formulation of TWOSTEP, the bias vector h and the symmetric matrix D are desired. The estimation vector θ is defined differently and contains h and D, structured as follows Because the vector θ is constant, the state model is given byθ = 0. The effective measurement is given by z k = y k 2 − m k 2 (4) while the measurement's model is given by and effective measurement's noise υ k ∼ N (µ k , σ 2 k ) follows (5). At each iteration D k and h k are extracted from θ k according to (16). The propagation is as it follows where P k is the covariance of the estimated parameters for h and D at step k. The matrix H(θ k ) is the linearization matrix of φ(θ k ) and is defined as where S k = [y 2 k,1 y 2 k,2 y 2 k,3 2y k,1 y k,2 2y k,1 y k,3 2y k,2 y k,3 ] (20a) and The noise variance of the measurements, σ 2 k , can be assumed to be constant and equal to σ 2 as in TWOSTEP. Given a set of N measurements, the EKF provides an optimal estimation vector θ * = θ N from which an optimal vector h * = h N and a matrix D * = D N can be extracted according to (16). Therefore, the calibration parameters (2) are given by Under certain conditions, e.g., fast changing data, this approach of sequential calibration may have some advantages in terms of computational complexity and adaptation. The algorithm is summarized in Algorithm 2.

Dorveaux et al.
An iterative algorithm for the calibration of magnetic field sensors based on iterations of a least-squares problem is introduced in [3]. In the beginning of the algorithm, the measurements lie on an ellipsoid according to (2). In each iteration, the measurements move from the initial ellipsoid to the unit sphere, following a cost function minimization algorithm.
The authors in [3] use the following variation of the measurement model of (2) where A = T −1 , B = −T −1 h and the measurement noise, ε, is neglected. The algorithm begins by considering an initial estimate of the magnetic field vectors, denoted bym k (0) and defined as In every iteration, the following cost function is formulated and minimized using the least squares method.
where n = 1, 2, . . . , N denotes the n th iteration. Let A n and B n be the resulting matrices from the minimization of (25). Every iteration ends with using A n and B n to update the estimates of the magnetic field vectors as From (26) we can express the magnetic field estimatesm k (n) using the measurement vectors y k asm whereÃ n andB n are iteratively defined as To determine when the algorithm has reached an acceptable solution, we define the following cost The iterations stop when J stop is sufficiently small. Note that the original manuscript does not provide an explicit condition to stop iterations. However it is reasonable to terminate the algorithm when contribution of the updated A n and B n to the calibration parametersÃ n andB n is negligible (see (28)). The estimatem(N) derived at the Nth iteration represents the calibrated data and it is: The derived matricesÃ N andB N are related to the calibration parameters T and h of the measurement model (2) as follows Finally, the estimatesm k (N), k = 1, 2, . . . , K, derived at the N th iteration represent the calibrated measurement vectors. The algorithm is summarized in Algorithm 3.
Step 2: Minimize (25) using least squares and derive A n and B n .
Step 3: Use A n and B n to calculatem k (n + 1) from (26).
Step 5: Evaluate the cost function J stop (A n , B n ) from (29).
Step 6: Repeat steps 2-5 until J stop is sufficiently small.
Step 7: UseÃ N andB N to calculate T and h using (31).

Vasconcelos et al.
The authors of [2] consider that magnetometers' measurements lie on a ellipsoid manifold following the measurement model (2). First, they derive an initial estimate of the calibration parameters T and h by finding the ellipsoid that fits best to the given data. Then, they use the measurement model of (2) to formulate a maximum likelihood estimation problem and derive an improved estimate of the calibration parameters T and h.
From (2), the magnetic field vector is expressed as Assuming that the magnitude of the magnetic field is constant during the calibration procedure we can write the following unconstrained optimization problem to derive T and h Here σ k denotes the standard deviation of the measurement noise in the k th measurement, assuming it is the same for all three axes and equal to σ. Without loss of generality, the magnitude of the magnetic field is assumed to be equal to one. A possible relaxation of this soft assumption is provided by Springmann [22] who addresses the problem of time-varying bias. To solve (32), the authors define the following cost function and then minimize it using the Newton's method The vector x is updated in every Newton's iteration as follows where ∇J(x) is the gradient vector and ∇ 2 J(x) is the Hessian matrix of the cost function. For both ∇J(x) and ∇ 2 J(x), the authors in [2] provide analytical expressions which are presented in Appendix A.1.

Initial Estimate
Solving (32) using the Newton's method requires a good initial estimate of the calibration parameters,T and h. Vasconcelos et al. use a previous work on nonlinear estimators for strapdown magnetometers by Foster and Elkaim [23,24], to derive a good initial estimate. Solving the ellipsoid equation m k = T −1 (y k − h) = 1 for every k is equivalent to solving the following pseudo-linear least squares estimation problem by re-arranging the terms as follows where, by writing each measurement vector as y k = y x k y y k y z The vector p is derived as The initial estimates of the calibration parameters are derived aŝ and and The auxiliary variables α 1 , α 2 and α 3 are defined as One contribution of Vasconcelos et al., advancing the existing initial step approach suggested in [23], was the derivation of the aforementioned explicit and non-trivial expressions. In addition, Vasconcelos et al. state that their proposed algorithm is applicable even when the magnitude of the magnetic field is not constant during the measurement, similarly to TWOSTEP and Crassidis et al. algorithm [6]. The algorithm is summarized in Algorithm 4.
Step 8: Split x intoT and h and calculate T =T −1 .

Ali et al.
The authors in [7] propose a Particle Swarm Optimization (PSO) [25] -based calibration algorithm that estimates the bias, the scale and nonorthogonality factors. The main advantage of this algorithm is its simplicity of implementation since the optimization is heuristic and does not depend on calculation of gradients, unlike other optimization techniques mentioned in this paper. It can be classified as an AI [26] approach.
The authors in [7] use (2) and a set of N sensor's measurements to form the following optimization problem for deriving the calibration parameters T and h is called the fitness value.
Function J depends on T and h which are conveniently combined into the single vector x ∈ R 12 , For a swarm of S particles, the position x i ∈ R 12 and the velocity v i ∈ R 12 of the i th particle can be computed using [25] v for i = 1, 2, . . . , S where k denotes the new value while k − 1 the old value. Also p i denotes the i th 's particle best position, p g denotes the swarm's best position, c 1 and c 2 are the acceleration coefficients, w is the inertial weight factor and r 1i , r 2i are random numbers uniformly distributed within the range [0, 1]. Typical values of these quantities are c 1 = c 2 = 2, w = 1 and the number of particles S is usually between 20 and 65. Therefore, at each iteration k, each particle's fitness value J(x k i ) is calculated and quantities p i and p g are updated accordingly. The authors suggest three different stop criteria. Specifically, the iterations stop either when the fitness value J of a particle is smaller than a predetermined threshold, or after a maximum number of iterations, or when the change of J becomes insignificant with iterations. Upon termination of the algorithm, parameters T and h (2) are extracted from the swarms's optimal solution p g according to Following the general concept of applying AI optimization algorithms, as was introduced in [7], one can also consider using more modern versions of the standard PSO, e.g., [27][28][29]. They are typically found as built-in functions in computational suites such as MATLAB [30]. The algorithm is summarized in Algorithm 5.

Algorithm 5:
Ali et al. [7] Step 1: Initialize x i , v i for i = 1, 2, . . . , S and set p i = x i Step 2: Find j = {i|i = 1, 2, . . . , S and J(p i ) ← min} Particle i best: J i min ← J(p i ) Global best: p g ← p j and J min ← J(p j ) Step 3: for each particle i do Update Step 4: Repeat Step 3 until an exit condition is met Step 5: Extract T and h from p g (48)

Wu and Shi
The authors of [4], formulate the calibration of a three-axis magnetometer as a maximum likelihood estimation problem which is solved using the Gauss-Newton method.
Starting from the measurement model of (2), Wu and Shi observed that by considering the QR decomposition T −1 = QR, where Q ∈ O(3) and R ∈ U(3), (2) is written as Definingm Q T m, we observe that m = m since Q ∈ O(3). Also settingT R −1 we have that Using the above transformation, the authors reduce the unknown model parameter variables from 12 (9 for T and 3 for h) to 9 (6 for R since R is upper triangular and 3 for h). Note that using (50), the calibration procedure now aims at finding the calibration parametersT and h while the magnetic field vectorm is also unknown. Using a set of K measurements and (50), the authors formulate the following maximum likelihood estimation problem Without loss of generality, the authors, constrained the magnitude of the magnetic field to be equal to one. Based on (51), the following Lagrange function is formulated and λ k , k = 1, 2, . . . , K are positive Lagrange coefficients for the unit norm constrain. Note that sinceT is an upper triangular matrix, the lower triangular elements ofT are excluded from x. The minimization of (52) and the estimation of x are done using the Gauss-Newton method as follows where ∇J(x) is the Jacobian vector and ∇ 2 J(x) is the Hessian matrix of the Lagrange function. For both ∇J(x) and ∇ 2 J(x), the authors provide analytical expressions which are presented in Appendix A.2.

Initial Estimate
Solving (51) using the Gauss-Newton method requires a good initial estimate of the unknowns. To find one, the authors of [4] use the unit magnitude constrain and the equation 1 = R(y k − h) 2 which after some manipulation, is written as where The authors, solve (56) using the least squares method and denote the solution z e = vec(A e ) T b T e c e T = min Yz 2 . They derive z e as the eigenvector of Y T Y corresponding to its minimum (or zero) eigenvalue. Using z e , the vector z is derived as z = αz e , where α = 4/ b T e A −1 e b e − 4c e . Extracting vec(A), b and c from z, the initial estimates of the unknowns,T(0), h(0),m k (0) and λ k (0) are defined as follows: where chol(·) is the Cholesky factorization.
Finally, an alternative version of Wu's and Shi's algorithm is proposed by Cao et al. in [13], where a different method for the initial estimate is presented, and the second step is identical. The algorithm is summarized in Algorithm 6.

Algorithm 6: Wu and Shi [4]
Initial Estimate Step 1: Calculate Y k , k = 1, 2, . . . , K from (55) and form the matrix Step 2: Find the eigenvector of Y T Y corresponding to its minimum (or zero) eigenvalue and denote it as z e = vec(A e ) T b T e c e T .
Step 3: Calculate z = az e where α = 4/ b T e A −1 e b e − 4c e . Step 4: Extract vec(A), b and c from z.
Step 5: Calculate an initial estimate of the unknowns using (57).

Gauss-Newton Method
Step 6: Use the initial estimates to initialize the vector x of (53) Step 7: Update x using (54).

Papafotis and Sotiriadis (MAG.I.C.AL.)
The authors in [5] use (2) and a set of K sensor's measurements to form the following optimization problem for deriving the calibration parameters T and h where, without loss of generality, the magnitude of the magnetic field is constrained to be equal to one. In order to solve (58) they propose an iterative algorithm, based on the solution of a linear least-squares problem. The algorithm begins by initializing the magnetic field vectors, m k , as and rewriting (2) in a matrix form as follows: In every iteration, (60) is solved for L using the least squares method, minimizing the total squared error E T E 2 From the calculated L, an updated set of calibration parameters T and h is extracted from (61b). Using them, the magnetic field vector is updated as wherem Every iteration ends by updating the matrix G using the updated vectors m k , k = 1, 2, . . . , K. Iterations stop when a small value of the following cost function is achieved MAG.I.C.AL. algorithm is summarized in Algorithm 7.
Step 3: Extract T and h from L using (61b).
Step 4: Update m k using (63) and (64) and use it to update G.

Algorithm Evaluation and Comparison
In this Section, the performance of the presented algorithms are evaluated in terms of accuracy, robustness, and execution speed. Firstly, we evaluate the performance of the seven algorithms using multiple sets of synthetic data where the calibration parameters T and h, as well as the measurement noise characteristics are predefined and known. By doing so, we are able to accurately determine the algorithms' accuracy and robustness. Then multiple datasets of two different low-cost magnetic field sensors are used to verify the algorithms' performance under real-world conditions.

Synthetic Data Generation
We designed a procedure to generate synthetic data effectively, in order to examine each of the aforementioned algorithm's performance across a range of noise variance and measurement sample size. The authors of TWOSTEP [18] propose a typical scenario of assuming the magnetic vector spinning with a constant angular velocity. On the other hand, Wu and Shi [4] suggest a specific sequence of 3D rotations using Euler Angles, applied on a constant known magnetic vector m. In the same page, Papafotis and Sotiriadis [5] recommend a sequence of 12 approximate orientations. Another alternative is to make use of a set of random, yet normalized, vector fields, which however demands a reasonable amount of samples.
Because none of the described algorithms guarantees that it will function properly under an arbitrary dataset, we propose an efficient method to span SO(3), following [31], so as to provide the algorithms with substantial, non-redundant information and to compare them fairly. After extensive simulation, it was observed that the recommended method was very effective in spanning the 3D rotation space.
Our method's effectiveness lies in distributing the points on the sphere m = 1, more evenly by using the canonical Fibonacci lattice mapping [31,32]. Generating a Fibonacci sphere is an extremely fast and effective approximate method to evenly distribute points on a sphere. This way SO(3) is sufficiently represented even with only a small dataset. An algorithm for generating K vectors distributed on a Fibonacci sphere is presented in detail in Algorithm 8.
Considering K vectors, m k , k = 1, 2, . . . , K distributed on a Fibonacci sphere, we continue with generating matrix T and vector h, required to calculate the corresponding measurement vectors y k , m k , k = 1, 2, . . . , K according to (2). Ideally, matrix T would be the 3 × 3 identity matrix while the bias vector h would be the 3 × 1 vector of zeros. A realistic model for T and h, accounting for the sensor's non-idealities, is derived by using the concept of additive perturbation where α accounts for gross scaling errors, E is a 3 × 3 perturbation matrix with random, typically small, coefficients and e is 3 × 1 perturbation bias vector with random coefficients.
Finally, a sequence of white noise ε ∼ N (0, σ 2 ) is added to the measurements and the measurement vectors y k , m k , k = 1, 2, . . . , K are derived according to (2) Algorithm 8: Generation of synthetic data Step 1:Initialize the number of measurements K and the radius of sphere r Step 2: Calculate Golden Ratio: ϕ = 1+ √ 5 2 Step 3: for each k = 1, 2, . . . , K do: Step 4: Pick the scaling parameter, α, the perturbation matrix, E and the perturbation vector, e.
Step 5: Calculate T and h according to (66).
Step 6: Generate a sequence of white noise: ε ∼ N (0, σ 2 ) Step 7: Calculate the measurement vectors: The two datasets generated using Algorithm 8 are presented in Figure 1. Note that for visualization purposes, the scaling parameter, α, the perturbation matrix, E, and the perturbation vector, e, used to create each dataset were set to a rather large value.

Experiment Setup and Evaluation Criteria
To evaluate the performance of the algorithms, we used synthetic data, generated by Algorithm 8, and we executed a great number Monte Carlo simulations. Each simulation consisted of 250 runs of each algorithm while in each run, the same dataset was used as input in all algorithms. An uncertainty was introduced in the generation of each dataset by considering a statistical distribution for the elements, E ij , of the perturbation matrix, E, and the elements, e i , of the perturbation vector e (see (66)). Specifically, for the Monte Carlo simulations we assumed where β and γ are scalars, the effect of which was tested using multiple Monte Carlo simulations. Note that we considered the scaling factor, α, to be close to the ideal value of α = 1. That may not be the case when real-world measurements are used, however, it is trivial, and common, to properly scale the measurements before the calibration procedure and remove gross scaling errors. In this way, the algorithms are not burdened, searching for a scaling relationship which can be easily provided by simple data preprocessing. A challenging point while setting up the experiments was to determine the number of samples of each dataset and the value of the sensor's noise variance, σ 2 . We considered a dataset of 300 measurements as a solid choice for a simulation environment based on [4,7] while we experimentally confirmed that bigger datasets do not improve the performance of any algorithm. We also examined the performance of the presented algorithms when smaller datasets, consisting of 150 and 50 measurements, are used. As far as the noise vari-ance, σ 2 , is concerned, we considered a nominal value of σ = 0.005, following [2,4], while we also simulated the cases of more noisy (σ = 0.05) and less noisy (σ = 0.0005) sensors.
The evaluation of the algorithm for each Monte Carlo simulation was completed in terms of accuracy, execution speed, and robustness. We used the execution speed of each algorithm as a metric of computational efficiency and is defined as the inverse of the mean execution time. As a metric of robustness we considered the percentage of datasets for which each algorithm successfully derived a meaningful solution.
The definition of an accuracy metric is a little more involved. Each algorithm was developed to take as inputs the measurement vectors y k , k = 1, 2, . . . , K and output the calibration parameters T and h. Comparing the output bias vector h with the true one, h true , which was used in the data generation procedure, was performed by defining the following cost The calibration matrix T on the other hand is derived under a rotational uncertainty and comparing it with the true one, T true , is a more challenging task.
Consider the measurement model of (2). Noting that the true magnetic field vector in (2) is also unknown, and derived by the calibration algorithm, we can write: where R is an orthogonal matrix in the O(3) group. Thus, taking into account the rotational invariance of the Euclidean norm which implies that R T m = m , a calibration algorithm may output any matrix T of the form T = T true R. Thus, a proper cost function to compare T and T true is the following where, the matrix R is defined as the solution of the following minimization problem The solution of (72) is given by the orthogonal procrustes problem [33], and it is where the matrices U and V are derived from the singular value decomposition (SVD) of the matrix T T true T, i.e., T T true T = UΣV T , where U, V ∈ O(3) and Σ is a diagonal matrix. Using (69) and (71) we define the following cost function as a metric of accuracy Based on the above and given the results of a Monte Carlo simulation consisted of N executions of each algorithm, we define the following metrics of performance: • Accuracy is defined as the mean value of the cost J, defined in (74), across all N executions with meaningful output. • Mean execution time is defined as the mean value of the execution time of an algorithm. • Robustness is defined as the percentage of datasets for which each algorithm successfully derived a meaningful solution.
The robustness criterion can be seen as the frequency in which an algorithm provides a better solution (T, h) in the sense of the cost function (74), than the trivial solution (I 3×3 , 0 3×1 ) which assumes no bias and non multiplicative errors. Given the cost J o that corresponds to the trivial solution, an execution of an algorithm is considered as successful with meaningful output when where δ ∈ (0, 1) is a robustness parameter. If δ is close to 1, it means that only little improvement with respect to J o is sufficient. As δ gets smaller, better solutions are required. Thus, this parameter can be tuned with respect to the test's objective and the application's specifications. Given N runs for an algorithm, its robustness is denoted by RB(%) and is defined as Here J i and J oi are the values of J (74) and J o (75), respectively, corresponding to the ith run of the algorithm and U is a boolean function, which is one if its argument is true and zero otherwise. Let M denote the number of executions meaningful outputs. Now, the accuracy metric is only applied on the M meaningful outputs according to the robustness test (76), since otherwise the comparison would be unfair for the least stable algorithms. The accuracy of an algorithm over a dataset is denoted by ρ and it is defined as which is the mean accuracy metric value over the M executions with meaningful outputs. Similarly, the time-efficiency metric (i.e., mean execution time) is only applied on the M executions with meaningful outputs according to the robustness test (76). Again, this is because otherwise the comparison would be unfair for the least stable algorithms. The mean execution time of an algorithm over a dataset, is denoted by τ and is defined as where t i is the time needed for the i run to be completed. The execution speed of an algorithm is defined as 1/τ.

Baseline Evaluation
To derive a baseline evaluation of the presented algorithms, we run a Monte Carlo simulation considering typical values for the sensor's error and noise parameters. In this simulation we neglected the effect of hard-iron and soft-iron distortions which are, in some cases, the dominant terms of the overall error, as well as extreme cases of large manufacturing imperfections. More specifically, 250 different datasets consisting of 300 measurements each, were generated following Algorithm 8 and considering the following distributions of the model disturbances and the measurement noise The distribution ranges in (80) are based on our literature review. The selection β = γ = 0.05 corresponds to the typical case of approximately 5% distortion for T and bias h. The measurement noise standard deviation is set to a typical value of σ = 0.005 [2,4].
The performance of the seven algorithms is presented in Table 2. Under extreme manufacturing imperfections or the effect of hard-iron distortion, the magnitude of the offset vector, h, can be much larger than that in the typical case. In this Section, we examine how larger values of h affect the performance of the presented algorithms. To do so, we run six Monte Carlo simulations, each one comprised of 250 different datasets generated by following Algorithm 8. The offset vector perturbation parameter e i is simulated with gradually increasing magnitude by expanding the selection horizon U [−γ, γ]. Afterwards, its corresponding impact on each algorithm's robustness and accuracy is investigated. The distributions of the model disturbances and measurement noise are: where l = 1, 2, . . . , 6 is the index of Monte Carlo simulation. The extreme case of γ = 1 addresses the possibility of bias being clearly comparable and even indistinguishable to the true magnetic vector. Therefore, as γ increases, the algorithms were driven to their limits and their functionality range was identified. All the other parameters were nominal, to ensure a fair comparison. The results of the six Monte Carlo simulations are presented in Figure 2.    where l = 1, 2, . . . , 6 is the index of Monte Carlo simulation. As β increases, the algorithms were driven to their limits and their functionality range was identified. All the other parameters were nominal, to ensure a fair comparison. The results of the six Monte Carlo simulations are presented in Figure 3.   In this section, we examine how the dataset size, K, affects the algorithms' performance. In general, the diversity of the measurement directions is more crucial than the quantity of them, e.g., a dataset of 50 measurements with directions distributed near uniformly on the unit sphere is significantly more suitable for the algorithms than one with thousands of measurements all having approximately the same direction.
According to existing literature [4,5,7], an order of 300 measurements with directions sufficiently covering the unit sphere form an acceptable dataset for the calibration. Here we use datasets with 50, 150, and 300 measurements to test the algorithms' limits. To do so, we run three Monte Carlo simulations, based on 250 different datasets generated by Algorithm 8. The distributions of the model disturbances and measurement noise are: The dataset size K varied whereas the distributions' ranges were fixed to nominal to ensure a fair comparison. The results of the three Monte Carlo simulations are presented in Figure 4.  In general, the dataset size, K, does not seem to be important in terms of robustness. Accuracy is surprisingly high even with only 50 measurements, which is probably an outcome of the well distributed measurement directions using the Fibonacci lattice. Furthermore, the algorithms execution time appeared to be linear with K.
10.1.6. The Effect of the Noise Variance, σ 2 In this section, we examine the influence of measurement noise variance σ on algorithms' robustness and accuracy. The assumption of pure white Gaussian noise in the measurement model was done. We considered a nominal value of σ = 0.005, following [2,4], while we also simulated the cases of more noisy (σ = 0.05) and less noisy (σ = 0.0005) sensors. With these choices, we represented algorithms' capabilities under 3 different orders in the magnitude of the error in the measurement. To do so, we run three Monte Carlo simulations, each one based on 250 different datasets generated by following Algorithm 8. The distributions of the model disturbances and measurement noise are: Finally, all parameters except σ were set to their default ones, to ensure a fair comparison. The results of the three Monte Carlo simulations are presented in Figure 5.    All algorithms appear to be immune to the change of measurement's output variance σ. What is worth mentioning is that an increase of one order in variance resulted to a decrease of one order in accuracy for most algorithms (i.

Algorithm Evaluation Using Real Data
In this section, the aforementioned algorithms are tested using real data. Multiple datasets captured by low-cost magnetic field sensors were used to verify the algorithms' performance under real-world conditions. In this case, parameters T true and h true are not known in advance. Therefore, the accuracy metric (74) cannot be used. Since, the measurements took place in a specific location, a constant magnitude of magnetic vector, m = 1 was considered. As a result, a proper cost function to evaluate an algorithm's effectiveness is the following where K is the number of measurements and k = 1, 2, . . . , K is the measurement index. The estimated magnetic field vector m k for each k is given by where T and h are the outputs of a calibration algorithm. Such a cost function is described by Wu and Shi (52), as well as by Papafotis and Sotiriadis (65).
To evaluate the performance of the presented algorithms, we used two off-the-shelf, low-cost magnetic field sensors, which are typically found in commercial electronic devices, such as smartphones, activity trackers, etc. More specifically, we captured a total of 30 datasets using the LSM9DS1 by STMicroelectronics and the BNO055 by Bosch Sensortec. The operation parameters of the two sensors during the experiment are presented in Table 3.
During the experiment, two sensors were fixed on the same rigid platform which was rotated by hand in several orientations. In Figure 6a, the mean value of the cost function (81) across all the recorded datasets for every algorithm is presented as a metric of accuracy. The robustness of each algorithm, as defined in (77) is presented in Figure 6b. Note that both Figure 6a,b are in agreement with the results obtained in Section 10.1.2 where synthetic data with typical values for sensor's noise and measurement distortion were considered.  (b) Robustness (RB(%)) of the presented algorithms using multiple datasets of real data from two different commercial magnetic field sensors. Figure 6. Performance characteristics of the presented algorithms using multiple datasets of real data from two different commercial magnetic field sensors.

Conclusions
To summarize, a complete and extensive study on calibration methods for low-cost magnetometers was carried out by the authors. Seven algorithms were selected for this purpose according to their popularity and their performance. A standard, unified, and complete linear measurement model was used here as the reference model for analyzing all calibration methods. After establishing the full calibration problem, these seven algorithms were discussed and were presented in an easy-to-implement way.
In order to evaluate fairly the presented algorithms' performance, we proposed a method for optimally generating artificial magnetometer data. This method was used for executing a plethora of Monte Carlo simulations. The evaluation metrics we focused on were the robustness, the accuracy and the efficiency of the algorithms. We designed several experiments to check the behavior of the algorithms under different values in bias, different values in non-orthogonality errors, different number of measurements and finally under various orders of variance in noise. Finally, several datasets of real magnetometer's data, from two different, low-cost, commercial sensors were used to verify the results obtained using the artificial data.
The following summarizes our findings regarding the studied algorithms and their possible implementation. Except from the objective criteria that we established in Section 10 to evaluate and compare the presented algorithms (accuracy, robustness, computational efficiency), in Table 4 we also evaluate the algorithms in terms of simplicity. Simplicity is used as a (subjective) metric describing our personal experience developing and testing the algorithms. It is related both to the algorithmic complexity of the algorithms (which is not an inherent disadvantage) and the quality of their presentation in the original manuscripts. The algorithms are discussed in chronological order of publication. Robust and accurate. Very high computational cost. Some prior knowledge of the search space is beneficial. At the beginning of the algorithm, the unknown variables are randomized and, thus, it is not always ensured that the algorithm will reach an optimal point. Thus, a couple of repetitions might be needed. Using modern PSO algorithms which can constrain the search space and handle a few variable inequalities increases the algorithm's performance significantly. Wu and Shi: Difficult to implement. Characterized by high time-complexity. Exceptional accuracy even with larger distortion. We noticed a 1% failure of finding an initial estimate due to inadequacy of applying Cholesky decomposition. MAG.I.C.AL: Easy to implement. Moderately time efficient. Exceptional robustness and accuracy in both synthetic and real data experiments. To conclude, in this work, we tried to cover a broad range of realistic cases and test the limits of the algorithms, noting that in real life the performance requirements differ from application to another. In some applications computational efficiency may be of major importance while great accuracy may not be needed, while in others, a very accurate calibration is essential even if significantly more computation time is required for this. Thus, there is no "perfect" algorithm appropriate for all applications; different algorithms may be more appropriate for different cases.
Author Contributions: Investigation, K.P., D.N. and P.P.S.; Writing-original draft, K.P., D.N.; Writing-review and editing, K.P., D.N. and P.P.S. All authors have read and agreed to the published version of the manuscript. This section (Section 8) presents the algebraic expressions for the gradient and Hessian of the likelihood function (51), used in descent optimization methods. For notational simplicityT andm are replaced by T and m. Let u k = y k − h, the Jacobian vector and Hessian matrix are, respectively, derived as