Direction of Arrival Method for L-Shaped Array with RF Switch: An Embedded Implementation Perspective

This paper addresses the challenge of implementing Direction of Arrival (DOA) methods for indoor localization using Internet of Things (IoT) devices, particularly with the recent direction-finding capability of Bluetooth. DOA methods are complex numerical methods that require significant computational resources and can quickly deplete the batteries of small embedded systems typically found in IoT networks. To address this challenge, the paper presents a novel Unitary R-D Root MUSIC for L-shaped arrays that is tailor-made for such devices utilizing a switching protocol defined by Bluetooth. The solution exploits the radio communication system design to speed up execution, and its root-finding method circumvents complex arithmetic despite being used for complex polynomials. The paper carries out experiments on energy consumption, memory footprint, accuracy, and execution time in a commercial constrained embedded IoT device series without operating systems and software layers to prove the viability of the implemented solution. The results demonstrate that the solution achieves good accuracy and attains an execution time of a few milliseconds, making it a viable solution for DOA implementation in IoT devices.


Introduction
Direction of Arrival (DOA) estimation is an array signal processing application found in radars, navigation, military devices, and medical appliances [1]. Most recently, it has been applied in wireless communication systems such as Bluetooth to localize devices in indoor environments where satellite-based radio-navigation systems fail to do so. In 2015, Bluetooth released its positioning technology based on the Received Signal Strength (RSS) [2]. In that past technology, a Bluetooth Low Energy (BLE) tag transmitted radio frequency (RF) signals to receivers. The receivers could estimate the positions of such tags from RSS measurements. The positions were not precisely estimated, in fact, the tag could be anywhere inside a circular zone if trilateration was used, which is a typical RSS-based positioning method. According to Bluetooth itself, such technology has an accuracy of a few meters (1-10 m) [3], and even if more complex algorithms are employed, there are limitations on the achievable positioning accuracy. In some cases, such accuracy of a few meters is enough, but some occasions need higher accuracy, for example, machine navigation for autonomous mobile robots, drones, industrial automation, or navigation systems that guide people in indoor environments. Higher accuracy is also desirable in asset tracking, where factories track material workflow and hospitals track equipment location. In IoT networks, it is important to note that nodes are often constrained embedded systems. These systems typically consist of three distinct subsystems [8], as illustrated in Figure 2. The microcontroller or computing subsystem is responsible for controlling the node's functionalities, and runs instructions on a low-power processor that commonly operates at a few megahertz. It also has flash memory and RAM whose sizes are in order of kilobytes. Flash memory is a non-volatile memory that stores the program's instructions and constant data values, whereas RAM is a volatile memory that is used as data storage for the program while it is running. Nodes usually have a simple real-time operating system that executes all functionalities, also known as tasks, concurrently. The sensor subsystem collects data from natural phenomena in the form of analog signals via sensor readings and transforms them into digital signals using an Analog-to-Digital Converter (ADC). Finally, the communication subsystem is made up of a transceiver and normally a co-processor device that is responsible for transmitting and receiving data. While Bluetooth specifies a protocol for transmitting a constant tone for DOA purposes, the development of direction of arrival algorithms is left to implementation. However, the implementation of DOA methods in IoT devices presents a significant challenge as these devices are often battery-powered and resource-constrained embedded systems with limited computational resources as previously mentioned. In contrast, DOA methods consist of complex numerical algorithms that are resource-intensive and time-consuming, potentially leading to rapid battery depletion, unacceptable execution times, and resource starvation. Moreover, IoT devices typically run a simple real-time operating system that executes small tasks concurrently, such as data acquisition from sensors and communication with other devices. The execution of DOA methods in such a multi-threaded environment is even more challenging, and computational resource management needs to be carefully thought out.Therefore, developing DOA algorithms for IoT devices demands an innovative approach that considers the balance between resource constraints and DOA accuracy, without compromising battery life or real-time system performance.

Literature Review
Typically, research about DOA does not take into account the implementation point of view sufficiently. In many cases, research is carried out using multi-paradigm programming languages such as MATLAB, which already have a range of pre-made general-purpose numerical functions, resulting in little motivation to develop tailor-made numerical algorithms for DOA methods. However, when implementing these methods in constrained embedded systems, numerical algorithms must be developed from scratch, as C programming language libraries offer very limited support. Additionally, widely-used linear algebra libraries such as LAPACK [9] and Armadillo [10] are not suitable for use in constrained embedded systems. Although the Common Microcontroller Software Interface Standard (CMSIS) DSP Software Library is designed for these systems, it lacks some numerical algorithms used in DOA methods. Furthermore, DOA estimations are typically used in radars or large antenna arrays that employ much more powerful processors than those in constrained embedded systems. While accuracy is usually the primary performance criterion of interest, energy consumption, and memory usage are also crucial considerations for battery-powered IoT devices. Therefore, DOA methods should not occupy significant amounts of memory, blocking other IoT tasks. Additionally, complex numerical algorithms require considerable computation, which affects the execution time, an essential performance criterion for real-time applications.
The array of antenna used in this research is the L-shaped array. The L-shaped array has an interesting propriety since it is composed of two orthogonal Uniform Linear Arrays (ULAs). One-dimensional (1D) DOA methods can be applied separately for two ULAs to estimate the azimuth and elevation angles. Other shapes of planar arrays, such as Uniform Rectangular Arrays (URAs) and Uniform Rectangular Frame Arrays (URFAs), rely on two-dimensional (2D) DOA methods which are more complicated than their 1D versions. Moreover, well-known fast algorithms were specifically developed for ULAs. Additionally, some exploit the Vandermonde structure in the signal model found in such an array to speed up their computations. Namely, Root Multiple Signal Classification (Root MUSIC) [11], Estimation of Signal Parameters via Rotational Invariant Techniques (ESPRIT) [12], Fourier Domain MUSIC [13], and Rank-Reduction Method (RARE) [14]. Most recently, many modified versions of the cited methods have been devised, claiming to be a better version in a certain way. Nevertheless, ESPRIT was later designed for URAs and Uniform Circular Arrays (UCAs) [15], whereas Fourier Domain MUSIC can be applied to non-uniform linear arrays as well.
Among many DOA methods, Multiple Signal Classification (MUSIC) [16], invented in the 80s, is an important algorithm that has been extensively tested in simulation and the real world for many decades as well as comprehensively studied, culminating in some well-known modified versions. Notably, the standard MUSIC detects DOAs by searching for peaks in the spatial spectrum. It also can be extended to find azimuth and elevation angles by searching for peaks in a 2D spatial spectrum of the planar array of antennas. However, that 2D search is a prohibitively expensive operation which motivated the development of Reduced-Dimension (R-D) MUSIC [17] that exploits the structure of an L-shaped array of antennas to do that search in two 1D spatial spectra, one for each ULA. Knowing Root-MUSIC is a search-free method that exploits the Vandermonde structure of ULAs to apply a root-finding method that substantially reduces its execution time, the Reduced-Dimension (R-D) Root MUSIC came naturally as a modified version that executes Root MUSIC two times, one execution for each ULA.
Research about the BLE direction finding is still scarce as it is a recent feature released in 2017. In [18], research was conducted involving real-world experiments using that BLE feature in an indoor environment focused on array calibrations using two ULAs and one URA. It reported that mutual coupling had a minimal influence or even had no clear improvement on the accuracy of a variation of MUSIC. However, Carrier Frequency Offset (CFO) compensation substantially impacted it. In [5], four 4 × 4 URAs were employed in an indoor environment of 25 m × 15 m wide composed of pillars, walls, human movements, tables, chairs, devices, and lamps. In total, 8 eight distinct positions were estimated, and 32 different estimations were evaluated using BLE direction finding and the MUSIC method. The average error was 0.7 m, attaining a good result concerning the sub-1-meter accuracy purpose.
Papers about real-world implementations are less common than about in a simulation. In [19], a modified version of MUSIC was developed in a Digital Signal Processor (DSP) for underwater acoustic sources. Notably, it employed a Reduced-Order Root-MUSIC method that reduces the polynomial order of Root MUSIC to speed up computations. In [20,21], researchers conducted a small-scale experiment using a ULA with four elements and one single transmitter. The array of antennas was connected to the NI PXI platform, and an antenna transmitter was connected to another PX platform. After running a single-source MUSIC and a Total Least Squares ESPRIT in LabView NI hardware to estimate two different DOAs, they concluded both methods could be used in real-time applications. Moreover, the development of 2D MUSIC for an L-shaped antenna array to estimate the azimuth and elevation angles based on parallel computing was successfully devised for a Digital Signal Processor (DSP) [22]. More specifically, researchers parallelized the eigenvalue decomposition to construct the signal or noise subspace and the peak searching method to find DOAs by exhaustive search. Despite the parallelization, the execution time of parallel MUSIC was 190.39 ms, which can be seen as slow considering that the experiment used a DSP of 1 GHz.
Most of the research did not focus on the implementation aspect regarding the optimization of the method to be adapted to constrained embedded systems. Except for one cited paper, the rest did not evaluate other important performance criteria such as execution time, energy consumption, and memory footprint. This research takes a step further by adapting the R-D Root MUSIC for a Radio Frequency (RF) switch based on Bluetooth specification. Additionally, the adapted method applies unitary transformations to avoid complex arithmetic, and a real-valued Eigenvalue Decomposition Method (EVD) is employed to find the roots of complex-valued polynomials. The novel method also exploits the radio communication system design of Bluetooth to speed up its execution based on BLE receivers' capability of estimating one single DOA only.
Notation: by defining the complex number z = ce jθ , the operators Re (·) and Im (·) return the real and imaginary part of z, respectively. The argument of a complex number arg (·) is an operator that returns the phase angle (θ) in the interval [−π, π] while (·) is the complex conjugate. The complex absolute operator is defined as |·| = Re(·) 2 + Im(·) 2 . The operator diag (·) denotes a diagonal matrix formed by an input vector. The Hermitian transpose is (·) H which applies the transpose and complex conjugate on a matrix. The operator · is the floor function, that is, it outputs the greatest integer less than or equal to the input which is a real number. The Hadamard product (•) is an operator that takes two matrices, for example, A and B of the same dimension, and outputs another matrix whose elements are given by (A • B) ij = (A ij )(B ij ). I n is an identity matrix while 0 n is a zero column vector both of size n. Figure 3 shows the structure of the L-shaped uniform array. It is composed of two orthogonal uniform linear arrays of M antennas in the X-Y plane in which the distance between two adjacent antennas is ∆. All antennas are assumed to be identical, isotropic, and omnidirectional. Suppose there are d (d < M) independent far-field narrowband stationary signals, s i (t) such that i = 1, . . . , d, incident on the array plane at 2D angle (θ 1 , φ 1 ), (θ 2 , φ 2 ), . . . , (θ d , φ d ) in which θ i is the azimuth and φ i is the elevation angle. Let us also assume the signals propagate in an AWGN channel with linear and isotropic transmission medium. DOA methods compute the broadside angle, which is the signal direction measured relative to the line perpendicular to the array. Since that array is composed of two ULAs, there are two broadside angles, α i , and β i , as shown in Figure 3, which correspond to the x-axis and y-axis ULAs, respectively. From geometric properties, the relation between α i and β i with azimuth and elevation are shown in Equations (1) [23], Assume that the L-shaped array is not subject to nonidealities such as mutual coupling and cross-polarization effect. Additionally, consider that all antennas are identical and have omnidirectional gain functions, i.e., g(θ, φ) = 1. The model of the signal samples received by x-axis and y-axis arrays at a timestamp t can be expressed as Equations (2) and (3) [24],

Mathematical Model for L-Shaped Uniform Array
where T is the array observation at timestamp t of the x-axis ULA which is a vector of the signal samples for each individual antenna in the x-axis ULA, such that x i (t) corresponds to a single signal sample received from the antenna i at timestamp t. Likewise, for y(t) = y 1 (t) y 2 (t) . . . y M (t) T in the case of y-axis ULA.
Moreover, s(t) ∈ C d×1 is a vector of signals of d sources, n x (t), n y (t) ∈ C M×1 are the additive white Gaussian noise (AWGN) and A x , A y ∈ C M×d are the ideal steering matrices of the x-axis array and y-axis array, respectively, as shown in Equations (4) and (5), where the ideal array responses are defined in Equations (6) and (7), in which and c is the speed of light, f c is the carrier frequency Moreover, in our analyses, it would be useful in some cases to consider the model signal samples for the whole L-shaped array instead of two separate ULAs. Let us define T as the array observation of the L-shaped array at timestamp t which is a vector of the signal samples for each individual antenna in the Lshaped array, such that h i (t) corresponds a single signal sample received from the antenna i at timestamp t. The signal samples of an L-shaped array are composed of h i (t) = x i (t) such that i = 1, . . . , M − 1, the common antenna h M (t) = x M (t) = y 1 (t) for both ULAs in addition to h M+j−1 (t) = y j (t) such that j = 2, . . . , M. The model of the signal samples received by the L-shaped array at a timestamp t can be expressed as Equation (8), where n h (t) ∈ C (2M−1)×1 is the additive white Gaussian noise (AWGN) and A h ∈ C (2M−1)×d is the ideal steering matrix of the L-shaped array, respectively, shown in Equation (9), if each element of the ideal L-shaped array response, a(α, β) ∈ C (2M−1)×1 , is represented in Equation (10), where 1 ≤ k x ≤ M and 1 ≤ k y ≤ M, hence the L-shaped array response can be expressed in a compact form in accordance with Equation (

Unitary Reduced-Dimension Root MUSIC
In the standard 2D MUSIC, the array observation of both x-axis and y-axis ULAs are combined in such a way that it is possible to estimate the azimuth and elevation angle of each signal source by performing a 2D search on the spatial spectrum. In contrast, the Unitary Reduced-Dimension (R-D) Root MUSIC computes the two ULAs separately by applying a root-finding method that does not require a 2D search on the spatial spectrum. Consequently, that method is substantially faster than standard 2D MUSIC [17]. Moreover, the Unitary transformation turns centro-hermitian matrices into real-valued ones which further decreases the execution time and reduces memory consumption.
For the standard 2D MUSIC, let z(t) be the combination of the array observation of the x-axis and y-axis in a timestamp t as defined in Equation (12), After collecting N array observations, the method computes its sample covariance matrix expressed in Equation (13), . Under some assumptions [24], the covariance matrix, R zz , is non-singular and its eigenvectors are divided into two subspaces: a noise subspace (U N ) and a signal subspace (U S ). More specifically, the signal subspace (U S ) is composed of eigenvectors corresponding to the d largest eigenvalues. While the noise subspace (U N ) is composed of eigenvectors corresponding to the N − d smallest eigenvalues. Ideally, the N − d smallest eigenvalues are zeros; however, due to AGWN, they are nonzeros. Moreover, the noise subspace is orthogonal to the combined steering vector as indicated by Equation (14), From Equation (14), the spatial spectrum of 2D MUSIC method can be denoted in Function (15), and the angles (α i , β i ), i = 1, . . . , d can be found by performing a 2D search of the d peaks on Equation (15). As previously mentioned, since the 2D search peak finding is prohibitively time consuming, the Unitary R-D Root MUSIC overcomes this problem by performing a root-finding method instead. Before describing the Unitary R-D Root MUSIC, we have to define three matrices that are used in the unitary transformation to convert centro-hermitian matrices into real-valued ones. Let Π p ∈ C p×p be any anti-diagonal identity matrix in keeping with Definition (16), and Q n ∈ C n×n be an unitary transform matrix expressed in Definitions (17) and (18), depending if its size is even or odd. The algorithm is outlined below.

1.
Collect N array observations for timestamp t 1 , t 2 , . . . , t N . Afterward, we can estimate the covariance matrix of the x-axis and y-axis separately in line with Equations (19) and (20), where the N array observations of x-axis and y-axis respectfully.

2.
Apply forward-backward averaging on the covariance matrix, that is,R f b xx =R xx + Π MRxx Π M to deal with coherent signals due to the multipath reflections [25]. Sincê xx is a centro-hermitian matrix, we apply the unitary transformation to convert that complex-valued matrix into a real-valued one, that is, The same goes for the covariance matrix of the y-axis, Apply the real-valued EVD in C x and C y to construct the noise subspace U N,x and U N,y of the x-axis and y-axis, respectively. It is a time-consuming operation that is optimized in this paper.

4.
As previously mentioned, the Unitary R-D Root MUSIC estimates the DOAs based on the roots of two polynomials, so it avoids the exhaustive search of standard MUSIC. Defining z = e −j 2π λ ∆ cos α for the x-axis, the array response in Equation (6) can be redefined as a function of z, as shown in Equation (21), likewise for the y-axis if we define z = e −j 2π λ ∆ cos β . In addition, since a(z) H = a T ( 1 z ), the MUSIC spectrum can be viewed as a polynomial function whose DOA information is contained in some of its roots. The polynomial of the Unitary R-D Root MUSIC for the x-axis and y-axis are defined in Equations (22) and (23) [11], whose coefficients are defined in the relation (25), The same procedure goes to p y (z).

5.
The d complex roots of p x (z) and p y (z) that are inside of a unit circle and closest to it, namely,ẑ x,1 ,ẑ x,2 , . . . ,ẑ x,d andẑ y,1 ,ẑ y,2 , . . . ,ẑ y,d , respectively, contain information about the d DOAs. The azimuth and elevation angles can be found as indicated by the set of equation in (26), Note that finding all roots of a complex-valued polynomial is a difficult task and time-consuming, thus it needs an efficient and accurate root-finding method. The implemented solution circumvents complex arithmetic and finds them by a real-valued EVD, which is described in Section 6.

RF Switch Model
Theoretically, all antennas within an array should sample the signal at the same time at each antenna port. However, this would require each antenna to have its own RF front-end, which includes components such as analog-to-digital converters, filters, mixers, and lownoise amplifiers. Unfortunately, incorporating such analog components for each antenna would lead to increased power consumption, physical size, and higher costs for constrained embedded IoT devices. To address these challenges, it is more appropriate for the array to have a single RF front-end and an RF switch, enabling each antenna to utilize the RF front-end at different times. The Bluetooth protocol considers this radio architecture for its direction-finding capability, and in this research, we opted to utilize the switching protocol outlined in the Bluetooth 5.1 specification [4] with an L-shaped array.
Bluetooth utilizes Gaussian Frequency-Shift Keying (GFSK) where 0s and 1s are modulated into different frequency shifts [26]. The two frequencies are equal to the central frequency ( f c ) in addition to a frequency deviation (± f ∆ ). To comply with the theoretical assumption, the signal should be stationary, that is, a signal with a constant time-frequency is desirable. Thus, the transmitter (signal source) sends a Constant Tone Extension (CTE), composed of a continuous series of digital ones, so the frequency remains the same during the IQ sampling. Notably, if the signal were non-stationary, DOA methods would be more complex [27].
During the CTE, the first 4 µs is the guard period. The reference period, which takes 8 µs, is the beginning of the IQ sampling operation but only one antenna of the receiver (anchor node) carries out the sampling operation. Only one single IQ sample is performed per 1 µs, totaling eight IQ samples. Afterward, a series of sampling and switching operations begin, which is referred to switch-sample period. The switch and sample slot last 1 µs or 2 µs; in this research we consider 1 µs time slot. For every switch slot, the RF switch device switches from one antenna to another, so that another antenna can acquire a single IQ sample during the next sample slot. Since this research considers a 1 µs time slot, there are 74 sample slots in the switch-sample period. Figure 4 depicts all operations during the CTE.  Bluetooth low energy signals can be considered narrowband when using 1 MHz bandwidth in indoor scenarios where the typical delay spread is between 20 ns and 60 ns [28]. Therefore, BLE satisfies the narrowband premise in Section 3, if we take into account all the cited assumptions as well, the mathematical model is the same as Equations (2) and (3) in addition to the phase shift due to the RF switch. Without AWGN, the phase shift between two consecutive samples in the reference period was reported to be about 80 • -100 • [29]. These numbers double for two consecutive samples in the switch-sample period. As a result, it is imperative to develop a phase compensation to make the DOA method work properly. As previously mentioned, the transmitter sends a continuous signal representing digital ones which is the CTE where its carrier frequency f c is between 2402 MHz and 2480 MHz depending on the used channel. The narrowband incoming signal can be expressed in a complex format in Equation (27) [18], where t is the time, and s(t) = I(t) + jQ(t) is called the complex envelope. However, that frequency in order of gigahertz is too high for the ADC, so the receiver RF frontend performs complex downconversion also known as quadrature demodulation. That operation outputs the in-phase and quadrature (IQ) components of u(t) in the baseband in such a way that the central frequency ( f c ) corresponds to a DC [29]. As a result, the IQ components can be expressed in Equation (28), where A is the amplitude, ψ is the initial phase, f ∆ = 250 kHz is the frequency deviation considering LE 1M physical layer [4], and f o is the carrier frequency offset (CFO) which is in order of 10 kHz [29]. Without loss of generality, let us consider that IQ samples from the reference period correspond to the first antenna of the L-shaped array. From Equation (28), the phase shift between two consecutive samples of the reference period without considering AWGN is evidenced by Equation (29), where ∆ t = 1 µs and ∆ f = f ∆ + f o . In other words, it is possible to estimate the phase shift over a 1 µs time period by using the samples of the reference period. However, we are interested in ∆ f since we can use it to estimate the phase shift over other time periods.
Assuming the carrier frequency offset is constant during the CTE, we can estimate ∆ f by calculating the average of the phase difference between two consecutive IQ samples of the reference period, as shown in Equation (30), By adopting the Round Robin switch pattern, each antenna from the L-shaped array carries out IQ sampling sequentially, as shown in Figure 5. Note that the switch pattern begins in the last reference period slot. As a result, the L-shaped array samples 75 IQ samples in total, 74 samples from the switch sample period, and 1 sample from the last reference period slot. Moreover, the array observation, in this case, is defined as one single sequence of the Round Robin pattern, that is, when all antennas in the array complete the IQ sampling. Observe that antenna k performs an IQ sampling 2(k − 1)µs after the array observation starts. It means that the phase shift is e j2π∆ f ∆t k without considering AGWN as indicated by Equation (31), where ∆T k = 2(k − 1)µs. We can generalize the observation of Equation (31). As a result, let h ss (t) be an array observation of a Round Robin sequence that begins at timestamp t, the array h ss (t) relates to h(t) by the phase shift matrix due to the RF switch labeled as O in accordance with Equation (32), such that, where T slot = 1µs and the RF switch phase shift matrix is a diagonal matrix defined in Equation (33), DOA methods such as MUSIC can estimate multiple DOAs during their execution, so radar applications sending sounding signals and measuring when their own signal is received from different reflections can take full advantage of that capability by identifying multiple copies of their own reflected signal. However, in IoT radio communication systems where anchors are employed to locate multiple tags, this is not possible in practice with low-cost single receiver anchor nodes operating at a single RF channel at a given time such as in Bluetooth receivers [30]. That is, if more than one BLE tag sends a signal to an anchor node at the same time and frequency resources, the signal to interference and noise ratio would be too low for that radio receiver to detect transmission reliably. For example, a receiver could not be able to decode both transmitter IDs of the transmitters reliably as each transmission would interfere with the other. Therefore, in this scenario, DOA methods can only estimate a single DOA only per execution. As result, the L-shaped array observation model of one sequence of the Round Robin pattern is defined in Equation (34), Note, s(t) is a scalar that represents one single signal in opposition to the vector s(t) found in Equations (2)  MUSIC was devised considering that all antennas perform IQ sampling at the same time, which is not the case for Bluetooth specification. Thus, without a phase compensation, the accuracy of Unitary R-D Root MUSIC is totally compromised, as shown experimentally in Section 7. The DOA method receives N array observations as the input shown in Equation (35), From Equation (32), we know that O −1 h ss (t) = h(t). Thus, the DOA method needs to apply the RF switch compensation matrix (O −1 ) into the N array observation matrix (H ss ) as expressed in Equation (36), where H = h(t 1 ) . . . h(t N ) . Note that in the real world, the equality in Equation (36) is an approximation, since the RF switch compensation matrix (O −1 ) takes an estimation of ∆ f calculated in Equation (30). Moreover, the Unitary R-D Root MUSIC needs to obtain matrices X and Y, which are the N array observations from x-axis and y-axis ULA, respectively, as defined in step 1 Section 4. To do that, observe that X is equal to the first M rows of H, and Y is equal to the last M rows of H. As a result, step 1 needs an additional operation to obtain matrices X and Y prior to covariance calculations.

Modified Unitary R-D Root MUSIC
The implemented solution optimizes the Unitary R-D Root MUSIC by exploiting Bluetooth's radio communication system design where only a single BLE tag transmits a signal at a time, as discussed in Section 5. It means that the signal subspaces, U S,x and U S,y , is a column vector. As a result, the implemented solution can void applying the time-consuming EVD and instead apply the Power Method, which is a much simpler algorithm. Experimentally, we found the Power Method converge mostly in four iterations only in our solution. Moreover, the computation of the RF switch compensation matrix is performed in a linear time complexity instead of executing the inverse of a matrix with a cubic time complexity. The implemented solution utilizes a finding-root method based on EVD that does not require computing complex arithmetic despite the polynomial having complex coefficients and roots. However, the implemented solution employs the ideal array response. The real array response must be empirically found and plugged into the implemented solution. Refer to [18,31,32] to know how to compute the real array response.
The objective of the optimization is to reduce the memory consumption and execution time of Unitary R-D Root MUSIC to attain satisfactory portability to run in constrained embedded systems. Thus, the algorithms were implemented from scratch in the C99 programming language, except the inverse of sine, the inverse of a tangent, and squared root, which are functions from math.h. The tailor-made numerical methods include the Power Method, an EVD, which is an adaptation of [33] that consists of the Shifted QR Algorithm, the Balancing technique, Hessenberg decomposition, and auxiliary linear algebra algorithms such as the norm of a vector and multiplication of a matrix with a vector. Since one of the objectives of the implemented solution is to attain a minimal memory footprint as much as possible, it does not use the complex.h library from C programming language. Instead, it has a data structure for complex numbers with two variables representing the real and imaginary parts and functions for complex multiplication, addition, division, and conjugation. Notably, the implemented solution only employs math.h and stdint.h libraries, reassuring its minimal computational resources consumption goal and portability.
Due to the switching protocol of Bluetooth 5.1, more operations are required in step 1 of Section 4. That is, the method collects samples from the reference period and N array observations (H ss ) by performing the Round Robin switch pattern. Subsequently, the implemented solution calculates ∆ f from Equation (30) using the samples from the reference period. Afterward, it applies the RF switch phase compensation (Equation (36)) to estimate the matrix H. Then, it separates the IQ samples of the x-axis ULA from the y-axis one. More specifically, X is composed of the first M rows of the estimated H, while Y is the last M rows. Finally, it calculates the two covariance matrices, R xx and R yy from Equations (19) and (20).
The first and simpler optimization concerns Equation (19). As discussed in [34], the matrix X is big for constrained embedded devices since it contains many IQ samples that are complex numbers. In fact, if the implemented solution uses all the IQ samples, X will occupy 600 bytes considering the single-precision floating point. By performing the sample covariance matrix, the code may have to store a temporary matrix X H as well, which would double the memory consumption. The implemented solution does not store X H . To analyze how it is possible, let us define V as a matrix that stores X H . Normally, the standard way to multiply two matrices, particularlyR xx = XV, is evidenced by Equation (37),r Since V = X H , then v(k, j) = x(j, k); therefore, Equation (37) could be written as indicated by Equation (38), Moreover, sinceR xx is Hermitian, which meansr xx (j, i) =r xx (i, j), thus the solution applies matrix multiplication only on its upper triangular part; therefore, Equation (38) is remodeled as the set of relations in (39), From Equation (39), the implemented solution estimates the covariance matrix using the same matrix X twice by applying an element-wise conjugate transpose operation, thus it does not need to store X H . Furthermore, it only computes the upper triangular part of R xx . To sum up, that approach saves execution time by half and memory usage in the order of MN. That is an important improvement, since calculating the sample covariance matrix is the second most time-consuming operation, as shown in Section 7. The same optimization is carried out in Equation (20) for the y-axis ULA.
Another minor optimization concerns the computation of the RF switch compensation matrix, which requires the inverse matrix calculation of O defined in Equation (33). The Gauss-Jordan elimination is a popular algorithm to calculate the inverse of a matrix whose complex is O(n 3 ) [35]. However, since O is a diagonal matrix in which the generic form of its elements is known, we can avoid applying the time-consuming inverse matrix calculation. From complex arithmetic, we know that (e jθ )e −jθ = 1, thus, the inverse of O is in line with Equation (40), Since e −j2π∆ f c ∆T k = e −j2π∆ f c (k−1)∆T 2 , then e −j2π∆ f c ∆T k = (e −j2π∆ f c ∆T 2 ) k−1 for 2 ≤ k ≤ 2M − 1. By defining, z = e −j2π∆ f c ∆T 2 , the inverse of O can be redefined in a more compact form as evidenced by Equation (41), From the redefinition in Equation (41), we can derive the Relation (42), Thus, to calculate O −1 , the implemented solution only needs to set O −1 (1, 1) = 1, compute z = e −j2π∆ f c ∆T 2 , and apply Equation (42) successively for k = 2 . . . 2M − 1 to take advantage of the previous computation. As a result, the implemented solution does not need to compute each element of O −1 explicitly, that is e −j2π∆ f c ∆T k , which may require computing the finite Maclaurin series 2M − 2 times to calculate all of 2M − 2 elements, except the first, which is 1. Notably, the finite Maclaurin series is a well-known method to evaluate complex numbers by computers, as illustrated in Equation (43), where n is the number of elements. Moreover, to reduce memory consumption, the implemented solution just needs to store the diagonal of O −1 as a column vector and compute the element-wise (Hadamard product) of that vector with h ss (t) defined in Equation (32). More specifically, let d = z 0 z 1 . . . z 2M−2 T be the cited column vec-tor, since h ss (t) • d = O −1 h ss (t), the implemented solution calculates H by applying the Hadamard product of d in each element of H ss as shown in Equation (44), replacing Equation (36). Particularly, the computation in Equation (44) is faster than (36), since the RF switch compensation is a matrix in Equation (36), whereas in Equation (44) it is a vector d. Moreover, Algorithm 1 calculates the RF switch compensation (O −1 ) whose complexity is O(n).
Algorithm 1: computation of the RF switch compensation (O −1 ) Input: the carrier frequency offset (∆ f c ) and time slot (∆T slot ) Output: the elements of the diagonal of O −1 stored in d. 1 Define d ∈ C 2M−1 such that d 1 ← 1 and calculate z ← e −j2π∆ f c ∆T 2 .
We can compute the two covariance noise subspaces (U N,x and U N,y ) indirectly via their respective signal subspace (U S,x and U S,y ) by Equations (45) and (46) [36], Remember that the implemented solution only estimates one single DOA, which means, d = 1. Thus, the signal subspace is composed of only one eigenvector, which means we can apply the Power Method [37] that only estimates one eigenvector, which is the signal subspace, as proved in the next paragraph. The noise subspace is composed of N − 1 eigenvectors corresponding to the smallest eigenvalues. Therefore, if we calculate the covariance noise subspace directly, we would apply an EVD method that computes all eigenvectors and eigenvalues. In addition, computing all of them requires a very complicated and time-consuming algorithm in addition to more memory footprint.
Notably, the complexity of the QR Algorithm, a typical method for EVD, is 6n 3 + O(n 2 ) per iteration [38], not to mention the Hessenberg decomposition that could be performed before the QR Algorithm, and an algorithm to create the noise subspace from the probable unsorted pairs of eigenvalue-eigenvector. However, since they could be unsorted, to construct a noise subspace a reasonable approach seems to apply a sorting algorithm that could have an average complexity between O(n log(n)) to O(n 2 ) [39]. While the Power Method has a complexity of O(n 2 ) per iteration, it calculates the signal subspace, requires very simple computations, and experimentally we found it converges mostly in four iterations only in our solution. Figure 6 depicts the algorithm overview of the covariance noise subspace computation. The left one computes the covariance directly while its fastest version (the right one) calculates the covariance indirectly via signal subspace.
To guarantee the Power Method will converge, the matrix must be diagonalizable, there must exist only one eigenvalue with the greatest absolute value, and it must be a real number [40]. For example, considering λ i ∈ R, i = 1, . . . , M to be eigenvalues of a diagonalizable matrix, if |λ 1 | > |λ 2 | ≥ · · · ≥ |λ M | then the cited matrix satisfies the convergence requirements. The matrix C x is a real covariance matrix, thus it is symmetric [41]; therefore, it is diagonalizable, and its all eigenvalues are real numbers [42].  Figure 6. Algorithm overview of the two covariance noise subspace computations. The left method calculates that covariance directly, but the right one calculates indirectly via the signal subspace.
The Power Method outputs the eigenvector and its associated eigenvalue, which is the greatest in magnitude. Here, we prove that the greatest eigenvalue in magnitude is the one of the signal subspace. Note that C x is a real covariance matrix, hence, it is positive semi-definite [41], which means all eigenvalues are non-negative. The line-of-sight (LOS) component of the received signal that constitutes the eigenvalue of the signal subspace is greater than the eigenvalues of the noise subspace [24], and since they are all non-negative, it is not possible to have eigenvalues of the noise subspace equal to or greater than the one of signal subspace in magnitude. Therefore, the eigenvalue of the signal subspace is the greatest in magnitude, hence its corresponding eigenvector is the signal subspace. The same analysis goes to C y .
The implemented Power Method (Algorithm 2) does not compute the eigenvalue since the solution only needs the eigenvector. We considered K = 30 and tol = 10 −6 . We carried out thousands of experiments and we verified that in most cases the Algorithm 2 takes 4 to 5 iterations to converge, and 30 iterations are much more than enough in all experimental instances, hence for k > 30 we assume the algorithm fails to compute the signal subspace.
Finding all roots of a polynomial is a difficult computational task that requires timeconsuming methods, and to aggravate the problem, the polynomial in question has complex coefficients with complex roots. Classical algorithms that operate directly on the polynomial function, such as the Newton-Raphson method, Secant method, and Brent's method, may be hard to work in practice. They only estimate one single root, some are guaranteed to converge if only certain conditions are satisfied and might not work on complex-valued polynomials, and the initial point must be chosen wisely since it could impact their convergence [43,44]. Although they can be extended to estimate multiple roots, they are highly sensitive to computing error since they operate directly on the polynomial function. That is the reason practical computer eigenvalue solvers, such as in MATLAB [45], hardly resemble these root-finding algorithms [46]. Instead, they apply EVD on the polynomial's companion matrix to find the roots. Since such a matrix is non-symmetric, the implemented solution estimates the roots of polynomial p x (z) (or p y (z)) defined in Equation (22) (or (23)) by applying the Shifted QR Algorithm in which its implementation is an adaptation of the algorithm found in [33]. The Shifted QR Algorithm is a well-known method that performs exceptionally well in practice and works even on non-symmetric matrices. It is an EVD method and an improved version of the standard QR Algorithm by including the shifting technique for rapid convergence. This algorithm calculates the eigenvalues of the companion matrix of p x (z) (or p y (z)), which are its roots. To simplify, let us consider a polynomial of the form p(z), where it could be either p x (z) or p y (z).

Algorithm 2: Power Method
Input: covariance matrix C = C x or C = C y . Output: signal subspace U S,x (or U S,y ).
where c i = a i /a 2M−2 ∀i = 0, 1, . . . , 2M − 2. That is, the companion matrix by definition relates to a polynomial in which its highest degree coefficient is one (a 2M−2 = 1), that is the reason we have to divide all coefficients by a 2M−2 as indicated by Equation (48), The matrix P is in a complex domain, but the implemented solution executes the Shifted QR Algorithm for real-valued matrices only. To circumvent this problem, that algorithm solves the complex EVD via equivalent real formulation by converting the complex-valued companion matrix into a real-valued one. That is, the (2M − 2) × (2M − 2) complex eigenvalue problem in Equation (49), (Re(P) + jIm(P)) · (u + jv) = λ(u + jv) can be reformulated as (4M − 4) × (4M − 4) real matrix problem in accordance with Equation (50) [47], Re(P) −Im(P) Im(P) Re(P) However, the eigenvalue (λ) could still be a complex number since the matrix in problem (50) is non-symmetric ( [33], pp. 486-487). That would apparently require complex arithmetic, but the implemented Shifted QR Algorithm circumvents it. Refer to [33] for more detail. The eigenvalue decomposition of P R in the reformulated problem (Equation (50)) outputs two times the number of eigenvalues of P and the complex-valued ones come in pairs of (λ, λ) since P R is a real matrix [46,48]. The implemented solution can easily detect which one, λ or λ, is the eigenvalue of P by verifying which is the root of the polynomial p(z). Moreover, the EVD of P R will increase the computations since its size is two times greater than P, but since the number of antennas (M) is small for an ULA in IoT devices, we can afford this small overburden, especially because the elimination of complex arithmetic could partly or even totally compensate this computational increment.
However, before executing the Shifted QR Algorithm the implemented solution applies the balancing technique, and afterward, it reduces the matrix to Hessenberg form. Both algorithms are an adaptation of [33] for the implemented solution. The balancing technique is a method to reduce the rounding error sensitivity of eigenvalues during the execution of EVD. Hessenberg decomposition transforms a matrix into a simpler one (Hessenberg form) to speed up the execution time of the Shifted QR Algorithms.
Algorithm 3 outputs 4M − 4 eigenvalues in which 2M − 2 are the roots of the polynomial p(z), and the other 2M − 2 are not roots but the complex conjugate of the cited eigenvalues, as explained previously. Since d = 1, ideally the implemented solution needs to find only one single root closest to the unit circle and inside it. However, due to AWGN, that root does not need to be inside the unit circle, in mathematical terms,
Therefore, let's define a real matrix from P, that is, P R Re(P) −Im(P) Im(P) Re(P) ∈ R (4M−4)×(4M−4) . 3 Apply the Balancing technique in P R to reduce the rounding errors sensitivity of eigenvalues. 4 Reduce the balanced P R to Hessenberg form to speed up the execution of Shifted QR Algorithm. 5 Apply the Shifted QR Algorithm to the Hessenberg form of the balanced P R to get its eigenvalues, λ 1 , λ 2 , . . . , λ 4M−4 .
The implemented solution applies Algorithm 4 to solve the optimization problem (51). However, instead of applying the complex absolute operator (|·|), the algorithm uses its squared value (|·| 2 ), to avoid calling the square root method in every iteration of line three. The square root method usually is an iterative algorithm and it needs to converge to a point. However, modern processors have a built-in circuit for square roots. Despite that, by avoiding that operation the execution time of Algorithm 4 is slightly decreased. In lines 2-6, observe that the algorithm finds the eigenvalue (λ solution ), which is the closest to the unit circle. However, its conjugate also is the closest to the unit circle since (|λ solution | 2 − 1) 2 = (|λ solution | 2 − 1) 2 . Thus, in lines 7-9, the algorithm finds which one is the root of the polynomial p(z). That is, if the eigenvalue λ solution is also a root, then |p(λ solution )| 2 1 which means |p(λ solution )| 2 < |p(λ solution )| 2 , otherwise, its conjugate is the root. In that way, the algorithm does not need to check if the eigenvalue is the polynomial root in every iteration (lines 2-6) which saves execution time.
Output: λ solution which is the root of p(z) that is closest to the unit circle.

Experiments
The objective of the experiment consisted of showing the modified Unitary R-D Root MUSIC (implemented solution) works and is feasible for commercial embedded IoT devices. The experiment comprises two parts. In the first part, to check the effectiveness of the RF switch compensation, we compared the implemented solution with the cited compensation and without it in a MATLAB environment only. The second part is more complex. It is composed of a simulation in MATLAB and the real world. In summary, the baseband signals were artificially generated in MATLAB to be the input of the implemented solution developed in C99 programming language and executed in a constrained embedded IoT device. Therefore, we could measure the memory footprint, execution time, energy consumption, and accuracy. To be as accurate as possible, such a device executed the implemented solution only without any operating systems or software layer, that is, the implemented solution was bare-metal programmed. An overview of the experiment is depicted in Figure 7.

Experimental Setup
For both parts of the experiments, the artificial baseband signals were generated using the 5G Toolbox, Phased Array System Toolbox, and Communication Toolbox provided by MATLAB. The simulation parameters are shown in Table 1. The Tapped Delay Line TDL-E channel model (corresponding to Line of Sight propagation) was employed to simulate the multipath propagation phenomenon in indoor environments alongside Additive White Gaussian Noise (AWGN). The center carrier frequency and the frequency deviation correspond to the CTE, and the simulation also randomly generated the CFO between [−30 kHz,+30 kHz] using Gaussian distribution. The CFO values were chosen based on the empirical experiment in [49], which estimated that 99% of CFO values in Bluetooth were within such interval. The L-shaped array is composed of seven isotropic antennas, that is, four antennas for each ULA. This number of antennas is small, thus, it is reasonable for IoT devices. The distance between antennas is half of the Bluetooth signal wavelength.   To measure the memory footprint (RAM and Flash), execution time, energy consumption, and accuracy, we employed a PCA10056 development kit that comes with an nRF52840 System-on-Chip (SoC) having an Arm Cortex-M4 of 64 MHz with a Floating-Point Unit (FPU). The SoC did not use an operating system or software layers. The hardware floating-point instructions and hardware floating-point linkage (-mfloat-abi=hard) were activated so the processor could fully operate the FPU. All devices of the nRF52 series have support for BLE, and although nRF52840 does not have Bluetooth Direction Finding capability, it is almost identical to other nRF52 and nRF53 devices that do have it. Notably, the nRF52 and nRF53 devices are a well-known series of constrained IoT devices developed by Nordic Semiconductor with a radio module of Bluetooth 5.1 or later versions and come with an Arm Cortex-M4 or Arm Cortex-M33 processor. Another popular one is the EFR32BG22 SoC series developed by Silicon Labs, which has the direction-finding capability, ARM Cortex-M33 with 76.8 MHz, up to 512 kB of flash, and 32 kB of RAM. Table 2 shows some SoCs with Bluetooth Direction Finding capability.  [34], and achieve the same accuracy as the other two floating-point formats. The half-precision floating-point (FP16) is another format employed by ARM processors; however, that format is used as a storage format only for Arm Cortex-M4. When operating in FP16, the processor promotes FP16 into FP32 before and demotes it after every calculation [50]. Those operations create a small overhead that could increase the Flash consumption and the execution time. A slower execution time may translate into more energy consumption. Another supported format is double-precision floatingpoint (FP64), but the FPU of Arm Cortex-M4 does not have support for FP64 at all. Thus, the C compiler emulates FP64 calculations [51,52], and that emulation creates an excess of computations culminating in a substantial increment of execution time and Flash usage. In fact, the execution time of DOA methods using FP64 was shown to be about 20 times slower than ones using FP32 [34].
To measure the energy consumption, we connected the Otti Arc (power measurement tool) to the nRF52840, and to measure the execution time, we utilized the Saleae logic analyzer. However, in both measurements we needed to activate a General-Purpose Input/Output (GPIO) port, thus a GPIO was set high and low before and after the execution of the algorithm. So, we could check when the method started and finished to properly carry out the two measurements. Thus, energy usage is slightly overestimated.
Additionally, we measured the stack memory consumption and the relative execution time of the principal operations in the implemented solution. There is no dynamic memory usage. We define the relative execution time as the running time of an operation divided by the execution time of the implemented method in percentage.
Moreover, we calculated the Root Mean Squared Error (RMSE) of accuracy considering a 500 azimuth-elevation pair for each SNR. In mathematical terms, the RMSE of accuracy is defined in Equation (52), where L = 500 is the number of estimated azimuth-elevation pair, and θ andθ is the actual azimuth and its estimation in degrees, respectively. Similarly for the elevation (φ).
The SNR values were 5 dB, 10 dB, 15 dB, 20 dB, 25 dB, and 30 dB. In total, we analyzed 6000 different pairs. Figure 8 shows the comparison between the implemented solution with RF switch compensation (a) and the one without it (b). Both were implemented in MATLAB. The implemented solution without the RF switch compensation is totally inaccurate, whereas the one with it attains much better accuracy as the SNR increases which demonstrates the effectiveness of such compensation. As previously explained, this experiment was the only one carried out totally in a simulation environment. For the next experiments, the implemented solution was run in an nRF52840 SoC. Accuracy is the most important performance criterion of a DOA method. Some companies [53,54] reported an average accuracy of about 5 degrees with an average position accuracy of around 1 m. Thus, it would be desirable for the implemented solution to achieve such a value. Figure 9 shows the RMSE of the accuracy for each SNR run in an nrf52840 SoC, which have almost the same values as one in MATLAB. We clearly see that as the noise decreases the accuracy improves. However, low accuracy is observed between SNR 5-10 dB. However, it should not be a concern since the minimum SNR value for Bluetooth to operate reliably is between 10-15 dB. With less than 10 dB, it is most likely that the receiver fails to decode and the cyclic redundancy check fails as well [55]. With SNR slightly higher than 10 dB, the RMSE values attain less than 5 degrees, reaching our desired result.

Results and Discussions
Nevertheless, the experiments did not consider non-idealities of the antenna array and RF front-end; in fact, the antennas have an ideal isotropic characteristic. As mentioned previously, the implemented solution considers an ideal array response. Since those imperfections deteriorate the accuracy, one should compute a real array response and use it in the implemented solution. The real array response should attenuate the problem, but not completely solve it.  Table 3 shows each main operation of the implemented solution with its respective stack memory consumption and relative execution time. All operations have a stack memory consumption of a few hundred bytes; thus, setting maximum stack memory to be 512 kB in the microcontroller is enough since none of the stack memory surpasses that value. The fifth operation concerns the finding-root method, which took up an incredible 83.17% of the method's execution time. It is not difficult to verify that no other operation demands so much computation as the fifth one, since finding polynomial roots composed of complex roots and coefficients is a difficult numerical computing task. Notably, Algorithm 3 tackles this problem by applying the Shifted QR Algorithm, a complicated method that requires two pre-processing methods to speed up its convergence and accuracy. On the other side of the spectrum, computing the companion matrix (four operations) is the least demanding task. It computes the polynomial coefficients defined in Equation (25), and afterward, it constructs the real companion matrix, which is P R defined in Equation (50). Moreover, the second operation is the second most time-consuming operation. It involves computing Equation (39) and the conversion of a complex-valued into a real-valued covariance as explained in step 2 Section 4. Notably, even though the implemented solution applies the optimization (Equation 39), which reduces the computation by half and stack memory by 600 B, that operation comes in second, which shows the importance of that optimization in reducing both the execution time and stack memory. Furthermore, the third operation constitutes the Power Method (Algorithm 2), and Equations (45) and (46). As explained in Section 6, the optimization avoids the execution of an EVD method, which could be as computationally demanding as the fifth operation. Finally, the first operation is composed of Algorithm 1 and Equations (44) and (30), which are computationally inexpensive. Table 4 shows the memory footprint, execution time, and energy consumption. From these values, the implemented solution satisfies the memory requirements for an nRF52, nRF53, and EFR32BG22 SoC series, as can be verified in [56][57][58]. The execution time is 16.2 ms. By comparison, the standard MUSIC implemented in [59] takes about 18 ms to 133 ms with a median of roughly 31 ms to estimate a single DOA of one ULA in the same SoC (nRF52840). That means that even though the implemented solution estimates DOAs from two ULAs, it is still almost two times faster than the median of the standard MUSIC. A 2D standard MUSIC was implemented in C programming language based on parallel computing using a Digital Signal Processor of 1 GHz and an L-shaped antenna [22]; however, despite the parallelization and a much more powerful processor, it takes about 190 ms. Nonetheless, the implemented solution could be much slower than the modified 2D Unitary TLS ESPRIT in [34]. Its 1D version was developed in the same SoC, and it takes about 0.855 ms using the same precision format (single-precision floating-point). Hence, we can roughly estimate that its 2D version for L-shaped arrays could be two times that value, that is, 1.71 ms if it were developed.
Coin batteries are used for small electronic devices [60], including constrained IoT ones. We found that the capacity of such batteries ranges from 1 mAh to 2000 mAh [61] in a well-established global distributor of semiconductors and electronic components. That means, considering the implemented solution as the only source of energy consumption, the nRF52 series can run from 16,574 to more than 33 million times approximately. Therefore, the implemented solution can be used for battery-powered small embedded devices. However, it is worth mentioning the experiment did not measure the RF front-end, since we did not employ a real array of antennas. Therefore, in practice, we considered that the measurement was evaluated after IQ sampling.

Conclusions
This paper presented a novel Unitary R-D Root MUSIC for L-shaped arrays tailormade for constrained embedded systems using a switching protocol defined by Bluetooth, and a more insightful implementation perspective that is usually not addressed sufficiently in papers. More precisely, the implemented solution exploits the radio communication system design to speed up its execution, that is, it applies a simple Power Method instead of the time-consuming EVD. It also has a root-finding method that circumvents complex arithmetic despite being used for complex polynomials.
In theory, all antennas in the array sample the signal at each antenna port at the same time; however, Bluetooth specifies that the array has an RF switch, so each antenna samples the signal at a different time. Therefore, the theoretical model was slightly modified to consider the RF switch. We showed that without an RF switch phase compensation, the accuracy of the implemented solution was totally compromised. Therefore, we developed a method of RF switch phase compensation and conceived a linear complexity algorithm to compute the phase compensation matrix.
To prove the solution viability, we carried out experiments on energy consumption, memory footprint, accuracy, and execution time in a commercial constrained embedded IoT device (nRF52080) without operating systems and software layers. Notably, except for accuracy, other performance criterion usually are not carried out in research; however, in ours, they were too important to be neglected. With such measurements, we showed the implemented solution viability for IoT devices verified by us, its few milliseconds execution time, and its good accuracy achievement.