The Usage of ANN for Regression Analysis in Visible Light Positioning Systems

In this paper, we study the design aspects of an indoor visible light positioning (VLP) system that uses an artificial neural network (ANN) for positioning estimation by considering a multipath channel. Previous results usually rely on the simplistic line of sight model with limited validity. The study considers the influence of noise as a performance indicator for the comparison between different design approaches. Three different ANN algorithms are considered, including Levenberg–Marquardt, Bayesian regularization, and scaled conjugate gradient algorithms, to minimize the positioning error (εp) in the VLP system. The ANN design is optimized based on the number of neurons in the hidden layers, the number of training epochs, and the size of the training set. It is shown that, the ANN with Bayesian regularization outperforms the traditional received signal strength (RSS) technique using the non-linear least square estimation for all values of signal to noise ratio (SNR). Furthermore, in the inner region, which includes the area of the receiving plane within the transmitters, the positioning accuracy is improved by 43, 55, and 50% for the SNR of 10, 20, and 30 dB, respectively. In the outer region, which is the remaining area within the room, the positioning accuracy is improved by 57, 32, and 6% for the SNR of 10, 20, and 30 dB, respectively. Moreover, we also analyze the impact of different training dataset sizes in ANN, and we show that it is possible to achieve a minimum εp of 2 cm for 30 dB of SNR using a random selection scheme. Finally, it is observed that εp is low even for lower values of SNR, i.e., εp values are 2, 11, and 44 cm for the SNR of 30, 20, and 10 dB, respectively.


Introduction
The necessity for indoor location-based services has been growing over the past decades due to its significance in the development of various applications, such as smart home appliances, robots, supermarkets, shopping malls, hospitals, etc. Various conventional positioning techniques are based on radio frequency (RF) technologies; for instance, the global positioning system has been used in outdoor environments. However, in indoor environments, it suffers from multipath-induced fading, which can affect the accuracy of the position estimation significantly [1,2]. A number of RF-based positioning systems have also been introduced including Bluetooth [3], ultrasound [4], wireless local area network [5], ultra-wide band [5], and RF identification [6].
Light-emitting diodes (LEDs)-based visible light communication (VLC) systems have been introduced in recent years, which have shown great potential in achieving highprecision indoor positioning due to the use of optical signals. These systems are known as

System Model
The proposed system consists of a standard empty room with several LED-based Txs and a single photodiode (PD)-based Rx, which is facing upwards, as depicted in Figure 1. The Txs and Rxs are placed on the ceiling and floor levels at heights, h t and h r of three and zero meters, respectively, from the ground. In the channel, we consider signals from both LoS and NLoS transmission paths. Note, for the NLoS, we have limited the reflections to the first order for: (i) the sake of simplicity [27]; and (ii) to contain most of the transmitted power [28]. In this work, we have adopted a simple Lambertian model with a v of 1 [29].
The block diagram of the proposed scheme is depicted in Figure 2. We have not considered the synchronization issue and have assumed that each Tx transmits a unique ID information, which is encoded and modulated in the on-off keying (OOK) signal format, and at the Rx the received power P R,i due to each Tx being determined using correlation methods, which are given by [26]: where P LoS,i and P NLoS,i are the received power from the i th Tx due to LoS and NLoS paths, respectively, and n G is the additive white Gaussian noise power with a zero mean and variance σ 2 i.e., N(0, σ 2 ), which arise from the thermal noise, and dark current, signal, and The block diagram of the proposed scheme is depicted in Figure 2. We have no considered the synchronization issue and have assumed that each Tx transmits a unique ID information, which is encoded and modulated in the on-off keying (OOK) signa format, and at the Rx the received power , due to each Tx being determined using correlation methods, which are given by [26]: , and , are the received power from the i th Tx due to LoS and NLoS paths, respectively, and is the additive white Gaussian noise power with a zero mean and variance i.e., N(0, ), which arise from the thermal noise, and dark current signal, and background radiation-induced shot noises. Note, in VLC systems, the latter is the dominant noise source.    The block diagram of the proposed scheme is depicted in Figure 2. We have not considered the synchronization issue and have assumed that each Tx transmits a unique ID information, which is encoded and modulated in the on-off keying (OOK) signal format, and at the Rx the received power , due to each Tx being determined using correlation methods, which are given by [26]: where , and , are the received power from the i th Tx due to LoS and NLoS paths, respectively, and is the additive white Gaussian noise power with a zero mean and variance i.e., N(0, ), which arise from the thermal noise, and dark current, signal, and background radiation-induced shot noises. Note, in VLC systems, the latter is the dominant noise source.   Figure 3a, for the LoS, the power is the highest directly beneath the Txs. The power decreases gradually with the user moving toward the corners and walls of the room. Figure 3b shows that, for the NLoS paths, power distributions are the highest along the walls, thus resulting in a slight rise in the total power   Figure 3a, for the LoS, the power is the highest directly beneath the Txs. The power decreases gradually with the user moving toward the corners and walls of the room. Figure 3b shows that, for the NLoS paths, power distributions are the highest along the walls, thus resulting in a slight rise in the total power received at the Rx near the walls. Figure 3c depicts the total power at Rx from both LoS and NLoS paths showing higher peak and average power level compared to Figure 3a,b. Note that, the received power from the NLoS paths leads to the overestimation of the transmission distances and, therefore, further degrades the positioning accuracy in the localization process. received at the Rx near the walls. Figure 3c depicts the total power at Rx from and NLoS paths showing higher peak and average power level compared to Fig Note that, the received power from the NLoS paths leads to the overestimatio transmission distances and, therefore, further degrades the positioning accura localization process. The received power from LoS path can be expressed as [30]: where is the distance between the i th Tx and the Rx, is the irradiance an the i th Tx to the Rx, φ, and ℛ are the incident angle and PD responsivity, resp , is the transmit power from the i th Tx and is the area of the PD. Ts( ) and the transmittance function and the concentrator gain of the Rx, respectively, considered to be unity for simplicity's sake. Lambertian order is given by: ln cos(HPA) , where HPA refers to the half-power angle for the light source. The RSS algorith porates a distance estimation step based on the total received power , , where tance between the i th Tx and the Rx is estimated as: The received power from LoS path can be expressed as [30]: where d i is the distance between the i th Tx and the Rx, ω i is the irradiance angle from the i th Tx to the Rx, ϕ, and R are the incident angle and PD responsivity, respectively. P t,i is the transmit power from the i th Tx and A r is the area of the PD. T s (ϕ) and g(ϕ) are the transmittance function and the concentrator gain of the Rx, respectively, that are considered to be unity for simplicity's sake. Lambertian order is given by: where HPA refers to the half-power angle for the light source. The RSS algorithm incorporates a distance estimation step based on the total received power P R,i , where the distance between the i th Tx and the Rx is estimated as: where r i is the horizontal distance from the i th Tx to the Rx and h is the difference in height between the Tx and Rx, i.e., (h t − h r ). The received power from the first order reflection is given by [31]: where d i,w , ϕ i,w , and ω k,w are the distances, receiving incident angle, and the irradiance angle between the i th Tx and the reflective area, respectively. d w,r , ϕ w,r , and ω w,r are the distances, receiving incident angle, and irradiance angle between the reflective area and the Rx, respectively. ρ is the reflectance factor of the reflecting surfaces and A ref is the reflectance area. For the NLoS case, a significant error may occur when calculating the distance due to the existence of reflections, as noted in (5). Therefore, a polynomial fitted model is introduced to express the relation between P R,i and the total distance from i th Tx and the Rx [32,33], which is given by: where a 0 ··· a g are the coefficients of the polynomial model for a g th order polynomial.

Estimation Algorithms
In the case of LLS, a g values are initially estimated based on the fitting process for the given values of d i and P R,i . These values are then utilized for the estimation of d i and substitution in (4) to determine r i for each Tx. Note that, LLS is used to find a coarse estimate of the Rx's position, which is given by [18]: where [x Rx ,ŷ Rx ] is the estimated position of the Rx, and A and B are given as: where I is the total number of Txs. However, the LLS estimation solution may not offer a high positioning accuracy [18]. This is especially true for the positions close to the walls and corners, where the signal power levels from the NLoS paths are higher. The NLLS estimation can be utilized as an alternative approach for position estimation, which minimizes the approximation error attained from LLS estimation [25]. The trust region algorithm is employed to solve the unrestricted optimization problem to realize the 3D positioning [34]. The estimated location is found at the minimum of the averaged squared error C, which is given by: where x Rx and y Rx is the estimated position of the Rx. r i is computed from (4) and (6).
In this work, we consider NLLS with a polynomial fitted model for the distance as the baseline for performance comparison.

Use of ANN for Regression
Even with the power versus the distance relation for NLoS described in (6), the room morphology (corners, walls, furniture, etc.) changes a great deal, thus making it difficult to infer an approximate model, which is applicable for every scenario. As a result, using ANN is advantageous since it is trained using P R,i from each Tx and the transmission distance. The regression analysis is useful to model the relationship between a dependent variable and one or more independent variables (i.e., the input values in the model). One of the possible solutions for any type of regression problem is the ANN. The ANN is inspired by the process of the human brain, and therefore is composed of neurons that work in parallel. Each neuron is capable of performing a simple mathematical operation individually [35]. Collectively, the neurons can evaluate complex problems, emulating most of the functions and providing precise solutions. The ANN is an interconnected network of processing elements (neurons) and it includes two different phases: (i) the training phase-where the ANN estimates an input-output map based on the training data set. During this training phase, the neuron weights are continuously adapted to minimize the error between the estimated output and the training data vectors. The process terminates when the required performance is achieved, or the complete training set is used; and (ii) the operation phasewhere the ANN is employed to perform estimates based on the input data alone. The ANN structure consists of at least three layers; a single input layer consisting of γ N , one or several hidden layers (HL), and a single output layer (see Figure 4a). These layers are linked together based on a collection of connected units or nodes, called the artificial neurons. The importance of these neurons is defined based on their weights and the learning process.
The weight W m kn has the capability to acquire and store experimental knowledge, where k, n, and m represent the number of neurons, inputs, and layers, respectively. These are also known as the synaptic weights as their principle is like the synapses present in biological brains. It relates the n th input to the k th neuron. Note, the number of neurons in the hidden layer controls the weights and the bias in the network. Each neuron can be biased with a value b m as depicted in Figure 4b. For HLs, a sigmoid transfer function is used as an activation function that applies thresholding to the input data and produces outputs as a continuous value between zero and one, while the output layer employs a linear transfer function. The performance of an ANN algorithm is measured by the mean square error, which can be expressed as a function of F(p m k ) as: where p k is the vector containing all of the network weights and biases for the k th neuron (i.e., p k = [W k , b k ]), and a m k is the network output of the k th neuron for the m th layer and t m k is the target output of the k th neuron for the m th layer. The weights and the bias are updated by the backpropagation method [35] as: where G is the learning rate, m = 0, 1, . . . , M − 1, M is the number of layers in the network, and (.) T is the transpose. b m k,n is the bias vector. γ m kn is the input vector, n = 0, 1, . . . , N, and N is the total number of inputs in the network. s m is the sensitivity matrix, which is evaluated from the least mean square error function,F p m k for various values of j, wherein j is defined in the matrix form as γ k W k + b k . and (ii) the operation phase-where the ANN is employed to perform estimates based on the input data alone. The ANN structure consists of at least three layers; a single inpu layer consisting of γN, one or several hidden layers (HL), and a single output layer (see Figure 4a). These layers are linked together based on a collection of connected units or nodes, called the artificial neurons. The importance of these neurons is defined based on their weights and the learning process.  The weight has the capability to acquire and store experimental knowledge where k, n, and m represent the number of neurons, inputs, and layers, respectively These are also known as the synaptic weights as their principle is like the synapses pre  The ANN structure in the proposed study is composed of four layers: an input layer; two HLs; and an output layer. Each layer has a different number of neurons, with the input and output layers having four and two neurons, respectively. The estimated x and y position coordinates are represented by the output neurons. The estimated distances from each Tx are applied to the input layer with the help of (6).
In this work, we have investigated the number of HLs and have determined that a simple ANN with only one hidden layer would not provide the desired results, i.e., high positioning errors. Using two hidden layers provided a more effective framework for achieving improved performance. Therefore, based on our preliminary research, we limited the number of hidden layers to two. The neurons in the HLs are activated using a Sigmoid transfer function, which thresholds the input data and outputs a continuous value between zero and one. A linear transfer function is used in the output layer. All notations utilized in the paper are indicated in Table 1. Transmitted power from the i th Tx T s (ϕ) Transmittance function g(ϕ) Concentrator gain of the Rx A r Area of the photodetector r i The horizontal distance from the i th Tx to the Rx h The difference in height between the Tx and Rx, i.e., (h t − h r ) The distances, receiving incident angle, and the irradiance angle between the i th Tx and the reflective area, respectively The distances, receiving incident angle, and the irradiance angle between the reflective area and the Rx, respectively ρ The reflectance factor depending on the material of the reflective surface Coefficients of the polynomial model for the g th order polynomial The estimated position of the Rx C Averaged squared error x Rx , y Rx The estimated position of the Rx. The vector containing all the network weights and biases for the k th neuron i.e., The network output for the k th neuron t k The target output of the network for the k th neuron G The trace of the inverse of Hessian matrix E qw (p k ) Quadratic approximation of the error function, F(p k ) p 1 , p 2 , . . . .p k The set of non-zero weight vectors Table 1. Cont.

Notation Definition
Percentage of the confidence interval Q Quantile function ε p−min Minimum positioning error ξ k Step size Following that, we have adopted a few well-known training algorithms and used them to analyze the ε p of the proposed system. For this investigation, we have used the default values of Matlab's fitnet tool to fix the parameters such as the learning rate. Note that, other parameters such as the number of neurons in HLs or the activation functions could also be optimized based on the topology of the HLs. Since Sigmoid and linear activation functions have been shown to perform well in regression tasks [36], therefore, they are used in the hidden and output layers, respectively. Having selected Bayesian regularization as the optimal learning algorithm, we then optimized the learning phase using the number of epochs and size of the training set.

ANN Training Methods
The network records the trained information in W m kn and b m . Supervised learning algorithms are adopted in this work as explained in the following subsections.

Levenberg-Marquardt Algorithm
The Levenberg-Marquardt (LM) algorithm is employed to solve the NLLS problems. By leveraging the most used optimization algorithms (i.e., Gauss-Newton algorithm, and the steepest descent algorithm), the LM algorithm can avoid some problems, such as overparameterization, local minima, and non-existence of the inverse matrix [37]. Moreover, it inherits the speed advantage of Gauss-Newton algorithm and the stability of the steepest descent algorithm. The updated rule of weights and biases, i.e., p k is given by: where J k is Jacobian matrix of the function, F(p k ), and µ k ≥ 0 is a scalar, and I is the identity matrix.

Bayesian Regularization Algorithm
Bayesian regularization (BR) is an algorithm that updates the values of weight and bias in accordance with LM optimization. In this algorithm, firstly, a linear combination of the squared errors and the weights are minimized and then, the linear combination is modified with the aim of obtaining a network with good generalization qualities [35]. In BR, the mean squared error function can be defined as: where E D is the squared error, E W is the sum of squared weights, which penalizes large weights in reaching a better generalization and smoother mapping, α, and β are the regularization parameters (or objective functions), which are given as: where γ e = N − 2αtr H −1 is called the effective number of parameters, H = ∇ 2 F(p k ) is Hessian matrix, N wb is the total number of parameters (weights and biases) of the network, tr H −1 is the trace of the inverse of Hessian matrix. Note, the 2nd term in (15) is known as the weight decay, and therefore small values of W would reduce the overfitting of the model.

Scaled Conjugate Gradient Algorithm
Most of the conjugate gradient algorithms use a line search for each iteration, thus making them computationally complex. Therefore, to address this we have adopted the scaled conjugate gradient (SCG) algorithm developed by Moller [38]. SCG is based on conjugate directions without performing a line search, with reduced computational complexity. The SCG algorithm, which is a scaled conjugate gradient method for updating the weight and bias values, is robust and does not depend on the user-defined parameters, given that the step size is a function of quadratic approximation of the error [38]. Different approaches are used for estimating the step size, which is given by: where E qw (p k ) is the quadratic approximation of the error function, F(p k ). p 1 , p 2 , . . . .p k are the set of non-zero weight vectors, and s k is the second-order information. λ k is the scaler to be updated such that: If ∆ k > 0.75, then λ k = λ k /4, and if ∆ k < 0.25 then λ k =λ k + δ k (1 − ∆ k )/|p k | 2 . ∆ k is a comparison parameter given by:

Results and Discussion
The proposed system adopted in Section 2 is implemented in the simulation environment using MATLAB. Both NLLS and different ANN algorithms are applied to the proposed VLP system, and the performance of all algorithms is compared. The ANN structure is composed of four layers, which include an input layer, two HLs, and an output layer. The number of neurons in each layer is variable, with four and two neurons in the input and output layers, respectively. The latter represents the estimated x and y position coordinates. Using (6), the calculated distances from each Tx are fed to the input layer. A sigmoid transfer function is used as the activation function for the neurons in the HLs, which thresholds the input data and provides the output as a continuous value between zero and one. The output layer employs a linear transfer function.
Besides, the proposed positioning process includes: (i) the total received power computed at the Rx; (ii) the polynomial regression model used to determine the power distance relation, and the distance from each Tx to the Rx; (iii) the computed distance is used as the input to the ANN algorithm for training purpose; and (iv) the position is estimated as the output of the ANN algorithm. Furthermore, for the real implementation, the use of these algorithms would imply two phases: the training phase, where previously collected data will be used for training the ANN; and the stand-alone phase, where the trained ANN with fixed weights will be used in the hardware for position estimation. Figure 5 illustrates the overview of the neural network used in the proposed system. The training data consist of different samples, ν of inputs and outputs, where ν is the total number of samples. The distances are considered as inputs, which are computed by (6). The real position of the Rx, (X, Y) is considered as the output for the training data. The training data is fed to the neural network for training and the prediction output, (x Rx , y Rx ) is obtained as estimated positions. These estimated positions are further compared with real positions and the error is again sent to the training algorithm for the modification of weights. This process continues until the network is fully trained.
ors 2022, 22, x FOR PEER REVIEW 12 of where the received power is more uniform and includes the area of the receiving pla within the Txs (LEDs), and the outer region representing the remaining area near t walls and corners as depicted in Figure 1. All the other key system parameters are giv in Table 3.    In this study, two datasets are considered for training, testing, and validation of the ANN as detailed in Table 2. These datasets are composed of the received power information for a given grid of Rxs with different noise power levels (according to the SNR). Note: (i) the data samples are randomly scrambled; and (ii) different datasets are used to avoid biasing of the training process, that is, ANN optimization is conducted using a single dataset, while for the validation and testing, another dataset is adopted. Therefore, 80% of dataset A is used for training, while 20% of dataset B is used for validation and testing. Data scrambling is used to feed the data randomly to the inputs of the neural network for training the network. We consider a grid (1 cm resolution) of 3600 Rx's positions on the receiving plane, which is divided into two regions, i.e., the inner region where the received power is more uniform and includes the area of the receiving plane within the Txs (LEDs), and the outer region representing the remaining area near the walls and corners as depicted in Figure 1. All the other key system parameters are given in Table 3.

VLP Error Performance
Generally, RSS-based positioning algorithms are susceptible to the ambient induced shot noise, thus leading to increased ε p . In this work, we consider the impact of noise, which is modelled as Gaussian with N (0, σ 2 ), on the performance of VLP. A total of 1000 iterations are performed in this simulation to gain some statistical significance. The performance evaluation of the VLP system is provided in terms of the Quantile function Q, which is a valid performance metric to show the level of accuracy. The measurement of the confidence interval of ε p is carried out through the performance metrics of the Q, which is given by [26]: where CDF represents the cumulative distribution function of ε p , and η is the percentage of the confidence interval. Figure 6 shows the measured Q(95%) as a function of the SNR for different ANN algorithms in both inner and outer regions. It is observed that, LM and BR outperform SCG in both regions. For instance, in the inner region at the SNR of 10 dB, ε p−min are 54, 62, and 66 cm for LM, BR, and SCG, respectively, which increases to 80, 95, and 170 cm, respectively, for the outer region. Note, the SNR thresholds for the inner and outer regions are 10 and 15 dB, respectively, where beyond these values, the positioning errors remain almost constant at the lowest levels. Note that, we have considered the average SNR values in the analysis. The decreasing trend in the ε p is justified by the increase in SNR. For high values of SNR, the effect of noise on the estimated position is reduced. On the contrary, for the small values of SNR, the randomness of the input data leads to overfitting, thus making the estimated error larger. To improve the proposed VLP system, we further investigate the impact of ANN algorithms, the number of neurons in the HLs, and the epochs in the following sections.
SNR. For high values of SNR, the effect of noise on the estimated position is reduced. On the contrary, for the small values of SNR, the randomness of the input data leads to overfitting, thus making the estimated error larger. To improve the proposed VLP system, we further investigate the impact of ANN algorithms, the number of neurons in the HLs, and the epochs in the following sections.

Selection of the Training Algorithm and Number of Neurons in the HL
The number of neurons in the HL and different training methods are investigated in this subsection to determine the optimum algorithm based on . The accuracy of the inner region is higher than the outer region due to more reflections being considered in the corners of the room. Therefore, we have only considered the inner region for the selection of the number of neurons in both HLs. As depicted in Figure 6, both LM and BR have lower compared with SCG, and therefore, are considered for further analysis. Next, we investigate a different number of neurons in the HL and the training for an ideal scenario (i.e., no noise). Figure 7 shows the surface plots for of 95% for the different number of neurons for LM and BR. As depicted in Figure 7, are 0.11 and 0.06 cm for: (i) LM with 36 neurons each in the HLs of 1 and 2; and (ii) BR with 32 and 28 neurons in HLs 1 and 2, respectively. Based on the number of neurons in the HL is selected for LM and BR

Selection of the Training Algorithm and Number of Neurons in the HL
The number of neurons in the HL and different training methods are investigated in this subsection to determine the optimum algorithm based on ε p−min . The accuracy of the inner region is higher than the outer region due to more reflections being considered in the corners of the room. Therefore, we have only considered the inner region for the selection of the number of neurons in both HLs. As depicted in Figure 6, both LM and BR have lower ε p compared with SCG, and therefore, are considered for further analysis. Next, we investigate a different number of neurons in the HL and the training for an ideal scenario (i.e., no noise). Figure 7 shows the surface plots for Q of 95% for the different number of neurons for LM and BR. As depicted in Figure 7, ε p−min are 0.11 and 0.06 cm for: (i) LM with 36 neurons each in the HLs of 1 and 2; and (ii) BR with 32 and 28 neurons in HLs 1 and 2, respectively. Based on ε p−min the number of neurons in the HL is selected for LM and BR as detailed in Table 4. Note, the training performance is compared for 1000 epochs between LM and BR with the total computation times of 22 and~10 min, respectively, which are achieved using CPU Intel I Core I i9-9900K CPU @ 3.60 GHz, 3600 MHz, 8 Core PC, having 16 Logical Processors and 32 GB RAM. The epochs represent the number of times the ANN algorithm will run over the full training dataset. BR offers a faster training phase, and therefore, is selected for further investigation of the impact of a different number of epochs. which are achieved using CPU Intel (R) Core (TM) i9-9900K CPU @ 3.60GHz, 3600 MHz, 8 Core PC, having 16 Logical Processors and 32 GB RAM. The epochs represent the number of times the ANN algorithm will run over the full training dataset. BR offers a faster training phase, and therefore, is selected for further investigation of the impact of a different number of epochs.

Impact of Epochs and Noise Performance on the VLP System
Firstly, the effect of epochs in the proposed VLP system is observed, where we investigate different epoch values and their impacts on the error performance. Figure 8 depicts the (95%) as a function of SNR for epochs of 500, 1000, and 3000 for inner and outer regions. We can see that, for the inner and outer regions, the epoch of 3000 offers the lowest for moderate and high values of SNR, and therefore, it is considered for further analysis with the noise. This shows that BR is strongly affected by the number of training epochs, with a larger number of epochs resulting in more tuned network weights. Note that, the epoch of 3000 does not provide high accuracy for the lower value of SNR due to the fact that the network is not able to generalize as well as moderate to high values of SNR. Generally, the precision of the ANN may improve with the higher number of epochs. However, this neglects the possibility of overfitting, which we observed for a larger number of epochs.

Impact of Epochs and Noise Performance on the VLP System
Firstly, the effect of epochs in the proposed VLP system is observed, where we investigate different epoch values and their impacts on the error performance. Figure 8 depicts the Q(95%) as a function of SNR for epochs of 500, 1000, and 3000 for inner and outer regions. We can see that, for the inner and outer regions, the epoch of 3000 offers the lowest Q for moderate and high values of SNR, and therefore, it is considered for further analysis with the noise. This shows that BR is strongly affected by the number of training epochs, with a larger number of epochs resulting in more tuned network weights. Note that, the epoch of 3000 does not provide high accuracy for the lower value of SNR due to the fact that the network is not able to generalize as well as moderate to high values of SNR. Generally, the precision of the ANN may improve with the higher number of epochs. However, this neglects the possibility of overfitting, which we observed for a larger number of epochs.  Figure 9 depicts the (95%) as the function of SNR for the BR-based ANN algorithm and with RSS, as well as for the inner and outer regions and for the epochs of 3000. Results show that NLLS is more prone to the effect of noise and proximity from walls and corners than BR. This can be explained by the ability of the ANN to better estimate the positions near the walls than NLLS and the inherent immunity to the noise. As shown,  show that NLLS is more prone to the effect of noise and proximity from walls and corners than BR. This can be explained by the ability of the ANN to better estimate the positions near the walls than NLLS and the inherent immunity to the noise. As shown, ε p is reduced significantly using ANN. For instance, at the SNR of 30 dB and for the inner region ε p−min are 8 and 13 cm for BR and NLLS, respectively. Moreover, in the inner region, the accuracy improvement values of 46, 58, and 38% are observed for the SNR values of 10, 20, and 30 dB, respectively. While in the case of the outer region, the accuracy improvements of 50, 30, and 9% are observed for the SNR values of 10, 20, and 30 dB, respectively. Therefore, the BR outperforms the traditional NLLS for the SNR range of 5-30 dB. Figure 9 depicts the (95%) as the function of SNR for the BR-based A and with RSS, as well as for the inner and outer regions and for the epoc sults show that NLLS is more prone to the effect of noise and proximity f corners than BR. This can be explained by the ability of the ANN to bette positions near the walls than NLLS and the inherent immunity to the noise is reduced significantly using ANN. For instance, at the SNR of 30 dB an region are 8 and 13 cm for BR and NLLS, respectively. Moreover, i gion, the accuracy improvement values of 46, 58, and 38% are observed values of 10, 20, and 30 dB, respectively. While in the case of the outer re racy improvements of 50, 30, and 9% are observed for the SNR values of 10 respectively. Therefore, the BR outperforms the traditional NLLS for the SN 30 dB.  Figure 10 depicts the error distribution plots using Bayesian regula rithm for different ranges of SNR. It can be observed that the positionin creases by increasing the SNR values. Therefore, we can clearly see the imp these error plots. The main observations are detailed in Table 5. Figure 9. The measured 95% quantile function for NLLS and BR. Figure 10 depicts the error distribution plots using Bayesian regularization algorithm for different ranges of SNR. It can be observed that the positioning error ε p decreases by increasing the SNR values. Therefore, we can clearly see the impact of noise in these error plots. The main observations are detailed in Table 5.

Impact of Different Training Dataset Sizes on the VLP System
Furthermore, we analyze the impact of different training dataset sizes denoted by In on the . For this, we have considered two training scenarios: the random selection (RS) and the uniform selection (US). In the former, the original dataset A is down-sampled from the original 18,000 samples to 9000, 4500, 2250, and 1125 datasets. While in the latter, the grid size is down-sampled from the original 60 × 60 samples to the aforemen tioned sizes. By doing so, we aim to show if the system performance depends on the se

Impact of Different Training Dataset Sizes on the VLP System
Furthermore, we analyze the impact of different training dataset sizes denoted by I n on the Q. For this, we have considered two training scenarios: the random selection (RS), and the uniform selection (US). In the former, the original dataset A is down-sampled from the original 18,000 samples to 9000, 4500, 2250, and 1125 datasets. While in the latter, the grid size is down-sampled from the original 60 × 60 samples to the aforementioned sizes. By doing so, we aim to show if the system performance depends on the selection of training dataset samples. Here, we have only generated results by considering only the data from the inner region. Figure 11 shows the error performance versus the SNR for a range of I n and for both RS and US scenarios. For the RS scenario, the ε p−min values are 2, 11, and 44 cm for the SNR values of 30, 20, and 10 dB, respectively, with a lower I n of 9000 compared to 15, 22, and 44 cm for the US scenario with a higher I n of 18,000. Results show that, the US scenario conducts to larger errors, and this is a result of us sampling the grid resolution. This may conduct to overfitting problems. With the RS scenario, the accuracy improves for high SNR values showing that there is an optimum size for the training dataset. This can be attributed to the fact that the original grid resolution is fixed, leading to less probability of overfitting. Therefore, considering the original dataset provides improved results, the proper selection of the training dataset sizes is also essential to properly design the system. and 44 cm for the US scenario with a higher In of 18,000. Results show that, the US nario conducts to larger errors, and this is a result of us sampling the grid resolution may conduct to overfitting problems. With the RS scenario, the accuracy improve high SNR values showing that there is an optimum size for the training dataset. Thi be attributed to the fact that the original grid resolution is fixed, leading to less prob ity of overfitting. Therefore, considering the original dataset provides improved re the proper selection of the training dataset sizes is also essential to properly desig system.

Conclusions
An indoor VLP system using an artificial neural network for positioning estim in the presence of both line-of-sight and non-line-of-sight multipath signals was

Conclusions
An indoor VLP system using an artificial neural network for positioning estimation in the presence of both line-of-sight and non-line-of-sight multipath signals was analyzed. In order to implement a realistic scenario, we studied the influence of noise in the proposed system. Three different ANN algorithms of Levenberg-Marquardt, Bayesian regularization, and scaled conjugate gradient algorithms were explored for minimizing the positioning error. The optimization of ANN was conducted based on the number of neurons in the hidden layers and the number of training epochs. We showed that the ANN with Bayesian regularization outperforms the traditional RSS technique using NLLS for the SNR range of 5-30 dB. We also observed an improvement in the positioning accuracy for the inner region by 43, 55, and 50% compared to 57, 32, and 6% in the outer region for the SNR of 10, 20, and 30 dB, respectively. We further studied the impact of different training dataset sizes for training the neural network. It is concluded that, ANN is an efficient method that allows us to achieve a minimum positioning error of 2 cm for 30 dB of SNR with a random selection of training dataset sizes. Finally, we observed that the positioning error is low even for a lower range of SNR, i.e., positioning error values of 2, 11, and 44 cm for the SNR of 30, 20, and 10 dB, respectively. In our future work, we will be developing an experimental test-bed for verification of the simulated results.