Photography Trilateration Indoor Localization with Image Sensor Communication

Localization has become an important aspect in a wide range of mobile services with the integration of the Internet of things and service on demand. Numerous mechanisms have been proposed for localization, most of which are based on the estimation of distances. Depending on the channel modeling, each mechanism has its advantages and limitations on deployment, exhibiting different performances in terms of error rates and implementation. With the development of technology, these limitations are rapidly overcome with hybrid systems and enhancement schemes. The successful approach depends on the achievement of a low error rate and its controllability by the integration of deployed products. In this study, we propose and analyze a new distance estimation technique employing photography and image sensor communications, also named optical camera communications (OCC). It represents one of the most important steps in the implemented trilateration localization scheme with real architectures and conditions of deployment which is the second our contribution for this article. With the advantages of the image sensor hardware integration in smart mobile devices, this technology has great potential in localization-based optical wireless communication


Introduction
In the last few decades, localization techniques that obtain the user's location in a particular environment have been widely considered as a core technology for on-demand applications and services. This technology mainly supports industrial business solutions and includes robotic, sensor network, or human life actions such as navigation and tracking [1,2]. One of the well-known localization technologies is the Global Positioning System (GPS), which cross-references received signals from several satellites [3]. However, GPS faces many difficulties with lines-of-sight signal transmissions due to the building block propagation or signal clock. For indoor environments, this technique can be employed by re-using its simple existing infrastructure. These approaches use an additional outdoor antenna to amplify and communicate the satellite signal to an indoor antenna [4]. Various algorithms are applied for enhanced location calculation. However, these algorithms are computationally expensive and perform with low accuracy. In practice, GPS is not the optimal technique for indoor localization [5]. The development of on-demand localization services, applied both indoors and outdoors, has been an important task in wireless communication. Most localization techniques perform in two phases: distance detection and localization calculation with reference coordinators. The main factor affecting the cost and error performance of a localization technique is the distance estimation phase. When choosing calculation methods, the classification is mainly based on the signal measurement such as angle of arrival (AOA), time of arrival (ToA), received signal strength indication (RSSI), time of flight (TOF), or time difference of arrival (TDoA) [2,4,[6][7][8][9][10]. Another is the signaling model base, where the The decoding operation comprises image processing functions of the captured color pixel information. Depending on the image sensor architecture, this could be in the form of spatial or temporal data from the captured image, depending on whether the global or rolling shutter image sensor is used. With the global shutter image sensor, the entire image is captured at once. This architecture has the advantage of spatial modulation with the MIMO scenario. In contrast, the rolling shutter image sensor mode includes multiple captured states at different times in a single captured image. This architecture has the advantage of an improved data rate compared with the global shutter mode. The OCC receiving signal in this architecture is color pixel rolls, which arise from the ON-OFF light signal of the optical transmitter. By applying image processing algorithms for color classification, the receiver can decode the embedded data following the color or pixel pattern. This can be in the form of color cell status in the global shutter image sensor or frequency separation from image rolls in the rolling shutter image sensor.
One of the well-known international specifications for VLC on PHY and MAC layer from the modulation technique and error correction to medium access control function is IEEE 802.15.7. This extends to IEEE 802.15.7 m with the issues on image sensor communications. The modulation techniques in this specification include OOK (On-Off keying), FSOOK (frequency shift On-Off keying), and screen modulation. These modulation mechanisms can operate with the global shutter and the rolling image sensor. The undersampled frequency shift OOK, namely, UFSOOK [13], applies two frequencies on the OOK light signal for data modulation and one frequency for the start frame synchronization. This modulation technique is based on the binary frequency shift subcarrier. The The decoding operation comprises image processing functions of the captured color pixel information. Depending on the image sensor architecture, this could be in the form of spatial or temporal data from the captured image, depending on whether the global or rolling shutter image sensor is used. With the global shutter image sensor, the entire image is captured at once. This architecture has the advantage of spatial modulation with the MIMO scenario. In contrast, the rolling shutter image sensor mode includes multiple captured states at different times in a single captured image. This architecture has the advantage of an improved data rate compared with the global shutter mode. The OCC receiving signal in this architecture is color pixel rolls, which arise from the ON-OFF light signal of the optical transmitter. By applying image processing algorithms for color classification, the receiver can decode the embedded data following the color or pixel pattern. This can be in the form of color cell status in the global shutter image sensor or frequency separation from image rolls in the rolling shutter image sensor.
One of the well-known international specifications for VLC on PHY and MAC layer from the modulation technique and error correction to medium access control function is IEEE 802.15.7. This extends to IEEE 802. 15 rolling image sensor. The undersampled frequency shift OOK, namely, UFSOOK [13], applies two frequencies on the OOK light signal for data modulation and one frequency for the start frame synchronization. This modulation technique is based on the binary frequency shift subcarrier. The frequencies of OOK for encoding 1 and 0 bits are separated by a specific frequency distance. The sampling frequencies of 1-and 0-bit encoding are lower than the camera frame rate for the global shutter image sensor. With two frequencies for embedded data, the proposed scheme can modulate 1 bit per sample. For synchronization, more than one frequency other than the data bit is required. It is performed with a synchronization period at each data frame. The synchronization bits operate at high-frequency OOK (10 kHz), which is above the frequency response of the image sensor. This is defined by a resulting average light intensity value from the image sensor. The main advantage of the frequency shift OOK modulation is the high-embedding bit rate per sample of the OCC system. One option is to overcome the limitation of the data rate. UFSOOK can likewise operate with the rolling shutter image at low rates. For higher data rates, the modulation can be performed by OOK, which modulates the binary bit by the ON-OFF light source. This technique has a limitation with respect to distance variation due to the captured FOV in the image sensor.

Structured Light
This technique uses a single image sensor from the camera, which constructs the projected light pattern. Typically, it includes two components: the laser beams and the camera for laser triangulation. The laser beam projects a dot at a specified coordinate onto the image sensor. The position and depth of the object surface is determined by observation of the light patterns and calculation.

AOA
The AOA technique estimates the distance using angles of arrival of signals from reference nodes. After obtaining the AOA, the position of the mobile device is determined as the intersection of multiple bearings. AOA has an advantage with respect to synchronization, because there is no requirement regarding the clock between the reference node and the mobile device. However, it has the disadvantage of complexity and cost, with the scan technique or sensor array for the LOS detection being based on the received power distribution.

TOA
Time of arrival is the core technology for the GPS system. This technique estimates the separation distance based on the signal arrival time, which is indicated by the stamp time. This one-way propagation time is calculated by the speed of light and the carrier frequency. With the high accuracy of the time clock, TOA represents a challenge to the technology for distance estimation, because the error is 300 m at 1 µs variance. To apply this technique to location services in 2-D topology, at least three reference base station distances must be obtained.

ToF
The concept of ToF is related to the travel time measurement of the light signal from the sensor to object target, which is similar to the ToA technique. This is the round trip time of a modulated (waveforms or pulses subcarrier) light signal. Because it is necessary to measure at the speed of light, the response of the light-sensing hardware must be sufficiently fast, namely, in the nanosecond range. The decoding process takes several duty cycles. The separation distance is calculated based on light speed, signal duty cycle, and angular frequency, which is defined by the traveling time of the signal in modulated ToF or the proportion of pulse delay in pulsed ToF.

TDoA
The measurement is established by the time difference of the signal arriving between transmitters that simultaneously broadcast the reference signal. The TDOA technique determines the position of the mobile device based on multiple reference nodes with the distance variation. Similar to TOA, TDOA requires strict control of the signal propagation clock time, and additional communication hardware or time protocol is required to provide synchronization among the base stations and mobile devices.

RSSI
Distance estimation can be defined by an RSS measurement, knowing the attenuation characteristics of transmitted signals. The distance between the transmitter and receiver is estimated by the signal strength, which is reduced during traveling due to the propagation effect. This is a major mechanism for distance estimation in the development and research of wireless communication. The relation between transmission, receiving power, and link distance is shown in the channel modeling ratio, the distance between transmitter and receiver, the transmission power, and the receiving power.

Vision Photogrammetry
The main mechanism behind vision photogrammetry is based on image geometry, perspective projection, and planar holography obtained by the mono or stereo camera. Using image perspective information of the reference object and intrinsic characteristics of the camera, the distance between the camera and the object can be calculated by the initial object surface area or moving distance between two captured images. The stereo camera can provide accurate distance calculations between the camera and objects, containing information about the baseline, focal length, and corresponding disparity pixels of the left and right captured images.

Survey of Distance Estimation Research
Precise distance information is one of the most important requirements for driving assistance systems in the ITS system or localization for on-demand services. Most research regarding distance estimation is applied using RSSI techniques, due to their low cost and complexity. A study presented in Ref. [14] employs the TOA mechanism on UWB for high accuracy in a single base station positioning system by inverting the relations between the arriving signal amplitude and angle, which are obtained between the tag and the base stations. The distance is obtained by analyzing the flight time of the arrived signal based on the communication protocol. The ranging proposal from Ref. [7], which is based on a hybrid of TOA and RSS measurements, performs a theoretical analysis on cases of practical interest for wireless sensor network applications. Here, the inter-node separation overlaps with the region around the critical distance. The hybrid TOA/RSS maximum likelihood estimator is obtained by maximizing the joint pdf of the RSS and TOA observations. The enhanced RSS-based distance estimator [15,16] uses an iterative form of Newton's method for maximum likelihood-based distance estimation due to the rain attenuation effect, achieving up to 90% error reduction rate. The proposed mechanism in Ref. [17] uses the Monte Carlo algorithm with variance determination by the visible light RSS-based. The calculation is built on the maximization of the signal-to-noise-ratio by means of matched filtering with intensive analytical elaborations. Likewise, working on this issue, schemes from Refs. [18,19] adopt the RSS between a base and a mobile station to obtain accuracy with particle filtering under mixed indoor LTE-A network line-of-sight and non-line-of-sight topologies. Employing another technique, the distance estimation from Ref. [20] utilizes a real-time monocular-vision-based mechanism for a vehicle scenario. The inter-vehicle distance estimation is defined by Haar-like features for vehicle detection, tail-light segmentation, and virtual symmetry detection. The detection process is based on a multi-feature fusion technique to enhance its accuracy and robustness. The distance is defined by the pixel information of the captured image with a reference angle and the distance of the camera position to the head/rear of the vehicle. This mechanism is expected to operate both day and night, and likewise for short-and long-range distances. Moreover, with the use of a single camera, the research from Ref. [21] determines the disparity map and the distance of the object from the camera system using two convex mirrors with a sufficiently long radius of curvature. The pair of images capturing the same scene from two different viewpoints is equivalent to two conventional cameras used in the stereo vision system. The study presented in Ref. [22] obtains distance based on the vanishing point, which is specified by the intersection points of straight lane lines in a vehicle scenario. Applying the Hough transform, the right and left edges are extracted and connected to the vanishing point to segment the road area and distance estimates. The solution using the infrastructure concept from Ref. [23], which represents an imaging mechanism with a blob-guided reference point, combines the pattern recognition of line laser image shapes at a certain angle. The estimation is based on the blobs-gaps relation of the position of the laser projector and the camera coordinates. The principle of the proposed scheme from Ref. [24] detects the width of the license plate as known information, and subsequently calculates the pixel size of the license plate appearing in the CCD image. From the distance-pixel relation between CCD images with known vehicle distances, the distance information can be obtained by a simple mathematical operation. Another study on the vehicle-to-vehicle topology presented in Ref. [25] uses a stereo camera for vehicle LED tracking and distance estimation. The calculation is the combination of a physical model and the Kalman filter for outdoor traveling on-road scenarios. The real-time approach for traveled distance estimation is based on a double integral of acceleration data using the built-in sensors on a smartphone during the cart movement [26]. A precise and robust distance measurement using frequency-domain analysis based on a stereo camera from the analysis of captured images was conducted in Ref. [27]. Here, distance information is calculated using the baseline of two monocular cameras, the focal length, and the pixel disparity that can be calculated directly through the phase of the frequency domain.
The distances between two LEDs in a real-world plane, AB, with the coordinate (x, y, z), and in the image plane, I 1 I 2 , with coordinate (x', y'), are given by Equations (1) and (2).
Using the pinhole camera approximation, the cross-height relation between the object and image is presented by Equation (3). AB Therefore, the distance between the camera and LEDs plane, d, is given as Equation (4).
where f is the camera focal length.

Survey of VLC Localization Estimation
The research and implementation of localization has a long history involving different levels of accuracy and complexity. VLC localization techniques can be classified into proximity, fingerprinting, triangulation, and photography vision. In the proximity technique [28][29][30], the relative position is estimated on the basis of the unique identification code of the light source, which defines one specific coordinate in the database. The accuracy level depends on the field of view of the transmitter. The mobile node can retrieve reference location offline or online using wireless communication interfaces such as Wi-Fi, Bluetooth, or ZigBee. Although this technique has low accuracy, it is facile to implement and can be applied in proximity to localization applications. The medical tracking system used in hospitals, presented in Ref. [28], estimates the location of moving devices based on the nearest ID from the illuminating LED base station. The coordinate retrieval and mapping process between mobile devices and the database server is based on the ZigBee or Wi-Fi connection interface. The enhancement of the proximity method is considered with multi-reference light sources in the presence of LED combinations in Ref. [30]. The optical element was deployed with the full angular diversity illumination from multiple LEDs by shaping their emission patterns. A simple binary signal detection of LED transmitters was proposed to identify the overlapping regions and bound the receiver location. The intersection of the optical cone lighting sources bounds the receiver location to a specific pattern. The optical signal received from a single photodiode of the LED array defines the pattern of the overlapping region by the FFT of the modulated light signal, which has different frequencies ranging from 4 to 10 kHz. The accuracy depends on the density of LEDs and the FOV of the captured signal. The fingerprinting method is a pattern recognition technique. It operates by matching the measured data with pre-measured reference location data based on the distribution of LEDs and the variance of received power due to reflections and scattering of the light source. The estimated coordinate is calculated on the basis of the linear static data or using an AI training model such as a neural network. In comparison with the proximity, the fingerprinting technique is more complex and has a higher accuracy of estimation. The proposed system can achieve average distance error is 33.5 cm and the standard deviation is 16.6 cm with a single luminary, 12.9 cm with a deviation of 8.5 cm with 9 lights, and 26 cm of average error with 4 lights.
Another technique that can achieve a higher accuracy level using the triangles' geometric properties is triangulation. The core calculation of this technique is based on distance estimation that includes the TOA, the time difference of arrival (TDOA), the RSS, and angulation estimation with the AOA method. The VLC positioning system from Ref. [31] uses the Cramer-Rao bound for TOA-based distance estimation. The LEDs are separated by different frequencies and perfectly synchronized with the PD receiver. The VLC TDOA-based indoor positioning system from Refs. [32,33] estimates the indoor localization with PD and multiple known LEDs. The LEDs' separation for the reference node identification is based on the TDMA technique with a square pulse pilot signal. The pilot signal can also be a sinusoidal waveform, as in Ref. [34]. The position of the mobile node was estimated from the ID detection and signal collection from PD. The proposed system from Ref. [35] uses the PD array to detect the AOA of the source signal by comparison of the received power weighted sum of angles of PDs in the PD array, which is controlled by the distance, radiance angle, and incidence angle. The result of ToA-based 3D localization experiments is obtained the precision of 100 mm to 200 mm, which would be improved by compensating the phase estimation for time synchronization from Ref. [33] and 30 cm of AOA estimation schemes based on circular PD from Ref. [35]. Working with ADOAs, the VLC positioning framework from Ref. [36] defined the angle between reference LEDs and receiver by image sensors or PD array. The scheme can achieve 15 cm error with the least-squares method. The system from Ref. [37] works with RSSI and survey data for indoor localization. The signal strength of each LED transmitter was obtained from the photo diode receiver module. The approximate position of the mobile device which holds the receiver was estimated by comparing the received values with the captured dataset value. The system includes four transmitters with a 16-cell labels arrangement for RSSI survey data.
With photography vision, the geometric relation between the world coordinates and the image plane is obtained by lens transfer using image processing. The relative coordinates of the mobile node can be derived with a single camera using co-linearity, or a dual camera using geometric distance estimation. The simulation and implementation from Ref. [38] is based on distance estimation of four preferred LEDs. The position of the mobile device is calculated by the trilateration method. The enhancement of accuracy can be considered by augmenting image sensor resolution and improving the object detection method. The localization scheme from Ref. [39] converts the pixel coordinate system of the image sensor into a mesh coordinate system using collinearity equation. The indoor localization scheme based on the OCC and PDR systems in Ref. [40] combined the identification information of the transmitter and its orientation vector to obtain an approximate location of the mobile device. The former of the rotation vector can be obtained from accelerometers and gravity data, and the latter from magnetometer and gyroscope. This is a real-time localization with a recalculated scheme to update the position and direction information of the LED. The error rate varies from 0.2 to 1.0 m, with an average value of 35 cm. Another indoor localization proposal in Ref. [41] designed an LED-based beaconing infrastructure which could be integrated to the existing lighting infrastructure. The system estimates the unknown location based on the detection image and the a priori information on the beacon positions. Each LED blinks at a high frequency to avoid flickering. Localization can be determined on the basis of LED classification using beacon and distance estimation based on LED blob size. The measurement results with 4 beacons had an error rate of 17 cm.

Distance Estimation
The proposed scenario of distance estimation based on the image sensor is shown in Figure 2. The scheme uses at least two LEDs, which continuously broadcast their coordinate information using VLC technology. From the captured coordinate information on the image sensor, the distance between the two LEDs is given by Equation (5).
where X, Y, and Z are coordinates of the LEDs in the real world, which are represented with A and B; f is the camera focal length. In addition, I 1 I 2 is the distance between the two LEDs' image in the image sensor with the x' and y' coordinates, which is given by Equation (2).

Distance Estimation
The proposed scenario of distance estimation based on the image sensor is shown in Figure 2. The scheme uses at least two LEDs, which continuously broadcast their coordinate information using VLC technology. From the captured coordinate information on the image sensor, the distance between the two LEDs is given by Equation (5).
where X, Y, and Z are coordinates of the LEDs in the real world, which are represented with A and B; f is the camera focal length. In addition, I1I2 is the distance between the two LEDs' image in the image sensor with the x' and y' coordinates, which is given by Equation (2).  The estimation process includes two main calculations: the image sensor rotation distance and the center shift distance. First, the image sensor rotation distance defines the proportion distance between the two LEDs and their image in the image sensor by means of a rotation angle. It includes two processing steps, which define the image of LEDs on the OXY plane with a separation of the Z coordinator of the LEDs and the image sensor plane with a rotation angle. If the camera image sensor is parallel to the LED plane, as in Figure 2, the distance, calculated using Equation (4), is the proportion among the focal lens, the reference LEDs, and the pixel length of the image of the LEDs. However, with a general captured case, such as that shown in the scenario illustrations in Figure 3a, 3b and 3c, where the captured image sensor plane is not parallel to the LED plane, the distances between the LEDs on the image sensor are the same. For the general scenario, the calculation must consider the captured angle that comes from the lines between the two LEDs and the camera surface.  The estimation process includes two main calculations: the image sensor rotation distance and the center shift distance. First, the image sensor rotation distance defines the proportion distance between the two LEDs and their image in the image sensor by means of a rotation angle. It includes two processing steps, which define the image of LEDs on the OXY plane with a separation of the Z coordinator of the LEDs and the image sensor plane with a rotation angle. If the camera image sensor is parallel to the LED plane, as in Figure 2, the distance, calculated using Equation (4), is the As shown in Figure 3d, with the assumption that A and B are known LEDs, the image of AB on the OXY plane is AB', the image of AB on the image sensor is I A I B , the distance from AB to the image sensor can be defined by pinhole camera approximation, and is d. However, the desired value is h, which can be calculated as follows: where µ is angle between AB and AB', µ is also equal to the angle between HF and FD. Then, Φ is the angle between AF and FD. Finally, I F is the center point of the image sensor. Considering the image sensor rotation status in Figure 4, the rotation angle with OXY plane is controlled by the pitch and roll axis information. These can be acquired from the variation detected by the Gyroscope Sensor with a reference coordinate or direction coordinate of the two reference LEDs. The calculation for distance estimation with the scenario in Figure 3d assumes that the image sensor plane and the OXY plane are parallel. For a general rotation of the image sensor to the OXY plane, the calculation of µ in Equation (6), which directly affects AB' (as Figure 3d), is redefined by µ oxy as in Figure 5, where µ oxy is the angle between AB and the OXY plane; µ OXY is the angle between AB and the OXY plane. The rotary image sensor plane has a normal vector, which is defined in Equation (7).
where → u and → v are the rotation vectors of the x-world coordinate axis and the y-world coordinate axis, respectively, given by Equation (8). β and ∝ are the rotation angle of image sensor on Roll axis and Pitch axis.
Here, IG is the center point of the image sensor, which is generated from the principal axis of a lens and the image sensor. FG, which depicts the distance between the LED plane and the lens in the center of the image sensor, can be calculated from the first step. Subsequently, the IGIE is the center shift of the LED image, which can be measured from image sensor pixels through the perpendicular line of the image of the two LEDs and the image sensor center point. The equation of the line from the two points (IA, IB) with the slope, m, and the intercept of the line, b, is given by the following conditions (13).
In the image sensor plane (oxyz), the coordinates of IE are defined by the intersection point between IAIB and the perpendicular line to IAIB traveling through the image sensor center point, given by Equation (14).
. The second calculation is for the center shift distance, which defines the real distance between the plane of the two LEDs and the image sensor with a shift distance from the center of the image sensor, as shown in Figure 6. The expected distance, EF, is defined by Equation (12). Figure 5. Image of reference points in the image sensor plane. Then, the coordinates of the intersection point, E, from the intersection between the expectation distance line and the line between the two LEDs in the LED plane are calculated by Equation (15). Here, I G is the center point of the image sensor, which is generated from the principal axis of a lens and the image sensor. FG, which depicts the distance between the LED plane and the lens in the center of the image sensor, can be calculated from the first step. Subsequently, the I G I E is the center shift of the LED image, which can be measured from image sensor pixels through the perpendicular line of the image of the two LEDs and the image sensor center point.
The equation of the line from the two points (I A , I B ) with the slope, m, and the intercept of the line, b, is given by the following conditions (13).
In the image sensor plane (oxyz), the coordinates of I E are defined by the intersection point between I A I B and the perpendicular line to I A I B traveling through the image sensor center point, given by Equation (14).
Then, the coordinates of the intersection point, E, from the intersection between the expectation distance line and the line between the two LEDs in the LED plane are calculated by Equation (15).
where I A I E is calculated based on the coordination of detected I A in the image sensor and defined by the formula in Equation (14) for point I E .

Coordinate Estimation
In the general scenario shown in Figure 6, the intersection point, E, between the LED line, AB, and the distance of the line between LEDs plane to camera, EF, from the distance estimation step, includes the complete coordinates based on trilateration formulas. On the basis of this mechanism, every pair of transmitter LEDs can define one distance between the camera and the LED pair at the specified intersection point. With n LEDs, we have k combinations of n sets of distance (between the camera and the LED line) and full coordinate of intersection point.
In the general scenario, the localization of mobile devices can be estimated on the basis of reference nodes with known coordinates and measured distances. With n known anchor nodes and distances from the mobile node to anchor nodes, L i = (x i ,y i ), r i , where (i = 1, . . . , n), the estimated location of mobile node M = (x,y) in two dimensions can be calculated by Equation (16) [42]. The anchor nodes and distance are defined in Section 3.1.1, especially Equations (12) and (15).
The calculation can be represented in matrix form as Ax = b, which defines the relationships between positions and distances according to Equation (17).
The position of the mobile node with coordinates (x,y) can be calculated using the least-squares matrix by Equation (18).
It can be seen that the accuracy of the coordinate calculation is mainly controlled by the distance estimation results. However, due to the Gaussian distributions, and inaccurate anchor positions and distance measurements, the least-squares formula can be reassigned with a weight of estimation variances as Equation (19). By applying more reference nodes which Gaussian distribution error for the distance estimation, the error with respect to the coordinates can be reduced.
Here (16) to (19) for x, y, and z correlates with the least-squares system for three reference points in a 2D topology and four reference points for a 3D topology.

OCC Transmitter
With the limitations of OOK modulation on the distance variation due to the captured FOV of the image sensor, FSOOK demonstrates the advantages in terms of stripe width separation. At different distances of the communication link, the covered bits in the captured image will be inversely proportional to the distance. This means that the amount of covered bits at a greater communication distance will be smaller. The bit loss arises from the coverage problem of the camera FOV. This cannot be recovered by any technique due to the out of protocol control rule. This is one of the main limitations of OOK modulation for the OCC system. With the frequency subcarrier, where the modulated bit is represented by multiple cycles, the distance variation does not affect the decoding process. In this technique, the binary bit classification is defined on the basis of the waveform frequency of the subcarrier, which is represented by a group of rolling stripes at the image sensor. This demonstrates the advantage of bit encoding compared with OOK modulation. At different link distances, the subcarrier frequency can be classified by the Fourier transform or the stripe width pattern. The number of rolling stripes depends on the captured FOV of the image sensor and the link distance. However, if the width of rolling stripes is constant, then the subcarrier frequency will also be the same. This is the basic advantage of FSOOK modulation compared with the OOK technique with respect to bit decoding.
The frequency shift OOK modulation operates with multiple frequency shifts of the on-and-off LED light. The number of frequency shifts decides the number of embedded bits in one symbol. The frequency selection should consider eye safety and the threshold frequency of the shutter. Basically, these frequencies range from 100 Hz to 8 kHz for complete illumination with flickering mitigation, as shown in Figure 7. However, to reduce the down-rate of the coding bit due to the line coding mechanism, besides the frequencies for data coding, one more frequency should be allocated for the preamble signal. The preamble signal operates at the beginning of every super frame for synchronization. Most existing studies on the frequency shift OOK modulation define the preamble frequency above the image sensor shutter speed threshold. This is about 10 kHz, at which the captured image is at the average brightness level of the light source.     With regard to communication, if low frequency is selected for subcarrier, there is a tradeoff relation between the numbers and the pixel width of the LED captured roll. The number of rolls is decreased when the captured distance is increased. This is reliable for decoding the signal. However, the flickering point is large because of the illumination variation between on and off light signals at the frequencies of f1 and f2. The optical clock rate of the transmitter should be the same as the camera Flickering is the first consideration in VLC deployment, which is defined as one of the most important factors of the transmitter. The frequency separation is defined by the difference in width between the black and white stripes, which is created by the on-and-off signal duration of the LED, as shown in Figure 8. If there is a big variation in the average illumination, the flickering point will be recognized by the human eye, as shown in Figure 9. The higher the frequencies that are applied, the more flickering will occur because of the larger distance of the frequency switch. This is a tradeoff between the embedded rate and the flickering index of the flickering point.    With regard to communication, if low frequency is selected for subcarrier, there is a tradeoff relation between the numbers and the pixel width of the LED captured roll. The number of rolls is    With regard to communication, if low frequency is selected for subcarrier, there is a tradeoff relation between the numbers and the pixel width of the LED captured roll. The number of rolls is decreased when the captured distance is increased. This is reliable for decoding the signal. However, To minimize the flickering point, the proposed system applies two frequencies for the subcarrier on-off signal. The bit encoding mechanism and frequency selection are shown in Table 2. The line coding mechanism is defined with two sampling bits "00" for the synchronization signal. At the frequencies of 2 and 4 kHz, the flickering point is minimized to 10% with variation in illumination.
The system can achieve a flickering index of 90% with the "VISO system flicker tester" application [43]. The frequency selection is based on two factors: the flickering point and the communication reliability. With regard to communication, if low frequency is selected for subcarrier, there is a tradeoff relation between the numbers and the pixel width of the LED captured roll. The number of rolls is decreased when the captured distance is increased. This is reliable for decoding the signal. However, the flickering point is large because of the illumination variation between on and off light signals at the frequencies of f1 and f2. The optical clock rate of the transmitter should be the same as the camera frame rate of the receiver. The optical rate and camera sampling rate selection should consider the image processing for the bit decoding algorithm.

OCC Receiver
The proposed OCC receiver architecture is based on the rolling shutter mode of the image sensor. The modulated optical signal is captured and converted to a color pixel array by the camera. The out stream is a buffer at which the output image data are dumped at each sampling frame. The data decoding includes light conversion, memory access time, and data image processing. The total time must be less than 1/(frame rate). If this value is larger than the buffer, the memory will overflow. The memory access time is dependent on the image type format and resolution. To minimize the decoding time, we applied a non-compressing image method, based on the gray-scale YUV color space. Figure 10 presents the data decoding with image processing. The gray-scale image data from the buffer are dumped to bitmap data using a two-dimensional array to minimize the processing of contour detections that result from the color conversion. frame rate of the receiver. The optical rate and camera sampling rate selection should consider the image processing for the bit decoding algorithm.

OCC Receiver
The proposed OCC receiver architecture is based on the rolling shutter mode of the image sensor. The modulated optical signal is captured and converted to a color pixel array by the camera. The out stream is a buffer at which the output image data are dumped at each sampling frame. The data decoding includes light conversion, memory access time, and data image processing. The total time must be less than 1/(frame rate). If this value is larger than the buffer, the memory will overflow. The memory access time is dependent on the image type format and resolution. To minimize the decoding time, we applied a non-compressing image method, based on the gray-scale YUV color space. Figure 10 presents the data decoding with image processing. The gray-scale image data from the buffer are dumped to bitmap data using a two-dimensional array to minimize the processing of contour detections that result from the color conversion.
The data decoding process includes two steps: detection of the regions of interest (ROIs) and subsequent extracting cell data. Among these, the first step isolates the image areas from the captured image. Subsequently, the second step decodes the frequency from detected ROIs. The ROI separating step is more important, because it can cause a fatal error in the next step. While detection of highfrequency changes in illumination is conducted, we exploit the fact that the rolling shutter of most modern CMOS cameras does not capture total image simultaneously. Alternately, the data transfer is pipelined by the sensor with the exposure of a pixel. This means that a light pulsing at a frequency, which is much higher than the capturing time of a frame, will light up only some rows, thus producing bands in the image. By detecting the frequency of these bands in the frequency domain of the image and inferring the pulsing light frequency, we can use the relationship between them to build up the demodulation for FSOOK. The main difference in the OCC receiver system configuration is the exposure time, which is defined as the pixel roll exposure duration. Under a different configuration of the exposure time, which follows the shutter speed, the light density is spread out onto the image sensor during absorption. This can likewise reduce the interference or the blur effect of being near the lighting source. Then it has a direct effect on the frequency separation in the decoding process. The large reduction configuration, however, has the disadvantage of low brightness, as the captured image is portrayed at a dark gray level. Therefore, the shutter speed configuration should consider the brightness of the scenario, which depends on LED power and the environment.
The on-off signal is presented by black and white stripes of occupied pixels in the captured image. The width of the stripes depends on modulation frequency. With a square waveform signal at a frequency fs, the duration of the on-off state, tf, will be 1/(2fs) seconds for a pair of black and white stripes. To generate one stripe in hs pixel rows with readout time, the width of the stripe pair ws, in millimeters, is defined by Equation (20). The data decoding process includes two steps: detection of the regions of interest (ROIs) and subsequent extracting cell data. Among these, the first step isolates the image areas from the captured image. Subsequently, the second step decodes the frequency from detected ROIs. The ROI separating step is more important, because it can cause a fatal error in the next step. While detection of high-frequency changes in illumination is conducted, we exploit the fact that the rolling shutter of most modern CMOS cameras does not capture total image simultaneously. Alternately, the data transfer is pipelined by the sensor with the exposure of a pixel. This means that a light pulsing at a frequency, which is much higher than the capturing time of a frame, will light up only some rows, thus producing bands in the image. By detecting the frequency of these bands in the frequency domain of the image and inferring the pulsing light frequency, we can use the relationship between them to build up the demodulation for FSOOK.
The main difference in the OCC receiver system configuration is the exposure time, which is defined as the pixel roll exposure duration. Under a different configuration of the exposure time, which follows the shutter speed, the light density is spread out onto the image sensor during absorption. This can likewise reduce the interference or the blur effect of being near the lighting source. Then it has a direct effect on the frequency separation in the decoding process. The large reduction configuration, however, has the disadvantage of low brightness, as the captured image is portrayed at a dark gray level. Therefore, the shutter speed configuration should consider the brightness of the scenario, which depends on LED power and the environment.
The on-off signal is presented by black and white stripes of occupied pixels in the captured image. The width of the stripes depends on modulation frequency. With a square waveform signal at a frequency f s , the duration of the on-off state, t f , will be 1/(2f s ) seconds for a pair of black and white stripes. To generate one stripe in h s pixel rows with readout time, the width of the stripe pair w s , in millimeters, is defined by Equation (20).
The blur effect, which arises from the inter-symbol interference and the offset shifting time of exposure process, has a width given by Equation (21).
Here, d c is the pixel density, t r is the row read out time, and h s is the height of the image in pixels, as shown in Figure 11.
where t is the exposure time at which the sensor opens the shutter to receive photons.
Here, dc is the pixel density, tr is the row read out time, and hs is the height of the image in pixels, as shown in Figure 11.
where t is the exposure time at which the sensor opens the shutter to receive photons.  If the transition time of the modulation optical signal is larger than the camera exposure time, the captured signal cannot be classified. The stripe width in Figure 12 shows the effect of different If the transition time of the modulation optical signal is larger than the camera exposure time, the captured signal cannot be classified. The stripe width in Figure 12 shows the effect of different shutter speed configurations. At short shutter speeds, the stripe width is reduced to close to the optical clock rate.  If the transition time of the modulation optical signal is larger than the camera exposure time, the captured signal cannot be classified. The stripe width in Figure 12 shows the effect of different shutter speed configurations. At short shutter speeds, the stripe width is reduced to close to the optical clock rate.

Topology and Operation
The flowchart for distance estimation based on OCC is shown in Figure 13. It includes five steps: OCC transmitter classification, coordinate query, device rotation matrix calculation, OCC LED image coordinate tracking, and distance estimation. In the first step, the OCC LEDs are extracted from among the illumination LEDs to be used for data communication in the second step. The OCC LEDs broadcast their coordinates through the optical channel. The coordinates of the two LEDs are queried from a server with the unique IDs, which are obtained by the OCC technique at the third step. The query can be executed with the index table or communication technology such as Wi-Fi or LTE. The distance between two LEDs is obtained from the coordinate information. Subsequently, the

Topology and Operation
The flowchart for distance estimation based on OCC is shown in Figure 13. It includes five steps: OCC transmitter classification, coordinate query, device rotation matrix calculation, OCC LED image coordinate tracking, and distance estimation. In the first step, the OCC LEDs are extracted from among the illumination LEDs to be used for data communication in the second step. The OCC LEDs broadcast their coordinates through the optical channel. The coordinates of the two LEDs are queried from a server with the unique IDs, which are obtained by the OCC technique at the third step. The query can be executed with the index table or communication technology such as Wi-Fi or LTE. The distance between two LEDs is obtained from the coordinate information. Subsequently, the separation distance of two LED images on the image sensor plane can be calculated by pixel measurement. By combining the rotation angle between the LED lines and the image sensor plane at the fourth step, distance estimation can be finalized. separation distance of two LED images on the image sensor plane can be calculated by pixel measurement. By combining the rotation angle between the LED lines and the image sensor plane at the fourth step, distance estimation can be finalized.

Error Analysis
The proportions of h and a in the camera architecture shown in Figure 14 are defined by Equation (22). The accuracy of the object image height a will affect the error rate of d, where the focal length f and the object height h are defined values extracted from the camera information and OCC reference coordinate calculation.
The estimation accuracy is directly dependent on the LED image landmark recognition. More specifically, it is strongly affected by the reference point for coordinate identification, which depends on the light conditions of the environment and the reference landmarks. The specified coordinates of the proposed scheme are defined on the basis of the center of the LED contours, as shown in Figure  15. However, because of the rolling effect portrayed in Figure 16, the captured stripes of LED are not at a fixed location in the image plane. This generates error with regard to the center position of LED contours.

Error Analysis
The proportions of h and a in the camera architecture shown in Figure 14 are defined by Equation (22). The accuracy of the object image height a will affect the error rate of d, where the focal length f and the object height h are defined values extracted from the camera information and OCC reference coordinate calculation. separation distance of two LED images on the image sensor plane can be calculated by pixel measurement. By combining the rotation angle between the LED lines and the image sensor plane at the fourth step, distance estimation can be finalized. Figure 13. Distance estimation processes.

Error Analysis
The proportions of h and a in the camera architecture shown in Figure 14 are defined by Equation (22). The accuracy of the object image height a will affect the error rate of d, where the focal length f and the object height h are defined values extracted from the camera information and OCC reference coordinate calculation.
The estimation accuracy is directly dependent on the LED image landmark recognition. More specifically, it is strongly affected by the reference point for coordinate identification, which depends on the light conditions of the environment and the reference landmarks. The specified coordinates of the proposed scheme are defined on the basis of the center of the LED contours, as shown in Figure  15. However, because of the rolling effect portrayed in Figure 16, the captured stripes of LED are not at a fixed location in the image plane. This generates error with regard to the center position of LED contours.
With the error of pixel amounts in the image line of the two LEDs, the distance calculation is proportional to h and f. A 1-pixel error at the image sensor plane can generate an error in the distance, according to Equations (23), (24), and (25). With common camera parameters, e.g., f = 600 pixels, the distance is equal to ~3 cm for a 1-pixel error.  The estimation accuracy is directly dependent on the LED image landmark recognition. More specifically, it is strongly affected by the reference point for coordinate identification, which depends on the light conditions of the environment and the reference landmarks. The specified coordinates of the proposed scheme are defined on the basis of the center of the LED contours, as shown in Figure 15. However, because of the rolling effect portrayed in Figure 16, the captured stripes of LED are not at a fixed location in the image plane. This generates error with regard to the center position of LED contours.
h a = d f (22) f pixels = f mm × pixelDensity 25.4   The clarity of the stripes in the captured image affects the data decoding error and distance estimation accuracy, which are based on the stripe width separation. Assuming that a white LED is used as a transmitter, the width of the stripe comes from the color band between the black and white regions of the two LED off-state stripes. The most important detection parameter is the on-off separation threshold value, which is controlled by the transition between black and white and the LED driver frequency response.
The absorbed photon intensity of the LED signal at the image sensor with the LOS path loss model is given by Equation (26).
where hi is the color channel DC gain, P(t) is the transmission power of the light source, and Pr is the received optical power. The channel DC gain path loss can be defined with Equation (27).
where m is the order of the Lambertian emission, A is the image sensor pixel area, D is the distance between the transmitter and the receiver, Ɵ is the angle of irradiance, Ts(Ɵ) is the angle of incidence, g(Ɵ) is the signal transmission coefficient of an optical filter, cos(Ɵ) is the gain of an optical concentrator, and ƟFOV is the receiver field of view. The values of m and g are defined by Equation (28).  The clarity of the stripes in the captured image affects the data decoding error and distance estimation accuracy, which are based on the stripe width separation. Assuming that a white LED is used as a transmitter, the width of the stripe comes from the color band between the black and white regions of the two LED off-state stripes. The most important detection parameter is the on-off separation threshold value, which is controlled by the transition between black and white and the LED driver frequency response.
The absorbed photon intensity of the LED signal at the image sensor with the LOS path loss model is given by Equation (26).
where hi is the color channel DC gain, P(t) is the transmission power of the light source, and Pr is the received optical power. The channel DC gain path loss can be defined with Equation (27).
where m is the order of the Lambertian emission, A is the image sensor pixel area, D is the distance between the transmitter and the receiver, Ɵ is the angle of irradiance, Ts(Ɵ) is the angle of incidence, g(Ɵ) is the signal transmission coefficient of an optical filter, cos(Ɵ) is the gain of an optical concentrator, and ƟFOV is the receiver field of view. The values of m and g are defined by Equation (28). With the error of pixel amounts in the image line of the two LEDs, the distance calculation is proportional to h and f. A 1-pixel error at the image sensor plane can generate an error in the distance, according to Equations (23)- (25). With common camera parameters, e.g., f = 600 pixels, the distance is equal to~3 cm for a 1-pixel error.
The clarity of the stripes in the captured image affects the data decoding error and distance estimation accuracy, which are based on the stripe width separation. Assuming that a white LED is used as a transmitter, the width of the stripe comes from the color band between the black and white regions of the two LED off-state stripes. The most important detection parameter is the on-off separation threshold value, which is controlled by the transition between black and white and the LED driver frequency response.
The absorbed photon intensity of the LED signal at the image sensor with the LOS path loss model is given by Equation (26).
where h i is the color channel DC gain, P(t) is the transmission power of the light source, and P r is the received optical power.
The channel DC gain path loss can be defined with Equation (27).
where m is the order of the Lambertian emission, A is the image sensor pixel area, D is the distance between the transmitter and the receiver, θ is the angle of irradiance, T s (θ) is the angle of incidence, g(θ) is the signal transmission coefficient of an optical filter, cos(θ) is the gain of an optical concentrator, and θ FOV is the receiver field of view. The values of m and g are defined by Equation (28).
where n denotes the internal refractive index of the optical concentrator.
With the effect of the camera lens, the channel gain is defined by Equation (29).
where C blur is the blur concentration ratio, which is quantified by its standard deviation by Equation (30).
where s, f, and l are the pixel edge length, camera focal length, and the diameter of the LED source.
To analyze the performance of the system's SNR over signal propagation, the noise model plays an important role. Optical noise includes shot noise ∂, thermal noise, and multipath inter-symbol interference [29], defined by Equations (31)-(34), respectively.
N shot = 2qWµ(P r + P rISI )B en + 2qWI bg I 2 B en + ε, where µ is the detector responsibility, q is the electronic charge, B en is the equivalent noise bandwidth, W is the sampling rate of the camera, I bg is the background current, I 2 is the noise bandwidth factor, and ε is the affection noise from the neighbor channel. K is the Boltzmann constant, T k is the absolute temperature, G is the open-loop voltage gain, δ is the fixed capacitance of the photo detector per unit area, E is the FET channel noise factor, g m is the FET trans-conductance, I 3 = 0.868, and I 2 = 0.562. P r denotes the received power, and P rISI is the received power resulting from the inter-symbol interference. T depicts the exposure time of the camera. The SNR value is defined by Equations (35) and (36).
With the proposed 2FSOOK modulation at 2 and 4 kHz of the subcarrier, the BER is defined by Equation (37). Figures 17 and 18 show the performance of the SNR and SINR with different configurations of shutter speed and link distance. For the SNR, the fluctuation of the signal strength is controlled by the received power propagation and the lens blur effect. In contrast to SNR, the main factor of SINR is the exposure time, which avoids the interference of the optical source from neighbor modulated light pulses. S = μ Pr P = ( h (t) ⊗ P (x))dx (36) With the proposed 2FSOOK modulation at 2 and 4 kHz of the subcarrier, the BER is defined by Equation (37).

Image Sensor Communication Performance
The distance and localization estimation performance of the proposed scheme was evaluated by applying the experiment implementation circuit for the transmitter, including the microcontroller LED driver and the LED, as shown in Figure 19. The receiver comprises a smartphone rolling shutter camera and an application that can configure the camera shutter speed.

Image Sensor Communication Performance
The distance and localization estimation performance of the proposed scheme was evaluated by applying the experiment implementation circuit for the transmitter, including the microcontroller LED driver and the LED, as shown in Figure 19. The receiver comprises a smartphone rolling shutter camera and an application that can configure the camera shutter speed.

Image Sensor Communication Performance
The distance and localization estimation performance of the proposed scheme was evaluated by applying the experiment implementation circuit for the transmitter, including the microcontroller LED driver and the LED, as shown in Figure 19. The receiver comprises a smartphone rolling shutter camera and an application that can configure the camera shutter speed.  The data rate of the proposed 2FSOOK modulation scheme at different configured camera frame rates is shown in Figure 20. The system data rate is proportional to the camera frame rate. However, high camera frame rates create a problem with decoding processing. The experiment performed on The data rate of the proposed 2FSOOK modulation scheme at different configured camera frame rates is shown in Figure 20. The system data rate is proportional to the camera frame rate. However, high camera frame rates create a problem with decoding processing. The experiment performed on the current commercial smartphone at more than 20 fps is not reliable with respect to the processing capability, due to the cache memory. The buffer is overloaded in some scenarios where the shot noise is high. Table 3 provides the configuration of the experiment that guarantees data encoding and decoding.
The roll stripes of subcarrier frequencies of 2FSOOK are shown in Figure 21, where the 2 and 4 kHz OOK frequencies are embedded. In this experiment, the implementation of a 17 cm diameter of LED and the power density distribution analysis of two neighbors LED optical pulse at 2 and 4 kHz OOK frequency are shown in Figure 22. The distance from the transmitter to the receiver can be at 2 m for reliable decoding data with identified stripes in the captured image. At a larger distance, the separation distance between the contours is not stable, and the system encounters an error resulting from missing stripe widths, which arise as a result of the rolling mechanism limitation. Therefore, to achieve high accuracy and minimize the complexity of error correction, the maximum number of rolling contours should be guaranteed for the demodulation of one frequency. The number of contours depicts the maximum communication range between the transmitter and the receiver. This means that with FSK modulation, the position of the receiver can change dynamically, whereas the maximum communication range will be recognized when the number of contours is more than one. In our implementation, we configure the image sensor with a resolution of 600 × 800 pixels and a maximum communication range of 2 m. The higher the resolution configuration, the longer the distance we can achieve for communication. However, this setup faces processing time issues on the device. The performance with regard to processing capability with respect to image formats, resolution configuration, and the rolling signal pixel width of 2FSOOK are shown in Figures 23-25. the current commercial smartphone at more than 20 fps is not reliable with respect to the processing capability, due to the cache memory. The buffer is overloaded in some scenarios where the shot noise is high. Table 3 provides the configuration of the experiment that guarantees data encoding and decoding.   the current commercial smartphone at more than 20 fps is not reliable with respect to the processing capability, due to the cache memory. The buffer is overloaded in some scenarios where the shot noise is high. Table 3 provides the configuration of the experiment that guarantees data encoding and decoding.   The roll stripes of subcarrier frequencies of 2FSOOK are shown in Figure 21, where the 2 and 4 kHz OOK frequencies are embedded. In this experiment, the implementation of a 17 cm diameter of LED and the power density distribution analysis of two neighbors LED optical pulse at 2 and 4 kHz OOK frequency are shown in Figure 22. The distance from the transmitter to the receiver can be at 2 m for reliable decoding data with identified stripes in the captured image. At a larger distance, the separation distance between the contours is not stable, and the system encounters an error resulting from missing stripe widths, which arise as a result of the rolling mechanism limitation. Therefore, to achieve high accuracy and minimize the complexity of error correction, the maximum number of rolling contours should be guaranteed for the demodulation of one frequency. The number of contours depicts the maximum communication range between the transmitter and the receiver. This means that with FSK modulation, the position of the receiver can change dynamically, whereas the maximum communication range will be recognized when the number of contours is more than one. the current commercial smartphone at more than 20 fps is not reliable with respect to the processing capability, due to the cache memory. The buffer is overloaded in some scenarios where the shot noise is high. Table 3 provides the configuration of the experiment that guarantees data encoding and decoding.   The roll stripes of subcarrier frequencies of 2FSOOK are shown in Figure 21, where the 2 and 4 kHz OOK frequencies are embedded. In this experiment, the implementation of a 17 cm diameter of LED and the power density distribution analysis of two neighbors LED optical pulse at 2 and 4 kHz OOK frequency are shown in Figure 22. The distance from the transmitter to the receiver can be at 2 m for reliable decoding data with identified stripes in the captured image. At a larger distance, the separation distance between the contours is not stable, and the system encounters an error resulting from missing stripe widths, which arise as a result of the rolling mechanism limitation. Therefore, to achieve high accuracy and minimize the complexity of error correction, the maximum number of rolling contours should be guaranteed for the demodulation of one frequency. The number of contours depicts the maximum communication range between the transmitter and the receiver. This means that with FSK modulation, the position of the receiver can change dynamically, whereas the maximum communication range will be recognized when the number of contours is more than one. In our implementation, we configure the image sensor with a resolution of 600 × 800 pixels and a maximum communication range of 2 m. The higher the resolution configuration, the longer the distance we can achieve for communication. However, this setup faces processing time issues on the device. The performance with regard to processing capability with respect to image formats, resolution configuration, and the rolling signal pixel width of 2FSOOK are shown in Figures 23, 24, and 25.   In our implementation, we configure the image sensor with a resolution of 600 × 800 pixels and a maximum communication range of 2 m. The higher the resolution configuration, the longer the distance we can achieve for communication. However, this setup faces processing time issues on the device. The performance with regard to processing capability with respect to image formats, resolution configuration, and the rolling signal pixel width of 2FSOOK are shown in Figures 23, 24, and 25.   In our implementation, we configure the image sensor with a resolution of 600 × 800 pixels and a maximum communication range of 2 m. The higher the resolution configuration, the longer the distance we can achieve for communication. However, this setup faces processing time issues on the device. The performance with regard to processing capability with respect to image formats, resolution configuration, and the rolling signal pixel width of 2FSOOK are shown in Figures 23, 24, and 25.

Distance Estimation Performance
The performance of the proposed scheme for distance estimation is measured using photography with the setup illustrated in Figure 26. As described in the analysis from the previous section, the estimation error arises from the calculation of the distance between the LEDs, which is defined by the line connecting the two center points of reference LEDs. The OCC LEDs can be separated into common LEDs by continuous rolling contours. The coordinates of the LED ROI are covered by the position of the first and last contour, which are varied at every captured frame due to the rolling effect with the on status of LED. In the experiment, Figure 27a,b represent the features of the 4 and 2 kHz LED signal, which correspond to the maximum of 4 and 6 pixels at 600 × 800-pixel resolution. The error of the coordinates of the LED contour on the image sensor ranges from 1 to 6 pixels under stable conditions. The performance of the proposed scheme for distance estimation is measured using photography with the setup illustrated in Figure 26. As described in the analysis from the previous section, the estimation error arises from the calculation of the distance between the LEDs, which is defined by the line connecting the two center points of reference LEDs. The OCC LEDs can be separated into common LEDs by continuous rolling contours. The coordinates of the LED ROI are covered by the position of the first and last contour, which are varied at every captured frame due to the rolling effect with the on status of LED. In the experiment, Figures 27a and 27b represent the features of the 4 and 2 kHz LED signal, which correspond to the maximum of 4 and 6 pixels at 600 × 800-pixel resolution. The error of the coordinates of the LED contour on the image sensor ranges from 1 to 6 pixels under stable conditions.
A photograph depicting the experimental implementation of the proposed scheme is presented in Figure 28, portraying two OCC LEDs among three illumination LEDs. The coordinates of the two OCC LEDs are mapped with the index table through LED-ID [44] or by receiving the coordinates directly from the LED signal. In this scenario, we applied the LED-ID architecture, where there is 5bits ID. The data are broadcast by the LEDs, which are 17 cm apart, and received by the rolling shutter camera in the smartphone. The coordinates of the LEDs are queried by a server via a Wi-Fi connection. With regard to the camera focal length, this is one of the most important parameters for distance estimation using the pinhole camera approximation. It should be defined initially for the estimation processing. Its value can be obtained from the Camera API (for example, CameraIntrinsics or CameraCharacteristics library for Android devices). It can also be defined initially through manual configuration using a built-in database. The focal length of existing cameras will be surveyed and adapted for the application. By extracting the device model from the camera API, which most existing devices can support, we can match the device camera focal length from the database and then configure these values.  photography with the setup illustrated in Figure 26. As described in the analysis from the previous section, the estimation error arises from the calculation of the distance between the LEDs, which is defined by the line connecting the two center points of reference LEDs. The OCC LEDs can be separated into common LEDs by continuous rolling contours. The coordinates of the LED ROI are covered by the position of the first and last contour, which are varied at every captured frame due to the rolling effect with the on status of LED. In the experiment, Figures 27a and 27b represent the features of the 4 and 2 kHz LED signal, which correspond to the maximum of 4 and 6 pixels at 600 × 800-pixel resolution. The error of the coordinates of the LED contour on the image sensor ranges from 1 to 6 pixels under stable conditions. A photograph depicting the experimental implementation of the proposed scheme is presented in Figure 28, portraying two OCC LEDs among three illumination LEDs. The coordinates of the two OCC LEDs are mapped with the index table through LED-ID [44] or by receiving the coordinates directly from the LED signal. In this scenario, we applied the LED-ID architecture, where there is 5bits ID. The data are broadcast by the LEDs, which are 17 cm apart, and received by the rolling shutter camera in the smartphone. The coordinates of the LEDs are queried by a server via a Wi-Fi connection. With regard to the camera focal length, this is one of the most important parameters for distance estimation using the pinhole camera approximation. It should be defined initially for the estimation processing. Its value can be obtained from the Camera API (for example, CameraIntrinsics or CameraCharacteristics library for Android devices). It can also be defined initially through manual configuration using a built-in database. The focal length of existing cameras will be surveyed and adapted for the application. By extracting the device model from the camera API, which most existing devices can support, we can match the device camera focal length from the database and then configure these values.  A photograph depicting the experimental implementation of the proposed scheme is presented in Figure 28, portraying two OCC LEDs among three illumination LEDs. The coordinates of the two OCC LEDs are mapped with the index table through LED-ID [44] or by receiving the coordinates directly from the LED signal. In this scenario, we applied the LED-ID architecture, where there is 5-bits ID. The data are broadcast by the LEDs, which are 17 cm apart, and received by the rolling shutter camera in the smartphone. The coordinates of the LEDs are queried by a server via a Wi-Fi connection. With regard to the camera focal length, this is one of the most important parameters for distance estimation using the pinhole camera approximation. It should be defined initially for the estimation processing. Its value can be obtained from the Camera API (for example, CameraIntrinsics or CameraCharacteristics library for Android devices). It can also be defined initially through manual configuration using a built-in database. The focal length of existing cameras will be surveyed and adapted for the application. By extracting the device model from the camera API, which most existing devices can support, we can match the device camera focal length from the database and then configure these values.
For the centralization error rate analysis of the image sensor photography technique, the experiment implementation only considered the optical camera communications and pinhole camera approximation calculation. The camera focal length value is calibrated by manual configuration to minimize the error value. In addition, then, the rotation angle between the mobile device and the LED plane, which can be extracted from the gyroscope sensor, is also considered with the fixed scenario. The rotation angle with the gyroscope sensor was expected to be a minimum of 2 degrees in Ref. [45]. We conduct the experiment by measuring 1,000 times in order to calculate the average across mean and variance. The performance of distance estimation is shown in terms of the cumulative distribution function (CDF) results in Figure 29. It can achieve an average estimation error of 10 cm, which arises from the error of the LED center point detection due to the projected contours.  For the centralization error rate analysis of the image sensor photography technique, the experiment implementation only considered the optical camera communications and pinhole camera approximation calculation. The camera focal length value is calibrated by manual configuration to minimize the error value. In addition, then, the rotation angle between the mobile device and the LED plane, which can be extracted from the gyroscope sensor, is also considered with the fixed scenario. The rotation angle with the gyroscope sensor was expected to be a minimum of 2 degrees in Ref. [45]. We conduct the experiment by measuring 1,000 times in order to calculate the average across mean and variance. The performance of distance estimation is shown in terms of the cumulative distribution function (CDF) results in Figure 29. It can achieve an average estimation error of 10 cm, which arises from the error of the LED center point detection due to the projected contours.

Localization Estimation Performance
The performance evaluation of the localization estimation is based on the scenario portrayed in Figure 30. It includes three main steps, extracted from four processes: OCC detection, coordinator download, ranging estimation, and coordinator estimation. Firstly, the mobile terminal downloads the coordinates of the three reference LEDs using OCC. Secondly, it references the LED range calculation based on photography geometry. Finally, it performs the localization estimation using  For the centralization error rate analysis of the image sensor photography technique, the experiment implementation only considered the optical camera communications and pinhole camera approximation calculation. The camera focal length value is calibrated by manual configuration to minimize the error value. In addition, then, the rotation angle between the mobile device and the LED plane, which can be extracted from the gyroscope sensor, is also considered with the fixed scenario. The rotation angle with the gyroscope sensor was expected to be a minimum of 2 degrees in Ref. [45]. We conduct the experiment by measuring 1,000 times in order to calculate the average across mean and variance. The performance of distance estimation is shown in terms of the cumulative distribution function (CDF) results in Figure 29. It can achieve an average estimation error of 10 cm, which arises from the error of the LED center point detection due to the projected contours.

Localization Estimation Performance
The performance evaluation of the localization estimation is based on the scenario portrayed in Figure 30. It includes three main steps, extracted from four processes: OCC detection, coordinator download, ranging estimation, and coordinator estimation. Firstly, the mobile terminal downloads the coordinates of the three reference LEDs using OCC. Secondly, it references the LED range calculation based on photography geometry. Finally, it performs the localization estimation using

Localization Estimation Performance
The performance evaluation of the localization estimation is based on the scenario portrayed in Figure 30. It includes three main steps, extracted from four processes: OCC detection, coordinator download, ranging estimation, and coordinator estimation. Firstly, the mobile terminal downloads the coordinates of the three reference LEDs using OCC. Secondly, it references the LED range calculation based on photography geometry. Finally, it performs the localization estimation using trilateration. Among these four steps, OCC detection plays an important role in all system functions that support data decoding, coordinator mapping, and OCC receiver classification.
The experimental setup for localization estimation is shown in Figure 31a-c, which portray the off-status of reference LEDs, non-communicative mode of camera, and communication mode of camera, respectively. In this scenario, there are three data LEDs that coexist with other illumination lighting LEDs. In the non-communicative mode, the camera shutter speed is configured at 30 Hz, at which there is no classification between the illumination LEDs and communication LEDs. However, in the communication mode, the inner optical interference is removed by the high speed of the camera shutter. Using a computer vision algorithm, we can easily separate the OCC LEDs and illumination LEDs. As discussed in the previous section, the main consideration with respect to the OCC data rate is the processing time capacity at the receiver, which is affected by the image processing algorithm, resolution, and the number of sources. trilateration. Among these four steps, OCC detection plays an important role in all system functions that support data decoding, coordinator mapping, and OCC receiver classification.   trilateration. Among these four steps, OCC detection plays an important role in all system functions that support data decoding, coordinator mapping, and OCC receiver classification.   As a result of distance estimation in Section 4.2, which is controlled by camera focal length, rotation matrix angle and LED ROI, the performance evaluation of system localization, which is defined by trilateration formulas, has another factor, Gaussian distributions error of anchor distances. The performance of 300 localization estimation measurements with three reference nodes, as in the configuration scenario in Section 4.2, is shown in Figure 32. In this experiment, the vertical value, which is defined by the "z" coordinate, is calculated on the basis of the photography image sensor calculation of Section 3. The origin of the coordinate system and the direction of the axis are defined by the received reference coordinates and the image sensor coordinates. The performance accuracy of more reference coordinates with Gaussian weighing is shown in Figure 33. The experimental setup for localization estimation is shown in Figures 31a, 31b, and 31c, which portray the off-status of reference LEDs, non-communicative mode of camera, and communication mode of camera, respectively. In this scenario, there are three data LEDs that coexist with other illumination lighting LEDs. In the non-communicative mode, the camera shutter speed is configured at 30 Hz, at which there is no classification between the illumination LEDs and communication LEDs. However, in the communication mode, the inner optical interference is removed by the high speed of the camera shutter. Using a computer vision algorithm, we can easily separate the OCC LEDs and illumination LEDs. As discussed in the previous section, the main consideration with respect to the OCC data rate is the processing time capacity at the receiver, which is affected by the image processing algorithm, resolution, and the number of sources.  As a result of distance estimation in Section 4.2, which is controlled by camera focal length, rotation matrix angle and LED ROI, the performance evaluation of system localization, which is defined by trilateration formulas, has another factor, Gaussian distributions error of anchor distances. The performance of 300 localization estimation measurements with three reference nodes, as in the configuration scenario in Section 4.2, is shown in Figure 32. In this experiment, the vertical value, which is defined by the "z" coordinate, is calculated on the basis of the photography image sensor calculation of Section 3. The origin of the coordinate system and the  The experimental setup for localization estimation is shown in Figures 31a, 31b, and 31c, which portray the off-status of reference LEDs, non-communicative mode of camera, and communication mode of camera, respectively. In this scenario, there are three data LEDs that coexist with other illumination lighting LEDs. In the non-communicative mode, the camera shutter speed is configured at 30 Hz, at which there is no classification between the illumination LEDs and communication LEDs. However, in the communication mode, the inner optical interference is removed by the high speed of the camera shutter. Using a computer vision algorithm, we can easily separate the OCC LEDs and illumination LEDs. As discussed in the previous section, the main consideration with respect to the OCC data rate is the processing time capacity at the receiver, which is affected by the image processing algorithm, resolution, and the number of sources.  As a result of distance estimation in Section 4.2, which is controlled by camera focal length, rotation matrix angle and LED ROI, the performance evaluation of system localization, which is defined by trilateration formulas, has another factor, Gaussian distributions error of anchor distances. The performance of 300 localization estimation measurements with three reference nodes, as in the configuration scenario in Section 4.2, is shown in Figure 32. In this experiment, the vertical value, which is defined by the "z" coordinate, is calculated on the basis of the photography image sensor calculation of Section 3. The origin of the coordinate system and the

Conclusions
Considering the importance of localization services in the business trends of on-demand applications, the proposed scheme for distance estimation based on OCC and photogrammetry shows the advantages of VLC in visible light optical channels and the development of image sensors in smart devices. The proposed system includes a non-flickering visible light system transmitter, an OCC rolling shutter receiver, and employs a photogrammetry mechanism for distance estimation. The main contributions of this article are the OCC system design and implementation, and the distance estimation and localization calculation mechanism using OCC and photography. The performance results, as well as advantages of the hardware and low complexity, imply a new promising technique for indoor localization services. In future work, the performance evaluation of the camera's focal length calibration and rotation angle of image sensor will be considered with a thorough analysis.