Performance Analysis of Visible Light Communication Using CMOS Sensors

This paper elucidates the fundamentals of visible light communication systems that use the rolling shutter mechanism of CMOS sensors. All related information involving different subjects, such as photometry, camera operation, photography and image processing, are studied in tandem to explain the system. Then, the system performance is analyzed with respect to signal quality and data rate. To this end, a measure of signal quality, the signal to interference plus noise ratio (SINR), is formulated. Finally, a simulation is conducted to verify the analysis.


Introduction
There are two types of light receivers that can be used for visible light communication (VLC): photo diodes and image sensors. Photo diodes have been widely used thanks to their low cost and high reception bandwidth. Image sensors, on the other hand, are still far behind, since traditionally, they are more expensive, and the achievable data rate is lower. In recent years, image sensor technology has made a big leap with regard to price and performance. Even the least expensive smartphones nowadays are equipped with high-resolution CMOS sensor cameras. Many of them can shoot full HD videos at 30 or even 60 fps. This motivates the use of CMOS sensors for VLC.
As in any communication system, achieving a high data rate is one of the first targets. There are some approaches for achieving a high data rate using an image sensor. First, a high-speed camera can be used to receive the high-modulation-frequency light from an LED. The clear disadvantage of this approach is the cost of a high speed camera. Even though the megapixel war in the camera industry has pushed the resolutions of image sensors to very high levels and enabled high-resolution image sensors to be sold at low prices, it is still impossible to have a high-frame-rate camera at low cost. The second approach is using an LED array to transmit LED patterns, which conveys multiple bits per frame, at a low frequency to a normal camera. This approach has the limitation that the low frequency blinking of the LED can cause flickering that is perceived by human eyes. The third approach takes advantage of the rolling shutter mechanism of CMOS sensors to receive multiple bits modulated at a high frequency within one frame [1]. As a result, flickering is unobserved regardless of the low frame rate. There are special approaches, such as the one proposed in [2], which developed an optical communication image sensor capable of responding promptly to the variation of LED light. While that technique can achieve a very high data rate up to 20 Mbps, it requires a very proprietary sensor, which is not available for every researcher to use for designing a VLC system.
In [1], the concept of using the rolling shutter effect of CMOS sensors for VLC has been proposed. This technique has been reviewed in many studies [3][4][5][6][7]. Some have applied this technique for vehicular communication [8] and for positioning [9,10]. However, the current knowledge of this technique has many gaps. That is because the theoretical foundation necessary for understanding the Figure 1 illustrates image acquisition with a CMOS sensor. With a CCD sensor, the whole sensor surface is exposed at the same time, and the data in all pixels are read out at the same time. With a CMOS sensor, the exposure and data readout are performed row by row. The delay time between the exposures of two rows is equal to the readout time of one row. This mechanism is called the rolling shutter of the CMOS sensor. The time for the integration of one frame starts when the first row is reset and finishes when the last row is read out. During this period, turning on and off the LED light would result in light and dark bands on the image. By this mechanism, several bits, which are represented by these bands, can be received in a single frame.

Frame Rate in a CMOS Sensor Camera
Basically, a row in a CMOS sensor is ready to be exposed for the next frame as soon as the readout in that row for the current frame has finished. Thus, the first row in the sensor might start its exposure for the next frame, while the last rows in the sensor are still being exposed for the current frame, as described in Figure 2. In other words, the start of the image acquisition for the next frame does not need to wait for the completion of the image acquisition for the current frame. This mechanism helps the CMOS sensor cameras achieve relatively higher frame rates than their CCD counterparts. Obviously, the sooner the first row in the sensor starts its exposure for the next frame, the higher the frame rate that is achieved. However, at any specific time, the readout process can occur at only one row. Hence, to keep the exposure time the same between frames, the first row needs to wait for a period of time so that the readout for the next frame in that row occurs after the readout for the current frame in the last row has finished. Consequently, the minimum interval between frames is equal the frame readout time t f , as described in Figure 2.
Corresponding to the minimum frame interval, the maximum frame rate is determined as: While some cameras, such as the ones in the Nikon 1 series, can manage to achieve the above maximum frame rate, many other cameras can only offer the frame rate of 60% to 90% of the maximum frame rate [11]. This is because in their design, these cameras have been allowed a period, a "guard" period, which is used for the setting of the next frame, as well as other purposes. In these cases, the first row in the sensor needs to wait for a longer time before starting its exposure for the next frame. More specifically, the frame interval in these cases would equal the readout time plus the guard period. Therefore, in general, the frame rate of the camera is determined as: where t g is the guard period between frames. The whole process for calculating a pixel value is described in Figure 3. The light goes through the lens and falls onto the image sensor. Depending on the exposure setting, which determines the lens aperture and exposure time in the camera, a specific luminous exposure, which represents the amount of light per unit area, will be received by the sensor. After that, the photon energy will be converted to a voltage. In the amplification process, a factor called International Standards Organization (ISO) speed will determine how much the voltage is amplified. Through the amplifier and analog-to-digital converter, this voltage will be represented by a digital number that is usually called the raw output (or raw pixel value) of the sensor. Finally, a gamma encoding operation will convert this raw output value to a pixel value. In this paper, to analyze the signal quality and to simulate the system, the pixel value at given lighting conditions and camera settings needs to be calculated. In the process shown in Figure 3, the luminous exposure can be calculated accurately with given assumptions. Additionally, the pixel value can also be calculated from the raw output value if the value of gamma is given. However, the voltage generated at a given luminous exposure, as well as the raw output value representing a given voltage are really difficult to calculate, since these values are determined by many parameters, which cannot easily be assumed. Therefore, instead of strictly following all of the steps in Figure 3, this paper derives a shortcut method for calculating the pixel value directly from the light source and camera settings without going through the calculations of luminous exposure, output voltage and raw output values. This shortcut calculation will be the foundation for the analysis and simulation in this paper. The derivation of this shortcut calculation will be presented in the following sections.

Calculating the Pixel Value from the Raw Output Value
The actual output voltage from each cell of an image sensor is directly proportional to the luminous exposure, and so is the raw output value. However, the response of human eyes to light is logarithmic in the sense that the eye response is proportional to the logarithm of the light intensity. Therefore, an operation called gamma encoding is applied to redistribute the raw outputs from the sensor to pixel values that are closer to how human eyes perceive them. Assume that an eight-bit RGB color space is used. The maximum pixel value is 255, and then, the gamma encoding is defined by the following power-law expression: where PV is the pixel value, raw is the raw output value and raw max is the maximum possible raw output value, which is determined by the number of bits that the camera uses for representing the raw output value.
According to the standard [12], an object with actual luminous value (i.e., the raw output value) of 18% of the full-scale would be rendered as middle gray, which is equivalent to 46% of the maximum digital brightness in the image. Therefore, the raw output value of 18% would be encoded into the pixel value of 118 with the full scale being 255. To accomplish this, the standard gamma value of 2.2 is used. Figure 4 shows the gamma encoding operation, which translates a raw output data to a corresponding RGB pixel value with the value of gamma of γ = 2.2. The middle gray point indicates the mapping of the raw output value of 18% of the full scale to the pixel value of 118. Note that 18%, 46%, 118 and 2.2 are rounded off values, which are conventionally presented in the literature. The actual value used by each camera manufacture for the gamma encoding might be slightly different. In this paper, the actual gamma used for the simulation is 2.22.

Calculating the Pixel Value from the Luminous Exposure Ratio
Since the raw output value is directly proportional to the luminous exposure, it can be assumed that: where H denotes the luminous exposure and Const is a specific constant. Let raw 18% denote the raw output value equal to 18% of the full scale and H SOS denote the luminous exposure corresponding to the raw output of raw 18% . This means: Then, Since the raw output value of raw 18% is encoded to become the pixel value of 118 (i.e., the middle gray point), from Equation (3), it can be seen that: From Equations (3) and (7), the pixel value can be given as: From Equations (6) and (8) The luminous exposure is given by [13]: where L v is the luminance of the object (in cd/m 2 ), N is the relative aperture (f-number) of the lens, t is the exposure time in seconds and q is a factor whose value is determined by the transmittance T of the lens, the vignetting factor v(θ) and the angle θ relative to the axis of the lens as: Tv(θ) cos 4 (θ) (11) For convenience, an exposure value EV is used to represent a combination of lens aperture and exposure time as [13]: As expressed in Equation (10), the luminous exposure is directly proportional to the ratio t N 2 . Therefore, for any object with fixed luminance, all exposure settings that have the same EV would make the sensor receive the same luminous exposure and, thus, make that object rendered to the same brightness in the image. Thus, given an object with specific luminance, there is a unique exposure value, the "indicated exposure value", that is required for the camera to receive the luminous exposure H SOS , which makes the object rendered as middle gray.
Given a light source with specific luminance, suppose that EV set is the exposure value set for the camera and EV ind is the indicated exposure value. The exposure difference between the indicated and the set exposures, denoted by ED, is defined as: From Equations (10) and (12), it can be seen that: where H is the luminous exposure received with the set exposure value and H SOS is the exposure received with the indicated exposure value. Thus, From Equations (9) and (15), the pixel value can be calculated from the exposure difference as: As can be seen from Figure 4, when the exposure exceeds the maximum value that can be represented, a phenomenon called clipping occurs. The clipped area of the image will appear as a uniform area of maximum brightness, which is 255 in the eight-bit RGB color space. Therefore, Equation (16) is valid only when the exposure difference is smaller than a specific value: In Equation (13), while the set exposure value EV set can be easily calculated from the lens aperture and exposure time using Equation (12), the method for calculating the indicated exposure value EV ind is still unknown now. Therefore, in the following sections, the method for calculating EV ind from the given camera setting and light source intensity will be explained. First, the photometry of LED and ambient light is presented to explain how to calculate the intensity of the light source.

Photometry of LED and Ambient Light
In the system considered in this paper, the subject of the image is the LED. Furthermore, it is assumed that the LED covers the whole image sensor. In other words, there are only two sources of light coming to the sensor: LED light radiated by the LED itself and ambient light bouncing off the LED surface.

Measurement of Radiated Light Intensity
The intensity of LED radiated light is measured in luminance, denoted as L v , which indicates how much luminous power is perceived by the human eye looking at the surface from a particular angle [14]. The unit for luminance is candela per square meter (cd/m 2 ). In practice, the luminance of the LED can be measured using a light meter or simply given by the LED manufacture.

Measurement of Incident Light Intensity
The intensity of the ambient light illuminating the LED surfaces is measured in illuminance, denoted as E v , which indicates how much the incident light illuminates the surface [14]. The unit of illuminance is lux (lx). The illuminance of the ambient light can be measured by a light meter or given as an assumption. Usually, the illuminance at different environments is assumed to have the value given in Table 1. Table 1. Illuminance at different environments.

Value (lux) Environment
Full moon 80 Hallways in office buildings 100 Very dark overcast day 300-500 Office lighting 400 Sunrise or sunset 1000 Overcast day 10,000-25,000 Full daylight 32,000-130,000 Direct sunlight Assume that the ambient light illuminance is E v and the reflectance of an object is R; the luminance of the reflected light from that object is given as [15]:

Calculating Indicated Exposure Value from Given Camera Settings and Light Source Intensity
As explained earlier, given a specific luminous exposure, which corresponds to a specific output voltage, the ISO speed will determine the corresponding raw output value. Therefore, basically, any level of luminous exposure received by the sensor can be rendered as middle gray given an appropriate ISO speed. According to the standard output sensitivity (SOS) [12], the ISO speed that makes a certain luminous exposure value of H to be rendered as middle gray, by the camera manufacturer's definition, has the value of S given as: Therefore, in any specific camera, given the ISO value set to S, the luminous exposure H SOS required for producing the middle gray tone in the image is given as: Indicated Exposure Value for Radiated Light From Equations (10) and (20), the camera setting required for producing the middle gray tone for an object having the luminance L v is determined as: where K = 10 q is the reflected light meter calibration constant. From Equations (12) and (21), the indicated exposure value EV LED corresponding to the light radiated from the LED is given as: where L v is the luminance in cd/m 2 of the light radiated from the LED.

Indicated Exposure Value for Incident Light
From Equations (18) and (21), the required camera setting for producing the middle gray tone for an object illuminated by an ambient light having the illuminance of E v is given as: where C = 10π qR = K π R is the incident light meter calibration constant. From Equations (12) and (23), the indicated exposure value EV BG corresponding to the background ambient light bouncing off the LED surface is given as: where E v is the illuminance in lux of the ambient light coming to the LED.

Indicated Exposure Value for the Combined Light Source
As mentioned earlier, the light coming from LED surface to the image sensor includes the light radiated by the LED and the ambient light bouncing off of the LED surface. Let H LED , H BG and H combine respectively denote the luminous exposure received by the sensor from the LED radiated light, the reflected ambient light and the combined light sources. Let EV set denote the set exposure value.
From Equations (13) and (15), H LED , H BG and H combine are given as: From Equation (14), the exposure difference is given as: From Equation (13), the indicated exposure value corresponding to the combined light source is given as: From the indicated exposure value calculated above and the set exposure value EV set calculated using Equation (12), the exposure difference ED is calculated using Equation (13), and finally, the pixel value PV can be obtained using Equation (16).

Performance Analysis of the System
The performance of the VLC system using the CMOS rolling shutter mechanism will be analyzed with respect to two major aspects: signal quality and data rate. It is important to note that the analysis in this section is valid when the condition described in Equation (17) is satisfied. In other words, the prerequisite of the analysis is that the image is exposed correctly so that the highlight, the white band, is not clipped. This is a fair assumption, since the exposure time, as can be seen later in this section, should be kept as short as possible, and thus, clipping would never be the case in practice.

Signal Quality
The signal quality of VLC using CMOS sensors is considered high when the white and black bands, which express the logical one and zero, can be distinguished easily. In image processing, the distinguishability of these two bands is determined by the clarity and contrast of the image. While clarity expresses the local difference between white and black bands in transition regions of the image, contrast indicates the global difference between the maximum and minimum brightness of the entire image. Clarity can be considered to be the measurement for the intersymbol interference (ISI) effect, and contrast can be considered to be the measurement for the effect of ambient light noise. Therefore, similar to other wireless communication technology, the signal quality of VLC using CMOS sensors can also be measured by the SINR.

Intersymbol Interference
Usually, the image clarity is reduced due to the presence of transitions between white and black bands. As the exposure time is longer than zero, there must be some rows at which the switching of the LED occurs during their exposure, and thus, the presence of transition bands is inevitable, as illustrated in Figure 5. In this figure: t f is the frame readout time; h i is the image height (i.e., the number of rows in the sensor); h t is the height (i.e., the number of rows) of the transition band; h c is the height (i.e., the number of rows) of the complete band; t is the exposure time of the camera; and t e f f and t o f f are respectively the effective and off exposure time of a row of the sensor. As can be seen in Figure 5, the exposure time t is the time period that the sensor opens to receive light. However, if the LED switches during the exposure of a row, the exposure period of that row can be divided into two periods: effective exposure and off exposure. In the first period, the LED is on, and thus, the row received both LED light and ambient light. In the second period, the LED is off, and thus, the row receives only ambient light.
For each row in the transition band, let i r denote the relative position of that row with respect to the first row of the transition band. In other words, i r is the total number of rows counted from the first row in the corresponding transition band to that row. Then, the relative positions of rows, from the first to the final one, in the transition band are i r = 1, 2, 3, ..., h t .
From the property of similar triangles in Figure 5, the off exposure time of a row i r in a transition band is given as: where t r is the row readout time given as: Then, the effective exposure time of a row i r in a transition band is given as: Furthermore, from the property of similar triangles in Figure 5, the height of the transition band is given as: The height of the complete white (or black) band in the number of rows h c in Figure 5 is given as: where f led is the blinking frequency of the LED. It is obvious that the heights of the transition bands should be as small as possible. The row readout time is a specification that cannot be changed for a specific camera. However, the transition between white and black bands, as shown in Equation (31), can be reduced by shortening the exposure time.
In [6], the authors conducted experiments with various LED frequencies and observed that when the LED frequency was higher than the reciprocal of the exposure time, the camera was not able to record the signal. While the reason for this phenomenon was not covered in [6], it can be seen clearly through Equation (32). From this equation, it turns out that the complete band has zero height when the exposure time equals the LED duration. Therefore, the exposure time must be shorter than the LED pulse duration: Figure 6 explains why the transition bands in the image correspond to the ISI of the signal. In this figure, the pixel value can be considered to be the signal amplitude, and thus, the white and black bands in the image correspond to pulses that represent the symbols 1 and 0. The black-to-white transition corresponds to the interference that symbol 0 introduces to symbol 1. In contrast, the white-to-black transition corresponds to the interference that symbol 1 introduces to symbol 0. These ISIs appear at the beginning of each symbol, causing the slow rising of symbol 1 and the slow falling of symbol 0, as well as reducing the distinguishability of the two symbols from each other.

Ambient Light Noise
In the indoor environment, the luminance of the LED is much higher than the ambient light. For example, the indicated exposure value at ISO 100 of a typical LED is 17, whereas that of a typical working office is seven. This means the LED is 2 10 -times brighter than the ambient light. When the camera exposure is set for the LED, the ambient light in this case has almost no impact on the image.
However, in an outdoor environment, the ambient light can bring both the highlight and shadow of the image up. However, because of the non-linear response between the exposure and pixel value, the ambient light shows a much stronger impact on the shadow than on the highlight. In other words, the ambient light significantly raises the pixel level of the logical zero (black) while it brings about a small increase in the pixel level of the logical one (white), as shown in Figure 7. Therefore, in an outdoor environment, the ambient light can considerably reduce the image contrast. To analyze the effect of ambient light, the minimum and maximum pixel values need to be calculated and compared.  The rows that fully receive the LED light and ambient light during their exposure period would have the maximum pixel value in the image. The maximum pixel value is given as (Appendix A): The rows that do not receive any LED light during their exposure period would have the minimum pixel value in the image. Since ambient light is the only light source, the exposure difference would be the difference between the indicated EV of ambient light and the set exposure. Then, the minimum pixel value is given as (Appendix B): Note that in the two equations above, E v R π is the luminance caused by the ambient light bouncing off the LED surface.
The contrast between the white and black bands is given by: The derivative of the contrast with respect to E v is given by: Thus, the contrast decreases when the ambient light increases. This explains the significant increasing of the pixel level of the logical zero compared to that of the logical one when the ambient light increases, as represented in Figure 7.

SINR
A typical signal with the presence of ISI and ambient noise is illustrated in Figure 8. PV max and PV min are the maximum and minimum pixel values in the image. PV i r denotes the pixel value at a row having the relative position i r in the transition band (i.e., the ISI part). The signal to interference plus noise ratio (SINR) is given as: Since the ISI at the beginning of symbol 1 and the ISI at the beginning of symbol 0 are symmetric, Equation (38) can be expressed as: The values of PV max and PV min are given by Equations (34) and (35), respectively. The value of PV i r is given by (Appendix C): From Equations (34), (35), (39) and (40), the value of SI NR is given as: To examine the effects of system parameters on SINR, the differentials of SINR to related parameters in Equation (41) are calculated. For the sake of simplicity of calculation, firstly, the finite sum in Equation (41) is approximated by the integral: From Equations (41) and (42), the value of SINR is approximated as: It can be seen through Equation (43) that the parameter t r can be canceled out from the equation for SINR. Therefore, SINR is independent of the frame readout time.
The results (Appendix D-F) show that: Therefore, SINR decreases when either the exposure time t, or the LED frequency f led , or the ambient light illuminance E v increases. The effect of LED luminance depends on the ambient light. Without the presence of ambient light, the SINR is independent of the LED luminance. In contrast, with the presence of ambient light, the SINR increases when the LED luminance increases.

Data Rate
From Figure 5, we see that the total number of complete white and black bands per frame N b can be calculated as: From Equations (2) and (45), the data rate is given by: It is easy to see that increasing the LED frequency will increase the data rate. However, there is a maximum value that the frequency of modulation of the LED cannot exceed. This maximum LED frequency can be determined through Equation (33) as f led < 1/t.
When t guard = 0, the camera has the maximum frame rate, and the data rate at that time is equal to the LED frequency. Additionally, because of Equation (33), the maximum data rate that can be achieved is:

Required Distance from LED to Camera
The major drawback of the technique using the rolling shutter CMOS sensor for VLC is that it require a close distance from the LED to the camera, so that the LED covers the entire or a big part of the sensor. For example, [1] and this paper assume that the LED covers the whole sensor. In [6], the LED only covers the vertical part of the sensor.
As shown in Figure 9, the required distance from the LED to the camera is determined by the LED size and the field of view (FOV) of the lens. A lens in a typical smartphone camera has a field of view equivalent to a focal length of a 35-mm lens on a full-frame camera. More specifically, a typical smartphone camera lens has a field of view of 54.4 • horizontally and 37.8 • vertically. Figure 9. Relation between LED size, camera FOV and distance from the LED to the camera.
From Figure 9, the required distance from the LED to the camera is given by: where LED_size is the diameter of the LED and FOV is the field of view of the lens. For the LED to cover the whole sensor as in [1] and this paper, the horizontal part of the sensor must be covered within the FOV. Thus, given that the LED diameter is 10 cm and the lens has a field of view equivalent to a focal length of a 35-mm lens of a full-frame camera, the required distance from the LED to the camera should be 9.7 cm. For the LED to cover just the vertical part of the sensor as in [6], the vertical part of the sensor must be covered within the FOV. In this case, the required distance from the LED to the camera should be 14.6 cm. A longer required distance from the LED to the camera can be obtained by using a longer lens or increasing the size of the LED.

Simulation Environment
To verify the analysis, we conducted a simulation of the system using MATLAB. The simulation reproduces the operation of a CMOS sensor as described in Section 2 to create a gray scale image. The LED luminance is assumed to have three levels, 4096 cd/m 2 , 8192 cd/m 2 and 16,384 cd/m 2 , which correspond to the indicated exposure values of 15, 16 and 17, respectively.
ISO 2720:1974 [16] recommends that the ranges for the reflected-light meter calibration constant K be from 10. 6-13.4. This paper assumes the usual value K = 12.5, which is used by Canon, Nikon and Sekonic. The incident-light meter calibration constant C is determined based on the reflectance of the LED surface. Following [17], the reflectance of the transparent surfaces was assumed to be 40%. Then, the value of C is assumed to be 12.5 × π 0.4 = 98. It is assumed that the LED has the typical modulation bandwidth of 10 MHz or above. Given that the maximum modulation frequency of the LED is 8000 Hz, the slow rising and falling of the LED would cause very little effect on the ISI. For simplicity, in the simulation, the LED is assumed to switch instantly between on and off.
All of the simulation parameters are listed in Table 2.

Simulation and Calculation Procedure
Given a set of parameters with specific values, the pixel value of each row is calculated using the equations in Section 2, and an image is created as shown in Figure 10a. After that, the pixel value of each row is tracked to find the signal, ISI and noise parts in the image. As illustrated in Figure 10b, first, the maximum and minimum pixel values of the image are found. Then, the signal component starts where the pixel value increases from the minimum and ends where the pixel value decreases from the maximum. The ISI component begins right behind the end of the signal part and completes where the pixel value reaches the minimum. The noise is determined as the part having the minimum pixel value after the ISI. The signal, ISI and noise are calculated as the summation of the pixel value from the starting row to the ending row of these components. Afterwards, the SINR is calculated using Equation (38).

SINR
According to the analysis, there are four parameters affecting SINR: exposure time, LED frequency, ambient light illuminance and LED luminance. Among them, the ambient light illuminance is usually given at certain environments and, thus, cannot be changed. The LED luminance is determined by the hardware and cannot be altered after the setting up of the system. However, the exposure time and LED frequency are two flexible parameters that can be changed to increase or decrease the system performance. Therefore, the simulation will focus on examining the effects of exposure time and LED pulse duration on the SINR. Nevertheless, the effects of ambient light illuminance and LED luminance are also tested. The simulation results are shown in Figure 11. (c) Figure 11. Effect of system parameters on the signal to interference plus noise ratio (SINR). (a) SINR with E v = 0, L v = 8192, t f = 1 100 , 1 8000 ≤ t ≤ 1 1000 , 500 ≤ f led ≤ 8000; (b) SINR with L v = 8192, t f = 1 100 , 1 8000 ≤ t ≤ 1 1000 , 500 ≤ f led ≤ 8000; (c) SINR with E v = 40,000, t f = 1 100 , 1 8000 ≤ t ≤ 1 1000 , 500 ≤ f led ≤ 8000.
The simulation shown in Figure 11a uses fixed values for ambient light illuminance and LED luminance. More specifically, no ambient light and the LED luminance of 8192 lux are assumed. The exposure time ranges from 1/8000 s-1/1000 s, and the LED frequency ranges from 500 Hz-8000 Hz. Note that the LED duration is the reciprocal of the LED frequency, and in the simulation, the condition of 1/ f led > t is always guaranteed. The results show that SINR decreases when either exposure time or the LED frequency increases. Figure 11b illustrates the effect of ambient light illuminance on SINR. Three kinds of environments are assumed: no ambient light, indoor office with 400 lux of illuminance and outdoor hazy sunlight with 40,000 lux of illuminance. It can be seen that SINR decreases when the ambient light illuminance increases. In addition, the ambient light in the indoor environment only has a subtle effect, as previously mentioned.
The LED luminance has no effect on SINR unless the ambient light exists. As previously shown, the presence of ambient light makes the SINR decrease. To lessen this undesired effect, the LED luminance should be increased. In the simulation shown in Figure 11c, an outdoor environment with strong ambient light of 40,000 lux is assumed. It can be seen that the SINR increases when the LED luminance increases.
As shown in the simulation, both shortening the exposure time and lengthening the LED duration increase the SINR. Lengthening the LED duration, however, would result in the decrease of the data rate. Consequently, the exposure time is the main parameter that can be changed to increase the SINR. Since reducing the exposure time makes the whole image darker, the ISO speed might need to be increased to compensate for the decrease of exposure. Note that increasing the ISO would introduce some noise in the image. The level of noise increase depends on the physical capability of the sensor. Therefore, when designing a VLC system using CMOS sensor, one needs to consider the effects of all of these parameters to get the most preferred performance.

Data Rate
The data rate is mainly determined by the LED frequency, and the relationship between them is straightforward. For example, in Equation (46), if the guard time t g is assumed to be zero, then the data rate will be simply identical to the LED frequency. Therefore, a simulation showing the effect of the LED frequency on the data rate is not conducted in this paper. Instead, we just explain the maximum data rate that can be achieved in normal cases and simulate the frame captured at such a data rate.
As explained earlier, the LED frequency is limited by the exposure time and so is the data rate. In most prosumer cameras, whether mechanical shutters or electronic shutters are used, the minimum exposure time is set to 1/8000th of a second. They are made that way since the exposure time hardly needs to be smaller than 1/8000 s in normal applications. Using these cameras, the maximum data rate that can be achieved is 8 kbps.
In fact, when using an electronic shutter, the exposure time of the camera can be as short as the time required for switching the status of each row in the sensor from exposure to readout. For example, the exposure time on the Panasonic GH and Nikon 1 series can be set at 1/16,000 s, while that on the Fuji RX series is 1/32,000 s. Some scientific cameras even allow the exposure time to be set at a few microseconds. In a VLC system, an exposure time of a few microseconds would be impractical, since the light received by the sensor would be insufficient. Even the exposure time of 1/32,000 s would place strict requirements on the LED, lens and ISO setting. More specifically, to receive sufficient light at 1/32,000 s, the LED luminance should be high; the lens should have a large aperture; and the ISO speed should be set high. Figure 12 shows a simulated frame corresponding to the setting: t = 1/32, 000, f -number = 2.8, L v = 16384, ISO = 800. The frame readout time is assumed to be 10 ms. Then, there are 320 bands in each frame. If the guard time between frames equals zero, the data rate would be 32 kbps. In normal cases, the lens aperture, LED luminance and ISO in this configuration almost reach their limits. For example, with prosumer cameras, an ISO higher than 800 would introduce too much noise in the image, and the lens aperture should be smaller than f/2.8. Therefore, a data rate of tens of kbps would be the maximum data rate of this technique in practical circumstances. Note that in the experiment conducted in [1], the maximum achieved data rate is 3.1 kbps.

Conclusions
Recently, visible light communication using rolling shutter CMOS sensors has been studied, and the results showed that it is a promising technique. In this study, the fundamentals of the system are explained in detail. After that, the system performance is analyzed with regard to signal quality and data rate. To accomplish this, a new measurement for signal quality is formally defined as a signal to interference plus noise ratio. Then, equations for calculating the SINR and data rate are formulated. Based on these equations, the effects of system parameters on the SINR and data rate are examined. Finally, a simulation is conducted to verify the analysis. Author Contributions: Trong-Hop Do proposed the idea, performed the analysis and wrote the manuscript. Myungsik Yoo provided the guidance for data analysis and paper writing.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A Calculate the Maximum Pixel Value
Presume that EV set is the exposure value to which the camera is set. EV LED and EV BG are the indicated exposure values corresponding to the LED light and background light provided by Equations (22) and (24), respectively.
From Equation (26), the exposure difference is given by: Then, the maximum pixel value is given by: From Equations (12), (22) and (24), we have: Therefore, the maximum pixel value is given by:

Appendix B Calculate the Minimum Pixel Value
In this case, the only light source coming to the sensor is the background ambient light. Presume that EV BG is the indicated exposure values corresponding to the background light provided by Equation (24). The exposure difference is given by: Therefore, the minimum pixel value is given by: PV min = 118 × 2 ED min /γ = 118 × 2 (EV BG −EV set )/γ = 118

Appendix E Derivative of SINR with Respect to LED Frequency
∂SI NR ∂ f led

Appendix F Derivative of SINR with Respect to Ambient Light Illuminance
∂SI NR Since t × f led = Exposure time LED duration < 1, 1/γ > 0 and