Iterative Pilot-Based Reference Frame Estimation for Improved Data Rate in Two-Dimensional Display Field Communications

: Recently, display-to-camera (D2C) communication, including display ﬁeld communication (DFC), has gained attention due to advancements in display technology and the widespread availability of cameras in handheld devices. In this study, we proposed an iterative pilot-based reference-frame estimation scheme to increase the data rate of a 2D-DFC system. To estimate the reference frame, pilot symbols are inserted between the data symbols of the transmitted image frames. Using pilot symbols, we can compensate for the distortion in the received frame and estimate the data pixels of the reference frames. After the ﬁrst iteration, we use some of the data symbols as virtual pilot symbols for the next iteration. This process is repeated using both the original and virtual pilots; furthermore, by conducting several iterations, all the data pixels of the reference frame are estimated to reconstruct the reference frame. Simulation results show that the proposed scheme signiﬁcantly boosts the achievable data rate of the 2D-DFC communication system by almost twofold, while maintaining the unobtrusiveness of the display.


Introduction
Display-to-camera (D2C) communication [1][2][3] is an application of visible light communication (VLC) [4], in which an LCD and a camera sensor can communicate for device-todevice communication. Owing to the vast popularity of mobile devices and the widespread availability of displays, D2C communication is the next promising candidate with the potential to replace conventional approaches, such as QR codes and 2D barcodes. In D2C communication, information is encoded on the display screens of smartphones, laptops, advertisement boards, etc. Another device with a camera sensor (such as a smartphone) can capture the screen and decode data using image analysis. Conventional approaches require space and are obtrusive to the human eye. For instance, a QR code placed at the corner of a commercial advertisement is not visually aesthetic and produces distractions. Second, the data transmission capability of a QR code, such as the URL of a product's homepage, is extremely limited. Although increasing attention is paid to embedding images into QR codes to mitigate the above limitations [5][6][7], a new approach of embedding the data directly into the image frames of a display might completely replace the conventional approach. This can be attributed to the inherent advantages of D2C communication, such as higher data rates and unobtrusiveness of the display to normal human viewers. D2C communication has the potential to enable a wide range of applications in areas, such as security [8,9], healthcare, and smart homes [10,11]. A key advantage of D2C communication is its security. Because transmission occurs through light, it is much more difficult to intercept the signal compared to other wireless communication methods such as radio frequency or Bluetooth. This makes D2C communication particularly useful for applications such as mobile banking, access-control systems, and healthcare devices. In addition, D2C communication can enable new types of interactive experiences. For example, it can be used to allow users to interact with displays in public spaces by simply pointing their cameras at the screen. This could open new possibilities for interactive advertising, digital signage, and other types of public information displays [12].
Approaches for embedding data into image frames can be broadly divided into two categories: spatial domain and spectral domain embedding. For example, in the work of Wang et al. [13], information bits are carried with spatially complementary visual patterns assembled into complementary temporal frames. This study uses the concept of complementary frames displayed at a high frame rate to ensure a normal viewing experience. HiLight [14] encodes data into pixel translucency changes for any screen content using an alpha channel. Here, an additional image layer (a black matte, fully transparent) is created on top of the content image layer, which is dedicated to data communication and is referred to as the communication layer. To transmit the data, the communication layer was divided into grids, and data were encoded into the pixel translucency change of each grid without affecting the user's viewing experience.
By contrast, data can be embedded in the spectral domain of an image [15]. Spectraldomain data embedding captures the characteristics of the human visual system better [16]. In other words, spectral domain techniques can take advantage of the fact that the human eye perceives different parts of the spectrum differently, allowing selective embedding in less perceptually important regions. In addition, spectral domain techniques can be more robust to compression and other signal processing algorithms that often affect the spatial domain more strongly than the spectral domain [17]. One of the spectral-domain data embedding approaches is display field communication (DFC) [18,19], where data are embedded into the frequency domain of an image by employing the properties associated with the frequency coefficients of an image. The DFC approach was robust against visual artifacts observed on the screen, even with a lower framerate.
In a previous study [20], we experimentally implemented a 1D-DFC approach based on 1D discrete cosine transform (DCT) and machine learning. The paper evaluates the proposed scheme using an actual DFC link and demonstrates the practical implementation and performance of our approach for various system design parameters. In particular, we first adopted DCT to transform a spatial-domain image into its spectral-domain equivalent. Addition allocation and subtraction data retrieval techniques were used to reduce computational complexity during the data-embedding process. Moreover, channel coding was applied to overcome the data errors caused by D2C wireless channel. After capturing the displayed image using a camera, the display region was extracted using the a object detection deep learning technique. Extensive real-world experiments were performed, considering various geometric distortions, noise, and different standard input images.
Although DFC is robust against visual artifacts at low frame rates and perspective distortion, it uses reference frames that are used at the camera receiver to decode the dataembedded frames correctly. Although complementary (or reference) frames compensate for the visual artifacts on the screen and assist in data decoding, they significantly diminish the data rate of the overall system. To address this problem, it is essential to estimate the reference frame at the receiver end. Our previous study introduced a comparable method for reconstructing reference frames in one-dimensional DFC systems [21]. In this study, we extend the iterative spectral image estimation approach to enhance the data rate of two-dimensional DFC (2D-DFC) systems. To estimate the channel, we first embedded pilot pixels into data-embedded frames. Using these pilot pixels, we obtained the least squares (LS) estimate, followed by interpolation of the channel at the data pixels based on the LS estimate of the pilot signals. Subsequently, we commenced the iteration process by assuming the correctness of the decoded symbols and using some of them as virtual pilots in the next iteration to reestimate the information pixels. This iteration is repeated several times to estimate the information pixels more accurately. After fully estimating the reference image, we employ a zero-forcing (ZF) receiver to demodulate the data. The simulation results demonstrate that the proposed scheme outperforms the conventional 2D-DFC scheme by a factor of two in terms of the achievable data rate (ADR) while also maintaining the unobtrusiveness of the display by embedding the data in the high-frequency regions of the transmit frames. Additionally, we conducted simulations that considered perspective distortion, which refers to the misalignment of cameras and displays. This implies that the camera and display are not perfectly aligned and can cause distortions in the visual output. This simulation provides an opportunity to test the feasibility of conducting experiments under similar conditions. The remainder of this paper is organized as follows. Section 2 provides an overview of the pilot-based 2D-DFC system, including the data embedding and pilot insertion mechanism. In Section 3, we propose an iterative scheme for reconstructing the reference image along with virtual pilot pixel selection criteria. Section 4 presents the simulation results in terms of the symbol error rate (SER), achievable data rate (ADR), and peak signal-to-noise ratio (PSNR) for various system design criteria. In addition, we performed simulations considering the misalignment between the camera and display, and compared our proposed scheme with the conventional 2D-DFC scheme. Finally, Section 5 concludes the paper.

System Description
The DFC scheme involves pointing a digital camera on an electronic screen to capture the display output [18]. The 2D-DFC scheme involves embedding the data in two dimensions of an image frame [19]. Figure 1 shows a typical block diagram of a 2D-DFC system with pilot signal assistance. As shown in Figure 1, at the transmitter, the modulator maps binary input data bits to data symbols, which are then embedded into the spectral domain of a image. Then, the 2D-inverse discrete Fourier transform (IDFT) was applied, and the resulting image was displayed on the screen. In conventional 2D-DFC, reference image frames (without data embedding) are inserted between neighboring data-embedded frames to minimize the visual artifacts that may be visible to the human eye. However, in the current system model, reference frames were not transmitted.
At the receiver, the camera captures a sequence of images from the screen, which are then transformed from the spatial domain into the frequency domain using a 2D-discrete Fourier transform (DFT). In the first iteration, the information symbols are decoded based on pilot observations. Once all the information symbols were decoded and presumed to be accurate, some pixels were used as virtual pilots for the second iteration, and the spectral domain image was reconstructed accordingly. The subsequent iterations refine the image pixel estimates using the feedback information symbol estimates.

Data Embedding
The DFC has a data-embedding approach that operates in the frequency domain. To begin the process, the image frames are first converted into the spectral domain using 2D-DFT, as shown in Figure 1. The DFC scheme exploits the fact that the information content of an image is concentrated in specific regions of its frequency domain representation. Specifically, the amplitudes of the low-frequency components are located at the corners of the 2D spectrum, whereas those of high-frequency components are situated at the center [18]. In other words, the low-frequency components containing important information about the image are concentrated at the four corners of the spectral-domain image. This characteristic allows the DFC scheme to use the areas surrounding the corners for data and pilot embedding. These areas were chosen because they offer a means of minimizing the perceptual image distortion that may occur owing to data embedding. Mathematically, the 2D-DFT of a spatial-domain image, I t , of size P × Q can be taken as where I F represents the spectral domain image, and F P and F Q represent P × P and Q × Q DFT matrices, respectively.

Pilot observations
Hard symbols Simultaneously, the binary information bits are modulated using a modulation scheme to embed them into a frequency-domain image. To ensure the invisibility of the embedded data and maintain the real and positive values of the gray-scale intensity components of the data-embedded image, the data components must exhibit conjugate symmetric properties. Thus, the data matrix at the ith time index X[i] is defined as where x {p,q} denotes the data element at the (p, q)-pixel position. The starting pixel values at which the data are embedded in a row and column are denoted by s p and s q , respectively. The pixel widths at which the data and pilots are loaded in the row and column directions are denoted as L p and L q , respectively [19].
In the 2D-DFC scheme, the data embedding is performed by multiplying the data coefficient with the frequency component of the image. Therefore, a frequency-domain image loaded with data at the ith time index, denoted by D F [i], can be expressed as where • represents the Hadamard product operator, I F1-F9 represents the nine subimages made out of the original image I F , and flip(·) is the flip operation. The operation flip(X[i]) is expressed as follows: wherex denotes the conjugate of x.
The above equation indicates that only the top left and bottom right corners of the frequency-domain image are utilized for embedding the data. Furthermore, the elements of the data matrix, denoted by x {p,q} , can be expressed as This equation indicates that the portion of the frequency-domain image where the lowfrequency information is concentrated is set to one, while the remaining portion is utilized for embedding data. Finally, as depicted in Figure 1, the 2D-IDFT operation was employed to convert the frequency-domain image back into the spatial domain to be displayed on the electronic screen. Mathematically, this can be expressed as where F H P and F H Q are Hermitian transposes of the 2D-DFT matrices. Thus, the spatial domain image is rendered through the electronic display, and the data are transmitted simultaneously through the image. Figure 2 depicts the data embedded in the frequency-domain image and their effect on the spatial-domain image. Figure 2a illustrates the location of the sub-band, where the data and pilots are loaded. In particular, the white region in the frequency-domain image represents the sub-band region, where the data and pilots are embedded. As mentioned previously, to introduce fewer detectable artifacts, the data and pilots were loaded in the high-frequency range. The low-frequency components of an image generally exhibit smooth color variations, whereas the high-frequency components exhibit sharp variations. Because the low-frequency regions of an image contain the primary perception of the human eye, it is preferable to embed the data in the high-frequency region. In this way, during the simultaneous transmission of images on the screen, any image distortions in sequential images are almost imperceptible to the human eye. Figure 2b shows the corresponding spatial-domain image. For comparison, a data-embedded image (without pilots) is displayed in Figure 2c [19]. Both images appear similar in the spatial domain, and even minor differences are imperceptible to the human eye.

Pilot Insertion
The embedding of data in the DFC system involves the insertion of uniformly spaced pilot symbols within each data matrix X[i], where N p pilots are inserted. Each data matrix consists of s p + L p pixels per column, divided into N p groups, with each group containing B adjacent vertical pixels. In each group, the first pixel is dedicated to transmitting the pilot signal. Thus, the DFC data matrix can be represented as where B = (s p + L p )/N p , and s p + L p = N p + N i . Here, N i denotes the total number of information symbols per column in the data matrix. Figure 3 illustrates the insertion of pilot symbols and data symbols into the frequency domain sub-image, where the pilot symbols are shown to be uniformly inserted in each column. The DFC symbol modulation on the lth pixel can be expressed as where l = 1, 2, . . ., L and X p (m) represents the mth pilot symbol.

Iterative Image Estimation
Assuming perfect alignment between the data transmitting screen and camera, the images received through the D2C link can be represented as where * denotes the convolution operation, Y t is the received data-embedded image, H t is the reference or channel image, and N t is the additive white Gaussian noise (AWGN) matrix. After receiving the images through the D2C link, they are converted into the frequency domain using the transformation matrices denoted by F P and F Q . This can be mathematically represented as whereĤ F [i] is the estimate of a channel image. After demodulation, the reconstructed source binary information data pixels are obtained at the receiver output.

Pilot Signal Estimation
The pilot signals are uniformly distributed within each column of the data-embedded images. Consequently, because the pilot signal is present only in certain pixels, the channel response of the nonpilot (or information) pixels must be estimated by interpolating the neighboring pilot pixels. As stated previously, the pilot pixels were first extracted from the received image frame, and the channel response was estimated using both the received and known pilot pixels. The channel response of the data-bearing pixels is then interpolated using the neighboring pilot channel responses. For simplicity, we consider the first DFC symbol without loss of generality. Let be the response of pilot pixels, and be a vector of the received pilot signals, both of size N p × 1, where N p denotes the number of pilot pixels. The received pilot signal vector Y p F is expressed as follows: Then, the estimate of the pilot pixels based on the least squares (LS) criterion is given bŷ

Data Pixel Interpolation
After estimating the image pixels at the pilot tones, the channel responses of the data pixels were interpolated using the adjacent pilot tones. In this study, the piecewise-cubic interpolation method was considered because it provides a better fit to the channel response and produces a smooth and continuous polynomial that is fitted to the given pixel points. The interpolator is defined as [22] where m = 0, 1, . . ., N p − 1. The termĤ and respectively.

Image Re-Estimation Using Virtual Pilots
In this section, we discuss an image estimation method that exploits virtual pilot pixels and pilot pixels in the re-estimation of the image. As illustrated in Figure 3, the virtual pilot pixels were chosen from among the available data pixels obtained after the initial demodulation. The selection of virtual pilot pixels was based on two conditions. First, the magnitude of the selected data pixels should be sufficiently large to ensure their suitability for image estimation. Second, the channels for the virtual pilot pixels must be highly correlated with those for the pilot pixels; otherwise, they would not contribute to improving the quality of the data pixel estimates. By incorporating the selected virtual pilot pixels with the original pilot signals, the image is re-estimated, and the newly generated reference image estimate is used for symbol detection in the subsequent iteration. This process was repeated until suitable termination conditions were achieved. Figure 3 shows the locations of the pilot and virtual pilot pixels in the upper sub-band of the frequencydomain image, which is represented in Figure 2a. Here, d 1 represents the first column used to embed the data, with s p and s q denoting the starting pixel locations for the row and column, respectively. A total of L p and L q data bits were embedded in the upper sub-band of the frequency-domain image.
Let N v denote the number of virtual pilot pixels utilized for the reference image re-estimation. The virtual pilot observations can be expressed in vector form as where the data symbols in D v F are unknown a priori to the receiver and should, therefore, be chosen from among all the available data pixels. By stacking the pilot observation vector Y p F and the virtual pilot observation Y v F , the observation vector for image re-estimation can be obtained as follows: For the next iteration, the LS estimate of Y v F is added to the virtual pilots. The LS estimate of the new pilot observation vector can be expressed as: yielding the pixel estimates at the pilot and virtual pilot positions in the first iteration. The pixel estimates for the remaining data symbols are calculated using the interpolation method discussed in Section 3.2. The image is then fully re-estimated using the pilots and virtual pilots, and the information pixels are demodulated using the ZF receiver as As Equation (21) relies on both the pilot and virtual pilot tones, it is anticipated that an improved reference image estimation, and therefore an enhanced SER, can be achieved with increasing iterations as the virtual pilot pixels become more refined.

Simulations
The performance of the proposed reference-frame estimation scheme was evaluated in the presence of an AWGN channel. The simulation environment used in this study is similar to that used in [21]. Specifically, a 2D-DFC system with a 256 × 256-pixel data-embedded Lena image displayed on the screen was utilized (cf. Figure 2b). The camera was perfectly aligned with the screen to prevent energy loss. BPSK modulation was employed on data symbols with uniformly spaced pilots, and the SER, ADR, and PSNR at the camera decoder output were used as performance metrics. The ADR was computed using the following formula: where A is the pixel area for data embedding and N i is the number of information symbols per frame. A standard off-the-shelf camera receiver of 30 fps was considered, and the data were embedded in every frame. Of all the data pixels, a total of 10% were designated as pilot pixels, and a maximum of five iterations were performed. However, the iteration was terminated when no significant improvement in the overall 2D-DFC performance was observed. The data were embedded in the high-frequency region of the Lena image to minimize visual artifacts on the screen, because a reference image frame was not transmitted. The starting pixel values of the symbols (s p and s q ) and number of embedded data symbols (L p and L q ) were set as 90 and 30, respectively. Figure 4 illustrates the SER performance of the proposed reference-frame estimation scheme. As the number of iterations of the scheme increases, the SER gradually approaches that of the ideal scheme, that is, conventional 2D-DFC. This can be attributed to the iterative refinement of the pixel estimate output at both the pilot and virtual pixels, which resulted in better interpolated values for the information symbols. Furthermore, we observed that, as the SNR increased, the proposed method achieved significant performance improvements, and the virtual pilot pixels became more accurate with increasing iterations. The uniqueness of the scheme lies in its ability to improve the performance iteratively by utilizing both pilot and virtual pilot pixels while keeping the pilot density low at only 10%. Figure 5 depicts the ADR of the proposed reference frame estimation scheme, which is the primary motivation for this study. We observed that, even without iterations, the data rate was higher than that of the conventional 2D-DFC scheme. Furthermore, with increasing number of iterations, the data rate becomes significantly higher, nearly doubling. The primary reason for this improvement is the elimination of reference frames in the proposed scheme. In conventional 2D-DFC, a reference frame is employed to decode each data frame at the receiver, whereas, in the proposed scheme, the reference frame is estimated at the receiver using transmitting pilots, thereby eliminating the need for reference frames. To enhance the performance, virtual pilots are further used iteratively. This demonstrates that the use of reference frames severely limits the data rate of 2D-DFC systems, and the proposed scheme enhances the performance by eliminating their use. The iterations were terminated after no significant improvement in the data rate was observed. ADR [bits/s] 10 5 No iteration 1st iteration 2nd iteration 3rd iteration 4th iteration 5th iteration Conv. 2D-DFC Figure 5. Comparison of ADR performance between the proposed 2D-DFC scheme with iterative processing and the conventional 2D-DFC scheme that employs a reference image. Figure 6 shows the PSNR as a function of the number of iterations. PSNR was used to measure the quality of the reconstructed image compared to the original image. Higher PSNR values indicate better image quality. To compute the PSNR between images, we first computed the mean square error (MSE) as is the reconstructed image in the spatial domain and D t [i] is the transmitted spatial domain image given by (6). We can then compute the PSNR between the transmitted and reconstructed images as: where R is the maximum pixel value. As shown in Figure 6, the proposed iterative 2D-DFC scheme ensures the perceptual unobtrusiveness of the data embedding, as the reconstructed image quality improves with an increasing number of iterations. The visual features of the image were primarily located at low frequencies, whereas details and noise were present at higher frequencies.
Because data embedding occurs in the high-frequency subband, visual artifacts are hardly noticeable. Although the proposed iterative method results in a slightly reduced PSNR performance compared to the conventional method, the data rate is nearly doubled. Therefore, the proposed method is beneficial for applications requiring high-speed data transmission through D2C links.

Perspective Distortion
Although the above results are based on the assumption of perfect alignment between the camera and display, in a real-life situation, a camera may not always be aligned frontally with the display screen, and there may be instances of distortion due to the tilting or rotation of the camera relative to the display. This type of distortion degrades DFC performance and is modeled as a perspective distortion [23]. Perspective distortion causes straight lines in the scene to appear curved or skewed in the resulting image and objects farther away from the camera appear to be smaller. In the simulations, the perspective transformation matrix is computed using the geometric transformation of a set of matched control points. A projective transformation can be represented by a 3 × 3 matrix known as a homography matrix. Given a point in homogeneous coordinates, which is represented as a 3 × 1 column vector [x, y, w] T , a projective transformation can be represented as where [x , y , w ] T is the transformed point in homogeneous coordinates, and H is the homography matrix. The homography matrix H can be computed from a set of corresponding points in the two images using a method known as direct linear transformation (DLT). Given n corresponding points, the homography matrix can be computed by solving a system of linear equations of the form: where A is a 2n × 9 matrix, h is a 9 × 1 column vector containing the elements of H in row-major order, and the symbol "×" denotes the vector cross-product. When the homography matrix is computed, it can be used to transform the transmitted images. At the camera receiver, the boundaries of the electronic display should be accurately detected before data retrieval. Harris corner detection and Hough transform can be used to recognize the borders of the display for precise image alignment. The distorted image can then be resized to its original size by obtaining missing pixels using interpolation. The image is then restored using a homography estimation. Figure 7 illustrates the effects of perspective distortion on an image, and the corresponding recovery using our proposed scheme. As shown in Figure 7a, the image was subjected to skewing and rotation, particularly at the edges. Figure 7b shows the corrected image, which is more accurately proportioned and shaped. Figure 8 presents the performance of the proposed scheme on an image affected by perspective distortion, which was subsequently corrected. Figure 8a shows the symbol error rate (SER), which shows an improvement in performance with an increasing number of iterations using virtual pilots. Our proposed scheme approaches the ideal 2D-DFC scheme despite the poor SER owing to substantial distortion in the image. Figure 8b shows the achievable data rate of the scheme, which is lower than that of the perfect alignment case. Nevertheless, our proposed scheme outperformed the conventional 2D-DFC scheme and achieved almost two-fold higher data rates even in the presence of perspective distortion.

Conclusions
This study proposes a novel scheme for increasing the data rate of a 2D-DFC system by introducing a reference frame estimation method. The proposed scheme involves computing the estimates at the pilot pixels, followed by the interpolation of the estimates at the information pixels using piecewise cubic interpolation. The scheme then selects virtual pilots based on specific criteria among the data symbols, and uses them as pilot pixels in the next iteration. This iterative process was repeated, and the reference image was re-estimated each time. Simulation results show that the proposed scheme improves the data rate of the 2D-DFC system by almost two-fold at the cost of a slightly reduced PSNR; thus, the proposed scheme provides a way to eliminate the use of reference image frames that typically limit the data rate of 2D-DFC systems. Overall, this study presented a promising approach for enhancing the performance of D2C communication systems.