Screen-Cam Robust Image Watermarking with Feature-Based Synchronization

: The screen-cam process, which is taking pictures of the content displayed on a screen with mobile phones or cameras


Introduction
Currently, the ubiquity of the computer office and the development of communication technology render digital image storage, copying, and transmission convenient and fast. Security problems involving digital images, such as leakage, malicious theft, and illegal dissemination, still frequently occur. In order to protect the data on the computer side, data encryption [1][2][3][4][5][6] and software watermarking [7,8] schemes are proposed. Similarly, some scholars have also investigated access control technology [9][10][11] to prevent illegal copying and transmission of data by restricting internal operations. Although these methods can effectively prevent the illegal acquisition of digital data directly from a computer, they cannot prohibit using a camera to take a photo of sensitive information

•
We analyze the performance of commonly used feature operators and the variation rules of DFT magnitude coefficients during the screen-cam process.

•
We designed an orientation and scale invariant local square feature region (LSFR) construction method, which can achieve watermark synchronization against screen-cam attack and also common desynchronization attacks.

•
We employ a non-rotating embedding algorithm based on the properties of the DFT coefficients, which can avoid further distortions that may be caused by orientation normalization.

•
We present a preprocessing method for message embedding. By working in combination with the proposed local statistical feature-based message extraction method, it can improve the extraction accuracy.
The remainder of the paper is organized as follows: Section 2 summarizes different distortions in the screen-cam process. Section 3 describes the implementation details of the proposed method. The selection of parameters and experiment results are presented in Sections 4 and 5. Finally, Section 6 concludes the paper.

Screen-Cam Process Analysis
The screen-cam process contains various distortions [34]. The subprocesses of the screen-cam process produce different types of distortions, as shown in Figure 1, and cause severe image quality degradation. This section aims to provide a basis for the design of screen-cam robust watermarking schemes by analyzing the different types of distortions generated in each step of the screen-cam process.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 3 of 26 not be able to restore the image to its original orientation and scale. Therefore, to address possible desynchronization attacks besides screen-cam attack, a screen-cam, RST, and cropping invariant watermark synchronization method need to be further investigated.
To solve these issues, a feature and Fourier-based screen-cam robust watermarking scheme is proposed in this paper. The main contributions are as follows:


We analyze the performance of commonly used feature operators and the variation rules of DFT magnitude coefficients during the screen-cam process.  We designed an orientation and scale invariant local square feature region (LSFR) construction method, which can achieve watermark synchronization against screen-cam attack and also common desynchronization attacks.  We employ a non-rotating embedding algorithm based on the properties of the DFT coefficients, which can avoid further distortions that may be caused by orientation normalization.  We present a preprocessing method for message embedding. By working in combination with the proposed local statistical feature-based message extraction method, it can improve the extraction accuracy.
The remainder of the paper is organized as follows: Section 2 summarizes different distortions in the screen-cam process. Section 3 describes the implementation details of the proposed method. The selection of parameters and experiment results are presented in Section 4 and Section 5. Finally, Section 6 concludes the paper.

Screen-Cam Process Analysis
The screen-cam process contains various distortions [34]. The subprocesses of the screen-cam process produce different types of distortions, as shown in Figure 1, and cause severe image quality degradation. This section aims to provide a basis for the design of screen-cam robust watermarking schemes by analyzing the different types of distortions generated in each step of the screen-cam process. The screen-cam process can be divided into three subprocesses: screen display, while shooting, and camera imaging.
In the screen display process, the main factors that affect image signal are the quality of different monitors and their settings. Regular user operations will also cause distortions.
With regard to the while shooting process, the main factors are the shooting environment, the relative position of screen and camera, and the moiré phenomenon. If shooting at a large angle, the focusing problem cannot be disregarded [23]. Besides, camera shake may occur when pressing the shutter.
The camera imaging process of a mobile phone is the process of converting optical signals to digital image signals and processing them. The main types of equipment in the process are the optical lens, the CMOS sensor [35], and the digital signal processor (DSP). With regard to CMOS sensors, the The screen-cam process can be divided into three subprocesses: screen display, while shooting, and camera imaging.
In the screen display process, the main factors that affect image signal are the quality of different monitors and their settings. Regular user operations will also cause distortions.
With regard to the while shooting process, the main factors are the shooting environment, the relative position of screen and camera, and the moiré phenomenon. If shooting at a large angle, the focusing problem cannot be disregarded [23]. Besides, camera shake may occur when pressing the shutter.
The camera imaging process of a mobile phone is the process of converting optical signals to digital image signals and processing them. The main types of equipment in the process are the optical lens, the CMOS sensor [35], and the digital signal processor (DSP). With regard to CMOS sensors, the embedding operations and present the detailed procedures of the proposed embedding method. The corresponding watermark detection method is given in Section 3.3.

Local Square Feature Region Construction
Due to the desynchronization attacks caused by the screen-cam process and user operations, we need to develop an appropriate synchronization method to locate the watermark. We test the feasibility of the Harris-Laplace, SIFT, and SURF operators, which are extensively employed to construct local scale-invariant feature regions (LFRs) as message embedding areas [37][38][39][40][41][42][43][44][45], in the screen-cam process. To select the most suitable operators for LFRs construction, the variations of feature point coordinates, feature scale, and feature direction are quantitatively analyzed under different shooting distances. The images we used here are shown in Figure 2. All host images are 1024 × 1024 pixels. Because of the blurring of the image edges caused by the low-pass filtering attack and the lens distortion, it is difficult to restore the captured image to exactly correspond to the original image. Inevitably, there will be a displacement between the coordinates of the corresponding pixels. Therefore, the feature points are considered to be repeated when the offsets of their coordinates are smaller than five pixels. Besides, considering the requirements of watermark synchronization, the feature scale variation should be below 10% at the same time. Furthermore, the feature points at the edge of the image are excluded. In order to reduce the impact of noise, we perform a Gaussian filter on both the original images and the captured images. As shown in Figure 3a, after a Gaussian function, we discover that the middle-and high-scale, which means the feature scale is greater than 15, feature points of the Harris-Laplace or SIFT operators can achieve high repeatability. The repeatability here refers to the ratio of the number of feature points extracted after a screen-cam attack and satisfying the above-mentioned pixel offset and scale variation criteria, compared to the original number of feature points. We also note that although the SIFT operator has a better performance at a long shooting distance, it does not work well at a close shooting distance, which indicates that it is more sensitive to moiré noise. Comparatively, the Harris-Laplace operator is more stable at different shooting distances, which is more suitable for watermark synchronization. Regarding the feature orientation descriptors, we note that the SURF orientation descriptor is more robust than the SIFT to the screen-cam process, where the orientation variations of repeated SURF feature points are predominantly less than five degrees, as shown in Figure 3b. We considered that the integral image and Haar wavelet-based SURF orientation descriptor is more robust to the blurring and luminance change in the screen-cam process than a Gaussian image and histogram-based SIFT orientation descriptor.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 5 of 27 3.2, we analyze the embedding operations and present the detailed procedures of the proposed embedding method. The corresponding watermark detection method is given in Section 3.3.

Local Square Feature Region Construction
Due to the desynchronization attacks caused by the screen-cam process and user operations, we need to develop an appropriate synchronization method to locate the watermark. We test the feasibility of the Harris-Laplace, SIFT, and SURF operators, which are extensively employed to construct local scale-invariant feature regions (LFRs) as message embedding areas [37][38][39][40][41][42][43][44][45], in the screen-cam process. To select the most suitable operators for LFRs construction, the variations of feature point coordinates, feature scale, and feature direction are quantitatively analyzed under different shooting distances. The images we used here are shown in Figure 2. All host images are 1024 × 1024 pixels. Because of the blurring of the image edges caused by the low-pass filtering attack and the lens distortion, it is difficult to restore the captured image to exactly correspond to the original image. Inevitably, there will be a displacement between the coordinates of the corresponding pixels. Therefore, the feature points are considered to be repeated when the offsets of their coordinates are smaller than five pixels. Besides, considering the requirements of watermark synchronization, the feature scale variation should be below 10% at the same time. Furthermore, the feature points at the edge of the image are excluded. In order to reduce the impact of noise, we perform a Gaussian filter on both the original images and the captured images. As shown in Figure 3a, after a Gaussian function, we discover that the middle-and high-scale, which means the feature scale is greater than 15, feature points of the Harris-Laplace or SIFT operators can achieve high repeatability. The repeatability here refers to the ratio of the number of feature points extracted after a screen-cam attack and satisfying the above-mentioned pixel offset and scale variation criteria, compared to the original number of feature points. We also note that although the SIFT operator has a better performance at a long shooting distance, it does not work well at a close shooting distance, which indicates that it is more sensitive to moiré noise. Comparatively, the Harris-Laplace operator is more stable at different shooting distances, which is more suitable for watermark synchronization. Regarding the feature orientation descriptors, we note that the SURF orientation descriptor is more robust than the SIFT to the screen-cam process, where the orientation variations of repeated SURF feature points are predominantly less than five degrees, as shown in Figure 3b. We considered that the integral image and Haar wavelet-based SURF orientation descriptor is more robust to the blurring and luminance change in the screen-cam process than a Gaussian image and histogram-based SIFT orientation descriptor.   Therefore, in our method, a modified Harris-Laplace detector and SURF orientation descriptor are integrated to construct the RST invariant LFRs. To increase the detection rate of feature points during the screen-cam process, we also employ a Gaussian function.
In previous LFR-based methods, circular feature regions are commonly constructed. These regions will involve zero-padding [43] or rearrangement [46] to a square region before message embedding, which will cause further distortions [47]. Therefore, we directly construct LSFRs. Figure  4 illustrates the subprocesses. The details are as follows.

Gaussian Function Preprocess
In the embedding and extraction processes, the detection of feature points will be performed on the Gaussian filtered images. A two-dimensional Gaussian function ( , ) G x y is obtained by the product of two one-dimensional Gaussian functions and can be defined as: where  is the standard deviation. The Gaussian kernel G H , whose sigma is set to 2 and window size to 7, is employed here. The image convolution process is defined as: where I is the input image and ' I is the convolution result.  denotes the convolution operator. Therefore, in our method, a modified Harris-Laplace detector and SURF orientation descriptor are integrated to construct the RST invariant LFRs. To increase the detection rate of feature points during the screen-cam process, we also employ a Gaussian function.

Modified Harris-Laplace Detector
In previous LFR-based methods, circular feature regions are commonly constructed. These regions will involve zero-padding [43] or rearrangement [46] to a square region before message embedding, which will cause further distortions [47]. Therefore, we directly construct LSFRs. Figure 4 illustrates the subprocesses. The details are as follows.  Therefore, in our method, a modified Harris-Laplace detector and SURF orientation descriptor are integrated to construct the RST invariant LFRs. To increase the detection rate of feature points during the screen-cam process, we also employ a Gaussian function.
In previous LFR-based methods, circular feature regions are commonly constructed. These regions will involve zero-padding [43] or rearrangement [46] to a square region before message embedding, which will cause further distortions [47]. Therefore, we directly construct LSFRs. Figure  4 illustrates the subprocesses. The details are as follows.

Gaussian Function Preprocess
In the embedding and extraction processes, the detection of feature points will be performed on the Gaussian filtered images. A two-dimensional Gaussian function ( , ) G x y is obtained by the product of two one-dimensional Gaussian functions and can be defined as: where  is the standard deviation. The Gaussian kernel G H , whose sigma is set to 2 and window size to 7, is employed here. The image convolution process is defined as: where I is the input image and ' I is the convolution result.  denotes the convolution operator.

Gaussian Function Preprocess
In the embedding and extraction processes, the detection of feature points will be performed on the Gaussian filtered images. A two-dimensional Gaussian function G(x, y) is obtained by the product of two one-dimensional Gaussian functions and can be defined as: where σ is the standard deviation. The Gaussian kernel H G , whose sigma is set to 2 and window size to 7, is employed here. The image convolution process is defined as: where I is the input image and I is the convolution result. * denotes the convolution operator.  [48,49] has been extensively employed in different watermarking schemes. Therefore, here we give a brief description of it, and the modified part will be explained in detail.
First, Harris points are detected in the scale space. To obtain the invariance to scale variation, we built a scale-space representation with the Harris function for preselected scales. The Harris detector is based on a specific image descriptor, which is referred to as the second moment matrix, and reflects the local distribution of gradient directions in the image [50]. To make the matrix independent of the image resolution, the scale-adapted second moment matrix is defined by: where σ I and σ D are the integration scale and local scale, respectively, and L is the derivative computed in an associated direction by a Gaussian. Given σ D , the uniform Gaussian multiscale space representation L is defined by: where G is the associated uniform Gaussian kernel with a standard deviation σ D and a mean of zero. Given σ I and σ D , the scale-adapted Harris corner strength cornerness used to quantitatively describe the stability under variations in imaging conditions can be computed. The original cornerness measure function needs an empirical parameter, which may float for different images. Therefore, in this paper, we will adopt another cornerness measure function, that is, the Alison Noble measure [51]: where det(·) and trace(·) denote computation of the determinant of the matrix and the trace of the matrix, respectively. eps is the smallest integer to ensure that the denominator is nonzero. The feature points obtained by this method are more robust under variations in imaging conditions [51]. At each level of the scale space, the candidate points are extracted as follows: where A represents the points within the 3σ I radius neighborhood, and t n is the threshold, which is 0.1 · max(cornerness I ).
The automatic scale selection of the feature points is performed. To select the characteristic scale of the local structure, a scale-normalized derivative LoG operator is defined as: where L xx and L yy are second partial derivatives with respect to x and y, respectively. For each candidate point, we apply an iterative method to determine the location and scale of the feature points. Given the initial point p with the scale σ I , the iteration steps are presented as follows.
Step (1) Find the local extremum over the scale of LoG for the point p k ; otherwise, reject the point.
The investigated range of scales is limited to σ Step (2) Detect the spatial point p k+1 of a maximum of the SHCS closest to p k for the selected σ

SURF Orientation Descriptor
To obtain the invariance to the rotation, each feature point will be assigned a direction based on the SURF orientation descriptor. We calculate the Haar wavelet response on the selected circle region of the integral image, which is centered at the feature points and is six times the feature scale as the radius. The Gaussian weighting function, for which σ is two times the feature scale, is used to Gaussian weight the response of the Haar wavelet.
To obtain the dominant orientation, we calculate the sum of all responses within a sliding orientation window of size π/3. By summing the horizontal and vertical responses within the window, the vector (m w , θ w ) can be obtained, which is defined as: where m w is the summarized responses, and θ w is the associated orientation. The dominant orientation θ is defined as:

LSFRs for Watermarking
Considering the severe distortion during the screen-cam process, the constructed LSFRs should have a sufficient range to ensure that information can survive. Thus, the feature points with appropriate scale and location are selected, and the side length L 0 of LSFR is designed as: where k 1 is a constant coefficient, and s is the feature scale value. In Figure 5, are shown the LSFRs for the 8-image test set. Because the watermark information will be embedded in the DFT coefficients, according to its characteristics, the following two situations are also feasible. When a small part of the candidate LSFR exists outside the image, shown in Figure 5f, or when a small part of the two LSFRs overlapped, shown in Figure 5g, these LSFRs can also be utilized as embedding areas.

Selection of Embedding Operations
As discussed in Section 3.1, there is an inevitable shift in the corresponding positions between

Selection of Embedding Operations
As discussed in Section 3.1, there is an inevitable shift in the corresponding positions between the corrected image and the original image. Fortunately, due to the nature of DFT coefficients in image translation, the coefficients of captured images can be corrected to correspond to the original image, if the four corners are carefully selected to avoid too much rotation and scaling distortions after perspective rectification. Therefore, it is advantageous to select the DFT domain for embedding a watermark message.
In order to use DFT coefficients as a watermark carrier, we need to analyze their variation rules in the screen-cam process first. The variations of the DFT magnitude coefficients with different shooting conditions were analyzed in detail. As mid-frequency coefficients are commonly employed as watermark carriers, we take the mid-frequency spectrum of 512 × 512 sized Lena image and the variation after screen-cam as an example to illustrate the details of their variation rules, as shown in Figure 6. The axis scale value is the coordinate in the spectrum of the original image.
We find that most of the magnitude coefficients with high values are well preserved. For example, (301, 299), (301, 300), (302, 304), and other points of deep warm color in Figure 6. Furthermore, the magnitude coefficients with low values commonly vary to become higher values. In general, the more blurred the image is, the higher the magnitude coefficients with low values will increase. Examples of this are the points (297, 305), (300, 296), and (302, 303) in Figure 6.
The changes can be summarized as blew. In the mid-frequency bands, the magnitude coefficients with high values are well preserved during the screen-cam process, while those with low values commonly become higher values to approximate their adjacent magnitude values. Therefore, we choose mid-frequency bands and embed the message by modifying the selected magnitude coefficients to higher values.  Figure 7 illustrates the embedding process. Each selected LSFR is treated as an independent communication channel, and the same watermark message will be embedded in every LSFR. Compared with the DCT-based method in [33], which embeds the message in the sub-blocks of feature regions, the proposed DFT-based method takes each LSFR as a whole, it has better robustness against cropping attacks.

Message Embedding
To avoid the LSFR from being further distorted during the rotation process of orientation normalization, we designed a non-rotating embedding method based on the properties of the DFT coefficients. Furthermore, to improve extraction accuracy, a preprocessing method of DFT magnitude coefficients is proposed. Specific steps are as follows.  Figure 7 illustrates the embedding process. Each selected LSFR is treated as an independent communication channel, and the same watermark message will be embedded in every LSFR. Compared with the DCT-based method in [33], which embeds the message in the sub-blocks of feature regions, the proposed DFT-based method takes each LSFR as a whole, it has better robustness against cropping attacks.

Message Embedding
To avoid the LSFR from being further distorted during the rotation process of orientation normalization, we designed a non-rotating embedding method based on the properties of the DFT coefficients. Furthermore, to improve extraction accuracy, a preprocessing method of DFT magnitude coefficients is proposed. Specific steps are as follows. Figure 7 illustrates the embedding process. Each selected LSFR is treated as an independent communication channel, and the same watermark message will be embedded in every LSFR. Compared with the DCT-based method in [33], which embeds the message in the sub-blocks of feature regions, the proposed DFT-based method takes each LSFR as a whole, it has better robustness against cropping attacks.
To avoid the LSFR from being further distorted during the rotation process of orientation normalization, we designed a non-rotating embedding method based on the properties of the DFT coefficients. Furthermore, to improve extraction accuracy, a preprocessing method of DFT magnitude coefficients is proposed. Specific steps are as follows.  First, the minimum square regions that contain the LSFRs to be embedded are extracted in order. The luminance band of this area is converted to the DFT domain.
Second, for the watermark information, the pseudorandom sequence W= {w(i) w(i) ∈ {−1, 1} , t = 0, . . . , l − 1} is generated by the secret key, where l is the size of the sequence. In order to achieve blind detection that can cope with the situation where the original size is unknown, the embedding radius R of W is set to a fixed value. Correspondingly, the embedding radius R 1 of W RS is defined as: where R 1 is the embedded radius of the square region and L 1 is the side length of the square region. According to the characteristics of the DFT coefficients, which is centrosymmetric, we can have 180 degrees as embedding region. The coordinates W RS (x i , y i ) of the message embedding position in the square region are defined as: where j is the j-th element of W. Therefore, the elements of the message W(x i , y i ) are equally spaced around the center of the embedding region. θ d defines the angle between the feature orientation of the LSFR and the normalized orientation. Third, to obtain a better detection rate, the magnitudes M need to be preprocessed before the signal embedding. In theory, the more obvious the difference between the magnitudes, where the watermark embedded information is "1" and "−1", the better the message extraction results. Considering the various rules of the magnitudes during the screen-cam process, we need to avoid high magnitude values at the positions that represent the watermark information "−1". Therefore, some extreme high magnitude values of these positions and their neighborhoods need to be reduced. For a normal distribution, nearly 84% of the values are less than the sum of the mean and one standard deviation. Hence, the preprocess is defined as follows: where m p (x, y) defines all magnitudes of the positions that represent the watermark information "−1" and their eight neighbor magnitudes, and m p and σ p define the mean value and the standard deviation of these magnitudes, respectively. The watermark signal is embedded in preprocessed magnitudes M P using the following equation: where M w (x, y) define the watermarked magnitudes and β the embedding strength. We provide an initial value β = 0.1R, which is set based on experience and adjust the value by the calculated peak signal to noise ratio (PSNR) index. If the PSNR value is less than 42 dB, the β will be reduced by 0.2. Iterate this process until the PSNR value is higher than 42 dB. Last, M W is combined with ϕ, which is converted to the watermarked luminance band of the square region and then transformed to the spatial domain. Only the pixel values within the LSFR are replaced. The result is a watermarked LSFR.
After all selected LSFRs are embedded, the embedding process is completed. Figure 8 illustrates the watermark detection process, which can be divided into the following three steps: perspective correction, candidate regions locating, and message extraction.

Watermark Detection
Appl. Sci. 2020, 10, x FOR PEER REVIEW 12 of 27 Figure 8 illustrates the watermark detection process, which can be divided into the following three steps: perspective correction, candidate regions locating, and message extraction.

Perspective Correction
Because different shooting angles and distances will cause perspective distortion, we need to correct it and extract the needed portion from the captured images. The perspective correction function can be written as: 11 12 13 '

Perspective Correction
Because different shooting angles and distances will cause perspective distortion, we need to correct it and extract the needed portion from the captured images. The perspective correction function can be written as: where [x , y , 1] T and [x, y, 1] T define the homogeneous point coordinates of the corrected image and the photo, respectively. H defines a nonsingular 3 × 3 homogeneous matrix. According to the formula, the matrix has eight degrees of freedom (DOF). Therefore, at least four sets of points are required to calculate H. We manually select the four needed vertices from the captured image. As the proposed watermarking scheme is designed for leak tracking, manual selection is acceptable. Since the watermark synchronization method is robust to scaling, the images do not have to be recovered to the original pixel size. In theory, without knowing the original size of the image or if the image has been cropped, we can also choose to use the four vertices of the screen to help with perspective correction, as shown in Figure 9. We at least need to know the size or aspect ratio of the screen, or the aspect ratio of the image if it has not been cropped.
Because smartphones have high-megapixel cameras, the pixels of the captured image are commonly substantially larger than the original image. To fully utilize the captured information, a judgment based on the shortest distance between the four points is made before the correction. If it is larger than 1500 pixels, the image will be recovered to two different sizes. By recovering to the original size, if known, or a relatively minor size, the recovered image I 1 is used to calculate the candidate LSFRs, which also accelerates the calculation. The image is recovered based on the shortest distance between two of the four vertices, as shown in Figure 9. The recovered image I 2 is used for message extraction. Otherwise, only one image will be recovered.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 13 of 27 candidate LSFRs, which also accelerates the calculation. The image is recovered based on the shortest distance between two of the four vertices, as shown in Figure 9. The recovered image 2 I is used for message extraction. Otherwise, only one image will be recovered.

Candidate Regions Locating
The calculation process of the candidate LSFRs is the same as the embedding process, which will be performed on 1 I . The Gaussian function is performed first to reduce the impact of a noise attack.
The feature points and associated orientation are calculated. To avoid missing detection, all feature points that may be used for watermark synchronization are selected based on scale and spatial location. We obtain the candidate LSFR set of 1 I . The corresponding regions are extracted from 2 I for message extraction.

Message Extraction
Watermark detection is an iterative search for candidate LSFRs. As long as watermark information is detected in one LSFR, the watermark detection of the captured image is completed. Each time, one candidate LFSR is orientation normalized and discrete Fourier transformed. According to the nature of the DFT coefficients, although we do not know the original size, the radius of watermark locations will not vary as long as the area corresponding to the feature scale has not varied. However, the feature scale and its corresponding area will vary slightly, resulting in a slight

Candidate Regions Locating
The calculation process of the candidate LSFRs is the same as the embedding process, which will be performed on I 1 . The Gaussian function is performed first to reduce the impact of a noise attack. The feature points and associated orientation are calculated. To avoid missing detection, all feature points that may be used for watermark synchronization are selected based on scale and spatial location. We obtain the candidate LSFR set of I 1 . The corresponding regions are extracted from I 2 for message extraction.

Message Extraction
Watermark detection is an iterative search for candidate LSFRs. As long as watermark information is detected in one LSFR, the watermark detection of the captured image is completed. Each time, one candidate LFSR is orientation normalized and discrete Fourier transformed. According to the nature of the DFT coefficients, although we do not know the original size, the radius of watermark locations will not vary as long as the area corresponding to the feature scale has not varied. However, the feature scale and its corresponding area will vary slightly, resulting in a slight variation in the radius of watermark locations. Therefore, the searching area will be between R i ∈ (R − 10, R + 10) at a step of 1 pixel. Besides, we also need to consider the variation in the feature orientation. As we have investigated in Section 3.1, the orientation variation is primarily less than five degrees. Therefore, the starting position is between −5 • , +5 • of the initial position at each radius R i at a step of one degree.
The correction of perspective distortion will inevitably cause some shift of the coefficients and imperfections in resampling. This results in a variation in the coefficient of the adjacent point. An example is shown in Figure 10. In addition, because the feature orientation will vary, the starting position cannot be located directly. Therefore, each time, the maximum magnitude value of the candidate positions and their neighborhoods are extracted to obtain the message V. Based on local statistical feature, the extracted message w is defined as: where w M and w  define the mean value and the standard deviation of all the magnitudes in the k is a parameter used to determine the threshold for message extraction.
The extracted w is compared with the pseudorandom sequence W generated by the secret key to calculate the number of erroneous bits. The watermark detection is positive if the number of erroneous bits is below the predefined threshold T . If the detection is negative, the iterative process continues.

Parameter Settings
For demonstration and experimental purposes, the watermark length l is set to 60, which can be considered a reasonable message length for real use cases. Based on this, we designed a series of experiments to select the most appropriate values for the parameters mentioned above.

The Selection of Embedding Radius
The magnitudes at different embedding radii R have different variation rules which affect the robustness of the algorithm. Considering the imperceptibility of the algorithm, the embedding strength  can vary according to different embedding radii.
To select the most suitable embedding radius for the algorithm, we designed an experiment. The eight host images are resized to 241 × 241, which can be treated as an LSFR. We generate the watermark information with the key 1 K , where totality of 32 watermark bits is "1". Based on local statistical feature, the extracted message w is defined as: where M w and σ w define the mean value and the standard deviation of all the magnitudes in the range is the extracted maximum value within 3 × 3 magnitudes, k 2 is a parameter used to determine the threshold for message extraction. The extracted w is compared with the pseudorandom sequence W generated by the secret key to calculate the number of erroneous bits. The watermark detection is positive if the number of erroneous bits is below the predefined threshold T. If the detection is negative, the iterative process continues.

Parameter Settings
For demonstration and experimental purposes, the watermark length l is set to 60, which can be considered a reasonable message length for real use cases. Based on this, we designed a series of experiments to select the most appropriate values for the parameters mentioned above.

The Selection of Embedding Radius
The magnitudes at different embedding radii R have different variation rules which affect the robustness of the algorithm. Considering the imperceptibility of the algorithm, the embedding strength β can vary according to different embedding radii.
To select the most suitable embedding radius for the algorithm, we designed an experiment. The eight host images are resized to 241 × 241, which can be treated as an LSFR. We generate the watermark information with the key K 1 , where totality of 32 watermark bits is "1".
Based on the discussion in Section 3.3.3, the embedding radius should be no less than 55 to avoid the watermark bits being too close affecting each other. According to the method in Section 3.2.2, the DFT magnitudes of the experiment images are preprocessed first. Then, watermark information is embedded at different radii for all images based on Equation 15. The PSNR value of the watermarked images is controlled to be around 42 dB by adjusting the embedding strength. The relationship between embedding radius R and the average embedding strength β is shown in Figure 11a. With the increase in the embedding radius, the embedding strength can be increased.
In order to compare the variation of the watermarked magnitudes in different radii and at different shooting distances, we designed an index K r,d as an evaluation indicator to describe the significance of watermark information. Because only the magnitudes of the positions where watermark bit is "1" are modified, K r,d only need to consider the modified magnitudes. According to Equation (17), it is defined as: where K r,d defines the index of the image captured at the distance of d with embedding radius r. m c(r,i) defines the magnitude in i-th position where watermark bit is "1" in the captured image with embedding radius r.
The relationship between the average of calculated K r,d and different shooting distances with different embedding radii is shown in Figure 11b. When the shooting distance is close to the screen, the watermark information with a larger embedding radius is more significant due to the higher embedding strength. However, the captured details of the watermark will be less and less as the shooting distance increases, so the higher frequency band coefficients will be poorly preserved. When the embedding radius is 56 and 60, the watermark information can be better preserved at different shooting distances. Considering the real scene, in order to better capture the image displayed on the screen, we usually shoot at 40 to 60 cm. At these distances, results with an embedding radius of 60 are better. Therefore, R is set to 60 in our experiment.  Figure 11b. When the shooting distance is close to the screen, the watermark information with a larger embedding radius is more significant due to the higher embedding strength. However, the captured details of the watermark will be less and less as the shooting distance increases, so the higher frequency band coefficients will be poorly preserved. When the embedding radius is 56 and 60, the watermark information can be better preserved at different shooting distances. Considering the real scene, in order to better capture the image displayed on the screen, we usually shoot at 40 to 60 cm. At these distances, results with an embedding radius of 60 are better. Therefore, R is set to 60 in our experiment.

The Selection of the Size and Number of LSFRs
The size and number of LSFRs determine the robustness of the proposed algorithm. Besides, the size of the constructed LSFRs determines the number of them. According to Equation (11), the size and number of constructed LSFRs in our experiment are determined by 1 k .
The 60 images from the database [52] are resized to 1024 × 1024 as experiment images here. We statistically analyzed the average number of constructed LSFRs with different 1 k . In theory, the larger and the greater the number of LSFRs, the better the robustness of the algorithm. Therefore, we also count the number of constructed LSFRs with side lengths of 240-300 and the number of constructed LSFRs with side lengths greater than 300, as shown in Table 1. When 1 k is set to 6 and

The Selection of the Size and Number of LSFRs
The size and number of LSFRs determine the robustness of the proposed algorithm. Besides, the size of the constructed LSFRs determines the number of them. According to Equation (11), the size and number of constructed LSFRs in our experiment are determined by k 1 .
The 60 images from the database [52] are resized to 1024 × 1024 as experiment images here. We statistically analyzed the average number of constructed LSFRs with different k 1 . In theory, the larger and the greater the number of LSFRs, the better the robustness of the algorithm. Therefore, we also count the number of constructed LSFRs with side lengths of 240-300 and the number of constructed LSFRs with side lengths greater than 300, as shown in Table 1. When k 1 is set to 6 and 6.5, the most LSFRs with side lengths greater than 300 can be constructed, and the total number can also satisfy the requirements. Therefore, k 1 is set to 6.5 in our experiment.

The Selection of the Threshold for Message Extraction
According to Equation (17), k 2 determines the threshold for message extraction, which will affect the success rate and validity of watermark information extraction. We performed a statistical analysis of the extraction results of the 29 LSFRs constructed from the eight host images with and without watermarks to select the most appropriate threshold. The experiment was set at a shooting angle of 0, 15, and 30 degrees and a shooting distance from 40 to 110 cm at intervals of 10 cm. Therefore, each LSFR was captured 24 times with different shooting conditions.
Based on the extraction method in Section 3.3.3, a total of 696 results of watermarked LSFRs and 648 results of unwatermarked LSFRs were obtained. The average erroneous bits with a different k 2 is shown in Figure 12a. The extraction results of watermarked LSFRs achieve the minimum erroneous bits when k 2 is set to 1. The distributions of erroneous bits with k 2 = 1 are shown in Figure 12b. The average of detected erroneous bits of unwatermarked LSFRs is around nineteen which is independent of k 2 .
Therefore, k 2 is set to 1 in our experiment.

The Selection of the Threshold for Message Extraction
According to Equation (17), 2 k determines the threshold for message extraction, which will affect the success rate and validity of watermark information extraction. We performed a statistical analysis of the extraction results of the 29 LSFRs constructed from the eight host images with and without watermarks to select the most appropriate threshold. The experiment was set at a shooting angle of 0, 15, and 30 degrees and a shooting distance from 40 to 110 cm at intervals of 10 cm. Therefore, each LSFR was captured 24 times with different shooting conditions. Based on the extraction method in Section 3.

The Selection of the Threshold for Watermark Detection
The selection of the threshold T determines the false-positive rate and the true-positive rate. T needs to be set low enough to ensure that the watermark can be detected from watermarked LSFRs and high enough to ensure that the watermark cannot be detected from unwatermarked LSFRs.
Messages extracted from an unwatermarked image can be considered as independent random variables [43]. Therefore, the probability that a single bit match is 0.5. The relationship between the false-positive rate of single detection f P and the threshold T is:

The Selection of the Threshold for Watermark Detection
The selection of the threshold T determines the false-positive rate and the true-positive rate. T needs to be set low enough to ensure that the watermark can be detected from watermarked LSFRs and high enough to ensure that the watermark cannot be detected from unwatermarked LSFRs.
Messages extracted from an unwatermarked image can be considered as independent random variables [43]. Therefore, the probability that a single bit match is 0.5. The relationship between the false-positive rate of single detection P f and the threshold T is: As mentioned in Section 3.3.3, each LSFR will be iteratively detected at different radii and angles. The maximum number of iterations is 231 times. Suppose we complete all iterative detection, the false-positive rate of the detection of one LSFR P f is: The false-positive rate curve with different thresholds is shown in Figure 13a. In order to choose an appropriate threshold, we further analyzed the influence of different secret keys and different host images on the positive detection rate. The eight host images were all embedded with other three different keys: K 2 , K 3 , K 4 . Each watermarked image was captured 24 times with different shooting conditions. The experimental setting is the same as in Section 4.3.
Based on the 768 detection results of watermarked images with four different keys, eight different host images, and 24 different shooting conditions, we can calculate the true-positive rate with different thresholds. The true-positive rate curves between different keys and different host images are shown in Figure 13b,c. The true-positive rate can be seen to be stable for different embedding messages. However, different images have different variations during screen-cam, so the true-positive rate is also different when T is below 10.
According to the result, we set threshold T to 8, which means when the number of erroneous bits is below 8, the detection is successful. According to Formula 20, the false-positive rate of the detection of one LSFR is 8.86 × 10 −8 . The true-positive rate with K 1 , K 2 , K 3 Figure 13b and Figure 13c. The true-positive rate can be seen to be stable for different embedding messages. However, different images have different variations during screencam, so the true-positive rate is also different when T is below 10.
According to the result, we set threshold T to 8, which means when the number of erroneous bits is below 8, the detection is successful. According to Formula 20, the false-positive rate of the detection of one LSFR is 8.86 × 10 -8 . The true-positive rate with 1

Experimental Results and Analysis
We conducted a series of experiments to verify the robustness of the algorithm. Robustness refers to the ability to detect the watermark after the designated class of transformations [53]. Bit Error Ratio (BER) is a commonly used metrics to measure the robustness of watermarking methods. BER is defined as: where e n is the number of erroneous bits. A lower BER indicates the extracted results are closer to the original watermark information, which means the better robustness. Since the threshold T for watermark detection is set to 8, and the watermark length l is set to 60 in our method, this means that watermark detection is successful when BER is below 0.1333.
In Section 5.1, the robustness to common image attacks is discussed. In Section 5.2, the proposed scheme is compared with two state-of-the-art schemes and the performance against screen-cam attack

Experimental Results and Analysis
We conducted a series of experiments to verify the robustness of the algorithm. Robustness refers to the ability to detect the watermark after the designated class of transformations [53]. Bit Error Ratio (BER) is a commonly used metrics to measure the robustness of watermarking methods. BER is defined as: where n e is the number of erroneous bits. A lower BER indicates the extracted results are closer to the original watermark information, which means the better robustness. Since the threshold T for watermark detection is set to 8, and the watermark length l is set to 60 in our method, this means that watermark detection is successful when BER is below 0.1333. In Section 5.1, the robustness to common image attacks is discussed. In Section 5.2, the proposed scheme is compared with two state-of-the-art schemes and the performance against screen-cam attack is analyzed in detail. In Section 5.3, considering real-life scenarios, some hypothetical scenarios were designed to verify the robustness of the algorithm.
The experimental instruments are as follows: The display device in this scenario is a 23-inch monitor with 1920 × 1080 pixels. Since the ordinary users' monitors are not accurately corrected, to mimic a real-world scenario, the monitors are not explicitly calibrated. An iPhone X with dual 12 MP pixels is used as the photography equipment. The lens is well focused while shooting, and shooting quality is controlled as much as possible.
The host images are the eight images in Figure 2. The PSNR values of each square region that contains an LSFR are controlled to be no less than 39 dB in our experiment. Figure 14 shows the corresponding watermarked images generated by the proposed method. The experimental instruments are as follows: The display device in this scenario is a 23-inch monitor with 1920 × 1080 pixels. Since the ordinary users' monitors are not accurately corrected, to mimic a real-world scenario, the monitors are not explicitly calibrated. An iPhone X with dual 12 MP pixels is used as the photography equipment. The lens is well focused while shooting, and shooting quality is controlled as much as possible.
The host images are the eight images in Figure 2. The PSNR values of each square region that contains an LSFR are controlled to be no less than 39 dB in our experiment. Figure 14 shows the corresponding watermarked images generated by the proposed method.

Robustness Against Common Image Attacks
To prove that the algorithm also has excellent robustness against common image attacks without screen-cam attack, we performed corresponding experiments. The results are shown in Table 2, and the PSNR and mean structural similarity index (MSSIM) [54] values are also listed.
The robustness primarily depends on whether feature points and watermarking information can be simultaneously detected. As shown in Table 2, the algorithm is robust to JPEG attacks, which can mostly survive at a JPEG of 20%. Because scale attacks cause the frame to shrink, we restore the scaled images before detection. The algorithm works when under a scaling 0.5 attack and basically works when under a scaling 0.4 attack. For cropping-off attacks, which refer to a continuous crop from the right in this section, assuming more than one relatively complete embedded LSFR exists, the detection can be successful in theory. Due to the fact that the watermark is repeatedly embedded in each LSFR, we can detect the watermark information at a cropping-off 50% attack in the experiments. The rotation attack may cause the loss of feature points since we only need at least one successful detection, and the algorithm is also effective. The algorithm also works at a median filter 3 × 3 attack. Thus, our watermarking scheme has excellent robustness to common image attacks.

Robustness against Common Image Attacks
To prove that the algorithm also has excellent robustness against common image attacks without screen-cam attack, we performed corresponding experiments. The results are shown in Table 2, and the PSNR and mean structural similarity index (MSSIM) [54] values are also listed.
The robustness primarily depends on whether feature points and watermarking information can be simultaneously detected. As shown in Table 2, the algorithm is robust to JPEG attacks, which can mostly survive at a JPEG of 20%. Because scale attacks cause the frame to shrink, we restore the scaled images before detection. The algorithm works when under a scaling 0.5 attack and basically works when under a scaling 0.4 attack. For cropping-off attacks, which refer to a continuous crop from the right in this section, assuming more than one relatively complete embedded LSFR exists, the detection can be successful in theory. Due to the fact that the watermark is repeatedly embedded in each LSFR, we can detect the watermark information at a cropping-off 50% attack in the experiments. The rotation attack may cause the loss of feature points since we only need at least one successful detection, and the algorithm is also effective. The algorithm also works at a median filter 3 × 3 attack. Thus, our watermarking scheme has excellent robustness to common image attacks.

Robustness against Screen-Cam
In this section, we verify the robustness against a screen-cam attack. First, we compare the proposed method with two existing algorithms [21,33]. Since the size of the host images used in their articles is different from this one, we use the same host images here. In order to improve the independence of the experimental results to the host images, we use additional twelve images from the database [52] to verify the performance. The PSNR values of the images generated by the proposed method are controlled to be not lower than by other methods, which are at around 42 dB. An example of Lena embedded with different methods is shown in Table 3. All the watermarked images are displayed on the screen at the original resolution. The comparison of BER for different shooting conditions is shown in Figure 15. The result shows method [21] designed for print-cam is not applicable for screen-cam process, and the proposed method and method [33] both have good robustness against screen-cam attack. In theory, without considering external interference, the distortion caused by shooting from the horizontal left and horizontal right is similar. Shooting at different vertical angles is also similar to shooting at different horizontal angles with a 90-degree rotation of the image. Therefore, as shown in Figure 16, the shooting angle is set from being perpendicular to the screen up to 60 degrees of horizontal left at intervals of fifteen degrees. The shooting distance is set from 40 to 110 cm at intervals of 10 cm. When the shooting angle is 45 or 60, the shooting distance of 40 cm is too small to capture the entire image. Therefore, the distance is selected to be over 50 cm. In theory, without considering external interference, the distortion caused by shooting from the horizontal left and horizontal right is similar. Shooting at different vertical angles is also similar to shooting at different horizontal angles with a 90-degree rotation of the image. Therefore, as shown in Figure 16, the shooting angle is set from being perpendicular to the screen up to 60 degrees of horizontal left at intervals of fifteen degrees. The shooting distance is set from 40 to 110 cm at intervals of 10 cm. When the shooting angle is 45 or 60, the shooting distance of 40 cm is too small to capture the entire image. Therefore, the distance is selected to be over 50 cm.
The example of Lena images recovered from captured images with different angles and distances and their detected BER by the secret key 1 K are shown in Table 4. The detection results of eight images are shown in Figure 16, where the red mark indicates the camera position relative to the screen and the dotted straight line indicates the shooting direction.  The example of Lena images recovered from captured images with different angles and distances and their detected BER by the secret key K 1 are shown in Table 4. The detection results of eight images are shown in Figure 16, where the red mark indicates the camera position relative to the screen and the dotted straight line indicates the shooting direction.  In theory, without considering external interference, the distortion caused by shooting from the horizontal left and horizontal right is similar. Shooting at different vertical angles is also similar to shooting at different horizontal angles with a 90-degree rotation of the image. Therefore, as shown in Figure 16, the shooting angle is set from being perpendicular to the screen up to 60 degrees of horizontal left at intervals of fifteen degrees. The shooting distance is set from 40 to 110 cm at intervals of 10 cm. When the shooting angle is 45 or 60, the shooting distance of 40 cm is too small to capture the entire image. Therefore, the distance is selected to be over 50 cm.
The example of Lena images recovered from captured images with different angles and distances and their detected BER by the secret key 1 K are shown in Table 4. The detection results of eight images are shown in Figure 16, where the red mark indicates the camera position relative to the screen and the dotted straight line indicates the shooting direction.  In theory, without considering external interference, the distortion caused by shooting from the horizontal left and horizontal right is similar. Shooting at different vertical angles is also similar to shooting at different horizontal angles with a 90-degree rotation of the image. Therefore, as shown in Figure 16, the shooting angle is set from being perpendicular to the screen up to 60 degrees of horizontal left at intervals of fifteen degrees. The shooting distance is set from 40 to 110 cm at intervals of 10 cm. When the shooting angle is 45 or 60, the shooting distance of 40 cm is too small to capture the entire image. Therefore, the distance is selected to be over 50 cm.
The example of Lena images recovered from captured images with different angles and distances and their detected BER by the secret key 1 K are shown in Table 4. The detection results of eight images are shown in Figure 16, where the red mark indicates the camera position relative to the screen and the dotted straight line indicates the shooting direction.  In theory, without considering external interference, the distortion caused by shooting from the horizontal left and horizontal right is similar. Shooting at different vertical angles is also similar to shooting at different horizontal angles with a 90-degree rotation of the image. Therefore, as shown in Figure 16, the shooting angle is set from being perpendicular to the screen up to 60 degrees of horizontal left at intervals of fifteen degrees. The shooting distance is set from 40 to 110 cm at intervals of 10 cm. When the shooting angle is 45 or 60, the shooting distance of 40 cm is too small to capture the entire image. Therefore, the distance is selected to be over 50 cm.
The example of Lena images recovered from captured images with different angles and distances and their detected BER by the secret key 1 K are shown in Table 4. The detection results of eight images are shown in Figure 16, where the red mark indicates the camera position relative to the screen and the dotted straight line indicates the shooting direction.  In theory, without considering external interference, the distortion caused by shooting from the horizontal left and horizontal right is similar. Shooting at different vertical angles is also similar to shooting at different horizontal angles with a 90-degree rotation of the image. Therefore, as shown in Figure 16, the shooting angle is set from being perpendicular to the screen up to 60 degrees of horizontal left at intervals of fifteen degrees. The shooting distance is set from 40 to 110 cm at intervals of 10 cm. When the shooting angle is 45 or 60, the shooting distance of 40 cm is too small to capture the entire image. Therefore, the distance is selected to be over 50 cm.
The example of Lena images recovered from captured images with different angles and distances and their detected BER by the secret key 1 K are shown in Table 4. The detection results of eight images are shown in Figure 16, where the red mark indicates the camera position relative to the screen and the dotted straight line indicates the shooting direction.  In theory, without considering external interference, the distortion caused by shooting from the horizontal left and horizontal right is similar. Shooting at different vertical angles is also similar to shooting at different horizontal angles with a 90-degree rotation of the image. Therefore, as shown in Figure 16, the shooting angle is set from being perpendicular to the screen up to 60 degrees of horizontal left at intervals of fifteen degrees. The shooting distance is set from 40 to 110 cm at intervals of 10 cm. When the shooting angle is 45 or 60, the shooting distance of 40 cm is too small to capture the entire image. Therefore, the distance is selected to be over 50 cm.
The example of Lena images recovered from captured images with different angles and distances and their detected BER by the secret key 1 K are shown in Table 4. The detection results of eight images are shown in Figure 16, where the red mark indicates the camera position relative to the screen and the dotted straight line indicates the shooting direction.  In theory, without considering external interference, the distortion caused by shooting from the horizontal left and horizontal right is similar. Shooting at different vertical angles is also similar to shooting at different horizontal angles with a 90-degree rotation of the image. Therefore, as shown in Figure 16, the shooting angle is set from being perpendicular to the screen up to 60 degrees of horizontal left at intervals of fifteen degrees. The shooting distance is set from 40 to 110 cm at intervals of 10 cm. When the shooting angle is 45 or 60, the shooting distance of 40 cm is too small to capture the entire image. Therefore, the distance is selected to be over 50 cm.
The example of Lena images recovered from captured images with different angles and distances and their detected BER by the secret key 1 K are shown in Table 4. The detection results of eight images are shown in Figure 16, where the red mark indicates the camera position relative to the screen and the dotted straight line indicates the shooting direction.  In theory, without considering external interference, the distortion caused by shooting from the horizontal left and horizontal right is similar. Shooting at different vertical angles is also similar to shooting at different horizontal angles with a 90-degree rotation of the image. Therefore, as shown in Figure 16, the shooting angle is set from being perpendicular to the screen up to 60 degrees of horizontal left at intervals of fifteen degrees. The shooting distance is set from 40 to 110 cm at intervals of 10 cm. When the shooting angle is 45 or 60, the shooting distance of 40 cm is too small to capture the entire image. Therefore, the distance is selected to be over 50 cm.
The example of Lena images recovered from captured images with different angles and distances and their detected BER by the secret key 1 K are shown in Table 4. The detection results of eight images are shown in Figure 16, where the red mark indicates the camera position relative to the screen and the dotted straight line indicates the shooting direction.  In theory, without considering external interference, the distortion caused by shooting from the horizontal left and horizontal right is similar. Shooting at different vertical angles is also similar to shooting at different horizontal angles with a 90-degree rotation of the image. Therefore, as shown in Figure 16, the shooting angle is set from being perpendicular to the screen up to 60 degrees of horizontal left at intervals of fifteen degrees. The shooting distance is set from 40 to 110 cm at intervals of 10 cm. When the shooting angle is 45 or 60, the shooting distance of 40 cm is too small to capture the entire image. Therefore, the distance is selected to be over 50 cm.
The example of Lena images recovered from captured images with different angles and distances and their detected BER by the secret key 1 K are shown in Table 4. The detection results of eight images are shown in Figure 16, where the red mark indicates the camera position relative to the screen and the dotted straight line indicates the shooting direction.  In theory, without considering external interference, the distortion caused by shooting from the horizontal left and horizontal right is similar. Shooting at different vertical angles is also similar to shooting at different horizontal angles with a 90-degree rotation of the image. Therefore, as shown in Figure 16, the shooting angle is set from being perpendicular to the screen up to 60 degrees of horizontal left at intervals of fifteen degrees. The shooting distance is set from 40 to 110 cm at intervals of 10 cm. When the shooting angle is 45 or 60, the shooting distance of 40 cm is too small to capture the entire image. Therefore, the distance is selected to be over 50 cm.
The example of Lena images recovered from captured images with different angles and distances and their detected BER by the secret key 1 K are shown in Table 4. The detection results of eight images are shown in Figure 16, where the red mark indicates the camera position relative to the screen and the dotted straight line indicates the shooting direction.  In theory, without considering external interference, the distortion caused by shooting from the horizontal left and horizontal right is similar. Shooting at different vertical angles is also similar to shooting at different horizontal angles with a 90-degree rotation of the image. Therefore, as shown in Figure 16, the shooting angle is set from being perpendicular to the screen up to 60 degrees of horizontal left at intervals of fifteen degrees. The shooting distance is set from 40 to 110 cm at intervals of 10 cm. When the shooting angle is 45 or 60, the shooting distance of 40 cm is too small to capture the entire image. Therefore, the distance is selected to be over 50 cm.
The example of Lena images recovered from captured images with different angles and distances and their detected BER by the secret key 1 K are shown in Table 4. The detection results of eight images are shown in Figure 16, where the red mark indicates the camera position relative to the screen and the dotted straight line indicates the shooting direction.  In theory, without considering external interference, the distortion caused by shooting from the horizontal left and horizontal right is similar. Shooting at different vertical angles is also similar to shooting at different horizontal angles with a 90-degree rotation of the image. Therefore, as shown in Figure 16, the shooting angle is set from being perpendicular to the screen up to 60 degrees of horizontal left at intervals of fifteen degrees. The shooting distance is set from 40 to 110 cm at intervals of 10 cm. When the shooting angle is 45 or 60, the shooting distance of 40 cm is too small to capture the entire image. Therefore, the distance is selected to be over 50 cm.
The example of Lena images recovered from captured images with different angles and distances and their detected BER by the secret key 1 K are shown in Table 4. The detection results of eight images are shown in Figure 16, where the red mark indicates the camera position relative to the screen and the dotted straight line indicates the shooting direction.  In theory, without considering external interference, the distortion caused by shooting from the horizontal left and horizontal right is similar. Shooting at different vertical angles is also similar to shooting at different horizontal angles with a 90-degree rotation of the image. Therefore, as shown in Figure 16, the shooting angle is set from being perpendicular to the screen up to 60 degrees of horizontal left at intervals of fifteen degrees. The shooting distance is set from 40 to 110 cm at intervals of 10 cm. When the shooting angle is 45 or 60, the shooting distance of 40 cm is too small to capture the entire image. Therefore, the distance is selected to be over 50 cm.
The example of Lena images recovered from captured images with different angles and distances and their detected BER by the secret key 1 K are shown in Table 4. The detection results of eight images are shown in Figure 16, where the red mark indicates the camera position relative to the screen and the dotted straight line indicates the shooting direction.  In theory, without considering external interference, the distortion caused by shooting from the horizontal left and horizontal right is similar. Shooting at different vertical angles is also similar to shooting at different horizontal angles with a 90-degree rotation of the image. Therefore, as shown in Figure 16, the shooting angle is set from being perpendicular to the screen up to 60 degrees of horizontal left at intervals of fifteen degrees. The shooting distance is set from 40 to 110 cm at intervals of 10 cm. When the shooting angle is 45 or 60, the shooting distance of 40 cm is too small to capture the entire image. Therefore, the distance is selected to be over 50 cm.
The example of Lena images recovered from captured images with different angles and distances and their detected BER by the secret key 1 K are shown in Table 4. The detection results of eight images are shown in Figure 16, where the red mark indicates the camera position relative to the screen and the dotted straight line indicates the shooting direction.  In theory, without considering external interference, the distortion caused by shooting from the horizontal left and horizontal right is similar. Shooting at different vertical angles is also similar to shooting at different horizontal angles with a 90-degree rotation of the image. Therefore, as shown in Figure 16, the shooting angle is set from being perpendicular to the screen up to 60 degrees of horizontal left at intervals of fifteen degrees. The shooting distance is set from 40 to 110 cm at intervals of 10 cm. When the shooting angle is 45 or 60, the shooting distance of 40 cm is too small to capture the entire image. Therefore, the distance is selected to be over 50 cm.
The example of Lena images recovered from captured images with different angles and distances and their detected BER by the secret key 1 K are shown in Table 4. The detection results of eight images are shown in Figure 16, where the red mark indicates the camera position relative to the screen and the dotted straight line indicates the shooting direction.  In theory, without considering external interference, the distortion caused by shooting from the horizontal left and horizontal right is similar. Shooting at different vertical angles is also similar to shooting at different horizontal angles with a 90-degree rotation of the image. Therefore, as shown in Figure 16, the shooting angle is set from being perpendicular to the screen up to 60 degrees of horizontal left at intervals of fifteen degrees. The shooting distance is set from 40 to 110 cm at intervals of 10 cm. When the shooting angle is 45 or 60, the shooting distance of 40 cm is too small to capture the entire image. Therefore, the distance is selected to be over 50 cm.
The example of Lena images recovered from captured images with different angles and distances and their detected BER by the secret key 1 K are shown in Table 4. The detection results of eight images are shown in Figure 16, where the red mark indicates the camera position relative to the screen and the dotted straight line indicates the shooting direction.  In theory, without considering external interference, the distortion caused by shooting from the horizontal left and horizontal right is similar. Shooting at different vertical angles is also similar to shooting at different horizontal angles with a 90-degree rotation of the image. Therefore, as shown in Figure 16, the shooting angle is set from being perpendicular to the screen up to 60 degrees of horizontal left at intervals of fifteen degrees. The shooting distance is set from 40 to 110 cm at intervals of 10 cm. When the shooting angle is 45 or 60, the shooting distance of 40 cm is too small to capture the entire image. Therefore, the distance is selected to be over 50 cm.
The example of Lena images recovered from captured images with different angles and distances and their detected BER by the secret key 1 K are shown in Table 4. The detection results of eight images are shown in Figure 16, where the red mark indicates the camera position relative to the screen and the dotted straight line indicates the shooting direction.  In theory, without considering external interference, the distortion caused by shooting from the horizontal left and horizontal right is similar. Shooting at different vertical angles is also similar to shooting at different horizontal angles with a 90-degree rotation of the image. Therefore, as shown in Figure 16, the shooting angle is set from being perpendicular to the screen up to 60 degrees of horizontal left at intervals of fifteen degrees. The shooting distance is set from 40 to 110 cm at intervals of 10 cm. When the shooting angle is 45 or 60, the shooting distance of 40 cm is too small to capture the entire image. Therefore, the distance is selected to be over 50 cm.
The example of Lena images recovered from captured images with different angles and distances and their detected BER by the secret key 1 K are shown in Table 4. The detection results of eight images are shown in Figure 16, where the red mark indicates the camera position relative to the screen and the dotted straight line indicates the shooting direction.  In theory, without considering external interference, the distortion caused by shooting from the horizontal left and horizontal right is similar. Shooting at different vertical angles is also similar to shooting at different horizontal angles with a 90-degree rotation of the image. Therefore, as shown in Figure 16, the shooting angle is set from being perpendicular to the screen up to 60 degrees of horizontal left at intervals of fifteen degrees. The shooting distance is set from 40 to 110 cm at intervals of 10 cm. When the shooting angle is 45 or 60, the shooting distance of 40 cm is too small to capture the entire image. Therefore, the distance is selected to be over 50 cm.
The example of Lena images recovered from captured images with different angles and distances and their detected BER by the secret key 1 K are shown in Table 4. The detection results of eight images are shown in Figure 16, where the red mark indicates the camera position relative to the screen and the dotted straight line indicates the shooting direction.  In theory, without considering external interference, the distortion caused by shooting from the horizontal left and horizontal right is similar. Shooting at different vertical angles is also similar to shooting at different horizontal angles with a 90-degree rotation of the image. Therefore, as shown in Figure 16, the shooting angle is set from being perpendicular to the screen up to 60 degrees of horizontal left at intervals of fifteen degrees. The shooting distance is set from 40 to 110 cm at intervals of 10 cm. When the shooting angle is 45 or 60, the shooting distance of 40 cm is too small to capture the entire image. Therefore, the distance is selected to be over 50 cm.
The example of Lena images recovered from captured images with different angles and distances and their detected BER by the secret key 1 K are shown in Table 4. The detection results of eight images are shown in Figure 16, where the red mark indicates the camera position relative to the screen and the dotted straight line indicates the shooting direction.  In theory, without considering external interference, the distortion caused by shooting from the horizontal left and horizontal right is similar. Shooting at different vertical angles is also similar to shooting at different horizontal angles with a 90-degree rotation of the image. Therefore, as shown in Figure 16, the shooting angle is set from being perpendicular to the screen up to 60 degrees of horizontal left at intervals of fifteen degrees. The shooting distance is set from 40 to 110 cm at intervals of 10 cm. When the shooting angle is 45 or 60, the shooting distance of 40 cm is too small to capture the entire image. Therefore, the distance is selected to be over 50 cm.
The example of Lena images recovered from captured images with different angles and distances and their detected BER by the secret key 1 K are shown in Table 4. The detection results of eight images are shown in Figure 16, where the red mark indicates the camera position relative to the screen and the dotted straight line indicates the shooting direction.  In theory, without considering external interference, the distortion caused by shooting from the horizontal left and horizontal right is similar. Shooting at different vertical angles is also similar to shooting at different horizontal angles with a 90-degree rotation of the image. Therefore, as shown in Figure 16, the shooting angle is set from being perpendicular to the screen up to 60 degrees of horizontal left at intervals of fifteen degrees. The shooting distance is set from 40 to 110 cm at intervals of 10 cm. When the shooting angle is 45 or 60, the shooting distance of 40 cm is too small to capture the entire image. Therefore, the distance is selected to be over 50 cm.
The example of Lena images recovered from captured images with different angles and distances and their detected BER by the secret key 1 K are shown in Table 4. The detection results of eight images are shown in Figure 16, where the red mark indicates the camera position relative to the screen and the dotted straight line indicates the shooting direction.  In theory, without considering external interference, the distortion caused by shooting from the horizontal left and horizontal right is similar. Shooting at different vertical angles is also similar to shooting at different horizontal angles with a 90-degree rotation of the image. Therefore, as shown in Figure 16, the shooting angle is set from being perpendicular to the screen up to 60 degrees of horizontal left at intervals of fifteen degrees. The shooting distance is set from 40 to 110 cm at intervals of 10 cm. When the shooting angle is 45 or 60, the shooting distance of 40 cm is too small to capture the entire image. Therefore, the distance is selected to be over 50 cm.
The example of Lena images recovered from captured images with different angles and distances and their detected BER by the secret key 1 K are shown in Table 4. The detection results of eight images are shown in Figure 16, where the red mark indicates the camera position relative to the screen and the dotted straight line indicates the shooting direction.  In theory, without considering external interference, the distortion caused by shooting from the horizontal left and horizontal right is similar. Shooting at different vertical angles is also similar to shooting at different horizontal angles with a 90-degree rotation of the image. Therefore, as shown in Figure 16, the shooting angle is set from being perpendicular to the screen up to 60 degrees of horizontal left at intervals of fifteen degrees. The shooting distance is set from 40 to 110 cm at intervals of 10 cm. When the shooting angle is 45 or 60, the shooting distance of 40 cm is too small to capture the entire image. Therefore, the distance is selected to be over 50 cm.
The example of Lena images recovered from captured images with different angles and distances and their detected BER by the secret key 1 K are shown in Table 4. The detection results of eight images are shown in Figure 16, where the red mark indicates the camera position relative to the screen and the dotted straight line indicates the shooting direction.  In theory, without considering external interference, the distortion caused by shooting from the horizontal left and horizontal right is similar. Shooting at different vertical angles is also similar to shooting at different horizontal angles with a 90-degree rotation of the image. Therefore, as shown in Figure 16, the shooting angle is set from being perpendicular to the screen up to 60 degrees of horizontal left at intervals of fifteen degrees. The shooting distance is set from 40 to 110 cm at intervals of 10 cm. When the shooting angle is 45 or 60, the shooting distance of 40 cm is too small to capture the entire image. Therefore, the distance is selected to be over 50 cm.
The example of Lena images recovered from captured images with different angles and distances and their detected BER by the secret key 1 K are shown in Table 4. The detection results of eight images are shown in Figure 16, where the red mark indicates the camera position relative to the screen and the dotted straight line indicates the shooting direction. As shown in Figure 16, when the horizontal shooting angle is lower than 30 degrees, watermarks are mostly detected successfully. When the horizontal shooting angle is 45 degrees, the watermark can be detected within a shooting distance of 90 or 100 cm. For a large shooting angle of 60 degrees, the image cannot be well focused. Thus, the watermark information can commonly be detected within a closer shooting distance, which is approximately 70 or 80 cm.  As shown in Figure 16, when the horizontal shooting angle is lower than 30 degrees, watermarks are mostly detected successfully. When the horizontal shooting angle is 45 degrees, the watermark can be detected within a shooting distance of 90 or 100 cm. For a large shooting angle of 60 degrees, the image cannot be well focused. Thus, the watermark information can commonly be detected within a closer shooting distance, which is approximately 70 or 80 cm.
We also tested the performance at other tilt shooting angles with a handhold shooting, as shown in Table 5; it also has excellent performance. Therefore, the proposed algorithm is robust to a screencam attack.  We also tested the performance at other tilt shooting angles with a handhold shooting, as shown in Table 5; it also has excellent performance. Therefore, the proposed algorithm is robust to a screen-cam attack.  As shown in Figure 16, when the horizontal shooting angle is lower than 30 degrees, watermarks are mostly detected successfully. When the horizontal shooting angle is 45 degrees, the watermark can be detected within a shooting distance of 90 or 100 cm. For a large shooting angle of 60 degrees, the image cannot be well focused. Thus, the watermark information can commonly be detected within a closer shooting distance, which is approximately 70 or 80 cm.
We also tested the performance at other tilt shooting angles with a handhold shooting, as shown in Table 5; it also has excellent performance. Therefore, the proposed algorithm is robust to a screencam attack.  As shown in Figure 16, when the horizontal shooting angle is lower than 30 degrees, watermarks are mostly detected successfully. When the horizontal shooting angle is 45 degrees, the watermark can be detected within a shooting distance of 90 or 100 cm. For a large shooting angle of 60 degrees, the image cannot be well focused. Thus, the watermark information can commonly be detected within a closer shooting distance, which is approximately 70 or 80 cm.
We also tested the performance at other tilt shooting angles with a handhold shooting, as shown in Table 5; it also has excellent performance. Therefore, the proposed algorithm is robust to a screencam attack.  As shown in Figure 16, when the horizontal shooting angle is lower than 30 degrees, watermarks are mostly detected successfully. When the horizontal shooting angle is 45 degrees, the watermark can be detected within a shooting distance of 90 or 100 cm. For a large shooting angle of 60 degrees, the image cannot be well focused. Thus, the watermark information can commonly be detected within a closer shooting distance, which is approximately 70 or 80 cm.
We also tested the performance at other tilt shooting angles with a handhold shooting, as shown in Table 5; it also has excellent performance. Therefore, the proposed algorithm is robust to a screencam attack.  As shown in Figure 16, when the horizontal shooting angle is lower than 30 degrees, watermarks are mostly detected successfully. When the horizontal shooting angle is 45 degrees, the watermark can be detected within a shooting distance of 90 or 100 cm. For a large shooting angle of 60 degrees, the image cannot be well focused. Thus, the watermark information can commonly be detected within a closer shooting distance, which is approximately 70 or 80 cm.
We also tested the performance at other tilt shooting angles with a handhold shooting, as shown in Table 5; it also has excellent performance. Therefore, the proposed algorithm is robust to a screencam attack.

Robustness Against Screen-Cam with Additional Common Attacks
The scheme in [33] needs to record the four vertices, which means it needs to know the original size. Furthermore, the scheme in [21] cannot deal with the cropping attack. However, in a real-life scenario, images may under common image processing attacks caused by normal user operations. Therefore, we experimented with several hypothetical scenarios to verify the effectiveness of the proposed algorithm for screen-cam with additional common attacks. We designed four realistic application scenarios where method [21] and [33] are not applicable: a) the Lena image is blocked by the window at 20 percent, which is equal to being cropped; b) the Peppers image is rotated five degrees and cropped; c) the Building image is scaled by 80%; d) the Pentagon image is scaled by 80% and rotated 90 degrees counterclockwise. An example of the four scenarios is shown in Table 6. When doing the watermark detection, assume that we do not know the specifics of the attacks, which means we do not correct the image to its original scale or original orientation manually. The coordinate points that are used for perspective correction are denoted in Table 6 as red dots.

Robustness Against Screen-Cam with Additional Common Attacks
The scheme in [33] needs to record the four vertices, which means it needs to know the original size. Furthermore, the scheme in [21] cannot deal with the cropping attack. However, in a real-life scenario, images may under common image processing attacks caused by normal user operations. Therefore, we experimented with several hypothetical scenarios to verify the effectiveness of the proposed algorithm for screen-cam with additional common attacks. We designed four realistic application scenarios where method [21] and [33] are not applicable: a) the Lena image is blocked by the window at 20 percent, which is equal to being cropped; b) the Peppers image is rotated five degrees and cropped; c) the Building image is scaled by 80%; d) the Pentagon image is scaled by 80% and rotated 90 degrees counterclockwise. An example of the four scenarios is shown in Table 6. When doing the watermark detection, assume that we do not know the specifics of the attacks, which means we do not correct the image to its original scale or original orientation manually. The coordinate points that are used for perspective correction are denoted in Table 6 as red dots.

Robustness Against Screen-Cam with Additional Common Attacks
The scheme in [33] needs to record the four vertices, which means it needs to know the original size. Furthermore, the scheme in [21] cannot deal with the cropping attack. However, in a real-life scenario, images may under common image processing attacks caused by normal user operations. Therefore, we experimented with several hypothetical scenarios to verify the effectiveness of the proposed algorithm for screen-cam with additional common attacks. We designed four realistic application scenarios where method [21] and [33] are not applicable: a) the Lena image is blocked by the window at 20 percent, which is equal to being cropped; b) the Peppers image is rotated five degrees and cropped; c) the Building image is scaled by 80%; d) the Pentagon image is scaled by 80% and rotated 90 degrees counterclockwise. An example of the four scenarios is shown in Table 6. When doing the watermark detection, assume that we do not know the specifics of the attacks, which means we do not correct the image to its original scale or original orientation manually. The coordinate points that are used for perspective correction are denoted in Table 6 as red dots.

Robustness Against Screen-Cam with Additional Common Attacks
The scheme in [33] needs to record the four vertices, which means it needs to know the original size. Furthermore, the scheme in [21] cannot deal with the cropping attack. However, in a real-life scenario, images may under common image processing attacks caused by normal user operations. Therefore, we experimented with several hypothetical scenarios to verify the effectiveness of the proposed algorithm for screen-cam with additional common attacks. We designed four realistic application scenarios where method [21] and [33] are not applicable: a) the Lena image is blocked by the window at 20 percent, which is equal to being cropped; b) the Peppers image is rotated five degrees and cropped; c) the Building image is scaled by 80%; d) the Pentagon image is scaled by 80% and rotated 90 degrees counterclockwise. An example of the four scenarios is shown in Table 6. When doing the watermark detection, assume that we do not know the specifics of the attacks, which means we do not correct the image to its original scale or original orientation manually. The coordinate points that are used for perspective correction are denoted in Table 6 as red dots.

Robustness against Screen-Cam with Additional Common Attacks
The scheme in [33] needs to record the four vertices, which means it needs to know the original size. Furthermore, the scheme in [21] cannot deal with the cropping attack. However, in a real-life scenario, images may under common image processing attacks caused by normal user operations. Therefore, we experimented with several hypothetical scenarios to verify the effectiveness of the proposed algorithm for screen-cam with additional common attacks. We designed four realistic application scenarios where method [21,33] are not applicable: (a) the Lena image is blocked by the window at 20 percent, which is equal to being cropped; (b) the Peppers image is rotated five degrees and cropped; (c) the Building image is scaled by 80%; (d) the Pentagon image is scaled by 80% and rotated 90 degrees counterclockwise. An example of the four scenarios is shown in Table 6. When doing the watermark detection, assume that we do not know the specifics of the attacks, which means we do not correct the image to its original scale or original orientation manually. The coordinate points that are used for perspective correction are denoted in Table 6 as red dots. size. Furthermore, the scheme in [21] cannot deal with the cropping attack. However, in a real-life scenario, images may under common image processing attacks caused by normal user operations. Therefore, we experimented with several hypothetical scenarios to verify the effectiveness of the proposed algorithm for screen-cam with additional common attacks. We designed four realistic application scenarios where method [21] and [33] are not applicable: a) the Lena image is blocked by the window at 20 percent, which is equal to being cropped; b) the Peppers image is rotated five degrees and cropped; c) the Building image is scaled by 80%; d) the Pentagon image is scaled by 80% and rotated 90 degrees counterclockwise. An example of the four scenarios is shown in Table 6. When doing the watermark detection, assume that we do not know the specifics of the attacks, which means we do not correct the image to its original scale or original orientation manually. The coordinate points that are used for perspective correction are denoted in Table 6 as red dots.  Figure 17 shows the detection results of the four scenarios. The construction of Figure 17 is the same as Figure 16. Furthermore, due to the different sizes of the experimental images, the shooting distance was adjusted accordingly. Because Scenario (a) and Scenario (b) use the four corner points of the screen for perspective correction, the experiment shooting distance starts from 50 cm. In these two scenarios, the performance of watermark detection is the same as the detection results of the same host images in Section 5.2. In Scenario (c) and Scenario (d), because the images are scaled, the test starting shooting distance can be shortened, and the effective detection distance is also shortened. When the shooting angle is 15 and 30 degrees, the watermark information can be detected at all shooting distances in the experiments. As the shooting angle increases, the detectable shooting distance is substantially reduced. Watermark information can be detected within a shooting distance of 50 cm when the horizontal shooting angle is 60 degrees. Thus, the scaling of the images has a considerable influence on the watermark detection of the large angle captured image, but it can still size. Furthermore, the scheme in [21] cannot deal with the cropping attack. However, in a real-life scenario, images may under common image processing attacks caused by normal user operations. Therefore, we experimented with several hypothetical scenarios to verify the effectiveness of the proposed algorithm for screen-cam with additional common attacks. We designed four realistic application scenarios where method [21] and [33] are not applicable: a) the Lena image is blocked by the window at 20 percent, which is equal to being cropped; b) the Peppers image is rotated five degrees and cropped; c) the Building image is scaled by 80%; d) the Pentagon image is scaled by 80% and rotated 90 degrees counterclockwise. An example of the four scenarios is shown in Table 6. When doing the watermark detection, assume that we do not know the specifics of the attacks, which means we do not correct the image to its original scale or original orientation manually. The coordinate points that are used for perspective correction are denoted in Table 6 as red dots.  Figure 17 shows the detection results of the four scenarios. The construction of Figure 17 is the same as Figure 16. Furthermore, due to the different sizes of the experimental images, the shooting distance was adjusted accordingly. Because Scenario (a) and Scenario (b) use the four corner points of the screen for perspective correction, the experiment shooting distance starts from 50 cm. In these two scenarios, the performance of watermark detection is the same as the detection results of the same host images in Section 5.2. In Scenario (c) and Scenario (d), because the images are scaled, the test starting shooting distance can be shortened, and the effective detection distance is also shortened. When the shooting angle is 15 and 30 degrees, the watermark information can be detected at all shooting distances in the experiments. As the shooting angle increases, the detectable shooting distance is substantially reduced. Watermark information can be detected within a shooting distance of 50 cm when the horizontal shooting angle is 60 degrees. Thus, the scaling of the images has a considerable influence on the watermark detection of the large angle captured image, but it can still size. Furthermore, the scheme in [21] cannot deal with the cropping attack. However, in a real-life scenario, images may under common image processing attacks caused by normal user operations. Therefore, we experimented with several hypothetical scenarios to verify the effectiveness of the proposed algorithm for screen-cam with additional common attacks. We designed four realistic application scenarios where method [21] and [33] are not applicable: a) the Lena image is blocked by the window at 20 percent, which is equal to being cropped; b) the Peppers image is rotated five degrees and cropped; c) the Building image is scaled by 80%; d) the Pentagon image is scaled by 80% and rotated 90 degrees counterclockwise. An example of the four scenarios is shown in Table 6. When doing the watermark detection, assume that we do not know the specifics of the attacks, which means we do not correct the image to its original scale or original orientation manually. The coordinate points that are used for perspective correction are denoted in Table 6 as red dots.  Figure 17 shows the detection results of the four scenarios. The construction of Figure 17 is the same as Figure 16. Furthermore, due to the different sizes of the experimental images, the shooting distance was adjusted accordingly. Because Scenario (a) and Scenario (b) use the four corner points of the screen for perspective correction, the experiment shooting distance starts from 50 cm. In these two scenarios, the performance of watermark detection is the same as the detection results of the same host images in Section 5.2. In Scenario (c) and Scenario (d), because the images are scaled, the test starting shooting distance can be shortened, and the effective detection distance is also shortened. When the shooting angle is 15 and 30 degrees, the watermark information can be detected at all shooting distances in the experiments. As the shooting angle increases, the detectable shooting distance is substantially reduced. Watermark information can be detected within a shooting distance of 50 cm when the horizontal shooting angle is 60 degrees. Thus, the scaling of the images has a considerable influence on the watermark detection of the large angle captured image, but it can still size. Furthermore, the scheme in [21] cannot deal with the cropping attack. However, in a real-life scenario, images may under common image processing attacks caused by normal user operations. Therefore, we experimented with several hypothetical scenarios to verify the effectiveness of the proposed algorithm for screen-cam with additional common attacks. We designed four realistic application scenarios where method [21] and [33] are not applicable: a) the Lena image is blocked by the window at 20 percent, which is equal to being cropped; b) the Peppers image is rotated five degrees and cropped; c) the Building image is scaled by 80%; d) the Pentagon image is scaled by 80% and rotated 90 degrees counterclockwise. An example of the four scenarios is shown in Table 6. When doing the watermark detection, assume that we do not know the specifics of the attacks, which means we do not correct the image to its original scale or original orientation manually. The coordinate points that are used for perspective correction are denoted in Table 6 as red dots.  Figure 17 shows the detection results of the four scenarios. The construction of Figure 17 is the same as Figure 16. Furthermore, due to the different sizes of the experimental images, the shooting distance was adjusted accordingly. Because Scenario (a) and Scenario (b) use the four corner points of the screen for perspective correction, the experiment shooting distance starts from 50 cm. In these two scenarios, the performance of watermark detection is the same as the detection results of the same host images in Section 5.2. In Scenario (c) and Scenario (d), because the images are scaled, the test starting shooting distance can be shortened, and the effective detection distance is also shortened. When the shooting angle is 15 and 30 degrees, the watermark information can be detected at all shooting distances in the experiments. As the shooting angle increases, the detectable shooting distance is substantially reduced. Watermark information can be detected within a shooting distance of 50 cm when the horizontal shooting angle is 60 degrees. Thus, the scaling of the images has a considerable influence on the watermark detection of the large angle captured image, but it can still Recovered image size. Furthermore, the scheme in [21] cannot deal with the cropping attack. However, in a real-life scenario, images may under common image processing attacks caused by normal user operations. Therefore, we experimented with several hypothetical scenarios to verify the effectiveness of the proposed algorithm for screen-cam with additional common attacks. We designed four realistic application scenarios where method [21] and [33] are not applicable: a) the Lena image is blocked by the window at 20 percent, which is equal to being cropped; b) the Peppers image is rotated five degrees and cropped; c) the Building image is scaled by 80%; d) the Pentagon image is scaled by 80% and rotated 90 degrees counterclockwise. An example of the four scenarios is shown in Table 6. When doing the watermark detection, assume that we do not know the specifics of the attacks, which means we do not correct the image to its original scale or original orientation manually. The coordinate points that are used for perspective correction are denoted in Table 6 as red dots.  Figure 17 shows the detection results of the four scenarios. The construction of Figure 17 is the same as Figure 16. Furthermore, due to the different sizes of the experimental images, the shooting distance was adjusted accordingly. Because Scenario (a) and Scenario (b) use the four corner points of the screen for perspective correction, the experiment shooting distance starts from 50 cm. In these two scenarios, the performance of watermark detection is the same as the detection results of the same host images in Section 5.2. In Scenario (c) and Scenario (d), because the images are scaled, the test starting shooting distance can be shortened, and the effective detection distance is also shortened. When the shooting angle is 15 and 30 degrees, the watermark information can be detected at all shooting distances in the experiments. As the shooting angle increases, the detectable shooting distance is substantially reduced. Watermark information can be detected within a shooting distance of 50 cm when the horizontal shooting angle is 60 degrees. Thus, the scaling of the images has a considerable influence on the watermark detection of the large angle captured image, but it can still size. Furthermore, the scheme in [21] cannot deal with the cropping attack. However, in a real-life scenario, images may under common image processing attacks caused by normal user operations. Therefore, we experimented with several hypothetical scenarios to verify the effectiveness of the proposed algorithm for screen-cam with additional common attacks. We designed four realistic application scenarios where method [21] and [33] are not applicable: a) the Lena image is blocked by the window at 20 percent, which is equal to being cropped; b) the Peppers image is rotated five degrees and cropped; c) the Building image is scaled by 80%; d) the Pentagon image is scaled by 80% and rotated 90 degrees counterclockwise. An example of the four scenarios is shown in Table 6. When doing the watermark detection, assume that we do not know the specifics of the attacks, which means we do not correct the image to its original scale or original orientation manually. The coordinate points that are used for perspective correction are denoted in Table 6 as red dots.  Figure 17 shows the detection results of the four scenarios. The construction of Figure 17 is the same as Figure 16. Furthermore, due to the different sizes of the experimental images, the shooting distance was adjusted accordingly. Because Scenario (a) and Scenario (b) use the four corner points of the screen for perspective correction, the experiment shooting distance starts from 50 cm. In these two scenarios, the performance of watermark detection is the same as the detection results of the same host images in Section 5.2. In Scenario (c) and Scenario (d), because the images are scaled, the test starting shooting distance can be shortened, and the effective detection distance is also shortened. When the shooting angle is 15 and 30 degrees, the watermark information can be detected at all shooting distances in the experiments. As the shooting angle increases, the detectable shooting distance is substantially reduced. Watermark information can be detected within a shooting distance of 50 cm when the horizontal shooting angle is 60 degrees. Thus, the scaling of the images has a considerable influence on the watermark detection of the large angle captured image, but it can still size. Furthermore, the scheme in [21] cannot deal with the cropping attack. However, in a real-life scenario, images may under common image processing attacks caused by normal user operations. Therefore, we experimented with several hypothetical scenarios to verify the effectiveness of the proposed algorithm for screen-cam with additional common attacks. We designed four realistic application scenarios where method [21] and [33] are not applicable: a) the Lena image is blocked by the window at 20 percent, which is equal to being cropped; b) the Peppers image is rotated five degrees and cropped; c) the Building image is scaled by 80%; d) the Pentagon image is scaled by 80% and rotated 90 degrees counterclockwise. An example of the four scenarios is shown in Table 6. When doing the watermark detection, assume that we do not know the specifics of the attacks, which means we do not correct the image to its original scale or original orientation manually. The coordinate points that are used for perspective correction are denoted in Table 6 as red dots.  Figure 17 shows the detection results of the four scenarios. The construction of Figure 17 is the same as Figure 16. Furthermore, due to the different sizes of the experimental images, the shooting distance was adjusted accordingly. Because Scenario (a) and Scenario (b) use the four corner points of the screen for perspective correction, the experiment shooting distance starts from 50 cm. In these two scenarios, the performance of watermark detection is the same as the detection results of the same host images in Section 5.2. In Scenario (c) and Scenario (d), because the images are scaled, the test starting shooting distance can be shortened, and the effective detection distance is also shortened. When the shooting angle is 15 and 30 degrees, the watermark information can be detected at all shooting distances in the experiments. As the shooting angle increases, the detectable shooting distance is substantially reduced. Watermark information can be detected within a shooting distance of 50 cm when the horizontal shooting angle is 60 degrees. Thus, the scaling of the images has a considerable influence on the watermark detection of the large angle captured image, but it can still size. Furthermore, the scheme in [21] cannot deal with the cropping attack. However, in a real-life scenario, images may under common image processing attacks caused by normal user operations. Therefore, we experimented with several hypothetical scenarios to verify the effectiveness of the proposed algorithm for screen-cam with additional common attacks. We designed four realistic application scenarios where method [21] and [33] are not applicable: a) the Lena image is blocked by the window at 20 percent, which is equal to being cropped; b) the Peppers image is rotated five degrees and cropped; c) the Building image is scaled by 80%; d) the Pentagon image is scaled by 80% and rotated 90 degrees counterclockwise. An example of the four scenarios is shown in Table 6. When doing the watermark detection, assume that we do not know the specifics of the attacks, which means we do not correct the image to its original scale or original orientation manually. The coordinate points that are used for perspective correction are denoted in Table 6 as red dots.  Figure 17 shows the detection results of the four scenarios. The construction of Figure 17 is the same as Figure 16. Furthermore, due to the different sizes of the experimental images, the shooting distance was adjusted accordingly. Because Scenario (a) and Scenario (b) use the four corner points of the screen for perspective correction, the experiment shooting distance starts from 50 cm. In these two scenarios, the performance of watermark detection is the same as the detection results of the same host images in Section 5.2. In Scenario (c) and Scenario (d), because the images are scaled, the test starting shooting distance can be shortened, and the effective detection distance is also shortened. When the shooting angle is 15 and 30 degrees, the watermark information can be detected at all shooting distances in the experiments. As the shooting angle increases, the detectable shooting distance is substantially reduced. Watermark information can be detected within a shooting distance of 50 cm when the horizontal shooting angle is 60 degrees. Thus, the scaling of the images has a considerable influence on the watermark detection of the large angle captured image, but it can still Figure 17 shows the detection results of the four scenarios. The construction of Figure 17 is the same as Figure 16. Furthermore, due to the different sizes of the experimental images, the shooting distance was adjusted accordingly. Because Scenario (a) and Scenario (b) use the four corner points of the screen for perspective correction, the experiment shooting distance starts from 50 cm. In these two scenarios, the performance of watermark detection is the same as the detection results of the same host images in Section 5.2. In Scenario (c) and Scenario (d), because the images are scaled, the test starting shooting distance can be shortened, and the effective detection distance is also shortened. When the shooting angle is 15 and 30 degrees, the watermark information can be detected at all shooting distances in the experiments. As the shooting angle increases, the detectable shooting distance is substantially reduced. Watermark information can be detected within a shooting distance of 50 cm when the horizontal shooting angle is 60 degrees. Thus, the scaling of the images has a considerable influence on the watermark detection of the large angle captured image, but it can still meet the actual needs. These results verified the fact that the proposed scheme can handle screen-cam with common attacks.

Applicability and Limitations analysis
The proposed scheme works well for most types of images, but it inevitably has limitations. Feature point-based algorithms are limited by the feature point operator itself. For images with simple texture, the feature points are often unstable when under a severe image quality degradation. Therefore, for images with simple texture, the proposed method may not achieve accurate watermark synchronization, which will probably cause watermark detection to fail.
Another limitation is that the proposed scheme is not applicable to this situation, where the image displayed on the screen is greatly zoomed out before we capture it with a camera. Because in this case, the image displayed on the screen is resampled, which will cause a massive loss of image details. Unfortunately, the screen-cam process will amplify this distortion. Especially for highresolution images, the users are most likely to zoom out to view the entire image. Therefore, the proposed scheme could be used with access control systems or other specific applications to avoid

Applicability and Limitations Analysis
The proposed scheme works well for most types of images, but it inevitably has limitations. Feature point-based algorithms are limited by the feature point operator itself. For images with simple texture, the feature points are often unstable when under a severe image quality degradation. Therefore, for images with simple texture, the proposed method may not achieve accurate watermark synchronization, which will probably cause watermark detection to fail.
Another limitation is that the proposed scheme is not applicable to this situation, where the image displayed on the screen is greatly zoomed out before we capture it with a camera. Because in this case, the image displayed on the screen is resampled, which will cause a massive loss of image details. Unfortunately, the screen-cam process will amplify this distortion. Especially for high-resolution images, the users are most likely to zoom out to view the entire image. Therefore, the proposed scheme could be used with access control systems or other specific applications to avoid this situation.
Furthermore, because the motivation of this method is to hold accountability for leakage behavior, the time complexity of algorithm is not a very important consideration. However, in other words, time complexity is also one of our limitations. The computation time of watermark embedding includes two parts: LSFRs construction and message embedding. Based on a personal computer, which CPU is Intel Core i7-9700 CPU and RAM is 32 GB, the average computation time of LSFRs construction and message embedding for the host images are 7.041 s and 0.106 s, respectively. The Harris-Laplace operation involves multiscale and iterative calculations, which cost most of the computation time. Based on the algorithm, the time complexity of embedding algorithm is O(Length · Width), where Length and Width define the length and width of the image, respectively. Hence, for high-resolution images, the computation time will vary according to their size. With respect to watermark detection, the process of finding candidate LSFRs is similar to the process of constructing LSFRs. Although the message extraction process iterates the message extraction algorithm within our defined detection range, the computation time is still insignificant compared with the process of finding candidate LSFRs. Hence, after the manual perspective correction process, the time complexity of watermark detection is similar to watermark embedding. Therefore, considering the user experience, the algorithm is not recommended for real-time applications for now.

Conclusions
In this paper, a novel feature and Fourier-based screen-cam robust watermarking scheme is proposed. The distortions during the screen-cam process are analyzed. To resist possible desynchronization attacks caused by user operations and the screen-cam process, an LSFR construction method, based on the modified Harris-Laplace detector and SURF orientation descriptor, is designed to achieve watermark synchronization. In the proposed message embedding scheme, we repeatedly embed the message sequence in the DFT domain of each selected LSFR to achieve robustness against the screen-cam process. To decrease the quality degradation after embedding and improve the extraction accuracy, we employ a non-rotating embedding method and a preprocessing method to modulate the DFT magnitude coefficients. On the extraction side, we restore the captured image based on the size of the image itself to help improve the detection accuracy. The experiment shows that the proposed scheme has high robustness for common image attacks and screen-cam attacks. Compared with existing methods, the proposed scheme can further achieve robustness against screen-cam with additional common attacks.
In future research, we aim to investigate automatic detection methods, which is a more practical application foreground. To achieve this goal, screen-cam robust invariants should be further investigated to help design novel local feature-based watermark synchronization methods or develop novel synchronization watermark message embedding and automatic detection methods.