A Study on Reversible Data Hiding Technique Based on Three-Dimensional Prediction-Error Histogram Modiﬁcation and a Multilayer Perceptron

: In the past few years, with the development of information technology and the focus on information security, many studies have gradually been aimed at data hiding technology. The embedding and extraction algorithms are mainly used by the technology to hide the data that requires secret transmission into a multimedia carrier so that the data transmission cannot be realized to achieve secure communication. Among them, reversible data hiding (RDH) is a technology for the applications that demand the secret data extraction as well as the original carrier recovery without distortion, such as remote medical diagnosis or military secret transmission. In this work, we hypothesize that the RDH performance can be enhanced by a more accurate pixel value predictor. We propose a new RDH scheme of prediction-error expansion (PEE) based on a multilayer perceptron, which is an extensively used artiﬁcial neural network in plenty of applications. The scheme utilizes the correlation between image pixel values and their adjacent pixels to obtain a well-trained multilayer perceptron so that we are capable of achieving more accurate pixel prediction results. Our data mapping method based on the three-dimensional prediction-error histogram modiﬁcation uses all eight octants in the three-dimensional space for secret data embedding. The experimental results of our RDH scheme show that the embedding capacity greatly increases and the image quality is still well maintained.


Background
With the rapid development of information technology, the internet has been ubiquitous in the world. Thanks to the development of optical communication systems (see [1] for more discussions), people can easily communicate with each other and share multimedia messages, including texts, sound, images, videos, etc. Obviously, the internet provides much more impact on human society than any other medium, while at the same time, issues regarding information security have received considerable and critical attention.
Data hiding is an available technique to deal with secure communication so that the secure data is imperceptibly embedded without drawing attention [2]. The multimedia is used as a cover carrier to hide secret data which will be transmitted in the internet. Reversible data hiding (RDH) not only guarantees the safe transmission of data content but also recovers the hidden data as well as the cover images [3,4]. However, most of these RDH algorithms bring permanent distortions to the original carrier during the embedding

Prediction-Error Expansion
In this subsection, we introduce the RDH paradigm we mainly follow: the predictionerror expansion (PEE) approach, which was first proposed by Thodi and Rodriguez [8]. PEE is a kind of histogram-shifting technique for which histograms of the feature elements (e.g., pixel values, errors between cover pixel values and their predicted values) are shifted to prepare vacant positions for embedding the secret bits. Since the most frequent feature elements determines the EC, and moreover, peaks of the prediction-error histograms usually center at zero, PEE has the advantage over the other histogram-shifting techniques in the spatial domain, especially for the cover image with flat pixel value histogram [9].
PEE can exploit spatial redundancy in the image. The correlation of local neighborhood of each pixel is taken into consideration. Following a certain order of scanning the original image, a predictor is used to make prediction of each pixel. Denote byx the predicted value of a pixel x. The prediction-error of x is defined as e x = x −x. One can expand the prediction-error e x to be e * x = f (e x , m) for some shifting operation f and a to-be-embedded bit m ∈ {0, 1}. When the context is clear, we omit the parameter m in f to make formula concise. In the stego-image this pixel will bex =x + e * x . As illustrated in [5], where m ∈ {0, 1} is a to-be-embedded bit. At pixel x at which e x ∈ {0, −1}, a secret bit is embedded, while for pixel x at which e x / ∈ {0, −1}, the pixel value is shifted by 1 or −1. With prediction-errors at hand, the prediction-error histogram (PEH) can be created as h(a) = |{i : e i = a}| for each prediction-error a. Specifically, PEE can be implemented as histogram modification of the PEH, that is, expanding the bins of −1 and 0 and shifting the other bins to create space to ensure the reversibility. Such a paradigm has been extended to 2D-PEH (e.g., [10]), where the PEH is defined by h 2 (a, b) = |i : (e 2i−1 , e 2i ) = (a, b)|, and also 3D-PEH (e.g., [5]), where the PEH is defined as h 3 (a, b, c) = |i : (e 3i−2 , e 3i−1 , e 3i ) = (a, b, c)|.

Our Contribution
As the illustrating example of PEE shows, EC depends on the prediction accuracy of the pixels. When PEE is applied, the data bits are embedded only when the prediction-error is −1 or 0. Hence, we have the following hypothesis.

Hypothesis 1.
As the prediction accuracy is improved, the performance of the PEE techniques for RDH is enhanced.
In this paper, we devote our efforts in validating this hypothesis. Specifically, we aim at improving the prediction accuracy in PEE using deep an artificial neural network (ANN), which has been developed rapidly and extensively studied in the past decade. We propose a novel method based on a multilayer perceptron (MLP), which is a well-known ANN consisting of multiple sequential fully connected layers and providing nonlinear mapping between input data and output data with nonlinear activation functions. Moreover, we consider eight octants in the three-dimensional space for embedding, which makes better use of space (c.f. [5] which considers only the first octant for the embedding). We conduct experiments by applying our proposed method on six test images, including Lena, Baboon, Boat, Peppers, Airplane (F-16), and House. The experimental results well support our hypothesis. The EC greatly increases and is 1.9-9.8 times of previous methods. On the other hand, the image quality is still well maintained in terms of low PSNR, which is competitive compared with previous work.

Remark 1.
Our MLP consists of layers of nodes. The nodes between consecutive layers are fully connected by weighted edges. Each node receives input from nodes on the previous layer and sends output by passing the aggregated input to a nonlinear activation function. It has been shown that the well-trained MLP can be used to approximate any smooth and measurable function [28]. The MLP has been proven to be an effective alternative to more traditional statistical techniques [29]. Recently, the MLP has been widely used in many different fields of research (e.g., see [30][31][32][33][34] for more details). Our proposed method applies MLP to the pixel prediction phase of prediction-error histogram modification. We train the MLP network and use it to derive more accurate pixel prediction. Unlike other statistical techniques, the MLP makes no prior assumptions on the data distribution and can be accurately applied even when new or unseen data appear. These features of the MLP make it an attractive alternative when developing numerical models and choosing between statistical methods.

Related Work and Comparisons between the Methods
Shi et al. [6] reviewed the recent advances on RDH in the past two decades, including various RDH schemes in image spatial domain, RDH for compressed images, robust RDH which aims at recovering hidden message from the lossily compressed image, RDH for encrypted images and RDH for video and audio. The RDH in image spatial domain is the most investigated subject and strongly related to this paper. We summarize progresses on this subject as below.

1.
Lossless compression-based methods. Most early RDH was implemented based on lossless compression [35][36][37][38][39][40][41][42]. Partial space is released by lossless compressing a feature set of the original image, and the data is embedded using the released space to achieve RDH. The performance of this method depends on the lossless compression algorithm used and the selection of compressed feature sets. The experimental results suggest that the algorithm based on lossless compression will result in greater distortion and poorer embedding effect than the subsequent RDH method.
Integer-transform-based methods can be seen in [36,39,41]. In this type of method, the original image is initially divided, so that multiple adjacent pixels can form an embedding unit. Subsequently, the secret information is embedded into each unit using integer transform. However, this type of method usually uses the average value of a pixel block to predict each pixel in the block, so that the image redundancy cannot be well utilized. Moreover, its algorithm cannot control the maximum modification range of each pixel so that the embedded distortion cannot be controlled effectively. Due to two defects mentioned above, the embedding performance of the integer transform-based methods is limited. The performance of this type of method has been significantly improved compared to the lossless compression-based methods; however, it still cannot achieve good embedding performance.

3.
Two-phase embedding with location maps. There are RDH schemes proceeds with two-phases (e.g., [43][44][45]) using location maps which map each pixel to a certain value and also ensure the reversibility of the cover image. In [44], Malik et al. considered even-valued and odd-valued pixels separately and embed the secret data bit for each pixel of the cover image by changing its value by at most 1. Their work improves previous complementary embedding strategy by Chang and Kieu [43] which uses vertical embedding and horizontal embedding separately in two phases. Kumar et al. considered even-valued and odd-valued pixels with location maps as well while the cover image is divided into non-overlapping 2-by-2 blocks of pixels and the secret bits are converted into 2-bit segments and embedded into the blocks by increasing or decreasing the pixel value of the corresponding block by at most 1. Since the second phase embedding has the affect as complement of the first phase embedding, this kind of approach persist the stego-image's quality while doubling the EC.
In this type of method, the original image is mapped to space with a lower dimension at the beginning by using the redundancy of the image. Then generate a histogram by counting the distribution of the low-dimensional space. Finally, the reversible embedding is realized by modifying the histogram. The earliest method having a great impact is proposed by Ni et al. in 2006 [46]. In this method, the secret data is embedded into the pixels with the highest frequency in the image histogram by expanding the histogram. The stego-image with this method maintains high image quality, but the embedding rate is low. Therefore, Lee et al. [47] improved the method of [46], which uses the image difference histogram that the shape rule is similar to Laplace distribution. The histogram of the method experiences a very high peak and rapidly dropping; therefore, it can have a better embedding capacity while maintaining image quality.
From the above points of view, the methods based on histogram modification, especially PEE based on PEH modification, have better embedding performance than other methods. Therefore, we focus on histogram generation and three-dimensional histogram modification. Note that the current RDH methods based on histogram modification mainly include the following aspects: • Generation method of histogram. Combined with PEE, the methods of this research direction mainly aim to generate a sharp and rapidly dropping PEH by using better image prediction methods, e.g., the methods of [12,13,19,20,23,24]. • Modification method of histogram. Different from the early expansion methods [8,9,16,24] using a peak in histogram, several authors [15,[25][26][27] proposed methods to expand the histogram by adaptively selecting with the frequency of pixels in the image histogram. These methods can significantly reduce the embedding distortion of PEE. • Selection of embedding location. This type of method firstly selects the image area that is more suitable for reversible embedding (usually smooth areas), and then uses the selected area as a new carrier for RDH. The effect of these methods are remarkable. Combining with PEE can effectively reduce the embedding distortion of PEE. Its idea was first proposed by Kamstra et al. [18], and many subsequent works have also applied this method as an auxiliary means to further optimize the embedding performance. • High-dimensional histogram modification. Several authors [10,21] proposed the methods based on high-dimensional histogram modification. They map high-dimensional redundant features of images to twodimensional space, and then modify the two-dimensional histogram to achieve reversible embedding. In recent works [5,14,17], the methods based on three-dimensional or high-dimensional histogram modification are proposed. By mapping the redundant features of the image to a higher-dimensional space, the embedding capacity is increased and the image quality is maintained. This type of method can greatly improve the embedding performance of existing PEE algorithms. • Multi-histogram modification. In [11,22], the reversible embedding methods based on using multi-histograms are proposed. Compared with the method of using a single histogram, the use of multiple histograms has greater flexibility and can further improve the performance of PEE algorithms. • PEH for color images.
In [51], Zhan et al. applied 3D-PEH to color images. Their approach is to predict the pixel values of each RGB channel of a color image and establish the 3D prediction-error histogram. Their results yield low distortion for color images.
Below we summarize two recent progress on the other perspectives on the histogram modification-based methods.

•
Histogram-shifting-imitated technique based on human visual system (HVS). Kumar et al. take human visual system into consideration [52] and improves previous work using histogram-shifting-imitated reversible data hiding method in [53]. Since human eyes are more sensitive to the changes in lower intensity pixels than higher ones, this approach divide the intensity levels into four groups of equal size and embed less bits in the low intensity pixels for less conceived distortion of the stego-image so that the visual imperceptibility is improved.
Li et al. [20] proposed the pixel value ordering (PVO) technique which is an advancement of PEE. When the cover images are divided into blocks, PVO first sorts pixel values in each block and then computes minimum, maximum, second-minimum and second maximum pixels which are used for data embedding depending on the minimum and maximum prediction errors in the blocks. PVO changes the pixel values only by at most 1; hence, it generates high quality stego-images. Kaur et al. [54] propose RDH technique using PVO and pairwise PEE to improve EC while retain the quality of the stego image. The embedding strategy is performed in two-phases on three-pixel blocks. Pixels are traversed in a zig-zag way and then sorted based on their rhombus means. The key of PVO for increasing EC is that smaller prediction errors are derived after pixels are sorted. Kaur et al. [55] also considered RDH based on PVO for roughly texture images. For more thorough survey on RDH approaches based on PVO can refer to the survey in [54].

Comparisons and Highlight of Our Approach
According to above discussions, we list the general comparisons of RDH methods in Table 1. As for the histogram modification-based framework which has attracted much attention and is strongly related to our work and covers the PEE paradigm we mainly follow, we highlight in Table 2 our proposed approach by comparisons with other approaches of this type, such as Ni et al. [46], Lee et al. [47], Li et al. [21], and Cai et al. [5], using average experimental results for six gray-scale images. As Table 2 shows, the embedding capacity of our method is much more than the four other methods, while the image quality is a bit sacrificed due to slightly larger image distortion, though it is tolerable since the PSNR is still close to 50dB. Our results reveal that, due to much better prediction accuracy of pixel values, our method is capable of achieving high embedding capacity while suffering only slight image distortion. Table 1. General comparison of RDH methods. ×: poor; ∆: unable to control effectively/limited; •: good; : even better.

RDH Method Types
Image Quality Embedding Capacity Table 2. Comparisons of histogram modification-based RDH methods. The image quality is measured by average PSNR (dB) when maximum embedding capacity is attained. Average embedding capacity are measured in bits.

The Proposed Approach
In this section, we introduce our reversible data hiding scheme based on 3D-PEH modification and a MLP as the pixel value predictor. As the hypothesis in Section 1 states, we expect the performance of such a RDH scheme can be greatly enhanced by an accurate MLP predictor. The characteristics of the correlation between the image pixel value and the neighboring pixels is used, so that the accuracy of pixel prediction can be hopefully improved due to a better trained MLP model. This then leads to increased embedding capacity. Overall, our proposed method includes four parts: the pre-processing phase, the training and prediction phase, the embedding and shifting phase, and the extraction and recovery phase. The flowchart of the proposed method is shown in Figure 2. We specify the four phases in the following subsections.

The Pre-Processing Phase
The pixel values of the cover image will be modified by +1 or −1 when the secret data embed based on 3D-PEH. Therefore, in order to avoid the overflow and underflow, the cover image will be pre-processed. Amend the pixel with value 0 to 1, and the pixel with value 255 to 254. Meanwhile, a location map is created to record these modified pixel positions. The location map is a binary sequence, which can be losslessly compressed to reduce its size. Then the secret data and the compressed location map are combined (hereinafter referred to as secret data); thereby, the pre-processing phase has been completed. After that, they will be embedded in the pre-processed cover image together.

The Training and Prediction Phase
The PEE method aims at the correlations between the pixels to derive accurate predictions where the prediction-errors are modified separately. However, the traditional PEE method uses the same algorithm to predict pixels for all images. This results in poor prediction accuracy and the prediction error increases as the image is relatively complex. Therefore, our proposed method, which leverages the power of trained MLP model, can predict the pixels of the cover image and significantly reduces the prediction-error so that the embedding capacity can be hopefully increased.
In the MLP training stage, except for pixels located in borders, the pixels are scanned from left to right and top to bottom to derive the cover sequence (y 1 , . . . , y n ). Consider the four-neighbor tuple (x top , x bottom , x left , x right } of a given pixel y i , shown in the left part of the Figure 3. The four-neighbor tuple is used as input data of the neural network, and the desired output value is y i . The structure of an MLP neural network has one input layer, two hidden layers, and one output layer, as shown in Figure 3. The input of four-neighbor tuples (x top , x bottom , x left , x right ) from the cover image is fed into the input layer of the MLP. Between the input and output layers, there have 100 and 200 neurons in two hidden layers, respectively. After the information income is processed by the network, the output layer of the neural network provides one outputỹ i as the predicted value by the MLP, and the corresponding y i in the cover image is used as reference data. We use the mean squared error (MSE) as the loss function which is calculated by taking the average squared difference between the predicted pixel value and the reference pixel value. The MSE function is defined as the Equation (1). Apparently, there is no prediction errors if and only if the MSE value is 0.
Here, N is the number of data points,ỹ i is the value returned by the model, and y i is the actual value for data point. Based on those input and reference data, the MLP network is then trained with the loss function such that the edge weights of the MLP are optimized to best associate given neighborhoods with the reference pixel values.

The Embedding and Shifting Phase
After the training and prediction phase is completed, the scheme enters the embedding and shifting phase. In order to embed the binary secret data in the cover image, the threedimensional PEH (3D-PEH) modification is used for embedding and shifting. However, in the previous work on 3D-PEH modification, only the points located in the first octant of the three-dimensional coordinate system are modified. This way of hiding secrets did not make use of most of the space in the three-dimensional coordinate system for embedding; hence, the embeddable pixels are relatively less and a less embedding capacity of images is made. Instead, our proposed method embed secret data in eight octants of the three-dimensional space, so that we possibly exploit much more space than previous approaches.
We adopt rhombus prediction and double-layered embedding, the same as the way used in [5,24], for the implementation of the proposed method to generate non-overlapping prediction-error triple (e x , e y , e z ) = (e 3i−2 , e 3i−1 , e 3i ) for feasible i (i.e., each pixel in the triple has four neighboring pixels). A 3D-PEH is generated by counting each non-overlapping prediction error triple, and the data embedding is realized by the obtained 3D-PEH modification using the designed reversible mapping. The data embedding procedure is briefly described as follows.
First, adopt double-layered embedding to divide the cover image into two sets denoted as "star"and "dot"(as shown in Figure 4a). The star and dot sets are embedded with half of the secret data, separately. Except for the pixels located in borders, the pixels of the star or dot set are scanned from left to right and top to bottom to derive the cover sequence (p 1 , . . . , p n ). The scan orders for star and dot pixels are shown in Figure 4b,c. Then, the 4-neighbor pixels of each p i are introduced to the trained MLP to obtain its predicted valuep i . The predicted value is used to determine the prediction-error sequence (e 1 , . . . , e n ), and the sequence is divided into the prediction-error triples e x , e y , e z . The prediction-error e i can be obtained as Lastly, modify each prediction-error triple (e x , e y , e z ) to be (e * x , e * y , e * z ) and get (p x ,p y , p z ) = (p x + e * x ,p y + e * y ,p z + e * z ) to embed data based on the 3D-PEH in the method shown in Tables 3-5. The 3D-PEH mapping method is divided into seven types: Type A to Type G. Table 3. Type A-C of the marked values of prediction-error triple (e x , e y , e z ) and cover pixel triple p x , p y , p z in different types of the proposed method with embedding as the data embedding operations on (e x , e y , e z ).

Type
(e x , e y , e z ) Secret Bits EC (bits) (e * x , e * y , e * z ) (p x ,p y ,p z ) D (e x , e y , e z ) = (0, ±1, ±1) (e x , e y , e z ) = (±1, 0, ±1) (e x , e y , e z ) = (±1, ±1, 0) (p x , p y , p z ± 1) (p x ± 1, p y ± 1, p z ± 1) Table 5. Type G of the marked values of prediction-error triple (e x , e y , e z ) and cover pixel triple p x , p y , p z in different types of the proposed method with shifting as the data embedding operations on (e x , e y , e z ).

Type
(e x , e y , e z ) Secret Bits EC (bits) (e * x , e * y , e * z ) (p x ,p y ,p z ) G e x , e y , e z = 0, and (e x , e y , e z ) / ∈ TypeB, F --(e x ±1, e y ±1, e z ±1) (p x ±1, p y ±1, p z ±1) Figure 5 visualizes the mapping how the secret data are embedded. The goal of such visualization is to provide an intuitive way to verify the reversibility of the our proposed method. First of all, there are seven types of embedding in the proposed method, the mapping relationship of Type A, B, ..., and G can be visualized as shown in Figure 5. An arrow with the starting point x to the end point y represents the data x transforms to the data y in this mapping. That is, the prediction-error groups e x , e y , e z and the cover pixel groups p x , p y , p z are modified by type A to type F according to the condition of the secret which will be embedded. For example, Type A could hide data by transforming (0, 0, 0) into (0, 0, 0), (0, 0, 1), (0, 1, 0), (0, −1, 0), (1, 0, 0), (0, 0, −1), and (−1, 0, 0). Therefore, Figure 5a shows the six arrows which starts from (0, 0, 0) to the destinations (0, 0, 0), (0, 0, 1), (0, 1, 0), (0, −1, 0), (1, 0, 0), (0, 0, −1), and (−1, 0, 0), respectively. Therefore, one can check if the mapping for data hiding is revertible by checking if one point in the mapping diagram can be reached by multiple points.  After the embedding and shifting phase, the stego-image embedded with secret data will be obtained. Then, the stego-image and the trained MLP model are sent to the receiver side through the communication channel.

The Extraction and Recovery Phase
Through the communication channel, the stego-image and the trained MLP model are received. Next, we consider the secret data extraction from the stego-image and the stego-image recovery. The scheme then enters the extraction and recovery phase.
In the extraction and recovery stage, the procedure of the secret data extraction and the stego-image recovery is similar to the procedure of embedding and shifting. The secret data extraction process is briefly described as follows.
First, rhombus prediction and double-layered embedding is adopted to divide the stego-image into two sets denoted as "star"and "dot" (as shown in Figure 4a), and half of the secret data will be extracted from the star and dot sets, respectively. Except for the pixels located in borders, the pixels of the star or dot set are scanned from top-left to bottom-right to derive the stego sequence (p 1 , . . . , p n ).
Then, the 4-neighbor dots of each p i are introduced to the trained MLP to obtain its predicted valuep i . The predicted value is used to determine the prediction-error sequence (e 1 , . . . , e n ), and the sequence is divided into the prediction-error triples (e x , e y , e z ). The prediction-error e i can be obtained as Finally, each recovered triple (p x , p y , p z ) is extracted based on the 3D-PEH as the method shown in Table 6-8. The 3D-PEH recovery method is divided into seven types: Type A to Type G . Besides, (e x , e y , e z ) should be the prediction-errors between the "marked pixels" (in the stego-image) and the prediction of the "marked pixels". When the predictionerror e i is 1, the recovered value is p i = p i − 1, and when the prediction-error e i is −1, the recovered value is p i = p i + 1.
The secret data bits are extracted by type A to type G according to the condition of the prediction-error group and the stego pixel group (p x , p y , p z ) is recovered to the recover pixel groups that have the same pixel values as the cover pixel groups (p x , p y , p z ) . In addition, the type G has no embedded data bits, so only recover the stego pixel groups to the recover pixel groups without secret data extraction. Table 6. Type A -C of the The extracted secret bits and the recovered values of prediction-error triple (e x , e y , e z ) and stego pixel triple (p x , p y , p z ) in different types of the proposed method with embedding as the data embedding operations on (e x , e y , e z ).

Computational Complexity
Assume that the image has height M and width N respectively. In the pre-processing phase (where a lossless compression is used for the location map; however, we can assume that it can be done in time linearly in the number of pixels if we do not require the space usage as small as possible), training-and-prediction phase, embedding-and-shifting phase and extraction and recovery phase, the computational complexity is basically O(MN) because there are O(MN) pixels to be scanned for a constant number of times. We remark here that though the structure of the MLP neural network is fixed so that this part contributes a constant factor in the complexity, such a constant factor hidden in the asymptotic notation can actually be huge. More specifically, for each input data point (i.e., a set of four pixels) fed to the input layer of the MLP neural network in one iteration, there are 100 × 200 multiplications required to compute the activation of all the neurons.

Experimental Results
The experimental results are shown in this section. Six grayscale images of size 512by-512, including Lena, Baboon, Boat, Peppers, Airplane (F-16), and House, are used in our experiments. The cover images and the stego-images which are embedded 10,000 bits of secret data are shown in Figure 6 . In addition, the variations in image quality under different embedding capacities are compared (as shown in Figure 7) . The most common strategy to measure the image quality is the calculation of Peak Signal to Noise Ratio (PSNR) function which is defined as PSNR = 10 · log 10 255 · 255 MSE , The results of the testing image (Lena) is presented in Figure 7. In addition, from the line chart can be observed that when the embedding capacity is less than 60,000 bits, the PSNR will decrease steadily. However, when the embedding capacity is more than 60,000 bits, PSNR will begin to decline relatively quickly.

Performance Comparison between the Proposed Method and Baseline Approaches
In this subsection, the proposed method is compared with the previously mentioned schemes. The compared results divide into two parts: maximum embedding capacity and embedding capability in different embedding capacities. The comparison results show that the proposed method has better embedding capacity, and the image qualities are still maintained well.

Maximum Embedding Capacity
We compared the embedding capacity and the image quality when the cover image was embedded once from beginning to end. The comparison is between the proposed method and the methods of Ni et al. [46], Lee et al. [47], Li et al. [21], and Cai et al. [5]. Shown in Table 9 is the comparison of maximum embedding capacity for six test images between the proposed method and the other schemes . In addition, the Table 10 is the comparison of PSNR for maximum embedding capacity between the proposed method and the other schemes. From the results in Table 9, whether in a smooth image (like image Lena) or in a complex image (like image Baboon), the proposed method has a better embedding capacity.
According to Table 10, the average PSNR of the stego-image among the previous schemes [5,21,46,47] and the proposed method are 53.04 dB, 51.75 dB, 51.61 dB, 63.72 dB, and 48.55 dB, respectively. Clearly, the larger the embedding capacity is, the lower the quality of the image we get. Although the PSNR of the proposed method is lower than other methods, the embedding capacity of it is much more than other methods. According to the above results, when the cover image is only embedded once, our proposed method can have the maximum embedding capacity and maintain good image quality.

Embedding Capability in Different Embedding Capacities
In this section, the variations in image quality under different embedding capacities between the proposed method and the methods of Ni et al. [46], Lee et al. [47], Li et al. [21], and Cai et al. [5] are compared. The image quality comparison for six test images in different embedding capacities between the proposed method and the other schemes are shown in Tables 11-13 . In addition, the performance comparisons between the proposed method and other related researches are shown in Figure 8 as line graphs.

Comparison between the Proposed Method and the Different Embedding Methods with Different Octant Embed Number
In this subsection, the variations in image quality under different embedding capacities between the proposed method and the different embedding methods are compared. The different embedding methods are generated by reducing the octant embed number of the 3D-PEH in the proposed method. The comparison results are shown in Figure 9. According to the above results, when the bits of embedded secret data are few, the distortion of the image can be slightly reduced by embedding the secret in fewer octants. Thus, the reducing effect is limited. Conversely, when the bits of embedded secret data is larger, the better quality of the image can be kept by embedding secret in more octants of the 3D-PEH. It can be expected that the more bits of embedded secret data, the larger gap between different embedding methods occurs. Therefore, we consider embedding secret data in eight octants in the proposed method.

Conclusions
Machine learning, especially deep learning, has made significant progress in many research areas and applications such as visual recognition, image classification and image processing, etc. However, to the best of our knowledge, no deep learning approaches have been successfully applied to RDH schemes which require images to be completely restored and secret information to be extracted. This motivates us to apply such approaches to RDH. In this paper, we propose a reversible data hiding scheme based on three-dimensional prediction-error histogram modification and MLP networks. We utilize a trained MLP neural network to predict pixel values and combining with PEE to achieve RDH. In addition, the proposed method of modifying the three-dimensional prediction-error histogram can better utilize the space in the three-dimensional coordinates for data embedding. Evaluation of the quality and embedding capacity of the stego-images shows that the proposed method still maintains a good PSNR and increases the maximum embedding capacity which is 1.9-9.8 times of previous methods. Nevertheless, the proposed method still has its disadvantages. Specifically, training the neural network and predicting pixels bit-by-bit are both time-consuming. Developing methods to enhance the efficiency of the proposed method, such as reducing the training time and predicting multiple bits at once, deserves to be further investigated in future works. Moreover, this work focused on proposing a novel reversible data hiding scheme which trains multilayer perceptrons by utilizing the correlation between image pixel values and their adjacent pixels so that the accurate pixel predictions can be achieved. There should be a trade-off between the performance and the fragility. For a future research direction, it is worthy to discuss the impact of fragility caused by transmission errors.