# CryptoDL: Predicting Dyslexia Biomarkers from Encrypted Neuroimaging Dataset Using Energy-Efficient Residue Number System and Deep Convolutional Neural Network

## Abstract

_{FA}for n ≥ 3. FPGA implementation of the proposed encoder shows 23.5% critical path delay improvement and saves up to 42.4% power. Our proposed cascaded deep CNN also shows promising classification outcomes, with the highest performance accuracy of 73.2% on the encrypted data. Specifically, this study has attempted to explore the potencies of CNN to discriminate cases of dyslexia from control subjects using encrypted dyslexia biomarkers neuroimaging dataset. This kind of research becomes expedient owing to the educational and medical importance of dyslexia.

## 1. Introduction

## 2. An Overview of Deep CNN for Image Dataset Classification

- Convolutional Layer: A convolutional layer is a series of small parameterized filters that operate on the input data domain. In this study, inputs are raw brain images and encrypted brain images data. The aim of the convolutional layers is to learn abstract features from the data [45]. Every filter is an n × n matrix called a stride. In this case, we have n = 3. We convolve the pixels in the input image and evaluate the dot product, called feature maps, of the filter values and related values in the pixel neighbour. For example, the stride is a pair of numbers (3,3), in which, in each step, we slide a three-unit filter to the left or down. In summary, given a brain MRI image I (Figure 1), consisting of R rows, C columns, and D layers, a 2D function I (x, y, z) where 0 ≤ x < R, 0 ≤ y < C, and 0 ≤ z < D are spatial coordinates, amplitude I is called the intensity at any point on the 2D set with coordinates (x, y, z) [46]. The process of extracting feature maps is defined in Equation (1):$${I}_{f}\left(x,y,z\right)={\displaystyle \sum}_{k=0}^{D-1}{\displaystyle \sum}_{i=0}^{n-1}{\displaystyle \sum}_{j=0}^{n-1}I\left(x+i,y+j,z+k\right)\ast {W}_{i,j,k}$$
_{f}is the convolved image, and W_{i,j,k}are coefficients of kernels or strides for convolving 2D arrays. - Activation Layer: The feature maps from convolutional layers are inputted through a nonlinear activation function to produce another stride called feature maps [4]. After each convolutional layer, we used a nonlinear activation function. Each activation function performs some fixed mathematical operations on a single number, which it accepts as input. In practice, there are several activation functions from which one could choose. These include ReLU (ReLU (z) = max (0, z)), Sigmoid, Tanh functions, and several other ReLU variants such as leaky ReLU and parameter ReLU [4,45]. ReLU is an acronym for rectified linear unit.
- Pooling Layer: A pooling layer, also known as sub-sampling layer, is next after an activation layer. The pooling layer takes small grid regions as input and performs operations on them to produce a single number for each region. Different kinds of pooling layers have been implemented in previous studies, with max-pooling and average pooling being the two most common. The pooling layers give CNN some translational invariance because a slight shift of the input image may result in a slight change in activation maps. In max-pooling (Figure 2), the value of the largest pixel among all the pixels is considered in the receptive field of the filter, while the average of all the pixel values is considered in average pooling.
- Fully Connected Layer: The fully connected layer has the same structure as classical feed-forward network hidden layers. This layer is named because each neuron in this layer is linked in the previous layer to all neurons, where each connection represents a value called weight. Every neuron’s output is the dot product of two vectors, that is, neuron output in the preceding layers and the corresponding weight for each neuron.
- Dropout Layer: This layer is also called dropout regularization. A model sometimes gets skewed to the training dataset on many occasions, and when the testing dataset is added, it generates high errors. In this situation, a problem of overfitting has occurred. To avoid overfitting during the training process, we used a dropout layer. In this layer, by setting them to zero in each iteration, we dropout a set of connections at random in the fully connected layers. This value drop prevents overfitting from occurring, so that the final model will not be fully fit to the training dataset. Batch normalization is also used to resolve internal covariance shift issues within the feature maps by smoothing the gradient flow, thus helping to improve network generalization. Figure 3 shows the building blocks of the simplified deep CNN classifier for brain images.

## 3. Background of Residue Number System and Image Encryption

_{1}, m

_{2}, …, m

_{n}} in such a way that GCD (m

_{i}, m

_{j}) = 1 is the greatest common divisor (GCD) provided i ≠ j. The dynamic range M of this RNS system is defined in Equation (2):

_{M}in the RNS can be expressed as $X\to \left({x}_{1},{x}_{2},\dots ,{x}_{n}\right)$ using Equation (3):

_{i}= x

_{1}, …, x

_{n}represents residue; X is a large integer; m

_{i}is a module; and M represents the system dynamic range, which must be sufficiently large enough. Z

_{M}ranges from [0, M), called the legitimate range of X.

## 4. Materials and Methods

#### 4.1. Participants

#### 4.2. Brain Images Acquisition and Pre-Processing

#### 4.3. Proposed Conceptual Framework for Secure Brain Image Classification

#### 4.4. Design of RNS Pixel-Bitstream Encoder for Image Encryption

^{n}− 1, 2

^{n}, 2

^{n+1}− 1} is suggested to establish bitstream shares for each pixel present in the patched brain image dataset, where m

_{1}= 2

^{n}− 1, m

_{2}= 2

^{n}, and m

_{3}= 2

^{n+1}− 1 represent the channel order of the modules with a value of n ≥ 3. These shares were concatenated to generate a cipher-image. In this scenario, the order of the moduli represents the public key (pk), while the value of n represents the secret key (sk), which must be kept as confidential as possible. In this scheme, a maximum cryptographic key length (λ = 4048 bits) was used for each modulo-channel to prevent adversarial forces such as brute-force, statistical, chosen plaintext, and chosen ciphertext attacks, because lower bit-length keys no longer provide enough strong security requirements. Designing a pixel bitstream encoder requires three parallel RNS modulo-processors, as illustrated in Figure 5. With the help of fast adders, each processor performs modular arithmetic operation regarding the arbitrary value of each corresponding m

_{i}modulus with the aid of carry save adders (CSA) and carry propagate adders (CPA), respectively.

_{i}of each modulo-processor using Equation (3). This RNS system’s dynamic range M [60] is determined using Equation (2) as follows:

_{i}is either 0 or 1. Therefore, r

_{i}’s can be computed based on the following assumptions:

- r
_{2}is the n least significant bit (LBS) of integer X and is computed directly from modulo −2^{n}processor. - For r
_{1}and r_{3}, X is partitioned into two n-bit blocks, Z_{1}and Z_{2}, and one (n + 1)-bit block Z_{3}, where$$\begin{array}{c}{Z}_{1}={\displaystyle {\displaystyle \sum}_{j=2n}^{3n}}{x}_{j}{2}^{j-2n+1}\\ {Z}_{2}={\displaystyle {\displaystyle \sum}_{j=n}^{2n-1}}{x}_{j}{2}^{j-n}\\ {Z}_{3}={\displaystyle {\displaystyle \sum}_{j=0}^{n-1}}{x}_{j}{2}^{j}\end{array}\}$$

_{2}in relation to modulo −2

^{n}, that is, r

_{2}= Z

_{1}. The only requirement here is the determination of the values |2

^{i}|

_{m}and then the summation of the results with a reduction relative to modulus [61,62]. Two cases are to be considered here:

#### 4.4.1. Case 1: Modulo −2^{n} − 1

^{n}− 1 processor yields residue r

_{1}as follows:

#### 4.4.2. Case 2: Modulo −2^{n+1} − 1

^{n+1}− 1 processor yields residue r

_{3}as follows:

^{n}− 1, 2

^{n}, 2

^{n+1}− 1} where n = 3 and a pixel value X = 123 (i.e., 1111011 in binary). Then, the encoding process is as follows:

_{1}= 011, Z

_{2}= 111, and Z

_{3}= 1.

_{2}= Z

_{1}= 3 (011). Using Equations (8) and (9),

_{1}, Z

_{2}, and Z

_{3}, and produces two outputs: a partial-sum (PS) and a partial-carry (PC), which must be fed into a CPA so that the carriages are propagated to produce an encoded result [61]. The architectural complexity area (A) and time delay (D) imposed on r

_{i}’s residues are calculated as follows from the above pixel-bitstream encoder:

#### 4.5. Deep CNN Architecture, Training, and Classification

_{I}) was applied to each pixel value (I) in the images for the unencrypted patches. Here, the value of e

_{I}= 5, and it was chosen such that Equation (12) holds true.

_{i}is modulus and M is the system dynamic range defined in Equation (2). Also, the pixel-bitstream encoder was designed to choose a floating-point random noise (e

_{M}) from the value range between [0.1, 0.9) to create a confusion moduli set (m

_{i}’) such that m

_{i}’ < m

_{i}and confusion dynamic range, M’ ≤ M–225. The encoder was set to select e

_{M}= 0.9 in this study, thus ensuring that the resulting m

_{i}’ is relatively co-prime.

## 5. Experimental Results and Discussion

#### 5.1. Implementation of the Proposed Pixel-Bitstream Encoder and Encryption Time Analysis

#### 5.2. Analysis of Pixel-Bitstream Encoder Performance

#### 5.2.1. Design Analysis

^{n+1}− 1 is the critical channel where the system spent most of its processing time owing to expensive computation resulting from addition operator. Table 1 shows the efficiency relation between the proposed pixel-bitstream encoder and the state-of-the-art binary-to-residue converter regarding critical path delay. The proposed pixel-bitstream encoder was first of all implemented on pure adder-based Virtex-4 XC4VSX25 field programmable-gate array (FPGA) for different values of the critical path: 2

^{5}− 1, 2

^{6}− 1, 2

^{9}– 1, and 2

^{12}− 1 bits. With a maximum frequency of 353.4 MHz, 292.8 MHz, 275.8 MHz, and 231.3 MHz, respectively, the timing efficiency of the proposed design was very good when the value of n = 4, 5, 8, and 11. The same implementation was repeated on Spartan-3 XC3S200 FPGA. Owing to the lack of integrated block random access memory (BRAM) count, however, the encoder could only be implemented with a maximum frequency of 383.4 MHz and 258.1 MHz for two critical paths (2

^{5}− 1 and 2

^{6}− 1 bits) when the value of n = 4 and 5, respectively. Meanwhile, the unexpected problem that was found is that the proposed encoder’s critical path delay for the value of n = 3 (2

^{4}− 1 bits) on FPGAs is 42.4% better than the read only memory (ROM)-based implementation. This explains the reason that it was able to encode all the image patches at approximately 15 s, as shown in Figure 7. From Table 1, it is clear that the best performance improvement of the proposed pixel-bitstream encoder for Virtex-4 FPGA is 23.5% when the value of n = 8. For Spartan-3 FPGA, the best performance improvement is 15.3% when the value of n = 4. While the time delay of the critical path decreases progressively for the latter, it increases progressively for the former and reached the peak at n = 8 before diminishing. By implication, it is not effective to implement the proposed encoder on FPGAs for applications requiring a large value of n owing to the computational complexity of the 2

^{n+1}− 1 module and lack of integrated BRAM count. In fact, it is not desirable to use external ROMs, as they are considerably slower than the built-in ones. Therefore, for applications requiring a small value of n, for example, digital image processing, the ROM-based platform is adequate and can deliver better performance than those based on adder. On the other hand, for applications that require a larger value for n, our proposed pixel-bitstream encoder is preferable.

#### 5.2.2. Cipher Image Analysis

#### 5.2.3. Histogram Analysis

#### 5.2.4. Correlation Coefficient Analysis

^{2}≤ 1.

_{p}and I

_{c}are functions with a condition that, if r = 1, there is a strong direct or positive connection between the two pictures, which means there has been no encryption. If r = −1, however, the inverse or negative correlation is ideal, suggesting strong encryption. Further, r = 0 connotes that there is no linear correlation, however, there might be a non-linear correlation between the two images. Using Equation (13), the correlation coefficient between Figure 9a,b was found to be −0.0073. Meanwhile, to ensure poor correlation of all portions of our brain image dataset before classification, we obtain five randomly selection adjacent portions from Figure 9a,b for correlation analysis, as shown in Table 2. The purpose of this task is to ensure that the CNN classifier, which we assumed to be a cloud-based third party platform, does not gain partial access that may be used to guess the encoded information using the dictionary of known cipher-text attacks.

#### 5.3. Analysis of the Proposed Cascaded Deep CNN Classifier Performance

- Accuracy: Accuracy tests the percentage of dyslexic subjects correctly classified as positive. For computation of the classifier accuracy, Equation (14) is used.$$Accuracy=\frac{TP+TN}{TP+TN+FP+FN}$$
- Sensitivity: Sensitivity is a measure of the percentage of dyslexic subjects that is correctly classified or predicted to be positive by the classifier. It is also known as the true positive rate (TPR) or recall. For the computation of sensitivity, Equation (15) is used.$$Sensitivity=\frac{TP}{TP+FN}$$
- Specificity: Specificity, or the true negative rate (TNR), tests the percentage of correctly classified non-dyslexic subjects. This indicates accuracy in identifying non-dyslexic subjects [82], as shown in Equation (16).$$Specificity=\frac{TN}{TN+FP}$$
- ROC and Area under ROC (AUC): The receiver operating characteristics (ROC) curve plots the sensitivity curve against specificity, and thus provides a representation of the trade-off between correctly classified positive instances and incorrectly classified negative instances [83]. Area under ROC (AUC) is computed directly from this curve.

#### 5.4. Summary of Discussion

## 6. Conclusions

_{FA}and a total time delay of (3n + 3) D

_{FA}when the value of n ≥ 3. FPGA implementation of the proposed encoder revealed that the encoder was able to save up to 42.4% energy compared with ROM-based implementation, with a decrease in critical path delay value of 23.5% compared with the state-of-the-art binary-to-residue converter equivalent. When used for the encryption process, the proposed encoder achieved approximately 15 s time for the creation of cipher images for all image patches segmented during the pre-processing phase. The correlation between the plain normalized brain and cipher images was found to range between −0.0073 and 0.0082 with completely different histogram shapes.

## Author Contributions

## Funding

## Conflicts of Interest

**Figure 1.**Feature maps extraction [46].

**Figure 3.**Simplified deep convolutional neural network architecture [4].

**Figure 4.**Proposed conceptual framework. RNS, residue number system; CNN, convolutional neural network.

**Figure 12.**Receiver operating characteristics (ROC) curve at a training iteration of 500 epochs after encoding. AUC, area under ROC.

Virtex-4 FPGA Delay in Seconds | Spartan-3 FPGA Delay in Seconds | ||||||
---|---|---|---|---|---|---|---|

n | Critical Path (2^{n+1} − 1) | Proposed Encoder | State-of-the-Art (2) | % Improvement | Proposed Encoder | State-of-the-Art (2) | % Improvement |

4 | 2^{5} − 1 | 23.7 | 25.6 | 7.4 | 25.5 | 30.1 | 15.3 |

5 | 2^{6} − 1 | 29.9 | 33.7 | 11.3 | 35.2 | 39.7 | 11.3 |

8 | 2^{9} − 1 | 37.1 | 48.5 | 23.5 | - | - | - |

11 | 2^{12} − 1 | 56.8 | 68.3 | 16.8 | - | - | - |

Adjacent Portions | Correlation Coefficient (r) |
---|---|

Portion1 | −0.0293 |

Portion2 | 0.0082 |

Portion3 | −0.0275 |

Portion4 | −0.0111 |

Portion5 | −0.0659 |

Whole Images | −0.0073 |

**Table 3.**Classification results before and after encoding (mean ± SD after 50 repeated 10-fold cross validation (CV)).

Before Encoding | After Encoding | |||||
---|---|---|---|---|---|---|

Training Iterations | Accuracy (%) | Sensitivity (%) | Specificity (%) | Accuracy (%) | Sensitivity (%) | Specificity (%) |

150 | 57.47 ± 2.58 | 40.19 ± 2.13 | 53.62 ± 2.33 | 39.66 ± 2.09 | 33.28 ± 2.51 | 37.18 ± 2.85 |

225 | 59.13 ± 3.76 | 61.23 ± 3.72 | 54.91 ± 3.19 | 58.39 ± 3.44 | 58.72 ± 3.06 | 53.31 ± 3.41 |

350 | 70.68 ± 4.02 | 65.29 ± 2.97 | 66.84 ± 2.88 | 63.42 ± 3.19 | 61.97 ± 2.89 | 62.00 ± 2.99 |

450 | 80.22 ± 4.46 | 71.33 ± 3.85 | 72.53 ± 4.12 | 68.99 ± 3.87 | 67.81 ± 4.73 | 68.03 ± 3.96 |

500 | 84.56 ± 4.91 | 76.25 ± 4.64 | 78.21 ± 4.33 | 73.19 ± 4.18 | 70.33 ± 4.46 | 71.43 ± 4.11 |

**Table 4.**Performance comparison with existing privacy-preservation methods. MRI, magnetic resonance imaging; CNN, convolutional neural network.

Author(s) and Year | Image Encrypted Algorithm | Deep Learning Classifier Used | Source of Dataset Used | Accuracy (%) | Reference No. |
---|---|---|---|---|---|

Tanaka (2018) | Block-based | Pyramidal Residue Network | CIFER Dataset | 56.80 | [89] |

Sirichotedumrong et al. (2019) | Pixel-based | ResNet-18 | CIFER Dataset | 86.99 | [35] |

Chao et al. (2019) | - | CaRENets | MNIST Dataset | 73.10 | [25] |

Proposed | Pixel-based | Two-Pathway Cascaded Deep CNN | Kaggle Brain MRI Dataset | 73.19 | - |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

