Implementing a Chaotic Cryptosystem by Performing Parallel Computing on Embedded Systems with Multiprocessors

Profiling and parallel computing techniques in a cluster of six embedded systems with multiprocessors are introduced herein to implement a chaotic cryptosystem for digital color images. The proposed encryption method is based on stream encryption using a pseudo-random number generator with high-precision arithmetic and data processing in parallel with collective communication. The profiling and parallel computing techniques allow discovery of the optimal number of processors that are necessary to improve the efficiency of the cryptosystem. That is, the processing speed improves the time for generating chaotic sequences and execution of the encryption algorithm. In addition, the high numerical precision reduces the digital degradation in a chaotic system and increases the security levels of the cryptosystem. The security analysis confirms that the proposed cryptosystem is secure and robust against different attacks that have been widely reported in the literature. Accordingly, we highlight that the proposed encryption method is potentially feasible to be implemented in practical applications, such as modern telecommunication devices employing multiprocessors, e.g., smart phones, tablets, and in any embedded system with multi-core hardware.


Introduction
Chaos theory is widely used in the encryption of information because of its particular properties [1][2][3], such as the high sensitivity to initial conditions, ergodicity, randomness, and topology complexity, among others [4][5][6][7]. For instance, numerous works related to the encryption of information using methods with chaotic and hyperchaotic models have been reported in [8][9][10][11][12][13][14][15][16][17][18][19]. On the one hand, chaotic maps have been widely used for the encryption of digital images [20][21][22][23][24][25][26], because they require generally few arithmetic operations compared to continuous-time chaotic systems. Additionally, other methods of image encryption are reported in the literature, such as those based on the substitution From our knowledge and based on the reviewed literature, few works report the use of parallel computing for the encryption of information, see for example [15,41,44,59,[77][78][79][80][81][82], but nevertheless, there are not any reported works in which profiling techniques and parallel computation are used for the encryption of high-resolution digital images using PRNGs on a cluster of embedded systems with multiprocessors. Except the work [41] reports a parallelizable chaos-based TRNG implemented on mobile device, but is not reported any implementation of parallel encryption on embedded system. Therefore, this paper proposes to use a different hardware from that commonly reported in the literature. The experiments are performed in a cluster with six System on Chip (SoC) Raspberry Pi 3, which allows us to physically integrate up to 24 cores, to run as many feasible processes that use parallel computing, and thus improve the efficiency of the new generation of embedded cryptosystems. Additionally, the method reported by [73] is used for the encryption of information using multiple-precision arithmetic [83]. The Python programming language [84] is used to process several significant decimals higher than that reported in the works [1][2][3][4][5][6][7][8][9][10][11][12][13][14][16][17][18][20][21][22][23][24][25][26] and which are based on the IEEE 754 standard [54], typically used in computers and FPGAs. It is important to emphasize that Python is a scientific programming language [84] with the following advantages: open-source, free license, and multiplatform. Under this perspective, the main contribution of this paper is that through the cluster integrated by six embedded systems, the feasible processes of parallelization are executed to improve the efficiency of the proposed cryptosystem by executing high-precision numerical operations under the scheme described by [73]. Additionally, the proposed cryptosystem must comply with the basic requirements of any chaos-based cryptographic system [85], such as the NIST SP 800-22 statistical tests designed for cryptographic modules [86,87] and other well-known attacks in the literature, such as Number of Pixels Change Rate (NPCR) and Unified Average Changing Intensity (UACI) differential attacks [88], entropy [89][90][91], key space [92,93], which are commonly applied to the cryptosystems.
The rest of this paper is organized as follows: Section 2 describes the proposed PRNG and dynamic keys generator. Section 3 describes the implementation of the proposed high-precision cryptosystem using parallel computing and profiling. Section 4 provides a security analysis according to [85][86][87][88] and performance analysis according to Amdahl's law [94]. The last section concludes this paper with a summary of the results achieved.

Proposed PRNG and Dynamic Keys Generator
The PRNG and dynamic keys generator that is implemented in this paper is shown in Figure 1, both corresponds to the method reported by [73], which use multiple precision. In this work we introduce a method to encrypt/decrypt color images applying profiling techniques and parallel computing, with the aim to improve the cryptosystem efficiency to be implemented on embedded systems with multi-cores that have limited computational resources. As an example the Tinkerbell map [73,95,96] is defined by (1), which remains in a chaotic regime for the ranges of its control parameters 0.84 < a ≤ 0.9, −0.61 < b < −0.59, 1.9 < c ≤ 2 and 0.45 < d ≤ 0.5, where −1.25 ≤ x n ≤ 0.55 and −1.6 < y n ≤ 0.55, and initial conditions x 0 and y 0 that correspond to points that are within the space of the chaotic attractor, such as x 0 = −0.72 e y 0 = −0.64 [96].
x n+1 = x 2 n − y 2 n + ax n + by n , y n+1 = 2x n y n + cx n + dy n . (1) The generated pseudo-random sequences consider two attributes: (i) the number of significant decimals is predefined to establish their level of precision np, for example np = 99 and (ii) each sequence is generated with different initial conditions and recurrent points of its attractor. For example, taking as initial conditions x 0 = −0.72, y 0 = −0.64 and iterating k = 2500 times (one can choose a number greater than 2500), if (1) considers a numerical precision np = 99, one gets [73]: From these values, a new level of precision and np x 0 truncation of significant decimals for the new initial x 0 and y 0 are predefined. For example, if np x 0 = 50 significant decimals, then is obtained: This procedure guarantees that the new values of initial conditions are within the space of the chaotic attractor and are valid in the long term for new initial conditions, that is, new dynamic keys are generated for the encryption of large amounts of information. When generating numerical values for the initial conditions very different from each other, they produce chaotic series with different dynamic behaviors without losing their properties, which makes them suitable for use in the encryption of images regardless of their level of resolution. Figure 2 shows an example of the chaotic trajectories of x n+1 . It is observed that due to the sensitivity to the initial conditions presented by the chaotic systems, different trajectories are generated from the first iteration.

Proposed Pseudo-Random Bit Generator (PRBG)
The proposed PRBG shown in Figure 3, is based on the high-precision chaos generator (HPCG) described in [73]. In this paper, we show the use of parallel computing techniques for the simultaneous generation of bit sequences, which allows the cryptosystem to be more efficient when using its multiple cores. In relation to Figure 3, each pseudo-random sequence of bits is obtained by converting each number of the chaotic series generated into a binary equivalent through the [Chaotic series adjustment] and [Converter to binary sequence] blocks. The binary equivalent that results in each iteration is concatenated each time and stored in a resulting bit string identified with letter B. In this way, a pseudo-random B string of bits is generated for each equation contained in the chaotic map.  Figure 3. Block diagram of the pseudo-random bit series generator using HPCG, taken from [73].

Proposed Parallel Encryption Method
The proposed system for encryption and decryption of digital color images is shown in Figure 4. It is based on the method of stream encryption with Symmetric key [97], and in the method for generation of pseudo-random numbers using multiple-precision arithmetic reported in [73]. In relation to Figure 4a), the Original image OI (Object image) goes through a data distribution stage that is illustrated in the [Data scattering] block and that corresponds to divide the image into n blocks (sub-images), where o nk corresponds to k-pixel of the plain image (OI n ). Each of these sub-images are sent to each n process of the embedded system cluster, where they are encrypted simultaneously (parallel), as shown in Figure 4a). In Figure 4b) each n process encrypts the o nk pixels with the corresponding byte to the associated encryption series .., b nk }, through the simple logical operation XOR. Subsequently, each of the sub-cryptograms achieved C 1 = {c 11 , c 12 , ..., c 1k }, C 2 = {c 21 , c 22 , ..., c 2k },..., C n = {c n1 , c n2 , ..., c nk }, are sent to the [Data Gathering] stage, where all the encrypted information is integrated to obtain the Full cryptogram C. To execute the scheme of the Figure 4a) for data processing in parallel with collective communication, it is proposed to use the Message Passing Interface (MPI) library reported in [98,99] commonly used in computers, nevertheless, in this paper, it is implemented in a cluster of six embedded systems with multi-core processors, which is described in Section 3.5. When using the logical operation XOR as an encryption operator and due to its duality property, the same method can be used to encrypt and decrypt digital images [73].

Profiling Algorithms Using Python
A profiler is a program that analyzes and collects information about the behavior of another object program during its execution. The type of information that is analyzed includes the processing times of the program's subroutines and the number of times each subroutine is called. The above, with the purpose of allowing the developer to optimize the program code to improve the speed in the execution of the complete program, adjusting the code design implementing adjustment techniques The report initially displays the value of 3.79968e −07 s as timer unit, and it is observed that the total execution time of the encrypt() function is 862.992 s. Additionally, the report indicates the execution time of each instruction per program code line. From the results reported in [73], it is observed that the series for the state y n+1 also present high levels of randomness, so that they are potentially viable to be used as a cipher series. In this paper, the proposed method also includes the chaotic series y n+1 , and it is estimated that the encrypt() function reduces the total time, as T(1) = 639 s, corresponding to the iterations of the chaotic map when n = 1 process. Therefore, with the information obtained through the profiling tool, one can identify the instructions that are executed a greater number of times, thus requiring a greater amount of time. This allows identification of in a specific way, where to implement code optimization adjustment techniques, or to divide the main program in one part with sequential processing and another parallelizable.

Parameters of Parallel Processing
Profiling analysis can determine that the function can be divided into a sequential part and a parallelizable part. There are a series of parameters to evaluate the parallel processing as the performance improvement factor or "speed-up" and the efficiency of the performance. If O(n) is defined as the number of elementary operations performed by a system with n process, and T(n) as the execution time in unit steps of time, then T(n) < O(n), if n process perform more than one operation per unit of time. The performance improvement factor S(n) for n process is defined by (2), and the efficiency of the E(n) system, for a system with n processes, is determined by (3).
The lowest system's efficiency E(n) → 0 occurs when all instructions in the object program are executed sequentially in a single processor system. The maximum system efficiency E(n) = 1 corresponds to the case in which all the processors of the system are being completely used during the execution of the program. Another parameter for the evaluation of systems with parallel processing is scalability. It is said that a system is scalable for a certain number of n process, if the efficiency of the system E(n) is constant and at all times greater than a factor of 0.5 [100]. In practice, scalable systems can be divided into several n process from which the efficiency of the system begins to decrease. In the analyzed program from the profiling report, of the total instructions, the 99.7% is potentially parallelizable (lines 13 to 42 of profiling report) and 0.3% is executed sequentially. Therefore, according to the analysis of the profiling report, if T(1) = 639 s the values of S(n) and E(n) for n = 2 are: T(1) = 639s = 99.7% + 0.3% = 637.08s + 1.92s, S(2) = T(1)/T(2) = 639s/(318.54s + 1.92s) = 639s/320.46s = 1.994, E(2) = S(2)/2 = T(1)/2 * T(2) = 1.9940/2 = 0.9970.
A first approximation to the n process among which the analyzed program can be established from Equation (3), that is, E(n) = 50%, i.e. 0.5, if T(1) = 639 s, and is considered to be 100%, T(n) = α + β p (n) = α + (β p (1)/n) = α + (100 − α)/n, then, 0.5 = 100/(n(α + (100 − α)/n)), where, α is the sequential time, and β p is the parallel time. Figure 5 shows the graph of the efficiency values of the system E(n) of the "Encrypter.py" program. It is observed that for n > 334 the efficiency of the system is less than 0.5. In relation to [100], the performance factor can be classified into three different types: (i) based on whether the load or problem size is fixed and what is intended to be reduced is the problem execution time, (ii) on the basis that there is a certain time to execute the problem or that it is intended to increase the size of the machine and, (iii) that applies to scalable problems where the main limitation in the execution process is the system memory. In this paper, a problem in which the computational load or problem size is fixed, and is intended to reduce the execution time of encryption process. Therefore, it is necessary to perform an analysis of the performance factor based on Amdahl's law [94]. Amdahl's law considers problems where the objective is to distribute the fixed load in more process to decrease the total execution time. In this case, the Amdahl performance improvement factor (S n ) described by (4), is determined by the percentage of the sequential time of the algorithm which is identified as α.
The percentage of parallel time is identified with β p = 1 − α, On the proposed cryptosystem, the percentage of sequential time α = 0.3% implies that S n approaches asymptotically at 1/α as the number of n process increases. The graph of Figure 6 shows the characteristic curve of Amdahl performance factor considering a sequential part α of 0.3% compared to the characteristic curve when the parallelizable part is zero or α = 0%. Additionally, and for comparison purposes, the curves are shown for different values of α where it is observed that as the percentage of sequential time increases, the "speed-up" value of Amdahl performance factor begins to decline to a great extent.

S n n
Amdahl performance factor Figure 6. Curves of performance factor S n for different α values compared to the S n curve for the proposed cryptosystem with α = 0.3%.
In the analysis performed it is considered that the process is under ideal conditions of computational resource of software and hardware, that is to say, at the time of executing the algorithm each time, considering different numbers of process, each process works under the same conditions and with the same computational resources. In the realization, another factor that is considered in the study of parallel processing is the communication between processors. In the cryptosystem, the computational load or size of the problem corresponds to the image that will be encrypted or decrypted. Considering that when carrying out the parallelization process the computational load is evenly distributed among the total number of processors executing the algorithm, then, there is a distribution time for the image in equal parts to be encrypted and a collection time of the sub-cryptograms to obtain the full cryptogram. Due to this, the total time in the execution of the algorithm is affected by the distribution and collection of the data load. Presently, there are tools that perform collective communication optimizing the distribution and load collection times among the processors, such as the MPI tool for Python, which provides collective communication functions that optimize the distribution and data collection times and information between process [98,99].

Implementing a Machine Based on Raspberry Pi 3
The SoC Raspberry Pi 3 has a Cortex-A-53 64-bit Quad Core 1.2 GHz processor with 1 GB DDR2 RAM memory. One of the aspects considered in the study of parallel processing is the processor machine in which the algorithms are executed. The execution of the main algorithm that uses parallel processing through the use of the MPI library [98,99] and the profiling tool is carried out by means of the command: mpiexec -n < n > kernprof -l -v Encrypter.py, where the parameter < n > indicates the number of process that the machine that executes the algorithm contains. Figure 7 shows the composition of the Raspberry Pi 3 machine where it is observed that internally has a Quad Core structure with internal communication bus and 1GB of RAM.  When executing the previous command, the Raspberry Pi 3 system creates a machine with n process for program execution. Figure 8 shows some of the applications and services that are running on the Raspberry Pi 3 system when executing the command for processing with 1 processor (n = 1). It is observed that a virtual machine is created with a resource of %CPU = 100.0 and a memory resource of %MEM = 2.6 in a process identified as PID = 1542.   Figure 9b shows the computational resource assigned to each processor when the proposed method is executed with n = 8 process. It is observed that each processor is assigned to a computational resource of CPU different from that assigned to each processor when it increases the process' number of the machine. Due to the above, the Amdahl performance factor (S n ) calculated and shown in the graph of Figure 6, is greatly affected because the machines do not have the same computational hardware resource for execution as the number of process increases. From the comparison shown in Figure 9a,b, the most optimal computational resource for the Raspberry Pi 3 machine, is presented when the number of process that execute the program is n = 4. Therefore, the maximum gain in runtime is obtained with n = 4 processors. From n > 4 process the gain in runtime decreases each time, this is because physically there are no more processors in a Raspberry Pi 3.

Implementing a Machine Based on a Cluster of Six Embedded Systems (FIAD Cluster)
The FIAD cluster is composed of six Raspberry Pi 3 embedded systems and whose organization is generally described in Figure 10. It is observed that when the complete machine is integrated, a total of 24 Cores is obtained. In theory it is expected that the maximum gain will be obtained in executions of the program with n = 24 process. However, in addition to the fact that each Raspberry Pi 3 system has an internal communication bus, each Raspberry Pi 3 communicates externally with the remaining 5 devices through an Ethernet communication channel at a speed of 100 Mbit/s. That way, the communication time between processes is affected. The Figure 11 shows an image of the experimental arrangement implemented for the execution of encryption tests using parallel computing on six embedded systems.

Experimental Results
To test the robustness and security of the proposed cryptosystem, the following security analyses were performed: Statistical tests of NIST for cryptographic modules, histogram analysis, entropy, differential attack tests (NPCR and UACI) and key space. The quality evaluation of the randomness is carried out to demonstrate the satisfactory security of the new proposed chaos-based cryptosystem.
In addition, to test the efficiency of the cryptosystem, a performance analysis of parallel computing in information encryption is accomplished. Besides, with the purpose of testing the proposed method in a different hardware and with multiprocessing capability, it also was implemented in a computer with CPU AMD A6 4400M APU 2.7 GHz (Accelerated Processing Unit), which is a mobile dual-core processor based on the Trinity architecture, works with Windows 10 operating system and Python version 3.5.2.

The NIST Statistical Test
For NIST tests, each p-value is the probability that a perfect random number generator would have produced a sequence less random than the sequence that was tested, given the kind of non-randomness assessed by the test. If a p-value for a test is determined to be equal to 1, then the sequence appears to have perfect randomness. A p-value of zero indicates that the sequence appears to be completely non-random. A significance level (α) can be chosen for the tests. If p-value α, i.e., the sequence appears to be random. If p-value < α, i.e., the sequence appears to be non-random. For all 16 tests in the NIST suite [86] performed in this paper, the significance level (α) was set to 0.01. If a computed p-value is greater than 0.01, the binary sequence is accepted as random with a confidence of 99%; otherwise, it is considered to be non-random [4,86,101]. In addition, the following setup parameters are considered: Block Frequency test -block length (M) = 128, Non-Overlapping Template test -block length (m) = 9, Overlapping Template test -block length (m) = 9, Approximate Entropy test -block length (m) = 10, Serial test -block length (M) = 16, Linear Complexity test -block length (M) = 500. All these tests were performed using 1000 series (sequences) of stream length = 1,000,000 bit. Table 1 lists the comparative results of the success percentages obtained with Tinkerbell map [73,95,96] using high precision of np = 99 significant decimals. If the proportion of success is greater than 0.98, it can be concluded that the sequences pass the NIST tests, i.e., those are random sequences. These results demonstrate that the proposed PRNG can be used in cryptosystems [73].  Figure 12 shows the original image of Lena RGB 512 × 512 × 3 and the resulting cryptogram when implementing the encryption method with the HPCG generator by considering the states x n+1 and y n+1 of the Tinkerbell map [73] to generate the pseudo-random series. It is observed that the cryptogram is a totally unintelligible image that shows no traces of the original image, so it can be seen that the histograms of the RGB components have a uniform distribution, thus confirming that the cryptosystem is robust against statistical attacks [92,93].

Entropy
In the works reported by Shannon [102,103], the mathematical foundations of the theory of information applied to communication and data storage were proposed. The entropy of the information is a criterion that measures the randomness of the data [13]. It can also be used to evaluate the security of the encryption [104]. Equation (5) is used to calculate the entropy H(s) [89][90][91], of a source (s), where P(s i ) represents the probability of the s i symbol. For a purely random source that is emitting 2 N symbols with the same probability after evaluating (5), the entropy H(s) = N, in this case, for images with completely random pixels with 8-bit format, its ideal entropy is H(s) = 8 bit. When digital images are encrypted, ideally their entropy must be 8. When a cryptographic system emits symbols (cryptograms) with entropy less than 8, this encryptor has a certain degree of predictability, so that its security is put at risk [13,89,90]. For purposes of comparison the entropy results to other related works that also report results using the Lena RGB image 512 × 512 × 3 with 8-bit RGB format. Table 2 shows that the proposed method using the Tinkerbell map [73] presents better entropy results versus most related works reported by [4,11,25,32,36,59,77,81], except work [41], which reports the better entropy. Thus, confirming that the high arithmetic precision helps to improve the entropy of the encrypted information. On the other hand, using the FIAD cluster (see Figures 10 and 11), Table 3 shows the entropy results obtained from the Lena RGB cryptogram of 512 × 512 × 3 using n process (n = 1 to n = 128) and the Tinkerbell map with high numerical precision np = 99 [73]. It can be appreciated that in all the results there is an entropy of about 7.999xxxx, hence, it can be said that the cryptosystem's security, i.e., entropy is not affected using parallel computing on embedded systems. In addition and with the purpose of testing the proposed method in a different hardware and with multiprocessing capability, the proposed method also was implemented in a computer with CPU AMD A6 4400M APU 2.7 GHz (Accelerated Processing Unit), which is a mobile dual-core processor based on the Trinity architecture, and it works with Windows 10 operating system and Python version 3.5.2.

NPCR and UACI Differential Attacks
To perform an analysis against differential attacks and understand the differences between encrypted images [13], two measures in common are used, NPCR and UACI. These measures are used to test the influence of change of a pixel in the whole encrypted pattern.
Let us consider the cryptograms C 1 and C 2 obtained with a tiny difference of 1 −90 in the encryption key using 1 processor obtained from a Lena RGB image with 8 bit RGB format adjusted to a size of M × N, where M = N = 512 pixels. According to [88], the critical values for the NPCR test in the image encryption with the levels of significance N α are: N 0.05 = 99.5994%, N 0.01 = 99.5952%, and N 0.001 = 99.5906%. Table 4 depicts the results of enforcement the NPCR test to the obtained cryptograms through the proposed method by implementing the Tinkerbell map [73]. If the values are less than N α , then, it is considered that C 1 and C 2 fail the test according to [88]. In Table 4, it can be observed that the results achieved by Tinkerbell map pass all the NPCR critical values test according to [88]. Also, it can be observed that the results obtained from NPCR have similar levels to those reported in related works [4,11,25,32,34,36,59,77].  Table 5 presents the NPCR results obtained with the Lena image 512 × 512 × 3 using FIAD cluster (see Figures 10 and 11) with multiprocessors and the Tinkerbell map [73] for the encryption of the information. It can be observed that in most cases the test is passed according to the critical values established by [88], even though, some results failed by a hundredth or a thousandth before reaching the minimum critical value. Therefore, it can be said that the cryptosystem's security is not affected by the use of parallel computing.  Table 6 shows the UACI results achieved with the Lena image 512 × 512 × 3 using FIAD cluster (illustrated in Figures 10 and 11) and the Tinkerbell map [73] for the encryption information. It can be observed that in all cases, the UACI test is passed according to the critical values established by [88]. Therefore, it is verified that security is not affected by the use of parallel computing.  Regarding the security analysis performed on the cryptosystem against differential attacks NPCR and UACI, it can be concluded that it does not affect the use of more process in parallel for the encryption of information as shown in Tables 4-6, where the security levels of NPCR and UACI remain satisfactory regardless of the number of process used in encrypting the information.

Key Space
The key space is the total number of different keys that can be used in the encrypted or decrypted process [13]. For a cryptographic system to be effective and safe, the key space must be large enough to make unfeasible the brute force attack [92]. The key cryptosystem proposed consists of two parts: (i) the dynamic keys generator, and (ii) the control parameters of the same chaotic map. If the key space of an encryption algorithm is large enough, typically greater than 128 bits, it is already considered safe for most cryptographic applications in terms of the speed of current computers. According to the key space result reported in [73], the key space is 2 2041 using the Tinkerbell map with a high precision of np = 99 significant decimals. However, in this work it is proposed to use the FIAD cluster (illustrated in Figures 10 and 11) the key space is increased virtually depending on the number n of process used. That is, the key space is 2 n×2041 , therefore, the brute force attack applied to the cryptosystem is unfeasible to break by current computers [13,93]. Table 7 lists the key space obtained by the proposed embedded cryptosystem using Tinkerbell chaotic map [73] and a comparison with other related works [4,25,32,33,[35][36][37][38][39][40]. It can be observed, that there exists an exponential increase on the key space when using multiple precision in the numerical calculation determined by np = 99 and np = 999 significant decimals against the methods using double precision [4,25,32,33,[35][36][37][38][39][40]. Hence, the Kerckhoffs's principles are met, which says, the security of the system must rest on the security of the key, being supposed to know the rest of the parameters of the cryptosystem [37,38,105]. It can be appreciated that the key space results achieved in this paper using multiple precision are greater than those reported in related works [4,11,13,25,[32][33][34][35][36][37][38][39]59,77]. Hence, the proposed embedded cryptosystem has virtually unlimited key space, such as [4,35,37,38,73].

Performance Results Using Parallel Computing
Also, it is important to mention, in parallel computing, when the communication time is greater than the total processing time, it is said that the efficiency of the system begins to be lost, therefore there is no performance, in this case, it is recommended to increase the quantity of processors or cores, this can be observed in more detail in Figure 14.  In the same way, Figure 16 shows the result of the decryption process with an error in the parameter corresponding to the number n process involved in the encryption. Finally, it is important to mention that when implementing the proposed method in a Raspberry Pi 3, the execution times of the program implemented with parallel processing are affected by the execution of other applications and services, so the theoretical estimations of the Performance factor and system efficiency may vary from real-time execution results. In addition, by encrypting the image separately (divided), n encryptions are performed where each encryption process has its corresponding encryption key (dynamic keys). Finally, it can be observed that the gain in execution time depends on the number of process involved in the execution of the program and the resource in software and hardware assigned to each processor, with n = 16 process being the optimal value for the case of a FIAD cluster, which is within the total available cores (24 cores). In the case of the personal computer CPU 2.7 GHz, the optimal time was n = 2, this because its hardware is a dual-core CPU.

Conclusions
In this paper, the implementation of a chaotic cryptosystem using profiling and parallel computation techniques in a cluster of embedded systems with multiprocessors was presented. With the experimental results, it was found that the efficiency of the embedded cryptosystem is improved and verified to comply with Amdahl's law. To verify that the proposed method is scalable to other hardware with multiprocessing capability, the proposed method also was implemented in a personal computer with 2.7 GHz CPU, which is a mobile dual-core processor based on the Trinity architecture. In both hardware implementations, the results of security and performance analysis were satisfactory. The proposed method helps to find the optimal number of parallel processes to be used in the cryptosystem. It is verified that when using excess of processes in parallel without having the necessary hardware (cores), the performance is affected, as shown in Figures 13 and 14. A great advantage of using parallel computing on embedded systems is that it is possible to reduce the total execution time by identifying the section of the algorithm that can be run simultaneously or parallel. In addition, it was possible to verify that the algorithms to generate the chaotic series can be adapted to obtain a high degree of parallelization. Using several processors and multiple precision in the cryptosystem add a greater degree of difficulty for cryptanalysts. Regarding the security analysis performed to the algorithm against the different types of attacks, such as statistical, entropy, differential (NPCR, UACI), key space, it can be concluded that the use of more processors in parallel for the encryption of the information does not affect, and as shown in Tables 3-7, the security levels remain satisfactory regardless of the n processes used. Regarding key space, when implementing the Tinkerbell map with high precision of np = 99 significant decimals and when using n processes in parallel for the encryption of the information, the key space is increased virtually up to 2 n×2041 , where n is the number of parallel processes (see Table 7). Therefore, the Kerckhoffs's principles are met. Finally, the proposed cryptographic method can be implemented in practical applications and with different types of hardware with multiprocessing capabilities.