Computing a Group of Polynomials over a Galois Field in FPGA Architecture

For the most extensive range of tasks, such as real-time data processing in intelligent transport systems, etc., advanced computer-based techniques are required. They include field-programmable gate arrays (FPGAs). This paper proposes a method of pre-calculating the hardware complexity of computing a group of polynomial functions depending on the number of input variables of the said functions, based on the microchips of FPGAs. These assessments are reduced for a group of polynomial functions due to computing the common values of elementary polynomials. Implementation is performed using similar software IP-cores adapted to the architecture of user-programmable logic arrays. The architecture of FPGAs includes lookup tables and D flip-flops. This circumstance ensures that the pipelined data processing provides the highest operating speed of a device, which implements the group of polynomial functions defined over a Galois field, independently of the number of variables of the said functions. A group of polynomial functions is computed based on common variables. Therefore, the input/output blocks of FPGAs are not a significant limiting factor for the hardware complexity estimates. Estimates obtained in using the method proposed allow evaluating the amount of the reconfigurable resources of FPGAs, required for implementing a group of polynomial functions defined over a Galois field. This refers to both the existing FPGAs and promising ones that have not yet been implemented.


Introduction
To ensure the seamless operation of intelligent transport systems (ITSes), it is required to process large amounts of monitoring information of various formats, purposes, and confidentiality levels in the real-time mode [1][2][3]. General-and special-purpose computing machinery is essential to information processing. Computer hardware (CHW) can be speeded up in two ways. The first one is extensive, requiring continuously enhancing computational capacities and developing special-purpose CHW focused on a predefined scope of tasks. The second one is intensive, requiring a flexible adaptation of the CHW hardware to a certain task, particularly rejecting the classical, von Neumann's, CHW common-bus architecture. Unlike the former, the latter way allows implementing devices with higher speeds than that for general-purpose CHW. Examples of special purpose computing devices that use field-programmable gate arrays (FPGAs) [4] are given in [5,6]. Using high-speed CHW devices that allow implementing various algorithms of distributed data processing at different times is relevant in solving various ITS-related tasks, such as data processing in large databases [7], analysis of discrete Markovian processes [8,9], nonlinear filtering of data [10], etc.
To solve the above scope of problems, FPGAs [4] can be used that include reconfigurable logical elements, i.e., lookup tables (LUTs(x)), D flip-flops, and input/output blocks (IOBs). In [11][12][13], an approach is presented based on reducing the problem of processing data arrays (exemplified by maps) to implementing similar FPGA-based operations. Several studies [14][15][16] show that the problem of implementing arbitrary maps of one set of elements into another set is reduced to the distributed computation of a group of nonlinear polynomial functions (polynomials, functions) of a given number of variables defined over a Galois field GF(2 k ) of a certain power [17]. A cooperative distributed FPGA-based computation method is proposed for the said polynomials. We obtained the estimates of hardware complexity for a group of polynomials by the number of reconfigurable FPGA elements. It is shown that the computations of elementary polynomials common for the functions from the set to be implemented considerably save the reconfigurable logical resources of FPGA. Due to the pipelined implementation of computing a group of functions based on similar IP-cores, estimates of operation delays of pipeline devices computing the set of polynomials on FPGA are weakly dependent on the number of variables at the input of these functions.
The article is contained three basic Sections: (1) Basic Terms and Definitions; (2) Hardware Complexity of Implementing a Group of Polynomials in the FPGA Architecture and (3) Discussion. In Section 1, the concept of a polynomial of m variables and operations on elements of the Galois field is introduced in accordance with [17]. Statements are made regarding the hardware and time complexity of the pipeline implementation of operations on elements of the Galois field in the FPGA architecture. The concept of a similar IP-core corresponding to FPGA architecture is introduced [16]. According to [14][15][16], the separate computation of a group of polynomials involves multiple recomputations of partial polynomial functions common to a group of polynomials. This significantly increases the complexity estimates by the number of such IP-cores. Section 2 presents and theoretically supports representing a group of polynomials over a Galois field. Estimates of the complexity of the implementation of this group by the number of similar IP cores are calculated. The proportion of IP-cores required to calculate parts of elementary polynomial functions common to a group of polynomials is determined. Section 3 defines the possibilities for using the proposed method to evaluate the characteristics of a group of polynomials (the number of variables and polynomials, the dimension of the Galois field) that are acceptable for implementation on a given FPGA-devices, both for existing and prospective.

Basic Terms and Definitions
Let us define a polynomial of m variables over a Galois field represented as GF(2 k ) [17] f (x 1 , . . . , By defining the values of an m-dimensional matrix of coefficients A = a i 1 ...i m , r = 2 k − 1. Symbol Σ in (1) means the bitwise sum of elementary modulo-2 polynomials a i 1 ...i m x i 1 1 · . . . ·x i m m , i j = 0, r. A polynomial function of the form (1) is representable as a self-similar formula:
The coefficients of this polynomial are given by a two-dimensional matrix Let us consider a polynomial represented as (1) on FPGA. According to [14], let us introduce the following statements: Statement 1. k LUTs (x), x ≥ 2k, allow performing the following operations: raising to power i, i = 1, r, and multiplying two elements of GF(2 k ); multiplying two elements of GF(2 k ) by a constant, followed by addition; and bitwise modulo-2 sum operation for x elements of GF(2 k ). Statement 2. The basic element for implementing a polynomial represented as (1) on FPGA is an IP-core that includes k LUTs (x), x ≥ 2k, and k D flip-flops. Statement 2 defines the hardware complexity of the basic element by the number of the programmable FPGA elements, i.e., k LUTs (x) and k D flip-flops, respectively. The pipelined operation delay time of a device implemented on these moduli does not depend on the number of variables of polynomial (1) and is defined by formula: where t D , t IC , and t LUT(x) are operation delay times of D flip-flops, interconnections, and LUTs (x) for a given FPGA, while t in and t out are the delay times of IOBs that function to information input/output within the package of a given FPGA, respectively. Delay time of interconnections, t IC , is computed for a given device using a particular computer-aided design system. Such as Vivado 2020.1 CAD system (Xilinx, Inc., San Jose, CA, USA), Quartus II (Intel, Inc., Santa Clara, CA, USA), etc. For example, according to [4], for FPGA Virtex UltraScale, device XCVU065, in formula (2) operation delay time t D , is 2.36 ns. t in and t out are equal to approximately 0.42 ns and 0.66 ns. The value of the value t LUT(x) (x = 6) is less than the values t in and t out . According to [15,18], the interconnections' delay time does not exceed 70% of the total delay time of operation. As a result, according to (2), the upper bound of pipelined operation delay time of XCVU065 device is 2.36 + (2.36 + 0.66)·0.7 + 0.66 = 5.134 ns.
Problem of implementing a broad class of digital CHW devices is reduced to the problem of implementing an arbitrary map of the elements of set F 1 into set F 2 . In [15], it is shown that, in case of |F 1 | ≤ 2 k m , |F 2 | ≤ 2 k E , this map can be represented by E polynomials of m variables over Galois field represented as GF(2 k ).

Hardware Complexity of Implementing a Group of Polynomials in the FPGA Architecture
According to Statement 1, let us find the hardware complexity estimates of implementing a group of E polynomials represented as (1) Values of Z (j) 1−E and Z e , j = 1, m, e = 1, E, are defined in accordance with the definitions 1-3. The total number of IP cores required to compute a group of E polynomials represented as (1) is: Based on definitions 1-3 and the above true is the Theorem 1.

Theorem 1.
Hardware complexity of computing a group of E polynomials represented as (1) of m variables over GF(2 k ) in the FPGA architecture by the number of IP-cores is calculated according to (3), the operation time delay was estimated according to (2), while the number of FPGA pins is defined as k(m + E).
Let us analyze the hardware complexity estimates of implementing a group of polynomials in the FPGA architecture. Figures 1 and 2 represent diagrams that show the dependency of the total number of IP-cores, N EP , for a group of polynomials represented as (1) over GF(2 2 ) and GF(2 3 ), respectively, on the number of variables m and on the power of a group of polynomials E. According to the data given in Figure 1, the spread of estimates of the number of IP-cores for a group of polynomials represented as (1) over GF(2 2 ) varies from 59 (m = 2, E = 5) to approximately 3.44·10 6 (m = 9, E = 12). For a group implemented over GF(2 2 ) (see Figure 2) these estimates range from 163 (m = 2, E = 5) to approximately 23.0·10 6 (m = 7, E = 10). With the increase in the number of the variables of polynomials, an exponential increase in the estimate, N EP , is observed, while the growth of E results in the practically linear increase in this estimate.  Of interest is also the contribution of the hardware complexity estimates for computing elementary polynomials common for each of E functions of N 1 to the total complexity estimate, N EP . According to diagrams shown in Figures 3 and 4, linear growth of the N 1 /N EP ratio is observed in increasing the number of variables. However, the higher the value of E, the slower N 1 /N EP grows. This observation is true for both GF(2 2 ) and GF(2 3 ). This is explained by the fact that the number of IP-cores required for computing elementary polynomials increases exponentially with the linear increase in the number of variables m.  The value N 1 /N EP shows what effect will be in the case of the implementation of a group of E polynomials according to the proposed method compared with the separate implementation of each of their E polynomials over GF(2 2 ) and GF(2 3 ). For GF(2 2 ) (see Figure 3) the spread of values of magnitude N 1 /N EP varies from 15.3% (m = 2, E = 5) to 42.8% (m = 9, E = 12). For GF(2 2 ) (see Figure 3)-from 30.1% (m = 2, E = 5) to 46.8% (m = 7, E = 10).
According to [15], implementing the given system from E polynomials of m variables over GF(2 k ) on one FPGA microcircuit is allowed, provided that the following conditions are met: Five elementary polynomials do not need to be calculated, since the coefficients for them and in f (1) (x 1 , x 2 ) and in f (2) (x 1 , x 2 ) are zero; according to the definition of 2, Z 1−E = Z 1−E = 1; for any value x j ∈ GF(2 2 ) x 3 j = 1, therefore, it is not necessary to raise to the power of x j , j = 1, 2. The result of stage 2 for each of the polynomials f (1) (x 1 , x 2 ) and f (2) (x 1 , x 2 ) is represented as matrices: Stage 3 consists of the bitwise addition of elementary polynomials and constants represented in the specified matrices:

Discussion
What is the advantage of the proposed approach to the implementation of a group of polynomials over a Galois field? Let us return to formula (3). Suppose the elements of this group of E polynomials are calculated separately. In that case, the hardware complexity will be defined as N S EP = E·N 1 + N 2 + N 3 , that is, the value of N 1 will be increased by E times. As a result, the assessment of the complexity N EP of the joint implementation of a group of E polynomials over a Galois field in comparison with their separate implementation N S EP will be reduced in time.
The value N 1 /N EP is given for a group of E polynomials over a Galois field GF(2 2 ) and GF(2 3 ) given m and E on Figures 3 and 4, respectively. By analogy, the estimate (5) has a range of values for GF(2 2 ) from 1.61 (m = 2, E = 5) to 5.71 (m = 9, E = 12) and for GF(2 3 ) from 1.90 (m = 2, E = 3) to 5.68 (m = 7, E = 9). These estimates are shown in Figures 5 and 6.  For example, let us consider the FPGA XC7V585T of the Virtex-7 family. This FPGA includes 585,720 LUTs (6), 728,400 D flip-flops, and 850 IOBs. Use factors of each of the reconfigurable elements specified are 0.5. According to Theorem 1, FPGA-based implementation of a group of E polynomials of m variables over GF(2 2 ) requires 2(m + E) IOBs, while that over GF(2 3 ) requires 3(m + E) IOBs. According to Statement 1, implementing each IP-core over the elements of GF(2 2 ) requires two LUTs (6) and two D flip-flops, while implementing over GF(2 3 ) requires three LUTs (6) and three D flip-flops each. According to inequality (4) and formula (3), it is possible to implement on one FPGA XC7V585T the groups of, at most:
In all the above cases of implementing the maps on FPGA XC7V585T, the limiting factor is the number of reconfigurable LUTs (6).
The proposed method allows us to estimate for a given group of E polynomials defined over GF(2 k ) and from m variables each the possibility of its implementation on a given FPGA-device. The degree of the field GF(2 k ), k = 2, 3, is determined with reference to the features of existing FPGA-devises that implement LUT(4) and LUT(6) (see. Statement 1). The proposed method allows us to perform a similar study for any FPGA, both for existing and prospective.

Conclusions
Using the method proposed, we obtained the hardware complexity estimates for the distributed computation of a group of E polynomials of m variables over field GF(2 k ). Based on the above estimates, the values of E and m, can be defined. This group of polynomials can be implemented on an FPGA with the predefined characteristics by the number of reconfigurable elements available to the user. The method suits for estimating the hardware complexity of implementing a group of polynomials on both the existing and promising would-be FPGAs.
In arranging a pipeline, the average estimate of the operating delay of a device implementing the FPGA-based computation of a group of polynomials is weakly dependent on the power of a group of polynomials and on the number of their variables at the input. Due to the cooperative computing of polynomials from the group, the logical resources of FPGA are saved considerably. A relatively small set of elements, defined by input variables that are common for a group of polynomials, can be mapped in quite a large output set of values, the power of which exceeds considerably that of the input set.
The technique presented herein is a tool that allows the FPGA-based implementation of a frantic way to increase the data array processing speed in using the focused hardware basis. Due to reconfigurable elements available in the FPGA architecture, different maps of one set into another one can be implemented at different time intervals. This result is relevant in developing a hardware platform of intelligent transport systems, the role of which becomes increasingly important in the modern information society.
Funding: The publication of this work was funded by the Association for the Advancement of Digital Development (https://aсцр.рф, accessed on 18 October 2021).