An Improved Scheduling Algorithm for Data Transmission in Ultrasonic Phased Arrays with Multi-Group Ultrasonic Sensors

High data transmission efficiency is a key requirement for an ultrasonic phased array with multi-group ultrasonic sensors. Here, a novel FIFOs scheduling algorithm was proposed and the data transmission efficiency with hardware technology was improved. This algorithm includes FIFOs as caches for the ultrasonic scanning data obtained from the sensors with the output data in a bandwidth-sharing way, on the basis of which an optimal length ratio of all the FIFOs is achieved, allowing the reading operations to be switched among all the FIFOs without time slot waiting. Therefore, this algorithm enhances the utilization ratio of the reading bandwidth resources so as to obtain higher efficiency than the traditional scheduling algorithms. The reliability and validity of the algorithm are substantiated after its implementation in the field programmable gate array (FPGA) technology, and the bandwidth utilization ratio and the real-time performance of the ultrasonic phased array are enhanced.


Introduction
The technology of multi-group ultrasonic sensors that consist of lots of piezoelectric elements and various scanning patterns of an ultrasonic phased array (UPA) have recently attracted widespread attention in the non-destructive testing area [1,2]. The UPA produces a series of the ultrasonic waves controlled by the amplitudes and phases of the electrical pulses to excite a series of elements of the sensors. The waves can easily penetrate inside some materials by adjusting their radiation direction to synthesize flexible and rapidly focused scanning ultrasonic beams. The parameters of beams such as angles, focal distances, and focal spot sizes can be readily tuned with suitable software. Therefore, the beams can be used to detect defects that possibly occur at random positions of the materials [3][4][5][6].
To increase the focusing ability, a UPA instrument is often equipped with multiple ultrasonic sensors to collect the ultrasonic echo data from different directions. Each sensor can work in one or more groups so that a variety of scanning modes are generated [7][8][9][10], which can be called as a multi-group scanning, and each group scanning includes many focused beams. Hence, the number of the sensors and the scanning groups are two important factors to determine detection accuracy [11,12], such as size, location, and orientation of defects. For example, Song et al. verified that a large-aperture hemispherical phased array can restore a sharp focus and maximize acoustic energy delivery at target tissue [11]. Regardless of the orientation of individual focused beams, the multiple focused beams can change their focal depths and sweeping angles through the phase interference. As a consequence, it is possible to precisely detect the position and the size of defects by means of increasing the number of the sensors, the scanning groups, and the focused beams. However, this strategy will in turn significantly increase the amount of scanning data in the process of the defect detection, which makes these data difficult to be transmitted to a peripheral through a single (or small quantity) high-speed serial bus, and subsequently produces an ultrasound image.
Each focused beam often brings different sampling rates and sizes of data stream. During the transmission process, different data streams compete against each other to gain access to the unique high-speed serial bus. An excellent transmission scheduling algorithm should allow all the data streams to be transferred to a peripheral without any blocking in a serialized way. Otherwise, the data streams would be blocked or severely delayed. Therefore, it is very desirable to design an effective algorithm to serially transmit a great amount of the data streams. Several well-known scheduling algorithms have been proposed, such as Time Division Multiple Access (TDMA) [13] and Round Robin (RR) [14]. The verification, analysis, and comparison of the two algorithms were presented in literature [15], which proves that the TDMA strategy based on the fixed allocation of a time slot to each master process may lead to important latencies as a time slot, and the RR protocol allows any unused slots to be reallocated to a master process to provide higher bandwidth. Unfortunately, the process of the reallocation will make the time slice resources more fragmented, and increase the complexity of the scheduling algorithm. Multiple examples of implementation for the scheduling algorithm are available in the open literatures [16][17][18][19][20][21][22][23]. Srinivasan et al. designed a self-configuring scheduling protocol for ultrasonic sensor systems by using an algorithm of the timeslot allocation, which simplified the deployment of the present detection system [16]. Long et al. proposed a time-division-multiple-access-based energy consumption balancing algorithm for the general k-hop wireless sensor networks, where one data packet is collected in one cycle, and the results demonstrated the effectiveness of the algorithm in terms of energy efficiency and time slot scheduling [19]. However, although these strategies can effectively improve the efficiency of the data transmission, they increase the complexities of both hardware and software, and their application scopes are limited, which makes such strategies not suitable for UPA of the multi-group sensor scanning system because of limited hardware and software resources and high real-time request.
FPGA, which is short for the term field programmable logic gate array, has the characteristics of static system repeatable programming and dynamic system reconfiguration, so that hardware can be modified programmatically, and FPGA also is a special kind of ASIC with the advantages of parallel processing, high speed and flexibility. In this paper, we used a series of FIFOs as high-speed caches and cache times as weights to propose a novel FIFOs and bandwidth-sharing scheduling (MFBSS) algorithm of the data transmission, where the lengths of the FIFOs are achieved by a series of multivariate equations. Actually, the algorithm shows many advantages such as real-time and high efficiency when it is implemented by FPGA technology. As far as the UPA system of the array sensors is concerned, we designed a data stream transmission scheduling mode based on the MFBSS algorithm, with which reading operations among all the FIFOs shares a fast reading bus without time slot waiting when the reading bus switches between any two FIFOs. Hence, such algorithm gives the maximum bandwidth utilization ratio and improves the real-time performance of the UPA instruments with minimal consumption of time and space resources.
In Section 2 of the paper, we will describe the data transmission of ultrasonic scanning for UPA [24][25][26]. In Section 3, we will study scheduling mechanism of the MFBSS algorithm for the data transmission. Section 4 will describe the results of implementation for the scheduling algorithm by FPGA technology. Finally, Section 5 will summarize the research to derive the conclusion. Figure 1 shows the UPA data transmission framework of the bandwidth-sharing with multiple scanning patterns [7][8][9][10]. In order to realize the optimal sampling of the UPA's echoes, different frequency echoes should be digitized with different sampling frequencies [27][28][29]. A sensor with a frequency of f p Hz produces ultrasonic echoes with the same frequency after excitation, and thus the sampling frequency is f s = K × f p Hz (K is a scaling factor, and K ≥ 2). Hence, N-group sensors can form N-group scanning patterns, generating N sampling frequencies (f s0~fsN−1 , where 0 and N − 1 represent the numbers of sampling) and forming N focusing beams with specific speeds and sizes. N-group scanning patterns, generating N sampling frequencies (fs0 ~ fsN−1, where 0 and N − 1 represent the numbers of sampling) and forming N focusing beams with specific speeds and sizes. As shown in Figure 1, the data of various scanning groups such as Gp0, Gp1, …, and GpN−1 produced from the ultrasonic sensors are written into FIFO0, FIFO1, …, FIFON−1, respectively, which are cached by a DDR3 through the Avalon bus in the bandwidth-sharing way [30]. Then, the data from the DDR3 are transmitted to the host computer through the PCIe bus [31,32]. The entire data transmission process is controlled by a bandwidth scheduler, which is composed of a controller with all the FIFOs' lengths and a reading arbiter, and usually runs the following scheduling algorithms such as First Come First Serve (FCFS), TDMA and Equal Time Slice Polling Scheduling (ETSPS) based on the principle of the RR scheduling which will be mentioned in Section 4, and so on. This paper will adopt the MFBSS algorithm to realize reading operations from every FIFO without time slot waiting through adjusting the lengths of FIFOs, timings of the reading and writing, and priority of the interrupts. Therefore, this algorithm can not only ensure the data transmission synchronization but also maximize the bandwidth utilization in all groups, which is readily implemented by FPGA technology with parallel processing.

The principle of the Maximal Bandwidth Utilization
To evaluate the utilization ratio of the data transmission bandwidth of the N-group scanning in the multi-input and single-output interfaces of the UPA system, the following requirements are satisfied: The defined parameters of the N-group scanning and the N FIFOs caches are listed in Table 1.   As shown in Figure 1, the data of various scanning groups such as Gp 0 , Gp 1 , . . . , and Gp N−1 produced from the ultrasonic sensors are written into FIFO 0 , FIFO 1 , . . . , FIFO N−1 , respectively, which are cached by a DDR3 through the Avalon bus in the bandwidth-sharing way [30]. Then, the data from the DDR3 are transmitted to the host computer through the PCIe bus [31,32]. The entire data transmission process is controlled by a bandwidth scheduler, which is composed of a controller with all the FIFOs' lengths and a reading arbiter, and usually runs the following scheduling algorithms such as First Come First Serve (FCFS), TDMA and Equal Time Slice Polling Scheduling (ETSPS) based on the principle of the RR scheduling which will be mentioned in Section 4, and so on. This paper will adopt the MFBSS algorithm to realize reading operations from every FIFO without time slot waiting through adjusting the lengths of FIFOs, timings of the reading and writing, and priority of the interrupts. Therefore, this algorithm can not only ensure the data transmission synchronization but also maximize the bandwidth utilization in all groups, which is readily implemented by FPGA technology with parallel processing.

The principle of the Maximal Bandwidth Utilization
To evaluate the utilization ratio of the data transmission bandwidth of the N-group scanning in the multi-input and single-output interfaces of the UPA system, the following requirements are satisfied: • Data transmission models Gp(n), n = 0, 1, . . . , N − 1 are independent from each other and have identical distributions for every group.
• The sum of the data bandwidth [ of all the groups and the sum of the memory bandwidth (∑ B v−RAM ) and the sum of the transmission bandwidth (∑ B v−Trans ) of the peripheral need to satisfy the following inequality: The defined parameters of the N-group scanning and the N FIFOs caches are listed in Table 1 When the Equation (1) becomes an equality, the maximum bandwidth utilization ratio is achieved, i.e., the single-output bandwidth equals to the sum of the multi-input bandwidths from the FIFOs. Consequently, the mathematical principle of the maximal bandwidth utilization ratio can be written as Equation (2).

Realization of the Maximal Bandwidth Utilization Ratio
According to Equation (2) • Assuming that at the moment T j i , when the FIFO i is read until empty, the reading operation of the FIFO i will be disabled. • At the next T j+1 i , when the FIFO i is full and the amount of the data is L(i) (i = 0, 1, . . . , N − 1), the reading operation of the FIFO i will be enabled.
When the FIFO i transfers from empty to full (where the consumed time is and a reading interrupt is produced), the FIFO i will gain access to the reading of the Avalon bus. During this process, the other FIFOs with the number of 0, 1, . . . , i + 1, i + 2, . . . , N − 1 have also transferred from full to empty with the consumed time of . The time slot transition diagram of the N FIFOs reading operations is shown in Figure 2.
Equation (3) is transformed into a matrix of Equation (5): The matrix A is achieved by elementary row transformation, and then the triangular array is applied: done by using the following recursion: According to Equation (4), f K (x i+1 ) can be described as Equation (7).
For the N-group scanning of the UPA system, when i = N, be transformed to A through the primary row transformation: Because the rank R(A) of the matrix A and the rank R(A ) of the matrix A have the following relation R(A) = R(A ) < N, Equation (5) has an infinite number of the solutions, and because A · L = 0 ⇔A · L = 0 , and the solutions can be expressed as follows: L(i + 1), (i = 0, 1, . . . , N − 2), and L(i) can be further deduced forward: Substituting the expression of f K (x i+1 ) from Equation (7) into Equation (8). The values of L(i), i = 0, 1, . . . , N − 1 are obtained, as shown in Equation (9): . . .
Sensors 2017, 17, 2355 7 of 14 when Equation (9) is multiplied by a term of , a set of fundamental solutions ξ to the equations of A · L = 0 will be obtained: Therefore, the solutions to the equations of A · L = 0 can be expressed as L = α · ξ (α∈R + ). The length function of L(i), i = 0, 1, . . . , N − 1 of the FIFOs has a proportional relation, as showed in Equation (10).
Equation (10) can be used to describe the most critical conclusion to realize the MFBSS algorithm, which shares the transmission bandwidth for the N-group scanning of the UPA system. Therefore, according to the ratios of the FIFOs' lengths, i.e., the cache time of each FIFO, the reading operation can be switched among each FIFO without time slot waiting, thus maximizing the bandwidth utilization ratio.
When the algorithm is implemented by an FPGA, in order to make the consumed resources of the  Table 2. when Equation (9) is multiplied by a term of .
The length function of L(i), i = 0, 1, …, N − 1 of the FIFOs has a proportional relation, as showed in Equation (10). : Equation (10) can be used to describe the most critical conclusion to realize the MFBSS algorithm, which shares the transmission bandwidth for the N-group scanning of the UPA system. Therefore, according to the ratios of the FIFOs' lengths, i.e., the cache time of each FIFO, the reading operation can be switched among each FIFO without time slot waiting, thus maximizing the bandwidth utilization ratio.
When the algorithm is implemented by an FPGA, in order to make the consumed resources of the FIFOs minimal, the ratio of L(0):L(1):…:L(N − 1) can often be simplified to a series of the suitable integer ratios. In the system of the N-group scanning and the N FIFOs caches, if the sampling rate fsn       When an FIFO is full, it will be immediately read until empty (the symbol R represents the reading state of the FIFO), and subsequently switches to the next FIFO without any time slot in the process of the data transmission. Likewise, when the next FIFO is just written fully, it will be read immediately. Therefore, the whole process is carried out in cycles without any delay, maximizing the utilization ratio of the data transmission bandwidth. When an FIFO is full, it will be immediately read until empty (the symbol R represents the reading state of the FIFO), and subsequently switches to the next FIFO without any time slot in the process of the data transmission. Likewise, when the next FIFO is just written fully, it will be read immediately. Therefore, the whole process is carried out in cycles without any delay, maximizing the utilization ratio of the data transmission bandwidth.

Implementation and Performance Evaluation of the Scheduling Algorithm
The scheduling algorithm is realized by using a UPA instrument (PA2000 model), which was made by Guangzhou Doppler Electronic Technologies Co., Ltd. (Guangzhou, China), and a Cyclone V GT FPGA Development Board made by Intel Corporation (Santa Clara, CA., USA) as the PCIe communication module with the PC. The UPA data are transmitted to the PC through the PCIe interface, and the multi-group scanning images are processed.
The UPA system with a work clock frequency (fs) of 100 MHz is mounted with four sensors with four different frequencies (fs) of 2, 2.5, 5, or 10 MHz, and thus the system can implement 4-group scanning patterns. The echoes of all the groups are up-sampled (fs = 10 × fp) by using digital signal processing technology, and thus the actual sampling frequencies of fs0 ~ fs3 become 20, 25, 50, or 100 MHz. The bit-width (ΔB) of the echo data is 8 bits, and both widths of the input (ΔBW) and the output (ΔBR) ports of the FIFOs are 64 bits. Table 3 lists the parameters of the writing frequency [VWf(n)] and the reading frequency (VRf) of the FIFOn caches. Obviously, VW(n) equals to VWf(n) × ΔBW, and VR equals to VRf × ΔBR for this case, hence, the scheduling algorithm can be used to allow the 4-FIFO caches to realize sharing transmission bandwidth. The capacities of the FIFOs are L(n) × ΔBW, and the length ratios of the FIFO caches can be calculated from Equation (10) Figure 4 shows the 4 FIFOs reading timing waves of the MFBSS algorithm from Signaltap, and a soft oscilloscope is used to observe FPGA internal signals. The signals of FIFO0_rd ~ FIFO3_rd respectively control the reading operation of the 4 FIFOs, allowing it to enable output data in a time

Implementation and Performance Evaluation of the Scheduling Algorithm
The scheduling algorithm is realized by using a UPA instrument (PA2000 model), which was made by Guangzhou Doppler Electronic Technologies Co., Ltd. (Guangzhou, China), and a Cyclone V GT FPGA Development Board made by Intel Corporation (Santa Clara, CA., USA) as the PCIe communication module with the PC. The UPA data are transmitted to the PC through the PCIe interface, and the multi-group scanning images are processed.
The UPA system with a work clock frequency (f s ) of 100 MHz is mounted with four sensors with four different frequencies (f s ) of 2, 2.5, 5, or 10 MHz, and thus the system can implement 4-group scanning patterns. The echoes of all the groups are up-sampled (f s = 10 × f p ) by using digital signal processing technology, and thus the actual sampling frequencies of f s0~fs3 become 20, 25, 50, or 100 MHz. The bit-width (∆B) of the echo data is 8 bits, and both widths of the input (∆B W ) and the output (∆B R ) ports of the FIFOs are 64 bits. Table 3 lists the parameters of the writing frequency [V Wf (n)] and the reading frequency (V Rf ) of the FIFO n caches. Obviously, V W (n) equals to V Wf (n) × ∆B W, and V R equals to V Rf × ∆B R for this case, hence, the scheduling algorithm can be used to allow the 4-FIFO caches to realize sharing transmission bandwidth. The capacities of the FIFOs are L(n) × ∆B W , and the length ratios of the FIFO caches can be calculated from Equation (10) Table 3, the value of V Rf is calculated to be 24.375 MHz, but it is relatively easier to implement the value of V Rf = 25.0 MHz (V Rf = f s /4 = 25.0 MHz ≈ V' Rf ) by the FPGA than the value of V Rf = 24.375 MHz, and thus we design the value of V Rf = 25.0 MHz for the experiment.  Figure 4 shows the 4 FIFOs reading timing waves of the MFBSS algorithm from Signaltap, and a soft oscilloscope is used to observe FPGA internal signals. The signals of FIFO0_rd~FIFO3_rd respectively control the reading operation of the 4 FIFOs, allowing it to enable output data in a time slice polling way. The times for reading the 4 FIFOs until empty are ∆T 0~∆ T 3 . The variables of ∆T 0 :∆T 1 :∆T 2 :∆T 3 have the following relation: : : : : All the FIFOs are readed in turn until empty in every cycle. The sum of data (DW−sum) for writing into the FIFOs and the sum of data (DR−sum) for reading out from the FIFOs are given by the two formulas respectively. As a result, the experimental results show that DW−sum equals to DR−sum, which meets the (2), and also agrees well with the theoretical analysis.
In the N-group scanning system, the bandwidth utilization ratio ( ) bw N  of the MFBSS algorithm can be expressed by Equation (11): Therefore, in the experiment, when N = 4, the utilization ratio (4) bw  of the MFBSS algorithm used in the UPA system can be calculated by Equation (12): The ETSPS scheduling algorithm based on the equal allocation of a time slot to each task. As compared with the MFBSS algorithm in this work, the ETSPS scheduling algorithm has four characteristics: ( (iv) When the FIFOi (i = 0, 1, 2, …, N − 1) is filled by writting, the reading operations of the FIFOi will be immediately performed. Therfore, the general utilization ratio of the bandwidth-sharing transmission with N-group scanning of the UPA system can be calculated by Equation (13): For N-group scanning data stream with bandwidths {VW(0), VW(1), …, VW(N − 1)} (unit: Byte/s),  In the N-group scanning system, the bandwidth utilization ratio η bw (N) of the MFBSS algorithm can be expressed by Equation (11): Therefore, in the experiment, when N = 4, the utilization ratio η bw (4) of the MFBSS algorithm used in the UPA system can be calculated by Equation (12): The ETSPS scheduling algorithm based on the equal allocation of a time slot to each task. As compared with the MFBSS algorithm in this work, the ETSPS scheduling algorithm has four characteristics: ( (iv) When the FIFO i (i = 0, 1, 2, . . . , N − 1) is filled by writting, the reading operations of the FIFO i will be immediately performed. Therfore, the general utilization ratio of the bandwidth-sharing transmission with N-group scanning of the UPA system can be calculated by Equation (13): For N-group scanning data stream with bandwidths {V W (0), V W (1), . . . , V W (N − 1)} (unit: Byte/s), we use the FPGA technology to implement the MFBSS algorithm together with the the traditional ETSPS scheduling algorithm, and analyze their bandwidth utilization ratios η bw (N) and η bw (N). For example, the FPGA (Arria-II EP2AGX65DF29I5) with a work clock frequency of f clk = 100 MHz. So, it is easy to produce the clock frequencies such as F 1 = {1, 2, 3, . . . , f clk } and F 2 = {f clk /100, f clk /99, f clk /98, . . . , f clk /1} (unit: MHz) by using the clock f clk by Digital Phase Locked Loop technology.

•
The MFBSS algorithm. According to Equation (11), the theoretical value of the shared output . The actual value of the shared output bandwidth is V R f , which satisfies the following conditions:   (14).
when the value of max(V Wf (i)) is close to , the minimum theoretical value of η bw (N) can be expressed by Equation (15). Figure 5 shows the bandwidth utilization ratio curves of the two scheduling algorithms (cross axis: the theoretical value of the shared output bandwidth V Rf (N = 4), and vertical axis: the bandwidth utilization). η bw (N) and η bw (N) are the bandwidth utilization ratios of the MFBSS algorithm and the ETSPS algorithm, respectively. bw max( ( )) Figure 5 shows the bandwidth utilization ratio curves of the two scheduling algorithms (cross axis: the theoretical value of the shared output bandwidth VRf (N = 4), and vertical axis: the bandwidth utilization).   The symbols η bw (N) and η ideal represent the experimental and ieal values of the algorithm MFBSS, respectively. The results show that the value of η bw (N) is between 92% and 100%, for example, for the above experiment of 4-group scanning based on the MFBSS algorithm, when V Rf equals to 24.375 MHz, η bw (N) equals to 97.5% and η ideal equals to 100%. Whereas the value of η bw (N) is relevant to the value of N, its value is between (100/N)% and η bw (N). For N-group scanning patterns, only when all groups have the same bandwidth, η bw (N) equals to η bw (N). Otherwise, η bw (N) would be much smaller than η bw (N).
Similarly, we use FPGA to implement the traditional ETSPS algorithm with the same parameters in Table 3, and collected reading timing waves of the 4 FIFOs by using Signaltap. As shown in Figure 6, the signals FIFO0_rd~FIFO3_rd control the reading operation of the four FIFOs, and the time resources occupied by the signals are assigned by the signal FIFO_rd.
Assuming that the symbols f FIFO_rd , f FIFO0_rd , f FIFO1_rd , f FIFO2_rd , and f FIFO3_rd represent the frequencies of signals FIFO_rd, FIFO0_rd, FIFO1_rd, FIFO2_rd, and FIFO3_rd, respectively, the following results can be easily obtained, as shown in Figure 6:  So, the utilization ratio of the data transmission with the 4-group scanning of the ETSPS algoritnm can be calculated by Equation (16): As a consequence, the bandwidth utilization ratio of the MFBSS algorithm η bw (4) reaches to 97.5% as shown in the inset of Figure 5, while the bandwidth utilization of the ETSPS algorithm η bw (4) is only 48.75%. The experimental results demonstrate that the MFBSS algorithm is efficient when used in the multi-group sensors scanning UPA system.

48.75% 
As a consequence, the bandwidth utilization ratio of the MFBSS algorithm (4) bw  reaches to 97.5% as shown in the inset of Figure 5, while the bandwidth utilization of the ETSPS algorithm (4) bw  is only 48.75%. The experimental results demonstrate that the MFBSS algorithm is efficient when used in the multi-group sensors scanning UPA system.

Conclusions
The novel MFBSS algorithm was proposed on the basis of the FIFOs variable lengths by FPGA technology, and was used for the multi-sensor scanning UPA system to maximize the bandwidth utilization ratio. The mathematical modeling of the MFBSS algorithm was established, and the formula VR =

Conclusions
The novel MFBSS algorithm was proposed on the basis of the FIFOs variable lengths by FPGA technology, and was used for the multi-sensor scanning UPA system to maximize the bandwidth utilization ratio. The mathematical modeling of the MFBSS algorithm was established, and the formula V R = N−1 ∑ n=0 V W (n) of maximizing bandwidth transmission utilization ratio in the N-group scanning patterns was successfully deduced. The lengths of the N-group FIFOs were achieved by using the designed equations, from which the length ratios were readily calculated. The algorithm was realized by FPGA technology, which made the reading operation of one FIFO switch to another FIFO without any time slot waiting, and thus it obtained the data transmission bandwidth utilization of no less than 92% hence allowing the UPA system to have the bandwidth utilization higher than that of the traditional ETSPS algorithm. In order to improve transmission efficiency of the large data generated by the sensor systems and the real-time performance of the algorithm through the multi-FPGA technology, the MFBSS scheduling algorithm based on data transmission has important applications in the multi-sensor systems, and the future research is likely to focus on designing some special scheduling algorithm module for different sensor systems.