Multi-GPU, Multi-Node Algorithms for Acceleration of Image Reconstruction in 3D Electrical Capacitance Tomography in Heterogeneous Distributed System

Majchrowicz, Michał; Kapusta, Paweł; Jackowska-Strumiłło, Lidia; Banasiak, Robert; Sankowski, Dominik

doi:10.3390/s20020391

Open AccessArticle

Multi-GPU, Multi-Node Algorithms for Acceleration of Image Reconstruction in 3D Electrical Capacitance Tomography in Heterogeneous Distributed System^†

by

Michał Majchrowicz

^*

,

Paweł Kapusta

,

Lidia Jackowska-Strumiłło

^*

,

Robert Banasiak

and

Dominik Sankowski

Institute of Applied Computer Science, Lodz University of Technology, Stefanowskiego 18/22, 90-537 Lodz, Poland

^*

Authors to whom correspondence should be addressed.

^†

This work is an extended version of conference paper published in: Majchrowicz, M.; Kapusta, P.; Jackowska-Strumiłło, L.; Sankowski, D. Acceleration of 3D ECT image reconstruction in heterogeneous, multi-GPU, multi-node distributed system. In Proceedings of the 2018 Federated Conference on Computer Science and Information Systems, Poznan, Poland, 9–12 September 2018; pp. 347–350.

Sensors 2020, 20(2), 391; https://doi.org/10.3390/s20020391

Submission received: 28 September 2019 / Revised: 19 December 2019 / Accepted: 4 January 2020 / Published: 10 January 2020

(This article belongs to the Special Issue Sensors Signal Processing and Visual Computing 2019)

Download

Browse Figures

Versions Notes

Abstract

:

Electrical capacitance tomography (ECT) is one of non-invasive visualization techniques which can be used for industrial process monitoring. However, acquiring images trough 3D ECT often requires performing time consuming complex computations on large size matrices. Therefore, a new parallel approach for 3D ECT image reconstruction is proposed, which is based on application of multi-GPU, multi-node algorithms in heterogeneous distributed system. This solution allows to speed up the required data processing. Distributed measurement system with a new framework for parallel computing and a special plugin dedicated to ECT are presented in the paper. Computing system architecture and its main features are described. Both data distribution as well as transmission between the computing nodes are discussed. System performance was measured using LBP and the Landweber’s reconstruction algorithms which were implemented as a part of the ECT plugin. Application of the framework with a new network communication layer reduced data transfer times significantly and improved the overall system efficiency.

Keywords:

electrical capacitance tomography; heterogeneus system; distributed systems; multi-GPU computations

1. Introduction

Electrical capacitance tomography (ECT) is a measurement technique that can be used for non-invasive monitoring of diverse industrial processes in 2D [1,2], 3D [3] and even 4D dynamic mode [4]. ECT is performing the task of imaging of materials with a contrast in dielectric permittivity by measuring capacitance from a set of electrodes placed around the investigated object. A fundamental ECT measurement setup consisting of the ECT sensor, ECT tomograph and a computer is shown in Figure 1a. In ECT measurement cycle the following phases can be distinguished: capacitance measurement, data acquisition (signal conditioning and analog to digital conversion), data collection, and finally calculating of electrical permittivity distribution, which is called image reconstruction. The entire measurement process is controlled automatically by a computer system (Figure 1b). Accuracy of ECT imaging is determined by hardware as well as by software and both of these factors are taken into consideration while new measurement setup is designed. Sensor resolution is limited by the number of electrodes and their size. In 3D ECT usually three or four rings with 8, 12 or 16 electrodes are used. Reducing the measurement time is limited by time of capacitance measurement. Also, data acquisition and processing have a significant impact on accuracy and time limits.

In order to achieve a high quality tomographic image [5], complex reconstruction algorithms performing many matrix calculations have to be applied [6,7,8,9]. However, choosing an optimal reconstruction algorithm and its implementation depends on an application. In medicine or material science, where an internal object or dangerous material crack [10] needs to be detected, the most important feature of the algorithm is accuracy and precision of image reconstruction, while time of data processing is not so crucial. In dynamic measurements of fast changing media, time of measurement and data processing is equally or even more important parameter than accuracy. Therefore different solutions accelerating these calculation have been reported in the past, especially dealing with parallel computing [11,12,13,14,15], sparse matrices and Finite Elements Method [16], Fourier-based sparse representations [17], neural networks approach [18,19], fuzzy logic [20] or field-programmable gate array (FPGA) implementation [21]. Usually a compromise between accuracy and rapidity of ECT measurement must be taken. While for control purposes determination of some important process parameters can be sufficient, fast and accurate 3D process visualisation is a challenging task.

In this work the authors concentrate on parallel computing in distributed system as computational capabilities of single PC proved to be one of the main limiting factors for applying 3D ECT for on-line industrial process monitoring, such as for example oil-gas flow [3] or silo flow [22] monitoring.

Distributed systems are one of the most important technological achievements in recent years [23,24], which have had a significant impact on the development of modern computing. The scope of distributed systems applications in everyday life is very wide [25,26], from local systems such as cars, ships, aircraft to global systems of millions of nodes used for data processing services; from simple built-in systems consisting of very small and simple sensors to those containing powerful computational components [27]; from built-in systems to those that support advanced interactive user interfaces.

A distributed system can be defined as one in which the hardware components or software on the computers in the network communicate and coordinate their actions only through the transmission of messages [28,29]. This simple definition covers a whole range of systems in which networked computers can be successfully used to perform tasks faster or more accurately than a single unit. The main motivation for the construction and use of distributed systems is the desire to separate resources [30,31,32]. The concept of “resources” that can be effectively shared on a computer network includes both hardware components such as disks and printers, and software such as files, databases, and other objects such as video streams that are transmitted from a digital video camera or a voice signal transmitted between mobile phones and stations.

In this paper, a novel heterogeneous, multi-graphics processing unit (GPU), multi-node distributed system is proposed, with a framework for parallel computing and a special plugin dedicated to ECT. The system’s architecture and features are presented, and also parallel algorithms of ECT data processing and image reconstruction. Test results comparing the developed system with new network communication layer are compared to the authors’ earlier solutions based on the Xgrid platform [13,33]. Finally, the results of the latest system modification and optimization are presented and compared to the previously developed solutions. [34,35].

2. Image Reconstruction in ECT

The scheme of image synthesis in electrical capacitance tomography is called image reconstruction. It is based on solving the so called inverse problem [36], in which the spatial distribution of electric permittivity is approximated from the measured values of capacitances between the sensor electrodes. Inverse problem in ECT is ill posed due to the small number of capacitance C measurements in relation to number of pixels in reconstructing image. This results a poor spatial resolution of ECT technique.

Inverse problem is nonlinear and therefore two main groups of image reconstruction algorithms can be distinguished: nonlinear and linear. Non-linear algorithms are more accurate but slower [16]. Linear algorithms use approximate linear model, which is less accurate but simple and useful in engineering praxis. These algorithms are used predominantly for monitoring fast-varying industrial processes, like oil-gas flows in pipelines [3] or gravitational flows and discharging of silo [22].

One of the most used reconstruction algorithms in ECT is linear back-projection (LBP). Even though it is characterized by low spatial resolution, it is not as computationally complex as other solutions. Moreover, there is still active research on improving its characteristics [37]). LBP is based on the following Equation [8,38]:

\begin{matrix} ε = S^{T} C_{m} \end{matrix}

(1)

where:

$ε$ —electric permittivity vector (output image),
$S^{T}$ —sensitivity matrix, transposed
$C_{m}$ —capacitance measurements vector.

The Landweber’s algorithm is a more accurate, iterative algorithm, which is based on the following Equation [8,38]:

\begin{matrix} ε_{k + 1} = ε_{k} - α S^{T} (S ε_{k} - C_{m}) \end{matrix}

(2)

where:

$ε_{k + 1}$ —image obtained in current iteration,
$ε_{k}$ —image from the previous iteration,
$α$ —convergence factor (scalar),
$S^{T}$ —sensitivity matrix, transposed,
$S$ —sensitivity matrix,
$C_{m}$ —capacitance measurements vector.

Computation of sensitivity matrix

S

for ECT tomography incorporates a sensitivity model based on an energy of electric field accumulated in a 3D ECT sensor space. This energy can be calculated as the energy of capacitor or as the energy of electric field:

\begin{matrix} W_{C} = \frac{1}{2} U^{2} C \end{matrix}

(3)

\begin{matrix} W_{E} = \frac{1}{2} \int_{Ω} ε (x, y, z) \vec{E} (x, y, z) \cdot \vec{E} (x, y, z) d Ω \end{matrix}

(4)

where:

W_{C}

is an electric field energy accumulated in capacitor with capacity C and with applied voltage

U

,

W_{E}

is an energy of electric field E with spatial electric permittivity distribution

ε (x, y, z)

. Of course both equations describe the same energy therefore they can be compared together:

\begin{matrix} \frac{1}{2} U^{2} C = \frac{1}{2} \int_{Ω} ε (x, y, z) \vec{E} (x, y, z) \cdot \vec{E} (x, y, z) d Ω \end{matrix}

(5)

and consequently:

C = \frac{1}{U^{2}} \int_{Ω} ε (x, y, z) {\vec{E}}^{2} (x, y, z) d Ω

(6)

The sensitivity value of any point

j \in Ω

describes the connections between changes of spatial permittivity distribution in that point and resulting changes of the capacity

i

, as it can be expressed by followed equation:

S_{i, j} = \frac{δ C_{i}}{δ ε_{j}}

(7)

Combining both Equations (6) and (7) and omitting

i

index which is connected with measurement sequence we can get:

\begin{matrix} S_{j} = \frac{1}{U^{2}} \frac{δ (\int_{Ω} ε (x, y, z) {\vec{E}}^{2} (x, y, z) d Ω)}{δ ε_{j}} \end{matrix}

(8)

However it is still not defined what

ε_{i}

means and how it is related to the spatial electric permittivity distribution function

ε (x, y, z)

. Typically when finite elements modeling for ECT is used the function

ε (x, y, z)

can be defined as constant value inside each point

j \in Ω

. Therefore Equation (8) can be rewritten as:

\begin{matrix} S_{j} = \frac{1}{U^{2}} \frac{δ (\int_{Ω} ε_{j} {\vec{E}}^{2} (x, y, z) d Ω)}{δ ε_{i}} \end{matrix}

(9)

Assuming that voltage

U

equals to a difference between potentials

φ_{e l 1}

and

φ_{e l 2}

applied to electrodes

e l 1

and

e l 1

and expanding Equation (8) we can compute the sensitivity matrix

S

using formula:

\begin{matrix} S_{i, j} (x, y, z) = \frac{1}{φ_{e l 1} \cdot φ_{e l 2}} \frac{δ (\int_{Ω} ε_{j} (x, y, z) \vec{E_{e l 1, i}} (x, y, z) \cdot \vec{E_{e l 2, i}} (x, y, z) d Ω)}{δ ε_{j}} \end{matrix}

(10)

where

S_{i, j} (x, y, z)

is a sensitivity value for point

j \in (x, y, z)

while i-th capacitance measurement between electrodes

{e l}_{1}

and

{e l}_{2}

formed,

φ_{e l 1}

and

φ_{e l 2}

are potentials applied to electrodes

{e l}_{1}

and

{e l}_{2}

respectively.

\vec{E_{e l 1, i}} (x, y, z)

and

\vec{E_{e l 2, i}} (x, y, z)

are electric field vectors,

ε_{j}

is a permittivity value for the point

j \in (x, y, z)

.

Potentials

φ_{e l 1}

and

φ_{e l 2}

can be determined numerically using FEM modelling according to general formula:

\begin{matrix} φ = Y^{- 1} F \end{matrix}

(11)

where

φ

is a sought distribution of the electric field—represented by the spatial distribution of nodal potential—partial solution of the forward problem in capacitance tomography;

Y

is a transformation matrix, built according to the geometric dependencies of sensor model mesh and Neumann boundary conditions;

F

is the right-hand side forces vector, defining the given Dirichlet boundary conditions

Electric field distribution

φ

can be further used for Gauss Law based inter-electrode capacitance computations:

\begin{matrix} C_{e 1 e 2} = \frac{{\int \int \int}_{Ω} ε (x, y, z) grad [φ (x, y, z)] d Ω}{φ_{e 1} - φ_{e 2}} \end{matrix}

(12)

where:

$C_{e 1 e 2}$ —Capacitance between electrodes e1 and e2
$ε (x, y, z)$ —distribution of electric permittivity in domain $Ω$
$φ (x, y, z)$ —distribution of electric potential in domain $Ω$
$φ_{e 1}$ —electric potential applied to electrode e1
$φ_{e 2}$ —electric potential applied to electrode e2
$x, y, z$ —Cartesian coordinates in domain $Ω$

The sensitivity matrix is an important part of the image reconstruction algorithms and has to be recalculated and tweaked for every sensor and specific visualization task. In some applications it is necessary to update it regularly and use non-linear reconstruction algorithm [16]. However, in this work only the linear Landweber’s algorithm is considered, so the sensitivity matrix calculation is a one-time process, after which the matrix is then distributed to each computing node. It is then treated as a constant in computations and does not contribute to the data transfer time during the reconstruction.

In the case of the Landweber’s algorithm each iteration improves the overall quality of the output image. The more iterations are performed the better the image quality is obtained, however the overall time of computations increases. As a result acceleration of image reconstruction process is a very important issue. Nowadays, industrial applications of 3D electrical capacitance tomography are mainly limited by the two combined factors: image quality and computational complexity of image reconstruction process [39].

The difference in quality of the reconstructed image, caused by the difference in number of Landweber’s algorithm iterations that were possible, can be seen in Figure (Figure 2), which shows reconstruction quality improvement rate for given iterations number, using two examples (two simple balls and challenging T letter shape).

Image reconstruction using deterministic methods requires execution of a large number of basic operations of linear algebra, such as transposition, multiplication, addition and subtraction [34,40]. Matrix calculations for a large number of elements is characterized by a high computational load [41]. Moreover, their computational complexity class is polynomial. For example, matrix multiplication is an algorithm of O(

n^{2 + ε}

) class [42], for any constant

ε

> 0, which means that n-fold increase in the multiplied matrix dimensions will

n^{2 + ε}

fold increase the execution time. Matrix multiplication is a key operation in ECT imaging and therefore some researchers decided even to build a custom hardware for this purpose to speed up computations [43].

In this work, an approach based on computations on GPUs is proposed, as it allows for high flexibility and relatively low application costs. Algorithms of ECT image reconstruction in a distributed multi-node, multi-GPU environment were developed and tested. Two image reconstruction algorithms were implemented: the LBP and the Landweber’s algorithm.

Due to the nature of the Landweber’s algorithm it is necessary to exchange the data (

ε_{k + 1}

) in every iteration, so its parallelization is a challenging task. The main concept of our implementation of the Landweber’s reconstruction in distributed system will be discussed in details in Section 4.

3. Developed Distributed System

As a result of the earlier studies [13,41] a new distributed system dedicated to ECT computations has been developed. The system is specifically designed to accelerate matrix computations that are a crucial part of reconstruction algorithms used in ECT [16]. The previously developed solution was based on the Xgrid platform, used as a network layer. However, the analysis of this system showed the limitations of this solution, and the main conclusion from the previous research [33] was that the new software for the system should be developed and new network communication should be implemented.

3.1. Design Assumptions

The use of a heterogeneous system for distributed computing in ECT required the solution of series of tasks without which the proposed system would not work properly. The most important of these are as follows:

Division of matrices between nodes
Basic operations of linear algebra (transposition, addition, subtraction, multiplication)
Data transfer between nodes
Planning and division of tasks
Support for heterogeneous devices
Support for calculations using graphics cards
Support for modern multi-core processors as a set of devices
Possibility to extend existing solutions with pseudo inheritance from implemented layouts

3.2. Implementation

The architecture of the designed system is shown in Figure 3.

The system contains both GPU and CPU based computing nodes for performing general purpose computation. It has an open structure and is designed to utilize currently sold hardware, but also for easy extension in the future with plugins and other upgrades.

In our new solution a special framework called KIS digital computing (KISDC), where KIS is an abbreviation of the Polish name of the Computer Engineering Department, was designed and built that provides software tools needed both for the system architecture expansion and new algorithms development and implementation in a distributed heterogeneous environment [35].

The proposed approach allows for a greater flexibility of the developed solutions, provides tools for their easy testing and enables further acceleration of ECT image reconstruction. The system was designed as a modular, layered architecture (Figure 4). This approach allows limiting the dependencies between the individual modules. Moreover, thanks to this architecture, it is possible to abstract the compute devices using KISDC-DEV module. It is an abstraction and management layer that hides the type of the underlying hardware from the user and makes all the algorithms written using the provided application programming interface (APIs) hardware-agnostic.

Expansion of the computing power of the system is possible through the use of “plug-in” architecture (by adding support for new devices, such as FPGAs). The basic operations of linear algebra were implemented in the system as a set of functions in the form of an API.

The framework was designed to ensure an efficient use of the computing power of all the devices present in the nodes. This architecture is scalable and allows users to expand the capabilities of the system by adding more nodes. The above assumptions pose many challenges in the architecture of the system itself, but their application makes it straightforward to use the environment to speed up computations in existing projects, thus testing and developing new distributed algorithms is much faster. As shown in the activity diagram (Figure 5), the process of performing calculations using the KISDC system allows for much more flexibility in the number and type of devices used.

The system is under active development and is modified according to the new demands. As part of the latest update a special algorithm was designed that computes an array of performance coefficients (Figure 6) which is then used to optimize the utilization of the system. In the proposed solution coefficients of data distribution are evaluated in experimental way. As part of this algorithm (which is used in setup phase of KISDC platform) special benchmark procedure has been developed (Figure 6), which allows to choose the coefficient values according to the current hardware configuration and to the system working conditions.

Computed coefficients are used in the device management layer to distribute the data between devices on each node (Figure 7). Experimental method of these coefficients evaluation is applied as well for data distribution between devices on each node as well between computers in distributed system. An analogous algorithm to this presented in Figure 6 and benchmark are used for coefficients computation for data distribution between computers.

The KISDC architecture aids in the development of image reconstruction algorithms in electrical capacitance tomography, as it simplifies their implementation for various hardware configurations in a distributed system. As shown in the previous section, the process of performing calculations application of the KISDC system allows for much more flexibility in the number and type of devices used than the OpenCL library.

In order to take full advantage of the distributed nature and the heterogeneity of the system, a new solution has been developed for performing computations on big sets of data. This problem is especially prominent in case of 3D ECT, where the size of used matrices is significant. When using GPUs as computational devices the memory available is usually much less than system memory (RAM). When the computational load is proportionally divided between the devices it is possible to only send necessary data to each device. Moreover, for some algorithms and compute kernel implementations, the data split can be done on matrix basis, i.e., the data will be divided between the devices in a way that will be hidden from the algorithm, based only on the available memory and computing capabilities of the nodes.

Furthermore, other minor optimisations of the algorithm have been implemented recently, what will be more accurately described in Section 5. The whole system software has been developed in C programming language with the use of object oriented programming paradigm, dynamic data structures and OpenCL libraries. Moreover, in order to guarantee correctness of the obtained computation results, we make sure that all devices present in the system are fully compliant with IEEE754-2008 standard, describing floating-point arithmetic. We also use techniques of increasing the precision and decreasing rounding errors, like utilizing hardware accelerated fused multiply add (FMA) operations, wherever possible.

4. Parallel Algorithm of ECT Image Reconstruction in Distributed System

The main concept of the Landweber’s algorithm (Equation (2)) implementation in a distributed system is shown in Figure 8. Matrix

S

is calculated from Equation (10) as it was explained in Section 2. Due to the high density meshes needed in this case to obtain sufficient computing accuracy for 3D ECT large number of numerical operations of high computational complexity has to be performed to solve this task. Parallel distributed algorithms developed for this purpose using the Finite Element Method, sparse matrices and multiple GPUs are described in the Authors’ earlier work [16]. Matrices

S

and

S^{T}

are calculated once at the beginning and remain constant, so they are sent only once to compute nodes before the other calculations. This is very important for the whole distributed system performance, because sensitivity matrices are big matrices, a few orders of magnitude bigger than measurements or output image vectors. Therefore they transferring over a LAN would have a huge impact on the system performance and would slow down the calculation process.

The next key problem is data distribution between the nodes in heterogeneous distributed system (Figure 3). In most 3D ECT systems measurement data are collected with higher frequency than they can be reconstructed. Moreover, because of the asynchronous nature of the developed solution, based on the commissioning of tasks to local GPUs using OpenCL technology, as well as remote computing nodes, delays can accumulate, therefore there is a need for their elimination by buffering systems. Measurement data

C

are buffered in a memory in

C_{m}

array and they need to be prepared for

ε_{k}

calculation before sending to the compute nodes. In the case of data distribution between computers larger amount of data containing full data frames are grouped and buffered in

CN

array for sending through the computer network. Within the computers smaller amounts of data are distributed between the compute nodes. Algorithms of data distribution between computers or compute nodes are based on experimentally determined coefficients corresponding to the available memory and computing capabilities of the nodes, as it was described in Section 3 and presented in Figure 6. These coefficients are also computed once for a specific hardware configuration.

In the first iteration, approximated electrical permittivity distribution

ε_{0}

is calculated from LBP algorithm (Equation (1)), then next

ε_{k + 1}

vectors are calculated from Equation (2) in distributed system.

Flow diagram of the algorithm for data and calculations distribution is shown in (Figure 9). The algorithm has been designed, implemented and optimized from the start as a solution suited to heterogeneous multi-thread CPUs and multi-GPUs distributed systems. CPUs are used for memory allocation, task division and then management of data collection from compute nodes, while GPUs are used for linear algebra operations on matrices. In order to split data according to devices, as it was shown in Figure 7, computational capabilities array of coefficents is used to calculated start (Sw) and end indeces (Ew) for data division. Due to the specific nature of the computations on GPU the most optimal solution is to start a separate thread for each GPU in the system, that are synchronised when reading the results.

The main idea of the developed algorithm is, that each GPU inside the compute node calculates the solution to the Landweber’s algorithm for a few picture elements, by assigning to each GPU calculations specific to the selected part of image, where the whole multi-GPU system computes the result of a single image reconstruction. In other words all computations of Landweber’s algorithm iterations are performed by GPUs whereas rest of the system is responsible for data transfer and division. This approach allows for more precise control of tasks allocation. This in turn enables its use in distributed systems with a high degree of heterogeneity.

Main problem of real-time image reconstruction are delays caused by computation, nevertheless data sending is also a very important issue since it introduces additional delays and hinders the whole process. Therefore, both these aspects have been tested while the system was developed. Problems of optimal hardware configuration in heterogeneous multi-CPU multi-GPU system were considered more detaily in our earlier publications [13,15]. In this work we focused more on communication issue and testing new developed network layer.

5. Test Results

All tests were conducted for the same hardware configuration consisted of two nodes of high computing power, using eight thread Intel i7 930 CPUs and Nvidia GPUs (Tesla S1070 + Tesla C2070 compute devices in the first and dual GeForce GTX 570 graphic cards in the second node), connected via 1 Gigabit LAN. The first part of this section is related to network layer testing. The next part deals with a performance of the whole new developed distributed system and its comparison to the previous system version. Finally, possibility of industrial process monitoring is discussed.

5.1. Network Layer Performance

By creating a network layer the existing solutions and network protocols can be used or the own ones can be developed. In the previously built distributed system a ready-made Xgrid platform was applied as a network layer [33].

In this work the author’s KISDC system with KISDC-NET network layer was designed and implemented. While designing the KISDC-NET layer, the existing network protocols were applied and tested in advance in order to choose the best solution.

The network characteristics of the previously developed solution based on the Xgrid system was compared with the new system using other data distribution protocols: hypertext transfer protocol (HTTP) [44], file transfer protocol (FTP) and server message block (SMB).

Based on the obtained results (see Figure 10) it can be stated that the use of each of the tested protocols ensured a shorter data transfer time than the Xgrid-based solution, although the best results were achieved for the HTTP protocol. Therefore, HTTP protocol was selected as the best one for the KISDC-NET. More accurate comparisons of HTTP and the Xgrid-based solutions are shown in Figure 11 and Figure 12. Average data transfer time for Xgrid was about 25 times longer than for HTTP (0.189 s and 0.008 s for 2 KiB and 0.481 s and 0.019 s for 3072 KiB respectively). It can also be noted that for Xgrid-based solution random disturbances in data transfer occurred frequently. They are visible in Figure 11 and Figure 12 as spikes with much longer than average data transfer time. For the HTTP protocol, these spikes occurred significantly less frequently and have many times smaller magnitude both for C and image vectors.

The comparison of the data transfer and communication times between the nodes shown in Figure 10 indicates very clearly that the use of HTTP protocol ensured much better results than the Xgrid platform. Already for the measurement data vector (2 KiB), it is evident that using the HTTP protocol allowed for significant reduction of the total image reconstruction time. Moreover, for the highest-resolution (3072 KiB) image vector, the use of HTTP allowed to improve significantly data transmission in a distributed environment compared to the results obtained with the Xgrid platform in the same configuration. In all cases data transmission was performed much faster in KISDC than on Xgrid platform.

5.2. System Performance

Both distributed systems, one based on the Xgrid platform and the KISDC, have been extensively tested and compared. In each test case the hardware configuration was the same as it was specified at the beginning of this section.

The comparison of times of a single frame reconstruction in the two node system are shown in Figure 13. Yellow color represents calculation time (the same for the both systems), blue color is related to data transfer time for the KISDC system, and orange color denotes data transfer time for the Xgrid system. For each of the analyzed data sizes, the speed up of image reconstruction expressed in the number of reconstructed frames per second was noted. The most significant relative acceleration was achieved for 48 KiB and 96 KiB image vectors.

In the recent system version a few optimizations of the algorithm have been implemented. The size field in the 4 byte header has been removed and replaced with single byte Enum, with defined values for all data sizes that the system supports. Moreover, calculation of

α S^{T}

is performed only once on the system start instead of in every iteration of the algorithm. Both versions of the distributed Landweber’s algorithm supporting frame buffering (Figure 8) were tested. The results are presented in Figure 14). It should be noted that implementation of the optimizations in calculations and data transfer layers allowed for significant reduction of the total image reconstruction time in comparison to previously reported results [35].

5.3. Monitoring of Industrial Processes

In order to monitor fast changing industrial processes in a satisfactory manner it is necessary to achieve high image quality at high frame rate per second. Therefore, possibility of industrial process monitoring and visualisation was also tested. At the beginning, calculations on a single computer were performed and at most 10–15 iterations of Landweber’s algorithm in one second have been achieved using single computer with CPU. In comparison, the KISDC platform (using two high performance nodes with multiple GPUs each) allowed for the reconstruction of 8 frames per second with 50 iterations of Landweber’s algorithm at the same time. Alternatively, if better image quality is demanded, it is possible to use highest possible image size that was tested (672KiB) and achieve 1 frame of image with 400 iterations of Landweber’s algorithm per second (Figure 13).

The system is flexible and can be extended by the next computing nodes, according to the needs.

6. Conclusions

A flexible, distributed computing system for tomographic image reconstruction called KISDC has been designed and developed. The system’s framework allows to accelerate any kind of computation dealing with a basic linear algebra operations. However, it should be noted that the KISDC is highly scalable and can be easily extended either by specific OpenCL kernel or by a plugin providing support for a special kind of calculations.

The work described in this paper was focused on improvement of data management in the distributed system and on reducing delays in the data transmission over the computer network. An original algorithm was designed that determines an array of performance coefficients for computing nodes in experimental way and allows to optimize the utilization of the system. The coefficient values are updated according to the current hardware configuration and to the system working conditions.

Also, a new network layer has been implemented. The comparison of times of data transfer and communication between the nodes shows very clearly that the use of the new developed system with HTTP protocol ensures much better results than with the Xgrid platform. It is also evident that the KISDC system allowed for a significant reduction of the total time of a single frame reconstruction and a major speed up in implementations of both the LBP and the Landweber’s algorithms.

Author Contributions

Conceptualization, all authors; methodology, all authors; software, M.M. and P.K.; validation, M.M., P.K., and L.J.-S.; formal analysis, M.M., P.K., and L.J.-S.; investigation, M.M., P.K., and L.J.-S.; resources, M.M. and P.K.; data curation, M.M. and P.K.; writing—original draft preparation, M.M., P.K., and L.J.-S.; writing—review and editing, R.B. and L.J.-S.; visualization, R.B., M.M., and P.K.; supervision, L.J.-S. and D.S.; funding acquisition, L.J.-S. and D.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Lodz University of Technology, Faculty of Electrical, Electronic, Computer and Control Engineering as a part of statutory activity (project no. 501/12- 24-1-5418).

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

3D	three dimensional
API	Application programming interface
CPU	Central processing unit
ECT	Electrical capacitance tomography
FPGA	Field-programmable gate array
FTP	File transfer protocol
GPU	Graphics processing unit
HTTP	Hypertext transfer protocol
KIS	Computer engineering department (in Polish: Katedra Informatyki Stosowanej)
KISDC	KIS digital computing
LAN	Local area network
LBP	Linear back projection
RAM	Random access memory
SMB	Server message block

References

Grudzień, K.; Chaniecki, Z.; Romanowski, A.; Niedostatkiewicz, M.; Sankowski, D. ECT Image Analysis Methods for Shear Zone Measurements during Silo Discharging Process. Chin. J. Chem. Eng. 2012, 20, 337–345. [Google Scholar] [CrossRef]
Garbaa, H.; Jackowska-Strumiłło, L.; Grudzień, K.; Romanowski, A. Funnel Flow Dynamics Prediction Based on Electrical Capacitance Tomography and Artificial Neural Networks. In Proceedings of the International Interdisciplinary PhD Workshop, Brno, Czech Republic, 12–15 Sepember 2016; pp. 58–62. [Google Scholar]
Banasiak, R.; Wajman, R.; Fidos, H.; Jaworski, T.; Fiderek, P.; Kapusta, P.; Majchrowicz, M.; Sankowski, D. Fusion of three-dimensional electrical capacitance tomography and fuzzy logic inference for phases fraction and flow structures identification in horizontal and vertical gas-liquid flow pipelines. In Proceedings of the 7th World Congress on Industrial Process Tomography, Kraków, Poland, 2–5 September 2013; pp. 818–827. [Google Scholar]
Soleimani, M.; Mitchell, C.N.; Banasiak, R.; Wajman, R.; Adler, A. Four-dimensional electrical capacitance tomography imaging using experimental data. Prog. Electromagn. Res. PIER 2009, 90, 171–186. [Google Scholar] [CrossRef] [Green Version]
Minervini, M.; Rusu, C.; Tsaftaris, S. Computationally Efficient Data and Application Driven Color Transforms for the Compression and Enhancement of Images and Video. In Color Image and Video Enhancement; Celebi, E., Lecca, M., Smolka, B., Eds.; Springer: Cham, Switzerland, 2015; pp. 371–393. [Google Scholar]
Polakowski, K.; Filipowicz, S.; Sikora, J.; Rymarczyk, T. Quality of Imaging in Multipath Tomography. Prz. Elektrotech. 2009, 2009 85, 134–136. [Google Scholar]
Rymarczyk, T.; Sikora, J.; Adamkiewicz, P. Effective algorithm for tomography imaging in three-dimensional problems. Prz. Elektrotech. 2019, 95, 115–118. [Google Scholar]
Yang, W.Q.; Peng, L. Review of image reconstruction algorithms for electrical capacitance tomography, Part 1: Principles. In Proceedings of the International Symposium on Process Tomography, Wrocław, Poland, 11–12 September 2002; pp. 123–132. [Google Scholar]
Yang, W.Q.; Peng, L. Review of image reconstruction algorithms for electrical capacitance tomography, Part 2: Evaluation. In Proceedings of the International Symposium on Process Tomography, Wrocław, Poland, 11–12 September 2002; pp. 133–140. [Google Scholar]
Rashetnia, R.; Alla, O.; Gonzalez-berrios, G.; Seppanen, A.; Ghaz, M. Electrical Resistance Tomography Based Sensing Skin with Internal Electrodes for Crack Detection in Large Structures: Preliminary Results Standard Difference Imaging Study. Mater. Eval. 2018, 10, 1405–1413. [Google Scholar]
Bieberle, A.; Frust, T.; Wagner, M.; Bieberle, M.; Hampel, U. Data processing performance analysis for ultrafast electron beam X-ray CT using parallel processing hardware architectures. Flow Meas. Instrum. 2017, 53A, 180–188. [Google Scholar] [CrossRef]
Kapusta, P.; Majchrowicz, M. Accelerating Image reconstruction algorithms in Electrical Capacitance Tomography using Multi-GPU system. In Advanced Numerical Modelling, International Interdisciplinary PhD Workshop; Electrotechnical Institute: Warsaw, Poland, 2011; pp. 47–49. [Google Scholar]
Kapusta, P.; Majchrowicz, M.; Sankowski, D.; Jackowska-Strumiłło, L.; Banasiak, R. Distributed multi-node, multi-GPU, heterogeneous system for 3D image reconstruction in Electrical Capacitance Tomography—Network performance and application analysis. Przegląd Elektrotech. 2013, 89, 339–342. [Google Scholar]
Majchrowicz, M.; Kapusta, P.; Banasiak, R. Applying parallel and distributed computing for image reconstruction in 3D Electrical Capacitance Tomography. Zesz. Nauk. Agh Autom. 2010, 14, 711–722. [Google Scholar]
Majchrowicz, M.; Kapusta, P.; Jackowska-Strumiłło, L.; Sankowski, D. Acceleration of image reconstruction process in the Electrical Capacitance Tomography 3D in heterogeneous, multi-GPU system. Inform. Control Meas. Econ. Environ. Prot. (IAPGOS) 2017, 7, 37–41. [Google Scholar] [CrossRef]
Kapusta, P.; Majchrowicz, M.; Sankowski, D.; Jackowska-Strumiłło, L. Acceleration of image reconstruction in 3D Electrical Capacitance Tomography in heterogeneous, multi-GPU system using sparse matrix computations and Finite Element Method. In Proceedings of the 2016 Federated Conference on Computer Science and Information Systems (ACSIS), Gdańsk, Polan, 11–14 September 2016; Volume 8, pp. 679–683. [Google Scholar]
Gunes, C.; Acero, D.O.; Marashdeh, Q.M.; Teixeira, F.L. Acceleration of Electrical Capacitance Volume Tomography Imaging by Fourier-Based Sparse Representations. IEEE Sens. J. 2018, 18, 9649–9659. [Google Scholar] [CrossRef]
Garbaa, H.; Jackowska-Strumiłło, L.; Grudzień, K.; Romanowski, A. Neural network approach to ECT inverse problem solving for estimation of gravitational solids flow. In Proceedings of the 2014 Federated Conference on Computer Science and Inf. Systems (AAIA’14), Warsaw, Poland, 7–10 September 2014; Volume 2, pp. 19–26. [Google Scholar]
Garbaa, H.; Jackowska-Strumiłło, L.; Grudzień, K.; Romanowski, A. Application of electrical capacitance tomography and artificial neural networks to rapid estimation of cylindrical shape parameters of industrial flow structure. Arch. Electr. Eng. 2016, 65, 657–669. [Google Scholar] [CrossRef]
Fiderek, P.; Kucharski, J.; Wajman, R. Fuzzy inference for two-phase gas-liquid flow type evaluation based on raw 3D ECT measurement data. Flow Meas. Instrum. 2017, 54, 88–96. [Google Scholar] [CrossRef]
Deabes, W. FPGA Implementation of ECT Digital System for Imaging Conductive Materials. Algorithms 2019, 12, 28. [Google Scholar] [CrossRef] [Green Version]
Sankowski, D.; Grudzień, K.; Chaniecki, Z.; Banasiak, R.; Wajman, R.; Romanowski, A. Process tomograhy development at Technical University of Lodz. In Electrical Capacitance Tomography Theoretical Basis and Applications; Sankowski, D., Sikora, J., Eds.; Wydawnictwo Książkowe Instytutu Elektrotechniki: Warszawa, Poland, 2010; pp. 70–95. [Google Scholar]
Cugola, G.; Di Nitto, E.; Fuggetta, A. Exploiting an Event-Based Infrastructure to Develop Complex Distributed Systems. In Proceedings of the 20th International Conference on Software Engineering (ICSE’98), Kyoto, Japan, 19–25 April 1998; pp. 261–270. [Google Scholar]
Gabassi, M.; Dupouy, B. Przetwarzanie Rozproszone w Systemie UNIX; Lupus: Warszawa, Poland, 1995. [Google Scholar]
Czajkowski, K.; Fitzgerald, S.; Foster, I.; Kesselman, C. Grid Information Services for Distributed Resource Sharing. In Proceedings of the 10th IEEE International Symposium on High Performance Distributed Computing (HPDC-10), San Francisco, CA, USA, 7–9 August 2001; pp. 181–194. [Google Scholar]
Czajkowski, K.; Foster, I.; Kesselman, C.; Sanger, V.; Tuecke, S. SNAP: A Protocol for Negociating Service Level Agreements and Coordinating Resource Management in Distributed Systems. In Job Scheduling Strategies for Parallel Processing. JSSPP 2002; Springer: Berlin/Heidelberg, Germany, 2002; pp. 153–183. [Google Scholar]
Nawrocki, W. LAN-Based Measurement Systems. In Measurement Systems and Sensors, 2nd ed.; Nawrocki, W., Ed.; Artech House Remote Sensing Library: London, UK, 2016; pp. 339–358. [Google Scholar]
Carzaniga, A.; Rosenblum, D.; Wolf, A. Challenges for Distributed Event Services: Scalability vs. Expressiveness. In Proceedings of the ICSE’99 Workshop on Engineering Distributed Objects (EDO’99), Los Angeles, CA, USA, 17–18 May 1999. [Google Scholar]
Carzaniga, A.; Rosenblum, D.; Wolf, A. Interfaces and Algorithms for a Wide-Area Event Notification Service. In Technical Report CU-CS-888-99; Department of Computer Science, University of Colorado: Boulder, CO, USA, 1999. [Google Scholar]
Casanova, H.; Legrand, A.; Zagorodnov, D.; Berman, F. Heuristics for Scheduling Parameter Sweep Applications in Grid Environments. In Proceedings of the 9th Heterogeneous Computing Workshop (HCW’00), Cancun, Mexico, 1 May 2000; pp. 349–363. [Google Scholar]
Chen, Y.; Katz, R.; Kubiatowicz, J. Dynamic Replica Placement for Scalable Content Delivery. In Peer-to-Peer Systems. IPTPS 2002; Springer: Berlin/Heidelberg, Germany, 2002; pp. 306–318. [Google Scholar]
Chervenak, A.; Foster, I.; Kesselman, C.; Salisbury, C.; Tuecke, S. The Data Grid: Towards and Architecture for the Distributed Management and Analysis of Large Scientific Data Sets. J. Netw. Comput. Appl. 2000, 23, 187–200. [Google Scholar] [CrossRef] [Green Version]
Majchrowicz, M.; Kapusta, P.; Jackowska-Strumiłło, L.; Sankowski, D. Analysis of Application of Distributed Multi-Node, Multi-GPU Heterogeneous System for Acceleration of Image Reconstruction in Electrical Capacitance Tomography. Image Process. Commun. 2015, 20, 5–14. [Google Scholar] [CrossRef] [Green Version]
Majchrowicz, M.; Kapusta, P.; Jackowska-Strumiłło, L.; Sankowski, D. Acceleration of image reconstruction in 3D Electrical Capacitance Tomography in heterogeneous, multi-GPU, multi-node distributed system. In Proceedings of the International Interdisciplinary PhD Workshop 2017, Lodz, Poland, 9–11 Sepember 2017; pp. 164–169. [Google Scholar]
Majchrowicz, M.; Kapusta, P.; Jackowska-Strumiłło, L.; Sankowski, D. Acceleration of 3D ECT image reconstruction in heterogeneous, multi-GPU, multi-node distributed system. In Proceedings of the 2018 Federated Conference on Computer Science and Information Systems, Poznan, Poland, 9–12 September 2018; pp. 347–350. [Google Scholar]
Tchorzewski, P.; Rymarczyk, T.; Sikora, J. Using Topological Algorithms to Solve Inverse Problem in Electrical Impedance Tomography. In Proceedings of the International Interdisciplinary PhD Workshop, Brno, Czech Republic, 12–15 Sepember 2016; pp. 46–50. [Google Scholar]
Sun, B.; Yue, S.; Cui, Z.; Wang, H. A new linear back projection algorithm to electrical tomography based on measuring data decomposition. Meas. Sci. Technol. 2015, 26, 1270–1283. [Google Scholar] [CrossRef]
Cui, Z.; Wang, Q.; Xue, Q.; Fan, W.; Zhang, L.; Cao, Z.; Sun, B.; Wang, H.; Yang, W. A review on image reconstruction algorithms for electrical capacitance/resistance tomography. Sens. Rev. 2016, 36, 429–445. [Google Scholar] [CrossRef]
Wajman, R. New image reconstruction method for capacitance process tomography. Zesz. Nauk. Politech. Łódzkiej Elektr. 2008, 113, 107–113. [Google Scholar]
Kapusta, P.; Duch, P.; Jaworski, T.; Kucharski, J.; Ślot, K. Generative network input shaping for controlling visual object rendition in Adversarial Networks. In Proceedings of the International Interdisciplinary PhD Workshop 2017, Lodz, Poland, 9–11 Sepember 2017; pp. 380–385. [Google Scholar]
Majchrowicz, M.; Kapusta, P. Application of General-Purpose Computing on Graphics Processing Units for Acceleration of Basic Linear Algebra Operations and Principal Components Analysis Method. In Man-Machine Interactions 3, Advances in Intelligent Systems and Computing; Springer: Berlin/Heidelberg, Germany, 2014; Volume 242, pp. 519–527. [Google Scholar]
Gall, F.L. Faster Algorithms for Rectangular Matrix Multiplication. arXiv 2012, arXiv:1204.1111v2. [Google Scholar]
Romanowski, A.; Grudzien, K.; Banasiak, R.; Williams, R.A.; Sankowski, D. Hopper Flow Measurement Data Visualization: Developments Towards 3D. In Proceedings of the 5th World Congress on Industrial Process Tomography, Bergen, Norway, 3–6 September 2007. [Google Scholar]
Fielding, R.; Gettys, J.; Mogul, J.; Frystyk, H.; Masinter, L.; Leach, P.; Berners-Lee, T. Hypertext Transfer Protocol–HTTP/1.1. Request For Comments (RFC); Internet Engineering Task Force: Wilmington, DE, USA, 1999. [Google Scholar]

Figure 1. Diagram of the measuring system in the electrical capacitance tomography: (a) ideological— constructional, (b) block—functional.

Figure 2. Dependency between 3D image reconstruction quality and number of Landweber’s algorithm iterations (it. LW) for two example objects (simple two balls and challenging T letter shape). From the left to the right : (in red) LBP (0it. LW), 1it LW, 5it LW, 10it LW, 25it LW, 50it LW, reference phantom object (in green).

Figure 3. Schematic of a designed heterogeneous distributed computing system comprising graphics cards.

Figure 4. Block diagram of different layers in KIS digital computing (KISDC) platform.

Figure 5. Activity block diagram for performing calculations using the KISDC platform—each block contains a name of activity and a name of corresponding function in KISDC framework realizing this activity (in parenthesis).

Figure 6. Flow diagram for coefficients computation algorithm for KISDC platform, where WL is array of nodes, N_W is number of elements in WL array, W is an array of computed coefficients, T is an array of average time for each node, Max_T_Diff is maximal time difference between nodes, IsB is bolean flag indicating if recalculation (false value) is necessary.

Figure 7. Block diagram of data division between devices on each node, kisdc_push function is responsible for transferring data to devices whereas kisdc_pop transfer data from devices.

Figure 8. General schema of distributed Landweber’s algorithm with data buffering, where

ε_{k + 1}

is image obtained in current iteration,

ε_{k}

is image from the previous iteration,

α

is convergence factor (scalar),

S^{T}

is sensitivity matrix transposed,

S

is sensitivity matrix,

C_{m}

is capacitance measurements vector, CN is an array of capacitance measurements vectors, WL is an array of nodes, W is an array of computed coefficients, and Nk is a number of elements in W array.

Figure 8. General schema of distributed Landweber’s algorithm with data buffering, where

ε_{k + 1}

is image obtained in current iteration,

ε_{k}

is image from the previous iteration,

α

is convergence factor (scalar),

S^{T}

is sensitivity matrix transposed,

S

is sensitivity matrix,

C_{m}

is capacitance measurements vector, CN is an array of capacitance measurements vectors, WL is an array of nodes, W is an array of computed coefficients, and Nk is a number of elements in W array.

Figure 9. Flow diagram for data and calculations distribution algorithm.

Figure 10. Comparison of average data transmission times for different network protocols.

Figure 11. Graph showing comparison of results for hypertext transfer protocol (HTTP) and Xgrid platform for configuration with two computers and measurement data vector (C 2 KiB).

Figure 12. Graph showing performance comparison for HTTP protocol and Xgrid platform for two-computer configuration and 3072 KiB image data vector.

Figure 13. Comparison of image reconstruction time in Xgrid and KISDC systems for dual computer configuration using 400 iterations of Landweber’s algorithm.

Figure 14. Graph showing a difference in time of image reconstruction between the improved version of the algorithm over previously obtained results.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Majchrowicz, M.; Kapusta, P.; Jackowska-Strumiłło, L.; Banasiak, R.; Sankowski, D. Multi-GPU, Multi-Node Algorithms for Acceleration of Image Reconstruction in 3D Electrical Capacitance Tomography in Heterogeneous Distributed System. Sensors 2020, 20, 391. https://doi.org/10.3390/s20020391

AMA Style

Majchrowicz M, Kapusta P, Jackowska-Strumiłło L, Banasiak R, Sankowski D. Multi-GPU, Multi-Node Algorithms for Acceleration of Image Reconstruction in 3D Electrical Capacitance Tomography in Heterogeneous Distributed System. Sensors. 2020; 20(2):391. https://doi.org/10.3390/s20020391

Chicago/Turabian Style

Majchrowicz, Michał, Paweł Kapusta, Lidia Jackowska-Strumiłło, Robert Banasiak, and Dominik Sankowski. 2020. "Multi-GPU, Multi-Node Algorithms for Acceleration of Image Reconstruction in 3D Electrical Capacitance Tomography in Heterogeneous Distributed System" Sensors 20, no. 2: 391. https://doi.org/10.3390/s20020391

APA Style

Majchrowicz, M., Kapusta, P., Jackowska-Strumiłło, L., Banasiak, R., & Sankowski, D. (2020). Multi-GPU, Multi-Node Algorithms for Acceleration of Image Reconstruction in 3D Electrical Capacitance Tomography in Heterogeneous Distributed System. Sensors, 20(2), 391. https://doi.org/10.3390/s20020391

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multi-GPU, Multi-Node Algorithms for Acceleration of Image Reconstruction in 3D Electrical Capacitance Tomography in Heterogeneous Distributed System^†

Abstract

1. Introduction

2. Image Reconstruction in ECT