FPGA-Based Portable Ultrasound Scanning System with Automatic Kidney Detection

: Bedsides diagnosis using portable ultrasound scanning (PUS) offering comfortable diagnosis with various clinical advantages, in general, ultrasound scanners


Introduction
Ultrasound scanning is a widely-used modality in healthcare, as it provides real-time, safer and noninvasive imaging of organs in the body, including heart, kidney, liver, fetus, etc. Conventional ultrasound machines cannot be used for point-of-care (POC) applications due to their high form factor. POC refers to treating the patients at their bedside, which has great potentiality in saving lives, especially in situations like ambulances, military, casualties, rural healthcare, etc. [1].Ultrasound scanning needs limited setup for scanning, and advancements in computing platforms enable portable ultrasound scanning (PUS) systems' use in POC applications.
Commercially-available PUS with application-specific integrated circuit (ASIC) cannot be updated with new algorithms, while programmable ultrasound scanning machines provide the flexibility for rapid prototyping, updating and validating new algorithms [2].Computing platforms, such as field-programmable gate array (FPGA), digital signal processor (DSP) and media processors, have capabilities in terms of hardware to realize complex ultrasound signal processing algorithms in real time.In [3], ultrasound signal processing is performed on a single media processor; due to the computational limitations of the processor, the beamformed data are acquired through an ultrasound research interface from a high-end system, and the real-time mid-end and back-end processing algorithms are implemented on a media processor.In [4], a complete standalone ultrasound scanning machine is realized using single FPGA and a proposed pseudo-dynamic received beamforming with an extended aperture technique targeting the PUS system.In [5], we proposed a low complexity programmable FPGA-based eight-channel ultrasound transmitter for research, which enables rapid prototyping of new algorithms.ASIC for ultrasound signal processing is beneficial for reducing the size of the ultrasound machine, and we proposed an ASIC design for integrated transmit and receive beamforming in [6].In [7], the authors demonstrated the feasibility of implementing the ultrasound signal processing on a high-end android mobile platform.Low complexity algorithms have been proposed for portable ultrasound machines, such that they can be easily implemented on computational platforms that have less computational capability [8][9][10][11].
In this paper, we propose an FPGA-based standalone portable ultrasound scanning system with real-time kidney detection.We chose FPGA for implementing complete ultrasound signal processing due to its efficiency in handling high data rates.The PUS machine is realized using Kintex-7 [12], which is integrated with an ARM1176JZF-S Raspberry Pi-2 processor [13].
For displaying and recording the scanned ultrasound data, the H.264 codec is used.H.264 is a computationally-expensive procedure, and several optimized versions are proposed to implement in real time on different computing platforms, like FPGAs, DSPs, microprocessors, etc.Here, we list some of the optimizations proposed for the H.264 codec to implement on an FPFA platform.In [14], low complexity and low energy adaptive hardware for H.264 with multiple frame reference motion estimation is proposed and implemented on an FPGA platform.A bioinspired custom-made reconfigurable hardware architecture is proposed in [15,16] for efficient computation of motion between image sequences.The work in [17] proposed a flexible and scalable fast estimation processor targeting the computational requirements for high definition video using the H.264 video codec on an FPGA platform.Optimization for accelerating the block matching algorithms on a Nios II microprocessor, which is designed for FPGA, is proposed in [18,19].A new early termination algorithm is proposed in [20] for effective computation of the correlation coefficient in template matching with fewer computations.Several optimizations are available for implementing the H.264 codec on an FPGA, but the FPGA resources in the proposed portable ultrasound system are preserved for implementing complex Doppler ultrasound.In the proposed portable ultrasound system, the H.264 codec is installed on a Raspberry Pi processor, which is downloaded from [21].
Even though ultrasound scanning is widely used, the ultrasound images have a low signal-to-noise ratio and contrast, and there will not be any significant distinction between the organ region and edges, making it difficult to identify the organ exactly.Portable ultrasound scanners in remote areas are mostly used by emergency physicians who may not be fully trained in sonography.Hence, there is a need for computer-assisted algorithms, which can assist physicians in making accurate decisions.These algorithms include speckle suppression, organ detection, computer-aided diagnosis (CAD), etc.Hence, automatic detection will be very beneficial for the development of applications related to CAD, image compression, segmentation, image-guided interventions, etc.
In the literature, portable ultrasound scanning machines are realized on FPGAs, DSPs and media processors [2][3][4], and the automatic detection of kidneys in CT images based on random forest is proposed.The algorithms proposed for CT images will not work for ultrasound images, as the characteristics of kidney change with imaging modality, and the kidney images acquired through ultrasound scanning depend on the person who scans.Therefore, manual cropping is employed for detecting the kidney in ultrasound images [22].In this paper, we propose an automatic kidney detection algorithm for detecting the kidneys in a real-time ultrasound video and evaluate the algorithm on an FPGA-based PUS machine in real-time.To the best of our knowledge, this is the first working prototype of a portable ultrasound scanner with automatic detection of kidney.The prototype of the PUS is evaluated by scanning the tissue-mimicking gelatin phantom.The reconstructed images of the PUS are visually compared to the images of commercially available platforms.The proposed portable ultrasound scanner with the kidney detection algorithm is successfully tested at the lab level.
The rest of the paper is organized as follows.In Section 2, we describe the flow adhered to for automated kidney detection.The hardware implementation of the portable ultrasound system is described in Section 3. The experimental setup and performance of the system are discussed in Section 4. Section 5 concludes the paper with discussions on the future scope of work.

Automatic Kidney Detection
Automatic organ detection is very beneficial for semi-skilled persons who operate the device remotely.Organ detection in ultrasound is strongly influenced by the quality of data.The characteristic artifacts that make the organ detection complicated are speckle noise, acoustic shadows, attenuation, signal dropout and missing boundaries of the organ.In this paper, we focus on detecting kidney in ultrasound images, which is different from segmentation of kidney, i.e., extracting the exact contour of kidney.
Kidney is made up of soft tissue, and it is nonrigid in nature.Moreover, the size of kidney in ultrasound images varies from patient to patient depending on the patient's age, anatomy, disease, orientation of acquisition, etc., so, it becomes a challenging task to automatically detect kidney in ultrasound images.
In the literature, the problem of automatic detection of kidney with ultrasound images has never been addressed before.However, there are some algorithms proposed in the literature to segment the kidney in ultrasound.Automatic organ detection is addressed as one of the intermediate steps involved in the segmentation procedure.Organs have to be accurately segmented in designing applications, like 3D reconstruction, CAD, automatic measurements, etc. Organ detection and localization are the two major steps involved in segmentation.In the literature, not much interest is shown in automatic detection of organs, while major research is done on detecting the exact contour of an organ.The organs in the images are detected by manually cropping along the contours [22], placing the landmarks on the contour [23] and marking the bounding boxes around the kidney [24].Automatic detection of kidney in CT slices using random forest has been used in [25] for reconstructing the 3D structure of kidney.A two-stage detection algorithm is employed for automatic detection of lymph nodes in CT data; Haar features with the Adaboost cascade classifier are used in the first stage, and in the second stage, self-assigning features with the Adaboost cascade classifier are used [26].The algorithm proposed in [22] needs manual cropping and assumes that kidney is present in every slice.However, when we scan kidney using 2D probes, we cannot expect kidney to be present in each frame, and kidneys also have been occluded by surrounding organs, like liver and spleen.As kidney is not present in every slice, the correlation between two frames is not useful in tracking the kidney.Therefore, the algorithm has to detect the kidney in each incoming slice.The kidney detection algorithm has to be fast enough at automatically detecting the kidney in real time from incoming ultrasound data.
Active contours [27], also known as snakes [28], pose contour detection as an optimization problem that moves a parameterized curve towards the image region with strong edges; though this model is highly successful, it suffers from initialization, and the optimization function being nonlinear, as a result, the solution is trapped in many places.In level-set methods [29,30], the contour of kidney is represented by a level-set zero of a distance function; this approach is less sensitive to initialization and significantly suffers from imaging conditions.Both active contours and level-set approaches are highly successful in contour extraction; they present the drawback of incorporating prior knowledge about the shape and texture distribution in the optimization function; the prior knowledge may be in the form of a probability distribution formed from the manually-segmented contours and textures.Active shape models (ASM) [23] and active appearance models (AAM) [31] are supervised learning approaches, which estimate the parameters from a joint distribution representing the shape and appearance of an organ; these methods need large databases, and the initialization of the contour should be close to local optima.In [32], kidney segmentation based on texture and shape priors is proposed; the texture features are extracted from a bank of Gabor filters on test images, then an iterative segmentation is proposed to combine texture measures into a parametric shape model.In [22], a probabilistic Bayesian method is employed to detect the contours in a three-dimensional ultrasound.
The models discussed above are constraint based on the terms of shapes and the condition of the organ (normal and abnormal).For example, active contours will fail to extract the exact contour of kidney in the presence of cysts in kidney.However, the ASM and AAM can capture the contour of kidney even in the presence of cysts, but fail to capture the diversified variations of contour.In this paper, we are interested in detecting the region of interest for kidney, which is useful to emphasize only that particular region further.The method we adopted for detecting kidney in an ultrasound images is based on a Viola-Jones detector [33].Viola-Jones is highly successful in detecting many object classes; it was primarily developed for face detection in images and later extended to detect faces in video sequences [34].Recently, the Viola-Jones algorithm framework was used in medical image analysis to detect various organs, such as pelvis and proximal femur of a hip joint in 3D CT images [35].In [36], a modified Viola-Jones algorithm is used to automatically locate the carotid artery in ultrasound images.
In this paper, we show the effectiveness of the Viola-Jones algorithm in detecting the region of interest for kidney in ultrasound images.The complete Viola-Jones algorithm is implemented on a Raspberry pi board, which is interfaced with the Xilinx Kintex-7 FPGA, where ultrasound signal processing algorithms are implemented.
The block diagram representation of the kidney detection algorithm is shown in Figure 1.The basic components corresponding to this algorithm include Haar-like features, the integral image, the Adaboost algorithm and the cascade of classifiers.

Haar-Like Features Used in the Kidney Detection Algorithm
Haar-like features are derived from the kernels [37]; a few of these kernels are shown in Figure 2. The white region in the kernel corresponds to weight w 0 = −1, and the black region corresponds to w 1 = +1.The value of these features are then computed using the formula f (x) = w 0 r 0 + w 1 r 1 , where f (x) is the response of a given Haar-like feature to the input image x, w 0 is the weight of the area r 0 and w 1 is the weight of the black area r 1 .The number of pixels in areas r 0 and r 1 varies because the features are generated for various possible combinations and positions in a given window.These dimensions start from a single pixel and extend up to the size of a given window.The features generated using these kernels are independent of image content.The process of feature generation is explained as follows: considering the kernel in Figure 2a, which is initially of a two-pixel column width (one pixel white and one pixel black), the feature value of f (x) is computed.The kernel is shifted from the top left of the image by one pixel, and a new feature value is calculated.Similarly, the kernel is then moved across the complete image, until it reaches the right bottom of the image with all of the features computed.Hence, the features are evaluated hundreds of times as the kernel moves across all of the rows of the image, and every time, a new feature value is updated in the feature list.Later, the kernel is increased to a four-pixel width (two white pixels and two black pixels), and the process is repeated to get new feature values.Five different kernels are used for generating the rectangular features.The process is repeated for all of the kernels; considering all of the variations in size and position, a total of 586,992 features were computed [33] for a window of size 32 × 54 [38].Figure 3 illustrates the flowchart for generating Haar-like features.

Integral Image
Computing the sum of pixels in a given area is a computationally-expensive procedure; hence, intermediate representation of the image is used to compute the features rapidly and efficiently; this representation of the image is called an integral image [39].The conversion of the image to an integral image is based on the following formulae: where u, v are the indices of the pixel, s(u, v) is the cumulative sum of pixel values in a row with initial conditions as s(u, −1) = 0 and i 1 (−1, v) = 0, i(u, v) is the pixel value of original image and i 1 (u, v) is the pixel value of the integral image.
In the integral image, the sum of all pixels under a rectangle can be evaluated using only the four-corner values of the image, which is similar to the summed area technique used in graphics [40].In Figure 4

Selection of Features for Automatic Kidney Detection Using the Adaboost Algorithm
Adaboost is a machine learning algorithm that helps with finding the best features among 586,000+ features [41,42] for detecting kidney.After evaluating the obtained features from the Adaboost algorithm, the weighted combination of these features is used to decide whether a given window has kidney or not.These selected features are also called weak classifiers.The output of a weak classifier is binary, either one or zero.One indicates that the feature is detected in the window, and zero indicates that there is no feature in the window.Adaboost constructs a strong classifier based on the linear combination of these weak classifiers.
where F (x) is a strong classifier, f i (x) is a weak classifier and α i is the weight corresponding to the error evaluated using classifier f i (x).
The flowchart for computing the features using the Adaboost algorithm is shown in Figure 5 [36].Adaboost starts with a uniform distribution of weights over training examples.The classifier with the lowest weighted error (a weak classifier) is selected.Later, the weights of the misclassified examples are increased, and the process is continued, till the required number of features is selected.Finally, a linear combination of all of these weak classifiers is evaluated, and a threshold is selected.If the linear combination of a new image in a given window is greater than this threshold, it is considered as kidney being present; if it is less than the threshold, it is classified as non-kidney.The Adaboost algorithm finds a single feature and threshold that best separate the positive (kidney) and negative (non-kidney) training examples in terms of weighted error.Firstly, the initial weights are set for positive (with kidney) and negative (without kidney) examples.Each classifier is used from 586,000+ features to determine the error, which in this case is to misclassify the presence of kidney in a window.If the error is high, the training process ends; else, the weights α t are set to the selected linear classifier.The α j are computed as: where j is the error occurring while classifying the images.Later, the weights of the positive and negative examples are adjusted, such that the weights of the misclassified examples are boosted, and the weights of the correctly-classified examples are not changed.Finally, a strong classifier is created, which is a combination of weak ones weighted according to the error that they had.
Figure 5. Flowchart of the Adaboost algorithm for selecting the features for kidney detection.

Cascade Classifier for Automatic Kidney Detection
The strong classifier formed from the linear combination of these best features is a computationally-expensive procedure.Therefore, a cascade classifier is used, which consists of stages, and each stage has a strong classifier [43].Therefore, all of the features are grouped into several stages, where each stage has a certain number of features along with a strong classifier.Thus, each stage is used to determine whether a kidney is present in a given sub-window.The block diagram of the cascade classifier is shown in Figure 6.A given sub-window is immediately discarded if kidney is not present and not considered for further stages.The cascade classifier works on the principle of rejection, as the majority of sub-windows will be negative.It rejects many negatives at the earliest stage possible.This reduces the computational cost, and hence, kidney in the image can be detected at a faster rate.The complete analysis and implementation of the Viola-Jones algorithm can be found in [44].

Database
Kidney images and videos were acquired using a Siemens S1000 from 400 patients during the period May 2014 to February 2015.These patients were in the age group of 14 to 65 years, including both genders.The database consists of 790 images with 400 kidney (normal: 332; cyst: 40; stone: 28) and 390 non-kidney (liver: 100; heart: 95; carotid: 90; spleen: 105) images.The kidney and non-kidney images were collected from the same patients.The data were collected by acknowledging the patients.The data received from the doctor were annotated with the patient's age, gender and disease.The names of the patients were not revealed in the process.

Metrics Used for Evaluating the Automatic Kidney Detection Algorithm
The performance of the organ detection algorithm is evaluated by tabulating the confusion matrix.The metrics used in this paper, referring to Figure 7, are given below: P ositive P redictive V alue = T P T P + F P (7) Sensitivity measures the proportion of actual positives, and specificity measures the proportion of actual negatives correctly identified as such, respectively.A positive predictive value measures the efficiency of an algorithm to correctly identify normal kidneys, and a negative predictive value measures the efficiency of an algorithm to correctly identify negative images.Accuracy gives the efficiency of an algorithm in correctly localizing kidney in the images.
For training the Viola-Jones algorithm, 640 images (positive images: 320; negative images: 320) are used.The positive and negative images used in training the cascade classifier are shown in Figures 8 and  9, respectively.The positive training dataset consists of kidneys with 270 normal, 30 cyst and 20 stone cases.The negative training dataset consists of 82 liver, 78 heart, 75 carotid and 85 spleen images.The region of interest for kidney differs from image to image.Therefore, in the training, all of the positive instances are rescaled to the expectation of width, the height of positive instances using nearest neighbor approach.The expectation for the region of interest is found to be 32 × 54 pixels.The kidney detection algorithm is trained using a 20-stage cascade classifier with a 0.2 false alarm rate.For training each stage in the cascade classifier, positive and negative examples are considered with a 1:2 ratio.In training the object model for kidney, each stage uses at most 320 positive instances and 640 negative instances.Negative instances are generated automatically from the negative images.The number of features used in first five stages of the cascade classifier are 3, 5, 4, 5 and 6, respectively.A total of 92 features are selected from 586,000+ features.The kidney detection algorithm is realized using functions of the Viola-Jones algorithm available in OpenCV.The inbuilt functions in OpenCV take the false alarm rate and the number of cascade classifiers as the input parameters and generate a model with feature selection and threshold parameters for cascade classifiers.
MATLAB 2015a running on a desktop with an i7 processor, 16 GB RAM and a 2.8-GHz clock takes 17.25 min to train the cascade model for kidney detection.The model generated from MATLAB is compatible with OpenCV and is used to detect the kidney in real time on an ARM processor.
The number of features and the threshold for each cascade classifier are automatically detected by the Adaboost algorithm.The number of features and the threshold are selected based on the given false alarm rate.The true positives of the detector increase with the increase in false positives, as shown in Table 1.The increase in false positives reduces the specificity and positive predictive value; ideally, these values should be equal to one.One hundred percent accuracy can be achieved with the cost of reducing the specificity and positive predictive value.An increase in parameters, like window enlargement in every step and displacement of the sliding window, slightly increases the detection speed at a cost of a slight decrease in accuracy.Detection accuracy heavily depends on false detections.A strong two-stage classifier with 100% detection accuracy can be constructed by keeping false alarms at more than 40%.The number of features selected for classification depends on the Adaboost algorithm.The detection time increased as the number of features used for classification increased.The displacement of the sliding window in detecting the kidney affected the accuracy of detection.The results presented here are for a one-pixel displacement.The detection time is improved with a two-pixel shift with slightly reduced accuracy.The kidney detection algorithm is initialized with the following parameters: the window enlargement in every step is set to 1.1, and every time, the sliding window is shifted by one pixel.A 10-fold cross-validation scheme is employed to test the variability in the database.The 10-fold cross-validation is applied on the test set ten times with different partitions.Ten random seeds are generated, each of them separating the test set into 10 disjoint sets.Each subset is used for testing, while the other remaining nine sets are used in training.Each 10-fold cross-validation produces classification results for 320 kidney and 320 non-kidney images.Thus, a total instance of 3200 and 3200 kidney and non-kidney images is tested.The overall classification accuracy is averaged over ten rounds.The Viola-Jones algorithm performed with an overall accuracy of 92.6% in detecting kidney.

System Description
Figure 10 shows an overall block level representation of the portable ultrasound imaging system using a single FPGA.The architecture consists of a 64-element linear transducer array, an eight-channel data acquisition module, a Kintex-7 FPGA and a Raspberry Pi-2 ARM processor.The data acquisition module consists of a beamformer and analog front end circuits, which include a high voltage (HV) pulser, analog receivers and 14-bit eight-channel analog to digital converters operating at 40 MHz.The signal processing algorithms, including the front-end, mid-end and back-end processing algorithms, are implemented on the Kintex-7 FPGA.The Raspberry Pi-2 processor running weezy-raspbian, a Debian-based Linux operating system with 1 GB RAM and a 900-MHz clock frequency, coordinates between all of the processing modules in ultrasound processing and also is used for displaying and communication purposes.The circuit-level organization and implementation of the portable ultrasound scanning system is shown in Figure 11.The Beamformer IC (LM96570 [45]) is programmed through Kintex-7 for generating eight P and N channels with 3.3-V logic pulses.The HV pulser acts like an amplifier, which converts 3.3-V logic pulses into transducer-compatible HV pulses.
The excitation pulses and received echoes traverse through the HV pulser.Therefore, when we apply HV pulses to the transducer, the transmit path will be shorted and the receive path will be opened to avoid damage to the analog circuitry.The analog front-end (AFE) receives reflected signals from the HV pulser and digitizes them.These digitized 14-bit ADC data are fed to the Kintex-7 implementing mid-end and back-end algorithms.The Raspberry Pi-2 processor displays the data received from the Kintex-7 board in the form of an image.Based on the functionality, the signal processing algorithms in ultrasound scanning are categorized into frond-end, mid-end and back-end processing modules.The functionality and implementation of each module is discussed below.

Front-End Processing
The front-end processing module consists of transmit (Tx) and receive (Rx) beamforming, low noise amplification and time gain compensation algorithms.Tx and Rx beamforming algorithms are essential to obtain good spatial and temporal resolution.In Tx beamforming, more than one element from the transducer is excited in such a way that all of the sound waves will converge at a particular point.In Rx beamforming, the echoes reach the different transducer elements with different delays.The echoes received from the transducer elements are delayed in such a way that all of the echoes are added coherently, as shown in Figure 12.The delay values for the transducer elements, referring to Figure 12, are computed in the following way: where R bf is the receive beamformed signal, E represents the number of active transducer elements used in acquiring the echo, n represents the number of focal zones, a e is the apodization coefficient and t e is the receive focusing delay of the e-th channel.The delay t e is computed as: here, (x e , y e ) is the location of the e-th channel, (x f , y f ) is the location of the focal point, y f c is the distance from the focal point to the central channel and v is the velocity of sound.In electronic beam steering, the delay value for the channels, referring to Figure 13, is computed in the following way: where where P represents the focal point, as shown in Figure 13, R f p(0) denotes the distance from the center element to point P , t i is the time required for the wavefront to reach point P , x i is the coordinate of the i-th element, t max is the maximum time required for the wavefront to reach point P , T i is the delay for the i-th element and α is the steering angle.The delay values are pre-computed, and lookup table (LUT)-based processing is employed to reduce the hardware complexity.The Rx beamforming is designed for 1024 focal points.Two bytes of memory are allocated for each delay value, constituting of a total of 2 KB of memory for each channel.LUT occupies 16 KB of memory to store the delay values of eight channels.Further processing of the beamformed signal, including low noise amplification, low pass filtering and analog to digital conversion, are implemented using AFE 5808 IC [46].

Mid-End Processing
Envelope detection and log compression operations are performed in mid-end processing.The significance of each operation is discussed below.

Envelope Detection
Echoes received from the interfaces have several cycles of oscillations with the same frequency of transmitted pulse.The high frequency nature of the signal cannot be viewed as an image, as the human eye cannot perceive the distinction between the samples.In Figure 14, the curves passing from above and below the baseline represent the envelope of the signal, thus eliminating the high frequency components present in the signal.The amplitude of the envelope signal corresponds to the pixel intensity of the B-mode image.The envelope of the signal is computed by taking the square root for the summation of the squared in phase and quadrature phase signals.The Hilbert transform is used to obtain in phase and quadrature phase components of the signal; it provides a +90 degrees phase shift to the positive frequency and a −90 degrees phase shift to the negative frequencies [47,48].The impulse response of an M length Hilbert filter is given by [49].
where α = (M − 1)/2 The Hilbert transform for echo signals is generated using a 32-tap FIR Hilbert filter; m is chosen based on the normalized root mean square error (RMSE) between the ideal Hilbert filter and the designed m-tap FIR Hilbert filter.Table 2 gives the RMSE of FIR Hilbert filter with respect to order m [50].The 32-tap FIR Hilbert filter is chosen as the RMSE is very low. Figure 14 shows the envelope detected data of the RF signal obtained from the AFE with the 32-tap FIR Hilbert filter.

Range Compression
The envelope detected RF signal has a high dynamic range over 80 dB and, hence, does not fit the resolution of the monitors.A dynamic range compression technique is used to compress the RF signal to display on devices having a low dynamic range, which is around seven or eight bits.A nonlinear compression technique, like log compression, is used to selectively compress the large input signals.
Figure 15 shows the dynamic range compressed data of the envelope signal (shown in Figure 14).To reduce the computational complexity, the LUT approach is employed, and 64 KB of memory are utilized for computing the log compression.

Back-End Processing
In back-end processing, the scan conversion and interpolation algorithms are performed on dynamic range compressed data.In phased array excitation, the dynamically compressed RF data are in the polar coordinate system.To display data on the monitor, the data in the polar coordinate system should be transformed to the rectangular coordinate system.The coordinate rotational digital computer (CORDIC) algorithm is employed for computing the square root and arctangent operations.The scanned data are inadequate to fit the resolution of the display monitor.A 2 × 2 linear interpolation method is used to increase the number of data points, such that it fits the resolution of the screen.
The processed data in the Kintex board are transmitted to Raspberry Pi through the RS-232 protocol with speeds up to 2,000,000 baud and transmit/receive 100 bytes in one transaction.The RS-232 protocol uses only Tx and Rx lines.A simple flow control between two boards is implemented through a transmit and wait protocol.The RS-232 controller of Kintex sends a frame triggered by the Tx signal from Raspberry Pi and becomes idle.A buffer matrix is created in the SDRAM of Raspberry Pi to store the frame.Raspberry Pi running a GUI application is programmed using the GTK+2.0libraries [51].The GTK toolkit allows the stored frame buffer to be displayed on a portion of the display window as a gray scale image, and an interrupt is generated for every one millisecond, which forces the processor to fetch a new frame from the Kintex, update the buffer and display the updated buffer in the window.The typical processing time for fetching the image, updating the buffer and displaying takes about 45 ms, and a streaming rate of approximately 22 fps is achieved through the RS-232 protocol.If a new frame arrives, it overwrites the existing frame in the buffer.Before overwriting, the existing frame is dumped into a data file.

Experimental Setup and Results
The prototype of the FPGA-based portable ultrasound scanning system is shown in Figure 16.The transducer elements are excited with a voltage of ± 50 V with a 1-A current.A custom-made power supply is designed to generate the required voltage, as shown in Figure 17.A 230-V, 50-Hz supply is given as the input to the power supply.A step down transformer is used to convert AC supply to ±50-V DC voltage with a 1-A current.These voltages are connected to the HV pulser for generating HV pulses to excite the transducer.The performance of the proposed portable ultrasound system is evaluated by comparing to the commercially available platforms, like the PCI Extensions for Instrumentation (PXI) system and Biosono.

Ultrasound Scanning System Based on PCI Extensions for Instrumentation
National Instrument's (NI) PXIe-1078 [52] is a rugged PC-based platform for measurement and automation systems.It has two eight-channel high speed analog input modules.We have interfaced our data acquisition board to the PXI analog channels.Spartan-3 FPGA [53] is used to establish synchronization between PXI and our data acquisition module.Figure 18 shows the PXI platform hardware setup.The portable ultrasound board is provided with control pins in the FPGA to program the transmission parameters for the logic pulse driver.The FPGA generates a transmit control signal, which is used for synchronization of the PXI with the ultrasound front-end board.The PXI system uses LABVIEW 2013 software to acquire high speed ultrasound data and to reconstruct the ultrasound image.A linear array transducer with a center frequency of 5 MHz is used to scan the gelatin phantom.

Biosono Ultrasound Platform
The proposed portable ultrasound data acquisition module is compared to the Biosono ultrasound platform.Figure 19a shows the Biosono ultrasound board designed for A-mode and M-mode imaging applications [54].The echos are sampled and digitized at a rate of 100 MHz and streamed out through the serial port at 115,200 bps.The MATLAB GUI (Figure 19b) is provided for viewing the real-time echo.The resources utilized in the Kintex-7 FPGA for implementing the beamforming, mid-end and back-end algorithms are analyzed using the Xilinx ISE 14.4 synthesis tool, and this is summarized in Table 4.The ultrasound signal processing algorithms use only 53% of the available LUT flip flops, so the remaining resources can be utilized for implementing other complex beamforming and image-processing algorithms.The performance of the automatic kidney detection algorithm is evaluated using the confusion matrix discussed in Section 2.2.The confusion matrix of the proposed automatic kidney detection algorithm is shown in Figure 21.The automatic kidney detection algorithm on Raspberry Pi-2 (900 MHz, 1 GB RAM) takes 60 ms to detect the presence of kidney in a 480 × 640 resolution frame.The kidney detection algorithm is used as an add-on feature in the Raspberry Pi board.By enabling the kidney detection option, the system directly applies the algorithm on the incoming frame, which is stored in the SDRAM.
For evaluating the kidney detection algorithm in real time, an ultrasound video is acquired from 10 patients using the Siemens S1000 through Ultrasonix 500RP [55].The scanned kidney videos have a frame rate of 15 fps, and this is transferred to the SDRAM of the Raspberry Pi processor.The kidney detection algorithm installed on the Raspberry Pi processor is successfully able to detect the kidney in live streaming video.
Implementing the Viola-Jones algorithm on an FPGA requires 11,505 slice registers, 7,872 slice flip flops, 20,901 four-input LUTs, 44 block RAM (BRAM) and one global clock (GCLK) buffer [56].Optimized algorithms have been proposed for fast real-time implementation of the Viola-Jones algorithm to detect faces in video having frame rates greater than 100.The Viola-Jones algorithm is effectively implemented on various computing platforms, like FPGAs, DSPs, GPUs, mobile platforms, etc. [56][57][58][59].The same computing platforms are also used for realizing PUS systems, so the Viola-Jones algorithm can be implemented on these platforms for real-time detection of kidney.From Table 4, the Kintex-7 has enough resources to implement the Viola-Jones algorithm.Implementing the kidney detection algorithm on an FPGA platform will reduce the load on the ARM processor, which can be used for other tasks, like real-time recording, compression, data transferring, etc.

Discussions and Conclusions
In this paper, we implemented entire B-mode ultrasound signal processing algorithms on a single FPGA-based Kintex-7 board with real-time automatic kidney detection in ultrasound video.The Viola-Jones algorithm proposed for face detection proved effective at detecting kidney in ultrasound images.The Viola-Jones algorithm proves very effective at capturing the diversified shapes of kidney, where other geometrical models fail to capture them.The Viola-Jones localizes the kidney by placing the bounding box around the kidney.The bounding box may not capture the entire organ inside it and can intersect the organ in the image.The bounding box can be used as the initialization to other algorithms, like ASM, active contours, etc., to capture the contour of a kidney.The automatic organ detection, which is also termed region of interest detection, can be very beneficial for performing high signal processing tasks, like segmentation and computer-aided diagnosis.Automatic organ detection is also beneficial for image compression, where non-organ parts are lossy compressed and loss-less compression is employed for organ parts.These sorts of data-dependent compression algorithms are very useful for tele-radiology applications.The hardware architecture proposed in this paper provides flexibility for installing new image processing algorithms in the system, this is done by installing the algorithms on an ARM processor.As the future extension of this work, we would like to develop signal processing algorithms to detect more organs in the image and CAD algorithms to automatically detect diseases in kidney without manual intervention.

Figure 1 .
Figure 1.Block diagram of the Viola-Jones algorithm for detecting kidney.

Figure 2 .
Figure 2. Kernels used to extract Haar-like features for the kidney detection algorithm.(a,b) Two-rectangle features.(c,d) Three-rectangle features and (e) Four-rectangle feature.

Figure 3 .
Figure 3. Flowchart to generate Haar-like features for the kidney detection algorithm.
, the value of the integral image at Location 1 is the sum of pixels in Rectangle A, at Location 2 A + B, at Location 3 A + C and at Location 4 A + B + C + D. The sum of pixels in D can be evaluated as (4 + 1) − (2 + 3).

Figure 4 .
Figure 4. Sum of the pixels in Region D using the four-array reference.

Figure 6 .
Figure 6.Block diagram of the cascade classifier for kidney detection.

Figure 8 .
Figure 8.Some of the kidney images of the positive training dataset used in training the Viola-Jones algorithm.

Figure 9 .
Figure 9.Some of the images of the negative training dataset used in training the Viola-Jones algorithm.

Figure 10 .
Figure 10.Block diagram representation of the portable ultrasound imaging system.

Figure 11 .
Figure 11.Circuit-level implementation of the portable ultrasound imaging system.

Figure 16 .
Figure 16.Prototype of the proposed FPGA-based portable ultrasound scanning system.

Figure 17 .
Figure 17.Power module designed for generating ±50 V DC.

Figure 18 .
Figure 18.Ultrasound system based on the PCI Extensions for Instrumentation (PXI) platform.

Figure 20 .
Figure 20.Image acquired and reconstructed from: (a) the Biosono platform; (b) the NIPXI platform; (c) the proposed FPGA-based portable ultrasound scanning system.

Figure 21 .
Figure 21.Confusion matrix for the proposed automatic kidney detection algorithm.

Figure 22 .
Figure 22.Multiple detections of kidney in ultrasound images.

Figure 24 .
Figure 24.False positives detected in localizing the kidney.

Table 1 .
Number of false detections vs. accuracy.

Table 2 .
Normalized RMSE of the FIR Hilbert filter with respect to order m.

Table 3 .
Parameters used for scanning the gelatin phantom.

Table 4 .
Resources utilized in the Kintex-7 board for implementing the beamforming, mid-end and back-end processing algorithms.