An Investigation of Signal Preprocessing for Photoacoustic Tomography

Photoacoustic tomography (PAT) is increasingly being used for high-resolution biological imaging at depth. Signal-to-noise ratios and resolution are the main factors that determine image quality. Various reconstruction algorithms have been proposed and applied to reduce noise and enhance resolution, but the efficacy of signal preprocessing methods which also affect image quality, are seldom discussed. We, therefore, compared common preprocessing techniques, namely bandpass filters, wavelet denoising, empirical mode decomposition, and singular value decomposition. Each was compared with and without accounting for sensor directivity. The denoising performance was evaluated with the contrast-to-noise ratio (CNR), and the resolution was calculated as the full width at half maximum (FWHM) in both the lateral and axial directions. In the phantom experiment, counting in directivity was found to significantly reduce noise, outperforming other methods. Irrespective of directivity, the best performing methods for denoising were bandpass, unfiltered, SVD, wavelet, and EMD, in that order. Only bandpass filtering consistently yielded improvements. Significant improvements in the lateral resolution were observed using directivity in two out of three acquisitions. This study investigated the advantages and disadvantages of different preprocessing methods and may help to determine better practices in PAT reconstruction.


Introduction
Photoacoustic or optoacoustic tomography (PAT) is an emerging modality that combines the high contrast of optical methods and good resolution in deep tissue with acoustic detection. Being able to break through the optical diffusion limit, PAT has shown great potential in the noninvasive detection of early-stage cancer [1] and the imaging of the brain [2]. When tissue is illuminated with a modulated light source, it absorbs energy followed by periodic thermal expansion to generate an ultrasonic wave. A transducer is employed to collect the propagated ultrasonic wave, which forms the final image by reconstruction. The signal-to-noise ratio (SNR) of the received PA signal is limited by several factors. First of all, the efficiency at each stage of energy conversion is not 100%, accompanied by additional thermal noise, electronic noise, etc. Moreover, in PAT, a high-power nanosecond pulsed laser is usually adopted to illuminate an area of tissue, and the amplitude of the generated PA signal in MHz is largely dependent on the optical absorption of the target. However, according to ANSI standards [3], there is a maximum permissible exposure (MPE) of lasers on human skin, which limits the excitation energy delivered into tissue and the strength of the generated signal. Because of these factors, the received PA signal could be weak and noisy, which may affect the final image quality, especially in deep tissue.
There are several preprocessing methods to enhance the raw PA signal before reconstruction. Excluded from this consideration is the averaging of multiple acquisitions, as increased acquisition time may affect the imaging of tissue motion. A signal is typically decomposed into components (the basis functions characterized by coefficients), with differences between the signal and noise coefficients allowing for the removal of noise. The Fourier transform method uses a simple basic function, the sinusoid, characterized by its frequency [4]. Hence lowpass, bandpass, and matched filters exclude noise-dominated frequencies, although this could also result in a signal loss if it occupies the same frequencies [5,6].
Improved results could be obtained by changing the basis function to waveforms matched to the signal, as in wavelet denoising [7,8]. The signal is decomposed using the discrete wavelet transform method into wavelet coefficients where, ideally, the signal and noise are separated. These noise coefficients are then removed based on thresholds, and the pure signal is reconstructed using the inverse wavelet transform method. This approach was demonstrated to yield SNR improvements in many applications, including chick embryos and rat brains [9], as well as in photoacoustic microscopy in melanoma cells [10]. It was shown to outperform both lowpass and bandpass filters, with bipolar "Symlet" family wavelets found most effective both in silico and in vivo, although the parameters depended on the frequency content and SNR of the image [7].
Empirical mode decomposition (EMD) sifts (decomposes) signals into intrinsic mode functions (IMFs), which have the advantage of being able to vary in frequency and amplitude. Feature selection is then used to identify noisy IMFs and remove them. This was previously demonstrated to improve photoacoustic images in several studies, including a simple approach of merely including the first two IMFs [11][12][13].
Singular value decomposition (SVD) was also used to remove laser-induced noise in photoacoustic images [11]. This is based on decomposing the imaging operator H, which is responsible for noise, into its component matrices USV T , where S is a diagonal matrix containing the singular values of each component. Noisier components could then be removed; indeed, better performance was achieved using just a single component [10].
It is also of relevance that sensors are not omnidirectional but less sensitive in particular directions [14]. The noise-related artefacts resulting from this were illustrated with simulations to date, including using the MATLAB k-wave toolbox [15] and Monte Carlo simulation of red blood cell signals, demonstrating a fourfold increase in accuracy when reconstruction accounted for sensor directivity, according to [16].
However, to our knowledge, there has not been a detailed comparison of the combinations of common preprocessing methods, and thus an optimal preprocessing workflow has not yet been defined. Non-classical methods, such as deep learning, were demonstrated for noise reduction, object detection, and segmentation [17]. However, these are not suitable for inclusion in the comparison because they are not a well-defined algorithm but are trained to datasets, which are typically simulated with limited translatability to experimental datasets [18]. We, therefore, restricted the scope of comparison to common classical methods.
In this paper, we aimed to investigate and optimize the preprocessing of PA signals from a 256-element multi-segment transducer [19] by comparing the reconstructed 2D PA image quality of the combinations of the unfiltered signal, bandpass filter, deconvolution, wavelet denoising, and sensor directivity. We used the metrics of noise reduction (measured as the contrast-to-noise ratio [CNR]) and resolution in both directions (measured as FWHM) to quantify the performance of each combination. These were investigated in a range of acquisitions from single-point sources, to multiple-point sources and finger imaging.

Image Acquisition
A 20 Hz tunable optical parametric oscillator (OPO, Ekspla) nanosecond pulsed laser was shone through a fiber bundle from one side of the transducer to illuminate the sample. The output energy at 700 nm was fixed at 45 mJ with a 3 cm 2 illumination area, corresponding to a fluence of~15 mJ/cm 2 . A customized multi-segment transducer with 256 elements arranged in an arc-linear-arc shape was used to acquire the PA data, as described in [19]. The sampling rate of the data acquisition unit (DAQ) was set to 40 MHz, and 3072 points were acquired for each channel. A PC was used to control the software written in MATLAB and save the data. In the phantom experiments, a 150 µm black fishing line was imaged first ("point" acquisition). A phantom composed of 17 fishing lines arranged in a pyramid shape was also used, as shown in Figure 1 ("multi-point" acquisition). This consisted of 1 point at the top, with 8 rows of increasing lateral separation below. The axial and lateral separation between successive points was 2 mm. Moreover, imaging on a finger was also performed ("finger acquisition").
The output energy at 700 nm was fixed at 45 mJ with a 3 cm 2 illumination area, corresponding to a fluence of ~15 mJ/cm 2 . A customized multi-segment transducer with 256 elements arranged in an arc-linear-arc shape was used to acquire the PA data, as described in [19]. The sampling rate of the data acquisition unit (DAQ) was set to 40 MHz, and 3072 points were acquired for each channel. A PC was used to control the software written in MATLAB and save the data. In the phantom experiments, a 150 µm black fishing line was imaged first ("point" acquisition). A phantom composed of 17 fishing lines arranged in a pyramid shape was also used, as shown in Figure 1 ("multi-point" acquisition). This consisted of 1 point at the top, with 8 rows of increasing lateral separation below. The axial and lateral separation between successive points was 2 mm. Moreover, imaging on a finger was also performed ("finger acquisition"). Figure 1. Schematic of the photoacoustic imaging system and the phantom. A total of 17 150µmfishing-lines were arranged in a pyramid shape with a metal frame. The axial and lateral separations were 2 mm between the two lines. Cross-sectional imaging was performed from the top of the phantom. OPO: optical parametric oscillator. DAQ: data acquisition unit.

Signal Processing
All image processing was carried out in MATLAB R2019b. The signal processing workflow for all compared methods is shown in Figure 2. The parameters for all methods are shown in Table 1.

Signal Processing
All image processing was carried out in MATLAB R2019b. The signal processing workflow for all compared methods is shown in Figure 2. The parameters for all methods are shown in Table 1.  The sensor signal underwent a bandpass filter, wavelet denoising, EMD, or SVD, or was unfiltered, followed by applying directivity or not. This yielded 10 combinations (unfiltered, bandpass, wavelet, EMD, SVD, unfiltered+directivity, bandpass+directivity, wavelet+directivity, EMD+directivity, SVD+directivity).

Optimization of Preprocessing Methods' Parameters
The bandpass filter's FWHM was set at 90% of the central frequency. The wavelet denoising parameters (wavelet family, order, denoising method, threshold rule, and noise estimation) were optimized with CNR maximization and the minimization of FWHMx and FWHMz (Appendix A). It was found that the wavelets of the Daubechies family performed best in the fourth order. The optimized values of all parameters are listed in Table 1. EMD was performed using the MATLAB emd function by iteratively decomposing the data X(t) into IMFs and a residual. This involved the following steps [20,21]: 1. Finding all local minima and maxima of the data; 2. The calculation of the lower and upper envelopes by the cubic spline interpolation of the minima and maxima, respectively; 3. Calculating the mean of the lower and upper envelopes, ; 4. Calculating the provisional component h1 as the difference between the data and : ℎ = ; 5. Repeat sifting (steps 1-4) but using the provisional component as the data, yielding ℎ = ℎ ; 6. Continue sifting k times until the stopping criterion is reached, yielding ℎ = ℎ . This is the first component, ; 7. Derive the residue : = ; 8. Repeat steps 1-8 using the residue r1 as the data, yielding = ; 9. Continue repeating steps 1-8 to yield all further components: = ; 10. Stop when the residue is a monotonic function, meaning no more minima and maxima exist, and thus no more components are extracted. The sensor signal underwent a bandpass filter, wavelet denoising, EMD, or SVD, or was unfiltered, followed by applying directivity or not. This yielded 10 combinations (unfiltered, bandpass, wavelet, EMD, SVD, unfiltered+directivity, bandpass+directivity, wavelet+directivity, EMD+directivity, SVD+directivity).

Optimization of Preprocessing Methods' Parameters
The bandpass filter's FWHM was set at 90% of the central frequency. The wavelet denoising parameters (wavelet family, order, denoising method, threshold rule, and noise estimation) were optimized with CNR maximization and the minimization of FWHM x and FWHM z (Appendix A). It was found that the wavelets of the Daubechies family performed best in the fourth order. The optimized values of all parameters are listed in Table 1. EMD was performed using the MATLAB emd function by iteratively decomposing the data X(t) into IMFs and a residual. This involved the following steps [20,21]: Finding all local minima and maxima of the data; 2.
The calculation of the lower and upper envelopes by the cubic spline interpolation of the minima and maxima, respectively; 3.
Calculating the mean of the lower and upper envelopes, m 1 ; 4.
Calculating the provisional component h 1 as the difference between the data and m 1 : Repeat sifting (steps 1-4) but using the provisional component as the data, yielding Continue sifting k times until the stopping criterion is reached, yielding h 1k = h 1(k−1) − m 1 . This is the first component, c 1 ; 7.
Continue repeating steps 1-8 to yield all further components: c n = r n−1 − r n ; 10. Stop when the residue r n is a monotonic function, meaning no more minima and maxima exist, and thus no more components are extracted. This was followed by summing the first two components and taking their absolute value.
SVD was performed using the MATLAB svd function to decompose the data matrix X of dimension n × p (where n is the number of sensors and p is the number of time points) into a matrix of left vectors U of the dimension n × n, a diagonal singular value matrix S of the dimension n × p, and a matrix of right vectors V of the dimension p × p.
As previously demonstrated, this decomposition characterized the laser-induced noise in particular because the noise was consistent across sensors, and using one singular value component was sufficient [10]. Therefore, all values in S except the first diagonal value were set to zero to form the laser-induced singular value matrix S L . The noise was then calculated as US L V T and subtracted from the original signal X.
After these preprocessing steps were completed, the universal back-projection algorithm [22] was used for image reconstruction.

CNR Measurement
CNR could be defined in the following way [23]: where S I and S o are the means of the ROIs located inside and outside the signal of interest, respectively, and σ o is the standard deviation of the ROI located outside.
A signal ROI of 20-pixel half-width was drawn centered on the pixel with the maximum signal. S I was thus the mean of this region. A noise ROI of 50-pixel half-width was drawn in the top left of the image, away from the signal or artefact. The S noise and σ noise were the mean and standard deviation of this ROI, respectively. We then calculated the CNR = |S I −S noise |/σ noise .
As another indicator of image quality, we also calculated the peak SNR where: where S Imax is the maximum value of the signal ROI, and σ o is the standard deviation of the noise ROI.

Resolution Measurement
FWHM x and FWHM z were used to represent the lateral and axial resolution of the reconstructed image. The FWHM x measurements were performed inside the signal ROI centered on the pixel with the maximum signal identified in 2.2.2. The signal was taken from a range of x-coordinates 20 pixels below and above this pixel, yielding 41 points in total. Its minimum was subtracted to eliminate the need to fit a constant as an additional parameter. This was then fitted to a three-term Lorentzian function (using the lorentzfit function from MATLAB Central) to yield its scale parameter γ. This was multiplied by 2 to yield FWHM x . The analogous procedure was repeated in the z-direction to yield FWHM z . Since these were in units of pixels, they were converted to µm via multiplication by the xand z-resolutions.
The multi-point acquisition had a range of intensities at different points, and we, therefore, needed to select a single measurement for comparison. We compared the CNR and resolution measurements for every point using the default unfiltered method (Appendix B). The best results were obtained in the middle row (Appendix B), and thus, the CNR and resolution results were reported as the average of the pair of points in this row.
The results were reported as CNR, FWHM x , and FWHM z . The changes relative to the control (unfiltered) method were reported as ∆CNR, ∆FWHM x , and ∆FWHM z .

Results
Whole images are shown, for example, the methods in the point, multi-point, and finger acquisitions (Figure 3) with the noise ROI annotated.

Results
Whole images are shown, for example, the methods in the point, multi-point, and finger acquisitions (Figure 3) with the noise ROI annotated. The CNR and FWHM measurements are shown in Table 2. A higher CNR represents better noise reduction, while a lower FWHM represents better resolution. Comparisons with the actual imaging phantom were listed in the brackets under the FWHM columns. The changes relative to the unfiltered method were calculated for all measurements (ΔCNR, ΔFWHMx, and ΔFWHMz) in Figure 4. The CNR and FWHM measurements are shown in Table 2. A higher CNR represents better noise reduction, while a lower FWHM represents better resolution. Comparisons with the actual imaging phantom were listed in the brackets under the FWHM columns. The changes relative to the unfiltered method were calculated for all measurements (∆CNR, ∆FWHM x , and ∆FWHM z ) in Figure 4.  The unfiltered+directivity method had the highest ΔCNR in two of three acquisitions, with bandpass+directivity being second and highest in the other. Averaged across all three acquisitions, bandpass+directivity had the highest ΔCNR, unfiltered+directivity was second, SVD+directivity was third, wavelet+directivity was fourth, and EMD+directivity was last. For the non-directivity methods, the same order resulted. The CNR of the directivity methods was significantly higher than the non-directivity methods (paired t-test, p < 0.05) for all three acquisitions.
There were various trends for FWHM. The five largest FWHMx decreases occurred using the five directivity methods in the point acquisition. In the multi-point acquisition, five of the six largest FWHMz increases were from the directivity methods. The FWHMx of the directivity methods was significantly lower than the non-directivity methods (paired t-test, p < 0.05) in all acquisitions except for the multi-point acquisition.

Discussion
The use of directivity significantly improved noise reduction in all acquisitions. Though it significantly improved lateral resolution in two of the acquisitions, its effects The unfiltered+directivity method had the highest ∆CNR in two of three acquisitions, with bandpass+directivity being second and highest in the other. Averaged across all three acquisitions, bandpass+directivity had the highest ∆CNR, unfiltered+directivity was second, SVD+directivity was third, wavelet+directivity was fourth, and EMD+directivity was last. For the non-directivity methods, the same order resulted. The CNR of the directivity methods was significantly higher than the non-directivity methods (paired t-test, p < 0.05) for all three acquisitions.
There were various trends for FWHM. The five largest FWHM x decreases occurred using the five directivity methods in the point acquisition. In the multi-point acquisition, five of the six largest FWHM z increases were from the directivity methods. The FWHM x of the directivity methods was significantly lower than the non-directivity methods (paired t-test, p < 0.05) in all acquisitions except for the multi-point acquisition.

Discussion
The use of directivity significantly improved noise reduction in all acquisitions. Though it significantly improved lateral resolution in two of the acquisitions, its effects on Sensors 2023, 23, 510 9 of 14 the resolution were mixed. There appeared to be two main effects of directivity: reductions in the intensity of the streaks broadly from the top-left to the bottom-right direction (green markers, Figure 3A vs 3P, Figure 3K vs 3Z), which is closest to the lateral direction and thus likely measured as an improvement of lateral resolution. They also reduced the noise (measured in the yellow box), resulting in the observed increase in the CNR. There are also increases in the streak length broadly from the top-right to the bottom-left direction (blue markers, Figure 3A vs 3P, Figure 3K vs 3Z), which is closest to the axial direction and thus likely measured as a worsening of axial resolution. The magnitude of the measured effects depended on how lateral the signal ROI was from the sensor array. This was relatively central in the point and finger acquisitions but more lateral in the multi-point acquisition because it was in the middle row. Considerable variations in streak lengths are seen at different locations using directivity in the multi-point acquisition (yellow marker, Figure 3U), which may be why the multi-point acquisition lacked a significant improvement in lateral resolution.
Adding directivity caused larger improvement in noise reduction, and to some extent, resolution, than different preprocessing methods. This is shown by the five largest axial resolution improvements occurring using the five directivity methods in the point acquisition and being significantly lower in all acquisitions except the multi-point acquisition, where, as suggested, the behavior is different due to the lateral signal ROI positioning. Even here, five of the highest six reductions in the axial resolution are from directivity methods, showing that directivity has a more significant effect.
The order of noise reduction is the same both with and without directivity. In decreasing order, it is: bandpass, unfiltered, SVD, wavelet denoising and EMD. Wavelet denoising had its best values for noise reduction, lateral resolution, and axial resolution only in the acquisition it was optimized for, the point acquisition, which suggests it is not generally optimized. This is supported by the noise reduction and lateral resolution becoming progressively worse with the increasing complexity of the signal, moving from the point to multi-point and finger acquisitions. Bandpass appeared to offer moderate improvements in noise reduction without much cost to the resolution and was the only method to outperform the unfiltered method consistently. Inconsistent results for lateral resolution were obtained using EMD and SVD. Distinctive artefacts are seen in the point acquisition (red markers, Figure 3D and 3E).
While this paper sought to provide a comparison of the common preprocessing methods and not the algorithm for deployment itself, we note the calculation time was, depending on the method, 7 to 43 s for a high-resolution (1200 × 1200) image. While this is below the speed required for real-time imaging, this could be achieved using parallelization and GPU acceleration which have previously yielded acceleration factors of 140-fold [24] and 125-to 1000-fold [25].
Deep learning methods were also used for preprocessing, such as the usage of convolutional neural networks (CNNs) [26] for artefact removal, the extension of bandwidth, and object detection using region-based CNNs [27][28][29] to separate signals from artefacts. While these could yield impressive results in terms of noise reduction, they are limited by the need for ground truth upon which to train, such as anatomical markers. This is typically difficult to obtain, resulting in the use of simulated data by a majority of studies which have shown poor translation to experimental data [18]. This ultimately limits the reliability of images for pathology and hence clinical diagnoses. As a black box, they generally are unable to show the paper trail of how the image was derived.
Furthermore, we note that this work was only focused on established preprocessing methods from a practical point of view demonstrated directly by experiments. The results may be affected by different experimental conditions such as laser fluence, transducer performance, target absorption, etc. Analytical models [30][31][32] may be helpful in predicting, verifying, and generalizing the results.

Conclusions
In conclusion, we tested combinations of classical signal preprocessing methods and directivity for 2D PA image reconstruction and found that directivity yields considerable improvements in both noise reduction and lateral resolution with the multi-segment transducer. With and without directivity, only bandpass filtering consistently yielded an improvement. Moreover, the advantages and disadvantages of other preprocessing methods were also investigated, which may be helpful in determining better preprocessing practices in PAT reconstruction. Data Availability Statement: Data will be provided upon reasonable request.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Appendix A Appendix A.1. Optimization of Wavelet Denoising
Wavelet denoising parameters (wavelet family, order, denoising method, threshold rule, and noise estimation) were optimized by the maximization of the CNR and the minimization of FWHM x and FWHM z on the point acquisition alone. Image processing was carried out using the default unfiltered method without directivity.

Appendix A.2. Selection of the Wavelet Family and Order
Each wavelet family available in MATLAB was compared (Table A1), namely coif (Coiflets), fk (Fejer-Korovkin), db (Daubechies), sym (Symlet), and haar (Haar). For all except haar, the wavelet orders also needed to be specified. Starting from the lowest order, the orders were tested until the maximum order specifiable was reached, or the CNR was observed to decrease monotonically for at least two orders.
Default settings were used for the other required parameters: "DenoisingMethod" was set to "universal threshold", "ThresholdRule" was set to "soft", and "NoiseEstimation" was set to "level-dependent". However, these settings were optimized in the following two stages.
The Daubechies family, with the fourth order, was selected as this was the highestranked.
Next, all permutations of "DenoisingMethod" and "ThresholdRule" were compared to find the best "ThresholdRule" in each "DenoisingMethod" (Table A2). Since using the absolute rank produced too many ties, the rank of the average % behind the best was used.
Finally, each of these combinations was compared using both level-dependent and level-independent noise estimation (Table A3). The highest-ranked combination was used. In total, this was Daubechies fourth order, universal threshold, hard thresholding, and level-dependent noise estimation.

Selection of the Optimal Row from the Multi-Point Acquisition
For each of the 17 points in the multi-point acquisition, the CNR, FWHM x , and FWHM z were individually calculated ( Figure A1). Sensors 2023, 23, x FOR PEER REVIEW 13 of 15 Figure A1. CNR, FWHMx, and FWHMz calculated for each of the 17 points in the multi-point acquisition.
The results (Table A4) were organized into rows, with the first row consisting of one point, followed by eight rows of two points. The average CNR, FWHM x , and FWHM z were calculated for each row. It was decided to use row five because it had the best values for the CNR and FWHM x .