An Atomic Force Acoustic Microscopy Image Fusion Method Based on Grayscale Inversion and Selection of Best-Fit Intensity

: Atomic force acoustic microscopy (AFAM) can provide surface morphology and internal structures of the samples simultaneously, with broad potential in non-destructive imaging of cells. As the output of AFAM, morphology and acoustic images reﬂect di ﬀ erent features of the cells, respectively. However, there are few studies about the fusion of these images. In this paper, a novel method is proposed to fuse these two types of images based on grayscale inversion and selection of best-ﬁt intensity. First, grayscale inversion is used to transform the morphology image into a series of inverted images with di ﬀ erent average intensities. Then, the max rule is applied to fuse those inverted images and acoustic images, and a group of pre-fused images is obtained. Finally, a selector is employed to extract and export the expected image with the best-ﬁt intensity among those pre-fused images. The expected image can preserve both the acoustic details of the cells and the background’s gradient information well, which beneﬁts the analysis of the cell’s subcellular structure. The experiments’ results demonstrated that our method could provide the clearest boundaries between the cells and background, and preserve most details from the morphology and acoustic images according to quantitative comparisons, including standard deviation, mutual information, Xydeas and Petrovic metric, feature mutual information, and visual information ﬁdelity fusion.


Introduction
High-resolution and nondestructive subcellular imaging instruments ensure that observing the activities of cells accessible is always on demand in biology development. In the 1990s, the introduction of an atomic force microscope (AFM) [1] provided a high-resolution and non-destructive measurement tool at macroscopic levels [2], which is a powerful platform for biological samples, from single molecules to living cells, to be visualized and manipulated [3]. As a near-field microscope, AFM has high resolution in the near field, but it is difficult to detect the inner structures [4].
Atomic force acoustic microscope (AFAM) is a nanoscale microscope that provides ultra-high-resolution images of the sample without destroying the intracellular structures [5]. It is developed by combining AFM with an ultrasound imaging module. The ultrasound transducer produces a single frequency acoustic wave under the cell, and the vibration of the probe is detected by the spot optical tracking system. As the outputs of AFAM, morphology images and acoustic images provide different information: morphology image provides the topography features of the cell by detecting the adhesive between probe and cell, i.e., the AFM image, while the acoustic image shows the phase of the acoustic waves at the same position, which reflects all contributions in the path to make the acoustic signal shift in the phase. In recent years, AFAM is applied for medical and biological research. For example, the observation of liver cells [6]. AFAM is applied to analyze small cells nowadays. The cells are so thin that signal shift is slight because of the thickness. Hence, the acoustic image only reflects the subcellular structure of the cell. Since the acoustic and morphology image provides completely different types of information, it will be beneficial for studying the cells' structure if we combine them together.
Image fusion is an important research topic in many related areas such as computer vision, remote sensing, robotics, and medical imaging etc. [7]. It is designed to combine multiple source images into a fused image that contains more information than individual source images. Image fusion is widely used in multi-mode image processing, such as visible and infrared images, computed tomography, and magnetic resonance images, and PET-CT images. Strategies for image fusion vary for different fusion tasks. Multi-scale transform (MST) is one of the most popular tools applied in image fusion tasks and has been fully studied. It divides the image into a low-frequency map, which shows a smoothed overall characteristic of the image, and a series of high-frequency maps, which contain the edge details of the image. The commonly used MST methods are the Laplacian pyramid (LP) [8] and nonsubsampled contourlet transform (NSCT) [9]. NSCT is an improved version of contourlet with good fusion results. Unfortunately, its computational efficiency is low. Fusion strategies based on MSTs can be further developed by using an effective way for low-frequency and high-frequency maps fusion to acquire high fusion quality. Yu Liu [10] proposes an image fusion framework based on MSTs and sparse representation (SR) [11], which performs well for the fusion of multi-modality images but needs complex computation. In addition to MST-based methods, deep learning for image fusion has attracted a lot of interest. In recent years, generative adversarial networks (GANs) [12] have been proven to provide more details than the traditional Convolutional Neural Networks (CNNs) in image generation. Jiayi Ma [13] introduced an end-to-end fusion method based on FusionGAN, avoiding manually designing fusion models in traditional methods. However, the network's loss function is still manual designed and the method only works well in visible and infrared images fusion [13]. Although various fusion methods show high performance in medical, multi-focus, and visible-infrared image processing, methods for fusing AFAM images need further research because the observer has difficulty to fuse the different information from the morphological image and acoustic image by mind. Morphology and acoustic images provide different information about the cells, but the high level of the morphology image's intensity may cover the details of the acoustic images. We propose a new method that uses grayscale to efficiently preserve both morphology and acoustic images. It highlights the position and shape of cells and preserves the cell's acoustic details in a single image. First, grayscale inversion is used to transform the morphology images into a series of inverted images with different average intensities. Next, the max rule is applied to fuse the inverted image and acoustic image to obtain a group of pre-fused images. Finally, a selector is employed to determine the expected image with the best-fit intensity among the pre-fused images.

Proposed Method
In this section, we introduce our fusion method and details to replicate the experiment. Figure 1 shows that the proposed fusion method contains three steps: inversion, fusion, and selector. Firstly, a morphology image is inverted into a series of inverted images using different averages. The max rule is then used to fuse the inverted images and acoustic image into the pre-fused images sequences. Finally, a selector is used to choose the expected image from the pre-fused images sequence as the fusion images' output.

Inversion
Our proposed method's first step is inverting the morphology image into a series of inverted images whose average intensities are different from each other. Our grayscale inversion approach is a simple linear approach, which is defined as below: where I m is the original morphology image. I k m is the k th inverted morphology image, V max is the largest gray level of I m . k = {1, 2, . . . , L} is the collection of inverted morphology images, and L is the maximum gray value of the acoustic image. We apply greyscale inversion to match the gray level diffusion of morphology and acoustic images. In Figure 2, we show the morphology, inverted, and acoustic image. The morphology image's intensity relates to the sample's height, so the bright area is the cell region, and the dark area is the background, respectively. As for acoustic images, many high-intensity noises surround the cell region, and the acoustic details in the cell are darker than the same region in the morphology image. Grayscale inversion aims to produce a high-intensity background and low-intensity cell region. The high-intensity background will reduce the contract of the noise and even cover it. Simultaneously, the low-intensity cell region makes acoustic details easier to be presented. The bright region, which is considered as background, may be beyond the bounds of the gray level. To present the cell's shape and gradient in the morphology image, the dark areas, which is considered the cell region, should take priority of the fusion work. As a trade-off, we truncate the image gray level to the bounds of the morphological image gray level, i.e., the pixels whose intensity is larger than 255 will be truncated to 255 for the 8-bit image.

Max Rule
The fusion rule of inverted and acoustic images is the max rule. At the pixel (i,j), it compares the intensity of the k th inverted image and acoustic image, and selects the larger one as the output. It is defined as follows: where I k f is the k th pre-fused image, I a is the acoustic image. Now, we fuse the information from both morphology and acoustic image. In the background, because the inversed image has a higher average gray level, the pre-fused image has smooth and high-intensity areas. In the boundary of the cell, the gradient from the bright region to dark region indicates the cell's shape. In the cell region, acoustic details are preserved because they have higher intensity than the inversed image. In Figure 3, as the increasing value of k, the pre-fused image provides more gradient information of the inverted morphology image but loses details of the acoustic image. Thus, the next step is to choose the image that preserves both morphology and acoustic information. We name this ideal image as the best-fit intensity image.

Selector
To choose the best-fit intensity image, a selector is designed to measure the information loss of the pre-fused images to select the best-fit intensity image as the final output of the fused image.
Because AFAM fusion aims is to combine the information of the cell, the regions of the cells should be extracted, and the fusion problem could be simplified to the fusion problem in the regions of cells.
To acquire the position and shape of the cell, a cell extraction algorithm based on Otsu's approach [14] was developed. Erosion and dilation steps are used to reduce the influence of the noise and smoothen the boundary of the cell region. Our cell extraction method is a triple-step process. The first step is detection. Otsu's threshold is used to segment the morphology image into CEL 0 and BKG at gray level t, which is defined as follows: where t is the threshold of classifying background and the cell. To clarifiy, Otsu's threshold is not an accurate segmentation method because it does not perform well in segmenting smooth boundary. Our method is not sensible to the segmentation accuracy. However, the small object, which is not the region of interest in the view of content but has the same gray level section, does harm to the fusion performance. To clear them, we apply the erosion step. We assume that the main part of the morphology image is the cell, and noise occupies small and separated regions. To clear these areas, the erosion step is defined as follows: where A is a disk-shaped structuring element, L A is the size of A, and p 1 = 1.2 is the multiple coefficient that determines the narrowing of the original segmentation of the cell. The function of A is to clear the noise regions which have a similar intensity feature of the cells, and the proper value of p 1 is to clear all the noise regions but still preserve the region of the cells. According to our hypothesis, the noise regions are removed during the erosion process but the cell region is narrowed either. Therefore, the dilation step, which is the third step, is used to recover the cell region. It is defined as follows: where B is a disk-shaped structuring element, L B is the size of B, and p 2 is a multiple coefficient that determines the extension of CEL 1 . The function of B is to recover the region of cells and make the boundary of the cells smooth. The value of p 2 is determined by p 1 , and it is defined as follows: At last, we obtain CEL, which represents the region of cell, and BKG, representing the background. The relationship of morphology image, BKG and CEL is defined as follows: Next, to set the standard of the best-fit intensity image among the pre-fused images, we define a multi-regional information loss (MRIL) to measure the information loss. It is a double-stage algorithm, which would calculate the total information loss of the inner and outer cells. To define the set of the criterion of the pre-fused images, characteristic of inverted morphology and acoustic image should be taken into consideration. As it is shown in Figure 2, inverted morphology and acoustic images show different information in different regions. In the cell region, because of grayscale inversion, the intensities of inverted morphology are weaker than those in the background and provide little information about the inner structures. By contrast, the acoustic image provides much richer details in the region of the cell. Thus, the fused image should preserve as many details of the acoustic image as we can. In terms of background, inverted morphology image is used to show the cell's shape and position, which is not clear in acoustic image. To measure the total information loss, MRIL is defined as follows: where L k CEL is the loss of cell and L k BKG is the loss of background. They are given by: where I k f is the k th pre-fused image, I ac is the acoustic image, rSFe I k f , I k m has a similar definition of the ratio of SF error [15]. The difference is that our rSFe method measures the space-frequency error between the pre-fused image and inverted morphology image in the region of BKG instead of fused image and source images. The reason why we choose rSFe to measure the loss in the background for the following two reasons. Firstly, the space-frequency remains the same no matter what the value of k is. Secondly, rSFe is a suitable regulation that makes the background as bright as possible to cover the acoustic noise, and it is not sensible to the segmentation accuracy of the cell either. Given that the acoustic image is characterized by their pixel intensities, MRIL is sensitive to among the pixels preserved from the acoustic image in the region of the cell.
According to the definition of information loss, the expected image is the image with the lowest L total . Thus, the selected image is defined as follows:

Experiment
In this section, we introduce the experiment of our method, including data acquisition, performance metrics, and the results and discussion of the experiment.

Data Acquisition
All the AFAM images we used in the experiment are acquired by CSPM5500, which is an open atomic force acoustic microscope from Guangzhou Benyuan Nano-Instruments Ltd., Guangzhou, China. Escherichia coli (E. coli) and Staphylococcus aureus (S. aureus) are the AFAM measurements samples. They are stored by the China Center for Type Culture Collection in Wuhan (Wuhan H22). As shown in Figure 4, 16 pairs of source images are employed to verify the effectiveness of the competing methods. Among them, there are 15 pairs of S. aureus images, and there is 1 pair of E. coli images. Four fusion methods, including Laplacian pyramid (LP) [10], nonsubsampled contourlet transform (NSCT) [11], fusion method based on Laplacian pyramid and sparse representation (LP-SR) [10], and FusionGAN [13], are applied for the comparison of the fusion performance. According to the original papers, we set the parameters of competing methods but use our AFAM images as the source images. Meanwhile, since visible and infrared images fusion has a similar pattern to AFAM images, we use our AFAM images database to retrain the FusionGAN to obtain better fusion performance at AFAM fusion task.

Performance Metrics
Because the reference image does not exist, it is hard to quantitatively evaluate a fused image's quality. In recent years, many fusion metrics have been proposed to evaluate fusion methods' performance, but none of them are widely believed. Therefore, it is necessary to apply several fusion metrics to evaluate fusion methods. To better evaluate the fusion performance of the cell region, we manually segment the source image of the source image into cell, and background and all the evaluation metrics measure the fusion performance in the areas of cells. Segmentation results are shown in Figure 5. The quantitatively evaluate metrics, we chose the following five metrics, including standard deviation (SD) [16], mutual information (MI) [17], Xydeas and Petrovic metric (Q AB/F ) [18], feature mutual information (FMI) [19], and the visual information fidelity fusion (VIFF) [20]. Here are the definitions of the evaluation metrics.

1.
SD is a value of the degree of a set of image data averages, reflecting the fused image's contract.
Mathematically, SD is defined as follows: where I F is the fused image, and µ is the mean value of the fused image.

2.
MI measures the degree of dependency between two events A and B. It is defined as follows: where p(u, v) is the joint distribution and p(u) and p(v) are the marginal distribution. Considering two source images A and B, and the fused image F, MI is defined as follow: 3. Q AB/F is an objective performance metric which measures the relative amount of edge information that is transferred from the source images to the fused image. It is defined as follows: where Q XF g (i, j) and Q XF o (i, j) are the edge strength and orientation preservation value at the pixel (i, j). w A (i, j) and w B (i, j) reflect the weight of Q AF (i, j) and Q BF (i, j).

4.
FMI computes the amount of feature transfer from source images to the fused image. A gradient map of an image provides information of texture, edge strength, and contrast included. 5.
VIFF is a recently proposed metric that measures the visual information fidelity between the source images and fused image. It measures the visual information of every block in each sub-band of the fused image based on the Gaussian scale mixture model, distortion model, and human visual system (HVS) model.

Results and Analysis
Firstly, we try to compare the fusion quality of different methods. As for Figure 6, three typical images are selected to intuitive results on the fusion performance. The first and second rows show the fusion method's ability to provide the details between the acoustic image. The third row requires the fusion method to make a distinction of two cells. The first two columns are morphology and acoustic images, the last column presents the fusion result of our method. The other columns are the results of the compared fusion methods. From the result, it is easy to conclude that our fused images are the best at preserving acoustic information. The inverted morphology part smartly highlights the shape of cell and avoids covering the acoustic details. LP and RP present similar fusion results. They can both acquire boundary and the acoustic information of cells, but they lack the acoustic information inside the cell. The fusion result of NSCT shows that the edge of the inner structures is sharp, and the shape of cells is easy to distinguish, but the contrast of the inner structure is low. FusionGAN is good at preserving morphology information, but it only keeps gradient information of acoustic images. As a result, the internal structures of the cells remain low contrast and hard to be found. Our method's fusion result is much different from the others, which is the result of grayscale inversion applied to morphology images. The area outside the cells is not the region of interest, so the strong intensivity of the inverted morphology image would cover the acoustic noise, which is why the region of the background is bright. The cells' region is thought to contain much acoustic information, and the low insensitivity of morphology information has high contract with the acoustic information and highlights the detail of the internal structures of cells. Moreover, the gradient from the bright background and the dark region of the cell forms the boundary. As a result, the obtained results by the proposed have sharp edges, more details, and enhanced contrast inside the internal structure and clear boundary of cells. The analysis of fusion images' features indicates that the application of grayscale inversion significantly highlights the cell and prevents the bright morphology pixel to cover the cell structure.
To assess the fusion quality in the cells' region, we further give quantitative comparisons of the five methods. We first manually segment the morphology image to obtain the weight map of the cell. Then, we apply the weight map to pre-process the fused image, morphology image, and acoustic image. Next, the five fusion metrics are used to evaluate the fusion quality of the pre-processed fusion image. Tables 1-5 list the metrics of the 16 images; the 9th image is E. coli and the others are S. aureus. Table 6 shows the result of the t-test between our proposed method and the methods with the second-highest value. According to Tables 1-3, our method approximately has the most extensive MI, Q AB/F , and FMI. The largest MI implies that our method preserves the most information of the source images. Our method has the best MI because it is enhanced to save the intensities of the acoustic image in the region of the cell, which is much stronger than the morphology information. The largest Q AB/F demonstrates that our fused image has rich edges information. Since the acoustic image provides most of the edge information, we can also consider that our fused image has the most considerable acoustic information. The largest FMI also demonstrates that our method has the most features including texture, edge strength, and contrast. Although the values of VIFF and SD do not have as large an advantage as the above three metrics, they still have the highest average among the compared methods. The results of the t-test imply the significance of our method. The result of VIFF yet demonstrates that our method is great, consistent with the human visual system. Although the VIFF of NSCT is close to our method, other metrics imply a certain gap between NSCT and the proposed method. The result of SD shows that our fused image expressed high contrast but not clear enough to be considered as the best. According to Table 5, FusionGAN shows much better fusion performance than the other method in the 10 th image, and results in low reliability according to the t-test shown in Table 6. Further research implies that SD of our method is highly dependent on acoustic image. The low SD of the fused image is the result of low contrast of the 10 th acoustic image. The fusion metrics above indicate that our method is good at preserving acoustic detail inside the cell, which is beneficial to analyzing the nanoscale and subcellular structure of the cell. To compare the computationally intensive, we list the operation time of processing a pair of images in Table 7. Since FusionGAN runs in TensorFlow, it is not suitable to compare with other methods running in MATLAB. Our method is slower than LP-SR and LP, but it can still fuse the images in less than one second.

Conclusions
In this paper, a novel fusion method for AFAM image fusion was proposed based on grayscale inversion and best-fit intensity selection. Grayscale inversion aims to change the grayscale feature of the morphology image and avoid the tradeoff between morphology and acoustic information. The inversion process is improved by inverting a single image into a series of inverted images and selecting the most suitable one as the output, which could avoid the tradeoff between high-intensity morphology information and complex acoustic details. The image segmentation method based on Otsu's approach segments the regions of cells and background, which can greatly help reduce the interruption of the morphology image and improve segmentation precision. The selection of the best-fit intensity is applied to measure the total information loss of the pre-fused image and choose the best one as the output of our method, and MRIL is used as the fusion parameter of optimization. Experiments on different fusion methods can prove that our method is the best at presenting the features of source images, and saving information from the AFAM images. The results of quantitative comparisons also demonstrated that the contrast of our fused images depends on the source of the acoustic images, which means that our method is designed to restore acoustic image in the region of the cell. The results indicate that our method is good at highlighting the cell and providing subcellular structure, which is beneficial for the analysis of the cell's structure. Currently, the quality of acoustic image limits the fusion performance, while in the future, we will focus on the fusion rules for the inverted and acoustic images and make our fusion model more robust and efficient.