Classification of Breast Cancer Cells Using the Integration of High-Frequency Single-Beam Acoustic Tweezers and Convolutional Neural Networks

Single-beam acoustic tweezers (SBAT) is a widely used trapping technique to manipulate microscopic particles or cells. Recently, the characterization of a single cancer cell using high-frequency (>30 MHz) SBAT has been reported to determine its invasiveness and metastatic potential. Investigation of cell elasticity and invasiveness is based on the deformability of cells under SBAT’s radiation forces, and in general, more physically deformed cells exhibit higher levels of invasiveness and therefore higher metastatic potential. However, previous imaging analysis to determine substantial differences in cell deformation, where the SBAT is turned ON or OFF, relies on the subjective observation that may vary and requires follow-up evaluations from experts. In this study, we propose an automatic and reliable cancer cell classification method based on SBAT and a convolutional neural network (CNN), which provides objective and accurate quantitative measurement results. We used a custom-designed 50 MHz SBAT transducer to obtain a series of images of deformed human breast cancer cells. CNN-based classification methods with data augmentation applied to collected images determined and validated the metastatic potential of cancer cells. As a result, with the selected optimizers, precision, and recall of the model were found to be greater than 0.95, which highly validates the classification performance of our integrated method. CNN-guided cancer cell deformation analysis using SBAT may be a promising alternative to current histological image analysis, and this pretrained model will significantly reduce the evaluation time for a larger population of cells.


Introduction
Image analysis of cancer cells is an emerging technique with growing applications in cancer research and plays a vital role in accurate diagnoses of cancer [1][2][3][4]. Histological image analysis has been extensively studied as a clinical diagnostic method of primary cancer cell classification after biopsy. However, due to the huge variability of the quality of/the condition of histological images and the subjective nature of manual analysis by experts, several limitations have been found [5][6][7]. One limitation, common to all manual image analysis, is observational variation among histopathologists and clinicians. Another critical limitation is non-automatic complex analysis protocols that increase evaluation times without increasing reliability. To overcome those major hurdles, an accurate and reliable quantitative analysis method that directly measures the physical properties of cells is gaining attention rapidly.
Over the years, there have been numerous studies that measured cell biomechanics using various techniques such as atomic force microscopy (AFM) [8][9][10], optical tweezers (OT) [11,12], magnetic tweezers [13], and stretchable substrates [14]. It is well documented that cell elasticity is closely linked to the invasion potential and infectibility of cells. However, the intrinsic limitations of those techniques, such as direct contact, limited forces, or labeling inside cells hinder reliable measurement at a single-cell level [15]. Among various attempts to measure cell elasticity, approaches using single-beam acoustic tweezers (SBAT) have emerged as a promising tool due to its micrometer-sized trapping, non-contact exertion of the force, no labeling requirement, and nanonewton trapping forces [16][17][18][19][20][21]. In the last decade, there has been tremendous success in studying the characteristics of a single cell using high-frequency ultrasound stimulation and acoustic tweezers [22]. In a study by SBAT, a 200 MHz ultrasonic transducer measured calcium responses of cultured breast cancer cells using the local cell membrane deformation [20], and the pattern of deformability of various cancer cells was analyzed for identifying cancer cell invasiveness [23].
In this study, experiments for cancer cell identification are presented using a high-frequency SBAT system that offers micrometer resolution spatially. This system is based on trapping and deforming a single cell using acoustic forces and quantifies the degree of the deformation caused by SBAT. Contrary to other tweezer techniques, SBAT can generate trapping forces up to a few hundred nanonewtons and can press the cell against the wall [17,[24][25][26]. The level of cell deformation can be controlled by the input acoustic parameters. Usually, acoustic pressures lower than 1.0 MPa does not cause significant effects on the cell condition, as previously proven by live-cell viability tests [27][28][29]. An earlier examination of the cell deformation with the SBAT on and off was based on the extraction of boundaries of the cell image. The difference was clearly visible and was identified with boundary markings, but such manual analysis methods are time-consuming and are prone to subjective interpretation. The highly variable shape and structure of cells, as well as the variable locations of cell abnormalities, pose further challenges. The development of computational imaging analysis that minimizes variability and subjective analysis is of utmost importance.
A convolutional neural network (CNN) offers a better solution than previous manual analysis of cancer cell invasiveness. Using numerous convolutional filters, CNN can compare cell deformability in images with the SBAT on and off. Since CNN can train the optimal filters for classifying cancer cells, we can obtain much higher accuracy than the conventional handcrafted filters. The accuracy of CNN can be improved continuously as new cases are added.
The present study demonstrates the fabrication of a highly focused ultrasonic transducer at 50 MHz, the cell deformation phenomenon based on the radiation and trapping force of the SBAT, and the investigation of cell invasiveness using CNN. Two cell lines with different degrees of metastasis: MDA-MB-231 (highly invasive) and MCF-7 (weakly invasive) were deformed under the SBAT. For analysis of the deep learning CNN model, cell images were preprocessed to emphasize cell boundaries and reduce noise. CNN model was then trained to classify the images as MDA-MB-231 and MCF-7. The proposed model has shown significant accuracy (precision: 0.96, recall: 0.99, F 1 measure: 0.97). Derived values of cell membrane deformation under the static state demonstrate the capability of classification of human breast cancer cells. The integration of ultrasonic devices and CNN models may serve as meaningful groundwork offering a high precision rate for the development of a new diagnostic approach for cancel cell classification.

Results
Highly invasive and weakly invasive cancer cells have been implicated in different forms of metastatic potential, so numerous in-depth studies have investigated the invasiveness properties of cancer cells using various tools. The major challenges were related to cell safety issues caused by mechanical contact and to limited forces they can generate. On the contrary, SBAT with the benefit of having micro-trapping and strong-trapping force, can trap and press the cell leading to deformation along the transverse axis as depicted in Figure 1. For single-cell deformation, a focused ultrasonic transducer with a beam width comparable to a cell diameter was fabricated. Detailed profiles of the final product are demonstrated in Figure 2. Biophysical characteristics i.e., the elasticity of cells and accurate detection of their morphological changes, can serve as a new diagnostic approach for cancer conditions. We investigated MDA-MB-231 (higher metastatic potential) and MCF-7 (weaker metastatic potential) in a suspended state floating above the Petri dish and monitored the cell lines during the SBAT. Increasing acoustic power facilitates cell deformation, causing area changes that are directly proportional to the applied acoustic pressure. Acoustic pressure was gradually increased from 0.0 to 1.0 MPa with the driving conditions of the fixed duty factor of 1%, the fixed pulse repetition frequency (PRF) of 1 kHz, and various peak-to-peak input voltages (V pp ). As shown in Figure 3, the MDA-MB-231 was imaged with the SBAT on and off. Figure 4 presents the comparison between the shapes of the MDA-MB-231 and MCF-7 with the SBAT on and off. Figure 5 is an example of fluorescence live-cell images. A similar tendency was shown for both cells; however, MDA-MB-231 still exhibited more deformation properties under the SBAT, which validates that the Young's modulus of MDA-MB-231 cells was lower than that of the MCF-7 [9,[30][31][32][33][34][35].
Previously, the deformability of a human breast cancer cell was measured relatively with an acoustic trap. The researchers still required manual analysis to track the change in the area of the cell after SBAT was turned on, drawing out the boundaries of the cell before and after deformation [23]. In this study, we developed a classification method using CNN to distinguish whether cells are highly or weakly invasive. For fast, precise, and automatic classification and detection of cancer cells after the SBAT experiment, we applied the CNN model to 40 cells that consist of 20 MDA-MB-231 and 20 MCF-7 cells. We conducted five-fold cross-validation. Each validation case used 80% of the cells for training the model and the remaining 20% for testing. All the cells were used as a testing sample once. By augmenting cell images, we generated 200 images for each cell (total 8000 images), as shown in Figure 6. Therefore, in a validation case, the CNN model was trained for classifying 6432 (32 × 201) cell images into invasive and non-invasive ones that include 3216 (16 × 201) images, respectively. Then, the model was tested by classifying the remaining 1608 (8 × 201) cell images according to their invasiveness. We measured the accuracy of the model on each validation case and evaluated the model using its average accuracy and variance for all the validation cases. The variance will show whether the proposed model can exhibit reliable performance generally. Preprocessing procedures for cell images. The first column presents photomicrographs taken when the SBAT was on and off, displaying the cell membrane deformation with the SBAT. The second column shows the results of the contrast enhancement. In the third column, we composed color channels of the combined image by using the enhanced photomicrographs. First, the red channel corresponds to the predeformation. The green channel exhibits the post deformation. As an average of the red and green channels, the blue channel will show us background areas. The last column is the result of the combination. We applied the same preprocessing method to both more and less deformed cells (MDA-MB-231 and MCF-7). Scale bar indicates 10 µm.
Our CNN model consists of three two-dimensional convolutional layers, three max-pooling layers, and two FC layers as shown in Figure 7. We implemented the model using Keras in Python. To evaluate the performance of our model, we used four metrics: accuracy (a), precision (p), recall (r), and F 1 measure (F 1 ). When M is a set of the automatically detected MDA-MB-231 cells, and M * denotes the actual MDA-MB-231 cells, the metrics can be formulated as: where |·| denotes the size of sets, and U refers to all the cells in our dataset. The precision indicates a ratio of what we correctly found for what we found, the recall means a ratio of what we correctly found for what we should find, and F 1 measure is their harmonic mean. The CNN model contains various hyper-parameters. To determine the parameters, we conducted a grid search. Table 1 presents the ranges and step sizes of the search for each parameter.     The batch size indicates how frequently we update the weights of our CNN model. When the batch size is 2, we update the weights according to the loss of every two images in the training set. One epoch indicates one iteration of training. Thus, denotes how many iterations we will conduct. There are various methods for searching optimal weights θ [36]. We applied five methods: stochastic gradient descent (SGD) [36], RMSprop (http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_ slides_lec6.pdf), Adagrad [37], Adadelta [38], and Adam [39]. Since Adadelta applies a decay factor to the learning rate according to epochs and recommends setting the initial learning rate as 1.00, we did not search the optimal learning rate for the Adadelta. We also conducted the hyper-parameter search for all the methods. Performances of the CNN model, according to epochs, exhibit stability of the proposed model. Figures 8 and 9 show performance fluctuations of the optimizers on the same validation case. We compared the optimizers using their accuracy and loss on the optimal hyper-parameters and epoch. We employed the binary cross-entropy loss ∈ [0, 1] (Equation (7) in Section 4.6.2). Table 2 presents averages and standard deviations of the performance metrics over the validation cases.   We also examined whether the proposed method is applicable to the calcium fluorescence live-cell images, which are widely used for tracking the oscillation of cytosolic calcium concentration. Fluorescent intensity caused by intracellular calcium also plays a fundamental role in determining cancer invasiveness. It is worthwhile to note that Figures 4 and 5 demonstrate that the deformability and fluorescence intensities of MDA-MB-231 are significantly higher than that of MCF-7, which is in agreement with existing literature [40,41]. We took a photomicrograph of 10 fluorescent cells consisting of five MDA-MB-231 and five MCF-7 cells. We assessed whether the proposed model, which is trained for non-fluorescent cells, can be used for fluorescent cells. This experiment can validate network generalization of the proposed model by showing that the model is capable of handling the diversity of cell photomicrographs. We used models trained by RMSprop and Adadelta optimizers, which have the highest accuracy. Table 3 presents the performance metrics for fluorescence cell images.

Discussion
In this study, we assumed that the conventional CNN model would be enough to classify cells according to their deformations. As shown in Table 2, the CNN model exhibited remarkably high accuracy. With the Adadelta, the most suitable optimizer, accuracy, precision, recall, and F 1 measure of the model were commonly greater than 0.96. The accuracy was also stable for both validation cases and epochs. Particularly, the Adadelta and RMSprop performed significantly lower variances than the other cases. Their recall was 0.99 ± 0.01. Their standard deviations for the accuracy and F 1 measure were 0.05. This result underpins the finding that we can diagnose diseases, which affect the deformations of cells, quickly and automatically, by integrating the ultrasonic devices and CNN model.
In addition, Figures 8 and 9 show that the accuracy and F 1 measure of the proposed model were more stable than the precision and recall while converging according to epochs. In general, the precision and recall have tended to show a trade-off relationship, and our results in Figures 8 and 9 also revealed the same, but the precision decreased more while the recall increased. Validation losses revealed these problems: (i) The proposed model exhibited significant fluctuations in their validation losses according to epochs, while the training loss converged. (ii) The Adadelta and RMSprop optimizers exhibited higher and less stable validation losses than the other optimizers. The high loss with high accuracy in the binary cross-entropy indicates our model generated correct answers, but with low confidence. In other words, the cell deformation was an effective feature for diagnosing invasiveness of cancers automatically, but borderline cases still exist. This issue will be resolved by employing additional features or data samples as shown in the fluorescence experiments (Table 3).
Although the difference between the SBAT on and off images was much more vivid in the fluorescence cases than in the original cases (Figures 4 and 5), the accuracy of the proposed method was lower in the fluorescence images. This is because the proposed model was not trained for the fluorescence images. Yet, the RMSprop optimizer still exhibited high precision (0.97), F 1 measure (0.89), accuracy (0.90), which represents that the proposed model is applicable for both fluorescent and non-fluorescent cells. On the fluorescence images, their precision was commonly higher than their recall. It implies that these images showed not only the deformation caused by the SBAT but also brightness changes. As we expected, additional features were helpful for dealing with the borderline cases.
Additionally, the experimental results show that the proposed model was free from the overfitting issue. Although the number of samples is restricted, we exhibited that our CNN model had reliable and stable performances on overall samples by using the k-fold cross-validation. The experiment of fluorescence images also supports that the proposed model can handle the diversity of photomicrographs produced in this research domain. Moreover, by adopting the shallow CNN, we attempted to avoid the possibility of the overfitting and showed that the shallow model is enough to classify cancer cells according to their invasiveness. At this moment, we are not sure that the proposed model is generally applicable to other cancer cell lines or diseases. Nevertheless, the experimental results are enough to show the necessity and prominence of integrating the SBAT and machine learning techniques.
In summary, this study experimentally demonstrated the capability of the SBAT to deform the cell and to classify the breast cancer cell based on their invasiveness through CNN. It was shown that the proposed model exhibited reasonable accuracy for both non-fluorescent and fluorescent and cells. Typically, the images CNN trained in this study is quite common, i.e., the cell morphology and background. Therefore, the relatively lower recall than its precision was found and might be caused by not offering various features for training. To enhance the recall rate of CNN, the sample number and the types of cells can be increased.

Transducer Fabrication
Piezoelectric single crystals, lithium niobate (LiNbO 3 ) are widely used to manufacture high-frequency ultrasonic transducers due to its high electromechanical coupling coefficient (k t ∼ 49%) and low dielectric permittivity ( s ∼ 39). We fabricated a 50 MHz press-focused transducer using the 36 • -rotated Y-cut LiNbO 3 (Boston Piezo-Optics, Bellingham, MA, USA) with the following steps [16,21]. A Krimholtz, Leedom, and Matthaei model software (PiezoCAD, Sonic Concepts, Bothell, WA, USA) offered both an optimal aperture size and thickness of an acoustic stack which includes piezoelectric, matching, and backing layers. LiNbO 3 was bonded to the glass plate and was manually lapped down to 61 µm. After the lapping process, chrome (500 Å) and gold electrodes (1000 Å) (Cr/Au, Nano-Master, Austin, TX, USA) were sputtered on the matching side of the material. The first matching layer, a mixture of 2-3 µm silver particles (silver; Aldrich Chemical Co., St. Louis, MO, USA) and Insulcast 501 epoxy (Insulcast 501, American Safety Technologies, Roseland, NJ, USA), was bonded to the front side of the LiNbO 3 layer and lapped down to 9 µm. Chrome and gold electrodes were sputtered on the backing side of the LiNbO 3 layer. Conductive epoxy (E-solder 3022, Von Roll Isola, Schenectady, NY, USA) was attached to the backside of the material at a thickness of 1 mm, and the final acoustic stack is fabricated. After the acoustic stack was turned down to a diameter of 5 mm using a lathe, it was wired with a single-lead wire at the backing layer. The stack was concentrically placed in a brass housing. The gap between the acoustic stack and the housing was filled with an insulating epoxy (Epo-tek 301, Epoxy Technologies, Billerica, MA, USA) to prevent a short circuit. A heated bearing ball was placed on the surface of the matching layer and mechanically pressed to generate a concave structure. Another layer of chrome and gold electrodes with a thickness of 1500 Å was sputtered on top to make a ground signal. An SMA electrical connector was mounted, and the second matching layer, a parylene film (10.5 µm), was coated the outermost surface of the transducer using a PDS 2010 Labcoater (SCS, Indianapolis, IN, USA) for the second matching layer and protection purposes.

Transducer Performance
A JSR pulser/receiver (DPR500, Pittsford, NY, USA) was used for a pulse-echo test of the fabricated transducer. It generated electrical impulses at a 500 Hz repetition rate and a damping ratio of 50. RF echo signals of the transducer received from a flat quartz reflector were analyzed. Figure 2a,b shows a measured pulse-echo response and the frequency spectrum, respectively. The center frequency was 50 MHz, and the −6 dB fractional bandwidth was 80%. Quantitative spatial peak temporal average intensity (I SPTA ) in two-dimensional lateral and axial directions was derived after calibration with a needle hydrophone (Precision Acoustics, UK) as shown in Figure 2c. The driving conditions were as follows: frequency of 50 MHz, pulse repetition frequency (PRF) of 1 kHz, cycle number of 10, and input peak to peak voltage of 25 V. The −3 dB lateral beam width was measured to be 32 µm. Lateral resolution is determined by the center frequency and f-number of the transducer. F-number (the focal distance of 4 mm/aperture diameter of 5 mm) was calculated to be 0.8, and the theoretical lateral resolution is 24 µm.

Cell Preparation
Human breast cancer cell lines, MDA-MB-231 and MCF-7, were purchased from ATCC (Manassas, VA, USA) and maintained in modified complete medium (RPMI, 10% fetal bovine serum, 10 mM HEPES, 2 mM L-glutamine, 1 mM sodium-pyruvate, 0.05 mM 2-mercaptoethanol, and 11 mM D-glucose). They were cultured in 5% CO 2 at 37 • C. The SBAT traps and deforms a single-cell in a suspended cell, so a trypsin-ethylenediaminetetraacetic acid (trypsin-EDTA) solution obtained from Invitrogen (Grand Island, NY, USA) was used to detach cultured cells from the bottom of the Petri dish. After the addition of trypsin-EDTA into the culture dish, the cells were incubated at 37 • C for 2 min. An equivalent volume of modified complete medium was added to neutralize the trypsin. Phosphate buffer solution (PBS) was purchased from Invitrogen (Grand Island, NY, USA) for washing cells before acoustic tweezer experiments. With the inverted microscope, we confirmed that the cell was slightly touching or floating on the Petri dish without morphological damage. During experiments, cells with blebs were excluded from the sample for measurements. The cell viability test also validated that there was no significant adverse effect on the cell's condition during the experiment.

Live Intracellular Calcium Imaging
For the fluorescence cell image, both cell lines of MDA-MB-231 and MCF-7 were seeded on culture dishes and kept in the CO 2 incubator for 48 h before experiments. HBSS with Ca 2+ and Mg 2+ was used as the working solution. Cells were washed with HBSS three times and incubated with 3 µm of Fluo-4 AM at room temperature for 30 min for Ca 2+ imaging. After incubation, the cells were washed three times with HBSS and imaged with an epi-fluorescence inverted microscope during experiments.

SBAT for Cell Deformation
The demonstration of an acoustic tweezers system is described in Figure 1. A focused 50 MHz ultrasonic transducer and ultrasound electronics which includes a pulser-receiver, a function generator (Stanford Research Systems, Sunnyvale, CA, USA), and a 50 dB power amplifier (525LA, ENI, Rochester, NY, USA) were integrated with an inverted fluorescence microscope (Olympus IX-71, Center Valley, PA, USA) to monitor the SBAT. The movement of the transducer was controlled by a three-axis motorized stage (SGSP 20, Sigma KOKI Co., Midori, Tokyo, Japan). The focal point on the Petri dish was aligned using a pulser-receiver, and a 50 MHz sinusoidal burst signal, generated by a function generator and amplified by a power amplifier, was driven on the transducer to trap, manipulate, and deform a suspended single-cell. The duty cycle and PRF were set to 500 cycles and 1 kHz, respectively. The input peak-to-peak voltage was set to 0.00, 4.74, 9.48, 14.22, 18.96, or 23.70 V pp (corresponding acoustic pressures: 0.00, 0.23, 0.43, 0.63, 0.82, and 1.00 MPa, respectively). An inverted microscope and a CMOS camera (ORCA-Flash2.8, Hamamatsu, Japan) were used for the recording of the SBAT and cell deformation.

Cancer Cell Classification with Convolutional Neural Networks
The study aim was to validate whether invasive cancer cells can be detected automatically using a conventional CNN model. We applied the CNN model to 40 cells. Half of the cells had significant deformation and invasiveness (MDA-MB-231), and the half did not (MCF-7). For each cell, we took photos with the SBAT on and off. Then, the CNN model was trained to classify cancer cells into invasive and non-invasive groups. The deformation is a major feature of the classification. However, since CNN is one of the black box models, our model will learn various and uninterpretable features from cell images.

Preprocessing
In this study, we use a conventional CNN model, which cannot deal with time-serial data. However, since we expect that the deformation will be the key feature, the model has to consider changes in cell size. We propose an image preprocessing method to solve this problem. Most of the image files consist of multiple color channels (e.g., red, green, and blue channels). The CNN model also accepts multi-channel images. On the other hand, our input images (photomicrographs) are gray-level images. Thus, we deliver cell images with the SBAT on and off indicating the different cell deformability through the red and green channels of the input image, respectively, as shown in Figure 3.
Detail procedures of the preprocessing are as follows.

1.
Enhance contrast of cell images.

2.
Put the SBAT on images as the red channel, the SBAT off images as the green channel, and the average of the SBAT on and off images as the blue channel.

3.
Save the combined image.
Since some cell images include noise from reflected light, CNN model is taught to recognize the noise by using two methods. First, cell areas and boundaries are emphasized by enhancing the contrast of the images. The enhancement is conducted by normalizing pixel values of the images into [0,255]. This can be formulated as: where p x,y is a pixel value, (x, y) indicates a pixel coordinate, p * x,y denotes a pixel value after the contrast enhancement, and · denotes the rounding function.
Second, for the machines, it is difficult to identify which parts of images are cells or backgrounds. Changes between the SBAT 'off' and 'on' images mainly occur in the cells. Thus, we made a new channel by averaging corresponding pixel values from the two images. The average will dilute the changes and preserve only the backgrounds. This can be formulated as: where p N x,y , p B x,y , and p A x,y are pixels on (x, y) in the 'background,' 'SBAT off,' and 'SBAT on' channels, respectively. Therefore, on our input image (I = B, A, N ), cells with the SBAT on and off are marked by red and green colors, respectively, as displayed in Figure 3. Additionally, we took 40 pairs of images from the 40 cells. However, the number of samples is not enough to train the CNN model. Thus, image augmentation was conducted. We employed the augmentation tool supported by Keras (https://keras.io/preprocessing/image/#imagedatageneratorclass). The augmentation tool generates new images by rotating, scaling, and translating the original images, as displayed in Figure 6. This process also makes the CNN model robust to those transformations. When the proposed model is deployed, the quality of the input photomicrographs is not guaranteed. Operators of this model cannot always be well-trained experts. Therefore, robustness is significant for the practicality of the proposed model.

CNN Model for Cancer Cell Classification
We applied the conventional CNN model to detect invasive cancer cells. We expected that the conventional model would be enough for this task, since deformability, our key feature, is vivid in the preprocessing results (Figures 4 and 5). In this research domain, it is difficult to collect an enormous amount of cell images to train deep CNN models that consist of hundreds of convolutional layers. Although we use the data augmentation, the deep models include too many weights to avoid the overfitting issue. In Sections 2 and 3, we exhibited that the shallow model has enough accuracy and is free from the overfitting by using the k-fold cross-validation and fluorescence cell images. This model consists of three two-dimensional convolutional layers, three max-pooling layers, and two FC layers. After the convolution parts, we flattened the outputs from matrices to a vector. Then, we put the vector as the input of the FC layers. Lastly, based on the output of the FC layers, we classified cells into two groups: MDA-MB-231 and MCF-7 cells. Figure 7 presents the structure of the CNN model in detail.
The convolutional layers consist of multiple convolutional filters. For example, our first convolutional layer consists of thirty-two 3 × 3 convolutional filters. The filters are square matrices, and their elements are weighting factors. Each convolutional filter calculates the weighted summation of pixel values in a part of the input image. The weighted summation reflects visual features in the part. Conventionally, the filters were designed to detect particular visual features using gradients of pixel values. For example, to detect horizontal edges, we can contrast pixels on the upper sides with on lower sides. This can be formulated as follows: where * denotes the weighted summation, the first matrices are parts of the input image, and the second matrix is a filter for detecting horizontal edges. The filter searches the input image by calculating the weighted summation on every n pixels. The step size n is called as stride, and the filter moves n pixels vertically or horizontally from top-left corner to bottom-right corner of the input image. However, there are limitations to design all the filters heuristically. Especially, shapes and deformation of the cells are not much typical. In other words, we expect that the deformation is a distinctive feature of cancer cells. Nevertheless, it is a challenging task to design convolutional filters for detecting the deformation over the noisy and atypical photomicrographs. Thus, we employed a CNN that can train convolutional filters for a specific purpose with a black-box approach. Simply speaking, we can express the layers in the CNN model, which consist of a number of convolutional filters, as linear functions. Each layer calculates weighted summations of input variables and applies activation functions on the summations. Outputs of lower layers are inputs of the upper layers. Since the linear functions are too simple to solve complicated problems, the activation functions transform output spaces of neural networks into non-linear spaces. All the layers of our model, excluding the output layer, use the rectified linear unit (ReLu) function as their activation, which is most widely used. The output layer uses the sigmoid function for the binary classification. Thus, our model f (·) can be formulated as: where f (n) (·) denotes the n-th layer, h (n) (·) refers to the activation function of the n-th layer, θ n indicates weights on the n-th layer. The training is conducted by the back propagation. Errors on the results of the CNN model are propagated to all the convolutional filters to update their weightings. Since our task is binary classification, we use the binary cross-entropy function to measure the loss (error) of our model. The output of the model is a real number in [0, 1]. When the output is greater than 0.5, we determine that a cell in the input image is in the MDA-MB-231 group; otherwise, in the MCF-7. Therefore, we train the model to print 1 for the MDA-MB-231 cells and 0 for the others. The loss function can be formulated as: where Y is the ground truth for the input image X, which is 1 for the MDA-MB-231 cells and 0 for the others. When the model makes correct answers, L(θ) will be 0; otherwise, positive real numbers. We train the model to find the optimal weights that minimize L(θ). This optimization is conducted by using gradients of L(θ) to the weights θ. To update the weights, most of the optimization methods move the weights according to the directions and sizes of the gradients, with an assumption that L(θ) is a convex function. This can be formulated as: where θ * is the updated weights, and ρ denotes the learning rate, which means how rapidly the weights are updated. For larger gradients and learning rates, θ moves more quickly. The learning rate, ρ must be tuned not to be stuck in local optima and not to pass over convex areas. In Section 2, we searched for the optimization method and hyper-parameters appropriate for our model.

Cell Viability Test
Cytotoxicity of SBATs on MDA-MB-231 and MCF-7 cells was evaluated using Calcein-AM (Thermo Fisher Scientific, Indianapolis, IN, USA). Calcein-AM is a dye that enters live cells converting to green fluorescent. Calcein-AM was prepared as a stock solution of 1 mM in dimethylsulfoxide at room temperature. A final concentration of 10 µm of Calcein-AM was added into the cell culture dish. Fluorescence imaging of the cells was observed using a microscope (an excitation of 488 nm and an emission of 532 nm). Figure 10 shows the results of the cell viability experiments: before SBAT (negative control) and after SBAT (experiment). The cells exposed to SBAT with 1.0 MPa for 1 min. If ultrasound affects the cell membrane integrity, the decrease of fluorescence intensity is observed. The normalized mean viabilities (0, 60, 120 min after trapping) for the MDA-MB-231 and MCF-7 cells were 1.012 ± 0.039 and 1.020 ± 0.038, respectively. The values at each cell line are the average of 20 samples. The p-values of all three cell groups were greater than 0.05 (p = 0.822 and 0.624 for MDA-MB-231 and MCF-7, respectively). No significant sign of cytotoxicity was found in both the non-trapping and trapping groups.

Conclusions
We demonstrated that SBAT with CNN based image analysis could serve as a platform for cancer cell evaluation. The high-frequency SBAT is a non-contact and non-labeling technique for the trapping and mechanically deforming of micron-sized objects such as particles or cells. The deformation of MDA-MB-231 and MCF-7 cells in vitro using the SBAT was successfully demonstrated in the paper. Previous methods to evaluate the invasive potential of cancer cells, such as manual analysis, are time-consuming and subjective. CNN clearly provides the classification between two cell lines, highly and weakly invasive cancer cells, based on pretraining and optimization. As a result, high precision and recall rates (>0.96) of the model have been achieved. For further development of the integrated SBATs and CNN, this system can be used for automatically estimating the elastic modulus of cancer cells by applying image processing techniques on cell photomicrographs. After image segmentation into cell and background areas, it is possible to measure ratios of changes in cell areas automatically. Then, CNN processes the correlation between cell deformability and acoustic pressure. Other than the mechanical deformation of a cancer cell, the calcium ion dynamics of a cell evoked by the SBAT is another important indicator for determining the cell invasiveness and its mechanotransduction pathway. This system can perform the automatic analysis of ultrasound-induced calcium elevation for a better understanding of various cellular functions.