1. Introduction
Droplet-based microfluidic platforms have been broadly adopted in various biotechnology applications, such as directed evolution, single-cell sequencing, and digital PCR, owing to their high-throughput capacity and single-molecule sensitivity. In addition, its miniaturization offers great advantages due to a larger surface-to-volume ratio, which confers reproducible microreactors to discretize reagents into picoliter or nanoliter volumes [
1,
2]. Benefiting from isolated microenvironments, precise manipulation of the number of cells inside each droplet makes it possible to study phenotypic and genetic heterogeneity at the single-cell level. Accordingly, it is essential to generate and monitor highly uniform droplets with single-cell encapsulation for single-cell analysis [
3,
4].
Several methods have been proposed for the passive generation of uniform microdroplets [
5]. The coflowing structure was first introduced while the dispersed phase and continuous phase coaxially flowed along the inner and outer coaxial capillaries, respectively [
6]. The dispersed phase with cell contents formed a flow focusing inside the continuous phase and automatically broke into droplets with encapsulated cells due to capillary instability. Correspondingly, flow-focusing structures were also applied where the dispersed phase received the combined force of the pressure drop and shear stress exerted by the continuous phase [
7]. It has shown a good advantage in throughput since smaller droplets were periodically generated at a higher frequency. Similarly, in the T-junction structure [
8], the dispersed phase is perpendicular to the continuous phase and changes the symmetric force into an asymmetric force. In addition, the interface curvature of the two phases could suddenly decrease within an abrupt change in the channel dimension, resulting in a decrease in the Laplacian pressure and producing outward drag [
9]. Considering their reproducibility and robustness, flow-focusing microfluidic chips fabricated by soft lithography processing have been widely used to generate homogeneous microdroplets for cell encapsulation.
With limited exceptions, the number of cells encapsulated per droplet using passive methods is restricted by Poisson statistics [
10], where a theoretical maximum of 36.78% (1/e) of all droplets contain exactly one cell. However, even at the cost of specificity, 26.42% of droplets will contain at least two cells, namely, the covalence probability, reducing the effective rate of single-cell encapsulation. Recently, many efforts have been made to break the inherent limits by regularly ordering cells in inertial [
11,
12] or close-packed channels, on-demand encapsulation, and post-encapsulation sorting, while massive microdroplets are produced with a generation rate on the order of 1~10 kHz. Instead of manual cell counting and analysis, this growing advance has created an active need for computational tools capable of processing many droplets. In addition, various parameters related to the microdroplet formation process will affect the final size distribution and encapsulation rate and need to be periodically monitored with quality control [
13]. Therefore, it is highly necessary to develop complementary and automatic methods to evaluate the equality of generated droplets and quantify the encapsulated contents.
There has been some research [
14,
15,
16,
17,
18] focusing on these tasks based on the video frames acquired by a high-speed camera in which the droplets are clearly separated. This dynamic strategy suffers from fuzzy motion and neglects droplet fusion that occurs downstream of the imaging scope. Hence, the monitoring of dynamic droplet generation is useful but insufficient. A two-stage object detection framework [
19,
20,
21] has been applied to static microscopic images to separately recognize the generated droplets and the encapsulated cells. In the first stage, available droplet proposals are generated with their boundaries segmented by masks. Considering the differences between droplets and oil backgrounds, morphological analysis is first applied to yield droplet proposals. Some research extracts their edge feature maps and adopts the Hough transformation to find circular-like contours [
15,
22,
23]. Background models and connected component analysis [
16,
24] are applied to segment the droplet foreground and locate droplet proposals. These methods work appropriately on transparent and separate droplets while encountering difficulties with opaque and adherent droplets. In the second stage, there are two kinds of approaches for detecting encapsulated contents and classifying droplets, including morphological analysis and machine learning algorithms. Basu et al. [
14] directly judged whether droplets encapsulated any particles by quantifying the deviation of their internal grayscale. Similarly, the standard deviation in the distance between the contour and gravity center of droplets is employed [
19]. However, these morphological methods rely heavily on imaging quality and cannot count the cell number. Adopting machine learning, the random forest was utilized to precisely detect beads inside each droplet with manual labeling [
20]. Handcrafted features were fed into a support vector machine (SVM) to classify droplets into empty, single-cell, and multicell encapsulations [
25]. Convolutional neural networks (CNNs) were also employed [
26] to classify encapsulated droplets. Although these methods can extract superficial droplet or cellular information, they are frequently restrained in their ability to quantify cell encapsulation and require considerable model parameters and excessive training time.
Reconsidering the recognition task of cell-encapsulated droplets, we realized that the key difference is the cell quantity rather than the divergence of cell-like features. Traditional classifiers can generally learn the bias among different categories; unfortunately, they cannot count the cell population in each droplet. Research on cell counting mostly adopts regression approaches to estimate the density map of a given medical image [
27], while the integral of the density map might intuitively indicate the cell number. These fully supervised learning approaches require tedious cell-level annotation for the training procedure, including a precise cell population [
28] and the accurate location of each cell [
29]. To avoid time-consuming annotation, three droplet-level labels (empty, single-cell, and multicell encapsulation) instead of cell-level labels are adopted in this paper. Accordingly, we developed a microfluidic system embedded with a recognition algorithm to generate microfluidic droplets, monitor droplet size, and further recognize the encapsulated numbers of cells. A morphological approach named adaptive scale template matching (ASTM) was first proposed to generate proposals. Second, to distinguish and categorize droplets by the number of encapsulated cells, the cell population inside each droplet was estimated by a weakly supervised cell counting network (WSCNet). Next, this algorithm also provides the location prediction of each cell, which is more explainable and reasonable compared with a CNN-based classifier. In addition, we verified our approach with intricate droplet data collected from three different microfluidic structures under different experimental parameters. Quantitative experimental results showed that our approach can not only distinguish droplet encapsulations (F1 score > 0.88) but also locate each cell without any supervised location information (accuracy > 89%). We also demonstrated the feasibility of this combined microfluidic system and cell counting network approach towards single-cell encapsulation analysis. The probability of “single cell in one droplet” encapsulation is systematically verified under different parameters and is in good agreement with the Poisson distribution. The whole system is self-contained, and the proposed counting networks are tiny and effective, which can be easily employed as a comprehensive platform for the quantitative assessment of encapsulated microfluidic droplets.
3. Materials and Methods
3.1. Microfluidic Chips and Experimental Platform
Hydrodynamic flow-focusing structures were used to generate droplets, while bacterial cells were successively encapsulated. We constructed a microfluidic droplet generation platform, as shown in
Figure 1d, including different microfluidic chips, a multichannel syringe pump (TS-1B, Longer Precision Pump Co. Ltd., Baoding, China), an inverted microscope (Motic AE31, Panthera, Xiamen, China), a USB camera (acA1920, Basler Asia Pte. Ltd., Singapore) with a resolution of 1920:1200, and a computer embedded with our proposed algorithm. The camera and microscope with 10/20/40× objectives provided clear images of cells larger than 2 μm. Three microfluidic chips with different geometries were established, as shown in
Figure 1a–c, to compare and validate the generalization ability of the algorithm on intricate data collected from different chips. The first two geometries generated the encapsulated droplets based on passive methods, while a serpentine inertial focusing channel [
11] was added to the third geometry to preorder cells and attempt to improve the single-cell encapsulation rate.
All microfluidic layers were fabricated using standard soft photolithography with patterns etched on silicon wafers [
1]. Master molds with 20–40 μm-thick SU-8 were fabricated in a clean room. The PDMS (polydimethylsiloxane) base and its curing reagent (Slygard 184, Dow Corning, Midland, MI, USA) were thoroughly mixed and degassed in a vacuum oven. Next, the PDMS mixture was cast onto the SU-8 molds, cured at 85 °C for 1 h, and peeled off from the molds. The PDMS slab was cut into a suitable size, punched for inlets and outlets, and bonded to a glass substrate after oxygen plasma treatment (Femto, Diener Electronic, Ebhausen, Germany). Aquapel (PPG Industries, Pittsburgh, PA, USA) was injected into the microchannels and blown out after 5 min for surface modification.
The encapsulated droplets were captured at two periods for training and inference purposes. The first period lasted for two months, and 830 images with a resolution of 640 × 480 were collected with a mean of 191.55 droplets per image for algorithm training. All the images were randomly divided into a training set, a validation set, and a test set according to a ratio of 4:1:1. There were four types of samples to be classified: background, empty, single, and multiple, while the background samples were extracted by random sampling on the background of the droplet images. Only three droplet-level labels were provided for the training stage. To quantify the localization performance of our approach, we pinpointed the center of each cell (diameter 3–10 μm) in the test dataset of multicell encapsulation, which was not applied in the training procedure. The number of samples in each dataset is shown in
Table 1.
Since the population of empty droplets is far larger than the population of single and multiple encapsulations, a random sampling subset of the empty samples was adopted in the training, validation, and testing. The number of background samples was approximately equal to the sum population of the other three samples, while the former and the latter were used as the negative and positive samples for the binary classification branch, respectively.
The second period followed the first period and lasted for six months; its purpose was to verify the generalization performance of the proposed algorithm on multitasking. Ninety-three groups of cell-encapsulated droplet generation are implemented for inference, each under different parameters, including microchip structures (A/B/C type), flow rates of the dispersed and continuous phases (Q1 and Q2), and CPD (λ), to generate different image distributions. More than 1800 images were collected from three microfluidic geometries (more than five chips were tested for each geometry), each containing 100~250 droplets.
3.2. Convolutional Neural Network-Based Imaging Recognition
We proposed a CNN-based recognition algorithm to evaluate droplet quality (size and distribution) and further recognize the encapsulated cell (amounts and position) in two stages. In the first stage, ASTM is proposed to heuristically generate droplet proposals from the binary foreground of highly adherent droplet images segmented by the Otsu algorithm, as shown in
Figure 2b. Specifically, assuming a circular binary template is denoted by
and its initial diameter is
r, the matching response in the foreground
D (
x,
y) can be computed by:
where
denotes the binary foreground image.
D (
x,
y) is essentially the ratio of the foreground area inside the template to the full template area. Consequently, its maximum corresponds to the largest foreground area covered by the template, which can be computed by
. Droplet proposals can be generated by the greedy search of all local maxima.
Nevertheless, the diameter of the droplets varied due to unpredicted fusion or flow disturbances. A template with a minor
r can locate small droplets but yield an inaccurate
for large droplets, and vice versa. It is necessary to elaborate an adaptive scale template to find all droplets with different diameters. The pixels with
higher than a predefined threshold
σ were marked as the center of a droplet proposal, while the current template scale suggests its diameter. In contrast, if all
are less than
σ, the next template scale is adaptively shrunk by:
Since overlapping bounding circles might occur in one true droplet, non-maximum suppression (NMS) [
36] is employed to remove redundant circles, as shown in
Supplementary Figure S3.
In the second stage, the cell-encapsulated droplets were detected using droplet proposals. We developed the WSCNet to estimate the number of cells and predict their positions. To avoid tedious and manual cell-level annotation, only three droplet-level labels, including empty, single-cell, and multicell encapsulation (0, 1, >1), are adopted. The WSCNet consists of classification and counting branches, as shown in
Figure 2f. The former serves as a filter to remove false positives from previously generated proposals. Similar to other counting tasks [
27,
28], the output of the latter branch is a single-channel density map, and its integral and local maxima may indicate the number and location of cells, respectively. Cross entropy was adopted by the classification branch as the loss function
to provide a predicted label (droplet or false positive). The counting branch may employ the mean square error (MSE) between the label and the prediction as a loss function:
where
y suggests the supervision, i.e., the true counting label, and
and
denote the output density map and its counting prediction, respectively. Considering that the multicell encapsulation contains at least 2 cells, we quantify its label as 2 and truncate its counting prediction to 2, which can be formulated by:
where
indicates the integral of the density map obtained by global sum pooling and
γ represents a small constant that provides a gradient. A regularization performed on each density value is added to Equation (6) to avoid overestimating the counting prediction caused by the truncation:
where
represents the max value in the density map obtained by global max pooling. The loss function Equation (6) for the counting branch can be rewritten as follows:
Finally, the loss function of the whole network containing two branches is given with a weight of
ω:
The classification branch provides a predicted label, and the counting branch outputs a density map with the same resolution as the input image. The density map is valid only when the predicted label is a droplet.
suggests the number of encapsulated cells, and the first
maxima in the density map indicate the cell location. According to the predicted cell numbers, it is easy to reclassify the droplets into three categories (empty, single cell, and multicell) for comparison with other classification-based approaches:
3.3. Network Implementation and Evaluation Metrics
We set the matching response threshold
σ, the small constant
γ, and the weight
ω to 0.98, 0.001, and 1, respectively. ReLU is adopted as the activation function in the whole network. The batch size is set to 1024. The learning rate is initialized at 10
−4 and adjusted by the loss of the validation set. Inspired by the interaction over union (IoU) of bounding boxes in object detection issues, we adopt a bounding circle IoU to distinguish the true positive predictions, as shown in
Figure S4, which is calculated by:
where
Cgt and
Cpre represent the true and predicted bounding circles of a droplet, respectively. A droplet proposal with an IoU higher than a threshold
θ indicates a true positive prediction.
We evaluate our algorithm from three angles. First, recall and precision are adopted to evaluate the performance of the ASTM on the generation of droplet proposals. Second, the metric F1, computed by Equation (13), is employed to quantify the performance of the WSCNet on the recognition of the encapsulated cell number. Additionally, the F1 score, model size, and training time are utilized to compare with other classification-based approaches. Third, to quantify the location performance of our approach, a circular mask with an
x-pixel radius centered at each annotated cell centroid is regarded as a valid area. Accordingly, a location prediction that is nearest to a cell centroid and falls into its valid area is a true positive; the metrics are also adopted to evaluate the location performance of the WSCNet.
In addition, since our algorithm can provide the exact cell population of each droplet, the cell number in the droplets can be constructed. We, therefore, selectively labeled the exact cell population of the multicell encapsulation to evaluate the predicted number, which is not applied in the training procedure. The counting performance can be measured by the mean relative error (MRE).
3.4. Experimental Setup for Droplet Generation and Cell Encapsulation
The microfluidic chips were placed on the stage of an inverted microscope for observation and recording. Multichannel syringe pumps were used to inject the dispersed and continuous phases into the corresponding inlets. In most generation experiments, a mixture of mineral oil was used as the continuous phase, including 3% (w/w) EM90 (ABIL, Evonik, Essen, Germany), which served as a surfactant, decreasing the surface tension, and 0.1% (v/v) Triton-100 dissolved in light mineral oil (M5310, Sigma–Aldrich, St. Louis, MO, USA). In partial experiments, Novec 7500 (3 M Inc., St. Paul, MN, USA) with 1% dSURF surfactant (DR-RE-SU, Fluigent, Le Kremlin-Bicêtre, France) was used as the continuous phase.
Yeast cell solution in PBS premix was used as the dispersed phase, consisting of 40% (v/v) OptiPrep medium (D1556, Sigma–Aldrich) or 30~50% glycerol (356350, Sigma–Aldrich) and PBS to prevent cell sedimentation. Yeast (Saccharomyces cerevisiae) was cultured for standard resuscitation at 28 °C in a YPD medium (formulated with 20 g of glucose, 10 g of yeast extract, and 10 g of peptone dissolved in 1 L of distilled water). After well blending, the diluted cells were incubated at room temperature, and the cell fractions were washed twice by resuspending in PBS and discarding the supernatant after 1000 rpm centrifugation for 5 min to remove residual debris and doublets. Before sample mixing, 10 µL of cell solution was stained for activity analysis and density calculation (Dcell).
In all experiments, droplets were generated within the microfluidic chips by injecting the dispersed and continuous phases at designated flow rates. The cell-encapsulated droplets were collected into EP tubes, and one drop was added onto glass slides pasted with rectangular enclosures that were prepacked with oil. Microscopic images of droplets were acquired with a USB camera, while the mean size (Rcell) and coefficient of variation (CV) were automatically calculated with the proposed algorithm. Therefore, the average number of cells in each microdroplet (CPD, λ) was quantified as the cell density Dcell divided by the droplet volume 4/3πR3cell. The cell counts of the encapsulated droplets were automatically analyzed and labeled by the proposed method. Multiple statistical results were fitted to the Poisson distributions, with error analysis conducted by the Chi-Squared and Residual Sum of Squares (RSS) tests.
5. Conclusions and Perspectives
In this study, we have illustrated passive cell encapsulation in microfluidic droplets as well as the principles and performance of image recognition algorithms. A novel weakly supervised algorithm, WSCNet, was designed to recognize cell-encapsulated droplets from highly adherent droplet images and was systematically verified by different experiments. Compared to classification-based approaches, our method can not only distinguish the droplets encapsulated with different amounts of cells but also locate them without any supervised location information. Reconsidering the weak supervision of droplet-level labels, the WSCNet learns to recognize cell features from the difference between empty and single-cell droplets and then applies the learned knowledge to multicell droplets. Because a multicell droplet encapsulation contains at least two cells inside, the WSCNet can learn to count the cell population from the precise labels and the imprecise labels. In addition, the maximum value in the density map also provides the precise location information of the cell. Unfortunately, if the cells in a multicell encapsulation are highly crowded, which results in a deformed morphological feature, the network may fail to count the cell population and locate each of them.
In addition, the architecture of the proposed counting network only contains seven convolutional layers, which is very tiny and effective. In the future, inertial cell ordering in the dispersed phase should be further investigated to improve the single-cell rate and break Poisson’s distribution [
11,
12]. We will attempt to extend the proposed method to real-time recognition of video frames. More experiments on cells with different morphologies and the differentiation and enumeration of cell subpopulations should be carried out to make our system available for clinical applications. Considering the integration of a microfluidic chip and imaging algorithm, this system is suitable for applications where rapid analysis of single-cell encapsulation is demanded, such as single-cell sequencing and droplet-based analysis.