Fast Analysis of Time-Domain Fluorescence Lifetime Imaging via Extreme Learning Machine

We present a fast and accurate analytical method for fluorescence lifetime imaging microscopy (FLIM), using the extreme learning machine (ELM). We used extensive metrics to evaluate ELM and existing algorithms. First, we compared these algorithms using synthetic datasets. The results indicate that ELM can obtain higher fidelity, even in low-photon conditions. Afterwards, we used ELM to retrieve lifetime components from human prostate cancer cells loaded with gold nanosensors, showing that ELM also outperforms the iterative fitting and non-fitting algorithms. By comparing ELM with a computational efficient neural network, ELM achieves comparable accuracy with less training and inference time. As there is no back-propagation process for ELM during the training phase, the training speed is much higher than existing neural network approaches. The proposed strategy is promising for edge computing with online training.


Introduction
Fluorescence lifetime imaging microscopy (FLIM) has attracted growing interest in biomedical applications, such as surgical procedures [1], tumor detection [2,3], cancer diagnosis [4], and the study of protein interaction networks using Förster resonance energy transfer (FRET) techniques [5]. It can quantitatively investigate local microenvironments of fluorophores by measuring fluorophores' lifetimes. For example, FLIM can observe dynamic metabolic changes in living cells by measuring autofluorescence lifetimes of NAD and NADP. This is utilized to mediate cell fate for diabetes and neurodegeneration research [6]. Fluorescence lifetime is the average time a fluorophore stays excited before releasing fluorescence. The process can be analyzed in the time or frequency domain. Timecorrelated single-photon counting (TCSPC) techniques [7] are more widely used [8][9][10] due to their superior signal-to-noise ratio (SNR) and precise temporal resolution (in picoseconds) compared with frequency-domain approaches. During data acquisitions, emitted photons are detected by a single-photon detector, wherein a high-precision stopwatch circuit records timestamps of detected photons. The stopwatch circuit generates an exponential histogram, from which the fluorescence lifetime is extracted.
Estimating lifetime parameters is an ill-posed problem with high computational complexity. Numerous algorithms have been developed to quantify lifetimes and relevant parameters. Iterative fitting and optimization approaches were reported to deduce fluorescence lifetimes. A convex optimization method [11] was utilized for high-resolution FLIM, where the accuracy is related to fine-tuned hyperparameters in the cost function. An F-value-based optimization algorithm [12] was used to minimize signal distortion introduced by pile-up effects and the dead time of single-photon detectors. A Laguerre expansion method [13][14][15] was reported to speed up least-squares deconvolutions.
On the other hand, non-iterative fitting methods were introduced to reduce computing complexity whilst maintaining high accuracy. A new nonparametric empirical Bayesian framework [16] was adopted for lifetime analysis based on a statistical model, where the expectation-maximization algorithm was employed to solve the optimization problem. A hardware-friendly fitting-free center of mass (CMM) [17][18][19] algorithm was proposed to deliver fast analysis and has been applied to a flow-cytometry system [20,21]. Integral equation methods (IEM) [22] were also implemented in FPGA devices to provide realtime analysis. Direction-of-arrivals estimation [23] was adopted to deliver a non-iterative and model-free lifetime reconstruction strategy, requiring a few time bins. A histogram cluster method [24] divides histograms into clusters instead of processing histograms pixel-by-pixel, enhancing the analysis speed. However, challenges remain. Firstly, most of these algorithms need a long acquisition time to guarantee the reconstruction fidelity, likely causing photobleaching. A fast algorithm suitable for low photon counts conditions is, therefore, desirable. Secondly, iterative or probabilistic methods are not portable to hardware, impeding the on-chip computing of TCSPC systems.
Artificial neural networks (ANNs) have proved promising for FLIM analysis. FLI-NET [25] used a 3-D convolutional neural network (CNN) to analyze bi-exponential decays via a branched architecture. Its compressed-sensing [26] version used a single-pixel detector and a digital micromirror device to reconstruct intensity and lifetime images. A 1-D CNN architecture [27] was introduced to reduce the computational load for multi-exponential analysis, using a similar branched structure. A multi-layer perceptron (MLP) method [28] was proposed for mono-exponential analysis with high spatial-resolution SPAD arrays. Another MLP [29] was reported combining maximum likelihood estimation algorithms and using fully connected layers to resolve bi-exponential decays. Moreover, another ANN technique [30] was introduced to fuse high-resolution fluorescence intensity and low-resolution lifetime images for wield-field FLIM systems. However, the training and inference of the ANNs are slow. Even with powerful GPUs, it usually takes a long training time (hours) to train a network. It is also time consuming to retrain a model when the lifetime range is altered.
Pixel-wise lifetime recovery has been widely used, since it is consistent with the sensor readout and more computationally economical than 3-D algorithms. The extreme learning machine (ELM) [31] is an efficient algorithm to process 1-D signals for biological applications, such as electrocardiogram (ECG) and electroencephalogram (EEG) signals [32]. Inspired by related literature, we used ELM to reconstruct lifetimes from 1-D histograms using multi-variable regression. Contributions of the ELM-based lifetime inference approach are that: (1) It is data-driven without a back-propagation learning strategy. It achieves less training time than existing ANN methods, paving the way for fast online training on embedded hardware for FLIM. (2) It can resolve mono-and bi-exponential models widely employed in practical experiments, wherein the amplitude and intensity average lifetimes were investigated. (3) Reconstructed lifetime parameters from ELM are more accurate than fitting and non-fitting algorithms regarding synthetic and experimental data under different photon-counting conditions whilst maintaining fast computing speed.
This paper presents a theory applying ELM to FLIM (Section 2), algorithms' comparisons regarding synthetic data with low-photon-count scenarios (Section 3), and algorithms' comparisons regarding an incubated living cell under different levels of photon counts (Section 4).
where ϕ(·) is the activation function, and usually, a sigmoid function can achieve a relatively good result, and w l = [w l1 , w l2 , . . . , w lm ] T and b l = [b 1 , b 2 , . . . , b L ] T , l = 1, . . . , L. are randomly assigned weights and biases between the input nodes and the hidden layer before training. Say β l is the weighting connecting the lth hidden layer and output nodes, defined as: To learn the parameter matrix of β with a dimension of L × n, the ridge loss function is widely adopted as: argmin where A is the matrix composed of the activation functions with dimensions H × L; Y is a matrix with dimensions H × n containing ground truth (GT) data: Through solving the loss function, we can obtain the matrix β by: where I is an identity matrix with dimensions L × L, the hyperparameter λ helps obtain a reliable result when the matrixA T A+ λI is not full rank.

TCSPC Model for FLIM
Fluorescence emission can be modeled with mono-or multi-exponential decay functions and a bi-exponential model can approximately deduce a signal following a multiexponential decay. Therefore, we focus on lifetime analysis from mono-and bi-exponential models in this work. Fluorescence functions can be adopted to formulate measured histograms containing multiple lifetime components and corresponding amplitude fractions. Therefore, for each pixel, the measured decay consisting of K lifetime components is formulated as: where the IRF(·) is the system's instrument response function, P is proportional to the fluorescence intensity, τ k is the kth lifetime component, α k is the kth amplitude fraction, and n(t) includes Poisson noise [33] and dark count rate of the sensor, t = [1, 2, . . . , T] is the time-bin index of the TCSPC module. As photon arrivals follow the Poisson distribution, with C cycles of laser excitation, the ultimate distribution in one pixel can be derived as: Based on this theoretical TCSPC model, we can generate training datasets for ELM. Synthetic curves correspond to column vectors in the input matrix x. Apart from multiexponential decays, we define the amplitude-weighted lifetime τ A and intensity-weighted average lifetime τ I to evaluate ELM.

Training Data Preparation
The training datasets contain 20,000 synthetic histograms, and ground truth (GT) lifetime parameters were generated to train the ELM network. Synthetic decays comply with Equation (6) and the IRF curve is modelled via a Gaussian curve: where FWHM (0.1673 ns) is compatible with the two-photon FLIM system for FLIM measurements, t 0 (14th) is the index of the peak, h (0.039 ns) is the bin width of the TCSPC system. Both mono-and bi-exponential decay models were generated for performance evaluation. Lifetime constants t were set in [0.1, 5] ns for the mono-exponential decay model and τ 1 , τ 2 are set in [0.1, 1], [1,3] ns for bi-exponential models. The structure of ELM is depicted in Figure 1. Suppose the input vector is a pixel-wise histogram measured by a TCSPC system containing 256 time bins in the inference phase. The number of output nodes depends on the number of lifetime components we defined in synthetic datasets. For instance, if the measured data consists of bi-exponential decay model, the output layer should be configured as three nodes, namely, τ 1 , τ 2 , and α. We can easily obtain average lifetimes from Equations (8) and (9). All the histograms from the sensor are fed into the network sequentially; lifetime parameters can be obtained from output nodes pixel by pixel. The number of nodes in the hidden layer can be flexibly adjusted to achieve a trade-off between accuracy and computing time consumption.

Synthetic Data Analysis
τA and τI are used to estimate energy transfer for FRET or indicate fluorescence quenching behaviours [34]. This section compares NLSF, BCMM, and ELM to retrieve τA from bi-exponential decays. Likewise, we also compared NLSF, CMM, and ELM to reconstruct τI. Besides, ELM was compared with existing ANNs for FLIM in terms of (1) the network scale and (2) training time. Multiple widely used metrics (F-value, SSIM, R 2 , MSE) were adopted for performance evaluations.

Comparisons of Individual Lifetime Components
As NLSF was usually adopted in previous studies [25,27,35], we compared the inference performances of ELM and deconvolution-based NLSF (implemented with lsqcurvefit(·) function in MATLAB using iterative Levenberg-Marquardt algorithm) in Figure 2. As such, 2000 simulated testing datasets were generated for recovery for single and double lifetimes. Here, we define the absolute error Δg = |g − gest|, where g = τ1, τ2, α, τA and gest is the estimated g. ΔgELM and ΔgNLSF are the absolute errors for ELM and NLSF. Figure 2a,b show the Δg of ELM and NLSF for mono-exponential decays, respectively. Δg decreases as the peak intensity increases, and ΔgELM is smaller than ΔgNLSF. Likewise, Figure 2c,d indicate Δg plots for g = τ1, τ2, and α, where ΔgELM is smaller than ΔgNLSF. Similarly, Figure 2e,f indicate ELM obtained a much more accurate τA than NLSF. Therefore, ELM can perform better than NLSF in mono-and bi-exponential decays. Additionally, as shown in Figure  3, we visually inspected estimated τ1, τ2, and α, based on pre-defined variables in synthetic 2-D images. We used the SSIM to evaluate reconstructed images in Figure 3a,b. The 2-D lifetime images were reconstructed from a 3-D synthetic data cube, composed of either mono-or bi-exponential decays (256 × 256 × 256, representing spatial and temporal dimensions). All the GT lifetime parameters (τ and α) are pre-defined in Equation (1). The 2-D lifetime images are recovered pixel by pixel from noisy synthetic 3-D data cubes. Figure 3a shows reconstructed 2-D images from mono-exponential decays with GT τ varying from 0.1 to 5 ns. Likely, Figure 3b shows estimated τ1, τ2, and α bi-exponential decays. Results obtained from ELM are more accurate than NLSF. Figure 3c shows the phasor plots of GT distributions of mono-( Figure 3a) and bi-exponential (Figure 3b) decays. From the phasor theory [36], cluster points of mono-exponential decays should locate on the semi-circle. For bi-exponential decays, two-lifetime components are indicated by the intersections of a fitted line and the semi-circle. We utilized R 2 defined as: Figure 1. ELM is used for lifetime analysis. The input data are a 1-D pixel-wise histogram from the raw point cloud that contains 256 time bins. The histogram is fed into a single-hidden-layer ELM, and lifetime parameters (τ 1 , τ 2 , and α) can be obtained from output nodes.

Synthetic Data Analysis
τ A and τ I are used to estimate energy transfer for FRET or indicate fluorescence quenching behaviours [34]. This section compares NLSF, BCMM, and ELM to retrieve τ A from bi-exponential decays. Likewise, we also compared NLSF, CMM, and ELM to reconstruct τ I . Besides, ELM was compared with existing ANNs for FLIM in terms of (1) the network scale and (2) training time. Multiple widely used metrics (F-value, SSIM, R 2 , MSE) were adopted for performance evaluations.

Comparisons of Individual Lifetime Components
As NLSF was usually adopted in previous studies [25,27,35], we compared the inference performances of ELM and deconvolution-based NLSF (implemented with lsqcurvefit(·) function in MATLAB using iterative Levenberg-Marquardt algorithm) in Figure 2. As such, 2000 simulated testing datasets were generated for recovery for single and double lifetimes. Here, we define the absolute error ∆g = |g − g est |, where g = τ 1 , τ 2 , α, τ A and g est is the estimated g. ∆g ELM and ∆g NLSF are the absolute errors for ELM and NLSF. Figure 2a,b show the ∆g of ELM and NLSF for mono-exponential decays, respectively. ∆g decreases as the peak intensity increases, and ∆g ELM is smaller than ∆g NLSF . Likewise, Figure 2c,d indicate ∆g plots for g = τ 1 , τ 2 , and α, where ∆g ELM is smaller than ∆g NLSF . Similarly, Figure 2e,f indicate ELM obtained a much more accurate τ A than NLSF. Therefore, ELM can perform better than NLSF in mono-and bi-exponential decays. Additionally, as shown in Figure 3, we visually inspected estimated τ 1 , τ 2 , and α, based on pre-defined variables in synthetic 2-D images. We used the SSIM to evaluate reconstructed images in Figure 3a,b. The 2-D lifetime images were reconstructed from a 3-D synthetic data cube, composed of either mono-or bi-exponential decays (256 × 256 × 256, representing spatial and temporal dimensions). All the GT lifetime parameters (τ and α) are pre-defined in Equation (1). The 2-D lifetime images are recovered pixel by pixel from noisy synthetic 3-D data cubes. Figure 3a shows reconstructed 2-D images from mono-exponential decays with GT τ varying from 0.1 to 5 ns. Likely, Figure 3b shows estimated τ 1 , τ 2 , and α bi-exponential decays. Results obtained from ELM are more accurate than NLSF. Figure 3c shows the phasor plots of GT distributions of mono- (Figure 3a) and bi-exponential (Figure 3b) decays. From the phasor theory [36], cluster points of mono-exponential decays should locate on the semi-circle. For bi-exponential decays, two-lifetime components are indicated by the intersections of a fitted line and the semi-circle. We utilized R 2 defined as: to evaluate the estimation consistency, where τ i A is the predicted parameter, τ i A_GT is the GT parameter, τ A_Ave is the average of GT parameters, P is the number of simulated decay curves. As shown in Figure 3d, the scatter plots show ELM is closer to GT, and NLSF shows more outliers. We further evaluated ELM and NLSF using the F-value defined as Equation (12) [37] with synthetic mono-and bi-exponential decays.
F > 1 and a lower F means higher precision, where I is the detected photon count, δx is the standard deviation of the estimated lifetime parameter, and x is the GT parameter. We generated 200 synthetic decays for given ranges of lifetimes and peak intensities in Figure  4. Figure 4a shows the F-value of mono-exponential decays versus the lifetime in the range ~ [0. 1,5] ns. Figure 4b shows the F-value of bi-exponential decays versus τ1, τ2, and α in [0.1, 1] ns, [1,3] ns, and [0, 1], respectively. We assigned 200 decays with a total photon count (<2000) per synthetic histogram for both scenarios. Both figures show that ELM obtained a smaller F than NLSF, meaning ELM can achieve better precision. Furthermore, we defined the bias Δτ/τ to evaluate ELM and NLSF versus the photon count. τ was set to 3.0 ns for mono-exponential decays. τ1, τ2, and α were set to 0.3 ns, 3.0 ns, and 0.5 for bi-exponential decays. Figure 4c shows that the bias of NLSF increases as the photon count increases, which is worse than ELM. Figure 4d shows that the bias of ELM is smaller than NLSF, and ELM is more robust to varying photon counts. Moreover, NLSF is also sensitive to initial conditions of lifetime parameters [34]. The bias decreases when the initial conditions are closed to GT values, meaning that users need to have prior knowledge about the parameters to be extracted.  F > 1 and a lower F means higher precision, where I is the detected photon count, δx is the standard deviation of the estimated lifetime parameter, and x is the GT parameter. We generated 200 synthetic decays for given ranges of lifetimes and peak intensities in Figure 4. Figure 4a shows the F-value of mono-exponential decays versus the lifetime in the rangẽ [0. 1,5] ns. Figure 4b shows the F-value of bi-exponential decays versus τ 1 , τ 2 , and α in [0.1, 1] ns, [1,3] ns, and [0, 1], respectively. We assigned 200 decays with a total photon count (<2000) per synthetic histogram for both scenarios. Both figures show that ELM obtained a smaller F than NLSF, meaning ELM can achieve better precision. Furthermore, we defined the bias ∆τ/τ to evaluate ELM and NLSF versus the photon count. τ was set to 3.0 ns for mono-exponential decays. τ 1 , τ 2 , and α were set to 0.3 ns, 3.0 ns, and 0.5 for bi-exponential decays. Figure 4c shows that the bias of NLSF increases as the photon count increases, which is worse than ELM. Figure 4d shows that the bias of ELM is smaller than NLSF, and ELM is more robust to varying photon counts. Moreover, NLSF is also sensitive to initial conditions of lifetime parameters [34]. The bias decreases when the initial conditions are closed to GT values, meaning that users need to have prior knowledge about the parameters to be extracted.

Comparisons of τ A
We evaluated ELM in estimating τ A in various count conditions. As shown in Figure 5a, we set three regions at three count levels, changing τ A from top to bottom. We refer to the three regions as low, middle, and high counts hereafter. Figure 5b depicts the GT τ A . From Figure 5c,d, ELM shows a more accurate τ A image than NLSF, with ELM producing a smaller MSE than NLSF in each region. We also included the non-fitting BCMM [18] for the comparison due to its fast speed and capacity to resolve bi-exponential decays. From Figure 5e, BCMM is not robust in low counts, outperforming NLSF in middle and high regions. Further, ELM obtained better results than BCMM. BCMM is less photon efficient, and it is sensitive to the measurement window T (T should be larger than 5 × τ 2 , otherwise bias correction is needed [18]).

Comparisons of τA
We evaluated ELM in estimating τA in various count conditions. As shown in Figure  5a, we set three regions at three count levels, changing τA from top to bottom. We refer to the three regions as low, middle, and high counts hereafter. Figure 5b depicts the GT τA. From Figure 5c,d, ELM shows a more accurate τA image than NLSF, with ELM producing a smaller MSE than NLSF in each region. We also included the non-fitting BCMM [18] for the comparison due to its fast speed and capacity to resolve bi-exponential decays. From Figure 5e, BCMM is not robust in low counts, outperforming NLSF in middle and high regions. Further, ELM obtained better results than BCMM. BCMM is less photon efficient, and it is sensitive to the measurement window T (T should be larger than 5 × τ2, otherwise bias correction is needed [18]). Table 1 compares ELM with NLSF regarding the time consumption for inference (forward-propagation) tasks in Figure 3a,b. NLSF resolving mono-exponential decays consumes more time than for bi-exponential decay models. In contrast, the analysis time of ELM is not affected by the number of lifetime components and it is substantially less than NLSF.

Comparisons of τI
CMM [17] achieves the fastest speed for intensity average lifetime analysis. We further compared CMM with ELM for τI reconstruction. As shown in Figure 6, the result from ELM is better than NLSF but slightly worse than CMM. However, CMM is sensitive to and biased by the measurement window if bias correction is not included. Although CMM obtained a smaller overall MSE, the bias occurs as τI becomes longer. It agrees with the conclusion from the previous work [34], indicating that CMM causes misleading inference when there are multi-lifetime species in the field of view. Further, τI sometimes generates a shorter dynamic lifetime range than τ as τI cannot correctly distinguish clusters with different lifetimes, especially for strong FRET phenomena [5]. ELM and CMM can achieve shorter processing time than NLSF and BCMM, as shown in Table 1. In this case, although ELM is slightly slower than CMM, the consumed time varies with the number of nodes in the hidden layer. Figure 7a shows training errors indicated by mean square errors (MAE) versus different numbers of nodes in the hidden layer. Here, the number of the hidden layer is set to 500 for both mono-and bi-exponential models, as there was no apparent MAE decrease, and a moderate processing time was achieved, as shown in Figure 7b. Moreover, we compared ELM with relevant ANNs for FLIM. Since ELM uses the Moore-Penrose matrix inversion strategy to learn parameters instead of back-propagation, it is much faster. As shown in Table 2, although ELM has more parameters than 1-D  Table 1 compares ELM with NLSF regarding the time consumption for inference (forward-propagation) tasks in Figure 3a,b. NLSF resolving mono-exponential decays consumes more time than for bi-exponential decay models. In contrast, the analysis time of ELM is not affected by the number of lifetime components and it is substantially less than NLSF.

Comparisons of τ I
CMM [17] achieves the fastest speed for intensity average lifetime analysis. We further compared CMM with ELM for τ I reconstruction. As shown in Figure 6, the result from ELM is better than NLSF but slightly worse than CMM. However, CMM is sensitive to and biased by the measurement window if bias correction is not included. Although CMM obtained a smaller overall MSE, the bias occurs as τ I becomes longer. It agrees with the conclusion from the previous work [34], indicating that CMM causes misleading inference when there are multi-lifetime species in the field of view. Further, τ I sometimes generates a shorter dynamic lifetime range than τ A as τ I cannot correctly distinguish clusters with different lifetimes, especially for strong FRET phenomena [5]. ELM and CMM can achieve shorter processing time than NLSF and BCMM, as shown in Table 1. In this case, although ELM is slightly slower than CMM, the consumed time varies with the number of nodes in the hidden layer. Figure 7a shows training errors indicated by mean square errors (MAE) versus different numbers of nodes in the hidden layer. Here, the number of the hidden layer is set to 500 for both mono-and bi-exponential models, as there was no apparent MAE decrease, and a moderate processing time was achieved, as shown in Figure 7b. Moreover, we compared ELM with relevant ANNs for FLIM. Since ELM uses the Moore-Penrose matrix inversion strategy to learn parameters instead of back-propagation, it is much faster. As shown in Table 2, although ELM has more parameters than 1-D CNN [27], the training time is much shorter than the other existing studies [25,[27][28][29]. Many CNN hyperparameters should be fine-tuned, and batch normalizations should be implemented to avoid gradient vanishing [38]. In contrast, ELM's architecture is much simpler, and we simply need to adjust the number of nodes in the hidden layer. Furthermore, the efficient training process enables online training and is suitable for embedded hardware implementations [39]. ELM is highly reconfigurable to provide a flexible solution to balance the trade-off between computing complexity and accuracy. The evaluations of ELM and NLSF were conducted in MATLAB R2016a, 64-bit CPU (Intel Core i5-4200H @ 2.80 GHz) with 8 GB memory. Notably, other studies in Table 2 used much more powerful GPU to train their models. Despite this, ELM still delivers the shortest training time.
Based on the analysis of synthetic datasets, ELM is more robust for analyzing monoand bi-exponential decays than traditional NLSF methods. We will evaluate ELM using realistic experiment data in the next section.

Experimental FLIM Data Analysis
To investigate the feasibility of ELM for experimental FLIM data, we utilized living prostate cancer cells incubated with functionalized gold nanorods (GNRs). A commercial two-photon FLIM system was used to acquire raw 3-D data cubes. This section compares ELM with 1D-CNN, NLSF, and BCMM. 4200H @ 2.80 GHz) with 8 GB memory. Notably, other studies in Table 2 used much more powerful GPU to train their models. Despite this, ELM still delivers the shortest training time.  [27] 48,675 7 ✓ 23 min MLP [28] 3,750,205 3  38 min MLP [29] 149,252 2 ✓ 4 h ✓ means the algorithm can resolve multiple-exponential decays,  means the algorithm cannot resolve multiple-exponential decays.
Based on the analysis of synthetic datasets, ELM is more robust for analyzing monoand bi-exponential decays than traditional NLSF methods. We will evaluate ELM using realistic experiment data in the next section.

Experimental FLIM Data Analysis
To investigate the feasibility of ELM for experimental FLIM data, we utilized living prostate cancer cells incubated with functionalized gold nanorods (GNRs). A commercial two-photon FLIM system was used to acquire raw 3-D data cubes. This section compares ELM with 1D-CNN, NLSF, and BCMM.   [27] 48,675 7 √ 23 min MLP [28] 3,750,205 3 Sensors 2022, 22, x FOR PEER REVIEW parameters than 1-D CNN [27], the train studies [25,[27][28][29]. Many CNN hyper normalizations should be implemented ELM's architecture is much simpler, and in the hidden layer. Furthermore, the effi is suitable for embedded hardware imple provide a flexible solution to balance th accuracy. The evaluations of ELM and N CPU (Intel Core i5-4200H @ 2.80 GHz) wi 2 used much more powerful GPU to tra the shortest training time. 1,084,045 1-D CNN [27] 48,675 MLP [28] 3,750,205 MLP [29] 149,252  means the algorithm can resolve multiple resolve multiple-exponential decays.


Based on the analysis of synthetic d and bi-exponential decays than tradition realistic experiment data in the next secti    [25,[27][28][29]. Many CNN hyperparameter normalizations should be implemented to avoid ELM's architecture is much simpler, and we simpl in the hidden layer. Furthermore, the efficient traini is suitable for embedded hardware implementation provide a flexible solution to balance the trade-of accuracy. The evaluations of ELM and NLSF were CPU (Intel Core i5-4200H @ 2.80 GHz) with 8 GB m 2 used much more powerful GPU to train their m the shortest training time. 1,084,045 7 1-D CNN [27] 48,675 7 MLP [28] 3,750,205 3 MLP [29] 149,252 2  means the algorithm can resolve multiple-exponenti resolve multiple-exponential decays.


Based on the analysis of synthetic datasets, EL and bi-exponential decays than traditional NLSF m realistic experiment data in the next section. Based on the analysis of synthetic datasets, ELM is more robust for analyzing monoand bi-exponential decays than traditional NLSF methods. We will evaluate ELM using realistic experiment data in the next section.

Experimental FLIM Data Analysis
To investigate the feasibility of ELM for experimental FLIM data, we utilized living prostate cancer cells incubated with functionalized gold nanorods (GNRs). A commercial two-photon FLIM system was used to acquire raw 3-D data cubes. This section compares ELM with 1D-CNN, NLSF, and BCMM.

Experimental Setup and Sample Preparation
We used the proposed ELM to analyze a living cellular sample, acquired by a twophoton FLIM system. To achieve an efficient imaging contrast, prostate cancer cells were treated with GNRs functionalized with Cy5 labeled ssDNA [40]. GNRs have tunable longitudinal surface plasmon resonance and enable the interactions between the strong electromagnetic field and activated fluorophore in biological samples [41,42]. Functionalizing GNRs with fluorophore-labelled DNA has been adopted to probe endocellular components [43,44], including microRNA detections for human breast cancer or monitoring the intracellular level of metal ions in human serums. Here, prostate cancer cells were incubated with nanoprobe for 6 h and washed three times with phosphate-buffered saline (PBS). Cells were blended with 4% paraformaldehyde for 15 min. After removing paraformaldehyde, cells were washed with distilled water three times. The two-photon FLIM platform consists of a confocal microscope (LSM 510, Carl Zeiss, Oberkochen, Germany) with 256 × 256 spatial resolution, where the scan module includes four individual PMTs. A TCSPC module (SPC-830, Becker & Hickl GmbH, Berlin, Germany) with 256 time bins and 39 picosecond timing resolution was mounted on the microscope. A tunable femtosecond Ti: sapphire laser (Chameleon, Coherent, Santa Clara, CA, USA) was configured with a repetition frequency 80 MHz and 850 nm wavelength to excite the sample. The emission light was collected using a 60× water-immersion objectives lens (numerical aperture = 1.0) and a 500-550 nm bandpass filter. One hundred scanning cycles were selected to prevent GNRs heating and obtain sufficient photons, where each cycle took three seconds.

Algorithm Evaluation
Due to the strong two-photon photoluminescence property of GNRs, high optical discernibility can be observed between the GNRs and cell tissues [45]. Figure 8a shows the grey-scale intensity image of the sample, where the bright spots are GNRs. As the background pixels with fewer photon counts imply less useful information, they can be neglected during the analysis. In this case, a threshold (100 photon counts) was considered to neglect these pixels. As conventional data readout from TCSPC systems is pixel by pixel, accumulated histograms can be directly fed into the ELM without data conversion. The biological sample should be illuminated with a long acquisition time to achieve a high SNR to obtain a reliable reference. However, a long acquisition time can easily lead to photobleaching. The previous study [27] reported that a phasor projection image could alternatively serve as a reference image to identify autofluorescence and gold nanoprobes. Two clusters representing autofluorescence of the cell and gold nanoprobes can be observed in the phasor plot shown in Figure 8b, after we had applied pixel filtering. Cluster 2 contains the majority of pixels with shorter lifetimes depicting gold nanoprobes. A fitted line was obtained by a linear regression fitting algorithm: where a and b are slope and intercept of the fitted line, g n and s n are locations of pixels in the phasor domain. The intersection points A(g a ,s a ) and B(g b ,s 2b ) can be obtained accordingly. As shown in Figure 8c, we employed the pixel-wise phasor score ρ to generate a phasor projection image by computing: where D is the Euclidean distance between A and B, n is the number of filtered pixels. three algorithms. This is because deconvolution was involved in NLSF, causing non-convergent results due to dealing with ultra-short decays caused by gold nanoprobes. As mentioned, BCMM is not robust in varying ranges of photon counts; many pixels are out of the defined range (0 to 2 ns), as the white pixels show in Figure 8g. Nevertheless, BCMM is a fast algorithm that only took 6.53 s to reconstruct the image. The inference time of 1-D CNN on a GPU (NVIDIA GTX 850M) is 116.43 s, whereas ELM only consumed 1.73 s during inference on the CPU.

Low Counts Scenarios
Fragile tissues, such as retinas, cannot be excited by laser for a long time. To avoid tissue damage and photobleaching caused by a long acquisition time, we investigated ELM's performance for data in low-photon scenarios. We kept the experimental setup identical to Section 4.1. To acquire less-emitted photons, we chose the field of view with fewer nanoprobes. Increased scanning cycles were set on the software. As the number of cycles increased, we changed the intensity threshold to guarantee sufficient pixels were saved. The value of the intensity threshold should be fine-tuned according to different bio-samples (5% of total counts in our experiments). Figure 9a,b depict intensity and reconstructed τA images, respectively. The lifetime of cells and nanoprobes can be consistently reconstructed, even if the cycle decreases to 10. Notably, nanoprobes and boundaries of cells cannot be identified in intensity images with 10 and 40 cycles, yet lifetime images can restore the lifetime and reveal cell boundaries. Below each lifetime image in Figure  9b, histograms of pixel occurrence were below τA images, showing means μ and standard deviations σ. There was no distinct shift in μ and σ at different collection cycles, indicating that ELM is robust, even at low counts.  (Figure 8g), the image from NLSF shows obvious bias because, as mentioned, NLSF is sensitive to initial values and fails to converge sometimes. Given that the 1-D CNN [27] achieved high speed and accuracy, we compared ELM and 1-D CNN in terms of τ A using the same training datasets. From Figure 8d,e, ELM is in good agreement with 1D-CNN, and they showed similar distributions of pixel counts, as shown in Figure 8h. However, in Figure 8g, the NLSF's result is significantly more biased than the other three algorithms. This is because deconvolution was involved in NLSF, causing non-convergent results due to dealing with ultra-short decays caused by gold nanoprobes. As mentioned, BCMM is not robust in varying ranges of photon counts; many pixels are out of the defined range (0 to 2 ns), as the white pixels show in Figure 8g. Nevertheless, BCMM is a fast algorithm that only took 6.53 s to reconstruct the image. The inference time of 1-D CNN on a GPU (NVIDIA GTX 850M) is 116.43 s, whereas ELM only consumed 1.73 s during inference on the CPU.

Low Counts Scenarios
Fragile tissues, such as retinas, cannot be excited by laser for a long time. To avoid tissue damage and photobleaching caused by a long acquisition time, we investigated ELM's performance for data in low-photon scenarios. We kept the experimental setup identical to Section 4.1. To acquire less-emitted photons, we chose the field of view with fewer nanoprobes. Increased scanning cycles were set on the software. As the number of cycles increased, we changed the intensity threshold to guarantee sufficient pixels were saved. The value of the intensity threshold should be fine-tuned according to different bio-samples (5% of total counts in our experiments). Figure 9a,b depict intensity and reconstructed τ A images, respectively. The lifetime of cells and nanoprobes can be consistently reconstructed, even if the cycle decreases to 10. Notably, nanoprobes and boundaries of cells cannot be identified in intensity images with 10 and 40 cycles, yet lifetime images can restore the lifetime and reveal cell boundaries. Below each lifetime image in Figure 9b, histograms of pixel occurrence were below τ A images, showing means µ and standard deviations σ. There was no distinct shift in µ and σ at different collection cycles, indicating that ELM is robust, even at low counts.

Conclusions
In summary, we presented an ELM architecture to accurately retrieve fluorescence lifetime parameters from mono-and bi-exponential decays. Both synthetic and realistic experimental FLIM datasets were employed to evaluate the proposed network. Our results show ELM outperforms fitting and non-fitting methods, regarding synthetic datasets at different photon counts. Further, ELM can better identify NRs and cells and yield a comparable result to the 1-D CNN method. Since ELM does not need back-propagation to train the network, it is more flexible to reconfigure the network topology. Due to the potential online training property, it is promising to implement it on embedded hardware in the future, coupling with sensors and readout circuits to achieve fast on-chip training and inference. More FLIM applications relying on gold nanoparticles will benefit from this study for cellular cancer diagnosis.

Conclusions
In summary, we presented an ELM architecture to accurately retrieve fluorescence lifetime parameters from mono-and bi-exponential decays. Both synthetic and realistic experimental FLIM datasets were employed to evaluate the proposed network. Our results show ELM outperforms fitting and non-fitting methods, regarding synthetic datasets at different photon counts. Further, ELM can better identify NRs and cells and yield a comparable result to the 1-D CNN method. Since ELM does not need back-propagation to train the network, it is more flexible to reconfigure the network topology. Due to the potential online training property, it is promising to implement it on embedded hardware in the future, coupling with sensors and readout circuits to achieve fast on-chip training and inference. More FLIM applications relying on gold nanoparticles will benefit from this study for cellular cancer diagnosis.