Review of Fluorescence Lifetime Imaging Microscopy (FLIM) Data Analysis Using Machine Learning

: Fluorescence lifetime imaging microscopy (FLIM) has emerged as a promising tool for all scientiﬁc studies in recent years. However, the utilization of FLIM data requires complex data modeling techniques, such as curve-ﬁtting procedures. These conventional curve-ﬁtting procedures are not only computationally intensive but also time-consuming. To address this limitation, machine learning (ML), particularly deep learning (DL), can be employed. This review aims to focus on the ML and DL methods for FLIM data analysis. Subsequently, ML and DL strategies for evaluating FLIM data are discussed, consisting of preprocessing, data modeling, and inverse modeling. Additionally, the advantages of the reviewed methods are deliberated alongside future implications. Furthermore, several


Introduction
In 1845, Fredrick W. Herschel made the first official discovery of fluorescence [1].During his experiments, he observed that a quinine solution, such as tonic water, could be excited by UV radiation and emit blue light.Building upon this discovery, Sir George G. Stokes, a British scientist, further investigated fluorescence and noted that the emitted light had a longer wavelength than the UV radiation that initially excited the object [1].The first applications of fluorophores in biological research were found several decades later, in the early 1900s, when they were used to stain tissues, bacteria, and other pathogens.This allowed scientists to visualize and study specific components within biological samples.In 1911, the first working fluorescence microscope was developed by Oskar Heimstaedt.Later, among many individual contributors, companies Carl Zeiss and Carl Reichert played a significant role in advancing fluorescence microscopy [1].In 1929, fluorescence labeling was first introduced by Ellinger and Hirt [2].Their contributions were instrumental in transforming fluorescence microscopy into a powerful tool in every research field.In 1988/1989, the first fluorescence lifetime imaging microscopy (FLIM) microscopy based on an ultra-fast laser scanning microscope was introduced in Jena, Germany [3].This introduction of ultrafast lasers, combined with the advent of semiconductor-based, very fast detection schemes such as SPAD (single photon avalanche diode) and TCSPC (timecorrelated single-photon counting) detectors, have made FLIM on the pico to nanosecond timescale a readily available experimental technique.This microscopy technique offers thorough details and high-resolution images of cell shape, intracellular concentration of chemicals, etc., and it has broad applications in biology, chemistry, materials science, and pharmaceutical research [4].As a result, FLIM has grown in recognition in recent years.It is often combined with Förster resonance energy transfer (FRET), enabling the investigation of molecular mechanisms, biosensor activities, and protein-protein interactions within live cells [4,5].FRET is a non-radiative energy transfer process wherein an excited fluorescence molecule, known as the donor, transfers energy to a non-excited molecule called the acceptor [4][5][6].This energy transfer occurs through dipole-dipole coupling when the emission spectrum of the donor overlaps with the excitation spectrum of the acceptor, and the molecules are at a small distance (within 10 nm) with suitable relative orientations (cf. Figure 1b-e) [6].FLIM-FRET offers the advantage of generating high-resolution spatial and temporal images.However, FLIM itself records fluorescence decay profiles rather than directly measuring fluorescence lifetimes.Thus, the fluorescence decay rate, which we discuss in the following paragraph, serves as the fundamental principle for FLIM operation.The underlying physical background of FLIM, the measurement techniques, and the data acquisition process are covered in the subsequent paragraphs.
To understand the principles underlying FLIM, it is useful to refer to a Jablonski diagram (cf. Figure 1a), which shows the relevant processes observable in a typical photoexcited system.In Figure 1a, S 0 refers to an electronic (singlet) ground state of a fluorophore system.Electrons in this ground state can be excited by absorption of light into various excited states (S 1 , S 2 . ..) depending on the wavelength of the incident light.Notably, the electronic states are further divided into various vibrational (and rotational) states.Typically, excitation by visible light populates a vibrationally excited state of the excited states S >=1 governed by the Franck-Condon principle.
From this excited state, the electrons return to the ground state either by radiative or non-radiative processes.The non-radiative decay is governed by two main processes: ref. [7].First, vibrational relaxation (also called vibrational cooling), by which energy is transferred from vibrationally excited states into kinetic modes of the molecule (or neighboring molecules), causes a transition between vibrational states of the same electronic state and decreases the energy of the system.Second, internal conversion couples different electronic states (e.g., S 1 and S 0 ) via their vibrational modes.Here, the system transitions from an electronically excited state into a highly vibrationally excited state of a lower electronic state without loss of energy, i.e., via a horizontal transition.Both of these non-radiative processes are usually very fast processes, happening in the femto-to-picosecond range.
Radiative decay is typically dominated by fluorescence.Here, the excited state is deactivated by emitting electromagnetic radiation in the form of a photon with a wavelength corresponding to the energy difference of the two states.However, due to the above mentioned speed of the non-radiative deactivation of the vibrationally excited state, this emission almost always happens from the vibrational ground state of the first electronically excited state (S 0 ), an empirical observation known as Kasha's rule [8].Furthermore, due to the Franck-Condon principle, deactivation typically occurs in the vibrationally excited state of S 0 .Both of these effects cause the wavelength of the emitted photon to be red-shifted compared to the excitation light.
The easiest parameters to characterize the fluorescence process are the wavelength of the emitted photon, corresponding to the energy difference between the states and the intensity of the emitted light (directly related to the number of photons), which is governed by the absorption coefficient as well as the quantum efficiency of the fluorophore.The quantum efficiency, in turn, is directly related to the ratio of the radiative to non-radiative decay processes described above.However, in more complicated systems, these parameters are often not sufficient to discriminate between similar fluorophores, as their spectral profiles can overlap.
Here, FLIM can offer many advantages over intensity-based fluorescence techniques.By characterizing the lifetime of a fluorophore, the non-radiative processes can be characterized as the exact timescale of vibrational cooling to the vibrational ground state of S 0 , and the efficiency of internal conversion between S 1 and S 0 directly influences the lifetime of the fluorophore [4].As these processes are extremely sensitive, even to slight changes in the chemical composition or even environment, the fluorescence lifetime can be used to discriminate between highly similar structures.
For example, nicotinamide adenine dinucleotide (NAD(P)H) and flavin adenine dinucleotide (FAD) play a vital role in cellular oxidation-reduction reactions [4,9].These co-enzymes possess autofluorescence properties, enabling non-invasive imaging of living cell metabolic activity.However, their spectral properties in distinct cellular environments exhibit considerable similarity.In contrast, FLIM can effectively distinguish between these two fluorophores due to the disparity in their lifetimes, with NAD(P)H having a shorter lifetime compared to FAD [9].See Table 1 for applications of FLIM.

Protein-protein interaction studies
FLIM can be used to investigate protein-protein interactions based on FRET.This helps in understanding dynamic protein complexes and signaling pathways within living cells.

Cellular metabolism analysis
FLIM can monitor the autofluorescence of cellular metabolites like NAD(P)H and FAD.The changes in these metabolites can indicate an alteration in cellular energy production.

Live-cell imaging and biomedical applications
FLIM can be used as a non-invasive live cell imaging, providing information about cellular analysis and molecular interaction.FLIM can provide useful information in biomedical research.

Super-resolution microscopy
FLIM can be integrated with super-resolution microscopy techniques like STED-FCS to achieve higher-resolution imaging.
J. Exp.Theor.Anal.2023, 1, FOR PEER REVIEW 3 in the chemical composition or even environment, the fluorescence lifetime can be used to discriminate between highly similar structures.For example, nicotinamide adenine dinucleotide (NAD(P)H) and flavin adenine dinucleotide (FAD) play a vital role in cellular oxidation-reduction reactions [4,9].These coenzymes possess autofluorescence properties, enabling non-invasive imaging of living cell metabolic activity.However, their spectral properties in distinct cellular environments exhibit considerable similarity.In contrast, FLIM can effectively distinguish between these two fluorophores due to the disparity in their lifetimes, with NAD(P)H having a shorter lifetime compared to FAD [9].See Table 1 for applications of FLIM.A molecule in S0 level absorbs energy, leading to an electronic excitation to a higher energy level for a short period of time.By internal conversion and vibrational relaxation processes, the electron moves to the lowest vibrational level of excited state.From the S1 electronic state, the electron returns to the ground state in either a radiative or nonradiative way (adapted from [4]).FLIM can be performed using various imaging modalities, such as laser scanning microscopy (LSM) and wide-field illumination (WFI) microscopy [4,11].Depending on the excitation-detection technique used, LSM systems are classified as either confocal (CLSM) or multiphoton (MP-LSM) systems.These microscopic techniques provide 3D FLIM data.It is important to note that WFI has the advantages of higher frame rates and less photodamage than LSM [4].
Fluorescence lifetimes are measured either in the time domain (TD-FLIM) or in the FLIM can be performed using various imaging modalities, such as laser scanning microscopy (LSM) and wide-field illumination (WFI) microscopy [4,11].Depending on the excitation-detection technique used, LSM systems are classified as either confocal (CLSM) or multiphoton (MP-LSM) systems.These microscopic techniques provide 3D FLIM data.It is important to note that WFI has the advantages of higher frame rates and less photodamage than LSM [4].
Fluorescence lifetimes are measured either in the time domain (TD-FLIM) or in the frequency domain (FD-FLIM) [4].TD-FLIM measures the time delay between excitation and emission photons using time-correlated single-photon counting (TCSPC) or timegated detection, while FD-FLIM analyzes the phase and amplitude of fluorescence signals modulated at different frequencies.Both techniques have their strengths and are suitable for different experimental setups and sample characteristics.
Among all FLIM measurement methods, TCSPC is probably the most often-used technique [4,12].Generally, in TCSPC systems, a photon-over-time histogram is measured.The intensity I(t) is calculated by convolving the systems instrument response function (IRF) and the weighted sum of a number of fluorescence decays, typically modeled via simple first-order exponential functions, as shown in Equation (1).
To utilize FLIM data, the lifetimes (τ i ) and abundances (a i ) need to be extracted, where the lifetimes represent the different types of molecules and abundances highlight the molecular concentration.For this purpose, it is important to know the exact lifetimes and abundances from the measured decay traces.There are a few traditional methods: curve-fitting techniques, such as the phasor approach, and deconvolution methods [4,13].The curve-fitting method tries to fit the photon histograms by applying Equation (1) with algorithms like the Levenberg-Marquardt algorithm [14] or maximum likelihood estimation [15][16][17].These algorithms afford 'optimal' (according to the utilized error function) parameters (lifetimes, abundances and offset, etc.) to describe the measured results but require prior knowledge of the source background fluorescence, the number of fluorophores, and the offset for convergence to the correct values.The method of curve fitting strongly depends on the number of photons, as it is highly sensitive to the signal-to-noise ratio, which increases with higher photon counts.This means that a higher photon number usually increases the fitting accuracy.There are two strategies for curve fitting-local fitting and global fitting.In local curve fitting, each pixel is fitted with different lifetime parameters.In the global fitting, all pixel decay traces are fitted at once, so all fluorophores are present in each pixel and show the same lifetime [18].Here, each pixel represents a decay trace.The low implementation complexity is one of the main advantages of all fitting methods.
Another popular approach for analyzing lifetimes without fitting is the Phasor approach [6,19].It provides a 2D graphical view of lifetime distributions.A phasor diagram is derived from the TCSPC data using a Fourier transform, and each pixel in the image corresponds to a point in the phasor diagram, as shown in Figure 2. The phasor space is constructed by two phasor vectors (G, S).
The time domain phasor plot is defined by Equations ( 2) and (3).
In these equations, g i and s i are the coordinates along the horizontal (G) and vertical (S) axis, ω is the modulation frequency, I(t) is the TCSPC data at the i th pixel.The deconvolution-based method is another common form of decay trace analysis technique [4,20].Deconvolution-based methods recover the lifetime decay from the measured fluorescence signal by deconvolution of the system response function (IRF).The Laguerre polynomial method is the most popular deconvolution method.In this method, the decay traces are represented by a Laguerre polynomial, which consists of the series expansion of decay and IRF.The main advantage of the Laguerre polynomial is that it is more precise than the phasor approach.

Proteinprotein interaction studies
FLIM can be used to investigate protein-protein interactions based on FRET.This helps in understanding dynamic protein complexes and signaling pathways within living cells.

Cellular metabolism analysis
FLIM can monitor the autofluorescence of cellular metabolites like NAD(P)H and FAD.The changes in these metabolites can indicate an alteration in cellular energy production.Live-cell imaging and biomedical applications FLIM can be used as a non-invasive live cell imaging, providing information about cellular analysis and molecular interaction.
FLIM can provide useful information in biomedical research.

Superresolution microscopy
FLIM can be integrated with super-resolution microscopy techniques like STED-FCS to achieve higher-resolution imaging.
These techniques (or combinations of them) are implemented in various software packages, both commercial and freely available (cf. Figure 3).The commercial software SPCImage (all versions) [12,21] (Becker & Hickl) is widely used to estimate the lifetime parameters and is considered the standard method.It implements all of the above mentioned methods and defaults to using iterative decay fitting by applying first-, second-, and triple-order exponential decays to fit the data.Also, some free software packages can be used for lifetime extraction.For example, the Python package FLIMview [22] is based on the principle of curve fitting.FLIMJ, which is based on Fiji, allows for visualizing images and analyzing the fit using various methods [23].Another MATLAB-based package is FLIMfit [24].See Table 2 for an comparison of these software packages.In the frequency domain FLIM measurement, the phasor plot is defined by the following equations: Here, m x,y and ϕ are the modulation ratio and the phase delay given a particular frequency (ω) at a pixel location (x, y).The average lifetime at the i th pixel is defined as the ratio of s i and g i .
The deconvolution-based method is another common form of decay trace analysis technique [4,20].Deconvolution-based methods recover the lifetime decay from the measured fluorescence signal by deconvolution of the system response function (IRF).The Laguerre polynomial method is the most popular deconvolution method.In this method, the decay traces are represented by a Laguerre polynomial, which consists of the series expansion of decay and IRF.The main advantage of the Laguerre polynomial is that it is more precise than the phasor approach.
These techniques (or combinations of them) are implemented in various software packages, both commercial and freely available (cf. Figure 3).The commercial software SPCImage (all versions) [12,21] (Becker & Hickl) is widely used to estimate the lifetime parameters and is considered the standard method.It implements all of the above mentioned methods and defaults to using iterative decay fitting by applying first-, second-, and triple-order exponential decays to fit the data.Also, some free software packages can be used for lifetime extraction.For example, the Python package FLIMview [22] is based on the principle of curve fitting.FLIMJ, which is based on Fiji, allows for visualizing images and analyzing the fit using various methods [23].Another MATLAB-based package is FLIMfit [24].See Table 2 for an comparison of these software packages.FLIMview where fitted curve, residue, and pixel coordinates can be visualized together [22].(b) Fiji FLIMJ package [23].(c) FLIMfit package is connected with Omero for image analysis [24].
Despite their many advantages, curve fitting, phasor method, and deconvolution have several disadvantages.All methods are time-consuming and error-prone.Recently, machine learning (ML), especially deep learning, has received a considerable boost in popularity due to its outstanding performance in this area.This review focuses on the application of ML and DL methods for FLIM data analysis.For this, the review is divided into four sections.The first section discusses the preprocessing of FLIM data.The second section deals with data modeling, followed by inverse modeling in the third section.The last part consists of a summary and an outlook.

Preprocessing
The primary objective of preprocessing is to enhance the quality of images or data to enable proper analysis.In fluorescence microscopy, image preprocessing is necessary due to the limited number of photons captured by the detector, resulting in a weak signal [25,26].This weak signal leads to fluorescence images exhibiting a significant contribution of Poisson-Gaussian noise [25].To experimentally obtain improved images, two approaches can be employed: First, increasing the power of the excitation laser increases the emitted photon flux (albeit only up to a certain limit); second, increasing the exposure time increases the total number of captured photons emitted with a constant photon flux.However, both methods tend to cause photodamage and can thus not be utilized for all samples or unlimited signal increases [25,26].Therefore, developing algorithms that effectively denoise fluorescence spectroscopy data and give high-quality images is necessary to analyze complex datasets.
A wide variety of denoising and image reconstruction algorithms for fluorescence images are known from the literature.Some recent approaches are listed in Table 3.In the next paragraph, we review several denoising algorithms.

Different Types of Denoising Methods Examples
Designed mainly for Gaussian noise BM3D [27] NLM [28] KSVD [29] Design mainly for Poisson noise and Poisson-Gaussian noise PURE-LET [30] Deep-learning-based denoiser DnCNN [31] CARE [32] First, a general problem in training and evaluating denoising methods was addressed by Zhang et al. [25].They addressed the issue of simulating artificial datasets that accurately reflect the properties of real measured fluorescence datasets.They recognized that most datasets are generated using Gaussian noise, which does not accurately represent the Poisson noise typically present in fluorescence images.Therefore, their primary objective was to bridge this gap by creating a Poisson-Gaussian denoising dataset.
To achieve this, the authors developed a dataset called "Fluorescence microscopy denoising (FMD)".The FMD dataset consists of 12,000 real noisy microscopic images obtained from confocal, multiphoton, and wide-field microscopes.The ground truth images were created through image averaging.By using this dataset, they compared the performance of ten traditional denoising algorithms with deep learning methods, demonstrating that deep learning approaches outperformed traditional methods on this dataset with more realistic noise.
In their paper [26], the authors proposed a two-step generative adversarial network (GAN)-based denoising model called the global noise modeling denoiser (GNMD).In the first step, the GAN-based model was trained using a combination of binary masked images and real images.This trained model could then generate synthetic images by taking a binary mask as input, which combined a synthetic foreground signal (gamma distributed intensities) with synthetic background noise (global noise generated by Pix2Pix).In the second step, the output from the first step and a clean image were fed into the same network, which was then trained using both clean and noisy images.This trained model aimed to enhance the denoising capacity.To validate the denoising capabilities of their model, the authors tested it using real fluorescence images of mitochondria acquired through a wide-field fluorescence microscope.The performance of their model was also compared to three traditional denoising models (PURE-LET, VST-BM3D, Noise2self [33]), and the GNMD model outperformed the others.In summary, Zhong et al. [26] addressed the need for realistic simulations by creating a Poisson-Gaussian denoising dataset (FMD) [25].They introduced the GNMD model, a two-step GAN-based denoising approach, and demonstrated its effectiveness using real fluorescence images.Their study highlighted the superiority of deep learning methods over traditional denoising algorithms in the context of fluorescence microscopy denoising.
Another deep-learning-based denoising technique was demonstrated by Mannam et al. [34].The authors [34] demonstrated a convolutional neural network (CNN) based denoising techniques to achieve a high signal-to-ratio (SNR).This paper was divided into two parts.Phasor-based denoising techniques were described in part 1, and in the second part, the segmentation technique was mentioned.For denoising, they used the Noise2Noise and DnCNN [31] models, which were pretrained with 12,000 fluorescence intensity images.The authors used this denoising technique as an ImageJ plugin.Additionally, they compare their results with traditional denoising methods (mean or median filter).To prove their network efficiency, they claimed that traditional denoising filters should be used several times to obtain a clear image, whereas CNN can eliminate noise by using it one time.
Although the deep-learning-based denoising approach has gained huge popularity in recent years, it still faces certain challenges [35].For example, deep learning algorithms are data-driven and data-hungry processes.For training a deep network, thousands of noisy and clean data pairs are needed, and this data acquisition process is laborious and error-prone.On the other hand, supervised learning methods can easily be overfit to the training dataset ("memorize the training data").This can lead the model to predict clean images similar to the training data independent from its input, a phenomenon known as hallucination [36].
To handle the above two problems, Wang et al. [36] described a transfer learning-based denoising technique.In the beginning, they trained the network by supervised learning with a U-net architecture and used generic and synthetic noisy or clean images for training.They then transferred the weights to another U-net model and trained this model with a self-supervised training framework, Noise2self [33], where only noisy data are required for training.Finally, they showed the results for only the self-supervised method and a combination of the supervised and self-supervised methods.They used the same FMD [25] dataset, and the comparison of the result is shown in Figure 4.
should be used several times to obtain a clear image, whereas CNN can eliminate noise by using it one time.
Although the deep-learning-based denoising approach has gained huge popularity in recent years, it still faces certain challenges [35].For example, deep learning algorithms are data-driven and data-hungry processes.For training a deep network, thousands of noisy and clean data pairs are needed, and this data acquisition process is laborious and error-prone.On the other hand, supervised learning methods can easily be overfit to the training dataset ("memorize the training data").This can lead the model to predict clean images similar to the training data independent from its input, a phenomenon known as hallucination [36].
To handle the above two problems, Wang et al. [36] described a transfer learningbased denoising technique.In the beginning, they trained the network by supervised learning with a U-net architecture and used generic and synthetic noisy or clean images for training.They then transferred the weights to another U-net model and trained this model with a self-supervised training framework, Noise2self [33], where only noisy data are required for training.Finally, they showed the results for only the self-supervised method and a combination of the supervised and self-supervised methods.They used the same FMD [25] dataset, and the comparison of the result is shown in Figure 4.
Although all proposed methods successfully have removed noise from the fluorescence image dataset, some improvements need to be made.For example, these methods need to be tested with various datasets of different tissue samples or any other biomedical dataset.Although all proposed methods successfully have removed noise from the fluorescence image dataset, some improvements need to be made.For example, these methods need to be tested with various datasets of different tissue samples or any other biomedical dataset.

Data Modeling
Fluorescence lifetime imaging microscopy (FLIM) has gained significant recognition in the biomedical field due to its label-free nature and high sensitivity.However, the conventional approach to FLIM data analysis usually involves curve-fitting procedures that require manually tuning parameters for the extraction of the lifetime values, which becomes inefficient when dealing with large amounts of sample data.Moreover, when there are subtle differences among different data points, the distribution of fluorescence lifetime values can be wide or very small, making it challenging to manually differentiate them into multiple classes.Fortunately, the recent popularity of ML or DL has greatly contributed to the advancement of data classification and data analysis in FLIM as shown in Figure 5.
synthetic noisy image".Reprinted with permission from [36] under the terms of the OSA Open Access Publishing Agreement.

Data Modeling
Fluorescence lifetime imaging microscopy (FLIM) has gained significant recognition in the biomedical field due to its label-free nature and high sensitivity.However, the conventional approach to FLIM data analysis usually involves curve-fitting procedures that require manually tuning parameters for the extraction of the lifetime values, which becomes inefficient when dealing with large amounts of sample data.Moreover, when there are subtle differences among different data points, the distribution of fluorescence lifetime values can be wide or very small, making it challenging to manually differentiate them into multiple classes.Fortunately, the recent popularity of ML or DL has greatly contributed to the advancement of data classification and data analysis in FLIM as shown in Figure 5.We divided data modeling into two sub parts: classification and segmentation (see Figure 5).Classification and segmentation are two related concepts but are used for different tasks and can complement each other in various applications.Classification involves assigning a label or category to an entire input based on its characteristics.Segmentation, on the other hand, involves dividing an image into meaningful regions or segments, typically on a pixel basis.However, segmentation and classification can be used together to provide a more comprehensive understanding of an image.In the next section, we discuss the segmentation and classification methods used in FLIM data modeling.

Segmentation
Zhang et al. [37] used K-means clustering to segment lifetime images by using a phasor plot (cf. Figure 6).In the phasor plot, pixels with similar decay phasors were sorted into the same cluster.This feature is useful for segmenting pixels based on the similarity of their fluorescence decays.Therefore, the lifetime segmentation technique is simplified using phasors as the problem is transformed into a classical clustering problem of points We divided data modeling into two sub parts: classification and segmentation (see Figure 5).Classification and segmentation are two related concepts but are used for different tasks and can complement each other in various applications.Classification involves assigning a label or category to an entire input based on its characteristics.Segmentation, on the other hand, involves dividing an image into meaningful regions or segments, typically on a pixel basis.However, segmentation and classification can be used together to provide a more comprehensive understanding of an image.In the next section, we discuss the segmentation and classification methods used in FLIM data modeling.

Segmentation
Zhang et al. [37] used K-means clustering to segment lifetime images by using a phasor plot (cf. Figure 6).In the phasor plot, pixels with similar decay phasors were sorted into the same cluster.This feature is useful for segmenting pixels based on the similarity of their fluorescence decays.Therefore, the lifetime segmentation technique is simplified using phasors as the problem is transformed into a classical clustering problem of points in a 2D plane.This method achieved success in segmentation with greater speed than traditional methods.In reference [38], the authors applied Otsu's thresholding-based segmentation method to separate the background pixels from the foreground pixels [39].They then used a morphological operation to remove the non-cellular region.This work is discussed in detail in the classification part.Here, the authors used segmentation as the initial step for classification.Also, in reference [40], the authors used a simple thresholdingbased segmentation to remove background before training an ML model on the dataset.This work is also discussed in detail in the next section.
J. Exp.Theor.Anal.2023, 1, FOR PEER REVIEW 10 in a 2D plane.This method achieved success in segmentation with greater speed than traditional methods.In reference [38], the authors applied Otsu s thresholding-based segmentation method to separate the background pixels from the foreground pixels [39].They then used a morphological operation to remove the non-cellular region.This work is discussed in detail in the classification part.Here, the authors used segmentation as the initial step for classification.Also, in reference [40], the authors used a simple thresholding-based segmentation to remove background before training an ML model on the dataset.This work is also discussed in detail in the next section.

Lung Cancer Classification
Lung cancer ranks among the top causes of cancer-related deaths globally, and the 5year survival rate after pneumonectomy is less than 14% [41].Surgery can improve survival rates; however, the success is strongly dependent on the stage of detection.Currently, the gold standard for assessing and diagnosing diseases is hematoxylin and eosin (H&E) stained histopathology, which takes 20-30 min even under ideal conditions with expensive and laborious interoperative cryosectioning [41].
Nicotinamide adenine dinucleotide (NADH) and flavin adenine dinucleotide (FAD) play a significant role in cellular energy metabolism [41].Cancer cells have higher rates of glycolysis for rapid cell division compared to normal cells.These metabolic changes are often accompanied by alterations in fluorophores, which can be detected using fluorescence imaging and spectroscopy.Recent studies have demonstrated the potential of autofluorescence imaging and spectroscopy as diagnostic methods for various cancers, including oral, cervical, and breast cancers [41].However, due to the irregular shape of the tissue, as well as various concentrations of perturbing absorbers in the tissue, intensity-based fluorescence techniques are challenging to apply.Here, FLIM, which is insensitive to these

Classification 3.2.1. Lung Cancer Classification
Lung cancer ranks among the top causes of cancer-related deaths globally, and the 5-year survival rate after pneumonectomy is less than 14% [41].Surgery can improve survival rates; however, the success is strongly dependent on the stage of detection.Currently, the gold standard for assessing and diagnosing diseases is hematoxylin and eosin (H&E) stained histopathology, which takes 20-30 min even under ideal conditions with expensive and laborious interoperative cryosectioning [41].
Nicotinamide adenine dinucleotide (NADH) and flavin adenine dinucleotide (FAD) play a significant role in cellular energy metabolism [41].Cancer cells have higher rates of glycolysis for rapid cell division compared to normal cells.These metabolic changes are often accompanied by alterations in fluorophores, which can be detected using fluorescence imaging and spectroscopy.Recent studies have demonstrated the potential of autofluorescence imaging and spectroscopy as diagnostic methods for various cancers, including oral, cervical, and breast cancers [41].However, due to the irregular shape of the tissue, as well as various concentrations of perturbing absorbers in the tissue, intensity-based fluorescence techniques are challenging to apply.Here, FLIM, which is insensitive to these perturbations, can be applied to clearly differentiate between cancerous and healthy tissue.
Wang et al. [40] aimed to classify healthy and cancerous lung tissue by four different ML methods (K-nearest neighbor (KNN), support vector classifier (SVC), neural network (NN), random forest (RF)).First, almost 20,000 fluorescence image frames (each frame contains one intensity and corresponding lifetime images) with a dimension of 128×128 px were collected from 10 patients (cancerous and non-cancerous).The measurements were performed by a fiber-based fluorescence lifetime imaging endomicroscope.In the preprocessing phase, the images went through several steps: thresholding, normalization, and Gaussian smoothing.Afterward, a dimension reduction was performed by princi-pal component analysis (PCA) to remove zero values, boundary, and zero lifetime pixels.Therefore, the number of features decreases from 16,384 (128 × 128) to 2100.These features were then used to classify the data into two classes by four different ML methods.Among these methods, RF performed the best.Though all these methods successfully distinguish cancer and non-cancer images, there are still some points that need to be improved; e.g., this paper used PCA-based approaches that require flattening of the input from 2D to 1D, and the employed ML methods use single-pixel values.This means that correlations between adjacent pixels were lost.To fill this gap, the authors from [40] extended their work, which is mentioned in paper [42], and they used deep learning for classification.The authors collected 70,000 images with a dimension of 128 × 128.These images then went through many preprocessing steps, with an important step being a thresholding step that removed all pixel values for which the measured intensity was smaller than the square root of the mean intensity over the whole image.Thereafter, a normalization was applied.For the CNN model, they used different architectures and compared their results.For the performance evaluation, the authors created three types of combined image datasets-only lifetime images, two-channel images, and three-channel images.Three channels mean a pair of intensity and lifetime images were filled into two different channels of an RGB image, leaving the remaining channel with zero values.On the other hand, a two-channel image refers to a stack of images where intensity and lifetime images are merged.In this paper, the authors compared the performance among different DL models with the said three types of datasets, where DenseNet121 s accuracy is highest with approximately 86%.Also, they compare their results with other ML methods and CNN results surpassed the ML in almost every aspect.In conclusion, the authors conclude that using the three-channel dataset gave the best performance.To improve this CNN performance another work was performed by Wang et al. [43].They replaced the bottleneck residual block of ResNet50 with a multiscale concatenated dilation (MSCD) block.The accuracy of MSCD was almost 87% among all other CNN methods.Though all these DL architectures, particularly ResNet [44] and DenseNet [45], performed well in classification problems, there are still some generalized problems.For example, ResNet produces some redundant features but is unable to create new features.To handle such a problem, Wang et al. [46] proposed a ResNetZ-based lung cancer classification method.Also, they compared the performance and architecture of their proposed network with ResNet [44] and Res2Net [47].During the first stage, they collected 100,000 FLIM images from 18 patients.The preprocessing step was the same as the paper [34], and they chose three-channel images for lung cancer discrimination.The architecture of three different networks is shown in Figure 7. From Figure 7, we can see that the authors replaced a 1D convolution splitting block with a 1D convolution block.However, the Res2Net model outperforms, while the ResNetZ and the Res2Net performances are quite similar.

Skin Cancer Classification
The standard procedure for determining the stage and extent of skin cancer is biopsy [48].However, due to its lack of specificity, this method s accuracy is greatly reliant on the dermatologists experience, which could result in misdiagnosis.Recent advancements in biomedical studies could lead to rapid and proper diagnosis, but all these processes are time-consuming.We already discussed how small changes in cell metabolism can be identified by FLIM in the previous section.In this section, we discuss the detection of skin  [46] under the terms of the CC-BY 4. license.

Skin Cancer Classification
The standard procedure for determining the stage and extent of skin cancer is biopsy [48].However, due to its lack of specificity, this method's accuracy is greatly reliant on the dermatologists' experience, which could result in misdiagnosis.Recent advancements in biomedical studies could lead to rapid and proper diagnosis, but all these processes are time-consuming.We already discussed how small changes in cell metabolism can be identified by FLIM in the previous section.In this section, we discuss the detection of skin cancer by FLIM.
In paper [48], skin biopsies suspected to be skin cancer were classified into cancerous and normal tissue by machine learning methods.The authors collected FLIM images of 24 normal cases and 138 cancer cases for this study.Before classification, data were partitioned into two samples: training and testing.Validation was performed by three methods: (1) bootstrapping, (2) hold-out method, and (3) k-fold cross-validation.Then, four classification methods (RF, KNN, support vector machine (SVM), and linear discriminant analysis (LDA)) were used, and the results were compared.Using the bootstrapping sample partitioning scheme, all tested ML methods achieved accuracies over 80%.
Chen et al. [49] presented a study on a linear-kernel support vector machine (LSVM) model to distinguish basal cell carcinoma (BCC) from actinic keratosis (AK) and Bowen's disease (BD).The input parameters of the LSVM model consist of lifetime components and lifetime entropies, which were extracted from two-photon fluorescence lifetime imaging of H&E-stained biopsy sections.In constructing the SVM models, features obtained from the lifetime (τ 2 ) of the second component were found to be significantly more predictive than the average fluorescence lifetime (τ m ) in terms of diagnostic accuracy, sensitivity, and specificity.The above findings were confirmed based on the receiver operating characteristic (ROC) curves of diagnostic models.Furthermore, the results showed that adding Shannon entropy as an independent feature could further improve the diagnostic accuracy.The establishment of the SVM training model involved the extraction of fluorescence lifetime features and the calculation of the information entropy.The fluorescence lifetime data were fitted with triple-exponential decays.The SVM model was optimized by leave-oneout 5-fold cross-validation of the training datasets.Lifetime calculations and fitting were performed using SPCImage software (Becker & Hickl GmbH, Berlin, Germany).From this study, the authors conclude that using the τ m feature, the LSVM model effectively distinguished BCC from the precancerous lesions (AK and BD) with a prediction accuracy of 90.4%.However, this model failed to distinguish between the AK and BD subcategories.On the other hand, the LSVM model using the lifetime component of τ 2 as a training feature achieved better classification performance and was able to classify AK and BD.The prediction accuracy was 95.6 % for BCC vs. (AK and BD) and 91.1 % for AK vs. BD.Also, prediction accuracy increased by adding entropy as a feature.

Cervical Cancer Classification
Cervical cancer is the fourth most common cancer in women, leading to over 300,000 deaths a year [50].Again, the prognosis of the diagnosis is highly dependent on the stage of detection, with early-stage detection significantly improving the outcome.To detect cancer, biopsies are currently performed, which are invasive, painful, and time-consuming procedures.Consequently, there is a need for a non-invasive and highly sensitive screening method.In contrast, it is known that cell metabolism changes during cancer, and it can be detected by the activity of a co-enzyme, namely NAD(P)H [51].We discuss cervical cancer classification using FLIM in this section.
In paper [51], the authors showed cervical tissue classification using FLIM.Lifetime information was extracted with a bi-exponential model using expectation-maximization and the Bayesian information criterion algorithm (EM-BIC).Then, they applied the extreme learning machine (ELM) method to classify cancerous and non-cancerous, and the classification gave more than 80% specificity.
In another work performed by the authors [52], tissue was classified into normal and cervical intraepithelial neoplasia (CIN) based on the lifetime information [52].In the images, epithelium and stroma regions were annotated, then calculated lifetimes were considered as the feature vector of the ELM method.Here, they used commercial SPCImage for lifetime calculation.Before applying the ML method, they used manual segmentation which was performed by pathologists.Their ELM method, which the authors say has better generalizability than traditional ML methods like SVM and back-propagation (BP), was able to differentiate between the tissue types with accuracies of up to 94%.
The authors of the paper [38] used an unsupervised machine learning method for cancer classification.In total, FLIM images of 71 patient samples (cf. Figure 8) were taken for this study.Some preprocessing steps were applied to increase the classification accuracy.Before classification with k-means, they used a pretrained Alexnet [53] for feature extraction and PCA for dimension reduction.This method gave 90.9% sensitivity and 100% specificity, which is higher than the traditional liquid-based cytology (LBC) method.The authors used three kinds of FLIM images as input mean lifetime (t m ), second abundance component (a 2 ), and t m and a 2 together.They also showed a case study on real diagnosis, where tm images gave better accuracy than other image types.They also showed in the case study that FLIM gave a promising result against conventional diagnosis.ing procedures.Consequently, there is a need for a non-invasive and highly sensitiv screening method.In contrast, it is known that cell metabolism changes during cance and it can be detected by the activity of a co-enzyme, namely NAD(P)H [51].We discus cervical cancer classification using FLIM in this section.
In paper [51], the authors showed cervical tissue classification using FLIM.Lifetim information was extracted with a bi-exponential model using expectation-maximizatio and the Bayesian information criterion algorithm (EM-BIC).Then, they applied the ex treme learning machine (ELM) method to classify cancerous and non-cancerous, and th classification gave more than 80% specificity.
In another work performed by the authors [52], tissue was classified into normal an cervical intraepithelial neoplasia (CIN) based on the lifetime information [52].In the im ages, epithelium and stroma regions were annotated, then calculated lifetimes were con sidered as the feature vector of the ELM method.Here, they used commercial SPCImag for lifetime calculation.Before applying the ML method, they used manual segmentatio which was performed by pathologists.Their ELM method, which the authors say has be ter generalizability than traditional ML methods like SVM and back-propagation (BP was able to differentiate between the tissue types with accuracies of up to 94%. The authors of the paper [38] used an unsupervised machine learning method fo cancer classification.In total, FLIM images of 71 patient samples (cf. Figure 8) were take for this study.Some preprocessing steps were applied to increase the classification accu racy.Before classification with k-means, they used a pretrained Alexnet [53] for featur extraction and PCA for dimension reduction.This method gave 90.9% sensitivity an 100% specificity, which is higher than the traditional liquid-based cytology (LBC) method The authors used three kinds of FLIM images as input mean lifetime (tm), second abun dance component (a2), and tm and a2 together.They also showed a case study on real d agnosis, where tm images gave better accuracy than other image types.They also showe in the case study that FLIM gave a promising result against conventional diagnosis.

Microglia Classification
The authors from [54] studied a FLIM-based artificial neural network (ANN) approach to identify microglia position.They divided their studies into two parts.In the first part of the paper, they trained the ANN with lifetime parameters (lifetime values), and in the second part of the paper, they trained the ANN directly with the exponential decay traces.In the first case, lifetimes were extracted with SPCImage.The resulting lifetime data were split into training/validation/testing sets in a 70/15/15 split regime.The performance index was mean square error (MSE).In the second approach, they used 256 times bin histograms for ANN training.From the above two cases, lifetime-based classification performed better (almost 40% better sensitivity for one test case), which the authors attribute to the experimental setup used.They suggest that the direct classification of the decay traces could be significantly improved with a more varied training dataset.

Other Classification
Jo et al. [55] demonstrated FLIM-based early-stage detection of oral cancer and dysplasia.In this experiment, the authors collected tissue samples from 73 patients.They developed a computer-aided multispectral FLIM endoscope.First, a deconvolution method was applied to fluorescence decay traces for each pixel.From the deconvolved decay traces, spectral intensities, normalized fluorescence intensity, and average lifetime were used as features to design a quadratic discriminant analysis (QDA) classifier to discriminate cancerous oral tissue and mild dysplasia.This statistical model achieved 95% accuracy and 87% specificity.
Walsh et al. used a random forest (RF) classifier to monitor the activation of T-cells using NAD(P)H and FAD autofluorescence [56].Uniform Manifold Approximation and Projection (UMAP) was used for dimension reduction.The authors achieved a classification accuracy of almost 98%.Additionally, the authors of [57] used a new technology to distinguish the parathyroid gland from other glands using FLIM images.Twenty-one patients underwent parathyroid surgery, and three ML models (NN, (RF), SVM) were used to distinguish parathyroid glands from other glands in the pharynx.The RF model showed the highest sensitivity and specificity.After classification, the Laguerre deconvolution method was used to predict lifetime values.A significant difference was found in the average lifetime of the parathyroid gland compared to other glands, such as thyroid, adipose tissue, and lymphoid tissue, enabling discrimination among them.

Inverse Modeling
As laid out in the introduction, FLIM measurements are carried out by obtaining fluorescence decay curves by means of TD or FD measurements.To use these data for further analysis, the relevant parameters describing the observed system, i.e., the fluorescence lifetimes, need to be extracted from these curves.However, this is an ill-posed problem [58].Curve fitting, the traditional extraction technique, is a difficult procedure with numerous software options offered by various businesses and research organizations (as shown in the introduction).
Some authors suggested that lifetime extraction approaches based on machine learning (ML), especially deep learning (DL), can solve this issue.This is based on the assumption that extraction of lifetime parameters from decay curves is essential for an inverse modeling problem of the measurement procedure (see Table 4

for examples).
There are several software packages to solve general inverse modeling problems.However, the application of these programs to FLIM remains challenging.One challenge is the multi-exponential nature of the decay curves: fluorescence decay in biological samples is often characterized by contributions of multiple fluorophores, which means that multiple lifetimes contribute to the overall fluorescence signal.Choosing an appropriate model or number of decay constants for fitting is challenging.On the other hand, the low photon counts in FLIM often cause high noise contributions in comparison with the signal of interest, which can affect lifetime parameter estimation.This section focuses on inverse modeling with ML/DL for lifetime feature extraction (see Figure 9) and their advantages and disadvantages.
Wu et al. [59] employed an ANN approach based on a bi-exponential model to estimate lifetime parameters from TCSPC raw data.The objective was to train an ANN model to approximate the function that maps the TCSPC raw data into the unknown lifetime parameters.The ANN model used in the study consisted of two hidden fully connected (FC) layers using the time bins and their respective photon counts as input values and outputting estimated lifetimes and abundance ratio.The results from the study demonstrated that the ANN-based method enabled the estimation of a lifetime image (256 × 256 size) in just 0.9 s, which was 180 times faster compared to the curve-fitting technique based on the least-squares method (LSM).The authors also observed that the LSM method struggled to accurately estimate lifetime parameters due to its sensitivity to initial conditions.The success rate for accurately estimating lifetime parameters from real experimental data using the ANN method was reported as 99.93%, while the LSM method achieved a success rate of 95.93%.This indicates that the ANN method significantly improved the accuracy of lifetime estimation compared to conventional curve-fitting tools.
samples is often characterized by contributions of multiple fluorophores, which means that multiple lifetimes contribute to the overall fluorescence signal.Choosing an appropriate model or number of decay constants for fitting is challenging.On the other hand, the low photon counts in FLIM often cause high noise contributions in comparison with the signal of interest, which can affect lifetime parameter estimation.This section focuses on inverse modeling with ML/DL for lifetime feature extraction (see Figure 9) and their advantages and disadvantages.Wu et al. [59] employed an ANN approach based on a bi-exponential model to estimate lifetime parameters from TCSPC raw data.The objective was to train an ANN model to approximate the function that maps the TCSPC raw data into the unknown lifetime parameters.The ANN model used in the study consisted of two hidden fully connected (FC) layers using the time bins and their respective photon counts as input values and outputting estimated lifetimes and abundance ratio.The results from the study demonstrated that the ANN-based method enabled the estimation of a lifetime image (256 × 256 size) in just 0.9 s, which was 180 times faster compared to the curve-fitting technique based on the least-squares method (LSM).The authors also observed that the LSM method struggled to accurately estimate lifetime parameters due to its sensitivity to initial conditions.The success rate for accurately estimating lifetime parameters from real experimental data using the ANN method was reported as 99.93%, while the LSM method achieved a success rate of 95.93%.This indicates that the ANN method significantly improved the accuracy of lifetime estimation compared to conventional curve-fitting tools.
However, this ANN failed for low photon counts.To overcome this problem, Smith et al. [60] proposed an innovative approach for estimating lifetime parameters from a complete three-dimensional TCSPC dataset using a CNN.Their architecture, called FLI-Net (Fluorescence Lifetime Imaging Network, cf. Figure 10), was trained with synthetic data produced from simulating biexponential decays and IRFs for each pixel in the popular MNIST handwritten digits dataset.FLI-Net consists of a shared branch for temporal feature extraction and separate branches for reconstructing lifetime images and fractional amplitudes of short lifetimes.To enable spatially independent feature extraction and However, this ANN failed for low photon counts.To overcome this problem, Smith et al. [60] proposed an innovative approach for estimating lifetime parameters from a complete three-dimensional TCSPC dataset using a CNN.Their architecture, called FLI-Net (Fluorescence Lifetime Imaging Network, cf. Figure 10), was trained with synthetic data produced from simulating biexponential decays and IRFs for each pixel in the popular MNIST handwritten digits dataset.FLI-Net consists of a shared branch for temporal feature extraction and separate branches for reconstructing lifetime images and fractional amplitudes of short lifetimes.To enable spatially independent feature extraction and capture of each temporal point spread function (TPSF), 3D convolutions (Conv3D) are applied along the temporal dimension for each pixel location.By using a Conv3D layer (with a kernel size 1 × 1 × 10), unwanted artifacts from neighboring pixels in the spatial dimensions were minimized during both the training and testing phases.Additionally, a residual block (ResBlock) with a reduced kernel length allows for further extraction of temporal information.This network has three output branches.Each branch employs a sequence of convolutions for down-sampling.By comparing the results obtained from FLI-Net with those from the conventional least-squares fitting (LSF) method, Smith et al. [60] found that FLI-Net gave high accuracy and was approximately 30 times faster than the SPCImage.However, this method's performance depends on spatial information of the FLIM data, which means the model needs more datasets to train.
Guo et al. [20] described a new method where lifetimes and abundances were inverse modeled from decay traces through ML.They trained a random forest (RF) model with 3000 artificial traces, which was tested with experimental data to estimate lifetimes and abundances.The performances of the ML model were verified based on two things: First, the predicted values were compared with their true values for artificial data, which showed good prediction performance.Secondly, model performances are verified on real-world data (SPCImage software for lifetime estimation), and results roughly match each other.In summary, ML performed better than the traditional method.The authors also compared their model with the FLI-Net [60].Their RF-based model performs better than FLI-Net within three aspects: First, the RF model works on a pixel basis and thus does not need retraining when spatial dimensions change.Second, the employed Laguerre polynomial approach and a large span of training values make the model highly generalizable.Third, compared to FLI-Net, decay traces with more than two components can be analyzed.capture of each temporal point spread function (TPSF), 3D convolutions (Conv3D) are applied along the temporal dimension for each pixel location.By using a Conv3D layer (with a kernel size 1 × 1 × 10), unwanted artifacts from neighboring pixels in the spatial dimensions were minimized during both the training and testing phases.Additionally, a residual block (ResBlock) with a reduced kernel length allows for further extraction of temporal information.This network has three output branches.Each branch employs a sequence of convolutions for down-sampling.By comparing the results obtained from FLI-Net with those from the conventional least-squares fitting (LSF) method, Smith et al. [60] found that FLI-Net gave high accuracy and was approximately 30 times faster than the SPCImage.However, this method s performance depends on spatial information of the FLIM data, which means the model needs more datasets to train.Guo et al. [20] described a new method where lifetimes and abundances were inverse modeled from decay traces through ML.They trained a random forest (RF) model with 3000 artificial traces, which was tested with experimental data to estimate lifetimes and abundances.The performances of the ML model were verified based on two things: First, the predicted values were compared with their true values for artificial data, which showed good prediction performance.Secondly, model performances are verified on realworld data (SPCImage software for lifetime estimation), and results roughly match each other.In summary, ML performed better than the traditional method.The authors also compared their model with the FLI-Net [60].Their RF-based model performs better than FLI-Net within three aspects: First, the RF model works on a pixel basis and thus does not need retraining when spatial dimensions change.Second, the employed Laguerre polynomial approach and a large span of training values make the model highly generalizable.Third, compared to FLI-Net, decay traces with more than two components can be analyzed.
Yao et al. [61] proposed a novel optical instrument for compressive macroscopic fluorescence lifetime images (MFLI).They demonstrated a method to enhance the resolution of fluorescence lifetime images, which is mainly based on a CNN model called Net-FLICS trained on a modified (with simulated decay traces in each pixel) EMNIST dataset.One block aims to recover sparsity information from compressive data with a 1D CNN layer.The second segment is responsible for revealing the intensity of images through a 2D convolution layer, and the third segment utilizes 1D convolution layers to reconstruct lifetime images directly from raw data.Yao et al. [61] proposed a novel optical instrument for compressive macroscopic fluorescence lifetime images (MFLI).They demonstrated a method to enhance the resolution of fluorescence lifetime images, which is mainly based on a CNN model called Net-FLICS trained on a modified (with simulated decay traces in each pixel) EMNIST dataset.One block aims to recover sparsity information from compressive data with a 1D CNN layer.The second segment is responsible for revealing the intensity of images through a 2D convolution layer, and the third segment utilizes 1D convolution layers to reconstruct lifetime images directly from raw data.
Another ANN technique to retrieve lifetimes from raw FLIM data was introduced in [62].The ANN takes the raw decay trace acquired from a SPAD (single photon avalanche diode) pixel, and, using three hidden layers, it directly outputs the associated lifetime.The model, trained with simulated data, was evaluated in comparison to results obtained from least-squares (LSQ) deconvolution and also tested with real experimental data.The authors showed that the ANN successfully estimates lifetime values from synthetic data as well as real data and is 1000 times faster than LSQ.
In general, most convolutional neural networks are designed to handle 2-, 3-, or multidimensional data.However, these high-dimensional CNNs have a higher number of trainable parameters and thus increase the training and calculation complexity.To overcome this problem, Xiao et al. [63] used fluorescence data in a 1D CNN instead of 2D or 3D CNNs.In this paper, the 1D CNN was mainly divided into two parts.In the first part, the decay features were extracted, and in the second part, containing n + 1 branches for an n-exponential decay fit, the lifetimes and abundancies for the components were reconstructed.For training and testing, the authors simulated two different datasets, each containing 40,000 decay samples, and performances were compared with the traditional trust-region-reflective algorithm (TRRA) method, FLI-Net, and DenseNet architecture.This model can resolve the multi-exponential decay model, and it is 8 and 300 times faster than other CNN models and TRRA, respectively.These are some important advantages of using 1D CNN.Also, the authors showed that their network successfully estimated fluorescence lifetime from experimental data.
Xiao et al. [64] proposed a DL-based lifetime image estimation method for few-photon fluorescence lifetime imaging (FPFLI).They showed that traditional methods, like LSF, MLE, Bayesian analysis (BA), or phasor methods, need high photon counts for FLIM analysis, while DL models can deal with this problem.Here, they increased the training speed by training this model with large synthetic data.They showed that FPFLI performed where MLE failed to estimate lifetimes in low photon counts.Ochoa et al. [65] proposed NetFLICS-CR, where a compressed ratio (CR) block is added to the NetFLICS [61] model.This CR block reduces the input dimension.They compared their model performance with TVRecon.They trained the network with a modified (simulated decay traces for each pixel) EMNIST dataset.Before training, they performed data augmentation by rotation and combining two different datasets.Mean absolute error (MAE) was considered as an evolution matrix between predicted and reconstructed.The trained Net-FLICS took approximately 2.2 s in total to reconstruct 800 samples, whereas TVRecon took 15 s.In conclusion, Net-FLICS performs better than TVRecon.In conclusion, we can confirm that all DL or ML methods can estimate lifetimes from experimental data, usually faster and more accurate than conventional methods.

Discussion and Conclusions
In our comprehensive review, we have summarized recent investigations on the utilization of machine learning (ML) and deep learning (DL) techniques for fluorescence lifetime imaging microscopy (FLIM) data analysis and modeling.Traditional methods for extracting fluorescence lifetimes from FLIM data can be complex and time-consuming.However, ML and DL methods have emerged as promising alternatives that offer faster and more accurate extraction of fluorescence lifetimes.Combining different inverse modeling approaches or combining inverse modeling with forward modeling to take advantage of each process can be a new direction in the FLIM inverse modeling technique.
Among the ML and DL approaches, convolutional neural network (CNN)-based techniques were found to be prevalent in the majority of the reviewed papers.These techniques have demonstrated their effectiveness in various aspects of FLIM analysis.For instance, CNNs have been successfully applied for denoising and enhancing fluorescence microscopic images, thereby improving the quality of fluorescence data.Additionally, novel DL techniques have been developed, such as the Net-FLICS network for fluorescence lifetime imaging using compressive sensing data and the 3D CNN-based FLI-Net network for lifetime extraction.
In this review, we highlighted several areas of further research and development.One key aspect is addressing the challenge of limited data availability.Strategies to overcome this limitation could involve data augmentation techniques or transfer learning approaches.

Figure 1 .
Figure 1.(a) Illustration of Jablonski s diagram.A molecule in S0 level absorbs energy, leading to an electronic excitation to a higher energy level for a short period of time.By internal conversion and vibrational relaxation processes, the electron moves to the lowest vibrational level of excited state.From the S1 electronic state, the electron returns to the ground state in either a radiative or nonradiative way (adapted from[4]).(b) The emission spectrum of the donor (blue line) must overlap with the excitation spectrum of the acceptor (yellow line).(c) The distance between donor and the acceptor molecule is important for the FLIM-FRET process.(d) If the distance is larger than the threshold value R0, no FRET is occurring.(e) If both molecules are in very close proximity, the donor s energy can be transferred to the acceptor, and the acceptor molecule emits a photon.(b-e) reprinted with permission from [10] © Leica Microsystems GmbH.
Figure 1.(a) Illustration of Jablonski s diagram.A molecule in S0 level absorbs energy, leading to an electronic excitation to a higher energy level for a short period of time.By internal conversion and vibrational relaxation processes, the electron moves to the lowest vibrational level of excited state.From the S1 electronic state, the electron returns to the ground state in either a radiative or nonradiative way (adapted from[4]).(b) The emission spectrum of the donor (blue line) must overlap with the excitation spectrum of the acceptor (yellow line).(c) The distance between donor and the acceptor molecule is important for the FLIM-FRET process.(d) If the distance is larger than the threshold value R0, no FRET is occurring.(e) If both molecules are in very close proximity, the donor s energy can be transferred to the acceptor, and the acceptor molecule emits a photon.(b-e) reprinted with permission from [10] © Leica Microsystems GmbH.

Figure 1 .
Figure 1.(a) Illustration of Jablonski's diagram.A molecule in S 0 level absorbs energy, leading to an electronic excitation to a higher energy level for a short period of time.By internal conversion and vibrational relaxation processes, the electron moves to the lowest vibrational level of excited state.From the S 1 electronic state, the electron returns to the ground state in either a radiative or non-radiative way (adapted from[4]).(b) The emission spectrum of the donor (blue line) must overlap with the excitation spectrum of the acceptor (yellow line).(c) The distance between donor and the acceptor molecule is important for the FLIM-FRET process.(d) If the distance is larger than the threshold value R0, no FRET is occurring.(e) If both molecules are in very close proximity, the donor's energy can be transferred to the acceptor, and the acceptor molecule emits a photon.(b-e) reprinted with permission from [10] © Leica Microsystems GmbH.

Figure 2 .
Figure 2.This figure shows the relationship between FLIM image and phasor plot.The phasor distributions are calculated by Fourier transform after data acquisition from TCSPC.Each intensity pixel value in the image was converted as a point in the phasor plot.Adopted with permission from [6] © Leica Microsystems GmbH.

Figure 2 .
Figure 2.This figure shows the relationship between FLIM image and phasor plot.The phasor distributions are calculated by Fourier transform after data acquisition from TCSPC.Each intensity pixel value in the image was converted as a point in the phasor plot.Adopted with permission from [6] © Leica Microsystems GmbH.

Figure 4 .
Figure 4. Performance of transfer learning in denoising: (a) Schematic of the transfer learning method.(b) Synthetic noisy images from microtube confocal images and corresponding denoised images with transfer learning denoising from pre-training using FMD dataset and compared to selfsupervised denoising without pre-training.(c) The denoising performance, in terms of "Mean square error", "Structure similarity index", and "Mean Fourier ring correlation as a function of the

Figure 4 .
Figure 4. Performance of transfer learning in denoising: (a) Schematic of the transfer learning method.(b) Synthetic noisy images from microtube confocal images and corresponding denoised images with transfer learning denoising from pre-training using FMD dataset and compared to self-supervised denoising without pre-training.(c) The denoising performance, in terms of "Mean square error", "Structure similarity index", and "Mean Fourier ring correlation as a function of the synthetic noisy image".Reprinted with permission from [36] under the terms of the OSA Open Access Publishing Agreement.

Figure 5 .
Figure 5. Schematic diagram of data modeling: First, true lifetime images were extracted from raw data by conventional methods like maximum likelihood method and least-square fitting.The machine learning model was then used to predict lifetime parameters.Afterward, machine learning was used for classification and segmentation.By segmentation region of interest (ROI) has been separated from background and in classification hyperplane divides data into different classes.

Figure 5 .
Figure 5. Schematic diagram of data modeling: First, true lifetime images were extracted from raw data by conventional methods like maximum likelihood method and least-square fitting.The machine learning model was then used to predict lifetime parameters.Afterward, machine learning was used for classification and segmentation.By segmentation region of interest (ROI) has been separated from background and in classification hyperplane divides data into different classes.

Figure 6 .
Figure 6.This image shows segmentation method by phasor plot: (a) two-photon intensity image; (b) fluorescence lifetime image; (c) phasor plot; (d,f) phasor labeling.Similar fluorescence decays have same features, so they cluster together.Here, red and blue color clusters are selected manually by users and this label their corresponding pixels in the image with certain colors; (e,g) clustered images (each color in g represents the one cluster).Reprinted with permission from [37] © The Optical Society.

Figure 6 .
Figure 6.This image shows segmentation method by phasor plot: (a) two-photon intensity image; (b) fluorescence lifetime image; (c) phasor plot; (d,f) phasor labeling.Similar fluorescence decays have same features, so they cluster together.Here, red and blue color clusters are selected manually by users and this label their corresponding pixels in the image with certain colors; (e,g) clustered images (each color in g represents the one cluster).Reprinted with permission from [37] © The Optical Society.

Figure 7 .
Figure 7. Schematic diagram shows the different architectures of ResNet, ResNetZ, and Res2Net.The proposed network and Res2Net have a shortcut connection.Reprinted with permission from [46] under the terms of the CC-BY 4. license.

Figure 7 .
Figure 7. Schematic diagram shows the different architectures of ResNet, ResNetZ, and Res2Net.The proposed network and Res2Net have a shortcut connection.Reprinted with permission from [46] under the terms of the CC-BY 4. license.

Figure 8 .Figure 8 .
Figure 8.Each column shows the FLIM images of cervical cancer from four participants: One row represents tm value from the NAD(P)H, and another row shows the a2 values from NAD(P)H Figure 8.Each column shows the FLIM images of cervical cancer from four participants: One row represents t m value from the NAD(P)H, and another row shows the a 2 values from NAD(P)H.Lifetime values are shorter in cancerous cells than in normal cells because cancer cells tend to undergo glycolysis rather than oxidative phosphorylation.(a−d) samples are from cervical cancer patients and (e−h) samples are from normal patient.Reprinted with permission from [38] under the terms of the CC BY 4.0 license.

Figure 9 .
Figure 9.This is the workflow of inverse modeling.First, a machine learning model was trained with artificial FLIM data.Artificial data mimic the real experimental data.To generate the artificial data, system response function (IRF) is convolved with an exponential function.Predicted parameters were compared with original artificial data.Finally, the machine learning model is tested with real data, measured data.Here, thick line represents the data generation and training method and dotted line represents the evolution process.

Figure 9 .
Figure 9.This is the workflow of inverse modeling.First, a machine learning model was trained with artificial FLIM data.Artificial data mimic the real experimental data.To generate the artificial data, system response function (IRF) is convolved with an exponential function.Predicted parameters were compared with original artificial data.Finally, the machine learning model is tested with real data, measured data.Here, thick line represents the data generation and training method and dotted line represents the evolution process.

Figure 10 .
Figure 10.(A) The architecture of FLI-Net.The input of the FLI-Net is 3D data cube.(B) MSE vs. epochs for lifetime 1, lifetime 2, abundance ratio.(C) t-SNE projection from the last activation map.(D) FLI-Net performance concerning structure similarity index (SSIM).(E) LSF performance concerning SSIM.From the plot, it is clear that FLI-Net performance is better than LSF.Reprinted with permission from[60] under the terms of the PNAS license.

Figure 10 .
Figure 10.(A) The architecture of FLI-Net.The input of the FLI-Net is 3D data cube.(B) MSE vs. epochs for lifetime 1, lifetime 2, abundance ratio.(C) t-SNE projection from the last activation map.(D) FLI-Net performance concerning structure similarity index (SSIM).(E) LSF performance concerning SSIM.From the plot, it is clear that FLI-Net performance is better than LSF.Reprinted with permission from[60] under the terms of the PNAS license.

Table 1 .
Experimental applications of FLIM.

Table 1 .
Experimental applications of FLIM.

Table 2 .
Comparison between different FLIM data analysis software packages.
Omero client: It can load some specific data formats, like .std,.txt,.tif,.raw.Fitted parameters can be exported as .csv

Table 3 .
Different denoising methods and their examples are shown.

Table 4 .
Comparison between some different inverse modeling methods.