A Deep Learning Approach for Improving Two-Photon Vascular Imaging Speeds

A potential method for tracking neurovascular disease progression over time in preclinical models is multiphoton fluorescence microscopy (MPM), which can image cerebral vasculature with capillary-level resolution. However, obtaining high-quality, three-dimensional images with traditional point scanning MPM is time-consuming and limits sample sizes for chronic studies. Here, we present a convolutional neural network-based (PSSR Res-U-Net architecture) algorithm for fast upscaling of low-resolution or sparsely sampled images and combine it with a segmentation-less vectorization process for 3D reconstruction and statistical analysis of vascular network structure. In doing so, we also demonstrate that the use of semi-synthetic training data can replace the expensive and arduous process of acquiring low- and high-resolution training pairs without compromising vectorization outcomes, and thus open the possibility of utilizing such approaches for other MPM tasks where collecting training data is challenging. We applied our approach to images with large fields of view from a mouse model and show that our method generalizes across imaging depths, disease states and other differences in neurovasculature. Our pretrained models and lightweight architecture can be used to reduce MPM imaging time by up to fourfold without any changes in underlying hardware, thereby enabling deployability across a range of settings.


Introduction
The neurovascular network transports chemicals (e.g., oxygen, nutrients, waste) to and from the brain to support neuronal activity [1,2].Neurovascular function is disrupted by disorders such as stroke, Alzheimer's and other neurodegenerative diseases, and diabetes, with lasting effects that are not fully understood.Advances in multiphoton fluorescence microscopy (MPM) have enabled imaging with capillary-level resolution in vivo, and this noninvasive tool could be used to monitor capillary-level changes over time in cerebral vasculature as a potential predictor of disease progression/prognosis [3][4][5][6].A constraint with MPM, however, is the slow acquisition process that is necessary for producing highquality, three-dimensional images with a traditional point-scanning multiphoton imaging setup.Given the physical limitations of a live animal, the acquisition speed puts a limit on study sizes and the ability to reach statistically significant results.Although previous approaches have sought to improve imaging speeds by incorporating innovative imaging hardware, these implementations come at high cost and complexity and cannot be readily employed in existing infrastructure [7][8][9][10].
An alternative, more cost-effective and accessible approach might be to computationally improve the image acquisition process using convolutional neural networks (CNNs), which leverage existing datasets of MPM images.While several recent advances have been made in applying CNNs to improve MPM imaging results [11][12][13][14][15][16], to our knowledge, only one has been focused on improving MPM imaging speed.Guan et al. presented a CNN for improving the imaging speed of a two-photon fiberscope for neuronal imaging using a conditional generative adversarial network (cGAN) [16].They achieved a 10-fold speedup in frame rate.A drawback to their approach, however, is the requirement for a two-part training set, involving both ex vivo and in vivo imaging, which is extremely expensive and time-consuming.Several other models for general denoising or segmentation for MPM have also been focused primarily on neuronal or calcium imaging [11][12][13][14], with only one to our knowledge focused on vascular segmentation [16], none of which is used for improving vascular imaging speeds.
The aim of our work is to demonstrate and validate a novel CNN-based data acquisition pipeline that allows for improved neurovascular imaging speeds by up to fourfold.We use a Res-U-Net architecture CNN-based approach trained to take images captured at low resolution (128 × 128 pixels), thereby at much faster speeds, and then upscale these to a higher resolution (512 × 512 pixels) without compromising the accuracy of vascular morphological information that is extracted or introducing additional noise.The upscaling process from low to high resolution using deep learning is referred to as image superresolution.We then combined this with a vectorization pipeline to obtain quantitative statistics of neurovasculature.Our pretrained models and light architecture allow for fast acquisition, image super-resolution, and vectorization of MPM images without the limitations of added hardware and can be used to reduce imaging time by up to fourfold.To our knowledge, there currently do not exist any comparably low-cost and accessible methods for increasing neurovascular MPM imaging speeds.This article will discuss methods used, followed by results, and a discussion of the results of our work.

Animal Preparation
Cranial window implants were prepared in C57 mice with dura intact, exposing a 4 × 3 mm portion of the skull that was then fixed with a cover glass to restore intracranial pressure as previously described [17].During imaging, mice were anesthetized with isoflurane and body temperature was maintained at 37.5 • C. Blood plasma was fluorescently labeled with dextran-conjugated Texas red (70 kDa, D1830, Thermo Fisher, Waltham, MA, USA) dissolved in saline (5% w/v).The dye was administered intravenously via retroorbital injection (0.1 mL).All animal protocols were approved by the University of Texas at Austin Institutional Animal Care and Use Committee.
For the stroke model mice, photothrombotic ischemia was induced through retroorbitally injecting rose bengal (0.15 mL at 15 mg/mL) and irradiating a penetrating arteriole branching from the middle cerebral artery for 15 min.The laser source had a 532 nm wavelength, 20 mW average power, and was focused to a ~300 µm-diameter spot size.Mice were anesthetized with isoflurane (1.5%, 0.6-0.8LPM) and body temperature was maintained with a heating pad during the procedure.Pial anatomy was visualized using laser speckle contrast imaging to select which artery to target and to confirm occlusion.

Image Acquisition
All images were acquired using a custom-built two-photon microscope previously described [18].The excitation source was an ytterbium fiber amplifier with an output beam of 1050 nm wavelength, 120 fs pulse width, and 80 MHz repetition rate [19].Highresolution images were 512 × 512 pixels and low-resolution images were 128 × 128 pixels, both with a field size of 700 × 700 µm.Image stacks were acquired with 3 µm axial spacing.
A resonant-galvanometer scanning system was used [18], with average pixel dwell times of 87.8 ns and 20-frame averaging.Power at the sample did not exceed 170 mW and was identical between low-and high-resolution pairs.Images with large fields of view were taken as a 2-by-4 grid of standard images, with ~25-30% overlap between tiles.

Image Preprocessing
All images were normalized prior to use as a training image or semi-synthetic test image.A 3D median filter of size [1 1 1] was applied to raw image stacks, followed by a full-scale contrast stretch (FSCS) to fill the 16-bit range with 0.3% saturation across the entire stack, using the normalization function provided by Fiji ImageJ (v.2.35) [20].This FSCS normalization method was determined to create the best images compared to FSCS across the entire stack without saturation and FSCS by image slice (Supplementary Figure S1).Images were then converted from 16 bit to 8 bit and separated into individual frames.

Stitching
Images with large fields of view acquired as a 2-by-4 grid of standard images were stitched together using ImageJ's Grid/Collection Stitching plugin [21].

Single-Frame Images
To create semi-synthetic low-resolution images for training, preprocessed real-world images received one of the following types of noise: Poisson, Gaussian (µ = 0, σ = 0.1), additive Gaussian (µ = 0, σ = 5), or no noise prior to fourfold downscaling (from 512 × 512 pixels to 128 × 128 pixels).For additive Gaussian noise, the local variance was scaled by 0.001.A range of parameters (i.e., mean, standard deviation, local variance) were tested to identify values for optimal performance.Models were trained for each combination of parameters and given test images.Output images were inspected visually, and image quality metrics (PSNR, SSIM) were calculated.

Multi-Frame Images
Low-resolution images from the single-frame semi-synthetic image generation step were used to create multi-frame training images.Multi-frame images consisted of five low-resolution images sequential in axial space with 0.3 µm separation (axial distance between acquired images).

Neural Networks and Training
To perform upscaling of the low-resolution images to high-resolution images, we used an architecture called Point Scanning Super Resolution (PSSR), first described in Fang et al. [22].The PSSR architecture is a ResNet-based U-Net convolutional neural network.The U-Net is in the standard form of encoder-decoder with skip-connections, where the encoder gradually downsizes an input image, followed by the decoder upsampling the image back to its original size.The encoder portion uses a Resnet architecture pretrained on ImageNet.The decoder utilizes learnable subpixel convolutional layers, which are trained for upsampling.A typical issue that occurs with deconvolution going from a low dimension to a higher dimension is uneven overlap, which leads to artifacts that appear in the form of a checkerboard.Instead of dealing with this issue through the use of carefully managed stride lengths, the model utilizes an additional blurring kernel to remove checkerboard artifacts.This blurring is achieved through the use of an interpolation kernel with a zeroorder hold with the scaling factor after each upsampling layer (Figure 1).To train the model, we used a mean squared error (MSE) loss function after determining that L1 and feature loss did not perform as well (Supplementary Figure S2).A learning rate of 9 × 10 −4 was used for single-frame training and 1 × 10 −4 was used for multi-frame training.

Training/Test Images
Preliminary training for finding the best noise model was undertaken with 3399 training (data from 5 mice, 16 stacks, 6 imaging sessions) and 676 validation image pairs (2 mice, 3 stacks, 2 imaging sessions).Final full-dataset training was completed using 24,069 training (6 mice, 114 stacks, 19 imaging sessions) and 4494 validation (6 mice, 22 stacks, 7 imaging sessions) image pairs.Real acquired image pairs (677 image pairs from 2 mice, 3 stacks, 2 imaging sessions) were used for testing and evaluating the models.These image pairs were also used for comparing the performance between training with real acquired pairs vs. semi-synthetic pairs (234 image pairs for training, 221 image pairs for validation, 222 image pairs for testing; each set from 1 mouse, 1 stack, 1 imaging session).

Hardware
Training was performed using Frontera at the Texas Advanced Computing Center (TACC) with four NVIDIA Quadro RTX 5000 GPUs using the CUDA version 10.0 toolkit.

Image Quality Evaluation
Image quality between upscaled images and the original image was preliminarily assessed with peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM).PSNR measures differences between images at the pixel level.The SSIM computes image similarity in terms of contrast, luminance, and structure [23].Both metrics were computed using built-in MATLAB functions: where R is the peak signal value and MSE is the mean square error between the two images.SSIM = ((2µ x µ y + C 1 )(2σ xy + C 2 ))/((µ x 2 + µ y 2 + C 1 )(σ x + σ y + C 2 )) where for images x and y, µ x and µ y are the means, σ x and σ y are the standard deviations, σ xy is the cross-variance, and C 1 and C 2 are constants.Both metrics were computed using built-in MATLAB functions.In combination, these metrics gave a general sense of image similarity, but were not indicators of morphological accuracy from vectorization.Generally, higher PSNR and SSIM values are desired.However, both metrics correlate similarities in raw intensity values with higher similarity between images, whereas the vectorization process designed for vascular networks is the end goal of image acquisition and produces quantitative anatomical information that may or may not be of interest to a particular researcher.For simplicity, when possible, we chose image segmentation accuracy (with respect to the ground truth) to measure general vectorization performance.In the case of unknown ground truth, SLAVV was used to segment the original image and estimate the CNR, which was used to match the quality of the simulated and real acquired images.PSNR and SSIM values do not seem to fully reflect image quality improvement from denoising, as seen in Figure 2b where image quality improves visually with the increased training data, but PSNR and SSIM values decrease slightly.In addition, higher variations in predicted pixel intensity value are seen within white vessel regions, which can cause lower PSNR and SSIM values despite not affecting visual quality or vectorization performance.This especially affects images closer to the surface of the brain, where large arteries dominate the image, and accounts for the outlier points seen in the boxplots of Figures 2 and 3.

Vectorization
All vectorization was performed using SLAVV software (v.2.02) [24].Real acquired low-resolution images were upscaled using PSSR, then vectorized and manually curated.To obtain an objective comparison of methods without manual curation bias, a simulated image with known ground truth was created from the manually curated high-resolution image.The simulated image had an identical CNR (0.94) to the original acquired highresolution image, measured by SLAVV as: where I foreground is average foreground intensity and I background is average background intensity.
A low-resolution image was created using the previously described semi-synthetic image generation method.The low-resolution image was upscaled using bilinear interpolation, the single-frame model, and the multi-frame model.Vectorization of simulated images using automated curation was possible with the known ground truth, as previously described [24].
From the vectorized networks for each upscaled image, the original simulated image, and the ground truth network, cumulative distribution functions were calculated for strand statistics (length, average radius, average z-direction, and inverse tortuosity).Two comparisons were then made for each strand statistics: Pearson's correlation between each upscaled or original simulated image CDF and ground truth CDF; and Kolmogorov-Smirnov (K-S) test between each upscaled image and original simulated image.The Pearson's correlation values compare the performance of each simulated (upscaled or original) image against the ground truth.The original simulated image serves as a baseline for vectorization performance.The K-S test is used to determine whether the performance of each model is significantly different compared to the original simulated image.
Overall accuracy was calculated for comparison as follows: with sensitivity and specificity defined as follows: where TP is the number of true-positive detections, TN is true negative, FP is false positive, and FN is false negative.TP refers to when the predicted and actual values are both true, TN refers to when the predicted and both false, FP refers to when the predicted value is true but the actual value is false, and FN refers to when the predicted value is false but the actual value is true [25].Balanced accuracy is usually defined as 50% sensitivity and 50% specificity.However, in our application, this is not a desirable measure, as we are interested in weighting the vasculature (foreground voxels) significantly more than the background.Foreground voxels occupy about 5-10% of the total volume, so we chose to use a weighted accuracy regime as a measure of the performance of our model.

Statistical Analysis
All significance values were Bonferroni-adjusted from the standard p value of 0.05 to address the increased possibility of type I error.

Structure and Analysis Pipeline Overview
Our pipeline to improve two-photon microscopy acquisition and vectorization accepts individual as well as multiple frames from MPM imaging.Our process is split into two main parts: an image super-resolution CNN designed to upscale low-resolution images and a vectorization pipeline that is designed to output morphological statistics on the acquired images (Figure 1).Low-resolution images (128 × 128 pixels) are acquired using two-photon microscopy.A deep learning (PSSR Res-U-Net)-based upscaling process generates highresolution images (512 × 512 pixels), which would take much longer to acquire, from low-resolution images.Segmentation-less vascular vectorization (SLAVV) generates 3D renderings and calculates network statistics from an upscaled image stack.For superresolving the images, we used the PSSR Res-U-Net architecture, which has been shown to restore images of presynaptic vesicles and neuronal mitochondria from a scanning electron microscope (SEM) and a laser scanning confocal microscope, respectively [22].We utilized this architecture over several other possible ones because it: (a) allowed us to use semi-synthetic data for training, which circumvents the need to acquire real-world image pairs for training, which is difficult and expensive for large datasets; (b) enabled us to also employ multi-frame inputs that could leverage information across correlated images at similar depths; (c) does not utilize an adversarial network in the training, which is more challenging to train as well as to evaluate the generated models; and (d) allowed us to use a transfer learning approach to initialize our model with weights obtained with the architecture trained on ImageNet, a large natural image classification dataset [22].For vectorization, we used segmentation-less, automated, vascular vectorization (SLAVV) [24], which uses simple models of vascular anatomy, and efficient linear filtering and vector extraction algorithms with manual or automated vector classification.Using a multi-frame PSSR approach and combining it with a vectorization pipeline, we show that we are able to restore two-photon vascular images sufficiently for extracting morphological characteristics through vectorization.

Transfer Learning, Creation and Evaluation of Semi-Synthetic Training Data
Traditional approaches to upscaling images involve acquiring paired high-and lowresolution real-world images that we could use for training the model [16,17].For our task, however, this process is time-consuming, since both low-and high-resolution image pairs must be collected (as opposed to accessing existing or acquiring only high-resolution images that can be used to make semi-synthetic data), expensive, since the additional acquisition time needed means additional costs for anesthesia and dye injections plus labor hours, and in certain situations impossible for live animals, since it is possible for animal movement during the transition from low-to high-resolution acquisition to cause image pairs to not perfectly match.This challenge greatly limits the practical sample size of training datasets.To overcome this difficulty, we sought to use semi-synthetic training data that mimic low-resolution acquisition to greatly improve sample size.Semi-synthetic training data were created by adding noise to, then downscaling, full-resolution images from a two-photon vascular image repository of previously acquired images (see Data Availability).We evaluated several approaches for the creation of this semi-synthetic data and compared our results to a model trained with a real-world dataset of the same sample size.
To mimic the noise observed in real acquired low-resolution images (i.e., real data), we tested models trained with semi-synthetic images that were created with the following noise filters: no noise (downscaled only, used as the reference), Poisson noise, Gaussian noise, and additive Gaussian-distributed noise (Figure 2a).Real acquired low-resolution images served as input images to test the model.We evaluated model performance using standard image quality metrics, specifically, by calculating and comparing the peak signalto-noise ratio (PSNR) and structural similarity index measure (SSIM) of the model output and acquired full-resolution image.
The resulting median PSNR and SSIM values from each model, ranked from highest to lowest for both metrics, were as follows: Gaussian, additive Gaussian, Poisson, no noise (Figure 2b,c).This was determined using Wilcoxon signed-rank tests with p < 0.005 (Bonferroni-adjusted).We noticed that the Gaussian and additive Gaussian models performed similarly, and thus performed further testing to compare the two noise methods using a larger training set consisting of 24,069 semi-synthetic training image pairs-7× the preliminary training set of 3399 semi-synthetic image pairs.The test image outputs from the models trained with the larger dataset showed notable qualitative improvements, with fewer false detections, less noise, and smoother vessel shapes.With the larger dataset, the Gaussian model produced a slightly higher median PSNR value but did not produce a median SSIM value that was statistically significantly different from that produced by additive Gaussian (Wilcoxon signed-rank test, p < 0.005).Despite the slightly higher PSNR performance by the Gaussian model, however, a qualitative comparison suggested that the additive Gaussian results had somewhat less noise and higher sensitivity to fainter vessels.Additionally, PSNR only measures similarity in pixel values and does not necessarily predict vectorization performance, which is what we ultimately wish to optimize.To fully validate and compare the performance between the Gaussianand additive Gaussian-trained models, we performed a final comparison test using the Segmentation-Less Automated Vascular Vectorization (SLAVV) (v2.1.0)software (further described in a later section).We found that the additive Gaussian model output allowed for a more accurate vessel detection overall compared to the Gaussian model output (95.7% vs. 95.6%).Based on these results, we chose to perform subsequent analyses using the model trained with the full dataset of semi-synthetic images created using the additive Gaussian noise method to maximize accurate vessel detection.To evaluate the effectiveness of using semi-synthetic data in place of real-world training data, we compared the performance of models trained with each method (Figure 2d,e).For this comparison, both models were trained with 234 image pairs, due to the limited availability of acquired image pairs.The output image from the real acquired model appeared blurrier and over-predicted vessel diameters more significantly compared to the semi-synthetic model (Supplemental Figure S1).Nonetheless, the model trained with real-world data had higher median PSNR and SSIM values compared to the model trained with semi-synthetic data (Wilcoxon signed-rank test, p < 0.05), although the values were close (PSNR: 26.9 vs. 26.6;SSIM: 0.492 vs. 0.494).We deem the results similar enough for semi-synthetic training data to be used in place of real-world training data.The use of semi-synthetic data advantageously circumvents complications from imprecise alignment in the acquisition of image pairs, limited availability of existing images (677 pairs), and high material and labor costs for data collection.

Single-Frame vs. Multi-Frame Training
A key issue with low-resolution acquired images is the diminished amount of total signal capture.This can result in noisier images and cause spurious vessels to appear in the vectorization process.A potential method for reducing false detections is providing the model neighboring depth images on a stack, which are highly correlated in signal but not in noise.Thus, we sought to improve the performance of our model by using multi-frame image input.
We compare the performance of the single-frame model to a multi-frame model, with an additional comparison against the traditional bilinear upscaling method, for both semi-synthetic and real-world test images (Figure 3).The traditional bilinear upscaling method offers a baseline performance measure for a non-CNN approach.The multi-frame model is trained with input image stacks consisting of five sequential images in depth, with axial offsets of 0.3 µm, to predict a single output image-the third image in the input sequence.The multi-frame model yields images with higher PSNR and SSIM values than the single-frame model, and both PSSR methods outperform the bilinear upscaling method for both semi-synthetic and real-world test images (Wilcoxon signed-rank test, p < 0.0167, Bonferroni-adjusted). Overall PSNR and SSIM values are higher for semisynthetic images compared to real-world images, which is unsurprising given the model was trained completely on semi-synthetic images.Nonetheless, the real-world output images from our models show that individual vessels can be resolved, which is much more important for the final vectorization process than the exact pixel values measured by PSNR.

Reconstruction and Stitching of Infarct Images
A major application of two-photon imaging that we aim to make more accessible with our approach is imaging using a large field of view (FOV) of diseased vasculature.Large-FOV imaging with high resolution is a time-consuming process and thus would benefit substantially from the speedup offered by low-resolution imaging.Large-FOV images are achieved by acquiring then stitching standard-sized tiles together using ImageJ's Grid/Collection Stitching plugin [21].An example of application is acquiring images from a stroke model, which is of interest for studying disease effects on vascular morphology.Even with a sped-up resonant-galvo setup [18] (as opposed to a traditional, slower galvo-galvo setup), the large-FOV 1.18 mm × 2.10 mm × 0.636 mm image stack used to generate the high-resolution stroke model image in Figure 4 required close to an hour of imaging time and did not come close to capturing the entire region affected by the infarct.This issue is further amplified when trying to image an entire cohort of mice in a single day for a longitudinal study of significant sample size.These currently inadequate acquisition speeds limit our ability to collect substantial two-photon image sets of diseased vascular networks and result in the limited availability of images of diseased vasculature that could be used for training data.Thus, the ability to speed up imaging times for large-FOV images using our model, without the need for specialized training data, would allow for the collection of greater volumes of diseased vasculature, such as with stroke studies.
To investigate the feasibility of using our models to drastically reduce imaging times for large-FOV images of diseased vasculature with minimal information loss, we examined the ability of our single-frame and multi-frame models, trained only with images of normal vasculature, to restore a semi-synthetic large FOV image of an ischemic infarct (four weeks post-stroke) collected in a preliminary study (Figure 4).The differences in morphology between vasculature in the peri-infarct region and normal vasculature are exemplified by the differences between the top half of the full image, which more closely resembles normal vasculature, and the bottom half of the image, which captures the infarct region and the more immediately surrounding vessels.Ischemic infarct vessels appear significantly more parallel to the imaging plane, thus creating image slices with higher vascular area density compared to the more perpendicularly oriented vessels further from the infarct.Despite these morphological differences and having only trained with images of normal vasculature, our models are able to resolve capillaries in the infarct region, as shown in the insets of Figure 4.The multi-frame output image more closely resembles the HR image than the single-frame image, as vessel radii are more consistent in the multi-frame image.In the case of the LR and bilinear-upscaled images, the individual capillaries in the inset cannot be resolved.

Vectorization
Vectorization is the ultimate step that extracts quantitative information for evaluating the vascular morphology of a network.Therefore, we are interested in comparing different image generation strategies by comparing performance after vectorization.We demonstrate successful vectorization of single-and multi-frame model output images from real acquired low-resolution images using manual curation-assisted SLAVV and visualization with VessMorphoVis [26] (Figure 5a).Additionally, we perform a more objective comparison of our models' performance using a previously described method [24], which uses simulated images from a known ground truth and automated vector classification (no manual assist).Using a known ground truth (derived from the real acquired high-resolution vectorized network shown in (Figure 5a), we are able to quantify the sensitivity, specificity, and accuracy of the vectorized upscaled images.We generated the simulated vascular image to have the same contrast-to-noise ratio (CNR) of 0.94 as the real acquired high-resolution image and to be representative of the image quality of a typical image acquired by our two-photon microscope.We created a low-resolution version of the simulated image using the same method for creating semi-synthetic training data and then upscaled it using bilinear interpolation and our single-frame and multi-frame models.We then vectorized these images using fully automated (globally thresholded) SLAVV at peak segmentation performance (measured against the ground truth image).The resulting strand objects are the minimal set of one-dimensional traces that span the entire vascular network.

Vectorization
Vectorization is the ultimate step that extracts quantitative information for evaluating the vascular morphology of a network.Therefore, we are interested in comparing different image generation strategies by comparing performance after vectorization.We demonstrate successful vectorization of single-and multi-frame model output images from real acquired low-resolution images using manual curation-assisted SLAVV and visualization with VessMorphoVis [26] (Figure 5a).Additionally, we perform a more objective comparison of our models' performance using a previously described method [24], which uses simulated images from a known ground truth and automated vector classification (no manual assist).Using a known ground truth (derived from the real acquired high-resolution vectorized network shown in (Figure 5a), we are able to quantify the sensitivity, specificity, and accuracy of the vectorized upscaled images.We generated the simulated vascular image to have the same contrast-to-noise ratio (CNR) of 0.94 as the real acquired high-resolution image and to be representative of the image quality of a typical image acquired by our two-photon microscope.We created a low-resolution version of the simulated image using the same method for creating semi-synthetic training data and then upscaled it using bilinear interpolation and our single-frame and multi-frame models.We then vectorized these images using fully automated (globally thresholded) SLAVV at peak segmentation performance (measured against the ground truth image).The resulting strand objects are the minimal set of one-dimensional traces that span the entire vascular network.[26] for visual comparison between single-and multi-frame results and an acquired high-resolution image.We  [26] for visual comparison between single-and multi-frame results and an acquired high-resolution image.We performed manual curation for this vectorization process.(b) Vectorized image statistics for the automated curation process with known ground truth (simulated from manually curated high-resolution image).CDFs shown for metrics of length, radius, z-direction, and inverse tortuosity for original (OG), simulated original (sOG), bilinear upscaled (BL), and PSSR single-and multi-frame (SF, MF, respectively) images.Pearson's correlation values (r 2 ) were calculated between the original image and each simulated or upscaled image for each metric.(c) Statistics regarding maximum accuracy (%) achieved with vectorization or thresholding and % error in median length and radius for each method.
We plotted cumulative distribution functions (CDFs) for each image for each of the following strand metrics: length, radius, z-direction, and inverse tortuosity (Figure 5b).We included the simulated original image in the analysis as a control for the automated curation process, since the ground truth image was obtained through manual curation.
For each strand metric, we calculated Pearson's correlation (r 2 ) values between the CDFs of the ground truth image and the simulated images.Of all the images, the simulated original image maintained the highest r 2 value for average strand radius and inverse tortuosity.Our multi-frame model had the highest r 2 value for strand length, while bilinear and the single-frame model produced the highest r 2 value for the z-direction.We performed a Kolmogorov-Smirnov (K-S) test to compare the CDFs of each upscaling method against that of the simulated original image.The multi-frame CDFs for strand length, radius, and z-direction, the single-frame CDFs for length and z-direction, and the bilinear CDF for z-direction were not significantly different from those of the simulated original image (p < 0.0167, Bonferroni-adjusted). Thus, of the tested upscaling methods, the multi-frame model produced the most statistically comparable strand metrics to the simulated original image.
We calculated overall accuracy with respect to the ground truth for each image (Figure 5c).The original simulated image retains the highest vectorization accuracy (96.2%), followed by multi-frame (95.7%), single-frame (95.2%), and bilinear (94.5%).In terms of accuracy with raw image segmentation through intensity thresholding, however, multi-frame performs best (96.0%), followed by single-frame (95.3%), bilinear (94.4%), and original simulated image (91.9%).We also calculated the percentage error in the median length and median radius, the characteristics that best represent the vessel morphology, between each image against the ground truth values.Multi-frame produced the lowest median length error (6.4%), followed by bilinear (7.2%), single-frame (8.1%), and the original simulated image (8.2%).The bilinear and single-frame images had notably higher median radius errors (40.1% and 41.6%, respectively) compared to the multi-frame and original simulated images (both 26.9%), which was noted with visual inspection of the images as well.These statistics further support that a multi-frame upscaled image produces comparable vectorization results to an original high-resolution image.

Discussion
To our knowledge, this is the first time that a deep learning model has been demonstrated to improve imaging speeds for two-photon microscopy by upscaling and denoising low-resolution images of vasculature while retaining accuracy in extracted morphological characteristics.For this application, our model outperforms the traditional, non-CNN bilinear upscaling technique in output image quality (Figures 3 and 4) and vectorization accuracy (Figure 5).The performance of our model also improves notably with increased training data (Figure 2b); therefore, substantial time and material costs are reduced by training the model with semi-synthetic images generated from our database of 28,563 previously acquired two-photon vascular images.Real-world data also introduce further complications by requiring image registration.Not only would this add computational hours, but the image registration process does not produce perfect alignment either, because it is limited to being purely translational and free of interpolation to maintain the original recorded pixel values.Any rotational misalignments would not be accurately correctable.
We speculate that these misalignments in real-world training image pairs caused the overly blurry and enlarged vascular structures seen in the results of preliminary experiments (Figure 2d, Supplementary Figure S3).
Models trained from semi-synthetic images proved capable of restoring low-resolution vascular images and outperformed the standard bilinear interpolation method.For performing segmentation via intensity thresholding (Figure 5c), multi-frame had the highest accuracy of the upscaling methods and significantly outperformed the original standardresolution image.We hypothesize that this is a result of the denoising that occurs in the PSSR process.Since all upscaled images had higher intensity thresholding accuracy compared to the original image, we further postulate that upscaled images have less noise overall because fewer pixels are physically captured: all pixels created during upscaling are interpolated from neighboring pixel values and thus free of noise from the image acquisition process [18].For performing vectorization, however, the diminished noise does not offset accuracy losses from the upscaling process.We determined that the multi-frame model yields the highest accuracy of the three upscaling methods, but did not outperform the original standard-resolution image.Nonetheless, the multi-frame image also produced the greatest number of CDFs for strand metrics that were not significantly different from those of a standard-resolution image.We consider these results from the multi-frame model to be within acceptable tolerance for vectorization accuracy and similarity in strand statistics for our previously described purposes of characterizing the structural properties of the vessels of a particular network [27,28].
By acquiring low-resolution images, the imaging time could be reduced by up to half in a two-photon microscope with a resonant-galvo scanning system and by up to fourfold with a galvo-galvo scanning system because the number of pixels collected along each x-y axis is reduced to half.In a resonant-galvo scanning system, the one-half time saving is only applicable along the galvo axis because the galvo axis is the slow limiting factor in the scan time, while the resonant axis is very fast [18], and thus the total time saving is only up to one half.In a galvo-galvo system, the one-half time saving applies along both slow galvo axes, resulting in total time savings of up to four times.The reduction in imaging time can have several benefits.For instance, faster imaging times reduce risk of phototoxicity and thermal damage if excitation powers are kept constant [29,30].Additionally, the amount of time for which the subject is under anesthesia is reduced, decreasing the risk of vascular dilation [31], which can create skewed vectorization statistics.The reduction in imaging time can also decrease the injection volume and frequency of fluorescent dye, which alone can save up to hundreds of dollars in addition to eliminating the risk of sample misalignment caused by a reinjection during an imaging session.A specific but major benefit for those wishing to conduct chronic studies is that the faster acquisition times will allow for larger cohorts, which are currently constrained by the number of animals that can reasonably be imaged within each timepoint.This would yield more statistically significant sample sizes for studying and comparing healthy and diseased vasculatures over time.
With the potential for future disease studies in mind, we tested our model on data from a mouse that was given a stroke.We show that our model, despite having only been trained with images of healthy vasculature, can reasonably restore images taken from a peri-infarct region with sufficient resolution for the image stitching algorithm to successfully create a large FOV image.These results from semi-synthetic test data are promising in terms of being able to apply the model broadly to different disease models, although further validation should be performed with real-world test data.
Another area of research that could benefit from increased imaging speeds is the study of light propagation through the brain for the development of noninvasive brain imaging devices.With accessibility to a larger database of large-FOV two-photon images, light propagation models can be more thoroughly developed, tested, and refined [32].As precise capillary capture is not necessary for these models, the use of even lower resolution and faster imaging could be further explored with PSSR.With decreased acquisition times comes an inevitable decrease in the amount of signal collected.Thus, a limitation to this method is that 100% accuracy cannot be expected.This method is best suited for applications with some tolerance for error in the precise mapping of vasculature, such as studies where bulk morphological statistics are tracked over time.Although losses in accuracy in low-resolution images are inevitable due to less information being captured, experimentation with the accuracy and speedup tradeoff can be done to fit the tolerance of any application.
The potential for further speedup by reducing frame averaging could also be explored.Lower frame averaging leads to higher noise levels, which PSSR can be used to reduce noise.With higher noise, we would expect increased false-positive and/or false-negative detections, leading to an overall reduction in restoration accuracy.A potential method that can be explored to combat this effect would be to modify the loss function to increase the penalty for false-negative detections with the tradeoff of accepting more noise in the image.Alternatively, to prioritize denoising over high sensitivity, the loss function could be modified to penalize false-positive detections more heavily.
Finally, we believe that another area of follow-up would be to extend this approach to examine tissues beyond just neurovasculature, for example, to study renal artery disease, vascular diseases of the heart, or neovascularization of tumors.

Supplementary Materials:
The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/bioengineering11020111/s1, Figure S1: Test image results for models trained & tested with different normalization methods.Sagittal projections of model output image stacks that underwent full scale contrast stretch (FSCS) with respect to the entire stack, per slice, or entire stack with 0.3% saturation prior to input to each respectively trained model as test images.FSCS was done with Fiji ImageJ normalization function [20]; Funding: V.M.N. was supported by a grant for human brain evolution from the Allen Discovery Center program, a Paul G. Allen Frontiers Group advised program of the Paul G. Allen Family Foundation, as well as a fellowship from the Good Systems for Ethical AI at the University of Texas at Austin.This work was also supported by the National Institutes of Health (NS108484 to A.K.D; 3T32EB007507 and 5T32LM012414 to A.Z.).Objective and mouse images in Figure 1 were adapted from BioRender.GPU and computing support for the project was supported by a director's discretionary fund at the Texas Advanced Computing Cluster.

Institutional Review Board Statement:
The animal study protocol was approved by the Institutional Review Board of Institutional Animal Care and Use Committee (protocol code AUP-2020-00281 approved 12/19/22).

Informed Consent Statement:
The trained network model is freely available at the following link (accessed on 14 January 2024): https://utexas.box.com/v/2p-DL-Upscale.

Figure 1 .
Figure 1.Structure and analysis pipeline.Low-resolution images (128 × 128 pixels) are acquired using two-photon microscopy.A deep learning (PSSR Res-U-Net)-based upscaling process generates high-resolution images (512 × 512 pixels), which take much longer to acquire, from low-resolution images.Segmentation-less vascular vectorization (SLAVV) generates 3D renderings and calculates network statistics from an upscaled image stack.

Figure 1 .
Figure 1.Structure and analysis pipeline.Low-resolution images (128 × 128 pixels) are acquired using two-photon microscopy.A deep learning (PSSR Res-U-Net)-based upscaling process generates high-resolution images (512 × 512 pixels), which take much longer to acquire, from low-resolution images.Segmentation-less vascular vectorization (SLAVV) generates 3D renderings and calculates network statistics from an upscaled image stack.

Figure 2 .
Figure 2. Generating and evaluating semi-synthetic training data.(a) Examples of semi-synthetic training images created using different types of added noise prior to downscaling: no noise

Figure 2 .
Figure 2. Generating and evaluating semi-synthetic training data.(a) Examples of semi-synthetic training images created using different types of added noise prior to downscaling: no noise (downscaling only), Poisson, Gaussian, and additive Gaussian.Acquired low-resolution (LR, 128 × 128 pixels)

Figure 3 .
Figure 3.Comparison of performance between bilinear upscaling, a single-frame model, and a multi-frame model for semi-synthetic and real acquired test images.All models were trained with 24,069 image pairs.(a) Semi-synthetic test images from bilinear upscaling and models trained using

Figure 3 .
Figure 3.Comparison of performance between bilinear upscaling, a single-frame model, and a multiframe model for semi-synthetic and real acquired test images.All models were trained with 24,069 image pairs.(a) Semi-synthetic test images from bilinear upscaling and models trained using single-

Figure 4 .
Figure 4. Maximum-intensity projections (x-y) of ischemic infarct images consisting of 2 × 4 tiles with 213 slices (final dimensions 1.18 mm × 2.10 mm × 0.636 mm, pixel dimensions 1.34 µm × 1.36 µm × 3 µm) for a semi-synthetic low-resolution image, bilinear upscaled image, single-and multiframe output images, and acquired high-resolution image.The black hole in the bottom-left corner represents the infarct itself.

Figure 4 .
Figure 4. Maximum-intensity projections (x-y) of ischemic infarct images consisting of 2 × 4 tiles with 213 slices (final dimensions 1.18 mm × 2.10 mm × 0.636 mm, pixel dimensions 1.34 µm × 1.36 µm × 3 µm) for a semi-synthetic low-resolution image, bilinear upscaled image, single-and multi-frame output images, and acquired high-resolution image.The black hole in the bottom-left corner represents the infarct itself.

Figure 5 .
Figure 5.Comparison of vectorization results using different upscaling methods against a ground truth image.(a) Blender rendering of vectorized images using VessMorphoVis[26] for visual comparison between single-and multi-frame results and an acquired high-resolution image.We

Figure 5 .
Figure 5.Comparison of vectorization results using different upscaling methods against a ground truth image.(a) Blender rendering of vectorized images using VessMorphoVis [26] for visual comparison Figure S2: Test image results for models trained with different loss functions (single image slice); Figure S3: Vessel diameter comparison for test image outputs from models trained with real training data vs. semi-synthetic training data, vs. a real-acquired image.Approximate vessel diameters are as follows in units of pixels: real training: 13, semi-synthetic training: 10, real acquired 7. Author Contributions: A.Z. conceived of the study.A.Z., S.A.E. and A.T. collected data.A.Z., S.A.M. and V.M.N. analyzed the imaging data and built the pipeline.A.Z. and V.M.N. wrote the manuscript with input from all coauthors.V.M.N. and A.K.D. supervised the study.All authors have read and agreed to the published version of the manuscript.